-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relationships selected in SQL-based datastores now elide columns that have static values #2096
Conversation
d6104da
to
44f2cbb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments. What I saw looked good but I wouldn't mind having another set of eyes on it.
} | ||
|
||
// QueryRelationships queries relationships for the given query and transaction. | ||
func QueryRelationships[R Rows, C ~map[string]any](ctx context.Context, queryInfo QueryInfo, sqlStatement string, args []any, span trace.Span, tx Querier[R], withIntegrity bool) (datastore.RelationshipIterator, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the ~
mean in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~T means the set of all types whose underlying type is T
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you did something like type MyType map[string]any
, doing ~map[string]any
will also accept that and not just explicitly map[string]any
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which is exactly what we do
colIntegrityKeyID, | ||
colIntegrityHash, | ||
colTimestamp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For myself: this was moved into extraFields
on the common.SchemaInformation struct.
type wrappedTX struct { | ||
tx querier | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What role does this play?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It translates the interface
internal/datastore/common/sql.go
Outdated
ColCaveatName string | ||
ColCaveatContext string | ||
PaginationFilterType PaginationFilterType | ||
PlaceholderFolder sq.PlaceholderFormat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Folder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also is this in here because this allows you to push more logic up into the common sql logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
var resourceObjectType string | ||
var resourceObjectID string | ||
var relation string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean anything that these declarations moved out of the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means they are on the heap, which isn't good, but it also means we don't need to recalculate them every iteration, which is good
pkg/datastore/test/relationships.go
Outdated
@@ -1573,10 +1573,7 @@ func ConcurrentWriteSerializationTest(t *testing.T, tester DatastoreTester) { | |||
<-waitToFinish | |||
return err | |||
}) | |||
if err != nil { | |||
panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it this is unnecessary when we're using MustBugF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in a test, so I changed it to just return the error
@@ -562,17 +654,57 @@ func (tqs QueryExecutor) ExecuteQuery( | |||
limit = *queryOpts.Limit | |||
} | |||
|
|||
toExecute := query.limit(limit) | |||
if limit < math.MaxInt64 { | |||
query = query.limit(limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this conditional just to reduce what's going over the wire?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, yes
44f2cbb
to
aa6f111
Compare
Updated |
b45f0a7
to
65b4e07
Compare
aaad075
to
334d8ea
Compare
334d8ea
to
228953f
Compare
228953f
to
a904d41
Compare
a904d41
to
75efa5a
Compare
96bcf17
to
d98d7d2
Compare
d98d7d2
to
a833e25
Compare
Rebased |
|
||
// StaticValueOrAddColumnForSelect adds a column to the list of columns to select if the value | ||
// is not static, otherwise it sets the value to the static value. | ||
func StaticValueOrAddColumnForSelect(colsToSelect []any, queryInfo QueryInfo, colName string, field *string) []any { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these []any
? aren't columns always []string
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are actually string reference, and it returns an []any
because that's what the rows.Scan
requires.
This isn't adding the static value to the slice, its adding the reference to the string value for collecting the non-static value
// and injects additional proxies for validation at test time. | ||
// NOTE: These additional proxies are not performant for use in production (but then, | ||
// neither is memdb) | ||
func NewMemDBDatastoreForTesting( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep this in the memdb
package? This just feels hard to find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want the memdb package to have requirements on any of the common SQL code, since it is an odd mixture
internal/datastore/common/sql.go
Outdated
|
||
// WithAdditionalFilter returns a new SchemaQueryFilterer with an additional filter applied to the query. | ||
func (sqf SchemaQueryFilterer) WithAdditionalFilter(filter func(original sq.SelectBuilder) sq.SelectBuilder) SchemaQueryFilterer { | ||
return SchemaQueryFilterer{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this, what about a Clone()
method (with a test), and then modify the clone? Then you can't accidentally omit a field in these With
methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. This is also why I want to have the linter check these, but I'll add a Clone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, since any filter adjusts the underlying query builder, I've changed it to return the same object
|
||
colsToSelect = append(colsToSelect, &expiration) | ||
|
||
if withIntegrity { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it make sense to have determine this from queryinfo instead of a seperate boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are injected at different places. To add it to QueryInfo, we'd need a Clone or a WithIntegrity(bool) on QueryInfo. I'm happy to make that change, if you like
} | ||
|
||
// QueryRelationships queries relationships for the given query and transaction. | ||
func QueryRelationships[R Rows, C ~map[string]any](ctx context.Context, queryInfo QueryInfo, sqlStatement string, args []any, span trace.Span, tx Querier[R], withIntegrity bool) (datastore.RelationshipIterator, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be simpler to have a QueryRelationshipsNoElision
(or whatever name) insted of all of the queryInfo.Schema.ColumnOptimization == ColumnOptimizationOptionNone
throughout. Then the query constructors can just pick the implementation on startup and not check at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was all intended to be temporary until we end the experiment and remove the flag. I can change it to use a different query function as you suggest, but I'm not sure its worth the effort for a temp flag
@@ -580,19 +702,73 @@ func (tqs QueryExecutor) ExecuteQuery( | |||
limit = *queryOpts.Limit | |||
} | |||
|
|||
toExecute := query.limit(limit) | |||
if limit < math.MaxInt64 { | |||
query = query.limit(limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the impact of no longer always having a limit clause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, nothing. It just means sending less bytes over the wire
internal/datastore/common/sql.go
Outdated
// Set the column names to select. | ||
columnNamesToSelect := make([]string, 0, 8+len(query.extraFields)) | ||
|
||
columnNamesToSelect = checkColumn(columnNamesToSelect, query.schema.ColumnOptimization, query.filteringColumnTracker, query.schema.ColNamespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this logic is essentially replicating the logic in QueryRelationships - there's no way to unify those two things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto for spanner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not easily; it is because of Spanner that its hard to do: Spanner needs its own executor code (it can't use the one relationships.go because it has different handling of types for caveats and expiration), but by putting the column selection here, we do share at least that portion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can column selection be factored out and called from both places? this seems easy to get out of sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could push it down to the use sites, but it would be fairly similar.
Out of sync is fairly evident: every single read test immediately fails
@@ -417,7 +435,24 @@ type querier interface { | |||
QueryContext(context.Context, string, ...interface{}) (*sql.Rows, error) | |||
} | |||
|
|||
func newMySQLExecutor(tx querier) common.ExecuteQueryFunc { | |||
type wrappedTX struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it wrapped? can we have a better name here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It rewrites it to match the interface expected by the common executor. Any suggestions? wrappedForCommon
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah anything that will indicate the reason for the wrapping, like withCommonTx
or asCommonTx
or commonTx
0d16bd1
to
e0692f6
Compare
internal/datastore/common/sql.go
Outdated
} | ||
|
||
toExecute.queryBuilder = toExecute.queryBuilder.From(from) | ||
columnNamesToSelect = append(columnNamesToSelect, b.Schema.ColExpiration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should there be a way to skip expiration fields if we know the rel can't have expiration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to do it a followup PR, but I'm happy to do so here. Let me know which you prefer
Updated |
|
||
// QueryRelationships queries relationships for the given query and transaction. | ||
func QueryRelationships[R Rows, C ~map[string]any](ctx context.Context, builder RelationshipsQueryBuilder, span trace.Span, tx Querier[R]) (datastore.RelationshipIterator, error) { | ||
defer span.End() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the caller be responsible to close the span then?
@@ -200,6 +200,33 @@ func newCRDBDatastore(ctx context.Context, url string, options ...Option) (datas | |||
return nil, fmt.Errorf("invalid head migration found for cockroach: %w", err) | |||
} | |||
|
|||
relTableName := tableTuple | |||
if config.withIntegrity { | |||
relTableName = tableTupleWithIntegrity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated, but out of curiosity: why was integrity support added as a new table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to support it in a way that didn't add the overhead of the columns to every caller, so we used a distinct table. Now that this change is going in, we could in theory move the normal rels table to support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it could be a maintenance burden down the line
internal/graph/check.go
Outdated
directSubjectOrWildcardCanHaveCaveats := false | ||
directSubjectOrWildcardCanHaveExpiration := false | ||
|
||
nonTerminalsCanHaveCaveats := false | ||
nonTerminalsCanHaveExpiration := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't defaulting to false fail-open? Assuming there is nothing detecting this down the line, let's say there is a bug, and we miss setting this to true somewhere:
- in the case of
false
, the system would elide caveats/expiration, leading to a CVE - in the case of
true
, the query will be slower, but the subproblem will be computed correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I flipped the logic using a counter... it reads a bit oddly though, so let me know what you think
internal/graph/check.go
Outdated
@@ -624,6 +651,56 @@ func (cc *ConcurrentChecker) checkComputedUserset(ctx context.Context, crc curre | |||
return combineResultWithFoundResources(result, membershipSet) | |||
} | |||
|
|||
// queryOptionsForArrowRelation returns query options such as SkipCaveats and SkipExpiration if *none* of the subject | |||
// types of the given relation support caveats or expiration. | |||
func (cc *ConcurrentChecker) queryOptionsForArrowRelation(ctx context.Context, reader datastore.Reader, namespaceName string, relationName string) ([]options.QueryOptionsOption, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please add unit tests to this for this method?
// types of the given relation support caveats or expiration. | ||
func (cc *ConcurrentChecker) queryOptionsForArrowRelation(ctx context.Context, reader datastore.Reader, namespaceName string, relationName string) ([]options.QueryOptionsOption, error) { | ||
// TODO(jschorr): Change to use the type system once we wire it through Check dispatch. | ||
nsDefs, err := reader.LookupNamespacesWithNames(ctx, []string{namespaceName}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this add a new roundtrip in the critical path? 😬 Any potential gains by eliding columns would be thwarted by this new extra query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be cached since the namespace was already accessed for the check previously
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, you are right. But we don't have anything asserting this behavior, which seems pretty important and would impact if for whatever reason it regresses.
columns that have static values This vastly reduces data over the wire, as well as deserialization time and memory usage
This will allow us to centrally register additional datastore validation that only runs at test time
This validation test acts as a proxy in the memdb testing datastore and validates that the column elision code (which *isn't* used in memdb) matches the static fields to the values returned for all relationships loaded
This moves the behavior out of Spanner datastore and into a common lib where possible
…sabled or the relationship cannot be marked as expiring
Also adds a datastore test to ensure the constructed cursor operates as expected
9777544
to
6793f2b
Compare
Updated |
- Have QueryRelationships add events to the parent span directly - Cleanup CRDB system time handling and add debug-time assertions for "as of system time" - Additional testing
6793f2b
to
f71404b
Compare
This vastly reduces data over the wire, as well as deserialization time and memory usage
Fixes #59
Fixes #1527