-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unneeded metadata read during update event generation #11829
base: main
Are you sure you want to change the base?
Remove unneeded metadata read during update event generation #11829
Conversation
ad81312
to
28984fe
Compare
@@ -475,10 +475,14 @@ public void commit() { | |||
} | |||
} | |||
|
|||
Object updateEvent(Snapshot committedSnapshot) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PendingUpdate.updateEvent
only usage is in SnapshotProducer currently so could change the interface directly to updateEvent(Snapshot committedSnapshot)
, but did not want to make a backwards incompatible API change
The whole update event/listener functionality seems untouched for years.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SnapshotProducer
is package private, so I think we're OK in terms of backwards compatibility since it's not like a public API is being broken.
28984fe
to
a710e1c
Compare
avoids unnecessary iceberg metadata read of just committed snapshot
a710e1c
to
072bacf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @grantatspothero , I agree with the principle idea of this change to derive the update event from the committed snapshot, no need to potentially read the whole metadata again right after the commit. I just had some comments on the implementation ; it'd also be ideal to have some tests which verify the produced event has the expected properties.
There's a broad question for the need for the listener API since I think these days at least for the commit path, the commit report sent to REST implementations has all those details but there's probably legitimate use cases for non-REST cases or even just generic patterns (sending events to a queue or whatnot). The interface is pretty straightforward/lightweight, and users can have whatever complexity they want in their own implementations.
ValidationException.check( | ||
snapshotId == committedSnapshot.snapshotId(), | ||
"Committed snapshotId %s does not match expected snapshotId %s", | ||
committedSnapshot.snapshotId(), | ||
snapshotId); | ||
long sequenceNumber = committedSnapshot.sequenceNumber(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we uplevel this logic, it's common to all the implementations?
@@ -475,10 +475,14 @@ public void commit() { | |||
} | |||
} | |||
|
|||
Object updateEvent(Snapshot committedSnapshot) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SnapshotProducer
is package private, so I think we're OK in terms of backwards compatibility since it's not like a public API is being broken.
long snapshotId = snapshotId(); | ||
Snapshot justSaved = ops().refresh().snapshot(snapshotId); | ||
long sequenceNumber = TableMetadata.INVALID_SEQUENCE_NUMBER; | ||
Map<String, String> summary; | ||
if (justSaved == null) { | ||
// The snapshot just saved may not be present if the latest metadata couldn't be loaded due to | ||
// eventual | ||
// consistency problems in refresh. | ||
LOG.warn("Failed to load committed snapshot: omitting sequence number from notifications"); | ||
summary = summary(); | ||
} else { | ||
sequenceNumber = justSaved.sequenceNumber(); | ||
summary = justSaved.summary(); | ||
} | ||
|
||
return new CreateSnapshotEvent(tableName, operation(), snapshotId, sequenceNumber, summary); | ||
ValidationException.check( | ||
snapshotId == committedSnapshot.snapshotId(), | ||
"Committed snapshotId %s does not match expected snapshotId %s", | ||
committedSnapshot.snapshotId(), | ||
snapshotId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need the validation? I feel like the principle of this change is that the update event that is produced is always going to be derived from the passed in committed snapshot. I think passing committedSnapshot.id()
to the event suffices
Followup from this PR:
#10523
The above PR removed unnecessary objectstore reads after commit, but there was 1 I missed.
SnapshotProducer.notifyListeners
has the same problem of reading metadata from objectstore instead of just reading the in memory committed Snapshot object.