-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize conflict do update (upserts) on Hypercore TAM #7537
Optimize conflict do update (upserts) on Hypercore TAM #7537
Conversation
/* Is it guaranteed that "slot" is populated with the conflicting segment | ||
* tuple? Otherwise we need to fetch the compressed segment... Unclear if | ||
* this is ever needed. */ | ||
if (TTS_EMPTY(slot) && !table_tuple_fetch_row_version(relation, ctid, snapshot, slot)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure this fetch is needed, hence the comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it is guaranteed.
- Before calling this function in
ExecOnConflictUpdate
,table_tuple_lock
will be called with the TID of the conflicting tuple. - Since you have a check for
!is_compressed_tid(ctid)
above, we only need to consider theis_compressed_tid
path inhypercore_tuple_lock
. - This branch in
hypercore_tuple_lock
fetches the child slot for the slot passed in and callsheapam_tuple_lock
on the compressed relation with the TID for the compressed tuple part of the conflicting TID. heapam_tuple_lock
does a lot of voodoo magic to fetch the heap tuple, but ends withExecStorePinnedBufferHeapTuple(tuple, slot, buffer)
which stores the locked tuple in the slot.- Later in
ExecOnConflictUpdate
this function is called, which will then contain the compressed tuple with the conflicting tuple.
It might be prudent to keep this check to deal with changes upstreams, but we have this comment in tableam.h
:
* Lock a tuple in the specified mode.
.
.
.
* Output parameters:
* *slot: contains the target tuple
* *tmfd: filled in failure cases (see below)
So it seems that there is a guarantee that the slot will contain the target tuple, hence also the compressed tuple with the conflict.
6d839eb
to
5565ec3
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7537 +/- ##
==========================================
+ Coverage 80.06% 82.24% +2.17%
==========================================
Files 190 230 +40
Lines 37181 43326 +6145
Branches 9450 10898 +1448
==========================================
+ Hits 29770 35634 +5864
- Misses 2997 3374 +377
+ Partials 4414 4318 -96 ☔ View full report in Codecov by Sentry. |
5565ec3
to
b4db81e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few suggestions for improvements to the comments, and I think you can remove the statement you have a comment about, but since neither of these is a reason to block a merge, I approve the pull request.
src/nodes/hypertable_modify.c
Outdated
if (ts_is_hypercore_am(resultRelInfo->ri_RelationDesc->rd_rel->relam)) | ||
{ | ||
ItemPointerData new_tid; | ||
int ntuples = | ||
ts_cm_functions->hypercore_decompress_conflict_segment(resultRelInfo->ri_RelationDesc, | ||
conflictTid, | ||
existing, | ||
context->estate->es_snapshot, | ||
&new_tid); | ||
|
||
if (ntuples > 0) | ||
{ | ||
/* | ||
* The conflicting row was decompressed, so must update the | ||
* conflictTid to point to the decompressed row. | ||
*/ | ||
ItemPointerCopy(&new_tid, conflictTid); | ||
/* | ||
* Since data was decompressed, the command counter was | ||
* incremented to make it visible. Make sure the executor uses the | ||
* latest command ID to see the changes. | ||
*/ | ||
context->estate->es_output_cid = GetCurrentCommandId(true); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just repeating our discussion to clarify why we need this.
The standard ExecOnConflictUpdate
first calls table_tuple_lock
with the TID of the tuple that conflicts so in theory we should be able to decompress the batch on a call to lock the tuple. Unfortunately, this do not work since that would change the TID from a compressed TID to an non-compressed TID, and the rest of ExecOnConflictUpdate
assumes that it does not change.
So, unfortunately, it seems like we need this code for now.
/* Is it guaranteed that "slot" is populated with the conflicting segment | ||
* tuple? Otherwise we need to fetch the compressed segment... Unclear if | ||
* this is ever needed. */ | ||
if (TTS_EMPTY(slot) && !table_tuple_fetch_row_version(relation, ctid, snapshot, slot)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it is guaranteed.
- Before calling this function in
ExecOnConflictUpdate
,table_tuple_lock
will be called with the TID of the conflicting tuple. - Since you have a check for
!is_compressed_tid(ctid)
above, we only need to consider theis_compressed_tid
path inhypercore_tuple_lock
. - This branch in
hypercore_tuple_lock
fetches the child slot for the slot passed in and callsheapam_tuple_lock
on the compressed relation with the TID for the compressed tuple part of the conflicting TID. heapam_tuple_lock
does a lot of voodoo magic to fetch the heap tuple, but ends withExecStorePinnedBufferHeapTuple(tuple, slot, buffer)
which stores the locked tuple in the slot.- Later in
ExecOnConflictUpdate
this function is called, which will then contain the compressed tuple with the conflicting tuple.
It might be prudent to keep this check to deal with changes upstreams, but we have this comment in tableam.h
:
* Lock a tuple in the specified mode.
.
.
.
* Output parameters:
* *slot: contains the target tuple
* *tmfd: filled in failure cases (see below)
So it seems that there is a guarantee that the slot will contain the target tuple, hence also the compressed tuple with the conflict.
/* | ||
* Decompress a segment that contains the row given by ctid. | ||
* | ||
* This function is typically called during an upsert (ON CONFLICT DO UPDATE), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: it is only called in this situation.
* This function is typically called during an upsert (ON CONFLICT DO UPDATE), | |
* This function is called during an upsert (ON CONFLICT DO UPDATE), |
7792505
to
a86bc2b
Compare
When a conflict-do-update statement executes on a Hypercore TAM table, the conflicting TID (and slot) can point to a "row" in a compressed segment. Therefore, it is possible to directly decompress the conflicting segment before executing the update statement. In contrast, the existing on-conflict handling for compression doesn't have a unique index that points to the conflicting tuple, so it first needs to scan for a conflicting segment on the compressed chunk, which might, in some cases, be a full sequence scan on the compressed chunk, and decompression of the segments that might conflict. On a synthetic test case, the improvements to upserts are significant, about 32x faster execution time for a single row upsert, while buffers read is reduced by 2220x. This makes the upsert performance on par with that of non-compressed hypertables.
a86bc2b
to
a81884b
Compare
When a conflict-do-update statement executes on a Hypercore TAM table, the conflicting TID (and slot) can point to a "row" in a compressed segment. Therefore, it is possible to directly decompress the conflicting segment before executing the update statement.
In contrast, the existing on-conflict handling for compression doesn't have a unique index that points to the conflicting tuple, so it first needs to scan for a conflicting segment on the compressed chunk, which might, in some cases, be a full sequence scan on the compressed chunk, and decompression of the segments that might conflict.
On a synthetic test case, the improvements to upserts are significant, about 32x faster execution time for a single row upsert, while buffers read is reduced by 2220x. This makes the upsert performance on par with that of non-compressed hypertables.
Disable-check: force-changelog-file
Disable-check: approval-count