-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Delim Join/Get Relations #91
Conversation
@EpsilonPrime, following up on our conversations, I managed to implement a version of the duplicate eliminated operators that work with the Reference Relations. Basically, the Duplicate eliminated Get looks like this:
The input is the subtree that is duplicate eliminated from the duplicate eliminated join, and the column IDs of the returned columns that are the duplicate eliminated from the input. A reference relation is also used on the duplicate eliminated join. The relevant cc: @ianmcook |
This PR introduces support for generating and consuming query plans with flattening subquery operators (i.e., delim joins and gets). It also now supports full TPC-H roundtrip through DuckDB.
It's important to note that this is still highly experimental and depends on
algebra.proto
file. feat: add operators to support duplicate eliminated joins substrait#695Regarding the substrait changes, we need to add a
JOIN_TYPE_MARK
to theJoinRel
and add two new relations.DelimGetRel
andDelimJoinRel
.DelimGetRel
mainly has to store the types of delim columns.e.g.,
On the other hand,
DelimJoinRel
basically has the same attributes as aJoinRel
. The difference is that it adds theduplicate_eliminated_columns
and an optimization if the delim is flipped or not.e.g.,
Besides all that, it was also necessary to two join types. Namely
JOIN_TYPE_RIGHT_SEMI
andJOIN_TYPE_RIGHT_ANTI
;cc @EpsilonPrime @ianmcook