Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new rewrite manifest spark action to accept custom partition o… #11881

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zachdisc
Copy link

Note this is a fresh PR replacing #9731. It had too much accumulated conflicts and changes, I rebased and messed it up. This is a clean start with all previous feedback incorporated.

What

This adds a simple sort method to the RewriteManifests spark action which lets user specify the partition column order to consider when grouping manifests.

Illustration:

RewriteManifests.Result result =
        actions
            .rewriteManifests(table)
            .sort("c", "b", "a")  < -- this is the new api piece
            .execute();

Closes #9615

Why

Iceberg's metadata is organized into a forest of manifest_files which point to data files sharing common partitions. By default, and during RewriteManifests, the partition grouping is determined by the default Spec partition order. If the primary query pattern is more aligned with the last partition in the table's spec, manifests are poorly suited to quickly plan and prune around those partitions.

EG

CREATE TABLE
...
PARTITIONED BY (region, storeId, bucket(ipAddress, 100), days(event_time)

Will create manifests that first group by region, whose manifest_file contents may span a wide range of event_time values. For a primary query pattern that doesn't care about region, storeId, etc, this leads to inefficient queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RewriteManifest with more options
1 participant