Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Adding File Metadata Directly #1470

Open
subthedubdub opened this issue Dec 25, 2024 · 1 comment
Open

Support Adding File Metadata Directly #1470

subthedubdub opened this issue Dec 25, 2024 · 1 comment

Comments

@subthedubdub
Copy link

Feature Request / Improvement

Support a table transaction where the user can directly supply the file metadata, similar to the Java Interface.

This is a lower-level operation than what is covered add_files operation and it covers several additional use-cases. For example:

  1. Implicit Partitioning: Iceberg does not require that partition values actually be stored in the actual parquet files. And in some cases, it may be useful to define a partiion value of a parquet file after it has been written (or to intentionally exclude to e.g. minimize storage). However, add_files does not currently support an option.
  2. Pre-calculated statistics: Calculating parquet statistics (e.g. column bounds) can be a somewhat resource intensive operations. If this has been pre-calculated by the user, it would be faster to update the manifest file directly.
@Fokko
Copy link
Contributor

Fokko commented Dec 26, 2024

Hey @subthedubdub Thanks for reaching out here. We allow appending data files directly:

with tbl.transaction() as txn:
    with txn.update_snapshot() as snapshot:
        with snapshot.fast_append() as append:
            append.append_data_file(data_file)

This is not documented, since it is a low-level API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants