Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ArrowTypeError: "Could not convert" Error in inspect._files method #1477

Open
1 of 3 tasks
xsfa opened this issue Dec 28, 2024 · 0 comments
Open
1 of 3 tasks

[BUG] ArrowTypeError: "Could not convert" Error in inspect._files method #1477

xsfa opened this issue Dec 28, 2024 · 0 comments

Comments

@xsfa
Copy link

xsfa commented Dec 28, 2024

Apache Iceberg version

0.8.1 (latest release)

Please describe the bug 🐞

I think PyArrow is receiving misformatted data from the file metadata, causing me to be unable to call any of the file functions. Could this be caused by my Iceberg table format or is it a genuine bug? I have confirmed that my table is a valid Iceberg V2 table and readable.

Code:

test_table = catalog.load_table("test.table")
current_snapshot_id = test_table.metadata.current_snapshot_id
test_table.inspect.files(current_snapshot_id)

Full Stack Trace:

---------------------------------------------------------------------------
ArrowTypeError                            Traceback (most recent call last)
Input [In [32]](vscode-notebook-cell:?execution_count=32), in <cell line: 17>()
     [14](vscode-notebook-cell:?execution_count=32&line=14) current_snapshot_id = test_table.metadata.current_snapshot_id
     [15](vscode-notebook-cell:?execution_count=32&line=15) print(current_snapshot_id)
---> [17](vscode-notebook-cell:?execution_count=32&line=17) test_table.inspect.files(current_snapshot_id)

File ~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:582, in InspectTable.files(self, snapshot_id)
    [581](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:581) def files(self, snapshot_id: Optional[int] = None) -> "pa.Table":
--> [582](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:582)     return self._files(snapshot_id)

File ~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:576, in InspectTable._files(self, snapshot_id, data_file_filter)
    [541](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:541)         readable_metrics = {
    [542](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:542)             schema.find_column_name(field.field_id): {
    [543](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:543)                 "column_size": column_sizes.get(field.field_id),
   (...)
    [554](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:554)             for field in self.tbl.metadata.schema().fields
    [555](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:555)         }
    [556](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:556)         files.append({
    [557](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:557)             "content": data_file.content,
    [558](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:558)             "file_path": data_file.file_path,
   (...)
    [573](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:573)             "readable_metrics": readable_metrics,
    [574](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:574)         })
--> [576](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:576) return pa.Table.from_pylist(
    [577](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:577)     files,
    [578](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:578)     schema=files_schema,
    [579](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:579) )

File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi:3700, in pyarrow.lib.Table.from_pylist()

File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi:5228, in pyarrow.lib._from_pylist()

File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi:3575, in pyarrow.lib.Table.from_arrays()

File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi:1398, in pyarrow.lib._sanitize_arrays()

File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/array.pxi:350, in pyarrow.lib.asarray()

File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/array.pxi:320, in pyarrow.lib.array()

File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/array.pxi:39, in pyarrow.lib._sequence_to_array()

File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/error.pxi:144, in pyarrow.lib.pyarrow_internal_check_status()

File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/error.pxi:123, in pyarrow.lib.check_status()

ArrowTypeError: Could not convert {1: 145, 2: 545, 3: 132, 4: 91, 5: 92, 6: 80, 7: 42, 8: 118, 9: 146, 10: 108, 11: 188, 12: 112, 13: 169, 14: 42, 15: 166, 16: 1248, 17: 57, 18: 38, 19: 81, 20: 120, 21: 42, 22: 129, 23: 90, 24: 38, 25: 38, 26: 80, 27: 544, 28: 112, 29: 79, 30: 131, 31: 71, 32: 70, 33: 70} with type dict: was not a sequence or recognized null for conversion to list type

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time
@xsfa xsfa changed the title ArrowTypeError: "Could not convert" Error in inspect._files method [BUG] ArrowTypeError: "Could not convert" Error in inspect._files method Dec 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant