-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
catalog.load_table
raises Invalid JSON error
#1328
Comments
I think this usually means the table metadata from the request is empty/invalid json. iceberg-python/pyiceberg/catalog/hive.py Lines 519 to 540 in 93ebd39
|
The issue is most likely from reading the table metadata file iceberg-python/pyiceberg/catalog/hive.py Lines 300 to 307 in 93ebd39
|
@sandcobainer Thanks for raising this, any chance that you could share the table metadata JSON? |
@Fokko I've tried to see if my s3 credentials are the issue, but that doesn't seem to be the issue. Here's the metadata file that i downloaded directly from the S3 bucket. @kevinjqliu it does look like an empty file is being read. could it be something to do with permissions to read? |
if it's empty on read, it's most likely related to a permission issue. Here's something you can run to debug.
|
@kevinjqliu tried this snippet by pointing the metadata location directly to the s3 uri, and the error is the same. does this mean it's an s3 access issue?
correction: i suspect if the parsed file is empty only from the error |
likely, to debug you can try reading the file directly
this should match when you read the file from S3 |
so i ran boto3 vs pyiceberg's
Output:
returns
|
that would explain the validation error. Its weird to me that s3 returns 0 bytes instead of an error. Couple of things you can try.
you can verify like
hope this helps! |
Question
Context: So I'm trying to run a simple proof of concept with PyIceberg, Hive Metastore (with an SQL dump as a hive metastore schema) and an S3 bucket of iceberg tables. I setup a docker compose file with mysql and hive metastore and the containers seem to be running fine. I am able to read the catalog, databases and tables with
Results:
[('default',), ('default_database',), ('tenantdb',), ('test',), ('testdb',)]
[('tenantdb', 'pinglogs'), ('tenantdb', 'pinglogs1'), ('tenantdb', 'pinglogs_bad'), ('tenantdb', 'pinglogs2'), ('tenantdb', 'pinglogs3')]
Issue: Running
pinglogs = catalog.load_table('tenantdb.pinglogs')
raises a validation errorValidationError: 1 validation error for TableMetadataWrapper Invalid JSON: EOF while parsing a value at line 1 column 0 [type=json_invalid, input_value='', input_type=str]"
I looked at the metadata json and can confirm it's a valid json file. Is there a way to debug what the downloaded metadata looks like or what else to pass to
load_table
?The text was updated successfully, but these errors were encountered: