You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While converting Spark data into Iceberg records, timestamps and dates are being converted into long values, which is confusing for users. This behavior makes it difficult to interpret the actual timestamp or date values in the records. Additionally, some engines, such as Presto, are unable to understand the timestamp value in long format, making it unreadable. Users expect these fields to retain their original formats or be represented in a more user-friendly way.
Steps to Reproduce:
Read a CSV data containing timestamp into a DataFrame and pass Row data into the SparkValueConverter convert method, then insert it into an Iceberg table.
Inspect the resulting Iceberg records.
Observe that timestamp and date fields are stored as long values.
Try to run the query from Presto engine. will get the timestamp issue.
Willingness to contribute
I can contribute a fix for this bug independently
I would be willing to contribute a fix for this bug with guidance from the Iceberg community
I cannot contribute a fix for this bug at this time
The text was updated successfully, but these errors were encountered:
Iceberg defines Timestamp as microseconds from epoch. When we actually store it in files it is up to that file format to define how that type is serialized. In this case of Parquet, this is also defined as an Int64.
If Presto has a bug decoding this that is indicative of a far deeper issue with the Presto read code for parquet files.
I'm also unsure how changing the underlying representation would help since it's up to the reader to decide how to display the value to users since native parquet files are not readable by humans (at least not me)
Apache Iceberg version
1.7.1 (latest release)
Query engine
PrestoDB
Please describe the bug 🐞
While converting Spark data into Iceberg records, timestamps and dates are being converted into long values, which is confusing for users. This behavior makes it difficult to interpret the actual timestamp or date values in the records. Additionally, some engines, such as Presto, are unable to understand the timestamp value in long format, making it unreadable. Users expect these fields to retain their original formats or be represented in a more user-friendly way.
Steps to Reproduce:
Willingness to contribute
The text was updated successfully, but these errors were encountered: