More clarification needed for Avro to iceberg data type conversion for timestamp variants #11890

Shekharrajak · 2024-12-30T09:50:59Z

Query engine

No response

Question

We have iceberg data type to spark data type mapping : https://iceberg.apache.org/docs/1.4.2/spark-writes/#iceberg-type-to-spark-type, which mentions:

timestamp with timezone	timestamp	
timestamp without timezone	timestamp_ntz

If I have avro schema like this :

{
  "type": "record",
  "name": "MyRecord",
  "fields": [
    {
      "name": "id",
      "type": "int"
    },
    {
      "name": "timestampWithZone",
      "type": {
        "type": "long",
        "logicalType": "timestamp-micros"
      }
    },
    {
      "name": "timestampWithoutZone",
      "type": {
        "type": "long",
        "logicalType": "timestamp-micros"
      }
    }
  ]
}

How can I tell iceberg which epoch time to consider as with timezone (timestamp_tz, timestamp_ntz) ?

I see there is Avro spec: https://iceberg.apache.org/spec/#avro but "adjust-to-utc": false but I find that in to_avro in spark handles it differently : https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala#L169C1-L190C8

          // For backward compatibility, if the Avro type is Long and it is not logical type
          // (the `null` case), output the timestamp value as with millisecond precision.
          case null | _: TimestampMillis => (getter, ordinal) =>
            DateTimeUtils.microsToMillis(timestampRebaseFunc(getter.getLong(ordinal)))
          case _: TimestampMicros => (getter, ordinal) =>
            timestampRebaseFunc(getter.getLong(ordinal))
          case other => throw new IncompatibleSchemaException(errorPrefix +
            s"SQL type ${TimestampType.sql} cannot be converted to Avro logical type $other")
        }

      case (TimestampNTZType, LONG) => avroType.getLogicalType match {
        // To keep consistent with TimestampType, if the Avro type is Long and it is not
        // logical type (the `null` case), output the TimestampNTZ as long value
        // in millisecond precision.
        case null | _: LocalTimestampMillis => (getter, ordinal) =>
          DateTimeUtils.microsToMillis(getter.getLong(ordinal))
        case _: LocalTimestampMicros => (getter, ordinal) =>
          getter.getLong(ordinal)
        case other => throw new IncompatibleSchemaException(errorPrefix +
          s"SQL type ${TimestampNTZType.sql} cannot be converted to Avro logical type $other")
      }

Means:

It look for logical type LocalTimestampMillis and LocalTimestampMicros while considering it as TimestampNTZType .
If logical type is TimestampMillis or TimestampMicros then only it is a TimestampType datatype.

Do we have specific reason to implement in this way ?

The text was updated successfully, but these errors were encountered:

Shekharrajak added the question Further information is requested label Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More clarification needed for Avro to iceberg data type conversion for timestamp variants #11890

More clarification needed for Avro to iceberg data type conversion for timestamp variants #11890

Shekharrajak commented Dec 30, 2024

More clarification needed for Avro to iceberg data type conversion for timestamp variants #11890

More clarification needed for Avro to iceberg data type conversion for timestamp variants #11890

Comments

Shekharrajak commented Dec 30, 2024

Query engine

Question