Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request] Area of Improvements for Documentation #1407

Open
3 tasks
kevinjqliu opened this issue Dec 6, 2024 · 4 comments
Open
3 tasks

[Request] Area of Improvements for Documentation #1407

kevinjqliu opened this issue Dec 6, 2024 · 4 comments

Comments

@kevinjqliu
Copy link
Contributor

Feature Request / Improvement

The PyIceberg documentation is currently lagging behind its feature set. I’m starting this issue to track community feedback on areas that need improvement and to encourage contributors to collaborate in enhancing the documentation.

Improvements could include refining existing documentation, adding more examples for specific features, or creating a cookbook (similar to issue #1201).

The current documentation can be found at the following locations:

Here's a list of possible improvements that are in my backlog. I'll add to this list as more requests come in.

Improvements

  • Connection to various catalogs (SQL/Hive/Glue/REST)
  • Create a table with partition and sort order
  • Dynamic overwrite
@jeppe-dos
Copy link

The schema evolution on the struct is incorrect in the pyiceberg api documention Schema evolution. It says that to use dot formatting "<struct_name>.<field_name>", but this is incorrect. It should be given as a tuple ("<struct_name>", "<field_name>").

@kevinjqliu
Copy link
Contributor Author

@jeppe-dos thanks for reporting this! Is this something you would like to contribute?

@astrojuanlu
Copy link

astrojuanlu commented Dec 16, 2024

Connection to various catalogs (SQL/Hive/Glue/REST)

Just having an authoritative page that lists all supported catalogs would be nice. According to various pages in https://iceberg.apache.org/docs/1.7.1/, looks like there are multiple options:

  • AWS Glue
  • Amazon DynamoDB
  • Hadoop
  • Hive Metastore
  • Project Nessie
  • The newly introduced Amazon S3 Tables Catalog
  • Catalogs implementing the Apache Iceberg REST Open API specification
    • Including Apache Polaris
      • With Snowflake Open Catalog being a managed service for it
    • Also including Apache Gravitino?
  • BigQuery Metastore, which is replacing BigLake Metastore
  • Just any "table in a relational database" through JDBC
  • Also Unity Catalog (maybe?)
  • Also Dell ECS¿?

This is probably not exhaustive. Not all of these are listed in https://github.com/apache/iceberg/blob/f40ec2096bc078b9fd2b59d6beb32cd77e371ac4/core/src/main/java/org/apache/iceberg/CatalogUtil.java#L69-L74 , which makes it all even more confusing.

@jeppe-dos
Copy link

Yes, I have made a PR:

@jeppe-dos thanks for reporting this! Is this something you would like to contribute?

Yes. I have made a pull request here: #1433

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants