Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(duckdb): return null typed pyarrow arrays and disable creating tables with all null columns in duckdb #9810

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Aug 9, 2024

Address all-NULL column handling in the DuckDB backend. null pyarrow Arrays are returned (previously int32), and it is now an error on the Ibis side to create all null columns with create_table in the DuckDB backend. Fixes #9669.

@cpcloud cpcloud added bug Incorrect behavior inside of ibis datatypes Issues relating to ibis's datatypes (under `ibis.expr.datatypes`) duckdb The DuckDB backend labels Aug 9, 2024
Copy link
Contributor

github-actions bot commented Aug 9, 2024

ACTION NEEDED

Ibis follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message.

Please update your PR title and description to match the specification.

@cpcloud cpcloud changed the title fix(duckdb): return null typed pyarrow arrays and disable creating tables with all null columns in duckdb. fix(duckdb): return null typed pyarrow arrays and disable creating tables with all null columns in duckdb Aug 9, 2024
@cpcloud cpcloud requested review from gforsyth and jcrist August 9, 2024 13:30
@cpcloud cpcloud force-pushed the all-nulls branch 2 times, most recently from 8f42f56 to 3784885 Compare August 9, 2024 16:56
@ncclementi
Copy link
Contributor

@cpcloud The flink backend is failing, it doesn't like the null type column. I got it to xfail by adding something similar to what you added in _register_in_memory_table() for other backends, in the execute() for flink, although maybe it could go higher up in the to_pyarrow() method. Looks like for flink we don't have the _register_in_memory_table() method.

In backends/flink/__init__.py :

    def execute(self, expr: ir.Expr, **kwargs: Any) -> Any:
        """Execute an expression."""
        self._register_udfs(expr)

        table_expr = expr.as_table()

        if null_columns := table_expr.op().schema.null_fields:
            raise exc.IbisTypeError(
                f"{self.name} cannot yet reliably handle `null` typed columns; "
                f"got null typed columns: {null_columns}")

        sql = self.compile(table_expr, **kwargs)
        df = self._table_env.sql_query(sql).to_pandas()

@NickCrews
Copy link
Contributor

@cpcloud im motivated to help this one across the finish line. If I fix the flink tests is this good to merge? Any thoughts regarding my incline comments?

@cpcloud cpcloud added this to the 10.0 milestone Sep 23, 2024
@github-actions github-actions bot added tests Issues or PRs related to tests impala The Apache Impala backend postgres The PostgreSQL backend clickhouse The ClickHouse backend mysql The MySQL backend mssql The Microsoft SQL Server backend trino The Trino backend oracle The Oracle backend flink Issues or PRs related to Flink exasol Issues related to the exasol backend risingwave The RisingWave backend labels Sep 23, 2024
Copy link
Contributor

@NickCrews NickCrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts, but thank you very much for this fix! I would love to help push this across the finish line.

  • If you point me to what you think needs to happen with flink, I can help
  • if you give a thumbs up to my suggestions, I can implement them

ibis/expr/schema.py Show resolved Hide resolved
ibis/backends/tests/test_export.py Show resolved Hide resolved
ibis/backends/duckdb/converter.py Show resolved Hide resolved
@cpcloud
Copy link
Member Author

cpcloud commented Oct 22, 2024

@NickCrews Can you give this another review pass and/or approve?

@NickCrews
Copy link
Contributor

NickCrews commented Oct 22, 2024

@cpcloud I tacked on two commits that I think are improvements, if those pass CI and you are happy, looks great to me to merge. I can then, in a followup PR, tackle adding in the nested null support that I describe in this comment. Thanks!

@NickCrews
Copy link
Contributor

Actually, hold on, we aren't converting null-typed scalars correctly. Pushing up a fixup commit soon.

@NickCrews
Copy link
Contributor

OK, with that I think I'm happy.

@cpcloud
Copy link
Member Author

cpcloud commented Dec 30, 2024

Moving off the 10.0 milestone.

@cpcloud cpcloud removed this from the 10.0 milestone Dec 30, 2024
@cpcloud cpcloud force-pushed the all-nulls branch 3 times, most recently from ad81580 to 7b6e7fc Compare December 31, 2024 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis clickhouse The ClickHouse backend datatypes Issues relating to ibis's datatypes (under `ibis.expr.datatypes`) duckdb The DuckDB backend exasol Issues related to the exasol backend flink Issues or PRs related to Flink impala The Apache Impala backend mssql The Microsoft SQL Server backend mysql The MySQL backend oracle The Oracle backend postgres The PostgreSQL backend risingwave The RisingWave backend tests Issues or PRs related to tests trino The Trino backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: Errors with NULL columns
3 participants