Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible bug: Table.__repr__ sometimes produces non-ASCII characters #10516

Open
1 task done
edschofield opened this issue Nov 21, 2024 · 2 comments
Open
1 task done
Labels
bug Incorrect behavior inside of ibis

Comments

@edschofield
Copy link

edschofield commented Nov 21, 2024

What happened?

In interactive mode, the __repr__ for a Table differs depending on the run-time environment. The __repr__ includes Unicode characters and ANSI escape codes for colours when repr(table) is called from a Jupyter notebook but not when called from the Python or IPython interpreter.

For example, this code:

import ibis
ibis.options.interactive = True

url = "https://raw.githubusercontent.com/PythonCharmers/PythonCharmersData/refs/heads/master/palmerpenguins.csv"
penguins = ibis.read_csv(url)
print(len(repr(penguins)))

outputs 3262 when run from Jupyter but 1206 when run from IPython or as a regular Python script.

I find both of these facts surprising:

  • that the __repr__ differs depending on the runtime environment
  • that the __repr__ sometimes contains non-ASCII characters

I expect that both are likely to cause problems in various workflows in ways that are hard to anticipate. But one simple example is when rendering notebooks via LaTeX. If the following cell appears in a notebook called ibis_str.ipynb along with its output:

import ibis
ibis.options.interactive = True

url = "https://raw.githubusercontent.com/PythonCharmers/PythonCharmersData/refs/heads/master/palmerpenguins.csv"
penguins = ibis.read_csv(url)
print(penguins)

then converting it to PDF as follows fails with a LaTeX error due to the use of Unicode characters:

jupyter nbconvert ---to latex ibis_str.ipynb
pdflatex ibis_str.tex

Another very surprising effect of the different code paths taken for __repr__ depending on the run-time environment is that this code:

import ibis
import polars as pl
ibis.options.interactive = True

url = "https://raw.githubusercontent.com/PythonCharmers/PythonCharmersData/refs/heads/master/palmerpenguins.csv"
penguins_pl = pl.read_csv(url)

penguins = ibis.memtable(penguins_pl)
output = repr(penguins)

currently fails with a very different exception when run in Python / IPython:

ParserException: Parser Error: zero-length delimited identifier at or near """"

versus the ValueError raised in a Jupyter notebook (as reported in issue #10514):

ValueError: Target schema's field names are not matching the table's field names: ...

I believe the standard approach would be for the Table class to have a single code path for __repr__ that produces the same ASCII string independent of the runtime environment and to define IPython-compatible methods like ._repr_pretty_, _repr_html_, and _repr_latex_ for fancier output in IPython and Jupyter.

If you agree that this would be an improvement, I can volunteer a PR as my first contribution to the project.

What version of ibis are you using?

9.5.0

What backend(s) are you using, if any?

No response

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@edschofield edschofield added the bug Incorrect behavior inside of ibis label Nov 21, 2024
@gforsyth
Copy link
Member

Hey @edschofield -- thanks for the thorough report!

I haven't delved into the repr lately. It would probably be worth first clarifying which of these behaviors is coming from rich, which we use for table formatting, and which are coming from Ibis, or Ibis' (possibly odd) use of rich.

Also, I don't view the use of unicode as a bug. Even if our display skeleton didn't have any unicode characters, a great number of our backends will emit them in result-sets.

@cpcloud
Copy link
Member

cpcloud commented Jan 1, 2025

but not when called from the Python or IPython interpreter.

This is not correct. Both Python and IPython will display unicode box characters for the table repr. Python will not render escape codes by default, but IPython will. This is probably due to a rich behavior or related configuration that we're using.

That said, I think the approach you're describing is the right one: ASCII for __repr__ (even if data would contain unicode), independent of runtime environment, and then we can get fancier with the Jupyter/IPython methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
Status: backlog
Development

No branches or pull requests

3 participants