-
Notifications
You must be signed in to change notification settings - Fork 604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: Problem converting pandas dataframe to memtable when column contains both float and string values #10633
Comments
Thanks for the issue. What behavior are you expecting here with respect to the column type of |
I'm not sure I really mind. It would just be nice not to have to go through a pandas clean-up step first (after diagnosing the problem) before running ibis.memtable(). After all, this is a very common type of data inconsistency in Excel spreadsheets that have been created by hand. I'm in the process of switching from pandas to ibis for everything and it's been a 99% positive experience so far (thank you!), but this was one time where using pandas would definitely have been quicker. Is there a better way to read an xlsx file into ibis than going via pandas first? If you're looking for suggested enhancements, a built-in read_excel function would be nice (though I do understand that DuckDB doesn't have one of those). |
In fact there is one sneakily embedded in the
One workaround, which I suppose could be officially promoted as "the way" to read excel files is to use
|
Very nice! It reads my original spreadsheet without complaint (as type str). Yes, it still requires some cleaning up, but a. I can at least see the data, making this easier than an import error; and b. I can do this all in ibis, which was the point in the first place. Thanks! |
A quick follow-up: The original spreadsheet, from which my example was extracted, had two blank lines at the top. Is there a way to skip 2 rows with read_geo? I've tried looking at the recommended DuckDB Web page to see what **kwargs includes, but it doesn't seem to be too helpful... Thanks! |
It looks like there's no option for that in GDAL itself: https://gdal.org/en/stable/drivers/vector/xlsx.html |
Bother...! I guess I'll have to rename everything manually. |
What happened?
I've been running into problems reading Excel spreadsheets into ibis via pandas, and have tracked things down to the issue in the title. Here's sample code to show the problem:
Instead of creating and displaying the ibis table, as expected, I get an error (more detail below):
What version of ibis are you using?
9.5.0
What backend(s) are you using, if any?
Default: DuckDB
Relevant log output
Code of Conduct
The text was updated successfully, but these errors were encountered: