Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(datatypes): return pd.Timestamp or pd.Series[datetime64] for date.to_pandas() #8784

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mfatihaktas
Copy link
Contributor

@mfatihaktas mfatihaktas commented Mar 26, 2024

Description of changes

Aims to close #8019.

The example given by @NickCrews in the Issue description runs as follows with the changes here:

>>> d = ibis.memtable({"date": ["2024-01-01", "2024-01-02"]}).cast({"date": "date"}).date
>>> s = d.to_pandas()
>>> print(type(s[0]))
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
>>> s
0   2024-01-01
1   2024-01-02
Name: date, dtype: datetime64[s]

To the reviewer: @NickCrews added the following suggestion in the issue:

It seems like you really thought about the semantics of this already per some comments in that issue and linked PR, but curious if there would be a problem with these semantics:
DateScalar.to_pandas() -> pd.Timestamp
DateColumn.to_pandas() -> pd.Series[datetime64]
DateScalar.execute() -> pd.Timestamp (because by definition this uses pandas)
DateColumn.execute() -> pd.Series[datetime64] (same)
repr(Date) -> "YYYY-MM-DD" (even if under the hood this uses .to_pandas() and we have pd.Timestamps in intermediate steps)

Should we add new tests to verify these suggestions?

@mfatihaktas mfatihaktas self-assigned this Mar 26, 2024
@mfatihaktas mfatihaktas added the datatypes Issues relating to ibis's datatypes (under `ibis.expr.datatypes`) label Mar 26, 2024
@mfatihaktas mfatihaktas force-pushed the fix/to_pandas-should-return-datetime64 branch 2 times, most recently from d3643b5 to 79ce1c3 Compare March 27, 2024 18:55
Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach looks reasonable, the devil is in the details as can be seen in CI 😂

@@ -74,7 +74,7 @@ def convert_Date(cls, s, dtype, pandas_type):
else:
s = dd.to_datetime(s)

return s.dt.normalize()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think might want to keep this, but not 100% sure. In theory once you're in this function you shouldn't need to normalize ... in theory :)

# not run the tests on this backend. If we do, should we also
# modify `convert_Timestamp_element()` and
# `convert_Time_element()` similarly?
return pd.Timestamp.fromisoformat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sort of up for grabs given that pandas doesn't have a standard way to represent an array of dates (the _element suffix implies [perhaps not in an obvious way] that this function is being called once per element of an array).

I think it's fine to also change this to using pandas timestamps.

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach looks reasonable, the devil is in the details as can be seen in CI 😂

@mfatihaktas mfatihaktas force-pushed the fix/to_pandas-should-return-datetime64 branch 6 times, most recently from 55b3c7a to 3271f89 Compare March 28, 2024 17:51
@mfatihaktas mfatihaktas changed the title fix: return pd.Timestamp or pd.Series[datetime64] for date.to_pandas() [WIP] fix: return pd.Timestamp or pd.Series[datetime64] for date.to_pandas() Mar 29, 2024
@mfatihaktas mfatihaktas force-pushed the fix/to_pandas-should-return-datetime64 branch from 3271f89 to d1b4cf6 Compare March 29, 2024 20:52
@mfatihaktas mfatihaktas marked this pull request as ready for review April 1, 2024 13:51
@mfatihaktas mfatihaktas requested a review from cpcloud April 1, 2024 15:59
@ncclementi
Copy link
Contributor

@mfatihaktas What's the status of this one? would you mind fixing the conflicts and triggering CI to see if things are green and request a review.

@mfatihaktas
Copy link
Contributor Author

@mfatihaktas What's the status of this one? would you mind fixing the conflicts and triggering CI to see if things are green and request a review.

Thanks for checking on it. Regarding the status, I marked this PR as ready-for-review some time ago. I am busy with another project this week, but will try to resolve the conflicts before EOD on Friday.

@mfatihaktas mfatihaktas force-pushed the fix/to_pandas-should-return-datetime64 branch 2 times, most recently from de1fe8a to 440d7a8 Compare June 10, 2024 19:44
@cpcloud cpcloud added this to the 10.0 milestone Jun 13, 2024
@cpcloud
Copy link
Member

cpcloud commented Jul 1, 2024

I think this PR is good to go, but it's slated for 10.0, so please do not merge.

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DO NOT MERGE

@gforsyth gforsyth changed the title fix: return pd.Timestamp or pd.Series[datetime64] for date.to_pandas() [10.0] fix: return pd.Timestamp or pd.Series[datetime64] for date.to_pandas() Jul 1, 2024
@gforsyth
Copy link
Member

gforsyth commented Jul 1, 2024

I made the PR title non-compliant with conventional commits, so we can avoid accidentally clicking the big green button.

Copy link
Contributor

github-actions bot commented Jul 1, 2024

ACTION NEEDED

Ibis follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message.

Please update your PR title and description to match the specification.

@cpcloud cpcloud added the breaking change Changes that introduce an API break at any level label Aug 4, 2024
@cpcloud cpcloud changed the title [10.0] fix: return pd.Timestamp or pd.Series[datetime64] for date.to_pandas() fix(datatypes): return pd.Timestamp or pd.Series[datetime64] for date.to_pandas() Dec 30, 2024
….to_pandas()

BREAKING CHANGE: The returned dtype of timestamp and date expressions in DataFrames and Series will be `datetime64` instead of object.
@cpcloud cpcloud force-pushed the fix/to_pandas-should-return-datetime64 branch from 440d7a8 to a0d8daa Compare December 31, 2024 13:21
@github-actions github-actions bot added docs Documentation related issues or PRs tests Issues or PRs related to tests sqlite The SQLite backend snowflake The Snowflake backend oracle The Oracle backend labels Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Changes that introduce an API break at any level datatypes Issues relating to ibis's datatypes (under `ibis.expr.datatypes`) docs Documentation related issues or PRs oracle The Oracle backend snowflake The Snowflake backend sqlite The SQLite backend tests Issues or PRs related to tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should date.to_pandas() return datetime64?
4 participants