BUG: MultiIndex union/difference not commutative #60642
Labels
Bug
Index
Related to the Index class or subclasses
Missing-data
np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
MultiIndex
Needs Info
Clarification about behavior needed to assess issue
Needs Triage
Issue that has not been reviewed by a pandas team member
setops
union, intersection, difference, symmetric_difference
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
I wasn't able to extract the data for this example to set up the test case more programmatically, but I managed to reduce the data significantly without compromising the behaviour. I hope the below is somewhat portable (please let me know if it isn't). I used numpy==2.2.1 and pandas==2.2.3
Issue Description
Creating the union of two indices with a nan level causes the union result to depend on the order of the call (
index1.union(index2)
vs.index2.union(index1)
). With other words, one of the calls yields the wrong result as the call deems every row to be distinct. I'm fairly certain that is is due tonan
value in dim1, but if I recreate the example programmatically, the behaviour is as expected.However, in test cases for a rather large application, I arrive at the state from the pickle example. I'm not sure what's different to the working example
Expected Behavior
I would expect the difference of the two indices from the pickled example to be empty and the union to be the same as the two indices.
I am also at a loss as to why I can't reproduce the wrong behaviour programmatically.
Installed Versions
The text was updated successfully, but these errors were encountered: