Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rechunker broken with new xarray and zarr #154

Open
juliettelavoie opened this issue Jan 7, 2025 · 0 comments
Open

Rechunker broken with new xarray and zarr #154

juliettelavoie opened this issue Jan 7, 2025 · 0 comments

Comments

@juliettelavoie
Copy link

It looks like rechunker was broken by new versions of xarray for rechunking zarr written to disk.

Minimal working example:

import xarray as xr
from rechunker import rechunk
ds = xr.tutorial.open_dataset("air_temperature")
ds.to_zarr('ds1.zarr')
ds = xr.open_zarr('ds1.zarr')
array_plan = rechunk(
    ds, 
    target_chunks={"time": 2920, "lat": 25, "lon": 1},
    max_mem='1GB',
    target_store='ds2.zarr',
    temp_store='temp'
)

This worked with version 2024.9.0, but with version 2024.10.0 and up, I get the following error:

Traceback (most recent call last):
  File "/exec/jlavoie/.conda/xscen-dev/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-37-1d0e2ad48247>", line 4, in <module>
    array_plan = rechunk(
                 ^^^^^^^^
  File "/exec/jlavoie/.conda/xscen-dev/lib/python3.12/site-packages/rechunker/api.py", line 303, in rechunk
    copy_spec, intermediate, target = _setup_rechunk(
                                      ^^^^^^^^^^^^^^^
  File "/exec/jlavoie/.conda/xscen-dev/lib/python3.12/site-packages/rechunker/api.py", line 447, in _setup_rechunk
    variable_encoding = extract_zarr_variable_encoding(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/exec/jlavoie/.conda/xscen-dev/lib/python3.12/site-packages/xarray/backends/zarr.py", line 473, in extract_zarr_variable_encoding
    chunks = _determine_zarr_chunks(
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/exec/jlavoie/.conda/xscen-dev/lib/python3.12/site-packages/xarray/backends/zarr.py", line 329, in _determine_zarr_chunks
    for zchunk, dchunks, interval, size in zip(
                                           ^^^^
TypeError: 'NoneType' object is not iterable

Note that if I don't write ds1 to disk, there is no error.

juliettelavoie added a commit to Ouranosinc/xscen that referenced this issue Jan 15, 2025
<!-- Please ensure the PR fulfills the following requirements! -->
<!-- If this is your first PR, make sure to add your details to the
AUTHORS.rst! -->
### Pull Request Checklist:
- [ ] This PR addresses an already opened issue (for bug fixes /
features)
    - This PR fixes #xyz
- [x] (If applicable) Documentation has been added / updated (for bug
fixes / features).
- [x] (If applicable) Tests have been added.
- [x] This PR does not seem to break the templates.
- [x] CHANGELOG.rst has been updated (with summary of main changes).
- [x] Link to issue (:issue:`number`) and pull request (:pull:`number`)
has been added.

### What kind of change does this PR introduce?

* Improve `build_partition_data`
* pinned xarray because of
pangeo-data/rechunker#154

### Does this PR introduce a breaking change?
yes, I removed the indicator_kw argument. I realized it was too heavy to
do everything in one shot.

### Other information:

Developped for "On the importance of the reference data: Uncertainty
partitioning of bias-adjusted climate simulations over Quebec" by Lavoie
et al. (submitted in 2024) (presented at the last jamboree)
Code for the paper here: https://github.com/Ouranosinc/partition
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant