You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! Thanks for the very usefull package! I thinks I found a bug in the chunks choice mechanism:
My input dataset has shape (176, 226, 55115) with chunks (20, 20, 55115). The requested output chunks are (80, 60, 365). I allowed 3GB of max_mem, and there is a temp store.
Rechunking fails with : (elided traceback)
File "/path/to/.conda/x38/lib/python3.9/site-packages/distributed/client.py", line 1813, in _gather
raise exception.with_traceback(traceback)
File "/path/to/.conda/x38/lib/python3.9/site-packages/rechunker/pipeline.py", line 47, in _copy_chunk
target[chunk_key] = data
File "/path/to/.conda/x38/lib/python3.9/site-packages/zarr/core.py", line 1213, in __setitem__
self.set_basic_selection(selection, value, fields=fields)
File "/path/to/.conda/x38/lib/python3.9/site-packages/zarr/core.py", line 1308, in set_basic_selection
return self._set_basic_selection_nd(selection, value, fields=fields)
File "/path/to/.conda/x38/lib/python3.9/site-packages/zarr/core.py", line 1599, in _set_basic_selection_nd
self._set_selection(indexer, value, fields=fields)
File "/path/to/.conda/x38/lib/python3.9/site-packages/zarr/core.py", line 1651, in _set_selection
self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields)
File "/path/to/.conda/x38/lib/python3.9/site-packages/zarr/core.py", line 1888, in _chunk_setitem
self._chunk_setitem_nosync(chunk_coords, chunk_selection, value,
File "/path/to/.conda/x38/lib/python3.9/site-packages/zarr/core.py", line 1893, in _chunk_setitem_nosync
cdata = self._process_for_setitem(ckey, chunk_selection, value, fields=fields)
File "/path/to/.conda/x38/lib/python3.9/site-packages/zarr/core.py", line 1952, in _process_for_setitem
return self._encode_chunk(chunk)
File "/path/to/.conda/x38/lib/python3.9/site-packages/zarr/core.py", line 2009, in _encode_chunk
cdata = self._compressor.encode(chunk)
File "numcodecs/blosc.pyx", line 557, in numcodecs.blosc.Blosc.encode
File "/path/to/.conda/x38/lib/python3.9/site-packages/numcodecs/compat.py", line 102, in ensure_contiguous_ndarray
raise ValueError(msg)
ValueError: Codec does not support buffers of > 2147483647 bytes
Turns out 55115 * 176 * 226 = 2192254240 = 8 Go (float32) and is slightly over the number in the error message (by 2%). So I'm guessing rechunker is trying to put everything in a single chunk? Even though this is way above max_mem. Also, I never asked for Blosc encoding, so I guess it is automatic? Not a problem, but it seems a smaller chunk should be chosen in that case.
The text was updated successfully, but these errors were encountered:
Hi! Thanks for the very usefull package! I thinks I found a bug in the chunks choice mechanism:
My input dataset has shape (176, 226, 55115) with chunks (20, 20, 55115). The requested output chunks are (80, 60, 365). I allowed 3GB of
max_mem
, and there is a temp store.Rechunking fails with : (elided traceback)
Turns out 55115 * 176 * 226 = 2192254240 = 8 Go (float32) and is slightly over the number in the error message (by 2%). So I'm guessing rechunker is trying to put everything in a single chunk? Even though this is way above
max_mem
. Also, I never asked forBlosc
encoding, so I guess it is automatic? Not a problem, but it seems a smaller chunk should be chosen in that case.The text was updated successfully, but these errors were encountered: