-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add ZSTD decompression support to IPC reader #693
Conversation
dfd3d4a
to
6452dc6
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #693 +/- ##
==========================================
- Coverage 87.63% 87.43% -0.20%
==========================================
Files 101 104 +3
Lines 14537 14851 +314
==========================================
+ Hits 12739 12985 +246
- Misses 1798 1866 +68 ☔ View full report in Codecov by Sentry. |
python/pyproject.toml
Outdated
@@ -45,6 +45,11 @@ requires = [ | |||
build-backend = "mesonpy" | |||
|
|||
[tool.meson-python.args] | |||
# Consistent version for zstd, rather than attempting to delocate whatever |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say delocate are you just referring to just to macOS or does this affect all platforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should check the logs, but I believe on both MacOS and Linux the wheel build attempts to copy shared library dependencies into the build. In the MacOS case, it was attempting to copy a version of zstd that would have required updating the minimum supported MacOS version. I'm not sure if this would have happened on Linux (because I'm not sure if that library was installed on manylinux2014), but it seems better to have the same version of zstd sitting in the wheel build? (Maybe handling this via build arguments in the CI job that builds the wheels is better?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok. I can take a look too.
Thinking out loud, we might need to be careful if it's the wheel build itself installing the libraries (because the subproject meson configurations instruct the build system to do so) or if it's the wheel repair jobs that are bundling improper libraries. I'm guessing it's the former.
I think we could add the following to pyproject.toml to prevent the former issue:
[tool.meson-python.args]
install = ['--skip-subprojects']
Or possibly even statically link the zstd subproject (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have the --skip-subprojects in pyrpoject.toml already...I think the issue was that meson actively tries to avoid building and statically linking if it can avoid it (but this is exactly what we want for the wheel, I think!). I'm happy with this at the moment but we should definitely take a closer look at exactly what's packaged before we release the first version with the meson-python build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few more minor comments but this lgtm. Nice work - this was a good deal of effort
@WillAyd Thank you for the review! |
This PR implements ZSTD buffer decompression in the
ArrowIpcDecoder
and in theArrowArrayStreamReader
when built with-DNANOARROW_IPC_WITH_ZSTD=ON
. It also allows a user to inject support for these into theArrowIpcDecoder
if for whatever reason they don't have control over the build flags (or want to use ZSTD that has been made available to them in a different way).This doesn't implement multithreaded decompression but does allow a user to implement it by not using the default
ArrowIpcSerialDecompressor()
. This could be included in header-only C++ if there is some interest.A non-trivial example in Python bindings (where were also wired up to support it):
This PR doesn't implement the R bindings (because adding a zstd dependency there is a can of worms better suited to another PR).