Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider distributing zlib-ng #464

Open
indygreg opened this issue Jan 1, 2025 · 6 comments
Open

Consider distributing zlib-ng #464

indygreg opened this issue Jan 1, 2025 · 6 comments
Labels
performance Potential performance improvement

Comments

@indygreg
Copy link
Collaborator

indygreg commented Jan 1, 2025

We build and distribute our own zlib library.

We're currently using the canonical zlib library available from zlib.net.

zlib-ng (https://github.com/zlib-ng/zlib-ng) has various optimizations for zlib that can realize >10% perf wins while maintaining API compatibility. API compatibility means that zlib-ng should just work.

I reckon we could safely compile against and distribute zlib-ng so end-users automagically see significant performance wins.

@indygreg indygreg added the performance Potential performance improvement label Jan 1, 2025
@indygreg
Copy link
Collaborator Author

indygreg commented Jan 1, 2025

Note that on macOS we don't currently build zlib from source.

I can't definitely recall the historical reason for this. But I think it had something to do with macOS is guaranteed to have a copy of libz at /usr/lib/libz.1.dylib and it was acceptable to just fall back to using the system zlib to keep things simpler. There may also be an issue with statically linking libz on macOS since other framework dependencies are likely to pull in /usr/lib/libz.1.dylib as a dependency, potentially creating symbol conflicts.

I mention this because if we do ship zlib-ng it could create [unwanted] divergence between macOS and !macOS distributions.

@zanieb
Copy link
Member

zanieb commented Jan 1, 2025

Of minor note, we recently switched from zlib-ng to zlib-rs in uv (astral-sh/uv#9184)

@indygreg
Copy link
Collaborator Author

indygreg commented Jan 2, 2025

I wasn't aware of the pure rust zlib-rs crate! That does look promising. I see some use of AVX-2 in there, which could make it performance competitive with zlib-ng.

However, it appears to be using compile-time CPU feature detection via e.g. #[target_feature(enable = "avx2")]. This means you have to explicitly target the Rust binary to a more modern CPU instruction set, as AVX2 isn't in the regular x86-64 instruction set.

zlib-ng, by contrast, uses runtime CPU feature detection. So the binary compiles in different versions of functions using different assembly instructions and then picks the best one at runtime. This is vastly more reliable than build time targeting since 1 binary runs optimally on ~every x86-64 CPU microarchitecture level. (You can achieve similar functionality in Rust by using a crate like https://crates.io/crates/multiversion.)

While I haven't reproduced, I'd be surprised if a vanilla x86-64 target build of zlib-rs was performance competitive with zlib-ng without using AVX2 instructions. It is likely that whoever conducted the performance benchmark you linked (@charliermarsh?) had a RUSTFLAGS="-C target-cpu=native" (or similar) hanging out in their ~/.cargo/config and was actually testing an AVX2 enabled build without knowing it. Or there was some other flaw in the benchmark (like zlib-ng not having appropriate features enabled.)

If zlib-rs -AVX2 is actually performance competitive with zlib-ng +AVX2, this would be worthy of tech blog headlines since it means the Rust compiler at the vanilla x86-64 instruction set is performance competitive with C + handwritten AVX2 assembly for similar functionality. (I doubt this is the case and it is more likely the benchmarking methodology was flawed.)

@charliermarsh
Copy link
Member

I ran it on my M3, so it seems more likely that none of the x86 feature sets were involved at all? Still might be worth it in our case to remove our CMake dependency.

@indygreg
Copy link
Collaborator Author

indygreg commented Jan 2, 2025

I ran it on my M3, so it seems more likely that none of the x86 feature sets were involved at all?

Yup. This is closer to a Rust vs C benchmark. Or a comparison of zlib-ng vs zlib-rs ARM support. TBH I'm unsure how optimized each is on ARM: I've mostly heard about the various x86-64 assembly optimizations.

@charliermarsh
Copy link
Member

Makes sense! (Maybe we'll restore zlib-ng for x86 builds in uv.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Potential performance improvement
Projects
None yet
Development

No branches or pull requests

3 participants