Consider distributing zlib-ng #464

indygreg · 2025-01-01T22:11:29Z

We build and distribute our own zlib library.

We're currently using the canonical zlib library available from zlib.net.

zlib-ng (https://github.com/zlib-ng/zlib-ng) has various optimizations for zlib that can realize >10% perf wins while maintaining API compatibility. API compatibility means that zlib-ng should just work.

I reckon we could safely compile against and distribute zlib-ng so end-users automagically see significant performance wins.

indygreg · 2025-01-01T22:19:15Z

Note that on macOS we don't currently build zlib from source.

I can't definitely recall the historical reason for this. But I think it had something to do with macOS is guaranteed to have a copy of libz at /usr/lib/libz.1.dylib and it was acceptable to just fall back to using the system zlib to keep things simpler. There may also be an issue with statically linking libz on macOS since other framework dependencies are likely to pull in /usr/lib/libz.1.dylib as a dependency, potentially creating symbol conflicts.

I mention this because if we do ship zlib-ng it could create [unwanted] divergence between macOS and !macOS distributions.

zanieb · 2025-01-01T23:32:28Z

Of minor note, we recently switched from zlib-ng to zlib-rs in uv (astral-sh/uv#9184)

indygreg · 2025-01-02T03:04:13Z

I wasn't aware of the pure rust zlib-rs crate! That does look promising. I see some use of AVX-2 in there, which could make it performance competitive with zlib-ng.

However, it appears to be using compile-time CPU feature detection via e.g. #[target_feature(enable = "avx2")]. This means you have to explicitly target the Rust binary to a more modern CPU instruction set, as AVX2 isn't in the regular x86-64 instruction set.

zlib-ng, by contrast, uses runtime CPU feature detection. So the binary compiles in different versions of functions using different assembly instructions and then picks the best one at runtime. This is vastly more reliable than build time targeting since 1 binary runs optimally on ~every x86-64 CPU microarchitecture level. (You can achieve similar functionality in Rust by using a crate like https://crates.io/crates/multiversion.)

While I haven't reproduced, I'd be surprised if a vanilla x86-64 target build of zlib-rs was performance competitive with zlib-ng without using AVX2 instructions. It is likely that whoever conducted the performance benchmark you linked (@charliermarsh?) had a RUSTFLAGS="-C target-cpu=native" (or similar) hanging out in their ~/.cargo/config and was actually testing an AVX2 enabled build without knowing it. Or there was some other flaw in the benchmark (like zlib-ng not having appropriate features enabled.)

If zlib-rs -AVX2 is actually performance competitive with zlib-ng +AVX2, this would be worthy of tech blog headlines since it means the Rust compiler at the vanilla x86-64 instruction set is performance competitive with C + handwritten AVX2 assembly for similar functionality. (I doubt this is the case and it is more likely the benchmarking methodology was flawed.)

charliermarsh · 2025-01-02T03:15:13Z

I ran it on my M3, so it seems more likely that none of the x86 feature sets were involved at all? Still might be worth it in our case to remove our CMake dependency.

indygreg · 2025-01-02T03:22:49Z

I ran it on my M3, so it seems more likely that none of the x86 feature sets were involved at all?

Yup. This is closer to a Rust vs C benchmark. Or a comparison of zlib-ng vs zlib-rs ARM support. TBH I'm unsure how optimized each is on ARM: I've mostly heard about the various x86-64 assembly optimizations.

charliermarsh · 2025-01-02T03:38:28Z

Makes sense! (Maybe we'll restore zlib-ng for x86 builds in uv.)

indygreg added the performance Potential performance improvement label Jan 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider distributing zlib-ng #464

Consider distributing zlib-ng #464

indygreg commented Jan 1, 2025

indygreg commented Jan 1, 2025

zanieb commented Jan 1, 2025

indygreg commented Jan 2, 2025

charliermarsh commented Jan 2, 2025

indygreg commented Jan 2, 2025

charliermarsh commented Jan 2, 2025

Consider distributing zlib-ng #464

Consider distributing zlib-ng #464

Comments

indygreg commented Jan 1, 2025

indygreg commented Jan 1, 2025

zanieb commented Jan 1, 2025

indygreg commented Jan 2, 2025

charliermarsh commented Jan 2, 2025

indygreg commented Jan 2, 2025

charliermarsh commented Jan 2, 2025