Skip to content

Htscodecs 1.3.0

Compare
Choose a tag to compare
@jkbonfield jkbonfield released this 09 Aug 10:23
· 83 commits to master since this release

Release 1.3.0: 9th August 2022

The primary change in this release is a new SIMD enabled rANS codec.

Changes

  • There is a 32-way unrolled rANS implementation. This is accessed using the existing rans 4x16 API with the RANS_ORDER_X32 bit set.
    Implementations exist for SSE4.1, AVX2, AVX512 and ARM Neon, as well as traditional non-SIMD scalar code in C and JavaScript. See
    the commit logs for benchmarks.

  • Improved memory allocation via a new htscodecs_tls_alloc function. This uses Thread Local Storage (TLS) to avoid multiple malloc/free calls, reducing system CPU time.

  • Some external functions have been renamed, with the old ones still existing in a deprecated fashion. Every symbol should now start hts_, rans_, arith_, fqz_ or tok3_*.

  • Improved test framework with an "entropy" tool that iterates over all entropy encoders.

  • Updated the Appveyor CI image to user a newer gcc. Also added ARM to the list of processors to test on.

  • Tab vs space code changes. Use "git diff -w" to see through these.

  • Reworked fuzzing infrastructure.

  • Small speed improvements to various rANS encoders and decoders. These were tested on a broad range of compilers, versions and systems. The new code may be slightly slower with some combinations, but is faster overall and removes a few outliers with considerably degraded performance.

  • Substantial memory reduction to the name tokeniser (tok3).

Bug fixes

  • Fixed undefined behaviour in our use of _builtin_clz().

  • Fixed a few redundant #includes.

  • Work around strict aliasing bugs, uncovered with gcc -O2.

  • Fixed an issue with encoding data blocks close to 2GB in size. (Additionally blocks above 2GB now error, rather than crashing or returning incorrect results.)

  • Fix encode error with large blocks using RANS_ORDER_STRIPE.