Releases: samtools/htscodecs
htscodecs 1.0
Release 1.0: 23rd Feb 2021
This marks the first non-beta release of htscodecs, following a perioid of integration with Htslib and automated fuzzing by Google's OSS-Fuzz program.
[Note this testing only applies to the C implementation. The JavaScript code should still be considered as examples of the codecs, more for purposes of understanding and clarity than as a fully optimised and tested release.]
Since the last release (0.5) the key changes are:
-
Improved support for big endian platforms
-
Speed improvements to CRAM 3.0 4x8 rANS order-1 encoding. It's between 10 and 50% faster at encoding, based on input data.
-
Improved autoconf bzip2 checks and tidy up "make test" output.
-
Added some more files into "make install", so that "make distcheck" now passes.
-
Replaced Travis with Cirrus-CI testing.
-
Removed various C undefined behaviour, such as left shifting of negative values and integer overflows. As far as we know these were currently harmless on the supported platforms, but may break future compiler optimisations.
-
Fixed numerous OSS-Fuzz identified flaws. Some of these were potential security issues such as small buffer overruns.
-
Tidied up some code to prevent warnings.
-
The name tokeniser now has a limit on the size of data it can encode (10 million records). This may still be too high given the memory it will require, so it may be reduced again.
htscodecs-0.5
This release has a few renamed functions (the variable sized integer encoding functions) and thus is incompatible with v0.4. The test tools now also incorporate a "raw" mode (-r
) for purposes of creating un-wrapped byte streams without data sizing information. These now match the CRAMcodecs specification.
Full changes:
-
Renamed the varint functions and also added signed versions.
-
Rans 4x16 order-1 frequency tables are now configurable (within the byte stream) to 10 or 12 bit totals. Previously it was 10, but this is too small for efficient compression of extreme distributions.
-
rANS 4x16 X4 has been renamed STRIPE and can now interleave other quantities than just 4 streams.
-
Sped up the C rans 4x16 order-1 decoder, often by around 30% or so if SSE4 is permitted. (Try
-march=native
) -
Sped up the C RLE decoding function. Also refactored this code into it's own rle.c file.
-
Bug fix to name tokeniser so it can handle blank lines.
-
Fixed RLE encoding in the rANS 4x16 JavaScript implementation. It no longer can generate invalid streams when it doesn't find anything worthy of doing RLE on.
-
Fixed JavaScript rans 4x16 frequency renormalisation. Occasionally it'd generate very suboptimal frequency distributions. Also used this revised algorithm in C (which didn't have that problem, but was still improved.)
-
Fixed JavaScript exception handling in tok3.js (with thanks to Chris Norman).
-
Bug fixed the JavaScript rans.js to correctly allocate data size. It could fail on tiny inputs.
htscodecs-0.4
There are no new features in this release; simply improvements in portability and robustness.
Code portability for MacOS and Windows.
- On both of these platforms, as well as on Linux, memory management has been improved to avoid requiring large stack sizes. We use thread local storage to perform one malloc call and reuse this same block for each subsequent function call for the duration of that thread. This has the benefits of a large stack without penalties of repeated use of malloc/free.
Fixes
- Bug fixed name tokeniser when there are a variable number of tokens.
- Removed some compilation warnings.
- Javascript demonstration code is more complete, with DO_REV support in fqzcomp.js.
htscodecs-0.3
Bug fixes and updates to C code. Note this includes some incompatibilities (see commits).
Improved testing.
Added first draft of the javascript implementation. This isn't intended for production use, but is instead a reference implementation to be used along side the codec specification document (currently a work in progress, over at https://github.com/jkbonfield/hts-specs/blob/CRAMv4/CRAMv4.tex).
htscodecs-0.2
Mainly portability (MacOSX) and fuzz testing fixes.
htscodecs-0.1
First test release of the htscodecs package.
See README.md for the minimal amount of documentation, or look at the test programs in the tests directory.