Add streamvbyte wrapper as well #2

apendleton · 2019-01-07T21:06:03Z

The two common operations for building inverted indexes are compressing and decompressing ID lists, and intersecting them. In addition to the already-present intersection implementation, this PR adds a wrapper around an integer list compressor, streamvbyte, also by Daniel Lemire.

FAQ

Is this orthogonal to the purposes of the library?

Probably, but I want them both so I don't care very much.

Why not use a Rust compressor?

https://github.com/tantivy-search/bitpacking from Tantivy seems hard to use (requires knowledge of the internal block size used by the compressor). The existing Rust port of stream-vbyte doesn't seem maintained, and also requires unstable Rust. I think it might be possible to get it to compile on stable, given that some SIMD stuff has stabilized since it was last touched, but I tried for a couple of hours and gave up.

Why not use one of the C++ compressors in the library you're already wrapping?

This library started out as a wrapper around https://github.com/lemire/SIMDCompressionAndIntersection which already contains a bunch of implementations of integer list compressors, including one for streamvbyte. The C++ implementation is apparently not vectorized though, per fast-pack/SIMDCompressionAndIntersection#22, so I opted for the C one instead under the assumption that it will be faster.

Add streamvbyte wrapper as well

802664c

apendleton merged commit 5add8e2 into master Jan 7, 2019

apendleton deleted the streamvbyte branch January 7, 2019 21:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add streamvbyte wrapper as well #2

Add streamvbyte wrapper as well #2

apendleton commented Jan 7, 2019

Add streamvbyte wrapper as well #2

Add streamvbyte wrapper as well #2

Conversation

apendleton commented Jan 7, 2019

FAQ

Is this orthogonal to the purposes of the library?

Why not use a Rust compressor?

Why not use one of the C++ compressors in the library you're already wrapping?