Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add streamvbyte wrapper as well #2

Merged
merged 1 commit into from
Jan 7, 2019
Merged

Add streamvbyte wrapper as well #2

merged 1 commit into from
Jan 7, 2019

Conversation

apendleton
Copy link
Owner

The two common operations for building inverted indexes are compressing and decompressing ID lists, and intersecting them. In addition to the already-present intersection implementation, this PR adds a wrapper around an integer list compressor, streamvbyte, also by Daniel Lemire.

FAQ

Is this orthogonal to the purposes of the library?

Probably, but I want them both so I don't care very much.

Why not use a Rust compressor?

https://github.com/tantivy-search/bitpacking from Tantivy seems hard to use (requires knowledge of the internal block size used by the compressor). The existing Rust port of stream-vbyte doesn't seem maintained, and also requires unstable Rust. I think it might be possible to get it to compile on stable, given that some SIMD stuff has stabilized since it was last touched, but I tried for a couple of hours and gave up.

Why not use one of the C++ compressors in the library you're already wrapping?

This library started out as a wrapper around https://github.com/lemire/SIMDCompressionAndIntersection which already contains a bunch of implementations of integer list compressors, including one for streamvbyte. The C++ implementation is apparently not vectorized though, per fast-pack/SIMDCompressionAndIntersection#22, so I opted for the C one instead under the assumption that it will be faster.

@apendleton apendleton merged commit 5add8e2 into master Jan 7, 2019
@apendleton apendleton deleted the streamvbyte branch January 7, 2019 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant