Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel Iter for Combinations #1197

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

hydrogs
Copy link

@hydrogs hydrogs commented Sep 26, 2024

Addresses #719

Brings the Itertools combinations method.

Also brings the array_combinations method. This one is not yet part of the public API of Itertools, but it is already available on their master branch. Since using arrays prevents unnecessary allocations I found it convenient to add it already, given Rayon focus on performance. Also it allows for eager code sharing between both methods.

I have chosen not to implement the tuple_combinations method from Itertools at this time, since apparently tuple_* functions are there to offer a similar functionality to arrays before const generics were a thing. There are proposals to migrate to array_* methods.

While itertools can encapsulate all the combinations logic within the increment_indices method, in Rayon this is a bit more tricky, since a kind of random access is required. Thus, this implementation uses unrank which computes the necessary indices for an arbitrary offset.

Itertools uses a buffered list to collect the previous iterator in a lazy way in order to be able to address arbitrary indices needed to seed the different combinations. This has been mentioned in #719. I haven't done something similar here as I don't see a way to collect a ParallelIterator in a lazy way, so the proposed implementation just collects the previous iterator into an Arc<[T]>. This is still better than the partial solution found in the issue, since this way there is no need to store all possible combinations in memory, just the base iterator which seeds them. Also, the combinations themselves are computed in parallel.

Handling overflows

Due to the factorial nature of combinations, overflows can occur easily. Both, Itertools and this PR use a checked version of the binomial function in order to detect such overflows. I have decided to panic in this case, since doing more than usize::MAX iterations on any 64 machine is probably just too much. Also, the code will panic if the length of the previous iterator is smaller than the k (length of each combination), as this is probably a bug in the caller implementation, and I find too inconvenient returning a Result or an Option with the Iterator. Feel free to comment about this.

Extensive testing

This PR includes an extensive set of tests, as I found it particularly challenging to get algorithms of this kind correct on the first try. Feel free to suggest pruning any unnecessary tests. The tests are relatively fast and do not significantly impact the overall test suite runtime though.

Tries to share as much code as possible with ParallelIterator::combinations
It uses the implementations from Itertools
In cases where `k` was 2 or 3, no guard was used, so this function could
return `None`, when the general algorithm can compute the actual Some(_).

Also, checked arithmetic was used when it is not needed, since the
initial guards are enough to assure no overflows will occur.

Tests where changed to include these cases.
@hydrogs hydrogs force-pushed the feature/combinations branch from d5fdacf to 5bbbe24 Compare October 1, 2024 01:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant