-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster and std::simd #53
Comments
I've always intended to remove intrinsics which are implemented in std::simd, but only once they've been rfc'd in explicitly or stabilized. I do think it's a good idea for faster to add some basic SIMD algorithms which can be done on most architectures (at least x86_64 and aarch64). Stuff like the vector popcnts. The iterator system is definitely going to be faster's main value-add after std::simd is stabilized, however. I don't think they're trying to break into that space, and I don't want to duplicate the work they're doing. I think the degree to which we can eschew std::arch and my wrapper is pretty reliant on the surface area of std::simd. I need vector masks, gathers, scatters, and certain types of shuffles to make many of the iterators performant. |
Just an update that I'm a bit stuck. The good news is, with the latest changes in packed_simd I was now able compile a https://github.com/ralfbiedert/faster/tree/budget_cuts In parallel, I was trying to update It's frustrating, since apparently I could push forward either way, but neither one seems to be easy: A) Fixing B) Ditching It will be quite some work to get them back in place; work there might interfere with your plans of adding dynamic feature selection. Option B) is still my favorite due to the cleaner code it promises. However, I feel I can't really push this forward myself, as it involves making some major architectural judgement calls that might interfere with dynamic feature selection and would cut down intrinsics unless they have been restored bit-by-bit. Option A) I wouldn't really want to touch after my latest |
I can confirm that this is not the intent.
Portable vector masks and shuffles are already available. Portable masked vector gather, scatters, as well as compressed stores and uncompressed loads are partially implemented. A PR should land on How good these will work in practice, and whether |
That's good to hear. Apologies if I'm a bit out of the loop, but is the current iteration of the |
I'd say, 95% of it is a good approximation. There are some method names that have changed in The main change is that all types in the RFC like The most controversial thing in the RFC is the approximate floating-point methods, so as long as you don't use those you should be fine. I am hopeful that we can include them in some form, but there will be bikeshedding about the approximation error, how to control it, etc. The largest missing feature in |
Opening another ticket since this is a separate discussion from #47 and might be more controversial:
The more I look into the upcoming
std::simd
, the more I wonder iffaster
should not become a thinner "SIMD-friendly iteration" library that neatly plugs intostd::simd
and is really good at handling variable slices, zipping, ... instead of providing a blanket implementation overstd::arch
.Right now it seems that many common intrinsics and operations faster provides on packed types are or might be implemented in
std::simd
(compare coresimd/ppsv).At the same time, for things that won't be in
std::simd
(and will be more platform specific), faster will have a hard time providing a consistent performance story anyway.By that reasoning I see a certain appeal primarily focusing on a more consistent cross-platform experience with a much lighter code base (e.g., imagine faster without
arch/
andintrin/
and using mostlystd::simd
instead ofvektor
).Faster could also integrate
std::arch
specific functions and types, but rather as extensions and helpers (e.g., for striding) for special use cases, instead of using them as internal fundamentals.The text was updated successfully, but these errors were encountered: