-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Choosing NEON over SVE when fixed size vectors are used where possible #2060
Comments
It is possible to bitcast SVE vectors to NEON vectors and vice versa on GCC and Clang releases that have support for the arm_neon_sve_bridge.h header, including Clang 14 and later and GCC 14 and later. An uint8x16_t vector can be bitcast to a svuint8_t vector by doing It is also possible to re-implement the HWY_SVE2_128 target to use the fixed-size vector, mask, and tuple types in arm_neon-inl.h (which are wrappers around fixed-sized NEON vectors) instead of the SVE scalable vector, mask, and tuple types in arm_sve-inl.h on compilers that have support for the arm_neon_sve_bridge.h header as full SVE vectors are exactly 16 bytes on the HWY_SVE2_128 target. |
Here is a link to a Compiler Explorer snippet that demonstrates the use of the ARM NEON SVE Bridge intrinsics (which are defined in the arm_neon_sve_bridge.h header) to convert between NEON vectors and SVE vectors on the HWY_NEON target: |
If I understand correctly, the issue is that we use +1 to John's comment that SVE2_128 would work when running on Neoverse V2, but I think this use case is running on V1 which actually has 256-bit vectors. I don't have experience with the SVE/NEON bridge, that sounds interesting. But perhaps I don't fully understand the use case. If we are porting from NEON code, why not just use the NEON target? Is the issue that dynamic dispatch chooses SVE, even though for this use case NEON would be better? If so, we can either set HWY_DISABLED_TARGETS (HWY_NEON|HWY_NEON_WITHOUT_AES), or call hwy::DisableTargets at runtime to influence the dynamic dispatch. |
Yes, the use case is running on V1 and when there are some scalable vectors used in parts of the code where fixed sized vectors are used in other parts of the code. We haven't tested using dynamic dispatch - only static dispatch - but even with dynamic dispatch, I imagine if there's currently not a way to use NEON vectors for parts of the code and SVE in other parts of the code. Am I correct in this understanding or is there actually a way of specifying? |
hm, if the code is isolated and not alternating between SVE/NEON in the same function or source file, it is easy to compile one source file with SVE disabled (so it would use NEON on Arm), and the other one not. I suppose we could compile both NEON and SVE in the SVE target, and whenever the N in Simd<T, N, kPow2> is <= 16/sizeof(T), only enable the NEON functions. This would probably require quite a few updates to the SFINAE conditions in both files, disabling SVE for small vectors, and disabling NEON for non-capped. |
I think that would be the ideal solution but for now, how would one specify whether to use NEON or SVE on a per-function basis? I don't envision using NEON and SVE mixed in one function so if there's a way to just specify it for functions, that would most likely be enough |
It can work like this.
and for functions not involving a |
I've noticed quite a severe performance hit when writing highway code using fixed size vectors where the size is smaller than the number of available lanes in SVE. This occurred when porting NEON code written for 128-bit vectors into highway on a SVE machine which has 256-bit SVE vectors. Would it be possible for highway to choose NEON vectors for fixed size vectors where the specified size is smaller or equal to 128 bits?
The text was updated successfully, but these errors were encountered: