-
-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Re-enable VXE from build targets for sin/cos #27665
Conversation
npy_uint64 simd_maski; | ||
hn::StoreMaskBits(f32, simd_mask, (uint8_t*)&simd_maski); | ||
hn::StoreMaskBits(f32, simd_mask, (uint8_t *)&simd_maski); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better approach is needed for libc fallback to properly handle scalable extensions. One potential improvement could be introducing a new intrinsic, such as bool TestBit(MASK, int pos)
, which would also be endian-friendly. For now, I applied a quick fix below to address a big-endian bug, as passing a uint64_t
as a uint8_t
array obviously lead to accessing garbage bytes instead.
@r-devulap, Oh, my editor automatically reformatted the entire source code based on the NumPy clang-format configuration. Would you prefer that I revert these changes, or is this formatting acceptable for you? |
All CI tests for s390x have passed. |
a18b34b
to
b7ae869
Compare
@r-devulap if you are happy about the changes please just put it in (I don't think the reformatting matters too much -- it's not that much). |
b7ae869
to
bda2921
Compare
Yeah no worries. This looks better anyways. |
yeah makes sense. I had to fix some rebase problems. But once the CI passes it should be good to go in. |
Let's put this in then, thanks @r-devulap and @seiko2plus. |
Not sure if this is having any impact here. google/highway#2409 |
I can confirm that trying to install numpy>= 2.0.0 for python3.10 breaks on UBI-8 image. |
As discussed in the optimization meeting, @seiko2plus wanted to split VSX and VXE into two separate PR's.
For clarification, SIMD optimizations for sine and cosine functions on both ppc64 and z/Architecture (IBM Z) were disabled by #25781 to bypass CI tests. This PR aims to re-enable optimizations for z/Architecture after addressing the following runtime errors, while #27627 re-enabled ppc64 optimizations.