Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] shuffle_v2 - see if we can rely on the compiler for mask, maskz #1890

Open
DenisYaroshevskiy opened this issue Jul 14, 2024 · 2 comments
Assignees

Comments

@DenisYaroshevskiy
Copy link
Collaborator

At the moment the effort to support maskz versions of operations is

a) duplicated:

else if constexpr( !P::has_zeroes )
{
static_assert(P::reg_size > 16, "sanity check - sse alignr is better");
constexpr std::ptrdiff_t shift_epi32 = *starts_from * P::g_size / 4;
constexpr std::ptrdiff_t shift_epi64 = *starts_from * P::g_size / 8;
if constexpr( P::reg_size == 32 )
{
if constexpr( P::g_size >= 8 ) return _mm256_alignr_epi64(y, x, shift_epi64);
else return _mm256_alignr_epi64(y, x, shift_epi32);
}
else
{
if constexpr( P::g_size >= 8 ) return _mm512_alignr_epi64(y, x, shift_epi64);
else return _mm512_alignr_epi32(y, x, shift_epi32);
}
}
else
{
constexpr std::ptrdiff_t shift_epi32 = *starts_from * P::g_size / 4;
constexpr std::ptrdiff_t shift_epi64 = *starts_from * P::g_size / 8;
auto mask = is_na_or_we_logical_mask(p, g, as(x)).storage();
if constexpr( P::reg_size == 16 )
{
if constexpr( P::g_size >= 8 ) return _mm128_maskz_alignr_epi64(mask, y, x, shift_epi64);
else return _mm128_maskz_alignr_epi32(mask, y, x, shift_epi32);
}
else if constexpr( P::reg_size == 32 )
{
if constexpr( P::g_size >= 8 ) return _mm256_maskz_alignr_epi64(mask, y, x, shift_epi64);
else return _mm256_maskz_alignr_epi32(mask, y, x, shift_epi32);
}
else
{
if constexpr( P::g_size >= 8 ) return _mm512_maskz_alignr_epi64(mask, y, x, shift_epi64);
else return _mm512_maskz_alignr_epi32(mask, y, x, shift_epi32);
}

b) untested (I only concerned myself with explicit names)

c) mask with registercases are not addressed at all.

=====================

I suspect compiler can merge the non masked operation + blend with a masked operation.

So - this needs to be checked for sve and avx512.
Bugs filed if not.

mask(z) logic moved into shuffle_driver.
All the zero handling removed.

@DenisYaroshevskiy
Copy link
Collaborator Author

Probably somewhere after this function: shuffle_v2_driver_multiple_registers

shuffle_v2_driver_multiple_registers(NativeSelector selector,

The tests are split into two files:
test/unit/api/regular/shuffle_v2/shuffle_v2_driver.cpp
test/unit/api/regular/shuffle_v2/shuffle_v2_driver_intergration.cpp

I'm not sure which one to add to at the moment.

You will also need to clean up some P::has_zeroes from
include/eve/detail/shuffle_v2/simd/x86/shuffle_l2.hpp
include/eve/detail/shuffle_v2/simd/arm/sve/shuffle_l2.hpp

@DenisYaroshevskiy
Copy link
Collaborator Author

Seems like both clang and gcc can do it, at least in some cases. https://godbolt.org/z/h68WxonaT

@DenisYaroshevskiy DenisYaroshevskiy self-assigned this Aug 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant