Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Incorrect Results Using __mmask16 Mask in if_else Function with AVX-512 #2026

Open
Panhaolin2001 opened this issue Nov 21, 2024 · 20 comments
Labels
bug Something isn't working reproduced The bug has been reproduced and is under investigation

Comments

@Panhaolin2001
Copy link
Contributor

Here is a example: https://godbolt.org/z/nd9Kc3cqq

@Panhaolin2001 Panhaolin2001 added the bug Something isn't working label Nov 21, 2024
@Panhaolin2001
Copy link
Contributor Author

The output should not be:
4,6,8,10,12,14,16,18,20,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,117 ?

@jfalcou
Copy link
Owner

jfalcou commented Nov 21, 2024

Two things happens :

  • raw mask types are in fact integers so the value of the mask is turned into true inside the if_else
  • If I fix this by filling a logical manually, you ends up with the 1 not in the correct place as internally EVE is using big endian.

If your mask is always half true/half false, you can use the relative conditional mask liek keep_last :

https://godbolt.org/z/YMvja9sYo

@jfalcou jfalcou added the reproduced The bug has been reproduced and is under investigation label Nov 21, 2024
@DenisYaroshevskiy
Copy link
Collaborator

Do you think you want to implicitly cast __mask16 to logical? I think that'd make sense but I also know that it doesn't work.
I appreciate that we like the implicit conversions and in other places you can pass a register directly as a parameter.

What are you thinking, @jfalcou ?

@jfalcou
Copy link
Owner

jfalcou commented Nov 21, 2024

Maybe we need to make the wrapper type over the mmask type public

@DenisYaroshevskiy
Copy link
Collaborator

DenisYaroshevskiy commented Nov 21, 2024

I think we'd need to put up a pr and then see what can we do. I don't think I understand your comment like this.

@Panhaolin2001 - you probably know this, but here is how to write what you meant today: https://godbolt.org/z/off5xv6nG

UPD: I would also recommend looking into our formula generation constructor, since this seems like what you want (from your example): https://godbolt.org/z/PhjhPxohP

@Panhaolin2001
Copy link
Contributor Author

I'm sorry I may not have expressed myself too well. I currently made a type Mask<T> wrapped around __mmask8 __mmask16 __mmask32 and __mmask64 on avx512, and I want to make it compatible with the type eve::logical<eve::wide<T>>. One question I have is that when I do the auto mask = a > b; operation, the mask type is of type eve::logical<eve::wide<T>> and the internal storage_type is of type __mmask family, so why is it that when constructing the mask by using the auto mask = a > b if_else function returns the correct value, but not when __mmask16{0b0000000001111111} is specified manually?

@jfalcou
Copy link
Owner

jfalcou commented Nov 22, 2024

We already have such a type. We may just need to make it public api and amend logical functions to handle it.

As for why __mask dont work, it is because they are just plain intégral so the constructor used to convert them to logical think you are doing logical(some.number) which is eqv to using logical(true).

@Panhaolin2001
Copy link
Contributor Author

Thank you for your response! Looking forward to you exposing it as a public API and improving the handling of logic functions.

@jfalcou
Copy link
Owner

jfalcou commented Nov 22, 2024

@DenisYaroshevskiy In fact, maybe we can just have an API that let us build logical from a bit set, using maybe a tagged constructor, and optimize it on AVX512 to not do anything and, for the other arch, to do the bit to mask conversion ?

logical<wide<float>> l( as_mask{0b01101101} );

?

@DenisYaroshevskiy
Copy link
Collaborator

I don't like this bits constructor. I think the formula/initializer list constructor do this already.

This is very length specific

@DenisYaroshevskiy
Copy link
Collaborator

@Panhaolin2001

I don't know how much I understood.

If you want to write a convrersion from your type to logical, provide a conversion operator

struct my {
  __mmask16 m;

  using eve_logical_t = eve::logical<eve::wide<float, eve::fixed<16>>>;
  
  operator eve_logical_t () {
    return eve_logical_t{m};
  }
};

@jfalcou
Copy link
Owner

jfalcou commented Nov 22, 2024

It has to be something like

struct my {
  __mmask16 m;

  using eve_logical_t = eve::logical<eve::wide<float, eve::fixed<16>>>;
  
  operator eve_logical_t () {
    eve_logical_t that;
    that.storage().value = m;
    return that;
  }
};

or else the mask is seen as an integer then turned to true/false.

@DenisYaroshevskiy
Copy link
Collaborator

You are right, I thought what I wrote works. This is horrible. It works on any other platform. We need to at least stop it from compiling

@jfalcou
Copy link
Owner

jfalcou commented Nov 22, 2024

It is complicated cause for most compiler the __mask* stuff are typedefs to integral :/

@DenisYaroshevskiy
Copy link
Collaborator

DenisYaroshevskiy commented Nov 22, 2024

are typedefs to integral

we should only convert from bool and not from integrals

@jfalcou
Copy link
Owner

jfalcou commented Nov 23, 2024

Ok thus

template<scalar_value S>

Should go and the bool one after it should.use std::same_as to prevent conversion.
Will make a PR

@Panhaolin2001
Copy link
Contributor Author

Panhaolin2001 commented Nov 25, 2024

Is there any plan to convert from clang builtin vector extension to logical type?

typedef float vec_type __attribute__((vector_size(32)));
vec_type a{1,2,3,4,5,6,7,8}, b{2,3,1,4,6,7,9,4};
auto mask = a < b; 
auto eve_mask = eve::logical<eve::wide<T, eve::fixed<8>>>{mask};

@DenisYaroshevskiy
Copy link
Collaborator

Is there any plan to convert from clang builtin vector extension to logical type?

No plans.

I looked into it.
Clang always uses ints for mask type, even on avx512.

Here is how I'd do it:

  typedef float vec_type __attribute__((vector_size(32)));
  using eve_vec_type = eve::wide<float, eve::fixed<8>>;
  using eve_int_type = eve::wide<std::uint32_t, eve::fixed<8>>;

  vec_type a{1, 2, 3, 4, 5, 6, 7, 8}, b{2, 3, 1, 4, 6, 7, 9, 4};

  auto mask = a < b;
  auto clang_mask = eve::bit_cast(mask, eve::as<eve_int_type>{});
  auto eve_mask = eve::bit_cast(clang_mask != 0, eve::as<eve::logical<eve_vec_type>>{});

https://godbolt.org/z/jx4839jEh

@Panhaolin2001
Copy link
Contributor Author

Excuse me, I have an off-topic question. May I ask, on avx512 platform wide<float, fixed<128>> represents a vector type with 128 elements, but avx512 can only support 16 elements of float type, how does wide<float, fixed<128>> utilize avx512 registers? Splicing?

@DenisYaroshevskiy
Copy link
Collaborator

DenisYaroshevskiy commented Dec 23, 2024

I'm not 100% what you mean by splicing but they are represented as an array of registers.

So 128 / 16 = 8 registers.

We call this "aggregation".

We usually don't test that many registers at once, it should work but do test your code just in case.

If you can tell me more about the problem you are trying to solve, maybe I can give you a better answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working reproduced The bug has been reproduced and is under investigation
Projects
None yet
Development

No branches or pull requests

3 participants