scalar horner algo using simd computations #1173
Replies: 6 comments
-
This
Seems wrong I suspect, you were meant to scan it. Can you just post a scalar version of the algorithm and then it'd be easier to figure out. |
Beta Was this translation helpful? Give feedback.
-
I found this online: https://en.wikipedia.org/wiki/Horner%27s_method Is this the algorithm you wanted? Didn't test anything, but I think the code should looks smth like:
|
Beta Was this translation helpful? Give feedback.
-
Ok it seems to be like that and I will try using scan and for_each. EVE_FORCEINLINE constexpr auto reverse_horner_(EVE_SUPPORTS(cpu_) |
Beta Was this translation helpful? Give feedback.
-
the polynom coeffs are in increasing degree order (this is reverse) |
Beta Was this translation helpful? Give feedback.
-
I am not sure what's reduce 2. We generally follow the standard algorithms so far. However, let's see a clean version and then we can figure out an apropriate for |
Beta Was this translation helpful? Give feedback.
-
Let's move this to discussion |
Beta Was this translation helpful? Give feedback.
-
I wrote this:
This is an internally simd version of the reverse horner algorithm which computes the value of a polynomial at a scalar floating point value x using a vector containing the polynomial coefficients in increasing order.
In the purely scalar version there is a loop is to compute from
s=0;
looping from 0 to siz-1 by 1 (siz being the size of a)
over the coefficients.
In this version the loop is here replaced by a shorter one from 0 to siz-1 by N (instead of 1)
where N is the number of lanes of the simd vector ( Of course [1,x,..., x^(N-1)] and x^N are computed once.)
This loop is then followed by a reduce to sum the N lanes and perhaps by a scalar ending. if N does not divides siz.
Can this be written using eve algorithms or with the extension of some algorithms?
At least it seems it will need a slight extension of reduce using 2 different op one for the simd accumulation and tone for he final wide reduce (will be eve::add here), but I tried that and was not fully successful. (Of course the algo::reduce first pass (the main loop) must be unaligned as spurious leading zeros in a are not wanted)
Beta Was this translation helpful? Give feedback.
All reactions