-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sse4.2 implementation #1
Comments
I´ll have a look later, thanks. FYI, my code runs without errors here (both scalar and vectorized). |
Hey, I have also implemented an AVX version: A accumulate the iteracion counts for the currently processed eight pixels in a Vector256 and at the end I convert it to Vector256 before storing it to memory. Apart from this, it really is the same as the Sse-version
|
I had a look ate the SSE code and run it: if I do a pixel-by-bixel comparison (compare the result arrays, not the bitmaps, too lazy to get the nuget package... ) I get differences, I suspect due to floating point deltas... not an issue with Mandelbrot but could be an issue for production code. That´s one of the reasons I pre-calculate the x and y values since I test the benchmarks by comparing the scalar and vector outputs. Did not look ate the Avx code yet. I´ll see if I can eliminate the differences and include your code and benchmarks if that´s OK with you. Cheers! |
You are right, my scalar and my vector codes do not return the same results. (the SSE and AVX versions do get the same results though). I have checked now, the differences are relativelly few, but definitively are there. I have debuged the code, I also have identified the reason: The important thing is, that when you do floating point arithmetic e.g.: (a-b-c+d) than the execution order of the arithmetic matters!!!. a-b-c+d is not always equal to (a-(b+c)+d). Exactly that is the difference between my scalar and vector codes:
than the results will be 100% identical to the scalar code results. (I have checked it) |
Hi!
We have had contact on Reddit. I am pretty sure that your SIMD code doesnt work correctly and it also has design fault.
I have rewritten both the Scalar and the SIMD code, mainly to make it simpler. Take what you think useful.
I very much like your buffer size to x-y-dimensions calculations but I think it is an unnecessary overkill. So I just take the Y-resolution (number of vertical pixels) and calculate the X-resolution (horizontal pixels). Instead of resolution correction, I only accept y-resolution which results in whole-number X-resolution.
SIMD:
Where I think your implementation is problematic are the following items:
You have in-memory arrays for x and y values which are read into Vectors. That is unnecessary and thus suboptimal as SIMD code can easily get memory-bound.
You can calculate the running X and Y vectors purely from registers (setting initial value and increment with step size) without touching memory.
I also get at
testSpan[resVectorNumber] = Avx.Add(xSquVec, ySquVec);
an index out of range exceptionI have used only SSE4.2 as my machine is not AVX2 capable.
If you add
SixLabors.ImageSharp;
from NuGet you can actually get an image of the calculated fractal!The text was updated successfully, but these errors were encountered: