-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mysterious 2x perf regression on GEMM #40
Comments
But running laser alone actually brings great improvements: $ nim cpp -r -d:release -d:openmp -d:danger --outdir:build benchmarks/gemm/gemm_bench_float32.nim
Hint: used config file '/home/beta/.choosenim/toolchains/nim-1.0.2/config/nim.cfg' [Conf]
Hint: used config file '/home/beta/Programming/Nim/laser/nim.cfg' [Conf]
Hint: operation successful (340 lines compiled; 0.025 sec total; 5.754MiB peakmem; Dangerous Release Build) [SuccessX]
Hint: /home/beta/Programming/Nim/laser/build/gemm_bench_float32 [Exec]
A matrix shape: (M: 1920, N: 1920)
B matrix shape: (M: 1920, N: 1920)
Output shape: (M: 1920, N: 1920)
Required number of operations: 14155.776 millions
Required bytes: 29.491 MB
Arithmetic intensity: 480.000 FLOP/byte
Theoretical peak single-core: 224.000 GFLOP/s
Theoretical peak multi: 4032.000 GFLOP/s
Make sure to not bench Apple Accelerate or the default Linux BLAS.
Laser production implementation
Collected 10 samples in 0.076 seconds
Average time: 6.928 ms
Stddev time: 3.038 ms
Min time: 5.896 ms
Max time: 15.573 ms
Perf: 2043.146 GFLOP/s |
And changing the order can slow down OpenBLAS as well
|
mratsim
added a commit
that referenced
this issue
Oct 24, 2019
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
With no code or hardware change at all, after month there is a 2x perf regression, OpenBLAS also is a bit slower (with no package update):
I suspect an issue with glibc OpenMP. (MKL-DNN is linked to Intel OpenMP)
The text was updated successfully, but these errors were encountered: