-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't use implicitly elapsed_time
in autotuner
#3036
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Anatoly Myachev <[email protected]>
@whitneywhtsang we can try the changes in #2484 on DLE runner, but we need to cherry-pick 2a4b818 into Pavel's branch |
benchmarks/triton_kernels_benchmark/gemm_postop_addmatrix_benchmark.py
Outdated
Show resolved
Hide resolved
benchmarks/triton_kernels_benchmark/gemm_postop_gelu_benchmark.py
Outdated
Show resolved
Hide resolved
benchmarks/triton_kernels_benchmark/gemm_preop_exp_benchmark.py
Outdated
Show resolved
Hide resolved
Let's cherry-pick this PR to |
ok, but let's use 2a4b818 (last commit in #2484) which compatible with changes on Pavel's branch |
Signed-off-by: Anatoly Myachev <[email protected]>
This reverts commit 2a4b818.
Please rebase this PR. |
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
Co-authored-by: Whitney Tsang <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
c21c92a
to
5710fd1
Compare
done |
This performance difference may be due to the different number of warm-up runs of the function. I use the interface of our functions that warm up a certain number of times (10), instead of running only 10 milliseconds, as is the default in Triton: |
We could do a run with upstream do_bench changed to use exact number of runs (without this PR), and see if there are any performance differences, to isolate the reason. (We could do that after Agama update.) |
The main idea of this pull request is not to use
elapsed_time
that enable profiling mode for sycl queues, as this is not needed for profiling with PyTorch and PTI.CI runs: