You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This needs a bit of a re-think - the current microbenchmark infrastructure uses the onednn kernel name to benchmark only the kernel time (and ignore the pytorch bits around the kernel). However, pytorch is not able to fuse many matmul operations (e.g. dot w/ add) into a single kernel - if we use the existing infra for those benchmarks, then we will be comparing the onednn matmul kernel to the triton matmul + add fused kernel, when I think we want to be comparing total pytorch execution time with total triton (fused) execution time.
Currently we compare xetla and triton, but can compare onednn as well.
The text was updated successfully, but these errors were encountered: