[GPU] Enable GEMMs to first attempt LLVMGPUTileAndFuse with intrinsic by default #19520

nirvedhmeshram · 2024-12-18T20:52:49Z

Based on comparisons with iree-kernel-benchmark here The performance between VectorDistribute vs TileAndFuse when using intrinisics seem comparable. Note that none of the tests in the sheet used the padding extension available in TileAndFuse after, #19484
so its a fair comparison of the pipelines themselves. TileAndFuse in some cases did have a speed up that seems beyond the noise level and overall it averages out to 1.25x faster.

However, we will be looking at LLAMA and SDXL numbers before actually considering this PR for merging,

Fixes : #18858

Depends on : #19587 , #19597

nirvedhmeshram · 2024-12-19T16:26:57Z

There are compiler failures in the regression suite models, converting to draft while I debug

nirvedhmeshram · 2024-12-19T21:49:28Z

The problem was a missing functionality for GEMMs of the type (f16,f16) ->f16. I filed this issue for it
#19532
Probably cant land this without having a solution for that but we also solved this problem at the model level so going to keep pushing on this to find other issues.

nirvedhmeshram · 2024-12-20T19:42:15Z

Found another issue with accumulating GEMMs #19546

…attern Signed-off-by: Nirvedh Meshram <[email protected]>

Signed-off-by: Nirvedh Meshram <[email protected]>

Signed-off-by: Nirvedh <[email protected]>

nirvedhmeshram force-pushed the enable_tile_and_fuse_matmul branch from 38f5a22 to 7d687d7 Compare December 18, 2024 22:21

nirvedhmeshram changed the title ~~[GPU] Enable GEMMs to use LLVMGPUTileAndFuse by default~~ [GPU] Enable GEMMs to first attempt LLVMGPUTileAndFuse with intrinsic by default Dec 18, 2024

nirvedhmeshram marked this pull request as ready for review December 18, 2024 22:35

nirvedhmeshram requested review from MaheshRavishankar, qedawkins, kuhar and Groverkss as code owners December 18, 2024 22:35

nirvedhmeshram marked this pull request as draft December 19, 2024 16:26

nirvedhmeshram force-pushed the enable_tile_and_fuse_matmul branch from 7d687d7 to 7e2cdf8 Compare December 19, 2024 21:46

nirvedhmeshram force-pushed the enable_tile_and_fuse_matmul branch 2 times, most recently from e6aa895 to 3bc822c Compare December 20, 2024 16:41

nirvedhmeshram force-pushed the enable_tile_and_fuse_matmul branch 3 times, most recently from 0bcf683 to ca7c4f3 Compare January 2, 2025 23:23

nirvedhmeshram added 2 commits January 3, 2025 15:56

[GPU] Add chained reshape support for scf.forall expand destination p…

223d2fc

…attern Signed-off-by: Nirvedh Meshram <[email protected]>

[GPU] Add a pass to convert accumulating GEMMs to GEMMs

25414e9

Signed-off-by: Nirvedh Meshram <[email protected]>

nirvedhmeshram force-pushed the enable_tile_and_fuse_matmul branch from ca7c4f3 to d82ec09 Compare January 3, 2025 22:15

[GPU] Enable tile and fuse matmul by default

2adc85d

Signed-off-by: Nirvedh <[email protected]>

nirvedhmeshram force-pushed the enable_tile_and_fuse_matmul branch from d82ec09 to 2adc85d Compare January 3, 2025 22:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] Enable GEMMs to first attempt LLVMGPUTileAndFuse with intrinsic by default #19520

[GPU] Enable GEMMs to first attempt LLVMGPUTileAndFuse with intrinsic by default #19520

nirvedhmeshram commented Dec 18, 2024 •

edited

Loading

nirvedhmeshram commented Dec 19, 2024

nirvedhmeshram commented Dec 19, 2024 •

edited

Loading

nirvedhmeshram commented Dec 20, 2024

[GPU] Enable GEMMs to first attempt LLVMGPUTileAndFuse with intrinsic by default #19520

Are you sure you want to change the base?

[GPU] Enable GEMMs to first attempt LLVMGPUTileAndFuse with intrinsic by default #19520

Conversation

nirvedhmeshram commented Dec 18, 2024 • edited Loading

nirvedhmeshram commented Dec 19, 2024

nirvedhmeshram commented Dec 19, 2024 • edited Loading

nirvedhmeshram commented Dec 20, 2024

nirvedhmeshram commented Dec 18, 2024 •

edited

Loading

nirvedhmeshram commented Dec 19, 2024 •

edited

Loading