Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] Enable GEMMs to first attempt LLVMGPUTileAndFuse with intrinsic by default #19520

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

nirvedhmeshram
Copy link
Contributor

@nirvedhmeshram nirvedhmeshram commented Dec 18, 2024

Based on comparisons with iree-kernel-benchmark here The performance between VectorDistribute vs TileAndFuse when using intrinisics seem comparable. Note that none of the tests in the sheet used the padding extension available in TileAndFuse after, #19484
so its a fair comparison of the pipelines themselves. TileAndFuse in some cases did have a speed up that seems beyond the noise level and overall it averages out to 1.25x faster.

However, we will be looking at LLAMA and SDXL numbers before actually considering this PR for merging,

Fixes : #18858

Depends on : #19587 , #19597

@nirvedhmeshram nirvedhmeshram force-pushed the enable_tile_and_fuse_matmul branch from 38f5a22 to 7d687d7 Compare December 18, 2024 22:21
@nirvedhmeshram nirvedhmeshram changed the title [GPU] Enable GEMMs to use LLVMGPUTileAndFuse by default [GPU] Enable GEMMs to first attempt LLVMGPUTileAndFuse with intrinsic by default Dec 18, 2024
@nirvedhmeshram nirvedhmeshram marked this pull request as ready for review December 18, 2024 22:35
@nirvedhmeshram nirvedhmeshram marked this pull request as draft December 19, 2024 16:26
@nirvedhmeshram
Copy link
Contributor Author

There are compiler failures in the regression suite models, converting to draft while I debug

@nirvedhmeshram nirvedhmeshram force-pushed the enable_tile_and_fuse_matmul branch from 7d687d7 to 7e2cdf8 Compare December 19, 2024 21:46
@nirvedhmeshram
Copy link
Contributor Author

nirvedhmeshram commented Dec 19, 2024

The problem was a missing functionality for GEMMs of the type (f16,f16) ->f16. I filed this issue for it
#19532
Probably cant land this without having a solution for that but we also solved this problem at the model level so going to keep pushing on this to find other issues.

@nirvedhmeshram nirvedhmeshram force-pushed the enable_tile_and_fuse_matmul branch 2 times, most recently from e6aa895 to 3bc822c Compare December 20, 2024 16:41
@nirvedhmeshram
Copy link
Contributor Author

Found another issue with accumulating GEMMs #19546

@nirvedhmeshram nirvedhmeshram force-pushed the enable_tile_and_fuse_matmul branch 3 times, most recently from 0bcf683 to ca7c4f3 Compare January 2, 2025 23:23
@nirvedhmeshram nirvedhmeshram force-pushed the enable_tile_and_fuse_matmul branch from ca7c4f3 to d82ec09 Compare January 3, 2025 22:15
@nirvedhmeshram nirvedhmeshram force-pushed the enable_tile_and_fuse_matmul branch from d82ec09 to 2adc85d Compare January 3, 2025 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable TileAndFuse pipeline with instrinisc targeting for non-intrinsic sized GEMM shapes
1 participant