Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Ifpack2: Compute residual update #13610

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

seanofthemillers
Copy link
Contributor

@trilinos/ifpack2

Motivation

This is a work in progress to test changes to the compute residual kernel used by the TriDiag Solver.

@brian-kelley @vbrunini @tcfisher This is a draft of the changes we discussed. It should give a small performance uplift, though I think there is room for further improvement.

Using the Ifpack2 block tridiag benchmark:
image

Related Issues

  • Closes put-issue-number-here

Stakeholder Feedback

Testing

@seanofthemillers seanofthemillers requested a review from a team as a code owner November 16, 2024 00:19
@brian-kelley brian-kelley self-requested a review December 12, 2024 16:44
Copy link
Contributor

@brian-kelley brian-kelley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. I'll do a run myself to make sure that hardcoding the block size and vector length doesn't hurt cuda performance.

@brian-kelley
Copy link
Contributor

brian-kelley commented Dec 12, 2024

@seanofthemillers I see you have WIP in the PR title, are there other changes you plan to make? You will at least have to amend your commit with the git signoff message (git commit -s ...) to pass that check that we require now.

edit: instead of -s you can also just add the text to commit message, "Signed-off-by: $user <$email>" where user and email match what you use for github.

Copy link
Contributor

@brian-kelley brian-kelley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanofthemillers After some evaluation on H100, it seems that these changes do cause a measurable regression for blocksize=7, and don't improve blocksize=13. I didn't test as many cases as you (just 100^3 grid and those two block sizes) but I think what this needs is to guard the change with launch bounds/vector length to Kokkos::HIP exec space.

Time in seconds for overall BlockTriDiagonalSolver, with 100 repeats.

Before (develop) After (rebased PR)
block size 7 0.0443379 0.0776897
block size 13 0.0694024 0.0688918

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants