Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flatten buoyancy_gradients call #2951

Merged
merged 2 commits into from
Apr 25, 2024
Merged

Conversation

charleskawczynski
Copy link
Member

@charleskawczynski charleskawczynski commented Apr 24, 2024

A step towards #2530.

@charleskawczynski charleskawczynski marked this pull request as ready for review April 25, 2024 14:11
@charleskawczynski
Copy link
Member Author

charleskawczynski commented Apr 25, 2024

I looked at the nsight report in https://buildkite.com/clima/climaatmos-target-gpu-simulations/builds/272, and buoyancy_gradients + the 3 hoisted ᶜgradᵥ-ᶠinterp calls are now 3.9 ms, so this PR cuts the time in half (xref: #2950 (comment)). The buoyancy gradients itself is only 547 μs, the hoisted ᶜgradᵥ-ᶠinterp calls are 1.5 ms, 177 μs, and 1.6 ms respectively. So, I think we can open an issue in ClimaCore about the following patterns:

Maybe bad:

@. ᶜgradᵥ = ᶜgradᵥ(ᶠinterp(get_single_scalar(big_struct)))

Maybe bad:

@. ᶜgradᵥ = ᶜgradᵥ(ᶠinterp(foo(something)))

Maybe catastrophic:

@. single_scalar = foo(ᶜgradᵥ(ᶠinterp(get_single_scalar(big_struct))), ᶜgradᵥ(ᶠinterp(get_single_scalar(big_struct))))

Maybe catastrophic:

@. single_scalar = foo(ᶜgradᵥ(ᶠinterp(foo(something))), ᶜgradᵥ(ᶠinterp(foo(something))))

Good:

@. ᶜgradᵥ = ᶜgradᵥ(ᶠinterp(cfield))

We can make some reproducers for this in ClimaCore and iterate more quickly on improving things.

What I suspect was the culprit for this slowdown, is both/either not using shared memory, or register spills, due to complexity. Not using shared memory could have even resulted in register spills.

We should probably make our FD use shared memory, but this is at least an improvement, so I'll merge.

@charleskawczynski charleskawczynski added this pull request to the merge queue Apr 25, 2024
@charleskawczynski
Copy link
Member Author

Also, just to note, the first commit didn't seem to have much of an impact on performance-- it was the hoisting that made the improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant