-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for MultiBroadcastFusion #1641
Conversation
Can you please add a short usage guide for developers? (Ie, when should we use |
3f4b1a1
to
cba15ef
Compare
56809bf
to
c1e6305
Compare
8843338
to
64d7746
Compare
fbf2ec2
to
7885e5e
Compare
7885e5e
to
001f898
Compare
d080ff1
to
12f9f49
Compare
12f9f49
to
7608772
Compare
eca3757
to
e4d699f
Compare
I've renamed @dennisYatunin, I think that this is in good enough shape to merge. Can you take a look when you have a chance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start! Looking forward to seeing this expanded to mismatched center/face/surface spaces and to non-pointwise operators.
This PR adds support for the use of MultiBroadcastFusion.jl, in order to allow users to fuse multiple broadcast expressions into a single kernel launch.
We'll be able to decorate multiple broadcasts with
@fused_direct
, e.g.,:Which will result in the compiler being able to hoist global memory reads, improving performance.
I'll open an issue on expanding this for a few cases:
But I'd like to start with this simple case first, since there should be some low hanging fruit that we can leverage (and also see how things work in production).
Once JuliaRegistries/General#102559 is merged, I'll update the PR to add the correct dependency.
This is a step towards CliMA/ClimaAtmos.jl#2632.
Right now, we are restricted to the following limitations:
Some examples summarizing this are below:
The following will error:
The following will error:
The following will error:
The following will error:
The following will work:
Any pointwise function should work, and it's advantageous to use
@fused_direct
when there is at least one variable that is shared across multiple broadcast expressions in any way. For example:converting
to
will reduce 1 memory read of because
ᶜ∇²uʲs.:($$j)
shows up in the right-hand side of both expressions. In addition, convertingto
Will result in the reuse of
ᶜts
, resulting in 1 fewer read, as the value computed in the first line can be reused in the second line.