-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuda.parallel: Support structured types as algorithm inputs #3218
base: main
Are you sure you want to change the base?
cuda.parallel: Support structured types as algorithm inputs #3218
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
/ok to test |
1 similar comment
/ok to test |
self.ltoir, _ = cuda.compile( | ||
op, sig=value_type(value_type, value_type), output="ltoir" | ||
) | ||
# if h_init is a struct, wrap it in a Record type: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# if h_init is a struct, wrap it in a Record type: | |
# if h_init is a struct, wrap it in a custom numba struct-like type: |
|
||
def wrap_struct(dtype: np.dtype) -> numba.types.Type: | ||
""" | ||
Wrap the given numpy structure dtype in a numba type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: explain this better
🟩 CI finished in 24m 05s: Pass: 100%/1 | Total: 24m 05s | Avg: 24m 05s | Max: 24m 05s
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
CUDA Experimental | |
+/- | python |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
CUDA Experimental | |
+/- | python |
CCCL C Parallel Library | |
Catch2Helper |
🏃 Runner counts (total jobs: 1)
# | Runner |
---|---|
1 | linux-amd64-gpu-v100-latest-1 |
7fab84d
to
2bd35e3
Compare
2bd35e3
to
ac9cc55
Compare
ac9cc55
to
a78a187
Compare
Description
Closes #3135.
This PR enables using structured data types with
cuda.parallel
algorithms.The numba CUDA target doesn't directly support using structured data types as inputs to device functions. Thus, the implementation works by defining a custom numba data type corresponding to the structured type, and compiling the user-provided reduction function for that custom data type.
Additional Context: Numba support for struct types
This PR involves wrapping a struct type in a custom numba ("wrapper") type. Ostensibly, numba supports using struct types directly, but for the CUDA target we get
cudaErrorIllegalAddress
when numba kernels are invoked on inputs of struct type.I don't fully understand the reasons for this, but it may be more apparent to someone else who is more familiar with reading the PTX. I suspect there are alignment issues somewhere when using struct types, as they translate to pointer-to-struct arguments to the generated device function.
Consider the following device function:
Below are the PTX generated by numba after compiling the function for (1) inputs as raw struct dtypes, (2) inputs as "wrapper" types. If I use the code generated for raw struct inputs, I get
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
.Raw struct input PTX
Wrapper type input
(for posterity, the PTX generated can be viewed by compiling the code for output type
"ptx"
in this function and running the unit test introduced in this PR).Checklist