-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Architecture and OS identification macros #3237
base: main
Are you sure you want to change the base?
Conversation
🟩 CI finished in 1h 52m: Pass: 100%/170 | Total: 1d 14h | Avg: 13m 42s | Max: 1h 24m | Hits: 9%/22530
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 170)
# | Runner |
---|---|
125 | linux-amd64-cpu16 |
19 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
Would it make sense to add some minimal testing? Maybe confirm one of the arch and one of the os macro is always present? Would it make sense to replace the current usage of the underlying macros or would it make more sense as a separate change? |
thanks @pciolkosz. I will add minimal testing. I would also love to add stronger tests, but I'm not sure how to implement them. Let me see if I can do something in this direction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the testing, thanks @fbusato
indeed, I found that amazing things happen with nvrtc #if _CCCL_ARCH(32BIT)
static_assert(sizeof(void*) == 4, ""); // FAIL!!
#endif |
Right, NVRTC follows the host compiler behavior on the respective platforms, though it does not compile host code and therefore has no such concept of host compiler unlike in NVCC. |
// X86 32-bit | ||
#if defined(_M_IX86) | ||
# define _CCCL_ARCH_X86_32_() 1 | ||
#else | ||
# define _CCCL_ARCH_X86_32_() 0 | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Why is this needed? I vaguely remember that we don't support any 32-bit systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows 32bit support has been removed in CUDA 12 (not 11) https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html#system-requirements. Probably, we can remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also handle Win32 in some cases, e.g.
https://github.com/NVIDIA/cccl/blob/main/libcudacxx/include/cuda/std/__bit/clz.h#L132
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are most likely remnants from our libc++ fork
🟩 CI finished in 1h 25m: Pass: 100%/170 | Total: 1d 02h | Avg: 9m 15s | Max: 42m 11s | Hits: 67%/22538
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 170)
# | Runner |
---|---|
125 | linux-amd64-cpu16 |
19 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
Q: is it necessary to define the values as function-like macros? Unless we need to delay the macro expansion, I think you we can use ordinary defines and omit the else branch defining the macro to 0. I used the same approach when I implemented the compiler checks |
If possible we really want to have the function like macros because in that case it is not possible to silently get invalid checks in like using just |
Fixes #2505
Description
Add internal architecture and OS identification macros for CUDA supported platforms