-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA RUNTIME API error: DeviceSetLimit failed with error cudaErrorInvalidValue #776
Comments
According to the CUDA description:
But then we don't call any printf (all are masked). And I don't understand why we see this problem only on H100... |
I do not understand it either, I have simply not root-caused the issue let alone reporting the software versions like CUDA (or HPCSDK). I am currently retrying with this change. |
it makes sense... |
Since |
* Citation: "Setting cudaLimitPrintfFifoSize must not be performed after launching any kernel that uses the printf() device system call - in such case cudaErrorInvalidValue will be returned." * Since DeviceSetLimit is governed by ACC_API_CALL, the symbol NDEBUG must not be defined for reproducing the issue.
* Citation: "Setting cudaLimitPrintfFifoSize must not be performed after launching any kernel that uses the printf() device system call - in such case cudaErrorInvalidValue will be returned." * Since DeviceSetLimit is governed by ACC_API_CALL, the symbol NDEBUG must not be defined for reproducing the issue.
Let's leave this ticket open... I think the issue here is when the RT fails to build a kernel, but I'm not sure... |
(Taking over from #777 (comment))
I start to think this is the right solution... But need more time to investigate it (see my previous comment). |
( tested on H100 device )
Originally posted by @hfp in #767 (comment)
The text was updated successfully, but these errors were encountered: