You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
Use Triton to load the model. After executing inference for a while, a core will appear. The information viewed by gdb is as follows
1)the stack information
2)all threads
the stack information of all thread
Triton Information
All projects are recompiled, using the 24.05 branch.
Official mirror nvcr.io/nvidia/tritonserver:24.05-py3
Expected behavior
The inference service can run stably, but sometimes crashes suddenly
When executing the inference service, use top to get the following information
the information of cpu
the GPU is A30
I want to know why a core appears after running for a while, and the gdb information shows that all threads are waiting.
There are multiple models running on the server at the same time. Is it because of resource problems that cause timeouts?
The text was updated successfully, but these errors were encountered:
Description
Use Triton to load the model. After executing inference for a while, a core will appear. The information viewed by gdb is as follows
1)the stack information
2)all threads
Triton Information
All projects are recompiled, using the 24.05 branch.
Official mirror nvcr.io/nvidia/tritonserver:24.05-py3
Expected behavior
The inference service can run stably, but sometimes crashes suddenly
When executing the inference service, use top to get the following information
the information of cpu
the GPU is A30
I want to know why a core appears after running for a while, and the gdb information shows that all threads are waiting.
There are multiple models running on the server at the same time. Is it because of resource problems that cause timeouts?
The text was updated successfully, but these errors were encountered: