Core was generated by /opt/tritonserver/backends/python/triton_python_backend_stub #7875

powerpistn · 2024-12-12T03:12:42Z

Description
Use Triton to load the model. After executing inference for a while, a core will appear. The information viewed by gdb is as follows
1）the stack information

2）all threads

the stack information of all thread

Triton Information
All projects are recompiled, using the 24.05 branch.
Official mirror nvcr.io/nvidia/tritonserver:24.05-py3

Expected behavior
The inference service can run stably, but sometimes crashes suddenly

When executing the inference service, use top to get the following information

the information of cpu

the GPU is A30
I want to know why a core appears after running for a while, and the gdb information shows that all threads are waiting.
There are multiple models running on the server at the same time. Is it because of resource problems that cause timeouts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core was generated by /opt/tritonserver/backends/python/triton_python_backend_stub #7875

Core was generated by /opt/tritonserver/backends/python/triton_python_backend_stub #7875

powerpistn commented Dec 12, 2024 •

edited

Loading

Core was generated by /opt/tritonserver/backends/python/triton_python_backend_stub #7875

Core was generated by /opt/tritonserver/backends/python/triton_python_backend_stub #7875

Comments

powerpistn commented Dec 12, 2024 • edited Loading

powerpistn commented Dec 12, 2024 •

edited

Loading