Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core was generated by /opt/tritonserver/backends/python/triton_python_backend_stub #7875

Open
powerpistn opened this issue Dec 12, 2024 · 0 comments

Comments

@powerpistn
Copy link

powerpistn commented Dec 12, 2024

Description
Use Triton to load the model. After executing inference for a while, a core will appear. The information viewed by gdb is as follows
1)the stack information
image

2)all threads
image

  1. the stack information of all thread
    image
    image
    image
    image

Triton Information
All projects are recompiled, using the 24.05 branch.
Official mirror nvcr.io/nvidia/tritonserver:24.05-py3

Expected behavior
The inference service can run stably, but sometimes crashes suddenly

When executing the inference service, use top to get the following information
image

the information of cpu
image

the GPU is A30
I want to know why a core appears after running for a while, and the gdb information shows that all threads are waiting.
There are multiple models running on the server at the same time. Is it because of resource problems that cause timeouts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant