You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use the same message with the role of "user," and the model produces different results. Most of the time, the model provides normal answers, but occasionally it generates responses with strange content.
I temporarily stopped calling the API for a short period. After that, I called the API again with the same message used previously, and the model returned a normal response.
Expected behavior
Is this issue caused by the model? Is there any way to prevent the model from generating such strange responses?
The text was updated successfully, but these errors were encountered:
I experienced same behaviour with Inference APIs - when there are many parallel requests - model starts generating full rubbish. After restarting it works normally again, for me 32 parallel requests is max before model starts spitting out rubbish. This should not happen of course.
I experienced the same issue with standard Llama models from Meta as well (3.1 70B Instruct, and 3.3 70B Instruct).
These models are hosted in my corporate infrastructure and usually receive 3/4k requests (and 2/3M input tokens) per hour, which doesn't look to be that much. In fact, I've never seen more than 5 running requests per second for each model.
I'm using TGI 3.0.1 with H100, and H100-nvl GPUs.
System Info
text generation inference api
Information
Tasks
Reproduction
i'm using inference api https://api-inference.huggingface.co/v1/chat/completions with nvidia/Llama-3.1-Nemotron-70B-Instruct-HF model.
I use the same message with the role of "user," and the model produces different results. Most of the time, the model provides normal answers, but occasionally it generates responses with strange content.
I temporarily stopped calling the API for a short period. After that, I called the API again with the same message used previously, and the model returned a normal response.
Expected behavior
Is this issue caused by the model? Is there any way to prevent the model from generating such strange responses?
The text was updated successfully, but these errors were encountered: