Responses with unusual content. #2871

ncthanhcs · 2024-12-30T07:08:01Z

System Info

text generation inference api

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

i'm using inference api https://api-inference.huggingface.co/v1/chat/completions with nvidia/Llama-3.1-Nemotron-70B-Instruct-HF model.

I use the same message with the role of "user," and the model produces different results. Most of the time, the model provides normal answers, but occasionally it generates responses with strange content.

I temporarily stopped calling the API for a short period. After that, I called the API again with the same message used previously, and the model returned a normal response.

Expected behavior

Is this issue caused by the model? Is there any way to prevent the model from generating such strange responses?

maiiabocharova · 2025-01-07T16:14:45Z

I experienced same behaviour with Inference APIs - when there are many parallel requests - model starts generating full rubbish. After restarting it works normally again, for me 32 parallel requests is max before model starts spitting out rubbish. This should not happen of course.

luonist · 2025-01-09T13:28:18Z

I experienced the same issue with standard Llama models from Meta as well (3.1 70B Instruct, and 3.3 70B Instruct).
These models are hosted in my corporate infrastructure and usually receive 3/4k requests (and 2/3M input tokens) per hour, which doesn't look to be that much. In fact, I've never seen more than 5 running requests per second for each model.
I'm using TGI 3.0.1 with H100, and H100-nvl GPUs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Responses with unusual content. #2871

Responses with unusual content. #2871

ncthanhcs commented Dec 30, 2024 •

edited

Loading

maiiabocharova commented Jan 7, 2025

luonist commented Jan 9, 2025 •

edited

Loading

Responses with unusual content. #2871

Responses with unusual content. #2871

Comments

ncthanhcs commented Dec 30, 2024 • edited Loading

System Info

Information

Tasks

Reproduction

Expected behavior

maiiabocharova commented Jan 7, 2025

luonist commented Jan 9, 2025 • edited Loading

ncthanhcs commented Dec 30, 2024 •

edited

Loading

luonist commented Jan 9, 2025 •

edited

Loading