Internal error for batch inference: probability tensor contains either inf
, nan
or element < 0.
#2728
Open
1 of 3 tasks
inf
, nan
or element < 0.
#2728
System Info / 系統信息
NVIDIA-SMI 550.127.08 Driver Version: 550.127.08 CUDA Version: 12.4
使用xinference:v1.1.1
2024-12-31 01:17:30,925 xinference.core.worker 109 INFO [request f449ca82-c757-11ef-b2ea-0242ac1a0002] Leave launch_builtin_model, elapsed time: 46 s
2024-12-31 01:17:44,027 transformers.generation.configuration_utils 471 INFO loading configuration file /root/llm/model/Llama-3.1-Nemotron-70B-Instruct-HF/generation_config.json
loading configuration file /root/llm/model/Llama-3.1-Nemotron-70B-Instruct-HF/generation_config.json
2024-12-31 01:17:44,027 transformers.generation.configuration_utils 471 INFO Generate config GenerationConfig {
"bos_token_id": 128000,
"eos_token_id": [
128001,
128008,
128009
]
}
Generate config GenerationConfig {
"bos_token_id": 128000,
"eos_token_id": [
128001,
128008,
128009
]
}
2024-12-31 01:17:57,391 transformers.models.llama.modeling_llama 471 WARNING We detected that you are passing
past_key_values
as a tuple of tuples. This is deprecated and will be removed in v4.47. Please convert your cache or use an appropriateCache
class (https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)We detected that you are passing
past_key_values
as a tuple of tuples. This is deprecated and will be removed in v4.47. Please convert your cache or use an appropriateCache
class (https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)2024-12-31 01:18:09,955 xinference.model.llm.transformers.utils 471 ERROR Internal error for batch inference: probability tensor contains either
inf
,nan
or element < 0.Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/utils.py", line 491, in batch_inference_one_step
_batch_inference_one_step_internal(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/utils.py", line 335, in _batch_inference_one_step_internal
token = _get_token_from_logits(
File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/utils.py", line 111, in _get_token_from_logits
indices = torch.multinomial(probs, num_samples=2)
RuntimeError: probability tensor contains either
inf
,nan
or element < 0Destroy generator 17ad5836c75811efa6630242ac1a0002 due to an error encountered.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 419, in xoscar_next
r = await asyncio.create_task(_async_wrapper(gen))
File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 409, in _async_wrapper
return await _gen.anext() # noqa: F821
File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 485, in _to_async_gen
async for v in gen:
File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 681, in _queue_consumer
raise RuntimeError(res[len(XINFERENCE_STREAMING_ERROR_FLAG) :])
RuntimeError: probability tensor contains either
inf
,nan
or element < 02024-12-31 01:18:10,047 xinference.api.restful_api 1 ERROR Chat completion stream got an error: [address=0.0.0.0:44787, pid=471] probability tensor contains either
inf
,nan
or element < 0Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 2072, in stream_results
async for item in iterator:
File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 340, in anext
return await self._actor_ref.xoscar_next(self._uid)
File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 231, in send
return self._process_result_message(result)
File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 667, in send
result = await self._run_coro(message.message_id, coro)
File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive
return await super().on_receive(message) # type: ignore
File "xoscar/core.pyx", line 558, in on_receive
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive
result = await result
File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 431, in xoscar_next
raise e
File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 419, in xoscar_next
r = await asyncio.create_task(_async_wrapper(gen))
File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 409, in _async_wrapper
return await _gen.anext() # noqa: F821
File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 485, in _to_async_gen
async for v in gen:
File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 681, in _queue_consumer
raise RuntimeError(res[len(XINFERENCE_STREAMING_ERROR_FLAG) :])
RuntimeError: [address=0.0.0.0:44787, pid=471] probability tensor contains either
inf
,nan
or element < 0Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 527, in process_events
response = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 261, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1786, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1350, in call_function
prediction = await utils.async_iteration(iterator)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 583, in async_iteration
return await iterator.anext()
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 709, in asyncgen_wrapper
response = await iterator.anext()
File "/usr/local/lib/python3.10/dist-packages/gradio/chat_interface.py", line 545, in _stream_fn
first_response = await async_iteration(generator)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 583, in async_iteration
return await iterator.anext()
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 576, in anext
return await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 559, in run_sync_iterator_async
return next(iterator)
File "/usr/local/lib/python3.10/dist-packages/xinference/core/chat_interface.py", line 126, in generate_wrapper
for chunk in model.chat(
File "/usr/local/lib/python3.10/dist-packages/xinference/client/common.py", line 51, in streaming_response_iterator
raise Exception(str(error))
Exception: [address=0.0.0.0:44787, pid=471] probability tensor contains either
inf
,nan
or element < 02024-12-31 01:19:20,387 xinference.model.llm.transformers.utils 471 ERROR Internal error for batch inference: probability tensor contains either
inf
,nan
or element < 0.Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/utils.py", line 491, in batch_inference_one_step
_batch_inference_one_step_internal(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/utils.py", line 335, in _batch_inference_one_step_internal
token = _get_token_from_logits(
File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/utils.py", line 111, in _get_token_from_logits
indices = torch.multinomial(probs, num_samples=2)
RuntimeError: probability tensor contains either
inf
,nan
or element < 02024-12-31 01:19:20,420 xinference.core.model 471 ERROR [request 41af0c06-c758-11ef-a663-0242ac1a0002] Leave chat, error: probability tensor contains either
inf
,nan
or element < 0, elapsed time: 25 sTraceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 94, in wrapped
ret = await func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 735, in chat
return await self.handle_batching_request(
File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 718, in handle_batching_request
result = await fut
ValueError: probability tensor contains either
inf
,nan
or element < 02024-12-31 01:19:20,424 xinference.api.restful_api 1 ERROR [address=0.0.0.0:44787, pid=471] probability tensor contains either
inf
,nan
or element < 0Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 2098, in create_chat_completion
data = await model.chat(
File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 231, in send
return self._process_result_message(result)
File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 667, in send
result = await self._run_coro(message.message_id, coro)
File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive
return await super().on_receive(message) # type: ignore
File "xoscar/core.pyx", line 558, in on_receive
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive
result = await result
File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 102, in wrapped_func
ret = await fn(self, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 462, in _wrapper
r = await func(self, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 94, in wrapped
ret = await func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 735, in chat
return await self.handle_batching_request(
File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 718, in handle_batching_request
result = await fut
ValueError: [address=0.0.0.0:44787, pid=471] probability tensor contains either
inf
,nan
or element < 0Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
docker image
v1.1.1
The command used to start Xinference / 用以启动 xinference 的命令
docker 容器启动
services:
xinference:
image: xprobe/xinference:v1.1.1
container_name: xinference
ports:
- "9997:9997"
volumes:
- /opt/xinference/.xinference:/root/.xinference/
- /opt/xinference/.cache:/root/.cache/
- /opt/llm/model/:/root/llm/model/
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
entrypoint: xinference-local
command: ["-H", "0.0.0.0"]
restart: always
Reproduction / 复现过程
registry model,
{
"version": 1,
"context_length": 2048,
"model_name": "Llama-3.1-Nemotron-70B-Instruct-HF",
"model_lang": [
"en",
"zh"
],
"model_ability": [
"chat"
],
"model_description": "Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries.",
"model_family": "qwen2.5-instruct",
"model_specs": [
{
"model_format": "pytorch",
"model_size_in_billions": 70,
"quantizations": [
"none"
],
"model_id": null,
"model_hub": "huggingface",
"model_uri": "/root/llm/model/Llama-3.1-Nemotron-70B-Instruct-HF",
"model_revision": null
}
],
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n {%- endif %}\n {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within XML tags:\n" }}\n {%- for tool in tools %}\n {{- "\n" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- "\n\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": , \"arguments\": }\n</tool_call><|im_end|>\n" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}\n {%- else %}\n {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}\n {%- elif message.role == "assistant" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\n<tool_call>\n{"name": "' }}\n {{- tool_call.name }}\n {{- '", "arguments": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\n' }}\n {%- elif message.role == "tool" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\n<tool_response>\n' }}\n {{- message.content }}\n {{- '\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}\n {{- '<|im_end|>\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\n' }}\n{%- endif %}\n",
"stop_token_ids": [
151643,
151644,
151645
],
"stop": [
"<|endoftext|>",
"<|im_start|>",
"<|im_end|>"
],
"is_builtin": false
}
启动model
chat model报错
Expected behavior / 期待表现
期待正常chat
The text was updated successfully, but these errors were encountered: