部署qwen-vl-chat 出错 #2702

amzfc · 2024-12-25T05:16:36Z

System Info / 系統信息

linux x86_64 ubuntu 22.04

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

docker / docker
pip install / 通过 pip install 安装
installation from source / 从源码安装

Version info / 版本信息

当前最新版本

The command used to start Xinference / 用以启动 xinference 的命令

xinference launch --model_path /data/models/qwen-vl-chat --model-engine Transformers --model-name qwen-vl-chat --size-in-billions 7 --model-format pytorch --quantization none --gpu-idx 2

Reproduction / 复现过程

RuntimeError: Failed to launch model, detail: [address=0.0.0.0:39001, pid=1537] None: Max retries exceeded with url: /Qwen-VL/assets/SimSun.ttf (Caused by None)

Expected behavior / 期待表现

为什么要请求这个字体文件，总是出粗？求解答

amzfc · 2024-12-25T05:19:11Z

补充：当前是无网环境，
已从huggingface上下载模型到了/data/models中，

qinxuye · 2024-12-25T05:19:48Z

为啥不试试 qwen2-vl，我搜了下，这个模型有不少这个问题的。

amzfc · 2024-12-25T05:22:32Z

@qinxuye qwen2-vl 使用xinference自带的图形化界面，上传图片文件后问答会报错误，请问有遇到过吗

qinxuye · 2024-12-25T05:36:26Z

有错误贴错误。

amzfc · 2024-12-25T05:39:13Z

qwen2-vl 使用xinference自带的图形化界面，上传图片文件后问答会报错误
启动命令:xinference launch --model_path /data/models/Qwen2-VL-7B --model-engine Transformers --model-name qwen2-vl-instruct --size-in-billions 7 --model-format pytorch --quantization none --gpu-idx 2

Failed to generate chat completion, detail: [address=0.0.0.0:44905, pid=1811] index 0 is out of bounds for dimension 0 with size 0

qinxuye · 2024-12-25T05:41:42Z

得贴服务端报错。

amzfc · 2024-12-25T05:48:29Z

2024-12-24 21:48:08,577 xinference.core.supervisor 141 DEBUG [request d1a8bf52-c283-11ef-9b1a-0242ac130002] Leave describe_model, elapsed time: 0 s
2024-12-24 21:48:08,578 xinference.core.model 1811 DEBUG Request chat, current serve request count: 0, request limit: inf for the model qwen2-vl-instruct
2024-12-24 21:48:08,578 xinference.core.model 1811 DEBUG [request d1a8eeaa-c283-11ef-b814-0242ac130002] Enter chat, args: ModelActor(qwen2-vl-instruct-0),[{'role': 'user', 'content': '你是什么模型呢'}],{'max_tokens': 512, 'temperature': 1.0, 'stream': False}, kwargs: raw_params={'max_tokens': 512, 'temperature': 1, 'stream': False}
2024-12-24 21:48:08,578 xinference.core.model 1811 WARNING Currently for multimodal models, xinference only supports qwen-vl-chat, cogvlm2, glm-4v, MiniCPM-V-2.6 for batching. Your model qwen2-vl-instruct with model family None is disqualified.
2024-12-24 21:48:08,581 xinference.core.model 1811 ERROR [request d1a8eeaa-c283-11ef-b814-0242ac130002] Leave chat, error: index 0 is out of bounds for dimension 0 with size 0, elapsed time: 0 s
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 90, in wrapped
ret = await func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 748, in chat
response = await self._call_wrapper_json(
File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 565, in _call_wrapper_json
return await self._call_wrapper("json", fn, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 132, in _async_wrapper
return await fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 577, in _call_wrapper
ret = await asyncio.to_thread(fn, *args, **kwargs)
File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/utils.py", line 530, in _wrapper
result = fn(self, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/qwen2_vl.py", line 95, in chat
c = self._generate(messages, generate_config)
File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/qwen2_vl.py", line 118, in _generate
generated_ids = self._model.generate(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2252, in generate
result = self._sample(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 3244, in _sample
model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1792, in prepare_inputs_for_generation
if cache_position[0] != 0:
IndexError: index 0 is out of bounds for dimension 0 with size 0
2024-12-24 21:48:08,581 xinference.core.model 1811 DEBUG After request chat, current serve request count: 0 for the model qwen2-vl-instruct
2024-12-24 21:48:08,584 xinference.api.restful_api 1 ERROR [address=0.0.0.0:44905, pid=1811] index 0 is out of bounds for dimension 0 with size 0
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 2098, in create_chat_completion
data = await model.chat(
File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 231, in send
return self._process_result_message(result)
File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 667, in send
result = await self._run_coro(message.message_id, coro)
File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive
return await super().on_receive(message) # type: ignore
File "xoscar/core.pyx", line 558, in on_receive
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive
result = await result
File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 103, in wrapped_func
ret = await fn(self, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 462, in _wrapper
r = await func(self, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 90, in wrapped
ret = await func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 748, in chat
response = await self._call_wrapper_json(
File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 565, in _call_wrapper_json
return await self._call_wrapper("json", fn, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 132, in _async_wrapper
return await fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 577, in _call_wrapper
ret = await asyncio.to_thread(fn, *args, **kwargs)
File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/utils.py", line 530, in _wrapper
result = fn(self, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/qwen2_vl.py", line 95, in chat
c = self._generate(messages, generate_config)
File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/qwen2_vl.py", line 118, in _generate
generated_ids = self._model.generate(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2252, in generate
result = self._sample(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 3244, in _sample
model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1792, in prepare_inputs_for_generation
if cache_position[0] != 0:
IndexError: [address=0.0.0.0:44905, pid=1811] index 0 is out of bounds for dimension 0 with size 0
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 527, in process_events
response = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 261, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1786, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1350, in call_function
prediction = await utils.async_iteration(iterator)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 583, in async_iteration
return await iterator.anext()
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 576, in anext
return await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 559, in run_sync_iterator_async
return next(iterator)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 742, in gen_wrapper
response = next(iterator)
File "/usr/local/lib/python3.10/dist-packages/xinference/core/chat_interface.py", line 225, in predict
response = model.chat(
File "/usr/local/lib/python3.10/dist-packages/xinference/client/restful/restful_client.py", line 580, in chat
raise RuntimeError(
RuntimeError: Failed to generate chat completion, detail: [address=0.0.0.0:44905, pid=1811] index 0 is out of bounds for dimension 0 with size 0

amzfc · 2024-12-25T06:31:52Z

关于无网环境下部署qwen-vl-chat，出现RuntimeError: Failed to launch model, detail: [address=0.0.0.0:39001, pid=1537] None: Max retries exceeded with url: /Qwen-VL/assets/SimSun.ttf (Caused by None)

定位：
本地下载的模型文件中的 tokenization_qwen.py 开头代码处，从 “Qwen/Qwen-VL-Chat” 加载SimSun.ttf，找不到则请网络请求下载
FONT_PATH = try_to_load_from_cache("Qwen/Qwen-VL-Chat", "SimSun.ttf")
if FONT_PATH is None:
if not os.path.exists("SimSun.ttf"):
ttf = requests.get("https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/SimSun.ttf")
open("SimSun.ttf", "wb").write(ttf.content)
FONT_PATH = "SimSun.ttf"

解决：
直接为FONT_PATH =“SimSun.ttf”（前提下载的模型文件中有此文件，正常情况下一般都有）

github-actions · 2025-01-01T19:03:38Z

This issue is stale because it has been open for 7 days with no activity.

XprobeBot added the gpu label Dec 25, 2024

XprobeBot added this to the v1.x milestone Dec 25, 2024

github-actions bot added the stale label Jan 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

部署qwen-vl-chat 出错 #2702

部署qwen-vl-chat 出错 #2702

amzfc commented Dec 25, 2024

amzfc commented Dec 25, 2024

qinxuye commented Dec 25, 2024

amzfc commented Dec 25, 2024

qinxuye commented Dec 25, 2024

amzfc commented Dec 25, 2024

qinxuye commented Dec 25, 2024

amzfc commented Dec 25, 2024

amzfc commented Dec 25, 2024

github-actions bot commented Jan 1, 2025

部署qwen-vl-chat 出错 #2702

部署qwen-vl-chat 出错 #2702

Comments

amzfc commented Dec 25, 2024

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

amzfc commented Dec 25, 2024

qinxuye commented Dec 25, 2024

amzfc commented Dec 25, 2024

qinxuye commented Dec 25, 2024

amzfc commented Dec 25, 2024

qinxuye commented Dec 25, 2024

amzfc commented Dec 25, 2024

amzfc commented Dec 25, 2024

github-actions bot commented Jan 1, 2025