-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: I started a qwen2vl-7b video processing service using vllm (0.6.6), but encountered an error during inference #11657
Comments
Your video is probably too long to fit inside the model. Try using a shorter video or sample fewer frames from it. |
My video is only 5 seconds long, which is considered a very short video. |
The most crucial thing is that the same video, on the same GPU, does not have this problem without using VLLM |
You need to sample the frames outside of vLLM, since we only apply HF's preprocessing to the data which doesn't include video sampling. Alternatively, if you want to keep the full video, you can try increasing |
I also tried to set the value of max_model_len to be greater than 32768, but encountered an error message as follows: |
You can try overriding rope scaling: https://qwen.readthedocs.io/en/latest/deployment/vllm.html#extended-context-support I'm not 100% sure whether this is applicable to Qwen2-VL though. @fyabc any idea about this? |
Your current environment
Command:
python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8088 --model /app/qwen2vl-7b --tensor-parrallel 1 --gpu-memory-utilization 0.95 --served-model-name qwen2vl-7b --trust-remote-code
GPU:
A800 80GB
Query:
query = {"model":"qwen2vl-7b",
"messages":[
{"role":"user",
"content":[
{"type":"text","text":"A prompt word of about 500 words"},
{"type":"video_url","video_url":{"url":“A downloadable URL, a video of about 5 seconds in mp4 format”}
}]
}]
}
Response:
{"object":"error","message":"The prompt (total length 43698) is too long to fit into the model (context length 32768). Make sure that `max
number of images, and pers than the r mber of text tokens plus multimodal tokens. For image inputs, the number of image tokens depends on the number of images, and possibly their aspect ratios as well.","type":"BadRequestError","param" :nuit, code :400}
Model Input Dumps
No response
🐛 Describe the bug
Command:
python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8088 --model /app/qwen2vl-7b --tensor-parrallel 1 --gpu-memory-utilization 0.95 --served-model-name qwen2vl-7b --trust-remote-code
GPU:
A800 80GB
Query:
query = {"model":"qwen2vl-7b",
"messages":[
{"role":"user",
"content":[
{"type":"text","text":"A prompt word of about 500 words"},
{"type":"video_url","video_url":{"url":“A downloadable URL, a video of about 5 seconds in mp4 format”}
}]
}]
}
Response:
{"object":"error","message":"The prompt (total length 43698) is too long to fit into the model (context length 32768). Make sure that `max
number of images, and pers than the r mber of text tokens plus multimodal tokens. For image inputs, the number of image tokens depends on the number of images, and possibly their aspect ratios as well.","type":"BadRequestError","param" :nuit, code :400}
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: