-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix][V1] Fix molmo text-only inputs #11676
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Hmm, maybe we should expand our tests to cover this case... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! Can you verify if this yields the same result as main branch on V0? I was mainly curious if the padded dummy image input ids will matter at all for text-only input.
else: | ||
base_image_input_size = image_processor.base_image_input_size | ||
image_patch_size = image_processor.image_patch_size | ||
image_num_patch = ( | ||
base_image_input_size[0] // image_patch_size, | ||
base_image_input_size[1] // image_patch_size, | ||
) | ||
n_pixels = image_patch_size * image_patch_size * 3 | ||
n_patches = image_num_patch[0] * image_num_patch[1] | ||
|
||
image_length_w = image_processor.image_token_length_w | ||
image_length_h = image_processor.image_token_length_h | ||
tokens_per_image = image_length_w * image_length_h | ||
images = torch.full( | ||
(max_total_crops, n_patches, n_pixels), | ||
-1, | ||
dtype=torch.float32, | ||
) | ||
image_input_idx = torch.full( | ||
(max_total_crops, tokens_per_image), | ||
-1, | ||
dtype=torch.int32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure why this was in the code originally when the AI2 team made the PR to support Molmo on vLLM, but I guess it wasn't an issue back then because it didn't matter on V0 since we didn't use the placeholder ranges for these "dummy" image input indices padded to the prompt token ids.
I have verified using the reproduce code above, and the generated results completely align with the main V0 |
Where should tests be added for this PR? |
You can edit the Hmm, actually it seems that empty image is already included there... |
Alright, then I won't add tests. |
It seems that we don't have any tests for Molmo at all. |
yea I think we decided that if the model support came from the vendor then the test is not required. |
I think to avoid breaking the code in future PRs (especially with the V1 refactoring that's going on), we should add tests for it. |
Okay, l will handle this asap |
Sorry didn't realize you added tests. |
Please fix the lint errors |
Signed-off-by: Jee Jee Li <[email protected]>
tests/models/decoder_only/vision_language/vlm_utils/model_utils.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Reproduce Code