-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat/cody: Brings image modality for BYOK users #6354
base: main
Are you sure you want to change the base?
Conversation
- Adds a new toolbar button to the chat interface to allow users to upload images when using the Google model - The button is conditionally rendered based on the current model being a Google model (identified by the `ModelTag.BYOK` tag and the model ID containing 'gemini-2.0-flash') - The onClick handler for the button is currently commented out, as the implementation for the actual image upload feature is not included in the provided diff
- Implements the functionality to select an image file and add it to the `ChatBuilder` instance - Adds the necessary handlers in `ChatController` to process the 'chat/upload-image' message and call the `ChatBuilder.addImages()` method - Adds a new message type in `protocol.ts` to handle the 'chat/upload-image' command
Implement image handling capabilities for the Google LLM provider: - Add types for image data and MIME type validation - Enhance ChatBuilder with image processing and MIME detection - Enable image support in completion parameters - Add inline image data support to chat messages
- Add visual indicators for models supporting image uploads - Improve image handling in Google chat client - Extract Gemini model detection into separate utility - Update model selection field to show image upload capability
- Replace filesystem URI handling with direct base64 encoding for images - Enhance image upload UI with preview and removal capabilities - Update MIME type detection to work with base64 strings - Simplify image upload protocol between webview and extension
- Add Vision tag for Gemini Flash 2.0 model configuration - Implement image upload handling in chat editor - Update model selection UI to display vision capabilities - Add dedicated Vision model group in model selector - Refactor image processing logic for better maintainability Related: Vision AI integration
- Add support for drag and drop image uploads in the human message cell - Implement handlers for drag enter, drag leave, and drop events - Update the HumanMessageEditor component to handle the uploaded image file - Add a new state variable to track the current image file Related: Vision AI integration
@@ -94,7 +95,8 @@ export class ChatBuilder { | |||
|
|||
public readonly sessionID: string = new Date(Date.now()).toUTCString(), | |||
private messages: ChatMessage[] = [], | |||
private customChatTitle?: string | |||
private customChatTitle?: string, | |||
private images: ImageData[] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private images: ImageData[] = [] | |
private images: ImageData[] = [] |
I did it this way to get the prototype demo-ready as my hackathon project but I don't think this is the best approach (my bad!).
Instead of passing it to ChatBuidler, could we add a new ContextItem type for media data instead so the images could be preserve in chat history?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have also thought about this and find this a great idea. Not only would the user have visual feedback, but there would also be multiple media blobs available in the future.
In its current state, however, this would collide with the other non-visual models, which would mean adding an additional context filter later on (totally feasible).
Additionally, I'm just not sure if a large number of images in the chat history would hurt performance. Depending on the specification of the computer, a slowdown in chat history management is observed: https://linear.app/sourcegraph/issue/CODY-4516/vscode-cody-extension-lags-with-large-chat-history-40-items
- Add drag counter to properly handle nested drag events - Restructure HumanMessageCell component hierarchy for better state management - Enhance image upload cleanup on removal - Fix drag state reset on drag end - Improve component organization for better maintainability This change provides a more reliable drag-and-drop experience and prevents UI state inconsistencies when handling image uploads in the chat interface.
…ova/cody into PriNova/image_modality_BYOK
Tried it again, and it looks great! Copying and pasting is still not working for me, though |
- Rename model check function for clarity (isGeminiFlash2Model) - Add smart title formatting for model names - Standardize model title presentation across components
This PR brings image modality for BYOK users via the Google LLM provider.
The PR is behind the
cody.dev.models
experimental feature flag. You need to configure it in the settings.json like this:Image_Modality.mp4
Model Selection overhauled:
Test plan
Build Cody based on this PR
Manual Testing Steps
Model Selection:
Image Upload Flow:
Chat Interaction:
Edge Cases:
Drag 'n' Drop:
Notes
Changelog
Added