Support for new formats of content #21

hellohejinyu · 2024-05-16T02:42:29Z

import OpenAI from "openai";

const openai = new OpenAI();

async function main() {
  const response = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [
      {
        role: "user",
        content: [
          { type: "text", text: "What’s in this image?" },
          {
            type: "image_url",
            image_url: {
              "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
            },
          },
        ],
      },
    ],
  });
  console.log(response.choices[0]);
}
main();

openai supports passing images and text at the same time, but the token calculation rules for images depend on the image size and processing mode. So I think we need to artificially supplement the two parameters of the image size and processing mode to calculate the token of the image.

sean-nicholas · 2024-05-16T03:17:20Z

I guess you could add a lib that extracts the size from the images like https://www.npmjs.com/package/image-size
Should be pretty easy to fetch an image or create a buffer from base64 to pipe that into image-size. But currently this won't work in cloudflare workers: image-size/image-size#405

I'm not quite sure if you can guess what detail level is chosen when you do not send it (when it's in auto mode), but from the documentation I would guess if it's smaller than 512px in both directions it will be low otherwise high.

Funny there are two description of costs in the official docs. The one that you posted and this: https://platform.openai.com/docs/guides/vision/low-or-high-fidelity-image-understanding
In this section they say the costs are 65 tokens per crop:

low will enable the "low res" mode. The model will receive a low-res 512px x 512px version of the image, and represent the image with a budget of 65 tokens. This allows the API to return faster responses and consume fewer input tokens for use cases that do not require high detail.

high will enable "high res" mode, which first allows the model to see the low res image and then creates detailed crops of input images as 512px squares based on the input image size. Each of the detailed crops uses twice the token budget (65 tokens) for a total of 129 tokens.

I'm not quite sure why two time 65 should be 129 but hey 🤷‍♂️😁

hellohejinyu · 2024-05-16T03:23:19Z

function calculateHighDetailTokens(width: number, height: number): number {
  // First, check if the image needs to be scaled to fit within the 2048 x 2048 size limit
  if (width > 2048 || height > 2048) {
    const aspectRatio = width / height;
    if (width > height) {
      width = 2048;
      height = Math.round(2048 / aspectRatio);
    } else {
      height = 2048;
      width = Math.round(2048 * aspectRatio);
    }
  }

  // Next, scale the image so that the shortest side is 768px
  const minSideLength = 768;
  const currentMinSide = Math.min(width, height);
  if (currentMinSide > minSideLength) {
    const scaleFactor = minSideLength / currentMinSide;
    width = Math.round(width * scaleFactor);
    height = Math.round(height * scaleFactor);
  }

  // Calculate how many 512px tiles the image is composed of
  const tilesWide = Math.ceil(width / 512);
  const tilesHigh = Math.ceil(height / 512);
  const totalTiles = tilesWide * tilesHigh;

  // The token cost for each tile is 170, with an additional 85 tokens added at the end
  const totalTokens = totalTiles * 170 + 85;

  return totalTokens;
}

// Example usage
console.log(calculateHighDetailTokens(1024, 1024)); // Should output 765
console.log(calculateHighDetailTokens(2048, 4096)); // Should output 1105

In our project, we actually only use high mode, and the front-end knows the image width and height when uploading images. So I asked gpt to write a code to calculate the code in high mode. This temporarily solves the problem of image message token calculation.😂

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for new formats of content #21

Support for new formats of content #21

hellohejinyu commented May 16, 2024

sean-nicholas commented May 16, 2024

hellohejinyu commented May 16, 2024

Support for new formats of content #21

Support for new formats of content #21

Comments

hellohejinyu commented May 16, 2024

sean-nicholas commented May 16, 2024

hellohejinyu commented May 16, 2024