[Frontend] Add segments to OpenAI Requests #11713

ruediste · 2025-01-03T08:25:01Z

I extended the format of the completions prompt:

{
  ...
  prompt: [
        { text: "<|fim_prefix|>", split_special_tokens: false },
        { text: prefix, split_special_tokens: true },
        { text: "<|fim_suffix|>", split_special_tokens: false },
        { text: suffix, split_special_tokens: true },
        { text: "<|fim_middle|>", split_special_tokens: false },
      ]
  ...
}

This allows fine control over the generated tokens, protection against prompt injection, and low latency since all of this can be achieved with a single request.

Sample Usage using the javascript openai client:

const response = await openai.completions.create({
      model: config.model,
      prompt: [
        { text: "<|fim_prefix|>", split_special_tokens: false },
        { text: prefix, split_special_tokens: true },
        { text: "<|fim_suffix|>", split_special_tokens: false },
        { text: suffix, split_special_tokens: true },
        { text: "<|fim_middle|>", split_special_tokens: false },
      ] as {
        text: string;
        split_special_tokens: boolean;
      }[] as any,
      stream: true,
      max_tokens: 25,
      temperature: 0,
    });
``1`

github-actions · 2025-01-03T08:25:15Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Ruedi Steinmann <[email protected]>

mgoin · 2025-01-03T15:23:57Z

Hi @ruediste this interface is a new idea to me and I don't see it in the OpenAI API, is there another server or API that implements this? It would be helpful to have a full e2e example of how to use this feature to demonstrate its intended use.

ruediste · 2025-01-04T15:23:55Z

A full e2e test is somewhat difficult, as it would in my case include a VS Code extension. But I'm happy to give an example:

Imaging a inline autocomplete extension. The user is editing some code:

// print `Hello World` to the console
console.<FIM>

You would get a prefix of

// print `Hello World` to the console
console.

and the suffix

The normal completions request:

curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{ 
    "model": "coder", 
    "prompt": "<|fim_prefix|>// print `Hello World` to the console\nconsole.<|fim_suffix|>\n<|fim_middle|>"
    }'

You get the intended (Qwen 2.5 Coder 0.5B) response: log(\"Hello World!\")

Now imagine the following code:

// print `<|endoftext|>` to the console
console.<FIM>

Request:

curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{ 
    "model": "coder", 
    "prompt": "<|fim_prefix|>// print `<|endoftext|>` to the console<|fim_suffix|>\n<|fim_middle|>"
    }'

the response is \nvar name = `Akash`\nconsole.log(name);

With the new API:

curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{ 
    "model": "coder", 
    "prompt": [
    	{"text": "<|fim_prefix|>", "split_special_tokens": false},`
  	{"text": "// print `<|endoftext|>` to the console\nconsole.", "split_special_tokens": true},
    	{"text": "<|fim_suffix|>", "split_special_tokens": false},
 	{"text": "\n", "split_special_tokens": true},
    	{"text": "<|fim_middle|>", "split_special_tokens": false}
    ]}'

The response is as expected: log('<|endoftext|>')

ruediste · 2025-01-12T17:23:56Z

ping

ruediste · 2025-01-21T07:09:37Z

Any updates on this?

mergify bot added documentation Improvements or additions to documentation frontend labels Jan 3, 2025

ruediste force-pushed the completions-split-tokens branch 2 times, most recently from e4077c5 to 537d5dd Compare January 3, 2025 08:34

[Frontend] Add segments to OpenAI Requests

cfd80ca

Signed-off-by: Ruedi Steinmann <[email protected]>

ruediste force-pushed the completions-split-tokens branch from 537d5dd to cfd80ca Compare January 3, 2025 08:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend] Add segments to OpenAI Requests #11713

[Frontend] Add segments to OpenAI Requests #11713

ruediste commented Jan 3, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 3, 2025

mgoin commented Jan 3, 2025

ruediste commented Jan 4, 2025 •

edited

Loading

ruediste commented Jan 12, 2025

ruediste commented Jan 21, 2025

[Frontend] Add segments to OpenAI Requests #11713

Are you sure you want to change the base?

[Frontend] Add segments to OpenAI Requests #11713

Conversation

ruediste commented Jan 3, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 3, 2025

mgoin commented Jan 3, 2025

ruediste commented Jan 4, 2025 • edited Loading

ruediste commented Jan 12, 2025

ruediste commented Jan 21, 2025

ruediste commented Jan 3, 2025 •

edited by github-actions bot

Loading

ruediste commented Jan 4, 2025 •

edited

Loading