Azure OpenAI Proxy Function

This Azure Function acts as a proxy for Azure OpenAI, supporting both streaming and non-streaming responses. It provides a simple way to interact with Azure OpenAI services while handling both Server-Sent Events (SSE) streaming and standard JSON responses.

This is to resolve Event Hub Logging for APIM SSE Streaming limitation.

Setup

Prerequisites

Azure subscription
Azure Function App (Python)
Azure OpenAI service instance

Environment Variables

Copy local.settings.sample.json to local.settings.json and set the following environment variables: or Set the following environment variables in your Azure Function App:

az functionapp config appsettings set \
  --name fnsse \
  --resource-group fnsse \
  --settings \
  "AZURE_OPENAI_KEY=your_key_here" \
  "AZURE_OPENAI_API_VERSION=2024-08-01-preview" \
  "AZURE_OPENAI_BASE_URL=https://your-instance.openai.azure.com/" \
  "AZURE_EVENTHUB_CONN_STR=your_eventhub_connection_string" \
  "AZURE_EVENTHUB_NAME=openai-logs" \
  "PYTHON_ENABLE_INIT_INDEXING=1"

Note: PYTHON_ENABLE_INIT_INDEXING=1 is required for proper Python module initialization in Azure Functions.

Demo

Function Start

API Call

Event Hub

Usage

Endpoint

The endpoint matches the Azure OpenAI API signature:

POST /openai/deployments/{deployment_name}/chat/completions?api-version=2024-08-01-preview

Request Format

Non-streaming Request (Default)

{
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Tell me about Singapore in 1 sentence"
        }
    ]
}

Streaming Request

{
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Tell me about Singapore in 1 sentence"
        }
    ],
    "stream": true,
    "stream_options": {
        "include_usage": true
    }
}

Response Format

Non-streaming Response

Standard Azure OpenAI response format:

{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "choices": [{
        "index": 0,
        "message": {
            "role": "assistant",
            "content": "Singapore is a highly developed city-state..."
        },
        "finish_reason": "stop"
    }],
    "usage": {
        "prompt_tokens": 25,
        "completion_tokens": 31,
        "total_tokens": 56
    }
}

Streaming Response

Server-Sent Events (SSE) format with chunks matching Azure OpenAI's format:

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"choices":[{"index":0,"delta":{"role":"assistant"}}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"choices":[{"index":0,"delta":{"content":"Singapore"}}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"choices":[{"index":0,"delta":{"content":" is"}}]}

... more chunks ...

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"usage":{"completion_tokens":31,"prompt_tokens":25,"total_tokens":56}}

data: [DONE]

Event Hub Logging

The function automatically logs completion details to Azure Event Hub for both streaming and non-streaming responses:

Non-streaming Log Format

{
    "type": "completion",
    "content": "Singapore is a vibrant city-state...",
    "usage": {
        "completion_tokens": 137,
        "prompt_tokens": 25,
        "total_tokens": 162
    },
    "model": "gpt-4o",
    "prompt": [...],
    "region": "Australia East",
    "latency_ms": 1306,
    "timestamp": "2024-11-13T06:59:30.584946"
}

Streaming Log Format

{
    "type": "stream_completion",
    "content": "Singapore is a vibrant city-state...",
    "model": "gpt-4o",
    "usage": {
        "completion_tokens": 137,
        "prompt_tokens": 25,
        "total_tokens": 162
    },
    "prompt": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Tell me about Singapore in 1 sentence"
        }
    ],
    "region": "Australia East",
    "latency_ms": 2650,
    "time_to_first_chunk_ms": 150,
    "streaming_duration_ms": 2500,
    "timestamp": "2024-11-13T06:59:30.584946"
}

Understanding Timing Metrics

For streaming responses, three timing metrics are captured:

time_to_first_chunk_ms: Time from request start until first token (includes queue time and model startup)
streaming_duration_ms: Duration of token generation (actual model inference time)
latency_ms: Total request duration (time_to_first_chunk_ms + streaming_duration_ms)

These metrics help identify:

Queue waiting time in different regions
Model warm-up and startup time
Token generation speed
Overall request latency

Dependencies

Python 3.9+
openai>=1.0.0
azure-functions
azure.functions.extensions.http.fastapi
httpx

Deployment

func azure functionapp publish fnsse

Monitoring

View logs using Azure CLI:

az functionapp logs tail --name fnsse --resource-group fnsse

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.vscode		.vscode
assets		assets
.funcignore		.funcignore
.gitignore		.gitignore
function_app.py		function_app.py
host.json		host.json
local.settings.sample.json		local.settings.sample.json
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Azure OpenAI Proxy Function

Setup

Prerequisites

Environment Variables

Demo

Function Start

API Call

Event Hub

Usage

Endpoint

Request Format

Non-streaming Request (Default)

Streaming Request

Response Format

Non-streaming Response

Streaming Response

Event Hub Logging

Non-streaming Log Format

Streaming Log Format

Understanding Timing Metrics

Dependencies

Deployment

Monitoring

About

Releases

Packages

Languages

hieumoscow/Function-SSE-Proxy

Folders and files

Latest commit

History

Repository files navigation

Azure OpenAI Proxy Function

Setup

Prerequisites

Environment Variables

Demo

Function Start

API Call

Event Hub

Usage

Endpoint

Request Format

Non-streaming Request (Default)

Streaming Request

Response Format

Non-streaming Response

Streaming Response

Event Hub Logging

Non-streaming Log Format

Streaming Log Format

Understanding Timing Metrics

Dependencies

Deployment

Monitoring

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages