llm-gateway/packages/chatgpt-api-adapter
Rene Fichtmueller 1d327720d5 feat: Implement Phase 2G.3 — ChatGPT/OpenAI API compatibility adapter
HTTP server providing OpenAI API compatibility for LLM Gateway.

- OpenAI client SDK drop-in replacement (baseURL only change)
- POST /v1/chat/completions endpoint with streaming support
- GET /v1/models for client library discovery
- Automatic model mapping: gpt-4 → qwen2.5:32b, etc.
- Server-Sent Events (SSE) streaming implementation
- Full TypeScript types and comprehensive test suite
- Graceful shutdown handling (SIGTERM/SIGINT)
- Health check endpoint with gateway status
- Performance: Same as gateway (100-500ms with fallback to Ollama)
2026-04-19 22:05:20 +02:00
..

ChatGPT API Adapter

OpenAI API compatibility adapter for LLM Gateway. Allows OpenAI client SDKs and curl requests to transparently use LLM Gateway.

Overview

Provides an HTTP server that implements the OpenAI Chat Completions API specification, transparently routing requests to the LLM Gateway. Existing OpenAI client code requires only a baseURL configuration change.

Installation

npm install @llm-gateway/chatgpt-api-adapter

Usage

As a Standalone Server

# Start the adapter (listens on port 3111)
npx chatgpt-api

# Or with custom port
CHATGPT_API_PORT=8080 npx chatgpt-api

# Or in Node.js
import ChatGPTAPIAdapter from '@llm-gateway/chatgpt-api-adapter'

const adapter = new ChatGPTAPIAdapter(3111)
await adapter.start()

With OpenAI Client SDK

import OpenAI from 'openai'

const client = new OpenAI({
  apiKey: 'not-needed',
  baseURL: 'http://localhost:3111/v1'
})

const response = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [
    { role: 'user', content: 'Hello, world!' }
  ]
})

With curl

curl http://localhost:3111/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "Explain TypeScript"}
    ],
    "max_tokens": 500
  }'

Streaming

curl http://localhost:3111/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "List 5 ideas"}
    ],
    "stream": true
  }'

Features

Implemented

  • Chat Completions (POST /v1/chat/completions): Full OpenAI API compatibility
  • Streaming (stream: true): Server-Sent Events (SSE) with chunked responses
  • Models (GET /v1/models): Lists available GPT models
  • Health (GET /health): Gateway health status
  • Model Mapping: Automatic mapping from OpenAI to gateway model names

Model Mapping

OpenAI Model Gateway Model
gpt-4 qwen2.5:32b
gpt-4-turbo qwen2.5:32b
gpt-3.5-turbo qwen2.5:14b
gpt-4-mini qwen2.5:3b

Architecture

OpenAI Client
    ↓
ChatGPT API Adapter (HTTP server)
    ↓
LLM Gateway API
    ↓
Model Selection (claude, Ollama, external)

Environment Variables

CHATGPT_API_PORT=3111                          # Listen port
GATEWAY_URL=https://llm-gateway.context-x.org  # LLM Gateway endpoint
OLLAMA_URL=192.168.178.213:11434              # Local Ollama fallback
AGENT_ID=chatgpt-api-adapter                   # Agent identifier
LOG_LEVEL=debug                                # Logging level

API Endpoints

POST /v1/chat/completions

Chat completion request using OpenAI format.

Request:

{
  "model": "gpt-4",
  "messages": [
    {"role": "system", "content": "You are helpful..."},
    {"role": "user", "content": "Hello"}
  ],
  "temperature": 0.7,
  "max_tokens": 2000,
  "top_p": 1,
  "stream": false
}

Response (non-streaming):

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 5,
    "total_tokens": 15
  }
}

Response (streaming):

data: {"id":"chatcmpl-123","object":"text_completion.chunk","created":1234567890,"model":"gpt-4","choices":[{"index":0,"delta":{"content":"H"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"text_completion.chunk","created":1234567890,"model":"gpt-4","choices":[{"index":0,"delta":{"content":"ello"},"finish_reason":null}]}
...
data: {"id":"chatcmpl-123","object":"text_completion.chunk","created":1234567890,"model":"gpt-4","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]

GET /v1/models

List available models.

Response:

{
  "object": "list",
  "data": [
    {"id": "gpt-4", "object": "model", "owned_by": "openai"},
    {"id": "gpt-4-turbo", "object": "model", "owned_by": "openai"},
    {"id": "gpt-3.5-turbo", "object": "model", "owned_by": "openai"},
    {"id": "gpt-4-mini", "object": "model", "owned_by": "openai"}
  ]
}

GET /health

Gateway health status.

Response:

{
  "status": "ok",
  "gateway": {
    "uptime": 123456,
    "models": ["qwen2.5:3b", "qwen2.5:14b"],
    "latency_ms": 250
  }
}

Performance

Typical latencies:

  • Gateway mode: 100-500ms (depends on model)
  • Ollama fallback: 200-2000ms (depends on hardware)
  • Streaming chunk: 10-50ms per chunk
  • Timeout: 30s (configurable via gateway)

Testing

npm test

Tests cover:

  • Chat completions (streaming and buffered)
  • Model listing
  • Error handling and fallback behavior
  • Token counting accuracy
  • Message formatting
  • Health checks

Security

  • No API key validation (assumes network-isolated deployment)
  • CORS enabled for all origins (configure as needed)
  • Messages logged at DEBUG level only
  • Automatic cleanup on shutdown (SIGTERM, SIGINT)

Troubleshooting

OpenAI client not connecting

  1. Verify adapter is running: curl http://localhost:3111/health
  2. Check baseURL in client: should be http://localhost:3111/v1 (no /v1 at end)
  3. Ensure gateway is accessible: curl $GATEWAY_URL/health

Streaming not working

  1. Verify stream: true in request body
  2. Check for SSE support in client library
  3. Ensure no intermediate proxies are buffering responses

Slow responses

  1. Check gateway latency: curl -w "%{time_total}\n" $GATEWAY_URL/health
  2. Verify model availability: curl http://localhost:3111/v1/models
  3. Check system resources on gateway (CPU, memory, disk)

Compatibility

  • OpenAI Client SDK (Python, Node.js, Go, etc.)
  • LiteLLM
  • Anthropic Bedrock (proxy mode)
  • Any HTTP client using OpenAI API format