Rene Fichtmueller 1d327720d5 feat: Implement Phase 2G.3 — ChatGPT/OpenAI API compatibility adapter
HTTP server providing OpenAI API compatibility for LLM Gateway.

- OpenAI client SDK drop-in replacement (baseURL only change)
- POST /v1/chat/completions endpoint with streaming support
- GET /v1/models for client library discovery
- Automatic model mapping: gpt-4 → qwen2.5:32b, etc.
- Server-Sent Events (SSE) streaming implementation
- Full TypeScript types and comprehensive test suite
- Graceful shutdown handling (SIGTERM/SIGINT)
- Health check endpoint with gateway status
- Performance: Same as gateway (100-500ms with fallback to Ollama)
2026-04-19 22:05:20 +02:00

263 lines
5.9 KiB
Markdown

# ChatGPT API Adapter
OpenAI API compatibility adapter for LLM Gateway. Allows OpenAI client SDKs and curl requests to transparently use LLM Gateway.
## Overview
Provides an HTTP server that implements the OpenAI Chat Completions API specification, transparently routing requests to the LLM Gateway. Existing OpenAI client code requires only a baseURL configuration change.
## Installation
```bash
npm install @llm-gateway/chatgpt-api-adapter
```
## Usage
### As a Standalone Server
```bash
# Start the adapter (listens on port 3111)
npx chatgpt-api
# Or with custom port
CHATGPT_API_PORT=8080 npx chatgpt-api
# Or in Node.js
import ChatGPTAPIAdapter from '@llm-gateway/chatgpt-api-adapter'
const adapter = new ChatGPTAPIAdapter(3111)
await adapter.start()
```
### With OpenAI Client SDK
```typescript
import OpenAI from 'openai'
const client = new OpenAI({
apiKey: 'not-needed',
baseURL: 'http://localhost:3111/v1'
})
const response = await client.chat.completions.create({
model: 'gpt-4',
messages: [
{ role: 'user', content: 'Hello, world!' }
]
})
```
### With curl
```bash
curl http://localhost:3111/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [
{"role": "user", "content": "Explain TypeScript"}
],
"max_tokens": 500
}'
```
### Streaming
```bash
curl http://localhost:3111/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [
{"role": "user", "content": "List 5 ideas"}
],
"stream": true
}'
```
## Features
### Implemented
- **Chat Completions** (`POST /v1/chat/completions`): Full OpenAI API compatibility
- **Streaming** (`stream: true`): Server-Sent Events (SSE) with chunked responses
- **Models** (`GET /v1/models`): Lists available GPT models
- **Health** (`GET /health`): Gateway health status
- **Model Mapping**: Automatic mapping from OpenAI to gateway model names
### Model Mapping
| OpenAI Model | Gateway Model |
|--------------|---------------|
| gpt-4 | qwen2.5:32b |
| gpt-4-turbo | qwen2.5:32b |
| gpt-3.5-turbo | qwen2.5:14b |
| gpt-4-mini | qwen2.5:3b |
## Architecture
```
OpenAI Client
ChatGPT API Adapter (HTTP server)
LLM Gateway API
Model Selection (claude, Ollama, external)
```
## Environment Variables
```bash
CHATGPT_API_PORT=3111 # Listen port
GATEWAY_URL=https://llm-gateway.context-x.org # LLM Gateway endpoint
OLLAMA_URL=192.168.178.213:11434 # Local Ollama fallback
AGENT_ID=chatgpt-api-adapter # Agent identifier
LOG_LEVEL=debug # Logging level
```
## API Endpoints
### POST /v1/chat/completions
Chat completion request using OpenAI format.
**Request:**
```json
{
"model": "gpt-4",
"messages": [
{"role": "system", "content": "You are helpful..."},
{"role": "user", "content": "Hello"}
],
"temperature": 0.7,
"max_tokens": 2000,
"top_p": 1,
"stream": false
}
```
**Response (non-streaming):**
```json
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1234567890,
"model": "gpt-4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 5,
"total_tokens": 15
}
}
```
**Response (streaming):**
```
data: {"id":"chatcmpl-123","object":"text_completion.chunk","created":1234567890,"model":"gpt-4","choices":[{"index":0,"delta":{"content":"H"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"text_completion.chunk","created":1234567890,"model":"gpt-4","choices":[{"index":0,"delta":{"content":"ello"},"finish_reason":null}]}
...
data: {"id":"chatcmpl-123","object":"text_completion.chunk","created":1234567890,"model":"gpt-4","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
```
### GET /v1/models
List available models.
**Response:**
```json
{
"object": "list",
"data": [
{"id": "gpt-4", "object": "model", "owned_by": "openai"},
{"id": "gpt-4-turbo", "object": "model", "owned_by": "openai"},
{"id": "gpt-3.5-turbo", "object": "model", "owned_by": "openai"},
{"id": "gpt-4-mini", "object": "model", "owned_by": "openai"}
]
}
```
### GET /health
Gateway health status.
**Response:**
```json
{
"status": "ok",
"gateway": {
"uptime": 123456,
"models": ["qwen2.5:3b", "qwen2.5:14b"],
"latency_ms": 250
}
}
```
## Performance
Typical latencies:
- **Gateway mode**: 100-500ms (depends on model)
- **Ollama fallback**: 200-2000ms (depends on hardware)
- **Streaming chunk**: 10-50ms per chunk
- **Timeout**: 30s (configurable via gateway)
## Testing
```bash
npm test
```
Tests cover:
- Chat completions (streaming and buffered)
- Model listing
- Error handling and fallback behavior
- Token counting accuracy
- Message formatting
- Health checks
## Security
- No API key validation (assumes network-isolated deployment)
- CORS enabled for all origins (configure as needed)
- Messages logged at DEBUG level only
- Automatic cleanup on shutdown (SIGTERM, SIGINT)
## Troubleshooting
### OpenAI client not connecting
1. Verify adapter is running: `curl http://localhost:3111/health`
2. Check baseURL in client: should be `http://localhost:3111/v1` (no `/v1` at end)
3. Ensure gateway is accessible: `curl $GATEWAY_URL/health`
### Streaming not working
1. Verify `stream: true` in request body
2. Check for SSE support in client library
3. Ensure no intermediate proxies are buffering responses
### Slow responses
1. Check gateway latency: `curl -w "%{time_total}\n" $GATEWAY_URL/health`
2. Verify model availability: `curl http://localhost:3111/v1/models`
3. Check system resources on gateway (CPU, memory, disk)
## Compatibility
- OpenAI Client SDK (Python, Node.js, Go, etc.)
- LiteLLM
- Anthropic Bedrock (proxy mode)
- Any HTTP client using OpenAI API format