llm-gateway/packages/chatgpt-api-adapter/README.md

# ChatGPT API Adapter

OpenAI API compatibility adapter for LLM Gateway. Allows OpenAI client SDKs and curl requests to transparently use LLM Gateway.

## Overview

Provides an HTTP server that implements the OpenAI Chat Completions API specification, transparently routing requests to the LLM Gateway. Existing OpenAI client code requires only a baseURL configuration change.

## Installation

```bash
npm install @llm-gateway/chatgpt-api-adapter
```

## Usage

### As a Standalone Server

```bash
# Start the adapter (listens on port 3111)
npx chatgpt-api

# Or with custom port
CHATGPT_API_PORT=8080 npx chatgpt-api

# Or in Node.js
import ChatGPTAPIAdapter from '@llm-gateway/chatgpt-api-adapter'

const adapter = new ChatGPTAPIAdapter(3111)
await adapter.start()
```

### With OpenAI Client SDK

```typescript
import OpenAI from 'openai'

const client = new OpenAI({
  apiKey: 'not-needed',
  baseURL: 'http://localhost:3111/v1'
})

const response = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [
    { role: 'user', content: 'Hello, world!' }
  ]
})
```

### With curl

```bash
curl http://localhost:3111/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "Explain TypeScript"}
    ],
    "max_tokens": 500
  }'
```

### Streaming

```bash
curl http://localhost:3111/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "List 5 ideas"}
    ],
    "stream": true
  }'
```

## Features

### Implemented

- **Chat Completions** (`POST /v1/chat/completions`): Full OpenAI API compatibility
- **Streaming** (`stream: true`): Server-Sent Events (SSE) with chunked responses
- **Models** (`GET /v1/models`): Lists available GPT models
- **Health** (`GET /health`): Gateway health status
- **Model Mapping**: Automatic mapping from OpenAI to gateway model names

### Model Mapping

| OpenAI Model | Gateway Model |
|--------------|---------------|
| gpt-4 | qwen2.5:32b |
| gpt-4-turbo | qwen2.5:32b |
| gpt-3.5-turbo | qwen2.5:14b |
| gpt-4-mini | qwen2.5:3b |

## Architecture

```
OpenAI Client
    ↓
ChatGPT API Adapter (HTTP server)
    ↓
LLM Gateway API
    ↓
Model Selection (claude, Ollama, external)
```

## Environment Variables

```bash
CHATGPT_API_PORT=3111                          # Listen port
GATEWAY_URL=https://llm-gateway.context-x.org  # LLM Gateway endpoint
OLLAMA_URL=192.168.178.213:11434              # Local Ollama fallback
AGENT_ID=chatgpt-api-adapter                   # Agent identifier
LOG_LEVEL=debug                                # Logging level
```

## API Endpoints

### POST /v1/chat/completions

Chat completion request using OpenAI format.

**Request:**
```json
{
  "model": "gpt-4",
  "messages": [
    {"role": "system", "content": "You are helpful..."},
    {"role": "user", "content": "Hello"}
  ],
  "temperature": 0.7,
  "max_tokens": 2000,
  "top_p": 1,
  "stream": false
}
```

**Response (non-streaming):**
```json
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 5,
    "total_tokens": 15
  }
}
```

**Response (streaming):**
```
data: {"id":"chatcmpl-123","object":"text_completion.chunk","created":1234567890,"model":"gpt-4","choices":[{"index":0,"delta":{"content":"H"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"text_completion.chunk","created":1234567890,"model":"gpt-4","choices":[{"index":0,"delta":{"content":"ello"},"finish_reason":null}]}
...
data: {"id":"chatcmpl-123","object":"text_completion.chunk","created":1234567890,"model":"gpt-4","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
```

### GET /v1/models

List available models.

**Response:**
```json
{
  "object": "list",
  "data": [
    {"id": "gpt-4", "object": "model", "owned_by": "openai"},
    {"id": "gpt-4-turbo", "object": "model", "owned_by": "openai"},
    {"id": "gpt-3.5-turbo", "object": "model", "owned_by": "openai"},
    {"id": "gpt-4-mini", "object": "model", "owned_by": "openai"}
  ]
}
```

### GET /health

Gateway health status.

**Response:**
```json
{
  "status": "ok",
  "gateway": {
    "uptime": 123456,
    "models": ["qwen2.5:3b", "qwen2.5:14b"],
    "latency_ms": 250
  }
}
```

## Performance

Typical latencies:
- **Gateway mode**: 100-500ms (depends on model)
- **Ollama fallback**: 200-2000ms (depends on hardware)
- **Streaming chunk**: 10-50ms per chunk
- **Timeout**: 30s (configurable via gateway)

## Testing

```bash
npm test
```

Tests cover:
- Chat completions (streaming and buffered)
- Model listing
- Error handling and fallback behavior
- Token counting accuracy
- Message formatting
- Health checks

## Security

- No API key validation (assumes network-isolated deployment)
- CORS enabled for all origins (configure as needed)
- Messages logged at DEBUG level only
- Automatic cleanup on shutdown (SIGTERM, SIGINT)

## Troubleshooting

### OpenAI client not connecting

1. Verify adapter is running: `curl http://localhost:3111/health`
2. Check baseURL in client: should be `http://localhost:3111/v1` (no `/v1` at end)
3. Ensure gateway is accessible: `curl $GATEWAY_URL/health`

### Streaming not working

1. Verify `stream: true` in request body
2. Check for SSE support in client library
3. Ensure no intermediate proxies are buffering responses

### Slow responses

1. Check gateway latency: `curl -w "%{time_total}\n" $GATEWAY_URL/health`
2. Verify model availability: `curl http://localhost:3111/v1/models`
3. Check system resources on gateway (CPU, memory, disk)

## Compatibility

- OpenAI Client SDK (Python, Node.js, Go, etc.)
- LiteLLM
- Anthropic Bedrock (proxy mode)
- Any HTTP client using OpenAI API format