llm.gateway / workbench

db connecting

poll starting

interval 3s

mode auto

summoning buddy

total tokens saved · all layers · all-time

0tokens

⚡ Gateway (LLM calls)0

cost saved $0.00

cache hits 0

savings rate 0%

cost analysis · last 24h · USD

without gateway

$0.00

→

with gateway

$0.00

you saved $0.00 · 0% reduction

Savings Sources 5 measurement axes across all calls

loading

Live Metrics last 24h

requests

0

routed

success rate

0%

approved/total

avg latency

0ms

end-to-end

spent today

$0.00

actual usd

confidence

0/10

post-val

fallback usage

0%

primary→fallback

Activity · last 365 days streak 0d

loading activity

Forecast based on recent trend

computing forecast

Live Activity most recent first

listening

Top Models last 24h

analyzing routing

Top Callers

analyzing callers

Achievements 0/0

checking quests

discovering installed subscriptions

Local on-host inference

enumerating local models

Subscription paid plans via bridges

enumerating subscription providers

Free Tier api-key authenticated

enumerating free-tier endpoints

Desktop AI Coverage only gateway traffic is counted

checking connected clients

Recent Requests live polling

request id

caller

model

status

ctx before

ctx sent

saved

compression

cost

latency

no requests yet

cumulative savings · last 24h

$0.00

— · — tokens prevented · — cache hits

$ saved per hour hit rate —

cache entries

0

distinct cached responses

tokens prevented

0

never sent to LLM

cache hit rate

0%

hits ÷ total req

compressed since last restart

0

— · — ops · since —

Top Caching Callers most savings

loading

Cache Controls manual invalidation

loading wallet

enter a caller id and click load

Knowledge Graph all callers + facts

caller fact key value

computing standings

Race Leaderboard last 7 days

loading

Public Share Card embeddable SVG · OG-card sized · no auth required

Monthly Report save as PDF via browser print

Tip: in the report window, press Cmd/Ctrl+P → "Save as PDF". The report is fully styled for A4 print.

API Reference all endpoints route through compression + caller tracking

The LLM Gateway exposes three POST endpoints and one GET. Every call is logged in activity, compressed when input ≥ 700 tokens, and routed via routing-rules.yaml to the right subscription bridge (Claude Code, ChatGPT, Copilot, M365 Copilot, Codex) or local Ollama.

POST /v1/chat/completions OpenAI-compatible · works with `openai` SDK

curl https://llm-gateway.context-x.org/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "messages": [{"role": "user", "content": "hi"}]
  }'

POST /v1/messages Anthropic-compatible · works with `@anthropic-ai/sdk`

curl https://llm-gateway.context-x.org/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "messages": [{"role": "user", "content": "hi"}],
    "max_tokens": 1024
  }'

POST /v1/completion native — full caller-tracking + compression options

curl https://llm-gateway.context-x.org/v1/completion \
  -H "Content-Type: application/json" \
  -d '{
    "caller": "my-app",
    "task_type": "generic_qa",
    "input": "your prompt here",
    "options": { "compression": { "enabled": true, "mode": "auto" } }
  }'

GET /v1/models list every model the gateway can route to

curl https://llm-gateway.context-x.org/v1/models

Try it out live POST against the gateway

Endpoint: Model:

Prompt:

Model → Bridge Mapping which subscription each model alias routes to

Model alias	Bridge	Subscription used	Port	Status
`claude-sonnet-4.6`, `claude-haiku`, `claude-opus`	claude-bridge	Claude Code Max (OAuth)	3250	—
`gpt-4o`, `gpt-4.1`, `gpt-5.x`	openai-bridge	ChatGPT Plus / Pro	3251	—
`copilot-gpt-4o`, `copilot-claude-3.7`	copilot-bridge	GitHub Copilot	3252	—
`codex-mini`, `gpt-5.1-codex-mini`	codex-bridge	OpenAI Codex CLI	3253	—
`m365-copilot`	m365-copilot-bridge	Microsoft 365 Copilot	3257	—
`qwen2.5:3b / 7b / 14b / 32b`, `magatama:32b`, `magatama-coder`	ollama (Mac Studio)	local — no cost	11434	—

The gateway picks the bridge from routing-rules.yaml based on task_type and the requested model. You can also hit a bridge directly (e.g. http://82.165.222.127:3250/v1/messages) — but then you bypass compression, savings tracking, and the routing rules.

connected