llm.gateway gateway workbench · v1.0
db connecting
poll starting
interval 3s
mode auto
summoning buddy
total tokens saved · all layers · all-time
0tokens
⚡ Gateway (LLM calls)0
cost saved $0.00
cache hits 0
savings rate 0%
cost analysis · last 24h · USD
without gateway
$0.00
with gateway
$0.00
you saved $0.00 · 0% reduction

Savings Sources 5 measurement axes across all calls

loading

Live Metrics last 24h

requests
0
routed
success rate
0%
approved/total
avg latency
0ms
end-to-end
spent today
$0.00
actual usd
confidence
0/10
post-val
fallback usage
0%
primary→fallback

Activity · last 365 days streak 0d

loading activity

Forecast based on recent trend

computing forecast

Live Activity most recent first

listening

Top Models last 24h

analyzing routing

Top Callers

analyzing callers

Achievements 0/0

checking quests
discovering installed subscriptions

Local on-host inference

enumerating local models

Subscription paid plans via bridges

enumerating subscription providers

Free Tier api-key authenticated

enumerating free-tier endpoints

Desktop AI Coverage only gateway traffic is counted

checking connected clients

Recent Requests live polling

request id
caller
model
status
ctx before
ctx sent
saved
compression
cost
latency
no requests yet
cumulative savings · last 24h
$0.00
— · — tokens prevented · — cache hits
$ saved per hour hit rate —
cache entries
0
distinct cached responses
tokens prevented
0
never sent to LLM
cache hit rate
0%
hits ÷ total req
compressed since last restart
0
— · — ops · since —

Top Caching Callers most savings

loading

Cache Controls manual invalidation

Subscription Pool Wallet — tracks API calls (not tokens) against each Pro plan's quota window. Numbers here are messages remaining, not tokens. For token savings via cache, see the Savings tab.
loading wallet
enter a caller id and click load

Knowledge Graph all callers + facts

caller fact key value
computing standings

Race Leaderboard last 7 days

loading

Public Share Card embeddable SVG · OG-card sized · no auth required

Monthly Report save as PDF via browser print

API Reference all endpoints route through compression + caller tracking

The LLM Gateway exposes three POST endpoints and one GET. Every call is logged in activity, compressed when input ≥ 700 tokens, and routed via routing-rules.yaml to the right subscription bridge (Claude Code, ChatGPT, Copilot, M365 Copilot, Codex) or local Ollama.
POST /v1/chat/completions OpenAI-compatible · works with `openai` SDK
curl https://llm-gateway.context-x.org/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "messages": [{"role": "user", "content": "hi"}]
  }'
POST /v1/messages Anthropic-compatible · works with `@anthropic-ai/sdk`
curl https://llm-gateway.context-x.org/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "messages": [{"role": "user", "content": "hi"}],
    "max_tokens": 1024
  }'
POST /v1/completion native — full caller-tracking + compression options
curl https://llm-gateway.context-x.org/v1/completion \
  -H "Content-Type: application/json" \
  -d '{
    "caller": "my-app",
    "task_type": "generic_qa",
    "input": "your prompt here",
    "options": { "compression": { "enabled": true, "mode": "auto" } }
  }'
GET /v1/models list every model the gateway can route to
curl https://llm-gateway.context-x.org/v1/models

Try it out live POST against the gateway

Model → Bridge Mapping which subscription each model alias routes to

Model alias Bridge Subscription used Port Status
claude-sonnet-4.6, claude-haiku, claude-opusclaude-bridgeClaude Code Max (OAuth)3250
gpt-4o, gpt-4.1, gpt-5.xopenai-bridgeChatGPT Plus / Pro3251
copilot-gpt-4o, copilot-claude-3.7copilot-bridgeGitHub Copilot3252
codex-mini, gpt-5.1-codex-minicodex-bridgeOpenAI Codex CLI3253
m365-copilotm365-copilot-bridgeMicrosoft 365 Copilot3257
qwen2.5:3b / 7b / 14b / 32b, magatama:32b, magatama-coderollama (Mac Studio)local — no cost11434
connected