llm-gateway/sync/history/2026-05-12-claude-code-gateway-fix.md
2026-05-12 22:56:24 +02:00

40 lines
1.7 KiB
Markdown

# 2026-05-12 — Claude Code Gateway Fix
## Summary
Claude Code CLI now reaches the local Gateway Companion and the public LLM Gateway.
Verified smoke:
- Local endpoint: `ANTHROPIC_BASE_URL=http://127.0.0.1:11435`
- Model: `claude-sonnet-4-6`
- Result: `claude-debug10-ok`
- Gateway dashboard caller: `claude-code-companion`
- Dashboard tracked Sonnet and Haiku rows with tokens, cost, latency, and compression metadata.
## Fixes Applied
- Companion:
- Anthropic `/v1/messages` translation clamps `max_tokens` to Gateway limit `16384`.
- Streaming Anthropic responses no longer double-write HTTP headers.
- OpenAI-style assistant markers and prompt echo are sanitized before returning to Claude Code.
- Message IDs now include a random suffix to prevent concurrent Claude Code internal requests from colliding.
- Gateway:
- Response-cache bypass is enabled for agentic callers containing `claude-code`, `codex`, or `copilot`.
- These callers are still logged and compression metadata is still recorded.
- This avoids stale semantic-cache answers for coding agents.
## Verification Evidence
- Public health: `/api/dashboard/health` returned `ok`, database `connected`.
- Latest dashboard rows after the fix:
- `claude-code-companion`, `claude-sonnet-4-6`, `tokens_in=138`, `tokens_out=19`, latency about `441ms`.
- `claude-code-companion`, `claude-haiku-3`, title/internal request tracked separately.
## Boundaries
- Claude Code text/CLI path is usable through Gateway and tracked.
- Full native Anthropic tool-use parity is not complete; the Companion still flattens tool-related content into text for Gateway routing.
- Small smoke prompts often show `compression_mode=none:none`; this is expected when there are too few tokens to compress usefully.