# 2026-05-12 — Claude Code Gateway Fix ## Summary Claude Code CLI now reaches the local Gateway Companion and the public LLM Gateway. Verified smoke: - Local endpoint: `ANTHROPIC_BASE_URL=http://127.0.0.1:11435` - Model: `claude-sonnet-4-6` - Result: `claude-debug10-ok` - Gateway dashboard caller: `claude-code-companion` - Dashboard tracked Sonnet and Haiku rows with tokens, cost, latency, and compression metadata. ## Fixes Applied - Companion: - Anthropic `/v1/messages` translation clamps `max_tokens` to Gateway limit `16384`. - Streaming Anthropic responses no longer double-write HTTP headers. - OpenAI-style assistant markers and prompt echo are sanitized before returning to Claude Code. - Message IDs now include a random suffix to prevent concurrent Claude Code internal requests from colliding. - Gateway: - Response-cache bypass is enabled for agentic callers containing `claude-code`, `codex`, or `copilot`. - These callers are still logged and compression metadata is still recorded. - This avoids stale semantic-cache answers for coding agents. ## Verification Evidence - Public health: `/api/dashboard/health` returned `ok`, database `connected`. - Latest dashboard rows after the fix: - `claude-code-companion`, `claude-sonnet-4-6`, `tokens_in=138`, `tokens_out=19`, latency about `441ms`. - `claude-code-companion`, `claude-haiku-3`, title/internal request tracked separately. ## Boundaries - Claude Code text/CLI path is usable through Gateway and tracked. - Full native Anthropic tool-use parity is not complete; the Companion still flattens tool-related content into text for Gateway routing. - Small smoke prompts often show `compression_mode=none:none`; this is expected when there are too few tokens to compress usefully.