40 lines
1.7 KiB
Markdown
40 lines
1.7 KiB
Markdown
# 2026-05-12 — Claude Code Gateway Fix
|
|
|
|
## Summary
|
|
|
|
Claude Code CLI now reaches the local Gateway Companion and the public LLM Gateway.
|
|
|
|
Verified smoke:
|
|
|
|
- Local endpoint: `ANTHROPIC_BASE_URL=http://127.0.0.1:11435`
|
|
- Model: `claude-sonnet-4-6`
|
|
- Result: `claude-debug10-ok`
|
|
- Gateway dashboard caller: `claude-code-companion`
|
|
- Dashboard tracked Sonnet and Haiku rows with tokens, cost, latency, and compression metadata.
|
|
|
|
## Fixes Applied
|
|
|
|
- Companion:
|
|
- Anthropic `/v1/messages` translation clamps `max_tokens` to Gateway limit `16384`.
|
|
- Streaming Anthropic responses no longer double-write HTTP headers.
|
|
- OpenAI-style assistant markers and prompt echo are sanitized before returning to Claude Code.
|
|
- Message IDs now include a random suffix to prevent concurrent Claude Code internal requests from colliding.
|
|
|
|
- Gateway:
|
|
- Response-cache bypass is enabled for agentic callers containing `claude-code`, `codex`, or `copilot`.
|
|
- These callers are still logged and compression metadata is still recorded.
|
|
- This avoids stale semantic-cache answers for coding agents.
|
|
|
|
## Verification Evidence
|
|
|
|
- Public health: `/api/dashboard/health` returned `ok`, database `connected`.
|
|
- Latest dashboard rows after the fix:
|
|
- `claude-code-companion`, `claude-sonnet-4-6`, `tokens_in=138`, `tokens_out=19`, latency about `441ms`.
|
|
- `claude-code-companion`, `claude-haiku-3`, title/internal request tracked separately.
|
|
|
|
## Boundaries
|
|
|
|
- Claude Code text/CLI path is usable through Gateway and tracked.
|
|
- Full native Anthropic tool-use parity is not complete; the Companion still flattens tool-related content into text for Gateway routing.
|
|
- Small smoke prompts often show `compression_mode=none:none`; this is expected when there are too few tokens to compress usefully.
|