llm-gateway/sync/history/2026-05-12-claude-code-gateway-fix.md
2026-05-12 22:56:24 +02:00

1.7 KiB

2026-05-12 — Claude Code Gateway Fix

Summary

Claude Code CLI now reaches the local Gateway Companion and the public LLM Gateway.

Verified smoke:

  • Local endpoint: ANTHROPIC_BASE_URL=http://127.0.0.1:11435
  • Model: claude-sonnet-4-6
  • Result: claude-debug10-ok
  • Gateway dashboard caller: claude-code-companion
  • Dashboard tracked Sonnet and Haiku rows with tokens, cost, latency, and compression metadata.

Fixes Applied

  • Companion:

    • Anthropic /v1/messages translation clamps max_tokens to Gateway limit 16384.
    • Streaming Anthropic responses no longer double-write HTTP headers.
    • OpenAI-style assistant markers and prompt echo are sanitized before returning to Claude Code.
    • Message IDs now include a random suffix to prevent concurrent Claude Code internal requests from colliding.
  • Gateway:

    • Response-cache bypass is enabled for agentic callers containing claude-code, codex, or copilot.
    • These callers are still logged and compression metadata is still recorded.
    • This avoids stale semantic-cache answers for coding agents.

Verification Evidence

  • Public health: /api/dashboard/health returned ok, database connected.
  • Latest dashboard rows after the fix:
    • claude-code-companion, claude-sonnet-4-6, tokens_in=138, tokens_out=19, latency about 441ms.
    • claude-code-companion, claude-haiku-3, title/internal request tracked separately.

Boundaries

  • Claude Code text/CLI path is usable through Gateway and tracked.
  • Full native Anthropic tool-use parity is not complete; the Companion still flattens tool-related content into text for Gateway routing.
  • Small smoke prompts often show compression_mode=none:none; this is expected when there are too few tokens to compress usefully.