llm-gateway/sync/CURRENT.md
2026-05-12 23:31:02 +02:00

298 lines
15 KiB
Markdown

# Claude Code Context — 2026-04-29
**Last Updated:** 2026-04-29 ~20:30 (Session ongoing)
**Session Type:** LLM Gateway / Codex Bridge Handoff
**Working Directory:** `/Users/renefichtmueller/Desktop/Claude Code`
**Model:** Haiku 4.5 (default), Opus for deep reasoning
**Context Window:** Using lean-ctx MCP for compression
---
## Session Status
### Latest Verified State — 2026-05-12 23:30 Europe/Berlin
- Live hardening and verification completed:
- GitHub Copilot bridge now binds to loopback by default (`127.0.0.1`) and reports stable diagnostic health instead of hiding startup/auth failures behind PM2 restarts.
- The Copilot bridge health now exposes `auth_required`, host, package, last startup/output, and an explicit warning while `COPILOT_API_PACKAGE` is still `copilot-api@latest`.
- Dashboard Client Coverage now shows bridge provider/runtime state per desktop client, not only local process/install detection.
- Live `/api/dashboard/clients?hours=24` verifies:
- Codex Desktop / CLI: `live`, bridge `codex` ready, callers include `codex-cli`, `codex-live-gateway-check`, `codex-secure-tunnel-smoke`, `tokensSaved=4067`.
- Claude Desktop / Claude Code: `live`, bridge `claude-code` ready, callers include `claude-code-companion`, `requestCount=28`.
- Microsoft Copilot: local process detected, bridge `m365-copilot-bridge` remains `auth_required` until Microsoft Graph/device auth is configured.
- GitHub Copilot: local process/bridge detected, bridge `copilot-bridge` remains `auth_required` until GitHub Copilot device login is completed.
- Fresh compression proof after deploy:
- Caller `final-repeat-compression-smoke`, model `qwen2.5:14b`.
- Compression mode `ctxlean:verbatim_compact`.
- Tokens `8882 -> 106`, saved `8776`, savings `98.81%`.
- Gateway public health remains green: `/api/dashboard/health` returns `status=ok`, database `connected`.
- Operational note:
- Cloudflare SSH fallback needed explicit Go DNS mode from Codex sandbox: `GODEBUG=netdns=go+1 cloudflared access ssh --hostname ssh.context-x.org`.
- Direct SSH to Erik was intermittent/refused during deploy, but Cloudflare SSH with the DNS override completed restart and verification.
- Companion tool-use adapter added and verified:
- Anthropic `tools` are summarized into a strict tool-use adapter instruction for the text backend.
- OpenAI-style `tool_calls` or compact JSON tool decisions are converted back to Anthropic `tool_use` content blocks.
- Forced `tool_choice: {type:"tool"}` now returns a valid `tool_use` block even if the text backend returns an empty response.
- Streaming tool use emits `content_block_start`, `input_json_delta`, `content_block_stop`, `message_delta`, and `message_stop`.
- Synthetic proof:
- Non-stream request with `read_file` returned `content[0].type=tool_use`, `name=read_file`, `input.path=/tmp/hello.txt`.
- Streaming request returned valid Anthropic SSE tool-use events with `partial_json={"path":"/tmp/stream.txt"}`.
- Claude Code text path still works through Companion/Gateway after the tool adapter; latest CLI smoke reached Gateway and dashboard logged `claude-code-companion`.
- Remaining quality boundary:
- Erik `/opt/claude-bridge/server.js` is text-only (`claude --print --output-format text`), so native model-driven Anthropic tool parity is still not the same as the hosted Anthropic API.
- The adapter now supports tool block transport and forced tool calls, but auto tool selection depends on the text backend following the tool JSON instruction.
- Short exact-answer prompts may still be answered creatively by the subscription bridge; this is provider behavior, not Companion protocol failure.
- Claude Code full CLI smoke now reaches the local Gateway Companion and public Gateway reliably:
- Local Companion: `127.0.0.1:11435`.
- Claude env: `ANTHROPIC_BASE_URL=http://127.0.0.1:11435`, `ANTHROPIC_API_KEY=gateway`, default Sonnet `claude-sonnet-4-6`.
- Verified command returned exact clean result `claude-debug10-ok`.
- Dashboard rows show caller `claude-code-companion`, models `claude-sonnet-4-6` and `claude-haiku-3`, tokens/cost/latency tracked.
- Fixes applied during verification:
- Companion clamps Anthropic `max_tokens` to Gateway limit `16384`.
- Companion emits Anthropic-compatible SSE without double-writing headers.
- Companion sanitizes OpenAI-style assistant markers and prompt echo before returning to Claude Code.
- Companion message IDs now include a random suffix to avoid concurrent `generate_session_title` vs main-request collisions.
- Gateway live route bypasses response-cache for agentic callers containing `claude-code`, `codex`, or `copilot`; these are still tracked and compression metadata is still recorded.
- Important boundary:
- Claude Code text/CLI path is now usable through Gateway and tracked.
- Full Anthropic tool-use fidelity is still adapter-level, not native Anthropic API parity; current bridge flattens tool requests to text for Gateway routing.
- Small Claude Code smoke prompts often show `compression_mode=none:none` because there is no useful token reduction on tiny inputs; larger Codex test already proved `ctxlean-rtk` savings.
- Secure bridge architecture is now in place for Gateway-routed subscription access:
- MacStudio Codex bridge listens on `127.0.0.1:3253`.
- Local M365 bridge listens on `127.0.0.1:3257` but remains auth-required.
- Cloudflare-Access SSH reverse tunnel exposes only Erik loopback listeners `127.0.0.1:3353` and `127.0.0.1:3357`.
- Gateway live env points `CODEX_BRIDGE_URL` / `OPENAI_CODEX_URL` to `http://127.0.0.1:3353`.
- End-to-end Codex via Gateway works and is tracked:
- Caller `codex-secure-tunnel-smoke`.
- Model `gpt-5.1-codex-mini`.
- Dashboard request row recorded tokens, latency, cost, and compression metadata.
- New local Codex starts are configured for Gateway:
- `~/.codex/config.toml` default provider `llm-gateway`, `wire_api = "responses"`, `env_key = "LLM_GATEWAY_API_KEY"`.
- `~/.zshrc` sets OpenAI-compatible Gateway env vars and aliases `codex` to the Gateway profile.
- Local Gateway Companion is running on `127.0.0.1:11435` for desktop/CLI clients that need a local endpoint.
- It forwards OpenAI-compatible calls to `https://llm-gateway.context-x.org`.
- It translates Claude/Anthropic `/v1/messages` text calls to Gateway `/v1/chat/completions`.
- Claude Companion smoke with model `claude-sonnet-4-6` returned content and was tracked.
- Claude model alias warning:
- `claude-sonnet-4-1` was stale for current Claude Code bridge behavior and produced empty/failing output.
- Live Gateway provider metadata was corrected to expose `claude-sonnet-4-6`.
- `claude-sonnet-4-6`, `sonnet`, or default bridge model works.
- Remaining auth blockers:
- GitHub Copilot bridge remains `auth_required`.
- M365 Copilot bridge remains `auth_required` until real Microsoft Graph delegated auth/client config exists.
- Truth boundary:
- Gateway can track/compress only requests that enter it before provider execution.
- Existing native app sessions must be restarted or explicitly configured to use Gateway/Companion.
- Full Claude Code tool-call translation through Anthropic `/v1/messages` is not finished; current Companion support is text-compatible and enough for tracking text calls.
### Previous Verified State — 2026-05-12
- Public gateway is reachable:
- `/api/dashboard/health` returns `ok`, database `connected`.
- `/v1/models` returns the configured model list.
- `/v1/chat/completions` accepted a live smoke request from caller `codex-live-gateway-check` and returned `gateway-check-ok`.
- Tracking works for requests that actually enter the gateway:
- Smoke request was recorded in `/api/dashboard/requests`.
- 24h metrics showed `8` tracked requests, all routed to `qwen2.5:14b`.
- Compression metrics are recorded, but current 24h savings were low: `25` tokens saved across `3` compression operations.
- Not everything is currently going through the gateway:
- `codex-desktop` is marked `live` because of tracked gateway callers, but the configured MacStudio bridge `http://192.168.178.213:3253` was unreachable from Erik during the check.
- `microsoft-copilot` is running locally but has `0` gateway requests and the configured MacStudio bridge `http://192.168.178.213:3257` was unreachable from Erik.
- GitHub Copilot bridge is running but returns `auth_required`.
- M365 Copilot bridge is running but returns `auth_required` / missing Microsoft Graph auth.
- Claude bridge is healthy and ready.
- Security note:
- Starting local Codex Bridge on `0.0.0.0` via PM2 was blocked by policy because it would expose local Codex access on the LAN. Use explicit approval plus a narrow network rule or a safer tunnel approach before enabling this persistently.
### Active Work
- **Scope:** Sync all chat history + context into `sync/` handoff folder for Codex integration
- **Repos Modified:** llm-gateway (sync/* only, no code changes)
- **Branch:** main (no branching, sync/* only)
### Current Tasks
1. ✅ Create `sync/README.md` — handoff format documentation
2. ⏳ Create `sync/CURRENT.md` — this file, current status
3. ⏳ Create `sync/history/2026-04-29-sync-handoff.md` — session snapshot
4. ⏳ Git commit sync/* files
5. ⏳ Git push to Gitea (origin main)
6. ⏳ Notify Codex via handoff mechanism
### Blockers
- None — proceed with autonomous execution (per Memory: no confirmations needed)
---
## Key Context
### Projects Active
1. **LLM Gateway** (`/llm-gateway/`)
- Stack: Fastify TypeScript monorepo (gateway + learning + client + fine-tuner)
- Live: https://llm-gateway.context-x.org (Port 3103 on Erik)
- DB: PostgreSQL llm_gateway on Erik (user: llm, pw: llm_secure_2026)
- Last Deploy: 2026-04-09 (23-dimension request scoring + free LLM fallback chain)
- Status: ✅ Running (PM2 id 19+20)
- **Codex Bridge:** New in this session — `/copilot-bridge/server.js` for Codex integration
2. **Transceiver Intelligence Platform (TIP)** (`github-repos/transceiver-db/`)
- Live: https://transceiver-db.fichtmueller.org
- Stack: PostgreSQL 17 + TimescaleDB + Qdrant + Cloudflare R2
- Features: Real-time pricing, Norton-Bass Hype Cycle, FAQ/KB, MCP Server
- Blog LLM: claude-bridge provider (switched from Ollama 2026-04-09)
- Status: ✅ Functional
3. **MAGATAMA Security Platform** (in planning)
- Status: S6 SHIN (ShieldX) + S2 TEN (ShieldY) functional
- Next: S1/S3/S4/S5/S7 planning
- Obsidian Docs: `/Users/renefichtmueller/Documents/ObsidianBrain/projects/magatama/wiki/`
---
## Erik / Infrastructure Status
### SSH Access
- **Primary:** Port 22 (via UFW ALLOW from Rene home IP 83.135.64.79)
- **Backup:** Port 2222 (systemd drop-in)
- **WireGuard:** jumphost for remote access
- **Serial Console:** sossh-rhr.online-server.cloud (IONOS OOB)
### Running Services (Erik .82)
- ✅ PostgreSQL 17 (llm_gateway, ctxmeet, others)
- ✅ Proxmox (infrastructure, .10)
- ✅ Ollama (via https://ollama.fichtmueller.org)
- ✅ PM2 Services:
- id 19+20: LLM Gateway (port 3103)
- id 41: claude-bridge (port 3250)
- peercortex (port 3101)
- ctxevent/nognet (port 3001)
- ⚠️ ShieldY: **Unknown status** — 846 restarts on Mac Studio (blocked until fixed)
### Security Notes
- ✅ SSH UFW rules: home IP whitelisted (Rule #1, #2 before LIMIT)
- ✅ Backups: Daily to Fearghas (12h, `/opt/scripts/daily-backup-fearghas.sh`)
- ⚠️ SFTP: Disabled on Synology (workaround: `scp -O` legacy mode in backup script)
---
## Changed Files (Uncommitted)
From `git status` in llm-gateway:
**Modified (code changes — NOT STAGED for sync commit):**
- Dockerfile, docker-compose.yaml
- copilot-bridge/server.js
- deploy/ecosystem.config.cjs, package-lock.json
- packages/gateway/package.json, public/dashboard.html
- packages/gateway/src/config/models.yaml
- packages/gateway/src/modules/request-logger.ts
- packages/gateway/src/pipeline/* (3 files)
- packages/gateway/src/routes/* (3 files)
- packages/gateway/src/security/tls-config.ts
- packages/gateway/src/server.ts
- packages/gateway/src/utils/tokenvault-hooks.ts
**Untracked Dirs (NEW):**
- codex-bridge/
- m365-copilot-bridge/
- packages/browser-extension/
- packages/companion/
- packages/mcp-router/, packages/mcp-server/, packages/mcp-tools/
**Untracked Files (DB migrations + modules):**
- 004-semantic-cache.sql, 005-fuzzy-cache.sql, 006-mcp-tool-calls.sql
- admin-auth.ts, bridge-spawner.ts, caller-detection.ts, caller-stats.ts
- context-compressor.ts, embedding-client.ts, gamification.ts
- knowledge-memory.ts, memory-graph.ts, race-leaderboard.ts, race-mode.ts
- report-generator.ts, response-cache.ts, savings-calculator.ts
- settings-store.ts, share-card.ts, subscription-discovery.ts
- subscription-wallet.ts
**⚠️ POLICY:** Only `sync/*` files committed/pushed in this session. Code changes staged separately (AFTER code review).
---
## Next Safe Steps (for Codex / Next Claude Session)
### Immediate (Safe to Execute)
1.`git add sync/*` — stage handoff files only
2.`git commit -m "sync: add chat handoff for Codex integration (2026-04-29)"` — commit
3.`git push origin main` — push to Gitea
### Code Review (After Handoff)
1. Review copilot-bridge/server.js + new packages/* (code-reviewer agent)
2. Security scan all new modules (security-reviewer agent)
3. Stage + commit code changes in separate PR (per development-workflow.md)
4. Deploy to Erik after approval
### Codex Integration
1. Codex reads this CURRENT.md on session start
2. Codex continues with code review workflow (not skipping security)
3. Codex pushes new history entry at session end
---
## Warnings / Blockers
### 🔴 CRITICAL
- **ShieldY Mac Studio:** 846 restarts — MUST FIX before production deployment
- Issue: Unknown crash pattern
- Next: Use **debug** skill to diagnose, then **build-fix** agent
- Blocked: MAGATAMA deployment until resolved
### 🟡 MEDIUM
- **Codex Bridge:** New component, needs security scan + testing
- **m365-copilot-bridge:** New (untracked), purpose unknown — document + review
- **UFW SSH Rate Limiting:** Rene home IP whitelisted, but new IPs could get blocked
- Workaround: `ufw insert 1 allow from <ip> to any port 22`
### 🟢 LOW
- SFTP disabled on Synology — currently using scp -O workaround (acceptable)
- Ollama tunnel via Cloudflare (no direct IP) — acceptable for current load
---
## Instructions for Codex / Next Session
**On Session Start:**
1. `cd /Users/renefichtmueller/Desktop/Claude\ Code/llm-gateway`
2. Read `sync/CURRENT.md` (this file) — has all context
3. `git status` — should show only modifications (code) + untracked (code)
4. Proceed with code review workflow (DON'T skip security)
**On Session End:**
1. Create new `sync/history/YYYY-MM-DD-topic.md` entry (copy template below)
2. Update `sync/CURRENT.md` with new status
3. `git add sync/* && git commit ... && git push` (sync/* only)
4. Code commits handled separately (per development-workflow.md)
**History Entry Template:**
```markdown
# Session: [Topic] — 2026-04-DD
**Duration:** HH:MM
**Agent:** Codex / Claude Code Opus
**Status:** ✅ Complete / ⏳ Ongoing / ❌ Blocked
## Achievements
- [ ] Task 1
- [ ] Task 2
## Remaining
- [ ] Task 3 (blockers: X)
- [ ] Task 4 (next: Y)
## Files Changed
- code/* — staged for review
- sync/* — handoff updated
## Context Used
- ~XXX tokens (Haiku / Opus)
- Lean-ctx compression: Y% savings
```
---
**End of CURRENT.md**