Was hardcoded to qwen2.5:3b. Now reads from process.env LLM_JUDGE_MODEL
with qwen2.5:3b fallback.
Production env updated to magatama-coder:judge-r1 — a snapshot of the
magatamallm post-chunk-4 LoRA adapter exported via train.py --export-only.
Chunk-4 picked because it had the best val_loss (0.861) of the 5 balanced
chunks; chunk-5 spiked back to val=2.531.
Sanity test on the new judge model:
injection prompt -> "INFORMATIONAL" (not the strict INJECTION word
we'd want — judge needs Phase-2
dedicated fine-tune on binary
classification format)
safe prompt -> "SAFE" (correct)
Implication: INJECTION_DEFENSE_MODE is staying at 'block' for now —
switching to 'llm_judge' mode with this provisional judge would actually
weaken defense because magatamallm's training tilts toward operator-task
output ("here's the fix") rather than binary INJECTION/SAFE classification.
Follow-up (Phase 2): train a dedicated `magatama-judge` model — small base
(Qwen 2.5:1.5b or Phi-3-mini), trained purely on injection-classification
SFT pairs extracted from our existing:
- llm-security-prompt-injection-2026-05-12.train.jsonl
- pulso-magatama-injection-guard-2026-05-13.train.jsonl
- guard-exposure-firewall-verified-2026-05-16.train.jsonl
- jailbreak-corpus-candidates.jsonl (L1B3RT4S gaps)
- benign samples from train.jsonl labeled SAFE
Architecture rationale: separation of concerns. Even if attacker manipulates
the primary backbone model, judge stays independent. ~5-10k pairs should
be enough for a focused 1.5B classifier. Training ~2-3h on Mac Studio MPS.
Adds a second defense layer between Layer-1 regex (62 patterns) and the
existing Layer-3 llm_judge. Calls a FastAPI sidecar running on the Mac
Studio (port 9091, MPS) that wraps protectai/deberta-v3-base-prompt-
injection-v2 — public model, no auth needed, ~50-400ms inference.
modules/prompt-guard-client.ts:
- callPromptGuard(input) opportunistic, never throws
- isPromptGuardConfigured() true if PROMPT_GUARD_URL is set
- getPromptGuardThreshold() default 0.85
- getPromptGuardMinLen() default 16 chars (skip tiny inputs)
routes/completion.ts:
- New Layer-2 block between regex scan and llm_judge: when Layer-1
didn't detect and input is long enough, ask the sidecar. If sidecar
returns INJECTION with score >= threshold, return HTTP 422 with
error.prompt_guard payload (score + latency).
- Fail-open: sidecar timeout/error logs a warning and the request
falls through to llm_judge / cache / model — never blocks legitimate
traffic due to sidecar issues.
Env (set in ecosystem.config.js):
PROMPT_GUARD_URL http://192.168.178.213:9091
PROMPT_GUARD_THRESHOLD 0.70 (lowered from 0.85 after empirical testing)
PROMPT_GUARD_TIMEOUT 1500 ms
Sidecar code lives at:
~/magatama-llm/prompt-guard-sidecar/server.py (Mac Studio)
launched via ~/Library/LaunchAgents/org.fichtmueller.prompt-guard-sidecar.plist
Smoke tests after deploy:
Layer-1 caught: German "ignoriere..." -> HTTP 422
Layer-2 caught: English "pretend no restrict.."-> HTTP 422 (pg_score 0.9999)
Layer-2 caught: Bangla-romanized -> HTTP 422 (Layer-1 actually)
Benign: "Explain DNS in 2 sentences" -> HTTP 200
Closes the multilingual bypass gap. Previously covered EN/DE/FR/ES/IT/RU/ZH/JA.
Now also: Bangla, Hindi, Arabic, Hebrew, Persian, Turkish, Vietnamese, Thai,
Korean, Polish, Dutch, Indonesian, Tagalog, Swahili.
Plus a universal non-Latin-script soft-flag pattern (severity=medium) that
catches ≥20 chars of Arabic/Bengali/Devanagari/Hebrew/Thai/Hangul/Han/
Hiragana/Katakana/Cyrillic/Tamil/Telugu/Gujarati/Gurmukhi/Myanmar/Khmer/
Lao/Tibetan/Georgian/Armenian/Sinhala — surfaces in scan result without
auto-blocking, so legitimate non-Latin prompts pass while the operator
can route them to llm_judge for deep inspection.
Pattern-engineering notes:
- Devanagari / Bengali / Hebrew need optional matra/suffix tolerance
- Turkish needs \p{L} instead of \w because ı/ş/ç fall outside ASCII \w
- Persian (SOV) needs both VSO and SOV order alternation
- Hebrew needs מ/ב/כ/ל preposition prefix tolerance
- Tagalog needs optional ang/sa article between verb and noun
Smoke-tested 14/14 languages → all HTTP 422 blocked.
Negative-tested 3 benign non-Latin prompts (jp-weather, ar-greeting,
th-thanks) → all HTTP 200 pass. Zero false positives.
Total active patterns: 62 across 6 categories.
- Fixed /api/stream/requests endpoint HTTP/2 INTERNAL_ERROR
- Use reply.raw.writeHead() instead of Fastify headers API for SSE
- Added 30s heartbeat to keep connection alive
- Proper event format with 'event:' and 'data:' fields
- Comprehensive error handling and cleanup on disconnect
- Mirrors working pattern from /api/stream/costs endpoint
- Resolves dashboard perpetual 'Loading...' state
The learning process was failing to communicate with the gateway because:
1. Gateway was sending 'Strict-Transport-Security' header on HTTP responses
2. Node.js fetch respects HSTS and upgrades subsequent requests to HTTPS
3. Gateway only has HTTP listener (localhost:3103), no HTTPS
4. Result: SSL 'packet length too long' error on second request attempt
Solution: Modified registerHSTSMiddleware to only send HSTS header when
the connection is already secure (HTTPS or x-forwarded-proto: https).
HTTP connections will not get the HSTS header, preventing the forced upgrade.
- Add openai-bridge service (port 3251) for ChatGPT and Codex integration
- Update external-providers.ts with openai and chatgpt provider definitions
- Add GPT-4 Turbo, GPT-4, and GPT-3.5 Turbo models to provider registry
- Modify getApiKey() to handle bridge provider authentication
- Modify getBaseUrl() to construct URLs from env vars
- Update ecosystem.config.cjs with OPENAI_BRIDGE_URL and OPENAI_API_KEY config
- Add openai-bridge PM2 service configuration (port 3251)
- Support both claude-bridge (port 3250) and openai-bridge (port 3251) as subscription services
- Extend fallback chain: claude → openai/chatgpt → cerebras → groq → mistral → nvidia → cloudflare
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- completion.ts now uses taskType directly for resolvePrompt (not decision.prompt_template)
so tip_transceiver_enrich.yaml is used instead of generic_qa fallback template
- routing-rules.yaml: +40 task type entries for TIP (8), EO Pulse (8), SwitchBlade (9),
PeerCortex (6), NOGnet (9), internal (2) — all with correct model tier assignments
- qwen2.5:3b for fast tasks (classify, short outputs)
- qwen2.5:14b for medium (most analysis tasks)
- qwen2.5:32b for large (blog posts, detailed reports, CSRD)
Gateway was reading ollama_base_url from YAML (192.168.178.169) instead of
OLLAMA_URL env var (https://ollama.fichtmueller.org). Fix getOllamaBaseUrl()
to prefer process.env['OLLAMA_URL'] and update YAML default to CF tunnel.
Gateway and learning DB clients now prefer DATABASE_URL connection string
over individual DB_* env vars — matches ecosystem.config.cjs convention.
Ollama health check timeout increased 5→15s for Cloudflare tunnel latency.