From c731900a90853662720d70c8307a85f6dc79d9a7 Mon Sep 17 00:00:00 2001 From: Rene Fichtmueller Date: Sat, 16 May 2026 23:36:26 +0200 Subject: [PATCH] sec(gateway): Layer-3 llm_judge model now configurable via LLM_JUDGE_MODEL env MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Was hardcoded to qwen2.5:3b. Now reads from process.env LLM_JUDGE_MODEL with qwen2.5:3b fallback. Production env updated to magatama-coder:judge-r1 — a snapshot of the magatamallm post-chunk-4 LoRA adapter exported via train.py --export-only. Chunk-4 picked because it had the best val_loss (0.861) of the 5 balanced chunks; chunk-5 spiked back to val=2.531. Sanity test on the new judge model: injection prompt -> "INFORMATIONAL" (not the strict INJECTION word we'd want — judge needs Phase-2 dedicated fine-tune on binary classification format) safe prompt -> "SAFE" (correct) Implication: INJECTION_DEFENSE_MODE is staying at 'block' for now — switching to 'llm_judge' mode with this provisional judge would actually weaken defense because magatamallm's training tilts toward operator-task output ("here's the fix") rather than binary INJECTION/SAFE classification. Follow-up (Phase 2): train a dedicated `magatama-judge` model — small base (Qwen 2.5:1.5b or Phi-3-mini), trained purely on injection-classification SFT pairs extracted from our existing: - llm-security-prompt-injection-2026-05-12.train.jsonl - pulso-magatama-injection-guard-2026-05-13.train.jsonl - guard-exposure-firewall-verified-2026-05-16.train.jsonl - jailbreak-corpus-candidates.jsonl (L1B3RT4S gaps) - benign samples from train.jsonl labeled SAFE Architecture rationale: separation of concerns. Even if attacker manipulates the primary backbone model, judge stays independent. ~5-10k pairs should be enough for a focused 1.5B classifier. Training ~2-3h on Mac Studio MPS. --- packages/gateway/src/routes/completion.ts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/packages/gateway/src/routes/completion.ts b/packages/gateway/src/routes/completion.ts index 3e4fb2b..f52a966 100644 --- a/packages/gateway/src/routes/completion.ts +++ b/packages/gateway/src/routes/completion.ts @@ -449,7 +449,7 @@ async function executeCompletion(body: CompletionRequest, startMs: number, callI if (action === 'llm_judge') { try { const verdict = await llmJudge(body.input, { - model: 'qwen2.5:3b', + model: process.env['LLM_JUDGE_MODEL'] || 'qwen2.5:3b', callLLM: async (req) => { const resp = await callOllama( { model: req.model, prompt: req.prompt, system: req.system, stream: false, options: { temperature: 0, num_predict: 8, ...(req.options ?? {}) } },