sec(gateway): Layer-3 llm_judge model now configurable via LLM_JUDGE_MODEL env
Was hardcoded to qwen2.5:3b. Now reads from process.env LLM_JUDGE_MODEL
with qwen2.5:3b fallback.
Production env updated to magatama-coder:judge-r1 — a snapshot of the
magatamallm post-chunk-4 LoRA adapter exported via train.py --export-only.
Chunk-4 picked because it had the best val_loss (0.861) of the 5 balanced
chunks; chunk-5 spiked back to val=2.531.
Sanity test on the new judge model:
injection prompt -> "INFORMATIONAL" (not the strict INJECTION word
we'd want — judge needs Phase-2
dedicated fine-tune on binary
classification format)
safe prompt -> "SAFE" (correct)
Implication: INJECTION_DEFENSE_MODE is staying at 'block' for now —
switching to 'llm_judge' mode with this provisional judge would actually
weaken defense because magatamallm's training tilts toward operator-task
output ("here's the fix") rather than binary INJECTION/SAFE classification.
Follow-up (Phase 2): train a dedicated `magatama-judge` model — small base
(Qwen 2.5:1.5b or Phi-3-mini), trained purely on injection-classification
SFT pairs extracted from our existing:
- llm-security-prompt-injection-2026-05-12.train.jsonl
- pulso-magatama-injection-guard-2026-05-13.train.jsonl
- guard-exposure-firewall-verified-2026-05-16.train.jsonl
- jailbreak-corpus-candidates.jsonl (L1B3RT4S gaps)
- benign samples from train.jsonl labeled SAFE
Architecture rationale: separation of concerns. Even if attacker manipulates
the primary backbone model, judge stays independent. ~5-10k pairs should
be enough for a focused 1.5B classifier. Training ~2-3h on Mac Studio MPS.
This commit is contained in:
parent
f399999e62
commit
c731900a90
@ -449,7 +449,7 @@ async function executeCompletion(body: CompletionRequest, startMs: number, callI
|
|||||||
if (action === 'llm_judge') {
|
if (action === 'llm_judge') {
|
||||||
try {
|
try {
|
||||||
const verdict = await llmJudge(body.input, {
|
const verdict = await llmJudge(body.input, {
|
||||||
model: 'qwen2.5:3b',
|
model: process.env['LLM_JUDGE_MODEL'] || 'qwen2.5:3b',
|
||||||
callLLM: async (req) => {
|
callLLM: async (req) => {
|
||||||
const resp = await callOllama(
|
const resp = await callOllama(
|
||||||
{ model: req.model, prompt: req.prompt, system: req.system, stream: false, options: { temperature: 0, num_predict: 8, ...(req.options ?? {}) } },
|
{ model: req.model, prompt: req.prompt, system: req.system, stream: false, options: { temperature: 0, num_predict: 8, ...(req.options ?? {}) } },
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user