llm-gateway

History

Rene Fichtmueller f399999e62 sec(gateway): Layer-2 ML classifier — Prompt-Guard sidecar integration

Adds a second defense layer between Layer-1 regex (62 patterns) and the
existing Layer-3 llm_judge. Calls a FastAPI sidecar running on the Mac
Studio (port 9091, MPS) that wraps protectai/deberta-v3-base-prompt-
injection-v2 — public model, no auth needed, ~50-400ms inference.

modules/prompt-guard-client.ts:
  - callPromptGuard(input)        opportunistic, never throws
  - isPromptGuardConfigured()     true if PROMPT_GUARD_URL is set
  - getPromptGuardThreshold()     default 0.85
  - getPromptGuardMinLen()        default 16 chars (skip tiny inputs)

routes/completion.ts:
  - New Layer-2 block between regex scan and llm_judge: when Layer-1
    didn't detect and input is long enough, ask the sidecar. If sidecar
    returns INJECTION with score >= threshold, return HTTP 422 with
    error.prompt_guard payload (score + latency).
  - Fail-open: sidecar timeout/error logs a warning and the request
    falls through to llm_judge / cache / model — never blocks legitimate
    traffic due to sidecar issues.

Env (set in ecosystem.config.js):
  PROMPT_GUARD_URL       http://192.168.178.213:9091
  PROMPT_GUARD_THRESHOLD 0.70  (lowered from 0.85 after empirical testing)
  PROMPT_GUARD_TIMEOUT   1500 ms

Sidecar code lives at:
  ~/magatama-llm/prompt-guard-sidecar/server.py  (Mac Studio)
  launched via ~/Library/LaunchAgents/org.fichtmueller.prompt-guard-sidecar.plist

Smoke tests after deploy:
  Layer-1 caught: German "ignoriere..."          -> HTTP 422
  Layer-2 caught: English "pretend no restrict.."-> HTTP 422 (pg_score 0.9999)
  Layer-2 caught: Bangla-romanized               -> HTTP 422 (Layer-1 actually)
  Benign:        "Explain DNS in 2 sentences"    -> HTTP 200

2026-05-16 23:14:16 +02:00

__tests__

sec(gateway): commit injection/output defense modules to git (44 patterns)

2026-05-16 22:55:08 +02:00

admin-auth.ts

feat: restore workbench v1 and publish wired v2

2026-05-03 09:53:40 +02:00

bridge-spawner.ts

feat: restore workbench v1 and publish wired v2

2026-05-03 09:53:40 +02:00

caller-stats.ts

feat: restore workbench v1 and publish wired v2