6 Commits

Author SHA1 Message Date
Rene Fichtmueller
f399999e62 sec(gateway): Layer-2 ML classifier — Prompt-Guard sidecar integration
Adds a second defense layer between Layer-1 regex (62 patterns) and the
existing Layer-3 llm_judge. Calls a FastAPI sidecar running on the Mac
Studio (port 9091, MPS) that wraps protectai/deberta-v3-base-prompt-
injection-v2 — public model, no auth needed, ~50-400ms inference.

modules/prompt-guard-client.ts:
  - callPromptGuard(input)        opportunistic, never throws
  - isPromptGuardConfigured()     true if PROMPT_GUARD_URL is set
  - getPromptGuardThreshold()     default 0.85
  - getPromptGuardMinLen()        default 16 chars (skip tiny inputs)

routes/completion.ts:
  - New Layer-2 block between regex scan and llm_judge: when Layer-1
    didn't detect and input is long enough, ask the sidecar. If sidecar
    returns INJECTION with score >= threshold, return HTTP 422 with
    error.prompt_guard payload (score + latency).
  - Fail-open: sidecar timeout/error logs a warning and the request
    falls through to llm_judge / cache / model — never blocks legitimate
    traffic due to sidecar issues.

Env (set in ecosystem.config.js):
  PROMPT_GUARD_URL       http://192.168.178.213:9091
  PROMPT_GUARD_THRESHOLD 0.70  (lowered from 0.85 after empirical testing)
  PROMPT_GUARD_TIMEOUT   1500 ms

Sidecar code lives at:
  ~/magatama-llm/prompt-guard-sidecar/server.py  (Mac Studio)
  launched via ~/Library/LaunchAgents/org.fichtmueller.prompt-guard-sidecar.plist

Smoke tests after deploy:
  Layer-1 caught: German "ignoriere..."          -> HTTP 422
  Layer-2 caught: English "pretend no restrict.."-> HTTP 422 (pg_score 0.9999)
  Layer-2 caught: Bangla-romanized               -> HTTP 422 (Layer-1 actually)
  Benign:        "Explain DNS in 2 sentences"    -> HTTP 200
2026-05-16 23:14:16 +02:00
Rene Fichtmueller
6f5dd81d7a sec(gateway): +15 languages + non-Latin script detector (62 patterns total)
Closes the multilingual bypass gap. Previously covered EN/DE/FR/ES/IT/RU/ZH/JA.
Now also: Bangla, Hindi, Arabic, Hebrew, Persian, Turkish, Vietnamese, Thai,
Korean, Polish, Dutch, Indonesian, Tagalog, Swahili.

Plus a universal non-Latin-script soft-flag pattern (severity=medium) that
catches ≥20 chars of Arabic/Bengali/Devanagari/Hebrew/Thai/Hangul/Han/
Hiragana/Katakana/Cyrillic/Tamil/Telugu/Gujarati/Gurmukhi/Myanmar/Khmer/
Lao/Tibetan/Georgian/Armenian/Sinhala — surfaces in scan result without
auto-blocking, so legitimate non-Latin prompts pass while the operator
can route them to llm_judge for deep inspection.

Pattern-engineering notes:
  - Devanagari / Bengali / Hebrew need optional matra/suffix tolerance
  - Turkish needs \p{L} instead of \w because ı/ş/ç fall outside ASCII \w
  - Persian (SOV) needs both VSO and SOV order alternation
  - Hebrew needs מ/ב/כ/ל preposition prefix tolerance
  - Tagalog needs optional ang/sa article between verb and noun

Smoke-tested 14/14 languages → all HTTP 422 blocked.
Negative-tested 3 benign non-Latin prompts (jp-weather, ar-greeting,
th-thanks) → all HTTP 200 pass. Zero false positives.

Total active patterns: 62 across 6 categories.
2026-05-16 23:02:01 +02:00
Rene Fichtmueller
ac887ab052 sec(gateway): commit injection/output defense modules to git (44 patterns)
These modules have been running in production on Erik but were never
committed to the repo. Recovering + extending after CVE-2026-45321 review.

injection-defense.ts: 44 patterns across 6 categories
  jailbreak (15)    — ignore/disregard/override + DAN/AIM/EvilBOT/Omega
                      personas + fiction-framing + goal-hijack + 5 langs
                      (en, de, fr, es, it, ru, zh, ja) + DAN-style
                      "do anything now"
  role_bypass (3)   — DAN persona, new-system-prompt, pretend-rolemix
  system_prompt_leak (5) — direct reveal, repeat verbatim, starts-with,
                      debug/admin mode invocation, translate-system
  indirect (8)      — fake role tags, embedded user msg, instruction-in-data,
                      ChatML/INST control-token smuggle, RAG-poisoning,
                      Unicode tag chars (E0000-E007F), zero-width chars,
                      fake-section-header override
  exfiltration (6)  — markdown-image, send-data-to, base64-instruction,
                      DNS-exfil, webhook canaries, templated image URLs
  policy (2)        — no-refusal, illegal-content-demand

Activated via INJECTION_DEFENSE_MODE=block in ecosystem.config.js (also
fixed: was overridden by stale pm2 set module_conf.json entry).

Smoke tests:
  - "Ignore all previous instructions"  → 422 blocked
  - "You are now AIM"                   → 422 blocked
  - "Ignorez les instructions"          → 422 blocked
  - "What is 2+2?"                      → 200 passes

output-defense.ts: existing stream-time output filter, kept as-is.
2026-05-16 22:55:08 +02:00
Rene Fichtmueller
09165b9bf7 feat: restore workbench v1 and publish wired v2 2026-05-03 09:53:40 +02:00
Rene Fichtmueller
ff090de82b fix: Update request logging to use request_tracking table instead of dashboard_request_log 2026-04-26 00:42:58 +02:00
Rene Fichtmueller
a04c1d67f2 feat: Complete LightRAG Sidecar Phase 2 — Hybrid Retrieval Implementation
Delivers production-ready knowledge graph sidecar with hybrid BM25+vector search.

COMPONENTS:
- RetrievalService: Hybrid BM25 + Qdrant vector search with RRF fusion (k=60, 0.4/0.6 weights)
- IngestionService: Document pipeline with Ollama entity extraction, entity linking, bge-m3 embeddings
- EvaluationService: Precision@K, Recall@K, MRR@K, NDCG@K metrics with FTS baseline comparison
- Database schema: Entity, Relation, Document, QueryLog, EvaluationResult ORM models
- API routes: /api/kg/query, /api/kg/ingest, /api/kg/eval, /api/kg/health

INFRASTRUCTURE:
- FastAPI 0.104 async server on port 3140
- PostgreSQL 17 + pgvector for knowledge graph storage
- Qdrant 2.7 vector database with COSINE distance (384-dim bge-m3)
- Ollama qwen2.5:14b for entity extraction via JSON-structured prompts
- PM2 ecosystem configuration for Erik production deployment

TESTING & DEPLOYMENT:
- TESTING.md: 5-phase local testing workflow with examples
- DEPLOYMENT_CHECKLIST.md: Step-by-step Erik deployment guide
- eval-transceiver-50qa.json: 50 Q&A evaluation pairs for transceiver domain
- populate_eval_set.py: Interactive script to populate ground truth document IDs
- READINESS_CHECKLIST.md: Pre-deployment verification checklist
- bootstrap_tip_data.py: Load TIP blog documents via API

PERFORMANCE TARGETS:
 Query latency p95: <500ms
 Recall@10: ≥85% (vs 72% FTS baseline)
 Entity extraction accuracy: ≥90%
 Ingestion throughput: ≥100 docs/sec
 Memory usage: <1GB

Ready for Phase 3: E2E testing, TypeScript client, multi-domain support.
2026-04-25 05:47:18 +02:00