Rene Fichtmueller a3793a1357 feat: ShieldX v0.1.0 — Self-Evolving LLM Prompt Injection Defense

10-layer defense pipeline with kill chain mapping, self-healing,
self-learning, and compliance reporting. Local-first, zero cloud deps.

- 72 detection rules across 7 kill chain phases
- 294 unit tests, 500+ attack corpus samples
- Management dashboard (Next.js 15, 10 pages)
- Automated resistance testing (2x daily, 31 probes)
- MITRE ATLAS, OWASP LLM Top 10, EU AI Act compliance
- Integrations: Next.js middleware, Ollama, n8n
- PostgreSQL 17 + pgvector for persistent learning

2026-03-27 15:07:27 +13:00

8.6 KiB

Raw Blame History

   _____ _     _      _     _ __  __
  / ____| |   (_)    | |   | |\ \/ /
 | (___ | |__  _  ___| | __| | \  /
  \___ \| '_ \| |/ _ \ |/ _` | /  \
  ____) | | | | |  __/ | (_| |/ /\ \
 |_____/|_| |_|_|\___|_|\__,_/_/  \_\

ShieldX - Self-Evolving LLM Prompt Injection Defense

The first open-source LLM security library that learns from attacks, heals itself, and maps threats to a 7-phase kill chain.

ShieldX protects Claude, GPT, Ollama, and any LLM API from prompt injection, jailbreaks, data exfiltration, and tool poisoning. It runs 100% locally with zero mandatory cloud dependencies.

Dashboard

Real-time overview with KPIs, kill chain distribution, and incident feed. Every scan result shows threat level, matched patterns, and the exact defense layer that caught it.

Live Prompt Tester

Test any prompt against the defense pipeline in real-time. See exactly which rules fired, confidence scores, and kill chain classification.

Promptware Kill Chain

Maps every detected attack to the Schneier 2026 Promptware Kill Chain with 7 phases: Initial Access, Privilege Escalation, Reconnaissance, Persistence, Command & Control, Lateral Movement, Actions on Objective.

Why ShieldX?

Feature	ShieldX	LLM Guard	Rebuff	NeMo Guardrails
Kill Chain Mapping	7 phases	No	No	No
Self-Learning	Drift + Active Learning	No	Vector only	No
Self-Healing	Per-phase strategies	No	No	No
Self-Testing	Red team mutations	No	No	No
MCP/Tool Protection	Full guard	No	No	No
Compliance	MITRE + OWASP + EU AI Act	No	No	No
Local-First	100%	Partial	Partial	Yes
Latency	<2ms (rules)	~50ms	~100ms	~200ms

Quick Start

import { ShieldX } from '@shieldx/core'

const shield = new ShieldX()
const result = await shield.scanInput('Ignore all previous instructions')

console.log(result.detected)        // true
console.log(result.threatLevel)     // 'critical'
console.log(result.killChainPhase)  // 'initial_access'
console.log(result.action)          // 'block'
console.log(result.latencyMs)       // 0.2

10-Layer Defense Pipeline

Layer	Name	Function	Latency
L0	Preprocessing	Unicode normalization, tokenizer attacks, compressed payloads	<0.5ms
L1	Rule Engine	72 regex patterns across 7 kill chain phases	<2ms
L2	Sentinel Phrases	Tripwire detection for system prompt probing	<1ms
L3	Constitutional AI	LLM-based classification (optional, via Ollama)	~100ms
L4	Embeddings	Semantic similarity via Ollama + pgvector	~200ms
L5	Entropy Analysis	Shannon entropy + attention pattern detection	<1ms
L6	Behavioral	Conversation tracking, intent monitoring, context integrity	<5ms
L7	MCP Guard	Tool privilege checking, chain analysis, resource budgets	<1ms
L8	Sanitization	Input/output cleaning, PPA, credential redaction	<1ms
L9	Self-Consciousness	Meta-reasoning about own vulnerability state	~50ms

The 7-Phase Promptware Kill Chain

Initial Access - Instruction override, delimiter injection
Privilege Escalation - Jailbreaks, DAN, role switching
Reconnaissance - System prompt extraction, scope probing
Persistence - Memory poisoning, context manipulation
Command & Control - Fake system messages, dynamic instruction loading
Lateral Movement - Agent-to-agent spread, external resource access
Actions on Objective - Data exfiltration, code execution, denial of service

Self-Evolution Engine

ShieldX doesn't just detect attacks -- it gets smarter from every one:

Concept Drift Detection - CUSUM algorithm detects when attack patterns shift
Active Learning - Uncertain results queued for human review (~6% sample rate)
Red Team Engine - GAN-style mutation generates attack variants to self-test
Attack Graph - Maps technique evolution and relationships
Federated Sync - Opt-in community pattern sharing (privacy-preserving, hash-only)

Automated Resistance Testing

Built-in scheduled testing runs 31 probes across all 7 kill chain phases:

2x daily automated runs (configurable schedule)
6 mutation strategies: synonym replacement, case scrambling, whitespace insertion, base64 encoding, leet speak, unicode substitution
Results tracked in dashboard with trend visualization

Compliance

MITRE ATLAS - Maps to ML attack techniques
OWASP LLM Top 10 2025 - Covers all 10 risk categories
EU AI Act - Articles 9, 12, 14, 15 compliance reporting

Dashboard Pages

Page	Description
Overview	KPIs, kill chain heatmap, incident feed
Kill Chain	7-phase visualization with drill-down
Incidents	Filterable incident log with badges
Learning	Pattern stats, drift detection, FP rate
Compliance	MITRE/OWASP/EU AI Act coverage
Healing	Self-healing action log
Resistance	Automated defense testing with scheduling
Config	Scanner toggles, thresholds
Try It	Live prompt tester

Integration

Next.js 15 Middleware

import { guardPrompt } from '@shieldx/core/guard'

// In any API route:
const blocked = await guardPrompt(userInput)
if (blocked) return Response.json({ error: blocked }, { status: 400 })

Ollama

import { createOllamaClient } from '@shieldx/core/ollama'

const ollama = createOllamaClient({
  endpoint: 'http://localhost:11434',
  model: 'llama3.2',
  shieldx: shield
})
// All calls automatically scanned

n8n

Copy integrations/n8n-shieldx-node.js to ~/.n8n/custom/nodes/ and add the ShieldX node before any AI node in your workflow.

Installation

npm install @shieldx/core

With PostgreSQL (recommended for production):

# Start PostgreSQL with pgvector
docker compose up -d

# Run migrations
npm run db:migrate

# Seed initial patterns
npm run db:seed

Without PostgreSQL (in-memory mode):

const shield = new ShieldX({
  learning: { storageBackend: 'memory' }
})

Benchmarks

Run with npm run benchmark:

Total Samples:    324
Attack Samples:   283
Benign Samples:   41

True Positive Rate (TPR):  32.9%  (rule-engine only, no ML)
False Positive Rate (FPR):  2.4%
Latency avg:               0.06ms
Latency p99:               0.33ms

TPR increases significantly when embedding (L4) and behavioral (L6) scanners are enabled with Ollama.

Performance Targets

Metric	Target	Achieved
L1 Rule Engine	<2ms	0.06ms
Full pipeline (no ML)	<50ms	<2ms
Embedding scan	<200ms	Depends on Ollama
False Positive Rate	<5%	2.4%

Project Structure

shieldx/
  src/
    core/           # ShieldX orchestrator, config, logger
    types/          # TypeScript type definitions
    detection/      # L1-L5 scanners + rules
    preprocessing/  # L0 Unicode, tokenizer, compression
    sanitization/   # L8 input/output cleaning, PPA
    behavioral/     # L6 conversation, intent, context
    mcp-guard/      # L7 tool validation, privilege check
    validation/     # Canary tokens, output validation
    healing/        # Self-healing strategies per phase
    learning/       # Pattern store, drift, active learning
    compliance/     # MITRE ATLAS, OWASP, EU AI Act
    integrations/   # Next.js, Ollama, n8n wrappers
  tests/
    unit/           # 294 unit tests
    attack-corpus/  # 500+ attack samples
  dashboard/        # @shieldx/dashboard React components
  app/              # Standalone Next.js dashboard
  scripts/          # Seed, benchmark, self-test, deploy

Tech Stack

TypeScript strict mode, zero any
Node.js 20+
PostgreSQL 17 + pgvector for persistent learning
Ollama for local embeddings (nomic-embed-text) and guard model
Vitest for testing
tsup for building
Next.js 15 for dashboard

License

Apache 2.0 - See LICENSE

Context X

ShieldX is a Context X Open Source project.

More Engineering, Less Bullshit.

8.6 KiB Raw Blame History