shieldx/README.md
Rene Fichtmueller a3793a1357 feat: ShieldX v0.1.0 — Self-Evolving LLM Prompt Injection Defense
10-layer defense pipeline with kill chain mapping, self-healing,
self-learning, and compliance reporting. Local-first, zero cloud deps.

- 72 detection rules across 7 kill chain phases
- 294 unit tests, 500+ attack corpus samples
- Management dashboard (Next.js 15, 10 pages)
- Automated resistance testing (2x daily, 31 probes)
- MITRE ATLAS, OWASP LLM Top 10, EU AI Act compliance
- Integrations: Next.js middleware, Ollama, n8n
- PostgreSQL 17 + pgvector for persistent learning
2026-03-27 15:07:27 +13:00

258 lines
8.6 KiB
Markdown

```
_____ _ _ _ _ __ __
/ ____| | (_) | | | |\ \/ /
| (___ | |__ _ ___| | __| | \ /
\___ \| '_ \| |/ _ \ |/ _` | / \
____) | | | | | __/ | (_| |/ /\ \
|_____/|_| |_|_|\___|_|\__,_/_/ \_\
```
# ShieldX - Self-Evolving LLM Prompt Injection Defense
**The first open-source LLM security library that learns from attacks, heals itself, and maps threats to a 7-phase kill chain.**
ShieldX protects Claude, GPT, Ollama, and any LLM API from prompt injection, jailbreaks, data exfiltration, and tool poisoning. It runs 100% locally with zero mandatory cloud dependencies.
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![TypeScript](https://img.shields.io/badge/TypeScript-strict-blue.svg)](https://www.typescriptlang.org/)
[![Node.js](https://img.shields.io/badge/Node.js-20+-green.svg)](https://nodejs.org/)
---
## Dashboard
![ShieldX Defense Center](docs/screenshots/dashboard-overview.png)
Real-time overview with KPIs, kill chain distribution, and incident feed. Every scan result shows threat level, matched patterns, and the exact defense layer that caught it.
## Live Prompt Tester
![Try It - Threat Detection](docs/screenshots/try-it-scan.png)
Test any prompt against the defense pipeline in real-time. See exactly which rules fired, confidence scores, and kill chain classification.
## Promptware Kill Chain
![Kill Chain Mapping](docs/screenshots/kill-chain.png)
Maps every detected attack to the Schneier 2026 Promptware Kill Chain with 7 phases: Initial Access, Privilege Escalation, Reconnaissance, Persistence, Command & Control, Lateral Movement, Actions on Objective.
---
## Why ShieldX?
| Feature | ShieldX | LLM Guard | Rebuff | NeMo Guardrails |
|---------|---------|-----------|--------|-----------------|
| Kill Chain Mapping | 7 phases | No | No | No |
| Self-Learning | Drift + Active Learning | No | Vector only | No |
| Self-Healing | Per-phase strategies | No | No | No |
| Self-Testing | Red team mutations | No | No | No |
| MCP/Tool Protection | Full guard | No | No | No |
| Compliance | MITRE + OWASP + EU AI Act | No | No | No |
| Local-First | 100% | Partial | Partial | Yes |
| Latency | <2ms (rules) | ~50ms | ~100ms | ~200ms |
## Quick Start
```typescript
import { ShieldX } from '@shieldx/core'
const shield = new ShieldX()
const result = await shield.scanInput('Ignore all previous instructions')
console.log(result.detected) // true
console.log(result.threatLevel) // 'critical'
console.log(result.killChainPhase) // 'initial_access'
console.log(result.action) // 'block'
console.log(result.latencyMs) // 0.2
```
## 10-Layer Defense Pipeline
| Layer | Name | Function | Latency |
|-------|------|----------|---------|
| L0 | Preprocessing | Unicode normalization, tokenizer attacks, compressed payloads | <0.5ms |
| L1 | Rule Engine | 72 regex patterns across 7 kill chain phases | <2ms |
| L2 | Sentinel Phrases | Tripwire detection for system prompt probing | <1ms |
| L3 | Constitutional AI | LLM-based classification (optional, via Ollama) | ~100ms |
| L4 | Embeddings | Semantic similarity via Ollama + pgvector | ~200ms |
| L5 | Entropy Analysis | Shannon entropy + attention pattern detection | <1ms |
| L6 | Behavioral | Conversation tracking, intent monitoring, context integrity | <5ms |
| L7 | MCP Guard | Tool privilege checking, chain analysis, resource budgets | <1ms |
| L8 | Sanitization | Input/output cleaning, PPA, credential redaction | <1ms |
| L9 | Self-Consciousness | Meta-reasoning about own vulnerability state | ~50ms |
## The 7-Phase Promptware Kill Chain
1. **Initial Access** - Instruction override, delimiter injection
2. **Privilege Escalation** - Jailbreaks, DAN, role switching
3. **Reconnaissance** - System prompt extraction, scope probing
4. **Persistence** - Memory poisoning, context manipulation
5. **Command & Control** - Fake system messages, dynamic instruction loading
6. **Lateral Movement** - Agent-to-agent spread, external resource access
7. **Actions on Objective** - Data exfiltration, code execution, denial of service
## Self-Evolution Engine
ShieldX doesn't just detect attacks -- it gets smarter from every one:
- **Concept Drift Detection** - CUSUM algorithm detects when attack patterns shift
- **Active Learning** - Uncertain results queued for human review (~6% sample rate)
- **Red Team Engine** - GAN-style mutation generates attack variants to self-test
- **Attack Graph** - Maps technique evolution and relationships
- **Federated Sync** - Opt-in community pattern sharing (privacy-preserving, hash-only)
## Automated Resistance Testing
Built-in scheduled testing runs 31 probes across all 7 kill chain phases:
- 2x daily automated runs (configurable schedule)
- 6 mutation strategies: synonym replacement, case scrambling, whitespace insertion, base64 encoding, leet speak, unicode substitution
- Results tracked in dashboard with trend visualization
## Compliance
- **MITRE ATLAS** - Maps to ML attack techniques
- **OWASP LLM Top 10 2025** - Covers all 10 risk categories
- **EU AI Act** - Articles 9, 12, 14, 15 compliance reporting
## Dashboard Pages
| Page | Description |
|------|-------------|
| Overview | KPIs, kill chain heatmap, incident feed |
| Kill Chain | 7-phase visualization with drill-down |
| Incidents | Filterable incident log with badges |
| Learning | Pattern stats, drift detection, FP rate |
| Compliance | MITRE/OWASP/EU AI Act coverage |
| Healing | Self-healing action log |
| Resistance | Automated defense testing with scheduling |
| Config | Scanner toggles, thresholds |
| Try It | Live prompt tester |
## Integration
### Next.js 15 Middleware
```typescript
import { guardPrompt } from '@shieldx/core/guard'
// In any API route:
const blocked = await guardPrompt(userInput)
if (blocked) return Response.json({ error: blocked }, { status: 400 })
```
### Ollama
```typescript
import { createOllamaClient } from '@shieldx/core/ollama'
const ollama = createOllamaClient({
endpoint: 'http://localhost:11434',
model: 'llama3.2',
shieldx: shield
})
// All calls automatically scanned
```
### n8n
Copy `integrations/n8n-shieldx-node.js` to `~/.n8n/custom/nodes/` and add the ShieldX node before any AI node in your workflow.
## Installation
```bash
npm install @shieldx/core
```
### With PostgreSQL (recommended for production):
```bash
# Start PostgreSQL with pgvector
docker compose up -d
# Run migrations
npm run db:migrate
# Seed initial patterns
npm run db:seed
```
### Without PostgreSQL (in-memory mode):
```typescript
const shield = new ShieldX({
learning: { storageBackend: 'memory' }
})
```
## Benchmarks
Run with `npm run benchmark`:
```
Total Samples: 324
Attack Samples: 283
Benign Samples: 41
True Positive Rate (TPR): 32.9% (rule-engine only, no ML)
False Positive Rate (FPR): 2.4%
Latency avg: 0.06ms
Latency p99: 0.33ms
```
*TPR increases significantly when embedding (L4) and behavioral (L6) scanners are enabled with Ollama.*
## Performance Targets
| Metric | Target | Achieved |
|--------|--------|----------|
| L1 Rule Engine | <2ms | 0.06ms |
| Full pipeline (no ML) | <50ms | <2ms |
| Embedding scan | <200ms | Depends on Ollama |
| False Positive Rate | <5% | 2.4% |
## Project Structure
```
shieldx/
src/
core/ # ShieldX orchestrator, config, logger
types/ # TypeScript type definitions
detection/ # L1-L5 scanners + rules
preprocessing/ # L0 Unicode, tokenizer, compression
sanitization/ # L8 input/output cleaning, PPA
behavioral/ # L6 conversation, intent, context
mcp-guard/ # L7 tool validation, privilege check
validation/ # Canary tokens, output validation
healing/ # Self-healing strategies per phase
learning/ # Pattern store, drift, active learning
compliance/ # MITRE ATLAS, OWASP, EU AI Act
integrations/ # Next.js, Ollama, n8n wrappers
tests/
unit/ # 294 unit tests
attack-corpus/ # 500+ attack samples
dashboard/ # @shieldx/dashboard React components
app/ # Standalone Next.js dashboard
scripts/ # Seed, benchmark, self-test, deploy
```
## Tech Stack
- **TypeScript** strict mode, zero `any`
- **Node.js 20+**
- **PostgreSQL 17** + pgvector for persistent learning
- **Ollama** for local embeddings (nomic-embed-text) and guard model
- **Vitest** for testing
- **tsup** for building
- **Next.js 15** for dashboard
## License
Apache 2.0 - See [LICENSE](LICENSE)
## Context X
ShieldX is a [Context X](https://context-x.org) Open Source project.
*More Engineering, Less Bullshit.*