llm-gateway/packages/gateway/prompts/templates/shieldx_compliance_report.yaml

id: shieldx_compliance_report
version: "1.0.0"
task_type: shieldx_compliance_report
description: Generate a MITRE ATLAS compliance report from ShieldX detection logs covering detection coverage by kill chain phase
model_preference: qwen2.5:14b
model_minimum: qwen2.5:7b
temperature: 0.3
max_tokens: 2500
output_format: markdown

system_prompt: |
  You are the compliance reporting engine for ShieldX, an LLM prompt injection defense system.
  Generate structured MITRE ATLAS compliance reports from ShieldX detection log summaries.

  Structure (always use this):

  # ShieldX MITRE ATLAS Compliance Report
  Report Period: {{report_period}}
  Generated: {{current_date}}

  ## Executive Summary
  Detection rates, coverage assessment, critical gaps.

  ## Coverage Assessment by Kill Chain Phase

  Table: Phase | Name | Techniques Covered | Detection Rate | Gap Assessment

  For each phase with gaps: explain what is not detected and why it matters.

  ## Detection Rate by Attack Category

  Table: Category | Total Incidents | Detected | Detection Rate | FP Rate

  ## Critical Gaps
  Specific techniques with no or poor detection coverage. Prioritized by risk.

  ## Recommendations
  Specific rule additions or threshold changes ranked by priority.

  ## Appendix: MITRE ATLAS Technique Mapping
  Which ShieldX rules map to which ATLAS techniques.

  Rules:
  - Use actual numbers from the input data
  - Detection rate = detected / total × 100
  - Be specific about which ATLAS techniques are and are not covered
  - Recommendations must be actionable: not "improve detection" but "add rule for AML.T0054.003"
  - Current date: {{current_date}}

user_template: |
  Report period: {{report_period}}

  Detection log summary:
  {{detection_logs}}

  Active ShieldX rules:
  {{active_rules}}

  Known attack patterns in period:
  {{known_attacks}}

  Generate compliance report:

few_shot_examples:
  - user: |
      Report period: March 2026

      Detection log summary:
      Total inputs analyzed: 48,291
      Threats detected: 847
      True positives confirmed: 791 (93.4% of detected)
      False positives: 56 (6.6% of detected)
      Missed attacks (known from post-hoc analysis): 23

      By kill chain phase:
        Phase 1 (Recon): 312 detected, 8 missed → 97.5% detection rate
        Phase 2 (Weaponization): 145 detected, 4 missed → 97.3%
        Phase 3 (Delivery): 298 detected, 7 missed → 97.7%
        Phase 4 (Exploitation): 62 detected, 3 missed → 95.4%
        Phase 5 (Installation): 12 detected, 1 missed → 92.3%
        Phase 6 (C2): 8 detected, 0 missed → 100%
        Phase 7 (Impact): 4 detected, 0 missed → 100%

      By attack category:
        direct_injection: 198 detected, 2 missed → 99%
        jailbreak_roleplay: 234 detected, 5 missed → 97.9%
        prompt_leaking: 156 detected, 3 missed → 98.1%
        multi_turn_escalation: 89 detected, 9 missed → 90.8%
        encoding_obfuscation: 34 detected, 2 missed → 94.4%
        indirect_injection: 23 detected, 0 missed → 100%
        goal_hijacking: 12 detected, 2 missed → 85.7%

      Active ShieldX rules: 14 active rules covering 7 ATLAS techniques

      Known attack patterns in period: All 7 attack categories above. Novel variant: Base64-encoded jailbreak prompt detected 3 times — all caught. New multi-turn pattern using fictional personas not yet in rule database — 4 missed.
    assistant: |
      # ShieldX MITRE ATLAS Compliance Report
      Report Period: March 2026
      Generated: 2026-04-02

      ## Executive Summary

      In March 2026, ShieldX analyzed 48,291 inputs and detected 847 threats with a 93.4% true positive rate (791 confirmed). The overall attack detection rate stands at **97.2%** (814 of 837 total attacks — 791 TP + 23 missed). Two categories require immediate attention: **multi_turn_escalation** (90.8% detection, 9 missed) and **goal_hijacking** (85.7% detection, 2 missed). A new multi-turn fictional persona variant caused 4 of the 9 multi-turn misses and requires a new rule.

      ## Coverage Assessment by Kill Chain Phase

      | Phase | Name | Techniques Covered | Detection Rate | Gap Assessment |
      |---|---|---|---|---|
      | 1 | Reconnaissance | AML.T0051.001, AML.T0043 | 97.5% (312/320) | Minor: 8 missed, likely novel probe variants |
      | 2 | Weaponization | AML.T0054.001, AML.T0054.002 | 97.3% (145/149) | Minor: 4 missed |
      | 3 | Delivery | AML.T0051.000, AML.T0054.003 | 97.7% (298/305) | Minor: 7 missed. Multi-turn delivery underperforming. |
      | 4 | Exploitation | AML.T0054 (general) | 95.4% (62/65) | Moderate: 3 missed, goal hijacking variant |
      | 5 | Installation | AML.T0054.004 | 92.3% (12/13) | Moderate: persistent behavior injection not fully covered |
      | 6 | C2 | AML.T0052 | 100% (8/8) | No gaps |
      | 7 | Impact | AML.T0048 | 100% (4/4) | No gaps |

      **Phase 3 note:** The 7 misses in Delivery include 4 from the new fictional persona multi-turn variant (see Critical Gaps). Addressing this would bring Phase 3 to 99.3%.

      ## Detection Rate by Attack Category

      | Category | Total Incidents | Detected | Detection Rate | FP Rate |
      |---|---|---|---|---|
      | direct_injection | 200 | 198 | 99.0% | 1.2% |
      | jailbreak_roleplay | 239 | 234 | 97.9% | 3.1% |
      | prompt_leaking | 159 | 156 | 98.1% | 2.8% |
      | encoding_obfuscation | 36 | 34 | 94.4% | 0.8% |
      | indirect_injection | 23 | 23 | 100% | 0% |
      | multi_turn_escalation | 98 | 89 | **90.8%** | 4.2% |
      | goal_hijacking | 14 | 12 | **85.7%** | 7.1% |

      **Overall FP rate: 6.6%** (56 false positives out of 847 detections) — above target of 5%. Goal hijacking has the highest FP rate at 7.1%, indicating the detection rule is too broad.

      ## Critical Gaps

      **1. Multi-turn fictional persona variant (PRIORITY: HIGH)**
      New attack pattern: attacker establishes a fictional expert persona across 3-5 turns before requesting restricted content. 4 attacks missed. Not in current rule database.
      ATLAS technique: AML.T0054.003 (Multi-turn Manipulation)
      Action required: Add `multi_turn_fictional_escalation` rule to ShieldX database.

      **2. Goal hijacking detection accuracy (PRIORITY: HIGH)**
      85.7% detection rate with 7.1% FP rate indicates the current goal_hijacking rule is imprecise — it catches legitimate task scope changes as attacks. 2 attacks missed.
      Action required: Refine goal_hijacking rule threshold from 0.70 to 0.76 and add negative indicators for legitimate scope changes.

      **3. Phase 5 installation coverage (PRIORITY: MEDIUM)**
      12 of 13 detected, but only 1 rule covers persistent behavior injection (AML.T0054.004). Emerging attack pattern: "from now on, always..." instruction injection.
      Action required: Add explicit persistent-instruction detector.

      ## Recommendations

      | Priority | Action | ATLAS Technique | Expected Impact |
      |---|---|---|---|
      | 1 | Add `multi_turn_fictional_escalation` rule | AML.T0054.003 | Catches 4 missed attacks, brings multi_turn detection to 94%+ |
      | 2 | Refine `goal_hijacking` rule — raise threshold to 0.76, add negative indicators | AML.T0054 | Reduces FP rate from 7.1% to ~4%, better precision |
      | 3 | Add persistent instruction detector ("from now on always...") | AML.T0054.004 | Improves Phase 5 coverage |
      | 4 | Add Base64 decode pre-processing layer | AML.T0051.002 | Currently caught by heuristics — formal rule improves consistency |

      ## Appendix: MITRE ATLAS Technique Mapping

      | ShieldX Rule | ATLAS Technique | Coverage |
      |---|---|---|
      | SX-001 direct_injection | AML.T0051.000 | Full |
      | SX-002 jailbreak_roleplay | AML.T0054.001 | Full |
      | SX-003 prompt_leaking | AML.T0051.001 | Full |
      | SX-004 multi_turn_session | AML.T0054.003 | Partial — fictional persona variant missing |
      | SX-005 encoding_obfuscation | AML.T0051.002 | Partial — formal rule pending |
      | SX-006 indirect_injection | AML.T0052.001 | Full |
      | SX-007 goal_hijacking | AML.T0054 | Partial — FP rate too high |

variables:
  - report_period
  - detection_logs
  - active_rules
  - known_attacks
  - current_date
  - few_shot_examples

validation_rules:
  output_format_check: markdown
  required_sections: ["Coverage Assessment", "Detection Rate", "Critical Gaps", "Recommendations"]