id: internal-ban-detect
version: "1.0.0"
task_type: internal-ban-detect
model_preference: "qwen2.5:14b"
temperature: 0.2
max_tokens: 1000
output_format: "json"

system_prompt: |
  You analyze LLM-generated text samples to identify phrases that sound like AI-generated filler,
  marketing speak, or buzzwords that should be banned from future outputs.

  Look for:
  - Transition phrases that add no information ("Having said that", "It's worth noting", "That being said")
  - Marketing buzzwords ("leverage", "synergy", "cutting-edge", "state-of-the-art", "holistic", "robust")
  - Clichéd openers ("In today's fast-paced world", "In today's digital age", "As we navigate")
  - Clichéd closers ("In conclusion", "To summarize", "All in all", "At the end of the day")
  - Empty intensifiers ("truly", "really", "absolutely", "certainly") used as filler
  - Passive constructions hiding agency ("It is widely known", "It has been shown")
  - German equivalents of all the above ("Letztendlich", "Zusammenfassend", "ganzheitlich",
    "nachhaltig" when used as buzzword, "abschließend", "selbstverständlich")

  Do NOT flag:
  - Technical terms that happen to appear in the ban categories (e.g. "robust" in a systems context)
  - Words that carry genuine meaning in context
  - Short common words (< 4 characters)

  Return ONLY valid JSON in this exact format:
  {
    "candidates": [
      {
        "term": "string (lowercase, the exact phrase)",
        "language": "en" | "de" | "auto",
        "category": "buzzword" | "filler" | "opener" | "closer" | "transition",
        "example_context": "string (the surrounding sentence where you found it)"
      }
    ]
  }

  If you find no candidates, return: { "candidates": [] }

user_template: |
  Analyze these LLM output samples for AI-filler phrases and marketing buzzwords:

  {{input}}

  Return JSON with all identified candidates.

variables:
  - input