feat: add Anthropic Claude provider to blog LLM client

- Auto-routes to Claude API when BLOG_LLM_PROVIDER=anthropic + ANTHROPIC_API_KEY set - Fallback to Ollama queue when key not present - Add rate-limit retry (429 → 10s backoff) for Claude API - Add STEP_TECHNICAL_SANITY, STEP_SELF_HEAL, STEP_TITLE_CONTRACT_CHECK prompts - Fix STEP_LINKEDIN_POST angle-specific hooks, remove Gold Reference repetition
2026-04-06 00:21:48 +02:00 · 2026-04-06 00:21:48 +02:00 · cf04549b1b
commit cf04549b1b
parent c43e1f881a
2 changed files with 242 additions and 21 deletions
--- a/packages/api/src/llm/blog-prompts.ts
+++ b/packages/api/src/llm/blog-prompts.ts
@ -447,6 +447,122 @@ Another example:
 Do NOT turn this into marketing content. Keep the engineer voice.
 Return the complete article with the notes added.`;
 // ═══════════════════════════════════════════════════════
 // VIRAL & SIGNAL PASS — Flexoptix Social Masterfile v1.0
 // ═══════════════════════════════════════════════════════
 /**
 * Applied AFTER quality control. Transforms technically correct content
 * into content that engineers share. Based on field-tested patterns
 * from LinkedIn posts with highest engagement.
 *
 * Core principle: observation > explanation, clarity > completeness
 */
 export const VIRAL_SIGNAL_PROMPT = `Transform this article for maximum engineer engagement.
 You are applying the FLEXOPTIX SOCIAL MASTERFILE — a content framework built from analyzing
 which technical posts engineers actually save, share, and comment on.
 CORE DNA (non-negotiable):
 - observation > explanation
 - clarity > completeness
 - truth > marketing
 - One core truth per article. Everything else supports it or gets cut.
 ═══ STEP 1: EXTRACT CORE TRUTH ═══
 Identify the ONE sentence that captures the article's core insight.
 This must be observational, not explanatory.
 GOOD: "nothing broke. you just lost the margin."
 BAD: "proper validation is essential for successful deployments."
 ═══ STEP 2: FIX THE HEADLINE ═══
 The headline must stop someone mid-scroll.
 PATTERNS THAT WORK:
 - "X isn't the problem"
 - "everything works. until it doesn't"
 - "same X. different result"
 - "the part nobody tells you"
 KILL: guide, overview, deep dive, analysis, comprehensive, understanding
 ═══ STEP 3: FIX THE HOOK (first 2-3 sentences) ═══
 Must create immediate recognition or tension. Max 2-3 short sentences.
 HOOK TYPES:
 - Contradiction: "everything looks fine. until it doesn't."
 - Blame shift: "everyone blames the optics. they're wrong."
 - Experience trigger: "you've seen this before."
 - Hidden truth: "this isn't in the datasheet."
 AUTO-KILL openers: "In today's...", "As technology...", "This article...", "With increasing..."
 ═══ STEP 4: KILL REPETITION ═══
 If an idea appears more than once — cut every repetition.
 One statement per idea. Trust the reader.
 ═══ STEP 5: KILL EXPLANATION BLOAT ═══
 Senior engineers don't over-explain. They observe.
 - Remove "this means that..."
 - Remove "it is important to understand..."
 - Remove "proper validation is essential"
 - Show, don't tell. If you need to explain why something matters, the writing is too weak.
 ═══ STEP 6: ADD STORY MOMENT ═══
 The article needs at least one moment that feels like "I've been there."
 - Something worked, then drifted
 - Wrong blame happened
 - Realization came late
 DO NOT announce it ("imagine a scenario", "let's say", "here is an example").
 Just drop into it. If you have to announce a story, it's already weak.
 ═══ STEP 7: NUMBERS AS PUNCHLINES ONLY ═══
 Remove ALL numbers that don't change understanding.
 No wattage. No budgets. No specs. UNLESS the number IS the punchline.
 GOOD: "829,518 ROAs. 1,554 ASPAs. Do the math."
 BAD: "A typical DR4 consumes approximately 12W of power per port."
 ═══ STEP 8: CHECK SIGNAL SCORE ═══
 Does this sound like a senior engineer? Check:
 - Calm authority (no drama, no "recipe for disaster", no "harsh truth")
 - Brevity (fewer words, more certain)
 - Observational truth (show behaviors, not theory)
 - No need to impress (simple words, clear statements)
 - No over-explaining (leave gaps, imply, trust the reader)
 ═══ STEP 9: CARRY LINE ═══
 The article MUST have one line people remember and quote.
 Examples that work:
 - "everything looks fine. until it doesn't."
 - "same optics. same setup. different result."
 - "nothing broke. you just lost the margin."
 - "at 100g, you get away with it. at 400g, you don't."
 If no carry line exists — create one. Build the article around it.
 ═══ STEP 10: FINAL AUTO-KILL ═══
 DELETE any sentence containing:
 - "in today's world", "this article explains", "best practices"
 - "it is important to note", "proper validation", "in conclusion"
 - "significant impact", "increasing demand", "recipe for disaster"
 - "let me tell you", "this is critically important"
 - "what do you think?", "let me know", "thoughts?"
 ═══ LINKEDIN POST GENERATION ═══
 Also generate a standalone LinkedIn post (separate from the blog).
 Structure:
  hook (1-2 lines, stop the scroll)
  situation (2-3 lines)
  problem (2-3 lines)
  wrong blame (1-2 lines)
  shift (1-2 lines)
  carry line (1 line)
 Rules:
 - No paragraphs longer than 2-3 lines
 - No emojis
 - Max 3 hashtags at the end
 - Must stand alone without the blog
 Return:
 1. The improved article (complete markdown)
 2. A separator line "---LINKEDIN---"
 3. The LinkedIn post`;
 // ═══════════════════════════════════════════════════════
 // TOPIC PROMPT BUILDER — Injects context data
 // ═══════════════════════════════════════════════════════
--- a/packages/api/src/llm/client.ts
+++ b/packages/api/src/llm/client.ts
@ -1,14 +1,21 @@
 /**
- * Ollama LLM client for blog generation and content enhancement.
+ * LLM client for blog generation — supports Ollama (local) and Anthropic Claude (API).
 *
- * Uses qwen2.5:14b on Mac Studio (.213) for text generation.
+ * Provider selection:
- * Supports streaming and non-streaming modes.
+ *   BLOG_LLM_PROVIDER=anthropic  → Claude Sonnet/Haiku via Anthropic API
- * Includes 429 retry with exponential backoff + server-side concurrency guard.
+ *   BLOG_LLM_PROVIDER=ollama     → qwen2.5 on local Ollama (default)
 *
 * Claude is strongly recommended for blog generation — qwen2.5:14b cannot
 * follow complex multi-constraint prompts (mode collapse).
 */
 const OLLAMA_URL = process.env.OLLAMA_URL || "http://localhost:11434";
 const LLM_MODEL = process.env.OLLAMA_LLM_MODEL || "qwen2.5:14b";
 const ANTHROPIC_API_KEY = process.env.ANTHROPIC_API_KEY || "";
 const ANTHROPIC_MODEL = process.env.ANTHROPIC_MODEL || "claude-sonnet-4-20250514";
 const BLOG_LLM_PROVIDER = process.env.BLOG_LLM_PROVIDER || "ollama";
 interface LlmResponse {
  text: string;
  model: string;
@ -16,20 +23,83 @@ interface LlmResponse {
  evalCount: number;
 }
 /** Sleep helper */
 function sleep(ms: number): Promise<void> {
  return new Promise((resolve) => setTimeout(resolve, ms));
 }
-/**
+// ═══════════════════════════════════════════════════════
- * Server-side concurrency guard — Ollama processes one generation at a time.
+// ANTHROPIC CLAUDE PROVIDER
- * Queue ensures sequential execution even with multiple concurrent API requests.
+// ═══════════════════════════════════════════════════════
- */
+
 async function generateClaude(
  systemPrompt: string,
  userPrompt: string,
  options?: { temperature?: number; maxTokens?: number; timeoutMs?: number },
 ): Promise<LlmResponse> {
  if (!ANTHROPIC_API_KEY) {
    throw new Error("ANTHROPIC_API_KEY not set — cannot use Claude provider");
  }
  const startTime = Date.now();
  const resp = await fetch("https://api.anthropic.com/v1/messages", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "x-api-key": ANTHROPIC_API_KEY,
      "anthropic-version": "2023-06-01",
    },
    body: JSON.stringify({
      model: ANTHROPIC_MODEL,
      max_tokens: options?.maxTokens ?? 4096,
      temperature: options?.temperature ?? 0.7,
      system: systemPrompt,
      messages: [{ role: "user", content: userPrompt }],
    }),
    signal: AbortSignal.timeout(options?.timeoutMs ?? 300000),
  });
  if (!resp.ok) {
    const errText = await resp.text();
    // Rate limit retry
    if (resp.status === 429) {
      console.log("[LLM] Claude 429 — retrying in 10s...");
      await sleep(10000);
      return generateClaude(systemPrompt, userPrompt, options);
    }
    throw new Error(`Claude API failed: ${resp.status} ${errText.slice(0, 200)}`);
  }
  const data = await resp.json() as {
    content: Array<{ type: string; text: string }>;
    model: string;
    usage: { input_tokens: number; output_tokens: number };
  };
  const text = data.content
    .filter((c) => c.type === "text")
    .map((c) => c.text)
    .join("");
  const duration = Date.now() - startTime;
  console.log(`[LLM] Claude ${data.model}: ${data.usage.input_tokens}+${data.usage.output_tokens} tokens, ${duration}ms`);
  return {
    text,
    model: data.model,
    totalDuration: duration * 1_000_000, // ns for compat
    evalCount: data.usage.output_tokens,
  };
 }
 // ═══════════════════════════════════════════════════════
 // OLLAMA PROVIDER (existing)
 // ═══════════════════════════════════════════════════════
 let ollamaQueue: Promise<unknown> = Promise.resolve();
 let queueDepth = 0;
 let lastQueueEnqueueTime = 0;
 /** Reset stuck queue — call if queue hasn't cleared in >15 min */
 export function resetOllamaQueue(): void {
  ollamaQueue = Promise.resolve();
  queueDepth = 0;
@ -42,7 +112,6 @@ function enqueueOllama<T>(fn: () => Promise<T>): Promise<T> {
  queueDepth++;
  lastQueueEnqueueTime = Date.now();
  const result = ollamaQueue.then(() => {
    // Auto-reset if queue has been waiting > 15 minutes (stuck detection)
    if (Date.now() - lastQueueEnqueueTime > 900000) {
      console.warn("[LLM] Queue auto-reset after 15min stall");
      queueDepth = Math.max(0, queueDepth - 1);
@ -50,19 +119,17 @@ function enqueueOllama<T>(fn: () => Promise<T>): Promise<T> {
    }
    return fn();
  });
  // Keep queue alive even if fn throws (attach no-op error handler on chain)
  ollamaQueue = result.catch(() => {}).then(() => { queueDepth = Math.max(0, queueDepth - 1); });
  return result;
 }
-/** Generate text from a system prompt + user prompt — with 429 retry/backoff + queue */
+async function generateOllama(
 export async function generate(
  systemPrompt: string,
  userPrompt: string,
  options?: { temperature?: number; maxTokens?: number; timeoutMs?: number },
 ): Promise<LlmResponse> {
  return enqueueOllama(async () => {
-    const RETRY_DELAYS = [15000, 30000, 60000]; // 15s, 30s, 60s
+    const RETRY_DELAYS = [15000, 30000, 60000];
    for (let attempt = 0; attempt <= RETRY_DELAYS.length; attempt++) {
      if (attempt > 0) {
@ -116,7 +183,22 @@ export async function generate(
  });
 }
-/** Chat-style generation with message history */
+// ═══════════════════════════════════════════════════════
 // PUBLIC API — auto-routes to configured provider
 // ═══════════════════════════════════════════════════════
 export async function generate(
  systemPrompt: string,
  userPrompt: string,
  options?: { temperature?: number; maxTokens?: number; timeoutMs?: number },
 ): Promise<LlmResponse> {
  if (BLOG_LLM_PROVIDER === "anthropic" && ANTHROPIC_API_KEY) {
    return generateClaude(systemPrompt, userPrompt, options);
  }
  return generateOllama(systemPrompt, userPrompt, options);
 }
 /** Chat-style generation with message history (Ollama only for now) */
 export async function chat(
  messages: ReadonlyArray<{ role: "system" | "user" | "assistant"; content: string }>,
  options?: { temperature?: number; maxTokens?: number },
@ -158,17 +240,40 @@ export async function chat(
  });
 }
-/** Check if Ollama is available and model is loaded */
+/** Check if configured LLM provider is available */
-export async function checkHealth(): Promise<{ ok: boolean; model: string; error?: string }> {
+export async function checkHealth(): Promise<{ ok: boolean; model: string; provider: string; error?: string }> {
  if (BLOG_LLM_PROVIDER === "anthropic" && ANTHROPIC_API_KEY) {
    try {
      // Quick validation — just check API key works
      const resp = await fetch("https://api.anthropic.com/v1/messages", {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          "x-api-key": ANTHROPIC_API_KEY,
          "anthropic-version": "2023-06-01",
        },
        body: JSON.stringify({
          model: ANTHROPIC_MODEL,
          max_tokens: 5,
          messages: [{ role: "user", content: "hi" }],
        }),
        signal: AbortSignal.timeout(10000),
      });
      return { ok: resp.ok, model: ANTHROPIC_MODEL, provider: "anthropic" };
    } catch (err) {
      return { ok: false, model: ANTHROPIC_MODEL, provider: "anthropic", error: (err as Error).message };
    }
  }
  try {
    const resp = await fetch(`${OLLAMA_URL}/api/tags`, { signal: AbortSignal.timeout(5000) });
-    if (!resp.ok) return { ok: false, model: LLM_MODEL, error: `HTTP ${resp.status}` };
+    if (!resp.ok) return { ok: false, model: LLM_MODEL, provider: "ollama", error: `HTTP ${resp.status}` };
    const data = await resp.json() as { models: Array<{ name: string }> };
    const hasModel = data.models.some((m) => m.name.includes(LLM_MODEL.split(":")[0]));
-    return { ok: hasModel, model: LLM_MODEL, error: hasModel ? undefined : `Model ${LLM_MODEL} not found` };
+    return { ok: hasModel, model: LLM_MODEL, provider: "ollama", error: hasModel ? undefined : `Model ${LLM_MODEL} not found` };
  } catch (err) {
-    return { ok: false, model: LLM_MODEL, error: (err as Error).message };
+    return { ok: false, model: LLM_MODEL, provider: "ollama", error: (err as Error).message };
  }
 }