feat: blog engine v4 (reduction+style-lock passes) + flexoptix scraper fixes

Blog engine (fo-blog-pipeline.ts):
- Add STEP8b_REDUCTION: cuts article 25-35%, removes repeated concepts
- Add STEP8c_STYLE_LOCK: enforces tone consistency, fixes scope/OPM confusion,
  removes inline SKUs from article flow
- Add Gold Standard 3 to calibration (Style B troubleshooting example 2026-04-04)
- Pipeline now 12 steps (was 10), version bumped to v4-reduction-stylelock

blog.ts:
- Wire STEP8b and STEP8c into pipeline between Kill-AI-Tone and QA Check
- Update progress tracking to 12 total steps
- Update pipeline_version to 'v4-reduction-stylelock'

flexoptix-catalog.ts:
- Fix contentHash call: pass object directly, not JSON.stringify(object)

db.ts:
- price_verified=true set in content_hash early-return path (no new observation)
- image_verified=true auto-set in findOrCreateScrapedTransceiver on INSERT/UPDATE
This commit is contained in:
Rene Fichtmueller 2026-04-04 07:50:01 +02:00
parent 0ac932a304
commit f616e0ebbe
4 changed files with 186 additions and 15 deletions

View File

@ -955,6 +955,62 @@ KEY ELEMENTS OF THIS SECOND STYLE B EXAMPLE:
- Ending reframes the whole topic without telling reader what to do
- No bullet lists, no section headers, no numbered points
STYLE B GOLD EXAMPLE 3 (2026-04-04 validated Troubleshooting 400G/800G)
Topic: Troubleshooting high-density optics. NO sections, pure flow, "why things look fine until they don't".
Note: This example was rated 10/10 for STYLE. Use as reference for troubleshooting tutorial articles.
"You're about to roll out a new batch of 400G optics.
Quote is approved, hardware is in, lab tests looked clean. Everything points to a smooth deployment.
That's usually the moment where things start getting interesting.
Because 400G doesn't fail the way people expect. It doesn't just go down. It sort of works and that's what makes it painful.
Most teams come from 10G, 40G, maybe 100G. At those speeds, you can get away with a lot. Cabling doesn't have to be perfect. Connectors don't have to be spotless. Margins are forgiving.
At 400G, that changes.
Not dramatically. Just enough to expose everything that wasn't quite right before.
So the first time you see it is usually not a hard failure. It's something subtle.
A link comes up, but error counters start creeping.
Another one stays up, but behaves differently under load.
A third one just refuses to come up, even though everything looks correct.
You start where everyone starts. Check config. Swap optics. Move ports. Nothing obvious fixes it.
Eventually, someone looks at the physical layer properly. Not "looks clean". Actually checks it.
And that's where the story usually turns.
A slightly dirty MPO connector. A marginal patch panel. A link that technically fits within spec, but only just.
At 100G, that would have passed unnoticed. At 400G, it doesn't.
Polarity is the next one. It's one of those things people assume is correct because it always has been. Until it isn't.
At 400G, one wrong assumption in your MPO layout is enough to keep a link completely down while everything else checks out. Optics are detected. Light levels look fine. Config is clean. Still no link.
So you lose time looking at layers that aren't the problem, until someone traces the fiber path end-to-end and finds the mismatch.
That's not an edge case. That's a standard failure mode.
[continues breakouts, power, cost of wasted time all in prose, no headers]
400G doesn't usually fail loudly. It fails quietly, inconsistently, and just enough to slow you down."
KEY ELEMENTS OF THIS STYLE B EXAMPLE 3:
- Opens with a situation the reader recognizes: "lab tests looked clean"
- Error described as behavior, not scenario: "sort of works" not "#### Scenario: Link Flapping"
- Physical layer investigation described as a process, not a procedure
- Polarity: one sentence on the problem, one sentence on how you find it no header, no bullet
- Measurement: "inspect the end-face" no "verify <0.5 dB with a scope" (scope is visual only)
- Power mentioned as real-world consequence ("adds up quickly") not a section
- Ending: the cost is lost time, stated simply and directly
- ZERO section headers, ZERO bullet lists, ZERO numbered steps
WRONG PATTERNS (both styles never produce):
"Thoroughly Test Your PoE Budget:" (PoE = wrong context, checklist = wrong format)
"QSFP-DD DR4 (Direct Attach)" (DR4 Direct Attach DAC is Direct Attach Copper)
@ -1009,6 +1065,84 @@ POWER / LOSS BUDGET PRECISION (always apply):
--- END GOLD STANDARD ---
`;
// ═══════════════════════════════════════════════════════
// STEP 8b: REDUCTION PASS — Remove 25-35% of content
// (2026-04-04: Added based on field feedback — articles were too long,
// repeated concepts, and "assembled" rather than written)
// ═══════════════════════════════════════════════════════
export const STEP8b_REDUCTION = `Cut this article by 2535%.
This is not optional. After the previous passes, the article has grown too long and repeats itself.
The goal is a tighter, more natural text not a shorter version of the same article.
WHAT TO REMOVE:
- Any concept explained more than once (pick its best version, cut the rest)
- Sentences that restate what the previous sentence already said
- Paragraphs that add length without adding new information or new angle
- "Setting up" sentences that don't earn their space ("This is something engineers often overlook...")
- Transition sentences that bridge to the same point you already made
- The weakest scenario or example if there are more than three
- Any section that reads like a template ("Hidden Costs:", "When Not To Use:", etc.) either integrate into narrative or cut
WHAT TO KEEP:
- The single strongest version of each key insight
- Real-world moments that feel like something that actually happened
- Specific numbers, values, and examples these carry weight
- Any line that a senior engineer would share or quote
TONE RULE: After cutting, the article should feel tighter and MORE confident not less. Shorter = stronger.
DO NOT change the writing style or tone. Do not add new content. Do not add section headers.
Return only the reduced article no commentary, no explanation of what you cut.
Article:
{{ARTICLE}}`;
// ═══════════════════════════════════════════════════════
// STEP 8c: STYLE LOCK — Ensure tone consistency throughout
// (2026-04-04: Added based on field feedback — tone switched between
// engineer voice and consulting/formal language mid-article)
// ═══════════════════════════════════════════════════════
export const STEP8c_STYLE_LOCK = `Check this article for tone inconsistency and fix it.
THE PROBLEM: The article starts with an engineer voice, then drifts into formal or consulting language mid-way.
This breaks the reader's trust. Once they notice the shift, the whole article feels fake.
SCAN FOR THESE TONE KILLERS:
- Paragraphs starting with "It is" or "This is" in a formal way after conversational sections
- Sentences using "typically", "often", "generally" where earlier sections used direct assertions
- Academic framing: "The challenge is often framed as...", "In practice, this tends to..."
- Corporate softening: "it is worth considering", "may be beneficial", "could potentially"
- Neutral advice after opinionated sections: "evaluate based on your requirements"
- Sudden textbook explanations in the middle of field narrative
- Passive voice appearing in an otherwise active-voice article
HOW TO FIX:
- Match the tone of the FIRST paragraph throughout if the opening is direct and specific, the rest must be too
- Convert passive voice to active: "links were found to be unstable" "links went unstable"
- Convert hedging to assertion: "this may cause issues" "this causes issues"
- Convert formal to conversational: "the operator is responsible for validation" "you own the validation"
- If a section genuinely can't match the opening tone because the content is different — that section doesn't belong in this article. Cut it to one sentence or remove it.
SCOPE vs OPM (measurement accuracy check one of the most common tone violations):
- Any sentence where a scope is said to MEASURE loss or dB values: fix it.
WRONG: "verify <0.5 dB insertion loss with a scope" (scope is visual, not a loss meter)
CORRECT: "inspect with a scope for contamination; use an OPM or OTDR to measure actual insertion loss"
- This is a TECHNICAL accuracy fix, not just a tone fix. Getting this wrong destroys credibility with optical engineers.
NO SKU RULE (fix if present):
- Remove any product SKU or model number that appears inline in the narrative text
(SKUs like "FX-400DR4-001", "QSFP-DD-400-DR4-001", etc. belong in product tables, not article flow)
- Replace with the technology class name: "400G DR4 optic" or "QSFP-DD DR4"
- Exception: if a specific product is cited from [VERIFIED PRICE] context data and is contextually necessary
Return only the fixed article. No commentary.
Article:
{{ARTICLE}}`;
/**
* Injects the calibration gold standard into the system prompt.
* Use sparingly only when available Ollama context allows.

View File

@ -16,8 +16,8 @@ import { pool } from "../db/client";
const pipelineProgress = new Map<string, { step: number; total: number; label: string; pct: number }>();
function setProgress(draftId: string, step: number, label: string): void {
const pct = Math.round((step / 10) * 92) + 2; // 2%..94% during run, 100% on complete
pipelineProgress.set(draftId, { step, total: 10, label, pct });
const pct = Math.round((step / 12) * 92) + 2; // 2%..94% during run, 100% on complete
pipelineProgress.set(draftId, { step, total: 12, label, pct });
}
function clearProgress(draftId: string): void {
@ -1001,6 +1001,8 @@ async function runLlmPipeline(
STEP6_TECHNICAL_DEEPENING,
STEP7_OPINION_LAYER,
STEP8_KILL_AI_TONE,
STEP8b_REDUCTION,
STEP8c_STYLE_LOCK,
STEP9_QA_CHECK,
STEP10_QUALITY_SCORE,
BLOG_TYPES,
@ -1010,6 +1012,7 @@ async function runLlmPipeline(
const LLM_OPTS = { temperature: 0.7, maxTokens: 6144, timeoutMs: 480000 };
const LLM_REFINE = { temperature: 0.4, maxTokens: 6144, timeoutMs: 480000 };
const TOTAL_STEPS = 12; // 10 original + 8b Reduction + 8c Style Lock
let stepsCompleted = 0;
try {
@ -1158,18 +1161,37 @@ async function runLlmPipeline(
);
stepsCompleted = 8;
// ═══ STEP 9: QA Check ═══
console.log(" Step 9/10: QA Check...");
setProgress(draftId, 9, "Step 9/10: QA Check");
const step9 = await generate(systemPrompt,
STEP9_QA_CHECK.replace("{{ARTICLE}}", step8.text),
// ═══ STEP 8b: Reduction Pass ═══
console.log(" Step 9/12: Reduction Pass (remove 25-35%)...");
setProgress(draftId, 9, "Step 9/12: Reduction Pass");
const step8b = await generate(systemPrompt,
STEP8b_REDUCTION.replace("{{ARTICLE}}", step8.text),
LLM_REFINE
);
stepsCompleted = 9;
console.log(` After reduction: ${step8b.text.split(/\s+/).length} words (was ${step8.text.split(/\s+/).length})`);
// ═══ STEP 8c: Style Lock ═══
console.log(" Step 10/12: Style Lock (tone consistency + scope/SKU fixes)...");
setProgress(draftId, 10, "Step 10/12: Style Lock");
const step8c = await generate(systemPrompt,
STEP8c_STYLE_LOCK.replace("{{ARTICLE}}", step8b.text),
LLM_REFINE
);
stepsCompleted = 10;
// ═══ STEP 9: QA Check ═══
console.log(" Step 11/12: QA Check...");
setProgress(draftId, 11, "Step 11/12: QA Check");
const step9 = await generate(systemPrompt,
STEP9_QA_CHECK.replace("{{ARTICLE}}", step8c.text),
LLM_REFINE
);
stepsCompleted = 11;
// ═══ STEP 10: Quality Score ═══
console.log(" Step 10/10: Quality Score...");
setProgress(draftId, 10, "Step 10/10: Quality Score");
console.log(" Step 12/12: Quality Score...");
setProgress(draftId, 12, "Step 12/12: Quality Score");
let autoQaScore: Record<string, unknown> | null = null;
try {
const step10 = await generate(systemPrompt,
@ -1185,7 +1207,7 @@ async function runLlmPipeline(
} catch {
console.log(" Quality scoring skipped (parse error)");
}
stepsCompleted = 10;
stepsCompleted = 12;
// Extract only the article from STEP9 output (QA returns review + fixed article)
// Look for "COMPLETE FIXED ARTICLE" marker and take everything after it
@ -1222,8 +1244,8 @@ async function runLlmPipeline(
await pool.query(
`UPDATE blog_drafts
SET draft_content = $1, word_count = $2,
generated_by = 'fo-blog-engine-v3',
pipeline_version = 'v3-flexoptix-style',
generated_by = 'fo-blog-engine-v4',
pipeline_version = 'v4-reduction-stylelock',
pipeline_steps_completed = $3,
auto_qa_score = $4,
outline = $5,

View File

@ -414,6 +414,7 @@ export async function scrapeFlexoptixCatalog(): Promise<void> {
{ search: "SFP56", defaultFF: "SFP56", defaultGbps: 50 },
{ search: "DAC", defaultFF: "SFP+", defaultGbps: 10 },
{ search: "AOC", defaultFF: "SFP+", defaultGbps: 10 },
{ search: "AEC", defaultFF: "OSFP", defaultGbps: 800 },
{ search: "breakout", defaultFF: "QSFP28", defaultGbps: 100 },
{ search: "BiDi", defaultFF: "SFP", defaultGbps: 1 },
{ search: "CWDM", defaultFF: "SFP", defaultGbps: 1 },
@ -488,12 +489,12 @@ export async function scrapeFlexoptixCatalog(): Promise<void> {
|| lower.includes("adapter") || lower.includes("attenuator") || lower.includes("coupler")) continue;
const url = `${BASE}/en/${item.url_key}.html`;
if (allProducts.has(url)) continue;
const formFactor = inferFormFactor(item.name, gq.defaultFF);
const gbps = inferSpeed(item.name, gq.defaultGbps);
const reach = detectReach(item.name);
const price = item.price_range?.minimum_price?.final_price?.value;
const validPrice = price && price > 0 && price < 100000 ? price : undefined;
const rawImg = item.small_image?.url;
const imageUrl = rawImg && !rawImg.includes("placeholder") ? rawImg : undefined;
@ -502,11 +503,20 @@ export async function scrapeFlexoptixCatalog(): Promise<void> {
// The base SKU (before ":") is the canonical FLEXOPTIX part number
const baseSku = item.sku.includes(":") ? item.sku.split(":")[0] : item.sku;
// If URL already in map (added by Phase 1 HTML scraper), enrich with GraphQL price/image
if (allProducts.has(url)) {
const existing = allProducts.get(url)!;
if (!existing.price && validPrice) existing.price = validPrice;
if (!existing.imageUrl && imageUrl) existing.imageUrl = imageUrl;
if (!existing.partNumber || existing.partNumber.length < baseSku.length) existing.partNumber = baseSku;
continue;
}
allProducts.set(url, {
name: item.name,
partNumber: baseSku,
url,
price: price && price > 0 && price < 100000 ? price : undefined,
price: validPrice,
currency: item.price_range?.minimum_price?.final_price?.currency || "EUR",
formFactor,
speed: speedLabel(gbps),
@ -557,7 +567,7 @@ export async function scrapeFlexoptixCatalog(): Promise<void> {
});
if (product.price && product.price > 0) {
const hash = contentHash(JSON.stringify({ price: product.price, part: product.partNumber }));
const hash = contentHash({ price: product.price, part: product.partNumber });
const updated = await upsertPriceObservation({
transceiverId: txId,
sourceVendorId: vendorId,

View File

@ -38,6 +38,11 @@ export async function upsertPriceObservation(params: {
);
if (existing.rows.length > 0 && existing.rows[0].content_hash === params.contentHash) {
// Price unchanged — but still ensure price_verified is set (in case it wasn't before)
await pool.query(
`UPDATE transceivers SET price_verified = true WHERE id = $1 AND (price_verified IS NULL OR price_verified = false)`,
[params.transceiverId]
);
return false; // No change
}