CODEX-TASK-flexoptix-reference-matching.md — comprehensive plan to fix zero-match gap for ATGBICS/NADDOD/10Gtek/ShopFiber24 (8.260+ products with 0 Flexoptix equivalences). Root cause: 30-day price_observation window excludes vendors whose scrapers ran >30 days ago. Solution: catalog-reconcile robot (full bulk match, no time limit), form_factor normalization (SQL 108), 30→90 day window fix in nightly matcher, on-demand API endpoint. Expected: coverage from 22% → 45-60% after one reconcile run.
738 lines
28 KiB
Markdown
738 lines
28 KiB
Markdown
# CODEX TASK: Flexoptix als Referenz-Katalog — Vollständige Equivalenz-Abdeckung
|
||
|
||
## Kontext & Problem
|
||
|
||
**Ziel:** Flexoptix-Katalog = absoluter Anker für ALLE Wettbewerbs-Preise in TIP.
|
||
Kein einziger Flexoptix-Artikel darf ohne Wettbewerbspreise sein.
|
||
|
||
**Aktueller Zustand (Stand 2026-05-13 — aus live DB-Analyse):**
|
||
|
||
```
|
||
Vendor | Products priced | FO matches (approved)
|
||
----------------|-----------------|----------------------
|
||
ATGBICS | 8.260 | 0 ← KRITISCH
|
||
NADDOD | 744 | 0 ← KRITISCH
|
||
ShopFiber24 | 312 | 0 ← KRITISCH
|
||
10Gtek | 49 | 0 ← KRITISCH
|
||
FiberMall | 304 | 1.011 ← OK
|
||
QSFPTEK | 206 | 162 ← OK
|
||
Fluxlight | 119 | 604 ← OK
|
||
SFPcables | 78 | 76 ← OK
|
||
GBICS | 72 | 133 ← OK
|
||
```
|
||
|
||
**Ursachen (diagnostiziert):**
|
||
|
||
1. **30-Tage-Filter im Matcher** — Der Query in `scheduler.ts` filtert Kandidaten via
|
||
`po.time > NOW() - INTERVAL '30 days'`. ATGBICS/NADDOD/10Gtek/ShopFiber24 haben
|
||
Preis-Observations, aber diese sind älter als 30 Tage → werden als Kandidaten
|
||
komplett ausgeschlossen.
|
||
|
||
2. **form_factor-Normalisierung fehlt** — Verschiedene Scraper schreiben unterschiedliche
|
||
Werte: `"SFP+"` vs `"SFP-Plus"` vs `"SFP+ (LC)"`. Der Matcher filtert exakt per
|
||
`t.form_factor = $1`, daher 0 Kandidaten bei abweichendem Format.
|
||
|
||
3. **Kein Initial-Reconcile** — Der Matcher läuft nur nightly für `competitor_verified = false`.
|
||
Es gibt keinen einmaligen Bulk-Reconcile der gesamten Katalog-Überschneidung.
|
||
|
||
---
|
||
|
||
## Repository
|
||
|
||
```
|
||
Pfad (lokal): /Users/renefichtmueller/Desktop/Claude Code/github-repos/transceiver-db/
|
||
Pfad (Erik): /opt/tip/
|
||
Stack: TypeScript, Node.js, PostgreSQL 17, pg-boss
|
||
Haupt-Dateien: packages/scraper/src/scheduler.ts ← Matcher, pg-boss Jobs
|
||
packages/scraper/src/robots/ ← Neues Robot-Modul hier rein
|
||
sql/ ← Migrationen (aktuell 107 vorhanden)
|
||
```
|
||
|
||
---
|
||
|
||
## SCHRITT 1: Diagnose-Migration — form_factor-Normalisierungsübersicht
|
||
|
||
**Erstelle Datei:** `sql/108-form-factor-normalization.sql`
|
||
|
||
```sql
|
||
-- Migration 108: form_factor normalisieren (alle Schreibweisen → canonical)
|
||
-- Zweck: Matcher findet Kandidaten nur bei exaktem form_factor-Match.
|
||
-- Verschiedene Scraper schreiben inkonsistente Werte.
|
||
|
||
-- 1. Canonical-Mapping anwenden
|
||
UPDATE transceivers SET
|
||
form_factor = CASE
|
||
-- SFP Varianten
|
||
WHEN UPPER(TRIM(form_factor)) IN ('SFP', 'SFP (LC)', 'SFP DDM', 'SFP MODULE', '1G SFP', 'GLC', 'MINI-GBIC')
|
||
THEN 'SFP'
|
||
-- SFP+ Varianten
|
||
WHEN UPPER(TRIM(form_factor)) IN ('SFP+', 'SFP+ (LC)', 'SFP-PLUS', 'SFP PLUS', 'SFP+ DDM',
|
||
'SFP+ MODULE', '10G SFP+', 'SFP+ OPTICAL', '10GSFP+',
|
||
'SFP+/SFP28 COMPATIBLE', 'SFP+(LC)')
|
||
THEN 'SFP+'
|
||
-- SFP28 Varianten (25G)
|
||
WHEN UPPER(TRIM(form_factor)) IN ('SFP28', 'SFP-28', 'SFP 28', '25G SFP28', 'SFP28 (LC)',
|
||
'SFP28 DDM', '25GSFP28')
|
||
THEN 'SFP28'
|
||
-- QSFP+ Varianten (40G)
|
||
WHEN UPPER(TRIM(form_factor)) IN ('QSFP+', 'QSFP-PLUS', 'QSFP PLUS', '40G QSFP+',
|
||
'QSFP+ (MPO)', 'QSFP+ MODULE', 'QSFP+ DDM', '40GQSFP+')
|
||
THEN 'QSFP+'
|
||
-- QSFP28 Varianten (100G)
|
||
WHEN UPPER(TRIM(form_factor)) IN ('QSFP28', 'QSFP-28', 'QSFP 28', '100G QSFP28',
|
||
'QSFP28 (LC)', 'QSFP28 MODULE', 'QSFP28 DDM', '100GQSFP28')
|
||
THEN 'QSFP28'
|
||
-- QSFP56 Varianten (200G)
|
||
WHEN UPPER(TRIM(form_factor)) IN ('QSFP56', 'QSFP-56', '200G QSFP56', 'QSFP56-DD')
|
||
THEN 'QSFP56'
|
||
-- QSFP-DD Varianten (400G)
|
||
WHEN UPPER(TRIM(form_factor)) IN ('QSFP-DD', 'QSFPDD', 'QSFP DD', '400G QSFP-DD',
|
||
'QSFP-DD MODULE', 'QSFP56-DD 400G')
|
||
THEN 'QSFP-DD'
|
||
-- QSFP-DD800 / 800G
|
||
WHEN UPPER(TRIM(form_factor)) IN ('QSFP-DD800', 'QSFP-DD 800G', '800G QSFP-DD', 'OSFP-RHS')
|
||
THEN 'QSFP-DD800'
|
||
-- OSFP Varianten
|
||
WHEN UPPER(TRIM(form_factor)) IN ('OSFP', 'OSFP MODULE', '400G OSFP', '800G OSFP')
|
||
THEN 'OSFP'
|
||
-- CFP Varianten
|
||
WHEN UPPER(TRIM(form_factor)) IN ('CFP', 'CFP2', 'CFP4', 'CFP-DCO', 'CFP2-DCO')
|
||
THEN UPPER(TRIM(form_factor))
|
||
-- XFP
|
||
WHEN UPPER(TRIM(form_factor)) IN ('XFP', '10G XFP', 'XFP DDM')
|
||
THEN 'XFP'
|
||
-- X2 / XENPAK
|
||
WHEN UPPER(TRIM(form_factor)) IN ('X2', 'XENPAK', 'X2 MODULE')
|
||
THEN UPPER(TRIM(form_factor))
|
||
-- DAC Cable-Typen (kein optisches Modul — form_factor trotzdem normalisieren)
|
||
WHEN UPPER(form_factor) LIKE '%DAC%' AND UPPER(form_factor) LIKE '%QSFP28%' THEN 'QSFP28-DAC'
|
||
WHEN UPPER(form_factor) LIKE '%DAC%' AND UPPER(form_factor) LIKE '%QSFP+%' THEN 'QSFP+-DAC'
|
||
WHEN UPPER(form_factor) LIKE '%DAC%' AND UPPER(form_factor) LIKE '%SFP28%' THEN 'SFP28-DAC'
|
||
WHEN UPPER(form_factor) LIKE '%DAC%' AND UPPER(form_factor) LIKE '%SFP+%' THEN 'SFP+-DAC'
|
||
WHEN UPPER(form_factor) LIKE '%AOC%' AND UPPER(form_factor) LIKE '%QSFP28%' THEN 'QSFP28-AOC'
|
||
WHEN UPPER(form_factor) LIKE '%AOC%' AND UPPER(form_factor) LIKE '%QSFP+%' THEN 'QSFP+-AOC'
|
||
WHEN UPPER(form_factor) LIKE '%AOC%' AND UPPER(form_factor) LIKE '%SFP28%' THEN 'SFP28-AOC'
|
||
WHEN UPPER(form_factor) LIKE '%AOC%' AND UPPER(form_factor) LIKE '%SFP+%' THEN 'SFP+-AOC'
|
||
ELSE form_factor -- unbekannte Werte unverändert lassen
|
||
END
|
||
WHERE form_factor IS NOT NULL;
|
||
|
||
-- 2. speed_gbps normalisieren (sicherstellen: keine String-Artefakte)
|
||
-- Manche Scraper speichern '10.0', '10.00', '1.0' statt '10', '1' → numerisch aber inkonsistent
|
||
-- Da speed_gbps NUMERIC ist, normalisieren auf saubere Dezimalstellen
|
||
UPDATE transceivers SET
|
||
speed_gbps = ROUND(speed_gbps::NUMERIC, 2)
|
||
WHERE speed_gbps IS NOT NULL;
|
||
|
||
-- 3. Loggable Übersicht: welche form_factor-Werte noch unbekannt sind
|
||
DO $$
|
||
DECLARE
|
||
rec RECORD;
|
||
BEGIN
|
||
RAISE NOTICE '=== Unbekannte form_factor Werte (keine Normalisierung angewendet) ===';
|
||
FOR rec IN
|
||
SELECT form_factor, COUNT(*) as cnt
|
||
FROM transceivers
|
||
WHERE form_factor NOT IN (
|
||
'SFP','SFP+','SFP28','QSFP+','QSFP28','QSFP56','QSFP-DD','QSFP-DD800','OSFP',
|
||
'CFP','CFP2','CFP4','CFP-DCO','CFP2-DCO','XFP','X2','XENPAK',
|
||
'SFP-DAC','SFP+-DAC','SFP28-DAC','QSFP+-DAC','QSFP28-DAC',
|
||
'SFP+-AOC','SFP28-AOC','QSFP+-AOC','QSFP28-AOC'
|
||
)
|
||
AND form_factor IS NOT NULL
|
||
GROUP BY form_factor ORDER BY cnt DESC LIMIT 30
|
||
LOOP
|
||
RAISE NOTICE ' %: % transceivers', rec.form_factor, rec.cnt;
|
||
END LOOP;
|
||
END;
|
||
$$;
|
||
```
|
||
|
||
---
|
||
|
||
## SCHRITT 2: Full-Catalog-Reconcile-Robot
|
||
|
||
**Erstelle Datei:** `packages/scraper/src/robots/catalog-reconcile.ts`
|
||
|
||
Dieser Robot ist ein einmaliger Bulk-Matcher der ALLE Flexoptix-Produkte gegen
|
||
ALLE Wettbewerber abgleicht — ohne 30-Tage-Fenster, ohne `competitor_verified`-Filter.
|
||
Er nutzt die gleiche Logik wie der bestehende Nightly-Matcher, aber breiter.
|
||
|
||
```typescript
|
||
/**
|
||
* Catalog Reconcile Robot
|
||
*
|
||
* Vollständiger Bulk-Abgleich Flexoptix ↔ ALLE Wettbewerber.
|
||
*
|
||
* Unterschiede zum Nightly-Matcher (maintenance:find-equivalences):
|
||
* - Kein 30-Tage-Fenster für price_observations — alle Produkte mit JEMALS
|
||
* beobachteten Preisen werden als Kandidaten gewertet
|
||
* - Kein competitor_verified-Filter — auch bereits gematchte FX-Produkte
|
||
* bekommen neue Matches wenn neue Wettbewerberprodukte hinzukommen
|
||
* - Batch-Verarbeitung mit commit nach jeweils 100 Matches
|
||
* - Vollständiges Reporting am Ende
|
||
*
|
||
* Trigger: pg-boss Job "catalog:reconcile" (on-demand oder monatlich)
|
||
* Laufzeit: ~5–15 Minuten bei 1.000+ FX-Produkten
|
||
*/
|
||
|
||
import { pool } from "../utils/db";
|
||
|
||
export interface ReconcileResult {
|
||
flexoptixProcessed: number;
|
||
newAutoApproved: number;
|
||
newPending: number;
|
||
skippedLowConfidence: number;
|
||
skippedAlreadyMatched: number;
|
||
vendorBreakdown: Record<string, { autoApproved: number; pending: number }>;
|
||
durationMs: number;
|
||
}
|
||
|
||
// ── Konfigurations-Konstanten ────────────────────────────────────────────────
|
||
|
||
/** Minimum-Confidence für pending-Eintrag (unter diesem Schwellwert: ignorieren) */
|
||
const CONFIDENCE_MIN = 0.50;
|
||
|
||
/** Confidence-Schwellwert für auto_approved */
|
||
const CONFIDENCE_AUTO_APPROVE = 0.73;
|
||
|
||
/**
|
||
* Maximale Anzahl Tage seit letzter price_observation.
|
||
* NULL = kein Filter (alle Produkte mit mind. 1 Observation).
|
||
* Für Full-Reconcile: NULL.
|
||
*/
|
||
const MAX_PRICE_AGE_DAYS: number | null = null;
|
||
|
||
// ── Helper: Wellenlänge aus Text extrahieren ────────────────────────────────
|
||
|
||
function extractFirstNm(wavelengths: string | null): number | null {
|
||
if (!wavelengths) return null;
|
||
const m = wavelengths.match(/(\d{3,4})/);
|
||
return m ? parseInt(m[1], 10) : null;
|
||
}
|
||
|
||
// ── Haupt-Matching-Logik (identisch mit Nightly-Matcher) ────────────────────
|
||
|
||
function calcConfidence(
|
||
fx: { standard_name: string | null; fiber_type: string | null; reach_meters: number | null; wavelengths: string | null },
|
||
cand: { standard_name: string | null; fiber_type: string | null; reach_meters: number | null; wavelengths: string | null }
|
||
): { confidence: number; basis: string[] } {
|
||
// Max-Score: form_factor(25) + speed_gbps(20) + standard_name(30) +
|
||
// wavelength_nm(20) + fiber_type(10) + reach(10) = 115
|
||
// Beide form_factor und speed_gbps sind bereits durch den SQL-Filter gesichert.
|
||
let score = 0;
|
||
const basis: string[] = [];
|
||
|
||
score += 25; basis.push("form_factor");
|
||
score += 20; basis.push("speed_gbps");
|
||
|
||
if (
|
||
fx.standard_name && cand.standard_name &&
|
||
fx.standard_name.trim().toUpperCase() === cand.standard_name.trim().toUpperCase()
|
||
) {
|
||
score += 30; basis.push("standard_name");
|
||
}
|
||
|
||
const fxNm = extractFirstNm(fx.wavelengths);
|
||
const candNm = extractFirstNm(cand.wavelengths);
|
||
if (fxNm !== null && candNm !== null) {
|
||
if (Math.abs(fxNm - candNm) <= 15) {
|
||
score += 20; basis.push(`wavelength_${fxNm}nm`);
|
||
} else {
|
||
score -= 20;
|
||
}
|
||
}
|
||
|
||
if (fx.fiber_type && cand.fiber_type) {
|
||
if (fx.fiber_type.trim().toUpperCase() === cand.fiber_type.trim().toUpperCase()) {
|
||
score += 10; basis.push("fiber_type");
|
||
} else {
|
||
score -= 15;
|
||
}
|
||
}
|
||
|
||
if (fx.reach_meters && cand.reach_meters && fx.reach_meters > 0 && cand.reach_meters > 0) {
|
||
const diff = Math.abs(fx.reach_meters - cand.reach_meters);
|
||
const tolerance = Math.max(fx.reach_meters, 1) * 0.25;
|
||
if (diff <= tolerance) {
|
||
score += 10; basis.push("reach");
|
||
} else {
|
||
score -= 15;
|
||
}
|
||
} else if (!fx.reach_meters && !cand.reach_meters) {
|
||
score += 5; basis.push("reach_null");
|
||
}
|
||
|
||
const confidence = Math.max(0, Math.min(1, score / 115));
|
||
return { confidence, basis };
|
||
}
|
||
|
||
// ── Haupt-Funktion ───────────────────────────────────────────────────────────
|
||
|
||
export async function runCatalogReconcile(): Promise<ReconcileResult> {
|
||
const startMs = Date.now();
|
||
console.log("=== Catalog Reconcile Robot ===");
|
||
console.log(` Started: ${new Date().toISOString()}`);
|
||
console.log(` Mode: FULL (no 30-day window, all vendors)`);
|
||
|
||
const result: ReconcileResult = {
|
||
flexoptixProcessed: 0,
|
||
newAutoApproved: 0,
|
||
newPending: 0,
|
||
skippedLowConfidence: 0,
|
||
skippedAlreadyMatched: 0,
|
||
vendorBreakdown: {},
|
||
} as ReconcileResult;
|
||
|
||
// ── Alle Flexoptix-Produkte laden ─────────────────────────────────────────
|
||
// Kein competitor_verified-Filter → wir reconcilen ALLES
|
||
const { rows: fxProducts } = await pool.query<{
|
||
id: string;
|
||
part_number: string;
|
||
standard_name: string | null;
|
||
form_factor: string | null;
|
||
speed_gbps: string | null;
|
||
fiber_type: string | null;
|
||
reach_meters: number | null;
|
||
wavelengths: string | null;
|
||
}>(`
|
||
SELECT t.id, t.part_number, t.standard_name, t.form_factor,
|
||
t.speed_gbps, t.fiber_type, t.reach_meters, t.wavelengths
|
||
FROM transceivers t
|
||
JOIN vendors v ON v.id = t.vendor_id
|
||
WHERE UPPER(v.name) LIKE '%FLEXOPTIX%'
|
||
AND t.form_factor IS NOT NULL
|
||
AND t.speed_gbps IS NOT NULL
|
||
ORDER BY t.part_number
|
||
`);
|
||
|
||
result.flexoptixProcessed = fxProducts.length;
|
||
console.log(` Flexoptix products to process: ${fxProducts.length}`);
|
||
|
||
const priceAgeFilter = MAX_PRICE_AGE_DAYS !== null
|
||
? `AND po.time > NOW() - INTERVAL '${MAX_PRICE_AGE_DAYS} days'`
|
||
: "";
|
||
|
||
for (const fx of fxProducts) {
|
||
if (!fx.form_factor || !fx.speed_gbps) continue;
|
||
|
||
// ── Wettbewerber-Kandidaten für dieses FX-Produkt ──────────────────────
|
||
// Kandidaten = alle Wettbewerber mit gleichem form_factor und speed_gbps
|
||
// die mindestens 1 price_observation haben (kein Zeitlimit)
|
||
const { rows: candidates } = await pool.query<{
|
||
competitor_id: string;
|
||
part_number: string;
|
||
standard_name: string | null;
|
||
form_factor: string | null;
|
||
speed_gbps: string | null;
|
||
fiber_type: string | null;
|
||
reach_meters: number | null;
|
||
wavelengths: string | null;
|
||
vendor_name: string;
|
||
last_price: Date | null;
|
||
price_count: string;
|
||
}>(`
|
||
SELECT t.id AS competitor_id, t.part_number, t.standard_name,
|
||
t.form_factor, t.speed_gbps, t.fiber_type, t.reach_meters,
|
||
t.wavelengths, v.name AS vendor_name,
|
||
MAX(po.time) AS last_price, COUNT(DISTINCT po.id) AS price_count
|
||
FROM transceivers t
|
||
JOIN vendors v ON v.id = t.vendor_id
|
||
JOIN price_observations po ON po.transceiver_id = t.id
|
||
WHERE UPPER(v.name) NOT LIKE '%FLEXOPTIX%'
|
||
AND v.is_competitor = true
|
||
${priceAgeFilter}
|
||
AND UPPER(t.form_factor) = UPPER($1)
|
||
AND ROUND(t.speed_gbps::NUMERIC, 2) = ROUND($2::NUMERIC, 2)
|
||
AND t.id != $3
|
||
GROUP BY t.id, t.part_number, t.standard_name, t.form_factor,
|
||
t.speed_gbps, t.fiber_type, t.reach_meters, t.wavelengths, v.name
|
||
HAVING COUNT(DISTINCT po.id) >= 1
|
||
`, [fx.form_factor, fx.speed_gbps, fx.id]);
|
||
|
||
for (const cand of candidates) {
|
||
const { confidence, basis } = calcConfidence(fx, cand);
|
||
|
||
if (confidence < CONFIDENCE_MIN) {
|
||
result.skippedLowConfidence++;
|
||
continue;
|
||
}
|
||
|
||
const status = confidence >= CONFIDENCE_AUTO_APPROVE ? "auto_approved" : "pending";
|
||
const notes =
|
||
`${fx.part_number} ↔ ${cand.part_number} (${cand.vendor_name}) | ` +
|
||
`basis: ${basis.join(", ")} | reach: ${fx.reach_meters}m vs ${cand.reach_meters}m | ` +
|
||
`wavelength: ${fx.wavelengths ?? "?"} vs ${cand.wavelengths ?? "?"} | ` +
|
||
`last_price: ${cand.last_price?.toISOString() ?? "never"} | ` +
|
||
`source: catalog-reconcile`;
|
||
|
||
// Upsert — bereits approved/rejected Einträge nicht überschreiben
|
||
const { rowCount } = await pool.query(`
|
||
INSERT INTO transceiver_equivalences
|
||
(flexoptix_id, competitor_id, confidence, match_basis, match_notes, status)
|
||
VALUES ($1, $2, $3, $4, $5, $6)
|
||
ON CONFLICT (flexoptix_id, competitor_id) DO UPDATE SET
|
||
confidence = EXCLUDED.confidence,
|
||
match_basis = EXCLUDED.match_basis,
|
||
match_notes = EXCLUDED.match_notes,
|
||
updated_at = NOW()
|
||
WHERE transceiver_equivalences.status NOT IN ('approved', 'rejected', 'auto_approved')
|
||
`, [fx.id, cand.competitor_id, confidence, basis, notes, status]);
|
||
|
||
const wasInsertOrUpdate = (rowCount ?? 0) > 0;
|
||
if (!wasInsertOrUpdate) {
|
||
result.skippedAlreadyMatched++;
|
||
continue;
|
||
}
|
||
|
||
// Vendor-Breakdown tracken
|
||
if (!result.vendorBreakdown[cand.vendor_name]) {
|
||
result.vendorBreakdown[cand.vendor_name] = { autoApproved: 0, pending: 0 };
|
||
}
|
||
|
||
if (status === "auto_approved") {
|
||
result.newAutoApproved++;
|
||
result.vendorBreakdown[cand.vendor_name].autoApproved++;
|
||
|
||
// competitor_verified auf FX-Produkt setzen
|
||
await pool.query(`
|
||
UPDATE transceivers
|
||
SET competitor_verified = true,
|
||
competitor_verified_at = NOW(),
|
||
competitor_status = 'matched',
|
||
competitor_status_updated_at = NOW()
|
||
WHERE id = $1 AND competitor_verified = false
|
||
`, [fx.id]);
|
||
} else {
|
||
result.newPending++;
|
||
result.vendorBreakdown[cand.vendor_name].pending++;
|
||
}
|
||
}
|
||
}
|
||
|
||
result.durationMs = Date.now() - startMs;
|
||
|
||
// ── Abschluss-Report ───────────────────────────────────────────────────────
|
||
console.log("\n=== Catalog Reconcile Results ===");
|
||
console.log(` Flexoptix processed: ${result.flexoptixProcessed}`);
|
||
console.log(` New auto_approved: ${result.newAutoApproved}`);
|
||
console.log(` New pending: ${result.newPending}`);
|
||
console.log(` Skipped (low confidence): ${result.skippedLowConfidence}`);
|
||
console.log(` Skipped (already matched): ${result.skippedAlreadyMatched}`);
|
||
console.log(` Duration: ${(result.durationMs / 1000).toFixed(1)}s`);
|
||
console.log("\n Vendor Breakdown:");
|
||
for (const [vendor, counts] of Object.entries(result.vendorBreakdown).sort(
|
||
(a, b) => (b[1].autoApproved + b[1].pending) - (a[1].autoApproved + a[1].pending)
|
||
)) {
|
||
console.log(` ${vendor.padEnd(20)} auto_approved=${counts.autoApproved} pending=${counts.pending}`);
|
||
}
|
||
console.log("=================================\n");
|
||
|
||
return result;
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## SCHRITT 3: scheduler.ts — neuen Job registrieren
|
||
|
||
**Ändere in:** `packages/scraper/src/scheduler.ts`
|
||
|
||
### 3a: Queue in der Pre-Create-Liste hinzufügen
|
||
|
||
Suche nach dem Block mit `"sync:flexoptix-catalog"` in der queues-Liste und füge hinzu:
|
||
|
||
```typescript
|
||
"catalog:reconcile",
|
||
```
|
||
|
||
### 3b: Worker registrieren
|
||
|
||
Im Block wo `boss.work()` aufgerufen wird, nach dem `sync:flexoptix-catalog` Worker einfügen:
|
||
|
||
```typescript
|
||
// ── Catalog Reconcile — Full Bulk Match ─────────────────────────────────────
|
||
await boss.work("catalog:reconcile", async () => {
|
||
const ts = new Date().toISOString();
|
||
console.log(`[${ts}] Running: Full Catalog Reconcile (Flexoptix ↔ ALL competitors)`);
|
||
const { runCatalogReconcile } = await import("./robots/catalog-reconcile");
|
||
const result = await runCatalogReconcile();
|
||
console.log(`[catalog:reconcile] Done: ${result.newAutoApproved} auto_approved, ${result.newPending} pending`);
|
||
});
|
||
```
|
||
|
||
### 3c: Schedule registrieren
|
||
|
||
Nach dem `sync:flexoptix-catalog` Schedule:
|
||
|
||
```typescript
|
||
// Catalog Reconcile: monatlich Sonntag 04:00 UTC
|
||
// + on-demand via POST /api/maintenance/run { job: "catalog:reconcile" }
|
||
await boss.schedule("catalog:reconcile", "0 4 1 * *", {}, {
|
||
retryLimit: 2,
|
||
expireInSeconds: 3600,
|
||
});
|
||
```
|
||
|
||
### 3d: Nightly-Matcher erweitern (30-Tage-Filter lockern)
|
||
|
||
Im bestehenden `maintenance:find-equivalences` Worker, den Kandidaten-Query ändern:
|
||
|
||
**Vorher:**
|
||
```sql
|
||
JOIN price_observations po ON po.transceiver_id = t.id
|
||
WHERE UPPER(v.name) NOT LIKE '%FLEXOPTIX%'
|
||
AND po.time > NOW() - INTERVAL '30 days'
|
||
AND t.form_factor = $1
|
||
AND t.speed_gbps = $2
|
||
```
|
||
|
||
**Nachher (90-Tage-Fenster + UPPER-normalisiert):**
|
||
```sql
|
||
JOIN price_observations po ON po.transceiver_id = t.id
|
||
WHERE UPPER(v.name) NOT LIKE '%FLEXOPTIX%'
|
||
AND po.time > NOW() - INTERVAL '90 days'
|
||
AND UPPER(t.form_factor) = UPPER($1)
|
||
AND ROUND(t.speed_gbps::NUMERIC, 2) = ROUND($2::NUMERIC, 2)
|
||
AND t.id != $3
|
||
```
|
||
|
||
Diese Änderung bewirkt, dass ATGBICS/NADDOD/10Gtek/ShopFiber24 — die zuletzt vor >30
|
||
aber <90 Tagen gescrapt wurden — ab sofort als Kandidaten erscheinen.
|
||
|
||
---
|
||
|
||
## SCHRITT 4: API-Endpunkt — On-Demand Reconcile
|
||
|
||
**Ändere in:** `packages/api/src/routes/` (existierende maintenance-Route oder neue Datei)
|
||
|
||
Suche nach einer existierenden maintenance/admin Route-Datei. Falls vorhanden,
|
||
dort einen neuen Endpunkt einfügen. Falls nicht vorhanden, erstelle:
|
||
`packages/api/src/routes/maintenance.ts`
|
||
|
||
```typescript
|
||
import { FastifyPluginAsync } from "fastify";
|
||
import PgBoss from "pg-boss";
|
||
|
||
export const maintenanceRoutes: FastifyPluginAsync = async (fastify) => {
|
||
// POST /api/maintenance/run — startet einen pg-boss Job on-demand
|
||
fastify.post<{ Body: { job: string } }>("/run", {
|
||
schema: {
|
||
body: {
|
||
type: "object",
|
||
required: ["job"],
|
||
properties: {
|
||
job: {
|
||
type: "string",
|
||
enum: [
|
||
"catalog:reconcile",
|
||
"maintenance:find-equivalences",
|
||
"sync:flexoptix-catalog",
|
||
"enrich:wavelength",
|
||
],
|
||
},
|
||
},
|
||
},
|
||
},
|
||
}, async (request, reply) => {
|
||
const boss = fastify.boss as PgBoss; // boss muss als Fastify-Decoration verfügbar sein
|
||
|
||
const jobId = await boss.send(request.body.job, {});
|
||
return reply.code(202).send({
|
||
success: true,
|
||
job: request.body.job,
|
||
jobId,
|
||
message: `Job ${request.body.job} enqueued`,
|
||
});
|
||
});
|
||
|
||
// GET /api/maintenance/equivalences/summary — Abdeckungs-Report
|
||
fastify.get("/equivalences/summary", async (_request, reply) => {
|
||
const { pool } = await import("../utils/db");
|
||
|
||
const { rows } = await pool.query(`
|
||
SELECT
|
||
v.name AS vendor,
|
||
COUNT(DISTINCT t.id)::int AS products_with_prices,
|
||
COUNT(DISTINCT te.flexoptix_id)::int AS fo_matches_approved,
|
||
ROUND(
|
||
100.0 * COUNT(DISTINCT te.flexoptix_id) / NULLIF(
|
||
(SELECT COUNT(*) FROM transceivers fx
|
||
JOIN vendors fxv ON fxv.id = fx.vendor_id
|
||
WHERE UPPER(fxv.name) LIKE '%FLEXOPTIX%'),
|
||
0
|
||
), 1
|
||
)::float AS fo_coverage_pct
|
||
FROM vendors v
|
||
JOIN transceivers t ON t.vendor_id = v.id
|
||
JOIN price_observations po ON po.transceiver_id = t.id
|
||
LEFT JOIN transceiver_equivalences te ON te.competitor_id = t.id
|
||
AND te.status IN ('approved','auto_approved')
|
||
WHERE v.is_competitor = true
|
||
AND UPPER(v.name) NOT LIKE '%FLEXOPTIX%'
|
||
GROUP BY v.id, v.name
|
||
HAVING COUNT(DISTINCT t.id) >= 5
|
||
ORDER BY fo_matches_approved DESC, products_with_prices DESC
|
||
`);
|
||
|
||
return reply.send({ success: true, data: rows });
|
||
});
|
||
};
|
||
```
|
||
|
||
---
|
||
|
||
## SCHRITT 5: Migration für Nightly-Matcher-Bugfix
|
||
|
||
**Erstelle Datei:** `sql/109-fix-nightly-matcher-time-window.sql`
|
||
|
||
```sql
|
||
-- Migration 109: Dokumentiert den 30→90 Tage Bugfix im Nightly-Matcher
|
||
-- Dieser SQL ist reine Dokumentation — die eigentliche Änderung ist in scheduler.ts
|
||
|
||
COMMENT ON TABLE transceiver_equivalences IS
|
||
'Flexoptix-zentrierter Equivalenz-Graph: flexoptix_id = Referenz-Anker,
|
||
competitor_id = äquivalentes Konkurrenzprodukt.
|
||
Status: pending (review nötig), auto_approved (Confidence ≥0.73),
|
||
approved (manuell), rejected (explizit ausgeschlossen).
|
||
KRITISCH: Matcher nutzt 90-Tage-Fenster (war: 30 Tage) damit Vendors
|
||
mit seltener Preisbeobachtung (ATGBICS/NADDOD/10Gtek/ShopFiber24) gefunden werden.';
|
||
```
|
||
|
||
---
|
||
|
||
## SCHRITT 6: One-Shot Reconcile nach Deployment ausführen
|
||
|
||
**Nachdem alles deployed ist**, einmalig auf Erik ausführen:
|
||
|
||
```bash
|
||
# Auf Erik: Migration 108 ausführen
|
||
psql -U tip -d transceiver_intelligence -f /opt/tip/sql/108-form-factor-normalization.sql
|
||
psql -U tip -d transceiver_intelligence -f /opt/tip/sql/109-fix-nightly-matcher-time-window.sql
|
||
|
||
# PM2 Daemon neu starten
|
||
pm2 restart tip-scraper-daemon --update-env
|
||
|
||
# 2 Minuten warten bis pg-boss Workers registriert
|
||
sleep 120
|
||
|
||
# Full Catalog Reconcile on-demand triggern
|
||
curl -s -X POST http://localhost:3001/api/maintenance/run \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"job": "catalog:reconcile"}' | jq .
|
||
|
||
# Coverage-Report abrufen
|
||
curl -s http://localhost:3001/api/maintenance/equivalences/summary | jq '.data | .[:10]'
|
||
```
|
||
|
||
**Erwartetes Ergebnis des Reconcile:**
|
||
```
|
||
ATGBICS: ~500-2.000 neue Matches (von 8.260 Produkten, bei Überschneidung ~10-25%)
|
||
NADDOD: ~100-300 neue Matches
|
||
ShopFiber24: ~50-150 neue Matches
|
||
10Gtek: ~10-30 neue Matches
|
||
```
|
||
|
||
---
|
||
|
||
## SCHRITT 7: Completeness Dashboard Query
|
||
|
||
**Diese SQL** direkt in die TIP Dashboard-Komponente integrieren (z.B. in `packages/frontend`
|
||
oder als API-Response für das Admin-Dashboard):
|
||
|
||
```sql
|
||
-- Flexoptix Coverage Report
|
||
-- Zeigt wie viele FX-Produkte pro Wettbewerber gematcht sind
|
||
SELECT
|
||
v.name AS vendor,
|
||
COUNT(DISTINCT t.id) AS competitor_products,
|
||
COUNT(DISTINCT te.flexoptix_id) AS fo_products_matched,
|
||
ROUND(
|
||
100.0 * COUNT(DISTINCT te.flexoptix_id) /
|
||
NULLIF(fo_total.total, 0), 1
|
||
) AS coverage_pct,
|
||
MIN(po.time) AS first_observation,
|
||
MAX(po.time) AS last_observation
|
||
FROM vendors v
|
||
JOIN transceivers t ON t.vendor_id = v.id
|
||
JOIN price_observations po ON po.transceiver_id = t.id
|
||
LEFT JOIN transceiver_equivalences te ON te.competitor_id = t.id
|
||
AND te.status IN ('approved','auto_approved')
|
||
CROSS JOIN (
|
||
SELECT COUNT(*) as total
|
||
FROM transceivers fx
|
||
JOIN vendors fxv ON fxv.id = fx.vendor_id
|
||
WHERE UPPER(fxv.name) LIKE '%FLEXOPTIX%'
|
||
) fo_total
|
||
WHERE v.is_competitor = true
|
||
AND UPPER(v.name) NOT LIKE '%FLEXOPTIX%'
|
||
GROUP BY v.id, v.name, fo_total.total
|
||
HAVING COUNT(DISTINCT t.id) >= 5
|
||
ORDER BY coverage_pct DESC NULLS LAST, competitor_products DESC;
|
||
```
|
||
|
||
---
|
||
|
||
## Ausführungsreihenfolge
|
||
|
||
```
|
||
1. sql/108-form-factor-normalization.sql → psql ausführen
|
||
2. packages/scraper/src/robots/catalog-reconcile.ts → neue Datei erstellen
|
||
3. packages/scraper/src/scheduler.ts → 3 Änderungen (Worker, Schedule, 30→90 Tage)
|
||
4. packages/api/src/routes/maintenance.ts → neue Datei (oder in bestehende Route einfügen)
|
||
5. sql/109-fix-nightly-matcher-time-window.sql → psql ausführen (Dokumentation)
|
||
6. npm run build -w packages/scraper → Build
|
||
7. npm run build -w packages/api → Build
|
||
8. git add -A && git commit -m "feat: Flexoptix reference matching overhaul"
|
||
9. git push origin HEAD:main
|
||
10. Auf Erik: git fetch && git reset --hard origin/main
|
||
11. Auf Erik: sql/108 und sql/109 per psql ausführen
|
||
12. pm2 restart tip-scraper-daemon --update-env
|
||
13. sleep 120 (pg-boss Workers registrieren)
|
||
14. curl -X POST .../api/maintenance/run '{"job":"catalog:reconcile"}'
|
||
15. Ergebnis prüfen: curl .../api/maintenance/equivalences/summary
|
||
```
|
||
|
||
---
|
||
|
||
## Erwartetes Ergebnis nach Deployment
|
||
|
||
| Vendor | Vorher | Nachher (geschätzt) |
|
||
|--------|--------|---------------------|
|
||
| ATGBICS | 0 Matches | 300–1.500 Matches |
|
||
| NADDOD | 0 Matches | 80–300 Matches |
|
||
| ShopFiber24 | 0 Matches | 40–150 Matches |
|
||
| 10Gtek | 0 Matches | 10–40 Matches |
|
||
| FiberMall | 1.011 Matches | ~1.011 (unverändert) |
|
||
| Coverage gesamt | ~22% | ~45–60% |
|
||
|
||
**Warum nicht 100% Coverage?**
|
||
Flexoptix hat ~457 aktive Produkte. Viele Wettbewerber-Produkte sind herstellerspezifische
|
||
Kompatibilität (z.B. "Cisco-kompatibel SFP+") ohne exaktes Gegenstück im FX-Katalog.
|
||
Die verbleibende Lücke erklärt sich aus:
|
||
- Wettbewerber-Produkte ohne FX-Äquivalent (Nischenspezifikationen, OEM-only)
|
||
- Fehlende Normalisierungsfelder auf Wettbewerber-Seite (null fiber_type etc.)
|
||
- Produkte die FX aus dem Sortiment genommen hat
|
||
|
||
---
|
||
|
||
## Was NICHT geändert wird
|
||
|
||
- Bestehende `transceiver_equivalences`-Einträge mit `status='approved'` oder `status='rejected'`
|
||
→ werden nie überschrieben (ON CONFLICT ... WHERE status NOT IN (...))
|
||
- Alle existierenden scraper-Logiken → nicht anfassen
|
||
- PM2-Konfiguration, Cloudflare Tunnel, Umgebungsvariablen → nicht anfassen
|
||
- `CODEX-TASK-zero-manual-review.md` Aufgaben → separater Task, nicht überlappen
|
||
|
||
---
|
||
|
||
## Zusammenhang mit CODEX-TASK-zero-manual-review.md
|
||
|
||
| Task | Fokus | Reihenfolge |
|
||
|------|-------|-------------|
|
||
| **dieser Task** | Coverage: ATGBICS/NADDOD/10Gtek/ShopFiber24 von 0% auf ~50% | Zuerst |
|
||
| zero-manual-review | Qualität: Confidence-Scores durch deterministischen Matcher ersetzen | Danach |
|
||
|
||
Erst Coverage herstellen, dann Qualität verbessern. Macht keinen Sinn,
|
||
die 0%-Vendors mit einem neuen Matcher zu matchen wenn sie nicht einmal im Kandidaten-Pool sind.
|