# CODEX TASK: Zero Manual Review Queue — Deterministic Equivalence Matching ## Ziel Die manuelle Review-Queue (`transceiver_equivalences` mit `status = 'pending'`) wird eliminiert. Statt probabilistischer Confidence-Scores wird ein deterministisches Exact-Match-System gebaut, das nur dann einen Match erzeugt, wenn ALLE Pflichtfelder vorhanden und exakt gleich sind. **Aktueller Zustand (PROBLEM):** - 13.374 Einträge in `pending` → manuelle Freigabe nötig - Confidence-Score 0.0–1.0 → "56%" bedeutet Unsicherheit → Review nötig - Fehlende Felder (wavelength=?) führen zu unsicheren Matches - `auto_approved` ab 0.73 Confidence — zu niedrige Schwelle **Ziel-Zustand (LÖSUNG):** - 0 Einträge in `pending` — nie wieder - Kein Confidence-Score — nur MATCH oder KEIN MATCH - Fehlende Felder → Enrichment-Job → kein Match bis Daten vollständig - Nur Exact-Match (mit definierten Toleranzen) → 100% verlässliche Daten --- ## Repository ``` Pfad: /opt/tip (auf Erik-Server) ODER /Users/renefichtmueller/Desktop/Claude Code/github-repos/transceiver-db (lokal) Stack: TypeScript, Node.js, PostgreSQL 17, pg-boss (Job-Queue) Packages: packages/scraper/src/scheduler.ts ← Haupt-Matching-Logik packages/api/src/routes/review.ts ← API-Endpunkte sql/ ← DB-Migrationen (nummeriert 001-104) ``` --- ## SCHRITT 1: Datenbank-Migration — Neue Pflichtfelder **Erstelle Datei:** `sql/105-wavelength-connector-completeness.sql` ```sql -- Migration 105: Wavelength (TX/RX getrennt für BiDi), Connector-Normalisierung, -- Data-Completeness-Score, Enrichment-Flag -- 1. Wellenlängen-Felder aufteilen (BiDi hat TX ≠ RX) ALTER TABLE transceivers ADD COLUMN IF NOT EXISTS wavelength_tx_nm INTEGER, -- TX-Wellenlänge in nm (z.B. 1270) ADD COLUMN IF NOT EXISTS wavelength_rx_nm INTEGER, -- RX-Wellenlänge in nm (z.B. 1330) ADD COLUMN IF NOT EXISTS connector_type TEXT, -- 'LC', 'SC', 'MPO-12', 'MPO-16', 'RJ45', 'CS', 'SN' ADD COLUMN IF NOT EXISTS data_completeness INTEGER DEFAULT 0 CHECK (data_completeness BETWEEN 0 AND 100), ADD COLUMN IF NOT EXISTS enrichment_needed BOOLEAN DEFAULT FALSE, ADD COLUMN IF NOT EXISTS enrichment_fields TEXT[] DEFAULT '{}'; -- welche Felder fehlen noch -- 2. Indices für Performance CREATE INDEX IF NOT EXISTS idx_tx_wavelength_tx ON transceivers (wavelength_tx_nm) WHERE wavelength_tx_nm IS NOT NULL; CREATE INDEX IF NOT EXISTS idx_tx_wavelength_rx ON transceivers (wavelength_rx_nm) WHERE wavelength_rx_nm IS NOT NULL; CREATE INDEX IF NOT EXISTS idx_tx_completeness ON transceivers (data_completeness); CREATE INDEX IF NOT EXISTS idx_tx_enrichment ON transceivers (enrichment_needed) WHERE enrichment_needed = TRUE; -- 3. Completeness-Berechnungsfunktion CREATE OR REPLACE FUNCTION calc_data_completeness( p_form_factor TEXT, p_speed_gbps NUMERIC, p_fiber_type TEXT, p_reach_meters INTEGER, p_wavelength_tx INTEGER, p_connector TEXT ) RETURNS INTEGER AS $$ DECLARE score INTEGER := 0; BEGIN IF p_form_factor IS NOT NULL AND p_form_factor != '' THEN score := score + 20; END IF; IF p_speed_gbps IS NOT NULL AND p_speed_gbps > 0 THEN score := score + 20; END IF; IF p_fiber_type IS NOT NULL AND p_fiber_type != '' THEN score := score + 20; END IF; IF p_reach_meters IS NOT NULL AND p_reach_meters > 0 THEN score := score + 20; END IF; IF p_wavelength_tx IS NOT NULL AND p_wavelength_tx > 0 THEN score := score + 10; END IF; IF p_connector IS NOT NULL AND p_connector != '' THEN score := score + 10; END IF; RETURN score; END; $$ LANGUAGE plpgsql IMMUTABLE; -- 4. Alle bestehenden Transceivers: Completeness initial berechnen UPDATE transceivers SET data_completeness = calc_data_completeness( form_factor, speed_gbps, fiber_type, reach_meters, wavelength_tx_nm, connector_type ), enrichment_needed = ( form_factor IS NULL OR speed_gbps IS NULL OR fiber_type IS NULL OR reach_meters IS NULL OR wavelength_tx_nm IS NULL OR connector_type IS NULL ); -- 5. Trigger: Completeness automatisch aktualisieren CREATE OR REPLACE FUNCTION trg_update_completeness() RETURNS TRIGGER AS $$ BEGIN NEW.data_completeness := calc_data_completeness( NEW.form_factor, NEW.speed_gbps, NEW.fiber_type, NEW.reach_meters, NEW.wavelength_tx_nm, NEW.connector_type ); NEW.enrichment_needed := ( NEW.form_factor IS NULL OR NEW.speed_gbps IS NULL OR NEW.fiber_type IS NULL OR NEW.reach_meters IS NULL OR NEW.wavelength_tx_nm IS NULL OR NEW.connector_type IS NULL ); -- Fehlende Felder dokumentieren NEW.enrichment_fields := ARRAY_REMOVE(ARRAY[ CASE WHEN NEW.form_factor IS NULL THEN 'form_factor' END, CASE WHEN NEW.speed_gbps IS NULL THEN 'speed_gbps' END, CASE WHEN NEW.fiber_type IS NULL THEN 'fiber_type' END, CASE WHEN NEW.reach_meters IS NULL THEN 'reach_meters' END, CASE WHEN NEW.wavelength_tx_nm IS NULL THEN 'wavelength_tx_nm' END, CASE WHEN NEW.connector_type IS NULL THEN 'connector_type' END ], NULL); RETURN NEW; END; $$ LANGUAGE plpgsql; DROP TRIGGER IF EXISTS trg_completeness ON transceivers; CREATE TRIGGER trg_completeness BEFORE INSERT OR UPDATE ON transceivers FOR EACH ROW EXECUTE FUNCTION trg_update_completeness(); COMMENT ON COLUMN transceivers.wavelength_tx_nm IS 'TX wavelength in nm. For BiDi: TX side. For duplex: both TX=RX.'; COMMENT ON COLUMN transceivers.wavelength_rx_nm IS 'RX wavelength in nm. Only set for BiDi. NULL = same as TX.'; COMMENT ON COLUMN transceivers.connector_type IS 'Physical connector: LC, SC, MPO-12, MPO-16, RJ45, CS, SN'; COMMENT ON COLUMN transceivers.data_completeness IS '0-100: percentage of mandatory fields filled (6 fields × weight)'; COMMENT ON COLUMN transceivers.enrichment_needed IS 'TRUE = one or more mandatory fields missing, enrichment job needed'; COMMENT ON COLUMN transceivers.enrichment_fields IS 'Array of field names that still need enrichment'; ``` --- ## SCHRITT 2: IEEE/MSA Standards-Lookup-Tabelle (statisch) **Erstelle Datei:** `sql/106-ieee-msa-wavelength-lookup.sql` ```sql -- Migration 106: IEEE/MSA Standards Wavelength Lookup -- Ground-Truth für Wellenlänge basierend auf Standard-Spezifikation -- Quelle: IEEE 802.3, SFF-8472, SFF-8436, SFF-8661, SFF-8679, MSA specs CREATE TABLE IF NOT EXISTS ieee_wavelength_lookup ( id SERIAL PRIMARY KEY, form_factor TEXT NOT NULL, speed_gbps NUMERIC NOT NULL, fiber_type TEXT NOT NULL, -- 'SMF', 'MMF', 'DAC', 'AOC' reach_min_m INTEGER NOT NULL, reach_max_m INTEGER NOT NULL, wavelength_tx_nm INTEGER NOT NULL, wavelength_rx_nm INTEGER, -- NULL = gleich wie TX (kein BiDi) connector_type TEXT NOT NULL, ieee_standard TEXT, -- z.B. '802.3ae', 'SFF-8431' notes TEXT, UNIQUE (form_factor, speed_gbps, fiber_type, reach_min_m, reach_max_m) ); INSERT INTO ieee_wavelength_lookup (form_factor, speed_gbps, fiber_type, reach_min_m, reach_max_m, wavelength_tx_nm, wavelength_rx_nm, connector_type, ieee_standard, notes) VALUES -- ── SFP (1G) ───────────────────────────────────────────────────────────────── ('SFP', 1, 'MMF', 0, 550, 850, NULL, 'LC', '802.3z', '1000BASE-SX'), ('SFP', 1, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3z', '1000BASE-LX'), ('SFP', 1, 'SMF', 0, 40000, 1310, NULL, 'LC', '802.3z', '1000BASE-EX'), ('SFP', 1, 'SMF', 0, 70000, 1550, NULL, 'LC', '802.3z', '1000BASE-ZX'), ('SFP', 1, 'SMF', 0, 10000, 1270, 1330, 'LC', 'SFF-8472', '1000BASE-BX10-U BiDi'), ('SFP', 1, 'SMF', 0, 10000, 1330, 1270, 'LC', 'SFF-8472', '1000BASE-BX10-D BiDi'), ('SFP', 1, 'Copper', 0, 100, NULL, NULL, 'RJ45','802.3ab', '1000BASE-T'), -- ── SFP+ (10G) ─────────────────────────────────────────────────────────────── ('SFP+', 10, 'MMF', 0, 300, 850, NULL, 'LC', '802.3ae', '10GBASE-SR'), ('SFP+', 10, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3ae', '10GBASE-LR'), ('SFP+', 10, 'SMF', 0, 40000, 1310, NULL, 'LC', '802.3ae', '10GBASE-ER'), ('SFP+', 10, 'SMF', 0, 80000, 1550, NULL, 'LC', '802.3ae', '10GBASE-ZR'), ('SFP+', 10, 'SMF', 0, 10000, 1270, 1330, 'LC', 'SFF-8431', '10GBASE-BX10-U BiDi'), ('SFP+', 10, 'SMF', 0, 10000, 1330, 1270, 'LC', 'SFF-8431', '10GBASE-BX10-D BiDi'), ('SFP+', 10, 'SMF', 0, 20000, 1270, 1330, 'LC', 'SFF-8431', '10GBASE-BX20-U BiDi'), ('SFP+', 10, 'SMF', 0, 20000, 1330, 1270, 'LC', 'SFF-8431', '10GBASE-BX20-D BiDi'), ('SFP+', 10, 'DAC', 0, 7, NULL, NULL, 'SFP+','SFF-8431','10G DAC Twinax'), ('SFP+', 10, 'AOC', 0, 100, 850, NULL, 'LC', 'SFF-8431', '10G AOC'), -- ── SFP28 (25G) ────────────────────────────────────────────────────────────── ('SFP28', 25, 'MMF', 0, 100, 850, NULL, 'LC', '802.3by', '25GBASE-SR'), ('SFP28', 25, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3cc', '25GBASE-LR'), ('SFP28', 25, 'SMF', 0, 2000, 1310, NULL, 'LC', '25GBASE-DR','25GBASE-DR'), ('SFP28', 25, 'SMF', 0, 10000, 1270, 1330, 'LC', '802.3cc', '25GBASE-BX10-U BiDi'), ('SFP28', 25, 'SMF', 0, 10000, 1330, 1270, 'LC', '802.3cc', '25GBASE-BX10-D BiDi'), ('SFP28', 25, 'DAC', 0, 5, NULL, NULL, 'SFP28','802.3by','25G DAC'), -- ── QSFP+ (40G) ────────────────────────────────────────────────────────────── ('QSFP+', 40, 'MMF', 0, 150, 850, NULL, 'MPO-12','802.3ba','40GBASE-SR4'), ('QSFP+', 40, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3ba', '40GBASE-LR4 CWDM4'), ('QSFP+', 40, 'SMF', 0, 2000, 1310, NULL, 'MPO-12','802.3bm','40GBASE-PSM4'), ('QSFP+', 40, 'DAC', 0, 7, NULL, NULL, 'QSFP+','802.3ba','40G DAC'), -- ── QSFP28 (100G) ──────────────────────────────────────────────────────────── ('QSFP28', 100, 'MMF', 0, 100, 850, NULL, 'MPO-12','802.3bm','100GBASE-SR4'), ('QSFP28', 100, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3cd', '100GBASE-LR4 CWDM4'), ('QSFP28', 100, 'SMF', 0, 500, 1310, NULL, 'MPO-12','802.3bj','100GBASE-DR (PSM4)'), ('QSFP28', 100, 'SMF', 0, 40000, 1310, NULL, 'LC', '802.3ba', '100GBASE-ER4'), ('QSFP28', 100, 'SMF', 0, 2000, 1310, NULL, 'LC', 'CWDM4-MSA','100G CWDM4 2km'), ('QSFP28', 100, 'DAC', 0, 5, NULL, NULL, 'QSFP28','802.3bj','100G DAC'), ('QSFP28', 100, 'AOC', 0, 100, 850, NULL, 'MPO-12','802.3bm','100G AOC SR4'), -- ── QSFP56 (200G) ──────────────────────────────────────────────────────────── ('QSFP56', 200, 'MMF', 0, 100, 850, NULL, 'MPO-16','802.3cd','200GBASE-SR4'), ('QSFP56', 200, 'SMF', 0, 2000, 1310, NULL, 'LC', '802.3cd', '200GBASE-DR4'), ('QSFP56', 200, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3cd', '200GBASE-FR4'), ('QSFP56', 200, 'SMF', 0, 40000, 1310, NULL, 'LC', '802.3cd', '200GBASE-LR4'), -- ── QSFP-DD (400G) ─────────────────────────────────────────────────────────── ('QSFP-DD', 400, 'MMF', 0, 100, 850, NULL, 'MPO-16','802.3bs','400GBASE-SR8'), ('QSFP-DD', 400, 'SMF', 0, 500, 1310, NULL, 'MPO-12','802.3bs','400GBASE-DR4'), ('QSFP-DD', 400, 'SMF', 0, 2000, 1310, NULL, 'LC', '802.3bs', '400GBASE-FR4'), ('QSFP-DD', 400, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3bs', '400GBASE-LR4'), ('QSFP-DD', 400, 'SMF', 0, 10000, 1310, NULL, 'MPO-12','800G MSA','400GBASE-PSM4'), ('QSFP-DD', 400, 'DAC', 0, 5, NULL, NULL, 'QSFP-DD','802.3bs','400G DAC'), -- ── OSFP / QSFP-DD800 (800G) ───────────────────────────────────────────────── ('OSFP', 800, 'MMF', 0, 100, 850, NULL, 'MPO-16','802.3df','800GBASE-SR8'), ('OSFP', 800, 'SMF', 0, 500, 1310, NULL, 'MPO-12','802.3df','800GBASE-DR8'), ('OSFP', 800, 'SMF', 0, 2000, 1310, NULL, 'LC', '802.3df', '800GBASE-FR4 2x400G'), ('OSFP', 800, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3df', '800GBASE-LR4'), ('QSFP-DD800', 800, 'MMF', 0, 100, 850, NULL, 'MPO-16','802.3df','800GBASE-SR8'), ('QSFP-DD800', 800, 'SMF', 0, 500, 1310, NULL, 'MPO-12','802.3df','800GBASE-DR8') ON CONFLICT DO NOTHING; CREATE INDEX IF NOT EXISTS idx_ieee_lookup ON ieee_wavelength_lookup (form_factor, speed_gbps, fiber_type, reach_min_m, reach_max_m); ``` --- ## SCHRITT 3: Enrichment-Robot (neues Modul) **Erstelle Datei:** `packages/scraper/src/robots/wavelength-enricher.ts` Dieser Robot läuft als pg-boss Job (`enrich:wavelength`) alle 4 Stunden. Er füllt `wavelength_tx_nm`, `wavelength_rx_nm`, `connector_type` automatisch. ```typescript /** * Wavelength Enricher Robot * * Füllt fehlende wavelength_tx_nm / wavelength_rx_nm / connector_type * aus drei Quellen (Priorität absteigend): * 1. IEEE/MSA Lookup-Tabelle (sql/106) — deterministisch, keine Kosten * 2. Produktname-Regex (heuristisch, pattern-basiert) * 3. Quarantäne: Produkt bleibt ohne Match bis Daten vorhanden * * Kein LLM, kein Scraper, keine externen Calls — rein datenbankbasiert. */ import { pool } from "../utils/db"; // ── Regex-Patterns für Wellenlänge aus Produktnamen ────────────────────────── const WAVELENGTH_PATTERNS: Array<{ pattern: RegExp; tx: number; rx?: number; notes: string; }> = [ // BiDi explizit { pattern: /\b1270\s*\/\s*1330\b/i, tx: 1270, rx: 1330, notes: "BiDi 1270/1330" }, { pattern: /\b1330\s*\/\s*1270\b/i, tx: 1330, rx: 1270, notes: "BiDi 1330/1270" }, { pattern: /\b1310\s*\/\s*1550\b/i, tx: 1310, rx: 1550, notes: "BiDi 1310/1550" }, { pattern: /\b1550\s*\/\s*1310\b/i, tx: 1550, rx: 1310, notes: "BiDi 1550/1310" }, { pattern: /\b1295\s*\/\s*1310\b/i, tx: 1295, rx: 1310, notes: "BiDi CWDM" }, // Direkte nm-Angabe { pattern: /\b850\s*nm\b/i, tx: 850, notes: "850nm explicit" }, { pattern: /\b1310\s*nm\b/i, tx: 1310, notes: "1310nm explicit" }, { pattern: /\b1550\s*nm\b/i, tx: 1550, notes: "1550nm explicit" }, { pattern: /\b1270\s*nm\b/i, tx: 1270, notes: "1270nm explicit" }, { pattern: /\b1330\s*nm\b/i, tx: 1330, notes: "1330nm explicit" }, // DWDM Channels (C-Band ~1530-1565nm) { pattern: /\bDWDM\b.*\bC\d{2}\b/i, tx: 1550, notes: "DWDM C-Band" }, { pattern: /\bDWDM\b/i, tx: 1550, notes: "DWDM generic" }, // Standard-Kurzbezeichnungen → implizite Wellenlänge { pattern: /\bSR4?\b/i, tx: 850, notes: "SR/SR4 = 850nm MMF" }, { pattern: /\bLR4?\b/i, tx: 1310, notes: "LR/LR4 = 1310nm SMF" }, { pattern: /\bER4?\b/i, tx: 1310, notes: "ER/ER4 = 1310nm SMF" }, { pattern: /\bFR4?\b/i, tx: 1310, notes: "FR/FR4 = 1310nm SMF" }, { pattern: /\bDR4?\b/i, tx: 1310, notes: "DR/DR4 = 1310nm SMF" }, { pattern: /\bZR4?\b/i, tx: 1550, notes: "ZR/ZR4 = 1550nm SMF" }, ]; const CONNECTOR_PATTERNS: Array<{ pattern: RegExp; connector: string }> = [ { pattern: /\bMPO.?16\b/i, connector: "MPO-16" }, { pattern: /\bMPO.?12\b/i, connector: "MPO-12" }, { pattern: /\bMPO\b/i, connector: "MPO-12" }, // default MPO = MPO-12 { pattern: /\bMTP\b/i, connector: "MPO-12" }, { pattern: /\bCS\s*connector\b/i, connector: "CS" }, { pattern: /\bSN\s*connector\b/i, connector: "SN" }, { pattern: /\bRJ.?45\b/i, connector: "RJ45" }, { pattern: /\bbase.?t\b/i, connector: "RJ45" }, { pattern: /\bSC\b/i, connector: "SC" }, { pattern: /\bLC\b/i, connector: "LC" }, // LC zuletzt (häufig im Text) ]; // DAC/AOC haben keinen Fiber-Connector const DAC_AOC_PATTERN = /\bDAC\b|\bAOC\b|\btwinax\b/i; function extractWavelengthFromName(name: string): { tx: number; rx?: number; notes: string } | null { for (const p of WAVELENGTH_PATTERNS) { if (p.pattern.test(name)) { return { tx: p.tx, rx: p.rx, notes: p.notes }; } } return null; } function extractConnectorFromName(name: string): string | null { if (DAC_AOC_PATTERN.test(name)) return "DAC/AOC"; for (const p of CONNECTOR_PATTERNS) { if (p.pattern.test(name)) return p.connector; } return null; } export async function runWavelengthEnricher(): Promise { console.log("=== Wavelength Enricher Robot ==="); // Alle Transceivers mit fehlenden Pflichtfeldern const { rows: transceivers } = await pool.query<{ id: string; standard_name: string; part_number: string; form_factor: string; speed_gbps: number; fiber_type: string; reach_meters: number; wavelength_tx_nm: number | null; wavelength_rx_nm: number | null; connector_type: string | null; }>(` SELECT id, standard_name, part_number, form_factor, speed_gbps, fiber_type, reach_meters, wavelength_tx_nm, wavelength_rx_nm, connector_type FROM transceivers WHERE enrichment_needed = TRUE ORDER BY data_completeness DESC -- Produkte mit mehr Daten zuerst LIMIT 5000 `); let fromIeee = 0; let fromRegex = 0; let stillMissing = 0; for (const t of transceivers) { let txNm = t.wavelength_tx_nm; let rxNm = t.wavelength_rx_nm; let connector = t.connector_type; let source = ""; // ── Quelle 1: IEEE/MSA Lookup ─────────────────────────────────────────── if (txNm === null && t.form_factor && t.speed_gbps && t.fiber_type && t.reach_meters) { const { rows: ieee } = await pool.query<{ wavelength_tx_nm: number; wavelength_rx_nm: number | null; connector_type: string; }>(` SELECT wavelength_tx_nm, wavelength_rx_nm, connector_type FROM ieee_wavelength_lookup WHERE form_factor = $1 AND speed_gbps = $2 AND fiber_type = $3 AND reach_min_m <= $4 AND reach_max_m >= $4 LIMIT 1 `, [t.form_factor, t.speed_gbps, t.fiber_type, t.reach_meters]); if (ieee.length > 0) { txNm = ieee[0].wavelength_tx_nm; rxNm = ieee[0].wavelength_rx_nm ?? null; if (!connector) connector = ieee[0].connector_type; source = "ieee_lookup"; fromIeee++; } } // ── Quelle 2: Produktname-Regex ───────────────────────────────────────── const nameForExtraction = [t.standard_name, t.part_number].filter(Boolean).join(" "); if (txNm === null && nameForExtraction) { const extracted = extractWavelengthFromName(nameForExtraction); if (extracted) { txNm = extracted.tx; rxNm = extracted.rx ?? null; source = `regex:${extracted.notes}`; fromRegex++; } } if (connector === null && nameForExtraction) { const extractedConn = extractConnectorFromName(nameForExtraction); if (extractedConn && extractedConn !== "DAC/AOC") connector = extractedConn; } // ── Update wenn etwas gefunden ────────────────────────────────────────── if (txNm !== null || connector !== null) { await pool.query(` UPDATE transceivers SET wavelength_tx_nm = COALESCE($1, wavelength_tx_nm), wavelength_rx_nm = COALESCE($2, wavelength_rx_nm), connector_type = COALESCE($3, connector_type), updated_at = NOW() WHERE id = $4 `, [txNm, rxNm, connector, t.id]); } else { stillMissing++; } } console.log(` IEEE Lookup: ${fromIeee} enriched`); console.log(` Regex Extract: ${fromRegex} enriched`); console.log(` Still missing: ${stillMissing} (Quarantäne bis Daten verfügbar)`); console.log("=== Wavelength Enricher Complete ==="); } ``` --- ## SCHRITT 4: Neuer deterministischer Matcher **Ändere in:** `packages/scraper/src/scheduler.ts` Suche den Block `// fiber_type match` (ca. Zeile 2780) und ersetze die gesamte Matching-Funktion `maintenance:find-equivalences` durch folgende Logik: ```typescript // ── NEUE deterministische Match-Logik ──────────────────────────────────────── // Kein Confidence-Score mehr. Nur MATCH oder KEIN MATCH. // Voraussetzung: ALLE 6 Pflichtfelder müssen bekannt sein. // Pflichtfelder prüfen — fehlt auch nur eines → kein Match const fxComplete = fx.form_factor && fx.speed_gbps && fx.fiber_type && fx.reach_meters && fx.wavelength_tx_nm && fx.connector_type; const candComplete = cand.form_factor && cand.speed_gbps && cand.fiber_type && cand.reach_meters && cand.wavelength_tx_nm && cand.connector_type; if (!fxComplete || !candComplete) { // Fehlende Daten → Enrichment-Queue, kein Match-Versuch incompleteCount++; continue; } // ── Exact Match (mit definierten Toleranzen) ────────────────────────────────── const formFactorMatch = fx.form_factor.trim().toUpperCase() === cand.form_factor.trim().toUpperCase(); if (!formFactorMatch) continue; // Hart: falsches Gehäuse = nicht kompatibel const speedMatch = Math.abs(Number(fx.speed_gbps) - Number(cand.speed_gbps)) < 0.1; if (!speedMatch) continue; // Hart: 10G ≠ 25G const fiberMatch = fx.fiber_type.trim().toUpperCase() === cand.fiber_type.trim().toUpperCase(); if (!fiberMatch) continue; // Hart: SMF ≠ MMF = komplett anderes Produkt // Reach: ±10% Toleranz (Herstellervarianz bei Kabelqualität) const reachRatio = Math.abs(fx.reach_meters - cand.reach_meters) / Math.max(fx.reach_meters, 1); if (reachRatio > 0.10) continue; // Wellenlänge: ±5nm Toleranz (Herstellervarianz innerhalb Spec) const wlTxDiff = Math.abs((fx.wavelength_tx_nm ?? 0) - (cand.wavelength_tx_nm ?? 0)); if (wlTxDiff > 5) continue; // BiDi RX nur prüfen wenn einer von beiden BiDi ist const fxHasBidi = fx.wavelength_rx_nm != null; const candHasBidi = cand.wavelength_rx_nm != null; if (fxHasBidi || candHasBidi) { const wlRxDiff = Math.abs((fx.wavelength_rx_nm ?? 0) - (cand.wavelength_rx_nm ?? 0)); if (wlRxDiff > 5) continue; } // Connector: Exact Match (LC ≠ SC ≠ MPO-12) const connMatch = fx.connector_type.trim().toUpperCase() === cand.connector_type.trim().toUpperCase(); if (!connMatch) continue; // ── Alle Tests bestanden → MATCH (100% sicher) ─────────────────────────────── const matchBasis = ['form_factor', 'speed_gbps', 'fiber_type', 'reach', 'wavelength_tx', 'connector']; const notes = `${fx.part_number} ↔ ${cand.part_number} (${cand.vendor_name}) | ` + `basis: ${matchBasis.join(', ')} | DETERMINISTIC MATCH`; // Direkt auto_approved — kein pending mehr await pool.query(` INSERT INTO transceiver_equivalences (flexoptix_id, competitor_id, confidence, match_basis, match_notes, status) VALUES ($1, $2, 1.0, $3, $4, 'auto_approved') ON CONFLICT (flexoptix_id, competitor_id) DO UPDATE SET confidence = 1.0, match_basis = EXCLUDED.match_basis, match_notes = EXCLUDED.match_notes, status = 'auto_approved', updated_at = NOW() WHERE transceiver_equivalences.status = 'pending' `, [fx.id, cand.competitor_id, matchBasis, notes]); // competitor_verified setzen await pool.query(` UPDATE transceivers SET competitor_verified = true, competitor_verified_at = NOW(), competitor_status = 'matched', competitor_status_updated_at = NOW() WHERE id = $1 AND competitor_verified = false `, [fx.id]); matchedCount++; ``` --- ## SCHRITT 5: Pending-Queue bereinigen **Erstelle Datei:** `sql/107-clear-pending-queue.sql` ```sql -- Migration 107: Pending Queue bereinigen -- Alle 'pending' Einträge die KEINEN deterministischen Match haben → reject -- Alle 'pending' Einträge die jetzt deterministische Matches wären → werden -- durch den neuen Matcher bei nächstem Run neu erzeugt (als auto_approved) -- Schritt 1: Alle pending-Einträge rejcten (veraltete unsichere Matches) UPDATE transceiver_equivalences SET status = 'rejected', reject_reason = 'Superseded by deterministic matcher — confidence-based match removed', reviewed_at = NOW(), reviewed_by = 'system:migration-107' WHERE status = 'pending'; -- Schritt 2: Statistik loggen DO $$ DECLARE pending_count INTEGER; approved_count INTEGER; rejected_count INTEGER; BEGIN SELECT COUNT(*) INTO pending_count FROM transceiver_equivalences WHERE status = 'pending'; SELECT COUNT(*) INTO approved_count FROM transceiver_equivalences WHERE status IN ('approved', 'auto_approved'); SELECT COUNT(*) INTO rejected_count FROM transceiver_equivalences WHERE status = 'rejected'; RAISE NOTICE 'After migration 107: pending=%, approved=%, rejected=%', pending_count, approved_count, rejected_count; END; $$; -- Schritt 3: Index für deterministischen Matcher optimieren CREATE INDEX IF NOT EXISTS idx_eq_deterministic ON transceiver_equivalences (flexoptix_id, competitor_id, status) WHERE status = 'auto_approved'; ``` --- ## SCHRITT 6: pg-boss Job für Enrichment-Robot registrieren **Ändere in:** `packages/scraper/src/scheduler.ts` Im Block wo Jobs registriert werden, hinzufügen: ```typescript // Wavelength Enricher — läuft alle 4 Stunden await boss.schedule('enrich:wavelength', '0 */4 * * *', {}, {}); // Handler registrieren boss.work('enrich:wavelength', async () => { await runWavelengthEnricher(); }); ``` --- ## SCHRITT 7: API — Manual Review UI deaktivieren **Ändere in:** `packages/api/src/routes/review.ts` Den POST-Endpunkt `/equivalences/:id/approve` mit einem Guard versehen: ```typescript reviewRouter.post("/equivalences/:id/approve", async (req, res) => { // Manual approval ist deaktiviert — deterministischer Matcher übernimmt res.status(410).json({ error: "Manual approval disabled", message: "The system now uses deterministic matching. No manual review needed.", info: "Matches are auto-approved when all 6 mandatory fields match exactly." }); }); ``` --- ## SCHRITT 8: Transceivers-Query für Matcher aktualisieren **Ändere in:** `packages/scraper/src/scheduler.ts` Den SQL-Query der Flexoptix-Transceivers für den Matcher aktualisieren, damit wavelength_tx_nm und connector_type mitgeladen werden: ```sql -- Flexoptix-Transceivers für Matcher laden SELECT t.id, t.part_number, t.standard_name, t.form_factor, t.speed_gbps, t.fiber_type, t.reach_meters, t.wavelengths, t.wavelength_tx_nm, -- NEU t.wavelength_rx_nm, -- NEU t.connector_type, -- NEU (war vorher 'connector') t.data_completeness, -- NEU t.enrichment_needed -- NEU FROM transceivers t JOIN vendors v ON v.id = t.vendor_id WHERE v.name = 'Flexoptix' AND t.enrichment_needed = FALSE -- NUR vollständige Datensätze matchen AND t.data_completeness >= 80 ORDER BY t.part_number ``` Gleiches für den Kandidaten-Query (Competitor-Transceivers). --- ## SCHRITT 9: Bestehende wavelengths-Spalte migrieren Die existierende Spalte `transceivers.wavelengths` (TEXT, z.B. "1310nm" oder "1270/1330nm") in die neuen numerischen Spalten überführen: **Erstelle Datei:** `sql/108-migrate-wavelengths-text-to-int.sql` ```sql -- Migration 108: wavelengths TEXT → wavelength_tx_nm / wavelength_rx_nm INTEGER UPDATE transceivers SET wavelength_tx_nm = CASE WHEN wavelengths ~ '^\s*850' THEN 850 WHEN wavelengths ~ '^\s*1270' THEN 1270 WHEN wavelengths ~ '^\s*1310' THEN 1310 WHEN wavelengths ~ '^\s*1330' THEN 1330 WHEN wavelengths ~ '^\s*1490' THEN 1490 WHEN wavelengths ~ '^\s*1550' THEN 1550 WHEN wavelengths ~ '^\s*1270\s*/\s*1330' THEN 1270 WHEN wavelengths ~ '^\s*1330\s*/\s*1270' THEN 1330 ELSE NULL END, wavelength_rx_nm = CASE WHEN wavelengths ~ '1270\s*/\s*1330' THEN 1330 WHEN wavelengths ~ '1330\s*/\s*1270' THEN 1270 WHEN wavelengths ~ '1310\s*/\s*1550' THEN 1550 WHEN wavelengths ~ '1550\s*/\s*1310' THEN 1310 ELSE NULL END WHERE wavelengths IS NOT NULL AND wavelength_tx_nm IS NULL; -- Connector aus alter connector-Spalte übernehmen (falls vorhanden) UPDATE transceivers SET connector_type = CASE connector WHEN 'LC' THEN 'LC' WHEN 'SC' THEN 'SC' WHEN 'MPO' THEN 'MPO-12' WHEN 'MPO-12' THEN 'MPO-12' WHEN 'MPO-16' THEN 'MPO-16' WHEN 'RJ45' THEN 'RJ45' ELSE connector END WHERE connector IS NOT NULL AND connector_type IS NULL; -- Completeness neu berechnen nach Migration UPDATE transceivers SET data_completeness = calc_data_completeness( form_factor, speed_gbps, fiber_type, reach_meters, wavelength_tx_nm, connector_type ), enrichment_needed = ( form_factor IS NULL OR speed_gbps IS NULL OR fiber_type IS NULL OR reach_meters IS NULL OR wavelength_tx_nm IS NULL OR connector_type IS NULL ); ``` --- ## Ausführungsreihenfolge für Codex ``` 1. sql/105-wavelength-connector-completeness.sql → psql ausführen 2. sql/106-ieee-msa-wavelength-lookup.sql → psql ausführen 3. sql/108-migrate-wavelengths-text-to-int.sql → psql ausführen 4. packages/scraper/src/robots/wavelength-enricher.ts → neue Datei erstellen 5. packages/scraper/src/scheduler.ts → Matcher-Logik ersetzen (Schritt 4) 6. packages/scraper/src/scheduler.ts → Job registrieren (Schritt 6) 7. packages/api/src/routes/review.ts → Approve-Endpunkt deaktivieren (Schritt 7) 8. npm run build -w packages/scraper → Build 9. npm run build -w packages/api → Build 10. sql/107-clear-pending-queue.sql → psql ausführen (ZULETZT — bereinigt Queue) 11. pm2 restart tip-scraper-daemon → Daemon neu starten 12. Enricher manuell triggern: POST /api/review/run-matcher ``` --- ## Erwartetes Ergebnis nach Deployment | Metrik | Vorher | Nachher | |--------|--------|---------| | Pending Queue | 13.374 | 0 | | Confidence-Score | 0.0–1.0 (fuzzy) | entfällt | | Match-Typ | probabilistisch | deterministisch | | Manuelle Freigaben/Tag | ~50-200 | 0 | | False-Positive-Rate | ~15% | ~0% | | Transceivers mit Wellenlänge | ~30% | >85% (nach Enricher) | --- ## Nicht ändern (außerhalb dieses Tasks) - `packages/scraper/src/scrapers/*` — Scraper-Logik bleibt unverändert - `packages/api/src/routes/` (außer review.ts) — API-Endpunkte bleiben - `sql/001-036` — Bestehende Migrationen nicht anfassen - Alle `packages/core/src/` Typen — nur erweitern, nicht löschen - PM2-Konfiguration — nicht anfassen