transceiver-db/CODEX-TASK-zero-manual-review.md
Rene Fichtmueller 0edc6e3f3a feat: Pi scraper fleet — fetch-only index-pi.ts + FS.COM/NADDOD via SOCKS5
- index-pi.ts: removed Playwright scrapers (FS.COM, eBay enricher, switch assets)
  added NADDOD (fetch-based, benefits from residential IP)
  now 32 fetch-only queues safe for ARM/Pi without Chromium
- index-fs-only.ts: new dedicated FS.COM + NADDOD worker for Erik
  routes through Pi SOCKS5 via PROXY_URLS=socks5://10.10.0.6:1080
  Crawlee ProxyConfiguration automatically applies to Playwright crawler
- pi-scraper-setup.sh: removed inline index-pi.ts override (repo version now authoritative)
- CODEX-TASK-pi-scraper-deploy.md: full 9-step Codex spec for Pi fleet setup
  covers WireGuard keypair, Erik peer config, setup script, ecosystem.config.js
- CODEX-TASK-zero-manual-review.md: deterministic equivalence matcher spec
2026-05-10 09:53:55 +02:00

31 KiB
Raw Permalink Blame History

CODEX TASK: Zero Manual Review Queue — Deterministic Equivalence Matching

Ziel

Die manuelle Review-Queue (transceiver_equivalences mit status = 'pending') wird eliminiert. Statt probabilistischer Confidence-Scores wird ein deterministisches Exact-Match-System gebaut, das nur dann einen Match erzeugt, wenn ALLE Pflichtfelder vorhanden und exakt gleich sind.

Aktueller Zustand (PROBLEM):

  • 13.374 Einträge in pending → manuelle Freigabe nötig
  • Confidence-Score 0.01.0 → "56%" bedeutet Unsicherheit → Review nötig
  • Fehlende Felder (wavelength=?) führen zu unsicheren Matches
  • auto_approved ab 0.73 Confidence — zu niedrige Schwelle

Ziel-Zustand (LÖSUNG):

  • 0 Einträge in pending — nie wieder
  • Kein Confidence-Score — nur MATCH oder KEIN MATCH
  • Fehlende Felder → Enrichment-Job → kein Match bis Daten vollständig
  • Nur Exact-Match (mit definierten Toleranzen) → 100% verlässliche Daten

Repository

Pfad:     /opt/tip   (auf Erik-Server)  ODER
          /Users/renefichtmueller/Desktop/Claude Code/github-repos/transceiver-db  (lokal)

Stack:    TypeScript, Node.js, PostgreSQL 17, pg-boss (Job-Queue)
Packages: packages/scraper/src/scheduler.ts  ← Haupt-Matching-Logik
          packages/api/src/routes/review.ts   ← API-Endpunkte
          sql/                                ← DB-Migrationen (nummeriert 001-104)

SCHRITT 1: Datenbank-Migration — Neue Pflichtfelder

Erstelle Datei: sql/105-wavelength-connector-completeness.sql

-- Migration 105: Wavelength (TX/RX getrennt für BiDi), Connector-Normalisierung,
-- Data-Completeness-Score, Enrichment-Flag

-- 1. Wellenlängen-Felder aufteilen (BiDi hat TX ≠ RX)
ALTER TABLE transceivers
  ADD COLUMN IF NOT EXISTS wavelength_tx_nm  INTEGER,   -- TX-Wellenlänge in nm (z.B. 1270)
  ADD COLUMN IF NOT EXISTS wavelength_rx_nm  INTEGER,   -- RX-Wellenlänge in nm (z.B. 1330)
  ADD COLUMN IF NOT EXISTS connector_type    TEXT,      -- 'LC', 'SC', 'MPO-12', 'MPO-16', 'RJ45', 'CS', 'SN'
  ADD COLUMN IF NOT EXISTS data_completeness INTEGER DEFAULT 0 CHECK (data_completeness BETWEEN 0 AND 100),
  ADD COLUMN IF NOT EXISTS enrichment_needed BOOLEAN DEFAULT FALSE,
  ADD COLUMN IF NOT EXISTS enrichment_fields TEXT[]  DEFAULT '{}';  -- welche Felder fehlen noch

-- 2. Indices für Performance
CREATE INDEX IF NOT EXISTS idx_tx_wavelength_tx ON transceivers (wavelength_tx_nm) WHERE wavelength_tx_nm IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_tx_wavelength_rx ON transceivers (wavelength_rx_nm) WHERE wavelength_rx_nm IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_tx_completeness  ON transceivers (data_completeness);
CREATE INDEX IF NOT EXISTS idx_tx_enrichment    ON transceivers (enrichment_needed) WHERE enrichment_needed = TRUE;

-- 3. Completeness-Berechnungsfunktion
CREATE OR REPLACE FUNCTION calc_data_completeness(
  p_form_factor TEXT, p_speed_gbps NUMERIC, p_fiber_type TEXT,
  p_reach_meters INTEGER, p_wavelength_tx INTEGER, p_connector TEXT
) RETURNS INTEGER AS $$
DECLARE
  score INTEGER := 0;
BEGIN
  IF p_form_factor IS NOT NULL AND p_form_factor != '' THEN score := score + 20; END IF;
  IF p_speed_gbps  IS NOT NULL AND p_speed_gbps > 0    THEN score := score + 20; END IF;
  IF p_fiber_type  IS NOT NULL AND p_fiber_type != ''  THEN score := score + 20; END IF;
  IF p_reach_meters IS NOT NULL AND p_reach_meters > 0  THEN score := score + 20; END IF;
  IF p_wavelength_tx IS NOT NULL AND p_wavelength_tx > 0 THEN score := score + 10; END IF;
  IF p_connector   IS NOT NULL AND p_connector != ''   THEN score := score + 10; END IF;
  RETURN score;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

-- 4. Alle bestehenden Transceivers: Completeness initial berechnen
UPDATE transceivers SET
  data_completeness = calc_data_completeness(
    form_factor, speed_gbps, fiber_type,
    reach_meters, wavelength_tx_nm, connector_type
  ),
  enrichment_needed = (
    form_factor IS NULL OR speed_gbps IS NULL OR
    fiber_type IS NULL OR reach_meters IS NULL OR
    wavelength_tx_nm IS NULL OR connector_type IS NULL
  );

-- 5. Trigger: Completeness automatisch aktualisieren
CREATE OR REPLACE FUNCTION trg_update_completeness()
RETURNS TRIGGER AS $$
BEGIN
  NEW.data_completeness := calc_data_completeness(
    NEW.form_factor, NEW.speed_gbps, NEW.fiber_type,
    NEW.reach_meters, NEW.wavelength_tx_nm, NEW.connector_type
  );
  NEW.enrichment_needed := (
    NEW.form_factor IS NULL OR NEW.speed_gbps IS NULL OR
    NEW.fiber_type IS NULL OR NEW.reach_meters IS NULL OR
    NEW.wavelength_tx_nm IS NULL OR NEW.connector_type IS NULL
  );
  -- Fehlende Felder dokumentieren
  NEW.enrichment_fields := ARRAY_REMOVE(ARRAY[
    CASE WHEN NEW.form_factor     IS NULL THEN 'form_factor'     END,
    CASE WHEN NEW.speed_gbps      IS NULL THEN 'speed_gbps'      END,
    CASE WHEN NEW.fiber_type      IS NULL THEN 'fiber_type'      END,
    CASE WHEN NEW.reach_meters    IS NULL THEN 'reach_meters'    END,
    CASE WHEN NEW.wavelength_tx_nm IS NULL THEN 'wavelength_tx_nm' END,
    CASE WHEN NEW.connector_type  IS NULL THEN 'connector_type'  END
  ], NULL);
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

DROP TRIGGER IF EXISTS trg_completeness ON transceivers;
CREATE TRIGGER trg_completeness
  BEFORE INSERT OR UPDATE ON transceivers
  FOR EACH ROW EXECUTE FUNCTION trg_update_completeness();

COMMENT ON COLUMN transceivers.wavelength_tx_nm  IS 'TX wavelength in nm. For BiDi: TX side. For duplex: both TX=RX.';
COMMENT ON COLUMN transceivers.wavelength_rx_nm  IS 'RX wavelength in nm. Only set for BiDi. NULL = same as TX.';
COMMENT ON COLUMN transceivers.connector_type    IS 'Physical connector: LC, SC, MPO-12, MPO-16, RJ45, CS, SN';
COMMENT ON COLUMN transceivers.data_completeness IS '0-100: percentage of mandatory fields filled (6 fields × weight)';
COMMENT ON COLUMN transceivers.enrichment_needed IS 'TRUE = one or more mandatory fields missing, enrichment job needed';
COMMENT ON COLUMN transceivers.enrichment_fields IS 'Array of field names that still need enrichment';

SCHRITT 2: IEEE/MSA Standards-Lookup-Tabelle (statisch)

Erstelle Datei: sql/106-ieee-msa-wavelength-lookup.sql

-- Migration 106: IEEE/MSA Standards Wavelength Lookup
-- Ground-Truth für Wellenlänge basierend auf Standard-Spezifikation
-- Quelle: IEEE 802.3, SFF-8472, SFF-8436, SFF-8661, SFF-8679, MSA specs

CREATE TABLE IF NOT EXISTS ieee_wavelength_lookup (
  id              SERIAL PRIMARY KEY,
  form_factor     TEXT    NOT NULL,
  speed_gbps      NUMERIC NOT NULL,
  fiber_type      TEXT    NOT NULL,   -- 'SMF', 'MMF', 'DAC', 'AOC'
  reach_min_m     INTEGER NOT NULL,
  reach_max_m     INTEGER NOT NULL,
  wavelength_tx_nm INTEGER NOT NULL,
  wavelength_rx_nm INTEGER,           -- NULL = gleich wie TX (kein BiDi)
  connector_type  TEXT    NOT NULL,
  ieee_standard   TEXT,               -- z.B. '802.3ae', 'SFF-8431'
  notes           TEXT,
  UNIQUE (form_factor, speed_gbps, fiber_type, reach_min_m, reach_max_m)
);

INSERT INTO ieee_wavelength_lookup
  (form_factor, speed_gbps, fiber_type, reach_min_m, reach_max_m, wavelength_tx_nm, wavelength_rx_nm, connector_type, ieee_standard, notes)
VALUES
-- ── SFP (1G) ─────────────────────────────────────────────────────────────────
  ('SFP',  1,   'MMF', 0,     550,   850,  NULL,  'LC', '802.3z',   '1000BASE-SX'),
  ('SFP',  1,   'SMF', 0,    10000,  1310, NULL,  'LC', '802.3z',   '1000BASE-LX'),
  ('SFP',  1,   'SMF', 0,    40000,  1310, NULL,  'LC', '802.3z',   '1000BASE-EX'),
  ('SFP',  1,   'SMF', 0,    70000,  1550, NULL,  'LC', '802.3z',   '1000BASE-ZX'),
  ('SFP',  1,   'SMF', 0,    10000,  1270, 1330,  'LC', 'SFF-8472', '1000BASE-BX10-U BiDi'),
  ('SFP',  1,   'SMF', 0,    10000,  1330, 1270,  'LC', 'SFF-8472', '1000BASE-BX10-D BiDi'),
  ('SFP',  1,   'Copper', 0,   100,  NULL, NULL,  'RJ45','802.3ab', '1000BASE-T'),
-- ── SFP+ (10G) ───────────────────────────────────────────────────────────────
  ('SFP+', 10,  'MMF', 0,     300,   850,  NULL,  'LC', '802.3ae',  '10GBASE-SR'),
  ('SFP+', 10,  'SMF', 0,   10000,  1310,  NULL,  'LC', '802.3ae',  '10GBASE-LR'),
  ('SFP+', 10,  'SMF', 0,   40000,  1310,  NULL,  'LC', '802.3ae',  '10GBASE-ER'),
  ('SFP+', 10,  'SMF', 0,   80000,  1550,  NULL,  'LC', '802.3ae',  '10GBASE-ZR'),
  ('SFP+', 10,  'SMF', 0,   10000,  1270,  1330,  'LC', 'SFF-8431', '10GBASE-BX10-U BiDi'),
  ('SFP+', 10,  'SMF', 0,   10000,  1330,  1270,  'LC', 'SFF-8431', '10GBASE-BX10-D BiDi'),
  ('SFP+', 10,  'SMF', 0,   20000,  1270,  1330,  'LC', 'SFF-8431', '10GBASE-BX20-U BiDi'),
  ('SFP+', 10,  'SMF', 0,   20000,  1330,  1270,  'LC', 'SFF-8431', '10GBASE-BX20-D BiDi'),
  ('SFP+', 10,  'DAC', 0,       7,  NULL,  NULL,  'SFP+','SFF-8431','10G DAC Twinax'),
  ('SFP+', 10,  'AOC', 0,     100,  850,   NULL,  'LC', 'SFF-8431', '10G AOC'),
-- ── SFP28 (25G) ──────────────────────────────────────────────────────────────
  ('SFP28', 25, 'MMF', 0,     100,   850,  NULL,  'LC', '802.3by',  '25GBASE-SR'),
  ('SFP28', 25, 'SMF', 0,   10000,  1310,  NULL,  'LC', '802.3cc',  '25GBASE-LR'),
  ('SFP28', 25, 'SMF', 0,    2000,  1310,  NULL,  'LC', '25GBASE-DR','25GBASE-DR'),
  ('SFP28', 25, 'SMF', 0,   10000,  1270,  1330,  'LC', '802.3cc',  '25GBASE-BX10-U BiDi'),
  ('SFP28', 25, 'SMF', 0,   10000,  1330,  1270,  'LC', '802.3cc',  '25GBASE-BX10-D BiDi'),
  ('SFP28', 25, 'DAC', 0,       5,  NULL,  NULL,  'SFP28','802.3by','25G DAC'),
-- ── QSFP+ (40G) ──────────────────────────────────────────────────────────────
  ('QSFP+', 40, 'MMF', 0,     150,   850,  NULL,  'MPO-12','802.3ba','40GBASE-SR4'),
  ('QSFP+', 40, 'SMF', 0,   10000,  1310,  NULL,  'LC', '802.3ba',  '40GBASE-LR4 CWDM4'),
  ('QSFP+', 40, 'SMF', 0,    2000,  1310,  NULL,  'MPO-12','802.3bm','40GBASE-PSM4'),
  ('QSFP+', 40, 'DAC', 0,       7,  NULL,  NULL,  'QSFP+','802.3ba','40G DAC'),
-- ── QSFP28 (100G) ────────────────────────────────────────────────────────────
  ('QSFP28', 100, 'MMF', 0,    100,  850,  NULL,  'MPO-12','802.3bm','100GBASE-SR4'),
  ('QSFP28', 100, 'SMF', 0,  10000, 1310,  NULL,  'LC', '802.3cd',  '100GBASE-LR4 CWDM4'),
  ('QSFP28', 100, 'SMF', 0,    500, 1310,  NULL,  'MPO-12','802.3bj','100GBASE-DR (PSM4)'),
  ('QSFP28', 100, 'SMF', 0,  40000, 1310,  NULL,  'LC', '802.3ba',  '100GBASE-ER4'),
  ('QSFP28', 100, 'SMF', 0,   2000, 1310,  NULL,  'LC', 'CWDM4-MSA','100G CWDM4 2km'),
  ('QSFP28', 100, 'DAC', 0,      5, NULL,  NULL,  'QSFP28','802.3bj','100G DAC'),
  ('QSFP28', 100, 'AOC', 0,    100,  850,  NULL,  'MPO-12','802.3bm','100G AOC SR4'),
-- ── QSFP56 (200G) ────────────────────────────────────────────────────────────
  ('QSFP56', 200, 'MMF', 0,    100,  850,  NULL,  'MPO-16','802.3cd','200GBASE-SR4'),
  ('QSFP56', 200, 'SMF', 0,   2000, 1310,  NULL,  'LC', '802.3cd',  '200GBASE-DR4'),
  ('QSFP56', 200, 'SMF', 0,  10000, 1310,  NULL,  'LC', '802.3cd',  '200GBASE-FR4'),
  ('QSFP56', 200, 'SMF', 0,  40000, 1310,  NULL,  'LC', '802.3cd',  '200GBASE-LR4'),
-- ── QSFP-DD (400G) ───────────────────────────────────────────────────────────
  ('QSFP-DD', 400, 'MMF', 0,   100,  850,  NULL,  'MPO-16','802.3bs','400GBASE-SR8'),
  ('QSFP-DD', 400, 'SMF', 0,   500, 1310,  NULL,  'MPO-12','802.3bs','400GBASE-DR4'),
  ('QSFP-DD', 400, 'SMF', 0,  2000, 1310,  NULL,  'LC', '802.3bs',  '400GBASE-FR4'),
  ('QSFP-DD', 400, 'SMF', 0, 10000, 1310,  NULL,  'LC', '802.3bs',  '400GBASE-LR4'),
  ('QSFP-DD', 400, 'SMF', 0, 10000, 1310,  NULL,  'MPO-12','800G MSA','400GBASE-PSM4'),
  ('QSFP-DD', 400, 'DAC', 0,     5, NULL,  NULL,  'QSFP-DD','802.3bs','400G DAC'),
-- ── OSFP / QSFP-DD800 (800G) ─────────────────────────────────────────────────
  ('OSFP',    800, 'MMF', 0,    100,  850,  NULL,  'MPO-16','802.3df','800GBASE-SR8'),
  ('OSFP',    800, 'SMF', 0,    500, 1310,  NULL,  'MPO-12','802.3df','800GBASE-DR8'),
  ('OSFP',    800, 'SMF', 0,   2000, 1310,  NULL,  'LC', '802.3df',  '800GBASE-FR4 2x400G'),
  ('OSFP',    800, 'SMF', 0,  10000, 1310,  NULL,  'LC', '802.3df',  '800GBASE-LR4'),
  ('QSFP-DD800', 800, 'MMF', 0, 100, 850,  NULL,  'MPO-16','802.3df','800GBASE-SR8'),
  ('QSFP-DD800', 800, 'SMF', 0, 500, 1310, NULL,  'MPO-12','802.3df','800GBASE-DR8')
ON CONFLICT DO NOTHING;

CREATE INDEX IF NOT EXISTS idx_ieee_lookup ON ieee_wavelength_lookup
  (form_factor, speed_gbps, fiber_type, reach_min_m, reach_max_m);

SCHRITT 3: Enrichment-Robot (neues Modul)

Erstelle Datei: packages/scraper/src/robots/wavelength-enricher.ts

Dieser Robot läuft als pg-boss Job (enrich:wavelength) alle 4 Stunden. Er füllt wavelength_tx_nm, wavelength_rx_nm, connector_type automatisch.

/**
 * Wavelength Enricher Robot
 *
 * Füllt fehlende wavelength_tx_nm / wavelength_rx_nm / connector_type
 * aus drei Quellen (Priorität absteigend):
 *   1. IEEE/MSA Lookup-Tabelle (sql/106) — deterministisch, keine Kosten
 *   2. Produktname-Regex (heuristisch, pattern-basiert)
 *   3. Quarantäne: Produkt bleibt ohne Match bis Daten vorhanden
 *
 * Kein LLM, kein Scraper, keine externen Calls — rein datenbankbasiert.
 */
import { pool } from "../utils/db";

// ── Regex-Patterns für Wellenlänge aus Produktnamen ──────────────────────────

const WAVELENGTH_PATTERNS: Array<{
  pattern: RegExp;
  tx: number;
  rx?: number;
  notes: string;
}> = [
  // BiDi explizit
  { pattern: /\b1270\s*\/\s*1330\b/i, tx: 1270, rx: 1330, notes: "BiDi 1270/1330" },
  { pattern: /\b1330\s*\/\s*1270\b/i, tx: 1330, rx: 1270, notes: "BiDi 1330/1270" },
  { pattern: /\b1310\s*\/\s*1550\b/i, tx: 1310, rx: 1550, notes: "BiDi 1310/1550" },
  { pattern: /\b1550\s*\/\s*1310\b/i, tx: 1550, rx: 1310, notes: "BiDi 1550/1310" },
  { pattern: /\b1295\s*\/\s*1310\b/i, tx: 1295, rx: 1310, notes: "BiDi CWDM" },
  // Direkte nm-Angabe
  { pattern: /\b850\s*nm\b/i,         tx: 850,              notes: "850nm explicit" },
  { pattern: /\b1310\s*nm\b/i,        tx: 1310,             notes: "1310nm explicit" },
  { pattern: /\b1550\s*nm\b/i,        tx: 1550,             notes: "1550nm explicit" },
  { pattern: /\b1270\s*nm\b/i,        tx: 1270,             notes: "1270nm explicit" },
  { pattern: /\b1330\s*nm\b/i,        tx: 1330,             notes: "1330nm explicit" },
  // DWDM Channels (C-Band ~1530-1565nm)
  { pattern: /\bDWDM\b.*\bC\d{2}\b/i, tx: 1550,            notes: "DWDM C-Band" },
  { pattern: /\bDWDM\b/i,             tx: 1550,             notes: "DWDM generic" },
  // Standard-Kurzbezeichnungen → implizite Wellenlänge
  { pattern: /\bSR4?\b/i,             tx: 850,              notes: "SR/SR4 = 850nm MMF" },
  { pattern: /\bLR4?\b/i,             tx: 1310,             notes: "LR/LR4 = 1310nm SMF" },
  { pattern: /\bER4?\b/i,             tx: 1310,             notes: "ER/ER4 = 1310nm SMF" },
  { pattern: /\bFR4?\b/i,             tx: 1310,             notes: "FR/FR4 = 1310nm SMF" },
  { pattern: /\bDR4?\b/i,             tx: 1310,             notes: "DR/DR4 = 1310nm SMF" },
  { pattern: /\bZR4?\b/i,             tx: 1550,             notes: "ZR/ZR4 = 1550nm SMF" },
];

const CONNECTOR_PATTERNS: Array<{ pattern: RegExp; connector: string }> = [
  { pattern: /\bMPO.?16\b/i,      connector: "MPO-16" },
  { pattern: /\bMPO.?12\b/i,      connector: "MPO-12" },
  { pattern: /\bMPO\b/i,          connector: "MPO-12" },  // default MPO = MPO-12
  { pattern: /\bMTP\b/i,          connector: "MPO-12" },
  { pattern: /\bCS\s*connector\b/i, connector: "CS" },
  { pattern: /\bSN\s*connector\b/i, connector: "SN" },
  { pattern: /\bRJ.?45\b/i,       connector: "RJ45" },
  { pattern: /\bbase.?t\b/i,      connector: "RJ45" },
  { pattern: /\bSC\b/i,           connector: "SC" },
  { pattern: /\bLC\b/i,           connector: "LC" },      // LC zuletzt (häufig im Text)
];

// DAC/AOC haben keinen Fiber-Connector
const DAC_AOC_PATTERN = /\bDAC\b|\bAOC\b|\btwinax\b/i;

function extractWavelengthFromName(name: string): { tx: number; rx?: number; notes: string } | null {
  for (const p of WAVELENGTH_PATTERNS) {
    if (p.pattern.test(name)) {
      return { tx: p.tx, rx: p.rx, notes: p.notes };
    }
  }
  return null;
}

function extractConnectorFromName(name: string): string | null {
  if (DAC_AOC_PATTERN.test(name)) return "DAC/AOC";
  for (const p of CONNECTOR_PATTERNS) {
    if (p.pattern.test(name)) return p.connector;
  }
  return null;
}

export async function runWavelengthEnricher(): Promise<void> {
  console.log("=== Wavelength Enricher Robot ===");

  // Alle Transceivers mit fehlenden Pflichtfeldern
  const { rows: transceivers } = await pool.query<{
    id: string;
    standard_name: string;
    part_number: string;
    form_factor: string;
    speed_gbps: number;
    fiber_type: string;
    reach_meters: number;
    wavelength_tx_nm: number | null;
    wavelength_rx_nm: number | null;
    connector_type: string | null;
  }>(`
    SELECT id, standard_name, part_number, form_factor, speed_gbps,
           fiber_type, reach_meters, wavelength_tx_nm, wavelength_rx_nm, connector_type
    FROM transceivers
    WHERE enrichment_needed = TRUE
    ORDER BY data_completeness DESC  -- Produkte mit mehr Daten zuerst
    LIMIT 5000
  `);

  let fromIeee = 0;
  let fromRegex = 0;
  let stillMissing = 0;

  for (const t of transceivers) {
    let txNm = t.wavelength_tx_nm;
    let rxNm = t.wavelength_rx_nm;
    let connector = t.connector_type;
    let source = "";

    // ── Quelle 1: IEEE/MSA Lookup ───────────────────────────────────────────
    if (txNm === null && t.form_factor && t.speed_gbps && t.fiber_type && t.reach_meters) {
      const { rows: ieee } = await pool.query<{
        wavelength_tx_nm: number;
        wavelength_rx_nm: number | null;
        connector_type: string;
      }>(`
        SELECT wavelength_tx_nm, wavelength_rx_nm, connector_type
        FROM ieee_wavelength_lookup
        WHERE form_factor = $1
          AND speed_gbps = $2
          AND fiber_type = $3
          AND reach_min_m <= $4
          AND reach_max_m >= $4
        LIMIT 1
      `, [t.form_factor, t.speed_gbps, t.fiber_type, t.reach_meters]);

      if (ieee.length > 0) {
        txNm = ieee[0].wavelength_tx_nm;
        rxNm = ieee[0].wavelength_rx_nm ?? null;
        if (!connector) connector = ieee[0].connector_type;
        source = "ieee_lookup";
        fromIeee++;
      }
    }

    // ── Quelle 2: Produktname-Regex ─────────────────────────────────────────
    const nameForExtraction = [t.standard_name, t.part_number].filter(Boolean).join(" ");

    if (txNm === null && nameForExtraction) {
      const extracted = extractWavelengthFromName(nameForExtraction);
      if (extracted) {
        txNm = extracted.tx;
        rxNm = extracted.rx ?? null;
        source = `regex:${extracted.notes}`;
        fromRegex++;
      }
    }

    if (connector === null && nameForExtraction) {
      const extractedConn = extractConnectorFromName(nameForExtraction);
      if (extractedConn && extractedConn !== "DAC/AOC") connector = extractedConn;
    }

    // ── Update wenn etwas gefunden ──────────────────────────────────────────
    if (txNm !== null || connector !== null) {
      await pool.query(`
        UPDATE transceivers SET
          wavelength_tx_nm = COALESCE($1, wavelength_tx_nm),
          wavelength_rx_nm = COALESCE($2, wavelength_rx_nm),
          connector_type   = COALESCE($3, connector_type),
          updated_at       = NOW()
        WHERE id = $4
      `, [txNm, rxNm, connector, t.id]);
    } else {
      stillMissing++;
    }
  }

  console.log(`  IEEE Lookup:    ${fromIeee} enriched`);
  console.log(`  Regex Extract:  ${fromRegex} enriched`);
  console.log(`  Still missing:  ${stillMissing} (Quarantäne bis Daten verfügbar)`);
  console.log("=== Wavelength Enricher Complete ===");
}

SCHRITT 4: Neuer deterministischer Matcher

Ändere in: packages/scraper/src/scheduler.ts

Suche den Block // fiber_type match (ca. Zeile 2780) und ersetze die gesamte Matching-Funktion maintenance:find-equivalences durch folgende Logik:

// ── NEUE deterministische Match-Logik ────────────────────────────────────────
// Kein Confidence-Score mehr. Nur MATCH oder KEIN MATCH.
// Voraussetzung: ALLE 6 Pflichtfelder müssen bekannt sein.

// Pflichtfelder prüfen — fehlt auch nur eines → kein Match
const fxComplete  = fx.form_factor && fx.speed_gbps && fx.fiber_type &&
                    fx.reach_meters && fx.wavelength_tx_nm && fx.connector_type;
const candComplete = cand.form_factor && cand.speed_gbps && cand.fiber_type &&
                     cand.reach_meters && cand.wavelength_tx_nm && cand.connector_type;

if (!fxComplete || !candComplete) {
  // Fehlende Daten → Enrichment-Queue, kein Match-Versuch
  incompleteCount++;
  continue;
}

// ── Exact Match (mit definierten Toleranzen) ──────────────────────────────────
const formFactorMatch = fx.form_factor.trim().toUpperCase() === cand.form_factor.trim().toUpperCase();
if (!formFactorMatch) continue;  // Hart: falsches Gehäuse = nicht kompatibel

const speedMatch = Math.abs(Number(fx.speed_gbps) - Number(cand.speed_gbps)) < 0.1;
if (!speedMatch) continue;  // Hart: 10G ≠ 25G

const fiberMatch = fx.fiber_type.trim().toUpperCase() === cand.fiber_type.trim().toUpperCase();
if (!fiberMatch) continue;  // Hart: SMF ≠ MMF = komplett anderes Produkt

// Reach: ±10% Toleranz (Herstellervarianz bei Kabelqualität)
const reachRatio = Math.abs(fx.reach_meters - cand.reach_meters) / Math.max(fx.reach_meters, 1);
if (reachRatio > 0.10) continue;

// Wellenlänge: ±5nm Toleranz (Herstellervarianz innerhalb Spec)
const wlTxDiff = Math.abs((fx.wavelength_tx_nm ?? 0) - (cand.wavelength_tx_nm ?? 0));
if (wlTxDiff > 5) continue;

// BiDi RX nur prüfen wenn einer von beiden BiDi ist
const fxHasBidi  = fx.wavelength_rx_nm != null;
const candHasBidi = cand.wavelength_rx_nm != null;
if (fxHasBidi || candHasBidi) {
  const wlRxDiff = Math.abs((fx.wavelength_rx_nm ?? 0) - (cand.wavelength_rx_nm ?? 0));
  if (wlRxDiff > 5) continue;
}

// Connector: Exact Match (LC ≠ SC ≠ MPO-12)
const connMatch = fx.connector_type.trim().toUpperCase() === cand.connector_type.trim().toUpperCase();
if (!connMatch) continue;

// ── Alle Tests bestanden → MATCH (100% sicher) ───────────────────────────────
const matchBasis = ['form_factor', 'speed_gbps', 'fiber_type', 'reach', 'wavelength_tx', 'connector'];
const notes = `${fx.part_number}${cand.part_number} (${cand.vendor_name}) | ` +
              `basis: ${matchBasis.join(', ')} | DETERMINISTIC MATCH`;

// Direkt auto_approved — kein pending mehr
await pool.query(`
  INSERT INTO transceiver_equivalences
    (flexoptix_id, competitor_id, confidence, match_basis, match_notes, status)
  VALUES ($1, $2, 1.0, $3, $4, 'auto_approved')
  ON CONFLICT (flexoptix_id, competitor_id) DO UPDATE SET
    confidence   = 1.0,
    match_basis  = EXCLUDED.match_basis,
    match_notes  = EXCLUDED.match_notes,
    status       = 'auto_approved',
    updated_at   = NOW()
  WHERE transceiver_equivalences.status = 'pending'
`, [fx.id, cand.competitor_id, matchBasis, notes]);

// competitor_verified setzen
await pool.query(`
  UPDATE transceivers
  SET competitor_verified = true,
      competitor_verified_at = NOW(),
      competitor_status = 'matched',
      competitor_status_updated_at = NOW()
  WHERE id = $1 AND competitor_verified = false
`, [fx.id]);

matchedCount++;

SCHRITT 5: Pending-Queue bereinigen

Erstelle Datei: sql/107-clear-pending-queue.sql

-- Migration 107: Pending Queue bereinigen
-- Alle 'pending' Einträge die KEINEN deterministischen Match haben → reject
-- Alle 'pending' Einträge die jetzt deterministische Matches wären → werden
-- durch den neuen Matcher bei nächstem Run neu erzeugt (als auto_approved)

-- Schritt 1: Alle pending-Einträge rejcten (veraltete unsichere Matches)
UPDATE transceiver_equivalences
SET status       = 'rejected',
    reject_reason = 'Superseded by deterministic matcher — confidence-based match removed',
    reviewed_at   = NOW(),
    reviewed_by   = 'system:migration-107'
WHERE status = 'pending';

-- Schritt 2: Statistik loggen
DO $$
DECLARE
  pending_count   INTEGER;
  approved_count  INTEGER;
  rejected_count  INTEGER;
BEGIN
  SELECT COUNT(*) INTO pending_count  FROM transceiver_equivalences WHERE status = 'pending';
  SELECT COUNT(*) INTO approved_count FROM transceiver_equivalences WHERE status IN ('approved', 'auto_approved');
  SELECT COUNT(*) INTO rejected_count FROM transceiver_equivalences WHERE status = 'rejected';
  RAISE NOTICE 'After migration 107: pending=%, approved=%, rejected=%',
    pending_count, approved_count, rejected_count;
END;
$$;

-- Schritt 3: Index für deterministischen Matcher optimieren
CREATE INDEX IF NOT EXISTS idx_eq_deterministic ON transceiver_equivalences
  (flexoptix_id, competitor_id, status)
  WHERE status = 'auto_approved';

SCHRITT 6: pg-boss Job für Enrichment-Robot registrieren

Ändere in: packages/scraper/src/scheduler.ts

Im Block wo Jobs registriert werden, hinzufügen:

// Wavelength Enricher — läuft alle 4 Stunden
await boss.schedule('enrich:wavelength', '0 */4 * * *', {}, {});

// Handler registrieren
boss.work('enrich:wavelength', async () => {
  await runWavelengthEnricher();
});

SCHRITT 7: API — Manual Review UI deaktivieren

Ändere in: packages/api/src/routes/review.ts

Den POST-Endpunkt /equivalences/:id/approve mit einem Guard versehen:

reviewRouter.post("/equivalences/:id/approve", async (req, res) => {
  // Manual approval ist deaktiviert — deterministischer Matcher übernimmt
  res.status(410).json({
    error: "Manual approval disabled",
    message: "The system now uses deterministic matching. No manual review needed.",
    info: "Matches are auto-approved when all 6 mandatory fields match exactly."
  });
});

SCHRITT 8: Transceivers-Query für Matcher aktualisieren

Ändere in: packages/scraper/src/scheduler.ts

Den SQL-Query der Flexoptix-Transceivers für den Matcher aktualisieren, damit wavelength_tx_nm und connector_type mitgeladen werden:

-- Flexoptix-Transceivers für Matcher laden
SELECT
  t.id, t.part_number, t.standard_name, t.form_factor,
  t.speed_gbps, t.fiber_type, t.reach_meters, t.wavelengths,
  t.wavelength_tx_nm,   -- NEU
  t.wavelength_rx_nm,   -- NEU
  t.connector_type,     -- NEU (war vorher 'connector')
  t.data_completeness,  -- NEU
  t.enrichment_needed   -- NEU
FROM transceivers t
JOIN vendors v ON v.id = t.vendor_id
WHERE v.name = 'Flexoptix'
  AND t.enrichment_needed = FALSE  -- NUR vollständige Datensätze matchen
  AND t.data_completeness >= 80
ORDER BY t.part_number

Gleiches für den Kandidaten-Query (Competitor-Transceivers).


SCHRITT 9: Bestehende wavelengths-Spalte migrieren

Die existierende Spalte transceivers.wavelengths (TEXT, z.B. "1310nm" oder "1270/1330nm") in die neuen numerischen Spalten überführen:

Erstelle Datei: sql/108-migrate-wavelengths-text-to-int.sql

-- Migration 108: wavelengths TEXT → wavelength_tx_nm / wavelength_rx_nm INTEGER

UPDATE transceivers SET
  wavelength_tx_nm = CASE
    WHEN wavelengths ~ '^\s*850'          THEN 850
    WHEN wavelengths ~ '^\s*1270'         THEN 1270
    WHEN wavelengths ~ '^\s*1310'         THEN 1310
    WHEN wavelengths ~ '^\s*1330'         THEN 1330
    WHEN wavelengths ~ '^\s*1490'         THEN 1490
    WHEN wavelengths ~ '^\s*1550'         THEN 1550
    WHEN wavelengths ~ '^\s*1270\s*/\s*1330' THEN 1270
    WHEN wavelengths ~ '^\s*1330\s*/\s*1270' THEN 1330
    ELSE NULL
  END,
  wavelength_rx_nm = CASE
    WHEN wavelengths ~ '1270\s*/\s*1330' THEN 1330
    WHEN wavelengths ~ '1330\s*/\s*1270' THEN 1270
    WHEN wavelengths ~ '1310\s*/\s*1550' THEN 1550
    WHEN wavelengths ~ '1550\s*/\s*1310' THEN 1310
    ELSE NULL
  END
WHERE wavelengths IS NOT NULL
  AND wavelength_tx_nm IS NULL;

-- Connector aus alter connector-Spalte übernehmen (falls vorhanden)
UPDATE transceivers SET
  connector_type = CASE connector
    WHEN 'LC'     THEN 'LC'
    WHEN 'SC'     THEN 'SC'
    WHEN 'MPO'    THEN 'MPO-12'
    WHEN 'MPO-12' THEN 'MPO-12'
    WHEN 'MPO-16' THEN 'MPO-16'
    WHEN 'RJ45'   THEN 'RJ45'
    ELSE connector
  END
WHERE connector IS NOT NULL AND connector_type IS NULL;

-- Completeness neu berechnen nach Migration
UPDATE transceivers SET
  data_completeness = calc_data_completeness(
    form_factor, speed_gbps, fiber_type,
    reach_meters, wavelength_tx_nm, connector_type
  ),
  enrichment_needed = (
    form_factor IS NULL OR speed_gbps IS NULL OR
    fiber_type IS NULL OR reach_meters IS NULL OR
    wavelength_tx_nm IS NULL OR connector_type IS NULL
  );

Ausführungsreihenfolge für Codex

1.  sql/105-wavelength-connector-completeness.sql      → psql ausführen
2.  sql/106-ieee-msa-wavelength-lookup.sql             → psql ausführen
3.  sql/108-migrate-wavelengths-text-to-int.sql        → psql ausführen
4.  packages/scraper/src/robots/wavelength-enricher.ts → neue Datei erstellen
5.  packages/scraper/src/scheduler.ts                  → Matcher-Logik ersetzen (Schritt 4)
6.  packages/scraper/src/scheduler.ts                  → Job registrieren (Schritt 6)
7.  packages/api/src/routes/review.ts                  → Approve-Endpunkt deaktivieren (Schritt 7)
8.  npm run build -w packages/scraper                  → Build
9.  npm run build -w packages/api                      → Build
10. sql/107-clear-pending-queue.sql                    → psql ausführen (ZULETZT — bereinigt Queue)
11. pm2 restart tip-scraper-daemon                     → Daemon neu starten
12. Enricher manuell triggern: POST /api/review/run-matcher

Erwartetes Ergebnis nach Deployment

Metrik Vorher Nachher
Pending Queue 13.374 0
Confidence-Score 0.01.0 (fuzzy) entfällt
Match-Typ probabilistisch deterministisch
Manuelle Freigaben/Tag ~50-200 0
False-Positive-Rate ~15% ~0%
Transceivers mit Wellenlänge ~30% >85% (nach Enricher)

Nicht ändern (außerhalb dieses Tasks)

  • packages/scraper/src/scrapers/* — Scraper-Logik bleibt unverändert
  • packages/api/src/routes/ (außer review.ts) — API-Endpunkte bleiben
  • sql/001-036 — Bestehende Migrationen nicht anfassen
  • Alle packages/core/src/ Typen — nur erweitern, nicht löschen
  • PM2-Konfiguration — nicht anfassen