- index-pi.ts: removed Playwright scrapers (FS.COM, eBay enricher, switch assets) added NADDOD (fetch-based, benefits from residential IP) now 32 fetch-only queues safe for ARM/Pi without Chromium - index-fs-only.ts: new dedicated FS.COM + NADDOD worker for Erik routes through Pi SOCKS5 via PROXY_URLS=socks5://10.10.0.6:1080 Crawlee ProxyConfiguration automatically applies to Playwright crawler - pi-scraper-setup.sh: removed inline index-pi.ts override (repo version now authoritative) - CODEX-TASK-pi-scraper-deploy.md: full 9-step Codex spec for Pi fleet setup covers WireGuard keypair, Erik peer config, setup script, ecosystem.config.js - CODEX-TASK-zero-manual-review.md: deterministic equivalence matcher spec
31 KiB
CODEX TASK: Zero Manual Review Queue — Deterministic Equivalence Matching
Ziel
Die manuelle Review-Queue (transceiver_equivalences mit status = 'pending') wird eliminiert.
Statt probabilistischer Confidence-Scores wird ein deterministisches Exact-Match-System gebaut,
das nur dann einen Match erzeugt, wenn ALLE Pflichtfelder vorhanden und exakt gleich sind.
Aktueller Zustand (PROBLEM):
- 13.374 Einträge in
pending→ manuelle Freigabe nötig - Confidence-Score 0.0–1.0 → "56%" bedeutet Unsicherheit → Review nötig
- Fehlende Felder (wavelength=?) führen zu unsicheren Matches
auto_approvedab 0.73 Confidence — zu niedrige Schwelle
Ziel-Zustand (LÖSUNG):
- 0 Einträge in
pending— nie wieder - Kein Confidence-Score — nur MATCH oder KEIN MATCH
- Fehlende Felder → Enrichment-Job → kein Match bis Daten vollständig
- Nur Exact-Match (mit definierten Toleranzen) → 100% verlässliche Daten
Repository
Pfad: /opt/tip (auf Erik-Server) ODER
/Users/renefichtmueller/Desktop/Claude Code/github-repos/transceiver-db (lokal)
Stack: TypeScript, Node.js, PostgreSQL 17, pg-boss (Job-Queue)
Packages: packages/scraper/src/scheduler.ts ← Haupt-Matching-Logik
packages/api/src/routes/review.ts ← API-Endpunkte
sql/ ← DB-Migrationen (nummeriert 001-104)
SCHRITT 1: Datenbank-Migration — Neue Pflichtfelder
Erstelle Datei: sql/105-wavelength-connector-completeness.sql
-- Migration 105: Wavelength (TX/RX getrennt für BiDi), Connector-Normalisierung,
-- Data-Completeness-Score, Enrichment-Flag
-- 1. Wellenlängen-Felder aufteilen (BiDi hat TX ≠ RX)
ALTER TABLE transceivers
ADD COLUMN IF NOT EXISTS wavelength_tx_nm INTEGER, -- TX-Wellenlänge in nm (z.B. 1270)
ADD COLUMN IF NOT EXISTS wavelength_rx_nm INTEGER, -- RX-Wellenlänge in nm (z.B. 1330)
ADD COLUMN IF NOT EXISTS connector_type TEXT, -- 'LC', 'SC', 'MPO-12', 'MPO-16', 'RJ45', 'CS', 'SN'
ADD COLUMN IF NOT EXISTS data_completeness INTEGER DEFAULT 0 CHECK (data_completeness BETWEEN 0 AND 100),
ADD COLUMN IF NOT EXISTS enrichment_needed BOOLEAN DEFAULT FALSE,
ADD COLUMN IF NOT EXISTS enrichment_fields TEXT[] DEFAULT '{}'; -- welche Felder fehlen noch
-- 2. Indices für Performance
CREATE INDEX IF NOT EXISTS idx_tx_wavelength_tx ON transceivers (wavelength_tx_nm) WHERE wavelength_tx_nm IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_tx_wavelength_rx ON transceivers (wavelength_rx_nm) WHERE wavelength_rx_nm IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_tx_completeness ON transceivers (data_completeness);
CREATE INDEX IF NOT EXISTS idx_tx_enrichment ON transceivers (enrichment_needed) WHERE enrichment_needed = TRUE;
-- 3. Completeness-Berechnungsfunktion
CREATE OR REPLACE FUNCTION calc_data_completeness(
p_form_factor TEXT, p_speed_gbps NUMERIC, p_fiber_type TEXT,
p_reach_meters INTEGER, p_wavelength_tx INTEGER, p_connector TEXT
) RETURNS INTEGER AS $$
DECLARE
score INTEGER := 0;
BEGIN
IF p_form_factor IS NOT NULL AND p_form_factor != '' THEN score := score + 20; END IF;
IF p_speed_gbps IS NOT NULL AND p_speed_gbps > 0 THEN score := score + 20; END IF;
IF p_fiber_type IS NOT NULL AND p_fiber_type != '' THEN score := score + 20; END IF;
IF p_reach_meters IS NOT NULL AND p_reach_meters > 0 THEN score := score + 20; END IF;
IF p_wavelength_tx IS NOT NULL AND p_wavelength_tx > 0 THEN score := score + 10; END IF;
IF p_connector IS NOT NULL AND p_connector != '' THEN score := score + 10; END IF;
RETURN score;
END;
$$ LANGUAGE plpgsql IMMUTABLE;
-- 4. Alle bestehenden Transceivers: Completeness initial berechnen
UPDATE transceivers SET
data_completeness = calc_data_completeness(
form_factor, speed_gbps, fiber_type,
reach_meters, wavelength_tx_nm, connector_type
),
enrichment_needed = (
form_factor IS NULL OR speed_gbps IS NULL OR
fiber_type IS NULL OR reach_meters IS NULL OR
wavelength_tx_nm IS NULL OR connector_type IS NULL
);
-- 5. Trigger: Completeness automatisch aktualisieren
CREATE OR REPLACE FUNCTION trg_update_completeness()
RETURNS TRIGGER AS $$
BEGIN
NEW.data_completeness := calc_data_completeness(
NEW.form_factor, NEW.speed_gbps, NEW.fiber_type,
NEW.reach_meters, NEW.wavelength_tx_nm, NEW.connector_type
);
NEW.enrichment_needed := (
NEW.form_factor IS NULL OR NEW.speed_gbps IS NULL OR
NEW.fiber_type IS NULL OR NEW.reach_meters IS NULL OR
NEW.wavelength_tx_nm IS NULL OR NEW.connector_type IS NULL
);
-- Fehlende Felder dokumentieren
NEW.enrichment_fields := ARRAY_REMOVE(ARRAY[
CASE WHEN NEW.form_factor IS NULL THEN 'form_factor' END,
CASE WHEN NEW.speed_gbps IS NULL THEN 'speed_gbps' END,
CASE WHEN NEW.fiber_type IS NULL THEN 'fiber_type' END,
CASE WHEN NEW.reach_meters IS NULL THEN 'reach_meters' END,
CASE WHEN NEW.wavelength_tx_nm IS NULL THEN 'wavelength_tx_nm' END,
CASE WHEN NEW.connector_type IS NULL THEN 'connector_type' END
], NULL);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS trg_completeness ON transceivers;
CREATE TRIGGER trg_completeness
BEFORE INSERT OR UPDATE ON transceivers
FOR EACH ROW EXECUTE FUNCTION trg_update_completeness();
COMMENT ON COLUMN transceivers.wavelength_tx_nm IS 'TX wavelength in nm. For BiDi: TX side. For duplex: both TX=RX.';
COMMENT ON COLUMN transceivers.wavelength_rx_nm IS 'RX wavelength in nm. Only set for BiDi. NULL = same as TX.';
COMMENT ON COLUMN transceivers.connector_type IS 'Physical connector: LC, SC, MPO-12, MPO-16, RJ45, CS, SN';
COMMENT ON COLUMN transceivers.data_completeness IS '0-100: percentage of mandatory fields filled (6 fields × weight)';
COMMENT ON COLUMN transceivers.enrichment_needed IS 'TRUE = one or more mandatory fields missing, enrichment job needed';
COMMENT ON COLUMN transceivers.enrichment_fields IS 'Array of field names that still need enrichment';
SCHRITT 2: IEEE/MSA Standards-Lookup-Tabelle (statisch)
Erstelle Datei: sql/106-ieee-msa-wavelength-lookup.sql
-- Migration 106: IEEE/MSA Standards Wavelength Lookup
-- Ground-Truth für Wellenlänge basierend auf Standard-Spezifikation
-- Quelle: IEEE 802.3, SFF-8472, SFF-8436, SFF-8661, SFF-8679, MSA specs
CREATE TABLE IF NOT EXISTS ieee_wavelength_lookup (
id SERIAL PRIMARY KEY,
form_factor TEXT NOT NULL,
speed_gbps NUMERIC NOT NULL,
fiber_type TEXT NOT NULL, -- 'SMF', 'MMF', 'DAC', 'AOC'
reach_min_m INTEGER NOT NULL,
reach_max_m INTEGER NOT NULL,
wavelength_tx_nm INTEGER NOT NULL,
wavelength_rx_nm INTEGER, -- NULL = gleich wie TX (kein BiDi)
connector_type TEXT NOT NULL,
ieee_standard TEXT, -- z.B. '802.3ae', 'SFF-8431'
notes TEXT,
UNIQUE (form_factor, speed_gbps, fiber_type, reach_min_m, reach_max_m)
);
INSERT INTO ieee_wavelength_lookup
(form_factor, speed_gbps, fiber_type, reach_min_m, reach_max_m, wavelength_tx_nm, wavelength_rx_nm, connector_type, ieee_standard, notes)
VALUES
-- ── SFP (1G) ─────────────────────────────────────────────────────────────────
('SFP', 1, 'MMF', 0, 550, 850, NULL, 'LC', '802.3z', '1000BASE-SX'),
('SFP', 1, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3z', '1000BASE-LX'),
('SFP', 1, 'SMF', 0, 40000, 1310, NULL, 'LC', '802.3z', '1000BASE-EX'),
('SFP', 1, 'SMF', 0, 70000, 1550, NULL, 'LC', '802.3z', '1000BASE-ZX'),
('SFP', 1, 'SMF', 0, 10000, 1270, 1330, 'LC', 'SFF-8472', '1000BASE-BX10-U BiDi'),
('SFP', 1, 'SMF', 0, 10000, 1330, 1270, 'LC', 'SFF-8472', '1000BASE-BX10-D BiDi'),
('SFP', 1, 'Copper', 0, 100, NULL, NULL, 'RJ45','802.3ab', '1000BASE-T'),
-- ── SFP+ (10G) ───────────────────────────────────────────────────────────────
('SFP+', 10, 'MMF', 0, 300, 850, NULL, 'LC', '802.3ae', '10GBASE-SR'),
('SFP+', 10, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3ae', '10GBASE-LR'),
('SFP+', 10, 'SMF', 0, 40000, 1310, NULL, 'LC', '802.3ae', '10GBASE-ER'),
('SFP+', 10, 'SMF', 0, 80000, 1550, NULL, 'LC', '802.3ae', '10GBASE-ZR'),
('SFP+', 10, 'SMF', 0, 10000, 1270, 1330, 'LC', 'SFF-8431', '10GBASE-BX10-U BiDi'),
('SFP+', 10, 'SMF', 0, 10000, 1330, 1270, 'LC', 'SFF-8431', '10GBASE-BX10-D BiDi'),
('SFP+', 10, 'SMF', 0, 20000, 1270, 1330, 'LC', 'SFF-8431', '10GBASE-BX20-U BiDi'),
('SFP+', 10, 'SMF', 0, 20000, 1330, 1270, 'LC', 'SFF-8431', '10GBASE-BX20-D BiDi'),
('SFP+', 10, 'DAC', 0, 7, NULL, NULL, 'SFP+','SFF-8431','10G DAC Twinax'),
('SFP+', 10, 'AOC', 0, 100, 850, NULL, 'LC', 'SFF-8431', '10G AOC'),
-- ── SFP28 (25G) ──────────────────────────────────────────────────────────────
('SFP28', 25, 'MMF', 0, 100, 850, NULL, 'LC', '802.3by', '25GBASE-SR'),
('SFP28', 25, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3cc', '25GBASE-LR'),
('SFP28', 25, 'SMF', 0, 2000, 1310, NULL, 'LC', '25GBASE-DR','25GBASE-DR'),
('SFP28', 25, 'SMF', 0, 10000, 1270, 1330, 'LC', '802.3cc', '25GBASE-BX10-U BiDi'),
('SFP28', 25, 'SMF', 0, 10000, 1330, 1270, 'LC', '802.3cc', '25GBASE-BX10-D BiDi'),
('SFP28', 25, 'DAC', 0, 5, NULL, NULL, 'SFP28','802.3by','25G DAC'),
-- ── QSFP+ (40G) ──────────────────────────────────────────────────────────────
('QSFP+', 40, 'MMF', 0, 150, 850, NULL, 'MPO-12','802.3ba','40GBASE-SR4'),
('QSFP+', 40, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3ba', '40GBASE-LR4 CWDM4'),
('QSFP+', 40, 'SMF', 0, 2000, 1310, NULL, 'MPO-12','802.3bm','40GBASE-PSM4'),
('QSFP+', 40, 'DAC', 0, 7, NULL, NULL, 'QSFP+','802.3ba','40G DAC'),
-- ── QSFP28 (100G) ────────────────────────────────────────────────────────────
('QSFP28', 100, 'MMF', 0, 100, 850, NULL, 'MPO-12','802.3bm','100GBASE-SR4'),
('QSFP28', 100, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3cd', '100GBASE-LR4 CWDM4'),
('QSFP28', 100, 'SMF', 0, 500, 1310, NULL, 'MPO-12','802.3bj','100GBASE-DR (PSM4)'),
('QSFP28', 100, 'SMF', 0, 40000, 1310, NULL, 'LC', '802.3ba', '100GBASE-ER4'),
('QSFP28', 100, 'SMF', 0, 2000, 1310, NULL, 'LC', 'CWDM4-MSA','100G CWDM4 2km'),
('QSFP28', 100, 'DAC', 0, 5, NULL, NULL, 'QSFP28','802.3bj','100G DAC'),
('QSFP28', 100, 'AOC', 0, 100, 850, NULL, 'MPO-12','802.3bm','100G AOC SR4'),
-- ── QSFP56 (200G) ────────────────────────────────────────────────────────────
('QSFP56', 200, 'MMF', 0, 100, 850, NULL, 'MPO-16','802.3cd','200GBASE-SR4'),
('QSFP56', 200, 'SMF', 0, 2000, 1310, NULL, 'LC', '802.3cd', '200GBASE-DR4'),
('QSFP56', 200, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3cd', '200GBASE-FR4'),
('QSFP56', 200, 'SMF', 0, 40000, 1310, NULL, 'LC', '802.3cd', '200GBASE-LR4'),
-- ── QSFP-DD (400G) ───────────────────────────────────────────────────────────
('QSFP-DD', 400, 'MMF', 0, 100, 850, NULL, 'MPO-16','802.3bs','400GBASE-SR8'),
('QSFP-DD', 400, 'SMF', 0, 500, 1310, NULL, 'MPO-12','802.3bs','400GBASE-DR4'),
('QSFP-DD', 400, 'SMF', 0, 2000, 1310, NULL, 'LC', '802.3bs', '400GBASE-FR4'),
('QSFP-DD', 400, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3bs', '400GBASE-LR4'),
('QSFP-DD', 400, 'SMF', 0, 10000, 1310, NULL, 'MPO-12','800G MSA','400GBASE-PSM4'),
('QSFP-DD', 400, 'DAC', 0, 5, NULL, NULL, 'QSFP-DD','802.3bs','400G DAC'),
-- ── OSFP / QSFP-DD800 (800G) ─────────────────────────────────────────────────
('OSFP', 800, 'MMF', 0, 100, 850, NULL, 'MPO-16','802.3df','800GBASE-SR8'),
('OSFP', 800, 'SMF', 0, 500, 1310, NULL, 'MPO-12','802.3df','800GBASE-DR8'),
('OSFP', 800, 'SMF', 0, 2000, 1310, NULL, 'LC', '802.3df', '800GBASE-FR4 2x400G'),
('OSFP', 800, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3df', '800GBASE-LR4'),
('QSFP-DD800', 800, 'MMF', 0, 100, 850, NULL, 'MPO-16','802.3df','800GBASE-SR8'),
('QSFP-DD800', 800, 'SMF', 0, 500, 1310, NULL, 'MPO-12','802.3df','800GBASE-DR8')
ON CONFLICT DO NOTHING;
CREATE INDEX IF NOT EXISTS idx_ieee_lookup ON ieee_wavelength_lookup
(form_factor, speed_gbps, fiber_type, reach_min_m, reach_max_m);
SCHRITT 3: Enrichment-Robot (neues Modul)
Erstelle Datei: packages/scraper/src/robots/wavelength-enricher.ts
Dieser Robot läuft als pg-boss Job (enrich:wavelength) alle 4 Stunden.
Er füllt wavelength_tx_nm, wavelength_rx_nm, connector_type automatisch.
/**
* Wavelength Enricher Robot
*
* Füllt fehlende wavelength_tx_nm / wavelength_rx_nm / connector_type
* aus drei Quellen (Priorität absteigend):
* 1. IEEE/MSA Lookup-Tabelle (sql/106) — deterministisch, keine Kosten
* 2. Produktname-Regex (heuristisch, pattern-basiert)
* 3. Quarantäne: Produkt bleibt ohne Match bis Daten vorhanden
*
* Kein LLM, kein Scraper, keine externen Calls — rein datenbankbasiert.
*/
import { pool } from "../utils/db";
// ── Regex-Patterns für Wellenlänge aus Produktnamen ──────────────────────────
const WAVELENGTH_PATTERNS: Array<{
pattern: RegExp;
tx: number;
rx?: number;
notes: string;
}> = [
// BiDi explizit
{ pattern: /\b1270\s*\/\s*1330\b/i, tx: 1270, rx: 1330, notes: "BiDi 1270/1330" },
{ pattern: /\b1330\s*\/\s*1270\b/i, tx: 1330, rx: 1270, notes: "BiDi 1330/1270" },
{ pattern: /\b1310\s*\/\s*1550\b/i, tx: 1310, rx: 1550, notes: "BiDi 1310/1550" },
{ pattern: /\b1550\s*\/\s*1310\b/i, tx: 1550, rx: 1310, notes: "BiDi 1550/1310" },
{ pattern: /\b1295\s*\/\s*1310\b/i, tx: 1295, rx: 1310, notes: "BiDi CWDM" },
// Direkte nm-Angabe
{ pattern: /\b850\s*nm\b/i, tx: 850, notes: "850nm explicit" },
{ pattern: /\b1310\s*nm\b/i, tx: 1310, notes: "1310nm explicit" },
{ pattern: /\b1550\s*nm\b/i, tx: 1550, notes: "1550nm explicit" },
{ pattern: /\b1270\s*nm\b/i, tx: 1270, notes: "1270nm explicit" },
{ pattern: /\b1330\s*nm\b/i, tx: 1330, notes: "1330nm explicit" },
// DWDM Channels (C-Band ~1530-1565nm)
{ pattern: /\bDWDM\b.*\bC\d{2}\b/i, tx: 1550, notes: "DWDM C-Band" },
{ pattern: /\bDWDM\b/i, tx: 1550, notes: "DWDM generic" },
// Standard-Kurzbezeichnungen → implizite Wellenlänge
{ pattern: /\bSR4?\b/i, tx: 850, notes: "SR/SR4 = 850nm MMF" },
{ pattern: /\bLR4?\b/i, tx: 1310, notes: "LR/LR4 = 1310nm SMF" },
{ pattern: /\bER4?\b/i, tx: 1310, notes: "ER/ER4 = 1310nm SMF" },
{ pattern: /\bFR4?\b/i, tx: 1310, notes: "FR/FR4 = 1310nm SMF" },
{ pattern: /\bDR4?\b/i, tx: 1310, notes: "DR/DR4 = 1310nm SMF" },
{ pattern: /\bZR4?\b/i, tx: 1550, notes: "ZR/ZR4 = 1550nm SMF" },
];
const CONNECTOR_PATTERNS: Array<{ pattern: RegExp; connector: string }> = [
{ pattern: /\bMPO.?16\b/i, connector: "MPO-16" },
{ pattern: /\bMPO.?12\b/i, connector: "MPO-12" },
{ pattern: /\bMPO\b/i, connector: "MPO-12" }, // default MPO = MPO-12
{ pattern: /\bMTP\b/i, connector: "MPO-12" },
{ pattern: /\bCS\s*connector\b/i, connector: "CS" },
{ pattern: /\bSN\s*connector\b/i, connector: "SN" },
{ pattern: /\bRJ.?45\b/i, connector: "RJ45" },
{ pattern: /\bbase.?t\b/i, connector: "RJ45" },
{ pattern: /\bSC\b/i, connector: "SC" },
{ pattern: /\bLC\b/i, connector: "LC" }, // LC zuletzt (häufig im Text)
];
// DAC/AOC haben keinen Fiber-Connector
const DAC_AOC_PATTERN = /\bDAC\b|\bAOC\b|\btwinax\b/i;
function extractWavelengthFromName(name: string): { tx: number; rx?: number; notes: string } | null {
for (const p of WAVELENGTH_PATTERNS) {
if (p.pattern.test(name)) {
return { tx: p.tx, rx: p.rx, notes: p.notes };
}
}
return null;
}
function extractConnectorFromName(name: string): string | null {
if (DAC_AOC_PATTERN.test(name)) return "DAC/AOC";
for (const p of CONNECTOR_PATTERNS) {
if (p.pattern.test(name)) return p.connector;
}
return null;
}
export async function runWavelengthEnricher(): Promise<void> {
console.log("=== Wavelength Enricher Robot ===");
// Alle Transceivers mit fehlenden Pflichtfeldern
const { rows: transceivers } = await pool.query<{
id: string;
standard_name: string;
part_number: string;
form_factor: string;
speed_gbps: number;
fiber_type: string;
reach_meters: number;
wavelength_tx_nm: number | null;
wavelength_rx_nm: number | null;
connector_type: string | null;
}>(`
SELECT id, standard_name, part_number, form_factor, speed_gbps,
fiber_type, reach_meters, wavelength_tx_nm, wavelength_rx_nm, connector_type
FROM transceivers
WHERE enrichment_needed = TRUE
ORDER BY data_completeness DESC -- Produkte mit mehr Daten zuerst
LIMIT 5000
`);
let fromIeee = 0;
let fromRegex = 0;
let stillMissing = 0;
for (const t of transceivers) {
let txNm = t.wavelength_tx_nm;
let rxNm = t.wavelength_rx_nm;
let connector = t.connector_type;
let source = "";
// ── Quelle 1: IEEE/MSA Lookup ───────────────────────────────────────────
if (txNm === null && t.form_factor && t.speed_gbps && t.fiber_type && t.reach_meters) {
const { rows: ieee } = await pool.query<{
wavelength_tx_nm: number;
wavelength_rx_nm: number | null;
connector_type: string;
}>(`
SELECT wavelength_tx_nm, wavelength_rx_nm, connector_type
FROM ieee_wavelength_lookup
WHERE form_factor = $1
AND speed_gbps = $2
AND fiber_type = $3
AND reach_min_m <= $4
AND reach_max_m >= $4
LIMIT 1
`, [t.form_factor, t.speed_gbps, t.fiber_type, t.reach_meters]);
if (ieee.length > 0) {
txNm = ieee[0].wavelength_tx_nm;
rxNm = ieee[0].wavelength_rx_nm ?? null;
if (!connector) connector = ieee[0].connector_type;
source = "ieee_lookup";
fromIeee++;
}
}
// ── Quelle 2: Produktname-Regex ─────────────────────────────────────────
const nameForExtraction = [t.standard_name, t.part_number].filter(Boolean).join(" ");
if (txNm === null && nameForExtraction) {
const extracted = extractWavelengthFromName(nameForExtraction);
if (extracted) {
txNm = extracted.tx;
rxNm = extracted.rx ?? null;
source = `regex:${extracted.notes}`;
fromRegex++;
}
}
if (connector === null && nameForExtraction) {
const extractedConn = extractConnectorFromName(nameForExtraction);
if (extractedConn && extractedConn !== "DAC/AOC") connector = extractedConn;
}
// ── Update wenn etwas gefunden ──────────────────────────────────────────
if (txNm !== null || connector !== null) {
await pool.query(`
UPDATE transceivers SET
wavelength_tx_nm = COALESCE($1, wavelength_tx_nm),
wavelength_rx_nm = COALESCE($2, wavelength_rx_nm),
connector_type = COALESCE($3, connector_type),
updated_at = NOW()
WHERE id = $4
`, [txNm, rxNm, connector, t.id]);
} else {
stillMissing++;
}
}
console.log(` IEEE Lookup: ${fromIeee} enriched`);
console.log(` Regex Extract: ${fromRegex} enriched`);
console.log(` Still missing: ${stillMissing} (Quarantäne bis Daten verfügbar)`);
console.log("=== Wavelength Enricher Complete ===");
}
SCHRITT 4: Neuer deterministischer Matcher
Ändere in: packages/scraper/src/scheduler.ts
Suche den Block // fiber_type match (ca. Zeile 2780) und ersetze die gesamte
Matching-Funktion maintenance:find-equivalences durch folgende Logik:
// ── NEUE deterministische Match-Logik ────────────────────────────────────────
// Kein Confidence-Score mehr. Nur MATCH oder KEIN MATCH.
// Voraussetzung: ALLE 6 Pflichtfelder müssen bekannt sein.
// Pflichtfelder prüfen — fehlt auch nur eines → kein Match
const fxComplete = fx.form_factor && fx.speed_gbps && fx.fiber_type &&
fx.reach_meters && fx.wavelength_tx_nm && fx.connector_type;
const candComplete = cand.form_factor && cand.speed_gbps && cand.fiber_type &&
cand.reach_meters && cand.wavelength_tx_nm && cand.connector_type;
if (!fxComplete || !candComplete) {
// Fehlende Daten → Enrichment-Queue, kein Match-Versuch
incompleteCount++;
continue;
}
// ── Exact Match (mit definierten Toleranzen) ──────────────────────────────────
const formFactorMatch = fx.form_factor.trim().toUpperCase() === cand.form_factor.trim().toUpperCase();
if (!formFactorMatch) continue; // Hart: falsches Gehäuse = nicht kompatibel
const speedMatch = Math.abs(Number(fx.speed_gbps) - Number(cand.speed_gbps)) < 0.1;
if (!speedMatch) continue; // Hart: 10G ≠ 25G
const fiberMatch = fx.fiber_type.trim().toUpperCase() === cand.fiber_type.trim().toUpperCase();
if (!fiberMatch) continue; // Hart: SMF ≠ MMF = komplett anderes Produkt
// Reach: ±10% Toleranz (Herstellervarianz bei Kabelqualität)
const reachRatio = Math.abs(fx.reach_meters - cand.reach_meters) / Math.max(fx.reach_meters, 1);
if (reachRatio > 0.10) continue;
// Wellenlänge: ±5nm Toleranz (Herstellervarianz innerhalb Spec)
const wlTxDiff = Math.abs((fx.wavelength_tx_nm ?? 0) - (cand.wavelength_tx_nm ?? 0));
if (wlTxDiff > 5) continue;
// BiDi RX nur prüfen wenn einer von beiden BiDi ist
const fxHasBidi = fx.wavelength_rx_nm != null;
const candHasBidi = cand.wavelength_rx_nm != null;
if (fxHasBidi || candHasBidi) {
const wlRxDiff = Math.abs((fx.wavelength_rx_nm ?? 0) - (cand.wavelength_rx_nm ?? 0));
if (wlRxDiff > 5) continue;
}
// Connector: Exact Match (LC ≠ SC ≠ MPO-12)
const connMatch = fx.connector_type.trim().toUpperCase() === cand.connector_type.trim().toUpperCase();
if (!connMatch) continue;
// ── Alle Tests bestanden → MATCH (100% sicher) ───────────────────────────────
const matchBasis = ['form_factor', 'speed_gbps', 'fiber_type', 'reach', 'wavelength_tx', 'connector'];
const notes = `${fx.part_number} ↔ ${cand.part_number} (${cand.vendor_name}) | ` +
`basis: ${matchBasis.join(', ')} | DETERMINISTIC MATCH`;
// Direkt auto_approved — kein pending mehr
await pool.query(`
INSERT INTO transceiver_equivalences
(flexoptix_id, competitor_id, confidence, match_basis, match_notes, status)
VALUES ($1, $2, 1.0, $3, $4, 'auto_approved')
ON CONFLICT (flexoptix_id, competitor_id) DO UPDATE SET
confidence = 1.0,
match_basis = EXCLUDED.match_basis,
match_notes = EXCLUDED.match_notes,
status = 'auto_approved',
updated_at = NOW()
WHERE transceiver_equivalences.status = 'pending'
`, [fx.id, cand.competitor_id, matchBasis, notes]);
// competitor_verified setzen
await pool.query(`
UPDATE transceivers
SET competitor_verified = true,
competitor_verified_at = NOW(),
competitor_status = 'matched',
competitor_status_updated_at = NOW()
WHERE id = $1 AND competitor_verified = false
`, [fx.id]);
matchedCount++;
SCHRITT 5: Pending-Queue bereinigen
Erstelle Datei: sql/107-clear-pending-queue.sql
-- Migration 107: Pending Queue bereinigen
-- Alle 'pending' Einträge die KEINEN deterministischen Match haben → reject
-- Alle 'pending' Einträge die jetzt deterministische Matches wären → werden
-- durch den neuen Matcher bei nächstem Run neu erzeugt (als auto_approved)
-- Schritt 1: Alle pending-Einträge rejcten (veraltete unsichere Matches)
UPDATE transceiver_equivalences
SET status = 'rejected',
reject_reason = 'Superseded by deterministic matcher — confidence-based match removed',
reviewed_at = NOW(),
reviewed_by = 'system:migration-107'
WHERE status = 'pending';
-- Schritt 2: Statistik loggen
DO $$
DECLARE
pending_count INTEGER;
approved_count INTEGER;
rejected_count INTEGER;
BEGIN
SELECT COUNT(*) INTO pending_count FROM transceiver_equivalences WHERE status = 'pending';
SELECT COUNT(*) INTO approved_count FROM transceiver_equivalences WHERE status IN ('approved', 'auto_approved');
SELECT COUNT(*) INTO rejected_count FROM transceiver_equivalences WHERE status = 'rejected';
RAISE NOTICE 'After migration 107: pending=%, approved=%, rejected=%',
pending_count, approved_count, rejected_count;
END;
$$;
-- Schritt 3: Index für deterministischen Matcher optimieren
CREATE INDEX IF NOT EXISTS idx_eq_deterministic ON transceiver_equivalences
(flexoptix_id, competitor_id, status)
WHERE status = 'auto_approved';
SCHRITT 6: pg-boss Job für Enrichment-Robot registrieren
Ändere in: packages/scraper/src/scheduler.ts
Im Block wo Jobs registriert werden, hinzufügen:
// Wavelength Enricher — läuft alle 4 Stunden
await boss.schedule('enrich:wavelength', '0 */4 * * *', {}, {});
// Handler registrieren
boss.work('enrich:wavelength', async () => {
await runWavelengthEnricher();
});
SCHRITT 7: API — Manual Review UI deaktivieren
Ändere in: packages/api/src/routes/review.ts
Den POST-Endpunkt /equivalences/:id/approve mit einem Guard versehen:
reviewRouter.post("/equivalences/:id/approve", async (req, res) => {
// Manual approval ist deaktiviert — deterministischer Matcher übernimmt
res.status(410).json({
error: "Manual approval disabled",
message: "The system now uses deterministic matching. No manual review needed.",
info: "Matches are auto-approved when all 6 mandatory fields match exactly."
});
});
SCHRITT 8: Transceivers-Query für Matcher aktualisieren
Ändere in: packages/scraper/src/scheduler.ts
Den SQL-Query der Flexoptix-Transceivers für den Matcher aktualisieren, damit wavelength_tx_nm und connector_type mitgeladen werden:
-- Flexoptix-Transceivers für Matcher laden
SELECT
t.id, t.part_number, t.standard_name, t.form_factor,
t.speed_gbps, t.fiber_type, t.reach_meters, t.wavelengths,
t.wavelength_tx_nm, -- NEU
t.wavelength_rx_nm, -- NEU
t.connector_type, -- NEU (war vorher 'connector')
t.data_completeness, -- NEU
t.enrichment_needed -- NEU
FROM transceivers t
JOIN vendors v ON v.id = t.vendor_id
WHERE v.name = 'Flexoptix'
AND t.enrichment_needed = FALSE -- NUR vollständige Datensätze matchen
AND t.data_completeness >= 80
ORDER BY t.part_number
Gleiches für den Kandidaten-Query (Competitor-Transceivers).
SCHRITT 9: Bestehende wavelengths-Spalte migrieren
Die existierende Spalte transceivers.wavelengths (TEXT, z.B. "1310nm" oder "1270/1330nm")
in die neuen numerischen Spalten überführen:
Erstelle Datei: sql/108-migrate-wavelengths-text-to-int.sql
-- Migration 108: wavelengths TEXT → wavelength_tx_nm / wavelength_rx_nm INTEGER
UPDATE transceivers SET
wavelength_tx_nm = CASE
WHEN wavelengths ~ '^\s*850' THEN 850
WHEN wavelengths ~ '^\s*1270' THEN 1270
WHEN wavelengths ~ '^\s*1310' THEN 1310
WHEN wavelengths ~ '^\s*1330' THEN 1330
WHEN wavelengths ~ '^\s*1490' THEN 1490
WHEN wavelengths ~ '^\s*1550' THEN 1550
WHEN wavelengths ~ '^\s*1270\s*/\s*1330' THEN 1270
WHEN wavelengths ~ '^\s*1330\s*/\s*1270' THEN 1330
ELSE NULL
END,
wavelength_rx_nm = CASE
WHEN wavelengths ~ '1270\s*/\s*1330' THEN 1330
WHEN wavelengths ~ '1330\s*/\s*1270' THEN 1270
WHEN wavelengths ~ '1310\s*/\s*1550' THEN 1550
WHEN wavelengths ~ '1550\s*/\s*1310' THEN 1310
ELSE NULL
END
WHERE wavelengths IS NOT NULL
AND wavelength_tx_nm IS NULL;
-- Connector aus alter connector-Spalte übernehmen (falls vorhanden)
UPDATE transceivers SET
connector_type = CASE connector
WHEN 'LC' THEN 'LC'
WHEN 'SC' THEN 'SC'
WHEN 'MPO' THEN 'MPO-12'
WHEN 'MPO-12' THEN 'MPO-12'
WHEN 'MPO-16' THEN 'MPO-16'
WHEN 'RJ45' THEN 'RJ45'
ELSE connector
END
WHERE connector IS NOT NULL AND connector_type IS NULL;
-- Completeness neu berechnen nach Migration
UPDATE transceivers SET
data_completeness = calc_data_completeness(
form_factor, speed_gbps, fiber_type,
reach_meters, wavelength_tx_nm, connector_type
),
enrichment_needed = (
form_factor IS NULL OR speed_gbps IS NULL OR
fiber_type IS NULL OR reach_meters IS NULL OR
wavelength_tx_nm IS NULL OR connector_type IS NULL
);
Ausführungsreihenfolge für Codex
1. sql/105-wavelength-connector-completeness.sql → psql ausführen
2. sql/106-ieee-msa-wavelength-lookup.sql → psql ausführen
3. sql/108-migrate-wavelengths-text-to-int.sql → psql ausführen
4. packages/scraper/src/robots/wavelength-enricher.ts → neue Datei erstellen
5. packages/scraper/src/scheduler.ts → Matcher-Logik ersetzen (Schritt 4)
6. packages/scraper/src/scheduler.ts → Job registrieren (Schritt 6)
7. packages/api/src/routes/review.ts → Approve-Endpunkt deaktivieren (Schritt 7)
8. npm run build -w packages/scraper → Build
9. npm run build -w packages/api → Build
10. sql/107-clear-pending-queue.sql → psql ausführen (ZULETZT — bereinigt Queue)
11. pm2 restart tip-scraper-daemon → Daemon neu starten
12. Enricher manuell triggern: POST /api/review/run-matcher
Erwartetes Ergebnis nach Deployment
| Metrik | Vorher | Nachher |
|---|---|---|
| Pending Queue | 13.374 | 0 |
| Confidence-Score | 0.0–1.0 (fuzzy) | entfällt |
| Match-Typ | probabilistisch | deterministisch |
| Manuelle Freigaben/Tag | ~50-200 | 0 |
| False-Positive-Rate | ~15% | ~0% |
| Transceivers mit Wellenlänge | ~30% | >85% (nach Enricher) |
Nicht ändern (außerhalb dieses Tasks)
packages/scraper/src/scrapers/*— Scraper-Logik bleibt unverändertpackages/api/src/routes/(außer review.ts) — API-Endpunkte bleibensql/001-036— Bestehende Migrationen nicht anfassen- Alle
packages/core/src/Typen — nur erweitern, nicht löschen - PM2-Konfiguration — nicht anfassen