transceiver-db/CODEX-TASK-zero-manual-review.md
Rene Fichtmueller 0edc6e3f3a feat: Pi scraper fleet — fetch-only index-pi.ts + FS.COM/NADDOD via SOCKS5
- index-pi.ts: removed Playwright scrapers (FS.COM, eBay enricher, switch assets)
  added NADDOD (fetch-based, benefits from residential IP)
  now 32 fetch-only queues safe for ARM/Pi without Chromium
- index-fs-only.ts: new dedicated FS.COM + NADDOD worker for Erik
  routes through Pi SOCKS5 via PROXY_URLS=socks5://10.10.0.6:1080
  Crawlee ProxyConfiguration automatically applies to Playwright crawler
- pi-scraper-setup.sh: removed inline index-pi.ts override (repo version now authoritative)
- CODEX-TASK-pi-scraper-deploy.md: full 9-step Codex spec for Pi fleet setup
  covers WireGuard keypair, Erik peer config, setup script, ecosystem.config.js
- CODEX-TASK-zero-manual-review.md: deterministic equivalence matcher spec
2026-05-10 09:53:55 +02:00

706 lines
31 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CODEX TASK: Zero Manual Review Queue — Deterministic Equivalence Matching
## Ziel
Die manuelle Review-Queue (`transceiver_equivalences` mit `status = 'pending'`) wird eliminiert.
Statt probabilistischer Confidence-Scores wird ein deterministisches Exact-Match-System gebaut,
das nur dann einen Match erzeugt, wenn ALLE Pflichtfelder vorhanden und exakt gleich sind.
**Aktueller Zustand (PROBLEM):**
- 13.374 Einträge in `pending` → manuelle Freigabe nötig
- Confidence-Score 0.01.0 → "56%" bedeutet Unsicherheit → Review nötig
- Fehlende Felder (wavelength=?) führen zu unsicheren Matches
- `auto_approved` ab 0.73 Confidence — zu niedrige Schwelle
**Ziel-Zustand (LÖSUNG):**
- 0 Einträge in `pending` — nie wieder
- Kein Confidence-Score — nur MATCH oder KEIN MATCH
- Fehlende Felder → Enrichment-Job → kein Match bis Daten vollständig
- Nur Exact-Match (mit definierten Toleranzen) → 100% verlässliche Daten
---
## Repository
```
Pfad: /opt/tip (auf Erik-Server) ODER
/Users/renefichtmueller/Desktop/Claude Code/github-repos/transceiver-db (lokal)
Stack: TypeScript, Node.js, PostgreSQL 17, pg-boss (Job-Queue)
Packages: packages/scraper/src/scheduler.ts ← Haupt-Matching-Logik
packages/api/src/routes/review.ts ← API-Endpunkte
sql/ ← DB-Migrationen (nummeriert 001-104)
```
---
## SCHRITT 1: Datenbank-Migration — Neue Pflichtfelder
**Erstelle Datei:** `sql/105-wavelength-connector-completeness.sql`
```sql
-- Migration 105: Wavelength (TX/RX getrennt für BiDi), Connector-Normalisierung,
-- Data-Completeness-Score, Enrichment-Flag
-- 1. Wellenlängen-Felder aufteilen (BiDi hat TX ≠ RX)
ALTER TABLE transceivers
ADD COLUMN IF NOT EXISTS wavelength_tx_nm INTEGER, -- TX-Wellenlänge in nm (z.B. 1270)
ADD COLUMN IF NOT EXISTS wavelength_rx_nm INTEGER, -- RX-Wellenlänge in nm (z.B. 1330)
ADD COLUMN IF NOT EXISTS connector_type TEXT, -- 'LC', 'SC', 'MPO-12', 'MPO-16', 'RJ45', 'CS', 'SN'
ADD COLUMN IF NOT EXISTS data_completeness INTEGER DEFAULT 0 CHECK (data_completeness BETWEEN 0 AND 100),
ADD COLUMN IF NOT EXISTS enrichment_needed BOOLEAN DEFAULT FALSE,
ADD COLUMN IF NOT EXISTS enrichment_fields TEXT[] DEFAULT '{}'; -- welche Felder fehlen noch
-- 2. Indices für Performance
CREATE INDEX IF NOT EXISTS idx_tx_wavelength_tx ON transceivers (wavelength_tx_nm) WHERE wavelength_tx_nm IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_tx_wavelength_rx ON transceivers (wavelength_rx_nm) WHERE wavelength_rx_nm IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_tx_completeness ON transceivers (data_completeness);
CREATE INDEX IF NOT EXISTS idx_tx_enrichment ON transceivers (enrichment_needed) WHERE enrichment_needed = TRUE;
-- 3. Completeness-Berechnungsfunktion
CREATE OR REPLACE FUNCTION calc_data_completeness(
p_form_factor TEXT, p_speed_gbps NUMERIC, p_fiber_type TEXT,
p_reach_meters INTEGER, p_wavelength_tx INTEGER, p_connector TEXT
) RETURNS INTEGER AS $$
DECLARE
score INTEGER := 0;
BEGIN
IF p_form_factor IS NOT NULL AND p_form_factor != '' THEN score := score + 20; END IF;
IF p_speed_gbps IS NOT NULL AND p_speed_gbps > 0 THEN score := score + 20; END IF;
IF p_fiber_type IS NOT NULL AND p_fiber_type != '' THEN score := score + 20; END IF;
IF p_reach_meters IS NOT NULL AND p_reach_meters > 0 THEN score := score + 20; END IF;
IF p_wavelength_tx IS NOT NULL AND p_wavelength_tx > 0 THEN score := score + 10; END IF;
IF p_connector IS NOT NULL AND p_connector != '' THEN score := score + 10; END IF;
RETURN score;
END;
$$ LANGUAGE plpgsql IMMUTABLE;
-- 4. Alle bestehenden Transceivers: Completeness initial berechnen
UPDATE transceivers SET
data_completeness = calc_data_completeness(
form_factor, speed_gbps, fiber_type,
reach_meters, wavelength_tx_nm, connector_type
),
enrichment_needed = (
form_factor IS NULL OR speed_gbps IS NULL OR
fiber_type IS NULL OR reach_meters IS NULL OR
wavelength_tx_nm IS NULL OR connector_type IS NULL
);
-- 5. Trigger: Completeness automatisch aktualisieren
CREATE OR REPLACE FUNCTION trg_update_completeness()
RETURNS TRIGGER AS $$
BEGIN
NEW.data_completeness := calc_data_completeness(
NEW.form_factor, NEW.speed_gbps, NEW.fiber_type,
NEW.reach_meters, NEW.wavelength_tx_nm, NEW.connector_type
);
NEW.enrichment_needed := (
NEW.form_factor IS NULL OR NEW.speed_gbps IS NULL OR
NEW.fiber_type IS NULL OR NEW.reach_meters IS NULL OR
NEW.wavelength_tx_nm IS NULL OR NEW.connector_type IS NULL
);
-- Fehlende Felder dokumentieren
NEW.enrichment_fields := ARRAY_REMOVE(ARRAY[
CASE WHEN NEW.form_factor IS NULL THEN 'form_factor' END,
CASE WHEN NEW.speed_gbps IS NULL THEN 'speed_gbps' END,
CASE WHEN NEW.fiber_type IS NULL THEN 'fiber_type' END,
CASE WHEN NEW.reach_meters IS NULL THEN 'reach_meters' END,
CASE WHEN NEW.wavelength_tx_nm IS NULL THEN 'wavelength_tx_nm' END,
CASE WHEN NEW.connector_type IS NULL THEN 'connector_type' END
], NULL);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS trg_completeness ON transceivers;
CREATE TRIGGER trg_completeness
BEFORE INSERT OR UPDATE ON transceivers
FOR EACH ROW EXECUTE FUNCTION trg_update_completeness();
COMMENT ON COLUMN transceivers.wavelength_tx_nm IS 'TX wavelength in nm. For BiDi: TX side. For duplex: both TX=RX.';
COMMENT ON COLUMN transceivers.wavelength_rx_nm IS 'RX wavelength in nm. Only set for BiDi. NULL = same as TX.';
COMMENT ON COLUMN transceivers.connector_type IS 'Physical connector: LC, SC, MPO-12, MPO-16, RJ45, CS, SN';
COMMENT ON COLUMN transceivers.data_completeness IS '0-100: percentage of mandatory fields filled (6 fields × weight)';
COMMENT ON COLUMN transceivers.enrichment_needed IS 'TRUE = one or more mandatory fields missing, enrichment job needed';
COMMENT ON COLUMN transceivers.enrichment_fields IS 'Array of field names that still need enrichment';
```
---
## SCHRITT 2: IEEE/MSA Standards-Lookup-Tabelle (statisch)
**Erstelle Datei:** `sql/106-ieee-msa-wavelength-lookup.sql`
```sql
-- Migration 106: IEEE/MSA Standards Wavelength Lookup
-- Ground-Truth für Wellenlänge basierend auf Standard-Spezifikation
-- Quelle: IEEE 802.3, SFF-8472, SFF-8436, SFF-8661, SFF-8679, MSA specs
CREATE TABLE IF NOT EXISTS ieee_wavelength_lookup (
id SERIAL PRIMARY KEY,
form_factor TEXT NOT NULL,
speed_gbps NUMERIC NOT NULL,
fiber_type TEXT NOT NULL, -- 'SMF', 'MMF', 'DAC', 'AOC'
reach_min_m INTEGER NOT NULL,
reach_max_m INTEGER NOT NULL,
wavelength_tx_nm INTEGER NOT NULL,
wavelength_rx_nm INTEGER, -- NULL = gleich wie TX (kein BiDi)
connector_type TEXT NOT NULL,
ieee_standard TEXT, -- z.B. '802.3ae', 'SFF-8431'
notes TEXT,
UNIQUE (form_factor, speed_gbps, fiber_type, reach_min_m, reach_max_m)
);
INSERT INTO ieee_wavelength_lookup
(form_factor, speed_gbps, fiber_type, reach_min_m, reach_max_m, wavelength_tx_nm, wavelength_rx_nm, connector_type, ieee_standard, notes)
VALUES
-- ── SFP (1G) ─────────────────────────────────────────────────────────────────
('SFP', 1, 'MMF', 0, 550, 850, NULL, 'LC', '802.3z', '1000BASE-SX'),
('SFP', 1, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3z', '1000BASE-LX'),
('SFP', 1, 'SMF', 0, 40000, 1310, NULL, 'LC', '802.3z', '1000BASE-EX'),
('SFP', 1, 'SMF', 0, 70000, 1550, NULL, 'LC', '802.3z', '1000BASE-ZX'),
('SFP', 1, 'SMF', 0, 10000, 1270, 1330, 'LC', 'SFF-8472', '1000BASE-BX10-U BiDi'),
('SFP', 1, 'SMF', 0, 10000, 1330, 1270, 'LC', 'SFF-8472', '1000BASE-BX10-D BiDi'),
('SFP', 1, 'Copper', 0, 100, NULL, NULL, 'RJ45','802.3ab', '1000BASE-T'),
-- ── SFP+ (10G) ───────────────────────────────────────────────────────────────
('SFP+', 10, 'MMF', 0, 300, 850, NULL, 'LC', '802.3ae', '10GBASE-SR'),
('SFP+', 10, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3ae', '10GBASE-LR'),
('SFP+', 10, 'SMF', 0, 40000, 1310, NULL, 'LC', '802.3ae', '10GBASE-ER'),
('SFP+', 10, 'SMF', 0, 80000, 1550, NULL, 'LC', '802.3ae', '10GBASE-ZR'),
('SFP+', 10, 'SMF', 0, 10000, 1270, 1330, 'LC', 'SFF-8431', '10GBASE-BX10-U BiDi'),
('SFP+', 10, 'SMF', 0, 10000, 1330, 1270, 'LC', 'SFF-8431', '10GBASE-BX10-D BiDi'),
('SFP+', 10, 'SMF', 0, 20000, 1270, 1330, 'LC', 'SFF-8431', '10GBASE-BX20-U BiDi'),
('SFP+', 10, 'SMF', 0, 20000, 1330, 1270, 'LC', 'SFF-8431', '10GBASE-BX20-D BiDi'),
('SFP+', 10, 'DAC', 0, 7, NULL, NULL, 'SFP+','SFF-8431','10G DAC Twinax'),
('SFP+', 10, 'AOC', 0, 100, 850, NULL, 'LC', 'SFF-8431', '10G AOC'),
-- ── SFP28 (25G) ──────────────────────────────────────────────────────────────
('SFP28', 25, 'MMF', 0, 100, 850, NULL, 'LC', '802.3by', '25GBASE-SR'),
('SFP28', 25, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3cc', '25GBASE-LR'),
('SFP28', 25, 'SMF', 0, 2000, 1310, NULL, 'LC', '25GBASE-DR','25GBASE-DR'),
('SFP28', 25, 'SMF', 0, 10000, 1270, 1330, 'LC', '802.3cc', '25GBASE-BX10-U BiDi'),
('SFP28', 25, 'SMF', 0, 10000, 1330, 1270, 'LC', '802.3cc', '25GBASE-BX10-D BiDi'),
('SFP28', 25, 'DAC', 0, 5, NULL, NULL, 'SFP28','802.3by','25G DAC'),
-- ── QSFP+ (40G) ──────────────────────────────────────────────────────────────
('QSFP+', 40, 'MMF', 0, 150, 850, NULL, 'MPO-12','802.3ba','40GBASE-SR4'),
('QSFP+', 40, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3ba', '40GBASE-LR4 CWDM4'),
('QSFP+', 40, 'SMF', 0, 2000, 1310, NULL, 'MPO-12','802.3bm','40GBASE-PSM4'),
('QSFP+', 40, 'DAC', 0, 7, NULL, NULL, 'QSFP+','802.3ba','40G DAC'),
-- ── QSFP28 (100G) ────────────────────────────────────────────────────────────
('QSFP28', 100, 'MMF', 0, 100, 850, NULL, 'MPO-12','802.3bm','100GBASE-SR4'),
('QSFP28', 100, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3cd', '100GBASE-LR4 CWDM4'),
('QSFP28', 100, 'SMF', 0, 500, 1310, NULL, 'MPO-12','802.3bj','100GBASE-DR (PSM4)'),
('QSFP28', 100, 'SMF', 0, 40000, 1310, NULL, 'LC', '802.3ba', '100GBASE-ER4'),
('QSFP28', 100, 'SMF', 0, 2000, 1310, NULL, 'LC', 'CWDM4-MSA','100G CWDM4 2km'),
('QSFP28', 100, 'DAC', 0, 5, NULL, NULL, 'QSFP28','802.3bj','100G DAC'),
('QSFP28', 100, 'AOC', 0, 100, 850, NULL, 'MPO-12','802.3bm','100G AOC SR4'),
-- ── QSFP56 (200G) ────────────────────────────────────────────────────────────
('QSFP56', 200, 'MMF', 0, 100, 850, NULL, 'MPO-16','802.3cd','200GBASE-SR4'),
('QSFP56', 200, 'SMF', 0, 2000, 1310, NULL, 'LC', '802.3cd', '200GBASE-DR4'),
('QSFP56', 200, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3cd', '200GBASE-FR4'),
('QSFP56', 200, 'SMF', 0, 40000, 1310, NULL, 'LC', '802.3cd', '200GBASE-LR4'),
-- ── QSFP-DD (400G) ───────────────────────────────────────────────────────────
('QSFP-DD', 400, 'MMF', 0, 100, 850, NULL, 'MPO-16','802.3bs','400GBASE-SR8'),
('QSFP-DD', 400, 'SMF', 0, 500, 1310, NULL, 'MPO-12','802.3bs','400GBASE-DR4'),
('QSFP-DD', 400, 'SMF', 0, 2000, 1310, NULL, 'LC', '802.3bs', '400GBASE-FR4'),
('QSFP-DD', 400, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3bs', '400GBASE-LR4'),
('QSFP-DD', 400, 'SMF', 0, 10000, 1310, NULL, 'MPO-12','800G MSA','400GBASE-PSM4'),
('QSFP-DD', 400, 'DAC', 0, 5, NULL, NULL, 'QSFP-DD','802.3bs','400G DAC'),
-- ── OSFP / QSFP-DD800 (800G) ─────────────────────────────────────────────────
('OSFP', 800, 'MMF', 0, 100, 850, NULL, 'MPO-16','802.3df','800GBASE-SR8'),
('OSFP', 800, 'SMF', 0, 500, 1310, NULL, 'MPO-12','802.3df','800GBASE-DR8'),
('OSFP', 800, 'SMF', 0, 2000, 1310, NULL, 'LC', '802.3df', '800GBASE-FR4 2x400G'),
('OSFP', 800, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3df', '800GBASE-LR4'),
('QSFP-DD800', 800, 'MMF', 0, 100, 850, NULL, 'MPO-16','802.3df','800GBASE-SR8'),
('QSFP-DD800', 800, 'SMF', 0, 500, 1310, NULL, 'MPO-12','802.3df','800GBASE-DR8')
ON CONFLICT DO NOTHING;
CREATE INDEX IF NOT EXISTS idx_ieee_lookup ON ieee_wavelength_lookup
(form_factor, speed_gbps, fiber_type, reach_min_m, reach_max_m);
```
---
## SCHRITT 3: Enrichment-Robot (neues Modul)
**Erstelle Datei:** `packages/scraper/src/robots/wavelength-enricher.ts`
Dieser Robot läuft als pg-boss Job (`enrich:wavelength`) alle 4 Stunden.
Er füllt `wavelength_tx_nm`, `wavelength_rx_nm`, `connector_type` automatisch.
```typescript
/**
* Wavelength Enricher Robot
*
* Füllt fehlende wavelength_tx_nm / wavelength_rx_nm / connector_type
* aus drei Quellen (Priorität absteigend):
* 1. IEEE/MSA Lookup-Tabelle (sql/106) — deterministisch, keine Kosten
* 2. Produktname-Regex (heuristisch, pattern-basiert)
* 3. Quarantäne: Produkt bleibt ohne Match bis Daten vorhanden
*
* Kein LLM, kein Scraper, keine externen Calls — rein datenbankbasiert.
*/
import { pool } from "../utils/db";
// ── Regex-Patterns für Wellenlänge aus Produktnamen ──────────────────────────
const WAVELENGTH_PATTERNS: Array<{
pattern: RegExp;
tx: number;
rx?: number;
notes: string;
}> = [
// BiDi explizit
{ pattern: /\b1270\s*\/\s*1330\b/i, tx: 1270, rx: 1330, notes: "BiDi 1270/1330" },
{ pattern: /\b1330\s*\/\s*1270\b/i, tx: 1330, rx: 1270, notes: "BiDi 1330/1270" },
{ pattern: /\b1310\s*\/\s*1550\b/i, tx: 1310, rx: 1550, notes: "BiDi 1310/1550" },
{ pattern: /\b1550\s*\/\s*1310\b/i, tx: 1550, rx: 1310, notes: "BiDi 1550/1310" },
{ pattern: /\b1295\s*\/\s*1310\b/i, tx: 1295, rx: 1310, notes: "BiDi CWDM" },
// Direkte nm-Angabe
{ pattern: /\b850\s*nm\b/i, tx: 850, notes: "850nm explicit" },
{ pattern: /\b1310\s*nm\b/i, tx: 1310, notes: "1310nm explicit" },
{ pattern: /\b1550\s*nm\b/i, tx: 1550, notes: "1550nm explicit" },
{ pattern: /\b1270\s*nm\b/i, tx: 1270, notes: "1270nm explicit" },
{ pattern: /\b1330\s*nm\b/i, tx: 1330, notes: "1330nm explicit" },
// DWDM Channels (C-Band ~1530-1565nm)
{ pattern: /\bDWDM\b.*\bC\d{2}\b/i, tx: 1550, notes: "DWDM C-Band" },
{ pattern: /\bDWDM\b/i, tx: 1550, notes: "DWDM generic" },
// Standard-Kurzbezeichnungen → implizite Wellenlänge
{ pattern: /\bSR4?\b/i, tx: 850, notes: "SR/SR4 = 850nm MMF" },
{ pattern: /\bLR4?\b/i, tx: 1310, notes: "LR/LR4 = 1310nm SMF" },
{ pattern: /\bER4?\b/i, tx: 1310, notes: "ER/ER4 = 1310nm SMF" },
{ pattern: /\bFR4?\b/i, tx: 1310, notes: "FR/FR4 = 1310nm SMF" },
{ pattern: /\bDR4?\b/i, tx: 1310, notes: "DR/DR4 = 1310nm SMF" },
{ pattern: /\bZR4?\b/i, tx: 1550, notes: "ZR/ZR4 = 1550nm SMF" },
];
const CONNECTOR_PATTERNS: Array<{ pattern: RegExp; connector: string }> = [
{ pattern: /\bMPO.?16\b/i, connector: "MPO-16" },
{ pattern: /\bMPO.?12\b/i, connector: "MPO-12" },
{ pattern: /\bMPO\b/i, connector: "MPO-12" }, // default MPO = MPO-12
{ pattern: /\bMTP\b/i, connector: "MPO-12" },
{ pattern: /\bCS\s*connector\b/i, connector: "CS" },
{ pattern: /\bSN\s*connector\b/i, connector: "SN" },
{ pattern: /\bRJ.?45\b/i, connector: "RJ45" },
{ pattern: /\bbase.?t\b/i, connector: "RJ45" },
{ pattern: /\bSC\b/i, connector: "SC" },
{ pattern: /\bLC\b/i, connector: "LC" }, // LC zuletzt (häufig im Text)
];
// DAC/AOC haben keinen Fiber-Connector
const DAC_AOC_PATTERN = /\bDAC\b|\bAOC\b|\btwinax\b/i;
function extractWavelengthFromName(name: string): { tx: number; rx?: number; notes: string } | null {
for (const p of WAVELENGTH_PATTERNS) {
if (p.pattern.test(name)) {
return { tx: p.tx, rx: p.rx, notes: p.notes };
}
}
return null;
}
function extractConnectorFromName(name: string): string | null {
if (DAC_AOC_PATTERN.test(name)) return "DAC/AOC";
for (const p of CONNECTOR_PATTERNS) {
if (p.pattern.test(name)) return p.connector;
}
return null;
}
export async function runWavelengthEnricher(): Promise<void> {
console.log("=== Wavelength Enricher Robot ===");
// Alle Transceivers mit fehlenden Pflichtfeldern
const { rows: transceivers } = await pool.query<{
id: string;
standard_name: string;
part_number: string;
form_factor: string;
speed_gbps: number;
fiber_type: string;
reach_meters: number;
wavelength_tx_nm: number | null;
wavelength_rx_nm: number | null;
connector_type: string | null;
}>(`
SELECT id, standard_name, part_number, form_factor, speed_gbps,
fiber_type, reach_meters, wavelength_tx_nm, wavelength_rx_nm, connector_type
FROM transceivers
WHERE enrichment_needed = TRUE
ORDER BY data_completeness DESC -- Produkte mit mehr Daten zuerst
LIMIT 5000
`);
let fromIeee = 0;
let fromRegex = 0;
let stillMissing = 0;
for (const t of transceivers) {
let txNm = t.wavelength_tx_nm;
let rxNm = t.wavelength_rx_nm;
let connector = t.connector_type;
let source = "";
// ── Quelle 1: IEEE/MSA Lookup ───────────────────────────────────────────
if (txNm === null && t.form_factor && t.speed_gbps && t.fiber_type && t.reach_meters) {
const { rows: ieee } = await pool.query<{
wavelength_tx_nm: number;
wavelength_rx_nm: number | null;
connector_type: string;
}>(`
SELECT wavelength_tx_nm, wavelength_rx_nm, connector_type
FROM ieee_wavelength_lookup
WHERE form_factor = $1
AND speed_gbps = $2
AND fiber_type = $3
AND reach_min_m <= $4
AND reach_max_m >= $4
LIMIT 1
`, [t.form_factor, t.speed_gbps, t.fiber_type, t.reach_meters]);
if (ieee.length > 0) {
txNm = ieee[0].wavelength_tx_nm;
rxNm = ieee[0].wavelength_rx_nm ?? null;
if (!connector) connector = ieee[0].connector_type;
source = "ieee_lookup";
fromIeee++;
}
}
// ── Quelle 2: Produktname-Regex ─────────────────────────────────────────
const nameForExtraction = [t.standard_name, t.part_number].filter(Boolean).join(" ");
if (txNm === null && nameForExtraction) {
const extracted = extractWavelengthFromName(nameForExtraction);
if (extracted) {
txNm = extracted.tx;
rxNm = extracted.rx ?? null;
source = `regex:${extracted.notes}`;
fromRegex++;
}
}
if (connector === null && nameForExtraction) {
const extractedConn = extractConnectorFromName(nameForExtraction);
if (extractedConn && extractedConn !== "DAC/AOC") connector = extractedConn;
}
// ── Update wenn etwas gefunden ──────────────────────────────────────────
if (txNm !== null || connector !== null) {
await pool.query(`
UPDATE transceivers SET
wavelength_tx_nm = COALESCE($1, wavelength_tx_nm),
wavelength_rx_nm = COALESCE($2, wavelength_rx_nm),
connector_type = COALESCE($3, connector_type),
updated_at = NOW()
WHERE id = $4
`, [txNm, rxNm, connector, t.id]);
} else {
stillMissing++;
}
}
console.log(` IEEE Lookup: ${fromIeee} enriched`);
console.log(` Regex Extract: ${fromRegex} enriched`);
console.log(` Still missing: ${stillMissing} (Quarantäne bis Daten verfügbar)`);
console.log("=== Wavelength Enricher Complete ===");
}
```
---
## SCHRITT 4: Neuer deterministischer Matcher
**Ändere in:** `packages/scraper/src/scheduler.ts`
Suche den Block `// fiber_type match` (ca. Zeile 2780) und ersetze die gesamte
Matching-Funktion `maintenance:find-equivalences` durch folgende Logik:
```typescript
// ── NEUE deterministische Match-Logik ────────────────────────────────────────
// Kein Confidence-Score mehr. Nur MATCH oder KEIN MATCH.
// Voraussetzung: ALLE 6 Pflichtfelder müssen bekannt sein.
// Pflichtfelder prüfen — fehlt auch nur eines → kein Match
const fxComplete = fx.form_factor && fx.speed_gbps && fx.fiber_type &&
fx.reach_meters && fx.wavelength_tx_nm && fx.connector_type;
const candComplete = cand.form_factor && cand.speed_gbps && cand.fiber_type &&
cand.reach_meters && cand.wavelength_tx_nm && cand.connector_type;
if (!fxComplete || !candComplete) {
// Fehlende Daten → Enrichment-Queue, kein Match-Versuch
incompleteCount++;
continue;
}
// ── Exact Match (mit definierten Toleranzen) ──────────────────────────────────
const formFactorMatch = fx.form_factor.trim().toUpperCase() === cand.form_factor.trim().toUpperCase();
if (!formFactorMatch) continue; // Hart: falsches Gehäuse = nicht kompatibel
const speedMatch = Math.abs(Number(fx.speed_gbps) - Number(cand.speed_gbps)) < 0.1;
if (!speedMatch) continue; // Hart: 10G ≠ 25G
const fiberMatch = fx.fiber_type.trim().toUpperCase() === cand.fiber_type.trim().toUpperCase();
if (!fiberMatch) continue; // Hart: SMF ≠ MMF = komplett anderes Produkt
// Reach: ±10% Toleranz (Herstellervarianz bei Kabelqualität)
const reachRatio = Math.abs(fx.reach_meters - cand.reach_meters) / Math.max(fx.reach_meters, 1);
if (reachRatio > 0.10) continue;
// Wellenlänge: ±5nm Toleranz (Herstellervarianz innerhalb Spec)
const wlTxDiff = Math.abs((fx.wavelength_tx_nm ?? 0) - (cand.wavelength_tx_nm ?? 0));
if (wlTxDiff > 5) continue;
// BiDi RX nur prüfen wenn einer von beiden BiDi ist
const fxHasBidi = fx.wavelength_rx_nm != null;
const candHasBidi = cand.wavelength_rx_nm != null;
if (fxHasBidi || candHasBidi) {
const wlRxDiff = Math.abs((fx.wavelength_rx_nm ?? 0) - (cand.wavelength_rx_nm ?? 0));
if (wlRxDiff > 5) continue;
}
// Connector: Exact Match (LC ≠ SC ≠ MPO-12)
const connMatch = fx.connector_type.trim().toUpperCase() === cand.connector_type.trim().toUpperCase();
if (!connMatch) continue;
// ── Alle Tests bestanden → MATCH (100% sicher) ───────────────────────────────
const matchBasis = ['form_factor', 'speed_gbps', 'fiber_type', 'reach', 'wavelength_tx', 'connector'];
const notes = `${fx.part_number}${cand.part_number} (${cand.vendor_name}) | ` +
`basis: ${matchBasis.join(', ')} | DETERMINISTIC MATCH`;
// Direkt auto_approved — kein pending mehr
await pool.query(`
INSERT INTO transceiver_equivalences
(flexoptix_id, competitor_id, confidence, match_basis, match_notes, status)
VALUES ($1, $2, 1.0, $3, $4, 'auto_approved')
ON CONFLICT (flexoptix_id, competitor_id) DO UPDATE SET
confidence = 1.0,
match_basis = EXCLUDED.match_basis,
match_notes = EXCLUDED.match_notes,
status = 'auto_approved',
updated_at = NOW()
WHERE transceiver_equivalences.status = 'pending'
`, [fx.id, cand.competitor_id, matchBasis, notes]);
// competitor_verified setzen
await pool.query(`
UPDATE transceivers
SET competitor_verified = true,
competitor_verified_at = NOW(),
competitor_status = 'matched',
competitor_status_updated_at = NOW()
WHERE id = $1 AND competitor_verified = false
`, [fx.id]);
matchedCount++;
```
---
## SCHRITT 5: Pending-Queue bereinigen
**Erstelle Datei:** `sql/107-clear-pending-queue.sql`
```sql
-- Migration 107: Pending Queue bereinigen
-- Alle 'pending' Einträge die KEINEN deterministischen Match haben → reject
-- Alle 'pending' Einträge die jetzt deterministische Matches wären → werden
-- durch den neuen Matcher bei nächstem Run neu erzeugt (als auto_approved)
-- Schritt 1: Alle pending-Einträge rejcten (veraltete unsichere Matches)
UPDATE transceiver_equivalences
SET status = 'rejected',
reject_reason = 'Superseded by deterministic matcher — confidence-based match removed',
reviewed_at = NOW(),
reviewed_by = 'system:migration-107'
WHERE status = 'pending';
-- Schritt 2: Statistik loggen
DO $$
DECLARE
pending_count INTEGER;
approved_count INTEGER;
rejected_count INTEGER;
BEGIN
SELECT COUNT(*) INTO pending_count FROM transceiver_equivalences WHERE status = 'pending';
SELECT COUNT(*) INTO approved_count FROM transceiver_equivalences WHERE status IN ('approved', 'auto_approved');
SELECT COUNT(*) INTO rejected_count FROM transceiver_equivalences WHERE status = 'rejected';
RAISE NOTICE 'After migration 107: pending=%, approved=%, rejected=%',
pending_count, approved_count, rejected_count;
END;
$$;
-- Schritt 3: Index für deterministischen Matcher optimieren
CREATE INDEX IF NOT EXISTS idx_eq_deterministic ON transceiver_equivalences
(flexoptix_id, competitor_id, status)
WHERE status = 'auto_approved';
```
---
## SCHRITT 6: pg-boss Job für Enrichment-Robot registrieren
**Ändere in:** `packages/scraper/src/scheduler.ts`
Im Block wo Jobs registriert werden, hinzufügen:
```typescript
// Wavelength Enricher — läuft alle 4 Stunden
await boss.schedule('enrich:wavelength', '0 */4 * * *', {}, {});
// Handler registrieren
boss.work('enrich:wavelength', async () => {
await runWavelengthEnricher();
});
```
---
## SCHRITT 7: API — Manual Review UI deaktivieren
**Ändere in:** `packages/api/src/routes/review.ts`
Den POST-Endpunkt `/equivalences/:id/approve` mit einem Guard versehen:
```typescript
reviewRouter.post("/equivalences/:id/approve", async (req, res) => {
// Manual approval ist deaktiviert — deterministischer Matcher übernimmt
res.status(410).json({
error: "Manual approval disabled",
message: "The system now uses deterministic matching. No manual review needed.",
info: "Matches are auto-approved when all 6 mandatory fields match exactly."
});
});
```
---
## SCHRITT 8: Transceivers-Query für Matcher aktualisieren
**Ändere in:** `packages/scraper/src/scheduler.ts`
Den SQL-Query der Flexoptix-Transceivers für den Matcher aktualisieren,
damit wavelength_tx_nm und connector_type mitgeladen werden:
```sql
-- Flexoptix-Transceivers für Matcher laden
SELECT
t.id, t.part_number, t.standard_name, t.form_factor,
t.speed_gbps, t.fiber_type, t.reach_meters, t.wavelengths,
t.wavelength_tx_nm, -- NEU
t.wavelength_rx_nm, -- NEU
t.connector_type, -- NEU (war vorher 'connector')
t.data_completeness, -- NEU
t.enrichment_needed -- NEU
FROM transceivers t
JOIN vendors v ON v.id = t.vendor_id
WHERE v.name = 'Flexoptix'
AND t.enrichment_needed = FALSE -- NUR vollständige Datensätze matchen
AND t.data_completeness >= 80
ORDER BY t.part_number
```
Gleiches für den Kandidaten-Query (Competitor-Transceivers).
---
## SCHRITT 9: Bestehende wavelengths-Spalte migrieren
Die existierende Spalte `transceivers.wavelengths` (TEXT, z.B. "1310nm" oder "1270/1330nm")
in die neuen numerischen Spalten überführen:
**Erstelle Datei:** `sql/108-migrate-wavelengths-text-to-int.sql`
```sql
-- Migration 108: wavelengths TEXT → wavelength_tx_nm / wavelength_rx_nm INTEGER
UPDATE transceivers SET
wavelength_tx_nm = CASE
WHEN wavelengths ~ '^\s*850' THEN 850
WHEN wavelengths ~ '^\s*1270' THEN 1270
WHEN wavelengths ~ '^\s*1310' THEN 1310
WHEN wavelengths ~ '^\s*1330' THEN 1330
WHEN wavelengths ~ '^\s*1490' THEN 1490
WHEN wavelengths ~ '^\s*1550' THEN 1550
WHEN wavelengths ~ '^\s*1270\s*/\s*1330' THEN 1270
WHEN wavelengths ~ '^\s*1330\s*/\s*1270' THEN 1330
ELSE NULL
END,
wavelength_rx_nm = CASE
WHEN wavelengths ~ '1270\s*/\s*1330' THEN 1330
WHEN wavelengths ~ '1330\s*/\s*1270' THEN 1270
WHEN wavelengths ~ '1310\s*/\s*1550' THEN 1550
WHEN wavelengths ~ '1550\s*/\s*1310' THEN 1310
ELSE NULL
END
WHERE wavelengths IS NOT NULL
AND wavelength_tx_nm IS NULL;
-- Connector aus alter connector-Spalte übernehmen (falls vorhanden)
UPDATE transceivers SET
connector_type = CASE connector
WHEN 'LC' THEN 'LC'
WHEN 'SC' THEN 'SC'
WHEN 'MPO' THEN 'MPO-12'
WHEN 'MPO-12' THEN 'MPO-12'
WHEN 'MPO-16' THEN 'MPO-16'
WHEN 'RJ45' THEN 'RJ45'
ELSE connector
END
WHERE connector IS NOT NULL AND connector_type IS NULL;
-- Completeness neu berechnen nach Migration
UPDATE transceivers SET
data_completeness = calc_data_completeness(
form_factor, speed_gbps, fiber_type,
reach_meters, wavelength_tx_nm, connector_type
),
enrichment_needed = (
form_factor IS NULL OR speed_gbps IS NULL OR
fiber_type IS NULL OR reach_meters IS NULL OR
wavelength_tx_nm IS NULL OR connector_type IS NULL
);
```
---
## Ausführungsreihenfolge für Codex
```
1. sql/105-wavelength-connector-completeness.sql → psql ausführen
2. sql/106-ieee-msa-wavelength-lookup.sql → psql ausführen
3. sql/108-migrate-wavelengths-text-to-int.sql → psql ausführen
4. packages/scraper/src/robots/wavelength-enricher.ts → neue Datei erstellen
5. packages/scraper/src/scheduler.ts → Matcher-Logik ersetzen (Schritt 4)
6. packages/scraper/src/scheduler.ts → Job registrieren (Schritt 6)
7. packages/api/src/routes/review.ts → Approve-Endpunkt deaktivieren (Schritt 7)
8. npm run build -w packages/scraper → Build
9. npm run build -w packages/api → Build
10. sql/107-clear-pending-queue.sql → psql ausführen (ZULETZT — bereinigt Queue)
11. pm2 restart tip-scraper-daemon → Daemon neu starten
12. Enricher manuell triggern: POST /api/review/run-matcher
```
---
## Erwartetes Ergebnis nach Deployment
| Metrik | Vorher | Nachher |
|--------|--------|---------|
| Pending Queue | 13.374 | 0 |
| Confidence-Score | 0.01.0 (fuzzy) | entfällt |
| Match-Typ | probabilistisch | deterministisch |
| Manuelle Freigaben/Tag | ~50-200 | 0 |
| False-Positive-Rate | ~15% | ~0% |
| Transceivers mit Wellenlänge | ~30% | >85% (nach Enricher) |
---
## Nicht ändern (außerhalb dieses Tasks)
- `packages/scraper/src/scrapers/*` — Scraper-Logik bleibt unverändert
- `packages/api/src/routes/` (außer review.ts) — API-Endpunkte bleiben
- `sql/001-036` — Bestehende Migrationen nicht anfassen
- Alle `packages/core/src/` Typen — nur erweitern, nicht löschen
- PM2-Konfiguration — nicht anfassen