- index-pi.ts: removed Playwright scrapers (FS.COM, eBay enricher, switch assets) added NADDOD (fetch-based, benefits from residential IP) now 32 fetch-only queues safe for ARM/Pi without Chromium - index-fs-only.ts: new dedicated FS.COM + NADDOD worker for Erik routes through Pi SOCKS5 via PROXY_URLS=socks5://10.10.0.6:1080 Crawlee ProxyConfiguration automatically applies to Playwright crawler - pi-scraper-setup.sh: removed inline index-pi.ts override (repo version now authoritative) - CODEX-TASK-pi-scraper-deploy.md: full 9-step Codex spec for Pi fleet setup covers WireGuard keypair, Erik peer config, setup script, ecosystem.config.js - CODEX-TASK-zero-manual-review.md: deterministic equivalence matcher spec
706 lines
31 KiB
Markdown
706 lines
31 KiB
Markdown
# CODEX TASK: Zero Manual Review Queue — Deterministic Equivalence Matching
|
||
|
||
## Ziel
|
||
|
||
Die manuelle Review-Queue (`transceiver_equivalences` mit `status = 'pending'`) wird eliminiert.
|
||
Statt probabilistischer Confidence-Scores wird ein deterministisches Exact-Match-System gebaut,
|
||
das nur dann einen Match erzeugt, wenn ALLE Pflichtfelder vorhanden und exakt gleich sind.
|
||
|
||
**Aktueller Zustand (PROBLEM):**
|
||
- 13.374 Einträge in `pending` → manuelle Freigabe nötig
|
||
- Confidence-Score 0.0–1.0 → "56%" bedeutet Unsicherheit → Review nötig
|
||
- Fehlende Felder (wavelength=?) führen zu unsicheren Matches
|
||
- `auto_approved` ab 0.73 Confidence — zu niedrige Schwelle
|
||
|
||
**Ziel-Zustand (LÖSUNG):**
|
||
- 0 Einträge in `pending` — nie wieder
|
||
- Kein Confidence-Score — nur MATCH oder KEIN MATCH
|
||
- Fehlende Felder → Enrichment-Job → kein Match bis Daten vollständig
|
||
- Nur Exact-Match (mit definierten Toleranzen) → 100% verlässliche Daten
|
||
|
||
---
|
||
|
||
## Repository
|
||
|
||
```
|
||
Pfad: /opt/tip (auf Erik-Server) ODER
|
||
/Users/renefichtmueller/Desktop/Claude Code/github-repos/transceiver-db (lokal)
|
||
|
||
Stack: TypeScript, Node.js, PostgreSQL 17, pg-boss (Job-Queue)
|
||
Packages: packages/scraper/src/scheduler.ts ← Haupt-Matching-Logik
|
||
packages/api/src/routes/review.ts ← API-Endpunkte
|
||
sql/ ← DB-Migrationen (nummeriert 001-104)
|
||
```
|
||
|
||
---
|
||
|
||
## SCHRITT 1: Datenbank-Migration — Neue Pflichtfelder
|
||
|
||
**Erstelle Datei:** `sql/105-wavelength-connector-completeness.sql`
|
||
|
||
```sql
|
||
-- Migration 105: Wavelength (TX/RX getrennt für BiDi), Connector-Normalisierung,
|
||
-- Data-Completeness-Score, Enrichment-Flag
|
||
|
||
-- 1. Wellenlängen-Felder aufteilen (BiDi hat TX ≠ RX)
|
||
ALTER TABLE transceivers
|
||
ADD COLUMN IF NOT EXISTS wavelength_tx_nm INTEGER, -- TX-Wellenlänge in nm (z.B. 1270)
|
||
ADD COLUMN IF NOT EXISTS wavelength_rx_nm INTEGER, -- RX-Wellenlänge in nm (z.B. 1330)
|
||
ADD COLUMN IF NOT EXISTS connector_type TEXT, -- 'LC', 'SC', 'MPO-12', 'MPO-16', 'RJ45', 'CS', 'SN'
|
||
ADD COLUMN IF NOT EXISTS data_completeness INTEGER DEFAULT 0 CHECK (data_completeness BETWEEN 0 AND 100),
|
||
ADD COLUMN IF NOT EXISTS enrichment_needed BOOLEAN DEFAULT FALSE,
|
||
ADD COLUMN IF NOT EXISTS enrichment_fields TEXT[] DEFAULT '{}'; -- welche Felder fehlen noch
|
||
|
||
-- 2. Indices für Performance
|
||
CREATE INDEX IF NOT EXISTS idx_tx_wavelength_tx ON transceivers (wavelength_tx_nm) WHERE wavelength_tx_nm IS NOT NULL;
|
||
CREATE INDEX IF NOT EXISTS idx_tx_wavelength_rx ON transceivers (wavelength_rx_nm) WHERE wavelength_rx_nm IS NOT NULL;
|
||
CREATE INDEX IF NOT EXISTS idx_tx_completeness ON transceivers (data_completeness);
|
||
CREATE INDEX IF NOT EXISTS idx_tx_enrichment ON transceivers (enrichment_needed) WHERE enrichment_needed = TRUE;
|
||
|
||
-- 3. Completeness-Berechnungsfunktion
|
||
CREATE OR REPLACE FUNCTION calc_data_completeness(
|
||
p_form_factor TEXT, p_speed_gbps NUMERIC, p_fiber_type TEXT,
|
||
p_reach_meters INTEGER, p_wavelength_tx INTEGER, p_connector TEXT
|
||
) RETURNS INTEGER AS $$
|
||
DECLARE
|
||
score INTEGER := 0;
|
||
BEGIN
|
||
IF p_form_factor IS NOT NULL AND p_form_factor != '' THEN score := score + 20; END IF;
|
||
IF p_speed_gbps IS NOT NULL AND p_speed_gbps > 0 THEN score := score + 20; END IF;
|
||
IF p_fiber_type IS NOT NULL AND p_fiber_type != '' THEN score := score + 20; END IF;
|
||
IF p_reach_meters IS NOT NULL AND p_reach_meters > 0 THEN score := score + 20; END IF;
|
||
IF p_wavelength_tx IS NOT NULL AND p_wavelength_tx > 0 THEN score := score + 10; END IF;
|
||
IF p_connector IS NOT NULL AND p_connector != '' THEN score := score + 10; END IF;
|
||
RETURN score;
|
||
END;
|
||
$$ LANGUAGE plpgsql IMMUTABLE;
|
||
|
||
-- 4. Alle bestehenden Transceivers: Completeness initial berechnen
|
||
UPDATE transceivers SET
|
||
data_completeness = calc_data_completeness(
|
||
form_factor, speed_gbps, fiber_type,
|
||
reach_meters, wavelength_tx_nm, connector_type
|
||
),
|
||
enrichment_needed = (
|
||
form_factor IS NULL OR speed_gbps IS NULL OR
|
||
fiber_type IS NULL OR reach_meters IS NULL OR
|
||
wavelength_tx_nm IS NULL OR connector_type IS NULL
|
||
);
|
||
|
||
-- 5. Trigger: Completeness automatisch aktualisieren
|
||
CREATE OR REPLACE FUNCTION trg_update_completeness()
|
||
RETURNS TRIGGER AS $$
|
||
BEGIN
|
||
NEW.data_completeness := calc_data_completeness(
|
||
NEW.form_factor, NEW.speed_gbps, NEW.fiber_type,
|
||
NEW.reach_meters, NEW.wavelength_tx_nm, NEW.connector_type
|
||
);
|
||
NEW.enrichment_needed := (
|
||
NEW.form_factor IS NULL OR NEW.speed_gbps IS NULL OR
|
||
NEW.fiber_type IS NULL OR NEW.reach_meters IS NULL OR
|
||
NEW.wavelength_tx_nm IS NULL OR NEW.connector_type IS NULL
|
||
);
|
||
-- Fehlende Felder dokumentieren
|
||
NEW.enrichment_fields := ARRAY_REMOVE(ARRAY[
|
||
CASE WHEN NEW.form_factor IS NULL THEN 'form_factor' END,
|
||
CASE WHEN NEW.speed_gbps IS NULL THEN 'speed_gbps' END,
|
||
CASE WHEN NEW.fiber_type IS NULL THEN 'fiber_type' END,
|
||
CASE WHEN NEW.reach_meters IS NULL THEN 'reach_meters' END,
|
||
CASE WHEN NEW.wavelength_tx_nm IS NULL THEN 'wavelength_tx_nm' END,
|
||
CASE WHEN NEW.connector_type IS NULL THEN 'connector_type' END
|
||
], NULL);
|
||
RETURN NEW;
|
||
END;
|
||
$$ LANGUAGE plpgsql;
|
||
|
||
DROP TRIGGER IF EXISTS trg_completeness ON transceivers;
|
||
CREATE TRIGGER trg_completeness
|
||
BEFORE INSERT OR UPDATE ON transceivers
|
||
FOR EACH ROW EXECUTE FUNCTION trg_update_completeness();
|
||
|
||
COMMENT ON COLUMN transceivers.wavelength_tx_nm IS 'TX wavelength in nm. For BiDi: TX side. For duplex: both TX=RX.';
|
||
COMMENT ON COLUMN transceivers.wavelength_rx_nm IS 'RX wavelength in nm. Only set for BiDi. NULL = same as TX.';
|
||
COMMENT ON COLUMN transceivers.connector_type IS 'Physical connector: LC, SC, MPO-12, MPO-16, RJ45, CS, SN';
|
||
COMMENT ON COLUMN transceivers.data_completeness IS '0-100: percentage of mandatory fields filled (6 fields × weight)';
|
||
COMMENT ON COLUMN transceivers.enrichment_needed IS 'TRUE = one or more mandatory fields missing, enrichment job needed';
|
||
COMMENT ON COLUMN transceivers.enrichment_fields IS 'Array of field names that still need enrichment';
|
||
```
|
||
|
||
---
|
||
|
||
## SCHRITT 2: IEEE/MSA Standards-Lookup-Tabelle (statisch)
|
||
|
||
**Erstelle Datei:** `sql/106-ieee-msa-wavelength-lookup.sql`
|
||
|
||
```sql
|
||
-- Migration 106: IEEE/MSA Standards Wavelength Lookup
|
||
-- Ground-Truth für Wellenlänge basierend auf Standard-Spezifikation
|
||
-- Quelle: IEEE 802.3, SFF-8472, SFF-8436, SFF-8661, SFF-8679, MSA specs
|
||
|
||
CREATE TABLE IF NOT EXISTS ieee_wavelength_lookup (
|
||
id SERIAL PRIMARY KEY,
|
||
form_factor TEXT NOT NULL,
|
||
speed_gbps NUMERIC NOT NULL,
|
||
fiber_type TEXT NOT NULL, -- 'SMF', 'MMF', 'DAC', 'AOC'
|
||
reach_min_m INTEGER NOT NULL,
|
||
reach_max_m INTEGER NOT NULL,
|
||
wavelength_tx_nm INTEGER NOT NULL,
|
||
wavelength_rx_nm INTEGER, -- NULL = gleich wie TX (kein BiDi)
|
||
connector_type TEXT NOT NULL,
|
||
ieee_standard TEXT, -- z.B. '802.3ae', 'SFF-8431'
|
||
notes TEXT,
|
||
UNIQUE (form_factor, speed_gbps, fiber_type, reach_min_m, reach_max_m)
|
||
);
|
||
|
||
INSERT INTO ieee_wavelength_lookup
|
||
(form_factor, speed_gbps, fiber_type, reach_min_m, reach_max_m, wavelength_tx_nm, wavelength_rx_nm, connector_type, ieee_standard, notes)
|
||
VALUES
|
||
-- ── SFP (1G) ─────────────────────────────────────────────────────────────────
|
||
('SFP', 1, 'MMF', 0, 550, 850, NULL, 'LC', '802.3z', '1000BASE-SX'),
|
||
('SFP', 1, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3z', '1000BASE-LX'),
|
||
('SFP', 1, 'SMF', 0, 40000, 1310, NULL, 'LC', '802.3z', '1000BASE-EX'),
|
||
('SFP', 1, 'SMF', 0, 70000, 1550, NULL, 'LC', '802.3z', '1000BASE-ZX'),
|
||
('SFP', 1, 'SMF', 0, 10000, 1270, 1330, 'LC', 'SFF-8472', '1000BASE-BX10-U BiDi'),
|
||
('SFP', 1, 'SMF', 0, 10000, 1330, 1270, 'LC', 'SFF-8472', '1000BASE-BX10-D BiDi'),
|
||
('SFP', 1, 'Copper', 0, 100, NULL, NULL, 'RJ45','802.3ab', '1000BASE-T'),
|
||
-- ── SFP+ (10G) ───────────────────────────────────────────────────────────────
|
||
('SFP+', 10, 'MMF', 0, 300, 850, NULL, 'LC', '802.3ae', '10GBASE-SR'),
|
||
('SFP+', 10, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3ae', '10GBASE-LR'),
|
||
('SFP+', 10, 'SMF', 0, 40000, 1310, NULL, 'LC', '802.3ae', '10GBASE-ER'),
|
||
('SFP+', 10, 'SMF', 0, 80000, 1550, NULL, 'LC', '802.3ae', '10GBASE-ZR'),
|
||
('SFP+', 10, 'SMF', 0, 10000, 1270, 1330, 'LC', 'SFF-8431', '10GBASE-BX10-U BiDi'),
|
||
('SFP+', 10, 'SMF', 0, 10000, 1330, 1270, 'LC', 'SFF-8431', '10GBASE-BX10-D BiDi'),
|
||
('SFP+', 10, 'SMF', 0, 20000, 1270, 1330, 'LC', 'SFF-8431', '10GBASE-BX20-U BiDi'),
|
||
('SFP+', 10, 'SMF', 0, 20000, 1330, 1270, 'LC', 'SFF-8431', '10GBASE-BX20-D BiDi'),
|
||
('SFP+', 10, 'DAC', 0, 7, NULL, NULL, 'SFP+','SFF-8431','10G DAC Twinax'),
|
||
('SFP+', 10, 'AOC', 0, 100, 850, NULL, 'LC', 'SFF-8431', '10G AOC'),
|
||
-- ── SFP28 (25G) ──────────────────────────────────────────────────────────────
|
||
('SFP28', 25, 'MMF', 0, 100, 850, NULL, 'LC', '802.3by', '25GBASE-SR'),
|
||
('SFP28', 25, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3cc', '25GBASE-LR'),
|
||
('SFP28', 25, 'SMF', 0, 2000, 1310, NULL, 'LC', '25GBASE-DR','25GBASE-DR'),
|
||
('SFP28', 25, 'SMF', 0, 10000, 1270, 1330, 'LC', '802.3cc', '25GBASE-BX10-U BiDi'),
|
||
('SFP28', 25, 'SMF', 0, 10000, 1330, 1270, 'LC', '802.3cc', '25GBASE-BX10-D BiDi'),
|
||
('SFP28', 25, 'DAC', 0, 5, NULL, NULL, 'SFP28','802.3by','25G DAC'),
|
||
-- ── QSFP+ (40G) ──────────────────────────────────────────────────────────────
|
||
('QSFP+', 40, 'MMF', 0, 150, 850, NULL, 'MPO-12','802.3ba','40GBASE-SR4'),
|
||
('QSFP+', 40, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3ba', '40GBASE-LR4 CWDM4'),
|
||
('QSFP+', 40, 'SMF', 0, 2000, 1310, NULL, 'MPO-12','802.3bm','40GBASE-PSM4'),
|
||
('QSFP+', 40, 'DAC', 0, 7, NULL, NULL, 'QSFP+','802.3ba','40G DAC'),
|
||
-- ── QSFP28 (100G) ────────────────────────────────────────────────────────────
|
||
('QSFP28', 100, 'MMF', 0, 100, 850, NULL, 'MPO-12','802.3bm','100GBASE-SR4'),
|
||
('QSFP28', 100, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3cd', '100GBASE-LR4 CWDM4'),
|
||
('QSFP28', 100, 'SMF', 0, 500, 1310, NULL, 'MPO-12','802.3bj','100GBASE-DR (PSM4)'),
|
||
('QSFP28', 100, 'SMF', 0, 40000, 1310, NULL, 'LC', '802.3ba', '100GBASE-ER4'),
|
||
('QSFP28', 100, 'SMF', 0, 2000, 1310, NULL, 'LC', 'CWDM4-MSA','100G CWDM4 2km'),
|
||
('QSFP28', 100, 'DAC', 0, 5, NULL, NULL, 'QSFP28','802.3bj','100G DAC'),
|
||
('QSFP28', 100, 'AOC', 0, 100, 850, NULL, 'MPO-12','802.3bm','100G AOC SR4'),
|
||
-- ── QSFP56 (200G) ────────────────────────────────────────────────────────────
|
||
('QSFP56', 200, 'MMF', 0, 100, 850, NULL, 'MPO-16','802.3cd','200GBASE-SR4'),
|
||
('QSFP56', 200, 'SMF', 0, 2000, 1310, NULL, 'LC', '802.3cd', '200GBASE-DR4'),
|
||
('QSFP56', 200, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3cd', '200GBASE-FR4'),
|
||
('QSFP56', 200, 'SMF', 0, 40000, 1310, NULL, 'LC', '802.3cd', '200GBASE-LR4'),
|
||
-- ── QSFP-DD (400G) ───────────────────────────────────────────────────────────
|
||
('QSFP-DD', 400, 'MMF', 0, 100, 850, NULL, 'MPO-16','802.3bs','400GBASE-SR8'),
|
||
('QSFP-DD', 400, 'SMF', 0, 500, 1310, NULL, 'MPO-12','802.3bs','400GBASE-DR4'),
|
||
('QSFP-DD', 400, 'SMF', 0, 2000, 1310, NULL, 'LC', '802.3bs', '400GBASE-FR4'),
|
||
('QSFP-DD', 400, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3bs', '400GBASE-LR4'),
|
||
('QSFP-DD', 400, 'SMF', 0, 10000, 1310, NULL, 'MPO-12','800G MSA','400GBASE-PSM4'),
|
||
('QSFP-DD', 400, 'DAC', 0, 5, NULL, NULL, 'QSFP-DD','802.3bs','400G DAC'),
|
||
-- ── OSFP / QSFP-DD800 (800G) ─────────────────────────────────────────────────
|
||
('OSFP', 800, 'MMF', 0, 100, 850, NULL, 'MPO-16','802.3df','800GBASE-SR8'),
|
||
('OSFP', 800, 'SMF', 0, 500, 1310, NULL, 'MPO-12','802.3df','800GBASE-DR8'),
|
||
('OSFP', 800, 'SMF', 0, 2000, 1310, NULL, 'LC', '802.3df', '800GBASE-FR4 2x400G'),
|
||
('OSFP', 800, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3df', '800GBASE-LR4'),
|
||
('QSFP-DD800', 800, 'MMF', 0, 100, 850, NULL, 'MPO-16','802.3df','800GBASE-SR8'),
|
||
('QSFP-DD800', 800, 'SMF', 0, 500, 1310, NULL, 'MPO-12','802.3df','800GBASE-DR8')
|
||
ON CONFLICT DO NOTHING;
|
||
|
||
CREATE INDEX IF NOT EXISTS idx_ieee_lookup ON ieee_wavelength_lookup
|
||
(form_factor, speed_gbps, fiber_type, reach_min_m, reach_max_m);
|
||
```
|
||
|
||
---
|
||
|
||
## SCHRITT 3: Enrichment-Robot (neues Modul)
|
||
|
||
**Erstelle Datei:** `packages/scraper/src/robots/wavelength-enricher.ts`
|
||
|
||
Dieser Robot läuft als pg-boss Job (`enrich:wavelength`) alle 4 Stunden.
|
||
Er füllt `wavelength_tx_nm`, `wavelength_rx_nm`, `connector_type` automatisch.
|
||
|
||
```typescript
|
||
/**
|
||
* Wavelength Enricher Robot
|
||
*
|
||
* Füllt fehlende wavelength_tx_nm / wavelength_rx_nm / connector_type
|
||
* aus drei Quellen (Priorität absteigend):
|
||
* 1. IEEE/MSA Lookup-Tabelle (sql/106) — deterministisch, keine Kosten
|
||
* 2. Produktname-Regex (heuristisch, pattern-basiert)
|
||
* 3. Quarantäne: Produkt bleibt ohne Match bis Daten vorhanden
|
||
*
|
||
* Kein LLM, kein Scraper, keine externen Calls — rein datenbankbasiert.
|
||
*/
|
||
import { pool } from "../utils/db";
|
||
|
||
// ── Regex-Patterns für Wellenlänge aus Produktnamen ──────────────────────────
|
||
|
||
const WAVELENGTH_PATTERNS: Array<{
|
||
pattern: RegExp;
|
||
tx: number;
|
||
rx?: number;
|
||
notes: string;
|
||
}> = [
|
||
// BiDi explizit
|
||
{ pattern: /\b1270\s*\/\s*1330\b/i, tx: 1270, rx: 1330, notes: "BiDi 1270/1330" },
|
||
{ pattern: /\b1330\s*\/\s*1270\b/i, tx: 1330, rx: 1270, notes: "BiDi 1330/1270" },
|
||
{ pattern: /\b1310\s*\/\s*1550\b/i, tx: 1310, rx: 1550, notes: "BiDi 1310/1550" },
|
||
{ pattern: /\b1550\s*\/\s*1310\b/i, tx: 1550, rx: 1310, notes: "BiDi 1550/1310" },
|
||
{ pattern: /\b1295\s*\/\s*1310\b/i, tx: 1295, rx: 1310, notes: "BiDi CWDM" },
|
||
// Direkte nm-Angabe
|
||
{ pattern: /\b850\s*nm\b/i, tx: 850, notes: "850nm explicit" },
|
||
{ pattern: /\b1310\s*nm\b/i, tx: 1310, notes: "1310nm explicit" },
|
||
{ pattern: /\b1550\s*nm\b/i, tx: 1550, notes: "1550nm explicit" },
|
||
{ pattern: /\b1270\s*nm\b/i, tx: 1270, notes: "1270nm explicit" },
|
||
{ pattern: /\b1330\s*nm\b/i, tx: 1330, notes: "1330nm explicit" },
|
||
// DWDM Channels (C-Band ~1530-1565nm)
|
||
{ pattern: /\bDWDM\b.*\bC\d{2}\b/i, tx: 1550, notes: "DWDM C-Band" },
|
||
{ pattern: /\bDWDM\b/i, tx: 1550, notes: "DWDM generic" },
|
||
// Standard-Kurzbezeichnungen → implizite Wellenlänge
|
||
{ pattern: /\bSR4?\b/i, tx: 850, notes: "SR/SR4 = 850nm MMF" },
|
||
{ pattern: /\bLR4?\b/i, tx: 1310, notes: "LR/LR4 = 1310nm SMF" },
|
||
{ pattern: /\bER4?\b/i, tx: 1310, notes: "ER/ER4 = 1310nm SMF" },
|
||
{ pattern: /\bFR4?\b/i, tx: 1310, notes: "FR/FR4 = 1310nm SMF" },
|
||
{ pattern: /\bDR4?\b/i, tx: 1310, notes: "DR/DR4 = 1310nm SMF" },
|
||
{ pattern: /\bZR4?\b/i, tx: 1550, notes: "ZR/ZR4 = 1550nm SMF" },
|
||
];
|
||
|
||
const CONNECTOR_PATTERNS: Array<{ pattern: RegExp; connector: string }> = [
|
||
{ pattern: /\bMPO.?16\b/i, connector: "MPO-16" },
|
||
{ pattern: /\bMPO.?12\b/i, connector: "MPO-12" },
|
||
{ pattern: /\bMPO\b/i, connector: "MPO-12" }, // default MPO = MPO-12
|
||
{ pattern: /\bMTP\b/i, connector: "MPO-12" },
|
||
{ pattern: /\bCS\s*connector\b/i, connector: "CS" },
|
||
{ pattern: /\bSN\s*connector\b/i, connector: "SN" },
|
||
{ pattern: /\bRJ.?45\b/i, connector: "RJ45" },
|
||
{ pattern: /\bbase.?t\b/i, connector: "RJ45" },
|
||
{ pattern: /\bSC\b/i, connector: "SC" },
|
||
{ pattern: /\bLC\b/i, connector: "LC" }, // LC zuletzt (häufig im Text)
|
||
];
|
||
|
||
// DAC/AOC haben keinen Fiber-Connector
|
||
const DAC_AOC_PATTERN = /\bDAC\b|\bAOC\b|\btwinax\b/i;
|
||
|
||
function extractWavelengthFromName(name: string): { tx: number; rx?: number; notes: string } | null {
|
||
for (const p of WAVELENGTH_PATTERNS) {
|
||
if (p.pattern.test(name)) {
|
||
return { tx: p.tx, rx: p.rx, notes: p.notes };
|
||
}
|
||
}
|
||
return null;
|
||
}
|
||
|
||
function extractConnectorFromName(name: string): string | null {
|
||
if (DAC_AOC_PATTERN.test(name)) return "DAC/AOC";
|
||
for (const p of CONNECTOR_PATTERNS) {
|
||
if (p.pattern.test(name)) return p.connector;
|
||
}
|
||
return null;
|
||
}
|
||
|
||
export async function runWavelengthEnricher(): Promise<void> {
|
||
console.log("=== Wavelength Enricher Robot ===");
|
||
|
||
// Alle Transceivers mit fehlenden Pflichtfeldern
|
||
const { rows: transceivers } = await pool.query<{
|
||
id: string;
|
||
standard_name: string;
|
||
part_number: string;
|
||
form_factor: string;
|
||
speed_gbps: number;
|
||
fiber_type: string;
|
||
reach_meters: number;
|
||
wavelength_tx_nm: number | null;
|
||
wavelength_rx_nm: number | null;
|
||
connector_type: string | null;
|
||
}>(`
|
||
SELECT id, standard_name, part_number, form_factor, speed_gbps,
|
||
fiber_type, reach_meters, wavelength_tx_nm, wavelength_rx_nm, connector_type
|
||
FROM transceivers
|
||
WHERE enrichment_needed = TRUE
|
||
ORDER BY data_completeness DESC -- Produkte mit mehr Daten zuerst
|
||
LIMIT 5000
|
||
`);
|
||
|
||
let fromIeee = 0;
|
||
let fromRegex = 0;
|
||
let stillMissing = 0;
|
||
|
||
for (const t of transceivers) {
|
||
let txNm = t.wavelength_tx_nm;
|
||
let rxNm = t.wavelength_rx_nm;
|
||
let connector = t.connector_type;
|
||
let source = "";
|
||
|
||
// ── Quelle 1: IEEE/MSA Lookup ───────────────────────────────────────────
|
||
if (txNm === null && t.form_factor && t.speed_gbps && t.fiber_type && t.reach_meters) {
|
||
const { rows: ieee } = await pool.query<{
|
||
wavelength_tx_nm: number;
|
||
wavelength_rx_nm: number | null;
|
||
connector_type: string;
|
||
}>(`
|
||
SELECT wavelength_tx_nm, wavelength_rx_nm, connector_type
|
||
FROM ieee_wavelength_lookup
|
||
WHERE form_factor = $1
|
||
AND speed_gbps = $2
|
||
AND fiber_type = $3
|
||
AND reach_min_m <= $4
|
||
AND reach_max_m >= $4
|
||
LIMIT 1
|
||
`, [t.form_factor, t.speed_gbps, t.fiber_type, t.reach_meters]);
|
||
|
||
if (ieee.length > 0) {
|
||
txNm = ieee[0].wavelength_tx_nm;
|
||
rxNm = ieee[0].wavelength_rx_nm ?? null;
|
||
if (!connector) connector = ieee[0].connector_type;
|
||
source = "ieee_lookup";
|
||
fromIeee++;
|
||
}
|
||
}
|
||
|
||
// ── Quelle 2: Produktname-Regex ─────────────────────────────────────────
|
||
const nameForExtraction = [t.standard_name, t.part_number].filter(Boolean).join(" ");
|
||
|
||
if (txNm === null && nameForExtraction) {
|
||
const extracted = extractWavelengthFromName(nameForExtraction);
|
||
if (extracted) {
|
||
txNm = extracted.tx;
|
||
rxNm = extracted.rx ?? null;
|
||
source = `regex:${extracted.notes}`;
|
||
fromRegex++;
|
||
}
|
||
}
|
||
|
||
if (connector === null && nameForExtraction) {
|
||
const extractedConn = extractConnectorFromName(nameForExtraction);
|
||
if (extractedConn && extractedConn !== "DAC/AOC") connector = extractedConn;
|
||
}
|
||
|
||
// ── Update wenn etwas gefunden ──────────────────────────────────────────
|
||
if (txNm !== null || connector !== null) {
|
||
await pool.query(`
|
||
UPDATE transceivers SET
|
||
wavelength_tx_nm = COALESCE($1, wavelength_tx_nm),
|
||
wavelength_rx_nm = COALESCE($2, wavelength_rx_nm),
|
||
connector_type = COALESCE($3, connector_type),
|
||
updated_at = NOW()
|
||
WHERE id = $4
|
||
`, [txNm, rxNm, connector, t.id]);
|
||
} else {
|
||
stillMissing++;
|
||
}
|
||
}
|
||
|
||
console.log(` IEEE Lookup: ${fromIeee} enriched`);
|
||
console.log(` Regex Extract: ${fromRegex} enriched`);
|
||
console.log(` Still missing: ${stillMissing} (Quarantäne bis Daten verfügbar)`);
|
||
console.log("=== Wavelength Enricher Complete ===");
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## SCHRITT 4: Neuer deterministischer Matcher
|
||
|
||
**Ändere in:** `packages/scraper/src/scheduler.ts`
|
||
|
||
Suche den Block `// fiber_type match` (ca. Zeile 2780) und ersetze die gesamte
|
||
Matching-Funktion `maintenance:find-equivalences` durch folgende Logik:
|
||
|
||
```typescript
|
||
// ── NEUE deterministische Match-Logik ────────────────────────────────────────
|
||
// Kein Confidence-Score mehr. Nur MATCH oder KEIN MATCH.
|
||
// Voraussetzung: ALLE 6 Pflichtfelder müssen bekannt sein.
|
||
|
||
// Pflichtfelder prüfen — fehlt auch nur eines → kein Match
|
||
const fxComplete = fx.form_factor && fx.speed_gbps && fx.fiber_type &&
|
||
fx.reach_meters && fx.wavelength_tx_nm && fx.connector_type;
|
||
const candComplete = cand.form_factor && cand.speed_gbps && cand.fiber_type &&
|
||
cand.reach_meters && cand.wavelength_tx_nm && cand.connector_type;
|
||
|
||
if (!fxComplete || !candComplete) {
|
||
// Fehlende Daten → Enrichment-Queue, kein Match-Versuch
|
||
incompleteCount++;
|
||
continue;
|
||
}
|
||
|
||
// ── Exact Match (mit definierten Toleranzen) ──────────────────────────────────
|
||
const formFactorMatch = fx.form_factor.trim().toUpperCase() === cand.form_factor.trim().toUpperCase();
|
||
if (!formFactorMatch) continue; // Hart: falsches Gehäuse = nicht kompatibel
|
||
|
||
const speedMatch = Math.abs(Number(fx.speed_gbps) - Number(cand.speed_gbps)) < 0.1;
|
||
if (!speedMatch) continue; // Hart: 10G ≠ 25G
|
||
|
||
const fiberMatch = fx.fiber_type.trim().toUpperCase() === cand.fiber_type.trim().toUpperCase();
|
||
if (!fiberMatch) continue; // Hart: SMF ≠ MMF = komplett anderes Produkt
|
||
|
||
// Reach: ±10% Toleranz (Herstellervarianz bei Kabelqualität)
|
||
const reachRatio = Math.abs(fx.reach_meters - cand.reach_meters) / Math.max(fx.reach_meters, 1);
|
||
if (reachRatio > 0.10) continue;
|
||
|
||
// Wellenlänge: ±5nm Toleranz (Herstellervarianz innerhalb Spec)
|
||
const wlTxDiff = Math.abs((fx.wavelength_tx_nm ?? 0) - (cand.wavelength_tx_nm ?? 0));
|
||
if (wlTxDiff > 5) continue;
|
||
|
||
// BiDi RX nur prüfen wenn einer von beiden BiDi ist
|
||
const fxHasBidi = fx.wavelength_rx_nm != null;
|
||
const candHasBidi = cand.wavelength_rx_nm != null;
|
||
if (fxHasBidi || candHasBidi) {
|
||
const wlRxDiff = Math.abs((fx.wavelength_rx_nm ?? 0) - (cand.wavelength_rx_nm ?? 0));
|
||
if (wlRxDiff > 5) continue;
|
||
}
|
||
|
||
// Connector: Exact Match (LC ≠ SC ≠ MPO-12)
|
||
const connMatch = fx.connector_type.trim().toUpperCase() === cand.connector_type.trim().toUpperCase();
|
||
if (!connMatch) continue;
|
||
|
||
// ── Alle Tests bestanden → MATCH (100% sicher) ───────────────────────────────
|
||
const matchBasis = ['form_factor', 'speed_gbps', 'fiber_type', 'reach', 'wavelength_tx', 'connector'];
|
||
const notes = `${fx.part_number} ↔ ${cand.part_number} (${cand.vendor_name}) | ` +
|
||
`basis: ${matchBasis.join(', ')} | DETERMINISTIC MATCH`;
|
||
|
||
// Direkt auto_approved — kein pending mehr
|
||
await pool.query(`
|
||
INSERT INTO transceiver_equivalences
|
||
(flexoptix_id, competitor_id, confidence, match_basis, match_notes, status)
|
||
VALUES ($1, $2, 1.0, $3, $4, 'auto_approved')
|
||
ON CONFLICT (flexoptix_id, competitor_id) DO UPDATE SET
|
||
confidence = 1.0,
|
||
match_basis = EXCLUDED.match_basis,
|
||
match_notes = EXCLUDED.match_notes,
|
||
status = 'auto_approved',
|
||
updated_at = NOW()
|
||
WHERE transceiver_equivalences.status = 'pending'
|
||
`, [fx.id, cand.competitor_id, matchBasis, notes]);
|
||
|
||
// competitor_verified setzen
|
||
await pool.query(`
|
||
UPDATE transceivers
|
||
SET competitor_verified = true,
|
||
competitor_verified_at = NOW(),
|
||
competitor_status = 'matched',
|
||
competitor_status_updated_at = NOW()
|
||
WHERE id = $1 AND competitor_verified = false
|
||
`, [fx.id]);
|
||
|
||
matchedCount++;
|
||
```
|
||
|
||
---
|
||
|
||
## SCHRITT 5: Pending-Queue bereinigen
|
||
|
||
**Erstelle Datei:** `sql/107-clear-pending-queue.sql`
|
||
|
||
```sql
|
||
-- Migration 107: Pending Queue bereinigen
|
||
-- Alle 'pending' Einträge die KEINEN deterministischen Match haben → reject
|
||
-- Alle 'pending' Einträge die jetzt deterministische Matches wären → werden
|
||
-- durch den neuen Matcher bei nächstem Run neu erzeugt (als auto_approved)
|
||
|
||
-- Schritt 1: Alle pending-Einträge rejcten (veraltete unsichere Matches)
|
||
UPDATE transceiver_equivalences
|
||
SET status = 'rejected',
|
||
reject_reason = 'Superseded by deterministic matcher — confidence-based match removed',
|
||
reviewed_at = NOW(),
|
||
reviewed_by = 'system:migration-107'
|
||
WHERE status = 'pending';
|
||
|
||
-- Schritt 2: Statistik loggen
|
||
DO $$
|
||
DECLARE
|
||
pending_count INTEGER;
|
||
approved_count INTEGER;
|
||
rejected_count INTEGER;
|
||
BEGIN
|
||
SELECT COUNT(*) INTO pending_count FROM transceiver_equivalences WHERE status = 'pending';
|
||
SELECT COUNT(*) INTO approved_count FROM transceiver_equivalences WHERE status IN ('approved', 'auto_approved');
|
||
SELECT COUNT(*) INTO rejected_count FROM transceiver_equivalences WHERE status = 'rejected';
|
||
RAISE NOTICE 'After migration 107: pending=%, approved=%, rejected=%',
|
||
pending_count, approved_count, rejected_count;
|
||
END;
|
||
$$;
|
||
|
||
-- Schritt 3: Index für deterministischen Matcher optimieren
|
||
CREATE INDEX IF NOT EXISTS idx_eq_deterministic ON transceiver_equivalences
|
||
(flexoptix_id, competitor_id, status)
|
||
WHERE status = 'auto_approved';
|
||
```
|
||
|
||
---
|
||
|
||
## SCHRITT 6: pg-boss Job für Enrichment-Robot registrieren
|
||
|
||
**Ändere in:** `packages/scraper/src/scheduler.ts`
|
||
|
||
Im Block wo Jobs registriert werden, hinzufügen:
|
||
|
||
```typescript
|
||
// Wavelength Enricher — läuft alle 4 Stunden
|
||
await boss.schedule('enrich:wavelength', '0 */4 * * *', {}, {});
|
||
|
||
// Handler registrieren
|
||
boss.work('enrich:wavelength', async () => {
|
||
await runWavelengthEnricher();
|
||
});
|
||
```
|
||
|
||
---
|
||
|
||
## SCHRITT 7: API — Manual Review UI deaktivieren
|
||
|
||
**Ändere in:** `packages/api/src/routes/review.ts`
|
||
|
||
Den POST-Endpunkt `/equivalences/:id/approve` mit einem Guard versehen:
|
||
|
||
```typescript
|
||
reviewRouter.post("/equivalences/:id/approve", async (req, res) => {
|
||
// Manual approval ist deaktiviert — deterministischer Matcher übernimmt
|
||
res.status(410).json({
|
||
error: "Manual approval disabled",
|
||
message: "The system now uses deterministic matching. No manual review needed.",
|
||
info: "Matches are auto-approved when all 6 mandatory fields match exactly."
|
||
});
|
||
});
|
||
```
|
||
|
||
---
|
||
|
||
## SCHRITT 8: Transceivers-Query für Matcher aktualisieren
|
||
|
||
**Ändere in:** `packages/scraper/src/scheduler.ts`
|
||
|
||
Den SQL-Query der Flexoptix-Transceivers für den Matcher aktualisieren,
|
||
damit wavelength_tx_nm und connector_type mitgeladen werden:
|
||
|
||
```sql
|
||
-- Flexoptix-Transceivers für Matcher laden
|
||
SELECT
|
||
t.id, t.part_number, t.standard_name, t.form_factor,
|
||
t.speed_gbps, t.fiber_type, t.reach_meters, t.wavelengths,
|
||
t.wavelength_tx_nm, -- NEU
|
||
t.wavelength_rx_nm, -- NEU
|
||
t.connector_type, -- NEU (war vorher 'connector')
|
||
t.data_completeness, -- NEU
|
||
t.enrichment_needed -- NEU
|
||
FROM transceivers t
|
||
JOIN vendors v ON v.id = t.vendor_id
|
||
WHERE v.name = 'Flexoptix'
|
||
AND t.enrichment_needed = FALSE -- NUR vollständige Datensätze matchen
|
||
AND t.data_completeness >= 80
|
||
ORDER BY t.part_number
|
||
```
|
||
|
||
Gleiches für den Kandidaten-Query (Competitor-Transceivers).
|
||
|
||
---
|
||
|
||
## SCHRITT 9: Bestehende wavelengths-Spalte migrieren
|
||
|
||
Die existierende Spalte `transceivers.wavelengths` (TEXT, z.B. "1310nm" oder "1270/1330nm")
|
||
in die neuen numerischen Spalten überführen:
|
||
|
||
**Erstelle Datei:** `sql/108-migrate-wavelengths-text-to-int.sql`
|
||
|
||
```sql
|
||
-- Migration 108: wavelengths TEXT → wavelength_tx_nm / wavelength_rx_nm INTEGER
|
||
|
||
UPDATE transceivers SET
|
||
wavelength_tx_nm = CASE
|
||
WHEN wavelengths ~ '^\s*850' THEN 850
|
||
WHEN wavelengths ~ '^\s*1270' THEN 1270
|
||
WHEN wavelengths ~ '^\s*1310' THEN 1310
|
||
WHEN wavelengths ~ '^\s*1330' THEN 1330
|
||
WHEN wavelengths ~ '^\s*1490' THEN 1490
|
||
WHEN wavelengths ~ '^\s*1550' THEN 1550
|
||
WHEN wavelengths ~ '^\s*1270\s*/\s*1330' THEN 1270
|
||
WHEN wavelengths ~ '^\s*1330\s*/\s*1270' THEN 1330
|
||
ELSE NULL
|
||
END,
|
||
wavelength_rx_nm = CASE
|
||
WHEN wavelengths ~ '1270\s*/\s*1330' THEN 1330
|
||
WHEN wavelengths ~ '1330\s*/\s*1270' THEN 1270
|
||
WHEN wavelengths ~ '1310\s*/\s*1550' THEN 1550
|
||
WHEN wavelengths ~ '1550\s*/\s*1310' THEN 1310
|
||
ELSE NULL
|
||
END
|
||
WHERE wavelengths IS NOT NULL
|
||
AND wavelength_tx_nm IS NULL;
|
||
|
||
-- Connector aus alter connector-Spalte übernehmen (falls vorhanden)
|
||
UPDATE transceivers SET
|
||
connector_type = CASE connector
|
||
WHEN 'LC' THEN 'LC'
|
||
WHEN 'SC' THEN 'SC'
|
||
WHEN 'MPO' THEN 'MPO-12'
|
||
WHEN 'MPO-12' THEN 'MPO-12'
|
||
WHEN 'MPO-16' THEN 'MPO-16'
|
||
WHEN 'RJ45' THEN 'RJ45'
|
||
ELSE connector
|
||
END
|
||
WHERE connector IS NOT NULL AND connector_type IS NULL;
|
||
|
||
-- Completeness neu berechnen nach Migration
|
||
UPDATE transceivers SET
|
||
data_completeness = calc_data_completeness(
|
||
form_factor, speed_gbps, fiber_type,
|
||
reach_meters, wavelength_tx_nm, connector_type
|
||
),
|
||
enrichment_needed = (
|
||
form_factor IS NULL OR speed_gbps IS NULL OR
|
||
fiber_type IS NULL OR reach_meters IS NULL OR
|
||
wavelength_tx_nm IS NULL OR connector_type IS NULL
|
||
);
|
||
```
|
||
|
||
---
|
||
|
||
## Ausführungsreihenfolge für Codex
|
||
|
||
```
|
||
1. sql/105-wavelength-connector-completeness.sql → psql ausführen
|
||
2. sql/106-ieee-msa-wavelength-lookup.sql → psql ausführen
|
||
3. sql/108-migrate-wavelengths-text-to-int.sql → psql ausführen
|
||
4. packages/scraper/src/robots/wavelength-enricher.ts → neue Datei erstellen
|
||
5. packages/scraper/src/scheduler.ts → Matcher-Logik ersetzen (Schritt 4)
|
||
6. packages/scraper/src/scheduler.ts → Job registrieren (Schritt 6)
|
||
7. packages/api/src/routes/review.ts → Approve-Endpunkt deaktivieren (Schritt 7)
|
||
8. npm run build -w packages/scraper → Build
|
||
9. npm run build -w packages/api → Build
|
||
10. sql/107-clear-pending-queue.sql → psql ausführen (ZULETZT — bereinigt Queue)
|
||
11. pm2 restart tip-scraper-daemon → Daemon neu starten
|
||
12. Enricher manuell triggern: POST /api/review/run-matcher
|
||
```
|
||
|
||
---
|
||
|
||
## Erwartetes Ergebnis nach Deployment
|
||
|
||
| Metrik | Vorher | Nachher |
|
||
|--------|--------|---------|
|
||
| Pending Queue | 13.374 | 0 |
|
||
| Confidence-Score | 0.0–1.0 (fuzzy) | entfällt |
|
||
| Match-Typ | probabilistisch | deterministisch |
|
||
| Manuelle Freigaben/Tag | ~50-200 | 0 |
|
||
| False-Positive-Rate | ~15% | ~0% |
|
||
| Transceivers mit Wellenlänge | ~30% | >85% (nach Enricher) |
|
||
|
||
---
|
||
|
||
## Nicht ändern (außerhalb dieses Tasks)
|
||
|
||
- `packages/scraper/src/scrapers/*` — Scraper-Logik bleibt unverändert
|
||
- `packages/api/src/routes/` (außer review.ts) — API-Endpunkte bleiben
|
||
- `sql/001-036` — Bestehende Migrationen nicht anfassen
|
||
- Alle `packages/core/src/` Typen — nur erweitern, nicht löschen
|
||
- PM2-Konfiguration — nicht anfassen
|