transceiver-db/sql/025-verification-quality-fix.sql
Rene Fichtmueller d9f5fc253f fix(verification): 100% Verified Badge war dramatisch zu großzügig
KERNPROBLEME BEHOBEN:
1. ATGBICS part_number = URL slug statt echte OEM-Nummer
   extractOemPartNumber() entfernt -r-compatible-transceiver-* Suffix
   + trailing Vendor-Namen (nokia, cisco, juniper, ...)
   Ergebnis: 3he16564aa-nokia-r-compatible-transceiver-... → 3HE16564AA

2. reach_label = '' (leer) wurde als details_verified akzeptiert
   IS NOT NULL erlaubt leere Strings → Fix: AND reach_label != ''

3. details_verified = true trotz garbled part_number
   Neue Kriterien: NOT ILIKE '%-compatible-transceiver%'
                   NOT ILIKE '%-r-compatible%'

4. data_confidence Werte falsch in Funktion ('scraped_unverified' etc)
   Echte Werte: low/medium/high/garbage → NOT IN ('garbage','unknown')

ERGEBNIS nach recompute_all_verification():
  fully_verified: 3.654 → 581 (Badge war 6x übertrieben)
  details_verified: inflated → 1.075 (korrekt)

ATGBICS Scraper:
  - extractOemPartNumber() für collection und product detail pages
  - detectReach() jetzt auch auf URL-slug (120km im slug → reach_label)

Price Anomaly Detection:
  - API: price_anomaly field wenn max/min ratio ≥ 10x
  - Dashboard: ⚠ Preisanomalie Banner mit Ratio + EUR Range

SQL 025: Part number cleanup (30 records), reach from slug (12 records)
2026-04-04 15:41:57 +02:00

60 lines
2.4 KiB
SQL

-- Migration 025: Fix details_verified quality gate + repair garbled ATGBICS records
-- Problem: details_verified = true when:
-- 1. reach_label = '' (empty string passes IS NOT NULL)
-- 2. part_number contains 'compatible-transceiver' (URL slug stored as PN)
-- ─────────────────────────────────────────────────────────────────────────────
-- Step 1: Fix part_numbers that are ATGBICS URL slugs
-- Extract the real OEM part number: take everything before "-r-compatible" or "-compatible"
UPDATE transceivers
SET
part_number = UPPER(
REGEXP_REPLACE(
REGEXP_REPLACE(
part_number,
'-(nokia|cisco|juniper|arista|huawei|hp|hpe|dell|extreme|brocade|mellanox|intel|broadcom|netgear|foundry|calix|ciena|adtran|palo|fortinet|alcatel|ericsson|nec|fujitsu|infinera|ribbon|hitachi|rad|zhone|ubiquiti|mikrotik|avaya|enterasys|allied|planet|zyxel|dlink)$',
'',
'i'
),
'-(r-compatible|compatible)(-transceiver.*)?$',
'',
'i'
)
),
updated_at = NOW()
WHERE
part_number ILIKE '%-r-compatible%'
OR part_number ILIKE '%-compatible-transceiver%';
-- Step 2: Extract reach_meters from reach_label where reach_meters = 0 but reach_label has data
UPDATE transceivers
SET
reach_meters = CASE
WHEN reach_label ILIKE '%km' THEN
CAST(REGEXP_REPLACE(reach_label, '[^0-9]', '', 'g') AS INTEGER) * 1000
WHEN reach_label ILIKE '%m' AND reach_label NOT ILIKE '%km' THEN
CAST(REGEXP_REPLACE(reach_label, '[^0-9]', '', 'g') AS INTEGER)
ELSE reach_meters
END,
updated_at = NOW()
WHERE reach_meters = 0
AND reach_label IS NOT NULL
AND reach_label != ''
AND reach_label ~ '^\d+\s*(m|km)$';
-- Step 3: Also extract reach_label from slug where still missing
-- For records where slug contains NNkm pattern (e.g. scraped-3he16564aa-...-120km-...)
UPDATE transceivers
SET
reach_label = (REGEXP_MATCH(slug, '(\d+km)'))[1],
reach_meters = CAST((REGEXP_MATCH(slug, '(\d+)km'))[1] AS INTEGER) * 1000,
updated_at = NOW()
WHERE
(reach_label IS NULL OR reach_label = '')
AND reach_meters = 0
AND slug ~ '\d+km';
-- Step 4: Recompute all verification badges with the fixed criteria
-- (Updates details_verified, fully_verified for all affected transceivers)
SELECT recompute_all_verification();