Rene Fichtmueller
db6b97186a
feat: OPN+spec equivalence matchers, 400G pricing, TIP_LLM training data
...
- Add OPN-based equivalence matcher robot (7,245 manufacturer-confirmed matches, confidence=1.0)
- Add spec-based equivalence matcher robot (683 matches, confidence=0.85)
- Matches by form_factor + speed_gbps + reach_tier + wavelength ±10nm
- Safety cap: skip FX products matching >30 competitors (too generic)
- Daily schedule: 04:30 UTC via pg-boss
- SQL migrations 116 (OPN) + 117 (spec) with tip_extract_wavelength_nm() + tip_reach_tier() helpers
- Fix tenGtek.ts: add 3 missing 400G categories (QSFP-DD, QSFP112) — closes pricing gap
- Generate tip-llm-pricing-v1.jsonl: 80 DB-grounded QA pairs (pricing, equivalences, 400G)
- Rebuild TIP_LLM training pool: 11,999 pairs (+127 vs prev), deployed to Erik
- FX product equivalence coverage: 88.1% (959/1089)
2026-05-13 21:33:19 +02:00
Rene Fichtmueller
2f85571784
feat: Flexoptix full product detail sync (sql/115 + detail-enricher robot)
...
Pulls complete per-SKU specifications and compatibility data from the
Flexoptix API (specifications=1&compatibilities=1) and writes structured
fields to the transceivers table for datasheet generation.
SQL migration 115:
- Adds fx_specifications JSONB (raw spec blob for datasheet gen)
- Adds fx_compatibilities JSONB (full OEM compatibility matrix)
- Adds compliance_code, laser_type, receiver_type, supported_protocols[]
- Adds extinction_ratio_db, cdr_support, inbuilt_fec, detail_synced_at
- GIN index on fx_compatibilities for vendor/OPN queries
flexoptix-detail-enricher.ts:
- Per-SKU API calls with rate-limiting (600ms/call, 100 SKUs/run)
- Parses all spec labels → structured fields (power, budget, tx/rx dBm,
modulation, wavelengths, temp range, DOM, laser type, receiver type)
- Strips :Sx variant suffixes before API queries (self-configure SKUs)
- COALESCE writes — never overwrites existing data, only fills gaps
- Tracks detail_synced_at, retries stale entries after 7 days
flexoptix-api-sync.ts:
- Also stores image_url and product_page_url during bulk sync
scheduler.ts:
- Registers enrich:flexoptix-details daily at 03:00 UTC
Results after initial run:
- 791/968 FX products (81.7%) fully enriched
- 26.0 avg compatibility entries per product (OEM vendor + OPN)
- 25.7 avg spec fields per product
- DFB(483), EML(148), FP(72), VCSEL(44) laser type distribution
2026-05-13 18:49:28 +02:00
Rene Fichtmueller
d1bde66e39
feat: deterministic equivalence matcher + full wavelength/connector enrichment
...
Replace confidence-based matcher with deterministic 6-field exact match:
- form_factor (exact), speed_gbps (±0.1G), fiber_type (exact),
reach (±10%), wavelength_tx (±5nm), connector_type (exact)
- Complete products → confidence=1.0, never creates pending records
- Incomplete products → enhanced confidence ≥0.85, still auto_approved
- PENDING CREATED: 0 (by design, permanent)
Migrations:
- sql/113: Connector type inference from IEEE lookup + form-factor rules
(970→479 missing connector for FX products)
- sql/114: Extend IEEE lookup with 400G/800G/1.6T OSFP/QSFP-DD standards,
wavelength fallback (SMF→1310nm, MMF→850nm), clear pending queue to 0
Enrichment results (before→after):
- FX fully complete: 50 → 555 / 1,089 (+505)
- Total fully complete: ~3,600 → 15,431 / 18,133 (+11,800)
- FX coverage: 54.7% → 55.8% (608/1,089 matched)
- Deterministic matches: 0 → 44,596 (confidence=1.0)
- Wavelength-mismatched records rejected: 521
- Pending queue: 42 → 0 (permanent)
New match stats:
- 55,743 new deterministic auto_approved matches
- 521 legacy wavelength-mismatch records rejected
- Total active: 53,447 auto_approved + 1,987 approved
2026-05-13 17:59:08 +02:00
Rene Fichtmueller
9979b79434
feat: wavelength/connector enrichment schema + enricher robot
...
- sql/110: add wavelength_tx_nm, wavelength_rx_nm, connector_type,
data_completeness, enrichment_needed columns + trigger
- sql/111: IEEE/MSA standards wavelength lookup table (SFP→OSFP)
- sql/112: migrate existing wavelengths TEXT → integer columns
- robots/wavelength-enricher.ts: fills missing wavelengths from IEEE
lookup (deterministic) then product-name regex, runs every 4h
- scheduler: register enrich:wavelength job (4h schedule)
Fixes over-broad matching where 1G SFPs match 500+ competitors
due to missing wavelength discrimination.
2026-05-13 17:35:42 +02:00
Rene Fichtmueller
1edd6c20a8
fix: use COUNT(*) instead of COUNT(DISTINCT po.id) in catalog-reconcile
...
price_observations table has no id column — replace with COUNT(*)
to avoid SQL error 42703.
2026-05-13 16:59:49 +02:00
Rene Fichtmueller
98b241f462
feat: implement Flexoptix reference matching overhaul
...
- sql/108: normalize form_factor across all vendors (SFP-Plus → SFP+, etc.)
and round speed_gbps for consistent matching
- sql/109: document 30→90 day matcher window change
- robots/catalog-reconcile.ts: new bulk-reconcile robot — matches all
Flexoptix products against all competitors without 30-day time limit
- scheduler.ts: register catalog:reconcile job (monthly + on-demand),
fix nightly matcher 30→90 day window, UPPER() form_factor matching,
ROUND() speed_gbps matching
Fixes: ATGBICS/NADDOD/10Gtek/ShopFiber24 had 0 Flexoptix equivalences
due to stale price_observations being filtered out. Expected coverage
improvement: 22% → 45-60% after first reconcile run.
2026-05-13 16:55:45 +02:00
Rene Fichtmueller
a20094755d
feat(scraper): Flexoptix REST API sync robot + scheduler integration
...
Replaces the GraphQL/search-based Flexoptix scraper with a proper
Magento 2 REST API integration that delivers authoritative SKUs,
prices, stock levels and compatibility data.
New files:
- packages/scraper/src/robots/flexoptix-api-sync.ts
Self-contained robot: auth → paginated fetch → normalize → DB write.
Reads FLEXOPTIX_API_BASE_URL / _USERNAME / _PASSWORD from env.
Returns { fetched, normalized, skipped, priceWrites, stockWrites }.
No file intermediary — in-memory pipeline.
- scripts/import-flexoptix-catalog.ts
One-shot CLI importer for the Pulso-generated JSONL (Codex handover).
- docs/FLEXOPTIX_CATALOG_IMPORT.md
Runbook for manual import + per-SKU specifications enrichment.
Scheduler changes:
- Added sync:flexoptix-catalog queue + work() handler
- Scheduled every 2h at 0 */2 * * * (same cadence as legacy job)
- scrape:pricing:flexoptix kept as legacy GraphQL fallback
Also includes Codex-generated additions from this sprint:
- audiocodes-oem scraper, seed-batch35/36/37, db.ts improvements,
sql/102 verification reconcile, README + package.json updates
2026-05-13 16:36:33 +02:00
Rene Fichtmueller
5eb1b07183
fix: close stale TIP manual review queue
2026-05-10 10:23:07 +02:00
Rene Fichtmueller
cf0e471fa4
feat: close TIP research resolution states
2026-05-10 10:13:09 +02:00
Rene Fichtmueller
0edc6e3f3a
feat: Pi scraper fleet — fetch-only index-pi.ts + FS.COM/NADDOD via SOCKS5
...
- index-pi.ts: removed Playwright scrapers (FS.COM, eBay enricher, switch assets)
added NADDOD (fetch-based, benefits from residential IP)
now 32 fetch-only queues safe for ARM/Pi without Chromium
- index-fs-only.ts: new dedicated FS.COM + NADDOD worker for Erik
routes through Pi SOCKS5 via PROXY_URLS=socks5://10.10.0.6:1080
Crawlee ProxyConfiguration automatically applies to Playwright crawler
- pi-scraper-setup.sh: removed inline index-pi.ts override (repo version now authoritative)
- CODEX-TASK-pi-scraper-deploy.md: full 9-step Codex spec for Pi fleet setup
covers WireGuard keypair, Erik peer config, setup script, ecosystem.config.js
- CODEX-TASK-zero-manual-review.md: deterministic equivalence matcher spec
2026-05-10 09:53:55 +02:00
Rene Fichtmueller
7e36236d2b
fix: quarantine GAO catalog artifacts
2026-05-10 09:48:43 +02:00
Rene Fichtmueller
d691745c7b
feat: clean TIP cable rows from active base
2026-05-10 09:41:59 +02:00
Rene Fichtmueller
2be61f2441
feat: close TIP retail price research states
2026-05-10 01:42:24 +02:00
Rene Fichtmueller
b58f7cee41
feat: resolve OEM price status and part details
2026-05-10 01:16:49 +02:00
Rene Fichtmueller
adb2661fac
feat: add targeted product page asset verifier
2026-05-10 00:31:33 +02:00
Rene Fichtmueller
0d4bcb6924
fix: preserve explicit competitor states in reconcile
2026-05-10 00:17:26 +02:00
Rene Fichtmueller
635a102932
feat: close open competitor research states
2026-05-10 00:03:42 +02:00
Rene Fichtmueller
fb9db56617
fix: quarantine fs numeric sku aliases
2026-05-09 23:35:01 +02:00
Rene Fichtmueller
79a57a5ac6
feat: add no-valid competitor resolver
2026-05-09 23:16:04 +02:00
Rene Fichtmueller
650de6ba9a
feat: add verification evidence state model
2026-05-09 23:06:21 +02:00
Rene Fichtmueller
1af4f090f7
fix: harden TIP verification cleanup
2026-05-09 22:16:29 +02:00
Rene Fichtmueller
a43e572946
fix: advance TIP product verification robots
2026-05-09 20:19:19 +02:00
Rene Fichtmueller
ec40a96ae0
feat: add vendor detail verifiers
2026-05-09 18:22:09 +02:00
Rene Fichtmueller
91a1c2282a
fix: harden atgbics evidence parsing
2026-05-09 17:30:08 +02:00
Rene Fichtmueller
c2421c03a3
fix: harden shopfiber24 reach parsing
2026-05-09 17:24:06 +02:00
Rene Fichtmueller
bb9c495497
fix: verify qsfptek cable details
2026-05-09 17:03:35 +02:00
Rene Fichtmueller
fc18b00157
fix: verify copper cable semantics
2026-05-09 16:55:50 +02:00
Rene Fichtmueller
c25300199a
fix: harden atgbics wavelength semantics
2026-05-09 16:41:18 +02:00
Rene Fichtmueller
b26696f0d1
fix: improve vendor verification and fscom 1.6t variants
2026-05-09 15:56:08 +02:00
Rene Fichtmueller
60531b6250
feat: add crawlee python worker integration
2026-05-09 14:06:34 +02:00
Rene Fichtmueller
3d79f6b8e0
fix: add fscom url discovery mode
2026-05-09 14:00:30 +02:00
Rene Fichtmueller
f64dbf7b6b
fix: add fscom targeted detail verification mode
2026-05-09 11:15:36 +02:00
Rene Fichtmueller
549b4430df
fix: enrich flexoptix detail verification
2026-05-09 09:36:28 +02:00
Rene Fichtmueller
5522bb2152
fix: refresh price verification timestamps
2026-05-09 08:13:39 +02:00
Rene Fichtmueller
43b7250180
fix: automate equivalence research review queue
2026-05-09 07:48:11 +02:00
Rene Fichtmueller
ef225c7dc5
fix: revalidate flexoptix fs prices and images
2026-05-09 05:13:37 +02:00
Rene Fichtmueller
57e20efe49
fix: NADDOD price extraction — read from LD+JSON offers.price
...
NADDOD uses LD+JSON for pricing (Astro/Shopify structure):
{"offers":{"price":"731.00","priceCurrency":"USD",...}}
Old regex (/US$\s*.../) never matched → all 132 price obs were lucky
text matches, not systematic. Now: parse all ld+json blocks first,
fall back to regex.
Also broaden sitemap URL regex to capture new-style URLs without .html:
/products/nvidia-networking/102612 (was being missed)
2026-05-06 23:55:55 +02:00
Rene Fichtmueller
1a7c928120
fix: FS.COM price extraction — use .no_tax/.price CSS selectors
...
FS.com changed their HTML structure; compound class names are gone.
Current layout (verified 2026-05-06):
<div class="no_tax">5,10 € ohne MwSt.</div> ← B2B net price (preferred)
<div class="price">6,07 €</div> ← gross fallback
<div class="standard_price">6,07 €</div> ← gross fallback
Old selectors ([class*='price-value'] etc.) matched nothing → all prices
stored as €? null. New .no_tax first gives us the correct net/B2B price.
2026-05-06 23:45:30 +02:00
Rene Fichtmueller
a1a525b332
chore: sync API routes, dashboard hot-topics, MCP server, scraper package, scripts
2026-05-06 23:39:04 +02:00
Rene Fichtmueller
a8529d166b
fix: resolve TS build errors — export backfillImages, add writeRobotExperience
...
- backfill-images.ts: rename main() → export backfillImages() to match index.ts import
- training-data-writer.ts: add writeRobotExperience export; remove hardcoded Gitea token
- fiber24.ts/fibermall.ts: scraper improvements from previous sessions
- image-downloader.ts/spec-updater.ts: utility updates
- robots/: add verification robots module
2026-05-06 23:39:00 +02:00
Rene Fichtmueller
5a77fce9f3
feat: NADDOD cursor rotation — covers all 7300+ URLs across 12 runs (24h)
...
Previously always sliced first 600 URLs from sitemap, missing 6700+ products.
Now stores offset in naddod-cursor.json, advances by 600 per run with wrap-around.
Full sitemap coverage in ~13 runs (26h). Also adds TIP_STORAGE_DIR env support.
2026-05-06 23:26:58 +02:00
Rene Fichtmueller
efb0c24a19
feat: rewrite ATGBICS scraper to use Shopify products.json API
...
Static HTML collection pages return wrong results (all redirect to same 9 products).
Switch to /collections/{handle}/products.json?limit=250&page=N API which is:
- Reliable JSON (no HTML parsing)
- Correct per-collection product lists
- Clean pagination (stop at < limit results)
- Covers 11 key transceiver collections (1G, 10G, 25G, 40G, 100G, 400G)
2026-05-06 23:17:46 +02:00
Rene Fichtmueller
5c882c3a46
fix: refresh stale price observations after 7 days + fix ATGBICS pagination wrap-around
...
- upsertPriceObservation: insert new observation if last one is >7 days old,
even when price (content_hash) hasn't changed — keeps timeseries data fresh
- ATGBICS: detect Shopify catalog wrap-around by tracking per-category seen URLs;
stop pagination when all products on a page were already seen in a prior page
- ATGBICS: improve hasNextPage to match &page=N anchored in href params
2026-05-06 23:11:15 +02:00
Rene Fichtmueller
199f36be48
fix(scraper): auto-create pg-boss queues before scheduling + worker/schedule order
...
- scheduler: patch boss.schedule() to call createQueue() first (idempotent),
fixing FK constraint errors after DB reset — no need to touch 277 call sites
- index: registerWorkers() before registerSchedules() since boss.work() must
register handlers before schedules fire
- dashboard: fix switchBlogLlm() to use api() helper (adds Bearer auth token)
instead of raw fetch() which was returning 401 Unauthorized
2026-04-29 16:14:25 +02:00
Rene Fichtmueller
39a63e0401
fix(scheduler): vendor discovery crawlers daily 24/7 (not weekly)
2026-04-28 23:59:00 +02:00
Rene Fichtmueller
297dc46f2b
feat(crawler-llm): intelligent vendor discovery pipeline + TIPLLM training data
...
- spec-validator.ts: physical plausibility checks (form factor↔speed matrix,
wavelength↔fiber consistency, IEEE standard cross-check, reach limits).
Outputs tier (high/medium/low/rejected) + confidence_delta for LLM scores.
- training-data-writer.ts: converts validated crawler extractions to SFT JSONL
training pairs (spec_qa / crawl_reasoning / validation / discovery types).
Auto-commits and pushes to Gitea tip-training-data repo in batches of 50.
- vendor-discovery-crawler.ts: PlaywrightCrawler pipeline — catalog URL →
LLM extraction (scrapeWithLLM) → spec validation → DB persist +
Gitea SFT training pairs. 8 vendor configs registered
(Cisco/Juniper/Arista/FS.com/Flexoptix/Nokia/Huawei/II-VI).
- scheduler.ts: 8 weekly discover:vendor:* jobs added (Sun 20:00–Mon 10:00 UTC).
Total registered jobs: 102.
- Gitea repo created: gitea.context-x.org/rene/tip-training-data
2026-04-28 23:46:34 +02:00
Rene Fichtmueller
2466cc5d82
feat(scraper): batch 37 OEM seeds — Extreme (Legacy), Nortel, 3Com, Avaya
...
Added 4 legacy OEM transceiver catalog seed scrapers (72 PIDs total):
- extreme-legacy-oem.ts: 18 PIDs — Summit/BlackDiamond 10052H/10318/10325 family, Legacy
- nortel-legacy-oem.ts: 18 PIDs — Passport/BayStack AA1419xxx + XFP, incl. GBIC, Legacy
- 3com-legacy-oem.ts: 18 PIDs — Switch 5500/7750 3C17770/3CSFP9x + XFP/GBIC, Legacy
- avaya-legacy-oem.ts: 18 PIDs — ERS/VSP AA1419xxx + 700480xxx QSFP28, Legacy
Scheduler: wired at 05:30/05:45/06:00/06:15 UTC. All 72 PIDs seeded clean.
2026-04-28 23:31:13 +02:00
Rene Fichtmueller
e684d3d1c3
feat(scraper): batch 36 OEM seeds — EnGenius, Palo Alto Networks, Brocade, Foundry Networks
...
Added 4 new OEM transceiver catalog seed scrapers (72 PIDs total):
- engenius-oem.ts: 18 PIDs — ECS switch series 1G–100G SFP/SFP+/SFP28/QSFP28 + DAC/AOC
- paloalto-networks-oem.ts: 18 PIDs — PA-3200/5200/7000/5450 NGFW SFP/SFP+/SFP28/QSFP28 + DAC
- brocade-legacy-oem.ts: 18 PIDs — ICX/FCX/VDX/MLX E1MG/10G-SFPP family, market_status=Legacy
- foundry-networks-oem.ts: 18 PIDs — FastIron/NetIron FDR- series incl. XFP, market_status=Legacy
Scheduler: wired at 04:30/04:45/05:00/05:15 UTC. All 72 PIDs seeded clean.
2026-04-28 23:28:10 +02:00
Rene Fichtmueller
e9b8cb95db
feat(scraper): batch 35 OEM seeds — Sierra Wireless, Senao, EMCORE, Reflex Photonics
...
Added 4 new OEM transceiver catalog seed scrapers (75 PIDs total):
- sierra-wireless-oem.ts: 18 PIDs — RV55/RV50X/LX60 SFP/SFP+/QSFP+ incl. Industrial -40~85°C
- senao-oem.ts: 20 PIDs — EnGenius ECS switches 1G–100G SFP/SFP+/SFP28/QSFP28 + DAC
- emcore-oem.ts: 20 PIDs — ORION coherent ZR/ZR+/CFP2-DCO 400G + MIL-grade avionics
- reflex-photonics-oem.ts: 17 PIDs — LightABLE MIL-STD-810H + RAD-HARD space-grade
Scheduler: wired at 03:30/03:45/04:00/04:15 UTC. All 75 PIDs seeded to TIP DB.
2026-04-28 23:24:53 +02:00
Rene Fichtmueller
32d3ded169
feat: add Finisar, Acacia, Inphi OEM scrapers (batch 34)
...
- finisar-oem: 17 PIDs (FTLX/FTLC historical BoM series, 1G-100G, widely referenced)
- acacia-oem: 14 PIDs (AC400/AC1200 coherent CFP2-DCO/QSFP-DD/OSFP up to 1.2T)
- inphi-oem: 13 PIDs (ColorZ/COLORZ-II DWDM QSFP28/QSFP-DD + 800G OSFP)
- scheduler: wired all 3 at 02:45/03:00/03:15 UTC
2026-04-28 23:14:06 +02:00