272 Commits

Author SHA1 Message Date
Rene Fichtmueller
98b241f462 feat: implement Flexoptix reference matching overhaul
- sql/108: normalize form_factor across all vendors (SFP-Plus → SFP+, etc.)
  and round speed_gbps for consistent matching
- sql/109: document 30→90 day matcher window change
- robots/catalog-reconcile.ts: new bulk-reconcile robot — matches all
  Flexoptix products against all competitors without 30-day time limit
- scheduler.ts: register catalog:reconcile job (monthly + on-demand),
  fix nightly matcher 30→90 day window, UPPER() form_factor matching,
  ROUND() speed_gbps matching

Fixes: ATGBICS/NADDOD/10Gtek/ShopFiber24 had 0 Flexoptix equivalences
due to stale price_observations being filtered out. Expected coverage
improvement: 22% → 45-60% after first reconcile run.
2026-05-13 16:55:45 +02:00
Rene Fichtmueller
048bf0dcf2 feat: add Codex task for Flexoptix reference matching overhaul
CODEX-TASK-flexoptix-reference-matching.md — comprehensive plan to fix
zero-match gap for ATGBICS/NADDOD/10Gtek/ShopFiber24 (8.260+ products
with 0 Flexoptix equivalences).

Root cause: 30-day price_observation window excludes vendors whose
scrapers ran >30 days ago. Solution: catalog-reconcile robot (full
bulk match, no time limit), form_factor normalization (SQL 108),
30→90 day window fix in nightly matcher, on-demand API endpoint.

Expected: coverage from 22% → 45-60% after one reconcile run.
2026-05-13 16:51:53 +02:00
Rene Fichtmueller
a20094755d feat(scraper): Flexoptix REST API sync robot + scheduler integration
Replaces the GraphQL/search-based Flexoptix scraper with a proper
Magento 2 REST API integration that delivers authoritative SKUs,
prices, stock levels and compatibility data.

New files:
- packages/scraper/src/robots/flexoptix-api-sync.ts
  Self-contained robot: auth → paginated fetch → normalize → DB write.
  Reads FLEXOPTIX_API_BASE_URL / _USERNAME / _PASSWORD from env.
  Returns { fetched, normalized, skipped, priceWrites, stockWrites }.
  No file intermediary — in-memory pipeline.

- scripts/import-flexoptix-catalog.ts
  One-shot CLI importer for the Pulso-generated JSONL (Codex handover).

- docs/FLEXOPTIX_CATALOG_IMPORT.md
  Runbook for manual import + per-SKU specifications enrichment.

Scheduler changes:
- Added sync:flexoptix-catalog queue + work() handler
- Scheduled every 2h at 0 */2 * * * (same cadence as legacy job)
- scrape:pricing:flexoptix kept as legacy GraphQL fallback

Also includes Codex-generated additions from this sprint:
- audiocodes-oem scraper, seed-batch35/36/37, db.ts improvements,
  sql/102 verification reconcile, README + package.json updates
2026-05-13 16:36:33 +02:00
Rene Fichtmueller
122c4b8a81 fix: remove stock demo tab marker 2026-05-10 15:57:15 +02:00
Rene Fichtmueller
a0657ee565 fix: filter TIP hot topics quality 2026-05-10 15:54:38 +02:00
Rene Fichtmueller
e5917a2250 fix: show active TIP product scope 2026-05-10 15:46:41 +02:00
Rene Fichtmueller
58a2570842 fix: show TIP research status on overview 2026-05-10 15:01:22 +02:00
Rene Fichtmueller
5eb1b07183 fix: close stale TIP manual review queue 2026-05-10 10:23:07 +02:00
Rene Fichtmueller
cf0e471fa4 feat: close TIP research resolution states 2026-05-10 10:13:09 +02:00
Rene Fichtmueller
10af2ca244 fix: generated_by tag — v6-length-fix → v7 2026-05-10 09:55:39 +02:00
Rene Fichtmueller
0edc6e3f3a feat: Pi scraper fleet — fetch-only index-pi.ts + FS.COM/NADDOD via SOCKS5
- index-pi.ts: removed Playwright scrapers (FS.COM, eBay enricher, switch assets)
  added NADDOD (fetch-based, benefits from residential IP)
  now 32 fetch-only queues safe for ARM/Pi without Chromium
- index-fs-only.ts: new dedicated FS.COM + NADDOD worker for Erik
  routes through Pi SOCKS5 via PROXY_URLS=socks5://10.10.0.6:1080
  Crawlee ProxyConfiguration automatically applies to Playwright crawler
- pi-scraper-setup.sh: removed inline index-pi.ts override (repo version now authoritative)
- CODEX-TASK-pi-scraper-deploy.md: full 9-step Codex spec for Pi fleet setup
  covers WireGuard keypair, Erik peer config, setup script, ecosystem.config.js
- CODEX-TASK-zero-manual-review.md: deterministic equivalence matcher spec
2026-05-10 09:53:55 +02:00
Rene Fichtmueller
7e36236d2b fix: quarantine GAO catalog artifacts 2026-05-10 09:48:43 +02:00
Rene Fichtmueller
d691745c7b feat: clean TIP cable rows from active base 2026-05-10 09:41:59 +02:00
Rene Fichtmueller
2be61f2441 feat: close TIP retail price research states 2026-05-10 01:42:24 +02:00
Rene Fichtmueller
b58f7cee41 feat: resolve OEM price status and part details 2026-05-10 01:16:49 +02:00
Rene Fichtmueller
adb2661fac feat: add targeted product page asset verifier 2026-05-10 00:31:33 +02:00
Rene Fichtmueller
0d4bcb6924 fix: preserve explicit competitor states in reconcile 2026-05-10 00:17:26 +02:00
Rene Fichtmueller
635a102932 feat: close open competitor research states 2026-05-10 00:03:42 +02:00
Rene Fichtmueller
fb9db56617 fix: quarantine fs numeric sku aliases 2026-05-09 23:35:01 +02:00
Rene Fichtmueller
79a57a5ac6 feat: add no-valid competitor resolver 2026-05-09 23:16:04 +02:00
Rene Fichtmueller
650de6ba9a feat: add verification evidence state model 2026-05-09 23:06:21 +02:00
Rene Fichtmueller
1af4f090f7 fix: harden TIP verification cleanup 2026-05-09 22:16:29 +02:00
Rene Fichtmueller
a43e572946 fix: advance TIP product verification robots 2026-05-09 20:19:19 +02:00
Rene Fichtmueller
ec40a96ae0 feat: add vendor detail verifiers 2026-05-09 18:22:09 +02:00
Rene Fichtmueller
91a1c2282a fix: harden atgbics evidence parsing 2026-05-09 17:30:08 +02:00
Rene Fichtmueller
c2421c03a3 fix: harden shopfiber24 reach parsing 2026-05-09 17:24:06 +02:00
Rene Fichtmueller
bb9c495497 fix: verify qsfptek cable details 2026-05-09 17:03:35 +02:00
Rene Fichtmueller
fc18b00157 fix: verify copper cable semantics 2026-05-09 16:55:50 +02:00
Rene Fichtmueller
c25300199a fix: harden atgbics wavelength semantics 2026-05-09 16:41:18 +02:00
Rene Fichtmueller
61acccf5df fix: require strict comparable transceiver evidence 2026-05-09 16:02:49 +02:00
Rene Fichtmueller
b26696f0d1 fix: improve vendor verification and fscom 1.6t variants 2026-05-09 15:56:08 +02:00
Rene Fichtmueller
49f0871720 chore: ignore crawlee python build artifacts 2026-05-09 14:06:55 +02:00
Rene Fichtmueller
60531b6250 feat: add crawlee python worker integration 2026-05-09 14:06:34 +02:00
Rene Fichtmueller
3d79f6b8e0 fix: add fscom url discovery mode 2026-05-09 14:00:30 +02:00
Rene Fichtmueller
f64dbf7b6b fix: add fscom targeted detail verification mode 2026-05-09 11:15:36 +02:00
Rene Fichtmueller
549b4430df fix: enrich flexoptix detail verification 2026-05-09 09:36:28 +02:00
Rene Fichtmueller
5522bb2152 fix: refresh price verification timestamps 2026-05-09 08:13:39 +02:00
Rene Fichtmueller
43b7250180 fix: automate equivalence research review queue 2026-05-09 07:48:11 +02:00
Rene Fichtmueller
ef225c7dc5 fix: revalidate flexoptix fs prices and images 2026-05-09 05:13:37 +02:00
Rene Fichtmueller
57e20efe49 fix: NADDOD price extraction — read from LD+JSON offers.price
NADDOD uses LD+JSON for pricing (Astro/Shopify structure):
  {"offers":{"price":"731.00","priceCurrency":"USD",...}}

Old regex (/US$\s*.../) never matched → all 132 price obs were lucky
text matches, not systematic. Now: parse all ld+json blocks first,
fall back to regex.

Also broaden sitemap URL regex to capture new-style URLs without .html:
  /products/nvidia-networking/102612 (was being missed)
2026-05-06 23:55:55 +02:00
Rene Fichtmueller
1a7c928120 fix: FS.COM price extraction — use .no_tax/.price CSS selectors
FS.com changed their HTML structure; compound class names are gone.
Current layout (verified 2026-05-06):
  <div class="no_tax">5,10 € ohne MwSt.</div>  ← B2B net price (preferred)
  <div class="price">6,07 €</div>               ← gross fallback
  <div class="standard_price">6,07 €</div>      ← gross fallback

Old selectors ([class*='price-value'] etc.) matched nothing → all prices
stored as €? null. New .no_tax first gives us the correct net/B2B price.
2026-05-06 23:45:30 +02:00
Rene Fichtmueller
a1a525b332 chore: sync API routes, dashboard hot-topics, MCP server, scraper package, scripts 2026-05-06 23:39:04 +02:00
Rene Fichtmueller
a8529d166b fix: resolve TS build errors — export backfillImages, add writeRobotExperience
- backfill-images.ts: rename main() → export backfillImages() to match index.ts import
- training-data-writer.ts: add writeRobotExperience export; remove hardcoded Gitea token
- fiber24.ts/fibermall.ts: scraper improvements from previous sessions
- image-downloader.ts/spec-updater.ts: utility updates
- robots/: add verification robots module
2026-05-06 23:39:00 +02:00
Rene Fichtmueller
5a77fce9f3 feat: NADDOD cursor rotation — covers all 7300+ URLs across 12 runs (24h)
Previously always sliced first 600 URLs from sitemap, missing 6700+ products.
Now stores offset in naddod-cursor.json, advances by 600 per run with wrap-around.
Full sitemap coverage in ~13 runs (26h). Also adds TIP_STORAGE_DIR env support.
2026-05-06 23:26:58 +02:00
Rene Fichtmueller
efb0c24a19 feat: rewrite ATGBICS scraper to use Shopify products.json API
Static HTML collection pages return wrong results (all redirect to same 9 products).
Switch to /collections/{handle}/products.json?limit=250&page=N API which is:
- Reliable JSON (no HTML parsing)
- Correct per-collection product lists
- Clean pagination (stop at < limit results)
- Covers 11 key transceiver collections (1G, 10G, 25G, 40G, 100G, 400G)
2026-05-06 23:17:46 +02:00
Rene Fichtmueller
5c882c3a46 fix: refresh stale price observations after 7 days + fix ATGBICS pagination wrap-around
- upsertPriceObservation: insert new observation if last one is >7 days old,
  even when price (content_hash) hasn't changed — keeps timeseries data fresh
- ATGBICS: detect Shopify catalog wrap-around by tracking per-category seen URLs;
  stop pagination when all products on a page were already seen in a prior page
- ATGBICS: improve hasNextPage to match &page=N anchored in href params
2026-05-06 23:11:15 +02:00
Rene Fichtmueller
199f36be48 fix(scraper): auto-create pg-boss queues before scheduling + worker/schedule order
- scheduler: patch boss.schedule() to call createQueue() first (idempotent),
  fixing FK constraint errors after DB reset — no need to touch 277 call sites
- index: registerWorkers() before registerSchedules() since boss.work() must
  register handlers before schedules fire
- dashboard: fix switchBlogLlm() to use api() helper (adds Bearer auth token)
  instead of raw fetch() which was returning 401 Unauthorized
2026-04-29 16:14:25 +02:00
Rene Fichtmueller
270bd12382 feat(dashboard): clickable LLM model selector — switch blog engine at runtime
- client.ts: BLOG_LLM_PROVIDER/OLLAMA_LLM_MODEL as mutable state (setLlmProvider/
  getLlmProvider). Reads blog-llm-settings.json on startup for persistence.
  All generate()/checkHealth()/chat() calls use dynamic provider() + llmModel()
  — no restart required for switches.

- blog.ts: POST /api/blog/llm/switch endpoint — validates provider, calls
  setLlmProvider(), writes settings file, returns previous+active state.

- index.html: all 4 model cards now clickable (cursor:pointer, hover fade).
  switchBlogLlm(provider, model) — disables cards during switch, shows
  green/red feedback toast, auto-refreshes status. SSH instruction removed.
2026-04-29 01:15:45 +02:00
Rene Fichtmueller
39a63e0401 fix(scheduler): vendor discovery crawlers daily 24/7 (not weekly) 2026-04-28 23:59:00 +02:00
Rene Fichtmueller
297dc46f2b feat(crawler-llm): intelligent vendor discovery pipeline + TIPLLM training data
- spec-validator.ts: physical plausibility checks (form factor↔speed matrix,
  wavelength↔fiber consistency, IEEE standard cross-check, reach limits).
  Outputs tier (high/medium/low/rejected) + confidence_delta for LLM scores.

- training-data-writer.ts: converts validated crawler extractions to SFT JSONL
  training pairs (spec_qa / crawl_reasoning / validation / discovery types).
  Auto-commits and pushes to Gitea tip-training-data repo in batches of 50.

- vendor-discovery-crawler.ts: PlaywrightCrawler pipeline — catalog URL →
  LLM extraction (scrapeWithLLM) → spec validation → DB persist +
  Gitea SFT training pairs. 8 vendor configs registered
  (Cisco/Juniper/Arista/FS.com/Flexoptix/Nokia/Huawei/II-VI).

- scheduler.ts: 8 weekly discover:vendor:* jobs added (Sun 20:00–Mon 10:00 UTC).
  Total registered jobs: 102.

- Gitea repo created: gitea.context-x.org/rene/tip-training-data
2026-04-28 23:46:34 +02:00