4 Commits

Author SHA1 Message Date
Rene Fichtmueller
a8529d166b fix: resolve TS build errors — export backfillImages, add writeRobotExperience
- backfill-images.ts: rename main() → export backfillImages() to match index.ts import
- training-data-writer.ts: add writeRobotExperience export; remove hardcoded Gitea token
- fiber24.ts/fibermall.ts: scraper improvements from previous sessions
- image-downloader.ts/spec-updater.ts: utility updates
- robots/: add verification robots module
2026-05-06 23:39:00 +02:00
Rene Fichtmueller
d4ad9f4641 fix: ShopFiber24 sitemap-based scraping + Fibermall image extraction
ShopFiber24 (fiber24.ts):
- Complete rewrite: was using JS-rendered catalog (all prices = 0)
- New strategy: fetch sitemap_0.xml.gz → 310 product DE-URLs
- Each product page has Schema.org microdata: itemprop=price, sku, image
- Extracts: price (minPrice), SKU, image_url, name, specs
- Rate: 1 req/1.5s, no Playwright needed

FiberMall (fibermall.ts):
- Add imageUrl to Product interface
- Extract first fibermall.com/photo/*.jpg from product listing card
- Write image_url to transceivers table (has_image=true) on upsert
- SKU variants share parent product image
- 304 FiberMall transceivers will get images on next scraper run
2026-04-18 22:20:57 +02:00
Rene Fichtmueller
662cd1f90b fix(scraper): FiberMall URL schema + price parser + Flexoptix EUR comma bug
FiberMall:
- Correct /store-XXXXX-name.htm category URLs (was /c/xxx/ → HTTP 404)
- Parser: split on new_proList_mainListLi, price from data-price on
  currency_price span — fix 0.00 false-match from SKU variant items
- Also scrape SKU brand variant links from .sku_item divs
- Result: 3,410 prices now in DB (was 0)

Flexoptix:
- Fix extractPrice regex for EUR thousand-separator format
  (2,921.60 EUR was parsed as 2 EUR)
- Add OSFP224 / 1.6T search queries (4 new, form factor was missing)
- Fix O.138HG2.C.05 stale price 3009.60→2921.60 EUR

Schema: competitor_verified + competitor_verified_at columns
added via ALTER TABLE (were referenced in code but missing in DB)

CHANGELOG: added 6 entries for 2026-04-12
2026-04-12 04:26:35 +02:00
Rene Fichtmueller
cdb8ef6e61 feat(scraper): add FiberMall/Vcelink/OpticsBay scrapers, fix QSFPTEK API migration
- New scrapers: fibermall.ts (WooCommerce), vcelink.ts (Shopify), opticsbay.ts (WooCommerce)
- QSFPTEK rewritten to use /mall/commodity/list API (old OpenCart /c/*.html paths gone 404)
  - New: attribute-based filtering by data rate (1G/10G/25G/40G/100G/200G/400G/800G)
  - Scrapes HTML fragments, extracts US$ prices and product URLs
- scheduler.ts: +3 queues/schedules/workers (fibermall, vcelink, opticsbay) → 61 total workers
- index-pi.ts: Pi fleet picks up all 3 new scrapers
2026-04-11 19:13:36 +02:00