transceiver-db/sync/history/2026-05-09-flexoptix-fs-price-image-revalidation.md
2026-05-09 05:13:37 +02:00

117 lines
3.9 KiB
Markdown

# 2026-05-09 Flexoptix + FS.com Price/Image Revalidation
## Request
Rene reported that many TIP prices, especially Flexoptix prices, were wrong and asked for all Flexoptix and FS.com prices to be fully revalidated and images checked.
Standing constraints were preserved:
- TIP crawler/robot planning and extraction feedback stays TIPLLM-only.
- No external AI was used for crawler planning or extraction feedback.
- Erik must not be overloaded.
- Robot/crawler experiences must be written into the Gitea-backed TIPLLM training pool.
- Work status must be written back to `sync/`.
## Root Cause
Two concrete issues were found:
1. `upsertPriceObservation` marked `transceivers.price_verified`, but inserted price rows did not set `price_observations.is_verified` or `verified_at`.
2. FS.com image extraction still used older selectors. Current FS.com product pages expose product images under `.big_img_box`, `img.big_img`, `.big_img_m_active`, `.big_img_m`, and `.small_img_active`, usually from `resource.fs.com/mall/mainImg/...`.
## Code Changed
- `packages/scraper/src/utils/db.ts`
- Price observations now set `is_verified = true` and `verified_at` for new observations.
- Fresh unchanged observations are backfilled to verified.
- `price_verified_at` is maintained.
- Image verification now refreshes `image_verified_at`, `image_verified_url`, and `image_scraped_at`.
- Existing transceivers now call `markImageVerified` whenever a scraper provides an image URL.
- `packages/scraper/src/scrapers/fs-com.ts`
- Added `TIP_FORCE_REVALIDATE`.
- Added `FS_MAX_DETAIL_PAGES_PER_RUN`.
- Added `FS_ONLY_MISSING_IMAGES`.
- Added URL normalization for FS.com product URLs.
- Updated image extraction to prefer current product image DOM and reject default/logo/general/icon/SVG URLs.
## Live Runs
All runs were executed sequentially and rate-limited on Erik after CT115 / `tip-scraper` SSH did not respond quickly enough from this session.
Build:
```bash
pnpm -C packages/scraper build
```
Result: passed on `/opt/tip`.
Flexoptix:
- 615 products processed.
- 615 Flexoptix price observation rows marked verified.
- 605 Flexoptix images verified in the run window.
FS.com full force revalidation:
- 270 products discovered.
- 270 detail pages scraped.
- 0 failed detail requests.
- 17 new price observations.
- 266 FS.com price observations verified after the pass.
FS.com targeted missing-image pass:
- 99 DB product URLs without images matched current category listings.
- 99 detail pages scraped.
- 0 failed detail requests.
- FS.com image-verified products increased from 207 to 299.
- FS.com verified price observations increased to 271.
## Final Counters
Flexoptix:
- products: 744
- product price_verified: 619
- product image_verified: 615
- price observation rows: 1288
- verified price observation rows: 615
FS.COM:
- products: 383
- product price_verified: 379
- product image_verified: 299
- price observation rows: 818
- verified price observation rows: 271
## Operations
- `tip-scraper-daemon` restarted and is online.
- `tip-api` remained online.
- Erik remained stable; final load around `2.16, 2.22, 2.47`.
- External dashboard health curl failed once from local DNS resolution, while PM2 and DB checks were healthy.
## TIPLLM Training Pool
The local clone `/tmp/tip-training-data` was recreated from Gitea.
New records were written to:
- `robot-experiences/2026-05-09.jsonl`
- `qa-pairs/robot-control-high.jsonl`
Pushed to Gitea:
```text
850083f crawl: add flexoptix fs revalidation learning record
```
## Follow-Up
- FS.com still has 84 products without `image_verified`; 67 of those had no usable `/products/` URL in the current DB snapshot or were not found in current category listings.
- A future robot wave should specifically reconcile FS.com rows with blank/missing `product_page_url`.
- For future heavy FS.com work, prefer CT115/Proxmox/Pi once SSH reachability is confirmed; Erik should remain the controller or slow emergency runner only.