117 lines
3.9 KiB
Markdown
117 lines
3.9 KiB
Markdown
# 2026-05-09 Flexoptix + FS.com Price/Image Revalidation
|
|
|
|
## Request
|
|
|
|
Rene reported that many TIP prices, especially Flexoptix prices, were wrong and asked for all Flexoptix and FS.com prices to be fully revalidated and images checked.
|
|
|
|
Standing constraints were preserved:
|
|
|
|
- TIP crawler/robot planning and extraction feedback stays TIPLLM-only.
|
|
- No external AI was used for crawler planning or extraction feedback.
|
|
- Erik must not be overloaded.
|
|
- Robot/crawler experiences must be written into the Gitea-backed TIPLLM training pool.
|
|
- Work status must be written back to `sync/`.
|
|
|
|
## Root Cause
|
|
|
|
Two concrete issues were found:
|
|
|
|
1. `upsertPriceObservation` marked `transceivers.price_verified`, but inserted price rows did not set `price_observations.is_verified` or `verified_at`.
|
|
2. FS.com image extraction still used older selectors. Current FS.com product pages expose product images under `.big_img_box`, `img.big_img`, `.big_img_m_active`, `.big_img_m`, and `.small_img_active`, usually from `resource.fs.com/mall/mainImg/...`.
|
|
|
|
## Code Changed
|
|
|
|
- `packages/scraper/src/utils/db.ts`
|
|
- Price observations now set `is_verified = true` and `verified_at` for new observations.
|
|
- Fresh unchanged observations are backfilled to verified.
|
|
- `price_verified_at` is maintained.
|
|
- Image verification now refreshes `image_verified_at`, `image_verified_url`, and `image_scraped_at`.
|
|
- Existing transceivers now call `markImageVerified` whenever a scraper provides an image URL.
|
|
|
|
- `packages/scraper/src/scrapers/fs-com.ts`
|
|
- Added `TIP_FORCE_REVALIDATE`.
|
|
- Added `FS_MAX_DETAIL_PAGES_PER_RUN`.
|
|
- Added `FS_ONLY_MISSING_IMAGES`.
|
|
- Added URL normalization for FS.com product URLs.
|
|
- Updated image extraction to prefer current product image DOM and reject default/logo/general/icon/SVG URLs.
|
|
|
|
## Live Runs
|
|
|
|
All runs were executed sequentially and rate-limited on Erik after CT115 / `tip-scraper` SSH did not respond quickly enough from this session.
|
|
|
|
Build:
|
|
|
|
```bash
|
|
pnpm -C packages/scraper build
|
|
```
|
|
|
|
Result: passed on `/opt/tip`.
|
|
|
|
Flexoptix:
|
|
|
|
- 615 products processed.
|
|
- 615 Flexoptix price observation rows marked verified.
|
|
- 605 Flexoptix images verified in the run window.
|
|
|
|
FS.com full force revalidation:
|
|
|
|
- 270 products discovered.
|
|
- 270 detail pages scraped.
|
|
- 0 failed detail requests.
|
|
- 17 new price observations.
|
|
- 266 FS.com price observations verified after the pass.
|
|
|
|
FS.com targeted missing-image pass:
|
|
|
|
- 99 DB product URLs without images matched current category listings.
|
|
- 99 detail pages scraped.
|
|
- 0 failed detail requests.
|
|
- FS.com image-verified products increased from 207 to 299.
|
|
- FS.com verified price observations increased to 271.
|
|
|
|
## Final Counters
|
|
|
|
Flexoptix:
|
|
|
|
- products: 744
|
|
- product price_verified: 619
|
|
- product image_verified: 615
|
|
- price observation rows: 1288
|
|
- verified price observation rows: 615
|
|
|
|
FS.COM:
|
|
|
|
- products: 383
|
|
- product price_verified: 379
|
|
- product image_verified: 299
|
|
- price observation rows: 818
|
|
- verified price observation rows: 271
|
|
|
|
## Operations
|
|
|
|
- `tip-scraper-daemon` restarted and is online.
|
|
- `tip-api` remained online.
|
|
- Erik remained stable; final load around `2.16, 2.22, 2.47`.
|
|
- External dashboard health curl failed once from local DNS resolution, while PM2 and DB checks were healthy.
|
|
|
|
## TIPLLM Training Pool
|
|
|
|
The local clone `/tmp/tip-training-data` was recreated from Gitea.
|
|
|
|
New records were written to:
|
|
|
|
- `robot-experiences/2026-05-09.jsonl`
|
|
- `qa-pairs/robot-control-high.jsonl`
|
|
|
|
Pushed to Gitea:
|
|
|
|
```text
|
|
850083f crawl: add flexoptix fs revalidation learning record
|
|
```
|
|
|
|
## Follow-Up
|
|
|
|
- FS.com still has 84 products without `image_verified`; 67 of those had no usable `/products/` URL in the current DB snapshot or were not found in current category listings.
|
|
- A future robot wave should specifically reconcile FS.com rows with blank/missing `product_page_url`.
|
|
- For future heavy FS.com work, prefer CT115/Proxmox/Pi once SSH reachability is confirmed; Erik should remain the controller or slow emergency runner only.
|