# Crawlee Evaluation and FS.com URL Discovery Date: 2026-05-09 ## Question Operator asked with highest priority whether these repositories help TIP: - `https://github.com/apify/crawlee` - `https://github.com/apify/crawlee-python` - `https://github.com/hiteshchoudhary/crawlee-project` ## Evaluation `apify/crawlee` helps directly, but TIP already uses it in the TypeScript scraper stack. The priority is to harden our current usage rather than introduce a new crawler framework. Best immediate Crawlee practices for TIP: - keep per-vendor bounded runs - use stable `uniqueKey`/target IDs so retries do not create duplicate rows - keep Crawlee storage directories isolated per vendor/run class - record no-text and max-retry URLs as a separate retry class - use AutoscaledPool telemetry as a safety signal - keep Erik at low concurrency and move heavier work to Pi/Proxmox workers `apify/crawlee-python` is useful for future isolated worker experiments on Pi/Proxmox, especially where Python extraction libraries help. It should not replace the current TypeScript crawler core today. `hiteshchoudhary/crawlee-project` is a small community/demo app, not a production building block for TIP. ## Code Changed: - `packages/scraper/src/scrapers/fs-com.ts` Added: - `FS_URL_DISCOVERY_ONLY=1` - target row propagation with `targetTransceiverId` - image verification for target rows - H1/part/spec deterministic detail verification when FS.com lacks a spec table ## Live Runs URL discovery pilot: - target `20` - scraped `19` - failed `0` - no-url rows: `76` -> `57` Full URL discovery: - target `56` - scraped `55` - failed `1` - failed URL: `https://www.fs.com/de/products/229461.html` - no-url rows: `57` -> `2` DB reconciliation: - target `57` - scraped `55` - failed `0` - new prices `41` - stock observations `40` - specs verified `55` Build: - `pnpm -C packages/scraper build` passed on Erik ## FS.com Final State - total rows: `383` - price verified: `379` - image verified: `374` - details verified: `373` - price+image+details: `373` - fully verified: `205` - missing URL: `2` - missing image URL: `9` - missing reach label: `4` - missing fiber type: `9` - HTML product-like rows: `373` - HTML product-like complete: `371` - no-url rows: `2` - category rows: `4` Remaining no-url rows: - `Change` - `FS-229461` TIP health after run: - status: `healthy` - load status: `ok` - memory used: `13%` - global image verified: `10711` - global details verified: `9929` - global fully verified: `8526` ## Training Pool Pushed: - `4d9a11c crawl: add fscom url discovery learning record` ## Next Do not claim FS.com is 100% complete yet. Remaining work: - classify `Change` - retry or classify `FS-229461` - classify 4 category rows - close 9 image/fiber gaps - then move to next high-value competitor with the same bounded Crawlee pattern