1.5 KiB
1.5 KiB
TIP Crawlee Runtime
Decision
TIP standardizes on Crawlee as the crawler runtime.
- Production TypeScript path:
packages/scraperwithapify/crawleeand Playwright. - Optional Python worker path:
packages/crawlee-pythonwithapify/crawlee-python.
TypeScript Core
The TypeScript scraper remains the canonical production path because TIP already uses it for DB writes, price observations, stock observations, image verification and detail verification.
Useful FS.com commands:
pnpm -C packages/scraper run scrape:fs:db-detail
pnpm -C packages/scraper run scrape:fs:url-discovery
Erik safety defaults:
- keep FS.com at browser concurrency
1 - use bounded run caps
- treat no-text and max-retry URLs as retry/classification classes
- keep Crawlee storage isolated with
makeCrawleeConfig(...)
Python Worker
The Python worker is optional and should run first on Pi/Proxmox/residential nodes. It writes JSONL evidence and does not write directly into TIP DB.
Install:
cd packages/crawlee-python
python3 -m venv .venv
. .venv/bin/activate
python -m pip install -U pip
python -m pip install -e ".[beautifulsoup]"
Smoke:
python -m tip_crawlee_worker \
--mode beautifulsoup \
--url https://crawlee.dev \
--out /tmp/tip-crawlee-python-smoke.jsonl \
--max-requests 1
Training Pool
Every crawler result, failure class, parser lesson and runtime safety lesson
should be written to the TIPLLM training pool and synced through sync/.