# TIP Crawlee Runtime

## Decision

TIP standardizes on Crawlee as the crawler runtime.

- Production TypeScript path: `packages/scraper` with `apify/crawlee` and Playwright.
- Optional Python worker path: `packages/crawlee-python` with `apify/crawlee-python`.

## TypeScript Core

The TypeScript scraper remains the canonical production path because TIP already
uses it for DB writes, price observations, stock observations, image verification
and detail verification.

Useful FS.com commands:

```bash
pnpm -C packages/scraper run scrape:fs:db-detail
pnpm -C packages/scraper run scrape:fs:url-discovery
```

Erik safety defaults:

- keep FS.com at browser concurrency `1`
- use bounded run caps
- treat no-text and max-retry URLs as retry/classification classes
- keep Crawlee storage isolated with `makeCrawleeConfig(...)`

## Python Worker

The Python worker is optional and should run first on Pi/Proxmox/residential
nodes. It writes JSONL evidence and does not write directly into TIP DB.

Install:

```bash
cd packages/crawlee-python
python3 -m venv .venv
. .venv/bin/activate
python -m pip install -U pip
python -m pip install -e ".[beautifulsoup]"
```

Smoke:

```bash
python -m tip_crawlee_worker \
  --mode beautifulsoup \
  --url https://crawlee.dev \
  --out /tmp/tip-crawlee-python-smoke.jsonl \
  --max-requests 1
```

## Training Pool

Every crawler result, failure class, parser lesson and runtime safety lesson
should be written to the TIPLLM training pool and synced through `sync/`.