TIP Crawlee Python Worker

Optional Python crawler worker for Pi/Proxmox/residential nodes.

The TypeScript scraper package remains the production crawler core. This package exists for isolated worker experiments where Python extraction libraries are a better fit. It writes JSONL artifacts; it does not write directly to TIP PostgreSQL.

Install

cd packages/crawlee-python
python3 -m venv .venv
. .venv/bin/activate
python -m pip install -U pip
python -m pip install -e ".[beautifulsoup]"

For browser-based Python workers:

python -m pip install -e ".[playwright]"
python -m playwright install chromium

Smoke Run

python -m tip_crawlee_worker \
  --mode beautifulsoup \
  --url https://crawlee.dev \
  --out /tmp/tip-crawlee-python-smoke.jsonl \
  --max-requests 1

TIP Policy

  • Use this on Pi/Proxmox/residential nodes first, not as an Erik-heavy crawler.
  • Keep output as JSONL evidence until a deterministic importer validates it.
  • Record useful crawler outcomes in the TIPLLM training pool.
  • Use TIPLLM only for planning/extraction feedback; no external AI.