chore: ignore crawlee python build artifacts
This commit is contained in:
parent
60531b6250
commit
49f0871720
4
packages/crawlee-python/.gitignore
vendored
Normal file
4
packages/crawlee-python/.gitignore
vendored
Normal file
@ -0,0 +1,4 @@
|
|||||||
|
*.egg-info/
|
||||||
|
__pycache__/
|
||||||
|
*.py[cod]
|
||||||
|
.venv/
|
||||||
@ -1,57 +0,0 @@
|
|||||||
Metadata-Version: 2.4
|
|
||||||
Name: tip-crawlee-python-worker
|
|
||||||
Version: 0.1.0
|
|
||||||
Summary: Optional Crawlee Python worker for TIP crawler nodes
|
|
||||||
Requires-Python: >=3.11
|
|
||||||
Description-Content-Type: text/markdown
|
|
||||||
Requires-Dist: crawlee>=1.0.0
|
|
||||||
Provides-Extra: beautifulsoup
|
|
||||||
Requires-Dist: crawlee[beautifulsoup]>=1.0.0; extra == "beautifulsoup"
|
|
||||||
Provides-Extra: playwright
|
|
||||||
Requires-Dist: crawlee[playwright]>=1.0.0; extra == "playwright"
|
|
||||||
Requires-Dist: playwright>=1.50.0; extra == "playwright"
|
|
||||||
Provides-Extra: all
|
|
||||||
Requires-Dist: crawlee[all]>=1.0.0; extra == "all"
|
|
||||||
|
|
||||||
# TIP Crawlee Python Worker
|
|
||||||
|
|
||||||
Optional Python crawler worker for Pi/Proxmox/residential nodes.
|
|
||||||
|
|
||||||
The TypeScript scraper package remains the production crawler core. This package
|
|
||||||
exists for isolated worker experiments where Python extraction libraries are a
|
|
||||||
better fit. It writes JSONL artifacts; it does not write directly to TIP
|
|
||||||
PostgreSQL.
|
|
||||||
|
|
||||||
## Install
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd packages/crawlee-python
|
|
||||||
python3 -m venv .venv
|
|
||||||
. .venv/bin/activate
|
|
||||||
python -m pip install -U pip
|
|
||||||
python -m pip install -e ".[beautifulsoup]"
|
|
||||||
```
|
|
||||||
|
|
||||||
For browser-based Python workers:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
python -m pip install -e ".[playwright]"
|
|
||||||
python -m playwright install chromium
|
|
||||||
```
|
|
||||||
|
|
||||||
## Smoke Run
|
|
||||||
|
|
||||||
```bash
|
|
||||||
python -m tip_crawlee_worker \
|
|
||||||
--mode beautifulsoup \
|
|
||||||
--url https://crawlee.dev \
|
|
||||||
--out /tmp/tip-crawlee-python-smoke.jsonl \
|
|
||||||
--max-requests 1
|
|
||||||
```
|
|
||||||
|
|
||||||
## TIP Policy
|
|
||||||
|
|
||||||
- Use this on Pi/Proxmox/residential nodes first, not as an Erik-heavy crawler.
|
|
||||||
- Keep output as JSONL evidence until a deterministic importer validates it.
|
|
||||||
- Record useful crawler outcomes in the TIPLLM training pool.
|
|
||||||
- Use TIPLLM only for planning/extraction feedback; no external AI.
|
|
||||||
@ -1,10 +0,0 @@
|
|||||||
README.md
|
|
||||||
pyproject.toml
|
|
||||||
tip_crawlee_python_worker.egg-info/PKG-INFO
|
|
||||||
tip_crawlee_python_worker.egg-info/SOURCES.txt
|
|
||||||
tip_crawlee_python_worker.egg-info/dependency_links.txt
|
|
||||||
tip_crawlee_python_worker.egg-info/entry_points.txt
|
|
||||||
tip_crawlee_python_worker.egg-info/requires.txt
|
|
||||||
tip_crawlee_python_worker.egg-info/top_level.txt
|
|
||||||
tip_crawlee_worker/__init__.py
|
|
||||||
tip_crawlee_worker/__main__.py
|
|
||||||
@ -1 +0,0 @@
|
|||||||
|
|
||||||
@ -1,2 +0,0 @@
|
|||||||
[console_scripts]
|
|
||||||
tip-crawlee-worker = tip_crawlee_worker.__main__:main
|
|
||||||
@ -1,11 +0,0 @@
|
|||||||
crawlee>=1.0.0
|
|
||||||
|
|
||||||
[all]
|
|
||||||
crawlee[all]>=1.0.0
|
|
||||||
|
|
||||||
[beautifulsoup]
|
|
||||||
crawlee[beautifulsoup]>=1.0.0
|
|
||||||
|
|
||||||
[playwright]
|
|
||||||
crawlee[playwright]>=1.0.0
|
|
||||||
playwright>=1.50.0
|
|
||||||
@ -1 +0,0 @@
|
|||||||
tip_crawlee_worker
|
|
||||||
Binary file not shown.
Binary file not shown.
Loading…
x
Reference in New Issue
Block a user