Phase 0 - Foundation: - Restructure into npm workspace monorepo (packages/core, api, scraper) - PostgreSQL 17 + TimescaleDB schema (15 tables incl. hypertables) - Docker Compose for local dev (PostgreSQL on 5433 + Qdrant) - Express 5 API on port 3200 with 6 routes - Seed script to migrate 159 transceivers + 42 standards from npm package - Erik server setup script + PM2 ecosystem config Phase 1 - Scraper Engine: - Crawlee + Playwright framework with pg-boss scheduler - FS.com scraper (PlaywrightCrawler, anti-bot workaround) - Optcore.net scraper (WP REST API enumeration + PlaywrightCrawler) - Uses /wp-json/wp/v2/product to get 2000+ product URLs - Playwright renders individual product pages for price extraction - Cisco TMG Matrix scraper (compatibility data) - News RSS aggregator (optics.org, SPIE, Network World, Nature Photonics) - Keyword relevance scoring for transceiver/fiber topics - xml2js with malformed XML sanitization - SHA-256 content hashing for change detection (skip unchanged records) - pg-boss v10 with explicit queue creation before scheduling
34 lines
769 B
YAML
34 lines
769 B
YAML
services:
|
|
postgres:
|
|
image: timescale/timescaledb:latest-pg17
|
|
container_name: tip-postgres
|
|
environment:
|
|
POSTGRES_DB: transceiver_db
|
|
POSTGRES_USER: tip
|
|
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-tip_dev_2026}
|
|
ports:
|
|
- "5433:5432"
|
|
volumes:
|
|
- tip_pgdata:/var/lib/postgresql/data
|
|
- ./sql:/docker-entrypoint-initdb.d
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "pg_isready -U tip -d transceiver_db"]
|
|
interval: 5s
|
|
timeout: 5s
|
|
retries: 5
|
|
|
|
qdrant:
|
|
image: qdrant/qdrant:latest
|
|
container_name: tip-qdrant
|
|
ports:
|
|
- "6333:6333"
|
|
- "6334:6334"
|
|
volumes:
|
|
- tip_qdrant:/qdrant/storage
|
|
environment:
|
|
QDRANT__SERVICE__GRPC_PORT: 6334
|
|
|
|
volumes:
|
|
tip_pgdata:
|
|
tip_qdrant:
|