Phase 0 - Foundation: - Restructure into npm workspace monorepo (packages/core, api, scraper) - PostgreSQL 17 + TimescaleDB schema (15 tables incl. hypertables) - Docker Compose for local dev (PostgreSQL on 5433 + Qdrant) - Express 5 API on port 3200 with 6 routes - Seed script to migrate 159 transceivers + 42 standards from npm package - Erik server setup script + PM2 ecosystem config Phase 1 - Scraper Engine: - Crawlee + Playwright framework with pg-boss scheduler - FS.com scraper (PlaywrightCrawler, anti-bot workaround) - Optcore.net scraper (WP REST API enumeration + PlaywrightCrawler) - Uses /wp-json/wp/v2/product to get 2000+ product URLs - Playwright renders individual product pages for price extraction - Cisco TMG Matrix scraper (compatibility data) - News RSS aggregator (optics.org, SPIE, Network World, Nature Photonics) - Keyword relevance scoring for transceiver/fiber topics - xml2js with malformed XML sanitization - SHA-256 content hashing for change detection (skip unchanged records) - pg-boss v10 with explicit queue creation before scheduling
29 lines
842 B
JSON
29 lines
842 B
JSON
{
|
|
"name": "transceiver-intelligence-platform",
|
|
"version": "0.1.0",
|
|
"private": true,
|
|
"description": "Transceiver Intelligence Platform — the world's most comprehensive optical transceiver & network switch database",
|
|
"workspaces": [
|
|
"packages/*"
|
|
],
|
|
"scripts": {
|
|
"build": "npm run build --workspaces",
|
|
"build:core": "npm run build -w packages/core",
|
|
"build:api": "npm run build -w packages/api",
|
|
"dev": "npm run dev -w packages/api",
|
|
"migrate": "tsx scripts/migrate.ts",
|
|
"seed": "tsx scripts/seed-from-npm.ts",
|
|
"db:reset": "npm run migrate && npm run seed"
|
|
},
|
|
"author": "Rene Fichtmueller",
|
|
"license": "MIT",
|
|
"repository": {
|
|
"type": "git",
|
|
"url": "https://github.com/renefichtmueller/transceiver-db"
|
|
},
|
|
"devDependencies": {
|
|
"tsx": "^4.19",
|
|
"typescript": "^5.9.3"
|
|
}
|
|
}
|