17 Commits

Author SHA1 Message Date
Rene Fichtmueller
5b35b2b8be feat(scraper+api): warehouse stock data pipeline — FS.com v2, SmartOptics v2, Stock API
Scraper changes:
- fs-com.ts v2: Playwright stealth patches + www.fs.com/de/ URL fix (de.fs.com DNS NXDOMAIN).
  Extracts DE-Lager, Global-Lager, Nachlieferung, units_sold, compatible_brands, price_net.
  Mac-side runner (run-fs-scraper-mac.sh) via SSH tunnel for residential IP access.
  Fast-fail connectivity check on datacenter IPs that are blocked by Cloudflare.
- smartoptics.ts v2: WooCommerce REST API fallback + 8 catalog categories + relative URL fix.
  Was finding only 8 products, now discovers 18+ with multi-category crawl.

DB layer:
- db.ts: add upsertStockObservation() — writes 10 new stock_observations columns
  (warehouse_de_qty, warehouse_global_qty, backorder_qty, units_sold, compatible_brands,
  price_net, product_url, delivery dates) with dedup check.

API:
- routes/stock.ts: GET /api/stock, /api/stock/summary, /api/stock/:id
  Warehouse breakdowns per transceiver/vendor with top-sellers and vendor summary.
- routes/review.ts: equivalence review queue (approve/reject/bulk-approve).
- index.ts: register /api/stock and /api/review routes.

Dashboard:
- index.html: 🏭 Stock tab with stat cards (DE-Lager, Global-Lager, Nachlieferung totals),
  top-sellers table, vendor breakdown, recently-restocked events, part-number lookup.

SQL migrations:
- 034: blog-review-tag, 035: price-observations is_anomalous, 036: transceiver-equivalences.
2026-04-17 10:45:59 +02:00
Rene Fichtmueller
cf75eee8ad feat: linecard system support, Cisco 8000 accuracy, price anomaly detection
API/finder:
- Add modular chassis support: sibling linecards fetched when is_linecard=true
- Add chassis linecards when system_type=modular
- Extend switch response: system_type, is_linecard, chassis_model, slot_type,
  flexbox_compat_mode, flexbox_notes, description, switching_capacity_tbps,
  total_ports, category, lifecycle_status, features, use_cases, linecards[]

API/transceivers:
- Filter price_observations with COALESCE(is_anomalous, false) = false
  (direct prices + comparable market prices)

Scraper/db:
- Add PRICE_BOUNDS map (per form-factor min/max USD sanity bounds)
- Add isPriceAnomalous() — marks DB price_observations as is_anomalous=true
- Add competitor_verified flag: set true when valid competitor price stored
- upsertPriceObservation: skip prices outside sanity bounds, set competitor_verified

Scraper/hash:
- contentHash() now accepts Record<string,unknown> | string (union type)
  to support both structured objects and legacy string callers

Scrapers (skylane, tscom, wiitek):
- Fix contentHash() call signature: pass objects not JSON.stringify strings
- Fix wiitek: remove invalid 'name' param, fix t.id → transceiverId

Migrations:
- Add is_anomalous, competitor_verified, competitor_verified_at,
  image_primary columns
- Recreate sync_fully_verified trigger to include competitor_verified
- Add is_linecard, chassis_model, system_type, slot_type,
  flexbox_compat_mode, flexbox_notes to switches table
2026-04-09 09:06:22 +02:00
Rene Fichtmueller
e4e432d9fa fix: parsePrice requires currency symbol + uses largest number to avoid misreads
Root cause of fake prices (e.g. 1.30 for 800G OSFP):
- parsePrice accepted any bare number without currency symbol
- Could misread stock counts, page numbers, or CSS values as prices
- Also picked the first number, not the main price

Fix:
- Require explicit currency symbol or decimal format (1234.56)
- Use the LARGEST number found in the price string
- Returns price=0 (rejected) when no valid price pattern found
2026-04-06 01:19:25 +02:00
Rene Fichtmueller
1434037d29 fix(scraper): use false instead of null for image_verified on insert 2026-04-05 12:15:10 +02:00
Rene Fichtmueller
f616e0ebbe feat: blog engine v4 (reduction+style-lock passes) + flexoptix scraper fixes
Blog engine (fo-blog-pipeline.ts):
- Add STEP8b_REDUCTION: cuts article 25-35%, removes repeated concepts
- Add STEP8c_STYLE_LOCK: enforces tone consistency, fixes scope/OPM confusion,
  removes inline SKUs from article flow
- Add Gold Standard 3 to calibration (Style B troubleshooting example 2026-04-04)
- Pipeline now 12 steps (was 10), version bumped to v4-reduction-stylelock

blog.ts:
- Wire STEP8b and STEP8c into pipeline between Kill-AI-Tone and QA Check
- Update progress tracking to 12 total steps
- Update pipeline_version to 'v4-reduction-stylelock'

flexoptix-catalog.ts:
- Fix contentHash call: pass object directly, not JSON.stringify(object)

db.ts:
- price_verified=true set in content_hash early-return path (no new observation)
- image_verified=true auto-set in findOrCreateScrapedTransceiver on INSERT/UPDATE
2026-04-04 07:50:01 +02:00
Rene Fichtmueller
c179b236d7 fix: auto-set image_verified and price_verified in db utils
- findOrCreateScrapedTransceiver now sets image_verified=true when writing image_url
- upsertPriceObservation now sets price_verified=true on the transceiver after inserting price
- Both INSERT and UPDATE paths covered for image_verified sync
- Eliminates need for manual backfill after scraper runs
2026-04-04 07:14:26 +02:00
Rene Fichtmueller
1026787318 feat: add proxy network, image backfill, and scraper improvements
- Add TIP Proxy Network (packages/proxy-agent): SOCKS5 proxy agent
  for residential IP bypass of CloudFront WAF blocks
- Add /api/proxy/* routes: node registration, heartbeat, load balancing
- Add image extraction to Flexoptix catalog scraper (GraphQL small_image)
- Add image extraction to Optcore scraper (Playwright gallery img)
- Fix Fluxlight price scraping (BigCommerce HTML structure: data-product-price-without-tax)
- Add SmartOptics scraper (8 DWDM/coherent products, og:image extraction)
- Fix findOrCreateScrapedTransceiver to update image_url for existing records
- Add image backfill script (backfill-images.ts): 178 Flexoptix images added
- Fix DB connection pool: max 5, idleTimeoutMillis 10s (was unlimited, caused >100 connections)
- Add proxy.ts utility for scraper proxy rotation
2026-04-03 21:13:03 +02:00
Rene Fichtmueller
f146ac873e feat: add 5 form-factor coverage scrapers with worker registrations
Add Comms-Express, Router-Switch.com, Multimode Inc, OpticTransceiver.com,
and Wiitek scrapers covering CFP2-DCO, CFP4, OSFP224, QSFP112, CXP, GBIC,
XENPAK, CSFP, SFP-DD, SFP56, QSFP56 and other previously-uncovered form
factors. Each scheduled every 8h. Worker registrations added to scheduler.

Also export db alias in utils/db.ts to fix eBay enricher + community scrapers
crashing with 'Cannot read properties of undefined (reading query)'.
2026-04-02 08:39:17 +02:00
Rene Fichtmueller
370c1d8801 feat: 6 prediction signal scrapers + forecast engine
New scrapers (all registered in pg-boss, 50 total jobs):
  - sec-edgar.ts       : SEC EDGAR XBRL API — hyperscaler CapEx from 10-Q/10-K
  - github-signals.ts  : GitHub Search/Stats API — tech adoption metrics weekly
  - ebay-velocity.ts   : eBay completed listings — sold count + price distribution
  - ai-clusters.ts     : RSS feeds (6 sources) — AI cluster & DC announcements
  - distributor-leads.ts : Mouser, Digi-Key, RS Components — lead time + stock
  - standards-tracker.ts : IEEE 802.3, OIF, IETF — draft/ballot/published status

New utilities:
  - forecast-engine.ts : Weighted signal aggregator → demand_index + price_direction
    6 signal types, 4 horizons (3/9/12/18 months), 5 technologies tracked

New DB tables (migration 022):
  hyperscaler_capex, distributor_lead_times, github_tech_signals,
  marketplace_velocity, ai_cluster_announcements, standards_activity,
  forecast_signals

Schedules:
  - EDGAR: weekly Mon 06:00
  - GitHub: weekly Sun 05:00
  - eBay velocity: every 12h
  - AI clusters: every 4h (news-speed)
  - Distributor leads: daily 03:30
  - Standards: weekly Wed 04:00
  - Forecast engine: daily 08:00 (after all nightly scrapers)
2026-04-02 02:02:44 +02:00
Rene Fichtmueller
c156e8d9f6 feat: download datasheets + manuals to Fearghas NAS in nightly sync
- downloadDocuments(): fetches PDFs from product_documents and documents tables
  using curl, organises into switches/ transceivers/ whitepapers/ other/ subdirs
- Integrated into runNightlyNasSync() — runs after JSON exports
- rsync incremental — only new/changed files transferred
- NAS dir structure: /volume1/tip-data/datasheets/{switches,transceivers,whitepapers,other}
- max-filesize 50MB guard per file
2026-04-02 01:47:16 +02:00
Rene Fichtmueller
5abe6397c4 feat: add logger utility + WireGuard setup in pi-scraper-setup.sh
- utils/logger.ts: minimal console-based logger (debug/info/warn/error)
  used by community-issues and ebay-enricher scrapers
- scripts/pi-scraper-setup.sh: step 7 adds optional WireGuard setup
  (pass WG_PRIVKEY + WG_ADDR env vars) — connects Pi to Erik for DB access
  auto-detects dead ethernet and routes WG traffic via working interface
2026-04-02 01:42:25 +02:00
Rene Fichtmueller
48218a553d feat: nightly scraper window 00-08 + NAS Fearghas sync + procurement demo data
- All scrapers now run nightly 00:00-08:00 (staggered, every day)
- NAS sync module: rsync JSON exports + weekly pg_dump to Fearghas via WireGuard
- 07:45 daily: price_observations, switches, transceivers, signals, issues exported as JSON
- Migration 021: 200 ABC classifications, 150 reorder signals, 300 stock snapshots demo data
- 9 market intelligence entries (LightReading, FierceTelecom, Farnell, Mouser, EU TED, Arista)
- 6 lifecycle events (ZR, 800G OSFP, 100G DR4 price floor, SFP-10G-SR EOL)
2026-04-01 23:07:26 +02:00
Rene Fichtmueller
e4c89de6c0 feat: fs.com scraper Phase 2 — crawl product detail pages for verified specs
- New spec-updater utility: parseSpecTable() + updateVerifiedSpecs()
- fs.com scraper now has 2 phases:
  Phase 1: Category pages → prices + stock (existing)
  Phase 2: Product detail pages → fiber_type, connector, wavelength, power, image, datasheet
- Updates data_confidence from 'enriched_estimated' to 'scraped_unverified'
- Processes up to 200 product pages per scraper run
2026-03-31 09:18:27 +02:00
Rene Fichtmueller
a69acc4588 feat(v0.2.0): Sales Intelligence Engine — Phase 0+A
New API routes:
- GET /api/finder — Switch→Flexoptix transceiver finder with FlexBox coding
- GET /api/competitor-alerts — Competitor intelligence (price changes, new products, stock)
- GET /api/forecast/:technology — Sales forecast 3/9/12/18 months + buy/wait/hold signal
- POST /api/transport/plan — Transport system planner (city→city BOM with fiber providers)

New MCP tools:
- find_flexoptix_for_switch — Customer switch → Flexoptix products
- get_competitor_alerts — Competitor monitoring
- plan_transport — Network transport planning
- forecast_sales — Volume/revenue prediction
- generate_blog — Enhanced blog generation

New DB tables (migration 013):
- competitor_alerts, price_changes, flexoptix_product_map
- sales_forecasts, fiber_providers, fiber_routes, cities
- generated_datasheets, blog_series
- Views: v_price_coverage, v_image_coverage, v_switch_flexoptix_finder

Seed data (migration 014):
- 25 European cities with IX/DC locations + coordinates
- 15 fiber providers (euNetworks, Telia, DTAG, Colt, Zayo, etc.)
- 16 fiber routes with pricing (Germany focus)

Infrastructure:
- Scraper scheduler: 2h Flexoptix, 4h FS.com/Optcore (was 6-8h)
- Change detector for competitor price/stock monitoring
- Image downloader utility with coverage tracking
2026-03-31 08:51:22 +02:00
Rene Fichtmueller
814325b349 feat: dashboard v2, blog expansion, market/cable MCP tools, switch asset scrapers, scraper utilities 2026-03-30 08:07:12 +02:00
Rene Fichtmueller
70447def02 feat: massive scraper expansion + hype cycle engine + lifecycle prediction
New scrapers:
- GBICS.com (BigCommerce, GBP prices, 10 categories, 78 products)
- Juniper HCT (Next.js SSR parser, 475 transceivers with specs/EOL)
- SFPcables.com (Magento store, 16 categories, 78 products)
- Fluxlight (BigCommerce, 6 pages, 118 products)
- Champion ONE (compatible vendor scraper)

Scraper fixes:
- 10Gtek: rewritten to parse HTML spec tables (152 products)
- Flexoptix: fix price extraction from Magento Hyva HTML
- Register all scrapers in CLI (--gbics, --juniper, --sfpcables, etc.)

Hype Cycle Engine enhancements:
- Data-driven enrichment from scraped vendor/price data
- Revenue lifecycle prediction (peak year, decline, revenue index)
- Regional adoption model (NA, China, APAC, Europe, RoW with lag coefficients)
- New API endpoints: /enriched, /lifecycle, /regional/:tech

DB growth: 89 → 1,168 transceivers, 0 → 416 prices, 6 vendors
Qdrant: 1,162 products embedded with nomic-embed-text

Research: Norton-Bass model, standards-to-market timelines, hype signals
2026-03-28 02:30:19 +13:00
Rene Fichtmueller
e9fb50a248 feat: TIP Phase 0+1 — monorepo, DB schema, API, scraper engine
Phase 0 - Foundation:
- Restructure into npm workspace monorepo (packages/core, api, scraper)
- PostgreSQL 17 + TimescaleDB schema (15 tables incl. hypertables)
- Docker Compose for local dev (PostgreSQL on 5433 + Qdrant)
- Express 5 API on port 3200 with 6 routes
- Seed script to migrate 159 transceivers + 42 standards from npm package
- Erik server setup script + PM2 ecosystem config

Phase 1 - Scraper Engine:
- Crawlee + Playwright framework with pg-boss scheduler
- FS.com scraper (PlaywrightCrawler, anti-bot workaround)
- Optcore.net scraper (WP REST API enumeration + PlaywrightCrawler)
  - Uses /wp-json/wp/v2/product to get 2000+ product URLs
  - Playwright renders individual product pages for price extraction
- Cisco TMG Matrix scraper (compatibility data)
- News RSS aggregator (optics.org, SPIE, Network World, Nature Photonics)
  - Keyword relevance scoring for transceiver/fiber topics
  - xml2js with malformed XML sanitization
- SHA-256 content hashing for change detection (skip unchanged records)
- pg-boss v10 with explicit queue creation before scheduling
2026-03-27 16:27:31 +13:00