5 Commits

Author SHA1 Message Date
Rene Fichtmueller
9965d8e43c fix: instance-level Crawlee storage isolation + eBay vendor type
- Add utils/crawlee-config.ts: makeCrawleeConfig(name) returns a
  Crawlee Configuration with isolated localDataDirectory per scraper.
  Uses storageClientOptions (not global CRAWLEE_STORAGE_DIR) so
  concurrent pg-boss workers in the same process don't race on
  the shared env var.

- Apply makeCrawleeConfig to all 6 Crawlee-based scrapers:
  optcore (PlaywrightCrawler), atgbics (PlaywrightCrawler),
  community-issues (CheerioCrawler + RequestQueue),
  edgecore (CheerioCrawler), ufispace (CheerioCrawler),
  market-intelligence (CheerioCrawler).

- scheduler.ts: add withIsolatedStorage for optcore and market-intel
  workers (was missing, caused storage-fs path bleed from fs scraper).

- ebay-enricher.ts: fix vendor type 'marketplace' -> 'reseller' to
  satisfy vendors_type_check constraint
  ['manufacturer','distributor','oem','reseller','compatible'].
2026-04-18 01:35:57 +02:00
Rene Fichtmueller
9db0335229 feat: wire finder.ts + switch-docs + Ollama LLM tools to MCP server
MCP Server (packages/mcp-server/src/index.ts):
- Register registerSwitchDocTools (switch-docs.ts) — switch documentation lookup
- Register finderTools dynamically (finder.ts) — find_flexoptix_for_switch, get_competitor_alerts
- Add analyze_market_with_llm tool: qwen2.5:14b via Ollama, enriched with live hype cycle + pricing + news
- Add generate_blog_post tool: fo-blog-v5 (fine-tuned) with qwen2.5:14b fallback, enriched with live pricing data
- OLLAMA_BASE_URL env var (default: https://ollama.fichtmueller.org)

Also includes scraper improvements (ascentoptics, atgbics, gbics, skylane, ebay-enricher),
API route updates (blog, blog-sll, health, hot-topics, transceivers, queries),
and dashboard hot-topics refresh.
2026-04-18 00:21:58 +02:00
Rene Fichtmueller
7050ff0802 feat(scraper): add SOCKS5 proxy rotation for fs-com, atgbics, gbics scrapers
Routes requests through CT130/131/132 proxy pool (192.168.178.77/76/74:1080)
when PROXY_URLS env var is set. Uses ProxyConfiguration from crawlee for
PlaywrightCrawler scrapers and socks-proxy-agent for fetch-based scrapers.
2026-04-08 08:17:49 +02:00
Rene Fichtmueller
d9f5fc253f fix(verification): 100% Verified Badge war dramatisch zu großzügig
KERNPROBLEME BEHOBEN:
1. ATGBICS part_number = URL slug statt echte OEM-Nummer
   extractOemPartNumber() entfernt -r-compatible-transceiver-* Suffix
   + trailing Vendor-Namen (nokia, cisco, juniper, ...)
   Ergebnis: 3he16564aa-nokia-r-compatible-transceiver-... → 3HE16564AA

2. reach_label = '' (leer) wurde als details_verified akzeptiert
   IS NOT NULL erlaubt leere Strings → Fix: AND reach_label != ''

3. details_verified = true trotz garbled part_number
   Neue Kriterien: NOT ILIKE '%-compatible-transceiver%'
                   NOT ILIKE '%-r-compatible%'

4. data_confidence Werte falsch in Funktion ('scraped_unverified' etc)
   Echte Werte: low/medium/high/garbage → NOT IN ('garbage','unknown')

ERGEBNIS nach recompute_all_verification():
  fully_verified: 3.654 → 581 (Badge war 6x übertrieben)
  details_verified: inflated → 1.075 (korrekt)

ATGBICS Scraper:
  - extractOemPartNumber() für collection und product detail pages
  - detectReach() jetzt auch auf URL-slug (120km im slug → reach_label)

Price Anomaly Detection:
  - API: price_anomaly field wenn max/min ratio ≥ 10x
  - Dashboard: ⚠ Preisanomalie Banner mit Ratio + EUR Range

SQL 025: Part number cleanup (30 records), reach from slug (12 records)
2026-04-04 15:41:57 +02:00
Rene Fichtmueller
4b452ab49e feat(scrapers+mcp): ATGBICS + ProLabs scrapers, MCP HTTP/SSE server
Scrapers:
- atgbics.ts: PlaywrightCrawler for UK vendor ATGBICS (Shopify store),
  scrapes SFP/SFP+/SFP28/QSFP+/QSFP28/QSFP-DD in GBP, max 50 pages/run
- prolabs.ts: HttpCrawler for ProLabs (Legrand subsidiary), USD pricing,
  category-driven crawl with reach/fiber/speed detection
- Both registered in scheduler (every 8h, staggered) and index.ts CLI

MCP HTTP Server:
- packages/mcp-server/src/http-server.ts: Express + SSEServerTransport
- Exposes all 12 TIP tools via GET /sse + POST /message
- Bearer token auth (MCP_SECRET env), CORS-configurable
- GET /health → { status: "ok", tools: 12 }
- Port: MCP_HTTP_PORT (default 3201)

SQL + tools:
- sql/006-009: seed scripts for whitebox switches, vendors, assets
- switch-docs.ts: MCP tool for switch documentation queries
2026-03-29 02:26:45 +08:00