Rene Fichtmueller
9965d8e43c
fix: instance-level Crawlee storage isolation + eBay vendor type
...
- Add utils/crawlee-config.ts: makeCrawleeConfig(name) returns a
Crawlee Configuration with isolated localDataDirectory per scraper.
Uses storageClientOptions (not global CRAWLEE_STORAGE_DIR) so
concurrent pg-boss workers in the same process don't race on
the shared env var.
- Apply makeCrawleeConfig to all 6 Crawlee-based scrapers:
optcore (PlaywrightCrawler), atgbics (PlaywrightCrawler),
community-issues (CheerioCrawler + RequestQueue),
edgecore (CheerioCrawler), ufispace (CheerioCrawler),
market-intelligence (CheerioCrawler).
- scheduler.ts: add withIsolatedStorage for optcore and market-intel
workers (was missing, caused storage-fs path bleed from fs scraper).
- ebay-enricher.ts: fix vendor type 'marketplace' -> 'reseller' to
satisfy vendors_type_check constraint
['manufacturer','distributor','oem','reseller','compatible'].
2026-04-18 01:35:57 +02:00
Rene Fichtmueller
9db0335229
feat: wire finder.ts + switch-docs + Ollama LLM tools to MCP server
...
MCP Server (packages/mcp-server/src/index.ts):
- Register registerSwitchDocTools (switch-docs.ts) — switch documentation lookup
- Register finderTools dynamically (finder.ts) — find_flexoptix_for_switch, get_competitor_alerts
- Add analyze_market_with_llm tool: qwen2.5:14b via Ollama, enriched with live hype cycle + pricing + news
- Add generate_blog_post tool: fo-blog-v5 (fine-tuned) with qwen2.5:14b fallback, enriched with live pricing data
- OLLAMA_BASE_URL env var (default: https://ollama.fichtmueller.org )
Also includes scraper improvements (ascentoptics, atgbics, gbics, skylane, ebay-enricher),
API route updates (blog, blog-sll, health, hot-topics, transceivers, queries),
and dashboard hot-topics refresh.
2026-04-18 00:21:58 +02:00
Rene Fichtmueller
7050ff0802
feat(scraper): add SOCKS5 proxy rotation for fs-com, atgbics, gbics scrapers
...
Routes requests through CT130/131/132 proxy pool (192.168.178.77/76/74:1080)
when PROXY_URLS env var is set. Uses ProxyConfiguration from crawlee for
PlaywrightCrawler scrapers and socks-proxy-agent for fetch-based scrapers.
2026-04-08 08:17:49 +02:00
Rene Fichtmueller
d9f5fc253f
fix(verification): 100% Verified Badge war dramatisch zu großzügig
...
KERNPROBLEME BEHOBEN:
1. ATGBICS part_number = URL slug statt echte OEM-Nummer
extractOemPartNumber() entfernt -r-compatible-transceiver-* Suffix
+ trailing Vendor-Namen (nokia, cisco, juniper, ...)
Ergebnis: 3he16564aa-nokia-r-compatible-transceiver-... → 3HE16564AA
2. reach_label = '' (leer) wurde als details_verified akzeptiert
IS NOT NULL erlaubt leere Strings → Fix: AND reach_label != ''
3. details_verified = true trotz garbled part_number
Neue Kriterien: NOT ILIKE '%-compatible-transceiver%'
NOT ILIKE '%-r-compatible%'
4. data_confidence Werte falsch in Funktion ('scraped_unverified' etc)
Echte Werte: low/medium/high/garbage → NOT IN ('garbage','unknown')
ERGEBNIS nach recompute_all_verification():
fully_verified: 3.654 → 581 (Badge war 6x übertrieben)
details_verified: inflated → 1.075 (korrekt)
ATGBICS Scraper:
- extractOemPartNumber() für collection und product detail pages
- detectReach() jetzt auch auf URL-slug (120km im slug → reach_label)
Price Anomaly Detection:
- API: price_anomaly field wenn max/min ratio ≥ 10x
- Dashboard: ⚠ Preisanomalie Banner mit Ratio + EUR Range
SQL 025: Part number cleanup (30 records), reach from slug (12 records)
2026-04-04 15:41:57 +02:00
Rene Fichtmueller
4b452ab49e
feat(scrapers+mcp): ATGBICS + ProLabs scrapers, MCP HTTP/SSE server
...
Scrapers:
- atgbics.ts: PlaywrightCrawler for UK vendor ATGBICS (Shopify store),
scrapes SFP/SFP+/SFP28/QSFP+/QSFP28/QSFP-DD in GBP, max 50 pages/run
- prolabs.ts: HttpCrawler for ProLabs (Legrand subsidiary), USD pricing,
category-driven crawl with reach/fiber/speed detection
- Both registered in scheduler (every 8h, staggered) and index.ts CLI
MCP HTTP Server:
- packages/mcp-server/src/http-server.ts: Express + SSEServerTransport
- Exposes all 12 TIP tools via GET /sse + POST /message
- Bearer token auth (MCP_SECRET env), CORS-configurable
- GET /health → { status: "ok", tools: 12 }
- Port: MCP_HTTP_PORT (default 3201)
SQL + tools:
- sql/006-009: seed scripts for whitebox switches, vendors, assets
- switch-docs.ts: MCP tool for switch documentation queries
2026-03-29 02:26:45 +08:00