Rene Fichtmueller
69d4be2384
data: MikroTik CRS/CCR CDN images + NVIDIA ConnectX-7 (migration 055)
2026-04-21 09:48:30 +02:00
Rene Fichtmueller
92d373d40e
data: direct image injection for Nokia, F5, Delta Networks, Siemens, TP-Link
...
Migration 051: TP-Link TL-SG3452XP + TL-SX3016F via static.tp-link.com CDN
Migration 052: Nokia 6/6 — 7220 IXR-D3L/H4 (docs.nokia.com graphics),
7250 IXR-10 + 7750 SR-1 (tempestns.com), SR-14s (telecomcauliffe.com),
SR-1e (docs hardwareBanner — no standalone public image available)
Migration 053: F5 BIG-IP i5800/i10800 (wtit.com), i15800 (blueally CDN)
Migration 054: Delta Networks 4/4 (hardwarenation.com + manualslib),
Siemens SCALANCE 4/4 — X-200/X-300/X-500 via images.sw.cdn.siemens.com
All 14 URLs verified HTTP 200 with correct image content-type (2026-04-21).
CHANGELOG_PENDING.md updated for all 4 migrations.
2026-04-21 08:42:14 +02:00
Rene Fichtmueller
f275e94a6f
data: image coverage improvements — QCT, Allied Telesis + CHANGELOG update
...
- sql/046: QCT QuantaMesh T3048-LY8 direct image injection
- sql/050: Allied Telesis 3/3 (AT-x530-28GSX, AT-x530L-52GPX, AT-x950-28XSQ)
via alliedtelesis.com Drupal static files CDN (og:image, all 200 PNG)
- CHANGELOG: document filter pattern fixes + all 6 new vendor migrations
(Moxa/UfiSpace/Brocade/NVIDIA/Allied Telesis — 19 new images total)
2026-04-21 08:11:57 +02:00
Rene Fichtmueller
51c18212b8
fix: add image filter patterns and direct URL migrations for 6 vendors
...
- switch-image-playwright.ts + switch-image-fetcher.ts: add filter patterns
for /webimage-404/ (Netgear 404 hero), /Brand/ + /cybersecurity.png/
(Moxa brand marketing images not product photos)
- sql/047: Moxa 4/4 models — CDN getattachment paths (hotlink-protected,
Referer: moxa.com required; R2 proxy needed for production display)
- sql/048: UfiSpace 6/6 models — ufispace.com/image/<hash>/ direct PNGs;
Brocade G720+G730 — broadcom.com og:image; ICX 7850-48FS — CommScope/Ruckus
vistancenetworks.com ImageServer (rand param is cache-bust only, not auth)
- sql/049: NVIDIA SN-series 6/6 — docscontent.nvidia.com (SN2201/3700/4700)
and S3 direct (SN5400/5600); SN3750-SX via uvation reseller CDN
2026-04-21 07:57:55 +02:00
Rene Fichtmueller
07e1fc9178
data: inject Edgecore product images directly (Playwright blocked by 403)
...
Edgecore blocks headless browsers (Playwright 403) but serves og:image via
plain HTTP. 5 models resolved via direct curl extraction:
- DCS204, DCS510, DCS810, EPS203 → edge-core.com/product/<slug>/
- Minipack2 → minipack-as8000-open-modular-platform product page
AS7535/AS7726/AS7946/AS9516 not on Edgecore's public WooCommerce site.
2026-04-21 07:07:24 +02:00
Rene Fichtmueller
c4585caada
fix: Cisco 8000 builder URL + MikroTik lowercase + new vendor builders
...
URL builder fixes:
- Cisco 8000: update to new /site/us/en/ URL scheme (family page, not per-model)
- MikroTik: fix to lowercase+underscore format (was uppercase, caused 404)
- Fortinet: set to null — JS-rendered pages, all redirect to generic page
- Alcatel-Lucent Enterprise slug added to dispatcher (was missing, caused 0 hits)
- Add Quanta, Allied Telesis, Ufispace, Netgear URL builders
- NVIDIA: skip ConnectX/BlueField non-switch models
Migration 044:
- Clear 35 wrong NCS-5500 URLs from Cisco 8000-series models
- Pre-set correct 8000-series family URL for 21 models without images
2026-04-21 00:41:31 +02:00
Rene Fichtmueller
1d50fd1c8f
feat: Flexoptix order section per switch + reject generic/logo images
2026-04-20 23:31:36 +02:00
Rene Fichtmueller
8de025cbcd
fix: expand compatibility.verification_method CHECK to include vendor_compat + spec_match
2026-04-20 23:23:23 +02:00
Rene Fichtmueller
51af82bd14
fix(migration-041): use 'manual' doc_type instead of config_guide (CHECK constraint)
2026-04-20 23:17:41 +02:00
Rene Fichtmueller
983cd1bc0d
fix: add certifications column to switches table (migration 039a)
2026-04-20 23:16:52 +02:00
Rene Fichtmueller
02c8554f56
data: migration 041 — seed switch datasheets (Cisco, Arista, Juniper, NVIDIA)
...
Seeds known-good official datasheet PDFs and config guide URLs for the 24 initial
seeded switches. Directly visible in switch detail panel → Datasheets & Manuals.
2026-04-20 22:58:18 +02:00
Rene Fichtmueller
ddc49e4e2b
feat: switch facts — migration 040 seeds power/weight/certifications + dashboard shows them
...
- Migration 040: seeds rack_units, typical_power_w, max_power_w, weight_kg, certifications
for 23 initial switches (Cisco Nexus/Catalyst, Arista, Juniper, NVIDIA, Edgecore/Celestica/Asterfusion)
- Dashboard: Specifications section now shows Typical Power, Weight, Certifications (colored pills)
2026-04-20 22:56:53 +02:00
Rene Fichtmueller
7a6e60fcc6
feat: Norton-Bass Hype Cycle Engine — market_metrics seed + Bass fitting + Gartner phase detection
2026-04-18 00:09:08 +02:00
Rene Fichtmueller
75cbd7cd86
feat: stock quality schema + QSFPTEK/NADDOD v2 scrapers with real-time stock counts
...
- Migration 028 (retroactive): document warehouse columns added to stock_observations
- Migration 037: composite indexes for DISTINCT ON (transceiver_id, source_vendor_id) queries
- Migration 038: add stock_confidence (1/2/3), price_currency, price_includes_tax,
stock_vendor_ts to stock_observations + TRUNCATE test-run data
db.ts: upsertStockObservation now accepts stockConfidence, priceCurrency,
priceIncludesTax, stockVendorTs; delta detection includes quantity_available
fs-com.ts: passes stockConfidence=3 + priceCurrency=EUR + priceIncludesTax=false
qsfptek.ts v2: Phase 1 API listing + Phase 2 detail-page stock extraction
- Parses 'X in real-time stock, DATE' from product detail pages
- Writes stock_observations with confidence=2 + stockVendorTs
- Up to 500 detail pages/run at 2s rate limit
naddod.ts v2: complete rewrite from WooCommerce to Astro sitemap-based
- Discovers products via /sitemaps/products.xml (600+ products)
- URL format: /products/XXXXX.html
- Extracts 'In Stock: X' exact counts from SSR HTML
- Writes both price + stock observations (confidence 1 or 2)
2026-04-17 22:54:40 +02:00
Rene Fichtmueller
f8809d999f
feat(scraper+api): warehouse stock data pipeline — FS.com v2, SmartOptics v2, Stock API
...
Scraper changes:
- fs-com.ts v2: Playwright stealth patches + www.fs.com/de/ URL fix (de.fs.com DNS NXDOMAIN).
Extracts DE-Lager, Global-Lager, Nachlieferung, units_sold, compatible_brands, price_net.
Mac-side runner (run-fs-scraper-mac.sh) via SSH tunnel for residential IP access.
Fast-fail connectivity check on datacenter IPs that are blocked by Cloudflare.
- smartoptics.ts v2: WooCommerce REST API fallback + 8 catalog categories + relative URL fix.
Was finding only 8 products, now discovers 18+ with multi-category crawl.
DB layer:
- db.ts: add upsertStockObservation() — writes 10 new stock_observations columns
(warehouse_de_qty, warehouse_global_qty, backorder_qty, units_sold, compatible_brands,
price_net, product_url, delivery dates) with dedup check.
API:
- routes/stock.ts: GET /api/stock, /api/stock/summary, /api/stock/:id
Warehouse breakdowns per transceiver/vendor with top-sellers and vendor summary.
- routes/review.ts: equivalence review queue (approve/reject/bulk-approve).
- index.ts: register /api/stock and /api/review routes.
Dashboard:
- index.html: 🏭 Stock tab with stat cards (DE-Lager, Global-Lager, Nachlieferung totals),
top-sellers table, vendor breakdown, recently-restocked events, part-number lookup.
SQL migrations:
- 034: blog-review-tag, 035: price-observations is_anomalous, 036: transceiver-equivalences.
2026-04-17 10:45:59 +02:00
Rene Fichtmueller
287eba1337
feat: 800G standards deep enrichment + Pi Starlink proxy-agent support
...
- Migration 033: comprehensive technical update for all 4x 800G standards
- 800GBASE-SR8: full optical specs (Tx OMA 2.3 dBm, Rx sens. -4.6 dBm, KP4 FEC,
MPO-16 APC, CMIS 5.2, ≤16W, 60m OM3/100m OM4, VCSEL 850nm, 53.125 GBd PAM4)
- 800GBASE-DR8: 500m SMF, EML 1310nm, 8x parallel fiber, MPO-12, -9dBm sensitivity
- 800GBASE-LR4: 2km CWDM4 WDM (1270/1290/1310/1330nm), 4x 106.25 GBd PAM4, LC duplex
- 800G-ZR (OIF-800ZR-01.0): DP-16QAM 96 GBd, 1000km EDFA, SD-FEC, 20-24W, DCO license
- Pi scraper: add optional SOCKS5 proxy via dante-server on WireGuard IP
- Enables Starlink bandwidth contribution (PROXY_AGENT=1 flag)
- Scraper routes selected jobs through Pi SOCKS5 for different IP range
2026-04-09 20:34:27 +02:00
Rene Fichtmueller
fd5308680e
feat: TIP audit fixes — Qdrant init, switches columns, verification fix, crawler live status, demo data badges
...
- Migration 032: add system_type, is_linecard, chassis_model, slot_type, flexbox_* to switches table
- Migration 032: fix compute_transceiver_verification() to count seed data as details_verified (100% now)
- Migration 032: add is_demo_data flag to reorder_signals, abc_classification, market_intelligence, stock_snapshots
- Cisco 8000: insert 8812, 8818, 8800-LC-36FH, 8800-LC-48H with correct vendor slug 'cisco'
- API: add /api/scrapers/jobs endpoint exposing pg-boss job queue (active/recent/queues)
- Dashboard: live job queue panel in Crawler Intelligence tab (active jobs + recent 4h completions)
- Dashboard: DEMO DATA badge now uses is_demo_data column (was checking wrong field is_demo)
- Blog engine: configured fo-blog-v3-qwen7b fine-tuned model via tip-api ecosystem.config.js
- Qdrant: all 6 collections created, seeded (2135 products, 29 FAQs, 39 news, 20 troubleshooting)
2026-04-09 20:29:46 +02:00
Rene Fichtmueller
0c58ea072c
data: add Cisco 8000 accuracy migration (031)
...
Documents all Cisco 8000 research findings as idempotent SQL migration:
- 8201-32FH: Silicon One Q200 (not Q100), full description, features, use_cases
- 8608: modular chassis type, corrected description
- NEW: 8812 (12-slot), 8818 (18-slot) modular chassis
- NEW: 8800-LC-36FH (36×400G QSFP-DD), 8800-LC-48H (48×100G QSFP28) line cards
flexbox_notes covers:
- IOS XR EEPROM/PID check behavior (TX laser disabled on unknown PID)
- transceiver permit pid all (per-interface, NOT global)
- DCO license required per 100G for ZR/ZR+ coherent optics (all vendors)
- No RTU license for grey non-coherent optics
- TAC refusal + PFM alarm on override
- IOS XR upgrade risk for 3rd-party optics
- DOM monitoring may show incorrect values
- Flexoptix FlexBox Cisco Systems mode = native PID, no override needed
2026-04-09 09:09:23 +02:00
Rene Fichtmueller
6fb9b6eb4f
feat(sql): migrations 026+027 for price cleanup and FS.COM EUR fix
...
026: Remove invalid price observations (sub-manufacturing-cost), disable
optictransceiver.com (domain repurposed as plant shop), fix verification
function to accept low/medium/high data_confidence values
027: Clean up FS.COM USD→EUR converted prices, force re-scrape with
new de.fs.com EUR-primary scraper
2026-04-06 02:22:00 +02:00
Rene Fichtmueller
6d7b067ca9
fix: resolve merge conflict in index.ts + add untracked blog-sll, news, sql migration
2026-04-05 11:51:07 +02:00
Rene Fichtmueller
d9f5fc253f
fix(verification): 100% Verified Badge war dramatisch zu großzügig
...
KERNPROBLEME BEHOBEN:
1. ATGBICS part_number = URL slug statt echte OEM-Nummer
extractOemPartNumber() entfernt -r-compatible-transceiver-* Suffix
+ trailing Vendor-Namen (nokia, cisco, juniper, ...)
Ergebnis: 3he16564aa-nokia-r-compatible-transceiver-... → 3HE16564AA
2. reach_label = '' (leer) wurde als details_verified akzeptiert
IS NOT NULL erlaubt leere Strings → Fix: AND reach_label != ''
3. details_verified = true trotz garbled part_number
Neue Kriterien: NOT ILIKE '%-compatible-transceiver%'
NOT ILIKE '%-r-compatible%'
4. data_confidence Werte falsch in Funktion ('scraped_unverified' etc)
Echte Werte: low/medium/high/garbage → NOT IN ('garbage','unknown')
ERGEBNIS nach recompute_all_verification():
fully_verified: 3.654 → 581 (Badge war 6x übertrieben)
details_verified: inflated → 1.075 (korrekt)
ATGBICS Scraper:
- extractOemPartNumber() für collection und product detail pages
- detectReach() jetzt auch auf URL-slug (120km im slug → reach_label)
Price Anomaly Detection:
- API: price_anomaly field wenn max/min ratio ≥ 10x
- Dashboard: ⚠ Preisanomalie Banner mit Ratio + EUR Range
SQL 025: Part number cleanup (30 records), reach from slug (12 records)
2026-04-04 15:41:57 +02:00
Rene Fichtmueller
7db9fad108
feat: blog engine v5 — narrative control + linkedin post + min words fix
...
- STEP4b_NARRATIVE_CONTROL: new pipeline step after draft; detects wrong
narrative (technology blamed instead of processes), applies anti-FUD filter,
reality reframe ("this becomes a problem when..."), Flexoptix voice check
- System prompt: NARRATIVE CONTROL RULE added as absolute rule #1
- Gold Standard 4: corrected "compatible vs OEM" article added as reference
- Minimum words: STEP4 raised from 1500 to 2500 words (final output was 750)
- Reduction pass: 25-35% → 15-25%, target 1500-2000 words final
- STEP_LINKEDIN_POST: generates LinkedIn post ≤2800 chars (hard limit 3000);
stores in blog_drafts.linkedin_post + linkedin_char_count column
- Pipeline now 14 steps: v5-narrative-control
- Migration 024: linkedin_post + linkedin_char_count columns in blog_drafts
2026-04-04 08:30:27 +02:00
Rene Fichtmueller
5091a7b75f
feat: proxy network — geo-lookup, uptime tracking, dedup fix
...
- IP geo-lookup via ip-api.com on register/heartbeat (country_code, city)
- heartbeat_count column + uptime_pct computation on every heartbeat
- Deduplication: register returns existing token for same IP+port
- Heartbeat no longer overwrites registered IP (prevents IPv6 churn conflicts)
- Migration 023: heartbeat_count column + backfill existing nodes
2026-04-04 08:15:32 +02:00
Rene Fichtmueller
7f1c701ba1
feat: 6 prediction signal scrapers + forecast engine
...
New scrapers (all registered in pg-boss, 50 total jobs):
- sec-edgar.ts : SEC EDGAR XBRL API — hyperscaler CapEx from 10-Q/10-K
- github-signals.ts : GitHub Search/Stats API — tech adoption metrics weekly
- ebay-velocity.ts : eBay completed listings — sold count + price distribution
- ai-clusters.ts : RSS feeds (6 sources) — AI cluster & DC announcements
- distributor-leads.ts : Mouser, Digi-Key, RS Components — lead time + stock
- standards-tracker.ts : IEEE 802.3, OIF, IETF — draft/ballot/published status
New utilities:
- forecast-engine.ts : Weighted signal aggregator → demand_index + price_direction
6 signal types, 4 horizons (3/9/12/18 months), 5 technologies tracked
New DB tables (migration 022):
hyperscaler_capex, distributor_lead_times, github_tech_signals,
marketplace_velocity, ai_cluster_announcements, standards_activity,
forecast_signals
Schedules:
- EDGAR: weekly Mon 06:00
- GitHub: weekly Sun 05:00
- eBay velocity: every 12h
- AI clusters: every 4h (news-speed)
- Distributor leads: daily 03:30
- Standards: weekly Wed 04:00
- Forecast engine: daily 08:00 (after all nightly scrapers)
2026-04-02 02:02:44 +02:00
Rene Fichtmueller
39ee5e86fa
fix: procurement demo data — correct schema column names for all 4 tables
2026-04-01 23:16:50 +02:00
Rene Fichtmueller
b4692cd2a3
feat: nightly scraper window 00-08 + NAS Fearghas sync + procurement demo data
...
- All scrapers now run nightly 00:00-08:00 (staggered, every day)
- NAS sync module: rsync JSON exports + weekly pg_dump to Fearghas via WireGuard
- 07:45 daily: price_observations, switches, transceivers, signals, issues exported as JSON
- Migration 021: 200 ABC classifications, 150 reorder signals, 300 stock snapshots demo data
- 9 market intelligence entries (LightReading, FierceTelecom, Farnell, Mouser, EU TED, Arista)
- 6 lifecycle events (ZR, 800G OSFP, 100G DR4 price floor, SFP-10G-SR EOL)
2026-04-01 23:07:26 +02:00
Rene Fichtmueller
70f8fcd975
feat: product intelligence layer — eBay enricher, community issues, datasheets+manuals API
...
- Migration 020: product_issues table, condition/marketplace on price_observations, features JSONB
- eBay enricher: switch features/description/refurb prices + transceiver condition pricing
- Community issues scraper: Reddit/ServeTheHome/Arista/Cisco community bug reports
- 7 pre-seeded issues (DCS-7800R3, SG350, QFX5120, CRS326, USW-Pro etc.)
- API: /switches/:id/issues + /switches/:id/documents endpoints
- Dashboard switch modal: features from DB, description, eBay refurb price, issues+docs async
- Datasheet finder for Arista/Cisco/Juniper/HPE vendor pages
- Scheduler: 4 new jobs (ebay enrichment nightly, community issues weekly)
2026-04-01 22:46:27 +02:00
Rene Fichtmueller
49ccf9a5d2
feat: Procurement Intelligence Engine (WS0c)
...
- Migration 019: stock_snapshots, abc_classification, reorder_signals,
product_lifecycle_events, market_intelligence, crawler_llm_log tables
- Seeded 7 market intel events (OFC 2026, AWS/Azure CapEx, Coherent lead times,
EU TED tenders, ECOC 2026, IEEE 802.3df)
- Seeded 4 lifecycle events (Cisco SFP-10G-LR EOL, Juniper EOL,
400ZR ratified, 800G MSA draft)
- Crawler LLM: core.ts (Ollama-based extractor), stock-schema.ts (typed schemas
+ vendor profiles for Flexoptix/FS.com/10Gtek/ATGBICS/ProLabs/Farnell/Mouser),
validator.ts (rule-based sanity checks + cross-validation)
- market-intelligence.ts scraper: OFC/ECOC, LightReading, IEEE 802.3, EU TED,
Farnell/Mouser lead times, FierceTelecom — weekly via pg-boss
- computeAbcClassification(): dynamic A/B/C classification from price obs +
compat count + vendor breadth
- computeReorderSignals(): buy_now/wait/hold/monitor with reasons + signal strength
- API: GET /api/procurement/overview|signals|signals/:id|abc|market-intel|
stock-trends/:id|lifecycle
- Dashboard: Procurement Intel tab with Reorder Signals, ABC table,
Market Intel cards, Lifecycle Events
2026-04-01 22:04:33 +02:00
Rene Fichtmueller
7d2562af9c
fix: detect+warn garbage product names, add DB cleanup migration 018
...
- isGarbageName(): detects scraped-slugs, 'All Optical Transceivers', 'Compatible NNGbps...',
generic form-factor descriptions with no real SKU
- Panel title priority: real standard_name → part_number → description → constructed from specs
- Details warning shown when details_verified = false (amber banner)
- sql/018: marks garbage entries as data_confidence='garbage' for future DELETE
2026-04-01 21:26:13 +02:00
Rene Fichtmueller
04ec8fe69b
feat: Verified Price + 100% Verified stamp system
...
DB (017-verification-tags.sql):
- New columns: price_verified, price_verified_eur, price_verified_url, price_verified_at
- New columns: image_verified, details_verified, fully_verified, fully_verified_at
- compute_transceiver_verification(uuid): per-product verification logic
• price_verified: real scraped URL + price > 0 + observed in last 30 days
• image_verified: R2 stored OR image_url from known vendor CDNs (flexoptix.net, fs.com, etc.), no placeholder
• details_verified: product_page_url + all core fields (form_factor, speed, reach, fiber_type, part_number) populated
• fully_verified: all three true simultaneously
- recompute_all_verification(): bulk recompute, returns stats
- Initial run: 3575 price_verified, 1173 image_verified, 1380 details_verified, 258 fully_verified
- Indexes on price_verified, fully_verified for fast filtering
- v_verified_products view
API finder.ts:
- SELECT now includes all verification fields
- Response maps: price_verified, price_verified_eur, price_verified_url, image_verified, details_verified, fully_verified
API health.ts:
- verification block: counts + coverage percentages in /api/health
Dashboard Finder:
- 'Verified Price': green checkmark ✓ next to price, tooltip explains source
- '100% Verified' stamp: dark green gradient badge top of card, card gets green border
- 'price source ↗' link to original scraped URL
- Summary bar: 'X × 100% Verified · Y with verified prices'
2026-04-01 17:43:48 +02:00
Rene Fichtmueller
73ef5766e6
feat(v0.2.1): data confidence tracking + validation + blog feedback system
...
- Migration 016: data_confidence column (vendor_verified/enriched_estimated/scraped_unverified)
- Migration 015: blog_feedback table with 8 quality scores + free text
- Validation script: 8 physics-based rules (wavelength↔fiber, reach plausibility, power limits)
- Blog feedback API: POST /api/blog/:id/feedback + training data export
- FO Blog Pipeline v3: 10-step Flexoptix Style prompts (Less bullshit. More engineering.)
- Auto-fix: wavelength↔fiber mismatches corrected automatically
2026-03-31 09:12:37 +02:00
Rene Fichtmueller
aa977abc97
feat(v0.2.0): Sales Intelligence Engine — Phase 0+A
...
New API routes:
- GET /api/finder — Switch→Flexoptix transceiver finder with FlexBox coding
- GET /api/competitor-alerts — Competitor intelligence (price changes, new products, stock)
- GET /api/forecast/:technology — Sales forecast 3/9/12/18 months + buy/wait/hold signal
- POST /api/transport/plan — Transport system planner (city→city BOM with fiber providers)
New MCP tools:
- find_flexoptix_for_switch — Customer switch → Flexoptix products
- get_competitor_alerts — Competitor monitoring
- plan_transport — Network transport planning
- forecast_sales — Volume/revenue prediction
- generate_blog — Enhanced blog generation
New DB tables (migration 013):
- competitor_alerts, price_changes, flexoptix_product_map
- sales_forecasts, fiber_providers, fiber_routes, cities
- generated_datasheets, blog_series
- Views: v_price_coverage, v_image_coverage, v_switch_flexoptix_finder
Seed data (migration 014):
- 25 European cities with IX/DC locations + coordinates
- 15 fiber providers (euNetworks, Telia, DTAG, Colt, Zayo, etc.)
- 16 fiber routes with pricing (Germany focus)
Infrastructure:
- Scraper scheduler: 2h Flexoptix, 4h FS.com/Optcore (was 6-8h)
- Change detector for competitor price/stock monitoring
- Image downloader utility with coverage tracking
2026-03-31 08:51:22 +02:00
Rene Fichtmueller
5a0cbed5a2
feat: dashboard v2, blog expansion, market/cable MCP tools, switch asset scrapers, scraper utilities
2026-03-30 08:07:12 +02:00
Rene Fichtmueller
4b452ab49e
feat(scrapers+mcp): ATGBICS + ProLabs scrapers, MCP HTTP/SSE server
...
Scrapers:
- atgbics.ts: PlaywrightCrawler for UK vendor ATGBICS (Shopify store),
scrapes SFP/SFP+/SFP28/QSFP+/QSFP28/QSFP-DD in GBP, max 50 pages/run
- prolabs.ts: HttpCrawler for ProLabs (Legrand subsidiary), USD pricing,
category-driven crawl with reach/fiber/speed detection
- Both registered in scheduler (every 8h, staggered) and index.ts CLI
MCP HTTP Server:
- packages/mcp-server/src/http-server.ts: Express + SSEServerTransport
- Exposes all 12 TIP tools via GET /sse + POST /message
- Bearer token auth (MCP_SECRET env), CORS-configurable
- GET /health → { status: "ok", tools: 12 }
- Port: MCP_HTTP_PORT (default 3201)
SQL + tools:
- sql/006-009: seed scripts for whitebox switches, vendors, assets
- switch-docs.ts: MCP tool for switch documentation queries
2026-03-29 02:26:45 +08:00
Rene Fichtmueller
122ca8444d
feat: Phase 5 — OCR pipeline + document/news search
...
Docling-powered OCR pipeline: PDF → markdown → chunks → Ollama embed → Qdrant.
News embedding seeder for news_embeddings collection.
Document and news semantic search API endpoints.
- embeddings/ocr-pipeline.ts: Docling convert → chunk → embed pipeline
- embeddings/seed-news.ts: Batch embed news_articles into Qdrant
- routes/documents.ts: POST /api/documents/process, GET /api/documents
- routes/search.ts: GET /search/documents, GET /search/news endpoints
- sql/005-documents.sql: Add chunks_count, processed_at to documents table
- Ollama + nomic-embed-text installed on Erik (CPU mode)
- 89 products + 40 datasheet chunks + 33 news articles in Qdrant
2026-03-28 00:22:01 +13:00
Rene Fichtmueller
b43bdd3060
feat: TIP Phase 0+1 — monorepo, DB schema, API, scraper engine
...
Phase 0 - Foundation:
- Restructure into npm workspace monorepo (packages/core, api, scraper)
- PostgreSQL 17 + TimescaleDB schema (15 tables incl. hypertables)
- Docker Compose for local dev (PostgreSQL on 5433 + Qdrant)
- Express 5 API on port 3200 with 6 routes
- Seed script to migrate 159 transceivers + 42 standards from npm package
- Erik server setup script + PM2 ecosystem config
Phase 1 - Scraper Engine:
- Crawlee + Playwright framework with pg-boss scheduler
- FS.com scraper (PlaywrightCrawler, anti-bot workaround)
- Optcore.net scraper (WP REST API enumeration + PlaywrightCrawler)
- Uses /wp-json/wp/v2/product to get 2000+ product URLs
- Playwright renders individual product pages for price extraction
- Cisco TMG Matrix scraper (compatibility data)
- News RSS aggregator (optics.org, SPIE, Network World, Nature Photonics)
- Keyword relevance scoring for transceiver/fiber topics
- xml2js with malformed XML sanitization
- SHA-256 content hashing for change detection (skip unchanged records)
- pg-boss v10 with explicit queue creation before scheduling
2026-03-27 16:27:31 +13:00