7 models: AS7312-54X, AS7312-54XS, AS7326-56X, AS7716-32X, AS7816-64X,
AS9716-32D, AS7512-32X. All from SONiC HCL Accton/Edgecore vendor.
Sources: edge-core.com official WP CDN + stordis.com + epsglobal.com.
Also adds scripts/apply-pending-migrations.sh for Erik DB catchup.
A9K-4HG-FLEX-FC, A9K-8X100G-LB-TR, A9K-4X100GE base: all share identical
hardware with previously-covered -TR/-SE siblings. FC/SE/TR are Cisco
software license suffixes on the same physical line card. Coverage: 595 → 598.
cdn.cookielaw.org logos appear as the largest DOM image on Dell/Extreme
product pages when the cookie consent overlay is present. Added to both
GENERIC_IMAGE_PATTERNS (isGenericImage filter) and img fallback skipPattern
so the next-largest actual product image can be found.
Previously: if og:image existed (even as a Dell logo URL), page.evaluate() returned
early and the img fallback was never tried. Now: meta tags are extracted first, then
isGenericImage() is checked in Node.js, and the img fallback runs if meta image is null
or generic. This allows vendors like Dell (og:image = logo) to still get product images
via the DOM fallback.
- buildAristaUrl() now extracts series prefix (7060X5-32QS → 7060x5-series)
instead of individual model URLs that lack og:image
- Strip trailing sub-variant 'A' so R3A → R3 series page
- Add uniqueKey: row.id to each request — prevents Crawlee from deduplicating
models that share the same series URL (e.g. 7060x5-series)
- For Arista: always prefer fresh builder URL over stored product_page_url
so stale individual-model URLs don't override correct series pages
switch-image-fetcher:
- Add Fortinet URL builder (11 FortiSwitch models)
- Add Quanta Cloud Technology, Allied Telesis, Ufispace, Netgear URL builders
- Fix alcatel-lucent-enterprise slug missing from URL_BUILDERS dispatcher
- Fix NVIDIA builder to skip ConnectX/BlueField adapters (not switches)
- Add aruba slug alias for hpe-aruba
health endpoint:
- Add system metrics: CPU load (1/5/15m), memory usage, disk usage
- Add load_status indicator (ok/busy/overloaded)
- Expose process RSS memory
- Used by external monitors
scripts/monitor-erik.sh:
- Cron-ready health check script for Claudi (.82) and Raspberry Pis
- Checks TIP API health endpoint (load, memory, disk, DB latency)
- Checks PM2 process state via SSH (errored/stopped detection)
- ntfy.sh push notifications (set NTFY_TOPIC env var)
- Includes systemd service + timer unit comments for auto-install
Replace placeholder models (93180YC-FX3, 7280R3A, QFX5120 — not in DB) with
the real seeded switches: N9K-C9364C, 93600CD-GX, 7060CX2-32S, QFX5130-32CD, SN5600
Seeds known-good official datasheet PDFs and config guide URLs for the 24 initial
seeded switches. Directly visible in switch detail panel → Datasheets & Manuals.
- Add flexoptix-compat.ts: maps switch models to compatible Flexoptix transceivers
via search API (vendor_compat) with form-factor fallback (spec_match)
Scheduled daily at 09:00 UTC as scrape:compat:flexoptix
- Enhance community-issues.ts: add vendor advisory sources (Cisco Field Notices,
Juniper KB, SONiC GitHub Issues) + new scrapeTransceiverCompatIssues() that
searches for switch+transceiver combination problems specifically
- Scheduler: 59 schedules, 78 workers
price?: number narrowing via typeof/!== undefined does not work for
arithmetic comparisons in TypeScript 5.9 dead code paths; use 'as number'
cast to keep the dead code compilable while the early-return guard above
prevents runtime execution entirely.
- ATGBics: update HTML parser from old card--product theme to new
card__info theme (Shopify template changed April 2026); name now
extracted from href link text instead of aria-label
- NADDOD: correct ensureVendor shop URL from /collections/transceivers
(404) to /collection/optical-transceivers
- VCELink: disable scraper — site pivoted from optical transceivers to
audio/video/cable products; all collection URLs return 404
- Replaces Playwright with pure fetch() — static HTML has prices
- Correct collection handles (compatible-transceivers-sfpp-10g etc.)
- Cookie: cart_currency=GBP forces GBP pricing from any geo-IP
- Handles 35+ pages per category × 24 products = 840+ SFP+ products
- No IP-blocking with static HTML (Playwright was the trigger)
- Adds scripts/run-atgbics-mac.sh for Mac-side runner if needed
- Fix OSFP-DR8-1.6T-FL and OSFP-2FR4-1.6T-FL: speed_gbps was 200, now 1600
→ FS.com 1.6T products now correctly match as comparables for Flexoptix O.1316T.C.05.M
- API: extend comparable price query to return comp_form_factor, comp_speed_gbps,
comp_reach_meters, comp_reach_label, comp_fiber_type, comp_wavelengths
- Dashboard: replace plain comparable price row with side-by-side spec comparison card
showing Flexoptix vs. competitor: Form Factor, Speed, Reach, Fiber, Wavelengths
with color coding (green=match, orange=mismatch) and savings badge (−45% günstiger)
- client.ts: add claude-code provider routing BLOG_LLM_PROVIDER=claude-code
to claude-bridge (flat-rate, no API billing via Claude Code subscription)
- checkHealth() now pings /health on claude-bridge for real availability check
- Default OLLAMA_LLM_MODEL changed from qwen2.5:14b to fo-blog-v5
- Dashboard: add claude-code card (EMPFOHLEN), rename fo-blog-v3 → fo-blog-v5
- loadBlogLLMStatus() handles all 3 providers: claude-code/anthropic/ollama
- Grid expanded from 3 to 4 columns to accommodate new card
- ecosystem.config.js + .env on Erik: OLLAMA_LLM_MODEL=fo-blog-v5 confirmed
- All 247 FS.com prices were €79 (shipping threshold, not product prices)
- Root cause: 'Gratis Versand ab 79 € (ohne MwSt.)' banner matched first
- Fix 1: DOM price extraction in page.evaluate with bad-parent skip list
- Fix 2: bodyText qualified patterns skip matches near shipping keywords
- Fix 3: waitForSelector for price DOM element before evaluate
- Fix 4: Deleted 247 invalid €79 observations from DB
Also included from previous session:
- db.ts: set has_image=true on image writes (fix 632 desync rows)
- spec-updater.ts: DR/FR/LR/ER/ZR → SMF, SR → MMF fiber type inference
Both generateBlog() and generateBlogManual() were calling
POST /api/blog/generate without an Authorization: Bearer header.
The requireAuth middleware correctly returned 401, which appeared
as 'Unauthorized — please log in' toast in the dashboard.
Fix: read loadToken() before each fetch and include the token in
the Authorization header. Also add r.status===401 guard to redirect
to login page when token expires, instead of showing error toast.
optcore.net blocks Erik's IP (82.165.222.127) via Cloudflare WAF.
WP REST API returns HTML block page instead of JSON → 0 product URLs
→ 0 scraped pages every run. Add SKIP_OPTCORE_SCRAPER guard matching
the existing SKIP_FS_SCRAPER pattern. Set in ecosystem.config.js on
Erik. Residential IP (Mac launchd) would be needed to use this scraper.
Crawlee's FileSystemStorage marks request URLs as HANDLED (state=4,
orderNo=null) after processing. With purgeOnStart=false these entries
persist, so on the next run crawler.run(startUrls) deduplicates them
→ requestsTotal=0 → immediate finish with 0 scraped pages.
Fix: rmSync request_queues/default/ before each makeCrawleeConfig()
call. Safe: session pool state lives in key_value_stores/, not in
request_queues/. Affects all Crawlee-based scrapers (ATGBICS, Optcore,
Switch-assets, etc.).
10gtek.com main site only exposes technical spec tables with no prices.
sfpcables.com is 10Gtek's own retail store and has both Model numbers
and USD prices in standard Magento product listings.
Changes:
- Switch scraping target from www.10gtek.com to sfpcables.com
- Parse Model: <part> + US.XX per product block (Magento structure)
- XFP fallback: extract part number from title after '|' separator
- Add fetchAllPages() with Magento loop-detection via seen-part dedup
- Remove QSFP-DD category (not available on sfpcables.com)
- Drop XFP-less categories from old 10gtek.com spec-table parser
Verified: 10/10 SFP prices, 10/10 SFP+ prices, 4/4 XFP prices on live site.
- Add global unhandledRejection handler in scheduler daemon to swallow
Crawlee's benign post-run ENOENT lock-file races (prevents process.exit(1))
- Add SKIP_FS_SCRAPER env var: skip FS.com worker on Erik where Cloudflare
WAF blocks datacenter IPs (Mac launchd handles FS.com from residential IP)
- Remove FS.COM from health monitor EXPECTED_VENDORS (skipped on Erik)
- Health monitor: extend pg-boss lookup from 12h → 26h, add completed-job
map; if job ran OK in last 26h + vendor has historical prices → mark
STABLE instead of CRITICAL (fixes ATGBICS/Fluxlight hash-dedup false positives)
- Install Playwright Chromium on Erik (fixes ATGBICS BrowserLaunchError)
- Create missing Crawlee storage dirs on Erik (storage-fs-phase1/2,
storage-ebay-transceivers) to prevent ENOENT on first Crawlee run
Pattern 1 (href→aria-label) finds 127 navigation links on GBICS BigCommerce
pages — none contain GBP prices. Pattern 2 (aria-label→href) correctly
finds 16-30 product links per category page with £XX.XX prices in aria-labels.
The fallback from P1 to P2 now triggers when P1 finds results but none
contain '£', rather than only when P1 finds 0 total results.
Previous logic fired an alert whenever prices_6h=0, even when prices
were genuinely stable (content hash dedup prevents duplicate inserts).
This caused Flexoptix, ATGBICS and others to trigger alerts every 3h
despite their scrapers running successfully.
New logic:
🔴 CRITICAL: last price > 7 days (genuine failure)
🟡 WARNING: last price 48h–7 days (possibly stale)
✅ STABLE: last price ≤48h, 0 new (prices unchanged, scraper OK)
Also shows pg-boss job state/time alongside each vendor for faster
root-cause diagnosis. Trimmed EXPECTED_VENDORS to vendors with actual
scraper implementations (removed never-scraped placeholders).
After PlaywrightCrawler.run() resolves, Crawlee's internal task loop
schedules one final _isTaskReadyFunction call that tries to read a
request queue .json file already cleaned up during processing. This
ENOENT fires as an unhandledRejection and calls process.exit(1),
aborting Phase 2 before prices are written to the database.
Added a targeted unhandledRejection handler in the require.main block
that swallows ENOENT errors from request_queues paths (benign Crawlee
cleanup race) while re-raising all other rejections.
Adds /tmp/tip-fs-scraper.lock PID file to prevent launchd from running
a second instance while the previous one is still active (e.g. 2am
schedule fires, runs past 10am when launchd fires again). Without this,
concurrent instances caused rmSync(storage-fs-phase1) in one instance
to delete the Crawlee storage dir while the other was still using it,
resulting in ENOENT crashes.
- Add makeCrawleeConfig isolation to CheerioCrawler instances
- Switch from named persistent RequestQueue to ephemeral null queues:
named queues retain 'handled' state and skip all URLs on re-runs,
causing 0 observations on every run after the first.
- Applies to both enrichSwitchFromEbay and enrichTransceiversFromEbay.
- Add utils/crawlee-config.ts: makeCrawleeConfig(name) returns a
Crawlee Configuration with isolated localDataDirectory per scraper.
Uses storageClientOptions (not global CRAWLEE_STORAGE_DIR) so
concurrent pg-boss workers in the same process don't race on
the shared env var.
- Apply makeCrawleeConfig to all 6 Crawlee-based scrapers:
optcore (PlaywrightCrawler), atgbics (PlaywrightCrawler),
community-issues (CheerioCrawler + RequestQueue),
edgecore (CheerioCrawler), ufispace (CheerioCrawler),
market-intelligence (CheerioCrawler).
- scheduler.ts: add withIsolatedStorage for optcore and market-intel
workers (was missing, caused storage-fs path bleed from fs scraper).
- ebay-enricher.ts: fix vendor type 'marketplace' -> 'reseller' to
satisfy vendors_type_check constraint
['manufacturer','distributor','oem','reseller','compatible'].
FiberMall:
- Correct /store-XXXXX-name.htm category URLs (was /c/xxx/ → HTTP 404)
- Parser: split on new_proList_mainListLi, price from data-price on
currency_price span — fix 0.00 false-match from SKU variant items
- Also scrape SKU brand variant links from .sku_item divs
- Result: 3,410 prices now in DB (was 0)
Flexoptix:
- Fix extractPrice regex for EUR thousand-separator format
(2,921.60 EUR was parsed as 2 EUR)
- Add OSFP224 / 1.6T search queries (4 new, form factor was missing)
- Fix O.138HG2.C.05 stale price 3009.60→2921.60 EUR
Schema: competitor_verified + competitor_verified_at columns
added via ALTER TABLE (were referenced in code but missing in DB)
CHANGELOG: added 6 entries for 2026-04-12
- New scrapers: fibermall.ts (WooCommerce), vcelink.ts (Shopify), opticsbay.ts (WooCommerce)
- QSFPTEK rewritten to use /mall/commodity/list API (old OpenCart /c/*.html paths gone 404)
- New: attribute-based filtering by data rate (1G/10G/25G/40G/100G/200G/400G/800G)
- Scrapes HTML fragments, extracts US$ prices and product URLs
- scheduler.ts: +3 queues/schedules/workers (fibermall, vcelink, opticsbay) → 61 total workers
- index-pi.ts: Pi fleet picks up all 3 new scrapers
Crawlee's SessionPool throws 'Could not find SDK_SESSION_POOL_STATE.json'
when initializing against a freshly-created isolated storage dir.
Setting CRAWLEE_PURGE_ON_START=1 tells Crawlee to start fresh instead
of trying to load non-existent session state — fixes FS.com and ATGBICS
crashes at the start of every 2h cycle after the dirs were cleaned up.
ProLabs uses B2B quote model - prices require reseller account and are
not shown publicly (schema.org always shows price=0.00). Fighting
CloudFront WAF with Firefox automation is pointless.
New approach:
- Sitemap-driven: downloads all 14 sitemaps to collect product URLs
- fetch-based: curl-compatible HTTP requests bypass CloudFront TLS detection
- catalog-only: writes part numbers + specs to transceivers table
- Rate-limited: 300ms between requests (~3 req/sec)
- No proxy needed: Pi nodes no longer consumed for ProLabs
Replace hard-coded purple/green colors with theme CSS variables.
Dark code blocks (#1e1e1e bg), orange accent for active borders/badges,
dark green for status text, amber for warnings — all readable on white.
Remove boss.work() registrations for lightweight fetch/cheerio scrapers
from Erik's scheduler. Pis are now the SOLE consumers of these queues:
fluxlight, gbics, optcore, champion-one, sfpcables, blueoptics, fiber24,
tscom, skylane, ascentoptics, gaotek, smartoptics, hubersuhner, news,
market-intel.
Shows active model (fo-blog-v3-qwen7b / claude-sonnet-4-6 / qwen2.5:14b),
live status from /api/blog/llm/status, ratings, config instructions,
and highlights which model is currently active.
Routes requests through CT130/131/132 proxy pool (192.168.178.77/76/74:1080)
when PROXY_URLS env var is set. Uses ProxyConfiguration from crawlee for
PlaywrightCrawler scrapers and socks-proxy-agent for fetch-based scrapers.
- Reset details_verified=false for 298 products where reach_label is empty (DB migration)
- Runtime check in dashboard: dVer requires non-empty reach_label regardless of DB flag
- comparable price query: treat reach_meters=0 same as NULL so 800G OSFP products
find FS.com equivalent prices (was blocked by reach_meters=0 != NULL shortcircuit)
- Product image area now fully clickable with vendor link overlay when product_page_url exists
- Clear wrong image for O.Czz8HG.z.R (was showing unrelated OSFP product image)
Completes training data coverage for all 8 blog types:
market_alert(2), comparison(1), technology_deep_dive(4), tutorial(3),
hype_cycle(1), buying_guide(1), migration_guide(1), new_product(1),
competitor_analysis(1) — 15 gold-standard articles total
- blog-012: technology_deep_dive — coherent vs direct-detect decision framework
- blog-013: market_alert — transceiver price cycle, when to buy
Training set now covers: market_alert(2), comparison(1), technology_deep_dive(4),
tutorial(3), hype_cycle(1), buying_guide(1), migration_guide(1) — 13 total
Adding diverse topic coverage:
- blog-008: buying_guide — OEM vs compatible real cost numbers
- blog-009: migration_guide — 100G→400G what actually breaks
- blog-010: technology_deep_dive — QSFP-DD vs OSFP form factor reality
- blog-011: tutorial — transceiver procurement checklist
All follow FO rules: no markdown headers in body, no bullet lists,
one thesis, engineer voice, ~1000 words. Total training set: 11 articles.
The generateClaude() function was recursively calling itself inside
enqueueClaude(), creating a circular Promise dependency that permanently
deadlocked the claudeQueue. Any 429 rate-limit response would poison
the queue, blocking all future Claude API calls until server restart.
Fixes:
- Split retries into claudeApiCall() which is called from enqueueClaude
(not re-entering the queue on retry = no circular dependency)
- Max 3 retries with increasing backoff (10s/30s/60s)
- Add resetClaudeQueue() exported function
- Add 15-minute auto-reset stall detection to enqueueClaude
- Expose resetClaudeQueue in POST /api/blog/llm/reset-queue endpoint
- Fix merge conflict markers in index.ts (duplicate scraperRouter import)
026: Remove invalid price observations (sub-manufacturing-cost), disable
optictransceiver.com (domain repurposed as plant shop), fix verification
function to accept low/medium/high data_confidence values
027: Clean up FS.COM USD→EUR converted prices, force re-scrape with
new de.fs.com EUR-primary scraper
EUR prices scraped verbatim from de.fs.com — no conversion needed.
USD derivation (EUR→USD) happens downstream, not EUR←USD.
Fixes price discrepancy: TIP showed USD 999×0.92=EUR 866 vs real €948 on de.fs.com.
Root cause of fake prices (e.g. 1.30 for 800G OSFP):
- parsePrice accepted any bare number without currency symbol
- Could misread stock counts, page numbers, or CSS values as prices
- Also picked the first number, not the main price
Fix:
- Require explicit currency symbol or decimal format (1234.56)
- Use the LARGEST number found in the price string
- Returns price=0 (rejected) when no valid price pattern found
- blog/generate now uses caller title when provided; falls back to template
- Migration 027: hard price floor by speed class in verification function
(no medians, no estimates — only real prices above minimum thresholds)
- Deleted 474 obviously wrong price observations (shipping costs scraped as prices)
- Price column now shows price_verified_eur (in EUR, dimmed) when street_price_usd is null
Fixes: FS.COM products showing dash while being marked fully verified
- Badge logic now requires visible price AND image_verified AND details_verified
No more badge when price displays as dash — all requirements must be visually present
Tier-1 Anthropic API has 40K TPM — with ~20K tokens per pipeline step,
concurrent calls immediately hit the limit. enqueueClaude() serializes
all generateClaude() calls so only one runs at a time, eliminating
the flood of 429-retry-429-retry loops.
- Auto-routes to Claude API when BLOG_LLM_PROVIDER=anthropic + ANTHROPIC_API_KEY set
- Fallback to Ollama queue when key not present
- Add rate-limit retry (429 → 10s backoff) for Claude API
- Add STEP_TECHNICAL_SANITY, STEP_SELF_HEAL, STEP_TITLE_CONTRACT_CHECK prompts
- Fix STEP_LINKEDIN_POST angle-specific hooks, remove Gold Reference repetition
10 specific story patterns banned directly in draft prompt.
LinkedIn banned hooks: 'Everything looks fine', 'CRC creeping', 'Same optics same setup'.
Title Contract now injected into STEP4 as binding constraint.
- 'Post on blog.fichtmueller.org' → publishes via Ghost Admin API
- 'Post on LinkedIn' → modal with text + copy + open LinkedIn
- Ghost integration: TIP Blog Engine (JWT auth, mobiledoc format)
SCRAPERS list used 'flexoptix-catalog' as DB lookup key but vendors.slug
is 'flexoptix' — no match → 0 records shown.
Fix: added dbSlug override field to SCRAPERS entries; lookup now uses
dbSlug || name so flexoptix-catalog/vendors/supported all map to
the correct 'flexoptix' slug in sourceMap.
Blog engine (fo-blog-pipeline.ts):
- Add STEP8b_REDUCTION: cuts article 25-35%, removes repeated concepts
- Add STEP8c_STYLE_LOCK: enforces tone consistency, fixes scope/OPM confusion,
removes inline SKUs from article flow
- Add Gold Standard 3 to calibration (Style B troubleshooting example 2026-04-04)
- Pipeline now 12 steps (was 10), version bumped to v4-reduction-stylelock
blog.ts:
- Wire STEP8b and STEP8c into pipeline between Kill-AI-Tone and QA Check
- Update progress tracking to 12 total steps
- Update pipeline_version to 'v4-reduction-stylelock'
flexoptix-catalog.ts:
- Fix contentHash call: pass object directly, not JSON.stringify(object)
db.ts:
- price_verified=true set in content_hash early-return path (no new observation)
- image_verified=true auto-set in findOrCreateScrapedTransceiver on INSERT/UPDATE
- findOrCreateScrapedTransceiver now sets image_verified=true when writing image_url
- upsertPriceObservation now sets price_verified=true on the transceiver after inserting price
- Both INSERT and UPDATE paths covered for image_verified sync
- Eliminates need for manual backfill after scraper runs
PostgreSQL max_connections was being exceeded (100/100).
- Limit pg-boss internal pool to 4 connections
- Added idle_in_transaction_session_timeout=30s to PostgreSQL config
- Already raised max_connections to 300 (container config)
System now stable at ~98/300 connections
Dead code leftover from STEP4_MASTER_DRAFT rewrite was sitting outside
any template literal, causing compilation failure. Removed duplicate
CONTEXT DATA RULES block and orphaned `{{OUTLINE}}`/`{{CONTEXT_DATA}}`
placeholders that were not wrapped in a string.
Core changes:
- HARD RULES rewritten: zero tolerance for ## headers, #### Scenario: patterns, bullet sections
- Gold article added as reference standard in STEP4_MASTER_DRAFT
- MANDATORY SECTIONS removed — replaced with continuous prose requirement
- STEP3_OUTLINE: now a flow plan (3-4 beats), not a section list
- STEP5_REALITY_INJECTION: no longer adds 'What Breaks' sections — injects into prose
- STEP9_QA_CHECK: format violations now primary HARD FAIL, above content checks
- DR4 wavelength fix: 1310nm = 0.35 dB/km (not 1550nm = 0.22 dB/km)
- Scope description fix: visual inspection tool ≠ loss measurement device
- Invented firmware version numbers now explicit HARD FAIL
token variable was undefined in loadCrawlerStatus() scope (only declared
inside IIFE auth guard, not globally). All API calls silently failed with
401. Fix: read token from localStorage at start of function, consistent
with getAuthHeaders() pattern used in all other load functions.
Replace static CSS ::after tooltips with JS-powered smart tooltips.
Tooltips now detect available space above/below and flip accordingly,
and clamp horizontally to viewport bounds. Hide on scroll.
Complete Pi scraper entry point covering all pricing, catalog, compat,
intelligence and prediction signal scrapers. Includes 5 new form-factor
coverage scrapers (comms-express, router-switch, multimode-inc,
optictransceiver, wiitek). Erik runs only API+DB, all scraping on Pis.
Add Comms-Express, Router-Switch.com, Multimode Inc, OpticTransceiver.com,
and Wiitek scrapers covering CFP2-DCO, CFP4, OSFP224, QSFP112, CXP, GBIC,
XENPAK, CSFP, SFP-DD, SFP56, QSFP56 and other previously-uncovered form
factors. Each scheduled every 8h. Worker registrations added to scheduler.
Also export db alias in utils/db.ts to fix eBay enricher + community scrapers
crashing with 'Cannot read properties of undefined (reading query)'.
- POST /api/auth/login: HMAC-SHA256 signed 7-day token, password from DASHBOARD_PASSWORD env
- GET /api/auth/verify: stateless token validation
- requireAuth middleware applied to all /api/* routes (except /api/health + /api/auth)
- /dashboard/login.html: dark TIP-themed login page with show/hide password toggle
- index.html: auth guard redirect to login + Authorization header on all api() calls
- No secrets in code — password stored in .env only
- downloadDocuments(): fetches PDFs from product_documents and documents tables
using curl, organises into switches/ transceivers/ whitepapers/ other/ subdirs
- Integrated into runNightlyNasSync() — runs after JSON exports
- rsync incremental — only new/changed files transferred
- NAS dir structure: /volume1/tip-data/datasheets/{switches,transceivers,whitepapers,other}
- max-filesize 50MB guard per file
- utils/logger.ts: minimal console-based logger (debug/info/warn/error)
used by community-issues and ebay-enricher scrapers
- scripts/pi-scraper-setup.sh: step 7 adds optional WireGuard setup
(pass WG_PRIVKEY + WG_ADDR env vars) — connects Pi to Erik for DB access
auto-detects dead ethernet and routes WG traffic via working interface
New scrapers (8):
- BlueOptics (EUR, every 4h)
- ShopFiber24 (EUR, every 4h)
- T&S Communication (USD, every 4h)
- SmartOptics (catalog, every 8h)
- HUBER+SUHNER (catalog, every 8h)
- Skylane Optics (USD, every 4h)
- AscentOptics (USD, every 4h)
- GAO Tek (USD, every 4h)
Scheduler: nightly window → 24/7 continuous (42 jobs total)
- Playwright scrapers: every 8h (FS.com, 10Gtek, ATGBICS, ProLabs)
- Fetch/Cheerio: every 4h (11 lightweight vendors)
- Flexoptix catalog: every 2h (primary price source)
- eBay enrichment: every 6h
- Compatibility matrices: every 12h
- Compute jobs: every 4h
Pi fleet: scripts/pi-scraper-setup.sh for one-command Pi node setup
Remove orphan schedules (addon/naddod/qsfptek) that had no registered workers.
Pre-create request_queues/default, datasets/default, key_value_stores/default
before each scraper run to avoid ENOENT when Crawlee tries to write lock files.
Previously missing from scheduler:
- Champion ONE, Fluxlight, GBICs, SFPCables pricing
- Juniper HCT, SONiC HCL, Ufispace, Edgecore compatibility
- Flexoptix supported vendors
- Switch assets enrichment
Full nightly sequence now covers every scraper in the fleet.
All jobs staggered with 15-30 min gaps to respect vendor rate limits.
- Rename nav tab and sub-nav from 'Procurement Intel' to 'Procurement Intelligence'
- Add data-tip tooltips to all 8 ABC table column headers
- Add title attributes to signal badges, ABC class badges, supply risk, stock/price/lead trend spans, signal strength bar
- Add hover descriptions to Market Intelligence type icons, buy signal badges, technology tags, impact horizon, source
- Add hover descriptions to Lifecycle Events type icons, buy signal badges, impact level, effective date
- Tooltips explain business meaning of every data point (e.g. ABC classification formula, demand score composition, supply risk levels)
- api() helper now parses JSON body on non-2xx responses so error.suggestion
is available in catch blocks
- runFinder() catch shows 'Switch not found' + suggestion instead of 'Error: HTTP 404'
- finder.ts: normalized search (removes hyphens/spaces) + token-based fallback
so 'sg350-28' → 'SG350-28', 'N9K-C93180' → Nexus 93180, etc.
- CHANGELOG_PENDING.md: 26 entries from v0.1.0 to today in JSON-line format
- GET /api/changelog: parses and serves entries as JSON array
- Overview tab: changelog card with type badges (FEAT/FIX/UI/DATA/AI/INFRA),
dates, show recent/all toggle
- isGarbageName(): detects scraped-slugs, 'All Optical Transceivers', 'Compatible NNGbps...',
generic form-factor descriptions with no real SKU
- Panel title priority: real standard_name → part_number → description → constructed from specs
- Details warning shown when details_verified = false (amber banner)
- sql/018: marks garbage entries as data_confidence='garbage' for future DELETE
- API: also returns comparable_prices from technically equivalent products
(same form_factor + speed_gbps + reach ±25%, different vendor, last 30 days)
- Dashboard: direct prices shown first, then separator + comparable products
- Comparable entries show vendor + exact part number scraped from their site
- Verified badge = real URL + observed within 7 days (strict)
- 100% VERIFIED bar: checkmarks now rgba(255,255,255,0.92) instead of #2d6a4f (was invisible on green bg)
- Pricing: only show prices with real URL; no MSRP/estimated fallback
- Verified badge only if observed within 7 days; older prices shown without badge
- Temperature Range: COM→'0-70°C (COM)', IND→'-40-85°C (IND)'
- GET /api/transceivers/🆔 returns competitor_prices[] from price_observations
- Detail view: verification summary bar (★ 100% VERIFIED / partial)
- Detail view: Current Prices section with vendor, price, verified badge, date, link
- Detail view: tag tooltips on vendor/category/market_status chips
- List view: new Verified column with 100% stamp or price check
- Optical Budget: TX Power Min/Max labels clarified
- Panel close button: high-contrast rgba background, 38px, bold ×, shadow
- Detail view: full product name (SKU + description) shown above image
- List view: NAME column shows part_number on line 1, descriptive name line 2
- txDescName() helper builds name from description field or constructs from specs
DB (017-verification-tags.sql):
- New columns: price_verified, price_verified_eur, price_verified_url, price_verified_at
- New columns: image_verified, details_verified, fully_verified, fully_verified_at
- compute_transceiver_verification(uuid): per-product verification logic
• price_verified: real scraped URL + price > 0 + observed in last 30 days
• image_verified: R2 stored OR image_url from known vendor CDNs (flexoptix.net, fs.com, etc.), no placeholder
• details_verified: product_page_url + all core fields (form_factor, speed, reach, fiber_type, part_number) populated
• fully_verified: all three true simultaneously
- recompute_all_verification(): bulk recompute, returns stats
- Initial run: 3575 price_verified, 1173 image_verified, 1380 details_verified, 258 fully_verified
- Indexes on price_verified, fully_verified for fast filtering
- v_verified_products view
API finder.ts:
- SELECT now includes all verification fields
- Response maps: price_verified, price_verified_eur, price_verified_url, image_verified, details_verified, fully_verified
API health.ts:
- verification block: counts + coverage percentages in /api/health
Dashboard Finder:
- 'Verified Price': green checkmark ✓ next to price, tooltip explains source
- '100% Verified' stamp: dark green gradient badge top of card, card gets green border
- 'price source ↗' link to original scraped URL
- Summary bar: 'X × 100% Verified · Y with verified prices'
- New 'Finder' tab between Switches and Blog Engine
- Search by switch model (free text), filter by speed
- Quick-access buttons: Nexus 93180, Nexus 9332D, Arista 7280R3A, Juniper QFX5120
- Results grouped by speed class (400G QSFP-DD, 100G QSFP28, etc.)
- Shows: part number, vendor, reach, fiber type, connector, verified price, stock status
- Flexoptix products highlighted with orange left border + FLEXOPTIX badge
- Buy link → flexoptix.net for each result
- Uses existing /api/finder endpoint with 33,993 compatibility entries
gatherBlogData():
- Fetches real prices from price_observations (last 30 days) per product
- Filters transceivers by speed extracted from topic keywords
- Enriches every product with verified_prices array + has_verified_price flag
- Joins DB products with vector search results (DB first — they have real prices)
contextData injection (blog.ts):
- [PRODUCT] lines: exact standard_name, form_factor, speed, reach, connector, dBm specs, Watts
- [VERIFIED PRICE] lines: real EUR/USD price, vendor, observed date, source URL
- [NO VERIFIED PRICE IN DB]: explicit tag — LLM must not invent a number
- [NO PRODUCT DATA AVAILABLE]: fallback when DB returns nothing
fo-blog-pipeline.ts system prompt:
- DATA INTEGRITY RULES block: prices/part numbers/vendors ONLY from context
- Never approximate with ~€350 or 'typically $200-600' for specific products
- Power specs only from [PRODUCT] data or REFERENCE VALUES
STEP4 context instructions:
- Explicit rules on how to use [VERIFIED PRICE] vs [NO VERIFIED PRICE]
- Invented data = HARD FAIL in QA
STEP9 QA — 3 new hard fail checks (30, 31, 32):
- Check 30: invented prices → remove or replace with flexoptix.net reference
- Check 31: invented part numbers → remove, use class name instead
- Check 32: invented vendor names → remove if not in known list
- Expanded research pool to 9 topics (was 3), evergreen to 12 (was 3)
- Conference topics: added Photonics West, CIOE, NFOEC follow-up, year-end review
- Standards topics: 3 rotating variants (IEEE tracker, SFF-8024 registry, OIF CEI-112G)
- seededShuffle(): day-of-year as seed — stable within the day, different every day
- API response adds refreshes_at (next midnight UTC) for frontend countdown
- Dashboard subtitle shows 'rotates daily · next refresh in Xh'
- Hot topic cards now pass full title + angle into generateBlog() correctly
- Fix SR4/DR4 fiber count: both use 8 fibers (4TX+4RX), difference is MMF vs SMF
- Fix power per port: 400G=10-15W/port, 800G=15-25W/port (not 1kW/port)
- Fix pricing context: always distinguish OEM ($1-5K) vs compatible ($200-600)
- Add HARD RULES 15-21: fiber count, power, pricing, no markdown headers,
max 6 sections, no repeated topics, flow over format
- Add QA CALIBRATION FAILS 16-21: same rules enforced at QA step
- Add fiber/power reference tables with correct values
- Strip markdown (##/###/####/**) from all output — plain text only
- Add Style B gold example (10/10 validated prose article)
- Update STEP5 reality injection with correct SR4→DR4 description
- Update STEP8 kill-AI-tone to strip markdown headers + merge duplicates
- CALIBRATION_GOLD_STANDARD now covers two validated styles: A (structured) and B (prose)
- Style B: no headers, no bullets, 1-3 sentence paragraphs, reframe ending
- STEP8_KILL_AI_TONE: prose conversion option for over-structured articles
- STEP4_MASTER_DRAFT: explicit style choice instruction (A vs B based on angle)
- Gold standard includes exact prose rhythm patterns from 10/10 human-reviewed article
- Wrong patterns expanded: symmetric sections, checklist endings, transition clichés
- POST /api/blog/:id/regenerate — re-runs full 10-step LLM pipeline on existing draft
- Regenerate button visible when quality_issues present or status=review
- SEO keywords now displayed as clickable #hashtags (copy-to-clipboard)
- fo-blog-pipeline: added PoE misuse, DR4 mislabeling, ZR/DR4 conflation as hard QA fails
- fo-blog-pipeline: 14 hard rules in system prompt (was 10)
- fo-blog-pipeline: CALIBRATION_GOLD_STANDARD + withCalibration() from 10/10 human review
- System prompt now includes gold standard example on every pipeline run
Hot Topics:
- Dynamic topics from /api/hot-topics loaded in Blog Engine tab
- 7 data sources (prices, competitors, hype cycle, news, conferences, research, evergreen)
- Urgency badges: BREAKING (red), HOT (orange), TRENDING (yellow), EMERGING (green)
Pipeline Lock:
- Only 1 generation at a time, 'Pipeline Busy' toast on double-click
- Progress bar with step names (external hot-topics.js, no inline hacks)
Blog Delete:
- DELETE /api/blog/:id endpoint
- Delete button (✕) on each blog in list
- 'Delete All Templates' button to clean up test drafts
Fix: dashboard JS extracted to external hot-topics.js to avoid sed quote hell
Hot Topics Engine (GET /api/hot-topics):
- 7 data sources: price movements, competitor alerts, hype cycle transitions,
news articles, conference calendar, research trends, evergreen topics
- Auto-discovers BREAKING/HOT/TRENDING/EMERGING topics
- Dashboard loads topics dynamically with urgency badges and source labels
- Click any topic → generates blog with that angle
Pipeline Lock (critical UX fix):
- Only 1 blog generation at a time (blogPipelineRunning flag)
- 'Pipeline Busy' toast if user clicks while generating
- Lock released on completion, timeout, or error
Dashboard:
- Static 3 cards replaced with dynamic hot topics grid
- 'Refresh Topics' button
- Topics show urgency color (red=breaking, orange=hot, yellow=trending, green=emerging)
- Auto-loads when Blog Engine tab opens
When you click Generate:
- Dark overlay with orange progress bar shows pipeline status
- Live step counter: 'Step 3/10: Outline Generation — decision-driven structure'
- Percentage updates every 15 seconds via API polling
- When done: shows word count + QA score, auto-opens the article
- No more silent template dump — user sees the entire pipeline working
Blog engine was falling back to template because qwen2.5:14b is on Mac Studio (.213),
not on Erik (localhost). Fixed ecosystem.config.js to use 192.168.178.213:11434.
This was the root cause why the 10-step pipeline never executed.
System prompt: 10 HARD RULES (non-negotiable, article fails QA without them)
- Mandatory WHAT BREAKS IN PRODUCTION section (2+ specific failures with symptoms/cause/fix)
- Mandatory HIDDEN COSTS section (cleaning, troubleshooting time, cabling redesign, training)
- Mandatory WHEN NOT TO USE section for every recommendation
- Absolute statement rule: NEVER without conditions/context
- Cabling reality: MPO polarity, SR4→DR4 migration, cleaning requirements
- Brutal hook requirement: not 'If you're still...' but 'You're about to sign a PO. Stop.'
- Minimum 2500 words (was 2000)
Step 5 (Reality Injection): Now checks for ALL mandatory sections and adds if missing
Step 9 (QA Check): Hard fail checks — article is NOT publishable without production failures + hidden costs
Feedback source: Human expert review scoring 7.5/10, targeting 9.5+
- GET /api/datasheets/transceiver/:id — Full datasheet with power budget, pricing, compatibility, HTML export
- GET /api/datasheets/switch/:id — Switch datasheet with compatible transceivers
- GET /api/adoption — Full technology roadmap with maturity indicators
- GET /api/adoption/:technology — Detailed adoption analysis, migration paths, risks, timelines
- All v0.2.0 routes registered in index.ts
Dashboard now uses /api/hype-cycle/enriched for forecast data.
Forecast chart: SVG area chart with adoption % per technology over 5 years.
Regional heatmap: table with market share intensity per region per tech.
Both render below the existing hype cycle table.
Premium dark UI with ambient glow background, glass cards, gradient
text, animated counters, glow effects on hype cycle dots, smooth
transitions, and improved detail panel with regional adoption and
revenue lifecycle data. Fix API data key mismatch (data vs transceivers).
Update REGIONAL_LAGS with data from LightCounting, vendor earnings,
OFC market sessions, and Chinese IPO prospectuses. Add price index
per region and segment mix (hyperscaler/telco/enterprise) for
more accurate regional revenue modeling.
findTechnology used loose includes() matching — '1.6T OSFP-XD' matched
'1G SFP' first because query contained '1'. Now matches exact name first,
then by speed prefix with proper unit parsing (G/T).
- Removed AI-slop gradient header, replaced with compact info bar
- JetBrains Mono for data, Inter for text — proper engineering tool aesthetic
- Unified slide-in detail panel for hype cycle, transceivers, and blog drafts
- Transceiver table: clickable rows open full spec sheet in side panel
- Form factor dropdown filter on transceivers tab
- Blog drafts: click to read full content in panel
- Tighter spacing, smaller fonts, higher information density
- Escape key closes panel, responsive breakpoints
- No gratuitous badges/glow — clean data presentation
Single-file dashboard with 6 tabs: Overview, Semantic Search,
Hype Cycle, Transceivers, News, Blog Engine. Dark theme, no
build step, served as static HTML from Express.
- Overview: health stats, vector collection counts, recent news
- Semantic Search: query across all 6 Qdrant collections
- Hype Cycle: Norton-Bass table with phase colors + position bars
- Transceivers: searchable table with form factor/speed/reach
- News: semantic news search with source links
- Blog: generate drafts from templates, view draft history
Live at: https://transceiver-db.context-x.org/dashboard/
Blog draft engine generates structured markdown from all Qdrant
collections (products, news, FAQ, troubleshooting). Supports 4
topic types: hype_cycle, comparison, new_product, tutorial.
- routes/blog.ts: POST /api/blog/generate, GET/PUT endpoints
- ecosystem.config.js: Added tip-scraper PM2 process
- Scraper scheduler (pg-boss) now running on Erik with 8 job queues
- News scraper running every 6 hours on Erik
PGPASSWORD=***REDACTED*** psql -h localhost -p 5433 -U tip -d transceiver_db -t -A -c "SELECT 'images: ' || count(*) FROM transceivers WHERE image_url IS NOT NULL" >> "$LOG" 2>&1
PGPASSWORD=***REDACTED*** psql -h localhost -p 5433 -U tip -d transceiver_db -t -A -c "SELECT 'connector: ' || count(*) FROM transceivers WHERE connector IS NOT NULL" >> "$LOG" 2>&1
PGPASSWORD=***REDACTED*** psql -h localhost -p 5433 -U tip -d transceiver_db -t -A -c "SELECT 'notes: ' || count(*) FROM transceivers WHERE notes IS NOT NULL AND notes != ''" >> "$LOG" 2>&1
PGPASSWORD=***REDACTED*** psql -h localhost -p 5433 -U tip -d transceiver_db -t -A -c "SELECT 'switches: ' || count(*) FROM switches" >> "$LOG" 2>&1
PGPASSWORD=***REDACTED*** psql -h localhost -p 5433 -U tip -d transceiver_db -t -A -c "SELECT 'sw_desc: ' || count(*) FROM switches WHERE description IS NOT NULL" >> "$LOG" 2>&1
PGPASSWORD=tip_prod_2026 psql -h localhost -p 5433 -U tip -d transceiver_db -t -A -c "SELECT 'images: ' || count(*) FROM transceivers WHERE image_url IS NOT NULL" >> "$LOG" 2>&1
PGPASSWORD=tip_prod_2026 psql -h localhost -p 5433 -U tip -d transceiver_db -t -A -c "SELECT 'connector: ' || count(*) FROM transceivers WHERE connector IS NOT NULL" >> "$LOG" 2>&1
PGPASSWORD=tip_prod_2026 psql -h localhost -p 5433 -U tip -d transceiver_db -t -A -c "SELECT 'notes: ' || count(*) FROM transceivers WHERE notes IS NOT NULL AND notes != ''" >> "$LOG" 2>&1
PGPASSWORD=tip_prod_2026 psql -h localhost -p 5433 -U tip -d transceiver_db -t -A -c "SELECT 'switches: ' || count(*) FROM switches" >> "$LOG" 2>&1
PGPASSWORD=tip_prod_2026 psql -h localhost -p 5433 -U tip -d transceiver_db -t -A -c "SELECT 'sw_desc: ' || count(*) FROM switches WHERE description IS NOT NULL" >> "$LOG" 2>&1
psql -h localhost -p 5433 -U tip -d transceiver_db -t -A -F'|' -c \
"SELECT t.id, t.product_page_url, t.part_number, t.standard_name FROM transceivers t JOIN vendors v ON t.vendor_id = v.id WHERE v.name = 'FLEXOPTIX' AND t.product_page_url IS NOT NULL"\
PGPASSWORD=***REDACTED*** psql -h localhost -p 5433 -U tip -d transceiver_db -t -A -c "SELECT count(*) as img FROM transceivers WHERE image_url IS NOT NULL" >> "$LOG" 2>&1
PGPASSWORD=tip_prod_2026 psql -h localhost -p 5433 -U tip -d transceiver_db -t -A -c "SELECT count(*) as img FROM transceivers WHERE image_url IS NOT NULL" >> "$LOG" 2>&1
echo" transceivers have images" >> "$LOG"
PGPASSWORD=***REDACTED*** psql -h localhost -p 5433 -U tip -d transceiver_db -t -A -c "SELECT count(*) FROM transceivers WHERE notes IS NOT NULL AND notes != ''" >> "$LOG" 2>&1
PGPASSWORD=tip_prod_2026 psql -h localhost -p 5433 -U tip -d transceiver_db -t -A -c "SELECT count(*) FROM transceivers WHERE notes IS NOT NULL AND notes != ''" >> "$LOG" 2>&1
echo" transceivers have enriched notes" >> "$LOG"
PGPASSWORD=***REDACTED*** psql -h localhost -p 5433 -U tip -d transceiver_db -t -A -c "SELECT count(*) FROM transceivers WHERE connector IS NOT NULL" >> "$LOG" 2>&1
PGPASSWORD=tip_prod_2026 psql -h localhost -p 5433 -U tip -d transceiver_db -t -A -c "SELECT count(*) FROM transceivers WHERE connector IS NOT NULL" >> "$LOG" 2>&1
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.