feat: data quality panel in Crawler Intelligence tab

GET /api/scrapers/data-quality — 4 parallel queries across 200k+
transceiver_verification_evidence rows. Returns: coverage percentages
(price 62%, image 68%, details 94%, competitor 2%), all 10 evidence
types with counts + avg confidence, 17 robot/scraper contributions,
14-day daily activity time series.

Dashboard: coverage progress bars (color-coded thresholds), evidence
type table, SVG activity sparkline, robot contributions table.
This commit is contained in:
Rene Fichtmueller 2026-05-14 16:22:25 +02:00
parent 10d13633fb
commit 4bd16af9a5
3 changed files with 233 additions and 2 deletions

View File

@ -1,6 +1,7 @@
# TIP Changelog # TIP Changelog
Format: `{"d":"YYYY-MM-DD","t":"TYPE","m":"Description"}` Format: `{"d":"YYYY-MM-DD","t":"TYPE","m":"Description"}`
{"d":"2026-05-14","t":"FEAT","m":"Crawler Intelligence: Data Quality panel. New GET /api/scrapers/data-quality endpoint — 4 parallel queries over 200,617 transceiver_verification_evidence rows: (1) coverage breakdown (price 11,366/18,146 = 62%, image 12,333/68%, details 17,085/94%, competitor_match 399/2%, quarantined 1,193); (2) all 10 evidence types with count + avg confidence + product count + last seen; (3) robot/scraper contributions table (17 robots ranked by output); (4) daily activity last 14 days. Dashboard Crawler Intelligence tab: new 🔬 Data Quality section with coverage progress bars (color-coded ≥80% green / ≥50% amber / red), evidence type table, SVG sparkline bar chart for 14-day activity, robot contributions table with live/stale dot indicators."}
{"d":"2026-05-14","t":"FEAT","m":"Dynamic Hype Cycle + Market Signal Engine: Hype Cycle tab is now fully data-driven. New GET /api/hype-cycle/market-signals endpoint blends 6 real data sources into a composite Market Signal Score (0100) per technology: (1) hype_score from Norton-Bass model (30% weight), (2) hyperscaler CapEx YoY avg (Microsoft +68.8%, Alphabet +107.4%, Meta +46.8%), (3) price observation activity ratio 30d vs prior 30d, (4) AI cluster estimated transceiver demand (90d window), (5) eBay secondary market sell-through velocity, (6) internal fast-mover demand trend. Score thresholds: ≥70 green, ≥50 yellow, ≥30 orange, <30 gray. Recommendation engine: buildRecommendation(phase, signalScore, capexYoyAvg, speedGbps) maps hype phase × capex boom × speed class Buy/Hold/Watch label with color + detail tooltip. Dashboard: Hype Cycle table shows Market Signal LIVE column (score + progress bar) + Recommendation column (emoji label, tooltip with reasoning). Market Context cards row above table shows Top Signal, CapEx Boom %, Fast Movers signal, eBay Velocity. New Hyperscaler CapEx panel (SEC filing data) + eBay Secondary Market panel at bottom of hype tab. Procurement: new 🛒 eBay Market sub-section with per-form-factor sell-through grid. All 6 queries run in parallel via Promise.all()."} {"d":"2026-05-14","t":"FEAT","m":"Dynamic Hype Cycle + Market Signal Engine: Hype Cycle tab is now fully data-driven. New GET /api/hype-cycle/market-signals endpoint blends 6 real data sources into a composite Market Signal Score (0100) per technology: (1) hype_score from Norton-Bass model (30% weight), (2) hyperscaler CapEx YoY avg (Microsoft +68.8%, Alphabet +107.4%, Meta +46.8%), (3) price observation activity ratio 30d vs prior 30d, (4) AI cluster estimated transceiver demand (90d window), (5) eBay secondary market sell-through velocity, (6) internal fast-mover demand trend. Score thresholds: ≥70 green, ≥50 yellow, ≥30 orange, <30 gray. Recommendation engine: buildRecommendation(phase, signalScore, capexYoyAvg, speedGbps) maps hype phase × capex boom × speed class Buy/Hold/Watch label with color + detail tooltip. Dashboard: Hype Cycle table shows Market Signal LIVE column (score + progress bar) + Recommendation column (emoji label, tooltip with reasoning). Market Context cards row above table shows Top Signal, CapEx Boom %, Fast Movers signal, eBay Velocity. New Hyperscaler CapEx panel (SEC filing data) + eBay Secondary Market panel at bottom of hype tab. Procurement: new 🛒 eBay Market sub-section with per-form-factor sell-through grid. All 6 queries run in parallel via Promise.all()."}
{"d":"2026-05-14","t":"FEAT","m":"Procurement tab: 2 new sections with real data. (1) 📦 Internal Demand — Flexoptix internal SKU velocity from flexoptix_internal_demand table (8,585 SKUs: 70 fast-movers 53k units/12M, 239 regular, 979 slow, 7,297 dead stock). Summary cards with trend %%. Filter by velocity class. API: GET /api/procurement/internal-demand?velocity_class=&limit=&sort=. (2) 🤖 AI Clusters — live AI datacenter announcements from ai_cluster_announcements table (396 in last 30 days). Shows estimated transceiver demand per build, MW scale, company, location, source link. Filter for entries with transceiver estimates. Stats: total announcements, MW, distinct companies, total estimated transceivers. API: GET /api/procurement/ai-clusters?days=&limit=. Replaced misleading DEMO DATA banners on Signals + ABC sections with informational note pointing to Internal Demand data."} {"d":"2026-05-14","t":"FEAT","m":"Procurement tab: 2 new sections with real data. (1) 📦 Internal Demand — Flexoptix internal SKU velocity from flexoptix_internal_demand table (8,585 SKUs: 70 fast-movers 53k units/12M, 239 regular, 979 slow, 7,297 dead stock). Summary cards with trend %%. Filter by velocity class. API: GET /api/procurement/internal-demand?velocity_class=&limit=&sort=. (2) 🤖 AI Clusters — live AI datacenter announcements from ai_cluster_announcements table (396 in last 30 days). Shows estimated transceiver demand per build, MW scale, company, location, source link. Filter for entries with transceiver estimates. Stats: total announcements, MW, distinct companies, total estimated transceivers. API: GET /api/procurement/ai-clusters?days=&limit=. Replaced misleading DEMO DATA banners on Signals + ABC sections with informational note pointing to Internal Demand data."}

View File

@ -238,3 +238,85 @@ scraperRouter.get("/llm-insights", async (_req: Request, res: Response) => {
res.status(503).json({ success: false, error: String(err) }); res.status(503).json({ success: false, error: String(err) });
} }
}); });
// GET /api/scrapers/data-quality — Verification evidence coverage + quality metrics
scraperRouter.get("/data-quality", async (_req: Request, res: Response) => {
try {
const [coverageRows, evidenceTypes, robotActivity, dailyActivity] = await Promise.all([
// Coverage: how many transceivers have each evidence type
pool.query(`
SELECT
COUNT(DISTINCT t.id)::int AS total_transceivers,
COUNT(DISTINCT CASE WHEN e.verification_type = 'price' THEN t.id END)::int AS have_price,
COUNT(DISTINCT CASE WHEN e.verification_type = 'image' THEN t.id END)::int AS have_image,
COUNT(DISTINCT CASE WHEN e.verification_type = 'details' THEN t.id END)::int AS have_details,
COUNT(DISTINCT CASE WHEN e.verification_type = 'competitor_match' THEN t.id END)::int AS have_competitor,
COUNT(DISTINCT CASE WHEN e.verification_type = 'artifact_quarantine' THEN t.id END)::int AS quarantined
FROM transceivers t
LEFT JOIN transceiver_verification_evidence e ON e.transceiver_id = t.id
`),
// Evidence type breakdown
pool.query(`
SELECT
verification_type,
COUNT(*)::int AS cnt,
ROUND(AVG(confidence)::numeric, 3) AS avg_confidence,
COUNT(DISTINCT transceiver_id)::int AS distinct_tx,
COUNT(DISTINCT robot_name) AS robot_count,
MAX(created_at) AS last_seen
FROM transceiver_verification_evidence
GROUP BY verification_type
ORDER BY cnt DESC
`),
// Robot / scraper activity
pool.query(`
SELECT
robot_name,
COUNT(*)::int AS total_evidence,
COUNT(DISTINCT transceiver_id)::int AS transceivers_covered,
COUNT(DISTINCT verification_type) AS types_covered,
MIN(created_at)::date AS first_run,
MAX(created_at)::date AS last_run
FROM transceiver_verification_evidence
GROUP BY robot_name
ORDER BY total_evidence DESC
LIMIT 20
`),
// Daily activity last 14 days
pool.query(`
SELECT
created_at::date AS day,
COUNT(*)::int AS evidence_added,
COUNT(DISTINCT transceiver_id)::int AS transceivers_processed
FROM transceiver_verification_evidence
WHERE created_at >= NOW() - INTERVAL '14 days'
GROUP BY day
ORDER BY day DESC
`),
]);
const cov = coverageRows.rows[0];
const total = cov.total_transceivers || 1;
res.json({
success: true,
coverage: {
total: cov.total_transceivers,
price: cov.have_price,
image: cov.have_image,
details: cov.have_details,
competitor: cov.have_competitor,
quarantined: cov.quarantined,
pricePct: Math.round((cov.have_price / total) * 100),
imagePct: Math.round((cov.have_image / total) * 100),
detailsPct: Math.round((cov.have_details / total) * 100),
competitorPct: Math.round((cov.have_competitor / total) * 100),
},
evidenceTypes: evidenceTypes.rows,
robotActivity: robotActivity.rows,
dailyActivity: dailyActivity.rows,
});
} catch (err) {
res.status(503).json({ success: false, error: String(err) });
}
});

View File

@ -1795,7 +1795,7 @@
</div> </div>
<!-- LLM Hot Topics --> <!-- LLM Hot Topics -->
<div style="display:grid;grid-template-columns:1fr 1fr;gap:1.5rem"> <div style="display:grid;grid-template-columns:1fr 1fr;gap:1.5rem;margin-bottom:2rem">
<div> <div>
<h3 style="font-size:0.9rem;font-weight:700;margin-bottom:0.75rem;color:var(--text-bright)">🔥 LLM Hot Topics</h3> <h3 style="font-size:0.9rem;font-weight:700;margin-bottom:0.75rem;color:var(--text-bright)">🔥 LLM Hot Topics</h3>
<div id="cr-topics"><div style="color:var(--text-dim)">Loading…</div></div> <div id="cr-topics"><div style="color:var(--text-dim)">Loading…</div></div>
@ -1805,6 +1805,15 @@
<div id="cr-kb-entries"><div style="color:var(--text-dim)">Loading…</div></div> <div id="cr-kb-entries"><div style="color:var(--text-dim)">Loading…</div></div>
</div> </div>
</div> </div>
<!-- Data Quality Panel -->
<div id="cr-data-quality-panel">
<div style="display:flex;align-items:center;gap:0.75rem;margin-bottom:1.25rem">
<h3 style="font-size:0.9rem;font-weight:700;color:var(--text-bright)">🔬 Data Quality &amp; Verification Coverage</h3>
<button onclick="loadDataQuality()" style="margin-left:auto;font-size:0.72rem;padding:2px 10px;border-radius:4px;border:1px solid var(--border);background:var(--surface2);color:var(--text-dim);cursor:pointer">↻ Refresh</button>
</div>
<div id="cr-data-quality"><div style="color:var(--text-dim)">Loading…</div></div>
</div>
</div><!-- end tab-crawlers --> </div><!-- end tab-crawlers -->
<!-- SELFLEARNING --> <!-- SELFLEARNING -->
@ -7541,6 +7550,7 @@ async function startSelflearningTrain(lane, provider, seedOnly) {
// ── CRAWLER INTELLIGENCE ──────────────────────────────────────────── // ── CRAWLER INTELLIGENCE ────────────────────────────────────────────
async function loadCrawlerStatus() { async function loadCrawlerStatus() {
loadCrawlerJobs(); // load live job queue in parallel loadCrawlerJobs(); // load live job queue in parallel
loadDataQuality(); // load verification evidence quality panel in parallel
var token = (window.loadToken ? window.loadToken() : '') || ''; var token = (window.loadToken ? window.loadToken() : '') || '';
var status = null; var status = null;
var insights = null; var insights = null;
@ -7731,6 +7741,144 @@ async function loadCrawlerJobs() {
} }
} }
/* ── Data Quality (Verification Evidence) ──────────────────────────────── */
async function loadDataQuality() {
var token = (window.loadToken ? window.loadToken() : '') || '';
var el = document.getElementById('cr-data-quality');
if (!el) return;
try {
var r = await fetch('/api/scrapers/data-quality', { headers: { 'Authorization': 'Bearer ' + token } });
var d = await r.json();
if (!d.success) throw new Error(d.error || 'API error');
el.innerHTML = renderDataQuality(d);
} catch(e) {
el.innerHTML = '<div style="color:var(--text-dim);padding:0.5rem">Error loading data quality: ' + esc(e.message) + '</div>';
}
}
function renderDataQuality(d) {
var cov = d.coverage || {};
var total = cov.total || 1;
// Coverage bars
var bars = [
{ label: 'Details / Spec', key: 'detailsPct', count: cov.details, pct: cov.detailsPct, color: '#6366f1' },
{ label: 'Image', key: 'imagePct', count: cov.image, pct: cov.imagePct, color: '#3b82f6' },
{ label: 'Price', key: 'pricePct', count: cov.price, pct: cov.pricePct, color: '#22c55e' },
{ label: 'Competitor Match', key: 'competitorPct', count: cov.competitor, pct: cov.competitorPct, color: '#f59e0b' },
];
var coverageHtml = '<div style="background:var(--surface2);border:1px solid var(--border);border-radius:10px;padding:1.25rem;margin-bottom:1.5rem">'
+ '<div style="display:flex;justify-content:space-between;align-items:center;margin-bottom:1rem">'
+ '<span style="font-size:0.82rem;font-weight:700;color:var(--text-bright)">Coverage Overview</span>'
+ '<span style="font-size:0.72rem;color:var(--text-dim)">' + (total).toLocaleString() + ' total transceivers'
+ (cov.quarantined ? ' · <span style="color:#f59e0b">' + cov.quarantined.toLocaleString() + ' quarantined</span>' : '')
+ '</span>'
+ '</div>'
+ bars.map(function(b) {
var pct = b.pct || 0;
var bgColor = pct >= 80 ? 'rgba(34,197,94,0.08)' : pct >= 50 ? 'rgba(245,158,11,0.08)' : 'rgba(239,68,68,0.08)';
return '<div style="margin-bottom:0.75rem;background:' + bgColor + ';border-radius:6px;padding:0.6rem 0.75rem">'
+ '<div style="display:flex;justify-content:space-between;align-items:center;margin-bottom:0.3rem">'
+ '<span style="font-size:0.78rem;font-weight:600;color:var(--text-bright)">' + esc(b.label) + '</span>'
+ '<span style="font-size:0.72rem;color:var(--text-dim)">'
+ (b.count || 0).toLocaleString() + ' / ' + total.toLocaleString()
+ ' · <span style="color:' + b.color + ';font-weight:700">' + pct + '%</span>'
+ '</span>'
+ '</div>'
+ '<div style="width:100%;height:6px;background:var(--border);border-radius:3px;overflow:hidden">'
+ '<div style="width:' + Math.min(pct, 100) + '%;height:100%;background:' + b.color + ';border-radius:3px;transition:width 0.6s ease"></div>'
+ '</div></div>';
}).join('')
+ '</div>';
// Evidence type table
var typeIcons = {
price: '💶', price_unavailable: '💶❌', image: '🖼', image_unavailable: '🖼❌',
details: '📋', details_unavailable: '📋❌', competitor_match: '✅',
competitor_no_match: '❌', competitor_ambiguous: '⚠️', artifact_quarantine: '🚫'
};
var types = d.evidenceTypes || [];
var evidenceHtml = '<div style="background:var(--surface2);border:1px solid var(--border);border-radius:10px;padding:1.25rem;margin-bottom:1.5rem">'
+ '<div style="font-size:0.82rem;font-weight:700;color:var(--text-bright);margin-bottom:1rem">Evidence Type Breakdown</div>'
+ '<table style="width:100%;border-collapse:collapse;font-size:0.75rem">'
+ '<thead><tr style="background:var(--surface2)">'
+ '<th style="padding:0.4rem 0.5rem;text-align:left;color:var(--text-dim);font-weight:600">Type</th>'
+ '<th style="padding:0.4rem 0.5rem;text-align:right;color:var(--text-dim);font-weight:600">Evidence</th>'
+ '<th style="padding:0.4rem 0.5rem;text-align:right;color:var(--text-dim);font-weight:600">Products</th>'
+ '<th style="padding:0.4rem 0.5rem;text-align:right;color:var(--text-dim);font-weight:600">Avg Conf</th>'
+ '<th style="padding:0.4rem 0.5rem;text-align:right;color:var(--text-dim);font-weight:600">Last Seen</th>'
+ '</tr></thead><tbody>'
+ types.map(function(t, i) {
var icon = typeIcons[t.verification_type] || '●';
var conf = t.avg_confidence != null ? Math.round(Number(t.avg_confidence) * 100) + '%' : '—';
var confColor = t.avg_confidence >= 0.95 ? '#22c55e' : t.avg_confidence >= 0.8 ? '#f59e0b' : '#ef4444';
var last = t.last_seen ? new Date(t.last_seen).toLocaleDateString('de-DE') : '—';
var stripe = i % 2 === 1 ? 'background:var(--surface2)' : '';
return '<tr style="border-bottom:1px solid var(--border);' + stripe + '">'
+ '<td style="padding:0.4rem 0.5rem;color:var(--text-bright)">' + icon + ' <span style="font-weight:500">' + esc(t.verification_type.replace(/_/g,' ')) + '</span></td>'
+ '<td style="padding:0.4rem 0.5rem;text-align:right;color:var(--blue);font-weight:700;font-family:monospace">' + (t.cnt||0).toLocaleString() + '</td>'
+ '<td style="padding:0.4rem 0.5rem;text-align:right;color:var(--text-dim)">' + (t.distinct_tx||0).toLocaleString() + '</td>'
+ '<td style="padding:0.4rem 0.5rem;text-align:right;color:' + confColor + ';font-weight:700">' + conf + '</td>'
+ '<td style="padding:0.4rem 0.5rem;text-align:right;color:var(--text-dim)">' + last + '</td>'
+ '</tr>';
}).join('')
+ '</tbody></table></div>';
// Daily activity sparkline
var days = (d.dailyActivity || []).slice().reverse(); // oldest first
var maxActivity = Math.max.apply(null, days.map(function(x) { return x.evidence_added || 0; })) || 1;
var sparkH = 50;
var sparkW = Math.max(days.length * 22, 200);
var sparkBars = days.map(function(x, i) {
var h = Math.max(2, Math.round((x.evidence_added / maxActivity) * sparkH));
var dateStr = x.day;
var label = x.evidence_added.toLocaleString() + ' evidence\n' + (x.transceivers_processed||0).toLocaleString() + ' products\n' + dateStr;
var barColor = x.evidence_added > 10000 ? '#6366f1' : x.evidence_added > 1000 ? '#3b82f6' : x.evidence_added > 100 ? '#22c55e' : '#64748b';
return '<rect class="tip" data-tip="' + esc(label) + '" x="' + (i*22+1) + '" y="' + (sparkH - h) + '" width="18" height="' + h + '" rx="3" fill="' + barColor + '" />';
}).join('');
var sparkSvg = '<svg width="' + sparkW + '" height="' + sparkH + '" style="overflow:visible">' + sparkBars + '</svg>';
var activityHtml = '<div style="background:var(--surface2);border:1px solid var(--border);border-radius:10px;padding:1.25rem;margin-bottom:1.5rem">'
+ '<div style="font-size:0.82rem;font-weight:700;color:var(--text-bright);margin-bottom:0.75rem">Daily Activity (last 14 days)</div>'
+ '<div style="overflow-x:auto;padding-bottom:0.5rem">' + sparkSvg + '</div>'
+ '<div style="font-size:0.68rem;color:var(--text-dim);margin-top:0.4rem">Hover bars for details. Purple = >10k, Blue = >1k, Green = >100, Gray = low activity.</div>'
+ '</div>';
// Robot table
var robots = (d.robotActivity || []);
var robotHtml = '<div style="background:var(--surface2);border:1px solid var(--border);border-radius:10px;padding:1.25rem">'
+ '<div style="font-size:0.82rem;font-weight:700;color:var(--text-bright);margin-bottom:1rem">Scraper / Robot Contributions</div>'
+ '<table style="width:100%;border-collapse:collapse;font-size:0.72rem">'
+ '<thead><tr style="background:var(--surface2)">'
+ '<th style="padding:0.4rem 0.5rem;text-align:left;color:var(--text-dim);font-weight:600">Robot</th>'
+ '<th style="padding:0.4rem 0.5rem;text-align:right;color:var(--text-dim);font-weight:600">Evidence</th>'
+ '<th style="padding:0.4rem 0.5rem;text-align:right;color:var(--text-dim);font-weight:600">Products</th>'
+ '<th style="padding:0.4rem 0.5rem;text-align:right;color:var(--text-dim);font-weight:600">Types</th>'
+ '<th style="padding:0.4rem 0.5rem;text-align:right;color:var(--text-dim);font-weight:600">Last Run</th>'
+ '</tr></thead><tbody>'
+ robots.map(function(r, i) {
var stripe = i % 2 === 1 ? 'background:rgba(255,255,255,0.02)' : '';
var isActive = r.last_run === new Date().toISOString().slice(0,10);
var dotColor = isActive ? '#22c55e' : '#64748b';
return '<tr style="border-bottom:1px solid var(--border);' + stripe + '">'
+ '<td style="padding:0.4rem 0.5rem;color:var(--text-bright);font-family:monospace;font-size:0.68rem">'
+ '<span style="display:inline-block;width:6px;height:6px;border-radius:50%;background:' + dotColor + ';margin-right:5px;vertical-align:middle"></span>'
+ esc(r.robot_name) + '</td>'
+ '<td style="padding:0.4rem 0.5rem;text-align:right;color:var(--blue);font-weight:700">' + (r.total_evidence||0).toLocaleString() + '</td>'
+ '<td style="padding:0.4rem 0.5rem;text-align:right;color:var(--text-dim)">' + (r.transceivers_covered||0).toLocaleString() + '</td>'
+ '<td style="padding:0.4rem 0.5rem;text-align:right;color:var(--text-dim)">' + (r.types_covered||0) + '</td>'
+ '<td style="padding:0.4rem 0.5rem;text-align:right;color:' + (isActive ? '#22c55e' : 'var(--text-dim)') + '">' + esc(r.last_run || '—') + '</td>'
+ '</tr>';
}).join('')
+ '</tbody></table></div>';
return coverageHtml + '<div style="display:grid;grid-template-columns:1fr 1fr;gap:1.25rem">'
+ '<div>' + evidenceHtml + activityHtml + '</div>'
+ '<div>' + robotHtml + '</div>'
+ '</div>';
}
/* ── Smart Tooltips ─────────────────────────────────────────────────────── */ /* ── Smart Tooltips ─────────────────────────────────────────────────────── */
function initSmartTooltips() { function initSmartTooltips() {
var tip = document.createElement('div'); var tip = document.createElement('div');