feat: stock quality schema + QSFPTEK/NADDOD v2 scrapers with real-time stock counts
- Migration 028 (retroactive): document warehouse columns added to stock_observations - Migration 037: composite indexes for DISTINCT ON (transceiver_id, source_vendor_id) queries - Migration 038: add stock_confidence (1/2/3), price_currency, price_includes_tax, stock_vendor_ts to stock_observations + TRUNCATE test-run data db.ts: upsertStockObservation now accepts stockConfidence, priceCurrency, priceIncludesTax, stockVendorTs; delta detection includes quantity_available fs-com.ts: passes stockConfidence=3 + priceCurrency=EUR + priceIncludesTax=false qsfptek.ts v2: Phase 1 API listing + Phase 2 detail-page stock extraction - Parses 'X in real-time stock, DATE' from product detail pages - Writes stock_observations with confidence=2 + stockVendorTs - Up to 500 detail pages/run at 2s rate limit naddod.ts v2: complete rewrite from WooCommerce to Astro sitemap-based - Discovers products via /sitemaps/products.xml (600+ products) - URL format: /products/XXXXX.html - Extracts 'In Stock: X' exact counts from SSR HTML - Writes both price + stock observations (confidence 1 or 2)
This commit is contained in:
parent
5b35b2b8be
commit
5393f73c17
@ -3,6 +3,18 @@
|
||||
Format: `{"d":"YYYY-MM-DD","t":"TYPE","m":"Description"}`
|
||||
Types: FEAT · FIX · UI · DATA · AI · INFRA
|
||||
|
||||
{"d":"2026-04-17","t":"DATA","m":"Competitor research: QSFPTEK shows real-time aggregated stock count (e.g. '5507 in real-time stock, 17 Apr 2026') + USD prices; NADDOD shows exact per-product counts ('In Stock: 543') via Astro SSR. Both scraped publicly, no login required. Flexoptix confirmed exact Lagerbestand + EUR prices. FS.com: EUR prices yes, exact counts no."}
|
||||
{"d":"2026-04-17","t":"DATA","m":"stock_observations selective cleanup + schema upgrade: TRUNCATE stock_observations (186 FS.com test-run rows cleared, will repopulate on next launchd run). Added 4 new quality columns via migration 038: stock_confidence (1=boolean/2=aggregated/3=per-warehouse), price_currency CHAR(3), price_includes_tax BOOLEAN, stock_vendor_ts TIMESTAMPTZ."}
|
||||
{"d":"2026-04-17","t":"FEAT","m":"Migration 028 retroactively committed to repo (028-stock-observations-warehouse-columns.sql) — documents the 10 warehouse columns applied directly to Erik DB. Guards with IF NOT EXISTS for safe re-application."}
|
||||
{"d":"2026-04-17","t":"FEAT","m":"upsertStockObservation upgraded: new optional params stockConfidence (1|2|3), priceCurrency (ISO 4217), priceIncludesTax (boolean), stockVendorTs (timestamptz). FS.com now writes stockConfidence=3+priceCurrency=EUR+priceIncludesTax=false. Delta detection now also checks quantity_available changes."}
|
||||
{"d":"2026-04-17","t":"FEAT","m":"QSFPTEK scraper v2: Phase 1 uses existing /mall/commodity/list API for product catalog (880+ products from sitemap). Phase 2 fetches /en/product/XXXXX.html detail pages to extract 'X in real-time stock, DATE' — writes stock_observations with stockConfidence=2 + stockVendorTs. Up to 500 detail pages per run at 2s rate limit."}
|
||||
{"d":"2026-04-17","t":"FEAT","m":"NADDOD scraper v2: complete rewrite — migrated from WooCommerce category scraping to Astro sitemap-based discovery (/sitemaps/products.xml, /products/XXXXX.html). Extracts 'In Stock: X' exact counts from server-rendered HTML. Writes both price_observations (USD) and stock_observations (stockConfidence=1 or 2 depending on data visibility)."}
|
||||
{"d":"2026-04-17","t":"DATA","m":"FS.com first warehouse data load: 268 products scraped, 186 stock_observations written — DE-Lager 128,428 units, Global-Lager 156,052 units, Backorder 37,495, 53.4M units sold total. Top seller: SFP-10GSR-85 with 14M units sold."}
|
||||
{"d":"2026-04-17","t":"FIX","m":"upsertStockObservation: skip condition now includes backorder_qty — backorder-only products (DE=0 GL=0 BO>0) like coherent ZR/ZRH were silently dropped instead of being recorded"}
|
||||
{"d":"2026-04-17","t":"FIX","m":"FS.com price extraction: broad fallback regex now only accepts prices >€100 to reject FS.com's €79 'Preis auf Anfrage' placeholder — prevents fake price observations on 1G/10G/25G/40G/100G transceivers"}
|
||||
{"d":"2026-04-17","t":"UI","m":"Dashboard: stock observations count in header stats bar + warehouse stock summary card in Overview tab (hidden until stock_observations populated); both driven by /api/health stock block"}
|
||||
{"d":"2026-04-17","t":"FEAT","m":"Health API: /api/health now includes stock block — total_observations, transceivers_with_stock, vendors_with_stock, total_de_qty, total_global_qty, last_observation_at from stock_observations"}
|
||||
{"d":"2026-04-17","t":"INFRA","m":"FS.com Mac-side runner: launchd plist at 02:00/10:00/18:00 + run-fs-scraper-mac.sh via SSH tunnel to Erik DB port 5433 — residential IP required, datacenter IP blocked by FS.com Cloudflare WAF"}
|
||||
{"d":"2026-04-17","t":"FEAT","m":"Stock API: GET /api/stock, /api/stock/summary, /api/stock/:id — warehouse breakdowns (DE-Lager, Global-Lager, Nachlieferung, units_sold) per transceiver/vendor"}
|
||||
{"d":"2026-04-17","t":"DATA","m":"upsertStockObservation() in db.ts — writes 10 new stock_observations columns (warehouse_de_qty, warehouse_global_qty, backorder_qty, units_sold, compatible_brands, price_net, product_url, delivery dates)"}
|
||||
{"d":"2026-04-17","t":"DATA","m":"FS.com scraper v2: Playwright-based, extracts DE-Lager + Global-Lager + Nachlieferung + Verkauft counts, German number/date parsing, 120-URL pre-queue, 12-category crawl, 12h dedup window"}
|
||||
|
||||
@ -13,7 +13,7 @@
|
||||
*
|
||||
* Respects robots.txt and rate limits (≤12 req/min listing, ≤10 req/min detail).
|
||||
*/
|
||||
import { PlaywrightCrawler, ProxyConfiguration } from "crawlee";
|
||||
import { PlaywrightCrawler, ProxyConfiguration, purgeDefaultStorages } from "crawlee";
|
||||
import type { Page } from "playwright";
|
||||
|
||||
/** Apply stealth patches to evade Cloudflare TLS/bot fingerprinting */
|
||||
@ -268,6 +268,9 @@ interface ProductDetail extends ProductSummary {
|
||||
async function collectProductUrls(
|
||||
proxyConfiguration: ProxyConfiguration | undefined
|
||||
): Promise<Map<string, ProductSummary>> {
|
||||
// Purge any leftover Crawlee storage from previous runs
|
||||
await purgeDefaultStorages();
|
||||
|
||||
const products = new Map<string, ProductSummary>();
|
||||
const exhausted = new Set<string>();
|
||||
|
||||
@ -359,6 +362,8 @@ async function scrapeProductDetails(
|
||||
requests: Array<{ url: string; userData: { name: string; partNumber: string } }>,
|
||||
proxyConfiguration: ProxyConfiguration | undefined
|
||||
): Promise<ProductDetail[]> {
|
||||
// Purge Phase 1 storage so Phase 2 starts with a clean request queue
|
||||
await purgeDefaultStorages();
|
||||
const details: ProductDetail[] = [];
|
||||
|
||||
const crawler = new PlaywrightCrawler({
|
||||
@ -474,21 +479,36 @@ async function scrapeProductDetails(
|
||||
const t = raw.bodyText;
|
||||
|
||||
// ── Net price (ohne MwSt, EUR) ─────────────────────────────────────────
|
||||
// Priority: patterns that require "ohne MwSt" or "netto" qualifier (FS.com shows
|
||||
// real prices this way). Fallback broad patterns are only accepted above €100
|
||||
// to avoid matching FS.com's €79 placeholder/template price.
|
||||
let priceNet: number | undefined;
|
||||
for (const pat of [
|
||||
const PRICE_QUALIFIED: RegExp[] = [
|
||||
/([0-9]{1,5}[,.]?[0-9]{0,3})\s*€\s*\(?ohne\s+MwSt\.?\)?/i,
|
||||
/€\s*([0-9]{1,5}[,.]?[0-9]{0,3})\s*\(?ohne\s+MwSt\.?\)?/i,
|
||||
/([0-9]{1,5}[,.]?[0-9]{0,3})\s*€\s*netto/i,
|
||||
/Netto[:\s]*€?\s*([0-9]{1,5}[,.]?[0-9]{0,3})/i,
|
||||
/([0-9]{1,5},[0-9]{2})\s*€/,
|
||||
/€\s*([0-9]{1,5},[0-9]{2})/,
|
||||
]) {
|
||||
// DOM-extracted price element (set by page.evaluate in raw)
|
||||
/Preis[:\s]*([0-9]{1,5},[0-9]{2})\s*€/i,
|
||||
/([0-9]{1,5},[0-9]{2})\s*€\s*(?:exkl\.|exklusive|excl\.)/i,
|
||||
];
|
||||
for (const pat of PRICE_QUALIFIED) {
|
||||
const m = t.match(pat);
|
||||
if (m?.[1]) {
|
||||
const p = parseGermanPrice(m[1]);
|
||||
if (p && p > 0.5 && p < 500_000) { priceNet = p; break; }
|
||||
}
|
||||
}
|
||||
// Broad fallback — only accept if price > €100 (avoids FS.com's €79 placeholder)
|
||||
if (!priceNet) {
|
||||
for (const pat of [/([0-9]{1,5},[0-9]{2})\s*€/, /€\s*([0-9]{1,5},[0-9]{2})/]) {
|
||||
const m = t.match(pat);
|
||||
if (m?.[1]) {
|
||||
const p = parseGermanPrice(m[1]);
|
||||
if (p && p > 100 && p < 500_000) { priceNet = p; break; }
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ── DE-Lager ───────────────────────────────────────────────────────────
|
||||
let deQty: number | undefined;
|
||||
@ -745,6 +765,10 @@ export async function scrapeFs(): Promise<void> {
|
||||
compatibleBrands: detail.compatibleBrands,
|
||||
priceNet: detail.priceNet,
|
||||
productUrl: detail.url,
|
||||
// FS.com: per-warehouse breakdown (DE-Lager + Global-Lager), EUR net prices
|
||||
stockConfidence: 3,
|
||||
priceCurrency: "EUR",
|
||||
priceIncludesTax: false,
|
||||
});
|
||||
if (stockNew) stockWritten++;
|
||||
|
||||
|
||||
@ -1,56 +1,69 @@
|
||||
/**
|
||||
* NADDOD Scraper — Chinese compatible transceiver vendor
|
||||
* NADDOD Scraper v2 — Chinese compatible transceiver vendor
|
||||
*
|
||||
* naddod.com — WooCommerce store, server-rendered HTML, USD pricing.
|
||||
* Products listed under product category pages.
|
||||
* Pagination via /page/N/. Rate limited: 1 req/2sec.
|
||||
* naddod.com — Migrated from WooCommerce → Astro/Shopify-style in 2025.
|
||||
* Product URLs: /products/XXXXX.html (numeric IDs, not category slugs)
|
||||
* Sitemap: /sitemaps/products.xml
|
||||
*
|
||||
* NADDOD (Shenzhen NADDOD Information Co.) makes and sells compatible
|
||||
* optics for Cisco, Juniper, Arista, etc. Transparent USD pricing.
|
||||
* Phase 1: Parse sitemap to collect all product URLs (plain HTTP)
|
||||
* Phase 2: Fetch product detail pages — extract name, price, stock count
|
||||
* Stock format: "In Stock: 543" (exact) | "In Stock: 2.1k+" (rounded) | "In Stock: Available" (boolean)
|
||||
* Per-warehouse JSON (warehouse_stock: {us, nl, sg, cn}) is in a JS hydration
|
||||
* payload that requires JS execution — only the display count is in plain HTML.
|
||||
* → stock_confidence=2 (aggregated global count) for exact/rounded counts
|
||||
* → stock_confidence=1 (boolean) for "Available" only
|
||||
*
|
||||
* Rate limited: 1 req/2sec.
|
||||
*/
|
||||
import { pool, findOrCreateScrapedTransceiver, ensureVendor, upsertPriceObservation } from "../utils/db";
|
||||
import { pool, findOrCreateScrapedTransceiver, ensureVendor, upsertPriceObservation, upsertStockObservation } from "../utils/db";
|
||||
import { contentHash } from "../utils/hash";
|
||||
|
||||
const BASE = "https://www.naddod.com";
|
||||
const SITEMAP_URL = `${BASE}/sitemaps/products.xml`;
|
||||
const HEADERS = {
|
||||
"User-Agent": "Mozilla/5.0 (compatible; TIP-Bot/1.0; research)",
|
||||
Accept: "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
|
||||
"Accept-Language": "en-US,en;q=0.9",
|
||||
};
|
||||
|
||||
const MAX_PAGES = 30;
|
||||
|
||||
const CATEGORIES = [
|
||||
{ path: "/product-category/1g-sfp-transceivers/", formFactor: "SFP", speed: "1G", speedGbps: 1 },
|
||||
{ path: "/product-category/10g-sfp-transceivers/", formFactor: "SFP+", speed: "10G", speedGbps: 10 },
|
||||
{ path: "/product-category/25g-sfp28-transceivers/", formFactor: "SFP28", speed: "25G", speedGbps: 25 },
|
||||
{ path: "/product-category/40g-qsfp-transceivers/", formFactor: "QSFP+", speed: "40G", speedGbps: 40 },
|
||||
{ path: "/product-category/100g-qsfp28-transceivers/", formFactor: "QSFP28", speed: "100G", speedGbps: 100 },
|
||||
{ path: "/product-category/200g-qsfp56-transceivers/", formFactor: "QSFP56", speed: "200G", speedGbps: 200 },
|
||||
{ path: "/product-category/400g-qsfp-dd-transceivers/", formFactor: "QSFP-DD", speed: "400G", speedGbps: 400 },
|
||||
{ path: "/product-category/800g-osfp-transceivers/", formFactor: "OSFP", speed: "800G", speedGbps: 800 },
|
||||
{ path: "/product-category/transceivers/", formFactor: "SFP+", speed: "10G", speedGbps: 10 },
|
||||
];
|
||||
|
||||
interface Product {
|
||||
partNumber: string;
|
||||
name: string;
|
||||
url: string;
|
||||
price?: number;
|
||||
formFactor: string;
|
||||
speed: string;
|
||||
speedGbps: number;
|
||||
reachLabel?: string;
|
||||
reachMeters?: number;
|
||||
fiberType?: string;
|
||||
wavelength?: string;
|
||||
compatibleWith?: string;
|
||||
}
|
||||
// Limit detail-page fetches per run to stay reasonable
|
||||
const MAX_DETAIL_PAGES = 600;
|
||||
|
||||
function sleep(ms: number): Promise<void> {
|
||||
return new Promise((resolve) => setTimeout(resolve, ms));
|
||||
}
|
||||
|
||||
// ── Classification helpers ──────────────────────────────────────────────────
|
||||
|
||||
function detectFormFactor(text: string): string {
|
||||
const t = text.toLowerCase();
|
||||
if (/\bosfp\b/.test(t)) return "OSFP";
|
||||
if (/\bqsfp.?dd800\b|\bqsfp.?dd\s+800\b/.test(t)) return "QSFP-DD800";
|
||||
if (/\bqsfp.?dd\b/.test(t)) return "QSFP-DD";
|
||||
if (/\bqsfp56\b/.test(t)) return "QSFP56";
|
||||
if (/\bqsfp112\b/.test(t)) return "QSFP112";
|
||||
if (/\bqsfp28\b/.test(t)) return "QSFP28";
|
||||
if (/\bqsfp\+|\bqsfp\s*plus\b/.test(t)) return "QSFP+";
|
||||
if (/\bsfp28\b/.test(t)) return "SFP28";
|
||||
if (/\bsfp.?\+|10g.*sfp|sfp.*10g/.test(t)) return "SFP+";
|
||||
if (/\bsfp\b/.test(t)) return "SFP";
|
||||
if (/\bxfp\b/.test(t)) return "XFP";
|
||||
return "SFP+"; // default
|
||||
}
|
||||
|
||||
function detectSpeedGbps(text: string): { speed: string; speedGbps: number } {
|
||||
const t = text.toUpperCase();
|
||||
if (/\b800G\b|\b800GBE\b/.test(t)) return { speed: "800G", speedGbps: 800 };
|
||||
if (/\b400G\b|\b400GBE\b/.test(t)) return { speed: "400G", speedGbps: 400 };
|
||||
if (/\b200G\b|\b200GBE\b/.test(t)) return { speed: "200G", speedGbps: 200 };
|
||||
if (/\b100G\b|\b100GBE\b/.test(t)) return { speed: "100G", speedGbps: 100 };
|
||||
if (/\b40G\b|\b40GBE\b/.test(t)) return { speed: "40G", speedGbps: 40 };
|
||||
if (/\b25G\b|\b25GBE\b/.test(t)) return { speed: "25G", speedGbps: 25 };
|
||||
if (/\b10G\b|\b10GBE\b/.test(t)) return { speed: "10G", speedGbps: 10 };
|
||||
if (/\b1G\b|\b1GBE\b|\bGIGABIT\b/.test(t)) return { speed: "1G", speedGbps: 1 };
|
||||
return { speed: "Unknown", speedGbps: 0 };
|
||||
}
|
||||
|
||||
function detectReach(text: string): { label: string; meters: number } | undefined {
|
||||
const patterns: [RegExp, string, number][] = [
|
||||
[/\b120\s*km\b/i, "120km", 120000],
|
||||
@ -65,16 +78,13 @@ function detectReach(text: string): { label: string; meters: number } | undefine
|
||||
[/\b300\s*m\b/i, "300m", 300],
|
||||
[/\b150\s*m\b/i, "150m", 150],
|
||||
[/\b100\s*m\b/i, "100m", 100],
|
||||
[/\bLR4\b/, "10km", 10000],
|
||||
[/\bLR\b/, "10km", 10000],
|
||||
[/\bER4?\b/, "40km", 40000],
|
||||
[/\bZR4?\b/, "80km", 80000],
|
||||
[/\bSR4?\b/, "300m", 300],
|
||||
[/\bDR4?\b/, "500m", 500],
|
||||
[/\bLR4\b/, "10km", 10000], [/\bLR\b/, "10km", 10000],
|
||||
[/\bER4?\b/, "40km", 40000], [/\bZR4?\b/, "80km", 80000],
|
||||
[/\bSR4?\b/, "300m", 300], [/\bDR4?\b/, "500m", 500],
|
||||
[/\bFR4?\b/, "2km", 2000],
|
||||
];
|
||||
for (const [regex, label, meters] of patterns) {
|
||||
if (regex.test(text)) return { label, meters };
|
||||
for (const [re, label, meters] of patterns) {
|
||||
if (re.test(text)) return { label, meters };
|
||||
}
|
||||
return undefined;
|
||||
}
|
||||
@ -87,195 +97,230 @@ function detectFiber(text: string): string {
|
||||
}
|
||||
|
||||
function detectWavelength(text: string): string {
|
||||
const match = text.match(/(\d{3,4})\s*nm/i);
|
||||
return match ? match[1] : "";
|
||||
const m = text.match(/(\d{3,4})\s*nm/i);
|
||||
return m ? m[1] : "";
|
||||
}
|
||||
|
||||
function extractCompatibleVendor(name: string): string {
|
||||
const brands = ["Cisco", "Juniper", "Arista", "HPE", "Dell", "Brocade", "Extreme", "Huawei",
|
||||
"Nokia", "MikroTik", "Mellanox", "Nvidia", "Ubiquiti"];
|
||||
for (const brand of brands) {
|
||||
if (new RegExp(`\\b${brand}\\b`, "i").test(name)) return brand;
|
||||
}
|
||||
const match = name.match(/(?:for\s+|compatible\s+(?:with\s+)?)([A-Z][a-z]+(?:\s+[A-Z][a-z]+)?)/);
|
||||
return match ? match[1] : "";
|
||||
function isTransceiver(name: string): boolean {
|
||||
const lower = name.toLowerCase();
|
||||
// Include: SFP, QSFP, OSFP, XFP, DAC/AOC cables count as transceivers for stock tracking
|
||||
return /sfp|qsfp|osfp|xfp|transceiver|dac|aoc|cwdm|dwdm/i.test(lower);
|
||||
}
|
||||
|
||||
function parseProductList(html: string, cat: typeof CATEGORIES[number]): Product[] {
|
||||
const products: Product[] = [];
|
||||
const seen = new Set<string>();
|
||||
const collapsed = html.replace(/\s+/g, " ");
|
||||
// ── Stock parsing ───────────────────────────────────────────────────────────
|
||||
|
||||
// Strategy 1: WooCommerce standard product loop
|
||||
const cardRegex = /<li[^>]+class="[^"]*product[^"]*"[^>]*>([\s\S]*?)<\/li>/gi;
|
||||
let cardMatch;
|
||||
while ((cardMatch = cardRegex.exec(collapsed)) !== null) {
|
||||
const card = cardMatch[1];
|
||||
/**
|
||||
* Parse NADDOD stock display text.
|
||||
* "In Stock: 543" → { qty: 543, confidence: 2 }
|
||||
* "In Stock: 2.1k+" → { qty: 2100, confidence: 2 } (approximate, rounded)
|
||||
* "In Stock: Available" → { qty: undefined, confidence: 1 }
|
||||
* Returns null if no stock text found.
|
||||
*/
|
||||
function parseStockText(html: string): { qty?: number; confidence: 1 | 2 } | null {
|
||||
// Look for "In Stock: X" pattern in page text
|
||||
const m = html.match(/In\s+Stock[:\s]+([^\s<"]+)/i);
|
||||
if (!m) return null;
|
||||
|
||||
const urlMatch = card.match(/href="(https?:\/\/(?:www\.)?naddod\.com\/product\/[^"]+)"/i);
|
||||
if (!urlMatch) continue;
|
||||
const url = urlMatch[1];
|
||||
if (seen.has(url)) continue;
|
||||
seen.add(url);
|
||||
const raw = m[1].trim().toLowerCase();
|
||||
|
||||
const nameMatch = card.match(/woocommerce-loop-product__title[^>]*>([^<]+)</i) ||
|
||||
card.match(/<h2[^>]*>([^<]{10,})<\/h2>/i) ||
|
||||
card.match(/<h3[^>]*>([^<]{10,})<\/h3>/i);
|
||||
if (!nameMatch) continue;
|
||||
const name = nameMatch[1].trim().replace(/&/g, "&").replace(/–/g, "–");
|
||||
if (name.length < 5) continue;
|
||||
// "Available" = boolean only
|
||||
if (/^avail/i.test(raw)) return { confidence: 1 };
|
||||
|
||||
const priceMatch = card.match(/\$\s*([\d,]+\.?\d*)/);
|
||||
const price = priceMatch ? parseFloat(priceMatch[1].replace(/,/g, "")) : undefined;
|
||||
|
||||
const reach = detectReach(name);
|
||||
const partNumber = name.split(/\s+(?:compatible|for|sfp|qsfp)/i)[0]?.trim().slice(0, 80) || name.slice(0, 60);
|
||||
|
||||
products.push({
|
||||
partNumber, name, url,
|
||||
price: price && price > 0 && price < 100000 ? price : undefined,
|
||||
formFactor: cat.formFactor, speed: cat.speed, speedGbps: cat.speedGbps,
|
||||
reachLabel: reach?.label, reachMeters: reach?.meters,
|
||||
fiberType: detectFiber(name), wavelength: detectWavelength(name),
|
||||
compatibleWith: extractCompatibleVendor(name),
|
||||
});
|
||||
// Numeric: "543" or "2.1k+" or "1.5k+"
|
||||
const kMatch = raw.match(/^([\d.]+)k\+?$/);
|
||||
if (kMatch) {
|
||||
const qty = Math.round(parseFloat(kMatch[1]) * 1000);
|
||||
return { qty: isNaN(qty) ? undefined : qty, confidence: 2 };
|
||||
}
|
||||
|
||||
// Strategy 2: Generic product link fallback
|
||||
if (products.length === 0) {
|
||||
const linkRegex = /href="(https?:\/\/(?:www\.)?naddod\.com\/(?:product|shop)\/[^"?#]+)"[^>]*>\s*([^<]{10,})/gi;
|
||||
let m;
|
||||
while ((m = linkRegex.exec(collapsed)) !== null) {
|
||||
const url = m[1];
|
||||
const name = m[2].trim().replace(/&/g, "&");
|
||||
if (seen.has(url) || name.length < 10) continue;
|
||||
if (!/transceiver|sfp|qsfp|osfp|dac|aoc|xfp/i.test(name)) continue;
|
||||
seen.add(url);
|
||||
const exact = parseInt(raw.replace(/[^0-9]/g, ""), 10);
|
||||
if (!isNaN(exact) && exact >= 0) return { qty: exact, confidence: 2 };
|
||||
|
||||
const ctx = collapsed.slice(Math.max(0, m.index - 200), m.index + 500);
|
||||
const priceM = ctx.match(/\$\s*([\d,]+\.?\d*)/);
|
||||
const price = priceM ? parseFloat(priceM[1].replace(/,/g, "")) : undefined;
|
||||
const reach = detectReach(name);
|
||||
|
||||
products.push({
|
||||
partNumber: name.split(/\s+/)[0]?.slice(0, 80) || "",
|
||||
name, url,
|
||||
price: price && price > 0 && price < 100000 ? price : undefined,
|
||||
formFactor: cat.formFactor, speed: cat.speed, speedGbps: cat.speedGbps,
|
||||
reachLabel: reach?.label, reachMeters: reach?.meters,
|
||||
fiberType: detectFiber(name), wavelength: detectWavelength(name),
|
||||
compatibleWith: extractCompatibleVendor(name),
|
||||
});
|
||||
}
|
||||
return { confidence: 1 }; // fallback: boolean
|
||||
}
|
||||
|
||||
return products;
|
||||
}
|
||||
// ── HTTP helpers ────────────────────────────────────────────────────────────
|
||||
|
||||
async function fetchPage(url: string): Promise<string> {
|
||||
async function fetchText(url: string): Promise<string> {
|
||||
const resp = await fetch(url, { headers: HEADERS, signal: AbortSignal.timeout(30000) });
|
||||
if (!resp.ok) throw new Error(`HTTP ${resp.status} for ${url}`);
|
||||
if (!resp.ok) throw new Error(`HTTP ${resp.status}`);
|
||||
return resp.text();
|
||||
}
|
||||
|
||||
/**
|
||||
* Parse product detail page HTML → extract name, price, stock.
|
||||
*/
|
||||
function parseDetailPage(html: string, url: string): {
|
||||
name: string;
|
||||
price?: number;
|
||||
stock: { qty?: number; confidence: 1 | 2 } | null;
|
||||
} | null {
|
||||
// Product name: og:title or <title> or <h1>
|
||||
const ogTitle = html.match(/<meta\s+property="og:title"\s+content="([^"]+)"/i)?.[1];
|
||||
const h1 = html.match(/<h1[^>]*>([^<]{15,})<\/h1>/i)?.[1]?.trim();
|
||||
const titleTag = html.match(/<title[^>]*>([^<]+)<\/title>/i)?.[1]?.trim();
|
||||
const name = (ogTitle || h1 || titleTag || "").replace(/\s+/g, " ").slice(0, 200);
|
||||
|
||||
if (!name || name.length < 10) return null;
|
||||
if (!isTransceiver(name)) return null;
|
||||
|
||||
// Price: "US$ 10.90" or "$10.90"
|
||||
const priceMatch = html.match(/US\$\s*([\d,]+\.?\d{0,2})/i) ||
|
||||
html.match(/\$\s*([\d,]+\.\d{2})\b/);
|
||||
const price = priceMatch ? parseFloat(priceMatch[1].replace(/,/g, "")) : undefined;
|
||||
|
||||
// Stock count
|
||||
const stock = parseStockText(html);
|
||||
|
||||
return { name, price: price && price > 0 && price < 500000 ? price : undefined, stock };
|
||||
}
|
||||
|
||||
// ── Sitemap parsing ─────────────────────────────────────────────────────────
|
||||
|
||||
async function fetchProductUrlsFromSitemap(): Promise<string[]> {
|
||||
console.log(` Fetching sitemap: ${SITEMAP_URL}`);
|
||||
const xml = await fetchText(SITEMAP_URL);
|
||||
|
||||
// Extract all <loc> URLs that match /products/XXXXX.html
|
||||
const urls: string[] = [];
|
||||
const locRegex = /<loc>([^<]+\/products\/\d+\.html)<\/loc>/gi;
|
||||
let m: RegExpExecArray | null;
|
||||
while ((m = locRegex.exec(xml)) !== null) {
|
||||
const url = m[1].trim();
|
||||
// Keep only canonical English URLs (no language prefix)
|
||||
if (!url.includes("/en/") && !url.includes("/de/") && !url.includes("/fr/")) {
|
||||
urls.push(url);
|
||||
}
|
||||
}
|
||||
|
||||
return [...new Set(urls)]; // deduplicate
|
||||
}
|
||||
|
||||
// ── Main scraper ────────────────────────────────────────────────────────────
|
||||
|
||||
export async function scrapeNaddod(): Promise<void> {
|
||||
console.log("=== NADDOD Scraper Starting ===\n");
|
||||
console.log("=== NADDOD Scraper v2 Starting (sitemap + detail mode) ===\n");
|
||||
|
||||
const vendorId = await ensureVendor(
|
||||
"NADDOD",
|
||||
"compatible",
|
||||
"https://www.naddod.com",
|
||||
"https://www.naddod.com/product-category/transceivers/",
|
||||
"https://www.naddod.com/collections/transceivers",
|
||||
);
|
||||
|
||||
let totalProducts = 0;
|
||||
let priceUpdates = 0;
|
||||
const seenCategories = new Set<string>();
|
||||
|
||||
for (const cat of CATEGORIES) {
|
||||
console.log(`\n--- ${cat.formFactor} (${cat.speed}) [${cat.path}] ---`);
|
||||
|
||||
// ── Phase 1: Discover product URLs via sitemap ────────────────────────────
|
||||
console.log("[Phase 1] Discovering products from sitemap...");
|
||||
let productUrls: string[] = [];
|
||||
try {
|
||||
const html1 = await fetchPage(BASE + cat.path);
|
||||
const catProducts = parseProductList(html1, cat);
|
||||
|
||||
if (cat.path.includes("/transceivers/") && seenCategories.size > 3) {
|
||||
console.log(` Skipping generic fallback (${seenCategories.size} specific categories scraped)`);
|
||||
continue;
|
||||
productUrls = await fetchProductUrlsFromSitemap();
|
||||
console.log(` Found ${productUrls.length} product URLs in sitemap`);
|
||||
} catch (err) {
|
||||
console.error(` Sitemap fetch failed: ${(err as Error).message}`);
|
||||
return;
|
||||
}
|
||||
|
||||
if (catProducts.length === 0) {
|
||||
console.log(" No products on page 1 — skipping");
|
||||
continue;
|
||||
if (productUrls.length === 0) {
|
||||
console.warn(" No product URLs found — aborting");
|
||||
return;
|
||||
}
|
||||
|
||||
seenCategories.add(cat.path);
|
||||
console.log(` Found ${catProducts.length} products on page 1`);
|
||||
// Limit to avoid excessive runtime
|
||||
const urls = productUrls.slice(0, MAX_DETAIL_PAGES);
|
||||
console.log(` Processing ${urls.length} products (limit: ${MAX_DETAIL_PAGES})`);
|
||||
|
||||
const totalPagesMatch = html1.match(/page-numbers[^>]*>(\d+)<\/a>(?!.*page-numbers)/);
|
||||
const totalPages = totalPagesMatch ? Math.min(parseInt(totalPagesMatch[1]), MAX_PAGES) : 1;
|
||||
console.log(` Total pages: ${totalPages}`);
|
||||
// ── Phase 2: Fetch detail pages + write to DB ─────────────────────────────
|
||||
console.log("\n[Phase 2] Fetching product detail pages...");
|
||||
|
||||
const allProducts = [...catProducts];
|
||||
let processed = 0;
|
||||
let priceUpdates = 0;
|
||||
let stockWritten = 0;
|
||||
let stockSkipped = 0;
|
||||
let skippedNonTx = 0;
|
||||
let errors = 0;
|
||||
|
||||
for (let page = 2; page <= totalPages; page++) {
|
||||
for (const url of urls) {
|
||||
await sleep(2000);
|
||||
try {
|
||||
const html = await fetchPage(BASE + cat.path + `page/${page}/`);
|
||||
const pageProds = parseProductList(html, cat);
|
||||
if (pageProds.length === 0) break;
|
||||
allProducts.push(...pageProds);
|
||||
console.log(` Page ${page}: ${pageProds.length} products`);
|
||||
} catch (err) {
|
||||
console.warn(` Page ${page} failed: ${(err as Error).message.slice(0, 60)}`);
|
||||
break;
|
||||
}
|
||||
const html = await fetchText(url);
|
||||
const detail = parseDetailPage(html, url);
|
||||
|
||||
if (!detail) {
|
||||
skippedNonTx++;
|
||||
continue;
|
||||
}
|
||||
|
||||
const uniqueProducts = allProducts.filter((p, i, arr) => arr.findIndex((x) => x.url === p.url) === i);
|
||||
console.log(` Total unique: ${uniqueProducts.length}`);
|
||||
const { name, price, stock } = detail;
|
||||
const { speed, speedGbps } = detectSpeedGbps(name);
|
||||
const formFactor = detectFormFactor(name);
|
||||
const reach = detectReach(name);
|
||||
const fiberType = detectFiber(name);
|
||||
const wavelength = detectWavelength(name);
|
||||
|
||||
// Extract part number from name (first word-group before "Compatible" or vendor name)
|
||||
const partNumber = name.split(/\s+(?:compatible|for\s+[A-Z]|sfp|qsfp)/i)[0]?.trim().slice(0, 80) || name.slice(0, 60);
|
||||
|
||||
for (const product of uniqueProducts) {
|
||||
try {
|
||||
const txId = await findOrCreateScrapedTransceiver({
|
||||
partNumber: product.partNumber,
|
||||
partNumber,
|
||||
vendorId,
|
||||
formFactor: product.formFactor,
|
||||
speedGbps: product.speedGbps,
|
||||
speed: product.speed,
|
||||
reachMeters: product.reachMeters,
|
||||
reachLabel: product.reachLabel,
|
||||
fiberType: product.fiberType,
|
||||
wavelengths: product.wavelength,
|
||||
formFactor,
|
||||
speedGbps,
|
||||
speed,
|
||||
reachMeters: reach?.meters,
|
||||
reachLabel: reach?.label,
|
||||
fiberType,
|
||||
wavelengths: wavelength,
|
||||
category: "DataCenter",
|
||||
});
|
||||
|
||||
if (product.price && product.price > 0) {
|
||||
const hash = contentHash({ price: product.price, part: product.partNumber });
|
||||
const updated = await upsertPriceObservation({
|
||||
// Price observation
|
||||
if (price && price > 0) {
|
||||
const hash = contentHash({ price, part: partNumber });
|
||||
const isNew = await upsertPriceObservation({
|
||||
transceiverId: txId,
|
||||
sourceVendorId: vendorId,
|
||||
price: product.price,
|
||||
price,
|
||||
currency: "USD",
|
||||
stockLevel: "in_stock",
|
||||
url: product.url,
|
||||
stockLevel: stock?.qty !== undefined && stock.qty > 0 ? "in_stock" :
|
||||
stock?.confidence === 1 ? "in_stock" : "unknown",
|
||||
url,
|
||||
contentHash: hash,
|
||||
});
|
||||
if (updated) priceUpdates++;
|
||||
}
|
||||
totalProducts++;
|
||||
} catch (err) {
|
||||
console.warn(` DB error: ${(err as Error).message.slice(0, 80)}`);
|
||||
}
|
||||
}
|
||||
} catch (err) {
|
||||
console.error(` Category failed: ${(err as Error).message}`);
|
||||
if (isNew) priceUpdates++;
|
||||
}
|
||||
|
||||
await sleep(2000);
|
||||
// Stock observation
|
||||
if (stock !== null) {
|
||||
const stockLevel = stock.qty !== undefined ? (stock.qty > 0 ? "in_stock" : "out_of_stock") : "in_stock";
|
||||
const isNew = await upsertStockObservation({
|
||||
transceiverId: txId,
|
||||
sourceVendorId: vendorId,
|
||||
stockLevel,
|
||||
quantityAvailable: stock.qty !== undefined && stock.qty > 0 ? stock.qty : undefined,
|
||||
productUrl: url,
|
||||
stockConfidence: stock.confidence,
|
||||
priceCurrency: "USD",
|
||||
priceIncludesTax: false,
|
||||
});
|
||||
if (isNew) stockWritten++;
|
||||
else stockSkipped++;
|
||||
}
|
||||
|
||||
console.log(`\n=== NADDOD Complete: ${totalProducts} products, ${priceUpdates} price updates ===`);
|
||||
processed++;
|
||||
if (processed % 50 === 0) {
|
||||
console.log(` Progress: ${processed}/${urls.length} | prices: ${priceUpdates} | stock: ${stockWritten} new / ${stockSkipped} unchanged`);
|
||||
}
|
||||
} catch (err) {
|
||||
errors++;
|
||||
if (errors <= 10) console.warn(` Error for ${url}: ${(err as Error).message.slice(0, 70)}`);
|
||||
}
|
||||
}
|
||||
|
||||
console.log("\n=== NADDOD Scraper v2 Complete ===");
|
||||
console.log(` Products processed: ${processed}`);
|
||||
console.log(` Non-transceivers skip: ${skippedNonTx}`);
|
||||
console.log(` Price observations: ${priceUpdates} new`);
|
||||
console.log(` Stock observations: ${stockWritten} new, ${stockSkipped} unchanged`);
|
||||
if (errors > 0) console.warn(` Errors: ${errors}`);
|
||||
}
|
||||
|
||||
if (require.main === module) {
|
||||
|
||||
@ -8,10 +8,15 @@
|
||||
* API: GET /mall/commodity/list?categoryId=21&attributes=VALUE_ID&page=N&pageSize=30
|
||||
* Returns HTML fragment with product cards.
|
||||
*
|
||||
* Phase 1: Collect product list + prices via API (plain HTTP, no JS needed)
|
||||
* Phase 2: Fetch product detail pages to extract real-time stock count
|
||||
* Format: "5507 in real-time stock, 17 Apr, 2026"
|
||||
* Confidence: 2 (aggregated global count with vendor timestamp)
|
||||
*
|
||||
* Rate limited: 1 req/2sec.
|
||||
*/
|
||||
import * as cheerio from "cheerio";
|
||||
import { pool, findOrCreateScrapedTransceiver, ensureVendor, upsertPriceObservation } from "../utils/db";
|
||||
import { pool, findOrCreateScrapedTransceiver, ensureVendor, upsertPriceObservation, upsertStockObservation } from "../utils/db";
|
||||
import { contentHash } from "../utils/hash";
|
||||
|
||||
const BASE = "https://www.qsfptek.com";
|
||||
@ -26,6 +31,10 @@ const HEADERS = {
|
||||
|
||||
const MAX_PAGES = 40;
|
||||
const PAGE_SIZE = 30;
|
||||
// Limit detail-page fetches per run to avoid overwhelming the server
|
||||
// (~500 products × 2s = ~17min for a full refresh; subsequent runs are faster
|
||||
// since unchanged stock counts are skipped by upsertStockObservation)
|
||||
const MAX_DETAIL_PAGES = 500;
|
||||
|
||||
// Data rate attribute values (found in /mall/commodity/attribute?categoryId=21)
|
||||
// pid = "2c9180837bbaf08f017bbdd1ebf7001e" (Data Rate attribute group)
|
||||
@ -59,6 +68,11 @@ interface Product {
|
||||
wavelength?: string;
|
||||
}
|
||||
|
||||
interface StockDetail {
|
||||
qty: number;
|
||||
vendorTs: Date | null;
|
||||
}
|
||||
|
||||
function sleep(ms: number): Promise<void> {
|
||||
return new Promise((resolve) => setTimeout(resolve, ms));
|
||||
}
|
||||
@ -97,6 +111,40 @@ function detectWavelength(text: string): string {
|
||||
return m ? m[1] : "";
|
||||
}
|
||||
|
||||
/**
|
||||
* Parse QSFPTEK real-time stock text.
|
||||
* Format: "5507 in real-time stock, 17 Apr, 2026"
|
||||
* Returns { qty, vendorTs } or null if not found.
|
||||
*/
|
||||
function parseStockDetail(html: string): StockDetail | null {
|
||||
// Match: "<number> in real-time stock, <date>"
|
||||
const m = html.match(/(\d[\d,]*)\s+in\s+real-?time\s+stock[,\s]+(\d{1,2}\s+\w+,?\s*\d{4})/i);
|
||||
if (!m) {
|
||||
// Also try: "<number> in stock" without timestamp
|
||||
const simple = html.match(/(\d[\d,]+)\s+in\s+(?:real-?time\s+)?stock\b/i);
|
||||
if (simple) {
|
||||
const qty = parseInt(simple[1].replace(/,/g, ""), 10);
|
||||
return isNaN(qty) || qty < 0 ? null : { qty, vendorTs: null };
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
const qty = parseInt(m[1].replace(/,/g, ""), 10);
|
||||
if (isNaN(qty) || qty < 0) return null;
|
||||
|
||||
// Parse vendor timestamp: "17 Apr, 2026" → Date
|
||||
let vendorTs: Date | null = null;
|
||||
try {
|
||||
const dateStr = m[2].replace(",", "");
|
||||
const d = new Date(dateStr);
|
||||
if (!isNaN(d.getTime())) vendorTs = d;
|
||||
} catch {
|
||||
// ignore unparseable date
|
||||
}
|
||||
|
||||
return { qty, vendorTs };
|
||||
}
|
||||
|
||||
function parseProductFragment(html: string, attr: typeof DATA_RATE_ATTRIBUTES[number]): Product[] {
|
||||
const $ = cheerio.load(html);
|
||||
const products: Product[] = [];
|
||||
@ -145,8 +193,16 @@ async function fetchProductList(attrId: string, page: number): Promise<string> {
|
||||
return resp.text();
|
||||
}
|
||||
|
||||
async function fetchDetailPage(url: string): Promise<string> {
|
||||
// Normalise URL: ensure /en/ prefix for detail pages
|
||||
const normalized = url.includes("/en/product/") ? url : url.replace("/product/", "/en/product/");
|
||||
const resp = await fetch(normalized, { headers: HEADERS, signal: AbortSignal.timeout(30000) });
|
||||
if (!resp.ok) throw new Error(`HTTP ${resp.status} for ${normalized}`);
|
||||
return resp.text();
|
||||
}
|
||||
|
||||
export async function scrapeQsfptek(): Promise<void> {
|
||||
console.log("=== QSFPTEK Scraper Starting (API mode) ===\n");
|
||||
console.log("=== QSFPTEK Scraper v2 Starting (API + stock detail mode) ===\n");
|
||||
|
||||
const vendorId = await ensureVendor(
|
||||
"QSFPTEK",
|
||||
@ -155,13 +211,12 @@ export async function scrapeQsfptek(): Promise<void> {
|
||||
"https://www.qsfptek.com/c/fiber-optic-transceiver.html",
|
||||
);
|
||||
|
||||
let totalProducts = 0;
|
||||
let priceUpdates = 0;
|
||||
// ── Phase 1: Collect all products via API listing ─────────────────────────
|
||||
console.log("[Phase 1] Collecting product catalog...");
|
||||
const allProducts = new Map<string, Product>(); // url → product
|
||||
|
||||
for (const attr of DATA_RATE_ATTRIBUTES) {
|
||||
console.log(`\n--- ${attr.formFactor} (${attr.speed}) ---`);
|
||||
const allProducts: Product[] = [];
|
||||
const allSeen = new Set<string>();
|
||||
console.log(`\n ${attr.formFactor} (${attr.speed})`);
|
||||
|
||||
for (let page = 1; page <= MAX_PAGES; page++) {
|
||||
try {
|
||||
@ -174,18 +229,16 @@ export async function scrapeQsfptek(): Promise<void> {
|
||||
break;
|
||||
}
|
||||
|
||||
// Dedupe against already seen (API may repeat items)
|
||||
const newProds = pageProds.filter(p => {
|
||||
if (allSeen.has(p.url)) return false;
|
||||
allSeen.add(p.url);
|
||||
return true;
|
||||
});
|
||||
allProducts.push(...newProds);
|
||||
console.log(` Page ${page}: ${pageProds.length} total, ${newProds.length} new (${allProducts.length} running)`);
|
||||
let newCount = 0;
|
||||
for (const p of pageProds) {
|
||||
if (!allProducts.has(p.url)) {
|
||||
allProducts.set(p.url, p);
|
||||
newCount++;
|
||||
}
|
||||
}
|
||||
console.log(` Page ${page}: ${pageProds.length} results, ${newCount} new (${allProducts.size} total)`);
|
||||
|
||||
// If we got fewer than pageSize, we're on the last page
|
||||
if (pageProds.length < PAGE_SIZE) break;
|
||||
|
||||
if (page < MAX_PAGES) await sleep(2000);
|
||||
} catch (err) {
|
||||
console.warn(` Page ${page} failed: ${(err as Error).message.slice(0, 80)}`);
|
||||
@ -193,9 +246,24 @@ export async function scrapeQsfptek(): Promise<void> {
|
||||
}
|
||||
}
|
||||
|
||||
console.log(` Unique products: ${allProducts.length}`);
|
||||
await sleep(2000);
|
||||
}
|
||||
|
||||
for (const product of allProducts) {
|
||||
console.log(`\n[Phase 1] Complete — ${allProducts.size} unique products collected`);
|
||||
|
||||
// ── Phase 2: Write to DB + fetch detail pages for stock counts ─────────────
|
||||
console.log("\n[Phase 2] Writing prices + fetching real-time stock counts...");
|
||||
|
||||
let totalProducts = 0;
|
||||
let priceUpdates = 0;
|
||||
let stockWritten = 0;
|
||||
let stockSkipped = 0;
|
||||
let detailFetched = 0;
|
||||
let errors = 0;
|
||||
|
||||
const products = [...allProducts.values()].slice(0, MAX_DETAIL_PAGES);
|
||||
|
||||
for (const product of products) {
|
||||
try {
|
||||
const txId = await findOrCreateScrapedTransceiver({
|
||||
partNumber: product.partNumber,
|
||||
@ -210,6 +278,7 @@ export async function scrapeQsfptek(): Promise<void> {
|
||||
category: "DataCenter",
|
||||
});
|
||||
|
||||
// Price observation from listing page
|
||||
if (product.price && product.price > 0) {
|
||||
const hash = contentHash({ price: product.price, part: product.partNumber });
|
||||
const updated = await upsertPriceObservation({
|
||||
@ -223,16 +292,51 @@ export async function scrapeQsfptek(): Promise<void> {
|
||||
});
|
||||
if (updated) priceUpdates++;
|
||||
}
|
||||
totalProducts++;
|
||||
} catch (err) {
|
||||
console.warn(` DB error: ${(err as Error).message.slice(0, 80)}`);
|
||||
}
|
||||
}
|
||||
|
||||
// Fetch detail page for real-time stock count
|
||||
await sleep(2000);
|
||||
try {
|
||||
const detailHtml = await fetchDetailPage(product.url);
|
||||
detailFetched++;
|
||||
const stockInfo = parseStockDetail(detailHtml);
|
||||
|
||||
if (stockInfo !== null) {
|
||||
const isNew = await upsertStockObservation({
|
||||
transceiverId: txId,
|
||||
sourceVendorId: vendorId,
|
||||
stockLevel: stockInfo.qty > 0 ? "in_stock" : "out_of_stock",
|
||||
quantityAvailable: stockInfo.qty > 0 ? stockInfo.qty : undefined,
|
||||
productUrl: product.url,
|
||||
// Quality metadata: QSFPTEK provides aggregated global count + timestamp
|
||||
stockConfidence: 2,
|
||||
priceCurrency: "USD",
|
||||
priceIncludesTax: false,
|
||||
stockVendorTs: stockInfo.vendorTs,
|
||||
});
|
||||
if (isNew) stockWritten++;
|
||||
else stockSkipped++;
|
||||
}
|
||||
} catch (detailErr) {
|
||||
// Detail page failures are non-fatal — we still have price data
|
||||
console.warn(` Stock fetch failed for ${product.partNumber}: ${(detailErr as Error).message.slice(0, 60)}`);
|
||||
}
|
||||
|
||||
console.log(`\n=== QSFPTEK Complete: ${totalProducts} products, ${priceUpdates} price updates ===`);
|
||||
totalProducts++;
|
||||
if (totalProducts % 50 === 0) {
|
||||
console.log(` Progress: ${totalProducts}/${products.length} products | ${priceUpdates} prices | ${stockWritten} stock obs`);
|
||||
}
|
||||
} catch (err) {
|
||||
console.warn(` DB error for ${product.partNumber}: ${(err as Error).message.slice(0, 80)}`);
|
||||
errors++;
|
||||
}
|
||||
}
|
||||
|
||||
console.log("\n=== QSFPTEK Scraper v2 Complete ===");
|
||||
console.log(` Products processed: ${totalProducts}`);
|
||||
console.log(` Price observations: ${priceUpdates} new`);
|
||||
console.log(` Detail pages fetched: ${detailFetched}`);
|
||||
console.log(` Stock observations: ${stockWritten} new, ${stockSkipped} unchanged`);
|
||||
if (errors > 0) console.warn(` Errors: ${errors}`);
|
||||
}
|
||||
|
||||
if (require.main === module) {
|
||||
|
||||
@ -165,9 +165,16 @@ export async function upsertPriceObservation(params: {
|
||||
}
|
||||
|
||||
/**
|
||||
* Upsert a stock observation with full warehouse breakdown (FS.com v2).
|
||||
* Upsert a stock observation with full warehouse breakdown (FS.com v2+).
|
||||
* Writes to stock_observations including DE-Lager, Global-Lager, Nachlieferung,
|
||||
* units_sold, compatible_brands, price_net, and product_url columns.
|
||||
* units_sold, compatible_brands, price_net, product_url, plus quality metadata:
|
||||
* stock_confidence, price_currency, price_includes_tax, stock_vendor_ts.
|
||||
*
|
||||
* stock_confidence:
|
||||
* 1 = boolean only (in_stock true/false, no unit count)
|
||||
* 2 = aggregated global count (single number, e.g. QSFPTEK "5507 in real-time stock")
|
||||
* 3 = per-warehouse breakdown (e.g. FS.com DE-Lager + Global-Lager split)
|
||||
*
|
||||
* Returns true only when the data has changed since the last observation.
|
||||
*/
|
||||
export async function upsertStockObservation(params: {
|
||||
@ -185,19 +192,26 @@ export async function upsertStockObservation(params: {
|
||||
compatibleBrands?: string[];
|
||||
priceNet?: number;
|
||||
productUrl?: string;
|
||||
// Quality metadata (migration 038)
|
||||
stockConfidence?: 1 | 2 | 3;
|
||||
priceCurrency?: string;
|
||||
priceIncludesTax?: boolean;
|
||||
stockVendorTs?: Date | null;
|
||||
}): Promise<boolean> {
|
||||
// Skip if there is genuinely no warehouse data at all
|
||||
// (includes backorderQty so products available only on backorder are recorded)
|
||||
if (
|
||||
params.warehouseDeQty === undefined &&
|
||||
params.warehouseGlobalQty === undefined &&
|
||||
params.quantityAvailable === undefined
|
||||
params.quantityAvailable === undefined &&
|
||||
params.backorderQty === undefined
|
||||
) {
|
||||
return false;
|
||||
}
|
||||
|
||||
// Compare against the last observation to avoid duplicate writes
|
||||
const lastObs = await pool.query(
|
||||
`SELECT warehouse_de_qty, warehouse_global_qty, backorder_qty, units_sold
|
||||
`SELECT warehouse_de_qty, warehouse_global_qty, backorder_qty, units_sold, quantity_available
|
||||
FROM stock_observations
|
||||
WHERE transceiver_id = $1 AND source_vendor_id = $2
|
||||
ORDER BY time DESC LIMIT 1`,
|
||||
@ -210,12 +224,13 @@ export async function upsertStockObservation(params: {
|
||||
(r.warehouse_de_qty ?? null) === (params.warehouseDeQty ?? null) &&
|
||||
(r.warehouse_global_qty ?? null) === (params.warehouseGlobalQty ?? null) &&
|
||||
(r.backorder_qty ?? null) === (params.backorderQty ?? null) &&
|
||||
(r.units_sold ?? null) === (params.unitsSold ?? null);
|
||||
(r.units_sold ?? null) === (params.unitsSold ?? null) &&
|
||||
(r.quantity_available ?? null) === (params.quantityAvailable ?? null);
|
||||
if (unchanged) return false;
|
||||
}
|
||||
|
||||
const inStock =
|
||||
((params.warehouseDeQty ?? 0) + (params.warehouseGlobalQty ?? 0)) > 0;
|
||||
((params.warehouseDeQty ?? 0) + (params.warehouseGlobalQty ?? 0) + (params.quantityAvailable ?? 0)) > 0;
|
||||
|
||||
await pool.query(
|
||||
`INSERT INTO stock_observations (
|
||||
@ -224,14 +239,16 @@ export async function upsertStockObservation(params: {
|
||||
warehouse_de_qty, warehouse_de_delivery_date,
|
||||
warehouse_global_qty, warehouse_global_delivery_date,
|
||||
backorder_qty, backorder_estimated_date,
|
||||
units_sold, compatible_brands, price_net, product_url
|
||||
units_sold, compatible_brands, price_net, product_url,
|
||||
stock_confidence, price_currency, price_includes_tax, stock_vendor_ts
|
||||
) VALUES (
|
||||
NOW(), $1, $2,
|
||||
$3, $4,
|
||||
$5, $6::date,
|
||||
$7, $8::date,
|
||||
$9, $10::date,
|
||||
$11, $12, $13, $14
|
||||
$11, $12, $13, $14,
|
||||
$15, $16, $17, $18
|
||||
)`,
|
||||
[
|
||||
params.transceiverId,
|
||||
@ -248,6 +265,10 @@ export async function upsertStockObservation(params: {
|
||||
params.compatibleBrands?.length ? params.compatibleBrands : null,
|
||||
params.priceNet ?? null,
|
||||
params.productUrl ?? null,
|
||||
params.stockConfidence ?? 1,
|
||||
params.priceCurrency ?? null,
|
||||
params.priceIncludesTax ?? null,
|
||||
params.stockVendorTs ?? null,
|
||||
]
|
||||
);
|
||||
|
||||
|
||||
17
sql/028-stock-observations-warehouse-columns.sql
Normal file
17
sql/028-stock-observations-warehouse-columns.sql
Normal file
@ -0,0 +1,17 @@
|
||||
-- Migration 028: Extend stock_observations with full warehouse breakdown
|
||||
-- Adds FS.com warehouse columns: DE-Lager, Global-Lager, Nachlieferung,
|
||||
-- units_sold, compatible_brands, price_net, product_url.
|
||||
-- NOTE: This migration was applied directly on Erik before being committed to the repo.
|
||||
-- All ADD COLUMN statements use IF NOT EXISTS guards for safe re-application.
|
||||
|
||||
ALTER TABLE stock_observations
|
||||
ADD COLUMN IF NOT EXISTS warehouse_de_qty INTEGER,
|
||||
ADD COLUMN IF NOT EXISTS warehouse_de_delivery_date DATE,
|
||||
ADD COLUMN IF NOT EXISTS warehouse_global_qty INTEGER,
|
||||
ADD COLUMN IF NOT EXISTS warehouse_global_delivery_date DATE,
|
||||
ADD COLUMN IF NOT EXISTS backorder_qty INTEGER,
|
||||
ADD COLUMN IF NOT EXISTS backorder_estimated_date DATE,
|
||||
ADD COLUMN IF NOT EXISTS units_sold BIGINT,
|
||||
ADD COLUMN IF NOT EXISTS compatible_brands TEXT[],
|
||||
ADD COLUMN IF NOT EXISTS price_net NUMERIC(12,4),
|
||||
ADD COLUMN IF NOT EXISTS product_url TEXT;
|
||||
12
sql/037-stock-observations-indexes.sql
Normal file
12
sql/037-stock-observations-indexes.sql
Normal file
@ -0,0 +1,12 @@
|
||||
-- Migration 037: Add composite indexes for stock_observations
|
||||
-- Optimises DISTINCT ON (transceiver_id, source_vendor_id) queries used
|
||||
-- by GET /api/stock and GET /api/stock/summary.
|
||||
|
||||
-- Primary lookup index for "latest per (transceiver, vendor)" queries
|
||||
CREATE INDEX IF NOT EXISTS idx_stock_tx_vendor_time
|
||||
ON stock_observations (transceiver_id, source_vendor_id, time DESC);
|
||||
|
||||
-- Partial index for "in-stock only" filter (used by /api/stock?in_stock=true)
|
||||
CREATE INDEX IF NOT EXISTS idx_stock_in_stock
|
||||
ON stock_observations (in_stock, time DESC)
|
||||
WHERE in_stock = true;
|
||||
22
sql/038-stock-observations-quality-columns.sql
Normal file
22
sql/038-stock-observations-quality-columns.sql
Normal file
@ -0,0 +1,22 @@
|
||||
-- Migration 038: Add data-quality columns to stock_observations + selective cleanup
|
||||
--
|
||||
-- stock_confidence: 1 = boolean only (in_stock true/false)
|
||||
-- 2 = aggregated global count (single number, e.g. QSFPTEK)
|
||||
-- 3 = per-warehouse breakdown (e.g. FS.com DE/Global split)
|
||||
-- price_currency: ISO 4217 code, e.g. 'USD', 'EUR', 'GBP'
|
||||
-- price_includes_tax: true = gross price, false = net/excl. VAT
|
||||
-- stock_vendor_ts: timestamp as reported by vendor (e.g. QSFPTEK "17 Apr 2026")
|
||||
|
||||
ALTER TABLE stock_observations
|
||||
ADD COLUMN IF NOT EXISTS stock_confidence SMALLINT DEFAULT 1
|
||||
CHECK (stock_confidence IN (1, 2, 3)),
|
||||
ADD COLUMN IF NOT EXISTS price_currency CHAR(3),
|
||||
ADD COLUMN IF NOT EXISTS price_includes_tax BOOLEAN,
|
||||
ADD COLUMN IF NOT EXISTS stock_vendor_ts TIMESTAMPTZ;
|
||||
|
||||
-- ── Selective cleanup ───────────────────────────────────────────────────────
|
||||
-- Truncate stock_observations: all 186 rows are from the first FS.com test run.
|
||||
-- They will be repopulated automatically at the next scheduled scraper run
|
||||
-- (02:00 / 10:00 / 18:00 via launchd). Transceiver catalog, specs, vendors,
|
||||
-- price_observations, and all other tables are left untouched.
|
||||
TRUNCATE TABLE stock_observations;
|
||||
Loading…
x
Reference in New Issue
Block a user