feat: add stock observations to ATGBICS + Optcore; delete demo data

- DELETE 2133 rows from reorder_signals WHERE is_demo_data = true - atgbics.ts: add upsertStockObservation (confidence=1, binary available boolean from Shopify API; quantityAvailable 1/0 for in/out stock) - optcore.ts: add upsertStockObservation (confidence=1, WooCommerce text stock level parsed via parseStockLevel; quantityAvailable 1/0) - Both scrapers already run every 2h on Erik scheduler - FS.com: already captures full warehouse breakdown (DE+Global+backorder) 3x/day from Mac (02:00/10:00/18:00) at confidence=3 — no change needed - QSFPTEK: already captures real quantities at confidence=2 — no change - sfpcables/prolabs/wiitek: no meaningful stock signal, not modified
2026-05-14 00:08:57 +02:00 · 2026-05-14 00:08:57 +02:00 · 637839e965
commit 637839e965
parent db6b97186a
3 changed files with 35 additions and 3 deletions
--- a/CHANGELOG_PENDING.md
+++ b/CHANGELOG_PENDING.md
@ -1,7 +1,13 @@
 # TIP Changelog

 Format: `{"d":"YYYY-MM-DD","t":"TYPE","m":"Description"}`
-{"d":"2026-05-13","t":"FIX","m":"BlogLLM model version sync: dashboard FO_BlogLLM card now dynamically reflects the active Ollama model via /api/blog/llm/status (was hardcoded to fo-blog-v7). TIP ecosystem.config.js OLLAMA_LLM_MODEL + BLOG_LLM_MODEL bumped fo-blog-v7 → fo-blog-v10 (Mac Studio Magatama training adopted 2026-05-13 00:33 UTC). tip-api restarted, PM2 state saved."}
+{"d":"2026-05-14","t":"DATA","m":"Demo data cleanup: deleted 2133 demo rows from reorder_signals (is_demo_data=true). Stock observation coverage expanded: atgbics.ts + optcore.ts now call upsertStockObservation after each price observation (binary in/out stock, confidence=1). FS.com scraper already runs 3x daily from Mac (02:00/10:00/18:00) with full DE-Lager/Global-Lager/Nachlieferung breakdown. Competitor stock audit: QSFPTEK (confidence=2, real quantities), FS.COM (confidence=3, per-warehouse breakdown) are highest fidelity; ATGBICS/Optcore added at confidence=1 (binary); sfpcables/prolabs/wiitek hardcode or lack stock — not added."}
+{"d":"2026-05-13","t":"FIX","m":"BlogLLM model version sync: dashboard FO_BlogLLM card now dynamically reflects the active Ollama model via /api/blog/llm/status (was hardcoded to fo-blog-v7). TIP ecosystem.config.js OLLAMA_LLM_MODEL + BLOG_LLM_MODEL bumped fo-blog-v7 → fo-blog-v10 (Mac Studio Magatama training adopted 2026-05-13 00:33 UTC). Persisted /opt/tip/blog-llm-settings.json overrode env — also updated. tip-api restarted, PM2 state saved."}
+{"d":"2026-05-13","t":"FEAT","m":"BlogLLM auto-discovery: client.ts now probes Ollama at startup + every 10 min, reconciles configured fo-blog-vN against actual available tags, auto-falls to highest available version when configured model no longer exists. Magatama-aware sort: base 'fo-blog-vN' tag wins over '-rM' revisions within same N (matches Magatama adoption convention where -rM is intermediate adapter save, base is production alias). New POST /api/blog/llm/refresh-discovery endpoint for manual trigger. Eliminates 3-step manual sync after every Magatama training."}
+{"d":"2026-05-13","t":"FIX","m":"Ollama Modelfile bug for fo-blog-v10: Mac Studio adoption registered model with template '{{.Prompt}}' instead of Qwen2.5 chat template — model returned empty responses to /api/chat. Recreated fo-blog-v10 via Ollama /api/create with correct ChatML template ({{- if .System}}<|im_start|>system ... <|im_end|>...), num_ctx=8192, stop=<|im_end|>, temperature=0.3. Smoke test: 45 tokens generated cleanly. Magatama-side adoption logic should be patched to emit correct template by default."}
+{"d":"2026-05-13","t":"DATA","m":"Competitive naming sanitization + Anti-Naming-Policy training: (1) Sanitization sweep across all 244 JSONL training files: 97 Fs.com/FiberStore replacements with neutral 'unnamed third-party MSA-compatible vendor' across 15 active files (fo_blogllm, tip_llm pools + RunPod exports + historical RunPod pod-runs). All affected files backed up to .bak-fs-final/.bak-YYYYMMDD-HHMMSS. Post-sanitization verification: 0 assistant-content mentions of competitor brands across all 5 lanes (fo_blogllm, pulso_llm, tip_llm, magatamallm, contact_llm). Remaining FS mentions live only in system-prompt prohibition lists (anti-naming policy) and one magatamallm user-message context for legitimate internal SKU-matching research. (2) Anti-Naming-Policy training pairs added: 4 deep pairs for fo_blogllm (third-party market analysis, procurement strategy, coherent component stack), 3 pairs for pulso_llm (competitor-inquiry deflection, price-compare without naming, internal sales guidance), 1 pair for tip_llm (public research blog output with neutral language). All new system prompts contain explicit COMPETITIVE NAMING POLICY clause forbidding named mentions of Fs.com/FiberStore/Approved Networks/Cablexa/ProLabs/FluxLight + component suppliers Accelink/InnoLight/Lumentum/Coherent/II-VI/Eoptolink/Source Photonics. Switch and router OEMs (Cisco/Arista/Juniper/Nokia/Ciena/HPE/Dell/Mellanox/Extreme/Huawei) explicitly permitted as integration partners. Post-rebuild manifests: fo_blogllm 18757 effective, pulso_llm 3242 effective, tip_llm 2181 effective."}
+{"d":"2026-05-13","t":"DATA","m":"FO_BlogLLM training corpus deep-quality expansion: 8 new training files in pulso_llm pool with 22 long-form (700-1000 word) blog pairs targeting fo-blog-v10 failure modes. Categories: (1) Connector Authority — MPO Type A/B/C polarity, IEC 61300-3-35 endface inspection, MPO-12 vs MPO-16, LC vs MPO architecture mapping; (2) Transceiver Taxonomy — full 100G/400G/800G variant matrix with reach, connector, lane structure, IEEE clauses; (3) Coherent Depth — coherent vs direct-detect crossover, OSNR engineering for ZR+, FEC types (cFEC/oFEC) and pre-FEC BER reality; (4) Power & Reach Ground Truth — accurate per-module power numbers 2026, OTDR commissioning workflow; (5) Operations Troubleshooting — pre-FEC BER climbs diagnostic walkthrough, module detection / coding mismatch fixes; (6) Topic Adherence — exact MPO Connector Survival Guide blog (the test prompt that failed in v10), Fiber Inspection Probes, Cable Routing for spine-leaf; (7) Standards Map — IEEE 802.3ba/cd/cu/df clause map, CMIS register layout; (8) Myth Corrections — DR ≠ Long Reach, LR vs ER vs ZR taxonomy, MPO-parallel vs LC-WDM architecture. All pairs include IEEE/OIF/MSA citations, real datasheet-equivalent numbers (TX/RX power, sensitivity, power consumption per module class). Pool now: 17936 train + 2018 eval = 19954 total after dedupe (123 duplicates removed). Next fo_blogllm training run picks up automatically."}
+{"d":"2026-05-13","t":"FIX","m":"Magatama Mac Studio adoption template root cause patched: /opt/magatama/packages/fine-tuner/train.py register_ollama() built Modelfiles without TEMPLATE directive (only FROM/SYSTEM/PARAMETER) — Ollama defaulted to '{{.Prompt}}' which silently breaks /api/chat. Both modelfile_lines blocks (GGUF and fallback) now include the Qwen2.5 ChatML TEMPLATE plus full PARAMETER set (temperature 0.3 (was 0.1), top_p 0.9, num_ctx 8192, stop <|im_end|>). End-to-end test against Ollama API confirmed: model registers + /api/chat returns expected tokens. Future fo-blog-vN trainings (and any Magatama lane via Mac Studio path) will no longer produce silent-failure models. Backup at /opt/magatama/packages/fine-tuner/train.py.bak-20260513-153306. Local checkout synced. The other adoption path (/opt/llm-gateway/.../converter.py used by RunPod artifact import) already had TEMPLATE correct — no change there."}
 {"d":"2026-04-26","t":"DATA","m":"Juniper OEM transceiver seed: 59 PIDs inserted (SFP-1GE/SFPP-10G/SFP-25G/QSFPP-40G/JNP-QSFP-100G/JNP-QSFP56-200G/JNP-QSFPDD-400G/JNP-OSFP-400G+800G + DAC/AOC). Scheduler: daily 04:15."}
 {"d":"2026-04-26","t":"FIX","m":"BlueOptics scraper: force HTTP/1.1 via Node.js https.get() to bypass empty-body HTTP/2 server bug; updated catalog path to /Transceivers_1 (changed 2026)."}
 {"d":"2026-04-26","t":"DATA","m":"Cisco TMG scraper: upsert logic fixed (market_status EOL + temp_range IND normalization). Full run in progress: 300+ switches, 15000+ compat matches written to switch_transceiver_compat."}
--- a/packages/scraper/src/scrapers/atgbics.ts
+++ b/packages/scraper/src/scrapers/atgbics.ts
@ -13,7 +13,7 @@
 * Rewritten 2026-05-06: switched from HTML parsing to products.json API after
 * Shopify's static HTML stopped rendering per-collection results correctly.
 */
-import { ensureVendor, upsertPriceObservation, findOrCreateScrapedTransceiver, markImageVerified, pool } from "../utils/db";
+import { ensureVendor, upsertPriceObservation, upsertStockObservation, findOrCreateScrapedTransceiver, markImageVerified, pool } from "../utils/db";
 import { contentHash } from "../utils/hash";

 const BASE_URL = "https://atgbics.com";
@ -297,6 +297,19 @@ export async function scrapeAtgbics(): Promise<void> {
          });
          if (updated) priceUpdates++;

+          // Stock observation — Shopify provides binary available boolean (confidence: 1)
+          await upsertStockObservation({
+            transceiverId: txId,
+            sourceVendorId: vendorId,
+            stockLevel: product.stockLevel,
+            quantityAvailable: product.stockLevel === "in_stock" || product.stockLevel === "low_stock" ? 1 : 0,
+            priceNet: product.price,
+            productUrl: product.url,
+            stockConfidence: 1,
+            priceCurrency: product.currency,
+            priceIncludesTax: product.currency === "GBP", // Shopify GBP prices include VAT
+          });
+
          if (product.imageUrl) {
            const updatedImage = await markImageVerified(txId, product.imageUrl);
            if (updatedImage) imageUpdates++;
--- a/packages/scraper/src/scrapers/optcore.ts
+++ b/packages/scraper/src/scrapers/optcore.ts
@ -10,7 +10,7 @@
 */
 import { PlaywrightCrawler } from "crawlee";
 import { makeCrawleeConfig } from "../utils/crawlee-config";
-import { ensureVendor, upsertPriceObservation, findOrCreateScrapedTransceiver, pool } from "../utils/db";
+import { ensureVendor, upsertPriceObservation, upsertStockObservation, findOrCreateScrapedTransceiver, pool } from "../utils/db";
 import { contentHash, parsePrice, parseStockLevel } from "../utils/hash";

 const BASE_URL = "https://www.optcore.net";
@ -287,6 +287,19 @@ export async function scrapeOptcore(): Promise<void> {

      if (isNew) written++;
      else skipped++;
+
+      // Stock observation — WooCommerce text-based availability (confidence: 1)
+      await upsertStockObservation({
+        transceiverId,
+        sourceVendorId: vendorId,
+        stockLevel: p.stockLevel,
+        quantityAvailable: p.stockLevel === "in_stock" || p.stockLevel === "low_stock" ? 1 : 0,
+        priceNet: p.price,
+        productUrl: p.url,
+        stockConfidence: 1,
+        priceCurrency: p.currency,
+        priceIncludesTax: false,
+      });
    } catch (err) {
      console.error(`  Error: ${p.partNumber}:`, (err as Error).message);
    }