2 Commits

Author SHA1 Message Date
Rene Fichtmueller
39a63e0401 fix(scheduler): vendor discovery crawlers daily 24/7 (not weekly) 2026-04-28 23:59:00 +02:00
Rene Fichtmueller
297dc46f2b feat(crawler-llm): intelligent vendor discovery pipeline + TIPLLM training data
- spec-validator.ts: physical plausibility checks (form factor↔speed matrix,
  wavelength↔fiber consistency, IEEE standard cross-check, reach limits).
  Outputs tier (high/medium/low/rejected) + confidence_delta for LLM scores.

- training-data-writer.ts: converts validated crawler extractions to SFT JSONL
  training pairs (spec_qa / crawl_reasoning / validation / discovery types).
  Auto-commits and pushes to Gitea tip-training-data repo in batches of 50.

- vendor-discovery-crawler.ts: PlaywrightCrawler pipeline — catalog URL →
  LLM extraction (scrapeWithLLM) → spec validation → DB persist +
  Gitea SFT training pairs. 8 vendor configs registered
  (Cisco/Juniper/Arista/FS.com/Flexoptix/Nokia/Huawei/II-VI).

- scheduler.ts: 8 weekly discover:vendor:* jobs added (Sun 20:00–Mon 10:00 UTC).
  Total registered jobs: 102.

- Gitea repo created: gitea.context-x.org/rene/tip-training-data
2026-04-28 23:46:34 +02:00