Rene Fichtmueller db6b97186a feat: OPN+spec equivalence matchers, 400G pricing, TIP_LLM training data
- Add OPN-based equivalence matcher robot (7,245 manufacturer-confirmed matches, confidence=1.0)
- Add spec-based equivalence matcher robot (683 matches, confidence=0.85)
  - Matches by form_factor + speed_gbps + reach_tier + wavelength ±10nm
  - Safety cap: skip FX products matching >30 competitors (too generic)
  - Daily schedule: 04:30 UTC via pg-boss
- SQL migrations 116 (OPN) + 117 (spec) with tip_extract_wavelength_nm() + tip_reach_tier() helpers
- Fix tenGtek.ts: add 3 missing 400G categories (QSFP-DD, QSFP112) — closes pricing gap
- Generate tip-llm-pricing-v1.jsonl: 80 DB-grounded QA pairs (pricing, equivalences, 400G)
- Rebuild TIP_LLM training pool: 11,999 pairs (+127 vs prev), deployed to Erik
- FX product equivalence coverage: 88.1% (959/1089)
2026-05-13 21:33:19 +02:00

31 lines
1.2 KiB
JSON

{
"raw_pairs": 12268,
"duplicates_removed": 269,
"training_pairs": 11999,
"train_pairs": 10799,
"eval_pairs": 1200,
"sources": {
"external:vendor-deep-dives.jsonl": 11200,
"external:technical-deep-dives.jsonl": 84,
"external:rir-infrastructure-data.jsonl": 150,
"external:market-business-analysis-part1.jsonl": 10,
"external:synthesized-training-samples.jsonl": 219,
"external:nanog-ripe-labs-content.jsonl": 34,
"external:academic-research-synthesis.jsonl": 109,
"training-data/tip-llm-pricing-v1.jsonl": 80,
"training-data/tip-llm-capabilities-v1.jsonl": 69,
"external:market-business-analysis-part6.jsonl": 5,
"robot-control-high.jsonl": 12,
"external:market-business-analysis-part5.jsonl": 7,
"external:market-business-analysis-part4.jsonl": 5,
"external:market-business-analysis-part2.jsonl": 8,
"external:market-business-analysis-part3.jsonl": 7
},
"files": {
"train": "training-data/runpod/tip_llm/tip_llm-sft-train.jsonl",
"eval": "training-data/runpod/tip_llm/tip_llm-sft-eval.jsonl",
"all": "training-data/runpod/tip_llm/tip_llm-sft-all.jsonl",
"manifest": "training-data/runpod/tip_llm/manifest.json"
}
}