sync: record magatama multi-llm training lanes

2026-05-10 00:11:48 +02:00 · 2026-05-10 00:11:48 +02:00 · 3926a1ef90
commit 3926a1ef90
parent 635a102932
2 changed files with 141 additions and 1 deletions
--- a/sync/CURRENT.md
+++ b/sync/CURRENT.md
@ -1,9 +1,63 @@
 # Current TIP Sync State
-Updated: 2026-05-09 22:01 UTC
+Updated: 2026-05-09 22:05 UTC
 ## Newest Work
 - MAGATAMA multi-LLM training lane expansion on 2026-05-09:
  - added first-class training lanes for:
    - `pulso_llm`
    - `contact_llm`
  - MAGATAMA training tool now exposes:
    - `MagatamaLLM`
    - `FO_BlogLLM`
    - `TIP_LLM`
    - `PulsoLLM`
    - `ContactLLM`
  - lane split is now canonical:
    - `MagatamaLLM`: MAGATAMA operations, cybersecurity, AI security, infrastructure security, resolver/fix workflows
    - `FO_BlogLLM`: Rene/Flexoptix-style blog writing, technical storytelling, market/blog structure
    - `TIP_LLM`: crawler/scraper/robot planning, source discovery, parser/selectors, switch/transceiver issue research
    - `PulsoLLM`: Flexoptix product/support/diagnostic lane for switches, transceivers, compatibility, product fit and offers
    - `ContactLLM`: contact discovery/research lane for structured, lawful contact lookup and source attribution
  - shared network/transceiver/switch knowledge is intentionally reused for `TIP_LLM` and `PulsoLLM`, but behavior/instruction pools remain separate
  - new source catalog added under MAGATAMA:
    - `training-data/model-registry/research-source-catalog-2026-05-09.json`
    - `training-data/model-registry/external-ingest/llm-lane-research-seeds-2026-05-09.jsonl`
  - source seeds added from current research include:
    - CISA KEV / CISA Malcolm / CISA ScubaGear
    - NVD CVE API
    - MITRE ATT&CK STIX/TAXII
    - OWASP LLM Top 10
    - Microsoft PyRIT
    - Microsoft Agent Governance Toolkit
    - Cisco Transceiver Module Group matrix
    - Juniper Hardware Compatibility Tool
    - Arista transceiver/cable references
    - Flexoptix product/support references
    - RFC 9309 robots.txt
    - schema.org `ContactPoint`
    - RFC 6350 vCard
    - PeeringDB API
    - RIPE Database REST API
  - lane-specific Gitea learning pool directories now exist for:
    - `training-data/gitea-learning-pool/pulso_llm/`
    - `training-data/gitea-learning-pool/contact_llm/`
  - RunPod lane exports rebuilt and deployed live on Erik:
    - `magatamallm`: `1375 train`, `153 eval`, `1528 total`
    - `fo_blogllm`: `17342 train`, `1929 eval`, `19271 total`
    - `tip_llm`: `276 train`, `31 eval`, `307 total`
    - `pulso_llm`: `28 train`, `5 eval`, `33 total`
    - `contact_llm`: `18 train`, `4 eval`, `22 total`
  - dashboard/API live checks:
    - `pulso_llm` and `contact_llm` appear in the training modal
    - RunPod provider is online for both lanes
    - `contact_llm` status correctly reports `neverTrained: true`
    - `pulso_llm` / `contact_llm` are trainable but not adopted yet because no local Ollama model tags exist yet
  - safety/automation note:
    - do not mark a lane training run successful unless an artifact exists, imports locally, passes smoke tests, and the active alias/version is switched
    - this remains the rule for all LLM lanes
 - TIP open competitor status closure on 2026-05-09:
  - added migration `sql/104-verification-evidence-ambiguous.sql`
    - extends `transceiver_verification_evidence.verification_type` with `competitor_ambiguous`
--- a/sync/history/2026-05-09-magatama-multi-llm-training-lanes.md
+++ b/sync/history/2026-05-09-magatama-multi-llm-training-lanes.md
@ -0,0 +1,86 @@
 # MAGATAMA Multi-LLM Training Lanes
 Date: 2026-05-09
 ## Decision
 MAGATAMA now treats the core specialized models as separate training lanes with separate behavior pools:
 - `magatamallm`: MAGATAMA operations, cybersecurity, AI security, infrastructure security, resolver/fix workflows.
 - `fo_blogllm`: Rene/Flexoptix-style blog writing, technical storytelling, market/blog structure.
 - `tip_llm`: crawler/scraper/robot planning, source discovery, parser/selectors, switch/transceiver issue research.
 - `pulso_llm`: Flexoptix product/support/diagnostic lane for switches, transceivers, compatibility, product fit and offers.
 - `contact_llm`: structured, lawful contact discovery/research with attribution.
 `TIP_LLM` and `PulsoLLM` share a network/transceiver/switch knowledge core, but their instruction and behavior pools stay separate.
 ## Implemented In MAGATAMA
 - Added `pulso_llm` and `contact_llm` to:
  - RunPod dataset builder
  - Gitea training pool sync
  - model registry build
  - RunPod submit path
  - HuggingFace/RunPod dataset publishing config
  - Susan/NAS training scan
  - full training refresh
  - dashboard training API/status
  - training modal UI
  - fine-tuner lane config and smoke prompts
 - Added new lane profiles:
  - `PulsoLLM`
  - `ContactLLM`
 - Added source catalog:
  - `training-data/model-registry/research-source-catalog-2026-05-09.json`
  - `training-data/model-registry/external-ingest/llm-lane-research-seeds-2026-05-09.jsonl`
 - Added Gitea-backed seed pools:
  - `training-data/gitea-learning-pool/pulso_llm/`
  - `training-data/gitea-learning-pool/contact_llm/`
 ## Source Seeds Added
 - CISA KEV / CISA Malcolm / CISA ScubaGear
 - NVD CVE API
 - MITRE ATT&CK STIX/TAXII
 - OWASP LLM Top 10
 - Microsoft PyRIT
 - Microsoft Agent Governance Toolkit
 - Cisco Transceiver Module Group matrix
 - Juniper Hardware Compatibility Tool
 - Arista transceiver/cable references
 - Flexoptix product/support references
 - RFC 9309 robots.txt
 - schema.org `ContactPoint`
 - RFC 6350 vCard
 - PeeringDB API
 - RIPE Database REST API
 ## Verified Counts
 RunPod lane exports rebuilt and deployed live on Erik:
 - `magatamallm`: `1375 train`, `153 eval`, `1528 total`
 - `fo_blogllm`: `17342 train`, `1929 eval`, `19271 total`
 - `tip_llm`: `276 train`, `31 eval`, `307 total`
 - `pulso_llm`: `28 train`, `5 eval`, `33 total`
 - `contact_llm`: `18 train`, `4 eval`, `22 total`
 ## Live Verification
 - `pulso_llm` and `contact_llm` appear in the MAGATAMA training modal.
 - RunPod provider is online for both new lanes.
 - `contact_llm` status correctly reports `neverTrained: true`.
 - `pulso_llm` / `contact_llm` are trainable but not adopted yet because no local Ollama model tags exist yet.
 ## Operational Rule
 Training success only counts when all of the following are true:
 - RunPod reports completion.
 - An artifact exists and is reachable.
 - Local import succeeds.
 - Smoke tests pass.
 - The active alias/version is switched.
 - The registry and dashboard show the new version.
 If any part fails, the lane must stay in a non-adopted state.