transceiver-db/sync/history/2026-05-09-magatama-multi-llm-training-lanes.md
2026-05-10 00:47:30 +02:00

101 lines
3.5 KiB
Markdown

# MAGATAMA Multi-LLM Training Lanes
Date: 2026-05-09
## Decision
MAGATAMA now treats the core specialized models as separate training lanes with separate behavior pools:
- `magatamallm`: MAGATAMA operations, cybersecurity, AI security, infrastructure security, resolver/fix workflows.
- `fo_blogllm`: Rene/Flexoptix-style blog writing, technical storytelling, market/blog structure.
- `tip_llm`: crawler/scraper/robot planning, source discovery, parser/selectors, switch/transceiver issue research.
- `pulso_llm`: Flexoptix product/support/diagnostic lane for switches, transceivers, compatibility, product fit and offers.
- `contact_llm`: structured, lawful contact discovery/research with attribution.
`TIP_LLM` and `PulsoLLM` share a network/transceiver/switch knowledge core, but their instruction and behavior pools stay separate.
## Implemented In MAGATAMA
- Added `pulso_llm` and `contact_llm` to:
- RunPod dataset builder
- Gitea training pool sync
- model registry build
- RunPod submit path
- HuggingFace/RunPod dataset publishing config
- Susan/NAS training scan
- full training refresh
- dashboard training API/status
- training modal UI
- fine-tuner lane config and smoke prompts
- Added new lane profiles:
- `PulsoLLM`
- `ContactLLM`
- Added source catalog:
- `training-data/model-registry/research-source-catalog-2026-05-09.json`
- `training-data/model-registry/external-ingest/llm-lane-research-seeds-2026-05-09.jsonl`
- Added Gitea-backed seed pools:
- `training-data/gitea-learning-pool/pulso_llm/`
- `training-data/gitea-learning-pool/contact_llm/`
## Source Seeds Added
- CISA KEV / CISA Malcolm / CISA ScubaGear
- NVD CVE API
- MITRE ATT&CK STIX/TAXII
- OWASP LLM Top 10
- Microsoft PyRIT
- Microsoft Agent Governance Toolkit
- Cisco Transceiver Module Group matrix
- Juniper Hardware Compatibility Tool
- Arista transceiver/cable references
- Flexoptix product/support references
- RFC 9309 robots.txt
- schema.org `ContactPoint`
- RFC 6350 vCard
- PeeringDB API
- RIPE Database REST API
## Verified Counts
RunPod lane exports rebuilt and deployed live on Erik:
- `magatamallm`: `1375 train`, `153 eval`, `1528 total`
- `fo_blogllm`: `17342 train`, `1929 eval`, `19271 total`
- `tip_llm`: `276 train`, `31 eval`, `307 total`
- `pulso_llm`: `28 train`, `5 eval`, `33 total`
- `contact_llm`: `18 train`, `4 eval`, `22 total`
## Live Verification
- `pulso_llm` and `contact_llm` appear in the MAGATAMA training modal.
- RunPod provider is online for both new lanes.
- `contact_llm` status correctly reports `neverTrained: true`.
- `pulso_llm` / `contact_llm` are trainable but not adopted yet because no local Ollama model tags exist yet.
## Gitea / Privacy Closure
- Sync handoff commit: `3926a1e`
- MAGATAMA implementation and training-pool commit: `8fb406b`
- MAGATAMA pre-commit correctly blocked the first attempt because raw training rows contained private-network data.
- Export path is now hardened:
- private IPs are replaced with placeholders
- local `/Users/...` paths are replaced
- emails, tokens, secrets and passwords are redacted
- The pushed MAGATAMA commit passed:
- secrets scan
- private data scan
- config values scan
## Operational Rule
Training success only counts when all of the following are true:
- RunPod reports completion.
- An artifact exists and is reachable.
- Local import succeeds.
- Smoke tests pass.
- The active alias/version is switched.
- The registry and dashboard show the new version.
If any part fails, the lane must stay in a non-adopted state.