transceiver-db/sync/history/2026-05-09-magatama-multi-llm-training-lanes.md
2026-05-10 00:47:30 +02:00

3.5 KiB

MAGATAMA Multi-LLM Training Lanes

Date: 2026-05-09

Decision

MAGATAMA now treats the core specialized models as separate training lanes with separate behavior pools:

  • magatamallm: MAGATAMA operations, cybersecurity, AI security, infrastructure security, resolver/fix workflows.
  • fo_blogllm: Rene/Flexoptix-style blog writing, technical storytelling, market/blog structure.
  • tip_llm: crawler/scraper/robot planning, source discovery, parser/selectors, switch/transceiver issue research.
  • pulso_llm: Flexoptix product/support/diagnostic lane for switches, transceivers, compatibility, product fit and offers.
  • contact_llm: structured, lawful contact discovery/research with attribution.

TIP_LLM and PulsoLLM share a network/transceiver/switch knowledge core, but their instruction and behavior pools stay separate.

Implemented In MAGATAMA

  • Added pulso_llm and contact_llm to:
    • RunPod dataset builder
    • Gitea training pool sync
    • model registry build
    • RunPod submit path
    • HuggingFace/RunPod dataset publishing config
    • Susan/NAS training scan
    • full training refresh
    • dashboard training API/status
    • training modal UI
    • fine-tuner lane config and smoke prompts
  • Added new lane profiles:
    • PulsoLLM
    • ContactLLM
  • Added source catalog:
    • training-data/model-registry/research-source-catalog-2026-05-09.json
    • training-data/model-registry/external-ingest/llm-lane-research-seeds-2026-05-09.jsonl
  • Added Gitea-backed seed pools:
    • training-data/gitea-learning-pool/pulso_llm/
    • training-data/gitea-learning-pool/contact_llm/

Source Seeds Added

  • CISA KEV / CISA Malcolm / CISA ScubaGear
  • NVD CVE API
  • MITRE ATT&CK STIX/TAXII
  • OWASP LLM Top 10
  • Microsoft PyRIT
  • Microsoft Agent Governance Toolkit
  • Cisco Transceiver Module Group matrix
  • Juniper Hardware Compatibility Tool
  • Arista transceiver/cable references
  • Flexoptix product/support references
  • RFC 9309 robots.txt
  • schema.org ContactPoint
  • RFC 6350 vCard
  • PeeringDB API
  • RIPE Database REST API

Verified Counts

RunPod lane exports rebuilt and deployed live on Erik:

  • magatamallm: 1375 train, 153 eval, 1528 total
  • fo_blogllm: 17342 train, 1929 eval, 19271 total
  • tip_llm: 276 train, 31 eval, 307 total
  • pulso_llm: 28 train, 5 eval, 33 total
  • contact_llm: 18 train, 4 eval, 22 total

Live Verification

  • pulso_llm and contact_llm appear in the MAGATAMA training modal.
  • RunPod provider is online for both new lanes.
  • contact_llm status correctly reports neverTrained: true.
  • pulso_llm / contact_llm are trainable but not adopted yet because no local Ollama model tags exist yet.

Gitea / Privacy Closure

  • Sync handoff commit: 3926a1e
  • MAGATAMA implementation and training-pool commit: 8fb406b
  • MAGATAMA pre-commit correctly blocked the first attempt because raw training rows contained private-network data.
  • Export path is now hardened:
    • private IPs are replaced with placeholders
    • local /Users/... paths are replaced
    • emails, tokens, secrets and passwords are redacted
  • The pushed MAGATAMA commit passed:
    • secrets scan
    • private data scan
    • config values scan

Operational Rule

Training success only counts when all of the following are true:

  • RunPod reports completion.
  • An artifact exists and is reachable.
  • Local import succeeds.
  • Smoke tests pass.
  • The active alias/version is switched.
  • The registry and dashboard show the new version.

If any part fails, the lane must stay in a non-adopted state.