sync: record magatama multi-llm training lanes

This commit is contained in:
Rene Fichtmueller 2026-05-10 00:11:48 +02:00
parent 635a102932
commit 3926a1ef90
2 changed files with 141 additions and 1 deletions

View File

@ -1,9 +1,63 @@
# Current TIP Sync State # Current TIP Sync State
Updated: 2026-05-09 22:01 UTC Updated: 2026-05-09 22:05 UTC
## Newest Work ## Newest Work
- MAGATAMA multi-LLM training lane expansion on 2026-05-09:
- added first-class training lanes for:
- `pulso_llm`
- `contact_llm`
- MAGATAMA training tool now exposes:
- `MagatamaLLM`
- `FO_BlogLLM`
- `TIP_LLM`
- `PulsoLLM`
- `ContactLLM`
- lane split is now canonical:
- `MagatamaLLM`: MAGATAMA operations, cybersecurity, AI security, infrastructure security, resolver/fix workflows
- `FO_BlogLLM`: Rene/Flexoptix-style blog writing, technical storytelling, market/blog structure
- `TIP_LLM`: crawler/scraper/robot planning, source discovery, parser/selectors, switch/transceiver issue research
- `PulsoLLM`: Flexoptix product/support/diagnostic lane for switches, transceivers, compatibility, product fit and offers
- `ContactLLM`: contact discovery/research lane for structured, lawful contact lookup and source attribution
- shared network/transceiver/switch knowledge is intentionally reused for `TIP_LLM` and `PulsoLLM`, but behavior/instruction pools remain separate
- new source catalog added under MAGATAMA:
- `training-data/model-registry/research-source-catalog-2026-05-09.json`
- `training-data/model-registry/external-ingest/llm-lane-research-seeds-2026-05-09.jsonl`
- source seeds added from current research include:
- CISA KEV / CISA Malcolm / CISA ScubaGear
- NVD CVE API
- MITRE ATT&CK STIX/TAXII
- OWASP LLM Top 10
- Microsoft PyRIT
- Microsoft Agent Governance Toolkit
- Cisco Transceiver Module Group matrix
- Juniper Hardware Compatibility Tool
- Arista transceiver/cable references
- Flexoptix product/support references
- RFC 9309 robots.txt
- schema.org `ContactPoint`
- RFC 6350 vCard
- PeeringDB API
- RIPE Database REST API
- lane-specific Gitea learning pool directories now exist for:
- `training-data/gitea-learning-pool/pulso_llm/`
- `training-data/gitea-learning-pool/contact_llm/`
- RunPod lane exports rebuilt and deployed live on Erik:
- `magatamallm`: `1375 train`, `153 eval`, `1528 total`
- `fo_blogllm`: `17342 train`, `1929 eval`, `19271 total`
- `tip_llm`: `276 train`, `31 eval`, `307 total`
- `pulso_llm`: `28 train`, `5 eval`, `33 total`
- `contact_llm`: `18 train`, `4 eval`, `22 total`
- dashboard/API live checks:
- `pulso_llm` and `contact_llm` appear in the training modal
- RunPod provider is online for both lanes
- `contact_llm` status correctly reports `neverTrained: true`
- `pulso_llm` / `contact_llm` are trainable but not adopted yet because no local Ollama model tags exist yet
- safety/automation note:
- do not mark a lane training run successful unless an artifact exists, imports locally, passes smoke tests, and the active alias/version is switched
- this remains the rule for all LLM lanes
- TIP open competitor status closure on 2026-05-09: - TIP open competitor status closure on 2026-05-09:
- added migration `sql/104-verification-evidence-ambiguous.sql` - added migration `sql/104-verification-evidence-ambiguous.sql`
- extends `transceiver_verification_evidence.verification_type` with `competitor_ambiguous` - extends `transceiver_verification_evidence.verification_type` with `competitor_ambiguous`

View File

@ -0,0 +1,86 @@
# MAGATAMA Multi-LLM Training Lanes
Date: 2026-05-09
## Decision
MAGATAMA now treats the core specialized models as separate training lanes with separate behavior pools:
- `magatamallm`: MAGATAMA operations, cybersecurity, AI security, infrastructure security, resolver/fix workflows.
- `fo_blogllm`: Rene/Flexoptix-style blog writing, technical storytelling, market/blog structure.
- `tip_llm`: crawler/scraper/robot planning, source discovery, parser/selectors, switch/transceiver issue research.
- `pulso_llm`: Flexoptix product/support/diagnostic lane for switches, transceivers, compatibility, product fit and offers.
- `contact_llm`: structured, lawful contact discovery/research with attribution.
`TIP_LLM` and `PulsoLLM` share a network/transceiver/switch knowledge core, but their instruction and behavior pools stay separate.
## Implemented In MAGATAMA
- Added `pulso_llm` and `contact_llm` to:
- RunPod dataset builder
- Gitea training pool sync
- model registry build
- RunPod submit path
- HuggingFace/RunPod dataset publishing config
- Susan/NAS training scan
- full training refresh
- dashboard training API/status
- training modal UI
- fine-tuner lane config and smoke prompts
- Added new lane profiles:
- `PulsoLLM`
- `ContactLLM`
- Added source catalog:
- `training-data/model-registry/research-source-catalog-2026-05-09.json`
- `training-data/model-registry/external-ingest/llm-lane-research-seeds-2026-05-09.jsonl`
- Added Gitea-backed seed pools:
- `training-data/gitea-learning-pool/pulso_llm/`
- `training-data/gitea-learning-pool/contact_llm/`
## Source Seeds Added
- CISA KEV / CISA Malcolm / CISA ScubaGear
- NVD CVE API
- MITRE ATT&CK STIX/TAXII
- OWASP LLM Top 10
- Microsoft PyRIT
- Microsoft Agent Governance Toolkit
- Cisco Transceiver Module Group matrix
- Juniper Hardware Compatibility Tool
- Arista transceiver/cable references
- Flexoptix product/support references
- RFC 9309 robots.txt
- schema.org `ContactPoint`
- RFC 6350 vCard
- PeeringDB API
- RIPE Database REST API
## Verified Counts
RunPod lane exports rebuilt and deployed live on Erik:
- `magatamallm`: `1375 train`, `153 eval`, `1528 total`
- `fo_blogllm`: `17342 train`, `1929 eval`, `19271 total`
- `tip_llm`: `276 train`, `31 eval`, `307 total`
- `pulso_llm`: `28 train`, `5 eval`, `33 total`
- `contact_llm`: `18 train`, `4 eval`, `22 total`
## Live Verification
- `pulso_llm` and `contact_llm` appear in the MAGATAMA training modal.
- RunPod provider is online for both new lanes.
- `contact_llm` status correctly reports `neverTrained: true`.
- `pulso_llm` / `contact_llm` are trainable but not adopted yet because no local Ollama model tags exist yet.
## Operational Rule
Training success only counts when all of the following are true:
- RunPod reports completion.
- An artifact exists and is reachable.
- Local import succeeds.
- Smoke tests pass.
- The active alias/version is switched.
- The registry and dashboard show the new version.
If any part fails, the lane must stay in a non-adopted state.