From 3926a1ef906320ebed8761ffe37935b7e09f058e Mon Sep 17 00:00:00 2001 From: Rene Fichtmueller Date: Sun, 10 May 2026 00:11:48 +0200 Subject: [PATCH] sync: record magatama multi-llm training lanes --- sync/CURRENT.md | 56 +++++++++++- ...05-09-magatama-multi-llm-training-lanes.md | 86 +++++++++++++++++++ 2 files changed, 141 insertions(+), 1 deletion(-) create mode 100644 sync/history/2026-05-09-magatama-multi-llm-training-lanes.md diff --git a/sync/CURRENT.md b/sync/CURRENT.md index 66e672e..2cd663e 100644 --- a/sync/CURRENT.md +++ b/sync/CURRENT.md @@ -1,9 +1,63 @@ # Current TIP Sync State -Updated: 2026-05-09 22:01 UTC +Updated: 2026-05-09 22:05 UTC ## Newest Work +- MAGATAMA multi-LLM training lane expansion on 2026-05-09: + - added first-class training lanes for: + - `pulso_llm` + - `contact_llm` + - MAGATAMA training tool now exposes: + - `MagatamaLLM` + - `FO_BlogLLM` + - `TIP_LLM` + - `PulsoLLM` + - `ContactLLM` + - lane split is now canonical: + - `MagatamaLLM`: MAGATAMA operations, cybersecurity, AI security, infrastructure security, resolver/fix workflows + - `FO_BlogLLM`: Rene/Flexoptix-style blog writing, technical storytelling, market/blog structure + - `TIP_LLM`: crawler/scraper/robot planning, source discovery, parser/selectors, switch/transceiver issue research + - `PulsoLLM`: Flexoptix product/support/diagnostic lane for switches, transceivers, compatibility, product fit and offers + - `ContactLLM`: contact discovery/research lane for structured, lawful contact lookup and source attribution + - shared network/transceiver/switch knowledge is intentionally reused for `TIP_LLM` and `PulsoLLM`, but behavior/instruction pools remain separate + - new source catalog added under MAGATAMA: + - `training-data/model-registry/research-source-catalog-2026-05-09.json` + - `training-data/model-registry/external-ingest/llm-lane-research-seeds-2026-05-09.jsonl` + - source seeds added from current research include: + - CISA KEV / CISA Malcolm / CISA ScubaGear + - NVD CVE API + - MITRE ATT&CK STIX/TAXII + - OWASP LLM Top 10 + - Microsoft PyRIT + - Microsoft Agent Governance Toolkit + - Cisco Transceiver Module Group matrix + - Juniper Hardware Compatibility Tool + - Arista transceiver/cable references + - Flexoptix product/support references + - RFC 9309 robots.txt + - schema.org `ContactPoint` + - RFC 6350 vCard + - PeeringDB API + - RIPE Database REST API + - lane-specific Gitea learning pool directories now exist for: + - `training-data/gitea-learning-pool/pulso_llm/` + - `training-data/gitea-learning-pool/contact_llm/` + - RunPod lane exports rebuilt and deployed live on Erik: + - `magatamallm`: `1375 train`, `153 eval`, `1528 total` + - `fo_blogllm`: `17342 train`, `1929 eval`, `19271 total` + - `tip_llm`: `276 train`, `31 eval`, `307 total` + - `pulso_llm`: `28 train`, `5 eval`, `33 total` + - `contact_llm`: `18 train`, `4 eval`, `22 total` + - dashboard/API live checks: + - `pulso_llm` and `contact_llm` appear in the training modal + - RunPod provider is online for both lanes + - `contact_llm` status correctly reports `neverTrained: true` + - `pulso_llm` / `contact_llm` are trainable but not adopted yet because no local Ollama model tags exist yet + - safety/automation note: + - do not mark a lane training run successful unless an artifact exists, imports locally, passes smoke tests, and the active alias/version is switched + - this remains the rule for all LLM lanes + - TIP open competitor status closure on 2026-05-09: - added migration `sql/104-verification-evidence-ambiguous.sql` - extends `transceiver_verification_evidence.verification_type` with `competitor_ambiguous` diff --git a/sync/history/2026-05-09-magatama-multi-llm-training-lanes.md b/sync/history/2026-05-09-magatama-multi-llm-training-lanes.md new file mode 100644 index 0000000..b30c271 --- /dev/null +++ b/sync/history/2026-05-09-magatama-multi-llm-training-lanes.md @@ -0,0 +1,86 @@ +# MAGATAMA Multi-LLM Training Lanes + +Date: 2026-05-09 + +## Decision + +MAGATAMA now treats the core specialized models as separate training lanes with separate behavior pools: + +- `magatamallm`: MAGATAMA operations, cybersecurity, AI security, infrastructure security, resolver/fix workflows. +- `fo_blogllm`: Rene/Flexoptix-style blog writing, technical storytelling, market/blog structure. +- `tip_llm`: crawler/scraper/robot planning, source discovery, parser/selectors, switch/transceiver issue research. +- `pulso_llm`: Flexoptix product/support/diagnostic lane for switches, transceivers, compatibility, product fit and offers. +- `contact_llm`: structured, lawful contact discovery/research with attribution. + +`TIP_LLM` and `PulsoLLM` share a network/transceiver/switch knowledge core, but their instruction and behavior pools stay separate. + +## Implemented In MAGATAMA + +- Added `pulso_llm` and `contact_llm` to: + - RunPod dataset builder + - Gitea training pool sync + - model registry build + - RunPod submit path + - HuggingFace/RunPod dataset publishing config + - Susan/NAS training scan + - full training refresh + - dashboard training API/status + - training modal UI + - fine-tuner lane config and smoke prompts +- Added new lane profiles: + - `PulsoLLM` + - `ContactLLM` +- Added source catalog: + - `training-data/model-registry/research-source-catalog-2026-05-09.json` + - `training-data/model-registry/external-ingest/llm-lane-research-seeds-2026-05-09.jsonl` +- Added Gitea-backed seed pools: + - `training-data/gitea-learning-pool/pulso_llm/` + - `training-data/gitea-learning-pool/contact_llm/` + +## Source Seeds Added + +- CISA KEV / CISA Malcolm / CISA ScubaGear +- NVD CVE API +- MITRE ATT&CK STIX/TAXII +- OWASP LLM Top 10 +- Microsoft PyRIT +- Microsoft Agent Governance Toolkit +- Cisco Transceiver Module Group matrix +- Juniper Hardware Compatibility Tool +- Arista transceiver/cable references +- Flexoptix product/support references +- RFC 9309 robots.txt +- schema.org `ContactPoint` +- RFC 6350 vCard +- PeeringDB API +- RIPE Database REST API + +## Verified Counts + +RunPod lane exports rebuilt and deployed live on Erik: + +- `magatamallm`: `1375 train`, `153 eval`, `1528 total` +- `fo_blogllm`: `17342 train`, `1929 eval`, `19271 total` +- `tip_llm`: `276 train`, `31 eval`, `307 total` +- `pulso_llm`: `28 train`, `5 eval`, `33 total` +- `contact_llm`: `18 train`, `4 eval`, `22 total` + +## Live Verification + +- `pulso_llm` and `contact_llm` appear in the MAGATAMA training modal. +- RunPod provider is online for both new lanes. +- `contact_llm` status correctly reports `neverTrained: true`. +- `pulso_llm` / `contact_llm` are trainable but not adopted yet because no local Ollama model tags exist yet. + +## Operational Rule + +Training success only counts when all of the following are true: + +- RunPod reports completion. +- An artifact exists and is reachable. +- Local import succeeds. +- Smoke tests pass. +- The active alias/version is switched. +- The registry and dashboard show the new version. + +If any part fails, the lane must stay in a non-adopted state.