transceiver-db/sync/history/2026-05-06-tip-lane-detangling-and-disk-safe-refresh.md
2026-05-06 22:53:41 +02:00

3.7 KiB

TIP Lane Detangling And Disk-Safe Refresh

Date: 2026-05-06 UTC

Summary

TIP_LLM was still contaminated by blog/writer behavior even though lane-specific counts were already separated in MAGATAMA. The problem was not only UI-level status, but the actual lane corpus feeding the RunPod export.

The lane was rebuilt and revalidated locally, then synced to Erik and refreshed there. The result is that TIP_LLM now uses a much smaller but correctly aligned research/network corpus instead of silently inheriting FO_Blog-like behavior.

Root Cause

  • The canonical training-data/gitea-learning-pool/tip_llm/*.jsonl pool still contained many blog-shaped rows from shared transceiver corpora.
  • The old TIP export sampled thousands of rows whose prompts/messages still looked like:
    • You are an expert technical writer...
    • publication-ready/blog instructions
  • A direct local check on the pre-fix TIP export showed:
    • 6250 train rows
    • 6087 matched blog/writer patterns

Changes Applied

scripts/runpod_dataset_builder.ts

  • Added a stricter tipDatasetAllowed(...) gate.
  • Tightened laneRecordIsCompatible(...) for tip_llm.
  • Tightened lanePoolMessagesAlign(...) for tip_llm:
    • reject:
      • blog writer
      • publication-ready
      • technical writer specializing
      • article-outline/founder/blog prompts
      • markdown-article assistant outputs
  • TIP registry fallback now only considers lane-compatible datasets.

scripts/sync_gitea_training_pool.ts

  • Applied the same stricter TIP lane-alignment logic.
  • Stopped rewriting redundant merged.jsonl copies for:
    • fo_blogllm
    • tip_llm
  • This was necessary because the duplicated merged artifacts caused local disk exhaustion during refresh.

Disk Incident

During the first rebuild after the lane hardening, refresh failed with:

  • ENOSPC: no space left on device

The immediate cause was writing:

  • training-data/gitea-learning-pool/tip_llm/merged.jsonl

Fix:

  • truncated redundant merged artifacts for fo_blogllm and tip_llm
  • changed sync logic so those duplicates are no longer recreated

Result:

  • free disk space recovered from roughly 377Mi to 17Gi

Verified Local Result

After rebuild:

  • TIP_LLM
    • train = 233
    • eval = 26
    • total = 259
    • blog/writer matches = 0

First rows now use the intended TIP instruction style:

  • You are TIP_LLM, a research and market-intelligence analyst for transceivers, switches, and vendor ecosystems...

This confirms the lane is no longer silently shaped like FO_Blog.

Synced To Erik

Synced:

  • updated scripts:
    • runpod_dataset_builder.ts
    • sync_gitea_training_pool.ts
    • submit_runpod_training.ts
  • rebuilt lane exports:
    • training-data/runpod/magatamallm/*
    • training-data/runpod/fo_blogllm/*
    • training-data/runpod/tip_llm/*

Then reran on Erik:

  • pnpm training:refresh-all

Live Erik / Public API Result

magatamallm

  • datasetSource = url
  • collectedExamples = 15679
  • evalExamples = 1743
  • totalExamples = 17422
  • newSinceLastTraining = 15679

fo_blogllm

  • datasetSource = url
  • collectedExamples = 17322
  • evalExamples = 1926
  • totalExamples = 19254
  • neverTrained = true

tip_llm

  • datasetSource = url
  • collectedExamples = 231
  • evalExamples = 26
  • totalExamples = 257
  • neverTrained = true

Remaining Work

The next remaining hard blocker is no longer lane contamination.

It is now:

  • RunPod artifact validation/adoption

Desired next step:

  1. only accept RunPod COMPLETED as success if a real artifact exists
  2. verify artifact importability
  3. update/adopt local Ollama tag automatically
  4. switch MAGATAMA only after successful adoption
  5. run pre/post smoke prompts