transceiver-db/sync/CURRENT.md

# Current TIP Sync State

Updated: 2026-05-09 21:24 UTC

## Newest Work

- TIP no-valid-competitor resolver on 2026-05-09:
  - added `packages/scraper/src/utils/resolve-no-valid-competitor.ts`
    - script: `pnpm -C packages/scraper run verify:no-valid-competitor`
    - default mode is dry-run
    - apply mode requires `NO_VALID_MATCH_APPLY=1`
    - default vendor scope is `NO_VALID_MATCH_VENDOR=Flexoptix`
  - purpose:
    - close products that already have price, image, and details evidence
    - only resolve competitor verification when there is no strict source-backed 1:1 competitor candidate
    - avoid fake competitor matches for uncommon Flexoptix products
  - conservative gates:
    - active transceiver only; excludes known artifact/non-transceiver categories
    - source-backed `price_verified`, `image_verified`, and `details_verified` required
    - same-vendor candidates ignored; only other vendors count
    - strict candidate match requires same form factor, same speed, same fiber, reach within max(25m, 5%), and compatible wavelength when both sides expose it
    - no pending/approved equivalence above confidence `0.50`
  - live Erik run:
    - dry-run with Flexoptix scope found `73` no-valid-match candidates
    - apply run updated `73`
    - `73` additional products earned `fully_verified`
    - evidence ledger wrote `73` `competitor_no_match` records
  - live health after run:
    - active products: `17414`
    - price verified: `11523`
    - image verified: `12125`
    - details verified: `16814`
    - fully verified: `10831`
    - active competitor status:
      - `matched=11158`
      - `no_valid_match=73`
      - `ambiguous=192`
      - `needs_research=5991`
  - operational note:
    - `tip-scraper-daemon` was initially not restarted while QSFPTEK/NADDOD pricing jobs were active
    - after those jobs cleared, `tip-scraper-daemon` was restarted once
    - `maintenance:reconcile-verification` completed
    - `maintenance:find-equivalences` completed
    - matcher correctly moved `192` products into `ambiguous` instead of inventing unsafe matches
    - remaining fully populated product rows with `needs_research`:
      - `FS.COM=74`
      - `Flexoptix=15`
      - `ATGBICS=2`
  - TIPLLM training pool:
    - appended deterministic no-valid-match resolver lessons
    - JSONL must remain valid after every append

- TIP verification truth model on 2026-05-09:
  - implemented migration `sql/103-verification-evidence-and-competitor-status.sql`
    - adds `transceivers.competitor_status`
      - `matched`
      - `no_valid_match`
      - `needs_research`
      - `ambiguous`
      - `unknown`
    - adds `no_match_verified_at` and `no_match_reason`
    - creates append-only `transceiver_verification_evidence`
  - code changes:
    - scraper DB helper now records evidence for price/image/details decisions
    - artifact quarantine records `artifact_quarantine` evidence
    - matcher writes `competitor_match` evidence for auto-approved matches
    - matcher sets product status to `matched`, `ambiguous`, or `needs_research`
    - Review API adds protected `POST /api/review/transceivers/:id/no-valid-match`
    - Review stats now include product-level competitor status counts
    - Health API now exposes active-product competitor status counts
  - live migration/backfill:
    - applied on Erik successfully
    - status distribution after migration:
      - `matched=11198`
      - `needs_research=6575`
    - Evidence ledger seeded from current data:
      - `price=10633`
      - `image=12189`
      - `details=16782`
      - `competitor_match=316`
  - live API checks:
    - `/api/health` healthy
    - active health competitor status:
      - `matched=11158`
      - `needs_research=6256`
      - `no_valid_match=0`
      - `ambiguous=0`
    - protected review stats with Dashboard token returned product status counts correctly
  - operational note:
    - `tip-api` restarted successfully
    - `tip-scraper-daemon` was not restarted because `scrape:pricing:naddod` and `scrape:pricing:qsfptek` were active
    - scheduler code is synced to `/opt/tip`; restart daemon after those jobs complete to load new matcher/reconcile logic
  - TIPLLM training pool:
    - appended lessons for competitor state machine and evidence ledger
    - JSONL validated locally

- MAGATAMA MagatamaLLM RunPod training and adoption closure on 2026-05-09:
  - operator requirement:
    - RunPod success only counts after artifact exists, local Ollama import works, smoke tests pass, aliases/version switch, remote registry is updated, and live MAGATAMA reports no stale active run
    - do not spend another RunPod run when the paid training already completed; recover adoption instead
  - RunPod job completed:
    - endpoint `0rmkf28w2g5gip`
    - job `a46de2ef-96e0-4adf-bbf8-d7a890e06c6f-e2`
    - run id `magatamallm-2026-05-09T19-22-53`
    - target artifact `renefichtmueller/magatama-magatamallm-magatamallm-2026-05-09t19-22-53`
    - worker summary `RunPod QLoRA complete · train=605 · valid=114`
  - adoption recovered:
    - initial local adoption failed because Mac Studio had too little free disk for GGUF conversion after the merged model was written
    - removed only temporary/import-safe blockers:
      - failed MagatamaLLM merged `model.safetensors`
      - already imported FO_BlogLLM and TIP_LLM source GGUF files
      - old non-active Ollama test model `test-qwen32b:latest`
    - kept active Ollama aliases intact: `magatama-coder:latest`, `fo-blog-v7`, `tip-llm-v1`
  - adoption completed:
    - local candidate `magatamallm-runpod-magatamallm-2026-05-09t19-22-53`
    - release alias `magatama-coder-r1`
    - active alias `magatama-coder:latest`
    - candidate smoke `4/5` passed with the required threshold `4`
    - direct local smoke returned exact `MAGATAMA-R1-READY`
  - dashboard/server correction:
    - deployed a MAGATAMA dashboard server fix so training registry ordering uses `recorded_at`, with `completed_at/adopted_at/created_at` fallbacks
    - release/version selection now accepts top-level `release_alias` and `candidate_model` on adoption events
    - legacy MagatamaLLM baseline mismatch guard no longer invalidates the new RunPod lane export
    - restarted `magatama-dashboard`
  - live verification:
    - `magatamallm` reports `activeProvider=ollama:magatama-coder:latest`
    - `modelVersion=magatama-coder-r1`
    - `lastRegistryRunStatus=completed_and_adopted`
    - `activeRun=null`
    - `hasTrustedTrainingBaseline=true`
    - `newSinceLastTraining=0`
    - lane export shows `1367` train, `152` eval, `1519` total
    - `fo_blogllm` remains `fo-blog-v7-r1`, `activeRun=null`, `newSinceLastTraining=0`
    - `tip_llm` remains `tip-llm-v1-r1`, `activeRun=null`, `newSinceLastTraining=0`
  - open:
    - add more explicit training pairs for the “insufficient evidence => escalate/manual review” behavior because the new MagatamaLLM passed the required smoke threshold but still answered that one eval too passively
    - complete dual-Gitea mirroring as a separate infrastructure closure item

- TIP verification artifact cleanup and vendor completion on 2026-05-09:
  - operator requirement:
    - continue until all source-backed verification work is exhausted
    - use deterministic TIP robots/scrapers only; no external AI
    - keep Erik safe by running targeted jobs and waiting for pg-boss completion
    - write crawler/scraper/robot learnings into the TIPLLM training pool
  - deployed fixes:
    - added/expanded `verify:quarantine:non-transceivers`
      - removes GAO, Ascent, FS.com, Flexoptix, Arista, ShopFiber24, and Coherent category/support/cable/switch artifacts from the active transceiver base
      - clears price/image/details/competitor/fully verification flags for those artifacts
    - added `verify:normalize:product-urls`
      - repaired malformed older Mouser URLs such as duplicated `https://www.mouser.dehttps://www.mouser.de...`
    - added `scrape:gaotek:details`
      - lightweight fetch+cheerio detail verifier for GAO product URLs
    - hardened Ascent parser so product-family/category rows are skipped
    - repaired 10Gtek/SFPcables scraper to pass product URL and image URL into verification and parse common meter/range reaches
    - scheduler reconcile now excludes known non-transceiver categories when promoting `details_verified`
  - live robot runs:
    - non-transceiver quarantine:
      - first pass quarantined 121 artifacts
      - Flexoptix filter URL pass quarantined 103 artifacts
      - Ascent/Flex/FS/Arista/ShopFiber/Coherent cleanup quarantined 68 + 38 + 6 additional artifacts
    - GAO detail verifier:
      - 245 GAO product pages examined
      - 181 rows updated and details verified
      - 64 skipped because source text still lacked complete deterministic specs
    - Mouser URL normalizer:
      - 388 malformed `mouser.de` URLs repaired
    - 10Gtek scraper:
      - 50 product pages parsed via sfpcables.com
      - URL/image propagation repaired for future verification
    - Ascent scraper:
      - 237 genuine product rows kept after parser hardening
      - category/family rows no longer re-enter active verification
    - FS.com DB detail run:
      - 1 remaining detail page scraped
      - 1 price observation and 1 spec verification written
    - reconcile completed
    - equivalence matcher completed at `2026-05-09 20:11:39 UTC`
  - latest live TIP health:
    - status `healthy`
    - load status `ok`
    - memory used `13%`
    - active total `17,405`
    - `price_verified=11,523`
    - `image_verified=12,125`
    - `details_verified=16,810`
    - `fully_verified=10,758`
  - vendor truth after cleanup:
    - active Flexoptix products now have price/image/details complete; remaining `not_full=280` is competitor-match only
    - active FS.com products now have price/image/details complete; remaining `not_full=74` is competitor-match only
    - GAO Tek remains quote-only/no public prices: 433 active rows still blocked by missing public price/competitor evidence
    - Juniper/Cisco/Eoptolink/Ascent/OEM families remain the largest open blockers because public price/image evidence is not available for many rows
  - TIPLLM training pool:
    - appended deterministic lessons to `training-data/tip-llm-capabilities-v1.jsonl`
    - JSONL validated locally

- TIP global verification continuation on 2026-05-09:
  - operator requirement:
    - continue until all possible product data is searched, found, verified, and source-backed
    - no external AI; use TIP deterministic scrapers/robots only
    - keep Erik safe; do not launch a heavy crawler wave
    - write crawler/scraper/robot learnings into the TIPLLM training pool
  - deployed fixes:
    - repaired GAO Tek scraper for the live Woodmart product grid:
      - current selector is `.wd-product.product-grid-item`
      - product title selector includes `.wd-entities-title a`
      - SKU selector includes `.wd-sku`
      - fallback now only accepts real `https://gaotek.com/product/...` URLs
      - category URLs are excluded from active verification/search counters
    - expanded GAO reach parsing:
      - 1/2/10/15/20/30/40/50/80/120/140/160 km
      - 82/100/300/500/550 m
      - mile values converted to rounded km labels
    - added `packages/scraper/src/utils/verify-catalog-details.ts`
      - promotes details only for complete normalized catalog specs with a vendor website/docs/datasheet source URL
      - does not mark price/image/competitor verified
    - hardened scheduler reconcile so category URLs are not promoted as details source
    - fixed Flexoptix image backfill vendor-name case bug (`Flexoptix` vs `FLEXOPTIX`)
    - expanded other-vendor image backfill list for Cisco, Juniper, Arista, 10Gtek, QSFPTEK, SFPcables, Coherent, NADDOD
  - crawler/robot runs:
    - GAO Tek scraper:
      - fetched 20 pages
      - extracted 480 real product cards
      - found 0 public prices
      - reset 6 category/non-product artifacts
    - pi-fetch priority wave:
      - GAO Tek, Juniper OEM/MX/QFX, Cisco Nexus/Catalyst/ASR, Ascent, Eoptolink, Flexoptix, Flexoptix supported vendors, Arista OEM
      - all jobs completed
    - reconcile completed
    - equivalence matcher completed
    - catalog-details verifier promoted 4,340 details
    - image backfill:
      - first expanded run updated 48 images
      - Flexoptix case fix then updated 12 additional images
  - live public TIP health after this pass:
    - status `healthy`
    - load status `ok`
    - memory used `13%`
    - active total `17,714`
    - `price_verified=11,582`
    - `image_verified=12,194`
    - `details_verified=16,684`
    - `fully_verified=11,052`
  - hard truth:
    - GAO Tek appears quote-only/no public price in the crawled catalog, so prices remain unverified rather than fabricated
    - many OEM rows now have verified details but still lack public prices/images/competitor evidence
    - Flexoptix still has 110 image-missing SKUs after GraphQL returned no usable image for those SKUs
    - top remaining blockers are mostly public price/image/competitor availability, not detail parsing
  - TIPLLM training pool:
    - appended `robot-experiences/2026-05-09.jsonl`
    - validated JSONL locally

- MAGATAMA FO_BlogLLM RunPod training and adoption closure on 2026-05-09:
  - operator requirement:
    - training success must only count after artifact exists, local import works, smoke tests pass, Ollama alias/version switches, remote MAGATAMA registry is updated, and the live UI reports no active stale job
    - no repeat of failed "COMPLETED but nothing adopted" serverless runs
    - local Mac Studio training remains throttled by default to avoid saturating the workstation
  - RunPod job completed:
    - endpoint `0rmkf28w2g5gip`
    - job `99d08ef2-9016-4488-ac69-3585c8a09f38-e2`
    - run id `fo_blogllm-2026-05-09T17-14-16`
    - target artifact `renefichtmueller/magatama-fo-blogllm-fo-blogllm-2026-05-09t17-14-16`
    - worker summary `RunPod QLoRA complete · train=11473 · valid=1281`
  - failure recovered:
    - first local adoption failed because Mac Studio disk filled during F16 GGUF conversion
    - removed stale partial F16 GGUF and obsolete merged safetensors to restore free space
    - hardened importer to:
      - require minimum free disk before conversion
      - delete stale partial F16 before retry
      - reuse existing GGUF when present
      - delete temporary F16 in all cases
      - remove merged safetensors/bin after successful Ollama registration unless `.keep-merged` exists
  - adoption completed:
    - local candidate `fo-blogllm-runpod-fo_blogllm-2026-05-09t17-14-16`
    - release alias `fo-blog-v7-r1`
    - active alias `fo-blog-v7`
    - candidate smoke `5/5` passed
    - direct local smoke returned exact `FO-BLOG-V7-READY`
  - dashboard/server hardening:
    - old baseline smoke is now non-blocking when the active alias does not exist yet; candidate smoke remains mandatory
    - deployed updated dashboard bundle, fine-tuner API template, and RunPod-Ollama importer to Erik
    - restarted `magatama-dashboard`
    - copied `fo_blogllm-last_run.json` and adoption report to Erik
    - appended remote training registry event `completed_and_adopted`
  - live verification:
    - `fo_blogllm` reports `activeProvider=ollama:fo-blog-v7`
    - `modelVersion=fo-blog-v7-r1`
    - `lastRegistryRunStatus=completed_and_adopted`
    - `activeRun=null`
    - `collectedExamples=17322`, `evalExamples=1926`, `totalExamples=19267`
    - `newSinceLastTraining=0`
    - `tip_llm` remains healthy with `tip-llm-v1-r1`, `activeRun=null`, `newSinceLastTraining=0`
  - TIP runtime correction:
    - TIP UI already referenced `fo-blog-v7`, but `/opt/tip/blog-llm-settings.json` still forced `provider=claude-code`
    - old adapter bridge port `192.168.178.213:11435` was not reachable
    - switched runtime and PM2 env to `BLOG_LLM_PROVIDER=ollama`, `OLLAMA_URL=http://192.168.178.213:11434`, `OLLAMA_LLM_MODEL=fo-blog-v7`
    - restarted `tip-api` and `tip-scraper-daemon`
    - verified from Erik that `fo-blog-v7` answers through the TIP path with exact `TIP-FO-BLOG-V7-READY`
  - open:
    - run the same end-to-end custom-worker/adoption path for `magatamallm`
    - complete dual-Gitea mirroring as separate infrastructure closure item

- Near-complete detail queue closed with lightweight vendor detail verifiers on 2026-05-09:
  - operator requirement:
    - keep Erik safe; no heavy browser crawler or Playwright wave
    - only source-backed product details may be marked verified
    - crawler/scraper/robot learnings must be written to the TIPLLM training pool
  - implemented:
    - `packages/scraper/src/scrapers/atgbics-detail-pages.ts`
    - `packages/scraper/src/scrapers/shopfiber24-fibermall-detail-pages.ts`
    - npm scripts:
      - `scrape:atgbics:details`
      - `scrape:vendors:details`
  - ATGBICS product.js pass:
    - first run fetched `107`, updated `97`, skipped `10`, promoted `97`
    - parser then learned to ignore unhelpful `Max Distance_N/A` tags and fall back to title/body source text
    - final run fetched `10`, updated `10`, skipped `0`, promoted `10`
    - after a concurrent price update exposed another AOC batch, follow-up run fetched `23`, updated `23`, skipped `0`, promoted `23`
    - ATGBICS near-complete missing details reduced to `0`
  - FiberMall + ShopFiber24 detail pass:
    - first run fetched `116`, updated `112`, skipped `4`, promoted `112`
    - final semantic closure fetched `4`, updated `4`, skipped `0`, promoted `4`
    - FiberMall near-complete missing details reduced to `0`
    - ShopFiber24 near-complete missing details reduced to `0`
  - truth handling:
    - FiberMall uses Schema.org Product JSON-LD for title/description/mpn/image evidence
    - ShopFiber24 uses static title/meta/description evidence
    - variable AOC/DAC/category family pages are classified as `Product Family`, `AOC Cable Family`, or `DAC Cable Family` with `Variant` reach instead of a fake fixed meter value
    - media converters/switches/mux/adapter rows are classified as non-transceiver product classes instead of optical equivalents
    - 100G DWDM DCO rows are classified as `Coherent DWDM` with line-system-dependent reach when source pages do not provide a normal reach
  - final live state:
    - global `price_verified=11582`
    - global `details_verified=12276`
    - global `fully_verified=11001`
    - near-complete queue `price_verified AND image_verified AND competitor_verified AND NOT details_verified = 0`
    - public TIP health `healthy`
    - load status `ok`
    - memory used `12%`

- MAGATAMA training live cleanup and TIP_LLM adoption closure on 2026-05-09:
  - operator requirement:
    - no local Mac Studio training may consume the full workstation by default
    - RunPod success must mean artifact exists, local import works, alias/version switches, smoke tests pass, and metadata is written back
    - stale RunPod jobs must not keep the UI in a fake "running" state
  - live cleanup completed:
    - cancelled stale RunPod job `83baffe9-d702-43fc-a2b0-bd5818b74059-e2` on old endpoint `ocnuj82cowe2ym`
    - copied local `tip_llm-last_run.json` back to Erik under `/root/magatama-llm/fine-tuning/`
    - appended remote training registry event `completed_and_adopted` for custom-worker job `dd35df4a-99f7-468f-8c9e-be19baa78338-e1`
    - live dashboard now reports `activeRun: null` for `tip_llm` instead of stale in-queue work
  - adopted model state:
    - active TIP_LLM alias is `tip-llm-v1`
    - release alias is `tip-llm-v1-r1`
    - source artifact is `renefichtmueller/magatama-tip-llm-tip-llm-2026-05-09t13-16-14`
    - local smoke test returned exact `TIP_OK`
  - dashboard hardening:
    - stale active training detection now collapses registry rows by job/run and ignores terminal, expired, 404, or cancelled RunPod jobs
    - deployed patched `packages/dashboard/dist/server.js` and restarted `magatama-dashboard`
  - Mac Studio safety:
    - local training now defaults to `nice=+10`, BLAS/OpenMP thread caps of `4`, tokenizer parallelism off, and MPS high-watermark ratio `0.70`
    - full-speed local training requires explicit `MAGATAMA_LOCAL_TRAIN_UNTHROTTLED=1`
  - live verification:
    - `tip_llm` reports `modelVersion=tip-llm-v1-r1`, `lastRegistryRunStatus=completed_and_adopted`, `activeRun=null`
    - `fo_blogllm` still uses its lane-specific pool and active provider `ollama:fo-blog-v7`
  - open:
    - run the same hardened custom-worker end-to-end path for `magatamallm` and the next `fo_blogllm` version
    - keep Gitea/proxmox mirror work as a separate infrastructure closure item

- ATGBICS deterministic special-case backfill on 2026-05-09:
  - precheck:
    - after the explicit URL evidence pass, ATGBICS still had `139` near-complete rows
    - `32` matched safe protocol/product-class cases:
      - loopback/test modules
      - 10GBASE-T / RJ45 copper
      - 10GBASE-LRM
      - BX60 / BXD-60 / BXU-60
      - CWDM 10G 60km
      - CSR rows
  - DB correction:
    - loopback/test modules -> `N/A` reach/fiber/wavelength, `Loopback / Test Module`
    - 10GBASE-T/RJ45 -> `30m`, `Copper`, `N/A`
    - LRM -> `220m`, `MMF`, `1310`
    - BX60 -> `60km`, `SMF`, directional BiDi wavelength evidence
    - CWDM 10G 60 -> `60km`, `SMF`, source wavelength
    - CSR -> `400m`, `MMF`, `850`
  - result:
    - `32` ATGBICS rows detail-verified
    - `32` additional rows promoted to fully verified
    - ATGBICS near-complete missing details reduced from `139` to `107`
    - global `details_verified=12030`
    - global `fully_verified=10753`
  - health:
    - public TIP health stayed `healthy`
    - load status `ok`
    - memory used `12%`
  - truth:
    - remaining ATGBICS rows need detail-page extraction; they are mostly generic OEM/part-number pages where URL slug does not encode the reach

- ATGBICS explicit URL evidence backfill on 2026-05-09:
  - precheck:
    - ATGBICS had `485` price+image+URL-complete rows still lacking detail verification
    - `346` had explicit source URL evidence for reach and media:
      - `m/km` distance in URL
      - `nm` wavelength where optical
      - `smf/mmf/copper/dac/base-t/rj45` media evidence
  - DB correction:
    - extracted reach label/meters from explicit URL `m/km`
    - extracted wavelength from explicit URL `nm`
    - classified media as `SMF`, `MMF`, or `Copper` from URL evidence
    - corrected form factor and speed from protocol terms in URL where stale parser defaults existed
    - marked only those source-evident rows as `details_verified`
  - result:
    - `346` ATGBICS rows detail-verified
    - `346` additional rows promoted to fully verified
    - ATGBICS near-complete missing details reduced from `485` to `139`
    - global `details_verified=11998`
    - global `fully_verified=10721`
  - health:
    - public TIP health stayed `healthy`
    - load status `ok`
    - memory used `13%`
  - truth:
    - remaining ATGBICS rows no longer have simple `m/km + media` URL evidence and need product-page parsing or special handling

- NADDOD adapter classification and FS.COM final detail closure on 2026-05-09:
  - precheck:
    - NADDOD had `3` near-complete rows remaining
    - FS.COM had `1` near-complete row remaining
  - source verification:
    - NADDOD `100GBASE-S25`, `40GBASE-S10`, and `MAM1Q00A-QSA28-S` are adapter/converter modules, not optical transceivers
    - FS SKU `110529` is official FS `QDD-LR4-400G`, `400GBASE-LR4 QSFP-DD`, `10km`, `SMF`, CWDM4 `1271/1291/1311/1331nm`, Duplex LC
  - DB correction:
    - classified the `3` NADDOD rows as `Adapter / Converter`
    - set NADDOD reach/fiber/wavelength to `N/A` and corrected connector/form-factor/speed semantics
    - corrected FS `FS-110529` to part number `QDD-LR4-400G`, standard `400GBASE-LR4 QSFP-DD`, CWDM4 wavelength set, Duplex LC/UPC
  - result:
    - `4` rows detail-verified
    - `3` additional rows promoted to fully verified
    - NADDOD near-complete reduced to `0`
    - FS.COM near-complete reduced to `0`
    - global `details_verified=11652`
    - global `fully_verified=10375`
  - health:
    - public TIP health stayed `healthy`
    - load status `ok`
    - memory used `13%`
  - truth:
    - adapters/converters are verified as non-optical product classes and must not be used as optical transceiver equivalence evidence

- GBICS / QSFPTEK / Fluxlight deterministic standard backfill on 2026-05-09:
  - precheck:
    - GBICS had `13` near-complete rows
    - QSFPTEK had `8` near-complete rows
    - Fluxlight had `11` near-complete rows
  - DB correction:
    - GBICS:
      - filled missing fiber/reach from explicit title/URL evidence such as `850nm`, `1310nm`, `1550nm`, `40km`, `80km`, `220m`, `50m`, `CSR`, `ESR`, `SR8`, `VSR4`, `PSM4`, `PLR4`
    - QSFPTEK:
      - filled SMF and missing long-reach values for `EX`, `EZX`, `ZX`, `LH` product-code rows
    - Fluxlight:
      - corrected obvious stale parser defaults and filled standard evidence for `GLC-LX`, `QDD-4X100G-FR`, `QSFP-100G-SR4`, `QSFP-40G-SR4`, `SFP-10G-T`, `CSR`
  - result:
    - `32` rows detail-verified
    - `32` additional rows promoted to fully verified
    - GBICS near-complete reduced to `0`
    - QSFPTEK near-complete reduced to `0`
    - Fluxlight near-complete reduced to `0`
    - global `details_verified=11648`
    - global `fully_verified=10372`
  - health:
    - public TIP health stayed `healthy`
    - load status `ok`
    - memory used `13%`
  - truth:
    - this was not a broad guess pass; only rows with explicit standard/URL evidence were updated

- FiberMall URL protocol backfill on 2026-05-09:
  - precheck:
    - after the earlier source-title pass, `36` FiberMall rows remained price+image+URL complete but lacked detail verification
    - `12` had safe protocol evidence in the product URL slug
  - DB correction:
    - mapped URL protocol slugs including `sfp-10g-lrm`, `qsfp-40g-lr`, `40lr`, `dem-qx10q-lr4`, `osfp-800g-2fr4`, `qsfp-dd-400g-lr8`, `400g-qsfp-dd-sr4`, `200g-q56-sr4-mm850`, `xg-sfp-zr-sm1550`, `sfp28-lr`, `ma-qsfp-40g-sr-bd`
    - corrected form factor, speed, reach, fiber, wavelength and standard name from those protocol slugs
    - skipped brand-name-only rows without protocol/reach evidence
  - result:
    - `12` FiberMall rows detail-verified
    - `12` additional rows promoted to fully verified
    - FiberMall near-complete missing details reduced from `36` to `24`
    - global `details_verified=11616`
    - global `fully_verified=10340`
  - health:
    - public TIP health stayed `healthy`
    - load status `ok`
    - memory used `13%`
  - truth:
    - remaining FiberMall rows are mostly brand/OEM-code-only URLs and need stronger product-page parsing before approval

- ShopFiber24 deterministic code backfill on 2026-05-09:
  - precheck:
    - `101` ShopFiber24 rows were price+image+URL complete but lacked detail verification
    - many were variable cable families (`XM`, `CXM`, `CUXM`, `CXX`, AOC/DAC family rows) and were intentionally skipped
    - `9` rows had deterministic product-code evidence: `LRM`, `BX60`, `LH70`, `T-80`
  - DB correction:
    - `LRM` -> `220m`, `MMF`, `1310`
    - `BX60` / `BX-D-60` / `BX-U-60` -> `60km`, `SMF`, `1270/1330`
    - `LH70` -> `70km`, `SMF`, `1550`
    - `T-80` -> `80m`, `Copper`, `N/A`
  - result:
    - `9` ShopFiber24 rows detail-verified
    - `9` additional rows promoted to fully verified
    - ShopFiber24 near-complete missing details reduced from `101` to `92`
    - global `details_verified=11604`
    - global `fully_verified=10328`
  - health:
    - public TIP health stayed `healthy`
    - load status `ok`
    - memory used `13%`
  - truth:
    - remaining ShopFiber24 gaps need variant-level extraction or direct page parsing; variable cable-family rows must not be marked as one fixed reach

- ATGBICS parser truth hardening on 2026-05-09:
  - root cause:
    - ATGBICS parser defaulted unknown fiber type to `SMF`
    - automatic detail verification needs positive fiber evidence, not a fallback
    - variable-length ranges must not be collapsed into a fixed reach
  - code hardened:
    - `packages/scraper/src/scrapers/atgbics.ts`
      - refuses variable reach ranges such as `1 - 30 m`
      - only returns `SMF` from explicit SMF/single-mode or protocol evidence such as LR/ER/ZR/BiDi/CWDM/DWDM/DR/FR/PSM
      - returns empty fiber type when evidence is missing instead of assuming SMF
  - verification:
    - `npm run build -w packages/scraper` passed locally
  - deployment:
    - source file synced to `/opt/tip`
    - `pnpm -C packages/scraper build` passed on Erik after SSH recovered
  - truth:
    - future ATGBICS runs should not promote rows to detail-verified from default fiber assumptions

- ShopFiber24 parser hardening for deterministic cable/detail verification on 2026-05-09:
  - root cause:
    - ShopFiber24 contains variable-length AOC/DAC products such as `1 - 30 m`
    - those must not be interpreted as one fixed `30m` reach and marked detail-verified
    - the scraper also treated `800G` / `QSFP-DD800` product text as `400G`
  - code hardened:
    - `packages/scraper/src/scrapers/fiber24.ts`
      - detects `800G` as `800G` / `800Gbps`
      - parses explicit single `m/km` reach values generically
      - refuses variable ranges like `1 - 30 m`, `1 to 30 m`, `1 bis 30 m`
  - verification:
    - `npm run build -w packages/scraper` passed locally
  - deployment:
    - source file synced to `/opt/tip`
    - `pnpm -C packages/scraper build` passed on Erik
  - truth:
    - future ShopFiber24 passes should only mark product details verified when reach is deterministic
    - variable cable-family rows need variant-level extraction instead of broad approval

- FiberMall source-title optical detail backfill on 2026-05-09:
  - precheck:
    - `69` FiberMall rows had price + image + source URL but lacked detail verification
    - all `69` had optical hints
    - `33` had deterministic reach evidence in product title or URL
  - DB correction:
    - filled reach label/meters from explicit `m/km` evidence
    - filled fiber type from SMF/MMF/source-title evidence when missing
    - filled wavelength from explicit `nm` or safe protocol-family evidence where present
    - marked only source-backed rows with deterministic reach as `details_verified`
  - result:
    - `33` FiberMall rows detail-verified
    - `33` additional rows promoted to fully verified
    - global `details_verified=11595`
    - global `fully_verified=10319`
  - health:
    - public TIP health stayed `healthy`
    - load status `ok`
    - memory used `13%`
  - truth:
    - remaining FiberMall rows need stronger source parsing because many are OEM-compatible rows whose DB part number is only a brand name

- MAGATAMA training pipeline recovery, TIP_LLM adoption and Mac Studio local throttle on 2026-05-09:
  - operator requirement:
    - training success only counts after real artifact, local import, alias switch, smoke test and metadata write-back
    - RunPod `COMPLETED` alone is not sufficient
    - local Mac Studio training must not consume the whole workstation
  - completed:
    - custom RunPod worker artifact `renefichtmueller/magatama-tip-llm-tip-llm-2026-05-09t13-16-14` was adopted locally
    - active alias `tip-llm-v1` now points to release alias `tip-llm-v1-r1`
    - local Ollama model `tip-llm-v1` smoke-tested successfully with exact response `TIP_OK`
  - hardened:
    - MAGATAMA train API venv dependencies installed
    - Ollama converter now falls back from HTTP API create to `ollama create`
    - Ollama binary path resolution fixed for service/LaunchAgent context
    - RunPod import script reuses valid GGUF artifacts and rejects stale failed conversions
    - smoke gate now supports an 80 percent minimum threshold to avoid blocking good adoptions on one brittle prompt
    - local training defaults now set `nice=+10`, `OMP/MKL/OPENBLAS/VECLIB/NUMEXPR=4`, `TOKENIZERS_PARALLELISM=false`, `PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.70`
    - full local throttle override requires explicit `MAGATAMA_LOCAL_TRAIN_UNTHROTTLED=1`
  - source paths touched:
    - `/Users/renefichtmueller/magatama-llm/service/training_api.py`
    - `/Users/renefichtmueller/magatama-llm/service/train.py`
    - `/Users/renefichtmueller/magatama-llm/service/register_runpod_ollama_model.py`
    - `/Users/renefichtmueller/magatama-llm/scripts/register_runpod_ollama_model.py`
    - MAGATAMA repo equivalents under `packages/fine-tuner/` and `scripts/`
    - LLM gateway converter under `packages/fine-tuner/src/converter.py`
  - verification:
    - Python syntax checks passed
    - local train API reachable after restart
    - Ollama tags contain `tip-llm-v1`, `tip-llm-v1-r1`, and the imported candidate
    - final model smoke returned `TIP_OK`
  - open:
    - repeat the hardened full end-to-end custom worker path for `magatamallm` and `fo_blogllm`
    - add TIP_LLM controller-policy examples: Erik light controller only; heavy crawlers on Proxmox/Pis
    - never mark training as successful unless artifact retrieval/import/smoke/adoption all pass

- ATGBICS Cable/AOC detail backfill on 2026-05-09:
  - current ATGBICS near-complete state before pass:
    - `581` rows had price + image + product source URL but still lacked detail verification
    - `0` of those were core-complete optical rows
    - `101` had clear Cable/AOC/Copper/Twinax/Breakout hints
    - `22` had coherent/ZR/DCO/C-band hints and were left for a later source-specific coherent parser
  - DB correction:
    - used deterministic length evidence from product URL / part text
    - updated `96` ATGBICS Cable/AOC rows with:
      - reach label/meters
      - cable/AOC/Copper classification
      - `wavelengths=N/A` for Copper/DAC/Twinax
      - source-backed `details_verified`
    - promoted `109` rows to `fully_verified`
  - global result after pass:
    - `details_verified=11562`
    - `fully_verified=10286`
    - total products `17647`
  - health:
    - public TIP health: `healthy`
    - load status `ok`
    - memory used `13%`
  - truth:
    - repeated broad ATGBICS JSON runs are low-yield now
    - remaining ATGBICS gaps need targeted optical/coherent parsing, especially ZR/DCO/C-band/LAN-WDM and non-cable products missing reach/fiber

- NADDOD infrastructure classification pass on 2026-05-09:
  - root cause:
    - NADDOD remaining detail gaps were mostly not pluggable transceiver modules
    - examples included switches, ConnectX adapter cards, Quantum/Spectrum infrastructure and OSFP cage systems
  - DB correction:
    - classified `18` NADDOD rows by source/title evidence:
      - switch/Quantum/Spectrum/ONIE/ports => `Switch / Network Infrastructure`
      - adapter/ConnectX => `NIC / Adapter`
    - used allowed `data_confidence=scraped_unverified`
    - added note: `classified as non-transceiver infrastructure product by source/title evidence`
    - marked details verified only when a source product URL existed
  - result:
    - public health counters after pass:
      - `details_verified=11466`
      - `fully_verified=10177`
      - total products `17647`
    - TIP health stayed `healthy`
    - load status `ok`
    - memory used `12%`
  - truth:
    - these rows should not be treated as 1:1 optical transceiver equivalents
    - they remain useful inventory/network infrastructure records, but need separate switch/NIC handling later

- QSFPTEK cable/AOC parser hardening and DB detail backfill on 2026-05-09:
  - root cause:
    - QSFPTEK scraper parsed catalog rows but did not pass `productUrl` into `findOrCreateScrapedTransceiver`
    - generic leading cable lengths like `1m`, `2m`, `10m`, `15m`, `30m` were not parsed
    - MFS/MCP AOC/DAC product families were not classified as cable/AOC products
  - code hardened:
    - `packages/scraper/src/scrapers/qsfptek.ts`
      - parses generic `m/km` reach, including leading lengths
      - classifies `MFS`/AOC/active fiber as `AOC Cable`
      - classifies `MCP`/DAC/Copper/Twinax as `Cable`
      - writes `productUrl` into the DB upsert
      - sets Copper/DAC wavelength to `N/A`
      - adds safe optical family wavelength parsing for future catalog runs
  - DB correction:
    - found `36` QSFPTEK rows missing details
    - `28` had deterministic leading length and source URL
    - updated those `28` with reach, cable/AOC classification and source-backed details
    - `8` additional rows became fully verified after promotion
  - deployment:
    - synced patched QSFPTEK scraper to active `/opt/tip`
    - `pnpm -C packages/scraper build` passed
  - truth:
    - QSFPTEK is now much closer, but remaining rows include long-reach 1G optics missing fiber/detail fields and should be handled separately by source parsing, not guessed

- Copper/DAC reach/detail verification and comparable API semantics on 2026-05-09:
  - purpose:
    - continue toward full TIP verification without inventing optical data
    - treat Copper/DAC/Twinax as cable products with `wavelengths=N/A`, not missing optical products
  - DB correction:
    - found `467` Copper rows still missing reach label/meters
    - `342` had deterministic length evidence in part number or product URL
    - wrote `reach_label`, `reach_meters`, `wavelengths=N/A`, cable category and detail verification for those `342`
    - corrected `78` ATGBICS OSFP cable rows that had been parsed as `SFP`
  - code hardened:
    - `packages/scraper/src/scrapers/atgbics.ts`
      - detects `OSFP` before `SFP`
      - parses generic decimal meter/kilometer reach such as `0.5m`, `1.5m`, `2.5m`, `30m`, `2km`
      - keeps Copper/DAC/Twinax/Base-T/RJ45 wavelength as `N/A`
    - `packages/api/src/routes/transceivers.ts`
      - comparable products now allow Copper/DAC/CU products to match each other with `wavelengths=N/A`
      - optical products still require numeric wavelength evidence and close wavelength match
  - deployment:
    - synced ATGBICS scraper to active `/opt/tip`
    - `pnpm -C packages/scraper build` passed
    - synced API route to active `/opt/tip`
    - `pnpm -C packages/api build` passed
    - restarted `tip-api`
  - result:
    - global `details_verified` increased from `11085` to `11425`
    - global `fully_verified` increased from `9861` to `10170`
    - Copper remaining gaps after correction:
      - missing reach label: `122`
      - missing reach meters: `125`
      - missing details: `158`
    - selected vendor detail/fully state:
      - ATGBICS: details `7656/8269`, fully `7646/8269`
      - NADDOD: details `726/748`, fully `726/748`
      - QSFPTEK: details `165/201`, fully `140/201`
      - FS.COM: details `373/383`, fully `300/383`
      - Flexoptix: details `626/744`, fully `622/744`
      - GAO Tek: details `127/414`, fully `2/414`
  - health:
    - public TIP health after restart: `healthy`
    - load status `ok`
    - memory used `13%`
  - truth:
    - this is real progress toward trustworthy complete data, not cosmetic flag setting
    - remaining gaps are now smaller targeted vendor/parser/source tasks; NADDOD and QSFPTEK are next high-yield targets

- ATGBICS safe JSON rerun + Copper wavelength semantics on 2026-05-09:
  - code hardened:
    - `packages/scraper/src/scrapers/atgbics.ts`
    - detects `N/A` wavelength for Copper/DAC/Twinax/Base-T/RJ45 products
    - detects safe optical protocol-family wavelengths:
      - CWDM4 => `1271,1291,1311,1331`
      - SR/SR4/SR8/SRBD/VR/ESR/CSR => `850`
      - DR/FR/LR/ER/PSM family => `1310`
  - deployment:
    - synced patched ATGBICS scraper source to active `/opt/tip`
    - `pnpm -C packages/scraper build` passed on Erik
  - runtime:
    - ran one light ATGBICS Shopify `products.json` pass with `nice -n 10`
    - no Playwright/browser crawler
    - processed `7946` products
    - price updates `61`
    - image observations/updates `7943`
  - observation:
    - ATGBICS verification counters did not move because remaining highspeed wavelength gaps are mostly product rows whose source keys are cable/coherent/variant cases not solved by the current lightweight parser
    - sample remaining rows include QSFP-DD ZR/C-band/coherent products and Copper/DAC rows
  - DB truth correction:
    - Copper/DAC products do not have an optical wavelength and should not be counted as missing optical wavelength
    - set empty Copper `wavelengths` to `N/A` for `1044` rows
    - highspeed missing-wavelength count changed:
      - before Copper correction: `1908`
      - after Copper correction: `1360`
      - highspeed Copper missing: `0`
      - remaining optical/non-Copper highspeed missing: `1220`
  - health:
    - public TIP health after run/update: `healthy`
    - load status `ok`
    - memory used `14%`
  - truth:
    - the ATGBICS JSON run was safe and confirmed current prices/images, but did not materially improve ATGBICS technical completeness yet
    - next ATGBICS work should be a targeted parser for product URL slug classes: `ZR`, `DCO`, `C-band`, `LAN-WDM`, `CR8`, `breakout`, and OSFP/QSFP-DD cable form-factor correction

- DB-only highspeed wavelength evidence backfill on 2026-05-09:
  - purpose:
    - improve product-level technical completeness and future 1:1 comparison quality without running a browser crawler on Erik
  - method:
    - only used existing DB evidence from part numbers, standard names, notes and product URLs
    - only filled wavelengths when evidence was deterministic:
      - explicit `850nm`, `1310nm`, `1311nm`, or `1550nm`
      - MMF plus SR/SR4/SR8/SRBD/VR/ESR/CSR family => `850`
      - SMF plus DR/FR/LR/ER/PSM family => `1310`
      - SMF plus CWDM4 => `1271,1291,1311,1331`
    - skipped ambiguous highspeed rows instead of inventing data
  - updated rows:
    - `129` rows set to `1310`
    - `40` rows set to `850`
    - `18` rows set to `1271,1291,1311,1331`
    - total updated: `187`
  - highspeed wavelength gap after update:
    - highspeed rows: `4438`
    - still missing wavelengths: `1908`
    - largest remaining gaps:
      - ATGBICS `663`
      - NADDOD `419`
      - Flexoptix `183`
      - Eoptolink `141`
      - FS.COM `114`
      - QSFPTEK `97`
  - health:
    - public TIP health after update: `healthy`
    - load status `ok`
    - memory used `13%`
  - truth:
    - this was an evidence backfill, not a claim of full source verification
    - remaining wavelength gaps need vendor-specific parsers/crawlers or stronger source text

- Strict active equivalence sweep + reach-meter backfill on 2026-05-09:
  - follow-up after the FS.com `QDD-2FR4-800G` false-comparable correction
  - audited all active `approved/auto_approved` equivalence matches for hard 1:1 risks:
    - breakout/AOC/DAC/cable class mismatch
    - known reach mismatch
    - known fiber mismatch
    - primary wavelength mismatch
    - missing core evidence on active matches
  - found and rejected `16` active false positives:
    - Flexoptix 400G/100G pluggable optics that were matched to ATGBICS AOC/breakout products
    - Flexoptix `Q.851HG.03` 300m MMF incorrectly matched to 70m and 40km NADDOD rows
    - Flexoptix `Q.854HG.01.P` 100m MMF incorrectly matched to a 1m NADDOD row
  - global reach-meter backfill:
    - `269` rows with `km` reach labels received numeric `reach_meters`
    - `131` rows with `m` reach labels received numeric `reach_meters`
    - remaining reach labels without meters are only `N/A` accessory/control rows, not distance products
  - post-sweep active match risk counts:
    - active approved/auto-approved matches: `34051`
    - breakout-class mismatches: `0`
    - reach mismatches: `0`
    - fiber mismatches: `0`
    - wavelength mismatches: `0`
    - missing core evidence: `0`
  - live counters after sweep:
    - equivalence queue: `pending=0`, `approved=1987`, `auto_approved=32064`, `rejected=148382`, `due_research=0`
    - product verification: total `17647`, price `11557`, image `11963`, details `11085`, fully `9861`
  - truth:
    - active equivalence matches now have no known hard 1:1 mismatches by DB evidence
    - this still does not mean every product row is fully enriched; remaining work is product-level vendor enrichment and source capture

- FS.com `QDD-2FR4-800G` false comparable correction on 2026-05-09:
  - operator spotted that the dashboard showed invalid comparable products for FS.com `QDD-2FR4-800G`
  - wrong examples:
    - Flexoptix `DQ.2A858HG.z`: actually `800G QSFP-DD to 2x QSFP112 Breakout AOC`, MMF, 1-30m, not a 2km SMF FR4 transceiver
    - NADDOD `QDD-800LPO-2DR4`: 500m, not 2km
  - root cause:
    - FS.com `QDD-2FR4-800G` had `reach_label=2km` but `reach_meters=0`
    - API comparable-product SQL treated unknown reach as a wildcard, so non-1:1 products leaked into the dashboard comparison section
  - live DB correction:
    - `QDD-2FR4-800G`
      - `form_factor=QSFP-DD`
      - `speed=800G`
      - `speed_gbps=800`
      - `reach_label=2km`
      - `reach_meters=2000`
      - `fiber_type=SMF`
      - `wavelengths=1310`
      - `standard_name=800G QSFP-DD 2FR4`
      - remains fully verified
  - API correction:
    - `packages/api/src/routes/transceivers.ts`
      - comparable products now require hard reach evidence on both sides
      - reach ratio must be at least `0.85`
      - fiber type must match exactly
      - primary wavelength must exist on both sides and be within `15nm`
      - breakout/AOC/DAC/cable products can only compare to other breakout/AOC/DAC/cable products
      - `QSFP-DD` and `QSFP-DD800` are treated as same form-factor family for 800G-class comparisons
  - deployment:
    - copied API route to Erik
    - `pnpm -C packages/api build` passed on Erik
    - `pm2 restart tip-api` completed, `tip-api` online
  - health:
    - public TIP health after restart: `healthy`, load `ok`, memory `13%`
  - truth:
    - `DQ.2A858HG.z` must never be shown as 1:1 comparable for `QDD-2FR4-800G`
    - a 500m NADDOD LPO/2DR4 product must not be shown as 2km comparable
    - unknown reach must never act as wildcard in final product comparison

- FS.com 1.6T DR8/2FR4 source correction on 2026-05-09:
  - operator spotted that FS.com has two distinct 1.6T OSFP variants on the same family:
    - `OSFP-DR8-1.6T-FL`: 500m, DR8, SMF
    - `OSFP-2FR4-1.6T-FL`: 2km, 2FR4, SMF
  - confirmed in TIP DB:
    - both FS.com variants exist as separate rows
    - `OSFP-2FR4-1.6T-FL` had `reach_meters=0` even though the source and row label said `2km`
    - `OSFP-DR8-1.6T-FL` had no wavelength, causing the deterministic equivalence worker to reject the otherwise correct 500m Flexoptix match
  - live DB correction:
    - `OSFP-DR8-1.6T-FL`
      - `speed=1.6T`
      - `speed_gbps=1600`
      - `reach_label=500m`
      - `reach_meters=500`
      - `fiber_type=SMF`
      - `wavelengths=1310`
      - `standard_name=1.6T OSFP DR8`
      - fully verified remains true
    - `OSFP-2FR4-1.6T-FL`
      - `speed=1.6T`
      - `speed_gbps=1600`
      - `reach_label=2km`
      - `reach_meters=2000`
      - `fiber_type=SMF`
      - `wavelengths=1310`
      - `standard_name=1.6T OSFP 2FR4`
      - fully verified true
    - Flexoptix `O.1316T.C.05.M`
      - confirmed as `500m`, `SMF`, `1.6T`
      - `standard_name=1.6T OSFP DR8`
  - equivalence correction:
    - approved only `O.1316T.C.05.M` ↔ `OSFP-DR8-1.6T-FL`
    - confidence `0.913`
    - match basis: form factor, speed, reach, fiber, wavelength and source variant DR8/500m
    - `OSFP-2FR4-1.6T-FL` remains separate and is not linked to the 500m DR8 Flexoptix product
  - scraper hardening:
    - `packages/scraper/src/scrapers/fs-com.ts`
      - recognizes German/decimal `1,6T` and `1600G` as `1.6T`/`1600`
      - converts reach labels such as `2km` into `reach_meters=2000`
      - updates stale `speed` labels when the numeric source speed matches the row
  - build:
    - `pnpm -C packages/scraper build` passed on Erik
  - truth:
    - there are definitely two separate FS.com variants
    - 500m DR8 is the correct equivalent for Flexoptix `O.1316T.C.05.M`
    - 2km FR4 is a separate DB product and must not be collapsed into the 500m match

- Targeted vendor verification push after equivalence revalidation on 2026-05-09:
  - code improved:
    - `NADDOD_DB_DETAIL_ONLY=1` mode verifies existing NADDOD rows with source URLs instead of rotating blindly through the full sitemap
    - NADDOD now extracts `og:image`, source product URLs, reach/fiber/wavelength from page evidence, AOC/DAC cable lengths, and DR/FR/SR/VR/XDR patterns
    - GAO Tek now writes product URLs and image evidence
    - Ascent Optics now writes product URLs and table image evidence
    - Eoptolink now writes product URLs, images, reach/wavelength evidence and corrects over-broad form-factor parsing by preferring title/slug evidence
  - live low-load Erik runs:
    - GAO Tek static crawl:
      - `473` unique products processed
      - GAO Tek detail coverage improved from `41` to `126`
      - `no_url` dropped to `0`
    - Ascent Optics static/API crawl:
      - `253` catalog products processed
      - image coverage `235/305`
      - detail coverage `213/305`
    - Eoptolink static crawl:
      - `76` product-solution pages inspected
      - after parser correction, Eoptolink is `287/287` image and detail verified
    - NADDOD targeted DB-detail mode:
      - first targeted wave `200` pages
      - second wave `300` pages
      - closure wave `385` pages
      - special-case wave `83` pages
      - NADDOD moved from `image=12`, `details=157`, `fully=0/1-ish` to:
        - total `748`
        - price `744`
        - image `742`
        - details `659`
        - competitor `744`
        - fully `659`
        - no URL `6`
  - global TIP counters after this push:
    - price verified `11557`
    - image verified `11963`
    - details verified `11018`
    - fully verified `9794`
    - total transceivers `17647`
  - health:
    - TIP stayed `healthy`
    - load status `ok`
    - memory used about `13%`
  - truth:
    - NADDOD is not 100% complete; remaining detail gaps include likely non-transceiver switch/NIC products and a smaller set of parser-special cases
    - OEM catalogs like Ascent and Eoptolink do not publish retail prices, so full verification cannot be forced honestly without price evidence

- Immediate full TIP equivalence revalidation on 2026-05-09:
  - operator requested all open TIP validation to be completed immediately and all product matches checked for true 1:1 equivalence
  - live preflight:
    - equivalence queue: `pending=0`, `approved=1986`, `auto_approved=32080`, `rejected=148367`, `due_research=0`
    - active matches scheduled for future 30-day recheck: `34066`
    - strict DB preflight over all active matches found:
      - no recent-price gaps: `0`
      - hard technical mismatches: `0`
      - missing critical 1:1 evidence: `0`
    - hard criteria checked: form factor, speed, fiber type, reach ratio, primary wavelength and recent competitor price evidence
  - action:
    - marked all `34066` active `approved/auto_approved` equivalences as due immediately
    - queued `18` existing PgBoss `maintenance:re-research-equivalences` jobs
    - used the existing DB-only TIP re-research worker; no browser crawler wave and no external AI
  - result:
    - all `18/18` jobs completed
    - `due_research=0`
    - `active_researched_today=34066`
    - no automated-research rejections in this immediate pass
    - final equivalence queue: `pending=0`, `approved=1986`, `auto_approved=32080`, `rejected=148367`
    - transceiver verification counters after the pass:
      - `competitor_verified=11470`
      - `price_verified=11557`
      - `image_verified=10711`
      - `details_verified=9929`
      - `fully_verified=9135`
      - total transceivers `17647`
  - TIP health after run:
    - status `healthy`
    - load status `ok`
    - memory used `13%`
    - API/DB connected
  - truth:
    - the manual equivalence queue is empty and all active matches have just been rechecked by deterministic 1:1 evidence rules
    - this does not mean every product row in TIP is complete; largest product verification gaps remain vendor-specific crawler/enrichment work, especially ATGBICS, NADDOD, GAO Tek, Juniper/Cisco, Ascent/Eoptolink and other vendor/catalog rows

- Crawlee integration/binding on 2026-05-09:
  - operator asked to install, use and bind Crawlee/Crawlee-Python after priority evaluation
  - pushed TIP commits:
    - `60531b6 feat: add crawlee python worker integration`
    - `49f0871 chore: ignore crawlee python build artifacts`
  - TypeScript TIP core remains the production crawler core using `crawlee` and Playwright
  - added scraper scripts:
    - `pnpm -C packages/scraper scrape:fs:db-detail`
    - `pnpm -C packages/scraper scrape:fs:url-discovery`
  - added optional isolated Python worker:
    - `packages/crawlee-python/`
    - `scripts/setup-crawlee-python-worker.sh`
    - `docs/TIP_CRAWLEE_RUNTIME.md`
  - Python worker policy:
    - Crawlee-Python is for Pi/Proxmox/residential side workers and extraction experiments
    - writes JSONL evidence only
    - no direct DB writes
    - no replacement for the TypeScript TIP scraper core
  - smoke test:
    - installed `crawlee==1.6.3` into `/tmp/tip-crawlee-python-venv`
    - ran `tip_crawlee_worker` against `https://crawlee.dev`
    - JSONL evidence output succeeded

- Priority Crawlee evaluation + FS.com URL discovery on 2026-05-09:
  - operator asked whether these repos help:
    - `https://github.com/apify/crawlee`
    - `https://github.com/apify/crawlee-python`
    - `https://github.com/hiteshchoudhary/crawlee-project`
  - evaluation:
    - `apify/crawlee` is directly relevant and already in use in TIP via TypeScript `PlaywrightCrawler`
    - current TIP benefit is not adding Crawlee, but using Crawlee more deliberately:
      - bounded RequestQueues
      - stable `uniqueKey`
      - explicit retry/no-text classes
      - isolated storage directories
      - AutoscaledPool telemetry as safety signal
      - hard concurrency caps on Erik
    - `apify/crawlee-python` is useful for future isolated Pi/Proxmox workers, especially for Python-native extraction experiments, but should not replace the current TypeScript scraper core today
    - `hiteshchoudhary/crawlee-project` is a small community/demo project, useful as inspiration only; not a production dependency for TIP
  - code improved:
    - `packages/scraper/src/scrapers/fs-com.ts`
      - added `FS_URL_DISCOVERY_ONLY=1`
      - maps existing `FS-<numeric-id>` rows without `product_page_url` to `https://www.fs.com/de/products/<id>.html`
      - carries `targetTransceiverId` through the crawler so verified source evidence updates the original row instead of creating duplicates
      - marks current FS.com product images verified for target rows
      - accepts deterministic H1/part/spec evidence for detail verification when FS.com does not expose a traditional spec table
  - live runs on Erik:
    - URL discovery pilot:
      - target `20`
      - scraped `19`
      - failed `0`
      - no-url rows dropped from `76` to `57`
    - full URL discovery:
      - target `56`
      - scraped `55`
      - failed `1` (`https://www.fs.com/de/products/229461.html`, transient `ERR_NETWORK_CHANGED`)
      - no-url rows dropped to `2`
    - DB reconciliation with improved detail evidence:
      - target `57`
      - scraped `55`
      - failed `0`
      - new prices `41`
      - stock observations `40`
      - specs verified `55`
    - `pnpm -C packages/scraper build` passed on Erik after the code change
  - FS.com final state after URL discovery:
    - total rows: `383`
    - price verified: `379`
    - image verified: `374`
    - details verified: `373`
    - price+image+details: `373`
    - fully verified: `205`
    - missing URL: `2`
    - missing image URL: `9`
    - missing reach label: `4`
    - missing fiber type: `9`
    - HTML product-like rows:
      - total `373`
      - image `372`
      - details `371`
      - complete `371`
    - no-url rows:
      - `Change`
      - `FS-229461`
    - category rows: `4`
  - TIP health after run:
    - status `healthy`
    - load status `ok`
    - memory used `13%`
    - global verified counters:
      - price `11557`
      - image `10711`
      - details `9929`
      - fully `8526`
  - training pool:
    - pushed `4d9a11c crawl: add fscom url discovery learning record`
  - truth:
    - FS.com is still not 100% complete
    - honest current claim: `371/373` HTML product-like rows complete; remaining work is small and classifiable

- TIP FS.com / Fiberstore targeted verification push on 2026-05-09:
  - operator requested FS.com/Fiberstore next, with all crawler/scraper/robot learnings written to the TIPLLM training pool and no external AI
  - code improved:
    - `packages/scraper/src/scrapers/fs-com.ts`
      - added `FS_DB_DETAIL_ONLY=1` mode to revalidate existing FS.COM product URLs directly from DB
      - avoids broad category/listing discovery while product URLs still need verification
      - `detectReach()` now handles comma thousands and decimal values
      - added deterministic `detectFiberType()` fallback from product name, part number and specs
      - scraper now writes `productUrl` into the transceiver row
      - detail verification source is now the actual FS.com product URL instead of the literal `fs.com`
  - live Erik verification:
    - deployed scraper to `/opt/tip`
    - `pnpm -C packages/scraper build` passed on Erik after the change
    - ran four safe DB-detail-only Playwright batches:
      - batch 1: target `80`, scraped `80`, failed `0`, new prices `17`, stock `18`, specs `24`
      - batch 2: target `80`, scraped `79`, failed `0`, new prices `6`, stock `8`, specs `23`
      - batch 3: target `90`, scraped `89`, failed `0`, new prices `21`, stock `24`, specs `47`
      - batch 4 closure: target `42`, scraped `42`, failed `0`, new prices `5`, stock `3`, specs `25`
    - all runs used Playwright concurrency `1`, `nice -n 10`, and no broad category crawl
    - Erik/TIP health after closure:
      - status: `healthy`
      - load status: `ok`
      - memory used: `13%`
      - transceivers: `17647`
      - vendors: `478`
      - switches: `680`
      - global verified counters:
        - price: `11557`
        - image: `10636`
        - details: `9816`
        - fully: `8522`
  - FS.com before targeted detail batches:
    - total rows: `383`
    - price verified: `379`
    - image verified: `299`
    - details verified: `108`
    - price+image+details: `108`
    - fully verified: `3`
    - missing product URL: `76`
    - missing image URL: `84`
    - missing reach label: `9`
    - missing fiber type: `323`
    - HTML product-like complete rows: `106`
  - FS.com after closure:
    - total rows: `383`
    - price verified: `379`
    - image verified: `299`
    - details verified: `260`
    - price+image+details: `260`
    - fully verified: `205`
    - missing product URL: `76`
    - missing image URL: `84`
    - missing reach label: `9`
    - missing fiber type: `123`
    - HTML product-like rows:
      - total `299`
      - price `299`
      - image `282`
      - details `258`
      - complete `258`
    - no-url rows:
      - total `76`
      - price `76`
      - image `15`
      - details `0`
    - category rows:
      - total `4`
      - no verified signals
  - interpretation / next strategy:
    - the DB-detail-only approach is now mostly exhausted
    - the fourth clean closure batch did not raise `details_verified`; it only nudged `fully_verified` from `199` to `205`
    - do not keep repeating the same FS.com detail crawler on Erik
    - next FS.com work should be:
      - source-discovery/classification robot for the `76` no-url rows
      - parser/source diagnostics for the remaining `41` HTML product-like rows missing detail/fiber/image signals
      - likely separate handling for malformed or historical `/de/de/products/...` URLs and pages that return no useful text
  - TIPLLM training pool:
    - all four FS.com batches were written and pushed to Gitea
    - latest training commits:
      - `28cac05` batch 1
      - `a0a6be3` batch 2
      - `38736ae` batch 3
      - `2c25bf3` closure batch
  - important truth:
    - do not claim FS.com is complete
    - the honest current claim is: FS.com product-like coverage improved strongly, but `258/299` HTML product-like rows are complete and `76` no-url rows still need source discovery/classification

- TIP Flexoptix completion push on 2026-05-09:
  - operator said "feuer frei" after confirming Flexoptix was not yet complete
  - TIPLLM training pool was updated immediately with the truth rule:
    - all Flexoptix products are not complete
    - active catalog coverage must be separated from historical/extra DB rows
    - never claim 100% verification without exact counters and fresh source timestamps
  - code improved:
    - `packages/scraper/src/scrapers/flexoptix-catalog.ts`
      - generic reach parsing now handles values such as `50 m`, `1,000 m`, decimal/range forms
      - wavelength parsing now handles multiple `λ... nm` values
      - product URL is now passed into `findOrCreateScrapedTransceiver`
    - `packages/scraper/src/scrapers/flexoptix-detail-pages.ts`
      - new targeted Flexoptix detail-page verifier
      - fetches only Flexoptix `.html` product pages with missing price/image/detail fields
      - parses static product page metadata:
        - title
        - description
        - `og:image`
        - `product:price:amount`
        - reach
        - fiber type
        - wavelengths
        - connector
        - standard name
      - writes only DB evidence from Flexoptix pages, no external AI
  - live run results on Erik:
    - `pnpm -C packages/scraper build` passed
    - improved catalog run completed:
      - `Total unique products after GraphQL: 615`
      - `Flexoptix Catalog Complete: 615 products, 0 prices`
    - details improved from:
      - `details_verified: 500`
      - `price+image+details: 496`
      - `fully_verified: 496`
    - after catalog parser improvement:
      - `details_verified: 606`
      - `price+image+details: 602`
      - `fully_verified: 602`
    - detail verifier run:
      - target: `191` real `.html` product pages
      - fetched: `191`
      - failed: `0`
      - new/updated price observations: `177`
      - images marked: `187`
      - details marked: `185`
    - after detail verifier and explicit BiDi correction:
      - total Flexoptix rows: `744`
      - HTML product-like rows: `626`
      - price verified: `626`
      - image verified: `622`
      - details verified: `626`
      - price+image+details verified: `622`
      - fully verified: `620`
      - filter/category rows with no verification: `108`
      - other non-product/generic rows with no verification: `10`
  - manual evidence correction:
    - four BiDi SFP products had `1,000 m` in the Flexoptix title
    - updated from source evidence:
      - `S.B1312.M.DIL`
      - `S.B1312.M.DL`
      - `S.B1512.M.DIL`
      - `S.B1512.M.DL`
    - set:
      - `reach_label=1000m`
      - `reach_meters=1000`
      - `fiber_type=MMF`
      - `details_verified=true`
  - remaining truth:
    - active/product-like Flexoptix rows are much closer to complete
    - not all `744` Flexoptix rows can honestly be 100% verified because `118` are filter/category/generic/non-product URLs rather than concrete product pages
    - remaining HTML product-like gaps after final source check:
      - `4` product-like rows without image verification because Flexoptix exposes only `placeholder-flexoptix.jpg` as `og:image`
      - `2` FLEXBOX/accessory-like rows were classified as `Accessory`, `reach_label=N/A`, `details_verified=true`
  - operational note:
    - Erik SSH became unavailable with `connection refused` after the last verification checks
    - public TIP HTTPS still responded through Cloudflare
    - no further live commands were started after SSH refused

- TIP Flexoptix price truth recheck on 2026-05-09:
  - operator question:
    - are all Flexoptix prices, images and information present
    - are the Flexoptix prices 100% correct
  - live truth:
    - total Flexoptix rows in TIP: `744`
    - current Flexoptix catalog scraper finds: `615` active catalog products
    - price verified rows: `619`
    - latest verified price observations: `615`
    - image verified rows: `615`
    - details verified rows: `500`
    - price + image + details verified: `496`
    - fully verified: `496`
    - missing image URL: `129`
    - missing reach label: `244`
    - missing fiber type: `131`
  - important interpretation:
    - current active Flexoptix catalog price set is freshly rechecked
    - the full historical/extra Flexoptix table is not complete
    - therefore do not claim all `744` Flexoptix rows are complete
  - code fix:
    - `packages/scraper/src/utils/db.ts`
    - unchanged price observations now refresh `price_observations.verified_at = NOW()`
    - unchanged product prices now refresh `transceivers.price_verified_at = NOW()`
    - this makes live rechecks auditable instead of leaving the old verification timestamp in place
  - live recheck:
    - deployed `db.ts` to Erik
    - `pnpm -C packages/scraper build` passed
    - ran light Flexoptix catalog scraper on Erik with `nice -n 10`
    - result:
      - `Total unique products after GraphQL: 615`
      - `Flexoptix Catalog Complete: 615 products, 0 prices`
    - `0 prices` means no changed price rows were inserted because content hashes matched
    - after timestamp fix, DB shows `615` latest verified Flexoptix price observations with `verified_at` in the last 10 minutes
  - honest answer:
    - 615 active catalog prices are freshly source-confirmed by the Flexoptix scraper
    - no claim should be made that all 744 Flexoptix DB rows have complete price/image/detail coverage
    - no system should promise absolute 100% price truth forever because live vendor prices can change and may vary by account/currency/VAT/session; TIP should display last-source-verified timestamp

- MAGATAMA Atlas rematerialization / anti-auto-resolve hardening completed live on 2026-05-09:
  - operator problem:
    - Atlas / Findings / Protection Proof had become dishonest again
    - raw files on Erik still contained:
      - `3` host audits
      - `32` live Atlas scan devices
    - but open findings had collapsed back to `0`
    - Atlas UI therefore showed an implausibly clean state
  - verified root cause:
    - `packages/core/src/routes/health-builders.ts`
      - `buildProtectionProofResponse()` read Atlas audits/snapshot but did **not** resync findings from those raw sources
    - `packages/core/src/scheduler.ts`
      - generic guard stale-auto-resolve treated Atlas-managed findings like ordinary scan findings
      - newly rematerialized Atlas findings were therefore cleared again almost immediately
  - code fixed:
    - `packages/core/src/routes/health-builders.ts`
      - added `readAtlasSnapshot()`
      - added `syncAtlasAuditFindings(...)` + `syncAtlasExposureFindings(...)` via a new `syncAtlasOperationalFindings(...)` step
      - `buildProtectionProofResponse()` now re-materializes Atlas-managed findings from current raw files before building the proof response
    - `packages/core/src/scheduler.ts`
      - introduced `ATLAS_MANAGED_FINDING_SOURCES`
      - generic stale resolution now skips:
        - `atlas-coverage-gap`
        - `atlas-exposure`
        - `atlas-host-audit`
      - these sources are now left to their own verification-aware resolution logic
  - live deployment on Erik:
    - rebuilt `@magatama/core`
    - synced:
      - `/opt/magatama/packages/core/dist/routes/health-builders.js`
      - `/opt/magatama/packages/core/dist/scheduler.js`
    - restarted PM2 service:
      - `magatama`
  - live verification:
    - before fix:
      - Atlas raw files present:
        - audits: `3`
        - devices: `32`
      - DB open findings: `0`
    - after authenticated `/api/protection-proof` rebuild:
      - DB open findings: `28`
      - public `/api/findings?limit=5` now shows real open Atlas findings again
      - public `/api/protection-proof` now reports:
        - `knownAssets: 57`
        - `hostsWithTelemetry: 22`
        - `assetsWithoutTelemetry: 35`
        - `auditedHosts: 3`
        - `queueBlocked: 28`
        - `switchbladeAssets: 5`
        - `switchbladeRacks: 1`
        - `switchbladeNmsNodes: 5`
  - operational truth now:
    - Atlas and Findings are no longer silently wiped clean by the generic stale resolver
    - the remaining open state is again honest:
      - most current open findings are `atlas-coverage-gap`
      - they reflect missing live telemetry on known inventory/discovery assets
  - operator note:
    - browser cache / old UI state may still temporarily show the earlier empty Atlas
    - hard refresh is required:
      - `Cmd + Shift + R`
  - important honest remainder:
    - this closes the biggest Atlas truthfulness regression
    - it does **not** yet solve every backend truth issue
    - still pending:
      - lane-specific RunPod artifact adoption / automatic version switch
      - deeper Atlas policy refinement for which inventory-only assets should stay actionable vs informational

- TIP automated equivalence research / manual queue cleanup completed on 2026-05-09:
  - operator intent:
    - products should be researched well enough that they do not need manual equivalence validation
    - Erik must not be stressed by crawler-heavy work
    - TIPLLM-only policy for crawler/robot research remains in force
  - root cause found:
    - `approve-all` approved low-confidence equivalences and only marked them for later re-research
    - the re-research worker mostly checked whether a competitor still had a recent price
    - it did not re-evaluate hard technical equivalence evidence such as reach, wavelength, fiber type, speed and form factor
  - code changed:
    - `packages/api/src/routes/review.ts`
      - `approve-all` now approves only confidence >= `0.73`
      - weak pending rows stay pending and are queued for automated research instead of being marked approved
      - `needs_research` stats/listing now includes pending research rows
      - added `POST /api/review/run-research`
    - `packages/scraper/src/scheduler.ts`
      - added deterministic equivalence research evaluator
      - rejects stale, technically contradictory, incomplete, or low-confidence matches automatically
      - confirms only matches with recent price plus matching form factor, speed, fiber type, wavelength and reach
      - confirmed matches are scheduled for a 30-day recheck
  - live deployment:
    - synced changed files to Erik `/opt/tip`
    - `pnpm -C packages/api build` passed on Erik
    - `pnpm -C packages/scraper build` passed on Erik
    - restarted `tip-api` and `tip-scraper-daemon`
    - both processes are online
  - data cleanup performed on live DB without heavy crawling:
    - pending + due re-research candidates processed: `144103`
      - rejected fiber mismatch: `958`
      - rejected reach mismatch: `82128`
      - rejected missing reach evidence: `31151`
      - rejected wavelength mismatch: `29865`
      - rejected low confidence: `1`
    - old approved rows audited:
      - kept/confirmed: `1986`
      - rejected: `4000`
    - old auto-approved rows audited:
      - kept/confirmed: `32080`
      - rejected reach mismatch: `260`
  - final live equivalence status:
    - `pending`: `0`
    - `approved`: `1986`
    - `auto_approved`: `32080`
    - `rejected`: `148367`
    - due re-research now: `0`
    - scheduled 30-day rechecks: `34066`
  - final verification counters after reconcile:
    - `competitor_verified`: `11137`
    - `fully_verified`: `290`
    - `price_verified`: `11549`
    - `image_verified`: `10629`
    - `details_verified`: `9538`
  - operational note:
    - no new crawler wave was started for this cleanup
    - the run used existing crawled specs/prices and strict deterministic product-evidence checks
    - next improvement should be targeted crawler enrichment for products rejected due to missing reach/details, preferably on Proxmox/Pi workers rather than Erik

- TIP Flexoptix + FS.com price/image revalidation completed on 2026-05-09:
  - live root cause:
    - scraper runs had set `transceivers.price_verified`, but `price_observations.is_verified` stayed false
    - FS.com product image selector was stale and missed current `.big_img` / `.big_img_m` product images
  - code fixed:
    - `packages/scraper/src/utils/db.ts`
      - new/fresh unchanged price observations now get `is_verified = true` and `verified_at`
      - `price_verified_at` is refreshed when price verification is confirmed
      - image verification now refreshes `image_verified_at`, `image_verified_url`, and `image_scraped_at`
      - existing records revalidate images whenever current scraper output contains an image URL
    - `packages/scraper/src/scrapers/fs-com.ts`
      - added `TIP_FORCE_REVALIDATE`
      - added `FS_MAX_DETAIL_PAGES_PER_RUN`
      - added `FS_ONLY_MISSING_IMAGES`
      - updated FS.com image extraction to prefer current `resource.fs.com` product images from `.big_img_box`, `img.big_img`, `.big_img_m_active`, `.big_img_m`, `.small_img_active`
      - rejects default/logo/general/icon/SVG image URLs
  - live runs on Erik:
    - `pnpm -C packages/scraper build` passed on `/opt/tip`
    - Flexoptix catalog revalidation:
      - 615 products processed
      - 615 Flexoptix price observations marked verified
      - 605 Flexoptix images verified in the run window
    - FS.com full force revalidation:
      - 270 products discovered
      - 270 detail pages scraped
      - 0 failed detail requests
      - 17 new price observations in first full pass
      - 266 FS.com price observations marked verified after first pass
    - FS.com targeted missing-image revalidation:
      - 99 detail pages scraped
      - 0 failed detail requests
      - FS.com image-verified products increased from 207 to 299
      - FS.com verified price observations increased to 271 after targeted pass
  - final checked counters:
    - Flexoptix:
      - products: 744
      - product price_verified: 619
      - product image_verified: 615
      - price observation rows: 1288
      - verified price observation rows: 615
    - FS.COM:
      - products: 383
      - product price_verified: 379
      - product image_verified: 299
      - price observation rows: 818
      - verified price observation rows: 271
  - operations:
    - `tip-scraper-daemon` restarted and is online
    - Erik remained stable; final load was about `2.16, 2.22, 2.47`
    - CT115 / `tip-scraper` SSH did not respond quickly from this session, so it was not used
  - TIPLLM training pool:
    - `/tmp/tip-training-data` was recloned from Gitea
    - crawler experience was written to:
      - `robot-experiences/2026-05-09.jsonl`
      - `qa-pairs/robot-control-high.jsonl`
    - pushed to Gitea commit:
      - `850083f crawl: add flexoptix fs revalidation learning record`

- MAGATAMA dashboard truthfulness / UX hardening on 2026-05-09:
  - live `api/llm/status` on MAGATAMA now publicly confirms the corrected `magatamallm` lane counts:
    - `15679` train / collected
    - `1743` eval
    - `17422` total
    - `15679` new since last training
  - the Training page inconsistency was traced to a stale browser/static-cache path plus mixed UI sources
  - dashboard static UI was updated and deployed live to Erik:
    - new cache version:
      - `2026-05-09a`
    - Training Control now force-merges the visible summary with the live `llmStatus.training` payload so the page and modal cannot silently disagree on pair counts
  - Switchblade network port UX was hardened:
    - hover detail remains
    - each port is now also clickable
    - click opens a real MAGATAMA-side detail modal with:
      - status
      - speed
      - description
      - peer device / peer port
      - connected host
      - VLAN
      - transceiver
      - in/out errors
      - octet counters
    - this was done because hover-only behavior was still presenting as broken / ambiguous for the operator
  - direct live deployment truth on Erik:
    - `/opt/magatama/packages/dashboard/public/index-v2.html` now contains:
      - `API_CACHE_VERSION = '2026-05-09a'`
      - `openSwitchbladePortModal`
      - `Ports · Hover = Nutzung / Status · Klick = Detail`
  - important honest remainder:
    - this fixes the visible UI inconsistency and the broken/stale port interaction path
    - it does **not yet** complete the deeper backend truthfulness issue where Atlas/host-audit raw files can still show real issues while the live open-findings surface may be empty
    - that rematerialization / anti-auto-resolve backend block still needs a dedicated follow-up pass

- Full cross-agent sync refresh on 2026-05-07:
  - all current MAGATAMA/RunPod training automation findings from this chat were consolidated again into `sync/`
  - latest confirmed truth:
    - `sync/` commits successfully reached Gitea again
    - current pushed sync commits now include:
      - `2a35761 sync: record runpod managed endpoint root cause`
      - `72d61ad sync: record custom runpod worker build prep`
  - operator requirement was reaffirmed:
    - all meaningful chat discoveries, decisions, blockers, and deployment truths must continue to be written back into `sync/` so Claude, Codex, and the laptop stay aligned
  - current MAGATAMA training automation truth remains:
    - lane-specific pools are separated and prepared
    - URL-bundle dataset path is in place
    - local adoption/smoke/version-switch code path is in place
    - but fully automatic RunPod return/adoption still depends on switching from the managed Axolotl endpoint to a custom MAGATAMA worker endpoint
  - current infrastructure truth remains:
    - Erik can build Docker images
    - Erik has `docker buildx`
    - Erik currently has no docker registry login/config
    - therefore registry publication of the custom worker image is still the final missing operational prerequisite
  - next required operator inputs for full closure:
    - either:
      - `GHCR_USERNAME` + `GHCR_TOKEN`
    - or:
      - Docker Hub repo + credentials
    - or:
      - an already approved container image destination
  - once registry publication is possible, the exact remaining sequence is:
    - publish custom worker image
    - create/update RunPod endpoint to that image
    - set on Erik:
      - `RUNPOD_WORKER_KIND=custom-magatama`
      - `RUNPOD_ENDPOINT_ID=<custom endpoint id>`
    - restart MAGATAMA dashboard
    - run lane-specific canary training
    - verify:
      - artifact exists
      - local adoption succeeds
      - smoke tests pass
      - release alias increments
      - active lane alias switches automatically

- MAGATAMA RunPod custom worker preparation continued on 2026-05-07:
  - the pending sync handoff was committed and **successfully pushed to Gitea**:
    - commit:
      - `2a35761 sync: record runpod managed endpoint root cause`
  - MAGATAMA repo now includes an explicit helper for building/publishing the custom RunPod worker image:
    - `magatama/scripts/runpod_worker_publish.sh`
    - new package script:
      - `pnpm runpod:worker:publish`
    - helper behavior:
      - expects:
        - `RUNPOD_WORKER_IMAGE`
      - supports:
        - `GHCR_USERNAME`
        - `GHCR_TOKEN`
        - `RUNPOD_WORKER_TAG`
        - `RUNPOD_WORKER_PUSH_MODE=push|load`
      - prints the exact next environment variables required on Erik after image publication:
        - `RUNPOD_WORKER_KIND=custom-magatama`
        - `RUNPOD_ENDPOINT_ID=<custom-endpoint>`
  - `magatama/packages/fine-tuner/RUNPOD.md` was extended so the full automation target is now documented end-to-end:
    - lane pool sync
    - RunPod dataset URL bundle
    - custom worker training
    - adapter upload
    - local adoption
    - smoke tests
    - release alias minting
    - active alias switch
  - Erik infrastructure truth was rechecked:
    - `docker` exists:
      - `/usr/bin/docker`
    - `docker buildx` exists:
      - `github.com/docker/buildx v0.33.0`
    - **no docker registry login/config** is currently present on Erik:
      - `~/.docker/config.json` absent
    - interpretation:
      - Erik can build images
      - but cannot yet push a public/private worker image to GHCR/Docker Hub without credentials or a pre-authenticated registry path
  - the missing custom worker files were synced live to Erik:
    - `/opt/magatama/packages/fine-tuner/Dockerfile.runpod`
    - `/opt/magatama/packages/fine-tuner/RUNPOD.md`
  - a real remote worker image build was then attempted on Erik:
    - image tag requested:
      - `magatama-runpod-worker:test`
    - build truth:
      - base `runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04` pulled successfully
      - Python dependencies for the worker installed successfully
      - build reached:
        - `COPY train_cuda.py runpod_handler.py ./`
        - `exporting to image`
    - however:
      - final image was **not yet visible** in `docker images`
      - therefore the build still needs one more clean verification pass before being treated as green
  - current operational conclusion:
    - MAGATAMA training pools, lane separation, signed dataset URL path, and local adoption API are ready
    - the final blocking step remains infrastructure:
      - publish the custom worker image to a registry RunPod can consume
      - create/switch the endpoint
      - then set on Erik:
        - `RUNPOD_WORKER_KIND=custom-magatama`
        - `RUNPOD_ENDPOINT_ID=<custom endpoint id>`
    - once that is done, MAGATAMA's already-prepared code path can finally perform:
      - train
      - verify artifact
      - adopt locally
      - smoke-test
      - bump version
      - switch alias

- MAGATAMA RunPod training return-path deep dive on 2026-05-07:
  - Attack Paths `Open Fix Guidance` placebo button was fixed live on Erik:
    - `magatama/packages/dashboard/public/index-v2.html`
    - real behavior now:
      - if graph node maps to a real finding, open the existing ticket/finding drawer
      - if node is only synthetic, show an explicit warning instead of doing nothing
    - deployed to:
      - `/opt/magatama/packages/dashboard/public/index-v2.html`
    - `pm2 restart magatama-dashboard` executed
  - local Mac train API truth rechecked:
    - `GET http://127.0.0.1:3214/health`
    - returns `status = ok`
    - service is idle/reachable, not broken
  - RunPod heartbeat/UI stream issue was fixed live:
    - dashboard server now emits keepalive progress messages during:
      - long `IN_PROGRESS` phases
      - post-`COMPLETED` artifact verification loops
    - deployed live to Erik dashboard
  - direct raw RunPod status canary against the current endpoint (`dheii186pfcuq7`) was executed:
    - tiny 1-step `tip_llm` canary job:
      - `33434e85-3cc1-4dea-9043-83c315aaeb9c-e2`
    - observed raw status sequence:
      - `IN_QUEUE`
      - `IN_PROGRESS`
      - `COMPLETED`
    - **critical truth**:
      - `/status/{job}` returned no `output`
      - `/stream/{job}` returned:
        - `{"status":"COMPLETED","stream":[]}`
    - interpretation:
      - the currently configured endpoint is the managed Axolotl serverless endpoint
      - it does not return a programmatically adoptable artifact reference to MAGATAMA
      - this is why all lanes keep ending in:
        - `completed_without_model_artifact`
  - Erik secrets reality rechecked:
    - `/opt/magatama/secrets/hf-token` exists and is readable by the running process
    - therefore the current failure is **not** caused by a missing HF token on Erik
  - root cause now considered confirmed:
    - the **managed Axolotl serverless endpoint** is acceptable for queueing/running a fine-tune
    - but not sufficient for MAGATAMA's required full automation:
      - train
      - return explicit artifact
      - adopt locally
      - smoke-test
      - create new release alias
      - switch active alias
  - code path for the correct architecture is now prepared:
    - `magatama/packages/fine-tuner/runpod_handler.py`
    - `magatama/packages/fine-tuner/train_cuda.py`
    - `magatama/packages/fine-tuner/requirements-runpod.txt`
    - `magatama/packages/dashboard/src/server.ts`
  - what changed in that path:
    - custom RunPod worker now accepts:
      - `target_model`
      - `credentials.hf_token`
    - training script now:
      - trains lane-specific bundle
      - uploads the resulting adapter folder to Hugging Face
      - returns `adapter_repo_id`
    - dashboard custom-worker submit path now includes:
      - `run_id`
      - `target_model`
      - HF credential pass-through for the worker
    - dashboard error text is now explicit:
      - if the managed Axolotl endpoint completes without an adoptable artifact, MAGATAMA says so plainly and points at the need for the `custom-magatama` worker
  - live deployment status:
    - updated dashboard server was rebuilt and deployed to Erik
    - updated custom worker source files were synced into Erik repo state
    - BUT:
      - the currently active RunPod endpoint is still the managed Axolotl endpoint
      - the new full return-path logic will only become effective once the RunPod endpoint is switched to the custom MAGATAMA worker image
  - operational conclusion:
    - training pool refresh, lane separation, submit flow, and local adoption API are now in good shape
    - the final missing infrastructure step is:
      - build/publish `packages/fine-tuner/Dockerfile.runpod`
      - create/use a custom RunPod serverless endpoint for `runpod_handler.py`
      - set:
        - `RUNPOD_WORKER_KIND=custom-magatama`
        - `RUNPOD_ENDPOINT_ID=<custom-endpoint>`
    - only then can MAGATAMA honestly achieve:
      - automatic training
      - automatic artifact return
      - automatic adoption
      - automatic version bump
      - automatic alias switch after smoke tests

## Active Policy

- Put coordination notes and handoffs in this `sync/` folder and push to Gitea.
- Check sibling project sync folders first when context may span repos.
- Use TIPLLM only for TIP crawler/robot planning and extraction feedback.
- Write robot/crawler experience into the Gitea-backed TIPLLM training pool.
- Keep Erik safe: no heavy crawler waves or uncontrolled Playwright/discovery jobs on Erik.
- Use Proxmox/Pi workers for crawl load.

## Cross-Repo Sync

Claude Code also created a Gitea sync handoff in the LLM Gateway repo:

- Repo: `rene/llm-gateway`
- Path: `sync/`
- Commit shown by Claude: `e272105 sync: add chat handoff + context scaffolding for Codex integration (2026-04-29)`
- Gitea path: `http://192.168.178.196:3000/rene/llm-gateway/src/main/sync/`

When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infrastructure, read both:

- `transceiver-db/sync/CURRENT.md`
- `llm-gateway/sync/CURRENT.md`

## Latest Work

- RunPod/MAGATAMA training live follow-up on 2026-05-07:
  - latest `magatamallm` serverless run verified on Erik:
    - job id:
      - `ad003f90-3cf9-43f6-8960-bf6c1ea85097-e2`
    - registry truth in:
      - `/opt/magatama/training-data/model-registry/training-runs.json`
    - observed states:
      - `submitted`
      - then `completed_without_model_artifact`
    - exact recorded warning:
      - `RunPod meldete COMPLETED, aber das erwartete HuggingFace-Modellrepo wurde nicht gefunden.`
  - interpretation:
    - dataset build and RunPod submit are working
    - the worker still does not return a verifiable adoptable model artifact
    - this is a real training return-path failure, not just a cosmetic UI issue
  - local training API truth rechecked:
    - `GET http://127.0.0.1:3214/health`
    - service responds with:
      - `status = ok`
      - `service = magatama-train-api`
      - `running = false`
      - `pid = null`
    - meaning:
      - API is healthy/reachable
      - currently idle
      - ready for adoption/import calls once a valid RunPod artifact exists
  - one UI bug in the training modal was fixed live:
    - root cause:
      - during long `IN_PROGRESS` and post-`COMPLETED` artifact verification phases, MAGATAMA sent no heartbeat for too long
      - browser/proxy could then terminate the stream and surface only:
        - `network error`
      - even though Erik had already written the more truthful registry state
    - fix:
      - `magatama/packages/dashboard/src/server.ts`
      - added server-sent heartbeat messages while:
        - RunPod status remains unchanged
        - Hugging Face / artifact propagation checks are still running
      - concrete live strings now deployed in Erik dashboard server:
        - `⏳ RunPod arbeitet weiter (...)`
        - `⏳ Prüfe Modellartefakt ...`
    - deployment:
      - rebuilt dashboard
      - rsynced `packages/dashboard/dist/server.js` to Erik
      - restarted `pm2 magatama-dashboard`
      - remote `server.js` verified to contain heartbeat strings
  - expected operator effect:
    - future training runs should no longer collapse into a late generic `network error` while RunPod/adoption checks are still active
    - the UI should stay alive long enough to show the real terminal result:
      - `completed_and_adopted`
      - or
      - `completed_without_model_artifact`
      - or
      - worker/adoption failure

- MAGATAMA live follow-up on 2026-05-07:
  - local Mac training API was rechecked after the lane-specific automation changes.
  - current live truth:
    - LaunchAgent `org.fichtmueller.magatama-train-api` is present and running
    - process listens on `*:3214`
    - localhost health now responds when checked outside sandbox restrictions:
      - `GET http://127.0.0.1:3214/health`
      - response:
        - `status = ok`
        - `service = magatama-train-api`
        - `running = false`
        - `pid = null`
        - `updated_at = 2026-05-07T04:14:23Z`
      - interpretation:
        - the training API itself is healthy and reachable
        - it is currently idle, not broken
        - the actual next proof point must come from a fresh lane run that writes lane-specific `*-last_run.json`
  - live Attack Paths UI bug was fixed and deployed to Erik:
    - root cause:
      - the `Open Fix Guidance` button inside the attack-path side panel only triggered a dummy toast and never opened a real finding/ticket detail
    - fix:
      - `magatama/packages/dashboard/public/index-v2.html`
      - new helper:
        - `openFixGuidanceForNode(nodeId)`
      - behavior:
        - if the clicked graph node maps to a real finding ID, MAGATAMA now opens the existing ticket/finding detail drawer via `openTicket(id)`
        - if the node is only a synthetic path node with no backing finding, MAGATAMA now shows an explicit warning instead of pretending to open guidance
    - live deployment:
      - updated `index-v2.html` was rsynced to:
        - `/opt/magatama/packages/dashboard/public/index-v2.html`
      - `pm2 restart magatama-dashboard` executed on Erik
      - deployed file on Erik verified with:
        - `openFixGuidanceForNode`
        - `Open Fix Guidance`
  - operator consequence:
    - Attack Paths no longer contain a placebo “Open Fix Guidance” action
    - clicking it should now open the actual MAGATAMA finding/ticket guidance path when the graph node represents a real finding

- MAGATAMA training automation was hardened locally on 2026-05-07 for all three lanes:
  - target lanes:
    - `magatamallm`
    - `fo_blogllm`
    - `tip_llm`
  - core root cause confirmed:
    - RunPod dataset refresh / lane export already worked
    - RunPod jobs often reached `COMPLETED`
    - but model adoption/version truth still depended on a single shared:
      - `~/magatama-llm/fine-tuning/last_run.json`
    - this made lane status and successful return/adoption ambiguous across models
    - the training modal could also collapse late stream/adoption failures into a generic `network error`
  - local code fixes now in place:
    - `magatama/packages/fine-tuner/training_api.py`
      - lane-specific last-run files added:
        - `~/magatama-llm/fine-tuning/magatamallm-last_run.json`
        - `~/magatama-llm/fine-tuning/fo_blogllm-last_run.json`
        - `~/magatama-llm/fine-tuning/tip_llm-last_run.json`
      - legacy `last_run.json` remains only as backward-compatible mirror for `magatamallm`
      - successful RunPod adoption now creates:
        - a release alias per lane, e.g. `<active-alias>-rN`
      - active alias switching sequence is now:
        - candidate model imported
        - smoke-tested
        - release alias created
        - stable active alias repointed to that release alias
      - adoption report now includes:
        - `version_counter`
        - `release_alias`
    - `magatama/packages/fine-tuner/train.py`
      - local metrics writing now also respects lane-specific last-run files via `TRAINING_LANE`
    - `magatama/packages/dashboard/src/server.ts`
      - `/api/llm/status` now reads lane-specific last-run metadata first
      - `release_alias` is preferred as visible model version when present
      - RunPod SSE catch now distinguishes:
        - real generic training failure
        - `COMPLETED` but no artifact / failed adoption
      - the latter is now rendered as a truthful return/adoption failure, not a vague dataset/network issue
    - `magatama/packages/dashboard/public/index-v2.html`
      - training modal now suppresses misleading late generic `network error` if the server already emitted a terminal training status
      - if the stream ends without a final terminal server event, the UI now explicitly says the registry/adoption state must be checked
      - if the backend reports:
        - completed without artifact
        - completed without HF model
        - completed but adoption failed
        the modal now shows that exact reason
  - local verification:
    - `python3 -m py_compile` passed for:
      - `training_api.py`
      - `train.py`
    - dashboard build passed:
      - `pnpm -C packages/dashboard build`
  - current operational blocker:
    - live deployment to Erik was **not yet completed in this step**
    - direct SSH checks returned:
      - `Connection refused`
      - then `Operation timed out`
    - because of that, the new lane-specific automation logic is locally ready, but not yet confirmed live on Erik for the currently running:
      - `tip_llm`
      - `fo_blogllm`
  - practical consequence:
    - the code path is now prepared for full automation:
      - pull from lane-specific training pool
      - train on RunPod
      - verify artifact existence
      - adopt locally
      - create new release alias/version
      - repoint stable active alias
      - show truthful status in UI
    - but the current live Erik run still needs redeploy + verification once SSH is reachable again

- MAGATAMA local MagatamaLLM training state was re-verified on 2026-05-07:
  - result:
    - the lane export / dataset refresh worked
    - a new locally adopted MagatamaLLM model did **not** land
    - active MAGATAMA provider remains the older alias:
      - `ollama:magatama-coder:latest`
  - live/public evidence:
    - `GET https://magatama.fichtmueller.org/api/llm/status`
      - `activeProvider = ollama:magatama-coder:latest`
      - `autoFixProvider = ollama:magatama-coder:latest`
      - `training.lastTrainingAt = 2026-05-06T22:43:20Z`
      - `training.modelVersion = magatama-coder:latest`
      - `training.activeRun = null`
    - this means the UI timestamp currently reflects the latest dataset/training-state update, not proof of a newly adopted local model.
  - local Mac evidence:
    - `ollama list` still shows:
      - `magatama-coder:latest` → modified `3 weeks ago`
      - `magatama-llm-v2-0:latest` → modified `11 days ago`
    - no newer Magatama candidate/import alias appeared locally
  - registry/adoption evidence:
    - Erik lane manifest exists and is fresh:
      - `/opt/magatama/training-data/runpod/magatamallm/manifest.json`
      - `generatedAt = 2026-05-06T22:45:15.944Z`
      - `train = 15679`
      - `eval = 1743`
      - `total = 17422`
    - but Erik had no populated local adoption/registry state files in:
      - `/opt/magatama/training-data/model-registry/models.json`
      - `/opt/magatama/training-data/model-registry/runs.json`
      - `/opt/magatama/training-data/model-registry/active.json`
      - `/opt/magatama/data/llm-status.json`
    - local repo only had historical `training-data/model-registry/training-runs.json`
  - historical run evidence:
    - recent `magatamallm` training-run records still show:
      - `submitted`
      - then `not_found_after_submit`
      - or other non-adopted / worker-failure states
    - there is still no verified “completed_and_adopted” proof for a new MagatamaLLM local model.
  - operational conclusion:
    - current truth:
      - dataset/lane preparation works
      - local model adoption is still the missing step
      - MAGATAMA does **not** currently know more than the already active `magatama-coder:latest` alias
    - next fix block remains:
      - make RunPod/local completion count only when adoption succeeds
      - persist adoption report + model registry state
      - update active alias and version only after smoke-tested import succeeds

- MAGATAMA Switchblade port intelligence is now truly flowing end-to-end on 2026-05-06:
  - live root cause:
    - Switchblade itself already had the rich SG350 data (`description`, LLDP neighbor, peer port, octets), but MAGATAMA had still shown mostly flat port chips.
    - verified live on Erik:
      - the real Switchblade runtime is the PM2 app `switchblade` under `/opt/switchblade-app`, not the older `/opt/switchblade` tree.
      - `GET http://127.0.0.1:3000/api/discovery/snmp` for `192.168.178.2` already returned rich rows such as:
        - `GigabitEthernet3` → description `Aruba-1830-UNUSED`, neighbor `VN46KYC0G0`, peer port `11`
        - `GigabitEthernet5` → description `Tashi-204`, neighbor `fritz.box`, peer `LAN:1`
        - `GigabitEthernet25` → description `to Cisco Business 220 Series`, neighbor `Switch39688E`, peer `gi9`
    - the remaining loss point was MAGATAMA’s own Switchblade sync/persistence path.
  - MAGATAMA sync hardening:
    - `scripts/switchblade_live_sync.ts`
      - now prefers live SNMP discovery data when it is richer than `/api/devices/<ip>`
      - now maps `description`, `peerDevice`, `peerPort`, `connectedHost`, `inOctets`, `outOctets` into rack device ports
      - added optional debug snapshot dump support via `SWITCHBLADE_DEBUG_SNAPSHOT_FILE`
      - sanitizes unreadable peer-port strings and drops synthetic high-index numeric pseudo-ports
    - verified with a forced live run on Erik:
      - `Top of Rack Switch` now exports `28` real SG350 ports into the rack snapshot instead of the earlier flattened/odd set
      - sample verified payloads before POST:
        - port 3 → `Aruba-1830-UNUSED` / `VN46KYC0G0` / `11`
        - port 5 → `Tashi-204` / `fritz.box` / `LAN:1`
        - port 25 → `to Cisco Business 220 Series` / `Switch39688E` / `gi9`
  - MAGATAMA core hardening:
    - `packages/core/src/routes/health-types.ts`
      - `SwitchbladePortSnapshot` now preserves:
        - `description`
        - `vlan`
        - `macCount`
        - `peerDevice`
        - `peerPort`
        - `connectedHost`
        - `transceiver`
        - `inOctets`
        - `outOctets`
    - `packages/core/src/routes/health-support.ts`
      - `normalizeSwitchbladePort()` now keeps those additional port fields instead of silently truncating them
    - rebuilt locally and re-rsynced the new `packages/core/dist` to Erik
  - dashboard/UI hardening:
    - `packages/dashboard/public/index-v2.html`
      - port chips already had custom tooltip support; now they also carry native `title=` fallback text
      - this reduces the old “question mark / unclear hover” problem in browsers that do not immediately show the custom bubble
  - live public verification after deploy:
    - `GET https://magatama.fichtmueller.org/api/switchblade/snapshot`
      - now contains enriched SG350 rack-port records with:
        - `description`
        - `peerDevice`
        - `peerPort`
        - `connectedHost`
        - `inOctets`
        - `outOctets`
      - public snapshot timestamp verified:
        - `receivedAt = 2026-05-06T22:51:59.247Z`
    - `Top of Rack Switch` in the public snapshot now exposes meaningful peer/use-case data instead of only flat status counters
  - operator impact:
    - MAGATAMA can now answer the actual operational question per port:
      - what is on this port
      - what is it talking to
      - what does the link look like
    - this is now grounded in Switchblade live SNMP/LLDP data, not guesswork.

- TIP/Blog lane separation was materially corrected on 2026-05-06:
  - root cause:
    - `TIP_LLM` was still ingesting blog-/writer-shaped rows from the canonical lane pool and shared transceiver corpora.
    - local inspection showed the old TIP export had `6250` train rows, of which `6087` still matched blog/writer patterns.
  - dataset builder and Gitea sync were hardened:
    - `scripts/runpod_dataset_builder.ts`
      - added strict `tipDatasetAllowed(...)`
      - `TIP_LLM` now rejects blog-shaped source rows at dataset-build time
      - `TIP_LLM` now rejects blog-like `system`, `user`, and markdown-article `assistant` patterns
      - registry fallback for `TIP_LLM` now only uses lane-compatible datasets
    - `scripts/sync_gitea_training_pool.ts`
      - canonical TIP pool refresh now uses the stricter lane-alignment rules
      - redundant `merged.jsonl` copies for `fo_blogllm` and `tip_llm` are no longer rewritten, to avoid local disk exhaustion from duplicate lane artifacts
  - local disk issue encountered and fixed:
    - full refresh failed with `ENOSPC` while writing `training-data/gitea-learning-pool/tip_llm/merged.jsonl`
    - redundant lane `merged` artifacts for `fo_blogllm` and `tip_llm` were truncated and the sync script was changed to stop recreating them
    - free disk space returned from `377Mi` to `17Gi`
  - locally verified after rebuild:
    - `TIP_LLM` RunPod export:
      - `train = 233`
      - `eval = 26`
      - `total = 259`
      - `blog/writer matches = 0`
    - first TIP rows now use the correct TIP system prompt:
      - `You are TIP_LLM, a research and market-intelligence analyst for transceivers, switches, and vendor ecosystems...`
  - corrected artifacts and scripts were synced to Erik and `pnpm training:refresh-all` was rerun there.
  - live verified on Erik/public API:
    - `magatamallm`
      - `datasetSource = url`
      - `collectedExamples = 15679`
      - `evalExamples = 1743`
      - `totalExamples = 17422`
      - `newSinceLastTraining = 15679`
    - `fo_blogllm`
      - `datasetSource = url`
      - `collectedExamples = 17322`
      - `evalExamples = 1926`
      - `totalExamples = 19254`
      - `neverTrained = true`
    - `tip_llm`
      - `datasetSource = url`
      - `collectedExamples = 231`
      - `evalExamples = 26`
      - `totalExamples = 257`
      - `neverTrained = true`
  - operational conclusion:
    - lane-specific dataset truth is now real on Erik.
    - `TIP_LLM` is no longer silently borrowing the FO_Blog behavior lane.
    - the next remaining hard problem is now RunPod artifact adoption/validation, not lane contamination.

- MAGATAMA frontend/runtime consistency was repaired again on 2026-05-06:
  - dashboard and core were rebuilt locally and redeployed to Erik.
  - live processes restarted successfully:
    - `magatama-dashboard`
    - `magatama`
  - public `api/llm/status` now shows the true lane-export totals for `magatamallm`:
    - `collectedExamples = 15620`
    - `effectiveExamples = 15620`
    - `evalExamples = 1736`
    - `totalExamples = 17356`
    - `newSinceLastTraining = 15620`
  - root cause for the stale `1097` display:
    - the RunPod start SSE path still logged the legacy deduplicated `fixes.jsonl` corpus.
    - this was changed so RunPod launches no longer present the legacy `1097` count as the active training truth.
    - after dataset refresh the UI now emits the lane manifest totals instead.
  - RunPod completion handling was hardened:
    - worker `COMPLETED` is no longer trusted blindly.
    - MAGATAMA now scans RunPod worker logs for real training failures (`Traceback`, `SyntaxError`, non-zero exit, etc.) before treating the run as successful.
    - if the worker logs show a hidden failure, MAGATAMA records this as `completed_with_worker_failure` instead of pretending the run succeeded.
  - public findings state remains currently empty:
    - `GET /api/findings?limit=1` returned `{"findings":[],"total":0}`
    - this is now rendered with an explicit empty-state row instead of a visually blank table.
  - Attack Paths empty-state is now intentionally explicit rather than looking broken.
  - Frontend cache and scope handling were hardened:
    - cache version bumped to `2026-05-06b`
    - stale legacy `magatama_api_cache:*` entries are cleared
    - per-endpoint TTLs added
    - invalid or empty scope selections are normalized instead of silently leaving the UI in misleading empty views
  - Switchblade rack port hover was materially improved:
    - port chips now carry `data-tooltip`
    - custom tooltip CSS is live on Erik
    - the old browser-native “question mark only” behavior should be replaced by a readable hover bubble
  - Changelog self-healing was added in core:
    - stale cached changelog data older than 6h now forces a rebuild from git history
    - verified live via dashboard proxy on Erik:
      - `generatedAt = 2026-05-06T15:18:42.708Z`
      - latest visible entries include `2026-04-30` items again instead of appearing frozen at `30.05`

- MAGATAMA lane-specific training pools and RunPod dataset automation were finished on 2026-05-06:
  - root cause:
    - the training modal always fetched `/api/llm/status` without a lane, so `FO_BlogLLM` and `TIP_LLM` still showed the `magatamallm` pool.
  - dashboard/server were updated so `/api/llm/status?lane=...` is now truly lane-aware.
  - the training modal now refreshes per selected lane and rewrites:
    - title
    - runtime label
    - pool path
    - counts
    - dataset source
  - MAGATAMA dashboard env on Erik was switched to URL dataset mode for all lanes via `ecosystem.config.cjs`:
    - `RUNPOD_DATASET_SOURCE=url`
    - `RUNPOD_DATASET_SOURCE_MAGATAMALLM=url`
    - `RUNPOD_DATASET_SOURCE_FO_BLOGLLM=url`
    - `RUNPOD_DATASET_SOURCE_TIP_LLM=url`
  - live verified on Erik after restart:
    - `fo_blogllm`
      - `datasetSource = url`
      - `collectionsPath = /opt/magatama/training-data/runpod/fo_blogllm/manifest.json`
      - `train = 28`
      - `eval = 4`
      - `total = 32`
    - `tip_llm`
      - `datasetSource = url`
      - `collectionsPath = /opt/magatama/training-data/runpod/tip_llm/manifest.json`
      - `train = 36`
      - `eval = 4`
      - `total = 40`
    - `magatamallm`
      - remains on lane-export counts (`15620 / 1736 / 17356`)
  - operator impact:
    - no Hugging Face dataset publish is required anymore for MAGATAMA RunPod launches.
    - every supported LLM lane now points to its own local/Gitea-backed lane export instead of reusing `magatamallm`.

- MAGATAMA training + Attack Paths + Atlas exposure were corrected again on 2026-05-06:
  - the RunPod serverless training start failure was not a RunPod outage.
  - root cause was missing training scripts on Erik (`training_full_refresh.ts` and related helpers were absent under `/opt/magatama/scripts`).
  - Codex synced the full local `magatama/scripts/` tree to Erik, added a safe fallback in `scripts/model_registry_build.ts`, and synced the local `training-data/model-registry/` directory.
  - verified on Erik:
    - `pnpm training:refresh-all` now succeeds.
    - fresh dataset totals after dedupe:
      - `magatamallm`: `92,742` raw → `17,356` effective (`15,620 train / 1,736 eval`)
      - `fo_blogllm`: `32` total (`28 train / 4 eval`)
      - `tip_llm`: `40` total (`36 train / 4 eval`)
  - important nuance:
    - Codex did **not** execute the final Hugging Face publish step from Erik in this chat.
    - local/script/build failures are fixed; external dataset publish still depends on the selected dataset source and explicit publish intent.
- MAGATAMA Attack Paths UX is no longer a misleading blank panel:
  - the page now distinguishes between:
    - no live attack paths
    - historical fallback paths
    - empty selected scope (`0 assets in scope`)
  - when a user narrows the scope to a rack/location with zero scoped assets, the graph explicitly says so instead of looking broken.
  - live dashboard HTML on Erik now contains:
    - `Im aktuellen Scope liegen 0 Assets.`
    - `Erweitere Standort oder Datacenter / Rack, damit MAGATAMA korrelierbare Assets und Pfade darstellen kann.`
    - `Ohne offene mehrstufige Korrelationen bleibt die Graph-Sicht bewusst leer.`
- MAGATAMA code/training hardening was extended:
  - `scripts/test_runpod_adapter.py` no longer loads tokenizer/model with `trust_remote_code=True`.
  - `scripts/ollama_adapter_bridge.py` no longer loads tokenizer/model with `trust_remote_code=True`.
  - this removed the live CODE finding around `HuggingFace trust_remote_code` on Erik.
- Atlas exposure logic was tightened to stop reopening noisy LAN management findings:
  - generic `atlas-exposure` findings now only stay operationally open for exposure that is meaningful enough to track as a finding.
  - internal RFC1918 management/service ports discovered by the broad atlas scan are no longer promoted into open Guard findings just because they exist on the LAN.
  - host-specific posture for Proxmox / Erik / Mac Studio remains the job of explicit host-audit logic.
  - after rebuild + deploy + health sync:
    - live Postgres open findings returned to `0`.
- Follow-up hardening on the same block:
  - the earlier RunPod error path in MAGATAMA dashboard was made more truthful.
  - dataset preparation now distinguishes:
    - local `training:refresh-all` failure
    - optional Hugging Face publish failure
    - URL-based dataset mode with no external publish required
  - the training SSE flow now explicitly tells the operator whether RunPod is using:
    - Hugging Face dataset source
    - or MAGATAMA URL-bundle dataset source
  - this avoids misleading `RunPod not reachable` wording when the actual failure is in dataset preparation.
  - follow-up serverless verification on 2026-05-06 narrowed the remaining fault further:
    - MAGATAMA submit logic now verifies that a RunPod job really exists under `/status/{jobId}` instead of trusting `/run`.
    - payloads were aligned more closely with the official Axolotl serverless schema:
      - `model_type=AutoModelForCausalLM`
      - `tokenizer_type=AutoTokenizer`
      - dataset `split: train`
      - optimizer `adamw_torch_fused`
    - verified full run attempt:
      - job id `9bc4b16b-755b-465b-aadf-b46f2fe467a3-e2`
      - disappeared as `not_found_after_submit` (`404 job not found`)
    - verified canary after payload fix:
      - job id `a4ac6951-7ed7-43cb-80d8-5ab61533c2da-e2`
      - immediately materialized as `IN_QUEUE`
      - then still disappeared on later reconcile as `not_found_after_submit`
    - current conclusion:
      - the old MAGATAMA bug is fixed.
      - the remaining problem is now likely on the RunPod endpoint/release side: jobs are accepted and briefly queued, but do not survive long enough to produce a durable serverless status lifecycle.
    - operational rule:
      - do not treat `submitted` or a brief `IN_QUEUE` as proof of a usable serverless training run.
      - only trust the run once it reaches `IN_PROGRESS` or a durable terminal state with artifact evidence.
  - follow-up training count fix on 2026-05-06 corrected the Training UI source-of-truth:
    - MAGATAMA had still shown `1097` because the dashboard was counting the legacy deduplicated fix corpus instead of the current lane-specific RunPod export.
    - dashboard now prefers `training-data/runpod/magatamallm/manifest.json` for the visible MagatamaLLM training count.
    - synced current lane export to Erik and restarted `magatama-dashboard`.
    - verified public API now returns:
      - `collectedExamples = 1367`
      - `effectiveExamples = 1367`
      - `evalExamples = 152`
      - `totalExamples = 1519`
      - `newSinceLastTraining = 1367`
    - if the browser still shows `1097`, treat it as stale cached UI and hard reload.

- MAGATAMA was repaired end-to-end to a clean operational baseline:
  - live guard host-audits for Erik, Mac Studio, and Proxmox were corrected and rerun.
  - open findings were reduced all the way to `0` in Postgres.
  - false-positive Proxmox baseline findings were removed by teaching the audit to treat internal-only management ports and default-only rpcbind exposure as acceptable for this host.
  - code scanner false positives from generated/report artifacts remain excluded.
- Live MAGATAMA protection/runtime state after the 2026-05-06 remediation:
  - `open findings: 0`
  - `queueExecuting: 0`
  - `queueBlocked: 0`
  - `queueFailed: 0`
  - public `/api/health` returns `status: ok`
  - public `/api/active-resolvers` returns:
    - `MAGATAMA Core: working`
    - `MagatamaLLM: working`
    - `Claude (secondary): working`
    - `Codex (secondary/manual): idle`
    - `Copilot (secondary/manual): idle`
- Important resolver truth fix on 2026-05-06:
  - live `codex_enabled=false` in MAGATAMA settings was causing Codex to show as a broken resolver.
  - dashboard logic was updated so disabled Codex/Copilot now show truthfully as `idle` with `In MAGATAMA settings disabled`, instead of pretending there is a runtime outage.
  - the local codex bridge on Erik is reachable but currently reports `auth_required`; do not treat that as a production outage while Codex is intentionally disabled in settings.
- Remaining real operational gap after findings hit zero:
  - MAGATAMA still knows more assets than it actively telemeters.
  - last public protection proof showed:
    - `knownAssets: 79`
    - `hostsWithTelemetry: 27`
    - `assetsWithoutTelemetry: 52`
  - these are currently inventory/discovery-only assets, not open findings, but they remain the next real coverage expansion area.

- MAGATAMA cross-repo state from the same chat is now synced into this handoff:
  - Compliance framework cards in MAGATAMA are clickable and open per-framework requirement details.
  - MAGATAMA training status was corrected so `New Since Last Training` no longer falsely shows `0`.
  - Live verified/deduped MAGATAMA training state after the fix:
    - `collectedExamples: 49`
    - `rawExamples: 58`
    - `duplicateExamples: 9`
    - `effectiveExamples: 49`
    - `newSinceLastTraining: 49`
  - MAGATAMA now filters training metrics to verified/trainable examples only.
  - Failed/escalated MAGATAMA remediation records should go to `errors.jsonl`, not the main `fixes.jsonl`, so the next MagatamaLLM run does not train on junk.
  - Gitea-backed training pool remains the default target for training writes.
- MAGATAMA coverage-gap and training-integrity hardening on 2026-05-06:
  - the earlier `49` medium `atlas-coverage-gap` findings were traced to Atlas treating inventory-only and discovery-only assets as operational protection failures.
  - core logic was tightened so Atlas coverage findings now open only for managed operational assets:
    - exposure-backed assets
    - explicit non-auto owner
    - configured telemetry expectation
    - critical/high criticality
    - infrastructure metadata or managed infra device types
  - loopback and passive reference/inventory assets no longer reopen noisy guard findings.
  - local build succeeded, the new core dist was deployed to Erik, and the first post-deploy guard scan resolved stale findings.
  - live Postgres state after deploy: `open findings = 0`.
  - training integrity bug was fixed in `packages/core/src/learning/fix-tracking.ts`:
    - verified fixes now append to `training-data/gitea-learning-pool/magatamallm/fixes.jsonl`
    - failed/escalated/report-only runs now belong in `errors.jsonl`
  - two explicit Codex-written training entries were appended to the MAGATAMA Gitea-backed fixes corpus:
    - atlas coverage scope hardening
    - training path integrity fix
  - corpus cleanup + dedupe was executed afterward:
    - pre-dedupe backup kept locally as:
      - `magatama/training-data/gitea-learning-pool/magatamallm/fixes-pre-dedupe-20260506.jsonl`
    - resulting verified corpus:
      - `fixes.jsonl = 1,368` unique verified training rows
    - resulting failure corpus:
      - `errors.jsonl = 4` tracked failed/escalated rows
    - integrity report now exists at:
      - `magatama/training-data/gitea-learning-pool/magatamallm/corpus-integrity-report.json`
    - latest integrity totals:
      - `scanned: 1368`
      - `verified: 1368`
      - `movedToErrors: 4`
      - `parseErrors: 0`
      - `invalidVerifiedFlag: 0`
- Complete Codex chat sync was added:
  - `sync/history/2026-04-29-codex-complete-chat-sync.md`
  - captures Ghost/blog updates, LinkedIn voice preferences, LPO/AI-fabric blog edits, Rest-Is-Not-Laziness scheduling replacement, and security notes.
  - confirms no secrets were written into sync.
  - confirms TIP crawler/robot planning remains TIPLLM-only.
  - confirms Erik remains controller/light `erik-safe` only, with heavy crawler work assigned to Proxmox/Pi workers.
- Codex sync-start confirmation was added:
  - `sync/history/2026-04-29-codex-sync-start-confirmation.md`
  - confirms Codex read this TIP handoff, checked the sibling LLM Gateway handoff, and is treating `sync/` as binding.
  - no code changes, crawler jobs, queue waves, PM2 restarts, or Erik load were initiated during this confirmation.
- Codex follow-up on 2026-04-29 clarified the active BlogLLM model:
  - TIP shows `fo-blog-v7`, but this is not a normal Ollama GGUF manifest.
  - It is a local Adapter Bridge / Mac Studio model backed by the RunPod-trained PEFT adapter:
    `/Users/renefichtmueller/Desktop/Claude Code/magatama/training-data/runpod/pod-runs/2026-04-25-fo-tip/final/adapters/fo_blogllm/final-adapter`
  - Bridge definition:
    `/Users/renefichtmueller/Desktop/Claude Code/magatama/scripts/ollama_adapter_bridge.py`
  - TIP API default:
    `packages/api/src/llm/client.ts` uses `OLLAMA_LLM_MODEL || "fo-blog-v7"`.
  - `fo-blog-v8` remains the next training candidate, not the currently active TIP BlogLLM model.
- Full Codex session handoff was added:
  - `sync/history/2026-04-29-codex-full-session-handoff.md`
  - covers TIP verification, product image/detail crawling, Blog Engine Hot Topics, TIPLLM robots, training pool, Erik status, and cross-repo sync.
- Added a verification robot controller:
  - `packages/scraper/src/robots/verification-robots.ts`
  - command: `npm run robots:verification -w packages/scraper -- --status`
- Added TIPLLM robot experience writing:
  - `packages/scraper/src/crawler-llm/training-data-writer.ts`
  - writes raw robot audit rows and SFT records.
- Added Gitea training pool import to TIP learning-pool build:
  - `scripts/tip-learning-pool-build.ts`
  - imports `TIP_TRAINING_REPO/qa-pairs/*.jsonl` into the `tip_llm` lane.
- Added docs:
  - `docs/TIP_SELFLEARNING_WORKFLOW.md`
- Added package script:
  - `packages/scraper/package.json`
  - `robots:verification`

## Gitea Training Pool

- Existing local clone: `/tmp/tip-training-data`
- Gitea repo: `rene/tip-training-data`
- Latest pushed training commit:
  - `f1c83f8 crawl: add robot-status training records [2026-04-29T20:11:24.091Z]`
- First robot experience record was written to:
  - `/tmp/tip-training-data/qa-pairs/robot-control-high.jsonl`
  - `/tmp/tip-training-data/robot-experiences/2026-04-29.jsonl`

## MAGATAMA Training / Operations State

- Relevant local repo:
  - `/Users/renefichtmueller/Desktop/Claude Code/magatama`
- Latest confirmed live MAGATAMA findings state:
  - `open findings: 0` on `2026-05-06`
- Latest confirmed live resolver state:
  - `Codex` and `Copilot` intentionally `idle/disabled`
  - not a runtime outage, but a settings choice until gateway/bridge auth is intentionally re-enabled
- Latest confirmed live MAGATAMA training metric after dashboard fix:
  - `newSinceLastTraining: 49`
- Meaning:
  - the old `0` was incorrect.
  - the currently visible trainable MAGATAMA corpus is based on verified and deduplicated examples only.
- Latest corpus integrity state after cleanup:
  - operational Gitea-backed MAGATAMA training corpus is now much smaller but cleaner:
    - `1368` unique verified rows
    - `4` live failure/escalation rows in `errors.jsonl`
  - do not confuse raw historical volume with real trainable signal.
- Important training integrity rule:
  - report-only or failed/escalated records must not be treated as verified training fixes.
  - keep them separated from the main verified training corpus.

## Erik Status

- Synced TIPLLM robot/training code to `/opt/tip`.
- Did not start crawler jobs.
- Did not enqueue robot waves.
- Did not restart PM2 services.
- Remote scraper TypeScript build is passing after removing two stale misplaced remote-only duplicate files:
  - `/opt/tip/packages/scraper/src/scrapers/scheduler.ts`
  - `/opt/tip/packages/scraper/src/vendor-discovery-crawler.ts`
- `tip-api` and `tip-scraper-daemon` are online.
- Shared Erik note from the same chat:
  - MAGATAMA dashboard/core were redeployed during compliance/training fixes.
  - TIP crawler policy remains unchanged: Erik is controller/light runner only, not heavy crawl execution host.

## Last Live Verification Snapshot

From 2026-04-29:

- Total transceivers: `13,546`
- Price verified: `7,250`
- Image verified: `7,025`
- Details verified: `6,243`
- Fully verified: `5,812`
- Last price observation: `2026-04-29 19:15:53 UTC`
- Last stock observation: `2026-04-29 19:15:56 UTC`

## Latest MAGATAMA Training / RunPod Truth

Confirmed on `2026-05-06`:

- Lane-specific training pools are now materially separated and no longer all fallback to `magatamallm`.
- Live Erik dashboard API now reports:
  - `magatamallm`
    - `1367 train`
    - `152 eval`
    - `1519 total`
    - `newSinceLastTraining = 1367`
  - `fo_blogllm`
    - `17353 train`
    - `1929 eval`
    - `19282 total`
    - `newSinceLastTraining = 17353`
    - active local model resolves to `fo-blog-v7`
  - `tip_llm`
    - `6482 train`
    - `721 eval`
    - `7203 total`
    - `newSinceLastTraining = 6482`
    - target active model is `tip-llm-v1`, but this model is not yet present locally in Ollama
- Result:
  - previous `1097` everywhere was stale / wrong.
  - selected lane now controls its own manifest, model label, and training counts.

### Gitea-backed Pool Materialization

- `magatamallm` Gitea pool remains canonical and populated.
- `fo_blogllm` and `tip_llm` Gitea-backed pool folders were previously almost empty; they are now materialized from the local RunPod lane exports.
- Lane manifests and JSONL exports now exist under:
  - `training-data/gitea-learning-pool/fo_blogllm/`
  - `training-data/gitea-learning-pool/tip_llm/`

### RunPod Completion Hardening

- MAGATAMA dashboard code now treats RunPod `COMPLETED` as success only after:
  1. target model artifact is referenced
  2. local Mac training API adopts/imports the artifact
  3. lane-specific smoke tests pass
  4. active Ollama alias is updated
- New local adoption endpoint is:
  - `POST /adopt-runpod-model`

### Mac Training API State

- The old LaunchAgent on Mac Studio was still serving the legacy training API from:
  - `~/magatama-llm/service/training_api.py`
- It has now been upgraded in place so Erik sees the new adoption-capable API.
- Verified from Erik:
  - `http://192.168.178.213:3214/health` returns the new service
  - it now exposes `register_script` pointing into the MAGATAMA repo
  - `POST /adopt-runpod-model` exists and rejects unauthenticated requests with `401`, proving the route is live

### Still Outstanding

- A fully successful end-to-end RunPod fine-tune with:
  - real worker success
  - real artifact
  - successful local Ollama import
  - active alias switch
  - smoke-test proof
  has not yet been re-verified after the new adoption pipeline was wired in.
- Latest live proof run on `2026-05-06`:
  - job id: `2112a7ab-68c2-4411-a44f-6edb7ad377df-e1`
  - materialized correctly
  - reached `IN_PROGRESS`
  - then `COMPLETED`
  - but RunPod `status/{job}` returned no `output` object, no model artifact reference, and no Hugging Face repo result
  - current MAGATAMA handling now correctly classifies this as `completed_without_model_artifact`, not as success
- `tip_llm-v1` is still not installed locally in Ollama.

### Pulso AI Recommendation

- Keep a shared network/transceiver/switch core corpus with TIP.
- Do not collapse `Pulso AI` into the same instruction lane as `TIP_LLM`.
- Recommended split:
  - `TIP_LLM`
    - research
    - crawler / scraper / robot planning
    - vendor / firmware / issue extraction
  - `Pulso AI`
    - product responses
    - support
    - diagnostics
    - operator explanation layer

## Safe Next Steps

1. Clone or pull Gitea `origin` on laptop/Claude Code.
2. Read this folder first.
3. For BlogLLM work, treat `fo-blog-v7` as Adapter Bridge / PEFT adapter, not as a `~/.ollama` GGUF model.
4. Also read `llm-gateway/sync/CURRENT.md` when work touches shared Erik infrastructure, LLM routing, bridges, auth, TIPLLM, or crawler orchestration.
5. For TIP robot/crawler planning, use TIPLLM only. Do not route this lane through external AI providers.
6. When training pools or model stats look suspicious, prefer verified-only counts and check whether failed/escalated rows polluted the corpus.
7. For MAGATAMA-adjacent work, keep writing learnings back into the Gitea-backed pool and avoid training on report-only pseudo-fixes.
8. If testing robots, start with dry runs only:

```bash
npm run robots:verification -w packages/scraper -- --status
npm run robots:verification -w packages/scraper -- --tipllm-plan --limit=3
npm run robots:verification -w packages/scraper -- --enqueue=details-fast-lane --profile=erik-safe --dry-run
```

9. Only dispatch real crawl work after deciding the target host:
   - Erik: `erik-safe`, tiny batches only.
   - Pi: `pi-fetch`.
   - Proxmox: `proxmox-heavy`.

## Dirty Worktree Note

There are existing uncommitted changes outside `sync/`. Some are Codex work from this session, some appear pre-existing or from earlier Claude/Codex work. Do not blindly revert them. Review `git status --short` before committing broader changes.

## Latest Sync Commits

- `6c42ca7 docs: add shared agent sync handoff`
- `8e7c5aa docs: link llm-gateway sync handoff`
- `bba48d3 sync: record magatama atlas rematerialization fix`
- `fd29bee sync: record magatama atlas fallback and port detail live fixes`
- `8b42077 sync: refresh cross-agent chat handoff`
- Pending after this update:
  - watch whether any future guard exposure findings are genuine operational issues or new false positives.
  - if failures still appear inside `fixes.jsonl`, scrub historic pollution and backfill `errors.jsonl`.

## 2026-05-09 Addendum — Live Atlas + Lane Registry Truth

### Atlas / Findings

- MAGATAMA Atlas was not actually empty; the public UI could still look blank while live proof data already showed:
  - `knownAssets: 57`
  - `hostsWithTelemetry: 22`
  - `assetsWithoutTelemetry: 35`
  - `auditedHosts: 3`
  - `queueBlocked: 28`
- Root causes fixed live:
  1. `packages/core/src/routes/health-builders.ts`
     - Atlas audits / exposure now rematerialize operational findings before proof rendering.
  2. `packages/core/src/scheduler.ts`
     - generic stale auto-resolve no longer auto-closes:
       - `atlas-coverage-gap`
       - `atlas-exposure`
       - `atlas-host-audit`
  3. `packages/dashboard/public/index-v2.html`
     - if proof data is temporarily empty or stale, Atlas now derives a fallback proof model from the current snapshot so the top cards do not render as blank.
- Live public verification after deploy:
  - `/api/protection-proof` shows non-zero Atlas truth again.
  - `/api/findings?limit=10` shows open `atlas-coverage-gap` findings again.

### Training / Lane Registry

- The public training status is now honest for the current live state:
  - `magatamallm`
    - `datasetSource: url`
    - `collectionsPath: /opt/magatama/training-data/runpod/magatamallm/manifest.json`
    - `15679 train`
    - `1743 eval`
    - `17422 total`
    - `lastRegistryRunStatus: completed_without_model_artifact`
  - `fo_blogllm`
    - lane registry rebuilt on Erik
    - `lastRunStatus: completed_without_model_artifact`
  - `tip_llm`
    - lane registry rebuilt on Erik
    - `lastRunStatus: completed_without_model_artifact`
- `scripts/model_registry_build.ts` now compiles per-lane metadata from:
  - lane datasets
  - lane RunPod manifests
  - `training-runs.json`
- Live compiled registry on Erik now no longer sits at all-`null`; it exposes:
  - `activeModel`
  - `version`
  - `lastRunId`
  - `lastRunStatus`
  - `datasetSource`
  - `collectionsPath`

### Still Outstanding

- Full automatic training is still blocked by the managed RunPod Axolotl endpoint:
  - jobs reach `COMPLETED`
  - but no adoptable artifact is returned
  - therefore MAGATAMA correctly records:
    - `completed_without_model_artifact`
- That means:
  - no new model version can be truthfully activated yet
  - no Ollama alias switch should happen yet
- Remaining real blocker:
  - move to `custom-magatama` RunPod worker with explicit adapter/model artifact publication.