# Current TIP Sync State

Updated: 2026-05-09 15:02 UTC

## Newest Work

- QSFPTEK cable/AOC parser hardening and DB detail backfill on 2026-05-09:
  - root cause:
    - QSFPTEK scraper parsed catalog rows but did not pass `productUrl` into `findOrCreateScrapedTransceiver`
    - generic leading cable lengths like `1m`, `2m`, `10m`, `15m`, `30m` were not parsed
    - MFS/MCP AOC/DAC product families were not classified as cable/AOC products
  - code hardened:
    - `packages/scraper/src/scrapers/qsfptek.ts`
      - parses generic `m/km` reach, including leading lengths
      - classifies `MFS`/AOC/active fiber as `AOC Cable`
      - classifies `MCP`/DAC/Copper/Twinax as `Cable`
      - writes `productUrl` into the DB upsert
      - sets Copper/DAC wavelength to `N/A`
      - adds safe optical family wavelength parsing for future catalog runs
  - DB correction:
    - found `36` QSFPTEK rows missing details
    - `28` had deterministic leading length and source URL
    - updated those `28` with reach, cable/AOC classification and source-backed details
    - `8` additional rows became fully verified after promotion
  - deployment:
    - synced patched QSFPTEK scraper to active `/opt/tip`
    - `pnpm -C packages/scraper build` passed
  - truth:
    - QSFPTEK is now much closer, but remaining rows include long-reach 1G optics missing fiber/detail fields and should be handled separately by source parsing, not guessed

- Copper/DAC reach/detail verification and comparable API semantics on 2026-05-09:
  - purpose:
    - continue toward full TIP verification without inventing optical data
    - treat Copper/DAC/Twinax as cable products with `wavelengths=N/A`, not missing optical products
  - DB correction:
    - found `467` Copper rows still missing reach label/meters
    - `342` had deterministic length evidence in part number or product URL
    - wrote `reach_label`, `reach_meters`, `wavelengths=N/A`, cable category and detail verification for those `342`
    - corrected `78` ATGBICS OSFP cable rows that had been parsed as `SFP`
  - code hardened:
    - `packages/scraper/src/scrapers/atgbics.ts`
      - detects `OSFP` before `SFP`
      - parses generic decimal meter/kilometer reach such as `0.5m`, `1.5m`, `2.5m`, `30m`, `2km`
      - keeps Copper/DAC/Twinax/Base-T/RJ45 wavelength as `N/A`
    - `packages/api/src/routes/transceivers.ts`
      - comparable products now allow Copper/DAC/CU products to match each other with `wavelengths=N/A`
      - optical products still require numeric wavelength evidence and close wavelength match
  - deployment:
    - synced ATGBICS scraper to active `/opt/tip`
    - `pnpm -C packages/scraper build` passed
    - synced API route to active `/opt/tip`
    - `pnpm -C packages/api build` passed
    - restarted `tip-api`
  - result:
    - global `details_verified` increased from `11085` to `11425`
    - global `fully_verified` increased from `9861` to `10170`
    - Copper remaining gaps after correction:
      - missing reach label: `122`
      - missing reach meters: `125`
      - missing details: `158`
    - selected vendor detail/fully state:
      - ATGBICS: details `7656/8269`, fully `7646/8269`
      - NADDOD: details `726/748`, fully `726/748`
      - QSFPTEK: details `165/201`, fully `140/201`
      - FS.COM: details `373/383`, fully `300/383`
      - Flexoptix: details `626/744`, fully `622/744`
      - GAO Tek: details `127/414`, fully `2/414`
  - health:
    - public TIP health after restart: `healthy`
    - load status `ok`
    - memory used `13%`
  - truth:
    - this is real progress toward trustworthy complete data, not cosmetic flag setting
    - remaining gaps are now smaller targeted vendor/parser/source tasks; NADDOD and QSFPTEK are next high-yield targets

- ATGBICS safe JSON rerun + Copper wavelength semantics on 2026-05-09:
  - code hardened:
    - `packages/scraper/src/scrapers/atgbics.ts`
    - detects `N/A` wavelength for Copper/DAC/Twinax/Base-T/RJ45 products
    - detects safe optical protocol-family wavelengths:
      - CWDM4 => `1271,1291,1311,1331`
      - SR/SR4/SR8/SRBD/VR/ESR/CSR => `850`
      - DR/FR/LR/ER/PSM family => `1310`
  - deployment:
    - synced patched ATGBICS scraper source to active `/opt/tip`
    - `pnpm -C packages/scraper build` passed on Erik
  - runtime:
    - ran one light ATGBICS Shopify `products.json` pass with `nice -n 10`
    - no Playwright/browser crawler
    - processed `7946` products
    - price updates `61`
    - image observations/updates `7943`
  - observation:
    - ATGBICS verification counters did not move because remaining highspeed wavelength gaps are mostly product rows whose source keys are cable/coherent/variant cases not solved by the current lightweight parser
    - sample remaining rows include QSFP-DD ZR/C-band/coherent products and Copper/DAC rows
  - DB truth correction:
    - Copper/DAC products do not have an optical wavelength and should not be counted as missing optical wavelength
    - set empty Copper `wavelengths` to `N/A` for `1044` rows
    - highspeed missing-wavelength count changed:
      - before Copper correction: `1908`
      - after Copper correction: `1360`
      - highspeed Copper missing: `0`
      - remaining optical/non-Copper highspeed missing: `1220`
  - health:
    - public TIP health after run/update: `healthy`
    - load status `ok`
    - memory used `14%`
  - truth:
    - the ATGBICS JSON run was safe and confirmed current prices/images, but did not materially improve ATGBICS technical completeness yet
    - next ATGBICS work should be a targeted parser for product URL slug classes: `ZR`, `DCO`, `C-band`, `LAN-WDM`, `CR8`, `breakout`, and OSFP/QSFP-DD cable form-factor correction

- DB-only highspeed wavelength evidence backfill on 2026-05-09:
  - purpose:
    - improve product-level technical completeness and future 1:1 comparison quality without running a browser crawler on Erik
  - method:
    - only used existing DB evidence from part numbers, standard names, notes and product URLs
    - only filled wavelengths when evidence was deterministic:
      - explicit `850nm`, `1310nm`, `1311nm`, or `1550nm`
      - MMF plus SR/SR4/SR8/SRBD/VR/ESR/CSR family => `850`
      - SMF plus DR/FR/LR/ER/PSM family => `1310`
      - SMF plus CWDM4 => `1271,1291,1311,1331`
    - skipped ambiguous highspeed rows instead of inventing data
  - updated rows:
    - `129` rows set to `1310`
    - `40` rows set to `850`
    - `18` rows set to `1271,1291,1311,1331`
    - total updated: `187`
  - highspeed wavelength gap after update:
    - highspeed rows: `4438`
    - still missing wavelengths: `1908`
    - largest remaining gaps:
      - ATGBICS `663`
      - NADDOD `419`
      - Flexoptix `183`
      - Eoptolink `141`
      - FS.COM `114`
      - QSFPTEK `97`
  - health:
    - public TIP health after update: `healthy`
    - load status `ok`
    - memory used `13%`
  - truth:
    - this was an evidence backfill, not a claim of full source verification
    - remaining wavelength gaps need vendor-specific parsers/crawlers or stronger source text

- Strict active equivalence sweep + reach-meter backfill on 2026-05-09:
  - follow-up after the FS.com `QDD-2FR4-800G` false-comparable correction
  - audited all active `approved/auto_approved` equivalence matches for hard 1:1 risks:
    - breakout/AOC/DAC/cable class mismatch
    - known reach mismatch
    - known fiber mismatch
    - primary wavelength mismatch
    - missing core evidence on active matches
  - found and rejected `16` active false positives:
    - Flexoptix 400G/100G pluggable optics that were matched to ATGBICS AOC/breakout products
    - Flexoptix `Q.851HG.03` 300m MMF incorrectly matched to 70m and 40km NADDOD rows
    - Flexoptix `Q.854HG.01.P` 100m MMF incorrectly matched to a 1m NADDOD row
  - global reach-meter backfill:
    - `269` rows with `km` reach labels received numeric `reach_meters`
    - `131` rows with `m` reach labels received numeric `reach_meters`
    - remaining reach labels without meters are only `N/A` accessory/control rows, not distance products
  - post-sweep active match risk counts:
    - active approved/auto-approved matches: `34051`
    - breakout-class mismatches: `0`
    - reach mismatches: `0`
    - fiber mismatches: `0`
    - wavelength mismatches: `0`
    - missing core evidence: `0`
  - live counters after sweep:
    - equivalence queue: `pending=0`, `approved=1987`, `auto_approved=32064`, `rejected=148382`, `due_research=0`
    - product verification: total `17647`, price `11557`, image `11963`, details `11085`, fully `9861`
  - truth:
    - active equivalence matches now have no known hard 1:1 mismatches by DB evidence
    - this still does not mean every product row is fully enriched; remaining work is product-level vendor enrichment and source capture

- FS.com `QDD-2FR4-800G` false comparable correction on 2026-05-09:
  - operator spotted that the dashboard showed invalid comparable products for FS.com `QDD-2FR4-800G`
  - wrong examples:
    - Flexoptix `DQ.2A858HG.z`: actually `800G QSFP-DD to 2x QSFP112 Breakout AOC`, MMF, 1-30m, not a 2km SMF FR4 transceiver
    - NADDOD `QDD-800LPO-2DR4`: 500m, not 2km
  - root cause:
    - FS.com `QDD-2FR4-800G` had `reach_label=2km` but `reach_meters=0`
    - API comparable-product SQL treated unknown reach as a wildcard, so non-1:1 products leaked into the dashboard comparison section
  - live DB correction:
    - `QDD-2FR4-800G`
      - `form_factor=QSFP-DD`
      - `speed=800G`
      - `speed_gbps=800`
      - `reach_label=2km`
      - `reach_meters=2000`
      - `fiber_type=SMF`
      - `wavelengths=1310`
      - `standard_name=800G QSFP-DD 2FR4`
      - remains fully verified
  - API correction:
    - `packages/api/src/routes/transceivers.ts`
      - comparable products now require hard reach evidence on both sides
      - reach ratio must be at least `0.85`
      - fiber type must match exactly
      - primary wavelength must exist on both sides and be within `15nm`
      - breakout/AOC/DAC/cable products can only compare to other breakout/AOC/DAC/cable products
      - `QSFP-DD` and `QSFP-DD800` are treated as same form-factor family for 800G-class comparisons
  - deployment:
    - copied API route to Erik
    - `pnpm -C packages/api build` passed on Erik
    - `pm2 restart tip-api` completed, `tip-api` online
  - health:
    - public TIP health after restart: `healthy`, load `ok`, memory `13%`
  - truth:
    - `DQ.2A858HG.z` must never be shown as 1:1 comparable for `QDD-2FR4-800G`
    - a 500m NADDOD LPO/2DR4 product must not be shown as 2km comparable
    - unknown reach must never act as wildcard in final product comparison

- FS.com 1.6T DR8/2FR4 source correction on 2026-05-09:
  - operator spotted that FS.com has two distinct 1.6T OSFP variants on the same family:
    - `OSFP-DR8-1.6T-FL`: 500m, DR8, SMF
    - `OSFP-2FR4-1.6T-FL`: 2km, 2FR4, SMF
  - confirmed in TIP DB:
    - both FS.com variants exist as separate rows
    - `OSFP-2FR4-1.6T-FL` had `reach_meters=0` even though the source and row label said `2km`
    - `OSFP-DR8-1.6T-FL` had no wavelength, causing the deterministic equivalence worker to reject the otherwise correct 500m Flexoptix match
  - live DB correction:
    - `OSFP-DR8-1.6T-FL`
      - `speed=1.6T`
      - `speed_gbps=1600`
      - `reach_label=500m`
      - `reach_meters=500`
      - `fiber_type=SMF`
      - `wavelengths=1310`
      - `standard_name=1.6T OSFP DR8`
      - fully verified remains true
    - `OSFP-2FR4-1.6T-FL`
      - `speed=1.6T`
      - `speed_gbps=1600`
      - `reach_label=2km`
      - `reach_meters=2000`
      - `fiber_type=SMF`
      - `wavelengths=1310`
      - `standard_name=1.6T OSFP 2FR4`
      - fully verified true
    - Flexoptix `O.1316T.C.05.M`
      - confirmed as `500m`, `SMF`, `1.6T`
      - `standard_name=1.6T OSFP DR8`
  - equivalence correction:
    - approved only `O.1316T.C.05.M` ↔ `OSFP-DR8-1.6T-FL`
    - confidence `0.913`
    - match basis: form factor, speed, reach, fiber, wavelength and source variant DR8/500m
    - `OSFP-2FR4-1.6T-FL` remains separate and is not linked to the 500m DR8 Flexoptix product
  - scraper hardening:
    - `packages/scraper/src/scrapers/fs-com.ts`
      - recognizes German/decimal `1,6T` and `1600G` as `1.6T`/`1600`
      - converts reach labels such as `2km` into `reach_meters=2000`
      - updates stale `speed` labels when the numeric source speed matches the row
  - build:
    - `pnpm -C packages/scraper build` passed on Erik
  - truth:
    - there are definitely two separate FS.com variants
    - 500m DR8 is the correct equivalent for Flexoptix `O.1316T.C.05.M`
    - 2km FR4 is a separate DB product and must not be collapsed into the 500m match

- Targeted vendor verification push after equivalence revalidation on 2026-05-09:
  - code improved:
    - `NADDOD_DB_DETAIL_ONLY=1` mode verifies existing NADDOD rows with source URLs instead of rotating blindly through the full sitemap
    - NADDOD now extracts `og:image`, source product URLs, reach/fiber/wavelength from page evidence, AOC/DAC cable lengths, and DR/FR/SR/VR/XDR patterns
    - GAO Tek now writes product URLs and image evidence
    - Ascent Optics now writes product URLs and table image evidence
    - Eoptolink now writes product URLs, images, reach/wavelength evidence and corrects over-broad form-factor parsing by preferring title/slug evidence
  - live low-load Erik runs:
    - GAO Tek static crawl:
      - `473` unique products processed
      - GAO Tek detail coverage improved from `41` to `126`
      - `no_url` dropped to `0`
    - Ascent Optics static/API crawl:
      - `253` catalog products processed
      - image coverage `235/305`
      - detail coverage `213/305`
    - Eoptolink static crawl:
      - `76` product-solution pages inspected
      - after parser correction, Eoptolink is `287/287` image and detail verified
    - NADDOD targeted DB-detail mode:
      - first targeted wave `200` pages
      - second wave `300` pages
      - closure wave `385` pages
      - special-case wave `83` pages
      - NADDOD moved from `image=12`, `details=157`, `fully=0/1-ish` to:
        - total `748`
        - price `744`
        - image `742`
        - details `659`
        - competitor `744`
        - fully `659`
        - no URL `6`
  - global TIP counters after this push:
    - price verified `11557`
    - image verified `11963`
    - details verified `11018`
    - fully verified `9794`
    - total transceivers `17647`
  - health:
    - TIP stayed `healthy`
    - load status `ok`
    - memory used about `13%`
  - truth:
    - NADDOD is not 100% complete; remaining detail gaps include likely non-transceiver switch/NIC products and a smaller set of parser-special cases
    - OEM catalogs like Ascent and Eoptolink do not publish retail prices, so full verification cannot be forced honestly without price evidence

- Immediate full TIP equivalence revalidation on 2026-05-09:
  - operator requested all open TIP validation to be completed immediately and all product matches checked for true 1:1 equivalence
  - live preflight:
    - equivalence queue: `pending=0`, `approved=1986`, `auto_approved=32080`, `rejected=148367`, `due_research=0`
    - active matches scheduled for future 30-day recheck: `34066`
    - strict DB preflight over all active matches found:
      - no recent-price gaps: `0`
      - hard technical mismatches: `0`
      - missing critical 1:1 evidence: `0`
    - hard criteria checked: form factor, speed, fiber type, reach ratio, primary wavelength and recent competitor price evidence
  - action:
    - marked all `34066` active `approved/auto_approved` equivalences as due immediately
    - queued `18` existing PgBoss `maintenance:re-research-equivalences` jobs
    - used the existing DB-only TIP re-research worker; no browser crawler wave and no external AI
  - result:
    - all `18/18` jobs completed
    - `due_research=0`
    - `active_researched_today=34066`
    - no automated-research rejections in this immediate pass
    - final equivalence queue: `pending=0`, `approved=1986`, `auto_approved=32080`, `rejected=148367`
    - transceiver verification counters after the pass:
      - `competitor_verified=11470`
      - `price_verified=11557`
      - `image_verified=10711`
      - `details_verified=9929`
      - `fully_verified=9135`
      - total transceivers `17647`
  - TIP health after run:
    - status `healthy`
    - load status `ok`
    - memory used `13%`
    - API/DB connected
  - truth:
    - the manual equivalence queue is empty and all active matches have just been rechecked by deterministic 1:1 evidence rules
    - this does not mean every product row in TIP is complete; largest product verification gaps remain vendor-specific crawler/enrichment work, especially ATGBICS, NADDOD, GAO Tek, Juniper/Cisco, Ascent/Eoptolink and other vendor/catalog rows

- Crawlee integration/binding on 2026-05-09:
  - operator asked to install, use and bind Crawlee/Crawlee-Python after priority evaluation
  - pushed TIP commits:
    - `60531b6 feat: add crawlee python worker integration`
    - `49f0871 chore: ignore crawlee python build artifacts`
  - TypeScript TIP core remains the production crawler core using `crawlee` and Playwright
  - added scraper scripts:
    - `pnpm -C packages/scraper scrape:fs:db-detail`
    - `pnpm -C packages/scraper scrape:fs:url-discovery`
  - added optional isolated Python worker:
    - `packages/crawlee-python/`
    - `scripts/setup-crawlee-python-worker.sh`
    - `docs/TIP_CRAWLEE_RUNTIME.md`
  - Python worker policy:
    - Crawlee-Python is for Pi/Proxmox/residential side workers and extraction experiments
    - writes JSONL evidence only
    - no direct DB writes
    - no replacement for the TypeScript TIP scraper core
  - smoke test:
    - installed `crawlee==1.6.3` into `/tmp/tip-crawlee-python-venv`
    - ran `tip_crawlee_worker` against `https://crawlee.dev`
    - JSONL evidence output succeeded

- Priority Crawlee evaluation + FS.com URL discovery on 2026-05-09:
  - operator asked whether these repos help:
    - `https://github.com/apify/crawlee`
    - `https://github.com/apify/crawlee-python`
    - `https://github.com/hiteshchoudhary/crawlee-project`
  - evaluation:
    - `apify/crawlee` is directly relevant and already in use in TIP via TypeScript `PlaywrightCrawler`
    - current TIP benefit is not adding Crawlee, but using Crawlee more deliberately:
      - bounded RequestQueues
      - stable `uniqueKey`
      - explicit retry/no-text classes
      - isolated storage directories
      - AutoscaledPool telemetry as safety signal
      - hard concurrency caps on Erik
    - `apify/crawlee-python` is useful for future isolated Pi/Proxmox workers, especially for Python-native extraction experiments, but should not replace the current TypeScript scraper core today
    - `hiteshchoudhary/crawlee-project` is a small community/demo project, useful as inspiration only; not a production dependency for TIP
  - code improved:
    - `packages/scraper/src/scrapers/fs-com.ts`
      - added `FS_URL_DISCOVERY_ONLY=1`
      - maps existing `FS-<numeric-id>` rows without `product_page_url` to `https://www.fs.com/de/products/<id>.html`
      - carries `targetTransceiverId` through the crawler so verified source evidence updates the original row instead of creating duplicates
      - marks current FS.com product images verified for target rows
      - accepts deterministic H1/part/spec evidence for detail verification when FS.com does not expose a traditional spec table
  - live runs on Erik:
    - URL discovery pilot:
      - target `20`
      - scraped `19`
      - failed `0`
      - no-url rows dropped from `76` to `57`
    - full URL discovery:
      - target `56`
      - scraped `55`
      - failed `1` (`https://www.fs.com/de/products/229461.html`, transient `ERR_NETWORK_CHANGED`)
      - no-url rows dropped to `2`
    - DB reconciliation with improved detail evidence:
      - target `57`
      - scraped `55`
      - failed `0`
      - new prices `41`
      - stock observations `40`
      - specs verified `55`
    - `pnpm -C packages/scraper build` passed on Erik after the code change
  - FS.com final state after URL discovery:
    - total rows: `383`
    - price verified: `379`
    - image verified: `374`
    - details verified: `373`
    - price+image+details: `373`
    - fully verified: `205`
    - missing URL: `2`
    - missing image URL: `9`
    - missing reach label: `4`
    - missing fiber type: `9`
    - HTML product-like rows:
      - total `373`
      - image `372`
      - details `371`
      - complete `371`
    - no-url rows:
      - `Change`
      - `FS-229461`
    - category rows: `4`
  - TIP health after run:
    - status `healthy`
    - load status `ok`
    - memory used `13%`
    - global verified counters:
      - price `11557`
      - image `10711`
      - details `9929`
      - fully `8526`
  - training pool:
    - pushed `4d9a11c crawl: add fscom url discovery learning record`
  - truth:
    - FS.com is still not 100% complete
    - honest current claim: `371/373` HTML product-like rows complete; remaining work is small and classifiable

- TIP FS.com / Fiberstore targeted verification push on 2026-05-09:
  - operator requested FS.com/Fiberstore next, with all crawler/scraper/robot learnings written to the TIPLLM training pool and no external AI
  - code improved:
    - `packages/scraper/src/scrapers/fs-com.ts`
      - added `FS_DB_DETAIL_ONLY=1` mode to revalidate existing FS.COM product URLs directly from DB
      - avoids broad category/listing discovery while product URLs still need verification
      - `detectReach()` now handles comma thousands and decimal values
      - added deterministic `detectFiberType()` fallback from product name, part number and specs
      - scraper now writes `productUrl` into the transceiver row
      - detail verification source is now the actual FS.com product URL instead of the literal `fs.com`
  - live Erik verification:
    - deployed scraper to `/opt/tip`
    - `pnpm -C packages/scraper build` passed on Erik after the change
    - ran four safe DB-detail-only Playwright batches:
      - batch 1: target `80`, scraped `80`, failed `0`, new prices `17`, stock `18`, specs `24`
      - batch 2: target `80`, scraped `79`, failed `0`, new prices `6`, stock `8`, specs `23`
      - batch 3: target `90`, scraped `89`, failed `0`, new prices `21`, stock `24`, specs `47`
      - batch 4 closure: target `42`, scraped `42`, failed `0`, new prices `5`, stock `3`, specs `25`
    - all runs used Playwright concurrency `1`, `nice -n 10`, and no broad category crawl
    - Erik/TIP health after closure:
      - status: `healthy`
      - load status: `ok`
      - memory used: `13%`
      - transceivers: `17647`
      - vendors: `478`
      - switches: `680`
      - global verified counters:
        - price: `11557`
        - image: `10636`
        - details: `9816`
        - fully: `8522`
  - FS.com before targeted detail batches:
    - total rows: `383`
    - price verified: `379`
    - image verified: `299`
    - details verified: `108`
    - price+image+details: `108`
    - fully verified: `3`
    - missing product URL: `76`
    - missing image URL: `84`
    - missing reach label: `9`
    - missing fiber type: `323`
    - HTML product-like complete rows: `106`
  - FS.com after closure:
    - total rows: `383`
    - price verified: `379`
    - image verified: `299`
    - details verified: `260`
    - price+image+details: `260`
    - fully verified: `205`
    - missing product URL: `76`
    - missing image URL: `84`
    - missing reach label: `9`
    - missing fiber type: `123`
    - HTML product-like rows:
      - total `299`
      - price `299`
      - image `282`
      - details `258`
      - complete `258`
    - no-url rows:
      - total `76`
      - price `76`
      - image `15`
      - details `0`
    - category rows:
      - total `4`
      - no verified signals
  - interpretation / next strategy:
    - the DB-detail-only approach is now mostly exhausted
    - the fourth clean closure batch did not raise `details_verified`; it only nudged `fully_verified` from `199` to `205`
    - do not keep repeating the same FS.com detail crawler on Erik
    - next FS.com work should be:
      - source-discovery/classification robot for the `76` no-url rows
      - parser/source diagnostics for the remaining `41` HTML product-like rows missing detail/fiber/image signals
      - likely separate handling for malformed or historical `/de/de/products/...` URLs and pages that return no useful text
  - TIPLLM training pool:
    - all four FS.com batches were written and pushed to Gitea
    - latest training commits:
      - `28cac05` batch 1
      - `a0a6be3` batch 2
      - `38736ae` batch 3
      - `2c25bf3` closure batch
  - important truth:
    - do not claim FS.com is complete
    - the honest current claim is: FS.com product-like coverage improved strongly, but `258/299` HTML product-like rows are complete and `76` no-url rows still need source discovery/classification

- TIP Flexoptix completion push on 2026-05-09:
  - operator said "feuer frei" after confirming Flexoptix was not yet complete
  - TIPLLM training pool was updated immediately with the truth rule:
    - all Flexoptix products are not complete
    - active catalog coverage must be separated from historical/extra DB rows
    - never claim 100% verification without exact counters and fresh source timestamps
  - code improved:
    - `packages/scraper/src/scrapers/flexoptix-catalog.ts`
      - generic reach parsing now handles values such as `50 m`, `1,000 m`, decimal/range forms
      - wavelength parsing now handles multiple `λ... nm` values
      - product URL is now passed into `findOrCreateScrapedTransceiver`
    - `packages/scraper/src/scrapers/flexoptix-detail-pages.ts`
      - new targeted Flexoptix detail-page verifier
      - fetches only Flexoptix `.html` product pages with missing price/image/detail fields
      - parses static product page metadata:
        - title
        - description
        - `og:image`
        - `product:price:amount`
        - reach
        - fiber type
        - wavelengths
        - connector
        - standard name
      - writes only DB evidence from Flexoptix pages, no external AI
  - live run results on Erik:
    - `pnpm -C packages/scraper build` passed
    - improved catalog run completed:
      - `Total unique products after GraphQL: 615`
      - `Flexoptix Catalog Complete: 615 products, 0 prices`
    - details improved from:
      - `details_verified: 500`
      - `price+image+details: 496`
      - `fully_verified: 496`
    - after catalog parser improvement:
      - `details_verified: 606`
      - `price+image+details: 602`
      - `fully_verified: 602`
    - detail verifier run:
      - target: `191` real `.html` product pages
      - fetched: `191`
      - failed: `0`
      - new/updated price observations: `177`
      - images marked: `187`
      - details marked: `185`
    - after detail verifier and explicit BiDi correction:
      - total Flexoptix rows: `744`
      - HTML product-like rows: `626`
      - price verified: `626`
      - image verified: `622`
      - details verified: `626`
      - price+image+details verified: `622`
      - fully verified: `620`
      - filter/category rows with no verification: `108`
      - other non-product/generic rows with no verification: `10`
  - manual evidence correction:
    - four BiDi SFP products had `1,000 m` in the Flexoptix title
    - updated from source evidence:
      - `S.B1312.M.DIL`
      - `S.B1312.M.DL`
      - `S.B1512.M.DIL`
      - `S.B1512.M.DL`
    - set:
      - `reach_label=1000m`
      - `reach_meters=1000`
      - `fiber_type=MMF`
      - `details_verified=true`
  - remaining truth:
    - active/product-like Flexoptix rows are much closer to complete
    - not all `744` Flexoptix rows can honestly be 100% verified because `118` are filter/category/generic/non-product URLs rather than concrete product pages
    - remaining HTML product-like gaps after final source check:
      - `4` product-like rows without image verification because Flexoptix exposes only `placeholder-flexoptix.jpg` as `og:image`
      - `2` FLEXBOX/accessory-like rows were classified as `Accessory`, `reach_label=N/A`, `details_verified=true`
  - operational note:
    - Erik SSH became unavailable with `connection refused` after the last verification checks
    - public TIP HTTPS still responded through Cloudflare
    - no further live commands were started after SSH refused

- TIP Flexoptix price truth recheck on 2026-05-09:
  - operator question:
    - are all Flexoptix prices, images and information present
    - are the Flexoptix prices 100% correct
  - live truth:
    - total Flexoptix rows in TIP: `744`
    - current Flexoptix catalog scraper finds: `615` active catalog products
    - price verified rows: `619`
    - latest verified price observations: `615`
    - image verified rows: `615`
    - details verified rows: `500`
    - price + image + details verified: `496`
    - fully verified: `496`
    - missing image URL: `129`
    - missing reach label: `244`
    - missing fiber type: `131`
  - important interpretation:
    - current active Flexoptix catalog price set is freshly rechecked
    - the full historical/extra Flexoptix table is not complete
    - therefore do not claim all `744` Flexoptix rows are complete
  - code fix:
    - `packages/scraper/src/utils/db.ts`
    - unchanged price observations now refresh `price_observations.verified_at = NOW()`
    - unchanged product prices now refresh `transceivers.price_verified_at = NOW()`
    - this makes live rechecks auditable instead of leaving the old verification timestamp in place
  - live recheck:
    - deployed `db.ts` to Erik
    - `pnpm -C packages/scraper build` passed
    - ran light Flexoptix catalog scraper on Erik with `nice -n 10`
    - result:
      - `Total unique products after GraphQL: 615`
      - `Flexoptix Catalog Complete: 615 products, 0 prices`
    - `0 prices` means no changed price rows were inserted because content hashes matched
    - after timestamp fix, DB shows `615` latest verified Flexoptix price observations with `verified_at` in the last 10 minutes
  - honest answer:
    - 615 active catalog prices are freshly source-confirmed by the Flexoptix scraper
    - no claim should be made that all 744 Flexoptix DB rows have complete price/image/detail coverage
    - no system should promise absolute 100% price truth forever because live vendor prices can change and may vary by account/currency/VAT/session; TIP should display last-source-verified timestamp

- MAGATAMA Atlas rematerialization / anti-auto-resolve hardening completed live on 2026-05-09:
  - operator problem:
    - Atlas / Findings / Protection Proof had become dishonest again
    - raw files on Erik still contained:
      - `3` host audits
      - `32` live Atlas scan devices
    - but open findings had collapsed back to `0`
    - Atlas UI therefore showed an implausibly clean state
  - verified root cause:
    - `packages/core/src/routes/health-builders.ts`
      - `buildProtectionProofResponse()` read Atlas audits/snapshot but did **not** resync findings from those raw sources
    - `packages/core/src/scheduler.ts`
      - generic guard stale-auto-resolve treated Atlas-managed findings like ordinary scan findings
      - newly rematerialized Atlas findings were therefore cleared again almost immediately
  - code fixed:
    - `packages/core/src/routes/health-builders.ts`
      - added `readAtlasSnapshot()`
      - added `syncAtlasAuditFindings(...)` + `syncAtlasExposureFindings(...)` via a new `syncAtlasOperationalFindings(...)` step
      - `buildProtectionProofResponse()` now re-materializes Atlas-managed findings from current raw files before building the proof response
    - `packages/core/src/scheduler.ts`
      - introduced `ATLAS_MANAGED_FINDING_SOURCES`
      - generic stale resolution now skips:
        - `atlas-coverage-gap`
        - `atlas-exposure`
        - `atlas-host-audit`
      - these sources are now left to their own verification-aware resolution logic
  - live deployment on Erik:
    - rebuilt `@magatama/core`
    - synced:
      - `/opt/magatama/packages/core/dist/routes/health-builders.js`
      - `/opt/magatama/packages/core/dist/scheduler.js`
    - restarted PM2 service:
      - `magatama`
  - live verification:
    - before fix:
      - Atlas raw files present:
        - audits: `3`
        - devices: `32`
      - DB open findings: `0`
    - after authenticated `/api/protection-proof` rebuild:
      - DB open findings: `28`
      - public `/api/findings?limit=5` now shows real open Atlas findings again
      - public `/api/protection-proof` now reports:
        - `knownAssets: 57`
        - `hostsWithTelemetry: 22`
        - `assetsWithoutTelemetry: 35`
        - `auditedHosts: 3`
        - `queueBlocked: 28`
        - `switchbladeAssets: 5`
        - `switchbladeRacks: 1`
        - `switchbladeNmsNodes: 5`
  - operational truth now:
    - Atlas and Findings are no longer silently wiped clean by the generic stale resolver
    - the remaining open state is again honest:
      - most current open findings are `atlas-coverage-gap`
      - they reflect missing live telemetry on known inventory/discovery assets
  - operator note:
    - browser cache / old UI state may still temporarily show the earlier empty Atlas
    - hard refresh is required:
      - `Cmd + Shift + R`
  - important honest remainder:
    - this closes the biggest Atlas truthfulness regression
    - it does **not** yet solve every backend truth issue
    - still pending:
      - lane-specific RunPod artifact adoption / automatic version switch
      - deeper Atlas policy refinement for which inventory-only assets should stay actionable vs informational

- TIP automated equivalence research / manual queue cleanup completed on 2026-05-09:
  - operator intent:
    - products should be researched well enough that they do not need manual equivalence validation
    - Erik must not be stressed by crawler-heavy work
    - TIPLLM-only policy for crawler/robot research remains in force
  - root cause found:
    - `approve-all` approved low-confidence equivalences and only marked them for later re-research
    - the re-research worker mostly checked whether a competitor still had a recent price
    - it did not re-evaluate hard technical equivalence evidence such as reach, wavelength, fiber type, speed and form factor
  - code changed:
    - `packages/api/src/routes/review.ts`
      - `approve-all` now approves only confidence >= `0.73`
      - weak pending rows stay pending and are queued for automated research instead of being marked approved
      - `needs_research` stats/listing now includes pending research rows
      - added `POST /api/review/run-research`
    - `packages/scraper/src/scheduler.ts`
      - added deterministic equivalence research evaluator
      - rejects stale, technically contradictory, incomplete, or low-confidence matches automatically
      - confirms only matches with recent price plus matching form factor, speed, fiber type, wavelength and reach
      - confirmed matches are scheduled for a 30-day recheck
  - live deployment:
    - synced changed files to Erik `/opt/tip`
    - `pnpm -C packages/api build` passed on Erik
    - `pnpm -C packages/scraper build` passed on Erik
    - restarted `tip-api` and `tip-scraper-daemon`
    - both processes are online
  - data cleanup performed on live DB without heavy crawling:
    - pending + due re-research candidates processed: `144103`
      - rejected fiber mismatch: `958`
      - rejected reach mismatch: `82128`
      - rejected missing reach evidence: `31151`
      - rejected wavelength mismatch: `29865`
      - rejected low confidence: `1`
    - old approved rows audited:
      - kept/confirmed: `1986`
      - rejected: `4000`
    - old auto-approved rows audited:
      - kept/confirmed: `32080`
      - rejected reach mismatch: `260`
  - final live equivalence status:
    - `pending`: `0`
    - `approved`: `1986`
    - `auto_approved`: `32080`
    - `rejected`: `148367`
    - due re-research now: `0`
    - scheduled 30-day rechecks: `34066`
  - final verification counters after reconcile:
    - `competitor_verified`: `11137`
    - `fully_verified`: `290`
    - `price_verified`: `11549`
    - `image_verified`: `10629`
    - `details_verified`: `9538`
  - operational note:
    - no new crawler wave was started for this cleanup
    - the run used existing crawled specs/prices and strict deterministic product-evidence checks
    - next improvement should be targeted crawler enrichment for products rejected due to missing reach/details, preferably on Proxmox/Pi workers rather than Erik

- TIP Flexoptix + FS.com price/image revalidation completed on 2026-05-09:
  - live root cause:
    - scraper runs had set `transceivers.price_verified`, but `price_observations.is_verified` stayed false
    - FS.com product image selector was stale and missed current `.big_img` / `.big_img_m` product images
  - code fixed:
    - `packages/scraper/src/utils/db.ts`
      - new/fresh unchanged price observations now get `is_verified = true` and `verified_at`
      - `price_verified_at` is refreshed when price verification is confirmed
      - image verification now refreshes `image_verified_at`, `image_verified_url`, and `image_scraped_at`
      - existing records revalidate images whenever current scraper output contains an image URL
    - `packages/scraper/src/scrapers/fs-com.ts`
      - added `TIP_FORCE_REVALIDATE`
      - added `FS_MAX_DETAIL_PAGES_PER_RUN`
      - added `FS_ONLY_MISSING_IMAGES`
      - updated FS.com image extraction to prefer current `resource.fs.com` product images from `.big_img_box`, `img.big_img`, `.big_img_m_active`, `.big_img_m`, `.small_img_active`
      - rejects default/logo/general/icon/SVG image URLs
  - live runs on Erik:
    - `pnpm -C packages/scraper build` passed on `/opt/tip`
    - Flexoptix catalog revalidation:
      - 615 products processed
      - 615 Flexoptix price observations marked verified
      - 605 Flexoptix images verified in the run window
    - FS.com full force revalidation:
      - 270 products discovered
      - 270 detail pages scraped
      - 0 failed detail requests
      - 17 new price observations in first full pass
      - 266 FS.com price observations marked verified after first pass
    - FS.com targeted missing-image revalidation:
      - 99 detail pages scraped
      - 0 failed detail requests
      - FS.com image-verified products increased from 207 to 299
      - FS.com verified price observations increased to 271 after targeted pass
  - final checked counters:
    - Flexoptix:
      - products: 744
      - product price_verified: 619
      - product image_verified: 615
      - price observation rows: 1288
      - verified price observation rows: 615
    - FS.COM:
      - products: 383
      - product price_verified: 379
      - product image_verified: 299
      - price observation rows: 818
      - verified price observation rows: 271
  - operations:
    - `tip-scraper-daemon` restarted and is online
    - Erik remained stable; final load was about `2.16, 2.22, 2.47`
    - CT115 / `tip-scraper` SSH did not respond quickly from this session, so it was not used
  - TIPLLM training pool:
    - `/tmp/tip-training-data` was recloned from Gitea
    - crawler experience was written to:
      - `robot-experiences/2026-05-09.jsonl`
      - `qa-pairs/robot-control-high.jsonl`
    - pushed to Gitea commit:
      - `850083f crawl: add flexoptix fs revalidation learning record`

- MAGATAMA dashboard truthfulness / UX hardening on 2026-05-09:
  - live `api/llm/status` on MAGATAMA now publicly confirms the corrected `magatamallm` lane counts:
    - `15679` train / collected
    - `1743` eval
    - `17422` total
    - `15679` new since last training
  - the Training page inconsistency was traced to a stale browser/static-cache path plus mixed UI sources
  - dashboard static UI was updated and deployed live to Erik:
    - new cache version:
      - `2026-05-09a`
    - Training Control now force-merges the visible summary with the live `llmStatus.training` payload so the page and modal cannot silently disagree on pair counts
  - Switchblade network port UX was hardened:
    - hover detail remains
    - each port is now also clickable
    - click opens a real MAGATAMA-side detail modal with:
      - status
      - speed
      - description
      - peer device / peer port
      - connected host
      - VLAN
      - transceiver
      - in/out errors
      - octet counters
    - this was done because hover-only behavior was still presenting as broken / ambiguous for the operator
  - direct live deployment truth on Erik:
    - `/opt/magatama/packages/dashboard/public/index-v2.html` now contains:
      - `API_CACHE_VERSION = '2026-05-09a'`
      - `openSwitchbladePortModal`
      - `Ports · Hover = Nutzung / Status · Klick = Detail`
  - important honest remainder:
    - this fixes the visible UI inconsistency and the broken/stale port interaction path
    - it does **not yet** complete the deeper backend truthfulness issue where Atlas/host-audit raw files can still show real issues while the live open-findings surface may be empty
    - that rematerialization / anti-auto-resolve backend block still needs a dedicated follow-up pass

- Full cross-agent sync refresh on 2026-05-07:
  - all current MAGATAMA/RunPod training automation findings from this chat were consolidated again into `sync/`
  - latest confirmed truth:
    - `sync/` commits successfully reached Gitea again
    - current pushed sync commits now include:
      - `2a35761 sync: record runpod managed endpoint root cause`
      - `72d61ad sync: record custom runpod worker build prep`
  - operator requirement was reaffirmed:
    - all meaningful chat discoveries, decisions, blockers, and deployment truths must continue to be written back into `sync/` so Claude, Codex, and the laptop stay aligned
  - current MAGATAMA training automation truth remains:
    - lane-specific pools are separated and prepared
    - URL-bundle dataset path is in place
    - local adoption/smoke/version-switch code path is in place
    - but fully automatic RunPod return/adoption still depends on switching from the managed Axolotl endpoint to a custom MAGATAMA worker endpoint
  - current infrastructure truth remains:
    - Erik can build Docker images
    - Erik has `docker buildx`
    - Erik currently has no docker registry login/config
    - therefore registry publication of the custom worker image is still the final missing operational prerequisite
  - next required operator inputs for full closure:
    - either:
      - `GHCR_USERNAME` + `GHCR_TOKEN`
    - or:
      - Docker Hub repo + credentials
    - or:
      - an already approved container image destination
  - once registry publication is possible, the exact remaining sequence is:
    - publish custom worker image
    - create/update RunPod endpoint to that image
    - set on Erik:
      - `RUNPOD_WORKER_KIND=custom-magatama`
      - `RUNPOD_ENDPOINT_ID=<custom endpoint id>`
    - restart MAGATAMA dashboard
    - run lane-specific canary training
    - verify:
      - artifact exists
      - local adoption succeeds
      - smoke tests pass
      - release alias increments
      - active lane alias switches automatically

- MAGATAMA RunPod custom worker preparation continued on 2026-05-07:
  - the pending sync handoff was committed and **successfully pushed to Gitea**:
    - commit:
      - `2a35761 sync: record runpod managed endpoint root cause`
  - MAGATAMA repo now includes an explicit helper for building/publishing the custom RunPod worker image:
    - `magatama/scripts/runpod_worker_publish.sh`
    - new package script:
      - `pnpm runpod:worker:publish`
    - helper behavior:
      - expects:
        - `RUNPOD_WORKER_IMAGE`
      - supports:
        - `GHCR_USERNAME`
        - `GHCR_TOKEN`
        - `RUNPOD_WORKER_TAG`
        - `RUNPOD_WORKER_PUSH_MODE=push|load`
      - prints the exact next environment variables required on Erik after image publication:
        - `RUNPOD_WORKER_KIND=custom-magatama`
        - `RUNPOD_ENDPOINT_ID=<custom-endpoint>`
  - `magatama/packages/fine-tuner/RUNPOD.md` was extended so the full automation target is now documented end-to-end:
    - lane pool sync
    - RunPod dataset URL bundle
    - custom worker training
    - adapter upload
    - local adoption
    - smoke tests
    - release alias minting
    - active alias switch
  - Erik infrastructure truth was rechecked:
    - `docker` exists:
      - `/usr/bin/docker`
    - `docker buildx` exists:
      - `github.com/docker/buildx v0.33.0`
    - **no docker registry login/config** is currently present on Erik:
      - `~/.docker/config.json` absent
    - interpretation:
      - Erik can build images
      - but cannot yet push a public/private worker image to GHCR/Docker Hub without credentials or a pre-authenticated registry path
  - the missing custom worker files were synced live to Erik:
    - `/opt/magatama/packages/fine-tuner/Dockerfile.runpod`
    - `/opt/magatama/packages/fine-tuner/RUNPOD.md`
  - a real remote worker image build was then attempted on Erik:
    - image tag requested:
      - `magatama-runpod-worker:test`
    - build truth:
      - base `runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04` pulled successfully
      - Python dependencies for the worker installed successfully
      - build reached:
        - `COPY train_cuda.py runpod_handler.py ./`
        - `exporting to image`
    - however:
      - final image was **not yet visible** in `docker images`
      - therefore the build still needs one more clean verification pass before being treated as green
  - current operational conclusion:
    - MAGATAMA training pools, lane separation, signed dataset URL path, and local adoption API are ready
    - the final blocking step remains infrastructure:
      - publish the custom worker image to a registry RunPod can consume
      - create/switch the endpoint
      - then set on Erik:
        - `RUNPOD_WORKER_KIND=custom-magatama`
        - `RUNPOD_ENDPOINT_ID=<custom endpoint id>`
    - once that is done, MAGATAMA's already-prepared code path can finally perform:
      - train
      - verify artifact
      - adopt locally
      - smoke-test
      - bump version
      - switch alias

- MAGATAMA RunPod training return-path deep dive on 2026-05-07:
  - Attack Paths `Open Fix Guidance` placebo button was fixed live on Erik:
    - `magatama/packages/dashboard/public/index-v2.html`
    - real behavior now:
      - if graph node maps to a real finding, open the existing ticket/finding drawer
      - if node is only synthetic, show an explicit warning instead of doing nothing
    - deployed to:
      - `/opt/magatama/packages/dashboard/public/index-v2.html`
    - `pm2 restart magatama-dashboard` executed
  - local Mac train API truth rechecked:
    - `GET http://127.0.0.1:3214/health`
    - returns `status = ok`
    - service is idle/reachable, not broken
  - RunPod heartbeat/UI stream issue was fixed live:
    - dashboard server now emits keepalive progress messages during:
      - long `IN_PROGRESS` phases
      - post-`COMPLETED` artifact verification loops
    - deployed live to Erik dashboard
  - direct raw RunPod status canary against the current endpoint (`dheii186pfcuq7`) was executed:
    - tiny 1-step `tip_llm` canary job:
      - `33434e85-3cc1-4dea-9043-83c315aaeb9c-e2`
    - observed raw status sequence:
      - `IN_QUEUE`
      - `IN_PROGRESS`
      - `COMPLETED`
    - **critical truth**:
      - `/status/{job}` returned no `output`
      - `/stream/{job}` returned:
        - `{"status":"COMPLETED","stream":[]}`
    - interpretation:
      - the currently configured endpoint is the managed Axolotl serverless endpoint
      - it does not return a programmatically adoptable artifact reference to MAGATAMA
      - this is why all lanes keep ending in:
        - `completed_without_model_artifact`
  - Erik secrets reality rechecked:
    - `/opt/magatama/secrets/hf-token` exists and is readable by the running process
    - therefore the current failure is **not** caused by a missing HF token on Erik
  - root cause now considered confirmed:
    - the **managed Axolotl serverless endpoint** is acceptable for queueing/running a fine-tune
    - but not sufficient for MAGATAMA's required full automation:
      - train
      - return explicit artifact
      - adopt locally
      - smoke-test
      - create new release alias
      - switch active alias
  - code path for the correct architecture is now prepared:
    - `magatama/packages/fine-tuner/runpod_handler.py`
    - `magatama/packages/fine-tuner/train_cuda.py`
    - `magatama/packages/fine-tuner/requirements-runpod.txt`
    - `magatama/packages/dashboard/src/server.ts`
  - what changed in that path:
    - custom RunPod worker now accepts:
      - `target_model`
      - `credentials.hf_token`
    - training script now:
      - trains lane-specific bundle
      - uploads the resulting adapter folder to Hugging Face
      - returns `adapter_repo_id`
    - dashboard custom-worker submit path now includes:
      - `run_id`
      - `target_model`
      - HF credential pass-through for the worker
    - dashboard error text is now explicit:
      - if the managed Axolotl endpoint completes without an adoptable artifact, MAGATAMA says so plainly and points at the need for the `custom-magatama` worker
  - live deployment status:
    - updated dashboard server was rebuilt and deployed to Erik
    - updated custom worker source files were synced into Erik repo state
    - BUT:
      - the currently active RunPod endpoint is still the managed Axolotl endpoint
      - the new full return-path logic will only become effective once the RunPod endpoint is switched to the custom MAGATAMA worker image
  - operational conclusion:
    - training pool refresh, lane separation, submit flow, and local adoption API are now in good shape
    - the final missing infrastructure step is:
      - build/publish `packages/fine-tuner/Dockerfile.runpod`
      - create/use a custom RunPod serverless endpoint for `runpod_handler.py`
      - set:
        - `RUNPOD_WORKER_KIND=custom-magatama`
        - `RUNPOD_ENDPOINT_ID=<custom-endpoint>`
    - only then can MAGATAMA honestly achieve:
      - automatic training
      - automatic artifact return
      - automatic adoption
      - automatic version bump
      - automatic alias switch after smoke tests

## Active Policy

- Put coordination notes and handoffs in this `sync/` folder and push to Gitea.
- Check sibling project sync folders first when context may span repos.
- Use TIPLLM only for TIP crawler/robot planning and extraction feedback.
- Write robot/crawler experience into the Gitea-backed TIPLLM training pool.
- Keep Erik safe: no heavy crawler waves or uncontrolled Playwright/discovery jobs on Erik.
- Use Proxmox/Pi workers for crawl load.

## Cross-Repo Sync

Claude Code also created a Gitea sync handoff in the LLM Gateway repo:

- Repo: `rene/llm-gateway`
- Path: `sync/`
- Commit shown by Claude: `e272105 sync: add chat handoff + context scaffolding for Codex integration (2026-04-29)`
- Gitea path: `http://192.168.178.196:3000/rene/llm-gateway/src/main/sync/`

When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infrastructure, read both:

- `transceiver-db/sync/CURRENT.md`
- `llm-gateway/sync/CURRENT.md`

## Latest Work

- RunPod/MAGATAMA training live follow-up on 2026-05-07:
  - latest `magatamallm` serverless run verified on Erik:
    - job id:
      - `ad003f90-3cf9-43f6-8960-bf6c1ea85097-e2`
    - registry truth in:
      - `/opt/magatama/training-data/model-registry/training-runs.json`
    - observed states:
      - `submitted`
      - then `completed_without_model_artifact`
    - exact recorded warning:
      - `RunPod meldete COMPLETED, aber das erwartete HuggingFace-Modellrepo wurde nicht gefunden.`
  - interpretation:
    - dataset build and RunPod submit are working
    - the worker still does not return a verifiable adoptable model artifact
    - this is a real training return-path failure, not just a cosmetic UI issue
  - local training API truth rechecked:
    - `GET http://127.0.0.1:3214/health`
    - service responds with:
      - `status = ok`
      - `service = magatama-train-api`
      - `running = false`
      - `pid = null`
    - meaning:
      - API is healthy/reachable
      - currently idle
      - ready for adoption/import calls once a valid RunPod artifact exists
  - one UI bug in the training modal was fixed live:
    - root cause:
      - during long `IN_PROGRESS` and post-`COMPLETED` artifact verification phases, MAGATAMA sent no heartbeat for too long
      - browser/proxy could then terminate the stream and surface only:
        - `network error`
      - even though Erik had already written the more truthful registry state
    - fix:
      - `magatama/packages/dashboard/src/server.ts`
      - added server-sent heartbeat messages while:
        - RunPod status remains unchanged
        - Hugging Face / artifact propagation checks are still running
      - concrete live strings now deployed in Erik dashboard server:
        - `⏳ RunPod arbeitet weiter (...)`
        - `⏳ Prüfe Modellartefakt ...`
    - deployment:
      - rebuilt dashboard
      - rsynced `packages/dashboard/dist/server.js` to Erik
      - restarted `pm2 magatama-dashboard`
      - remote `server.js` verified to contain heartbeat strings
  - expected operator effect:
    - future training runs should no longer collapse into a late generic `network error` while RunPod/adoption checks are still active
    - the UI should stay alive long enough to show the real terminal result:
      - `completed_and_adopted`
      - or
      - `completed_without_model_artifact`
      - or
      - worker/adoption failure

- MAGATAMA live follow-up on 2026-05-07:
  - local Mac training API was rechecked after the lane-specific automation changes.
  - current live truth:
    - LaunchAgent `org.fichtmueller.magatama-train-api` is present and running
    - process listens on `*:3214`
    - localhost health now responds when checked outside sandbox restrictions:
      - `GET http://127.0.0.1:3214/health`
      - response:
        - `status = ok`
        - `service = magatama-train-api`
        - `running = false`
        - `pid = null`
        - `updated_at = 2026-05-07T04:14:23Z`
      - interpretation:
        - the training API itself is healthy and reachable
        - it is currently idle, not broken
        - the actual next proof point must come from a fresh lane run that writes lane-specific `*-last_run.json`
  - live Attack Paths UI bug was fixed and deployed to Erik:
    - root cause:
      - the `Open Fix Guidance` button inside the attack-path side panel only triggered a dummy toast and never opened a real finding/ticket detail
    - fix:
      - `magatama/packages/dashboard/public/index-v2.html`
      - new helper:
        - `openFixGuidanceForNode(nodeId)`
      - behavior:
        - if the clicked graph node maps to a real finding ID, MAGATAMA now opens the existing ticket/finding detail drawer via `openTicket(id)`
        - if the node is only a synthetic path node with no backing finding, MAGATAMA now shows an explicit warning instead of pretending to open guidance
    - live deployment:
      - updated `index-v2.html` was rsynced to:
        - `/opt/magatama/packages/dashboard/public/index-v2.html`
      - `pm2 restart magatama-dashboard` executed on Erik
      - deployed file on Erik verified with:
        - `openFixGuidanceForNode`
        - `Open Fix Guidance`
  - operator consequence:
    - Attack Paths no longer contain a placebo “Open Fix Guidance” action
    - clicking it should now open the actual MAGATAMA finding/ticket guidance path when the graph node represents a real finding

- MAGATAMA training automation was hardened locally on 2026-05-07 for all three lanes:
  - target lanes:
    - `magatamallm`
    - `fo_blogllm`
    - `tip_llm`
  - core root cause confirmed:
    - RunPod dataset refresh / lane export already worked
    - RunPod jobs often reached `COMPLETED`
    - but model adoption/version truth still depended on a single shared:
      - `~/magatama-llm/fine-tuning/last_run.json`
    - this made lane status and successful return/adoption ambiguous across models
    - the training modal could also collapse late stream/adoption failures into a generic `network error`
  - local code fixes now in place:
    - `magatama/packages/fine-tuner/training_api.py`
      - lane-specific last-run files added:
        - `~/magatama-llm/fine-tuning/magatamallm-last_run.json`
        - `~/magatama-llm/fine-tuning/fo_blogllm-last_run.json`
        - `~/magatama-llm/fine-tuning/tip_llm-last_run.json`
      - legacy `last_run.json` remains only as backward-compatible mirror for `magatamallm`
      - successful RunPod adoption now creates:
        - a release alias per lane, e.g. `<active-alias>-rN`
      - active alias switching sequence is now:
        - candidate model imported
        - smoke-tested
        - release alias created
        - stable active alias repointed to that release alias
      - adoption report now includes:
        - `version_counter`
        - `release_alias`
    - `magatama/packages/fine-tuner/train.py`
      - local metrics writing now also respects lane-specific last-run files via `TRAINING_LANE`
    - `magatama/packages/dashboard/src/server.ts`
      - `/api/llm/status` now reads lane-specific last-run metadata first
      - `release_alias` is preferred as visible model version when present
      - RunPod SSE catch now distinguishes:
        - real generic training failure
        - `COMPLETED` but no artifact / failed adoption
      - the latter is now rendered as a truthful return/adoption failure, not a vague dataset/network issue
    - `magatama/packages/dashboard/public/index-v2.html`
      - training modal now suppresses misleading late generic `network error` if the server already emitted a terminal training status
      - if the stream ends without a final terminal server event, the UI now explicitly says the registry/adoption state must be checked
      - if the backend reports:
        - completed without artifact
        - completed without HF model
        - completed but adoption failed
        the modal now shows that exact reason
  - local verification:
    - `python3 -m py_compile` passed for:
      - `training_api.py`
      - `train.py`
    - dashboard build passed:
      - `pnpm -C packages/dashboard build`
  - current operational blocker:
    - live deployment to Erik was **not yet completed in this step**
    - direct SSH checks returned:
      - `Connection refused`
      - then `Operation timed out`
    - because of that, the new lane-specific automation logic is locally ready, but not yet confirmed live on Erik for the currently running:
      - `tip_llm`
      - `fo_blogllm`
  - practical consequence:
    - the code path is now prepared for full automation:
      - pull from lane-specific training pool
      - train on RunPod
      - verify artifact existence
      - adopt locally
      - create new release alias/version
      - repoint stable active alias
      - show truthful status in UI
    - but the current live Erik run still needs redeploy + verification once SSH is reachable again

- MAGATAMA local MagatamaLLM training state was re-verified on 2026-05-07:
  - result:
    - the lane export / dataset refresh worked
    - a new locally adopted MagatamaLLM model did **not** land
    - active MAGATAMA provider remains the older alias:
      - `ollama:magatama-coder:latest`
  - live/public evidence:
    - `GET https://magatama.fichtmueller.org/api/llm/status`
      - `activeProvider = ollama:magatama-coder:latest`
      - `autoFixProvider = ollama:magatama-coder:latest`
      - `training.lastTrainingAt = 2026-05-06T22:43:20Z`
      - `training.modelVersion = magatama-coder:latest`
      - `training.activeRun = null`
    - this means the UI timestamp currently reflects the latest dataset/training-state update, not proof of a newly adopted local model.
  - local Mac evidence:
    - `ollama list` still shows:
      - `magatama-coder:latest` → modified `3 weeks ago`
      - `magatama-llm-v2-0:latest` → modified `11 days ago`
    - no newer Magatama candidate/import alias appeared locally
  - registry/adoption evidence:
    - Erik lane manifest exists and is fresh:
      - `/opt/magatama/training-data/runpod/magatamallm/manifest.json`
      - `generatedAt = 2026-05-06T22:45:15.944Z`
      - `train = 15679`
      - `eval = 1743`
      - `total = 17422`
    - but Erik had no populated local adoption/registry state files in:
      - `/opt/magatama/training-data/model-registry/models.json`
      - `/opt/magatama/training-data/model-registry/runs.json`
      - `/opt/magatama/training-data/model-registry/active.json`
      - `/opt/magatama/data/llm-status.json`
    - local repo only had historical `training-data/model-registry/training-runs.json`
  - historical run evidence:
    - recent `magatamallm` training-run records still show:
      - `submitted`
      - then `not_found_after_submit`
      - or other non-adopted / worker-failure states
    - there is still no verified “completed_and_adopted” proof for a new MagatamaLLM local model.
  - operational conclusion:
    - current truth:
      - dataset/lane preparation works
      - local model adoption is still the missing step
      - MAGATAMA does **not** currently know more than the already active `magatama-coder:latest` alias
    - next fix block remains:
      - make RunPod/local completion count only when adoption succeeds
      - persist adoption report + model registry state
      - update active alias and version only after smoke-tested import succeeds

- MAGATAMA Switchblade port intelligence is now truly flowing end-to-end on 2026-05-06:
  - live root cause:
    - Switchblade itself already had the rich SG350 data (`description`, LLDP neighbor, peer port, octets), but MAGATAMA had still shown mostly flat port chips.
    - verified live on Erik:
      - the real Switchblade runtime is the PM2 app `switchblade` under `/opt/switchblade-app`, not the older `/opt/switchblade` tree.
      - `GET http://127.0.0.1:3000/api/discovery/snmp` for `192.168.178.2` already returned rich rows such as:
        - `GigabitEthernet3` → description `Aruba-1830-UNUSED`, neighbor `VN46KYC0G0`, peer port `11`
        - `GigabitEthernet5` → description `Tashi-204`, neighbor `fritz.box`, peer `LAN:1`
        - `GigabitEthernet25` → description `to Cisco Business 220 Series`, neighbor `Switch39688E`, peer `gi9`
    - the remaining loss point was MAGATAMA’s own Switchblade sync/persistence path.
  - MAGATAMA sync hardening:
    - `scripts/switchblade_live_sync.ts`
      - now prefers live SNMP discovery data when it is richer than `/api/devices/<ip>`
      - now maps `description`, `peerDevice`, `peerPort`, `connectedHost`, `inOctets`, `outOctets` into rack device ports
      - added optional debug snapshot dump support via `SWITCHBLADE_DEBUG_SNAPSHOT_FILE`
      - sanitizes unreadable peer-port strings and drops synthetic high-index numeric pseudo-ports
    - verified with a forced live run on Erik:
      - `Top of Rack Switch` now exports `28` real SG350 ports into the rack snapshot instead of the earlier flattened/odd set
      - sample verified payloads before POST:
        - port 3 → `Aruba-1830-UNUSED` / `VN46KYC0G0` / `11`
        - port 5 → `Tashi-204` / `fritz.box` / `LAN:1`
        - port 25 → `to Cisco Business 220 Series` / `Switch39688E` / `gi9`
  - MAGATAMA core hardening:
    - `packages/core/src/routes/health-types.ts`
      - `SwitchbladePortSnapshot` now preserves:
        - `description`
        - `vlan`
        - `macCount`
        - `peerDevice`
        - `peerPort`
        - `connectedHost`
        - `transceiver`
        - `inOctets`
        - `outOctets`
    - `packages/core/src/routes/health-support.ts`
      - `normalizeSwitchbladePort()` now keeps those additional port fields instead of silently truncating them
    - rebuilt locally and re-rsynced the new `packages/core/dist` to Erik
  - dashboard/UI hardening:
    - `packages/dashboard/public/index-v2.html`
      - port chips already had custom tooltip support; now they also carry native `title=` fallback text
      - this reduces the old “question mark / unclear hover” problem in browsers that do not immediately show the custom bubble
  - live public verification after deploy:
    - `GET https://magatama.fichtmueller.org/api/switchblade/snapshot`
      - now contains enriched SG350 rack-port records with:
        - `description`
        - `peerDevice`
        - `peerPort`
        - `connectedHost`
        - `inOctets`
        - `outOctets`
      - public snapshot timestamp verified:
        - `receivedAt = 2026-05-06T22:51:59.247Z`
    - `Top of Rack Switch` in the public snapshot now exposes meaningful peer/use-case data instead of only flat status counters
  - operator impact:
    - MAGATAMA can now answer the actual operational question per port:
      - what is on this port
      - what is it talking to
      - what does the link look like
    - this is now grounded in Switchblade live SNMP/LLDP data, not guesswork.

- TIP/Blog lane separation was materially corrected on 2026-05-06:
  - root cause:
    - `TIP_LLM` was still ingesting blog-/writer-shaped rows from the canonical lane pool and shared transceiver corpora.
    - local inspection showed the old TIP export had `6250` train rows, of which `6087` still matched blog/writer patterns.
  - dataset builder and Gitea sync were hardened:
    - `scripts/runpod_dataset_builder.ts`
      - added strict `tipDatasetAllowed(...)`
      - `TIP_LLM` now rejects blog-shaped source rows at dataset-build time
      - `TIP_LLM` now rejects blog-like `system`, `user`, and markdown-article `assistant` patterns
      - registry fallback for `TIP_LLM` now only uses lane-compatible datasets
    - `scripts/sync_gitea_training_pool.ts`
      - canonical TIP pool refresh now uses the stricter lane-alignment rules
      - redundant `merged.jsonl` copies for `fo_blogllm` and `tip_llm` are no longer rewritten, to avoid local disk exhaustion from duplicate lane artifacts
  - local disk issue encountered and fixed:
    - full refresh failed with `ENOSPC` while writing `training-data/gitea-learning-pool/tip_llm/merged.jsonl`
    - redundant lane `merged` artifacts for `fo_blogllm` and `tip_llm` were truncated and the sync script was changed to stop recreating them
    - free disk space returned from `377Mi` to `17Gi`
  - locally verified after rebuild:
    - `TIP_LLM` RunPod export:
      - `train = 233`
      - `eval = 26`
      - `total = 259`
      - `blog/writer matches = 0`
    - first TIP rows now use the correct TIP system prompt:
      - `You are TIP_LLM, a research and market-intelligence analyst for transceivers, switches, and vendor ecosystems...`
  - corrected artifacts and scripts were synced to Erik and `pnpm training:refresh-all` was rerun there.
  - live verified on Erik/public API:
    - `magatamallm`
      - `datasetSource = url`
      - `collectedExamples = 15679`
      - `evalExamples = 1743`
      - `totalExamples = 17422`
      - `newSinceLastTraining = 15679`
    - `fo_blogllm`
      - `datasetSource = url`
      - `collectedExamples = 17322`
      - `evalExamples = 1926`
      - `totalExamples = 19254`
      - `neverTrained = true`
    - `tip_llm`
      - `datasetSource = url`
      - `collectedExamples = 231`
      - `evalExamples = 26`
      - `totalExamples = 257`
      - `neverTrained = true`
  - operational conclusion:
    - lane-specific dataset truth is now real on Erik.
    - `TIP_LLM` is no longer silently borrowing the FO_Blog behavior lane.
    - the next remaining hard problem is now RunPod artifact adoption/validation, not lane contamination.

- MAGATAMA frontend/runtime consistency was repaired again on 2026-05-06:
  - dashboard and core were rebuilt locally and redeployed to Erik.
  - live processes restarted successfully:
    - `magatama-dashboard`
    - `magatama`
  - public `api/llm/status` now shows the true lane-export totals for `magatamallm`:
    - `collectedExamples = 15620`
    - `effectiveExamples = 15620`
    - `evalExamples = 1736`
    - `totalExamples = 17356`
    - `newSinceLastTraining = 15620`
  - root cause for the stale `1097` display:
    - the RunPod start SSE path still logged the legacy deduplicated `fixes.jsonl` corpus.
    - this was changed so RunPod launches no longer present the legacy `1097` count as the active training truth.
    - after dataset refresh the UI now emits the lane manifest totals instead.
  - RunPod completion handling was hardened:
    - worker `COMPLETED` is no longer trusted blindly.
    - MAGATAMA now scans RunPod worker logs for real training failures (`Traceback`, `SyntaxError`, non-zero exit, etc.) before treating the run as successful.
    - if the worker logs show a hidden failure, MAGATAMA records this as `completed_with_worker_failure` instead of pretending the run succeeded.
  - public findings state remains currently empty:
    - `GET /api/findings?limit=1` returned `{"findings":[],"total":0}`
    - this is now rendered with an explicit empty-state row instead of a visually blank table.
  - Attack Paths empty-state is now intentionally explicit rather than looking broken.
  - Frontend cache and scope handling were hardened:
    - cache version bumped to `2026-05-06b`
    - stale legacy `magatama_api_cache:*` entries are cleared
    - per-endpoint TTLs added
    - invalid or empty scope selections are normalized instead of silently leaving the UI in misleading empty views
  - Switchblade rack port hover was materially improved:
    - port chips now carry `data-tooltip`
    - custom tooltip CSS is live on Erik
    - the old browser-native “question mark only” behavior should be replaced by a readable hover bubble
  - Changelog self-healing was added in core:
    - stale cached changelog data older than 6h now forces a rebuild from git history
    - verified live via dashboard proxy on Erik:
      - `generatedAt = 2026-05-06T15:18:42.708Z`
      - latest visible entries include `2026-04-30` items again instead of appearing frozen at `30.05`

- MAGATAMA lane-specific training pools and RunPod dataset automation were finished on 2026-05-06:
  - root cause:
    - the training modal always fetched `/api/llm/status` without a lane, so `FO_BlogLLM` and `TIP_LLM` still showed the `magatamallm` pool.
  - dashboard/server were updated so `/api/llm/status?lane=...` is now truly lane-aware.
  - the training modal now refreshes per selected lane and rewrites:
    - title
    - runtime label
    - pool path
    - counts
    - dataset source
  - MAGATAMA dashboard env on Erik was switched to URL dataset mode for all lanes via `ecosystem.config.cjs`:
    - `RUNPOD_DATASET_SOURCE=url`
    - `RUNPOD_DATASET_SOURCE_MAGATAMALLM=url`
    - `RUNPOD_DATASET_SOURCE_FO_BLOGLLM=url`
    - `RUNPOD_DATASET_SOURCE_TIP_LLM=url`
  - live verified on Erik after restart:
    - `fo_blogllm`
      - `datasetSource = url`
      - `collectionsPath = /opt/magatama/training-data/runpod/fo_blogllm/manifest.json`
      - `train = 28`
      - `eval = 4`
      - `total = 32`
    - `tip_llm`
      - `datasetSource = url`
      - `collectionsPath = /opt/magatama/training-data/runpod/tip_llm/manifest.json`
      - `train = 36`
      - `eval = 4`
      - `total = 40`
    - `magatamallm`
      - remains on lane-export counts (`15620 / 1736 / 17356`)
  - operator impact:
    - no Hugging Face dataset publish is required anymore for MAGATAMA RunPod launches.
    - every supported LLM lane now points to its own local/Gitea-backed lane export instead of reusing `magatamallm`.

- MAGATAMA training + Attack Paths + Atlas exposure were corrected again on 2026-05-06:
  - the RunPod serverless training start failure was not a RunPod outage.
  - root cause was missing training scripts on Erik (`training_full_refresh.ts` and related helpers were absent under `/opt/magatama/scripts`).
  - Codex synced the full local `magatama/scripts/` tree to Erik, added a safe fallback in `scripts/model_registry_build.ts`, and synced the local `training-data/model-registry/` directory.
  - verified on Erik:
    - `pnpm training:refresh-all` now succeeds.
    - fresh dataset totals after dedupe:
      - `magatamallm`: `92,742` raw → `17,356` effective (`15,620 train / 1,736 eval`)
      - `fo_blogllm`: `32` total (`28 train / 4 eval`)
      - `tip_llm`: `40` total (`36 train / 4 eval`)
  - important nuance:
    - Codex did **not** execute the final Hugging Face publish step from Erik in this chat.
    - local/script/build failures are fixed; external dataset publish still depends on the selected dataset source and explicit publish intent.
- MAGATAMA Attack Paths UX is no longer a misleading blank panel:
  - the page now distinguishes between:
    - no live attack paths
    - historical fallback paths
    - empty selected scope (`0 assets in scope`)
  - when a user narrows the scope to a rack/location with zero scoped assets, the graph explicitly says so instead of looking broken.
  - live dashboard HTML on Erik now contains:
    - `Im aktuellen Scope liegen 0 Assets.`
    - `Erweitere Standort oder Datacenter / Rack, damit MAGATAMA korrelierbare Assets und Pfade darstellen kann.`
    - `Ohne offene mehrstufige Korrelationen bleibt die Graph-Sicht bewusst leer.`
- MAGATAMA code/training hardening was extended:
  - `scripts/test_runpod_adapter.py` no longer loads tokenizer/model with `trust_remote_code=True`.
  - `scripts/ollama_adapter_bridge.py` no longer loads tokenizer/model with `trust_remote_code=True`.
  - this removed the live CODE finding around `HuggingFace trust_remote_code` on Erik.
- Atlas exposure logic was tightened to stop reopening noisy LAN management findings:
  - generic `atlas-exposure` findings now only stay operationally open for exposure that is meaningful enough to track as a finding.
  - internal RFC1918 management/service ports discovered by the broad atlas scan are no longer promoted into open Guard findings just because they exist on the LAN.
  - host-specific posture for Proxmox / Erik / Mac Studio remains the job of explicit host-audit logic.
  - after rebuild + deploy + health sync:
    - live Postgres open findings returned to `0`.
- Follow-up hardening on the same block:
  - the earlier RunPod error path in MAGATAMA dashboard was made more truthful.
  - dataset preparation now distinguishes:
    - local `training:refresh-all` failure
    - optional Hugging Face publish failure
    - URL-based dataset mode with no external publish required
  - the training SSE flow now explicitly tells the operator whether RunPod is using:
    - Hugging Face dataset source
    - or MAGATAMA URL-bundle dataset source
  - this avoids misleading `RunPod not reachable` wording when the actual failure is in dataset preparation.
  - follow-up serverless verification on 2026-05-06 narrowed the remaining fault further:
    - MAGATAMA submit logic now verifies that a RunPod job really exists under `/status/{jobId}` instead of trusting `/run`.
    - payloads were aligned more closely with the official Axolotl serverless schema:
      - `model_type=AutoModelForCausalLM`
      - `tokenizer_type=AutoTokenizer`
      - dataset `split: train`
      - optimizer `adamw_torch_fused`
    - verified full run attempt:
      - job id `9bc4b16b-755b-465b-aadf-b46f2fe467a3-e2`
      - disappeared as `not_found_after_submit` (`404 job not found`)
    - verified canary after payload fix:
      - job id `a4ac6951-7ed7-43cb-80d8-5ab61533c2da-e2`
      - immediately materialized as `IN_QUEUE`
      - then still disappeared on later reconcile as `not_found_after_submit`
    - current conclusion:
      - the old MAGATAMA bug is fixed.
      - the remaining problem is now likely on the RunPod endpoint/release side: jobs are accepted and briefly queued, but do not survive long enough to produce a durable serverless status lifecycle.
    - operational rule:
      - do not treat `submitted` or a brief `IN_QUEUE` as proof of a usable serverless training run.
      - only trust the run once it reaches `IN_PROGRESS` or a durable terminal state with artifact evidence.
  - follow-up training count fix on 2026-05-06 corrected the Training UI source-of-truth:
    - MAGATAMA had still shown `1097` because the dashboard was counting the legacy deduplicated fix corpus instead of the current lane-specific RunPod export.
    - dashboard now prefers `training-data/runpod/magatamallm/manifest.json` for the visible MagatamaLLM training count.
    - synced current lane export to Erik and restarted `magatama-dashboard`.
    - verified public API now returns:
      - `collectedExamples = 1367`
      - `effectiveExamples = 1367`
      - `evalExamples = 152`
      - `totalExamples = 1519`
      - `newSinceLastTraining = 1367`
    - if the browser still shows `1097`, treat it as stale cached UI and hard reload.

- MAGATAMA was repaired end-to-end to a clean operational baseline:
  - live guard host-audits for Erik, Mac Studio, and Proxmox were corrected and rerun.
  - open findings were reduced all the way to `0` in Postgres.
  - false-positive Proxmox baseline findings were removed by teaching the audit to treat internal-only management ports and default-only rpcbind exposure as acceptable for this host.
  - code scanner false positives from generated/report artifacts remain excluded.
- Live MAGATAMA protection/runtime state after the 2026-05-06 remediation:
  - `open findings: 0`
  - `queueExecuting: 0`
  - `queueBlocked: 0`
  - `queueFailed: 0`
  - public `/api/health` returns `status: ok`
  - public `/api/active-resolvers` returns:
    - `MAGATAMA Core: working`
    - `MagatamaLLM: working`
    - `Claude (secondary): working`
    - `Codex (secondary/manual): idle`
    - `Copilot (secondary/manual): idle`
- Important resolver truth fix on 2026-05-06:
  - live `codex_enabled=false` in MAGATAMA settings was causing Codex to show as a broken resolver.
  - dashboard logic was updated so disabled Codex/Copilot now show truthfully as `idle` with `In MAGATAMA settings disabled`, instead of pretending there is a runtime outage.
  - the local codex bridge on Erik is reachable but currently reports `auth_required`; do not treat that as a production outage while Codex is intentionally disabled in settings.
- Remaining real operational gap after findings hit zero:
  - MAGATAMA still knows more assets than it actively telemeters.
  - last public protection proof showed:
    - `knownAssets: 79`
    - `hostsWithTelemetry: 27`
    - `assetsWithoutTelemetry: 52`
  - these are currently inventory/discovery-only assets, not open findings, but they remain the next real coverage expansion area.

- MAGATAMA cross-repo state from the same chat is now synced into this handoff:
  - Compliance framework cards in MAGATAMA are clickable and open per-framework requirement details.
  - MAGATAMA training status was corrected so `New Since Last Training` no longer falsely shows `0`.
  - Live verified/deduped MAGATAMA training state after the fix:
    - `collectedExamples: 49`
    - `rawExamples: 58`
    - `duplicateExamples: 9`
    - `effectiveExamples: 49`
    - `newSinceLastTraining: 49`
  - MAGATAMA now filters training metrics to verified/trainable examples only.
  - Failed/escalated MAGATAMA remediation records should go to `errors.jsonl`, not the main `fixes.jsonl`, so the next MagatamaLLM run does not train on junk.
  - Gitea-backed training pool remains the default target for training writes.
- MAGATAMA coverage-gap and training-integrity hardening on 2026-05-06:
  - the earlier `49` medium `atlas-coverage-gap` findings were traced to Atlas treating inventory-only and discovery-only assets as operational protection failures.
  - core logic was tightened so Atlas coverage findings now open only for managed operational assets:
    - exposure-backed assets
    - explicit non-auto owner
    - configured telemetry expectation
    - critical/high criticality
    - infrastructure metadata or managed infra device types
  - loopback and passive reference/inventory assets no longer reopen noisy guard findings.
  - local build succeeded, the new core dist was deployed to Erik, and the first post-deploy guard scan resolved stale findings.
  - live Postgres state after deploy: `open findings = 0`.
  - training integrity bug was fixed in `packages/core/src/learning/fix-tracking.ts`:
    - verified fixes now append to `training-data/gitea-learning-pool/magatamallm/fixes.jsonl`
    - failed/escalated/report-only runs now belong in `errors.jsonl`
  - two explicit Codex-written training entries were appended to the MAGATAMA Gitea-backed fixes corpus:
    - atlas coverage scope hardening
    - training path integrity fix
  - corpus cleanup + dedupe was executed afterward:
    - pre-dedupe backup kept locally as:
      - `magatama/training-data/gitea-learning-pool/magatamallm/fixes-pre-dedupe-20260506.jsonl`
    - resulting verified corpus:
      - `fixes.jsonl = 1,368` unique verified training rows
    - resulting failure corpus:
      - `errors.jsonl = 4` tracked failed/escalated rows
    - integrity report now exists at:
      - `magatama/training-data/gitea-learning-pool/magatamallm/corpus-integrity-report.json`
    - latest integrity totals:
      - `scanned: 1368`
      - `verified: 1368`
      - `movedToErrors: 4`
      - `parseErrors: 0`
      - `invalidVerifiedFlag: 0`
- Complete Codex chat sync was added:
  - `sync/history/2026-04-29-codex-complete-chat-sync.md`
  - captures Ghost/blog updates, LinkedIn voice preferences, LPO/AI-fabric blog edits, Rest-Is-Not-Laziness scheduling replacement, and security notes.
  - confirms no secrets were written into sync.
  - confirms TIP crawler/robot planning remains TIPLLM-only.
  - confirms Erik remains controller/light `erik-safe` only, with heavy crawler work assigned to Proxmox/Pi workers.
- Codex sync-start confirmation was added:
  - `sync/history/2026-04-29-codex-sync-start-confirmation.md`
  - confirms Codex read this TIP handoff, checked the sibling LLM Gateway handoff, and is treating `sync/` as binding.
  - no code changes, crawler jobs, queue waves, PM2 restarts, or Erik load were initiated during this confirmation.
- Codex follow-up on 2026-04-29 clarified the active BlogLLM model:
  - TIP shows `fo-blog-v7`, but this is not a normal Ollama GGUF manifest.
  - It is a local Adapter Bridge / Mac Studio model backed by the RunPod-trained PEFT adapter:
    `/Users/renefichtmueller/Desktop/Claude Code/magatama/training-data/runpod/pod-runs/2026-04-25-fo-tip/final/adapters/fo_blogllm/final-adapter`
  - Bridge definition:
    `/Users/renefichtmueller/Desktop/Claude Code/magatama/scripts/ollama_adapter_bridge.py`
  - TIP API default:
    `packages/api/src/llm/client.ts` uses `OLLAMA_LLM_MODEL || "fo-blog-v7"`.
  - `fo-blog-v8` remains the next training candidate, not the currently active TIP BlogLLM model.
- Full Codex session handoff was added:
  - `sync/history/2026-04-29-codex-full-session-handoff.md`
  - covers TIP verification, product image/detail crawling, Blog Engine Hot Topics, TIPLLM robots, training pool, Erik status, and cross-repo sync.
- Added a verification robot controller:
  - `packages/scraper/src/robots/verification-robots.ts`
  - command: `npm run robots:verification -w packages/scraper -- --status`
- Added TIPLLM robot experience writing:
  - `packages/scraper/src/crawler-llm/training-data-writer.ts`
  - writes raw robot audit rows and SFT records.
- Added Gitea training pool import to TIP learning-pool build:
  - `scripts/tip-learning-pool-build.ts`
  - imports `TIP_TRAINING_REPO/qa-pairs/*.jsonl` into the `tip_llm` lane.
- Added docs:
  - `docs/TIP_SELFLEARNING_WORKFLOW.md`
- Added package script:
  - `packages/scraper/package.json`
  - `robots:verification`

## Gitea Training Pool

- Existing local clone: `/tmp/tip-training-data`
- Gitea repo: `rene/tip-training-data`
- Latest pushed training commit:
  - `f1c83f8 crawl: add robot-status training records [2026-04-29T20:11:24.091Z]`
- First robot experience record was written to:
  - `/tmp/tip-training-data/qa-pairs/robot-control-high.jsonl`
  - `/tmp/tip-training-data/robot-experiences/2026-04-29.jsonl`

## MAGATAMA Training / Operations State

- Relevant local repo:
  - `/Users/renefichtmueller/Desktop/Claude Code/magatama`
- Latest confirmed live MAGATAMA findings state:
  - `open findings: 0` on `2026-05-06`
- Latest confirmed live resolver state:
  - `Codex` and `Copilot` intentionally `idle/disabled`
  - not a runtime outage, but a settings choice until gateway/bridge auth is intentionally re-enabled
- Latest confirmed live MAGATAMA training metric after dashboard fix:
  - `newSinceLastTraining: 49`
- Meaning:
  - the old `0` was incorrect.
  - the currently visible trainable MAGATAMA corpus is based on verified and deduplicated examples only.
- Latest corpus integrity state after cleanup:
  - operational Gitea-backed MAGATAMA training corpus is now much smaller but cleaner:
    - `1368` unique verified rows
    - `4` live failure/escalation rows in `errors.jsonl`
  - do not confuse raw historical volume with real trainable signal.
- Important training integrity rule:
  - report-only or failed/escalated records must not be treated as verified training fixes.
  - keep them separated from the main verified training corpus.

## Erik Status

- Synced TIPLLM robot/training code to `/opt/tip`.
- Did not start crawler jobs.
- Did not enqueue robot waves.
- Did not restart PM2 services.
- Remote scraper TypeScript build is passing after removing two stale misplaced remote-only duplicate files:
  - `/opt/tip/packages/scraper/src/scrapers/scheduler.ts`
  - `/opt/tip/packages/scraper/src/vendor-discovery-crawler.ts`
- `tip-api` and `tip-scraper-daemon` are online.
- Shared Erik note from the same chat:
  - MAGATAMA dashboard/core were redeployed during compliance/training fixes.
  - TIP crawler policy remains unchanged: Erik is controller/light runner only, not heavy crawl execution host.

## Last Live Verification Snapshot

From 2026-04-29:

- Total transceivers: `13,546`
- Price verified: `7,250`
- Image verified: `7,025`
- Details verified: `6,243`
- Fully verified: `5,812`
- Last price observation: `2026-04-29 19:15:53 UTC`
- Last stock observation: `2026-04-29 19:15:56 UTC`

## Latest MAGATAMA Training / RunPod Truth

Confirmed on `2026-05-06`:

- Lane-specific training pools are now materially separated and no longer all fallback to `magatamallm`.
- Live Erik dashboard API now reports:
  - `magatamallm`
    - `1367 train`
    - `152 eval`
    - `1519 total`
    - `newSinceLastTraining = 1367`
  - `fo_blogllm`
    - `17353 train`
    - `1929 eval`
    - `19282 total`
    - `newSinceLastTraining = 17353`
    - active local model resolves to `fo-blog-v7`
  - `tip_llm`
    - `6482 train`
    - `721 eval`
    - `7203 total`
    - `newSinceLastTraining = 6482`
    - target active model is `tip-llm-v1`, but this model is not yet present locally in Ollama
- Result:
  - previous `1097` everywhere was stale / wrong.
  - selected lane now controls its own manifest, model label, and training counts.

### Gitea-backed Pool Materialization

- `magatamallm` Gitea pool remains canonical and populated.
- `fo_blogllm` and `tip_llm` Gitea-backed pool folders were previously almost empty; they are now materialized from the local RunPod lane exports.
- Lane manifests and JSONL exports now exist under:
  - `training-data/gitea-learning-pool/fo_blogllm/`
  - `training-data/gitea-learning-pool/tip_llm/`

### RunPod Completion Hardening

- MAGATAMA dashboard code now treats RunPod `COMPLETED` as success only after:
  1. target model artifact is referenced
  2. local Mac training API adopts/imports the artifact
  3. lane-specific smoke tests pass
  4. active Ollama alias is updated
- New local adoption endpoint is:
  - `POST /adopt-runpod-model`

### Mac Training API State

- The old LaunchAgent on Mac Studio was still serving the legacy training API from:
  - `~/magatama-llm/service/training_api.py`
- It has now been upgraded in place so Erik sees the new adoption-capable API.
- Verified from Erik:
  - `http://192.168.178.213:3214/health` returns the new service
  - it now exposes `register_script` pointing into the MAGATAMA repo
  - `POST /adopt-runpod-model` exists and rejects unauthenticated requests with `401`, proving the route is live

### Still Outstanding

- A fully successful end-to-end RunPod fine-tune with:
  - real worker success
  - real artifact
  - successful local Ollama import
  - active alias switch
  - smoke-test proof
  has not yet been re-verified after the new adoption pipeline was wired in.
- Latest live proof run on `2026-05-06`:
  - job id: `2112a7ab-68c2-4411-a44f-6edb7ad377df-e1`
  - materialized correctly
  - reached `IN_PROGRESS`
  - then `COMPLETED`
  - but RunPod `status/{job}` returned no `output` object, no model artifact reference, and no Hugging Face repo result
  - current MAGATAMA handling now correctly classifies this as `completed_without_model_artifact`, not as success
- `tip_llm-v1` is still not installed locally in Ollama.

### Pulso AI Recommendation

- Keep a shared network/transceiver/switch core corpus with TIP.
- Do not collapse `Pulso AI` into the same instruction lane as `TIP_LLM`.
- Recommended split:
  - `TIP_LLM`
    - research
    - crawler / scraper / robot planning
    - vendor / firmware / issue extraction
  - `Pulso AI`
    - product responses
    - support
    - diagnostics
    - operator explanation layer

## Safe Next Steps

1. Clone or pull Gitea `origin` on laptop/Claude Code.
2. Read this folder first.
3. For BlogLLM work, treat `fo-blog-v7` as Adapter Bridge / PEFT adapter, not as a `~/.ollama` GGUF model.
4. Also read `llm-gateway/sync/CURRENT.md` when work touches shared Erik infrastructure, LLM routing, bridges, auth, TIPLLM, or crawler orchestration.
5. For TIP robot/crawler planning, use TIPLLM only. Do not route this lane through external AI providers.
6. When training pools or model stats look suspicious, prefer verified-only counts and check whether failed/escalated rows polluted the corpus.
7. For MAGATAMA-adjacent work, keep writing learnings back into the Gitea-backed pool and avoid training on report-only pseudo-fixes.
8. If testing robots, start with dry runs only:

```bash
npm run robots:verification -w packages/scraper -- --status
npm run robots:verification -w packages/scraper -- --tipllm-plan --limit=3
npm run robots:verification -w packages/scraper -- --enqueue=details-fast-lane --profile=erik-safe --dry-run
```

9. Only dispatch real crawl work after deciding the target host:
   - Erik: `erik-safe`, tiny batches only.
   - Pi: `pi-fetch`.
   - Proxmox: `proxmox-heavy`.

## Dirty Worktree Note

There are existing uncommitted changes outside `sync/`. Some are Codex work from this session, some appear pre-existing or from earlier Claude/Codex work. Do not blindly revert them. Review `git status --short` before committing broader changes.

## Latest Sync Commits

- `6c42ca7 docs: add shared agent sync handoff`
- `8e7c5aa docs: link llm-gateway sync handoff`
- `bba48d3 sync: record magatama atlas rematerialization fix`
- `fd29bee sync: record magatama atlas fallback and port detail live fixes`
- `8b42077 sync: refresh cross-agent chat handoff`
- Pending after this update:
  - watch whether any future guard exposure findings are genuine operational issues or new false positives.
  - if failures still appear inside `fixes.jsonl`, scrub historic pollution and backfill `errors.jsonl`.

## 2026-05-09 Addendum — Live Atlas + Lane Registry Truth

### Atlas / Findings

- MAGATAMA Atlas was not actually empty; the public UI could still look blank while live proof data already showed:
  - `knownAssets: 57`
  - `hostsWithTelemetry: 22`
  - `assetsWithoutTelemetry: 35`
  - `auditedHosts: 3`
  - `queueBlocked: 28`
- Root causes fixed live:
  1. `packages/core/src/routes/health-builders.ts`
     - Atlas audits / exposure now rematerialize operational findings before proof rendering.
  2. `packages/core/src/scheduler.ts`
     - generic stale auto-resolve no longer auto-closes:
       - `atlas-coverage-gap`
       - `atlas-exposure`
       - `atlas-host-audit`
  3. `packages/dashboard/public/index-v2.html`
     - if proof data is temporarily empty or stale, Atlas now derives a fallback proof model from the current snapshot so the top cards do not render as blank.
- Live public verification after deploy:
  - `/api/protection-proof` shows non-zero Atlas truth again.
  - `/api/findings?limit=10` shows open `atlas-coverage-gap` findings again.

### Training / Lane Registry

- The public training status is now honest for the current live state:
  - `magatamallm`
    - `datasetSource: url`
    - `collectionsPath: /opt/magatama/training-data/runpod/magatamallm/manifest.json`
    - `15679 train`
    - `1743 eval`
    - `17422 total`
    - `lastRegistryRunStatus: completed_without_model_artifact`
  - `fo_blogllm`
    - lane registry rebuilt on Erik
    - `lastRunStatus: completed_without_model_artifact`
  - `tip_llm`
    - lane registry rebuilt on Erik
    - `lastRunStatus: completed_without_model_artifact`
- `scripts/model_registry_build.ts` now compiles per-lane metadata from:
  - lane datasets
  - lane RunPod manifests
  - `training-runs.json`
- Live compiled registry on Erik now no longer sits at all-`null`; it exposes:
  - `activeModel`
  - `version`
  - `lastRunId`
  - `lastRunStatus`
  - `datasetSource`
  - `collectionsPath`

### Still Outstanding

- Full automatic training is still blocked by the managed RunPod Axolotl endpoint:
  - jobs reach `COMPLETED`
  - but no adoptable artifact is returned
  - therefore MAGATAMA correctly records:
    - `completed_without_model_artifact`
- That means:
  - no new model version can be truthfully activated yet
  - no Ollama alias switch should happen yet
- Remaining real blocker:
  - move to `custom-magatama` RunPod worker with explicit adapter/model artifact publication.