2932 lines
148 KiB
Markdown
2932 lines
148 KiB
Markdown
# Current TIP Sync State
|
||
|
||
Updated: 2026-05-10 02:58 UTC
|
||
|
||
## Newest Work
|
||
|
||
- MAGATAMA all-lane RunPod training completion on 2026-05-10:
|
||
- RunPod training/adoption is now verified end-to-end for all five active MAGATAMA LLM lanes:
|
||
- `magatamallm`: active `magatama-coder:latest`, model version `magatama-coder-r2`, dataset `1375 train / 153 eval / 1528 total`
|
||
- `fo_blogllm`: active `fo-blog-v8`, model version `fo-blog-v8-r2`, dataset `17342 train / 1929 eval / 19271 total`
|
||
- `tip_llm`: active `tip-llm-v2`, model version `tip-llm-v2-r2`, dataset `276 train / 31 eval / 307 total`
|
||
- `pulso_llm`: active `pulso-llm-v1`, model version `pulso-llm-v1-r1`, dataset `28 train / 5 eval / 33 total`
|
||
- `contact_llm`: active `contact-llm-v1`, model version `contact-llm-v1-r1`, dataset `18 train / 4 eval / 22 total`
|
||
- strict adoption rule is now validated in production:
|
||
- RunPod `COMPLETED` alone is not a success
|
||
- success requires uploaded adapter artifact, local Mac adoption, Ollama model registration, smoke tests, registry write, dashboard registry rebuild and active alias switch
|
||
- fixed/verified automation behavior:
|
||
- local Mac adoption service exposes authenticated adoption reports per lane via `/adoption-report/{lane}`
|
||
- dashboard adoption path can recover from transient network/fetch errors by reading the local adoption report
|
||
- reconciler can adopt already-completed RunPod jobs when the live SSE path failed after artifact upload
|
||
- registry events now include top-level `active_model`, `release_alias`, `model_version`, `version_counter` and `candidate_model`
|
||
- resolved concrete failures:
|
||
- `pulso_llm` training had succeeded, but old local lane mapping caused `unknown lane: pulso_llm`; Pulso is now adopted and active
|
||
- `tip_llm` training succeeded but local adoption failed due low Mac disk space before GGUF conversion; safe obsolete Ollama versions and imported intermediate GGUFs were removed, then TIP was reconciled successfully
|
||
- `contact_llm` was still `neverTrained`; it is now trained, adopted and active
|
||
- ContactLLM smoke test result:
|
||
- `4/5` checks passed
|
||
- remaining improvement: provenance prompt should always include source URL, timestamp, confidence and contact type; add this as a next training/eval item
|
||
- public Magatama `/api/llm/status?lane=...` checks after dashboard restart show all five lanes as `completed_and_adopted`
|
||
- operational note:
|
||
- keep enough Mac free space before another adoption; each new 7B adapter adoption needs merge + GGUF conversion workspace
|
||
- obsolete non-active Ollama versions can be removed after verifying active aliases and release aliases exist
|
||
|
||
- TIP price/source verification closure on 2026-05-10 local / 2026-05-09 UTC:
|
||
- fixed SFPcables scraper to persist `product_page_url`
|
||
- added product-page price fallback for SFPcables when listing pages omit price markup
|
||
- added `verify:product-page-prices`
|
||
- source-backed public price verification from existing product URLs
|
||
- ShopFiber24 parser takes the first main product `itemprop=price`, not related-product `minPrice`
|
||
- ATGBICS parser uses Shopify `/products/{handle}.js` prices for coherent/ZR products
|
||
- fixed `upsertPriceObservation` to set `price_status='public_price'`
|
||
- widened price anomaly handling only for explicit coherent/ZR/DCO/tunable products
|
||
- expanded quarantine for ShopFiber24 FOCP/category/DAC-AOC artifacts and Vcelink numeric rows
|
||
- live runs on Erik:
|
||
- ShopFiber24 quarantine: `12` artifacts removed
|
||
- SFPcables scraper with detail fallback: `110` products, `37` price observations
|
||
- SFPcables asset verifier: `31` images, `29` details, `0` errors
|
||
- ShopFiber24 price verifier: `12` real EUR prices
|
||
- ATGBICS price verifier: `3` real GBP coherent/ZR prices
|
||
- Vcelink quarantine: `2` numeric artifacts removed
|
||
- 10Gtek/SFPcables retail crawl confirmed remaining `126` rows have no public retail product URL
|
||
- 10Gtek price availability resolver: `126` rows set to `price_status=no_public_price` with evidence
|
||
- live health after this pass:
|
||
- active products: `17181`
|
||
- price verified: `11460`
|
||
- price status: `public_price=11460`, `no_public_price=5721`, `needs_research=0`, `ambiguous=0`
|
||
- image verified: `12132`
|
||
- details verified: `16922`
|
||
- fully verified: `10549`
|
||
- competitor status: `matched=10821`, `no_valid_match=74`, `ambiguous=556`, `needs_research=5730`
|
||
- follow-up image/detail probing:
|
||
- II-VI / Coherent product URL verifier added `1` detail
|
||
- Cisco/Juniper dry-run showed Juniper product pages expose no useful product images in the sampled batch
|
||
- Cisco apply added `7` official Cisco rendition images and detail improvements; some Cisco pages return `403`
|
||
- interpretation:
|
||
- price research queue is closed without fabricated prices
|
||
- remaining verification work is image/details/competitor state, dominated by OEM/catalog rows
|
||
- largest current product-data gaps: Juniper, Cisco, 10Gtek, Nokia, Palo Alto, Arista
|
||
|
||
- TIP continuation on 2026-05-10 local / 2026-05-09 UTC:
|
||
- added `verify:part-number-details`
|
||
- deterministic part-number speed inference for rows where form factor/reach/fiber already exist but `speed_gbps=0`
|
||
- dry-run caught Cisco `GLC-FE-*` as Fast Ethernet trap; rule hardened before apply
|
||
- live apply:
|
||
- Juniper Networks: `375` speed updates/details verified
|
||
- Cisco Systems: `176` speed updates/details verified
|
||
- evidence count `verify:part-number-details`: `551`
|
||
- health detail count moved to `16913`
|
||
- added migration `sql/105-price-status-and-unavailable-evidence.sql`
|
||
- new `transceivers.price_status`
|
||
- new `price_unavailable` evidence type
|
||
- strict rule remains: `price_verified` only means a real public price observation exists
|
||
- added `verify:price-availability`
|
||
- resolves quote-only/OEM/manufacturer/test-equipment/hyperscaler vendors to `price_status=no_public_price`
|
||
- writes `price_unavailable` evidence, does not fabricate price rows
|
||
- preserves real retail/source-discovery cases as `needs_research`
|
||
- Health API now exposes price status buckets
|
||
- live price-status result:
|
||
- `public_price=11414`
|
||
- `no_public_price=5595`
|
||
- `needs_research=186`
|
||
- `ambiguous=0`
|
||
- remaining price research is now limited to real retail/source-discovery vendors:
|
||
- `10Gtek=126`
|
||
- `SFPcables=31`
|
||
- `ShopFiber24=24`
|
||
- `ATGBICS=3`
|
||
- `Vcelink=2`
|
||
- SFPcables search tests for 10Gtek part numbers did not return reliable direct hits; remaining 10Gtek work is source/alias discovery, not no-public-price classification
|
||
- live health:
|
||
- active products: `17195`
|
||
- price verified: `11414`
|
||
- image verified: `12104`
|
||
- details verified: `16913`
|
||
- fully verified: `10505`
|
||
- competitor status: `matched=10775`, `no_valid_match=74`, `ambiguous=556`, `needs_research=5790`
|
||
- TIPLLM training pool updated with:
|
||
- part-number details verifier rules
|
||
- price_status/no-public-price model
|
||
|
||
- MAGATAMA all-lane RunPod training block started on 2026-05-09:
|
||
- user requested all trainable LLM lanes via RunPod
|
||
- lanes in scope:
|
||
- `magatamallm`
|
||
- `fo_blogllm`
|
||
- `tip_llm`
|
||
- `pulso_llm`
|
||
- `contact_llm`
|
||
- preflight:
|
||
- MAGATAMA services online on Erik
|
||
- active RunPod endpoint: `0rmkf28w2g5gip`
|
||
- worker kind: `custom-magatama`
|
||
- dataset source: URL lane export
|
||
- latest previous adopted runs existed for `magatamallm`, `fo_blogllm`, `tip_llm`
|
||
- `pulso_llm` and `contact_llm` had no previous adopted RunPod run
|
||
- fixed live/local helper script:
|
||
- `scripts/trigger_lane_training_once.py`
|
||
- API payload now uses `iters` and `seed_only` instead of stale `iterations` and `seedOnly`
|
||
- added `all` mode for sequential full-lane training
|
||
- streams SSE lines to the log instead of buffering until the response closes
|
||
- MAGATAMA Gitea commit: `76d4054`
|
||
- live sequence started on Erik:
|
||
- command: `python3 -u scripts/trigger_lane_training_once.py all 500 false`
|
||
- log: `/opt/magatama/logs/runpod-all-lanes-20260509T230549Z.log`
|
||
- first active lane: `magatamallm`
|
||
- first RunPod job: `89627e7e-8533-45db-9fe8-eca994018aa6-e2`
|
||
- `magatamallm` dataset at start: `1375 train`, `153 eval`, `1528 total`
|
||
- success rule remains strict:
|
||
- RunPod `COMPLETED` alone is not sufficient
|
||
- artifact must exist, import/adoption must succeed, smoke checks must pass, and active alias/version must update
|
||
|
||
- TIP verification continuation on 2026-05-09:
|
||
- expanded deterministic non-transceiver quarantine for GBICS and T&S Communication artifacts
|
||
- live quarantine result:
|
||
- `93` additional artifacts moved out of the active transceiver base
|
||
- `verify:quarantine:non-transceivers` evidence count: `93`
|
||
- current vendor gaps after cleanup:
|
||
- GBICS: `88` active rows, `17` missing price, `17` missing image, `17` missing details
|
||
- T&S Communication: `36` active rows, `36` missing price, `36` missing image, `6` missing details
|
||
- 10Gtek: `175` active rows, `126` missing price, `131` missing image, `25` missing details
|
||
- fixed `maintenance:reconcile-verification`:
|
||
- preserve explicit `competitor_status IN ('no_valid_match', 'ambiguous')`
|
||
- do not reset deliberate research outcomes back to `needs_research`
|
||
- deployed to Erik:
|
||
- `packages/scraper/src/scheduler.ts`
|
||
- `packages/scraper/src/utils/quarantine-non-transceivers.ts`
|
||
- remote build passed
|
||
- `tip-scraper-daemon` restarted after pg-boss queue was empty
|
||
- restored competitor states after previous reconcile regression:
|
||
- dry-run candidates: `615`
|
||
- apply wrote `74` `no_valid_match`, `541` `ambiguous`, `74` `fully_verified_earned`
|
||
- fresh reconcile test completed successfully after the fix
|
||
- live health after reconcile test:
|
||
- active products: `17212`
|
||
- price verified: `11414`
|
||
- image verified: `12016`
|
||
- details verified: `16702`
|
||
- fully verified: `10449`
|
||
- competitor status: `matched=10775`, `no_valid_match=74`, `ambiguous=556`, `needs_research=5807`
|
||
- fully product-verified rows still in competitor `needs_research`: `0`
|
||
- TIPLLM training pool updated with:
|
||
- reconcile must preserve explicit competitor research states
|
||
- GBICS/T&S artifact quarantine rules
|
||
|
||
- TIP product-page asset verifier on 2026-05-09:
|
||
- added `verify:product-page-assets`
|
||
- deterministic scope:
|
||
- only existing `product_page_url` rows
|
||
- vendor-limited batches via `PRODUCT_ASSET_VENDOR`
|
||
- dry-run by default, apply only with `PRODUCT_ASSET_APPLY=1`
|
||
- extracts images from source-backed product image tags/meta only
|
||
- infers details only from part number, product URL, and title to avoid navigation pollution
|
||
- remote build passed on Erik
|
||
- live verifier results:
|
||
- GBICS extra quarantine: `17` additional category/family artifacts
|
||
- T&S Communication asset apply: `36` images, `36` details closed after a second DR8 reach pass
|
||
- 10Gtek/SFPcables asset apply: `5` images, `10` details improved on rows with existing product URLs
|
||
- current vendor gaps:
|
||
- GBICS: `71` active rows, `0` missing price, `0` missing image, `0` missing details
|
||
- T&S Communication: `36` active rows, `36` missing price, `0` missing image, `0` missing details
|
||
- 10Gtek: `175` active rows, `126` missing price, `126` missing image, `20` missing details
|
||
- interpretation:
|
||
- T&S is now product-data complete but public-price blocked; pages expose no real public price (`price: 0.00` / quote-only behavior)
|
||
- 10Gtek remaining gaps are mostly rows without reliable product URLs/price sources and need alias/source discovery rather than blind image guessing
|
||
- live health after this pass:
|
||
- active products: `17195`
|
||
- price verified: `11414`
|
||
- image verified: `12057`
|
||
- details verified: `16713`
|
||
- fully verified: `10459`
|
||
- competitor status: `matched=10775`, `no_valid_match=74`, `ambiguous=556`, `needs_research=5790`
|
||
- TIPLLM training pool updated with:
|
||
- product-page asset verifier dry-run/apply pattern
|
||
- T&S quote-only public-price rule
|
||
|
||
- MAGATAMA multi-LLM training lane expansion on 2026-05-09:
|
||
- added first-class training lanes for:
|
||
- `pulso_llm`
|
||
- `contact_llm`
|
||
- MAGATAMA training tool now exposes:
|
||
- `MagatamaLLM`
|
||
- `FO_BlogLLM`
|
||
- `TIP_LLM`
|
||
- `PulsoLLM`
|
||
- `ContactLLM`
|
||
- lane split is now canonical:
|
||
- `MagatamaLLM`: MAGATAMA operations, cybersecurity, AI security, infrastructure security, resolver/fix workflows
|
||
- `FO_BlogLLM`: Rene/Flexoptix-style blog writing, technical storytelling, market/blog structure
|
||
- `TIP_LLM`: crawler/scraper/robot planning, source discovery, parser/selectors, switch/transceiver issue research
|
||
- `PulsoLLM`: Flexoptix product/support/diagnostic lane for switches, transceivers, compatibility, product fit and offers
|
||
- `ContactLLM`: contact discovery/research lane for structured, lawful contact lookup and source attribution
|
||
- shared network/transceiver/switch knowledge is intentionally reused for `TIP_LLM` and `PulsoLLM`, but behavior/instruction pools remain separate
|
||
- new source catalog added under MAGATAMA:
|
||
- `training-data/model-registry/research-source-catalog-2026-05-09.json`
|
||
- `training-data/model-registry/external-ingest/llm-lane-research-seeds-2026-05-09.jsonl`
|
||
- source seeds added from current research include:
|
||
- CISA KEV / CISA Malcolm / CISA ScubaGear
|
||
- NVD CVE API
|
||
- MITRE ATT&CK STIX/TAXII
|
||
- OWASP LLM Top 10
|
||
- Microsoft PyRIT
|
||
- Microsoft Agent Governance Toolkit
|
||
- Cisco Transceiver Module Group matrix
|
||
- Juniper Hardware Compatibility Tool
|
||
- Arista transceiver/cable references
|
||
- Flexoptix product/support references
|
||
- RFC 9309 robots.txt
|
||
- schema.org `ContactPoint`
|
||
- RFC 6350 vCard
|
||
- PeeringDB API
|
||
- RIPE Database REST API
|
||
- lane-specific Gitea learning pool directories now exist for:
|
||
- `training-data/gitea-learning-pool/pulso_llm/`
|
||
- `training-data/gitea-learning-pool/contact_llm/`
|
||
- RunPod lane exports rebuilt and deployed live on Erik:
|
||
- `magatamallm`: `1375 train`, `153 eval`, `1528 total`
|
||
- `fo_blogllm`: `17342 train`, `1929 eval`, `19271 total`
|
||
- `tip_llm`: `276 train`, `31 eval`, `307 total`
|
||
- `pulso_llm`: `28 train`, `5 eval`, `33 total`
|
||
- `contact_llm`: `18 train`, `4 eval`, `22 total`
|
||
- dashboard/API live checks:
|
||
- `pulso_llm` and `contact_llm` appear in the training modal
|
||
- RunPod provider is online for both lanes
|
||
- `contact_llm` status correctly reports `neverTrained: true`
|
||
- `pulso_llm` / `contact_llm` are trainable but not adopted yet because no local Ollama model tags exist yet
|
||
- Gitea commits:
|
||
- transceiver-db sync handoff: `3926a1e`
|
||
- MAGATAMA implementation and sanitized training pools: `8fb406b`
|
||
- privacy guard:
|
||
- MAGATAMA pre-commit correctly blocked raw private-network training rows
|
||
- exported Gitea/RunPod training pools now sanitize private IPs, local paths, emails and credentials before commit
|
||
- safety/automation note:
|
||
- do not mark a lane training run successful unless an artifact exists, imports locally, passes smoke tests, and the active alias/version is switched
|
||
- this remains the rule for all LLM lanes
|
||
|
||
- TIP open competitor status closure on 2026-05-09:
|
||
- added migration `sql/104-verification-evidence-ambiguous.sql`
|
||
- extends `transceiver_verification_evidence.verification_type` with `competitor_ambiguous`
|
||
- added `packages/scraper/src/utils/resolve-open-competitor-status.ts`
|
||
- script: `pnpm -C packages/scraper run verify:open-competitor-status`
|
||
- default is dry-run
|
||
- apply requires `OPEN_COMPETITOR_APPLY=1`
|
||
- live Erik run:
|
||
- dry-run found `365` fully populated products still stuck in `needs_research`
|
||
- apply result:
|
||
- `364` set to `ambiguous`
|
||
- `1` set to `no_valid_match`
|
||
- `1` additional product earned `fully_verified`
|
||
- evidence:
|
||
- `364` `competitor_ambiguous` records from `verify:open-competitor-status`
|
||
- `1` `competitor_no_match` record from `verify:open-competitor-status`
|
||
- current fully populated competitor queue:
|
||
- products with price+image+details and `competitor_status='needs_research'`: `0`
|
||
- scheduler guard:
|
||
- updated `maintenance:find-equivalences` so future matcher runs do not reset deliberate `ambiguous` rows back to `needs_research`
|
||
- rebuilt and restarted `tip-scraper-daemon` after confirming no active pg-boss jobs
|
||
- live health after closure:
|
||
- active products: `17305`
|
||
- price verified: `11414`
|
||
- image verified: `12016`
|
||
- details verified: `16705`
|
||
- fully verified: `10449`
|
||
- competitor status:
|
||
- `matched=10775`
|
||
- `no_valid_match=74`
|
||
- `ambiguous=556`
|
||
- `needs_research=5900`
|
||
- remaining `needs_research` rows are no longer fully populated competitor-ready products; they are product-data gaps first
|
||
|
||
- TIP product-data gap probing on 2026-05-09:
|
||
- hardened `verify:catalog:details` to write `details` evidence
|
||
- run result: `113` catalog-derived rows updated, `0` additional fully verified
|
||
- active Health did not move, indicating those updates were outside the active dashboard base or not counted by the current active filters
|
||
- GAO Tek detail verifier:
|
||
- checked remaining `64` GAO product URLs
|
||
- result: `0` updated, `64` skipped, `0` errors
|
||
- interpretation: remaining GAO rows lack deterministic public detail evidence; no fake details added
|
||
- GBICS:
|
||
- added package script `scrape:gbics`
|
||
- patched scraper to pass product URLs into `findOrCreateScrapedTransceiver`
|
||
- live run found `758` products, `0` prices
|
||
- active GBICS gap remains `64 price / 64 image / 64 details` of `135`
|
||
- interpretation: GBICS has product discovery, but active old rows and scraped product identifiers do not line up cleanly; needs alias/dedupe hardening plus price selector repair
|
||
- T&S Communication:
|
||
- added package script `scrape:tscom`
|
||
- patched scraper to pass product URLs into `findOrCreateScrapedTransceiver`
|
||
- live run found `109` unique products, `0` prices
|
||
- active T&S gap remains `82 price / 82 image / 49 details` of `82`
|
||
- interpretation: product discovery works, but price selector and existing-row matching need hardening
|
||
- 10Gtek / SFPcables:
|
||
- live run found `110` products and wrote `6` prices
|
||
- active 10Gtek gap remains `126 price / 131 image / 25 details` of `175`
|
||
- interpretation: parser works partially, but many active 10Gtek rows are unmatched aliases or lack source pages
|
||
- current largest active product-data gaps:
|
||
- `Juniper Networks`: `283 price / 394 image / 173 details` of `534`
|
||
- `Cisco Systems`: `151 price / 351 image / 146 details` of `351`
|
||
- `GAO Tek`: `456 price / 23 image / 87 details` of `458`
|
||
- `GBICS`: `64 price / 64 image / 64 details` of `135`
|
||
- `T&S Communication`: `82 price / 82 image / 49 details` of `82`
|
||
- `10Gtek`: `126 price / 131 image / 25 details` of `175`
|
||
|
||
- TIP FS.com SKU alias cleanup on 2026-05-09:
|
||
- added `packages/scraper/src/utils/quarantine-fs-sku-aliases.ts`
|
||
- script: `pnpm -C packages/scraper run verify:fs:sku-aliases`
|
||
- default mode is dry-run
|
||
- apply mode requires `FS_SKU_ALIAS_APPLY=1`
|
||
- purpose:
|
||
- remove duplicate active FS.com numeric SKU rows such as `FS-380881` when the same FS URL already has the real product P/N row such as `OSFP-DR8-1.6T-FL`
|
||
- prevent numeric FS SKU aliases from becoming false competitor or no-match candidates
|
||
- safe gates:
|
||
- alias must match `^FS-[0-9]+$`
|
||
- same normalized FS product URL must have a non-numeric canonical product row
|
||
- canonical row must already have price, image, and details verified
|
||
- live Erik run:
|
||
- dry-run found `109` candidates
|
||
- apply quarantined `109`
|
||
- evidence ledger wrote `109` `artifact_quarantine` records from `verify:fs:sku-aliases`
|
||
- active numeric-SKU duplicates with canonical product row after run: `0`
|
||
- specific user-reported FS.com 1.6T case:
|
||
- numeric shadow rows `FS-380881` and `FS-380883` were duplicate aliases
|
||
- canonical rows remain active:
|
||
- `OSFP-DR8-1.6T-FL`
|
||
- `OSFP-2FR4-1.6T-FL`
|
||
- this preserves both the 500m DR8 and the 2km FR4 product instead of treating the numeric SKU as a separate transceiver
|
||
- live health after reconcile/matcher:
|
||
- active products: `17305`
|
||
- price verified: `11414`
|
||
- image verified: `12016`
|
||
- details verified: `16705`
|
||
- fully verified: `10448`
|
||
- active competitor status:
|
||
- `matched=10775`
|
||
- `no_valid_match=73`
|
||
- `ambiguous=192`
|
||
- `needs_research=6265`
|
||
- fully populated product rows still needing competitor research:
|
||
- `Flexoptix=359`
|
||
- `FS.COM=4`
|
||
- `ATGBICS=2`
|
||
- FS.com and Flexoptix no-valid-match dry-runs now both return `0`; remaining cases need real candidate research/normalization, not no-match closure
|
||
- TIPLLM training pool:
|
||
- appended lesson for FS.com numeric SKU alias quarantine
|
||
|
||
- TIP no-valid-competitor resolver on 2026-05-09:
|
||
- added `packages/scraper/src/utils/resolve-no-valid-competitor.ts`
|
||
- script: `pnpm -C packages/scraper run verify:no-valid-competitor`
|
||
- default mode is dry-run
|
||
- apply mode requires `NO_VALID_MATCH_APPLY=1`
|
||
- default vendor scope is `NO_VALID_MATCH_VENDOR=Flexoptix`
|
||
- purpose:
|
||
- close products that already have price, image, and details evidence
|
||
- only resolve competitor verification when there is no strict source-backed 1:1 competitor candidate
|
||
- avoid fake competitor matches for uncommon Flexoptix products
|
||
- conservative gates:
|
||
- active transceiver only; excludes known artifact/non-transceiver categories
|
||
- source-backed `price_verified`, `image_verified`, and `details_verified` required
|
||
- same-vendor candidates ignored; only other vendors count
|
||
- strict candidate match requires same form factor, same speed, same fiber, reach within max(25m, 5%), and compatible wavelength when both sides expose it
|
||
- no pending/approved equivalence above confidence `0.50`
|
||
- live Erik run:
|
||
- dry-run with Flexoptix scope found `73` no-valid-match candidates
|
||
- apply run updated `73`
|
||
- `73` additional products earned `fully_verified`
|
||
- evidence ledger wrote `73` `competitor_no_match` records
|
||
- live health after run:
|
||
- active products: `17414`
|
||
- price verified: `11523`
|
||
- image verified: `12125`
|
||
- details verified: `16814`
|
||
- fully verified: `10831`
|
||
- active competitor status:
|
||
- `matched=11158`
|
||
- `no_valid_match=73`
|
||
- `ambiguous=192`
|
||
- `needs_research=5991`
|
||
- operational note:
|
||
- `tip-scraper-daemon` was initially not restarted while QSFPTEK/NADDOD pricing jobs were active
|
||
- after those jobs cleared, `tip-scraper-daemon` was restarted once
|
||
- `maintenance:reconcile-verification` completed
|
||
- `maintenance:find-equivalences` completed
|
||
- matcher correctly moved `192` products into `ambiguous` instead of inventing unsafe matches
|
||
- remaining fully populated product rows with `needs_research`:
|
||
- `FS.COM=74`
|
||
- `Flexoptix=15`
|
||
- `ATGBICS=2`
|
||
- TIPLLM training pool:
|
||
- appended deterministic no-valid-match resolver lessons
|
||
- JSONL must remain valid after every append
|
||
|
||
- TIP verification truth model on 2026-05-09:
|
||
- implemented migration `sql/103-verification-evidence-and-competitor-status.sql`
|
||
- adds `transceivers.competitor_status`
|
||
- `matched`
|
||
- `no_valid_match`
|
||
- `needs_research`
|
||
- `ambiguous`
|
||
- `unknown`
|
||
- adds `no_match_verified_at` and `no_match_reason`
|
||
- creates append-only `transceiver_verification_evidence`
|
||
- code changes:
|
||
- scraper DB helper now records evidence for price/image/details decisions
|
||
- artifact quarantine records `artifact_quarantine` evidence
|
||
- matcher writes `competitor_match` evidence for auto-approved matches
|
||
- matcher sets product status to `matched`, `ambiguous`, or `needs_research`
|
||
- Review API adds protected `POST /api/review/transceivers/:id/no-valid-match`
|
||
- Review stats now include product-level competitor status counts
|
||
- Health API now exposes active-product competitor status counts
|
||
- live migration/backfill:
|
||
- applied on Erik successfully
|
||
- status distribution after migration:
|
||
- `matched=11198`
|
||
- `needs_research=6575`
|
||
- Evidence ledger seeded from current data:
|
||
- `price=10633`
|
||
- `image=12189`
|
||
- `details=16782`
|
||
- `competitor_match=316`
|
||
- live API checks:
|
||
- `/api/health` healthy
|
||
- active health competitor status:
|
||
- `matched=11158`
|
||
- `needs_research=6256`
|
||
- `no_valid_match=0`
|
||
- `ambiguous=0`
|
||
- protected review stats with Dashboard token returned product status counts correctly
|
||
- operational note:
|
||
- `tip-api` restarted successfully
|
||
- `tip-scraper-daemon` was not restarted because `scrape:pricing:naddod` and `scrape:pricing:qsfptek` were active
|
||
- scheduler code is synced to `/opt/tip`; restart daemon after those jobs complete to load new matcher/reconcile logic
|
||
- TIPLLM training pool:
|
||
- appended lessons for competitor state machine and evidence ledger
|
||
- JSONL validated locally
|
||
|
||
- MAGATAMA MagatamaLLM RunPod training and adoption closure on 2026-05-09:
|
||
- operator requirement:
|
||
- RunPod success only counts after artifact exists, local Ollama import works, smoke tests pass, aliases/version switch, remote registry is updated, and live MAGATAMA reports no stale active run
|
||
- do not spend another RunPod run when the paid training already completed; recover adoption instead
|
||
- RunPod job completed:
|
||
- endpoint `0rmkf28w2g5gip`
|
||
- job `a46de2ef-96e0-4adf-bbf8-d7a890e06c6f-e2`
|
||
- run id `magatamallm-2026-05-09T19-22-53`
|
||
- target artifact `renefichtmueller/magatama-magatamallm-magatamallm-2026-05-09t19-22-53`
|
||
- worker summary `RunPod QLoRA complete · train=605 · valid=114`
|
||
- adoption recovered:
|
||
- initial local adoption failed because Mac Studio had too little free disk for GGUF conversion after the merged model was written
|
||
- removed only temporary/import-safe blockers:
|
||
- failed MagatamaLLM merged `model.safetensors`
|
||
- already imported FO_BlogLLM and TIP_LLM source GGUF files
|
||
- old non-active Ollama test model `test-qwen32b:latest`
|
||
- kept active Ollama aliases intact: `magatama-coder:latest`, `fo-blog-v7`, `tip-llm-v1`
|
||
- adoption completed:
|
||
- local candidate `magatamallm-runpod-magatamallm-2026-05-09t19-22-53`
|
||
- release alias `magatama-coder-r1`
|
||
- active alias `magatama-coder:latest`
|
||
- candidate smoke `4/5` passed with the required threshold `4`
|
||
- direct local smoke returned exact `MAGATAMA-R1-READY`
|
||
- dashboard/server correction:
|
||
- deployed a MAGATAMA dashboard server fix so training registry ordering uses `recorded_at`, with `completed_at/adopted_at/created_at` fallbacks
|
||
- release/version selection now accepts top-level `release_alias` and `candidate_model` on adoption events
|
||
- legacy MagatamaLLM baseline mismatch guard no longer invalidates the new RunPod lane export
|
||
- restarted `magatama-dashboard`
|
||
- live verification:
|
||
- `magatamallm` reports `activeProvider=ollama:magatama-coder:latest`
|
||
- `modelVersion=magatama-coder-r1`
|
||
- `lastRegistryRunStatus=completed_and_adopted`
|
||
- `activeRun=null`
|
||
- `hasTrustedTrainingBaseline=true`
|
||
- `newSinceLastTraining=0`
|
||
- lane export shows `1367` train, `152` eval, `1519` total
|
||
- `fo_blogllm` remains `fo-blog-v7-r1`, `activeRun=null`, `newSinceLastTraining=0`
|
||
- `tip_llm` remains `tip-llm-v1-r1`, `activeRun=null`, `newSinceLastTraining=0`
|
||
- open:
|
||
- add more explicit training pairs for the “insufficient evidence => escalate/manual review” behavior because the new MagatamaLLM passed the required smoke threshold but still answered that one eval too passively
|
||
- complete dual-Gitea mirroring as a separate infrastructure closure item
|
||
|
||
- TIP verification artifact cleanup and vendor completion on 2026-05-09:
|
||
- operator requirement:
|
||
- continue until all source-backed verification work is exhausted
|
||
- use deterministic TIP robots/scrapers only; no external AI
|
||
- keep Erik safe by running targeted jobs and waiting for pg-boss completion
|
||
- write crawler/scraper/robot learnings into the TIPLLM training pool
|
||
- deployed fixes:
|
||
- added/expanded `verify:quarantine:non-transceivers`
|
||
- removes GAO, Ascent, FS.com, Flexoptix, Arista, ShopFiber24, and Coherent category/support/cable/switch artifacts from the active transceiver base
|
||
- clears price/image/details/competitor/fully verification flags for those artifacts
|
||
- added `verify:normalize:product-urls`
|
||
- repaired malformed older Mouser URLs such as duplicated `https://www.mouser.dehttps://www.mouser.de...`
|
||
- added `scrape:gaotek:details`
|
||
- lightweight fetch+cheerio detail verifier for GAO product URLs
|
||
- hardened Ascent parser so product-family/category rows are skipped
|
||
- repaired 10Gtek/SFPcables scraper to pass product URL and image URL into verification and parse common meter/range reaches
|
||
- scheduler reconcile now excludes known non-transceiver categories when promoting `details_verified`
|
||
- live robot runs:
|
||
- non-transceiver quarantine:
|
||
- first pass quarantined 121 artifacts
|
||
- Flexoptix filter URL pass quarantined 103 artifacts
|
||
- Ascent/Flex/FS/Arista/ShopFiber/Coherent cleanup quarantined 68 + 38 + 6 additional artifacts
|
||
- GAO detail verifier:
|
||
- 245 GAO product pages examined
|
||
- 181 rows updated and details verified
|
||
- 64 skipped because source text still lacked complete deterministic specs
|
||
- Mouser URL normalizer:
|
||
- 388 malformed `mouser.de` URLs repaired
|
||
- 10Gtek scraper:
|
||
- 50 product pages parsed via sfpcables.com
|
||
- URL/image propagation repaired for future verification
|
||
- Ascent scraper:
|
||
- 237 genuine product rows kept after parser hardening
|
||
- category/family rows no longer re-enter active verification
|
||
- FS.com DB detail run:
|
||
- 1 remaining detail page scraped
|
||
- 1 price observation and 1 spec verification written
|
||
- reconcile completed
|
||
- equivalence matcher completed at `2026-05-09 20:11:39 UTC`
|
||
- latest live TIP health:
|
||
- status `healthy`
|
||
- load status `ok`
|
||
- memory used `13%`
|
||
- active total `17,405`
|
||
- `price_verified=11,523`
|
||
- `image_verified=12,125`
|
||
- `details_verified=16,810`
|
||
- `fully_verified=10,758`
|
||
- vendor truth after cleanup:
|
||
- active Flexoptix products now have price/image/details complete; remaining `not_full=280` is competitor-match only
|
||
- active FS.com products now have price/image/details complete; remaining `not_full=74` is competitor-match only
|
||
- GAO Tek remains quote-only/no public prices: 433 active rows still blocked by missing public price/competitor evidence
|
||
- Juniper/Cisco/Eoptolink/Ascent/OEM families remain the largest open blockers because public price/image evidence is not available for many rows
|
||
- TIPLLM training pool:
|
||
- appended deterministic lessons to `training-data/tip-llm-capabilities-v1.jsonl`
|
||
- JSONL validated locally
|
||
|
||
- TIP global verification continuation on 2026-05-09:
|
||
- operator requirement:
|
||
- continue until all possible product data is searched, found, verified, and source-backed
|
||
- no external AI; use TIP deterministic scrapers/robots only
|
||
- keep Erik safe; do not launch a heavy crawler wave
|
||
- write crawler/scraper/robot learnings into the TIPLLM training pool
|
||
- deployed fixes:
|
||
- repaired GAO Tek scraper for the live Woodmart product grid:
|
||
- current selector is `.wd-product.product-grid-item`
|
||
- product title selector includes `.wd-entities-title a`
|
||
- SKU selector includes `.wd-sku`
|
||
- fallback now only accepts real `https://gaotek.com/product/...` URLs
|
||
- category URLs are excluded from active verification/search counters
|
||
- expanded GAO reach parsing:
|
||
- 1/2/10/15/20/30/40/50/80/120/140/160 km
|
||
- 82/100/300/500/550 m
|
||
- mile values converted to rounded km labels
|
||
- added `packages/scraper/src/utils/verify-catalog-details.ts`
|
||
- promotes details only for complete normalized catalog specs with a vendor website/docs/datasheet source URL
|
||
- does not mark price/image/competitor verified
|
||
- hardened scheduler reconcile so category URLs are not promoted as details source
|
||
- fixed Flexoptix image backfill vendor-name case bug (`Flexoptix` vs `FLEXOPTIX`)
|
||
- expanded other-vendor image backfill list for Cisco, Juniper, Arista, 10Gtek, QSFPTEK, SFPcables, Coherent, NADDOD
|
||
- crawler/robot runs:
|
||
- GAO Tek scraper:
|
||
- fetched 20 pages
|
||
- extracted 480 real product cards
|
||
- found 0 public prices
|
||
- reset 6 category/non-product artifacts
|
||
- pi-fetch priority wave:
|
||
- GAO Tek, Juniper OEM/MX/QFX, Cisco Nexus/Catalyst/ASR, Ascent, Eoptolink, Flexoptix, Flexoptix supported vendors, Arista OEM
|
||
- all jobs completed
|
||
- reconcile completed
|
||
- equivalence matcher completed
|
||
- catalog-details verifier promoted 4,340 details
|
||
- image backfill:
|
||
- first expanded run updated 48 images
|
||
- Flexoptix case fix then updated 12 additional images
|
||
- live public TIP health after this pass:
|
||
- status `healthy`
|
||
- load status `ok`
|
||
- memory used `13%`
|
||
- active total `17,714`
|
||
- `price_verified=11,582`
|
||
- `image_verified=12,194`
|
||
- `details_verified=16,684`
|
||
- `fully_verified=11,052`
|
||
- hard truth:
|
||
- GAO Tek appears quote-only/no public price in the crawled catalog, so prices remain unverified rather than fabricated
|
||
- many OEM rows now have verified details but still lack public prices/images/competitor evidence
|
||
- Flexoptix still has 110 image-missing SKUs after GraphQL returned no usable image for those SKUs
|
||
- top remaining blockers are mostly public price/image/competitor availability, not detail parsing
|
||
- TIPLLM training pool:
|
||
- appended `robot-experiences/2026-05-09.jsonl`
|
||
- validated JSONL locally
|
||
|
||
- MAGATAMA FO_BlogLLM RunPod training and adoption closure on 2026-05-09:
|
||
- operator requirement:
|
||
- training success must only count after artifact exists, local import works, smoke tests pass, Ollama alias/version switches, remote MAGATAMA registry is updated, and the live UI reports no active stale job
|
||
- no repeat of failed "COMPLETED but nothing adopted" serverless runs
|
||
- local Mac Studio training remains throttled by default to avoid saturating the workstation
|
||
- RunPod job completed:
|
||
- endpoint `0rmkf28w2g5gip`
|
||
- job `99d08ef2-9016-4488-ac69-3585c8a09f38-e2`
|
||
- run id `fo_blogllm-2026-05-09T17-14-16`
|
||
- target artifact `renefichtmueller/magatama-fo-blogllm-fo-blogllm-2026-05-09t17-14-16`
|
||
- worker summary `RunPod QLoRA complete · train=11473 · valid=1281`
|
||
- failure recovered:
|
||
- first local adoption failed because Mac Studio disk filled during F16 GGUF conversion
|
||
- removed stale partial F16 GGUF and obsolete merged safetensors to restore free space
|
||
- hardened importer to:
|
||
- require minimum free disk before conversion
|
||
- delete stale partial F16 before retry
|
||
- reuse existing GGUF when present
|
||
- delete temporary F16 in all cases
|
||
- remove merged safetensors/bin after successful Ollama registration unless `.keep-merged` exists
|
||
- adoption completed:
|
||
- local candidate `fo-blogllm-runpod-fo_blogllm-2026-05-09t17-14-16`
|
||
- release alias `fo-blog-v7-r1`
|
||
- active alias `fo-blog-v7`
|
||
- candidate smoke `5/5` passed
|
||
- direct local smoke returned exact `FO-BLOG-V7-READY`
|
||
- dashboard/server hardening:
|
||
- old baseline smoke is now non-blocking when the active alias does not exist yet; candidate smoke remains mandatory
|
||
- deployed updated dashboard bundle, fine-tuner API template, and RunPod-Ollama importer to Erik
|
||
- restarted `magatama-dashboard`
|
||
- copied `fo_blogllm-last_run.json` and adoption report to Erik
|
||
- appended remote training registry event `completed_and_adopted`
|
||
- live verification:
|
||
- `fo_blogllm` reports `activeProvider=ollama:fo-blog-v7`
|
||
- `modelVersion=fo-blog-v7-r1`
|
||
- `lastRegistryRunStatus=completed_and_adopted`
|
||
- `activeRun=null`
|
||
- `collectedExamples=17322`, `evalExamples=1926`, `totalExamples=19267`
|
||
- `newSinceLastTraining=0`
|
||
- `tip_llm` remains healthy with `tip-llm-v1-r1`, `activeRun=null`, `newSinceLastTraining=0`
|
||
- TIP runtime correction:
|
||
- TIP UI already referenced `fo-blog-v7`, but `/opt/tip/blog-llm-settings.json` still forced `provider=claude-code`
|
||
- old adapter bridge port `192.168.178.213:11435` was not reachable
|
||
- switched runtime and PM2 env to `BLOG_LLM_PROVIDER=ollama`, `OLLAMA_URL=http://192.168.178.213:11434`, `OLLAMA_LLM_MODEL=fo-blog-v7`
|
||
- restarted `tip-api` and `tip-scraper-daemon`
|
||
- verified from Erik that `fo-blog-v7` answers through the TIP path with exact `TIP-FO-BLOG-V7-READY`
|
||
- open:
|
||
- run the same end-to-end custom-worker/adoption path for `magatamallm`
|
||
- complete dual-Gitea mirroring as separate infrastructure closure item
|
||
|
||
- Near-complete detail queue closed with lightweight vendor detail verifiers on 2026-05-09:
|
||
- operator requirement:
|
||
- keep Erik safe; no heavy browser crawler or Playwright wave
|
||
- only source-backed product details may be marked verified
|
||
- crawler/scraper/robot learnings must be written to the TIPLLM training pool
|
||
- implemented:
|
||
- `packages/scraper/src/scrapers/atgbics-detail-pages.ts`
|
||
- `packages/scraper/src/scrapers/shopfiber24-fibermall-detail-pages.ts`
|
||
- npm scripts:
|
||
- `scrape:atgbics:details`
|
||
- `scrape:vendors:details`
|
||
- ATGBICS product.js pass:
|
||
- first run fetched `107`, updated `97`, skipped `10`, promoted `97`
|
||
- parser then learned to ignore unhelpful `Max Distance_N/A` tags and fall back to title/body source text
|
||
- final run fetched `10`, updated `10`, skipped `0`, promoted `10`
|
||
- after a concurrent price update exposed another AOC batch, follow-up run fetched `23`, updated `23`, skipped `0`, promoted `23`
|
||
- ATGBICS near-complete missing details reduced to `0`
|
||
- FiberMall + ShopFiber24 detail pass:
|
||
- first run fetched `116`, updated `112`, skipped `4`, promoted `112`
|
||
- final semantic closure fetched `4`, updated `4`, skipped `0`, promoted `4`
|
||
- FiberMall near-complete missing details reduced to `0`
|
||
- ShopFiber24 near-complete missing details reduced to `0`
|
||
- truth handling:
|
||
- FiberMall uses Schema.org Product JSON-LD for title/description/mpn/image evidence
|
||
- ShopFiber24 uses static title/meta/description evidence
|
||
- variable AOC/DAC/category family pages are classified as `Product Family`, `AOC Cable Family`, or `DAC Cable Family` with `Variant` reach instead of a fake fixed meter value
|
||
- media converters/switches/mux/adapter rows are classified as non-transceiver product classes instead of optical equivalents
|
||
- 100G DWDM DCO rows are classified as `Coherent DWDM` with line-system-dependent reach when source pages do not provide a normal reach
|
||
- final live state:
|
||
- global `price_verified=11582`
|
||
- global `details_verified=12276`
|
||
- global `fully_verified=11001`
|
||
- near-complete queue `price_verified AND image_verified AND competitor_verified AND NOT details_verified = 0`
|
||
- public TIP health `healthy`
|
||
- load status `ok`
|
||
- memory used `12%`
|
||
|
||
- MAGATAMA training live cleanup and TIP_LLM adoption closure on 2026-05-09:
|
||
- operator requirement:
|
||
- no local Mac Studio training may consume the full workstation by default
|
||
- RunPod success must mean artifact exists, local import works, alias/version switches, smoke tests pass, and metadata is written back
|
||
- stale RunPod jobs must not keep the UI in a fake "running" state
|
||
- live cleanup completed:
|
||
- cancelled stale RunPod job `83baffe9-d702-43fc-a2b0-bd5818b74059-e2` on old endpoint `ocnuj82cowe2ym`
|
||
- copied local `tip_llm-last_run.json` back to Erik under `/root/magatama-llm/fine-tuning/`
|
||
- appended remote training registry event `completed_and_adopted` for custom-worker job `dd35df4a-99f7-468f-8c9e-be19baa78338-e1`
|
||
- live dashboard now reports `activeRun: null` for `tip_llm` instead of stale in-queue work
|
||
- adopted model state:
|
||
- active TIP_LLM alias is `tip-llm-v1`
|
||
- release alias is `tip-llm-v1-r1`
|
||
- source artifact is `renefichtmueller/magatama-tip-llm-tip-llm-2026-05-09t13-16-14`
|
||
- local smoke test returned exact `TIP_OK`
|
||
- dashboard hardening:
|
||
- stale active training detection now collapses registry rows by job/run and ignores terminal, expired, 404, or cancelled RunPod jobs
|
||
- deployed patched `packages/dashboard/dist/server.js` and restarted `magatama-dashboard`
|
||
- Mac Studio safety:
|
||
- local training now defaults to `nice=+10`, BLAS/OpenMP thread caps of `4`, tokenizer parallelism off, and MPS high-watermark ratio `0.70`
|
||
- full-speed local training requires explicit `MAGATAMA_LOCAL_TRAIN_UNTHROTTLED=1`
|
||
- live verification:
|
||
- `tip_llm` reports `modelVersion=tip-llm-v1-r1`, `lastRegistryRunStatus=completed_and_adopted`, `activeRun=null`
|
||
- `fo_blogllm` still uses its lane-specific pool and active provider `ollama:fo-blog-v7`
|
||
- open:
|
||
- run the same hardened custom-worker end-to-end path for `magatamallm` and the next `fo_blogllm` version
|
||
- keep Gitea/proxmox mirror work as a separate infrastructure closure item
|
||
|
||
- ATGBICS deterministic special-case backfill on 2026-05-09:
|
||
- precheck:
|
||
- after the explicit URL evidence pass, ATGBICS still had `139` near-complete rows
|
||
- `32` matched safe protocol/product-class cases:
|
||
- loopback/test modules
|
||
- 10GBASE-T / RJ45 copper
|
||
- 10GBASE-LRM
|
||
- BX60 / BXD-60 / BXU-60
|
||
- CWDM 10G 60km
|
||
- CSR rows
|
||
- DB correction:
|
||
- loopback/test modules -> `N/A` reach/fiber/wavelength, `Loopback / Test Module`
|
||
- 10GBASE-T/RJ45 -> `30m`, `Copper`, `N/A`
|
||
- LRM -> `220m`, `MMF`, `1310`
|
||
- BX60 -> `60km`, `SMF`, directional BiDi wavelength evidence
|
||
- CWDM 10G 60 -> `60km`, `SMF`, source wavelength
|
||
- CSR -> `400m`, `MMF`, `850`
|
||
- result:
|
||
- `32` ATGBICS rows detail-verified
|
||
- `32` additional rows promoted to fully verified
|
||
- ATGBICS near-complete missing details reduced from `139` to `107`
|
||
- global `details_verified=12030`
|
||
- global `fully_verified=10753`
|
||
- health:
|
||
- public TIP health stayed `healthy`
|
||
- load status `ok`
|
||
- memory used `12%`
|
||
- truth:
|
||
- remaining ATGBICS rows need detail-page extraction; they are mostly generic OEM/part-number pages where URL slug does not encode the reach
|
||
|
||
- ATGBICS explicit URL evidence backfill on 2026-05-09:
|
||
- precheck:
|
||
- ATGBICS had `485` price+image+URL-complete rows still lacking detail verification
|
||
- `346` had explicit source URL evidence for reach and media:
|
||
- `m/km` distance in URL
|
||
- `nm` wavelength where optical
|
||
- `smf/mmf/copper/dac/base-t/rj45` media evidence
|
||
- DB correction:
|
||
- extracted reach label/meters from explicit URL `m/km`
|
||
- extracted wavelength from explicit URL `nm`
|
||
- classified media as `SMF`, `MMF`, or `Copper` from URL evidence
|
||
- corrected form factor and speed from protocol terms in URL where stale parser defaults existed
|
||
- marked only those source-evident rows as `details_verified`
|
||
- result:
|
||
- `346` ATGBICS rows detail-verified
|
||
- `346` additional rows promoted to fully verified
|
||
- ATGBICS near-complete missing details reduced from `485` to `139`
|
||
- global `details_verified=11998`
|
||
- global `fully_verified=10721`
|
||
- health:
|
||
- public TIP health stayed `healthy`
|
||
- load status `ok`
|
||
- memory used `13%`
|
||
- truth:
|
||
- remaining ATGBICS rows no longer have simple `m/km + media` URL evidence and need product-page parsing or special handling
|
||
|
||
- NADDOD adapter classification and FS.COM final detail closure on 2026-05-09:
|
||
- precheck:
|
||
- NADDOD had `3` near-complete rows remaining
|
||
- FS.COM had `1` near-complete row remaining
|
||
- source verification:
|
||
- NADDOD `100GBASE-S25`, `40GBASE-S10`, and `MAM1Q00A-QSA28-S` are adapter/converter modules, not optical transceivers
|
||
- FS SKU `110529` is official FS `QDD-LR4-400G`, `400GBASE-LR4 QSFP-DD`, `10km`, `SMF`, CWDM4 `1271/1291/1311/1331nm`, Duplex LC
|
||
- DB correction:
|
||
- classified the `3` NADDOD rows as `Adapter / Converter`
|
||
- set NADDOD reach/fiber/wavelength to `N/A` and corrected connector/form-factor/speed semantics
|
||
- corrected FS `FS-110529` to part number `QDD-LR4-400G`, standard `400GBASE-LR4 QSFP-DD`, CWDM4 wavelength set, Duplex LC/UPC
|
||
- result:
|
||
- `4` rows detail-verified
|
||
- `3` additional rows promoted to fully verified
|
||
- NADDOD near-complete reduced to `0`
|
||
- FS.COM near-complete reduced to `0`
|
||
- global `details_verified=11652`
|
||
- global `fully_verified=10375`
|
||
- health:
|
||
- public TIP health stayed `healthy`
|
||
- load status `ok`
|
||
- memory used `13%`
|
||
- truth:
|
||
- adapters/converters are verified as non-optical product classes and must not be used as optical transceiver equivalence evidence
|
||
|
||
- GBICS / QSFPTEK / Fluxlight deterministic standard backfill on 2026-05-09:
|
||
- precheck:
|
||
- GBICS had `13` near-complete rows
|
||
- QSFPTEK had `8` near-complete rows
|
||
- Fluxlight had `11` near-complete rows
|
||
- DB correction:
|
||
- GBICS:
|
||
- filled missing fiber/reach from explicit title/URL evidence such as `850nm`, `1310nm`, `1550nm`, `40km`, `80km`, `220m`, `50m`, `CSR`, `ESR`, `SR8`, `VSR4`, `PSM4`, `PLR4`
|
||
- QSFPTEK:
|
||
- filled SMF and missing long-reach values for `EX`, `EZX`, `ZX`, `LH` product-code rows
|
||
- Fluxlight:
|
||
- corrected obvious stale parser defaults and filled standard evidence for `GLC-LX`, `QDD-4X100G-FR`, `QSFP-100G-SR4`, `QSFP-40G-SR4`, `SFP-10G-T`, `CSR`
|
||
- result:
|
||
- `32` rows detail-verified
|
||
- `32` additional rows promoted to fully verified
|
||
- GBICS near-complete reduced to `0`
|
||
- QSFPTEK near-complete reduced to `0`
|
||
- Fluxlight near-complete reduced to `0`
|
||
- global `details_verified=11648`
|
||
- global `fully_verified=10372`
|
||
- health:
|
||
- public TIP health stayed `healthy`
|
||
- load status `ok`
|
||
- memory used `13%`
|
||
- truth:
|
||
- this was not a broad guess pass; only rows with explicit standard/URL evidence were updated
|
||
|
||
- FiberMall URL protocol backfill on 2026-05-09:
|
||
- precheck:
|
||
- after the earlier source-title pass, `36` FiberMall rows remained price+image+URL complete but lacked detail verification
|
||
- `12` had safe protocol evidence in the product URL slug
|
||
- DB correction:
|
||
- mapped URL protocol slugs including `sfp-10g-lrm`, `qsfp-40g-lr`, `40lr`, `dem-qx10q-lr4`, `osfp-800g-2fr4`, `qsfp-dd-400g-lr8`, `400g-qsfp-dd-sr4`, `200g-q56-sr4-mm850`, `xg-sfp-zr-sm1550`, `sfp28-lr`, `ma-qsfp-40g-sr-bd`
|
||
- corrected form factor, speed, reach, fiber, wavelength and standard name from those protocol slugs
|
||
- skipped brand-name-only rows without protocol/reach evidence
|
||
- result:
|
||
- `12` FiberMall rows detail-verified
|
||
- `12` additional rows promoted to fully verified
|
||
- FiberMall near-complete missing details reduced from `36` to `24`
|
||
- global `details_verified=11616`
|
||
- global `fully_verified=10340`
|
||
- health:
|
||
- public TIP health stayed `healthy`
|
||
- load status `ok`
|
||
- memory used `13%`
|
||
- truth:
|
||
- remaining FiberMall rows are mostly brand/OEM-code-only URLs and need stronger product-page parsing before approval
|
||
|
||
- ShopFiber24 deterministic code backfill on 2026-05-09:
|
||
- precheck:
|
||
- `101` ShopFiber24 rows were price+image+URL complete but lacked detail verification
|
||
- many were variable cable families (`XM`, `CXM`, `CUXM`, `CXX`, AOC/DAC family rows) and were intentionally skipped
|
||
- `9` rows had deterministic product-code evidence: `LRM`, `BX60`, `LH70`, `T-80`
|
||
- DB correction:
|
||
- `LRM` -> `220m`, `MMF`, `1310`
|
||
- `BX60` / `BX-D-60` / `BX-U-60` -> `60km`, `SMF`, `1270/1330`
|
||
- `LH70` -> `70km`, `SMF`, `1550`
|
||
- `T-80` -> `80m`, `Copper`, `N/A`
|
||
- result:
|
||
- `9` ShopFiber24 rows detail-verified
|
||
- `9` additional rows promoted to fully verified
|
||
- ShopFiber24 near-complete missing details reduced from `101` to `92`
|
||
- global `details_verified=11604`
|
||
- global `fully_verified=10328`
|
||
- health:
|
||
- public TIP health stayed `healthy`
|
||
- load status `ok`
|
||
- memory used `13%`
|
||
- truth:
|
||
- remaining ShopFiber24 gaps need variant-level extraction or direct page parsing; variable cable-family rows must not be marked as one fixed reach
|
||
|
||
- ATGBICS parser truth hardening on 2026-05-09:
|
||
- root cause:
|
||
- ATGBICS parser defaulted unknown fiber type to `SMF`
|
||
- automatic detail verification needs positive fiber evidence, not a fallback
|
||
- variable-length ranges must not be collapsed into a fixed reach
|
||
- code hardened:
|
||
- `packages/scraper/src/scrapers/atgbics.ts`
|
||
- refuses variable reach ranges such as `1 - 30 m`
|
||
- only returns `SMF` from explicit SMF/single-mode or protocol evidence such as LR/ER/ZR/BiDi/CWDM/DWDM/DR/FR/PSM
|
||
- returns empty fiber type when evidence is missing instead of assuming SMF
|
||
- verification:
|
||
- `npm run build -w packages/scraper` passed locally
|
||
- deployment:
|
||
- source file synced to `/opt/tip`
|
||
- `pnpm -C packages/scraper build` passed on Erik after SSH recovered
|
||
- truth:
|
||
- future ATGBICS runs should not promote rows to detail-verified from default fiber assumptions
|
||
|
||
- ShopFiber24 parser hardening for deterministic cable/detail verification on 2026-05-09:
|
||
- root cause:
|
||
- ShopFiber24 contains variable-length AOC/DAC products such as `1 - 30 m`
|
||
- those must not be interpreted as one fixed `30m` reach and marked detail-verified
|
||
- the scraper also treated `800G` / `QSFP-DD800` product text as `400G`
|
||
- code hardened:
|
||
- `packages/scraper/src/scrapers/fiber24.ts`
|
||
- detects `800G` as `800G` / `800Gbps`
|
||
- parses explicit single `m/km` reach values generically
|
||
- refuses variable ranges like `1 - 30 m`, `1 to 30 m`, `1 bis 30 m`
|
||
- verification:
|
||
- `npm run build -w packages/scraper` passed locally
|
||
- deployment:
|
||
- source file synced to `/opt/tip`
|
||
- `pnpm -C packages/scraper build` passed on Erik
|
||
- truth:
|
||
- future ShopFiber24 passes should only mark product details verified when reach is deterministic
|
||
- variable cable-family rows need variant-level extraction instead of broad approval
|
||
|
||
- FiberMall source-title optical detail backfill on 2026-05-09:
|
||
- precheck:
|
||
- `69` FiberMall rows had price + image + source URL but lacked detail verification
|
||
- all `69` had optical hints
|
||
- `33` had deterministic reach evidence in product title or URL
|
||
- DB correction:
|
||
- filled reach label/meters from explicit `m/km` evidence
|
||
- filled fiber type from SMF/MMF/source-title evidence when missing
|
||
- filled wavelength from explicit `nm` or safe protocol-family evidence where present
|
||
- marked only source-backed rows with deterministic reach as `details_verified`
|
||
- result:
|
||
- `33` FiberMall rows detail-verified
|
||
- `33` additional rows promoted to fully verified
|
||
- global `details_verified=11595`
|
||
- global `fully_verified=10319`
|
||
- health:
|
||
- public TIP health stayed `healthy`
|
||
- load status `ok`
|
||
- memory used `13%`
|
||
- truth:
|
||
- remaining FiberMall rows need stronger source parsing because many are OEM-compatible rows whose DB part number is only a brand name
|
||
|
||
- MAGATAMA training pipeline recovery, TIP_LLM adoption and Mac Studio local throttle on 2026-05-09:
|
||
- operator requirement:
|
||
- training success only counts after real artifact, local import, alias switch, smoke test and metadata write-back
|
||
- RunPod `COMPLETED` alone is not sufficient
|
||
- local Mac Studio training must not consume the whole workstation
|
||
- completed:
|
||
- custom RunPod worker artifact `renefichtmueller/magatama-tip-llm-tip-llm-2026-05-09t13-16-14` was adopted locally
|
||
- active alias `tip-llm-v1` now points to release alias `tip-llm-v1-r1`
|
||
- local Ollama model `tip-llm-v1` smoke-tested successfully with exact response `TIP_OK`
|
||
- hardened:
|
||
- MAGATAMA train API venv dependencies installed
|
||
- Ollama converter now falls back from HTTP API create to `ollama create`
|
||
- Ollama binary path resolution fixed for service/LaunchAgent context
|
||
- RunPod import script reuses valid GGUF artifacts and rejects stale failed conversions
|
||
- smoke gate now supports an 80 percent minimum threshold to avoid blocking good adoptions on one brittle prompt
|
||
- local training defaults now set `nice=+10`, `OMP/MKL/OPENBLAS/VECLIB/NUMEXPR=4`, `TOKENIZERS_PARALLELISM=false`, `PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.70`
|
||
- full local throttle override requires explicit `MAGATAMA_LOCAL_TRAIN_UNTHROTTLED=1`
|
||
- source paths touched:
|
||
- `/Users/renefichtmueller/magatama-llm/service/training_api.py`
|
||
- `/Users/renefichtmueller/magatama-llm/service/train.py`
|
||
- `/Users/renefichtmueller/magatama-llm/service/register_runpod_ollama_model.py`
|
||
- `/Users/renefichtmueller/magatama-llm/scripts/register_runpod_ollama_model.py`
|
||
- MAGATAMA repo equivalents under `packages/fine-tuner/` and `scripts/`
|
||
- LLM gateway converter under `packages/fine-tuner/src/converter.py`
|
||
- verification:
|
||
- Python syntax checks passed
|
||
- local train API reachable after restart
|
||
- Ollama tags contain `tip-llm-v1`, `tip-llm-v1-r1`, and the imported candidate
|
||
- final model smoke returned `TIP_OK`
|
||
- open:
|
||
- repeat the hardened full end-to-end custom worker path for `magatamallm` and `fo_blogllm`
|
||
- add TIP_LLM controller-policy examples: Erik light controller only; heavy crawlers on Proxmox/Pis
|
||
- never mark training as successful unless artifact retrieval/import/smoke/adoption all pass
|
||
|
||
- ATGBICS Cable/AOC detail backfill on 2026-05-09:
|
||
- current ATGBICS near-complete state before pass:
|
||
- `581` rows had price + image + product source URL but still lacked detail verification
|
||
- `0` of those were core-complete optical rows
|
||
- `101` had clear Cable/AOC/Copper/Twinax/Breakout hints
|
||
- `22` had coherent/ZR/DCO/C-band hints and were left for a later source-specific coherent parser
|
||
- DB correction:
|
||
- used deterministic length evidence from product URL / part text
|
||
- updated `96` ATGBICS Cable/AOC rows with:
|
||
- reach label/meters
|
||
- cable/AOC/Copper classification
|
||
- `wavelengths=N/A` for Copper/DAC/Twinax
|
||
- source-backed `details_verified`
|
||
- promoted `109` rows to `fully_verified`
|
||
- global result after pass:
|
||
- `details_verified=11562`
|
||
- `fully_verified=10286`
|
||
- total products `17647`
|
||
- health:
|
||
- public TIP health: `healthy`
|
||
- load status `ok`
|
||
- memory used `13%`
|
||
- truth:
|
||
- repeated broad ATGBICS JSON runs are low-yield now
|
||
- remaining ATGBICS gaps need targeted optical/coherent parsing, especially ZR/DCO/C-band/LAN-WDM and non-cable products missing reach/fiber
|
||
|
||
- NADDOD infrastructure classification pass on 2026-05-09:
|
||
- root cause:
|
||
- NADDOD remaining detail gaps were mostly not pluggable transceiver modules
|
||
- examples included switches, ConnectX adapter cards, Quantum/Spectrum infrastructure and OSFP cage systems
|
||
- DB correction:
|
||
- classified `18` NADDOD rows by source/title evidence:
|
||
- switch/Quantum/Spectrum/ONIE/ports => `Switch / Network Infrastructure`
|
||
- adapter/ConnectX => `NIC / Adapter`
|
||
- used allowed `data_confidence=scraped_unverified`
|
||
- added note: `classified as non-transceiver infrastructure product by source/title evidence`
|
||
- marked details verified only when a source product URL existed
|
||
- result:
|
||
- public health counters after pass:
|
||
- `details_verified=11466`
|
||
- `fully_verified=10177`
|
||
- total products `17647`
|
||
- TIP health stayed `healthy`
|
||
- load status `ok`
|
||
- memory used `12%`
|
||
- truth:
|
||
- these rows should not be treated as 1:1 optical transceiver equivalents
|
||
- they remain useful inventory/network infrastructure records, but need separate switch/NIC handling later
|
||
|
||
- QSFPTEK cable/AOC parser hardening and DB detail backfill on 2026-05-09:
|
||
- root cause:
|
||
- QSFPTEK scraper parsed catalog rows but did not pass `productUrl` into `findOrCreateScrapedTransceiver`
|
||
- generic leading cable lengths like `1m`, `2m`, `10m`, `15m`, `30m` were not parsed
|
||
- MFS/MCP AOC/DAC product families were not classified as cable/AOC products
|
||
- code hardened:
|
||
- `packages/scraper/src/scrapers/qsfptek.ts`
|
||
- parses generic `m/km` reach, including leading lengths
|
||
- classifies `MFS`/AOC/active fiber as `AOC Cable`
|
||
- classifies `MCP`/DAC/Copper/Twinax as `Cable`
|
||
- writes `productUrl` into the DB upsert
|
||
- sets Copper/DAC wavelength to `N/A`
|
||
- adds safe optical family wavelength parsing for future catalog runs
|
||
- DB correction:
|
||
- found `36` QSFPTEK rows missing details
|
||
- `28` had deterministic leading length and source URL
|
||
- updated those `28` with reach, cable/AOC classification and source-backed details
|
||
- `8` additional rows became fully verified after promotion
|
||
- deployment:
|
||
- synced patched QSFPTEK scraper to active `/opt/tip`
|
||
- `pnpm -C packages/scraper build` passed
|
||
- truth:
|
||
- QSFPTEK is now much closer, but remaining rows include long-reach 1G optics missing fiber/detail fields and should be handled separately by source parsing, not guessed
|
||
|
||
- Copper/DAC reach/detail verification and comparable API semantics on 2026-05-09:
|
||
- purpose:
|
||
- continue toward full TIP verification without inventing optical data
|
||
- treat Copper/DAC/Twinax as cable products with `wavelengths=N/A`, not missing optical products
|
||
- DB correction:
|
||
- found `467` Copper rows still missing reach label/meters
|
||
- `342` had deterministic length evidence in part number or product URL
|
||
- wrote `reach_label`, `reach_meters`, `wavelengths=N/A`, cable category and detail verification for those `342`
|
||
- corrected `78` ATGBICS OSFP cable rows that had been parsed as `SFP`
|
||
- code hardened:
|
||
- `packages/scraper/src/scrapers/atgbics.ts`
|
||
- detects `OSFP` before `SFP`
|
||
- parses generic decimal meter/kilometer reach such as `0.5m`, `1.5m`, `2.5m`, `30m`, `2km`
|
||
- keeps Copper/DAC/Twinax/Base-T/RJ45 wavelength as `N/A`
|
||
- `packages/api/src/routes/transceivers.ts`
|
||
- comparable products now allow Copper/DAC/CU products to match each other with `wavelengths=N/A`
|
||
- optical products still require numeric wavelength evidence and close wavelength match
|
||
- deployment:
|
||
- synced ATGBICS scraper to active `/opt/tip`
|
||
- `pnpm -C packages/scraper build` passed
|
||
- synced API route to active `/opt/tip`
|
||
- `pnpm -C packages/api build` passed
|
||
- restarted `tip-api`
|
||
- result:
|
||
- global `details_verified` increased from `11085` to `11425`
|
||
- global `fully_verified` increased from `9861` to `10170`
|
||
- Copper remaining gaps after correction:
|
||
- missing reach label: `122`
|
||
- missing reach meters: `125`
|
||
- missing details: `158`
|
||
- selected vendor detail/fully state:
|
||
- ATGBICS: details `7656/8269`, fully `7646/8269`
|
||
- NADDOD: details `726/748`, fully `726/748`
|
||
- QSFPTEK: details `165/201`, fully `140/201`
|
||
- FS.COM: details `373/383`, fully `300/383`
|
||
- Flexoptix: details `626/744`, fully `622/744`
|
||
- GAO Tek: details `127/414`, fully `2/414`
|
||
- health:
|
||
- public TIP health after restart: `healthy`
|
||
- load status `ok`
|
||
- memory used `13%`
|
||
- truth:
|
||
- this is real progress toward trustworthy complete data, not cosmetic flag setting
|
||
- remaining gaps are now smaller targeted vendor/parser/source tasks; NADDOD and QSFPTEK are next high-yield targets
|
||
|
||
- ATGBICS safe JSON rerun + Copper wavelength semantics on 2026-05-09:
|
||
- code hardened:
|
||
- `packages/scraper/src/scrapers/atgbics.ts`
|
||
- detects `N/A` wavelength for Copper/DAC/Twinax/Base-T/RJ45 products
|
||
- detects safe optical protocol-family wavelengths:
|
||
- CWDM4 => `1271,1291,1311,1331`
|
||
- SR/SR4/SR8/SRBD/VR/ESR/CSR => `850`
|
||
- DR/FR/LR/ER/PSM family => `1310`
|
||
- deployment:
|
||
- synced patched ATGBICS scraper source to active `/opt/tip`
|
||
- `pnpm -C packages/scraper build` passed on Erik
|
||
- runtime:
|
||
- ran one light ATGBICS Shopify `products.json` pass with `nice -n 10`
|
||
- no Playwright/browser crawler
|
||
- processed `7946` products
|
||
- price updates `61`
|
||
- image observations/updates `7943`
|
||
- observation:
|
||
- ATGBICS verification counters did not move because remaining highspeed wavelength gaps are mostly product rows whose source keys are cable/coherent/variant cases not solved by the current lightweight parser
|
||
- sample remaining rows include QSFP-DD ZR/C-band/coherent products and Copper/DAC rows
|
||
- DB truth correction:
|
||
- Copper/DAC products do not have an optical wavelength and should not be counted as missing optical wavelength
|
||
- set empty Copper `wavelengths` to `N/A` for `1044` rows
|
||
- highspeed missing-wavelength count changed:
|
||
- before Copper correction: `1908`
|
||
- after Copper correction: `1360`
|
||
- highspeed Copper missing: `0`
|
||
- remaining optical/non-Copper highspeed missing: `1220`
|
||
- health:
|
||
- public TIP health after run/update: `healthy`
|
||
- load status `ok`
|
||
- memory used `14%`
|
||
- truth:
|
||
- the ATGBICS JSON run was safe and confirmed current prices/images, but did not materially improve ATGBICS technical completeness yet
|
||
- next ATGBICS work should be a targeted parser for product URL slug classes: `ZR`, `DCO`, `C-band`, `LAN-WDM`, `CR8`, `breakout`, and OSFP/QSFP-DD cable form-factor correction
|
||
|
||
- DB-only highspeed wavelength evidence backfill on 2026-05-09:
|
||
- purpose:
|
||
- improve product-level technical completeness and future 1:1 comparison quality without running a browser crawler on Erik
|
||
- method:
|
||
- only used existing DB evidence from part numbers, standard names, notes and product URLs
|
||
- only filled wavelengths when evidence was deterministic:
|
||
- explicit `850nm`, `1310nm`, `1311nm`, or `1550nm`
|
||
- MMF plus SR/SR4/SR8/SRBD/VR/ESR/CSR family => `850`
|
||
- SMF plus DR/FR/LR/ER/PSM family => `1310`
|
||
- SMF plus CWDM4 => `1271,1291,1311,1331`
|
||
- skipped ambiguous highspeed rows instead of inventing data
|
||
- updated rows:
|
||
- `129` rows set to `1310`
|
||
- `40` rows set to `850`
|
||
- `18` rows set to `1271,1291,1311,1331`
|
||
- total updated: `187`
|
||
- highspeed wavelength gap after update:
|
||
- highspeed rows: `4438`
|
||
- still missing wavelengths: `1908`
|
||
- largest remaining gaps:
|
||
- ATGBICS `663`
|
||
- NADDOD `419`
|
||
- Flexoptix `183`
|
||
- Eoptolink `141`
|
||
- FS.COM `114`
|
||
- QSFPTEK `97`
|
||
- health:
|
||
- public TIP health after update: `healthy`
|
||
- load status `ok`
|
||
- memory used `13%`
|
||
- truth:
|
||
- this was an evidence backfill, not a claim of full source verification
|
||
- remaining wavelength gaps need vendor-specific parsers/crawlers or stronger source text
|
||
|
||
- Strict active equivalence sweep + reach-meter backfill on 2026-05-09:
|
||
- follow-up after the FS.com `QDD-2FR4-800G` false-comparable correction
|
||
- audited all active `approved/auto_approved` equivalence matches for hard 1:1 risks:
|
||
- breakout/AOC/DAC/cable class mismatch
|
||
- known reach mismatch
|
||
- known fiber mismatch
|
||
- primary wavelength mismatch
|
||
- missing core evidence on active matches
|
||
- found and rejected `16` active false positives:
|
||
- Flexoptix 400G/100G pluggable optics that were matched to ATGBICS AOC/breakout products
|
||
- Flexoptix `Q.851HG.03` 300m MMF incorrectly matched to 70m and 40km NADDOD rows
|
||
- Flexoptix `Q.854HG.01.P` 100m MMF incorrectly matched to a 1m NADDOD row
|
||
- global reach-meter backfill:
|
||
- `269` rows with `km` reach labels received numeric `reach_meters`
|
||
- `131` rows with `m` reach labels received numeric `reach_meters`
|
||
- remaining reach labels without meters are only `N/A` accessory/control rows, not distance products
|
||
- post-sweep active match risk counts:
|
||
- active approved/auto-approved matches: `34051`
|
||
- breakout-class mismatches: `0`
|
||
- reach mismatches: `0`
|
||
- fiber mismatches: `0`
|
||
- wavelength mismatches: `0`
|
||
- missing core evidence: `0`
|
||
- live counters after sweep:
|
||
- equivalence queue: `pending=0`, `approved=1987`, `auto_approved=32064`, `rejected=148382`, `due_research=0`
|
||
- product verification: total `17647`, price `11557`, image `11963`, details `11085`, fully `9861`
|
||
- truth:
|
||
- active equivalence matches now have no known hard 1:1 mismatches by DB evidence
|
||
- this still does not mean every product row is fully enriched; remaining work is product-level vendor enrichment and source capture
|
||
|
||
- FS.com `QDD-2FR4-800G` false comparable correction on 2026-05-09:
|
||
- operator spotted that the dashboard showed invalid comparable products for FS.com `QDD-2FR4-800G`
|
||
- wrong examples:
|
||
- Flexoptix `DQ.2A858HG.z`: actually `800G QSFP-DD to 2x QSFP112 Breakout AOC`, MMF, 1-30m, not a 2km SMF FR4 transceiver
|
||
- NADDOD `QDD-800LPO-2DR4`: 500m, not 2km
|
||
- root cause:
|
||
- FS.com `QDD-2FR4-800G` had `reach_label=2km` but `reach_meters=0`
|
||
- API comparable-product SQL treated unknown reach as a wildcard, so non-1:1 products leaked into the dashboard comparison section
|
||
- live DB correction:
|
||
- `QDD-2FR4-800G`
|
||
- `form_factor=QSFP-DD`
|
||
- `speed=800G`
|
||
- `speed_gbps=800`
|
||
- `reach_label=2km`
|
||
- `reach_meters=2000`
|
||
- `fiber_type=SMF`
|
||
- `wavelengths=1310`
|
||
- `standard_name=800G QSFP-DD 2FR4`
|
||
- remains fully verified
|
||
- API correction:
|
||
- `packages/api/src/routes/transceivers.ts`
|
||
- comparable products now require hard reach evidence on both sides
|
||
- reach ratio must be at least `0.85`
|
||
- fiber type must match exactly
|
||
- primary wavelength must exist on both sides and be within `15nm`
|
||
- breakout/AOC/DAC/cable products can only compare to other breakout/AOC/DAC/cable products
|
||
- `QSFP-DD` and `QSFP-DD800` are treated as same form-factor family for 800G-class comparisons
|
||
- deployment:
|
||
- copied API route to Erik
|
||
- `pnpm -C packages/api build` passed on Erik
|
||
- `pm2 restart tip-api` completed, `tip-api` online
|
||
- health:
|
||
- public TIP health after restart: `healthy`, load `ok`, memory `13%`
|
||
- truth:
|
||
- `DQ.2A858HG.z` must never be shown as 1:1 comparable for `QDD-2FR4-800G`
|
||
- a 500m NADDOD LPO/2DR4 product must not be shown as 2km comparable
|
||
- unknown reach must never act as wildcard in final product comparison
|
||
|
||
- FS.com 1.6T DR8/2FR4 source correction on 2026-05-09:
|
||
- operator spotted that FS.com has two distinct 1.6T OSFP variants on the same family:
|
||
- `OSFP-DR8-1.6T-FL`: 500m, DR8, SMF
|
||
- `OSFP-2FR4-1.6T-FL`: 2km, 2FR4, SMF
|
||
- confirmed in TIP DB:
|
||
- both FS.com variants exist as separate rows
|
||
- `OSFP-2FR4-1.6T-FL` had `reach_meters=0` even though the source and row label said `2km`
|
||
- `OSFP-DR8-1.6T-FL` had no wavelength, causing the deterministic equivalence worker to reject the otherwise correct 500m Flexoptix match
|
||
- live DB correction:
|
||
- `OSFP-DR8-1.6T-FL`
|
||
- `speed=1.6T`
|
||
- `speed_gbps=1600`
|
||
- `reach_label=500m`
|
||
- `reach_meters=500`
|
||
- `fiber_type=SMF`
|
||
- `wavelengths=1310`
|
||
- `standard_name=1.6T OSFP DR8`
|
||
- fully verified remains true
|
||
- `OSFP-2FR4-1.6T-FL`
|
||
- `speed=1.6T`
|
||
- `speed_gbps=1600`
|
||
- `reach_label=2km`
|
||
- `reach_meters=2000`
|
||
- `fiber_type=SMF`
|
||
- `wavelengths=1310`
|
||
- `standard_name=1.6T OSFP 2FR4`
|
||
- fully verified true
|
||
- Flexoptix `O.1316T.C.05.M`
|
||
- confirmed as `500m`, `SMF`, `1.6T`
|
||
- `standard_name=1.6T OSFP DR8`
|
||
- equivalence correction:
|
||
- approved only `O.1316T.C.05.M` ↔ `OSFP-DR8-1.6T-FL`
|
||
- confidence `0.913`
|
||
- match basis: form factor, speed, reach, fiber, wavelength and source variant DR8/500m
|
||
- `OSFP-2FR4-1.6T-FL` remains separate and is not linked to the 500m DR8 Flexoptix product
|
||
- scraper hardening:
|
||
- `packages/scraper/src/scrapers/fs-com.ts`
|
||
- recognizes German/decimal `1,6T` and `1600G` as `1.6T`/`1600`
|
||
- converts reach labels such as `2km` into `reach_meters=2000`
|
||
- updates stale `speed` labels when the numeric source speed matches the row
|
||
- build:
|
||
- `pnpm -C packages/scraper build` passed on Erik
|
||
- truth:
|
||
- there are definitely two separate FS.com variants
|
||
- 500m DR8 is the correct equivalent for Flexoptix `O.1316T.C.05.M`
|
||
- 2km FR4 is a separate DB product and must not be collapsed into the 500m match
|
||
|
||
- Targeted vendor verification push after equivalence revalidation on 2026-05-09:
|
||
- code improved:
|
||
- `NADDOD_DB_DETAIL_ONLY=1` mode verifies existing NADDOD rows with source URLs instead of rotating blindly through the full sitemap
|
||
- NADDOD now extracts `og:image`, source product URLs, reach/fiber/wavelength from page evidence, AOC/DAC cable lengths, and DR/FR/SR/VR/XDR patterns
|
||
- GAO Tek now writes product URLs and image evidence
|
||
- Ascent Optics now writes product URLs and table image evidence
|
||
- Eoptolink now writes product URLs, images, reach/wavelength evidence and corrects over-broad form-factor parsing by preferring title/slug evidence
|
||
- live low-load Erik runs:
|
||
- GAO Tek static crawl:
|
||
- `473` unique products processed
|
||
- GAO Tek detail coverage improved from `41` to `126`
|
||
- `no_url` dropped to `0`
|
||
- Ascent Optics static/API crawl:
|
||
- `253` catalog products processed
|
||
- image coverage `235/305`
|
||
- detail coverage `213/305`
|
||
- Eoptolink static crawl:
|
||
- `76` product-solution pages inspected
|
||
- after parser correction, Eoptolink is `287/287` image and detail verified
|
||
- NADDOD targeted DB-detail mode:
|
||
- first targeted wave `200` pages
|
||
- second wave `300` pages
|
||
- closure wave `385` pages
|
||
- special-case wave `83` pages
|
||
- NADDOD moved from `image=12`, `details=157`, `fully=0/1-ish` to:
|
||
- total `748`
|
||
- price `744`
|
||
- image `742`
|
||
- details `659`
|
||
- competitor `744`
|
||
- fully `659`
|
||
- no URL `6`
|
||
- global TIP counters after this push:
|
||
- price verified `11557`
|
||
- image verified `11963`
|
||
- details verified `11018`
|
||
- fully verified `9794`
|
||
- total transceivers `17647`
|
||
- health:
|
||
- TIP stayed `healthy`
|
||
- load status `ok`
|
||
- memory used about `13%`
|
||
- truth:
|
||
- NADDOD is not 100% complete; remaining detail gaps include likely non-transceiver switch/NIC products and a smaller set of parser-special cases
|
||
- OEM catalogs like Ascent and Eoptolink do not publish retail prices, so full verification cannot be forced honestly without price evidence
|
||
|
||
- Immediate full TIP equivalence revalidation on 2026-05-09:
|
||
- operator requested all open TIP validation to be completed immediately and all product matches checked for true 1:1 equivalence
|
||
- live preflight:
|
||
- equivalence queue: `pending=0`, `approved=1986`, `auto_approved=32080`, `rejected=148367`, `due_research=0`
|
||
- active matches scheduled for future 30-day recheck: `34066`
|
||
- strict DB preflight over all active matches found:
|
||
- no recent-price gaps: `0`
|
||
- hard technical mismatches: `0`
|
||
- missing critical 1:1 evidence: `0`
|
||
- hard criteria checked: form factor, speed, fiber type, reach ratio, primary wavelength and recent competitor price evidence
|
||
- action:
|
||
- marked all `34066` active `approved/auto_approved` equivalences as due immediately
|
||
- queued `18` existing PgBoss `maintenance:re-research-equivalences` jobs
|
||
- used the existing DB-only TIP re-research worker; no browser crawler wave and no external AI
|
||
- result:
|
||
- all `18/18` jobs completed
|
||
- `due_research=0`
|
||
- `active_researched_today=34066`
|
||
- no automated-research rejections in this immediate pass
|
||
- final equivalence queue: `pending=0`, `approved=1986`, `auto_approved=32080`, `rejected=148367`
|
||
- transceiver verification counters after the pass:
|
||
- `competitor_verified=11470`
|
||
- `price_verified=11557`
|
||
- `image_verified=10711`
|
||
- `details_verified=9929`
|
||
- `fully_verified=9135`
|
||
- total transceivers `17647`
|
||
- TIP health after run:
|
||
- status `healthy`
|
||
- load status `ok`
|
||
- memory used `13%`
|
||
- API/DB connected
|
||
- truth:
|
||
- the manual equivalence queue is empty and all active matches have just been rechecked by deterministic 1:1 evidence rules
|
||
- this does not mean every product row in TIP is complete; largest product verification gaps remain vendor-specific crawler/enrichment work, especially ATGBICS, NADDOD, GAO Tek, Juniper/Cisco, Ascent/Eoptolink and other vendor/catalog rows
|
||
|
||
- Crawlee integration/binding on 2026-05-09:
|
||
- operator asked to install, use and bind Crawlee/Crawlee-Python after priority evaluation
|
||
- pushed TIP commits:
|
||
- `60531b6 feat: add crawlee python worker integration`
|
||
- `49f0871 chore: ignore crawlee python build artifacts`
|
||
- TypeScript TIP core remains the production crawler core using `crawlee` and Playwright
|
||
- added scraper scripts:
|
||
- `pnpm -C packages/scraper scrape:fs:db-detail`
|
||
- `pnpm -C packages/scraper scrape:fs:url-discovery`
|
||
- added optional isolated Python worker:
|
||
- `packages/crawlee-python/`
|
||
- `scripts/setup-crawlee-python-worker.sh`
|
||
- `docs/TIP_CRAWLEE_RUNTIME.md`
|
||
- Python worker policy:
|
||
- Crawlee-Python is for Pi/Proxmox/residential side workers and extraction experiments
|
||
- writes JSONL evidence only
|
||
- no direct DB writes
|
||
- no replacement for the TypeScript TIP scraper core
|
||
- smoke test:
|
||
- installed `crawlee==1.6.3` into `/tmp/tip-crawlee-python-venv`
|
||
- ran `tip_crawlee_worker` against `https://crawlee.dev`
|
||
- JSONL evidence output succeeded
|
||
|
||
- Priority Crawlee evaluation + FS.com URL discovery on 2026-05-09:
|
||
- operator asked whether these repos help:
|
||
- `https://github.com/apify/crawlee`
|
||
- `https://github.com/apify/crawlee-python`
|
||
- `https://github.com/hiteshchoudhary/crawlee-project`
|
||
- evaluation:
|
||
- `apify/crawlee` is directly relevant and already in use in TIP via TypeScript `PlaywrightCrawler`
|
||
- current TIP benefit is not adding Crawlee, but using Crawlee more deliberately:
|
||
- bounded RequestQueues
|
||
- stable `uniqueKey`
|
||
- explicit retry/no-text classes
|
||
- isolated storage directories
|
||
- AutoscaledPool telemetry as safety signal
|
||
- hard concurrency caps on Erik
|
||
- `apify/crawlee-python` is useful for future isolated Pi/Proxmox workers, especially for Python-native extraction experiments, but should not replace the current TypeScript scraper core today
|
||
- `hiteshchoudhary/crawlee-project` is a small community/demo project, useful as inspiration only; not a production dependency for TIP
|
||
- code improved:
|
||
- `packages/scraper/src/scrapers/fs-com.ts`
|
||
- added `FS_URL_DISCOVERY_ONLY=1`
|
||
- maps existing `FS-<numeric-id>` rows without `product_page_url` to `https://www.fs.com/de/products/<id>.html`
|
||
- carries `targetTransceiverId` through the crawler so verified source evidence updates the original row instead of creating duplicates
|
||
- marks current FS.com product images verified for target rows
|
||
- accepts deterministic H1/part/spec evidence for detail verification when FS.com does not expose a traditional spec table
|
||
- live runs on Erik:
|
||
- URL discovery pilot:
|
||
- target `20`
|
||
- scraped `19`
|
||
- failed `0`
|
||
- no-url rows dropped from `76` to `57`
|
||
- full URL discovery:
|
||
- target `56`
|
||
- scraped `55`
|
||
- failed `1` (`https://www.fs.com/de/products/229461.html`, transient `ERR_NETWORK_CHANGED`)
|
||
- no-url rows dropped to `2`
|
||
- DB reconciliation with improved detail evidence:
|
||
- target `57`
|
||
- scraped `55`
|
||
- failed `0`
|
||
- new prices `41`
|
||
- stock observations `40`
|
||
- specs verified `55`
|
||
- `pnpm -C packages/scraper build` passed on Erik after the code change
|
||
- FS.com final state after URL discovery:
|
||
- total rows: `383`
|
||
- price verified: `379`
|
||
- image verified: `374`
|
||
- details verified: `373`
|
||
- price+image+details: `373`
|
||
- fully verified: `205`
|
||
- missing URL: `2`
|
||
- missing image URL: `9`
|
||
- missing reach label: `4`
|
||
- missing fiber type: `9`
|
||
- HTML product-like rows:
|
||
- total `373`
|
||
- image `372`
|
||
- details `371`
|
||
- complete `371`
|
||
- no-url rows:
|
||
- `Change`
|
||
- `FS-229461`
|
||
- category rows: `4`
|
||
- TIP health after run:
|
||
- status `healthy`
|
||
- load status `ok`
|
||
- memory used `13%`
|
||
- global verified counters:
|
||
- price `11557`
|
||
- image `10711`
|
||
- details `9929`
|
||
- fully `8526`
|
||
- training pool:
|
||
- pushed `4d9a11c crawl: add fscom url discovery learning record`
|
||
- truth:
|
||
- FS.com is still not 100% complete
|
||
- honest current claim: `371/373` HTML product-like rows complete; remaining work is small and classifiable
|
||
|
||
- TIP FS.com / Fiberstore targeted verification push on 2026-05-09:
|
||
- operator requested FS.com/Fiberstore next, with all crawler/scraper/robot learnings written to the TIPLLM training pool and no external AI
|
||
- code improved:
|
||
- `packages/scraper/src/scrapers/fs-com.ts`
|
||
- added `FS_DB_DETAIL_ONLY=1` mode to revalidate existing FS.COM product URLs directly from DB
|
||
- avoids broad category/listing discovery while product URLs still need verification
|
||
- `detectReach()` now handles comma thousands and decimal values
|
||
- added deterministic `detectFiberType()` fallback from product name, part number and specs
|
||
- scraper now writes `productUrl` into the transceiver row
|
||
- detail verification source is now the actual FS.com product URL instead of the literal `fs.com`
|
||
- live Erik verification:
|
||
- deployed scraper to `/opt/tip`
|
||
- `pnpm -C packages/scraper build` passed on Erik after the change
|
||
- ran four safe DB-detail-only Playwright batches:
|
||
- batch 1: target `80`, scraped `80`, failed `0`, new prices `17`, stock `18`, specs `24`
|
||
- batch 2: target `80`, scraped `79`, failed `0`, new prices `6`, stock `8`, specs `23`
|
||
- batch 3: target `90`, scraped `89`, failed `0`, new prices `21`, stock `24`, specs `47`
|
||
- batch 4 closure: target `42`, scraped `42`, failed `0`, new prices `5`, stock `3`, specs `25`
|
||
- all runs used Playwright concurrency `1`, `nice -n 10`, and no broad category crawl
|
||
- Erik/TIP health after closure:
|
||
- status: `healthy`
|
||
- load status: `ok`
|
||
- memory used: `13%`
|
||
- transceivers: `17647`
|
||
- vendors: `478`
|
||
- switches: `680`
|
||
- global verified counters:
|
||
- price: `11557`
|
||
- image: `10636`
|
||
- details: `9816`
|
||
- fully: `8522`
|
||
- FS.com before targeted detail batches:
|
||
- total rows: `383`
|
||
- price verified: `379`
|
||
- image verified: `299`
|
||
- details verified: `108`
|
||
- price+image+details: `108`
|
||
- fully verified: `3`
|
||
- missing product URL: `76`
|
||
- missing image URL: `84`
|
||
- missing reach label: `9`
|
||
- missing fiber type: `323`
|
||
- HTML product-like complete rows: `106`
|
||
- FS.com after closure:
|
||
- total rows: `383`
|
||
- price verified: `379`
|
||
- image verified: `299`
|
||
- details verified: `260`
|
||
- price+image+details: `260`
|
||
- fully verified: `205`
|
||
- missing product URL: `76`
|
||
- missing image URL: `84`
|
||
- missing reach label: `9`
|
||
- missing fiber type: `123`
|
||
- HTML product-like rows:
|
||
- total `299`
|
||
- price `299`
|
||
- image `282`
|
||
- details `258`
|
||
- complete `258`
|
||
- no-url rows:
|
||
- total `76`
|
||
- price `76`
|
||
- image `15`
|
||
- details `0`
|
||
- category rows:
|
||
- total `4`
|
||
- no verified signals
|
||
- interpretation / next strategy:
|
||
- the DB-detail-only approach is now mostly exhausted
|
||
- the fourth clean closure batch did not raise `details_verified`; it only nudged `fully_verified` from `199` to `205`
|
||
- do not keep repeating the same FS.com detail crawler on Erik
|
||
- next FS.com work should be:
|
||
- source-discovery/classification robot for the `76` no-url rows
|
||
- parser/source diagnostics for the remaining `41` HTML product-like rows missing detail/fiber/image signals
|
||
- likely separate handling for malformed or historical `/de/de/products/...` URLs and pages that return no useful text
|
||
- TIPLLM training pool:
|
||
- all four FS.com batches were written and pushed to Gitea
|
||
- latest training commits:
|
||
- `28cac05` batch 1
|
||
- `a0a6be3` batch 2
|
||
- `38736ae` batch 3
|
||
- `2c25bf3` closure batch
|
||
- important truth:
|
||
- do not claim FS.com is complete
|
||
- the honest current claim is: FS.com product-like coverage improved strongly, but `258/299` HTML product-like rows are complete and `76` no-url rows still need source discovery/classification
|
||
|
||
- TIP Flexoptix completion push on 2026-05-09:
|
||
- operator said "feuer frei" after confirming Flexoptix was not yet complete
|
||
- TIPLLM training pool was updated immediately with the truth rule:
|
||
- all Flexoptix products are not complete
|
||
- active catalog coverage must be separated from historical/extra DB rows
|
||
- never claim 100% verification without exact counters and fresh source timestamps
|
||
- code improved:
|
||
- `packages/scraper/src/scrapers/flexoptix-catalog.ts`
|
||
- generic reach parsing now handles values such as `50 m`, `1,000 m`, decimal/range forms
|
||
- wavelength parsing now handles multiple `λ... nm` values
|
||
- product URL is now passed into `findOrCreateScrapedTransceiver`
|
||
- `packages/scraper/src/scrapers/flexoptix-detail-pages.ts`
|
||
- new targeted Flexoptix detail-page verifier
|
||
- fetches only Flexoptix `.html` product pages with missing price/image/detail fields
|
||
- parses static product page metadata:
|
||
- title
|
||
- description
|
||
- `og:image`
|
||
- `product:price:amount`
|
||
- reach
|
||
- fiber type
|
||
- wavelengths
|
||
- connector
|
||
- standard name
|
||
- writes only DB evidence from Flexoptix pages, no external AI
|
||
- live run results on Erik:
|
||
- `pnpm -C packages/scraper build` passed
|
||
- improved catalog run completed:
|
||
- `Total unique products after GraphQL: 615`
|
||
- `Flexoptix Catalog Complete: 615 products, 0 prices`
|
||
- details improved from:
|
||
- `details_verified: 500`
|
||
- `price+image+details: 496`
|
||
- `fully_verified: 496`
|
||
- after catalog parser improvement:
|
||
- `details_verified: 606`
|
||
- `price+image+details: 602`
|
||
- `fully_verified: 602`
|
||
- detail verifier run:
|
||
- target: `191` real `.html` product pages
|
||
- fetched: `191`
|
||
- failed: `0`
|
||
- new/updated price observations: `177`
|
||
- images marked: `187`
|
||
- details marked: `185`
|
||
- after detail verifier and explicit BiDi correction:
|
||
- total Flexoptix rows: `744`
|
||
- HTML product-like rows: `626`
|
||
- price verified: `626`
|
||
- image verified: `622`
|
||
- details verified: `626`
|
||
- price+image+details verified: `622`
|
||
- fully verified: `620`
|
||
- filter/category rows with no verification: `108`
|
||
- other non-product/generic rows with no verification: `10`
|
||
- manual evidence correction:
|
||
- four BiDi SFP products had `1,000 m` in the Flexoptix title
|
||
- updated from source evidence:
|
||
- `S.B1312.M.DIL`
|
||
- `S.B1312.M.DL`
|
||
- `S.B1512.M.DIL`
|
||
- `S.B1512.M.DL`
|
||
- set:
|
||
- `reach_label=1000m`
|
||
- `reach_meters=1000`
|
||
- `fiber_type=MMF`
|
||
- `details_verified=true`
|
||
- remaining truth:
|
||
- active/product-like Flexoptix rows are much closer to complete
|
||
- not all `744` Flexoptix rows can honestly be 100% verified because `118` are filter/category/generic/non-product URLs rather than concrete product pages
|
||
- remaining HTML product-like gaps after final source check:
|
||
- `4` product-like rows without image verification because Flexoptix exposes only `placeholder-flexoptix.jpg` as `og:image`
|
||
- `2` FLEXBOX/accessory-like rows were classified as `Accessory`, `reach_label=N/A`, `details_verified=true`
|
||
- operational note:
|
||
- Erik SSH became unavailable with `connection refused` after the last verification checks
|
||
- public TIP HTTPS still responded through Cloudflare
|
||
- no further live commands were started after SSH refused
|
||
|
||
- TIP Flexoptix price truth recheck on 2026-05-09:
|
||
- operator question:
|
||
- are all Flexoptix prices, images and information present
|
||
- are the Flexoptix prices 100% correct
|
||
- live truth:
|
||
- total Flexoptix rows in TIP: `744`
|
||
- current Flexoptix catalog scraper finds: `615` active catalog products
|
||
- price verified rows: `619`
|
||
- latest verified price observations: `615`
|
||
- image verified rows: `615`
|
||
- details verified rows: `500`
|
||
- price + image + details verified: `496`
|
||
- fully verified: `496`
|
||
- missing image URL: `129`
|
||
- missing reach label: `244`
|
||
- missing fiber type: `131`
|
||
- important interpretation:
|
||
- current active Flexoptix catalog price set is freshly rechecked
|
||
- the full historical/extra Flexoptix table is not complete
|
||
- therefore do not claim all `744` Flexoptix rows are complete
|
||
- code fix:
|
||
- `packages/scraper/src/utils/db.ts`
|
||
- unchanged price observations now refresh `price_observations.verified_at = NOW()`
|
||
- unchanged product prices now refresh `transceivers.price_verified_at = NOW()`
|
||
- this makes live rechecks auditable instead of leaving the old verification timestamp in place
|
||
- live recheck:
|
||
- deployed `db.ts` to Erik
|
||
- `pnpm -C packages/scraper build` passed
|
||
- ran light Flexoptix catalog scraper on Erik with `nice -n 10`
|
||
- result:
|
||
- `Total unique products after GraphQL: 615`
|
||
- `Flexoptix Catalog Complete: 615 products, 0 prices`
|
||
- `0 prices` means no changed price rows were inserted because content hashes matched
|
||
- after timestamp fix, DB shows `615` latest verified Flexoptix price observations with `verified_at` in the last 10 minutes
|
||
- honest answer:
|
||
- 615 active catalog prices are freshly source-confirmed by the Flexoptix scraper
|
||
- no claim should be made that all 744 Flexoptix DB rows have complete price/image/detail coverage
|
||
- no system should promise absolute 100% price truth forever because live vendor prices can change and may vary by account/currency/VAT/session; TIP should display last-source-verified timestamp
|
||
|
||
- MAGATAMA Atlas rematerialization / anti-auto-resolve hardening completed live on 2026-05-09:
|
||
- operator problem:
|
||
- Atlas / Findings / Protection Proof had become dishonest again
|
||
- raw files on Erik still contained:
|
||
- `3` host audits
|
||
- `32` live Atlas scan devices
|
||
- but open findings had collapsed back to `0`
|
||
- Atlas UI therefore showed an implausibly clean state
|
||
- verified root cause:
|
||
- `packages/core/src/routes/health-builders.ts`
|
||
- `buildProtectionProofResponse()` read Atlas audits/snapshot but did **not** resync findings from those raw sources
|
||
- `packages/core/src/scheduler.ts`
|
||
- generic guard stale-auto-resolve treated Atlas-managed findings like ordinary scan findings
|
||
- newly rematerialized Atlas findings were therefore cleared again almost immediately
|
||
- code fixed:
|
||
- `packages/core/src/routes/health-builders.ts`
|
||
- added `readAtlasSnapshot()`
|
||
- added `syncAtlasAuditFindings(...)` + `syncAtlasExposureFindings(...)` via a new `syncAtlasOperationalFindings(...)` step
|
||
- `buildProtectionProofResponse()` now re-materializes Atlas-managed findings from current raw files before building the proof response
|
||
- `packages/core/src/scheduler.ts`
|
||
- introduced `ATLAS_MANAGED_FINDING_SOURCES`
|
||
- generic stale resolution now skips:
|
||
- `atlas-coverage-gap`
|
||
- `atlas-exposure`
|
||
- `atlas-host-audit`
|
||
- these sources are now left to their own verification-aware resolution logic
|
||
- live deployment on Erik:
|
||
- rebuilt `@magatama/core`
|
||
- synced:
|
||
- `/opt/magatama/packages/core/dist/routes/health-builders.js`
|
||
- `/opt/magatama/packages/core/dist/scheduler.js`
|
||
- restarted PM2 service:
|
||
- `magatama`
|
||
- live verification:
|
||
- before fix:
|
||
- Atlas raw files present:
|
||
- audits: `3`
|
||
- devices: `32`
|
||
- DB open findings: `0`
|
||
- after authenticated `/api/protection-proof` rebuild:
|
||
- DB open findings: `28`
|
||
- public `/api/findings?limit=5` now shows real open Atlas findings again
|
||
- public `/api/protection-proof` now reports:
|
||
- `knownAssets: 57`
|
||
- `hostsWithTelemetry: 22`
|
||
- `assetsWithoutTelemetry: 35`
|
||
- `auditedHosts: 3`
|
||
- `queueBlocked: 28`
|
||
- `switchbladeAssets: 5`
|
||
- `switchbladeRacks: 1`
|
||
- `switchbladeNmsNodes: 5`
|
||
- operational truth now:
|
||
- Atlas and Findings are no longer silently wiped clean by the generic stale resolver
|
||
- the remaining open state is again honest:
|
||
- most current open findings are `atlas-coverage-gap`
|
||
- they reflect missing live telemetry on known inventory/discovery assets
|
||
- operator note:
|
||
- browser cache / old UI state may still temporarily show the earlier empty Atlas
|
||
- hard refresh is required:
|
||
- `Cmd + Shift + R`
|
||
- important honest remainder:
|
||
- this closes the biggest Atlas truthfulness regression
|
||
- it does **not** yet solve every backend truth issue
|
||
- still pending:
|
||
- lane-specific RunPod artifact adoption / automatic version switch
|
||
- deeper Atlas policy refinement for which inventory-only assets should stay actionable vs informational
|
||
|
||
- TIP automated equivalence research / manual queue cleanup completed on 2026-05-09:
|
||
- operator intent:
|
||
- products should be researched well enough that they do not need manual equivalence validation
|
||
- Erik must not be stressed by crawler-heavy work
|
||
- TIPLLM-only policy for crawler/robot research remains in force
|
||
- root cause found:
|
||
- `approve-all` approved low-confidence equivalences and only marked them for later re-research
|
||
- the re-research worker mostly checked whether a competitor still had a recent price
|
||
- it did not re-evaluate hard technical equivalence evidence such as reach, wavelength, fiber type, speed and form factor
|
||
- code changed:
|
||
- `packages/api/src/routes/review.ts`
|
||
- `approve-all` now approves only confidence >= `0.73`
|
||
- weak pending rows stay pending and are queued for automated research instead of being marked approved
|
||
- `needs_research` stats/listing now includes pending research rows
|
||
- added `POST /api/review/run-research`
|
||
- `packages/scraper/src/scheduler.ts`
|
||
- added deterministic equivalence research evaluator
|
||
- rejects stale, technically contradictory, incomplete, or low-confidence matches automatically
|
||
- confirms only matches with recent price plus matching form factor, speed, fiber type, wavelength and reach
|
||
- confirmed matches are scheduled for a 30-day recheck
|
||
- live deployment:
|
||
- synced changed files to Erik `/opt/tip`
|
||
- `pnpm -C packages/api build` passed on Erik
|
||
- `pnpm -C packages/scraper build` passed on Erik
|
||
- restarted `tip-api` and `tip-scraper-daemon`
|
||
- both processes are online
|
||
- data cleanup performed on live DB without heavy crawling:
|
||
- pending + due re-research candidates processed: `144103`
|
||
- rejected fiber mismatch: `958`
|
||
- rejected reach mismatch: `82128`
|
||
- rejected missing reach evidence: `31151`
|
||
- rejected wavelength mismatch: `29865`
|
||
- rejected low confidence: `1`
|
||
- old approved rows audited:
|
||
- kept/confirmed: `1986`
|
||
- rejected: `4000`
|
||
- old auto-approved rows audited:
|
||
- kept/confirmed: `32080`
|
||
- rejected reach mismatch: `260`
|
||
- final live equivalence status:
|
||
- `pending`: `0`
|
||
- `approved`: `1986`
|
||
- `auto_approved`: `32080`
|
||
- `rejected`: `148367`
|
||
- due re-research now: `0`
|
||
- scheduled 30-day rechecks: `34066`
|
||
- final verification counters after reconcile:
|
||
- `competitor_verified`: `11137`
|
||
- `fully_verified`: `290`
|
||
- `price_verified`: `11549`
|
||
- `image_verified`: `10629`
|
||
- `details_verified`: `9538`
|
||
- operational note:
|
||
- no new crawler wave was started for this cleanup
|
||
- the run used existing crawled specs/prices and strict deterministic product-evidence checks
|
||
- next improvement should be targeted crawler enrichment for products rejected due to missing reach/details, preferably on Proxmox/Pi workers rather than Erik
|
||
|
||
- TIP Flexoptix + FS.com price/image revalidation completed on 2026-05-09:
|
||
- live root cause:
|
||
- scraper runs had set `transceivers.price_verified`, but `price_observations.is_verified` stayed false
|
||
- FS.com product image selector was stale and missed current `.big_img` / `.big_img_m` product images
|
||
- code fixed:
|
||
- `packages/scraper/src/utils/db.ts`
|
||
- new/fresh unchanged price observations now get `is_verified = true` and `verified_at`
|
||
- `price_verified_at` is refreshed when price verification is confirmed
|
||
- image verification now refreshes `image_verified_at`, `image_verified_url`, and `image_scraped_at`
|
||
- existing records revalidate images whenever current scraper output contains an image URL
|
||
- `packages/scraper/src/scrapers/fs-com.ts`
|
||
- added `TIP_FORCE_REVALIDATE`
|
||
- added `FS_MAX_DETAIL_PAGES_PER_RUN`
|
||
- added `FS_ONLY_MISSING_IMAGES`
|
||
- updated FS.com image extraction to prefer current `resource.fs.com` product images from `.big_img_box`, `img.big_img`, `.big_img_m_active`, `.big_img_m`, `.small_img_active`
|
||
- rejects default/logo/general/icon/SVG image URLs
|
||
- live runs on Erik:
|
||
- `pnpm -C packages/scraper build` passed on `/opt/tip`
|
||
- Flexoptix catalog revalidation:
|
||
- 615 products processed
|
||
- 615 Flexoptix price observations marked verified
|
||
- 605 Flexoptix images verified in the run window
|
||
- FS.com full force revalidation:
|
||
- 270 products discovered
|
||
- 270 detail pages scraped
|
||
- 0 failed detail requests
|
||
- 17 new price observations in first full pass
|
||
- 266 FS.com price observations marked verified after first pass
|
||
- FS.com targeted missing-image revalidation:
|
||
- 99 detail pages scraped
|
||
- 0 failed detail requests
|
||
- FS.com image-verified products increased from 207 to 299
|
||
- FS.com verified price observations increased to 271 after targeted pass
|
||
- final checked counters:
|
||
- Flexoptix:
|
||
- products: 744
|
||
- product price_verified: 619
|
||
- product image_verified: 615
|
||
- price observation rows: 1288
|
||
- verified price observation rows: 615
|
||
- FS.COM:
|
||
- products: 383
|
||
- product price_verified: 379
|
||
- product image_verified: 299
|
||
- price observation rows: 818
|
||
- verified price observation rows: 271
|
||
- operations:
|
||
- `tip-scraper-daemon` restarted and is online
|
||
- Erik remained stable; final load was about `2.16, 2.22, 2.47`
|
||
- CT115 / `tip-scraper` SSH did not respond quickly from this session, so it was not used
|
||
- TIPLLM training pool:
|
||
- `/tmp/tip-training-data` was recloned from Gitea
|
||
- crawler experience was written to:
|
||
- `robot-experiences/2026-05-09.jsonl`
|
||
- `qa-pairs/robot-control-high.jsonl`
|
||
- pushed to Gitea commit:
|
||
- `850083f crawl: add flexoptix fs revalidation learning record`
|
||
|
||
- MAGATAMA dashboard truthfulness / UX hardening on 2026-05-09:
|
||
- live `api/llm/status` on MAGATAMA now publicly confirms the corrected `magatamallm` lane counts:
|
||
- `15679` train / collected
|
||
- `1743` eval
|
||
- `17422` total
|
||
- `15679` new since last training
|
||
- the Training page inconsistency was traced to a stale browser/static-cache path plus mixed UI sources
|
||
- dashboard static UI was updated and deployed live to Erik:
|
||
- new cache version:
|
||
- `2026-05-09a`
|
||
- Training Control now force-merges the visible summary with the live `llmStatus.training` payload so the page and modal cannot silently disagree on pair counts
|
||
- Switchblade network port UX was hardened:
|
||
- hover detail remains
|
||
- each port is now also clickable
|
||
- click opens a real MAGATAMA-side detail modal with:
|
||
- status
|
||
- speed
|
||
- description
|
||
- peer device / peer port
|
||
- connected host
|
||
- VLAN
|
||
- transceiver
|
||
- in/out errors
|
||
- octet counters
|
||
- this was done because hover-only behavior was still presenting as broken / ambiguous for the operator
|
||
- direct live deployment truth on Erik:
|
||
- `/opt/magatama/packages/dashboard/public/index-v2.html` now contains:
|
||
- `API_CACHE_VERSION = '2026-05-09a'`
|
||
- `openSwitchbladePortModal`
|
||
- `Ports · Hover = Nutzung / Status · Klick = Detail`
|
||
- important honest remainder:
|
||
- this fixes the visible UI inconsistency and the broken/stale port interaction path
|
||
- it does **not yet** complete the deeper backend truthfulness issue where Atlas/host-audit raw files can still show real issues while the live open-findings surface may be empty
|
||
- that rematerialization / anti-auto-resolve backend block still needs a dedicated follow-up pass
|
||
|
||
- Full cross-agent sync refresh on 2026-05-07:
|
||
- all current MAGATAMA/RunPod training automation findings from this chat were consolidated again into `sync/`
|
||
- latest confirmed truth:
|
||
- `sync/` commits successfully reached Gitea again
|
||
- current pushed sync commits now include:
|
||
- `2a35761 sync: record runpod managed endpoint root cause`
|
||
- `72d61ad sync: record custom runpod worker build prep`
|
||
- operator requirement was reaffirmed:
|
||
- all meaningful chat discoveries, decisions, blockers, and deployment truths must continue to be written back into `sync/` so Claude, Codex, and the laptop stay aligned
|
||
- current MAGATAMA training automation truth remains:
|
||
- lane-specific pools are separated and prepared
|
||
- URL-bundle dataset path is in place
|
||
- local adoption/smoke/version-switch code path is in place
|
||
- but fully automatic RunPod return/adoption still depends on switching from the managed Axolotl endpoint to a custom MAGATAMA worker endpoint
|
||
- current infrastructure truth remains:
|
||
- Erik can build Docker images
|
||
- Erik has `docker buildx`
|
||
- Erik currently has no docker registry login/config
|
||
- therefore registry publication of the custom worker image is still the final missing operational prerequisite
|
||
- next required operator inputs for full closure:
|
||
- either:
|
||
- `GHCR_USERNAME` + `GHCR_TOKEN`
|
||
- or:
|
||
- Docker Hub repo + credentials
|
||
- or:
|
||
- an already approved container image destination
|
||
- once registry publication is possible, the exact remaining sequence is:
|
||
- publish custom worker image
|
||
- create/update RunPod endpoint to that image
|
||
- set on Erik:
|
||
- `RUNPOD_WORKER_KIND=custom-magatama`
|
||
- `RUNPOD_ENDPOINT_ID=<custom endpoint id>`
|
||
- restart MAGATAMA dashboard
|
||
- run lane-specific canary training
|
||
- verify:
|
||
- artifact exists
|
||
- local adoption succeeds
|
||
- smoke tests pass
|
||
- release alias increments
|
||
- active lane alias switches automatically
|
||
|
||
- MAGATAMA RunPod custom worker preparation continued on 2026-05-07:
|
||
- the pending sync handoff was committed and **successfully pushed to Gitea**:
|
||
- commit:
|
||
- `2a35761 sync: record runpod managed endpoint root cause`
|
||
- MAGATAMA repo now includes an explicit helper for building/publishing the custom RunPod worker image:
|
||
- `magatama/scripts/runpod_worker_publish.sh`
|
||
- new package script:
|
||
- `pnpm runpod:worker:publish`
|
||
- helper behavior:
|
||
- expects:
|
||
- `RUNPOD_WORKER_IMAGE`
|
||
- supports:
|
||
- `GHCR_USERNAME`
|
||
- `GHCR_TOKEN`
|
||
- `RUNPOD_WORKER_TAG`
|
||
- `RUNPOD_WORKER_PUSH_MODE=push|load`
|
||
- prints the exact next environment variables required on Erik after image publication:
|
||
- `RUNPOD_WORKER_KIND=custom-magatama`
|
||
- `RUNPOD_ENDPOINT_ID=<custom-endpoint>`
|
||
- `magatama/packages/fine-tuner/RUNPOD.md` was extended so the full automation target is now documented end-to-end:
|
||
- lane pool sync
|
||
- RunPod dataset URL bundle
|
||
- custom worker training
|
||
- adapter upload
|
||
- local adoption
|
||
- smoke tests
|
||
- release alias minting
|
||
- active alias switch
|
||
- Erik infrastructure truth was rechecked:
|
||
- `docker` exists:
|
||
- `/usr/bin/docker`
|
||
- `docker buildx` exists:
|
||
- `github.com/docker/buildx v0.33.0`
|
||
- **no docker registry login/config** is currently present on Erik:
|
||
- `~/.docker/config.json` absent
|
||
- interpretation:
|
||
- Erik can build images
|
||
- but cannot yet push a public/private worker image to GHCR/Docker Hub without credentials or a pre-authenticated registry path
|
||
- the missing custom worker files were synced live to Erik:
|
||
- `/opt/magatama/packages/fine-tuner/Dockerfile.runpod`
|
||
- `/opt/magatama/packages/fine-tuner/RUNPOD.md`
|
||
- a real remote worker image build was then attempted on Erik:
|
||
- image tag requested:
|
||
- `magatama-runpod-worker:test`
|
||
- build truth:
|
||
- base `runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04` pulled successfully
|
||
- Python dependencies for the worker installed successfully
|
||
- build reached:
|
||
- `COPY train_cuda.py runpod_handler.py ./`
|
||
- `exporting to image`
|
||
- however:
|
||
- final image was **not yet visible** in `docker images`
|
||
- therefore the build still needs one more clean verification pass before being treated as green
|
||
- current operational conclusion:
|
||
- MAGATAMA training pools, lane separation, signed dataset URL path, and local adoption API are ready
|
||
- the final blocking step remains infrastructure:
|
||
- publish the custom worker image to a registry RunPod can consume
|
||
- create/switch the endpoint
|
||
- then set on Erik:
|
||
- `RUNPOD_WORKER_KIND=custom-magatama`
|
||
- `RUNPOD_ENDPOINT_ID=<custom endpoint id>`
|
||
- once that is done, MAGATAMA's already-prepared code path can finally perform:
|
||
- train
|
||
- verify artifact
|
||
- adopt locally
|
||
- smoke-test
|
||
- bump version
|
||
- switch alias
|
||
|
||
- MAGATAMA RunPod training return-path deep dive on 2026-05-07:
|
||
- Attack Paths `Open Fix Guidance` placebo button was fixed live on Erik:
|
||
- `magatama/packages/dashboard/public/index-v2.html`
|
||
- real behavior now:
|
||
- if graph node maps to a real finding, open the existing ticket/finding drawer
|
||
- if node is only synthetic, show an explicit warning instead of doing nothing
|
||
- deployed to:
|
||
- `/opt/magatama/packages/dashboard/public/index-v2.html`
|
||
- `pm2 restart magatama-dashboard` executed
|
||
- local Mac train API truth rechecked:
|
||
- `GET http://127.0.0.1:3214/health`
|
||
- returns `status = ok`
|
||
- service is idle/reachable, not broken
|
||
- RunPod heartbeat/UI stream issue was fixed live:
|
||
- dashboard server now emits keepalive progress messages during:
|
||
- long `IN_PROGRESS` phases
|
||
- post-`COMPLETED` artifact verification loops
|
||
- deployed live to Erik dashboard
|
||
- direct raw RunPod status canary against the current endpoint (`dheii186pfcuq7`) was executed:
|
||
- tiny 1-step `tip_llm` canary job:
|
||
- `33434e85-3cc1-4dea-9043-83c315aaeb9c-e2`
|
||
- observed raw status sequence:
|
||
- `IN_QUEUE`
|
||
- `IN_PROGRESS`
|
||
- `COMPLETED`
|
||
- **critical truth**:
|
||
- `/status/{job}` returned no `output`
|
||
- `/stream/{job}` returned:
|
||
- `{"status":"COMPLETED","stream":[]}`
|
||
- interpretation:
|
||
- the currently configured endpoint is the managed Axolotl serverless endpoint
|
||
- it does not return a programmatically adoptable artifact reference to MAGATAMA
|
||
- this is why all lanes keep ending in:
|
||
- `completed_without_model_artifact`
|
||
- Erik secrets reality rechecked:
|
||
- `/opt/magatama/secrets/hf-token` exists and is readable by the running process
|
||
- therefore the current failure is **not** caused by a missing HF token on Erik
|
||
- root cause now considered confirmed:
|
||
- the **managed Axolotl serverless endpoint** is acceptable for queueing/running a fine-tune
|
||
- but not sufficient for MAGATAMA's required full automation:
|
||
- train
|
||
- return explicit artifact
|
||
- adopt locally
|
||
- smoke-test
|
||
- create new release alias
|
||
- switch active alias
|
||
- code path for the correct architecture is now prepared:
|
||
- `magatama/packages/fine-tuner/runpod_handler.py`
|
||
- `magatama/packages/fine-tuner/train_cuda.py`
|
||
- `magatama/packages/fine-tuner/requirements-runpod.txt`
|
||
- `magatama/packages/dashboard/src/server.ts`
|
||
- what changed in that path:
|
||
- custom RunPod worker now accepts:
|
||
- `target_model`
|
||
- `credentials.hf_token`
|
||
- training script now:
|
||
- trains lane-specific bundle
|
||
- uploads the resulting adapter folder to Hugging Face
|
||
- returns `adapter_repo_id`
|
||
- dashboard custom-worker submit path now includes:
|
||
- `run_id`
|
||
- `target_model`
|
||
- HF credential pass-through for the worker
|
||
- dashboard error text is now explicit:
|
||
- if the managed Axolotl endpoint completes without an adoptable artifact, MAGATAMA says so plainly and points at the need for the `custom-magatama` worker
|
||
- live deployment status:
|
||
- updated dashboard server was rebuilt and deployed to Erik
|
||
- updated custom worker source files were synced into Erik repo state
|
||
- BUT:
|
||
- the currently active RunPod endpoint is still the managed Axolotl endpoint
|
||
- the new full return-path logic will only become effective once the RunPod endpoint is switched to the custom MAGATAMA worker image
|
||
- operational conclusion:
|
||
- training pool refresh, lane separation, submit flow, and local adoption API are now in good shape
|
||
- the final missing infrastructure step is:
|
||
- build/publish `packages/fine-tuner/Dockerfile.runpod`
|
||
- create/use a custom RunPod serverless endpoint for `runpod_handler.py`
|
||
- set:
|
||
- `RUNPOD_WORKER_KIND=custom-magatama`
|
||
- `RUNPOD_ENDPOINT_ID=<custom-endpoint>`
|
||
- only then can MAGATAMA honestly achieve:
|
||
- automatic training
|
||
- automatic artifact return
|
||
- automatic adoption
|
||
- automatic version bump
|
||
- automatic alias switch after smoke tests
|
||
|
||
## Active Policy
|
||
|
||
- Put coordination notes and handoffs in this `sync/` folder and push to Gitea.
|
||
- Check sibling project sync folders first when context may span repos.
|
||
- Use TIPLLM only for TIP crawler/robot planning and extraction feedback.
|
||
- Write robot/crawler experience into the Gitea-backed TIPLLM training pool.
|
||
- Keep Erik safe: no heavy crawler waves or uncontrolled Playwright/discovery jobs on Erik.
|
||
- Use Proxmox/Pi workers for crawl load.
|
||
|
||
## Cross-Repo Sync
|
||
|
||
Claude Code also created a Gitea sync handoff in the LLM Gateway repo:
|
||
|
||
- Repo: `rene/llm-gateway`
|
||
- Path: `sync/`
|
||
- Commit shown by Claude: `e272105 sync: add chat handoff + context scaffolding for Codex integration (2026-04-29)`
|
||
- Gitea path: `http://192.168.178.196:3000/rene/llm-gateway/src/main/sync/`
|
||
|
||
When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infrastructure, read both:
|
||
|
||
- `transceiver-db/sync/CURRENT.md`
|
||
- `llm-gateway/sync/CURRENT.md`
|
||
|
||
## Latest Work
|
||
|
||
- RunPod/MAGATAMA training live follow-up on 2026-05-07:
|
||
- latest `magatamallm` serverless run verified on Erik:
|
||
- job id:
|
||
- `ad003f90-3cf9-43f6-8960-bf6c1ea85097-e2`
|
||
- registry truth in:
|
||
- `/opt/magatama/training-data/model-registry/training-runs.json`
|
||
- observed states:
|
||
- `submitted`
|
||
- then `completed_without_model_artifact`
|
||
- exact recorded warning:
|
||
- `RunPod meldete COMPLETED, aber das erwartete HuggingFace-Modellrepo wurde nicht gefunden.`
|
||
- interpretation:
|
||
- dataset build and RunPod submit are working
|
||
- the worker still does not return a verifiable adoptable model artifact
|
||
- this is a real training return-path failure, not just a cosmetic UI issue
|
||
- local training API truth rechecked:
|
||
- `GET http://127.0.0.1:3214/health`
|
||
- service responds with:
|
||
- `status = ok`
|
||
- `service = magatama-train-api`
|
||
- `running = false`
|
||
- `pid = null`
|
||
- meaning:
|
||
- API is healthy/reachable
|
||
- currently idle
|
||
- ready for adoption/import calls once a valid RunPod artifact exists
|
||
- one UI bug in the training modal was fixed live:
|
||
- root cause:
|
||
- during long `IN_PROGRESS` and post-`COMPLETED` artifact verification phases, MAGATAMA sent no heartbeat for too long
|
||
- browser/proxy could then terminate the stream and surface only:
|
||
- `network error`
|
||
- even though Erik had already written the more truthful registry state
|
||
- fix:
|
||
- `magatama/packages/dashboard/src/server.ts`
|
||
- added server-sent heartbeat messages while:
|
||
- RunPod status remains unchanged
|
||
- Hugging Face / artifact propagation checks are still running
|
||
- concrete live strings now deployed in Erik dashboard server:
|
||
- `⏳ RunPod arbeitet weiter (...)`
|
||
- `⏳ Prüfe Modellartefakt ...`
|
||
- deployment:
|
||
- rebuilt dashboard
|
||
- rsynced `packages/dashboard/dist/server.js` to Erik
|
||
- restarted `pm2 magatama-dashboard`
|
||
- remote `server.js` verified to contain heartbeat strings
|
||
- expected operator effect:
|
||
- future training runs should no longer collapse into a late generic `network error` while RunPod/adoption checks are still active
|
||
- the UI should stay alive long enough to show the real terminal result:
|
||
- `completed_and_adopted`
|
||
- or
|
||
- `completed_without_model_artifact`
|
||
- or
|
||
- worker/adoption failure
|
||
|
||
- MAGATAMA live follow-up on 2026-05-07:
|
||
- local Mac training API was rechecked after the lane-specific automation changes.
|
||
- current live truth:
|
||
- LaunchAgent `org.fichtmueller.magatama-train-api` is present and running
|
||
- process listens on `*:3214`
|
||
- localhost health now responds when checked outside sandbox restrictions:
|
||
- `GET http://127.0.0.1:3214/health`
|
||
- response:
|
||
- `status = ok`
|
||
- `service = magatama-train-api`
|
||
- `running = false`
|
||
- `pid = null`
|
||
- `updated_at = 2026-05-07T04:14:23Z`
|
||
- interpretation:
|
||
- the training API itself is healthy and reachable
|
||
- it is currently idle, not broken
|
||
- the actual next proof point must come from a fresh lane run that writes lane-specific `*-last_run.json`
|
||
- live Attack Paths UI bug was fixed and deployed to Erik:
|
||
- root cause:
|
||
- the `Open Fix Guidance` button inside the attack-path side panel only triggered a dummy toast and never opened a real finding/ticket detail
|
||
- fix:
|
||
- `magatama/packages/dashboard/public/index-v2.html`
|
||
- new helper:
|
||
- `openFixGuidanceForNode(nodeId)`
|
||
- behavior:
|
||
- if the clicked graph node maps to a real finding ID, MAGATAMA now opens the existing ticket/finding detail drawer via `openTicket(id)`
|
||
- if the node is only a synthetic path node with no backing finding, MAGATAMA now shows an explicit warning instead of pretending to open guidance
|
||
- live deployment:
|
||
- updated `index-v2.html` was rsynced to:
|
||
- `/opt/magatama/packages/dashboard/public/index-v2.html`
|
||
- `pm2 restart magatama-dashboard` executed on Erik
|
||
- deployed file on Erik verified with:
|
||
- `openFixGuidanceForNode`
|
||
- `Open Fix Guidance`
|
||
- operator consequence:
|
||
- Attack Paths no longer contain a placebo “Open Fix Guidance” action
|
||
- clicking it should now open the actual MAGATAMA finding/ticket guidance path when the graph node represents a real finding
|
||
|
||
- MAGATAMA training automation was hardened locally on 2026-05-07 for all three lanes:
|
||
- target lanes:
|
||
- `magatamallm`
|
||
- `fo_blogllm`
|
||
- `tip_llm`
|
||
- core root cause confirmed:
|
||
- RunPod dataset refresh / lane export already worked
|
||
- RunPod jobs often reached `COMPLETED`
|
||
- but model adoption/version truth still depended on a single shared:
|
||
- `~/magatama-llm/fine-tuning/last_run.json`
|
||
- this made lane status and successful return/adoption ambiguous across models
|
||
- the training modal could also collapse late stream/adoption failures into a generic `network error`
|
||
- local code fixes now in place:
|
||
- `magatama/packages/fine-tuner/training_api.py`
|
||
- lane-specific last-run files added:
|
||
- `~/magatama-llm/fine-tuning/magatamallm-last_run.json`
|
||
- `~/magatama-llm/fine-tuning/fo_blogllm-last_run.json`
|
||
- `~/magatama-llm/fine-tuning/tip_llm-last_run.json`
|
||
- legacy `last_run.json` remains only as backward-compatible mirror for `magatamallm`
|
||
- successful RunPod adoption now creates:
|
||
- a release alias per lane, e.g. `<active-alias>-rN`
|
||
- active alias switching sequence is now:
|
||
- candidate model imported
|
||
- smoke-tested
|
||
- release alias created
|
||
- stable active alias repointed to that release alias
|
||
- adoption report now includes:
|
||
- `version_counter`
|
||
- `release_alias`
|
||
- `magatama/packages/fine-tuner/train.py`
|
||
- local metrics writing now also respects lane-specific last-run files via `TRAINING_LANE`
|
||
- `magatama/packages/dashboard/src/server.ts`
|
||
- `/api/llm/status` now reads lane-specific last-run metadata first
|
||
- `release_alias` is preferred as visible model version when present
|
||
- RunPod SSE catch now distinguishes:
|
||
- real generic training failure
|
||
- `COMPLETED` but no artifact / failed adoption
|
||
- the latter is now rendered as a truthful return/adoption failure, not a vague dataset/network issue
|
||
- `magatama/packages/dashboard/public/index-v2.html`
|
||
- training modal now suppresses misleading late generic `network error` if the server already emitted a terminal training status
|
||
- if the stream ends without a final terminal server event, the UI now explicitly says the registry/adoption state must be checked
|
||
- if the backend reports:
|
||
- completed without artifact
|
||
- completed without HF model
|
||
- completed but adoption failed
|
||
the modal now shows that exact reason
|
||
- local verification:
|
||
- `python3 -m py_compile` passed for:
|
||
- `training_api.py`
|
||
- `train.py`
|
||
- dashboard build passed:
|
||
- `pnpm -C packages/dashboard build`
|
||
- current operational blocker:
|
||
- live deployment to Erik was **not yet completed in this step**
|
||
- direct SSH checks returned:
|
||
- `Connection refused`
|
||
- then `Operation timed out`
|
||
- because of that, the new lane-specific automation logic is locally ready, but not yet confirmed live on Erik for the currently running:
|
||
- `tip_llm`
|
||
- `fo_blogllm`
|
||
- practical consequence:
|
||
- the code path is now prepared for full automation:
|
||
- pull from lane-specific training pool
|
||
- train on RunPod
|
||
- verify artifact existence
|
||
- adopt locally
|
||
- create new release alias/version
|
||
- repoint stable active alias
|
||
- show truthful status in UI
|
||
- but the current live Erik run still needs redeploy + verification once SSH is reachable again
|
||
|
||
- MAGATAMA local MagatamaLLM training state was re-verified on 2026-05-07:
|
||
- result:
|
||
- the lane export / dataset refresh worked
|
||
- a new locally adopted MagatamaLLM model did **not** land
|
||
- active MAGATAMA provider remains the older alias:
|
||
- `ollama:magatama-coder:latest`
|
||
- live/public evidence:
|
||
- `GET https://magatama.fichtmueller.org/api/llm/status`
|
||
- `activeProvider = ollama:magatama-coder:latest`
|
||
- `autoFixProvider = ollama:magatama-coder:latest`
|
||
- `training.lastTrainingAt = 2026-05-06T22:43:20Z`
|
||
- `training.modelVersion = magatama-coder:latest`
|
||
- `training.activeRun = null`
|
||
- this means the UI timestamp currently reflects the latest dataset/training-state update, not proof of a newly adopted local model.
|
||
- local Mac evidence:
|
||
- `ollama list` still shows:
|
||
- `magatama-coder:latest` → modified `3 weeks ago`
|
||
- `magatama-llm-v2-0:latest` → modified `11 days ago`
|
||
- no newer Magatama candidate/import alias appeared locally
|
||
- registry/adoption evidence:
|
||
- Erik lane manifest exists and is fresh:
|
||
- `/opt/magatama/training-data/runpod/magatamallm/manifest.json`
|
||
- `generatedAt = 2026-05-06T22:45:15.944Z`
|
||
- `train = 15679`
|
||
- `eval = 1743`
|
||
- `total = 17422`
|
||
- but Erik had no populated local adoption/registry state files in:
|
||
- `/opt/magatama/training-data/model-registry/models.json`
|
||
- `/opt/magatama/training-data/model-registry/runs.json`
|
||
- `/opt/magatama/training-data/model-registry/active.json`
|
||
- `/opt/magatama/data/llm-status.json`
|
||
- local repo only had historical `training-data/model-registry/training-runs.json`
|
||
- historical run evidence:
|
||
- recent `magatamallm` training-run records still show:
|
||
- `submitted`
|
||
- then `not_found_after_submit`
|
||
- or other non-adopted / worker-failure states
|
||
- there is still no verified “completed_and_adopted” proof for a new MagatamaLLM local model.
|
||
- operational conclusion:
|
||
- current truth:
|
||
- dataset/lane preparation works
|
||
- local model adoption is still the missing step
|
||
- MAGATAMA does **not** currently know more than the already active `magatama-coder:latest` alias
|
||
- next fix block remains:
|
||
- make RunPod/local completion count only when adoption succeeds
|
||
- persist adoption report + model registry state
|
||
- update active alias and version only after smoke-tested import succeeds
|
||
|
||
- MAGATAMA Switchblade port intelligence is now truly flowing end-to-end on 2026-05-06:
|
||
- live root cause:
|
||
- Switchblade itself already had the rich SG350 data (`description`, LLDP neighbor, peer port, octets), but MAGATAMA had still shown mostly flat port chips.
|
||
- verified live on Erik:
|
||
- the real Switchblade runtime is the PM2 app `switchblade` under `/opt/switchblade-app`, not the older `/opt/switchblade` tree.
|
||
- `GET http://127.0.0.1:3000/api/discovery/snmp` for `192.168.178.2` already returned rich rows such as:
|
||
- `GigabitEthernet3` → description `Aruba-1830-UNUSED`, neighbor `VN46KYC0G0`, peer port `11`
|
||
- `GigabitEthernet5` → description `Tashi-204`, neighbor `fritz.box`, peer `LAN:1`
|
||
- `GigabitEthernet25` → description `to Cisco Business 220 Series`, neighbor `Switch39688E`, peer `gi9`
|
||
- the remaining loss point was MAGATAMA’s own Switchblade sync/persistence path.
|
||
- MAGATAMA sync hardening:
|
||
- `scripts/switchblade_live_sync.ts`
|
||
- now prefers live SNMP discovery data when it is richer than `/api/devices/<ip>`
|
||
- now maps `description`, `peerDevice`, `peerPort`, `connectedHost`, `inOctets`, `outOctets` into rack device ports
|
||
- added optional debug snapshot dump support via `SWITCHBLADE_DEBUG_SNAPSHOT_FILE`
|
||
- sanitizes unreadable peer-port strings and drops synthetic high-index numeric pseudo-ports
|
||
- verified with a forced live run on Erik:
|
||
- `Top of Rack Switch` now exports `28` real SG350 ports into the rack snapshot instead of the earlier flattened/odd set
|
||
- sample verified payloads before POST:
|
||
- port 3 → `Aruba-1830-UNUSED` / `VN46KYC0G0` / `11`
|
||
- port 5 → `Tashi-204` / `fritz.box` / `LAN:1`
|
||
- port 25 → `to Cisco Business 220 Series` / `Switch39688E` / `gi9`
|
||
- MAGATAMA core hardening:
|
||
- `packages/core/src/routes/health-types.ts`
|
||
- `SwitchbladePortSnapshot` now preserves:
|
||
- `description`
|
||
- `vlan`
|
||
- `macCount`
|
||
- `peerDevice`
|
||
- `peerPort`
|
||
- `connectedHost`
|
||
- `transceiver`
|
||
- `inOctets`
|
||
- `outOctets`
|
||
- `packages/core/src/routes/health-support.ts`
|
||
- `normalizeSwitchbladePort()` now keeps those additional port fields instead of silently truncating them
|
||
- rebuilt locally and re-rsynced the new `packages/core/dist` to Erik
|
||
- dashboard/UI hardening:
|
||
- `packages/dashboard/public/index-v2.html`
|
||
- port chips already had custom tooltip support; now they also carry native `title=` fallback text
|
||
- this reduces the old “question mark / unclear hover” problem in browsers that do not immediately show the custom bubble
|
||
- live public verification after deploy:
|
||
- `GET https://magatama.fichtmueller.org/api/switchblade/snapshot`
|
||
- now contains enriched SG350 rack-port records with:
|
||
- `description`
|
||
- `peerDevice`
|
||
- `peerPort`
|
||
- `connectedHost`
|
||
- `inOctets`
|
||
- `outOctets`
|
||
- public snapshot timestamp verified:
|
||
- `receivedAt = 2026-05-06T22:51:59.247Z`
|
||
- `Top of Rack Switch` in the public snapshot now exposes meaningful peer/use-case data instead of only flat status counters
|
||
- operator impact:
|
||
- MAGATAMA can now answer the actual operational question per port:
|
||
- what is on this port
|
||
- what is it talking to
|
||
- what does the link look like
|
||
- this is now grounded in Switchblade live SNMP/LLDP data, not guesswork.
|
||
|
||
- TIP/Blog lane separation was materially corrected on 2026-05-06:
|
||
- root cause:
|
||
- `TIP_LLM` was still ingesting blog-/writer-shaped rows from the canonical lane pool and shared transceiver corpora.
|
||
- local inspection showed the old TIP export had `6250` train rows, of which `6087` still matched blog/writer patterns.
|
||
- dataset builder and Gitea sync were hardened:
|
||
- `scripts/runpod_dataset_builder.ts`
|
||
- added strict `tipDatasetAllowed(...)`
|
||
- `TIP_LLM` now rejects blog-shaped source rows at dataset-build time
|
||
- `TIP_LLM` now rejects blog-like `system`, `user`, and markdown-article `assistant` patterns
|
||
- registry fallback for `TIP_LLM` now only uses lane-compatible datasets
|
||
- `scripts/sync_gitea_training_pool.ts`
|
||
- canonical TIP pool refresh now uses the stricter lane-alignment rules
|
||
- redundant `merged.jsonl` copies for `fo_blogllm` and `tip_llm` are no longer rewritten, to avoid local disk exhaustion from duplicate lane artifacts
|
||
- local disk issue encountered and fixed:
|
||
- full refresh failed with `ENOSPC` while writing `training-data/gitea-learning-pool/tip_llm/merged.jsonl`
|
||
- redundant lane `merged` artifacts for `fo_blogllm` and `tip_llm` were truncated and the sync script was changed to stop recreating them
|
||
- free disk space returned from `377Mi` to `17Gi`
|
||
- locally verified after rebuild:
|
||
- `TIP_LLM` RunPod export:
|
||
- `train = 233`
|
||
- `eval = 26`
|
||
- `total = 259`
|
||
- `blog/writer matches = 0`
|
||
- first TIP rows now use the correct TIP system prompt:
|
||
- `You are TIP_LLM, a research and market-intelligence analyst for transceivers, switches, and vendor ecosystems...`
|
||
- corrected artifacts and scripts were synced to Erik and `pnpm training:refresh-all` was rerun there.
|
||
- live verified on Erik/public API:
|
||
- `magatamallm`
|
||
- `datasetSource = url`
|
||
- `collectedExamples = 15679`
|
||
- `evalExamples = 1743`
|
||
- `totalExamples = 17422`
|
||
- `newSinceLastTraining = 15679`
|
||
- `fo_blogllm`
|
||
- `datasetSource = url`
|
||
- `collectedExamples = 17322`
|
||
- `evalExamples = 1926`
|
||
- `totalExamples = 19254`
|
||
- `neverTrained = true`
|
||
- `tip_llm`
|
||
- `datasetSource = url`
|
||
- `collectedExamples = 231`
|
||
- `evalExamples = 26`
|
||
- `totalExamples = 257`
|
||
- `neverTrained = true`
|
||
- operational conclusion:
|
||
- lane-specific dataset truth is now real on Erik.
|
||
- `TIP_LLM` is no longer silently borrowing the FO_Blog behavior lane.
|
||
- the next remaining hard problem is now RunPod artifact adoption/validation, not lane contamination.
|
||
|
||
- MAGATAMA frontend/runtime consistency was repaired again on 2026-05-06:
|
||
- dashboard and core were rebuilt locally and redeployed to Erik.
|
||
- live processes restarted successfully:
|
||
- `magatama-dashboard`
|
||
- `magatama`
|
||
- public `api/llm/status` now shows the true lane-export totals for `magatamallm`:
|
||
- `collectedExamples = 15620`
|
||
- `effectiveExamples = 15620`
|
||
- `evalExamples = 1736`
|
||
- `totalExamples = 17356`
|
||
- `newSinceLastTraining = 15620`
|
||
- root cause for the stale `1097` display:
|
||
- the RunPod start SSE path still logged the legacy deduplicated `fixes.jsonl` corpus.
|
||
- this was changed so RunPod launches no longer present the legacy `1097` count as the active training truth.
|
||
- after dataset refresh the UI now emits the lane manifest totals instead.
|
||
- RunPod completion handling was hardened:
|
||
- worker `COMPLETED` is no longer trusted blindly.
|
||
- MAGATAMA now scans RunPod worker logs for real training failures (`Traceback`, `SyntaxError`, non-zero exit, etc.) before treating the run as successful.
|
||
- if the worker logs show a hidden failure, MAGATAMA records this as `completed_with_worker_failure` instead of pretending the run succeeded.
|
||
- public findings state remains currently empty:
|
||
- `GET /api/findings?limit=1` returned `{"findings":[],"total":0}`
|
||
- this is now rendered with an explicit empty-state row instead of a visually blank table.
|
||
- Attack Paths empty-state is now intentionally explicit rather than looking broken.
|
||
- Frontend cache and scope handling were hardened:
|
||
- cache version bumped to `2026-05-06b`
|
||
- stale legacy `magatama_api_cache:*` entries are cleared
|
||
- per-endpoint TTLs added
|
||
- invalid or empty scope selections are normalized instead of silently leaving the UI in misleading empty views
|
||
- Switchblade rack port hover was materially improved:
|
||
- port chips now carry `data-tooltip`
|
||
- custom tooltip CSS is live on Erik
|
||
- the old browser-native “question mark only” behavior should be replaced by a readable hover bubble
|
||
- Changelog self-healing was added in core:
|
||
- stale cached changelog data older than 6h now forces a rebuild from git history
|
||
- verified live via dashboard proxy on Erik:
|
||
- `generatedAt = 2026-05-06T15:18:42.708Z`
|
||
- latest visible entries include `2026-04-30` items again instead of appearing frozen at `30.05`
|
||
|
||
- MAGATAMA lane-specific training pools and RunPod dataset automation were finished on 2026-05-06:
|
||
- root cause:
|
||
- the training modal always fetched `/api/llm/status` without a lane, so `FO_BlogLLM` and `TIP_LLM` still showed the `magatamallm` pool.
|
||
- dashboard/server were updated so `/api/llm/status?lane=...` is now truly lane-aware.
|
||
- the training modal now refreshes per selected lane and rewrites:
|
||
- title
|
||
- runtime label
|
||
- pool path
|
||
- counts
|
||
- dataset source
|
||
- MAGATAMA dashboard env on Erik was switched to URL dataset mode for all lanes via `ecosystem.config.cjs`:
|
||
- `RUNPOD_DATASET_SOURCE=url`
|
||
- `RUNPOD_DATASET_SOURCE_MAGATAMALLM=url`
|
||
- `RUNPOD_DATASET_SOURCE_FO_BLOGLLM=url`
|
||
- `RUNPOD_DATASET_SOURCE_TIP_LLM=url`
|
||
- live verified on Erik after restart:
|
||
- `fo_blogllm`
|
||
- `datasetSource = url`
|
||
- `collectionsPath = /opt/magatama/training-data/runpod/fo_blogllm/manifest.json`
|
||
- `train = 28`
|
||
- `eval = 4`
|
||
- `total = 32`
|
||
- `tip_llm`
|
||
- `datasetSource = url`
|
||
- `collectionsPath = /opt/magatama/training-data/runpod/tip_llm/manifest.json`
|
||
- `train = 36`
|
||
- `eval = 4`
|
||
- `total = 40`
|
||
- `magatamallm`
|
||
- remains on lane-export counts (`15620 / 1736 / 17356`)
|
||
- operator impact:
|
||
- no Hugging Face dataset publish is required anymore for MAGATAMA RunPod launches.
|
||
- every supported LLM lane now points to its own local/Gitea-backed lane export instead of reusing `magatamallm`.
|
||
|
||
- MAGATAMA training + Attack Paths + Atlas exposure were corrected again on 2026-05-06:
|
||
- the RunPod serverless training start failure was not a RunPod outage.
|
||
- root cause was missing training scripts on Erik (`training_full_refresh.ts` and related helpers were absent under `/opt/magatama/scripts`).
|
||
- Codex synced the full local `magatama/scripts/` tree to Erik, added a safe fallback in `scripts/model_registry_build.ts`, and synced the local `training-data/model-registry/` directory.
|
||
- verified on Erik:
|
||
- `pnpm training:refresh-all` now succeeds.
|
||
- fresh dataset totals after dedupe:
|
||
- `magatamallm`: `92,742` raw → `17,356` effective (`15,620 train / 1,736 eval`)
|
||
- `fo_blogllm`: `32` total (`28 train / 4 eval`)
|
||
- `tip_llm`: `40` total (`36 train / 4 eval`)
|
||
- important nuance:
|
||
- Codex did **not** execute the final Hugging Face publish step from Erik in this chat.
|
||
- local/script/build failures are fixed; external dataset publish still depends on the selected dataset source and explicit publish intent.
|
||
- MAGATAMA Attack Paths UX is no longer a misleading blank panel:
|
||
- the page now distinguishes between:
|
||
- no live attack paths
|
||
- historical fallback paths
|
||
- empty selected scope (`0 assets in scope`)
|
||
- when a user narrows the scope to a rack/location with zero scoped assets, the graph explicitly says so instead of looking broken.
|
||
- live dashboard HTML on Erik now contains:
|
||
- `Im aktuellen Scope liegen 0 Assets.`
|
||
- `Erweitere Standort oder Datacenter / Rack, damit MAGATAMA korrelierbare Assets und Pfade darstellen kann.`
|
||
- `Ohne offene mehrstufige Korrelationen bleibt die Graph-Sicht bewusst leer.`
|
||
- MAGATAMA code/training hardening was extended:
|
||
- `scripts/test_runpod_adapter.py` no longer loads tokenizer/model with `trust_remote_code=True`.
|
||
- `scripts/ollama_adapter_bridge.py` no longer loads tokenizer/model with `trust_remote_code=True`.
|
||
- this removed the live CODE finding around `HuggingFace trust_remote_code` on Erik.
|
||
- Atlas exposure logic was tightened to stop reopening noisy LAN management findings:
|
||
- generic `atlas-exposure` findings now only stay operationally open for exposure that is meaningful enough to track as a finding.
|
||
- internal RFC1918 management/service ports discovered by the broad atlas scan are no longer promoted into open Guard findings just because they exist on the LAN.
|
||
- host-specific posture for Proxmox / Erik / Mac Studio remains the job of explicit host-audit logic.
|
||
- after rebuild + deploy + health sync:
|
||
- live Postgres open findings returned to `0`.
|
||
- Follow-up hardening on the same block:
|
||
- the earlier RunPod error path in MAGATAMA dashboard was made more truthful.
|
||
- dataset preparation now distinguishes:
|
||
- local `training:refresh-all` failure
|
||
- optional Hugging Face publish failure
|
||
- URL-based dataset mode with no external publish required
|
||
- the training SSE flow now explicitly tells the operator whether RunPod is using:
|
||
- Hugging Face dataset source
|
||
- or MAGATAMA URL-bundle dataset source
|
||
- this avoids misleading `RunPod not reachable` wording when the actual failure is in dataset preparation.
|
||
- follow-up serverless verification on 2026-05-06 narrowed the remaining fault further:
|
||
- MAGATAMA submit logic now verifies that a RunPod job really exists under `/status/{jobId}` instead of trusting `/run`.
|
||
- payloads were aligned more closely with the official Axolotl serverless schema:
|
||
- `model_type=AutoModelForCausalLM`
|
||
- `tokenizer_type=AutoTokenizer`
|
||
- dataset `split: train`
|
||
- optimizer `adamw_torch_fused`
|
||
- verified full run attempt:
|
||
- job id `9bc4b16b-755b-465b-aadf-b46f2fe467a3-e2`
|
||
- disappeared as `not_found_after_submit` (`404 job not found`)
|
||
- verified canary after payload fix:
|
||
- job id `a4ac6951-7ed7-43cb-80d8-5ab61533c2da-e2`
|
||
- immediately materialized as `IN_QUEUE`
|
||
- then still disappeared on later reconcile as `not_found_after_submit`
|
||
- current conclusion:
|
||
- the old MAGATAMA bug is fixed.
|
||
- the remaining problem is now likely on the RunPod endpoint/release side: jobs are accepted and briefly queued, but do not survive long enough to produce a durable serverless status lifecycle.
|
||
- operational rule:
|
||
- do not treat `submitted` or a brief `IN_QUEUE` as proof of a usable serverless training run.
|
||
- only trust the run once it reaches `IN_PROGRESS` or a durable terminal state with artifact evidence.
|
||
- follow-up training count fix on 2026-05-06 corrected the Training UI source-of-truth:
|
||
- MAGATAMA had still shown `1097` because the dashboard was counting the legacy deduplicated fix corpus instead of the current lane-specific RunPod export.
|
||
- dashboard now prefers `training-data/runpod/magatamallm/manifest.json` for the visible MagatamaLLM training count.
|
||
- synced current lane export to Erik and restarted `magatama-dashboard`.
|
||
- verified public API now returns:
|
||
- `collectedExamples = 1367`
|
||
- `effectiveExamples = 1367`
|
||
- `evalExamples = 152`
|
||
- `totalExamples = 1519`
|
||
- `newSinceLastTraining = 1367`
|
||
- if the browser still shows `1097`, treat it as stale cached UI and hard reload.
|
||
|
||
- MAGATAMA was repaired end-to-end to a clean operational baseline:
|
||
- live guard host-audits for Erik, Mac Studio, and Proxmox were corrected and rerun.
|
||
- open findings were reduced all the way to `0` in Postgres.
|
||
- false-positive Proxmox baseline findings were removed by teaching the audit to treat internal-only management ports and default-only rpcbind exposure as acceptable for this host.
|
||
- code scanner false positives from generated/report artifacts remain excluded.
|
||
- Live MAGATAMA protection/runtime state after the 2026-05-06 remediation:
|
||
- `open findings: 0`
|
||
- `queueExecuting: 0`
|
||
- `queueBlocked: 0`
|
||
- `queueFailed: 0`
|
||
- public `/api/health` returns `status: ok`
|
||
- public `/api/active-resolvers` returns:
|
||
- `MAGATAMA Core: working`
|
||
- `MagatamaLLM: working`
|
||
- `Claude (secondary): working`
|
||
- `Codex (secondary/manual): idle`
|
||
- `Copilot (secondary/manual): idle`
|
||
- Important resolver truth fix on 2026-05-06:
|
||
- live `codex_enabled=false` in MAGATAMA settings was causing Codex to show as a broken resolver.
|
||
- dashboard logic was updated so disabled Codex/Copilot now show truthfully as `idle` with `In MAGATAMA settings disabled`, instead of pretending there is a runtime outage.
|
||
- the local codex bridge on Erik is reachable but currently reports `auth_required`; do not treat that as a production outage while Codex is intentionally disabled in settings.
|
||
- Remaining real operational gap after findings hit zero:
|
||
- MAGATAMA still knows more assets than it actively telemeters.
|
||
- last public protection proof showed:
|
||
- `knownAssets: 79`
|
||
- `hostsWithTelemetry: 27`
|
||
- `assetsWithoutTelemetry: 52`
|
||
- these are currently inventory/discovery-only assets, not open findings, but they remain the next real coverage expansion area.
|
||
|
||
- MAGATAMA cross-repo state from the same chat is now synced into this handoff:
|
||
- Compliance framework cards in MAGATAMA are clickable and open per-framework requirement details.
|
||
- MAGATAMA training status was corrected so `New Since Last Training` no longer falsely shows `0`.
|
||
- Live verified/deduped MAGATAMA training state after the fix:
|
||
- `collectedExamples: 49`
|
||
- `rawExamples: 58`
|
||
- `duplicateExamples: 9`
|
||
- `effectiveExamples: 49`
|
||
- `newSinceLastTraining: 49`
|
||
- MAGATAMA now filters training metrics to verified/trainable examples only.
|
||
- Failed/escalated MAGATAMA remediation records should go to `errors.jsonl`, not the main `fixes.jsonl`, so the next MagatamaLLM run does not train on junk.
|
||
- Gitea-backed training pool remains the default target for training writes.
|
||
- MAGATAMA coverage-gap and training-integrity hardening on 2026-05-06:
|
||
- the earlier `49` medium `atlas-coverage-gap` findings were traced to Atlas treating inventory-only and discovery-only assets as operational protection failures.
|
||
- core logic was tightened so Atlas coverage findings now open only for managed operational assets:
|
||
- exposure-backed assets
|
||
- explicit non-auto owner
|
||
- configured telemetry expectation
|
||
- critical/high criticality
|
||
- infrastructure metadata or managed infra device types
|
||
- loopback and passive reference/inventory assets no longer reopen noisy guard findings.
|
||
- local build succeeded, the new core dist was deployed to Erik, and the first post-deploy guard scan resolved stale findings.
|
||
- live Postgres state after deploy: `open findings = 0`.
|
||
- training integrity bug was fixed in `packages/core/src/learning/fix-tracking.ts`:
|
||
- verified fixes now append to `training-data/gitea-learning-pool/magatamallm/fixes.jsonl`
|
||
- failed/escalated/report-only runs now belong in `errors.jsonl`
|
||
- two explicit Codex-written training entries were appended to the MAGATAMA Gitea-backed fixes corpus:
|
||
- atlas coverage scope hardening
|
||
- training path integrity fix
|
||
- corpus cleanup + dedupe was executed afterward:
|
||
- pre-dedupe backup kept locally as:
|
||
- `magatama/training-data/gitea-learning-pool/magatamallm/fixes-pre-dedupe-20260506.jsonl`
|
||
- resulting verified corpus:
|
||
- `fixes.jsonl = 1,368` unique verified training rows
|
||
- resulting failure corpus:
|
||
- `errors.jsonl = 4` tracked failed/escalated rows
|
||
- integrity report now exists at:
|
||
- `magatama/training-data/gitea-learning-pool/magatamallm/corpus-integrity-report.json`
|
||
- latest integrity totals:
|
||
- `scanned: 1368`
|
||
- `verified: 1368`
|
||
- `movedToErrors: 4`
|
||
- `parseErrors: 0`
|
||
- `invalidVerifiedFlag: 0`
|
||
- Complete Codex chat sync was added:
|
||
- `sync/history/2026-04-29-codex-complete-chat-sync.md`
|
||
- captures Ghost/blog updates, LinkedIn voice preferences, LPO/AI-fabric blog edits, Rest-Is-Not-Laziness scheduling replacement, and security notes.
|
||
- confirms no secrets were written into sync.
|
||
- confirms TIP crawler/robot planning remains TIPLLM-only.
|
||
- confirms Erik remains controller/light `erik-safe` only, with heavy crawler work assigned to Proxmox/Pi workers.
|
||
- Codex sync-start confirmation was added:
|
||
- `sync/history/2026-04-29-codex-sync-start-confirmation.md`
|
||
- confirms Codex read this TIP handoff, checked the sibling LLM Gateway handoff, and is treating `sync/` as binding.
|
||
- no code changes, crawler jobs, queue waves, PM2 restarts, or Erik load were initiated during this confirmation.
|
||
- Codex follow-up on 2026-04-29 clarified the active BlogLLM model:
|
||
- TIP shows `fo-blog-v7`, but this is not a normal Ollama GGUF manifest.
|
||
- It is a local Adapter Bridge / Mac Studio model backed by the RunPod-trained PEFT adapter:
|
||
`/Users/renefichtmueller/Desktop/Claude Code/magatama/training-data/runpod/pod-runs/2026-04-25-fo-tip/final/adapters/fo_blogllm/final-adapter`
|
||
- Bridge definition:
|
||
`/Users/renefichtmueller/Desktop/Claude Code/magatama/scripts/ollama_adapter_bridge.py`
|
||
- TIP API default:
|
||
`packages/api/src/llm/client.ts` uses `OLLAMA_LLM_MODEL || "fo-blog-v7"`.
|
||
- `fo-blog-v8` remains the next training candidate, not the currently active TIP BlogLLM model.
|
||
- Full Codex session handoff was added:
|
||
- `sync/history/2026-04-29-codex-full-session-handoff.md`
|
||
- covers TIP verification, product image/detail crawling, Blog Engine Hot Topics, TIPLLM robots, training pool, Erik status, and cross-repo sync.
|
||
- Added a verification robot controller:
|
||
- `packages/scraper/src/robots/verification-robots.ts`
|
||
- command: `npm run robots:verification -w packages/scraper -- --status`
|
||
- Added TIPLLM robot experience writing:
|
||
- `packages/scraper/src/crawler-llm/training-data-writer.ts`
|
||
- writes raw robot audit rows and SFT records.
|
||
- Added Gitea training pool import to TIP learning-pool build:
|
||
- `scripts/tip-learning-pool-build.ts`
|
||
- imports `TIP_TRAINING_REPO/qa-pairs/*.jsonl` into the `tip_llm` lane.
|
||
- Added docs:
|
||
- `docs/TIP_SELFLEARNING_WORKFLOW.md`
|
||
- Added package script:
|
||
- `packages/scraper/package.json`
|
||
- `robots:verification`
|
||
|
||
## Gitea Training Pool
|
||
|
||
- Existing local clone: `/tmp/tip-training-data`
|
||
- Gitea repo: `rene/tip-training-data`
|
||
- Latest pushed training commit:
|
||
- `f1c83f8 crawl: add robot-status training records [2026-04-29T20:11:24.091Z]`
|
||
- First robot experience record was written to:
|
||
- `/tmp/tip-training-data/qa-pairs/robot-control-high.jsonl`
|
||
- `/tmp/tip-training-data/robot-experiences/2026-04-29.jsonl`
|
||
|
||
## MAGATAMA Training / Operations State
|
||
|
||
- Relevant local repo:
|
||
- `/Users/renefichtmueller/Desktop/Claude Code/magatama`
|
||
- Latest confirmed live MAGATAMA findings state:
|
||
- `open findings: 0` on `2026-05-06`
|
||
- Latest confirmed live resolver state:
|
||
- `Codex` and `Copilot` intentionally `idle/disabled`
|
||
- not a runtime outage, but a settings choice until gateway/bridge auth is intentionally re-enabled
|
||
- Latest confirmed live MAGATAMA training metric after dashboard fix:
|
||
- `newSinceLastTraining: 49`
|
||
- Meaning:
|
||
- the old `0` was incorrect.
|
||
- the currently visible trainable MAGATAMA corpus is based on verified and deduplicated examples only.
|
||
- Latest corpus integrity state after cleanup:
|
||
- operational Gitea-backed MAGATAMA training corpus is now much smaller but cleaner:
|
||
- `1368` unique verified rows
|
||
- `4` live failure/escalation rows in `errors.jsonl`
|
||
- do not confuse raw historical volume with real trainable signal.
|
||
- Important training integrity rule:
|
||
- report-only or failed/escalated records must not be treated as verified training fixes.
|
||
- keep them separated from the main verified training corpus.
|
||
|
||
## Erik Status
|
||
|
||
- Synced TIPLLM robot/training code to `/opt/tip`.
|
||
- Did not start crawler jobs.
|
||
- Did not enqueue robot waves.
|
||
- Did not restart PM2 services.
|
||
- Remote scraper TypeScript build is passing after removing two stale misplaced remote-only duplicate files:
|
||
- `/opt/tip/packages/scraper/src/scrapers/scheduler.ts`
|
||
- `/opt/tip/packages/scraper/src/vendor-discovery-crawler.ts`
|
||
- `tip-api` and `tip-scraper-daemon` are online.
|
||
- Shared Erik note from the same chat:
|
||
- MAGATAMA dashboard/core were redeployed during compliance/training fixes.
|
||
- TIP crawler policy remains unchanged: Erik is controller/light runner only, not heavy crawl execution host.
|
||
|
||
## Last Live Verification Snapshot
|
||
|
||
From 2026-04-29:
|
||
|
||
- Total transceivers: `13,546`
|
||
- Price verified: `7,250`
|
||
- Image verified: `7,025`
|
||
- Details verified: `6,243`
|
||
- Fully verified: `5,812`
|
||
- Last price observation: `2026-04-29 19:15:53 UTC`
|
||
- Last stock observation: `2026-04-29 19:15:56 UTC`
|
||
|
||
## Latest MAGATAMA Training / RunPod Truth
|
||
|
||
Confirmed on `2026-05-06`:
|
||
|
||
- Lane-specific training pools are now materially separated and no longer all fallback to `magatamallm`.
|
||
- Live Erik dashboard API now reports:
|
||
- `magatamallm`
|
||
- `1367 train`
|
||
- `152 eval`
|
||
- `1519 total`
|
||
- `newSinceLastTraining = 1367`
|
||
- `fo_blogllm`
|
||
- `17353 train`
|
||
- `1929 eval`
|
||
- `19282 total`
|
||
- `newSinceLastTraining = 17353`
|
||
- active local model resolves to `fo-blog-v7`
|
||
- `tip_llm`
|
||
- `6482 train`
|
||
- `721 eval`
|
||
- `7203 total`
|
||
- `newSinceLastTraining = 6482`
|
||
- target active model is `tip-llm-v1`, but this model is not yet present locally in Ollama
|
||
- Result:
|
||
- previous `1097` everywhere was stale / wrong.
|
||
- selected lane now controls its own manifest, model label, and training counts.
|
||
|
||
### Gitea-backed Pool Materialization
|
||
|
||
- `magatamallm` Gitea pool remains canonical and populated.
|
||
- `fo_blogllm` and `tip_llm` Gitea-backed pool folders were previously almost empty; they are now materialized from the local RunPod lane exports.
|
||
- Lane manifests and JSONL exports now exist under:
|
||
- `training-data/gitea-learning-pool/fo_blogllm/`
|
||
- `training-data/gitea-learning-pool/tip_llm/`
|
||
|
||
### RunPod Completion Hardening
|
||
|
||
- MAGATAMA dashboard code now treats RunPod `COMPLETED` as success only after:
|
||
1. target model artifact is referenced
|
||
2. local Mac training API adopts/imports the artifact
|
||
3. lane-specific smoke tests pass
|
||
4. active Ollama alias is updated
|
||
- New local adoption endpoint is:
|
||
- `POST /adopt-runpod-model`
|
||
|
||
### Mac Training API State
|
||
|
||
- The old LaunchAgent on Mac Studio was still serving the legacy training API from:
|
||
- `~/magatama-llm/service/training_api.py`
|
||
- It has now been upgraded in place so Erik sees the new adoption-capable API.
|
||
- Verified from Erik:
|
||
- `http://192.168.178.213:3214/health` returns the new service
|
||
- it now exposes `register_script` pointing into the MAGATAMA repo
|
||
- `POST /adopt-runpod-model` exists and rejects unauthenticated requests with `401`, proving the route is live
|
||
|
||
### Still Outstanding
|
||
|
||
- A fully successful end-to-end RunPod fine-tune with:
|
||
- real worker success
|
||
- real artifact
|
||
- successful local Ollama import
|
||
- active alias switch
|
||
- smoke-test proof
|
||
has not yet been re-verified after the new adoption pipeline was wired in.
|
||
- Latest live proof run on `2026-05-06`:
|
||
- job id: `2112a7ab-68c2-4411-a44f-6edb7ad377df-e1`
|
||
- materialized correctly
|
||
- reached `IN_PROGRESS`
|
||
- then `COMPLETED`
|
||
- but RunPod `status/{job}` returned no `output` object, no model artifact reference, and no Hugging Face repo result
|
||
- current MAGATAMA handling now correctly classifies this as `completed_without_model_artifact`, not as success
|
||
- `tip_llm-v1` is still not installed locally in Ollama.
|
||
|
||
### Pulso AI Recommendation
|
||
|
||
- Keep a shared network/transceiver/switch core corpus with TIP.
|
||
- Do not collapse `Pulso AI` into the same instruction lane as `TIP_LLM`.
|
||
- Recommended split:
|
||
- `TIP_LLM`
|
||
- research
|
||
- crawler / scraper / robot planning
|
||
- vendor / firmware / issue extraction
|
||
- `Pulso AI`
|
||
- product responses
|
||
- support
|
||
- diagnostics
|
||
- operator explanation layer
|
||
|
||
## Safe Next Steps
|
||
|
||
1. Clone or pull Gitea `origin` on laptop/Claude Code.
|
||
2. Read this folder first.
|
||
3. For BlogLLM work, treat `fo-blog-v7` as Adapter Bridge / PEFT adapter, not as a `~/.ollama` GGUF model.
|
||
4. Also read `llm-gateway/sync/CURRENT.md` when work touches shared Erik infrastructure, LLM routing, bridges, auth, TIPLLM, or crawler orchestration.
|
||
5. For TIP robot/crawler planning, use TIPLLM only. Do not route this lane through external AI providers.
|
||
6. When training pools or model stats look suspicious, prefer verified-only counts and check whether failed/escalated rows polluted the corpus.
|
||
7. For MAGATAMA-adjacent work, keep writing learnings back into the Gitea-backed pool and avoid training on report-only pseudo-fixes.
|
||
8. If testing robots, start with dry runs only:
|
||
|
||
```bash
|
||
npm run robots:verification -w packages/scraper -- --status
|
||
npm run robots:verification -w packages/scraper -- --tipllm-plan --limit=3
|
||
npm run robots:verification -w packages/scraper -- --enqueue=details-fast-lane --profile=erik-safe --dry-run
|
||
```
|
||
|
||
9. Only dispatch real crawl work after deciding the target host:
|
||
- Erik: `erik-safe`, tiny batches only.
|
||
- Pi: `pi-fetch`.
|
||
- Proxmox: `proxmox-heavy`.
|
||
|
||
## Dirty Worktree Note
|
||
|
||
There are existing uncommitted changes outside `sync/`. Some are Codex work from this session, some appear pre-existing or from earlier Claude/Codex work. Do not blindly revert them. Review `git status --short` before committing broader changes.
|
||
|
||
## Latest Sync Commits
|
||
|
||
- `6c42ca7 docs: add shared agent sync handoff`
|
||
- `8e7c5aa docs: link llm-gateway sync handoff`
|
||
- `bba48d3 sync: record magatama atlas rematerialization fix`
|
||
- `fd29bee sync: record magatama atlas fallback and port detail live fixes`
|
||
- `8b42077 sync: refresh cross-agent chat handoff`
|
||
- Pending after this update:
|
||
- watch whether any future guard exposure findings are genuine operational issues or new false positives.
|
||
- if failures still appear inside `fixes.jsonl`, scrub historic pollution and backfill `errors.jsonl`.
|
||
|
||
## 2026-05-09 Addendum — Live Atlas + Lane Registry Truth
|
||
|
||
### Atlas / Findings
|
||
|
||
- MAGATAMA Atlas was not actually empty; the public UI could still look blank while live proof data already showed:
|
||
- `knownAssets: 57`
|
||
- `hostsWithTelemetry: 22`
|
||
- `assetsWithoutTelemetry: 35`
|
||
- `auditedHosts: 3`
|
||
- `queueBlocked: 28`
|
||
- Root causes fixed live:
|
||
1. `packages/core/src/routes/health-builders.ts`
|
||
- Atlas audits / exposure now rematerialize operational findings before proof rendering.
|
||
2. `packages/core/src/scheduler.ts`
|
||
- generic stale auto-resolve no longer auto-closes:
|
||
- `atlas-coverage-gap`
|
||
- `atlas-exposure`
|
||
- `atlas-host-audit`
|
||
3. `packages/dashboard/public/index-v2.html`
|
||
- if proof data is temporarily empty or stale, Atlas now derives a fallback proof model from the current snapshot so the top cards do not render as blank.
|
||
- Live public verification after deploy:
|
||
- `/api/protection-proof` shows non-zero Atlas truth again.
|
||
- `/api/findings?limit=10` shows open `atlas-coverage-gap` findings again.
|
||
|
||
### Training / Lane Registry
|
||
|
||
- The public training status is now honest for the current live state:
|
||
- `magatamallm`
|
||
- `datasetSource: url`
|
||
- `collectionsPath: /opt/magatama/training-data/runpod/magatamallm/manifest.json`
|
||
- `15679 train`
|
||
- `1743 eval`
|
||
- `17422 total`
|
||
- `lastRegistryRunStatus: completed_without_model_artifact`
|
||
- `fo_blogllm`
|
||
- lane registry rebuilt on Erik
|
||
- `lastRunStatus: completed_without_model_artifact`
|
||
- `tip_llm`
|
||
- lane registry rebuilt on Erik
|
||
- `lastRunStatus: completed_without_model_artifact`
|
||
- `scripts/model_registry_build.ts` now compiles per-lane metadata from:
|
||
- lane datasets
|
||
- lane RunPod manifests
|
||
- `training-runs.json`
|
||
- Live compiled registry on Erik now no longer sits at all-`null`; it exposes:
|
||
- `activeModel`
|
||
- `version`
|
||
- `lastRunId`
|
||
- `lastRunStatus`
|
||
- `datasetSource`
|
||
- `collectionsPath`
|
||
|
||
### Still Outstanding
|
||
|
||
- Full automatic training is still blocked by the managed RunPod Axolotl endpoint:
|
||
- jobs reach `COMPLETED`
|
||
- but no adoptable artifact is returned
|
||
- therefore MAGATAMA correctly records:
|
||
- `completed_without_model_artifact`
|
||
- That means:
|
||
- no new model version can be truthfully activated yet
|
||
- no Ollama alias switch should happen yet
|
||
- Remaining real blocker:
|
||
- move to `custom-magatama` RunPod worker with explicit adapter/model artifact publication.
|