transceiver-db/sync/history/2026-05-09-magatama-all-lane-runpod-training-start.md
Rene Fichtmueller a20094755d feat(scraper): Flexoptix REST API sync robot + scheduler integration
Replaces the GraphQL/search-based Flexoptix scraper with a proper
Magento 2 REST API integration that delivers authoritative SKUs,
prices, stock levels and compatibility data.

New files:
- packages/scraper/src/robots/flexoptix-api-sync.ts
  Self-contained robot: auth → paginated fetch → normalize → DB write.
  Reads FLEXOPTIX_API_BASE_URL / _USERNAME / _PASSWORD from env.
  Returns { fetched, normalized, skipped, priceWrites, stockWrites }.
  No file intermediary — in-memory pipeline.

- scripts/import-flexoptix-catalog.ts
  One-shot CLI importer for the Pulso-generated JSONL (Codex handover).

- docs/FLEXOPTIX_CATALOG_IMPORT.md
  Runbook for manual import + per-SKU specifications enrichment.

Scheduler changes:
- Added sync:flexoptix-catalog queue + work() handler
- Scheduled every 2h at 0 */2 * * * (same cadence as legacy job)
- scrape:pricing:flexoptix kept as legacy GraphQL fallback

Also includes Codex-generated additions from this sprint:
- audiocodes-oem scraper, seed-batch35/36/37, db.ts improvements,
  sql/102 verification reconcile, README + package.json updates
2026-05-13 16:36:33 +02:00

64 lines
1.5 KiB
Markdown

# MAGATAMA All-Lane RunPod Training Start
Date: 2026-05-09 23:09 UTC
## Scope
- Train all current MAGATAMA LLM lanes via RunPod:
- `magatamallm`
- `fo_blogllm`
- `tip_llm`
- `pulso_llm`
- `contact_llm`
## Preflight
- MAGATAMA services were online on Erik.
- Active RunPod endpoint reported by MAGATAMA: `0rmkf28w2g5gip`.
- RunPod worker kind: `custom-magatama`.
- Dataset source: URL-based lane export.
- Previous successful/adopted runs existed for:
- `magatamallm`
- `fo_blogllm`
- `tip_llm`
- No previous run existed yet for:
- `pulso_llm`
- `contact_llm`
## Runner Fix
- Fixed `scripts/trigger_lane_training_once.py` locally and on Erik.
- MAGATAMA Gitea commit: `76d4054`.
- The script previously used stale API keys:
- `iterations`
- `seedOnly`
- The MAGATAMA training API expects:
- `iters`
- `seed_only`
- Added `all` mode to run all lanes sequentially.
- Added streamed SSE logging so progress is visible during long RunPod runs.
## Live Run
- Started on Erik:
- `python3 -u scripts/trigger_lane_training_once.py all 500 false`
- Log:
- `/opt/magatama/logs/runpod-all-lanes-20260509T230549Z.log`
- First active lane:
- `magatamallm`
- First RunPod job:
- `89627e7e-8533-45db-9fe8-eca994018aa6-e2`
- Initial `magatamallm` dataset:
- `1375 train`
- `153 eval`
- `1528 total`
## Success Rule
- Do not treat RunPod `COMPLETED` as success by itself.
- A lane is only successful when:
- the model artifact exists,
- MAGATAMA imports/adopts it locally,
- smoke checks pass,
- the active alias/version is updated.