rene/llm-gateway

Fork 0

Rene Fichtmueller 060b846d9b feat: publish llm gateway v2 dashboard alongside restored workbench

2026-05-01 17:43:32 +02:00

30 KiB

Raw Blame History

Open Source Blueprint: Adaptive LLM Gateway

Companion documents:

AI_CONTROL_PLANE_SYSTEM_DESIGN.md — canonical control-plane architecture
OPEN_SOURCE_GAP_ANALYSIS.md — current gateway vs. OSS target
OPEN_SOURCE_FEATURE_MATRIX.md — feature state and priority
OPEN_SOURCE_IMPLEMENTATION_ROADMAP.md — phase-by-phase build plan

Vision

Turn the Context-X LLM Gateway into an open-source, self-adapting LLM control plane that can run on a user's own machine or server, discover the local AI/dev environment, and expose it through a secure MCP server plus OpenAI-compatible APIs.

The open-source version should not assume Context-X infrastructure. It should install cleanly, detect what is available, ask before using sensitive integrations, and then wire local models, hosted providers, tools, documents, and developer environments into one gateway.

Product Shape

Working name: Adaptive LLM Gateway

Core promise:

Bring your own local or hosted models.
Run a private MCP server with an optional local LLM.
Detect common tools and runtimes automatically.
Expose one unified API for apps, agents, IDEs, and automations.
Keep secrets and private data local by default.

Differentiating Core Modules

The open-source project should lead with four features that make it more than a model proxy:

Trust Router
Context Receipt
Shared Gitea Memory
AI Handoff Protocol

The second core layer should add learning, accountability, and repeatability:

Capability Benchmark Lab
Agent Reputation Score
Local Consent Ledger
Reproducible AI Runs

The execution pipeline should be:

Client Entry
  -> Trust Router
  -> Policy Engine
  -> Memory Query
  -> Compression Engine
  -> Provider Router
  -> Execution Layer
  -> Receipt Engine
  -> Memory Update
  -> Route Reflector Memory

Together they create a trusted coordination layer for all AI clients and agents on a user's system.

Request
  |
  v
Trust Router
  - validate client identity
  - assign trust level
  - classify request type and sensitivity
  |
  v
Policy Engine
  - enforce provider/model/tool permissions
  - apply cost, compliance, and project rules
  |
  v
Context Builder
  - memory
  - files
  - retrieved docs
  - compressed history
  |
  v
LLM / Agent / MCP Tool
  |
  v
Context Receipt + Shared Memory Update + Route Reflector Learning

Trust Router

The Trust Router decides which model, provider, agent, and tool chain may handle a request.

It should classify every request by:

data sensitivity
task type
required capabilities
allowed tools
user/team policy
cost and latency budget
local model availability

Suggested trust levels:

Trust Level	Meaning	Allowed Routing
`public`	Safe public/non-sensitive content	Any enabled provider
`internal`	Project context, private notes, normal code	Local or approved providers
`confidential`	Customer data, private business data, security findings	Local-only or explicitly trusted provider
`secret`	API keys, credentials, tokens, private keys	Block, redact, or local security scanner only

Policy example:

trust_router:
  default_mode: hybrid-safe
  rules:
    - match:
        contains_secret: true
      action: block
    - match:
        sensitivity: confidential
      route: local-only
    - match:
        task_type: code_generation
        sensitivity: internal
      route: [claude-code, codex, local-code-model]
    - match:
        task_type: brainstorming
        sensitivity: public
      route: [openai, anthropic, local]

The Trust Router should always explain its decision internally and optionally expose it to users.

Policy Engine

The Policy Engine evaluates what is allowed after the Trust Router has classified the request.

It should evaluate:

allowed providers
allowed models
allowed tools
data sensitivity
project policy
compliance rules
cost limits
offline/simulation/live mode

Example policies:

never send legal data to public APIs
prefer local models for internal code
use external models only if confidence is below a threshold
block requests containing secrets
require admin override for production deployment tools

The output is a route constraint set:

allowed_routes: [ollama, claude-code]
blocked_routes:
  - provider: openai
    reason: confidential data policy
required_redactions: []
max_request_cost_usd: 0.10
mode: live

Provider Router

The Provider Router makes the final execution decision after policy and compression.

It chooses:

local model
external provider
AI agent/client
MCP tool
fallback chain

Inputs:

policy constraints
model availability
provider health
latency
cost
benchmark scores
agent reputation
Route Reflector Memory

The Provider Router should support live, simulation, and offline modes.

Context Receipt

Every answer should be able to produce a receipt that shows what context was used and what was protected.

Example:

receipt_id: ctxr_2026_05_01_001
request_id: req_abc123
model: qwen2.5:14b
provider: ollama
trust_level: internal
route_reason:
  - local model selected because project memory was private
  - external providers skipped by policy
context_used:
  - type: memory
    ref: projects/adaptive-llm-gateway/PROJECT.md
  - type: file
    ref: OPEN_SOURCE_BLUEPRINT.md
  - type: retrieval
    ref: memory/decisions/2026-05-01-gitea-memory.md
context_blocked:
  - type: file
    ref: .env
    reason: secret pattern
  - type: provider
    ref: openai
    reason: confidential policy
tokens:
  input: 4200
  output: 900
  compressed_from: 13200
cost:
  estimated_usd: 0

Receipts can be stored locally, pushed to shared memory, or attached to audit logs.

AI Handoff Protocol

Define a simple handoff format so Claude Code, Codex, ChatGPT, Cursor, n8n, and other agents can pass work to each other without losing context.

Handoff files should be plain Markdown with YAML frontmatter or pure YAML/JSON.

Example:

handoff_version: 1
id: handoff_2026_05_01_001
project: adaptive-llm-gateway
from_agent: claude-code
to_agent: codex
created_at: 2026-05-01T12:00:00Z
status: ready
goal: Implement MCP memory tools.
current_state:
  summary: Blueprint exists. Need package scaffold and safe tool definitions.
  branch: main
  files_changed:
    - OPEN_SOURCE_BLUEPRINT.md
constraints:
  - Do not expose shell tools by default.
  - Do not sync secrets.
next_actions:
  - Create packages/mcp-server.
  - Add memory.search and memory.write tools.
  - Add tests for policy enforcement.
context_refs:
  - memory/projects/adaptive-llm-gateway/PROJECT.md
  - memory/decisions/2026-05-01-shared-gitea-memory.md
open_questions:
  - Should SQLite be mandatory for personal mode?
confidence: 0.82

Recommended folders:

memory/projects/<project>/handoffs/
memory/agents/<agent>/sessions/
memory/decisions/

The protocol should be append-first and easy for humans to read.

Capability Benchmark Lab

The gateway should benchmark every detected model, provider, and major agent integration before trusting it for routing.

Benchmarks should be local, transparent, and repeatable.

Test dimensions:

JSON/schema reliability
code generation
code patch quality
instruction following
German/English quality
summarization
tool-call readiness
latency
cost
context length behavior
private-data safety
refusal/guardrail behavior

Example benchmark result:

model: qwen2.5:14b
provider: ollama
benchmarked_at: 2026-05-01T12:00:00Z
scores:
  json_schema: 0.84
  code_generation: 0.71
  german: 0.88
  summarization: 0.91
  latency: 0.76
  privacy: 1.00
recommended_for:
  - private_summarization
  - german_drafts
  - internal_qa
not_recommended_for:
  - complex_code_patch

The Trust Router should use benchmark results instead of static assumptions.

Agent Reputation Score

Track how well each connected AI client or agent performs on real tasks.

Agents can include:

Codex
Claude Code
ChatGPT
Cursor
VS Code assistants
n8n workflows
local autonomous agents

Metrics:

task success rate
test pass rate
human approval rate
rollback rate
average latency
average token/cost usage
policy violation count
handoff quality
reproducibility score

Example:

agent: codex
period: 30d
score: 0.91
strengths:
  - code_patches
  - test_fixes
  - small_refactors
weaknesses:
  - broad_product_strategy
metrics:
  test_pass_rate: 0.94
  rollback_rate: 0.03
  avg_handoff_quality: 0.87

Agent scores should guide routing:

send code patches to agents with high patch/test scores
send long analysis to agents with high synthesis scores
keep private tasks with local agents/models when policy requires it

Store user permissions as an auditable local ledger.

The consent ledger answers:

Which agents can read which memory?
Which agents can write memory?
Which tools can be called?
Which folders can be indexed?
Which providers can receive which trust levels?
Which actions require confirmation?

Example:

consent_version: 1
updated_at: 2026-05-01T12:00:00Z
agents:
  codex:
    memory:
      read: [project, decisions, runbooks]
      write: [sessions, handoffs, tasks]
    tools:
      allowed: [repo.search, memory.write, tests.run]
      confirm: [git.push, file.delete]
      denied: [secrets.read, deploy.production]
    providers:
      public_llm_allowed: false
  claude-code:
    memory:
      read: [project, decisions, architecture]
      write: [sessions, decisions]
    tools:
      allowed: [repo.search, memory.write]
      confirm: [file.write]

Consent changes should be append-only:

memory/consent/ledger.jsonl

The gateway may generate config snippets from consent, but it should ask before editing external tool settings.

Reproducible AI Runs

Every important AI run should be replayable.

Store:

request id
agent id
model/provider
prompt template version
context receipt
trust policy version
memory refs
retrieval refs
tool calls
redaction decisions
output
human feedback

Example run folder:

memory/runs/2026/05/01/req_abc123/
  request.yaml
  context-receipt.yaml
  prompt.md
  output.md
  toolcalls.jsonl
  feedback.yaml

Replay modes:

exact: same context refs and same model/provider where possible
compare: same input against several models
policy-replay: rerun trust routing with a newer policy
compression-replay: test different compression settings

This makes the gateway debuggable, auditable, and useful for evaluation.

Visual Topology Map

The UI should include a live topology view of the user's AI infrastructure.

It should show:

detected AI clients
active MCP servers
local model runtimes
hosted providers
memory backend
vector index
enabled tools
blocked or disabled integrations
routing paths
cost-producing paths

Example:

Claude Code ── MCP ─┐
Codex ─────── LSP ──┼── Adaptive LLM Gateway ── Trust Router ── Ollama
Cursor ───── OpenAI ┘              │              │
                                   │              ├── OpenAI (public only)
                                   │              └── Anthropic (approved)
                                   │
                                   ├── Shared Memory ── Gitea
                                   └── Knowledge Index ── SQLite/Qdrant

Each node should expose status, permissions, latency, cost, and recent receipts.

Setup Doctor

Add a diagnostic command:

adaptive-llm-gateway doctor

Checks:

gateway health
MCP server health
Ollama/LM Studio/vLLM/LocalAI availability
hosted provider credentials
Gitea sync status
vector index health
database migrations
port conflicts
Docker status
Claude Code/Codex/Cursor/VS Code integration status
policy and consent ledger validity

The doctor should produce direct fix suggestions:

Issue: Ollama detected but no models installed.
Fix: ollama pull qwen2.5:7b

Issue: Claude Code detected but MCP config not installed.
Fix: adaptive-llm-gateway integrate claude-code --write-config

AI Cost Governor

The gateway should actively control cost, not only report it.

Features:

daily/weekly/monthly budgets
per-provider budgets
per-agent budgets
per-project budgets
max-cost-per-request
auto-fallback from paid to local models
warnings before expensive runs
hard stop when budget is exhausted

Example:

cost_governor:
  weekly_budget_usd: 25
  max_request_usd: 0.25
  agents:
    codex:
      weekly_budget_usd: 5
    chatgpt:
      weekly_budget_usd: 10
  fallback_when_budget_low: local-only

Offline Mode

Provide a strict local-only mode:

adaptive-llm-gateway mode offline

Offline mode:

disables hosted providers
disables external telemetry
routes only to local models
uses local memory only
blocks remote sync unless explicitly allowed
marks receipts as offline_mode: true

This is important for security work, customer data, travel, and privacy-focused users.

Integration Marketplace

Add a local integration catalog, not a SaaS marketplace.

Examples:

Claude Code integration
Codex integration
Cursor integration
VS Code integration
Continue.dev integration
ChatGPT export importer
GitHub Copilot bridge
n8n workflow pack
Gitea memory backend
GitHub memory backend
Obsidian connector
Open WebUI connector
Home Assistant connector
Slack/Teams connector
Jira/Linear/GitHub Issues connector

Each integration should declare:

permissions required
tools exposed
data read/write scope
setup method
config files touched
risk level
rollback instructions

Data Source Connectors

Support user-approved knowledge sources:

local folders
Git repos
Obsidian vaults
Markdown notes
PDFs
browser bookmarks
ChatGPT exports
Claude/Codex handoffs
Notion
Google Drive
OneDrive
email
calendar
tickets/issues
logs
databases

All connectors must use explicit scope and consent.

Team Mode

Team mode should support small organizations without requiring cloud SaaS.

Features:

shared Gitea memory
shared provider configuration
per-user budgets
per-project policies
role-based permissions
audit logs
admin dashboard
project onboarding
policy templates
team-wide benchmark results

Suggested roles:

owner
admin
developer
analyst
viewer

Prompt and Agent Versioning

Version everything that changes AI behavior:

prompts
prompt packs
routing rules
policies
consent ledger changes
agent profiles
benchmark suites
benchmark results
eval datasets
compression strategies

Store versions in Git/Gitea where possible.

Safe Config Writer

The gateway should be able to configure other tools, but only through reviewable diffs.

Flow:

1. Detect target config.
2. Generate proposed diff.
3. Explain impact.
4. Ask user approval.
5. Write config.
6. Store receipt and rollback entry.

Example:

+ "mcpServers": {
+   "adaptive-llm-gateway": {
+     "command": "adaptive-llm-gateway-mcp",
+     "args": ["--config", "~/.adaptive-llm-gateway/config.yaml"]
+   }
+ }

Migration and Import Wizard

Help users consolidate existing AI chaos:

adaptive-llm-gateway import

Import targets:

existing .env provider keys
Ollama model list
Open WebUI config
LM Studio local server settings
ChatGPT exports
Claude Code handoffs
Codex session notes
existing project READMEs/docs
n8n workflows
previous vector indexes where supported

The import wizard should never move or delete original data. It creates normalized memory entries, config snippets, and receipts.

UI Direction

The open-source UI can inherit the spirit of the current LLM Gateway dashboard, but it should be productized into a neutral, reusable interface.

Keep from the current gateway:

operational dashboard feel
live health/status cards
request/cost/token visibility
provider and fallback visibility
logs/metrics orientation
dashboard as first screen, not a marketing page

Improve for OSS:

first-run setup wizard
topology map as the home view
integration catalog
trust policy editor
memory browser
context receipts viewer
consent ledger viewer
benchmark lab
team/admin mode

Recommended main navigation:

Topology
Models
Agents
Memory
Policies
Receipts
Benchmarks
Costs
Integrations
Doctor
Settings

Visual style:

dense, operational, and scannable
dark/light mode
no marketing hero as the app entry
no Context-X-specific branding in OSS defaults
optional theme pack for Context-X/internal deployments

Target Users

Developers running Ollama, LM Studio, Open WebUI, Claude Code, Codex, Cursor, VS Code, n8n, or custom agents.
Small teams that want one internal AI gateway instead of scattered API keys.
Homelab and self-hosting users who want MCP tools, local models, and remote fallback models in one stack.
Security-conscious teams that want audit logs, budgets, routing rules, and local-first behavior.

Open Source Boundary

The OSS release should remove or isolate Context-X-specific assumptions:

Hardcoded domains such as context-x.org, fichtmueller.org, and Erik host paths.
Private project templates for TIP, MAGATAMA, SwitchBlade, PeerCortex, etc.
Private credentials, server names, and internal service assumptions.
Context-X-specific training data unless explicitly sanitized and licensed.

Keep as generic features:

Fastify gateway service.
TypeScript client.
Health checks.
Provider routing.
OpenAI-compatible adapter.
MCP server.
Local model discovery.
Audit logging.
Cost and token tracking.
Prompt template system.
Optional learning engine.

Adaptive System Discovery

Add a first-run discovery command:

npx adaptive-llm-gateway init

It should detect:

OS: macOS, Linux, Windows/WSL.
Runtime: Node.js, Python, Docker, Docker Compose, pnpm/npm/yarn.
Local LLM servers:
- Ollama on localhost:11434
- LM Studio on localhost:1234
- LocalAI
- Open WebUI
- llama.cpp server
Hosted provider credentials from environment only after consent:
- OpenAI
- Anthropic
- Mistral
- Groq
- Cerebras
- OpenRouter
- Cloudflare Workers AI
Developer tools:
- VS Code
- Cursor
- Claude Code
- Codex CLI/Desktop
- GitHub Copilot
- n8n
- Git remotes and local repos
Local knowledge sources:
- selected folders
- docs
- markdown notes
- code repositories
- optional browser/exported bookmarks

Discovery must produce a local config file, not silently mutate user systems:

gateway:
  port: 3103
  mode: local-first

models:
  local:
    ollama:
      detected: true
      url: http://localhost:11434
      models: []

providers:
  openai:
    enabled: false
    env_key: OPENAI_API_KEY

mcp:
  enabled: true
  port: 3104

tools:
  filesystem:
    enabled: false
    allowed_roots: []
  git:
    enabled: true
  shell:
    enabled: false

AI Client and Agent Detection

The gateway should detect AI clients and agent runtimes as integration targets, but it should treat each one differently depending on what is technically and legally possible.

Detection is not the same as control. Some tools expose APIs, config files, MCP settings, or proxy configuration. Others are closed consumer apps where the safe integration path is an adapter, browser extension, exported data import, or a documented manual setup step.

Integration Levels

Use four integration levels:

Level	Meaning	Example
`detected`	Tool exists, but no automatic binding yet	ChatGPT desktop app installed
`configurable`	Gateway can write or suggest config	Claude Code MCP config
`proxyable`	Tool can point to OpenAI-compatible gateway URL	OpenAI SDK, Continue, many IDE plugins
`native`	Gateway has a dedicated adapter/package	Codex LSP adapter, Claude Code bridge

Tool Matrix

Tool	Detect	Best Integration Path	Notes
Codex CLI/Desktop	CLI path, config folder, running process	MCP server, LSP adapter, OpenAI-compatible endpoint	Provide `codex-lsp-adapter` and MCP setup instructions.
Claude Code	CLI path, MCP/config files, shell env	MCP server + Claude Code bridge	Best path is first-class MCP tools/resources.
ChatGPT Desktop/Web	App/process/browser profile, exported chats	OpenAI-compatible adapter where supported, browser extension, import/export	Do not scrape private chats silently. Ask before importing exports.
OpenAI SDK users	Env vars, package manifests, code search	Replace `baseURL` with gateway URL	Very easy and safe to automate per repo.
Cursor	App/config detection	MCP server, OpenAI-compatible proxy if configured	Needs explicit user approval before editing settings.
VS Code	Extensions + settings.json	MCP/LSP adapter, Continue/Copilot-compatible config	Offer snippets instead of blind mutation.
GitHub Copilot	gh auth, extension, copilot bridge	copilot-bridge where available	Subscription/auth belongs to user; gateway should not extract tokens.
Continue.dev	config files	OpenAI-compatible endpoint	Good OSS integration target.
Open WebUI	local port/container detection	Register gateway as provider or upstream	Can also use Open WebUI as discovered model frontend.
n8n	local port/container/env	HTTP node templates + credentials guidance	Detect workflows only with allowed path/API access.
LangChain/LlamaIndex apps	package manifests/code search	Generated integration patch	Per-project opt-in.

Detection Sources

Safe discovery sources:

process list
common install paths
package manifests
shell PATH
Docker containers
local ports
explicit config directories
user-selected project folders

Sensitive sources that require consent:

browser profiles
chat exports
API keys
IDE settings writes
MCP config writes
repo-wide code modifications
shell command execution tools

Binding Strategy

The first-run wizard should present findings like this:

Detected AI tools:

✓ Claude Code CLI
  Integration: MCP server
  Action: add Adaptive LLM Gateway MCP config

✓ Codex
  Integration: MCP + LSP adapter
  Action: generate config snippet

✓ ChatGPT desktop
  Integration: detected only
  Action: optional import of exported chats, optional browser extension

✓ Cursor
  Integration: MCP/OpenAI-compatible endpoint
  Action: generate settings snippet

Enable integrations now? [select]

Default behavior should be conservative:

Generate config snippets first.
Ask before writing settings.
Ask before indexing chat exports or repo contents.
Never extract tokens from apps.
Prefer official APIs, MCP, LSP, or documented config surfaces.

MCP Server With Own LLM

The MCP server should be a first-class package:

packages/mcp-server

Responsibilities:

Expose tools for gateway completion, model listing, health, routing, embeddings, and document lookup.
Expose resources for discovered docs/repos when the user allows them.
Use the gateway's local-first model routing by default.
Allow a dedicated local model for tool reasoning, for example qwen2.5:7b or another detected local model.
Never expose shell or filesystem tools until the user explicitly enables allowed scopes.

Suggested MCP tools:

gateway.complete
gateway.chat
gateway.classify
gateway.models
gateway.health
gateway.route_preview
knowledge.search
repo.search
repo.summarize
config.get
config.update

Embedding Everything

"Embed everything" should mean controlled, user-approved indexing:

Scan allowed roots only.
Chunk and embed text/code/docs.
Store embeddings locally by default.
Support SQLite + sqlite-vec for simple installs.
Support Postgres + pgvector for team/server installs.
Optional Qdrant for larger deployments.

Default modes:

personal: SQLite, local-only, one user.
team: Postgres, API keys, audit logging.
server: Docker Compose, reverse proxy, persistence, MCP enabled.

Shared AI Memory Sync

Add a shared memory layer for all connected AI clients and agents. The goal is to make Claude Code, Codex, ChatGPT exports, Cursor, IDE assistants, MCP tools, and automation agents work from the same durable project memory instead of each assistant living in an isolated context bubble.

Working name: Memory Sync Backend.

Why Git/Gitea

Git is a strong default backend for portable AI memory:

auditable history
human-readable Markdown/JSON/YAML files
offline-first local clone
easy sync across machines
branchable experiments
reviewable diffs
self-hostable with Gitea
no mandatory SaaS dependency

Gitea can act as the team/server backend:

Claude Code ─┐
Codex      ──┼── Adaptive LLM Gateway ── Memory Sync ── Git/Gitea repo
Cursor     ──┤             │
ChatGPT    ──┘             └── local vector index for fast retrieval

Memory Types

Store memory in typed folders:

memory/
  projects/
    my-project/
      PROJECT.md
      decisions/
      tasks/
      architecture/
      runbooks/
      sync/
  agents/
    codex/
    claude-code/
    chatgpt/
    cursor/
  facts/
  preferences/
  credentials-notes/
  incidents/
  evals/

Use plain files for durable truth and an embedding index for fast lookup.

Memory Records

Each memory entry should include provenance:

id: mem_2026_05_01_001
type: decision
project: adaptive-llm-gateway
source_agent: codex
created_at: 2026-05-01T12:00:00Z
visibility: team
sensitivity: internal
tags: [mcp, memory, gitea]
summary: Use Gitea-backed memory sync as the shared durable backend.
links:
  - file: OPEN_SOURCE_BLUEPRINT.md

Sync Modes

local: file-based memory in ~/.adaptive-llm-gateway/memory.
git: local Git repo, user pushes manually.
gitea: automatic push/pull to self-hosted Gitea.
github: optional public/private GitHub backend.
s3: optional artifact backup, not source of truth.

Agent Integration

Each agent gets a memory adapter:

Claude Code: MCP resources + memory write tools.
Codex: MCP resources + session handoff writer.
ChatGPT: import exported chats; optional browser extension later.
Cursor/VS Code: repo memory + generated context snippets.
n8n: workflow memory and execution summaries.

Suggested MCP memory tools:

memory.search
memory.read
memory.write
memory.append_session
memory.summarize_project
memory.record_decision
memory.record_task
memory.sync_status
memory.pull
memory.push

Conflict Handling

Memory should be append-first. Avoid agents overwriting each other.

Rules:

Session logs are append-only.
Decisions can supersede earlier decisions but should not delete them.
Project summaries are regenerated from source logs and committed as derived files.
Conflicts create review entries instead of automatic destructive merges.

Privacy and Safety

Never sync secrets.
Secret-looking values are redacted before commit.
Sensitive memory can stay local-only.
Users can mark folders as private, team, or public.
Chat imports require explicit approval.
Every memory entry records source agent and timestamp.

Gitea Default Layout

For self-hosted users:

gitea.example.local/user/ai-memory.git
gitea.example.local/user/project-a.git
gitea.example.local/user/project-b.git

The gateway can either:

use one central ai-memory repo, or
add a .ai-memory/ folder to each project repo.

Recommended default:

personal mode: one central memory repo
team mode: one memory repo plus per-project links
open-source project mode: .ai-memory/ inside the project

Architecture

User apps / agents / IDEs
        |
        | OpenAI API / MCP / SDK
        v
Adaptive LLM Gateway
  - routing
  - prompt templates
  - confidence gates
  - budgets
  - audit logs
  - local knowledge lookup
        |
        +--> Local models: Ollama, LM Studio, LocalAI, llama.cpp
        +--> Hosted providers: OpenAI, Anthropic, Groq, Mistral, etc.
        +--> MCP tools/resources
        +--> Local vector store

Installation Targets

Simple local install:

npx adaptive-llm-gateway init
npx adaptive-llm-gateway start

Docker install:

docker compose up -d

Team/server install:

npx adaptive-llm-gateway init --mode team
npx adaptive-llm-gateway deploy-config

Security Defaults

Local-first.
No secrets in config files.
Read env vars only after consent.
No filesystem indexing without allowed roots.
No shell tool by default.
No telemetry by default.
Audit logs redact prompts by default unless user opts in.
MCP dangerous tools disabled until explicitly enabled.
Provider API keys remain in env, system keychain, or configured secret backend.

Refactor Plan

Phase 1: Extract Context-X assumptions

Move Context-X routing templates into optional example pack.
Rename packages from @llm-gateway/* or prepare a neutral scope.
Replace hardcoded domains and ports with generated config.
Add .env.example for OSS.

Phase 2: First-run discovery

Add packages/discovery.
Detect local models, runtimes, repos, and common agent tools.
Generate gateway.config.yaml.

Phase 3: MCP server

Add packages/mcp-server.
Expose gateway tools and resources.
Add local model-backed tool reasoning.

Phase 4: Embeddings and knowledge

Add packages/knowledge.
Support SQLite default and Postgres/Qdrant optional backends.
Add chunking, indexing, search, and repo/doc ingestion.

Phase 5: OSS release hardening

Secret scan.
License audit.
Remove private data.
Add quickstart docs.
Add GitHub Actions CI.
Add Docker Compose starter.

Minimum Viable OSS Release

The first public version should include:

Gateway server.
Client SDK.
OpenAI-compatible adapter.
Local Ollama/LM Studio detection.
MCP server with safe tools.
SQLite config and audit store.
Docker Compose.
One generic prompt template pack.
Documentation for local, team, and server modes.

Name Ideas

Adaptive LLM Gateway
Open LLM Gateway
LocalMesh Gateway
ModelRouter
GatewayKit
AIDE Gateway

30 KiB Raw Blame History

Open Source Blueprint: Adaptive LLM Gateway

Vision

Product Shape

Differentiating Core Modules

Trust Router

Policy Engine

Provider Router

Context Receipt

AI Handoff Protocol

Capability Benchmark Lab

Agent Reputation Score

Local Consent Ledger

Reproducible AI Runs

Visual Topology Map

Setup Doctor

AI Cost Governor

Offline Mode

Integration Marketplace

Data Source Connectors

Team Mode

Prompt and Agent Versioning

Safe Config Writer

Migration and Import Wizard

UI Direction

Target Users

Open Source Boundary

Adaptive System Discovery

AI Client and Agent Detection

Integration Levels

Tool Matrix

Detection Sources

Binding Strategy

MCP Server With Own LLM

Embedding Everything

Shared AI Memory Sync

Why Git/Gitea

Memory Types

Memory Records

Sync Modes

Agent Integration

Conflict Handling

Privacy and Safety

Gitea Default Layout

Architecture

Installation Targets

Security Defaults

Refactor Plan

Minimum Viable OSS Release

Name Ideas

30 KiB

Raw Blame History