1271 lines
30 KiB
Markdown
1271 lines
30 KiB
Markdown
# Open Source Blueprint: Adaptive LLM Gateway
|
|
|
|
Companion documents:
|
|
|
|
- `AI_CONTROL_PLANE_SYSTEM_DESIGN.md` — canonical control-plane architecture
|
|
- `OPEN_SOURCE_GAP_ANALYSIS.md` — current gateway vs. OSS target
|
|
- `OPEN_SOURCE_FEATURE_MATRIX.md` — feature state and priority
|
|
- `OPEN_SOURCE_IMPLEMENTATION_ROADMAP.md` — phase-by-phase build plan
|
|
|
|
## Vision
|
|
|
|
Turn the Context-X LLM Gateway into an open-source, self-adapting LLM control plane that can run on a user's own machine or server, discover the local AI/dev environment, and expose it through a secure MCP server plus OpenAI-compatible APIs.
|
|
|
|
The open-source version should not assume Context-X infrastructure. It should install cleanly, detect what is available, ask before using sensitive integrations, and then wire local models, hosted providers, tools, documents, and developer environments into one gateway.
|
|
|
|
## Product Shape
|
|
|
|
Working name: **Adaptive LLM Gateway**
|
|
|
|
Core promise:
|
|
|
|
- Bring your own local or hosted models.
|
|
- Run a private MCP server with an optional local LLM.
|
|
- Detect common tools and runtimes automatically.
|
|
- Expose one unified API for apps, agents, IDEs, and automations.
|
|
- Keep secrets and private data local by default.
|
|
|
|
## Differentiating Core Modules
|
|
|
|
The open-source project should lead with four features that make it more than a model proxy:
|
|
|
|
1. **Trust Router**
|
|
2. **Context Receipt**
|
|
3. **Shared Gitea Memory**
|
|
4. **AI Handoff Protocol**
|
|
|
|
The second core layer should add learning, accountability, and repeatability:
|
|
|
|
5. **Capability Benchmark Lab**
|
|
6. **Agent Reputation Score**
|
|
7. **Local Consent Ledger**
|
|
8. **Reproducible AI Runs**
|
|
|
|
The execution pipeline should be:
|
|
|
|
```text
|
|
Client Entry
|
|
-> Trust Router
|
|
-> Policy Engine
|
|
-> Memory Query
|
|
-> Compression Engine
|
|
-> Provider Router
|
|
-> Execution Layer
|
|
-> Receipt Engine
|
|
-> Memory Update
|
|
-> Route Reflector Memory
|
|
```
|
|
|
|
Together they create a trusted coordination layer for all AI clients and agents on a user's system.
|
|
|
|
```text
|
|
Request
|
|
|
|
|
v
|
|
Trust Router
|
|
- validate client identity
|
|
- assign trust level
|
|
- classify request type and sensitivity
|
|
|
|
|
v
|
|
Policy Engine
|
|
- enforce provider/model/tool permissions
|
|
- apply cost, compliance, and project rules
|
|
|
|
|
v
|
|
Context Builder
|
|
- memory
|
|
- files
|
|
- retrieved docs
|
|
- compressed history
|
|
|
|
|
v
|
|
LLM / Agent / MCP Tool
|
|
|
|
|
v
|
|
Context Receipt + Shared Memory Update + Route Reflector Learning
|
|
```
|
|
|
|
## Trust Router
|
|
|
|
The Trust Router decides which model, provider, agent, and tool chain may handle a request.
|
|
|
|
It should classify every request by:
|
|
|
|
- data sensitivity
|
|
- task type
|
|
- required capabilities
|
|
- allowed tools
|
|
- user/team policy
|
|
- cost and latency budget
|
|
- local model availability
|
|
|
|
Suggested trust levels:
|
|
|
|
| Trust Level | Meaning | Allowed Routing |
|
|
|---|---|---|
|
|
| `public` | Safe public/non-sensitive content | Any enabled provider |
|
|
| `internal` | Project context, private notes, normal code | Local or approved providers |
|
|
| `confidential` | Customer data, private business data, security findings | Local-only or explicitly trusted provider |
|
|
| `secret` | API keys, credentials, tokens, private keys | Block, redact, or local security scanner only |
|
|
|
|
Policy example:
|
|
|
|
```yaml
|
|
trust_router:
|
|
default_mode: hybrid-safe
|
|
rules:
|
|
- match:
|
|
contains_secret: true
|
|
action: block
|
|
- match:
|
|
sensitivity: confidential
|
|
route: local-only
|
|
- match:
|
|
task_type: code_generation
|
|
sensitivity: internal
|
|
route: [claude-code, codex, local-code-model]
|
|
- match:
|
|
task_type: brainstorming
|
|
sensitivity: public
|
|
route: [openai, anthropic, local]
|
|
```
|
|
|
|
The Trust Router should always explain its decision internally and optionally expose it to users.
|
|
|
|
## Policy Engine
|
|
|
|
The Policy Engine evaluates what is allowed after the Trust Router has classified the request.
|
|
|
|
It should evaluate:
|
|
|
|
- allowed providers
|
|
- allowed models
|
|
- allowed tools
|
|
- data sensitivity
|
|
- project policy
|
|
- compliance rules
|
|
- cost limits
|
|
- offline/simulation/live mode
|
|
|
|
Example policies:
|
|
|
|
- never send legal data to public APIs
|
|
- prefer local models for internal code
|
|
- use external models only if confidence is below a threshold
|
|
- block requests containing secrets
|
|
- require admin override for production deployment tools
|
|
|
|
The output is a route constraint set:
|
|
|
|
```yaml
|
|
allowed_routes: [ollama, claude-code]
|
|
blocked_routes:
|
|
- provider: openai
|
|
reason: confidential data policy
|
|
required_redactions: []
|
|
max_request_cost_usd: 0.10
|
|
mode: live
|
|
```
|
|
|
|
## Provider Router
|
|
|
|
The Provider Router makes the final execution decision after policy and compression.
|
|
|
|
It chooses:
|
|
|
|
- local model
|
|
- external provider
|
|
- AI agent/client
|
|
- MCP tool
|
|
- fallback chain
|
|
|
|
Inputs:
|
|
|
|
- policy constraints
|
|
- model availability
|
|
- provider health
|
|
- latency
|
|
- cost
|
|
- benchmark scores
|
|
- agent reputation
|
|
- Route Reflector Memory
|
|
|
|
The Provider Router should support live, simulation, and offline modes.
|
|
|
|
## Context Receipt
|
|
|
|
Every answer should be able to produce a receipt that shows what context was used and what was protected.
|
|
|
|
Example:
|
|
|
|
```yaml
|
|
receipt_id: ctxr_2026_05_01_001
|
|
request_id: req_abc123
|
|
model: qwen2.5:14b
|
|
provider: ollama
|
|
trust_level: internal
|
|
route_reason:
|
|
- local model selected because project memory was private
|
|
- external providers skipped by policy
|
|
context_used:
|
|
- type: memory
|
|
ref: projects/adaptive-llm-gateway/PROJECT.md
|
|
- type: file
|
|
ref: OPEN_SOURCE_BLUEPRINT.md
|
|
- type: retrieval
|
|
ref: memory/decisions/2026-05-01-gitea-memory.md
|
|
context_blocked:
|
|
- type: file
|
|
ref: .env
|
|
reason: secret pattern
|
|
- type: provider
|
|
ref: openai
|
|
reason: confidential policy
|
|
tokens:
|
|
input: 4200
|
|
output: 900
|
|
compressed_from: 13200
|
|
cost:
|
|
estimated_usd: 0
|
|
```
|
|
|
|
Receipts can be stored locally, pushed to shared memory, or attached to audit logs.
|
|
|
|
## AI Handoff Protocol
|
|
|
|
Define a simple handoff format so Claude Code, Codex, ChatGPT, Cursor, n8n, and other agents can pass work to each other without losing context.
|
|
|
|
Handoff files should be plain Markdown with YAML frontmatter or pure YAML/JSON.
|
|
|
|
Example:
|
|
|
|
```yaml
|
|
handoff_version: 1
|
|
id: handoff_2026_05_01_001
|
|
project: adaptive-llm-gateway
|
|
from_agent: claude-code
|
|
to_agent: codex
|
|
created_at: 2026-05-01T12:00:00Z
|
|
status: ready
|
|
goal: Implement MCP memory tools.
|
|
current_state:
|
|
summary: Blueprint exists. Need package scaffold and safe tool definitions.
|
|
branch: main
|
|
files_changed:
|
|
- OPEN_SOURCE_BLUEPRINT.md
|
|
constraints:
|
|
- Do not expose shell tools by default.
|
|
- Do not sync secrets.
|
|
next_actions:
|
|
- Create packages/mcp-server.
|
|
- Add memory.search and memory.write tools.
|
|
- Add tests for policy enforcement.
|
|
context_refs:
|
|
- memory/projects/adaptive-llm-gateway/PROJECT.md
|
|
- memory/decisions/2026-05-01-shared-gitea-memory.md
|
|
open_questions:
|
|
- Should SQLite be mandatory for personal mode?
|
|
confidence: 0.82
|
|
```
|
|
|
|
Recommended folders:
|
|
|
|
```text
|
|
memory/projects/<project>/handoffs/
|
|
memory/agents/<agent>/sessions/
|
|
memory/decisions/
|
|
```
|
|
|
|
The protocol should be append-first and easy for humans to read.
|
|
|
|
## Capability Benchmark Lab
|
|
|
|
The gateway should benchmark every detected model, provider, and major agent integration before trusting it for routing.
|
|
|
|
Benchmarks should be local, transparent, and repeatable.
|
|
|
|
Test dimensions:
|
|
|
|
- JSON/schema reliability
|
|
- code generation
|
|
- code patch quality
|
|
- instruction following
|
|
- German/English quality
|
|
- summarization
|
|
- tool-call readiness
|
|
- latency
|
|
- cost
|
|
- context length behavior
|
|
- private-data safety
|
|
- refusal/guardrail behavior
|
|
|
|
Example benchmark result:
|
|
|
|
```yaml
|
|
model: qwen2.5:14b
|
|
provider: ollama
|
|
benchmarked_at: 2026-05-01T12:00:00Z
|
|
scores:
|
|
json_schema: 0.84
|
|
code_generation: 0.71
|
|
german: 0.88
|
|
summarization: 0.91
|
|
latency: 0.76
|
|
privacy: 1.00
|
|
recommended_for:
|
|
- private_summarization
|
|
- german_drafts
|
|
- internal_qa
|
|
not_recommended_for:
|
|
- complex_code_patch
|
|
```
|
|
|
|
The Trust Router should use benchmark results instead of static assumptions.
|
|
|
|
## Agent Reputation Score
|
|
|
|
Track how well each connected AI client or agent performs on real tasks.
|
|
|
|
Agents can include:
|
|
|
|
- Codex
|
|
- Claude Code
|
|
- ChatGPT
|
|
- Cursor
|
|
- VS Code assistants
|
|
- n8n workflows
|
|
- local autonomous agents
|
|
|
|
Metrics:
|
|
|
|
- task success rate
|
|
- test pass rate
|
|
- human approval rate
|
|
- rollback rate
|
|
- average latency
|
|
- average token/cost usage
|
|
- policy violation count
|
|
- handoff quality
|
|
- reproducibility score
|
|
|
|
Example:
|
|
|
|
```yaml
|
|
agent: codex
|
|
period: 30d
|
|
score: 0.91
|
|
strengths:
|
|
- code_patches
|
|
- test_fixes
|
|
- small_refactors
|
|
weaknesses:
|
|
- broad_product_strategy
|
|
metrics:
|
|
test_pass_rate: 0.94
|
|
rollback_rate: 0.03
|
|
avg_handoff_quality: 0.87
|
|
```
|
|
|
|
Agent scores should guide routing:
|
|
|
|
- send code patches to agents with high patch/test scores
|
|
- send long analysis to agents with high synthesis scores
|
|
- keep private tasks with local agents/models when policy requires it
|
|
|
|
## Local Consent Ledger
|
|
|
|
Store user permissions as an auditable local ledger.
|
|
|
|
The consent ledger answers:
|
|
|
|
- Which agents can read which memory?
|
|
- Which agents can write memory?
|
|
- Which tools can be called?
|
|
- Which folders can be indexed?
|
|
- Which providers can receive which trust levels?
|
|
- Which actions require confirmation?
|
|
|
|
Example:
|
|
|
|
```yaml
|
|
consent_version: 1
|
|
updated_at: 2026-05-01T12:00:00Z
|
|
agents:
|
|
codex:
|
|
memory:
|
|
read: [project, decisions, runbooks]
|
|
write: [sessions, handoffs, tasks]
|
|
tools:
|
|
allowed: [repo.search, memory.write, tests.run]
|
|
confirm: [git.push, file.delete]
|
|
denied: [secrets.read, deploy.production]
|
|
providers:
|
|
public_llm_allowed: false
|
|
claude-code:
|
|
memory:
|
|
read: [project, decisions, architecture]
|
|
write: [sessions, decisions]
|
|
tools:
|
|
allowed: [repo.search, memory.write]
|
|
confirm: [file.write]
|
|
```
|
|
|
|
Consent changes should be append-only:
|
|
|
|
```text
|
|
memory/consent/ledger.jsonl
|
|
```
|
|
|
|
The gateway may generate config snippets from consent, but it should ask before editing external tool settings.
|
|
|
|
## Reproducible AI Runs
|
|
|
|
Every important AI run should be replayable.
|
|
|
|
Store:
|
|
|
|
- request id
|
|
- agent id
|
|
- model/provider
|
|
- prompt template version
|
|
- context receipt
|
|
- trust policy version
|
|
- memory refs
|
|
- retrieval refs
|
|
- tool calls
|
|
- redaction decisions
|
|
- output
|
|
- human feedback
|
|
|
|
Example run folder:
|
|
|
|
```text
|
|
memory/runs/2026/05/01/req_abc123/
|
|
request.yaml
|
|
context-receipt.yaml
|
|
prompt.md
|
|
output.md
|
|
toolcalls.jsonl
|
|
feedback.yaml
|
|
```
|
|
|
|
Replay modes:
|
|
|
|
- `exact`: same context refs and same model/provider where possible
|
|
- `compare`: same input against several models
|
|
- `policy-replay`: rerun trust routing with a newer policy
|
|
- `compression-replay`: test different compression settings
|
|
|
|
This makes the gateway debuggable, auditable, and useful for evaluation.
|
|
|
|
## Visual Topology Map
|
|
|
|
The UI should include a live topology view of the user's AI infrastructure.
|
|
|
|
It should show:
|
|
|
|
- detected AI clients
|
|
- active MCP servers
|
|
- local model runtimes
|
|
- hosted providers
|
|
- memory backend
|
|
- vector index
|
|
- enabled tools
|
|
- blocked or disabled integrations
|
|
- routing paths
|
|
- cost-producing paths
|
|
|
|
Example:
|
|
|
|
```text
|
|
Claude Code ── MCP ─┐
|
|
Codex ─────── LSP ──┼── Adaptive LLM Gateway ── Trust Router ── Ollama
|
|
Cursor ───── OpenAI ┘ │ │
|
|
│ ├── OpenAI (public only)
|
|
│ └── Anthropic (approved)
|
|
│
|
|
├── Shared Memory ── Gitea
|
|
└── Knowledge Index ── SQLite/Qdrant
|
|
```
|
|
|
|
Each node should expose status, permissions, latency, cost, and recent receipts.
|
|
|
|
## Setup Doctor
|
|
|
|
Add a diagnostic command:
|
|
|
|
```bash
|
|
adaptive-llm-gateway doctor
|
|
```
|
|
|
|
Checks:
|
|
|
|
- gateway health
|
|
- MCP server health
|
|
- Ollama/LM Studio/vLLM/LocalAI availability
|
|
- hosted provider credentials
|
|
- Gitea sync status
|
|
- vector index health
|
|
- database migrations
|
|
- port conflicts
|
|
- Docker status
|
|
- Claude Code/Codex/Cursor/VS Code integration status
|
|
- policy and consent ledger validity
|
|
|
|
The doctor should produce direct fix suggestions:
|
|
|
|
```text
|
|
Issue: Ollama detected but no models installed.
|
|
Fix: ollama pull qwen2.5:7b
|
|
|
|
Issue: Claude Code detected but MCP config not installed.
|
|
Fix: adaptive-llm-gateway integrate claude-code --write-config
|
|
```
|
|
|
|
## AI Cost Governor
|
|
|
|
The gateway should actively control cost, not only report it.
|
|
|
|
Features:
|
|
|
|
- daily/weekly/monthly budgets
|
|
- per-provider budgets
|
|
- per-agent budgets
|
|
- per-project budgets
|
|
- max-cost-per-request
|
|
- auto-fallback from paid to local models
|
|
- warnings before expensive runs
|
|
- hard stop when budget is exhausted
|
|
|
|
Example:
|
|
|
|
```yaml
|
|
cost_governor:
|
|
weekly_budget_usd: 25
|
|
max_request_usd: 0.25
|
|
agents:
|
|
codex:
|
|
weekly_budget_usd: 5
|
|
chatgpt:
|
|
weekly_budget_usd: 10
|
|
fallback_when_budget_low: local-only
|
|
```
|
|
|
|
## Offline Mode
|
|
|
|
Provide a strict local-only mode:
|
|
|
|
```bash
|
|
adaptive-llm-gateway mode offline
|
|
```
|
|
|
|
Offline mode:
|
|
|
|
- disables hosted providers
|
|
- disables external telemetry
|
|
- routes only to local models
|
|
- uses local memory only
|
|
- blocks remote sync unless explicitly allowed
|
|
- marks receipts as `offline_mode: true`
|
|
|
|
This is important for security work, customer data, travel, and privacy-focused users.
|
|
|
|
## Integration Marketplace
|
|
|
|
Add a local integration catalog, not a SaaS marketplace.
|
|
|
|
Examples:
|
|
|
|
- Claude Code integration
|
|
- Codex integration
|
|
- Cursor integration
|
|
- VS Code integration
|
|
- Continue.dev integration
|
|
- ChatGPT export importer
|
|
- GitHub Copilot bridge
|
|
- n8n workflow pack
|
|
- Gitea memory backend
|
|
- GitHub memory backend
|
|
- Obsidian connector
|
|
- Open WebUI connector
|
|
- Home Assistant connector
|
|
- Slack/Teams connector
|
|
- Jira/Linear/GitHub Issues connector
|
|
|
|
Each integration should declare:
|
|
|
|
- permissions required
|
|
- tools exposed
|
|
- data read/write scope
|
|
- setup method
|
|
- config files touched
|
|
- risk level
|
|
- rollback instructions
|
|
|
|
## Data Source Connectors
|
|
|
|
Support user-approved knowledge sources:
|
|
|
|
- local folders
|
|
- Git repos
|
|
- Obsidian vaults
|
|
- Markdown notes
|
|
- PDFs
|
|
- browser bookmarks
|
|
- ChatGPT exports
|
|
- Claude/Codex handoffs
|
|
- Notion
|
|
- Google Drive
|
|
- OneDrive
|
|
- email
|
|
- calendar
|
|
- tickets/issues
|
|
- logs
|
|
- databases
|
|
|
|
All connectors must use explicit scope and consent.
|
|
|
|
## Team Mode
|
|
|
|
Team mode should support small organizations without requiring cloud SaaS.
|
|
|
|
Features:
|
|
|
|
- shared Gitea memory
|
|
- shared provider configuration
|
|
- per-user budgets
|
|
- per-project policies
|
|
- role-based permissions
|
|
- audit logs
|
|
- admin dashboard
|
|
- project onboarding
|
|
- policy templates
|
|
- team-wide benchmark results
|
|
|
|
Suggested roles:
|
|
|
|
- owner
|
|
- admin
|
|
- developer
|
|
- analyst
|
|
- viewer
|
|
|
|
## Prompt and Agent Versioning
|
|
|
|
Version everything that changes AI behavior:
|
|
|
|
- prompts
|
|
- prompt packs
|
|
- routing rules
|
|
- policies
|
|
- consent ledger changes
|
|
- agent profiles
|
|
- benchmark suites
|
|
- benchmark results
|
|
- eval datasets
|
|
- compression strategies
|
|
|
|
Store versions in Git/Gitea where possible.
|
|
|
|
## Safe Config Writer
|
|
|
|
The gateway should be able to configure other tools, but only through reviewable diffs.
|
|
|
|
Flow:
|
|
|
|
```text
|
|
1. Detect target config.
|
|
2. Generate proposed diff.
|
|
3. Explain impact.
|
|
4. Ask user approval.
|
|
5. Write config.
|
|
6. Store receipt and rollback entry.
|
|
```
|
|
|
|
Example:
|
|
|
|
```diff
|
|
+ "mcpServers": {
|
|
+ "adaptive-llm-gateway": {
|
|
+ "command": "adaptive-llm-gateway-mcp",
|
|
+ "args": ["--config", "~/.adaptive-llm-gateway/config.yaml"]
|
|
+ }
|
|
+ }
|
|
```
|
|
|
|
## Migration and Import Wizard
|
|
|
|
Help users consolidate existing AI chaos:
|
|
|
|
```bash
|
|
adaptive-llm-gateway import
|
|
```
|
|
|
|
Import targets:
|
|
|
|
- existing `.env` provider keys
|
|
- Ollama model list
|
|
- Open WebUI config
|
|
- LM Studio local server settings
|
|
- ChatGPT exports
|
|
- Claude Code handoffs
|
|
- Codex session notes
|
|
- existing project READMEs/docs
|
|
- n8n workflows
|
|
- previous vector indexes where supported
|
|
|
|
The import wizard should never move or delete original data. It creates normalized memory entries, config snippets, and receipts.
|
|
|
|
## UI Direction
|
|
|
|
The open-source UI can inherit the spirit of the current LLM Gateway dashboard, but it should be productized into a neutral, reusable interface.
|
|
|
|
Keep from the current gateway:
|
|
|
|
- operational dashboard feel
|
|
- live health/status cards
|
|
- request/cost/token visibility
|
|
- provider and fallback visibility
|
|
- logs/metrics orientation
|
|
- dashboard as first screen, not a marketing page
|
|
|
|
Improve for OSS:
|
|
|
|
- first-run setup wizard
|
|
- topology map as the home view
|
|
- integration catalog
|
|
- trust policy editor
|
|
- memory browser
|
|
- context receipts viewer
|
|
- consent ledger viewer
|
|
- benchmark lab
|
|
- team/admin mode
|
|
|
|
Recommended main navigation:
|
|
|
|
```text
|
|
Topology
|
|
Models
|
|
Agents
|
|
Memory
|
|
Policies
|
|
Receipts
|
|
Benchmarks
|
|
Costs
|
|
Integrations
|
|
Doctor
|
|
Settings
|
|
```
|
|
|
|
Visual style:
|
|
|
|
- dense, operational, and scannable
|
|
- dark/light mode
|
|
- no marketing hero as the app entry
|
|
- no Context-X-specific branding in OSS defaults
|
|
- optional theme pack for Context-X/internal deployments
|
|
|
|
|
|
|
|
## Target Users
|
|
|
|
- Developers running Ollama, LM Studio, Open WebUI, Claude Code, Codex, Cursor, VS Code, n8n, or custom agents.
|
|
- Small teams that want one internal AI gateway instead of scattered API keys.
|
|
- Homelab and self-hosting users who want MCP tools, local models, and remote fallback models in one stack.
|
|
- Security-conscious teams that want audit logs, budgets, routing rules, and local-first behavior.
|
|
|
|
## Open Source Boundary
|
|
|
|
The OSS release should remove or isolate Context-X-specific assumptions:
|
|
|
|
- Hardcoded domains such as `context-x.org`, `fichtmueller.org`, and Erik host paths.
|
|
- Private project templates for TIP, MAGATAMA, SwitchBlade, PeerCortex, etc.
|
|
- Private credentials, server names, and internal service assumptions.
|
|
- Context-X-specific training data unless explicitly sanitized and licensed.
|
|
|
|
Keep as generic features:
|
|
|
|
- Fastify gateway service.
|
|
- TypeScript client.
|
|
- Health checks.
|
|
- Provider routing.
|
|
- OpenAI-compatible adapter.
|
|
- MCP server.
|
|
- Local model discovery.
|
|
- Audit logging.
|
|
- Cost and token tracking.
|
|
- Prompt template system.
|
|
- Optional learning engine.
|
|
|
|
## Adaptive System Discovery
|
|
|
|
Add a first-run discovery command:
|
|
|
|
```bash
|
|
npx adaptive-llm-gateway init
|
|
```
|
|
|
|
It should detect:
|
|
|
|
- OS: macOS, Linux, Windows/WSL.
|
|
- Runtime: Node.js, Python, Docker, Docker Compose, pnpm/npm/yarn.
|
|
- Local LLM servers:
|
|
- Ollama on `localhost:11434`
|
|
- LM Studio on `localhost:1234`
|
|
- LocalAI
|
|
- Open WebUI
|
|
- llama.cpp server
|
|
- Hosted provider credentials from environment only after consent:
|
|
- OpenAI
|
|
- Anthropic
|
|
- Mistral
|
|
- Groq
|
|
- Cerebras
|
|
- OpenRouter
|
|
- Cloudflare Workers AI
|
|
- Developer tools:
|
|
- VS Code
|
|
- Cursor
|
|
- Claude Code
|
|
- Codex CLI/Desktop
|
|
- GitHub Copilot
|
|
- n8n
|
|
- Git remotes and local repos
|
|
- Local knowledge sources:
|
|
- selected folders
|
|
- docs
|
|
- markdown notes
|
|
- code repositories
|
|
- optional browser/exported bookmarks
|
|
|
|
Discovery must produce a local config file, not silently mutate user systems:
|
|
|
|
```yaml
|
|
gateway:
|
|
port: 3103
|
|
mode: local-first
|
|
|
|
models:
|
|
local:
|
|
ollama:
|
|
detected: true
|
|
url: http://localhost:11434
|
|
models: []
|
|
|
|
providers:
|
|
openai:
|
|
enabled: false
|
|
env_key: OPENAI_API_KEY
|
|
|
|
mcp:
|
|
enabled: true
|
|
port: 3104
|
|
|
|
tools:
|
|
filesystem:
|
|
enabled: false
|
|
allowed_roots: []
|
|
git:
|
|
enabled: true
|
|
shell:
|
|
enabled: false
|
|
```
|
|
|
|
## AI Client and Agent Detection
|
|
|
|
The gateway should detect AI clients and agent runtimes as integration targets, but it should treat each one differently depending on what is technically and legally possible.
|
|
|
|
Detection is not the same as control. Some tools expose APIs, config files, MCP settings, or proxy configuration. Others are closed consumer apps where the safe integration path is an adapter, browser extension, exported data import, or a documented manual setup step.
|
|
|
|
### Integration Levels
|
|
|
|
Use four integration levels:
|
|
|
|
| Level | Meaning | Example |
|
|
|---|---|---|
|
|
| `detected` | Tool exists, but no automatic binding yet | ChatGPT desktop app installed |
|
|
| `configurable` | Gateway can write or suggest config | Claude Code MCP config |
|
|
| `proxyable` | Tool can point to OpenAI-compatible gateway URL | OpenAI SDK, Continue, many IDE plugins |
|
|
| `native` | Gateway has a dedicated adapter/package | Codex LSP adapter, Claude Code bridge |
|
|
|
|
### Tool Matrix
|
|
|
|
| Tool | Detect | Best Integration Path | Notes |
|
|
|---|---|---|---|
|
|
| Codex CLI/Desktop | CLI path, config folder, running process | MCP server, LSP adapter, OpenAI-compatible endpoint | Provide `codex-lsp-adapter` and MCP setup instructions. |
|
|
| Claude Code | CLI path, MCP/config files, shell env | MCP server + Claude Code bridge | Best path is first-class MCP tools/resources. |
|
|
| ChatGPT Desktop/Web | App/process/browser profile, exported chats | OpenAI-compatible adapter where supported, browser extension, import/export | Do not scrape private chats silently. Ask before importing exports. |
|
|
| OpenAI SDK users | Env vars, package manifests, code search | Replace `baseURL` with gateway URL | Very easy and safe to automate per repo. |
|
|
| Cursor | App/config detection | MCP server, OpenAI-compatible proxy if configured | Needs explicit user approval before editing settings. |
|
|
| VS Code | Extensions + settings.json | MCP/LSP adapter, Continue/Copilot-compatible config | Offer snippets instead of blind mutation. |
|
|
| GitHub Copilot | gh auth, extension, copilot bridge | copilot-bridge where available | Subscription/auth belongs to user; gateway should not extract tokens. |
|
|
| Continue.dev | config files | OpenAI-compatible endpoint | Good OSS integration target. |
|
|
| Open WebUI | local port/container detection | Register gateway as provider or upstream | Can also use Open WebUI as discovered model frontend. |
|
|
| n8n | local port/container/env | HTTP node templates + credentials guidance | Detect workflows only with allowed path/API access. |
|
|
| LangChain/LlamaIndex apps | package manifests/code search | Generated integration patch | Per-project opt-in. |
|
|
|
|
### Detection Sources
|
|
|
|
Safe discovery sources:
|
|
|
|
- process list
|
|
- common install paths
|
|
- package manifests
|
|
- shell PATH
|
|
- Docker containers
|
|
- local ports
|
|
- explicit config directories
|
|
- user-selected project folders
|
|
|
|
Sensitive sources that require consent:
|
|
|
|
- browser profiles
|
|
- chat exports
|
|
- API keys
|
|
- IDE settings writes
|
|
- MCP config writes
|
|
- repo-wide code modifications
|
|
- shell command execution tools
|
|
|
|
### Binding Strategy
|
|
|
|
The first-run wizard should present findings like this:
|
|
|
|
```text
|
|
Detected AI tools:
|
|
|
|
✓ Claude Code CLI
|
|
Integration: MCP server
|
|
Action: add Adaptive LLM Gateway MCP config
|
|
|
|
✓ Codex
|
|
Integration: MCP + LSP adapter
|
|
Action: generate config snippet
|
|
|
|
✓ ChatGPT desktop
|
|
Integration: detected only
|
|
Action: optional import of exported chats, optional browser extension
|
|
|
|
✓ Cursor
|
|
Integration: MCP/OpenAI-compatible endpoint
|
|
Action: generate settings snippet
|
|
|
|
Enable integrations now? [select]
|
|
```
|
|
|
|
Default behavior should be conservative:
|
|
|
|
- Generate config snippets first.
|
|
- Ask before writing settings.
|
|
- Ask before indexing chat exports or repo contents.
|
|
- Never extract tokens from apps.
|
|
- Prefer official APIs, MCP, LSP, or documented config surfaces.
|
|
|
|
## MCP Server With Own LLM
|
|
|
|
The MCP server should be a first-class package:
|
|
|
|
```text
|
|
packages/mcp-server
|
|
```
|
|
|
|
Responsibilities:
|
|
|
|
- Expose tools for gateway completion, model listing, health, routing, embeddings, and document lookup.
|
|
- Expose resources for discovered docs/repos when the user allows them.
|
|
- Use the gateway's local-first model routing by default.
|
|
- Allow a dedicated local model for tool reasoning, for example `qwen2.5:7b` or another detected local model.
|
|
- Never expose shell or filesystem tools until the user explicitly enables allowed scopes.
|
|
|
|
Suggested MCP tools:
|
|
|
|
- `gateway.complete`
|
|
- `gateway.chat`
|
|
- `gateway.classify`
|
|
- `gateway.models`
|
|
- `gateway.health`
|
|
- `gateway.route_preview`
|
|
- `knowledge.search`
|
|
- `repo.search`
|
|
- `repo.summarize`
|
|
- `config.get`
|
|
- `config.update`
|
|
|
|
## Embedding Everything
|
|
|
|
"Embed everything" should mean controlled, user-approved indexing:
|
|
|
|
- Scan allowed roots only.
|
|
- Chunk and embed text/code/docs.
|
|
- Store embeddings locally by default.
|
|
- Support SQLite + sqlite-vec for simple installs.
|
|
- Support Postgres + pgvector for team/server installs.
|
|
- Optional Qdrant for larger deployments.
|
|
|
|
Default modes:
|
|
|
|
- `personal`: SQLite, local-only, one user.
|
|
- `team`: Postgres, API keys, audit logging.
|
|
- `server`: Docker Compose, reverse proxy, persistence, MCP enabled.
|
|
|
|
## Shared AI Memory Sync
|
|
|
|
Add a shared memory layer for all connected AI clients and agents. The goal is to make Claude Code, Codex, ChatGPT exports, Cursor, IDE assistants, MCP tools, and automation agents work from the same durable project memory instead of each assistant living in an isolated context bubble.
|
|
|
|
Working name: **Memory Sync Backend**.
|
|
|
|
### Why Git/Gitea
|
|
|
|
Git is a strong default backend for portable AI memory:
|
|
|
|
- auditable history
|
|
- human-readable Markdown/JSON/YAML files
|
|
- offline-first local clone
|
|
- easy sync across machines
|
|
- branchable experiments
|
|
- reviewable diffs
|
|
- self-hostable with Gitea
|
|
- no mandatory SaaS dependency
|
|
|
|
Gitea can act as the team/server backend:
|
|
|
|
```text
|
|
Claude Code ─┐
|
|
Codex ──┼── Adaptive LLM Gateway ── Memory Sync ── Git/Gitea repo
|
|
Cursor ──┤ │
|
|
ChatGPT ──┘ └── local vector index for fast retrieval
|
|
```
|
|
|
|
### Memory Types
|
|
|
|
Store memory in typed folders:
|
|
|
|
```text
|
|
memory/
|
|
projects/
|
|
my-project/
|
|
PROJECT.md
|
|
decisions/
|
|
tasks/
|
|
architecture/
|
|
runbooks/
|
|
sync/
|
|
agents/
|
|
codex/
|
|
claude-code/
|
|
chatgpt/
|
|
cursor/
|
|
facts/
|
|
preferences/
|
|
credentials-notes/
|
|
incidents/
|
|
evals/
|
|
```
|
|
|
|
Use plain files for durable truth and an embedding index for fast lookup.
|
|
|
|
### Memory Records
|
|
|
|
Each memory entry should include provenance:
|
|
|
|
```yaml
|
|
id: mem_2026_05_01_001
|
|
type: decision
|
|
project: adaptive-llm-gateway
|
|
source_agent: codex
|
|
created_at: 2026-05-01T12:00:00Z
|
|
visibility: team
|
|
sensitivity: internal
|
|
tags: [mcp, memory, gitea]
|
|
summary: Use Gitea-backed memory sync as the shared durable backend.
|
|
links:
|
|
- file: OPEN_SOURCE_BLUEPRINT.md
|
|
```
|
|
|
|
### Sync Modes
|
|
|
|
- `local`: file-based memory in `~/.adaptive-llm-gateway/memory`.
|
|
- `git`: local Git repo, user pushes manually.
|
|
- `gitea`: automatic push/pull to self-hosted Gitea.
|
|
- `github`: optional public/private GitHub backend.
|
|
- `s3`: optional artifact backup, not source of truth.
|
|
|
|
### Agent Integration
|
|
|
|
Each agent gets a memory adapter:
|
|
|
|
- Claude Code: MCP resources + memory write tools.
|
|
- Codex: MCP resources + session handoff writer.
|
|
- ChatGPT: import exported chats; optional browser extension later.
|
|
- Cursor/VS Code: repo memory + generated context snippets.
|
|
- n8n: workflow memory and execution summaries.
|
|
|
|
Suggested MCP memory tools:
|
|
|
|
- `memory.search`
|
|
- `memory.read`
|
|
- `memory.write`
|
|
- `memory.append_session`
|
|
- `memory.summarize_project`
|
|
- `memory.record_decision`
|
|
- `memory.record_task`
|
|
- `memory.sync_status`
|
|
- `memory.pull`
|
|
- `memory.push`
|
|
|
|
### Conflict Handling
|
|
|
|
Memory should be append-first. Avoid agents overwriting each other.
|
|
|
|
Rules:
|
|
|
|
- Session logs are append-only.
|
|
- Decisions can supersede earlier decisions but should not delete them.
|
|
- Project summaries are regenerated from source logs and committed as derived files.
|
|
- Conflicts create review entries instead of automatic destructive merges.
|
|
|
|
### Privacy and Safety
|
|
|
|
- Never sync secrets.
|
|
- Secret-looking values are redacted before commit.
|
|
- Sensitive memory can stay local-only.
|
|
- Users can mark folders as `private`, `team`, or `public`.
|
|
- Chat imports require explicit approval.
|
|
- Every memory entry records source agent and timestamp.
|
|
|
|
### Gitea Default Layout
|
|
|
|
For self-hosted users:
|
|
|
|
```text
|
|
gitea.example.local/user/ai-memory.git
|
|
gitea.example.local/user/project-a.git
|
|
gitea.example.local/user/project-b.git
|
|
```
|
|
|
|
The gateway can either:
|
|
|
|
- use one central `ai-memory` repo, or
|
|
- add a `.ai-memory/` folder to each project repo.
|
|
|
|
Recommended default:
|
|
|
|
- personal mode: one central memory repo
|
|
- team mode: one memory repo plus per-project links
|
|
- open-source project mode: `.ai-memory/` inside the project
|
|
|
|
## Architecture
|
|
|
|
```text
|
|
User apps / agents / IDEs
|
|
|
|
|
| OpenAI API / MCP / SDK
|
|
v
|
|
Adaptive LLM Gateway
|
|
- routing
|
|
- prompt templates
|
|
- confidence gates
|
|
- budgets
|
|
- audit logs
|
|
- local knowledge lookup
|
|
|
|
|
+--> Local models: Ollama, LM Studio, LocalAI, llama.cpp
|
|
+--> Hosted providers: OpenAI, Anthropic, Groq, Mistral, etc.
|
|
+--> MCP tools/resources
|
|
+--> Local vector store
|
|
```
|
|
|
|
## Installation Targets
|
|
|
|
Simple local install:
|
|
|
|
```bash
|
|
npx adaptive-llm-gateway init
|
|
npx adaptive-llm-gateway start
|
|
```
|
|
|
|
Docker install:
|
|
|
|
```bash
|
|
docker compose up -d
|
|
```
|
|
|
|
Team/server install:
|
|
|
|
```bash
|
|
npx adaptive-llm-gateway init --mode team
|
|
npx adaptive-llm-gateway deploy-config
|
|
```
|
|
|
|
## Security Defaults
|
|
|
|
- Local-first.
|
|
- No secrets in config files.
|
|
- Read env vars only after consent.
|
|
- No filesystem indexing without allowed roots.
|
|
- No shell tool by default.
|
|
- No telemetry by default.
|
|
- Audit logs redact prompts by default unless user opts in.
|
|
- MCP dangerous tools disabled until explicitly enabled.
|
|
- Provider API keys remain in env, system keychain, or configured secret backend.
|
|
|
|
## Refactor Plan
|
|
|
|
Phase 1: Extract Context-X assumptions
|
|
|
|
- Move Context-X routing templates into optional example pack.
|
|
- Rename packages from `@llm-gateway/*` or prepare a neutral scope.
|
|
- Replace hardcoded domains and ports with generated config.
|
|
- Add `.env.example` for OSS.
|
|
|
|
Phase 2: First-run discovery
|
|
|
|
- Add `packages/discovery`.
|
|
- Detect local models, runtimes, repos, and common agent tools.
|
|
- Generate `gateway.config.yaml`.
|
|
|
|
Phase 3: MCP server
|
|
|
|
- Add `packages/mcp-server`.
|
|
- Expose gateway tools and resources.
|
|
- Add local model-backed tool reasoning.
|
|
|
|
Phase 4: Embeddings and knowledge
|
|
|
|
- Add `packages/knowledge`.
|
|
- Support SQLite default and Postgres/Qdrant optional backends.
|
|
- Add chunking, indexing, search, and repo/doc ingestion.
|
|
|
|
Phase 5: OSS release hardening
|
|
|
|
- Secret scan.
|
|
- License audit.
|
|
- Remove private data.
|
|
- Add quickstart docs.
|
|
- Add GitHub Actions CI.
|
|
- Add Docker Compose starter.
|
|
|
|
## Minimum Viable OSS Release
|
|
|
|
The first public version should include:
|
|
|
|
- Gateway server.
|
|
- Client SDK.
|
|
- OpenAI-compatible adapter.
|
|
- Local Ollama/LM Studio detection.
|
|
- MCP server with safe tools.
|
|
- SQLite config and audit store.
|
|
- Docker Compose.
|
|
- One generic prompt template pack.
|
|
- Documentation for local, team, and server modes.
|
|
|
|
## Name Ideas
|
|
|
|
- Adaptive LLM Gateway
|
|
- Open LLM Gateway
|
|
- LocalMesh Gateway
|
|
- ModelRouter
|
|
- GatewayKit
|
|
- AIDE Gateway
|