PaperCortex adds semantic search, auto-classification, receipt extraction, bank statement matching, and DATEV export to Paperless-ngx — powered entirely by local AI through Ollama. Exposes everything as an MCP Server for Claude Code and AI agent integration. - MCP Server with 5 tools (search, classify, receipt, query, export) - Local Ollama embeddings for semantic document search - Receipt data extraction (vendor, amount, date, tax, line items) - DATEV Buchungsstapel CSV export for German accounting - Bank CSV transaction matching - Paperless-ngx REST API client - Docker deployment - Zero cloud dependencies — 100% self-hosted
2.2 KiB
2.2 KiB
Setup Guide
Prerequisites
- Node.js 20+ (or Docker)
- Paperless-ngx instance with API access
- Ollama with required models
Step 1: Install Ollama Models
# Required: LLM for classification and extraction
ollama pull qwen2.5:14b
# Required: Embedding model for semantic search
ollama pull nomic-embed-text
Verify Ollama is running:
curl http://localhost:11434/api/tags
Step 2: Get Paperless-ngx API Token
- Open your Paperless-ngx web UI
- Go to Settings > API
- Generate a new API token
- Copy the token for the next step
Step 3: Configure PaperCortex
git clone https://github.com/YOUR_USERNAME/PaperCortex.git
cd PaperCortex
cp .env.example .env
Edit .env with your values:
PAPERLESS_URL=http://localhost:8000
PAPERLESS_TOKEN=<your-api-token>
OLLAMA_URL=http://localhost:11434
Step 4: Run
Option A: Docker (Recommended)
docker compose up -d
Option B: Manual
npm install
npm run build
npm start
Option C: Development
npm install
npm run dev
Step 5: Register MCP Server
Add to your Claude Code configuration (~/.claude.json):
{
"mcpServers": {
"papercortex": {
"command": "node",
"args": ["/absolute/path/to/PaperCortex/dist/mcp-server/index.js"],
"env": {
"PAPERLESS_URL": "http://localhost:8000",
"PAPERLESS_TOKEN": "your-token",
"OLLAMA_URL": "http://localhost:11434"
}
}
}
}
Step 6: Populate Vector Store
On first run, you need to embed your existing documents. This will be automated in a future release. For now, the vector store is populated as documents are queried or classified.
Troubleshooting
"Connection refused" to Paperless-ngx
- Verify the URL in
.envis reachable - Check that the API token is valid
- Ensure Paperless-ngx is running
"Connection refused" to Ollama
- Run
ollama serveif not already running - Check the port (default: 11434)
- Verify models are pulled:
ollama list
Slow first query
- The first embedding generation may take longer as Ollama loads the model into memory
- Subsequent queries will be faster once the model is loaded