PaperCortex adds semantic search, auto-classification, receipt extraction, bank statement matching, and DATEV export to Paperless-ngx — powered entirely by local AI through Ollama. Exposes everything as an MCP Server for Claude Code and AI agent integration. - MCP Server with 5 tools (search, classify, receipt, query, export) - Local Ollama embeddings for semantic document search - Receipt data extraction (vendor, amount, date, tax, line items) - DATEV Buchungsstapel CSV export for German accounting - Bank CSV transaction matching - Paperless-ngx REST API client - Docker deployment - Zero cloud dependencies — 100% self-hosted
108 lines
2.2 KiB
Markdown
108 lines
2.2 KiB
Markdown
# Setup Guide
|
|
|
|
## Prerequisites
|
|
|
|
- **Node.js** 20+ (or Docker)
|
|
- **Paperless-ngx** instance with API access
|
|
- **Ollama** with required models
|
|
|
|
## Step 1: Install Ollama Models
|
|
|
|
```bash
|
|
# Required: LLM for classification and extraction
|
|
ollama pull qwen2.5:14b
|
|
|
|
# Required: Embedding model for semantic search
|
|
ollama pull nomic-embed-text
|
|
```
|
|
|
|
Verify Ollama is running:
|
|
```bash
|
|
curl http://localhost:11434/api/tags
|
|
```
|
|
|
|
## Step 2: Get Paperless-ngx API Token
|
|
|
|
1. Open your Paperless-ngx web UI
|
|
2. Go to Settings > API
|
|
3. Generate a new API token
|
|
4. Copy the token for the next step
|
|
|
|
## Step 3: Configure PaperCortex
|
|
|
|
```bash
|
|
git clone https://github.com/YOUR_USERNAME/PaperCortex.git
|
|
cd PaperCortex
|
|
cp .env.example .env
|
|
```
|
|
|
|
Edit `.env` with your values:
|
|
```env
|
|
PAPERLESS_URL=http://localhost:8000
|
|
PAPERLESS_TOKEN=<your-api-token>
|
|
OLLAMA_URL=http://localhost:11434
|
|
```
|
|
|
|
## Step 4: Run
|
|
|
|
### Option A: Docker (Recommended)
|
|
|
|
```bash
|
|
docker compose up -d
|
|
```
|
|
|
|
### Option B: Manual
|
|
|
|
```bash
|
|
npm install
|
|
npm run build
|
|
npm start
|
|
```
|
|
|
|
### Option C: Development
|
|
|
|
```bash
|
|
npm install
|
|
npm run dev
|
|
```
|
|
|
|
## Step 5: Register MCP Server
|
|
|
|
Add to your Claude Code configuration (`~/.claude.json`):
|
|
|
|
```json
|
|
{
|
|
"mcpServers": {
|
|
"papercortex": {
|
|
"command": "node",
|
|
"args": ["/absolute/path/to/PaperCortex/dist/mcp-server/index.js"],
|
|
"env": {
|
|
"PAPERLESS_URL": "http://localhost:8000",
|
|
"PAPERLESS_TOKEN": "your-token",
|
|
"OLLAMA_URL": "http://localhost:11434"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Step 6: Populate Vector Store
|
|
|
|
On first run, you need to embed your existing documents. This will be automated in a future release. For now, the vector store is populated as documents are queried or classified.
|
|
|
|
## Troubleshooting
|
|
|
|
### "Connection refused" to Paperless-ngx
|
|
- Verify the URL in `.env` is reachable
|
|
- Check that the API token is valid
|
|
- Ensure Paperless-ngx is running
|
|
|
|
### "Connection refused" to Ollama
|
|
- Run `ollama serve` if not already running
|
|
- Check the port (default: 11434)
|
|
- Verify models are pulled: `ollama list`
|
|
|
|
### Slow first query
|
|
- The first embedding generation may take longer as Ollama loads the model into memory
|
|
- Subsequent queries will be faster once the model is loaded
|