Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mortdecai Gateway
Authenticated Ollama proxy with power metering and tamper-proof billing. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline.
Quick Start
git clone <repo-url>
cd mortdecai-gateway
chmod +x setup.sh
./setup.sh
The setup script:
- Generates an API key
- Starts Ollama + gateway in Docker
- Downloads the model (~5.3 GB)
- Loads it into Ollama
- Runs a test inference
- Prints connection details
Dashboard: http://localhost:8434/dashboard
Architecture
Internet → Port 8434 → Gateway (auth + metering + ledger) → Ollama → GPU
The gateway is the only exposed port. It proxies authenticated requests to Ollama and tracks every transaction in a tamper-proof ledger.
Cost Model
The gateway estimates electricity cost based on marginal power — only the extra watts your GPU draws during inference above its idle power.
Marginal cost = (GPU load - GPU idle + System load - System idle) × time × $/kWh
Configuration
All parameters in .env or adjustable live via POST /config:
| Parameter | Default | Description |
|---|---|---|
GPU_IDLE_WATTS |
15 | GPU power at idle |
GPU_LOAD_WATTS |
54 | GPU power during inference |
SYSTEM_IDLE_WATTS |
45 | System power at idle (CPU/RAM/fans) |
SYSTEM_INFERENCE_WATTS |
65 | System power during inference |
ELECTRICITY_RATE |
0.15 | $/kWh |
BILLING_MODE |
marginal | marginal (extra watts only) or dedicated (all uptime) |
BASE_RATE_PER_HOUR |
0.00 | Hourly rate in dedicated mode |
SPENDING_CAP |
10.00 | $ before gateway stops accepting requests |
LABOR_RATE_PER_HOUR |
0.00 | $/hr for operator time (setup/maintenance) |
PROFIT_MARGIN |
0.00 | Markup multiplier (0.10 = 10%) |
Billing Modes
Marginal (default): Only charges for the extra power above idle. If the machine is on anyway (gaming, general use), you only pay for what inference adds.
Dedicated: Charges for full system power during uptime plus a base hourly rate. Use if the machine is kept on specifically for inference.
Dual Ledger
Every transaction is recorded in a tamper-proof ledger on both sides — the gateway operator's machine AND the client's server.
How it works
1. Client sends inference request to gateway
2. Gateway processes request via Ollama
3. Gateway records transaction in local ledger.jsonl
4. Gateway POSTs transaction to client's callback URL
5. Client's ledger_receiver.py saves independent copy
6. Both copies include a SHA-256 hash of (id + tokens + cost + shared_secret)
Tamper protection
| Scenario | Detection |
|---|---|
| Gateway resets stats | Client's ledger has full history |
| Client denies requests happened | Gateway's ledger has full history |
| Either side edits a transaction | Hash verification fails on /reconcile |
| Shared secret mismatch | All hashes show as invalid |
Setup
Both sides configure the same LEDGER_SECRET in their .env:
Gateway (.env):
LEDGER_SECRET=agreed_upon_secret_here
CALLBACK_URL=http://client_ip:8435/transaction
Client (ledger_receiver.py):
LEDGER_SECRET=agreed_upon_secret_here
python3 ledger_receiver.py
Reconciliation
# On the gateway — verify all hashes, compare ledger vs stats
curl -s http://localhost:8434/reconcile -H "Authorization: Bearer $KEY"
Response:
{
"ledger_entries": 142,
"ledger_total_cost": 0.003421,
"stats_total_cost": 0.003421,
"discrepancy": 0.0,
"hash_verification": {
"total": 142,
"valid": 142,
"invalid": 0
},
"status": "OK"
}
Endpoints
Public (no auth)
| Endpoint | Description |
|---|---|
GET /health |
Ollama status + loaded models |
GET /dashboard |
Web dashboard with live stats |
Authenticated
| Endpoint | Description |
|---|---|
POST /api/chat |
Proxied to Ollama (inference) |
POST /api/generate |
Proxied to Ollama (inference) |
GET /stats |
Full usage stats + cost config |
GET /config |
View cost configuration |
POST /config |
Update cost parameters live |
GET /ledger |
View recent transactions + total cost |
GET /reconcile |
Verify ledger integrity |
Admin
| Endpoint | Description |
|---|---|
POST /admin/update-model |
Download + load new GGUF (requires ALLOW_MODEL_UPDATES=true) |
Model Updates
Remote push (opt-in): Set ALLOW_MODEL_UPDATES=true in .env. The client can push new model versions:
curl -X POST http://gateway:8434/admin/update-model \
-H "Authorization: Bearer $KEY" \
-d '{"url": "https://mortdec.ai/dl/v5/mortdecai-0.5.0.gguf", "name": "mortdecai:0.5.0"}'
Manual update: Run the update script:
./update-model.sh https://mortdec.ai/dl/v5/mortdecai-0.5.0.gguf mortdecai:0.5.0
Response Metadata
Every proxied response includes gateway metadata:
{
"message": {"role": "assistant", "content": "..."},
"_gateway": {
"duration_seconds": 3.42,
"marginal_watts": 59,
"energy_wh": 0.0561,
"estimated_cost": 0.000008,
"total_cost": 0.0342,
"budget_remaining": 9.9658,
"billing_mode": "marginal"
}
}
Dashboard
The dashboard shows live:
- Request count, tokens, inference time
- Cost progress bar (spent vs cap)
- Average cost per request, estimated remaining requests
- Power model breakdown (idle→load for GPU and system)
- Labor hours and cost
- GPU utilization, temperature, power draw
Auto-refreshes every 10 seconds.
GPU Support
AMD ROCm (default): Docker compose uses ollama/ollama:rocm. Requires ROCm drivers on host. For Strix Halo, set BIOS to reserved VRAM mode.
NVIDIA: Edit docker-compose.yml — uncomment the deploy section, comment out the devices section.
Files
| File | Purpose |
|---|---|
gateway.py |
Main proxy server |
ledger_receiver.py |
Client-side transaction receiver |
docker-compose.yml |
Ollama + gateway containers |
Dockerfile |
Gateway container build |
setup.sh |
Automated first-time setup |
update-model.sh |
Manual model update |
.env.example |
Configuration template |