968b00890f8b5bec30f08a0f9b617288f27fc841
Every inference request is recorded in a local JSONL ledger with a SHA-256 hash of (id + tokens + duration + cost + shared_secret). Both sides keep independent copies: - Gateway (Matt's): writes to ledger.jsonl on every request - Receiver (Seth's): receives callbacks, saves per-gateway ledger Endpoints: - GET /ledger — view transactions + total cost - GET /reconcile — compare ledger vs stats, verify all hashes - POST /config — adjust cost params live ledger_receiver.py runs on Seth's server: - POST /transaction — receive and verify gateway callbacks - GET /summary — total cost per gateway - GET /ledger — all transactions across gateways If either side resets stats, the other's ledger has the full history. If either side tampers with entries, hash verification catches it. Tested: request → ledger write → reconcile → hash valid → zero discrepancy Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mortdecai Gateway
Authenticated Ollama proxy with power metering. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline.
Quick Start
git clone <repo-url>
cd mortdecai-gateway
mkdir -p models
# Copy the GGUF file into models/
cp /path/to/mortdecai-v4.gguf models/
chmod +x setup.sh
./setup.sh
Dashboard: http://localhost:8434/dashboard
What It Does
Your GPU → Ollama → Gateway (auth + metering) → Port 8434 → Internet
The gateway sits in front of Ollama and:
- Authenticates requests via API key
- Tracks inference time, tokens, energy usage
- Estimates electricity cost (GPU TDP × time × rate)
- Enforces a spending cap
- Provides a dashboard with live stats
Configuration
Edit .env:
API_KEY=mk_your_secret_key
GPU_TDP_WATTS=54 # Your GPU's TDP
SYSTEM_OVERHEAD_WATTS=30 # CPU/RAM draw during inference
ELECTRICITY_RATE=0.15 # $/kWh
SPENDING_CAP=10.00 # $ before gateway stops accepting
Endpoints
| Endpoint | Auth | Description |
|---|---|---|
GET /health |
No | Ollama status + loaded models |
GET /dashboard |
No | Web dashboard with live stats |
GET /stats |
Yes | JSON usage stats |
POST /api/chat |
Yes | Proxied to Ollama |
POST /api/generate |
Yes | Proxied to Ollama |
* |
Yes | Everything else proxied to Ollama |
Response Metadata
Every proxied response includes a _gateway field:
{
"message": { "role": "assistant", "content": "..." },
"_gateway": {
"duration_seconds": 3.42,
"energy_wh": 0.0798,
"estimated_cost": 0.000012,
"total_cost": 0.0342,
"budget_remaining": 9.9658
}
}
AMD ROCm
The Docker compose uses ollama/ollama:rocm by default. Requires ROCm drivers on the host. For Strix Halo, ensure BIOS is set to reserved VRAM mode.
NVIDIA
Edit docker-compose.yml: uncomment the deploy section and comment out the devices section.
Description
Languages
Python
86.4%
Shell
13.3%
Dockerfile
0.3%