# Mortdecai Gateway Authenticated Ollama proxy with power metering and tamper-proof billing. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline. ## Quick Start ```bash git clone cd mortdecai-gateway chmod +x setup.sh ./setup.sh ``` The setup script: 1. Generates an API key 2. Starts Ollama + gateway in Docker 3. Downloads the model (~5.3 GB) 4. Loads it into Ollama 5. Runs a test inference 6. Prints connection details Dashboard: http://localhost:8434/dashboard ## Architecture ``` Internet → Port 8434 → Gateway (auth + metering + ledger) → Ollama → GPU ``` The gateway is the only exposed port. It proxies authenticated requests to Ollama and tracks every transaction in a tamper-proof ledger. ## Cost Model The gateway estimates electricity cost based on **marginal power** — only the extra watts your GPU draws during inference above its idle power. ``` Marginal cost = (GPU load - GPU idle + System load - System idle) × time × $/kWh ``` ### Configuration All parameters in `.env` or adjustable live via `POST /config`: | Parameter | Default | Description | |-----------|---------|-------------| | `GPU_IDLE_WATTS` | 15 | GPU power at idle | | `GPU_LOAD_WATTS` | 54 | GPU power during inference | | `SYSTEM_IDLE_WATTS` | 45 | System power at idle (CPU/RAM/fans) | | `SYSTEM_INFERENCE_WATTS` | 65 | System power during inference | | `ELECTRICITY_RATE` | 0.15 | $/kWh | | `BILLING_MODE` | marginal | `marginal` (extra watts only) or `dedicated` (all uptime) | | `BASE_RATE_PER_HOUR` | 0.00 | Hourly rate in dedicated mode | | `SPENDING_CAP` | 10.00 | $ before gateway stops accepting requests | | `LABOR_RATE_PER_HOUR` | 0.00 | $/hr for operator time (setup/maintenance) | | `PROFIT_MARGIN` | 0.00 | Markup multiplier (0.10 = 10%) | ### Billing Modes **Marginal** (default): Only charges for the extra power above idle. If the machine is on anyway (gaming, general use), you only pay for what inference adds. **Dedicated**: Charges for full system power during uptime plus a base hourly rate. Use if the machine is kept on specifically for inference. ## Dual Ledger Every transaction is recorded in a tamper-proof ledger on **both sides** — the gateway operator's machine AND the client's server. ### How it works ``` 1. Client sends inference request to gateway 2. Gateway processes request via Ollama 3. Gateway records transaction in local ledger.jsonl 4. Gateway POSTs transaction to client's callback URL 5. Client's ledger_receiver.py saves independent copy 6. Both copies include a SHA-256 hash of (id + tokens + cost + shared_secret) ``` ### Tamper protection | Scenario | Detection | |----------|-----------| | Gateway resets stats | Client's ledger has full history | | Client denies requests happened | Gateway's ledger has full history | | Either side edits a transaction | Hash verification fails on `/reconcile` | | Shared secret mismatch | All hashes show as invalid | ### Setup Both sides configure the same `LEDGER_SECRET` in their `.env`: **Gateway (.env):** ``` LEDGER_SECRET=agreed_upon_secret_here CALLBACK_URL=http://client_ip:8435/transaction ``` **Client (ledger_receiver.py):** ``` LEDGER_SECRET=agreed_upon_secret_here python3 ledger_receiver.py ``` ### Reconciliation ```bash # On the gateway — verify all hashes, compare ledger vs stats curl -s http://localhost:8434/reconcile -H "Authorization: Bearer $KEY" ``` Response: ```json { "ledger_entries": 142, "ledger_total_cost": 0.003421, "stats_total_cost": 0.003421, "discrepancy": 0.0, "hash_verification": { "total": 142, "valid": 142, "invalid": 0 }, "status": "OK" } ``` ## Endpoints ### Public (no auth) | Endpoint | Description | |----------|-------------| | `GET /health` | Ollama status + loaded models | | `GET /dashboard` | Web dashboard with live stats | ### Authenticated | Endpoint | Description | |----------|-------------| | `POST /api/chat` | Proxied to Ollama (inference) | | `POST /api/generate` | Proxied to Ollama (inference) | | `GET /stats` | Full usage stats + cost config | | `GET /config` | View cost configuration | | `POST /config` | Update cost parameters live | | `GET /ledger` | View recent transactions + total cost | | `GET /reconcile` | Verify ledger integrity | ### Admin | Endpoint | Description | |----------|-------------| | `POST /admin/update-model` | Download + load new GGUF (requires `ALLOW_MODEL_UPDATES=true`) | ## Model Updates **Remote push** (opt-in): Set `ALLOW_MODEL_UPDATES=true` in `.env`. The client can push new model versions: ```bash curl -X POST http://gateway:8434/admin/update-model \ -H "Authorization: Bearer $KEY" \ -d '{"url": "https://mortdec.ai/dl/v5/mortdecai-0.5.0.gguf", "name": "mortdecai:0.5.0"}' ``` **Manual update**: Run the update script: ```bash ./update-model.sh https://mortdec.ai/dl/v5/mortdecai-0.5.0.gguf mortdecai:0.5.0 ``` ## Response Metadata Every proxied response includes gateway metadata: ```json { "message": {"role": "assistant", "content": "..."}, "_gateway": { "duration_seconds": 3.42, "marginal_watts": 59, "energy_wh": 0.0561, "estimated_cost": 0.000008, "total_cost": 0.0342, "budget_remaining": 9.9658, "billing_mode": "marginal" } } ``` ## Dashboard The dashboard shows live: - Request count, tokens, inference time - Cost progress bar (spent vs cap) - Average cost per request, estimated remaining requests - Power model breakdown (idle→load for GPU and system) - Labor hours and cost - GPU utilization, temperature, power draw Auto-refreshes every 10 seconds. ## GPU Support **AMD ROCm** (default): Docker compose uses `ollama/ollama:rocm`. Requires ROCm drivers on host. For Strix Halo, set BIOS to reserved VRAM mode. **NVIDIA**: Edit `docker-compose.yml` — uncomment the `deploy` section, comment out the `devices` section. ## Files | File | Purpose | |------|---------| | `gateway.py` | Main proxy server | | `ledger_receiver.py` | Client-side transaction receiver | | `docker-compose.yml` | Ollama + gateway containers | | `Dockerfile` | Gateway container build | | `setup.sh` | Automated first-time setup | | `update-model.sh` | Manual model update | | `.env.example` | Configuration template |