From f3ea624269559819871a0576056d07d1804db800 Mon Sep 17 00:00:00 2001 From: Seth Freiberg Date: Fri, 20 Mar 2026 20:00:01 -0400 Subject: [PATCH] Complete README: cost model, dual ledger, all endpoints documented Full documentation covering: - Quick start with automated setup - Marginal vs dedicated billing modes - All cost parameters with defaults - Dual ledger architecture and tamper protection - Reconciliation process - All endpoints (public, authenticated, admin) - Model update paths (remote + manual) - Response metadata format - Dashboard features - GPU support (AMD ROCm + NVIDIA) Co-Authored-By: Claude Opus 4.6 (1M context) --- README.md | 209 ++++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 173 insertions(+), 36 deletions(-) diff --git a/README.md b/README.md index e374434..53aac87 100644 --- a/README.md +++ b/README.md @@ -1,78 +1,215 @@ # Mortdecai Gateway -Authenticated Ollama proxy with power metering. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline. +Authenticated Ollama proxy with power metering and tamper-proof billing. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline. ## Quick Start ```bash git clone cd mortdecai-gateway -mkdir -p models -# Copy the GGUF file into models/ -cp /path/to/mortdecai-v4.gguf models/ chmod +x setup.sh ./setup.sh ``` +The setup script: +1. Generates an API key +2. Starts Ollama + gateway in Docker +3. Downloads the model (~5.3 GB) +4. Loads it into Ollama +5. Runs a test inference +6. Prints connection details + Dashboard: http://localhost:8434/dashboard -## What It Does +## Architecture ``` -Your GPU → Ollama → Gateway (auth + metering) → Port 8434 → Internet +Internet → Port 8434 → Gateway (auth + metering + ledger) → Ollama → GPU ``` -The gateway sits in front of Ollama and: -- Authenticates requests via API key -- Tracks inference time, tokens, energy usage -- Estimates electricity cost (GPU TDP × time × rate) -- Enforces a spending cap -- Provides a dashboard with live stats +The gateway is the only exposed port. It proxies authenticated requests to Ollama and tracks every transaction in a tamper-proof ledger. -## Configuration +## Cost Model -Edit `.env`: +The gateway estimates electricity cost based on **marginal power** — only the extra watts your GPU draws during inference above its idle power. ``` -API_KEY=mk_your_secret_key -GPU_TDP_WATTS=54 # Your GPU's TDP -SYSTEM_OVERHEAD_WATTS=30 # CPU/RAM draw during inference -ELECTRICITY_RATE=0.15 # $/kWh -SPENDING_CAP=10.00 # $ before gateway stops accepting +Marginal cost = (GPU load - GPU idle + System load - System idle) × time × $/kWh +``` + +### Configuration + +All parameters in `.env` or adjustable live via `POST /config`: + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `GPU_IDLE_WATTS` | 15 | GPU power at idle | +| `GPU_LOAD_WATTS` | 54 | GPU power during inference | +| `SYSTEM_IDLE_WATTS` | 45 | System power at idle (CPU/RAM/fans) | +| `SYSTEM_INFERENCE_WATTS` | 65 | System power during inference | +| `ELECTRICITY_RATE` | 0.15 | $/kWh | +| `BILLING_MODE` | marginal | `marginal` (extra watts only) or `dedicated` (all uptime) | +| `BASE_RATE_PER_HOUR` | 0.00 | Hourly rate in dedicated mode | +| `SPENDING_CAP` | 10.00 | $ before gateway stops accepting requests | +| `LABOR_RATE_PER_HOUR` | 0.00 | $/hr for operator time (setup/maintenance) | +| `PROFIT_MARGIN` | 0.00 | Markup multiplier (0.10 = 10%) | + +### Billing Modes + +**Marginal** (default): Only charges for the extra power above idle. If the machine is on anyway (gaming, general use), you only pay for what inference adds. + +**Dedicated**: Charges for full system power during uptime plus a base hourly rate. Use if the machine is kept on specifically for inference. + +## Dual Ledger + +Every transaction is recorded in a tamper-proof ledger on **both sides** — the gateway operator's machine AND the client's server. + +### How it works + +``` +1. Client sends inference request to gateway +2. Gateway processes request via Ollama +3. Gateway records transaction in local ledger.jsonl +4. Gateway POSTs transaction to client's callback URL +5. Client's ledger_receiver.py saves independent copy +6. Both copies include a SHA-256 hash of (id + tokens + cost + shared_secret) +``` + +### Tamper protection + +| Scenario | Detection | +|----------|-----------| +| Gateway resets stats | Client's ledger has full history | +| Client denies requests happened | Gateway's ledger has full history | +| Either side edits a transaction | Hash verification fails on `/reconcile` | +| Shared secret mismatch | All hashes show as invalid | + +### Setup + +Both sides configure the same `LEDGER_SECRET` in their `.env`: + +**Gateway (.env):** +``` +LEDGER_SECRET=agreed_upon_secret_here +CALLBACK_URL=http://client_ip:8435/transaction +``` + +**Client (ledger_receiver.py):** +``` +LEDGER_SECRET=agreed_upon_secret_here +python3 ledger_receiver.py +``` + +### Reconciliation + +```bash +# On the gateway — verify all hashes, compare ledger vs stats +curl -s http://localhost:8434/reconcile -H "Authorization: Bearer $KEY" +``` + +Response: +```json +{ + "ledger_entries": 142, + "ledger_total_cost": 0.003421, + "stats_total_cost": 0.003421, + "discrepancy": 0.0, + "hash_verification": { + "total": 142, + "valid": 142, + "invalid": 0 + }, + "status": "OK" +} ``` ## Endpoints -| Endpoint | Auth | Description | -|----------|------|-------------| -| `GET /health` | No | Ollama status + loaded models | -| `GET /dashboard` | No | Web dashboard with live stats | -| `GET /stats` | Yes | JSON usage stats | -| `POST /api/chat` | Yes | Proxied to Ollama | -| `POST /api/generate` | Yes | Proxied to Ollama | -| `*` | Yes | Everything else proxied to Ollama | +### Public (no auth) + +| Endpoint | Description | +|----------|-------------| +| `GET /health` | Ollama status + loaded models | +| `GET /dashboard` | Web dashboard with live stats | + +### Authenticated + +| Endpoint | Description | +|----------|-------------| +| `POST /api/chat` | Proxied to Ollama (inference) | +| `POST /api/generate` | Proxied to Ollama (inference) | +| `GET /stats` | Full usage stats + cost config | +| `GET /config` | View cost configuration | +| `POST /config` | Update cost parameters live | +| `GET /ledger` | View recent transactions + total cost | +| `GET /reconcile` | Verify ledger integrity | + +### Admin + +| Endpoint | Description | +|----------|-------------| +| `POST /admin/update-model` | Download + load new GGUF (requires `ALLOW_MODEL_UPDATES=true`) | + +## Model Updates + +**Remote push** (opt-in): Set `ALLOW_MODEL_UPDATES=true` in `.env`. The client can push new model versions: + +```bash +curl -X POST http://gateway:8434/admin/update-model \ + -H "Authorization: Bearer $KEY" \ + -d '{"url": "https://mortdec.ai/dl/v5/mortdecai-v5.gguf", "name": "mortdecai-v5"}' +``` + +**Manual update**: Run the update script: +```bash +./update-model.sh https://mortdec.ai/dl/v5/mortdecai-v5.gguf mortdecai-v5 +``` ## Response Metadata -Every proxied response includes a `_gateway` field: +Every proxied response includes gateway metadata: ```json { - "message": { "role": "assistant", "content": "..." }, + "message": {"role": "assistant", "content": "..."}, "_gateway": { "duration_seconds": 3.42, - "energy_wh": 0.0798, - "estimated_cost": 0.000012, + "marginal_watts": 59, + "energy_wh": 0.0561, + "estimated_cost": 0.000008, "total_cost": 0.0342, - "budget_remaining": 9.9658 + "budget_remaining": 9.9658, + "billing_mode": "marginal" } } ``` -## AMD ROCm +## Dashboard -The Docker compose uses `ollama/ollama:rocm` by default. Requires ROCm drivers on the host. For Strix Halo, ensure BIOS is set to reserved VRAM mode. +The dashboard shows live: +- Request count, tokens, inference time +- Cost progress bar (spent vs cap) +- Average cost per request, estimated remaining requests +- Power model breakdown (idle→load for GPU and system) +- Labor hours and cost +- GPU utilization, temperature, power draw -## NVIDIA +Auto-refreshes every 10 seconds. -Edit `docker-compose.yml`: uncomment the `deploy` section and comment out the `devices` section. +## GPU Support + +**AMD ROCm** (default): Docker compose uses `ollama/ollama:rocm`. Requires ROCm drivers on host. For Strix Halo, set BIOS to reserved VRAM mode. + +**NVIDIA**: Edit `docker-compose.yml` — uncomment the `deploy` section, comment out the `devices` section. + +## Files + +| File | Purpose | +|------|---------| +| `gateway.py` | Main proxy server | +| `ledger_receiver.py` | Client-side transaction receiver | +| `docker-compose.yml` | Ollama + gateway containers | +| `Dockerfile` | Gateway container build | +| `setup.sh` | Automated first-time setup | +| `update-model.sh` | Manual model update | +| `.env.example` | Configuration template |