From f3ea624269559819871a0576056d07d1804db800 Mon Sep 17 00:00:00 2001
From: Seth Freiberg <seth@sethfreiberg.com>
Date: Fri, 20 Mar 2026 20:00:01 -0400
Subject: [PATCH] Complete README: cost model, dual ledger, all endpoints
 documented

Full documentation covering:
- Quick start with automated setup
- Marginal vs dedicated billing modes
- All cost parameters with defaults
- Dual ledger architecture and tamper protection
- Reconciliation process
- All endpoints (public, authenticated, admin)
- Model update paths (remote + manual)
- Response metadata format
- Dashboard features
- GPU support (AMD ROCm + NVIDIA)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 README.md | 209 ++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 173 insertions(+), 36 deletions(-)
diff --git a/README.md b/README.md
index e374434..53aac87 100644
--- a/README.md
+++ b/README.md
@@ -1,78 +1,215 @@
 # Mortdecai Gateway
 
-Authenticated Ollama proxy with power metering. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline.
+Authenticated Ollama proxy with power metering and tamper-proof billing. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline.
 
 ## Quick Start
 
 ```bash
 git clone <repo-url>
 cd mortdecai-gateway
-mkdir -p models
-# Copy the GGUF file into models/
-cp /path/to/mortdecai-v4.gguf models/
 chmod +x setup.sh
 ./setup.sh
 ```
 
+The setup script:
+1. Generates an API key
+2. Starts Ollama + gateway in Docker
+3. Downloads the model (~5.3 GB)
+4. Loads it into Ollama
+5. Runs a test inference
+6. Prints connection details
+
 Dashboard: http://localhost:8434/dashboard
 
-## What It Does
+## Architecture
 
 ```
-Your GPU → Ollama → Gateway (auth + metering) → Port 8434 → Internet
+Internet → Port 8434 → Gateway (auth + metering + ledger) → Ollama → GPU
 ```
 
-The gateway sits in front of Ollama and:
-- Authenticates requests via API key
-- Tracks inference time, tokens, energy usage
-- Estimates electricity cost (GPU TDP × time × rate)
-- Enforces a spending cap
-- Provides a dashboard with live stats
+The gateway is the only exposed port. It proxies authenticated requests to Ollama and tracks every transaction in a tamper-proof ledger.
 
-## Configuration
+## Cost Model
 
-Edit `.env`:
+The gateway estimates electricity cost based on **marginal power** — only the extra watts your GPU draws during inference above its idle power.
 
 ```
-API_KEY=mk_your_secret_key
-GPU_TDP_WATTS=54          # Your GPU's TDP
-SYSTEM_OVERHEAD_WATTS=30  # CPU/RAM draw during inference
-ELECTRICITY_RATE=0.15     # $/kWh
-SPENDING_CAP=10.00        # $ before gateway stops accepting
+Marginal cost = (GPU load - GPU idle + System load - System idle) × time × $/kWh
+```
+
+### Configuration
+
+All parameters in `.env` or adjustable live via `POST /config`:
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `GPU_IDLE_WATTS` | 15 | GPU power at idle |
+| `GPU_LOAD_WATTS` | 54 | GPU power during inference |
+| `SYSTEM_IDLE_WATTS` | 45 | System power at idle (CPU/RAM/fans) |
+| `SYSTEM_INFERENCE_WATTS` | 65 | System power during inference |
+| `ELECTRICITY_RATE` | 0.15 | $/kWh |
+| `BILLING_MODE` | marginal | `marginal` (extra watts only) or `dedicated` (all uptime) |
+| `BASE_RATE_PER_HOUR` | 0.00 | Hourly rate in dedicated mode |
+| `SPENDING_CAP` | 10.00 | $ before gateway stops accepting requests |
+| `LABOR_RATE_PER_HOUR` | 0.00 | $/hr for operator time (setup/maintenance) |
+| `PROFIT_MARGIN` | 0.00 | Markup multiplier (0.10 = 10%) |
+
+### Billing Modes
+
+**Marginal** (default): Only charges for the extra power above idle. If the machine is on anyway (gaming, general use), you only pay for what inference adds.
+
+**Dedicated**: Charges for full system power during uptime plus a base hourly rate. Use if the machine is kept on specifically for inference.
+
+## Dual Ledger
+
+Every transaction is recorded in a tamper-proof ledger on **both sides** — the gateway operator's machine AND the client's server.
+
+### How it works
+
+```
+1. Client sends inference request to gateway
+2. Gateway processes request via Ollama
+3. Gateway records transaction in local ledger.jsonl
+4. Gateway POSTs transaction to client's callback URL
+5. Client's ledger_receiver.py saves independent copy
+6. Both copies include a SHA-256 hash of (id + tokens + cost + shared_secret)
+```
+
+### Tamper protection
+
+| Scenario | Detection |
+|----------|-----------|
+| Gateway resets stats | Client's ledger has full history |
+| Client denies requests happened | Gateway's ledger has full history |
+| Either side edits a transaction | Hash verification fails on `/reconcile` |
+| Shared secret mismatch | All hashes show as invalid |
+
+### Setup
+
+Both sides configure the same `LEDGER_SECRET` in their `.env`:
+
+**Gateway (.env):**
+```
+LEDGER_SECRET=agreed_upon_secret_here
+CALLBACK_URL=http://client_ip:8435/transaction
+```
+
+**Client (ledger_receiver.py):**
+```
+LEDGER_SECRET=agreed_upon_secret_here
+python3 ledger_receiver.py
+```
+
+### Reconciliation
+
+```bash
+# On the gateway — verify all hashes, compare ledger vs stats
+curl -s http://localhost:8434/reconcile -H "Authorization: Bearer $KEY"
+```
+
+Response:
+```json
+{
+  "ledger_entries": 142,
+  "ledger_total_cost": 0.003421,
+  "stats_total_cost": 0.003421,
+  "discrepancy": 0.0,
+  "hash_verification": {
+    "total": 142,
+    "valid": 142,
+    "invalid": 0
+  },
+  "status": "OK"
+}
 ```
 
 ## Endpoints
 
-| Endpoint | Auth | Description |
-|----------|------|-------------|
-| `GET /health` | No | Ollama status + loaded models |
-| `GET /dashboard` | No | Web dashboard with live stats |
-| `GET /stats` | Yes | JSON usage stats |
-| `POST /api/chat` | Yes | Proxied to Ollama |
-| `POST /api/generate` | Yes | Proxied to Ollama |
-| `*` | Yes | Everything else proxied to Ollama |
+### Public (no auth)
+
+| Endpoint | Description |
+|----------|-------------|
+| `GET /health` | Ollama status + loaded models |
+| `GET /dashboard` | Web dashboard with live stats |
+
+### Authenticated
+
+| Endpoint | Description |
+|----------|-------------|
+| `POST /api/chat` | Proxied to Ollama (inference) |
+| `POST /api/generate` | Proxied to Ollama (inference) |
+| `GET /stats` | Full usage stats + cost config |
+| `GET /config` | View cost configuration |
+| `POST /config` | Update cost parameters live |
+| `GET /ledger` | View recent transactions + total cost |
+| `GET /reconcile` | Verify ledger integrity |
+
+### Admin
+
+| Endpoint | Description |
+|----------|-------------|
+| `POST /admin/update-model` | Download + load new GGUF (requires `ALLOW_MODEL_UPDATES=true`) |
+
+## Model Updates
+
+**Remote push** (opt-in): Set `ALLOW_MODEL_UPDATES=true` in `.env`. The client can push new model versions:
+
+```bash
+curl -X POST http://gateway:8434/admin/update-model \
+  -H "Authorization: Bearer $KEY" \
+  -d '{"url": "https://mortdec.ai/dl/v5/mortdecai-v5.gguf", "name": "mortdecai-v5"}'
+```
+
+**Manual update**: Run the update script:
+```bash
+./update-model.sh https://mortdec.ai/dl/v5/mortdecai-v5.gguf mortdecai-v5
+```
 
 ## Response Metadata
 
-Every proxied response includes a `_gateway` field:
+Every proxied response includes gateway metadata:
 
 ```json
 {
-  "message": { "role": "assistant", "content": "..." },
+  "message": {"role": "assistant", "content": "..."},
   "_gateway": {
     "duration_seconds": 3.42,
-    "energy_wh": 0.0798,
-    "estimated_cost": 0.000012,
+    "marginal_watts": 59,
+    "energy_wh": 0.0561,
+    "estimated_cost": 0.000008,
     "total_cost": 0.0342,
-    "budget_remaining": 9.9658
+    "budget_remaining": 9.9658,
+    "billing_mode": "marginal"
   }
 }
 ```
 
-## AMD ROCm
+## Dashboard
 
-The Docker compose uses `ollama/ollama:rocm` by default. Requires ROCm drivers on the host. For Strix Halo, ensure BIOS is set to reserved VRAM mode.
+The dashboard shows live:
+- Request count, tokens, inference time
+- Cost progress bar (spent vs cap)
+- Average cost per request, estimated remaining requests
+- Power model breakdown (idle→load for GPU and system)
+- Labor hours and cost
+- GPU utilization, temperature, power draw
 
-## NVIDIA
+Auto-refreshes every 10 seconds.
 
-Edit `docker-compose.yml`: uncomment the `deploy` section and comment out the `devices` section.
+## GPU Support
+
+**AMD ROCm** (default): Docker compose uses `ollama/ollama:rocm`. Requires ROCm drivers on host. For Strix Halo, set BIOS to reserved VRAM mode.
+
+**NVIDIA**: Edit `docker-compose.yml` — uncomment the `deploy` section, comment out the `devices` section.
+
+## Files
+
+| File | Purpose |
+|------|---------|
+| `gateway.py` | Main proxy server |
+| `ledger_receiver.py` | Client-side transaction receiver |
+| `docker-compose.yml` | Ollama + gateway containers |
+| `Dockerfile` | Gateway container build |
+| `setup.sh` | Automated first-time setup |
+| `update-model.sh` | Manual model update |
+| `.env.example` | Configuration template |