mortdecai-gateway/README.md

# Mortdecai Gateway

Authenticated Ollama proxy with power metering and tamper-proof billing. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline.

## Quick Start

```bash
git clone <repo-url>
cd mortdecai-gateway
chmod +x setup.sh
./setup.sh
```

The setup script:
1. Generates an API key
2. Starts Ollama + gateway in Docker
3. Downloads the model (~5.3 GB)
4. Loads it into Ollama
5. Runs a test inference
6. Prints connection details

Dashboard: http://localhost:8434/dashboard

## Architecture

```
Internet → Port 8434 → Gateway (auth + metering + ledger) → Ollama → GPU
```

The gateway is the only exposed port. It proxies authenticated requests to Ollama and tracks every transaction in a tamper-proof ledger.

## Cost Model

The gateway estimates electricity cost based on **marginal power** — only the extra watts your GPU draws during inference above its idle power.

```
Marginal cost = (GPU load - GPU idle + System load - System idle) × time × $/kWh
```

### Configuration

All parameters in `.env` or adjustable live via `POST /config`:

| Parameter | Default | Description |
|-----------|---------|-------------|
| `GPU_IDLE_WATTS` | 15 | GPU power at idle |
| `GPU_LOAD_WATTS` | 54 | GPU power during inference |
| `SYSTEM_IDLE_WATTS` | 45 | System power at idle (CPU/RAM/fans) |
| `SYSTEM_INFERENCE_WATTS` | 65 | System power during inference |
| `ELECTRICITY_RATE` | 0.15 | $/kWh |
| `BILLING_MODE` | marginal | `marginal` (extra watts only) or `dedicated` (all uptime) |
| `BASE_RATE_PER_HOUR` | 0.00 | Hourly rate in dedicated mode |
| `SPENDING_CAP` | 10.00 | $ before gateway stops accepting requests |
| `LABOR_RATE_PER_HOUR` | 0.00 | $/hr for operator time (setup/maintenance) |
| `PROFIT_MARGIN` | 0.00 | Markup multiplier (0.10 = 10%) |

### Billing Modes

**Marginal** (default): Only charges for the extra power above idle. If the machine is on anyway (gaming, general use), you only pay for what inference adds.

**Dedicated**: Charges for full system power during uptime plus a base hourly rate. Use if the machine is kept on specifically for inference.

## Dual Ledger

Every transaction is recorded in a tamper-proof ledger on **both sides** — the gateway operator's machine AND the client's server.

### How it works

```
1. Client sends inference request to gateway
2. Gateway processes request via Ollama
3. Gateway records transaction in local ledger.jsonl
4. Gateway POSTs transaction to client's callback URL
5. Client's ledger_receiver.py saves independent copy
6. Both copies include a SHA-256 hash of (id + tokens + cost + shared_secret)
```

### Tamper protection

| Scenario | Detection |
|----------|-----------|
| Gateway resets stats | Client's ledger has full history |
| Client denies requests happened | Gateway's ledger has full history |
| Either side edits a transaction | Hash verification fails on `/reconcile` |
| Shared secret mismatch | All hashes show as invalid |

### Setup

Both sides configure the same `LEDGER_SECRET` in their `.env`:

**Gateway (.env):**
```
LEDGER_SECRET=agreed_upon_secret_here
CALLBACK_URL=http://client_ip:8435/transaction
```

**Client (ledger_receiver.py):**
```
LEDGER_SECRET=agreed_upon_secret_here
python3 ledger_receiver.py
```

### Reconciliation

```bash
# On the gateway — verify all hashes, compare ledger vs stats
curl -s http://localhost:8434/reconcile -H "Authorization: Bearer $KEY"
```

Response:
```json
{
  "ledger_entries": 142,
  "ledger_total_cost": 0.003421,
  "stats_total_cost": 0.003421,
  "discrepancy": 0.0,
  "hash_verification": {
    "total": 142,
    "valid": 142,
    "invalid": 0
  },
  "status": "OK"
}
```

## Endpoints

### Public (no auth)

| Endpoint | Description |
|----------|-------------|
| `GET /health` | Ollama status + loaded models |
| `GET /dashboard` | Web dashboard with live stats |

### Authenticated

| Endpoint | Description |
|----------|-------------|
| `POST /api/chat` | Proxied to Ollama (inference) |
| `POST /api/generate` | Proxied to Ollama (inference) |
| `GET /stats` | Full usage stats + cost config |
| `GET /config` | View cost configuration |
| `POST /config` | Update cost parameters live |
| `GET /ledger` | View recent transactions + total cost |
| `GET /reconcile` | Verify ledger integrity |

### Admin

| Endpoint | Description |
|----------|-------------|
| `POST /admin/update-model` | Download + load new GGUF (requires `ALLOW_MODEL_UPDATES=true`) |

## Model Updates

**Remote push** (opt-in): Set `ALLOW_MODEL_UPDATES=true` in `.env`. The client can push new model versions:

```bash
curl -X POST http://gateway:8434/admin/update-model \
  -H "Authorization: Bearer $KEY" \
  -d '{"url": "https://mortdec.ai/dl/v5/mortdecai-v5.gguf", "name": "mortdecai-v5"}'
```

**Manual update**: Run the update script:
```bash
./update-model.sh https://mortdec.ai/dl/v5/mortdecai-v5.gguf mortdecai-v5
```

## Response Metadata

Every proxied response includes gateway metadata:

```json
{
  "message": {"role": "assistant", "content": "..."},
  "_gateway": {
    "duration_seconds": 3.42,
    "marginal_watts": 59,
    "energy_wh": 0.0561,
    "estimated_cost": 0.000008,
    "total_cost": 0.0342,
    "budget_remaining": 9.9658,
    "billing_mode": "marginal"
  }
}
```

## Dashboard

The dashboard shows live:
- Request count, tokens, inference time
- Cost progress bar (spent vs cap)
- Average cost per request, estimated remaining requests
- Power model breakdown (idle→load for GPU and system)
- Labor hours and cost
- GPU utilization, temperature, power draw

Auto-refreshes every 10 seconds.

## GPU Support

**AMD ROCm** (default): Docker compose uses `ollama/ollama:rocm`. Requires ROCm drivers on host. For Strix Halo, set BIOS to reserved VRAM mode.

**NVIDIA**: Edit `docker-compose.yml` — uncomment the `deploy` section, comment out the `devices` section.

## Files

| File | Purpose |
|------|---------|
| `gateway.py` | Main proxy server |
| `ledger_receiver.py` | Client-side transaction receiver |
| `docker-compose.yml` | Ollama + gateway containers |
| `Dockerfile` | Gateway container build |
| `setup.sh` | Automated first-time setup |
| `update-model.sh` | Manual model update |
| `.env.example` | Configuration template |