Files
mortdecai-gateway/README.md
T
Seth f3ea624269 Complete README: cost model, dual ledger, all endpoints documented
Full documentation covering:
- Quick start with automated setup
- Marginal vs dedicated billing modes
- All cost parameters with defaults
- Dual ledger architecture and tamper protection
- Reconciliation process
- All endpoints (public, authenticated, admin)
- Model update paths (remote + manual)
- Response metadata format
- Dashboard features
- GPU support (AMD ROCm + NVIDIA)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 20:00:01 -04:00

216 lines
6.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Mortdecai Gateway
Authenticated Ollama proxy with power metering and tamper-proof billing. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline.
## Quick Start
```bash
git clone <repo-url>
cd mortdecai-gateway
chmod +x setup.sh
./setup.sh
```
The setup script:
1. Generates an API key
2. Starts Ollama + gateway in Docker
3. Downloads the model (~5.3 GB)
4. Loads it into Ollama
5. Runs a test inference
6. Prints connection details
Dashboard: http://localhost:8434/dashboard
## Architecture
```
Internet → Port 8434 → Gateway (auth + metering + ledger) → Ollama → GPU
```
The gateway is the only exposed port. It proxies authenticated requests to Ollama and tracks every transaction in a tamper-proof ledger.
## Cost Model
The gateway estimates electricity cost based on **marginal power** — only the extra watts your GPU draws during inference above its idle power.
```
Marginal cost = (GPU load - GPU idle + System load - System idle) × time × $/kWh
```
### Configuration
All parameters in `.env` or adjustable live via `POST /config`:
| Parameter | Default | Description |
|-----------|---------|-------------|
| `GPU_IDLE_WATTS` | 15 | GPU power at idle |
| `GPU_LOAD_WATTS` | 54 | GPU power during inference |
| `SYSTEM_IDLE_WATTS` | 45 | System power at idle (CPU/RAM/fans) |
| `SYSTEM_INFERENCE_WATTS` | 65 | System power during inference |
| `ELECTRICITY_RATE` | 0.15 | $/kWh |
| `BILLING_MODE` | marginal | `marginal` (extra watts only) or `dedicated` (all uptime) |
| `BASE_RATE_PER_HOUR` | 0.00 | Hourly rate in dedicated mode |
| `SPENDING_CAP` | 10.00 | $ before gateway stops accepting requests |
| `LABOR_RATE_PER_HOUR` | 0.00 | $/hr for operator time (setup/maintenance) |
| `PROFIT_MARGIN` | 0.00 | Markup multiplier (0.10 = 10%) |
### Billing Modes
**Marginal** (default): Only charges for the extra power above idle. If the machine is on anyway (gaming, general use), you only pay for what inference adds.
**Dedicated**: Charges for full system power during uptime plus a base hourly rate. Use if the machine is kept on specifically for inference.
## Dual Ledger
Every transaction is recorded in a tamper-proof ledger on **both sides** — the gateway operator's machine AND the client's server.
### How it works
```
1. Client sends inference request to gateway
2. Gateway processes request via Ollama
3. Gateway records transaction in local ledger.jsonl
4. Gateway POSTs transaction to client's callback URL
5. Client's ledger_receiver.py saves independent copy
6. Both copies include a SHA-256 hash of (id + tokens + cost + shared_secret)
```
### Tamper protection
| Scenario | Detection |
|----------|-----------|
| Gateway resets stats | Client's ledger has full history |
| Client denies requests happened | Gateway's ledger has full history |
| Either side edits a transaction | Hash verification fails on `/reconcile` |
| Shared secret mismatch | All hashes show as invalid |
### Setup
Both sides configure the same `LEDGER_SECRET` in their `.env`:
**Gateway (.env):**
```
LEDGER_SECRET=agreed_upon_secret_here
CALLBACK_URL=http://client_ip:8435/transaction
```
**Client (ledger_receiver.py):**
```
LEDGER_SECRET=agreed_upon_secret_here
python3 ledger_receiver.py
```
### Reconciliation
```bash
# On the gateway — verify all hashes, compare ledger vs stats
curl -s http://localhost:8434/reconcile -H "Authorization: Bearer $KEY"
```
Response:
```json
{
"ledger_entries": 142,
"ledger_total_cost": 0.003421,
"stats_total_cost": 0.003421,
"discrepancy": 0.0,
"hash_verification": {
"total": 142,
"valid": 142,
"invalid": 0
},
"status": "OK"
}
```
## Endpoints
### Public (no auth)
| Endpoint | Description |
|----------|-------------|
| `GET /health` | Ollama status + loaded models |
| `GET /dashboard` | Web dashboard with live stats |
### Authenticated
| Endpoint | Description |
|----------|-------------|
| `POST /api/chat` | Proxied to Ollama (inference) |
| `POST /api/generate` | Proxied to Ollama (inference) |
| `GET /stats` | Full usage stats + cost config |
| `GET /config` | View cost configuration |
| `POST /config` | Update cost parameters live |
| `GET /ledger` | View recent transactions + total cost |
| `GET /reconcile` | Verify ledger integrity |
### Admin
| Endpoint | Description |
|----------|-------------|
| `POST /admin/update-model` | Download + load new GGUF (requires `ALLOW_MODEL_UPDATES=true`) |
## Model Updates
**Remote push** (opt-in): Set `ALLOW_MODEL_UPDATES=true` in `.env`. The client can push new model versions:
```bash
curl -X POST http://gateway:8434/admin/update-model \
-H "Authorization: Bearer $KEY" \
-d '{"url": "https://mortdec.ai/dl/v5/mortdecai-v5.gguf", "name": "mortdecai-v5"}'
```
**Manual update**: Run the update script:
```bash
./update-model.sh https://mortdec.ai/dl/v5/mortdecai-v5.gguf mortdecai-v5
```
## Response Metadata
Every proxied response includes gateway metadata:
```json
{
"message": {"role": "assistant", "content": "..."},
"_gateway": {
"duration_seconds": 3.42,
"marginal_watts": 59,
"energy_wh": 0.0561,
"estimated_cost": 0.000008,
"total_cost": 0.0342,
"budget_remaining": 9.9658,
"billing_mode": "marginal"
}
}
```
## Dashboard
The dashboard shows live:
- Request count, tokens, inference time
- Cost progress bar (spent vs cap)
- Average cost per request, estimated remaining requests
- Power model breakdown (idle→load for GPU and system)
- Labor hours and cost
- GPU utilization, temperature, power draw
Auto-refreshes every 10 seconds.
## GPU Support
**AMD ROCm** (default): Docker compose uses `ollama/ollama:rocm`. Requires ROCm drivers on host. For Strix Halo, set BIOS to reserved VRAM mode.
**NVIDIA**: Edit `docker-compose.yml` — uncomment the `deploy` section, comment out the `devices` section.
## Files
| File | Purpose |
|------|---------|
| `gateway.py` | Main proxy server |
| `ledger_receiver.py` | Client-side transaction receiver |
| `docker-compose.yml` | Ollama + gateway containers |
| `Dockerfile` | Gateway container build |
| `setup.sh` | Automated first-time setup |
| `update-model.sh` | Manual model update |
| `.env.example` | Configuration template |