Complete README: cost model, dual ledger, all endpoints documented

Full documentation covering:
- Quick start with automated setup
- Marginal vs dedicated billing modes
- All cost parameters with defaults
- Dual ledger architecture and tamper protection
- Reconciliation process
- All endpoints (public, authenticated, admin)
- Model update paths (remote + manual)
- Response metadata format
- Dashboard features
- GPU support (AMD ROCm + NVIDIA)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-20 20:00:01 -04:00
parent 968b00890f
commit f3ea624269
+172 -35
View File
@@ -1,78 +1,215 @@
# Mortdecai Gateway
Authenticated Ollama proxy with power metering. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline.
Authenticated Ollama proxy with power metering and tamper-proof billing. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline.
## Quick Start
```bash
git clone <repo-url>
cd mortdecai-gateway
mkdir -p models
# Copy the GGUF file into models/
cp /path/to/mortdecai-v4.gguf models/
chmod +x setup.sh
./setup.sh
```
The setup script:
1. Generates an API key
2. Starts Ollama + gateway in Docker
3. Downloads the model (~5.3 GB)
4. Loads it into Ollama
5. Runs a test inference
6. Prints connection details
Dashboard: http://localhost:8434/dashboard
## What It Does
## Architecture
```
Your GPU → Ollama → Gateway (auth + metering) → Port 8434 → Internet
Internet → Port 8434 → Gateway (auth + metering + ledger) → Ollama → GPU
```
The gateway sits in front of Ollama and:
- Authenticates requests via API key
- Tracks inference time, tokens, energy usage
- Estimates electricity cost (GPU TDP × time × rate)
- Enforces a spending cap
- Provides a dashboard with live stats
The gateway is the only exposed port. It proxies authenticated requests to Ollama and tracks every transaction in a tamper-proof ledger.
## Configuration
## Cost Model
Edit `.env`:
The gateway estimates electricity cost based on **marginal power** — only the extra watts your GPU draws during inference above its idle power.
```
API_KEY=mk_your_secret_key
GPU_TDP_WATTS=54 # Your GPU's TDP
SYSTEM_OVERHEAD_WATTS=30 # CPU/RAM draw during inference
ELECTRICITY_RATE=0.15 # $/kWh
SPENDING_CAP=10.00 # $ before gateway stops accepting
Marginal cost = (GPU load - GPU idle + System load - System idle) × time × $/kWh
```
### Configuration
All parameters in `.env` or adjustable live via `POST /config`:
| Parameter | Default | Description |
|-----------|---------|-------------|
| `GPU_IDLE_WATTS` | 15 | GPU power at idle |
| `GPU_LOAD_WATTS` | 54 | GPU power during inference |
| `SYSTEM_IDLE_WATTS` | 45 | System power at idle (CPU/RAM/fans) |
| `SYSTEM_INFERENCE_WATTS` | 65 | System power during inference |
| `ELECTRICITY_RATE` | 0.15 | $/kWh |
| `BILLING_MODE` | marginal | `marginal` (extra watts only) or `dedicated` (all uptime) |
| `BASE_RATE_PER_HOUR` | 0.00 | Hourly rate in dedicated mode |
| `SPENDING_CAP` | 10.00 | $ before gateway stops accepting requests |
| `LABOR_RATE_PER_HOUR` | 0.00 | $/hr for operator time (setup/maintenance) |
| `PROFIT_MARGIN` | 0.00 | Markup multiplier (0.10 = 10%) |
### Billing Modes
**Marginal** (default): Only charges for the extra power above idle. If the machine is on anyway (gaming, general use), you only pay for what inference adds.
**Dedicated**: Charges for full system power during uptime plus a base hourly rate. Use if the machine is kept on specifically for inference.
## Dual Ledger
Every transaction is recorded in a tamper-proof ledger on **both sides** — the gateway operator's machine AND the client's server.
### How it works
```
1. Client sends inference request to gateway
2. Gateway processes request via Ollama
3. Gateway records transaction in local ledger.jsonl
4. Gateway POSTs transaction to client's callback URL
5. Client's ledger_receiver.py saves independent copy
6. Both copies include a SHA-256 hash of (id + tokens + cost + shared_secret)
```
### Tamper protection
| Scenario | Detection |
|----------|-----------|
| Gateway resets stats | Client's ledger has full history |
| Client denies requests happened | Gateway's ledger has full history |
| Either side edits a transaction | Hash verification fails on `/reconcile` |
| Shared secret mismatch | All hashes show as invalid |
### Setup
Both sides configure the same `LEDGER_SECRET` in their `.env`:
**Gateway (.env):**
```
LEDGER_SECRET=agreed_upon_secret_here
CALLBACK_URL=http://client_ip:8435/transaction
```
**Client (ledger_receiver.py):**
```
LEDGER_SECRET=agreed_upon_secret_here
python3 ledger_receiver.py
```
### Reconciliation
```bash
# On the gateway — verify all hashes, compare ledger vs stats
curl -s http://localhost:8434/reconcile -H "Authorization: Bearer $KEY"
```
Response:
```json
{
"ledger_entries": 142,
"ledger_total_cost": 0.003421,
"stats_total_cost": 0.003421,
"discrepancy": 0.0,
"hash_verification": {
"total": 142,
"valid": 142,
"invalid": 0
},
"status": "OK"
}
```
## Endpoints
| Endpoint | Auth | Description |
|----------|------|-------------|
| `GET /health` | No | Ollama status + loaded models |
| `GET /dashboard` | No | Web dashboard with live stats |
| `GET /stats` | Yes | JSON usage stats |
| `POST /api/chat` | Yes | Proxied to Ollama |
| `POST /api/generate` | Yes | Proxied to Ollama |
| `*` | Yes | Everything else proxied to Ollama |
### Public (no auth)
| Endpoint | Description |
|----------|-------------|
| `GET /health` | Ollama status + loaded models |
| `GET /dashboard` | Web dashboard with live stats |
### Authenticated
| Endpoint | Description |
|----------|-------------|
| `POST /api/chat` | Proxied to Ollama (inference) |
| `POST /api/generate` | Proxied to Ollama (inference) |
| `GET /stats` | Full usage stats + cost config |
| `GET /config` | View cost configuration |
| `POST /config` | Update cost parameters live |
| `GET /ledger` | View recent transactions + total cost |
| `GET /reconcile` | Verify ledger integrity |
### Admin
| Endpoint | Description |
|----------|-------------|
| `POST /admin/update-model` | Download + load new GGUF (requires `ALLOW_MODEL_UPDATES=true`) |
## Model Updates
**Remote push** (opt-in): Set `ALLOW_MODEL_UPDATES=true` in `.env`. The client can push new model versions:
```bash
curl -X POST http://gateway:8434/admin/update-model \
-H "Authorization: Bearer $KEY" \
-d '{"url": "https://mortdec.ai/dl/v5/mortdecai-v5.gguf", "name": "mortdecai-v5"}'
```
**Manual update**: Run the update script:
```bash
./update-model.sh https://mortdec.ai/dl/v5/mortdecai-v5.gguf mortdecai-v5
```
## Response Metadata
Every proxied response includes a `_gateway` field:
Every proxied response includes gateway metadata:
```json
{
"message": {"role": "assistant", "content": "..."},
"_gateway": {
"duration_seconds": 3.42,
"energy_wh": 0.0798,
"estimated_cost": 0.000012,
"marginal_watts": 59,
"energy_wh": 0.0561,
"estimated_cost": 0.000008,
"total_cost": 0.0342,
"budget_remaining": 9.9658
"budget_remaining": 9.9658,
"billing_mode": "marginal"
}
}
```
## AMD ROCm
## Dashboard
The Docker compose uses `ollama/ollama:rocm` by default. Requires ROCm drivers on the host. For Strix Halo, ensure BIOS is set to reserved VRAM mode.
The dashboard shows live:
- Request count, tokens, inference time
- Cost progress bar (spent vs cap)
- Average cost per request, estimated remaining requests
- Power model breakdown (idle→load for GPU and system)
- Labor hours and cost
- GPU utilization, temperature, power draw
## NVIDIA
Auto-refreshes every 10 seconds.
Edit `docker-compose.yml`: uncomment the `deploy` section and comment out the `devices` section.
## GPU Support
**AMD ROCm** (default): Docker compose uses `ollama/ollama:rocm`. Requires ROCm drivers on host. For Strix Halo, set BIOS to reserved VRAM mode.
**NVIDIA**: Edit `docker-compose.yml` — uncomment the `deploy` section, comment out the `devices` section.
## Files
| File | Purpose |
|------|---------|
| `gateway.py` | Main proxy server |
| `ledger_receiver.py` | Client-side transaction receiver |
| `docker-compose.yml` | Ollama + gateway containers |
| `Dockerfile` | Gateway container build |
| `setup.sh` | Automated first-time setup |
| `update-model.sh` | Manual model update |
| `.env.example` | Configuration template |