Complete README: cost model, dual ledger, all endpoints documented
Full documentation covering: - Quick start with automated setup - Marginal vs dedicated billing modes - All cost parameters with defaults - Dual ledger architecture and tamper protection - Reconciliation process - All endpoints (public, authenticated, admin) - Model update paths (remote + manual) - Response metadata format - Dashboard features - GPU support (AMD ROCm + NVIDIA) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,78 +1,215 @@
|
||||
# Mortdecai Gateway
|
||||
|
||||
Authenticated Ollama proxy with power metering. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline.
|
||||
Authenticated Ollama proxy with power metering and tamper-proof billing. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
git clone <repo-url>
|
||||
cd mortdecai-gateway
|
||||
mkdir -p models
|
||||
# Copy the GGUF file into models/
|
||||
cp /path/to/mortdecai-v4.gguf models/
|
||||
chmod +x setup.sh
|
||||
./setup.sh
|
||||
```
|
||||
|
||||
The setup script:
|
||||
1. Generates an API key
|
||||
2. Starts Ollama + gateway in Docker
|
||||
3. Downloads the model (~5.3 GB)
|
||||
4. Loads it into Ollama
|
||||
5. Runs a test inference
|
||||
6. Prints connection details
|
||||
|
||||
Dashboard: http://localhost:8434/dashboard
|
||||
|
||||
## What It Does
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Your GPU → Ollama → Gateway (auth + metering) → Port 8434 → Internet
|
||||
Internet → Port 8434 → Gateway (auth + metering + ledger) → Ollama → GPU
|
||||
```
|
||||
|
||||
The gateway sits in front of Ollama and:
|
||||
- Authenticates requests via API key
|
||||
- Tracks inference time, tokens, energy usage
|
||||
- Estimates electricity cost (GPU TDP × time × rate)
|
||||
- Enforces a spending cap
|
||||
- Provides a dashboard with live stats
|
||||
The gateway is the only exposed port. It proxies authenticated requests to Ollama and tracks every transaction in a tamper-proof ledger.
|
||||
|
||||
## Configuration
|
||||
## Cost Model
|
||||
|
||||
Edit `.env`:
|
||||
The gateway estimates electricity cost based on **marginal power** — only the extra watts your GPU draws during inference above its idle power.
|
||||
|
||||
```
|
||||
API_KEY=mk_your_secret_key
|
||||
GPU_TDP_WATTS=54 # Your GPU's TDP
|
||||
SYSTEM_OVERHEAD_WATTS=30 # CPU/RAM draw during inference
|
||||
ELECTRICITY_RATE=0.15 # $/kWh
|
||||
SPENDING_CAP=10.00 # $ before gateway stops accepting
|
||||
Marginal cost = (GPU load - GPU idle + System load - System idle) × time × $/kWh
|
||||
```
|
||||
|
||||
### Configuration
|
||||
|
||||
All parameters in `.env` or adjustable live via `POST /config`:
|
||||
|
||||
| Parameter | Default | Description |
|
||||
|-----------|---------|-------------|
|
||||
| `GPU_IDLE_WATTS` | 15 | GPU power at idle |
|
||||
| `GPU_LOAD_WATTS` | 54 | GPU power during inference |
|
||||
| `SYSTEM_IDLE_WATTS` | 45 | System power at idle (CPU/RAM/fans) |
|
||||
| `SYSTEM_INFERENCE_WATTS` | 65 | System power during inference |
|
||||
| `ELECTRICITY_RATE` | 0.15 | $/kWh |
|
||||
| `BILLING_MODE` | marginal | `marginal` (extra watts only) or `dedicated` (all uptime) |
|
||||
| `BASE_RATE_PER_HOUR` | 0.00 | Hourly rate in dedicated mode |
|
||||
| `SPENDING_CAP` | 10.00 | $ before gateway stops accepting requests |
|
||||
| `LABOR_RATE_PER_HOUR` | 0.00 | $/hr for operator time (setup/maintenance) |
|
||||
| `PROFIT_MARGIN` | 0.00 | Markup multiplier (0.10 = 10%) |
|
||||
|
||||
### Billing Modes
|
||||
|
||||
**Marginal** (default): Only charges for the extra power above idle. If the machine is on anyway (gaming, general use), you only pay for what inference adds.
|
||||
|
||||
**Dedicated**: Charges for full system power during uptime plus a base hourly rate. Use if the machine is kept on specifically for inference.
|
||||
|
||||
## Dual Ledger
|
||||
|
||||
Every transaction is recorded in a tamper-proof ledger on **both sides** — the gateway operator's machine AND the client's server.
|
||||
|
||||
### How it works
|
||||
|
||||
```
|
||||
1. Client sends inference request to gateway
|
||||
2. Gateway processes request via Ollama
|
||||
3. Gateway records transaction in local ledger.jsonl
|
||||
4. Gateway POSTs transaction to client's callback URL
|
||||
5. Client's ledger_receiver.py saves independent copy
|
||||
6. Both copies include a SHA-256 hash of (id + tokens + cost + shared_secret)
|
||||
```
|
||||
|
||||
### Tamper protection
|
||||
|
||||
| Scenario | Detection |
|
||||
|----------|-----------|
|
||||
| Gateway resets stats | Client's ledger has full history |
|
||||
| Client denies requests happened | Gateway's ledger has full history |
|
||||
| Either side edits a transaction | Hash verification fails on `/reconcile` |
|
||||
| Shared secret mismatch | All hashes show as invalid |
|
||||
|
||||
### Setup
|
||||
|
||||
Both sides configure the same `LEDGER_SECRET` in their `.env`:
|
||||
|
||||
**Gateway (.env):**
|
||||
```
|
||||
LEDGER_SECRET=agreed_upon_secret_here
|
||||
CALLBACK_URL=http://client_ip:8435/transaction
|
||||
```
|
||||
|
||||
**Client (ledger_receiver.py):**
|
||||
```
|
||||
LEDGER_SECRET=agreed_upon_secret_here
|
||||
python3 ledger_receiver.py
|
||||
```
|
||||
|
||||
### Reconciliation
|
||||
|
||||
```bash
|
||||
# On the gateway — verify all hashes, compare ledger vs stats
|
||||
curl -s http://localhost:8434/reconcile -H "Authorization: Bearer $KEY"
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"ledger_entries": 142,
|
||||
"ledger_total_cost": 0.003421,
|
||||
"stats_total_cost": 0.003421,
|
||||
"discrepancy": 0.0,
|
||||
"hash_verification": {
|
||||
"total": 142,
|
||||
"valid": 142,
|
||||
"invalid": 0
|
||||
},
|
||||
"status": "OK"
|
||||
}
|
||||
```
|
||||
|
||||
## Endpoints
|
||||
|
||||
| Endpoint | Auth | Description |
|
||||
|----------|------|-------------|
|
||||
| `GET /health` | No | Ollama status + loaded models |
|
||||
| `GET /dashboard` | No | Web dashboard with live stats |
|
||||
| `GET /stats` | Yes | JSON usage stats |
|
||||
| `POST /api/chat` | Yes | Proxied to Ollama |
|
||||
| `POST /api/generate` | Yes | Proxied to Ollama |
|
||||
| `*` | Yes | Everything else proxied to Ollama |
|
||||
### Public (no auth)
|
||||
|
||||
| Endpoint | Description |
|
||||
|----------|-------------|
|
||||
| `GET /health` | Ollama status + loaded models |
|
||||
| `GET /dashboard` | Web dashboard with live stats |
|
||||
|
||||
### Authenticated
|
||||
|
||||
| Endpoint | Description |
|
||||
|----------|-------------|
|
||||
| `POST /api/chat` | Proxied to Ollama (inference) |
|
||||
| `POST /api/generate` | Proxied to Ollama (inference) |
|
||||
| `GET /stats` | Full usage stats + cost config |
|
||||
| `GET /config` | View cost configuration |
|
||||
| `POST /config` | Update cost parameters live |
|
||||
| `GET /ledger` | View recent transactions + total cost |
|
||||
| `GET /reconcile` | Verify ledger integrity |
|
||||
|
||||
### Admin
|
||||
|
||||
| Endpoint | Description |
|
||||
|----------|-------------|
|
||||
| `POST /admin/update-model` | Download + load new GGUF (requires `ALLOW_MODEL_UPDATES=true`) |
|
||||
|
||||
## Model Updates
|
||||
|
||||
**Remote push** (opt-in): Set `ALLOW_MODEL_UPDATES=true` in `.env`. The client can push new model versions:
|
||||
|
||||
```bash
|
||||
curl -X POST http://gateway:8434/admin/update-model \
|
||||
-H "Authorization: Bearer $KEY" \
|
||||
-d '{"url": "https://mortdec.ai/dl/v5/mortdecai-v5.gguf", "name": "mortdecai-v5"}'
|
||||
```
|
||||
|
||||
**Manual update**: Run the update script:
|
||||
```bash
|
||||
./update-model.sh https://mortdec.ai/dl/v5/mortdecai-v5.gguf mortdecai-v5
|
||||
```
|
||||
|
||||
## Response Metadata
|
||||
|
||||
Every proxied response includes a `_gateway` field:
|
||||
Every proxied response includes gateway metadata:
|
||||
|
||||
```json
|
||||
{
|
||||
"message": {"role": "assistant", "content": "..."},
|
||||
"_gateway": {
|
||||
"duration_seconds": 3.42,
|
||||
"energy_wh": 0.0798,
|
||||
"estimated_cost": 0.000012,
|
||||
"marginal_watts": 59,
|
||||
"energy_wh": 0.0561,
|
||||
"estimated_cost": 0.000008,
|
||||
"total_cost": 0.0342,
|
||||
"budget_remaining": 9.9658
|
||||
"budget_remaining": 9.9658,
|
||||
"billing_mode": "marginal"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## AMD ROCm
|
||||
## Dashboard
|
||||
|
||||
The Docker compose uses `ollama/ollama:rocm` by default. Requires ROCm drivers on the host. For Strix Halo, ensure BIOS is set to reserved VRAM mode.
|
||||
The dashboard shows live:
|
||||
- Request count, tokens, inference time
|
||||
- Cost progress bar (spent vs cap)
|
||||
- Average cost per request, estimated remaining requests
|
||||
- Power model breakdown (idle→load for GPU and system)
|
||||
- Labor hours and cost
|
||||
- GPU utilization, temperature, power draw
|
||||
|
||||
## NVIDIA
|
||||
Auto-refreshes every 10 seconds.
|
||||
|
||||
Edit `docker-compose.yml`: uncomment the `deploy` section and comment out the `devices` section.
|
||||
## GPU Support
|
||||
|
||||
**AMD ROCm** (default): Docker compose uses `ollama/ollama:rocm`. Requires ROCm drivers on host. For Strix Halo, set BIOS to reserved VRAM mode.
|
||||
|
||||
**NVIDIA**: Edit `docker-compose.yml` — uncomment the `deploy` section, comment out the `devices` section.
|
||||
|
||||
## Files
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `gateway.py` | Main proxy server |
|
||||
| `ledger_receiver.py` | Client-side transaction receiver |
|
||||
| `docker-compose.yml` | Ollama + gateway containers |
|
||||
| `Dockerfile` | Gateway container build |
|
||||
| `setup.sh` | Automated first-time setup |
|
||||
| `update-model.sh` | Manual model update |
|
||||
| `.env.example` | Configuration template |
|
||||
|
||||
Reference in New Issue
Block a user