Files

T

Seth f3ea624269 Complete README: cost model, dual ledger, all endpoints documented

Full documentation covering:
- Quick start with automated setup
- Marginal vs dedicated billing modes
- All cost parameters with defaults
- Dual ledger architecture and tamper protection
- Reconciliation process
- All endpoints (public, authenticated, admin)
- Model update paths (remote + manual)
- Response metadata format
- Dashboard features
- GPU support (AMD ROCm + NVIDIA)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-20 20:00:01 -04:00

6.1 KiB

Raw Blame History

Mortdecai Gateway

Authenticated Ollama proxy with power metering and tamper-proof billing. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline.

Quick Start

git clone <repo-url>
cd mortdecai-gateway
chmod +x setup.sh
./setup.sh

The setup script:

Generates an API key
Starts Ollama + gateway in Docker
Downloads the model (~5.3 GB)
Loads it into Ollama
Runs a test inference
Prints connection details

Dashboard: http://localhost:8434/dashboard

Architecture

Internet → Port 8434 → Gateway (auth + metering + ledger) → Ollama → GPU

The gateway is the only exposed port. It proxies authenticated requests to Ollama and tracks every transaction in a tamper-proof ledger.

Cost Model

The gateway estimates electricity cost based on marginal power — only the extra watts your GPU draws during inference above its idle power.

Marginal cost = (GPU load - GPU idle + System load - System idle) × time × $/kWh

Configuration

All parameters in .env or adjustable live via POST /config:

Parameter	Default	Description
`GPU_IDLE_WATTS`	15	GPU power at idle
`GPU_LOAD_WATTS`	54	GPU power during inference
`SYSTEM_IDLE_WATTS`	45	System power at idle (CPU/RAM/fans)
`SYSTEM_INFERENCE_WATTS`	65	System power during inference
`ELECTRICITY_RATE`	0.15	$/kWh
`BILLING_MODE`	marginal	`marginal` (extra watts only) or `dedicated` (all uptime)
`BASE_RATE_PER_HOUR`	0.00	Hourly rate in dedicated mode
`SPENDING_CAP`	10.00	$ before gateway stops accepting requests
`LABOR_RATE_PER_HOUR`	0.00	$/hr for operator time (setup/maintenance)
`PROFIT_MARGIN`	0.00	Markup multiplier (0.10 = 10%)

Billing Modes

Marginal (default): Only charges for the extra power above idle. If the machine is on anyway (gaming, general use), you only pay for what inference adds.

Dedicated: Charges for full system power during uptime plus a base hourly rate. Use if the machine is kept on specifically for inference.

Dual Ledger

Every transaction is recorded in a tamper-proof ledger on both sides — the gateway operator's machine AND the client's server.

How it works

1. Client sends inference request to gateway
2. Gateway processes request via Ollama
3. Gateway records transaction in local ledger.jsonl
4. Gateway POSTs transaction to client's callback URL
5. Client's ledger_receiver.py saves independent copy
6. Both copies include a SHA-256 hash of (id + tokens + cost + shared_secret)

Tamper protection

Scenario	Detection
Gateway resets stats	Client's ledger has full history
Client denies requests happened	Gateway's ledger has full history
Either side edits a transaction	Hash verification fails on `/reconcile`
Shared secret mismatch	All hashes show as invalid

Setup

Both sides configure the same LEDGER_SECRET in their .env:

Gateway (.env):

LEDGER_SECRET=agreed_upon_secret_here
CALLBACK_URL=http://client_ip:8435/transaction

Client (ledger_receiver.py):

LEDGER_SECRET=agreed_upon_secret_here
python3 ledger_receiver.py

Reconciliation

# On the gateway — verify all hashes, compare ledger vs stats
curl -s http://localhost:8434/reconcile -H "Authorization: Bearer $KEY"

Response:

{
  "ledger_entries": 142,
  "ledger_total_cost": 0.003421,
  "stats_total_cost": 0.003421,
  "discrepancy": 0.0,
  "hash_verification": {
    "total": 142,
    "valid": 142,
    "invalid": 0
  },
  "status": "OK"
}

Endpoints

Public (no auth)

Endpoint	Description
`GET /health`	Ollama status + loaded models
`GET /dashboard`	Web dashboard with live stats

Authenticated

Endpoint	Description
`POST /api/chat`	Proxied to Ollama (inference)
`POST /api/generate`	Proxied to Ollama (inference)
`GET /stats`	Full usage stats + cost config
`GET /config`	View cost configuration
`POST /config`	Update cost parameters live
`GET /ledger`	View recent transactions + total cost
`GET /reconcile`	Verify ledger integrity

Admin

Endpoint	Description
`POST /admin/update-model`	Download + load new GGUF (requires `ALLOW_MODEL_UPDATES=true`)

Model Updates

Remote push (opt-in): Set ALLOW_MODEL_UPDATES=true in .env. The client can push new model versions:

curl -X POST http://gateway:8434/admin/update-model \
  -H "Authorization: Bearer $KEY" \
  -d '{"url": "https://mortdec.ai/dl/v5/mortdecai-v5.gguf", "name": "mortdecai-v5"}'

Manual update: Run the update script:

./update-model.sh https://mortdec.ai/dl/v5/mortdecai-v5.gguf mortdecai-v5

Response Metadata

Every proxied response includes gateway metadata:

{
  "message": {"role": "assistant", "content": "..."},
  "_gateway": {
    "duration_seconds": 3.42,
    "marginal_watts": 59,
    "energy_wh": 0.0561,
    "estimated_cost": 0.000008,
    "total_cost": 0.0342,
    "budget_remaining": 9.9658,
    "billing_mode": "marginal"
  }
}

Dashboard

The dashboard shows live:

Request count, tokens, inference time
Cost progress bar (spent vs cap)
Average cost per request, estimated remaining requests
Power model breakdown (idle→load for GPU and system)
Labor hours and cost
GPU utilization, temperature, power draw

Auto-refreshes every 10 seconds.

GPU Support

AMD ROCm (default): Docker compose uses ollama/ollama:rocm. Requires ROCm drivers on host. For Strix Halo, set BIOS to reserved VRAM mode.

NVIDIA: Edit docker-compose.yml — uncomment the deploy section, comment out the devices section.

Files

File	Purpose
`gateway.py`	Main proxy server
`ledger_receiver.py`	Client-side transaction receiver
`docker-compose.yml`	Ollama + gateway containers
`Dockerfile`	Gateway container build
`setup.sh`	Automated first-time setup
`update-model.sh`	Manual model update
`.env.example`	Configuration template

6.1 KiB Raw Blame History Unescape Escape