Files
mortdecai-gateway/README.md
T
Seth f3ea624269 Complete README: cost model, dual ledger, all endpoints documented
Full documentation covering:
- Quick start with automated setup
- Marginal vs dedicated billing modes
- All cost parameters with defaults
- Dual ledger architecture and tamper protection
- Reconciliation process
- All endpoints (public, authenticated, admin)
- Model update paths (remote + manual)
- Response metadata format
- Dashboard features
- GPU support (AMD ROCm + NVIDIA)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 20:00:01 -04:00

6.1 KiB
Raw Blame History

Mortdecai Gateway

Authenticated Ollama proxy with power metering and tamper-proof billing. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline.

Quick Start

git clone <repo-url>
cd mortdecai-gateway
chmod +x setup.sh
./setup.sh

The setup script:

  1. Generates an API key
  2. Starts Ollama + gateway in Docker
  3. Downloads the model (~5.3 GB)
  4. Loads it into Ollama
  5. Runs a test inference
  6. Prints connection details

Dashboard: http://localhost:8434/dashboard

Architecture

Internet → Port 8434 → Gateway (auth + metering + ledger) → Ollama → GPU

The gateway is the only exposed port. It proxies authenticated requests to Ollama and tracks every transaction in a tamper-proof ledger.

Cost Model

The gateway estimates electricity cost based on marginal power — only the extra watts your GPU draws during inference above its idle power.

Marginal cost = (GPU load - GPU idle + System load - System idle) × time × $/kWh

Configuration

All parameters in .env or adjustable live via POST /config:

Parameter Default Description
GPU_IDLE_WATTS 15 GPU power at idle
GPU_LOAD_WATTS 54 GPU power during inference
SYSTEM_IDLE_WATTS 45 System power at idle (CPU/RAM/fans)
SYSTEM_INFERENCE_WATTS 65 System power during inference
ELECTRICITY_RATE 0.15 $/kWh
BILLING_MODE marginal marginal (extra watts only) or dedicated (all uptime)
BASE_RATE_PER_HOUR 0.00 Hourly rate in dedicated mode
SPENDING_CAP 10.00 $ before gateway stops accepting requests
LABOR_RATE_PER_HOUR 0.00 $/hr for operator time (setup/maintenance)
PROFIT_MARGIN 0.00 Markup multiplier (0.10 = 10%)

Billing Modes

Marginal (default): Only charges for the extra power above idle. If the machine is on anyway (gaming, general use), you only pay for what inference adds.

Dedicated: Charges for full system power during uptime plus a base hourly rate. Use if the machine is kept on specifically for inference.

Dual Ledger

Every transaction is recorded in a tamper-proof ledger on both sides — the gateway operator's machine AND the client's server.

How it works

1. Client sends inference request to gateway
2. Gateway processes request via Ollama
3. Gateway records transaction in local ledger.jsonl
4. Gateway POSTs transaction to client's callback URL
5. Client's ledger_receiver.py saves independent copy
6. Both copies include a SHA-256 hash of (id + tokens + cost + shared_secret)

Tamper protection

Scenario Detection
Gateway resets stats Client's ledger has full history
Client denies requests happened Gateway's ledger has full history
Either side edits a transaction Hash verification fails on /reconcile
Shared secret mismatch All hashes show as invalid

Setup

Both sides configure the same LEDGER_SECRET in their .env:

Gateway (.env):

LEDGER_SECRET=agreed_upon_secret_here
CALLBACK_URL=http://client_ip:8435/transaction

Client (ledger_receiver.py):

LEDGER_SECRET=agreed_upon_secret_here
python3 ledger_receiver.py

Reconciliation

# On the gateway — verify all hashes, compare ledger vs stats
curl -s http://localhost:8434/reconcile -H "Authorization: Bearer $KEY"

Response:

{
  "ledger_entries": 142,
  "ledger_total_cost": 0.003421,
  "stats_total_cost": 0.003421,
  "discrepancy": 0.0,
  "hash_verification": {
    "total": 142,
    "valid": 142,
    "invalid": 0
  },
  "status": "OK"
}

Endpoints

Public (no auth)

Endpoint Description
GET /health Ollama status + loaded models
GET /dashboard Web dashboard with live stats

Authenticated

Endpoint Description
POST /api/chat Proxied to Ollama (inference)
POST /api/generate Proxied to Ollama (inference)
GET /stats Full usage stats + cost config
GET /config View cost configuration
POST /config Update cost parameters live
GET /ledger View recent transactions + total cost
GET /reconcile Verify ledger integrity

Admin

Endpoint Description
POST /admin/update-model Download + load new GGUF (requires ALLOW_MODEL_UPDATES=true)

Model Updates

Remote push (opt-in): Set ALLOW_MODEL_UPDATES=true in .env. The client can push new model versions:

curl -X POST http://gateway:8434/admin/update-model \
  -H "Authorization: Bearer $KEY" \
  -d '{"url": "https://mortdec.ai/dl/v5/mortdecai-v5.gguf", "name": "mortdecai-v5"}'

Manual update: Run the update script:

./update-model.sh https://mortdec.ai/dl/v5/mortdecai-v5.gguf mortdecai-v5

Response Metadata

Every proxied response includes gateway metadata:

{
  "message": {"role": "assistant", "content": "..."},
  "_gateway": {
    "duration_seconds": 3.42,
    "marginal_watts": 59,
    "energy_wh": 0.0561,
    "estimated_cost": 0.000008,
    "total_cost": 0.0342,
    "budget_remaining": 9.9658,
    "billing_mode": "marginal"
  }
}

Dashboard

The dashboard shows live:

  • Request count, tokens, inference time
  • Cost progress bar (spent vs cap)
  • Average cost per request, estimated remaining requests
  • Power model breakdown (idle→load for GPU and system)
  • Labor hours and cost
  • GPU utilization, temperature, power draw

Auto-refreshes every 10 seconds.

GPU Support

AMD ROCm (default): Docker compose uses ollama/ollama:rocm. Requires ROCm drivers on host. For Strix Halo, set BIOS to reserved VRAM mode.

NVIDIA: Edit docker-compose.yml — uncomment the deploy section, comment out the devices section.

Files

File Purpose
gateway.py Main proxy server
ledger_receiver.py Client-side transaction receiver
docker-compose.yml Ollama + gateway containers
Dockerfile Gateway container build
setup.sh Automated first-time setup
update-model.sh Manual model update
.env.example Configuration template