Files
mortdecai-gateway/README.md
T
Seth c5865feb35 Mortdecai Gateway — authenticated Ollama proxy with power metering
- API key auth on all inference endpoints
- Power/cost tracking: GPU TDP × inference time × electricity rate
- Spending cap enforcement
- Web dashboard with live stats
- Docker compose for AMD ROCm (Strix Halo) or NVIDIA
- Auto-setup script with GGUF loading
- Tested against local Ollama

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:26:43 -04:00

79 lines
1.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Mortdecai Gateway
Authenticated Ollama proxy with power metering. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline.
## Quick Start
```bash
git clone <repo-url>
cd mortdecai-gateway
mkdir -p models
# Copy the GGUF file into models/
cp /path/to/mortdecai-v4.gguf models/
chmod +x setup.sh
./setup.sh
```
Dashboard: http://localhost:8434/dashboard
## What It Does
```
Your GPU → Ollama → Gateway (auth + metering) → Port 8434 → Internet
```
The gateway sits in front of Ollama and:
- Authenticates requests via API key
- Tracks inference time, tokens, energy usage
- Estimates electricity cost (GPU TDP × time × rate)
- Enforces a spending cap
- Provides a dashboard with live stats
## Configuration
Edit `.env`:
```
API_KEY=mk_your_secret_key
GPU_TDP_WATTS=54 # Your GPU's TDP
SYSTEM_OVERHEAD_WATTS=30 # CPU/RAM draw during inference
ELECTRICITY_RATE=0.15 # $/kWh
SPENDING_CAP=10.00 # $ before gateway stops accepting
```
## Endpoints
| Endpoint | Auth | Description |
|----------|------|-------------|
| `GET /health` | No | Ollama status + loaded models |
| `GET /dashboard` | No | Web dashboard with live stats |
| `GET /stats` | Yes | JSON usage stats |
| `POST /api/chat` | Yes | Proxied to Ollama |
| `POST /api/generate` | Yes | Proxied to Ollama |
| `*` | Yes | Everything else proxied to Ollama |
## Response Metadata
Every proxied response includes a `_gateway` field:
```json
{
"message": { "role": "assistant", "content": "..." },
"_gateway": {
"duration_seconds": 3.42,
"energy_wh": 0.0798,
"estimated_cost": 0.000012,
"total_cost": 0.0342,
"budget_remaining": 9.9658
}
}
```
## AMD ROCm
The Docker compose uses `ollama/ollama:rocm` by default. Requires ROCm drivers on the host. For Strix Halo, ensure BIOS is set to reserved VRAM mode.
## NVIDIA
Edit `docker-compose.yml`: uncomment the `deploy` section and comment out the `devices` section.