# Mortdecai Gateway Authenticated Ollama proxy with power metering. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline. ## Quick Start ```bash git clone cd mortdecai-gateway mkdir -p models # Copy the GGUF file into models/ cp /path/to/mortdecai-v4.gguf models/ chmod +x setup.sh ./setup.sh ``` Dashboard: http://localhost:8434/dashboard ## What It Does ``` Your GPU → Ollama → Gateway (auth + metering) → Port 8434 → Internet ``` The gateway sits in front of Ollama and: - Authenticates requests via API key - Tracks inference time, tokens, energy usage - Estimates electricity cost (GPU TDP × time × rate) - Enforces a spending cap - Provides a dashboard with live stats ## Configuration Edit `.env`: ``` API_KEY=mk_your_secret_key GPU_TDP_WATTS=54 # Your GPU's TDP SYSTEM_OVERHEAD_WATTS=30 # CPU/RAM draw during inference ELECTRICITY_RATE=0.15 # $/kWh SPENDING_CAP=10.00 # $ before gateway stops accepting ``` ## Endpoints | Endpoint | Auth | Description | |----------|------|-------------| | `GET /health` | No | Ollama status + loaded models | | `GET /dashboard` | No | Web dashboard with live stats | | `GET /stats` | Yes | JSON usage stats | | `POST /api/chat` | Yes | Proxied to Ollama | | `POST /api/generate` | Yes | Proxied to Ollama | | `*` | Yes | Everything else proxied to Ollama | ## Response Metadata Every proxied response includes a `_gateway` field: ```json { "message": { "role": "assistant", "content": "..." }, "_gateway": { "duration_seconds": 3.42, "energy_wh": 0.0798, "estimated_cost": 0.000012, "total_cost": 0.0342, "budget_remaining": 9.9658 } } ``` ## AMD ROCm The Docker compose uses `ollama/ollama:rocm` by default. Requires ROCm drivers on the host. For Strix Halo, ensure BIOS is set to reserved VRAM mode. ## NVIDIA Edit `docker-compose.yml`: uncomment the `deploy` section and comment out the `devices` section.