mortdecai-gateway/README.md

# Mortdecai Gateway

Authenticated Ollama proxy with power metering. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline.

## Quick Start

```bash
git clone <repo-url>
cd mortdecai-gateway
mkdir -p models
# Copy the GGUF file into models/
cp /path/to/mortdecai-v4.gguf models/
chmod +x setup.sh
./setup.sh
```

Dashboard: http://localhost:8434/dashboard

## What It Does

```
Your GPU → Ollama → Gateway (auth + metering) → Port 8434 → Internet
```

The gateway sits in front of Ollama and:
- Authenticates requests via API key
- Tracks inference time, tokens, energy usage
- Estimates electricity cost (GPU TDP × time × rate)
- Enforces a spending cap
- Provides a dashboard with live stats

## Configuration

Edit `.env`:

```
API_KEY=mk_your_secret_key
GPU_TDP_WATTS=54          # Your GPU's TDP
SYSTEM_OVERHEAD_WATTS=30  # CPU/RAM draw during inference
ELECTRICITY_RATE=0.15     # $/kWh
SPENDING_CAP=10.00        # $ before gateway stops accepting
```

## Endpoints

| Endpoint | Auth | Description |
|----------|------|-------------|
| `GET /health` | No | Ollama status + loaded models |
| `GET /dashboard` | No | Web dashboard with live stats |
| `GET /stats` | Yes | JSON usage stats |
| `POST /api/chat` | Yes | Proxied to Ollama |
| `POST /api/generate` | Yes | Proxied to Ollama |
| `*` | Yes | Everything else proxied to Ollama |

## Response Metadata

Every proxied response includes a `_gateway` field:

```json
{
  "message": { "role": "assistant", "content": "..." },
  "_gateway": {
    "duration_seconds": 3.42,
    "energy_wh": 0.0798,
    "estimated_cost": 0.000012,
    "total_cost": 0.0342,
    "budget_remaining": 9.9658
  }
}
```

## AMD ROCm

The Docker compose uses `ollama/ollama:rocm` by default. Requires ROCm drivers on the host. For Strix Halo, ensure BIOS is set to reserved VRAM mode.

## NVIDIA

Edit `docker-compose.yml`: uncomment the `deploy` section and comment out the `devices` section.