Compare commits

...

10 Commits

Author SHA1 Message Date
Seth af5cb4df2a Semver rename: mortdecai:0.4.0, mortdecai:0.5.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 21:37:36 -04:00
Seth adeda6dd84 Pre-set HSA_OVERRIDE_GFX_VERSION for Strix Halo ROCm detection
Ollama ROCm doesn't auto-detect newer AMD iGPUs (gfx1150/1151).
Setting HSA_OVERRIDE_GFX_VERSION=11.0.0 in the compose fixes this.
Configurable via .env for other AMD chips.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 20:37:06 -04:00
Seth f3ea624269 Complete README: cost model, dual ledger, all endpoints documented
Full documentation covering:
- Quick start with automated setup
- Marginal vs dedicated billing modes
- All cost parameters with defaults
- Dual ledger architecture and tamper protection
- Reconciliation process
- All endpoints (public, authenticated, admin)
- Model update paths (remote + manual)
- Response metadata format
- Dashboard features
- GPU support (AMD ROCm + NVIDIA)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 20:00:01 -04:00
Seth 968b00890f Dual ledger: tamper-proof transaction tracking on both sides
Every inference request is recorded in a local JSONL ledger with a
SHA-256 hash of (id + tokens + duration + cost + shared_secret).

Both sides keep independent copies:
- Gateway (Matt's): writes to ledger.jsonl on every request
- Receiver (Seth's): receives callbacks, saves per-gateway ledger

Endpoints:
- GET /ledger — view transactions + total cost
- GET /reconcile — compare ledger vs stats, verify all hashes
- POST /config — adjust cost params live

ledger_receiver.py runs on Seth's server:
- POST /transaction — receive and verify gateway callbacks
- GET /summary — total cost per gateway
- GET /ledger — all transactions across gateways

If either side resets stats, the other's ledger has the full history.
If either side tampers with entries, hash verification catches it.

Tested: request → ledger write → reconcile → hash valid → zero discrepancy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:56:10 -04:00
Seth 583c563daa Fix startup print for new config model
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:50:54 -04:00
Seth 6d3df9ae58 Full cost model: marginal power, labor, profit, live config
Cost model:
- Marginal billing: only charge for watts above idle
- Dedicated billing: charge for all uptime (optional)
- Labor rate: $/hr for operator time, manually logged
- Profit margin: percentage markup on electricity cost
- All parameters adjustable live via POST /config

Dashboard shows:
- Cost breakdown with progress bar
- Power model (idle→load for GPU and system)
- Marginal watts per inference call
- Labor hours + labor cost
- Total owed (electricity + labor + margin)
- GPU utilization, temperature, power draw
- Avg cost per request, estimated remaining requests

Endpoints:
- GET /config — view current cost config
- POST /config — update any parameter live
- GET /stats — full usage stats + cost config (auth required)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:49:14 -04:00
Seth 648b123f14 Add manual model update script
./update-model.sh [url] [name]
Downloads GGUF and loads into Ollama. No remote access needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:41:56 -04:00
Seth 0b37d7de79 Add opt-in model update endpoint + API key support
Gateway: POST /admin/update-model downloads new GGUF and reloads.
Disabled by default — requires ALLOW_MODEL_UPDATES=true in .env.
Matt controls whether remote model updates are allowed.

Self-play: --api-key flag for authenticated gateway connections.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:39:50 -04:00
Seth f470f052aa Fix models mount to read-write for Modelfile creation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:35:45 -04:00
Seth df9f623943 Fully automated setup: downloads GGUF, loads model, tests inference
Setup script now:
1. Generates API key
2. Starts Docker containers
3. Downloads GGUF from mortdec.ai automatically (~5.3GB)
4. Creates Ollama model with correct chat template
5. Runs test inference
6. Prints connection details for Seth

Matt just runs ./setup.sh — no manual file copying.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:33:39 -04:00
7 changed files with 790 additions and 113 deletions
+29 -4
View File
@@ -1,6 +1,31 @@
# Mortdecai Gateway Configuration # Mortdecai Gateway Configuration
# All values can also be adjusted live via POST /config
# Auth
API_KEY=mk_change_this_to_a_real_key API_KEY=mk_change_this_to_a_real_key
GPU_TDP_WATTS=54
SYSTEM_OVERHEAD_WATTS=30 # Power model
ELECTRICITY_RATE=0.15 GPU_IDLE_WATTS=15 # GPU at idle (watts)
SPENDING_CAP=10.00 GPU_LOAD_WATTS=54 # GPU during inference (watts)
SYSTEM_IDLE_WATTS=45 # Whole system idle (watts)
SYSTEM_INFERENCE_WATTS=65 # Whole system during inference (watts)
# Billing
ELECTRICITY_RATE=0.15 # $/kWh
BILLING_MODE=marginal # "marginal" (only extra watts) or "dedicated" (all uptime)
BASE_RATE_PER_HOUR=0.00 # $/hr base (dedicated mode only)
SPENDING_CAP=10.00 # $ before gateway stops accepting
# Labor & profit
LABOR_RATE_PER_HOUR=0.00 # $/hr for setup/maintenance time
PROFIT_MARGIN=0.00 # Markup multiplier (0.10 = 10%)
# Dual ledger
LEDGER_SECRET=change_me_to_a_shared_secret # Both sides must match
CALLBACK_URL= # Seth's server (e.g. http://seth_ip:8435/transaction)
# Features
ALLOW_MODEL_UPDATES=false # Allow remote model push via /admin/update-model
# AMD GPU (Strix Halo / newer chips that ROCm doesn't auto-detect)
HSA_OVERRIDE_GFX_VERSION=11.0.0
+173 -36
View File
@@ -1,78 +1,215 @@
# Mortdecai Gateway # Mortdecai Gateway
Authenticated Ollama proxy with power metering. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline. Authenticated Ollama proxy with power metering and tamper-proof billing. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline.
## Quick Start ## Quick Start
```bash ```bash
git clone <repo-url> git clone <repo-url>
cd mortdecai-gateway cd mortdecai-gateway
mkdir -p models
# Copy the GGUF file into models/
cp /path/to/mortdecai-v4.gguf models/
chmod +x setup.sh chmod +x setup.sh
./setup.sh ./setup.sh
``` ```
The setup script:
1. Generates an API key
2. Starts Ollama + gateway in Docker
3. Downloads the model (~5.3 GB)
4. Loads it into Ollama
5. Runs a test inference
6. Prints connection details
Dashboard: http://localhost:8434/dashboard Dashboard: http://localhost:8434/dashboard
## What It Does ## Architecture
``` ```
Your GPU → Ollama → Gateway (auth + metering) → Port 8434 → Internet Internet → Port 8434 → Gateway (auth + metering + ledger) → Ollama → GPU
``` ```
The gateway sits in front of Ollama and: The gateway is the only exposed port. It proxies authenticated requests to Ollama and tracks every transaction in a tamper-proof ledger.
- Authenticates requests via API key
- Tracks inference time, tokens, energy usage
- Estimates electricity cost (GPU TDP × time × rate)
- Enforces a spending cap
- Provides a dashboard with live stats
## Configuration ## Cost Model
Edit `.env`: The gateway estimates electricity cost based on **marginal power** — only the extra watts your GPU draws during inference above its idle power.
``` ```
API_KEY=mk_your_secret_key Marginal cost = (GPU load - GPU idle + System load - System idle) × time × $/kWh
GPU_TDP_WATTS=54 # Your GPU's TDP ```
SYSTEM_OVERHEAD_WATTS=30 # CPU/RAM draw during inference
ELECTRICITY_RATE=0.15 # $/kWh ### Configuration
SPENDING_CAP=10.00 # $ before gateway stops accepting
All parameters in `.env` or adjustable live via `POST /config`:
| Parameter | Default | Description |
|-----------|---------|-------------|
| `GPU_IDLE_WATTS` | 15 | GPU power at idle |
| `GPU_LOAD_WATTS` | 54 | GPU power during inference |
| `SYSTEM_IDLE_WATTS` | 45 | System power at idle (CPU/RAM/fans) |
| `SYSTEM_INFERENCE_WATTS` | 65 | System power during inference |
| `ELECTRICITY_RATE` | 0.15 | $/kWh |
| `BILLING_MODE` | marginal | `marginal` (extra watts only) or `dedicated` (all uptime) |
| `BASE_RATE_PER_HOUR` | 0.00 | Hourly rate in dedicated mode |
| `SPENDING_CAP` | 10.00 | $ before gateway stops accepting requests |
| `LABOR_RATE_PER_HOUR` | 0.00 | $/hr for operator time (setup/maintenance) |
| `PROFIT_MARGIN` | 0.00 | Markup multiplier (0.10 = 10%) |
### Billing Modes
**Marginal** (default): Only charges for the extra power above idle. If the machine is on anyway (gaming, general use), you only pay for what inference adds.
**Dedicated**: Charges for full system power during uptime plus a base hourly rate. Use if the machine is kept on specifically for inference.
## Dual Ledger
Every transaction is recorded in a tamper-proof ledger on **both sides** — the gateway operator's machine AND the client's server.
### How it works
```
1. Client sends inference request to gateway
2. Gateway processes request via Ollama
3. Gateway records transaction in local ledger.jsonl
4. Gateway POSTs transaction to client's callback URL
5. Client's ledger_receiver.py saves independent copy
6. Both copies include a SHA-256 hash of (id + tokens + cost + shared_secret)
```
### Tamper protection
| Scenario | Detection |
|----------|-----------|
| Gateway resets stats | Client's ledger has full history |
| Client denies requests happened | Gateway's ledger has full history |
| Either side edits a transaction | Hash verification fails on `/reconcile` |
| Shared secret mismatch | All hashes show as invalid |
### Setup
Both sides configure the same `LEDGER_SECRET` in their `.env`:
**Gateway (.env):**
```
LEDGER_SECRET=agreed_upon_secret_here
CALLBACK_URL=http://client_ip:8435/transaction
```
**Client (ledger_receiver.py):**
```
LEDGER_SECRET=agreed_upon_secret_here
python3 ledger_receiver.py
```
### Reconciliation
```bash
# On the gateway — verify all hashes, compare ledger vs stats
curl -s http://localhost:8434/reconcile -H "Authorization: Bearer $KEY"
```
Response:
```json
{
"ledger_entries": 142,
"ledger_total_cost": 0.003421,
"stats_total_cost": 0.003421,
"discrepancy": 0.0,
"hash_verification": {
"total": 142,
"valid": 142,
"invalid": 0
},
"status": "OK"
}
``` ```
## Endpoints ## Endpoints
| Endpoint | Auth | Description | ### Public (no auth)
|----------|------|-------------|
| `GET /health` | No | Ollama status + loaded models | | Endpoint | Description |
| `GET /dashboard` | No | Web dashboard with live stats | |----------|-------------|
| `GET /stats` | Yes | JSON usage stats | | `GET /health` | Ollama status + loaded models |
| `POST /api/chat` | Yes | Proxied to Ollama | | `GET /dashboard` | Web dashboard with live stats |
| `POST /api/generate` | Yes | Proxied to Ollama |
| `*` | Yes | Everything else proxied to Ollama | ### Authenticated
| Endpoint | Description |
|----------|-------------|
| `POST /api/chat` | Proxied to Ollama (inference) |
| `POST /api/generate` | Proxied to Ollama (inference) |
| `GET /stats` | Full usage stats + cost config |
| `GET /config` | View cost configuration |
| `POST /config` | Update cost parameters live |
| `GET /ledger` | View recent transactions + total cost |
| `GET /reconcile` | Verify ledger integrity |
### Admin
| Endpoint | Description |
|----------|-------------|
| `POST /admin/update-model` | Download + load new GGUF (requires `ALLOW_MODEL_UPDATES=true`) |
## Model Updates
**Remote push** (opt-in): Set `ALLOW_MODEL_UPDATES=true` in `.env`. The client can push new model versions:
```bash
curl -X POST http://gateway:8434/admin/update-model \
-H "Authorization: Bearer $KEY" \
-d '{"url": "https://mortdec.ai/dl/v5/mortdecai-0.5.0.gguf", "name": "mortdecai:0.5.0"}'
```
**Manual update**: Run the update script:
```bash
./update-model.sh https://mortdec.ai/dl/v5/mortdecai-0.5.0.gguf mortdecai:0.5.0
```
## Response Metadata ## Response Metadata
Every proxied response includes a `_gateway` field: Every proxied response includes gateway metadata:
```json ```json
{ {
"message": { "role": "assistant", "content": "..." }, "message": {"role": "assistant", "content": "..."},
"_gateway": { "_gateway": {
"duration_seconds": 3.42, "duration_seconds": 3.42,
"energy_wh": 0.0798, "marginal_watts": 59,
"estimated_cost": 0.000012, "energy_wh": 0.0561,
"estimated_cost": 0.000008,
"total_cost": 0.0342, "total_cost": 0.0342,
"budget_remaining": 9.9658 "budget_remaining": 9.9658,
"billing_mode": "marginal"
} }
} }
``` ```
## AMD ROCm ## Dashboard
The Docker compose uses `ollama/ollama:rocm` by default. Requires ROCm drivers on the host. For Strix Halo, ensure BIOS is set to reserved VRAM mode. The dashboard shows live:
- Request count, tokens, inference time
- Cost progress bar (spent vs cap)
- Average cost per request, estimated remaining requests
- Power model breakdown (idle→load for GPU and system)
- Labor hours and cost
- GPU utilization, temperature, power draw
## NVIDIA Auto-refreshes every 10 seconds.
Edit `docker-compose.yml`: uncomment the `deploy` section and comment out the `devices` section. ## GPU Support
**AMD ROCm** (default): Docker compose uses `ollama/ollama:rocm`. Requires ROCm drivers on host. For Strix Halo, set BIOS to reserved VRAM mode.
**NVIDIA**: Edit `docker-compose.yml` — uncomment the `deploy` section, comment out the `devices` section.
## Files
| File | Purpose |
|------|---------|
| `gateway.py` | Main proxy server |
| `ledger_receiver.py` | Client-side transaction receiver |
| `docker-compose.yml` | Ollama + gateway containers |
| `Dockerfile` | Gateway container build |
| `setup.sh` | Automated first-time setup |
| `update-model.sh` | Manual model update |
| `.env.example` | Configuration template |
+1
View File
@@ -28,6 +28,7 @@ services:
- /dev/dri:/dev/dri - /dev/dri:/dev/dri
environment: environment:
- OLLAMA_HOST=0.0.0.0:11434 - OLLAMA_HOST=0.0.0.0:11434
- HSA_OVERRIDE_GFX_VERSION=${HSA_OVERRIDE_GFX_VERSION:-11.0.0}
# For NVIDIA, replace 'devices' above with: # For NVIDIA, replace 'devices' above with:
# deploy: # deploy:
# resources: # resources:
+332 -44
View File
@@ -19,6 +19,8 @@ import os
import time import time
import threading import threading
import subprocess import subprocess
import hashlib
import uuid
from http.server import HTTPServer, BaseHTTPRequestHandler from http.server import HTTPServer, BaseHTTPRequestHandler
from urllib.parse import urlparse, parse_qs from urllib.parse import urlparse, parse_qs
import requests import requests
@@ -27,11 +29,139 @@ import requests
OLLAMA_URL = os.environ.get("OLLAMA_URL", "http://localhost:11434") OLLAMA_URL = os.environ.get("OLLAMA_URL", "http://localhost:11434")
LISTEN_PORT = int(os.environ.get("GATEWAY_PORT", "8434")) LISTEN_PORT = int(os.environ.get("GATEWAY_PORT", "8434"))
API_KEY = os.environ.get("API_KEY", "mk_mortdecai_default") API_KEY = os.environ.get("API_KEY", "mk_mortdecai_default")
ELECTRICITY_RATE = float(os.environ.get("ELECTRICITY_RATE", "0.15")) # $/kWh
GPU_TDP_WATTS = float(os.environ.get("GPU_TDP_WATTS", "54")) # Strix Halo iGPU
SYSTEM_OVERHEAD_WATTS = float(os.environ.get("SYSTEM_OVERHEAD_WATTS", "30")) # CPU/RAM/etc idle draw during inference
SPENDING_CAP = float(os.environ.get("SPENDING_CAP", "10.00")) # $ before refusing requests
STATS_FILE = os.environ.get("STATS_FILE", "/var/lib/mortdecai-gateway/stats.json") STATS_FILE = os.environ.get("STATS_FILE", "/var/lib/mortdecai-gateway/stats.json")
CONFIG_FILE = os.environ.get("CONFIG_FILE", "/var/lib/mortdecai-gateway/cost_config.json")
# Default cost config (overridden by config file or env vars)
_DEFAULT_COST_CONFIG = {
"electricity_rate": 0.15, # $/kWh
"gpu_idle_watts": 15, # GPU at idle
"gpu_load_watts": 54, # GPU during inference
"system_idle_watts": 45, # Whole system idle (CPU/RAM/fans/PSU)
"system_inference_watts": 65, # Whole system during inference
"billing_mode": "marginal", # "marginal" = only extra watts; "dedicated" = all uptime
"base_rate_per_hour": 0.00, # $/hr for keeping machine on (dedicated mode only)
"spending_cap": 10.00, # $ before refusing requests
"labor_rate_per_hour": 0.00, # $/hr for operator's time (setup, maintenance)
"profit_margin": 0.00, # multiplier (0.10 = 10% markup)
"labor_hours_logged": 0.0, # total hours spent on setup/maintenance
}
def _load_cost_config():
config = dict(_DEFAULT_COST_CONFIG)
# Override from file
try:
with open(CONFIG_FILE) as f:
config.update(json.load(f))
except:
pass
# Override from env vars
for key in _DEFAULT_COST_CONFIG:
env_key = key.upper()
val = os.environ.get(env_key)
if val is not None:
try:
config[key] = type(_DEFAULT_COST_CONFIG[key])(val)
except:
pass
return config
def _save_cost_config(config):
try:
os.makedirs(os.path.dirname(CONFIG_FILE), exist_ok=True)
with open(CONFIG_FILE, "w") as f:
json.dump(config, f, indent=2)
except:
pass
COST_CONFIG = _load_cost_config()
# --- Dual Ledger ---
LEDGER_FILE = os.environ.get("LEDGER_FILE", "/var/lib/mortdecai-gateway/ledger.jsonl")
LEDGER_SECRET = os.environ.get("LEDGER_SECRET", "change_me_shared_secret")
CALLBACK_URL = os.environ.get("CALLBACK_URL", "") # Seth's server endpoint for transaction logging
_ledger_lock = threading.Lock()
def _ledger_hash(entry):
"""Create a verification hash from transaction data + shared secret."""
raw = f"{entry['id']}|{entry['tokens_in']}|{entry['tokens_out']}|{entry['duration']}|{entry['cost']}|{LEDGER_SECRET}"
return hashlib.sha256(raw.encode()).hexdigest()[:16]
def _ledger_write(entry):
"""Append a transaction to the local ledger."""
with _ledger_lock:
try:
os.makedirs(os.path.dirname(LEDGER_FILE), exist_ok=True)
with open(LEDGER_FILE, "a") as f:
f.write(json.dumps(entry) + "\n")
except Exception as e:
print(f"Ledger write failed: {e}")
def _ledger_callback(entry):
"""Send transaction to the client's server for cross-verification."""
if not CALLBACK_URL:
return
try:
requests.post(
CALLBACK_URL,
json=entry,
headers={"Content-Type": "application/json"},
timeout=5,
)
except:
pass # Non-blocking — don't fail inference because callback is down
def _ledger_record(tokens_in, tokens_out, duration, cost, energy_wh, model):
"""Record a transaction in the ledger and notify the client."""
entry = {
"id": str(uuid.uuid4())[:12],
"timestamp": time.strftime("%Y-%m-%dT%H:%M:%SZ"),
"tokens_in": tokens_in,
"tokens_out": tokens_out,
"duration": round(duration, 3),
"cost": round(cost, 8),
"energy_wh": round(energy_wh, 4),
"model": model,
"billing_mode": COST_CONFIG["billing_mode"],
}
entry["hash"] = _ledger_hash(entry)
_ledger_write(entry)
# Send to client in background
threading.Thread(target=_ledger_callback, args=(entry,), daemon=True).start()
return entry
def _ledger_load():
"""Load all ledger entries."""
entries = []
try:
with open(LEDGER_FILE) as f:
for line in f:
if line.strip():
entries.append(json.loads(line))
except:
pass
return entries
def _ledger_verify(entries):
"""Verify all ledger entries against their hashes."""
results = {"total": len(entries), "valid": 0, "invalid": 0, "invalid_ids": []}
for entry in entries:
expected = _ledger_hash(entry)
if entry.get("hash") == expected:
results["valid"] += 1
else:
results["invalid"] += 1
results["invalid_ids"].append(entry.get("id", "?"))
return results
# --- Stats tracking --- # --- Stats tracking ---
_stats_lock = threading.Lock() _stats_lock = threading.Lock()
@@ -67,25 +197,52 @@ def _save_stats():
pass pass
def _track_request(tokens_in, tokens_out, duration_seconds): def _calc_marginal_cost(duration_seconds):
"""Track a completed inference request.""" """Calculate marginal electricity cost for an inference call."""
c = COST_CONFIG
if c["billing_mode"] == "marginal":
# Only charge for extra watts above idle
marginal_gpu = c["gpu_load_watts"] - c["gpu_idle_watts"]
marginal_system = c["system_inference_watts"] - c["system_idle_watts"]
marginal_watts = marginal_gpu + marginal_system
else:
# Dedicated: charge for full system draw during inference
marginal_watts = c["gpu_load_watts"] + c["system_inference_watts"]
energy_wh = (marginal_watts * duration_seconds) / 3600
electricity_cost = (energy_wh / 1000) * c["electricity_rate"]
# Apply profit margin
cost = electricity_cost * (1 + c["profit_margin"])
return marginal_watts, energy_wh, cost
def _track_request(tokens_in, tokens_out, duration_seconds, model="mortdecai:0.4.0"):
"""Track a completed inference request and record in ledger."""
marginal_watts, energy_wh, cost = _calc_marginal_cost(duration_seconds)
# Record in dual ledger
_ledger_record(tokens_in, tokens_out, duration_seconds, cost, energy_wh, model)
with _stats_lock: with _stats_lock:
_stats["total_requests"] += 1 _stats["total_requests"] += 1
_stats["total_tokens_in"] += tokens_in _stats["total_tokens_in"] += tokens_in
_stats["total_tokens_out"] += tokens_out _stats["total_tokens_out"] += tokens_out
_stats["total_inference_seconds"] += duration_seconds _stats["total_inference_seconds"] += duration_seconds
_stats["last_request_at"] = time.strftime("%Y-%m-%dT%H:%M:%SZ") _stats["last_request_at"] = time.strftime("%Y-%m-%dT%H:%M:%SZ")
# Power calculation
# GPU draws TDP watts during inference, plus system overhead
total_watts = GPU_TDP_WATTS + SYSTEM_OVERHEAD_WATTS
energy_wh = (total_watts * duration_seconds) / 3600
cost = (energy_wh / 1000) * ELECTRICITY_RATE
_stats["total_energy_wh"] += energy_wh _stats["total_energy_wh"] += energy_wh
_stats["total_cost"] += cost _stats["total_cost"] += cost
_stats["total_marginal_watts_avg"] = (
_stats.get("total_marginal_watts_avg", marginal_watts) * 0.95 + marginal_watts * 0.05
)
# Base rate for dedicated mode
if COST_CONFIG["billing_mode"] == "dedicated" and COST_CONFIG["base_rate_per_hour"] > 0:
# Add base rate proportional to time since last request
last = _stats.get("_last_base_calc", time.time())
elapsed_hours = (time.time() - last) / 3600
_stats["total_cost"] += COST_CONFIG["base_rate_per_hour"] * elapsed_hours
_stats["_last_base_calc"] = time.time()
# Save every 10 requests
if _stats["total_requests"] % 10 == 0: if _stats["total_requests"] % 10 == 0:
_save_stats() _save_stats()
@@ -93,7 +250,7 @@ def _track_request(tokens_in, tokens_out, duration_seconds):
def _check_budget(): def _check_budget():
"""Returns True if under spending cap.""" """Returns True if under spending cap."""
with _stats_lock: with _stats_lock:
return _stats["total_cost"] < SPENDING_CAP return _stats["total_cost"] < COST_CONFIG["spending_cap"]
def _get_gpu_utilization(): def _get_gpu_utilization():
@@ -185,17 +342,21 @@ class GatewayHandler(BaseHTTPRequestHandler):
# Track token usage from response # Track token usage from response
tokens_in = data.get("prompt_eval_count", 0) tokens_in = data.get("prompt_eval_count", 0)
tokens_out = data.get("eval_count", 0) tokens_out = data.get("eval_count", 0)
model_name = (body or {}).get("model", "unknown")
if tokens_in or tokens_out: if tokens_in or tokens_out:
_track_request(tokens_in, tokens_out, duration) _track_request(tokens_in, tokens_out, duration, model_name)
# Add gateway metadata to response # Add gateway metadata to response
if isinstance(data, dict): if isinstance(data, dict):
mw, ewh, ecost = _calc_marginal_cost(duration)
data["_gateway"] = { data["_gateway"] = {
"duration_seconds": round(duration, 2), "duration_seconds": round(duration, 2),
"energy_wh": round((GPU_TDP_WATTS + SYSTEM_OVERHEAD_WATTS) * duration / 3600, 4), "marginal_watts": round(mw, 1),
"estimated_cost": round(((GPU_TDP_WATTS + SYSTEM_OVERHEAD_WATTS) * duration / 3600 / 1000) * ELECTRICITY_RATE, 6), "energy_wh": round(ewh, 4),
"estimated_cost": round(ecost, 6),
"total_cost": round(_stats["total_cost"], 4), "total_cost": round(_stats["total_cost"], 4),
"budget_remaining": round(SPENDING_CAP - _stats["total_cost"], 4), "budget_remaining": round(COST_CONFIG["spending_cap"] - _stats["total_cost"], 4),
"billing_mode": COST_CONFIG["billing_mode"],
} }
self._send_json(r.status_code, data) self._send_json(r.status_code, data)
@@ -225,17 +386,46 @@ class GatewayHandler(BaseHTTPRequestHandler):
return return
gpu = _get_gpu_utilization() gpu = _get_gpu_utilization()
with _stats_lock: with _stats_lock:
stats_copy = dict(_stats) stats_copy = {k: v for k, v in _stats.items() if not k.startswith("_")}
stats_copy["gpu"] = gpu stats_copy["gpu"] = gpu
stats_copy["config"] = { stats_copy["cost_config"] = COST_CONFIG
"gpu_tdp_watts": GPU_TDP_WATTS,
"system_overhead_watts": SYSTEM_OVERHEAD_WATTS,
"electricity_rate": ELECTRICITY_RATE,
"spending_cap": SPENDING_CAP,
}
self._send_json(200, stats_copy) self._send_json(200, stats_copy)
return return
if parsed.path == "/config":
if not self._check_auth():
return
self._send_json(200, COST_CONFIG)
return
if parsed.path == "/ledger":
if not self._check_auth():
return
entries = _ledger_load()
total_cost = sum(e.get("cost", 0) for e in entries)
self._send_json(200, {
"entries": len(entries),
"total_cost": round(total_cost, 6),
"last_10": entries[-10:],
})
return
if parsed.path == "/reconcile":
if not self._check_auth():
return
entries = _ledger_load()
verification = _ledger_verify(entries)
total_cost = sum(e.get("cost", 0) for e in entries)
self._send_json(200, {
"ledger_entries": len(entries),
"ledger_total_cost": round(total_cost, 6),
"stats_total_cost": round(_stats.get("total_cost", 0), 6),
"discrepancy": round(abs(total_cost - _stats.get("total_cost", 0)), 6),
"hash_verification": verification,
"status": "OK" if verification["invalid"] == 0 else "TAMPERED",
})
return
if parsed.path == "/dashboard": if parsed.path == "/dashboard":
self._serve_dashboard() self._serve_dashboard()
return return
@@ -252,36 +442,132 @@ class GatewayHandler(BaseHTTPRequestHandler):
length = int(self.headers.get("Content-Length", 0)) length = int(self.headers.get("Content-Length", 0))
body = json.loads(self.rfile.read(length)) if length > 0 else None body = json.loads(self.rfile.read(length)) if length > 0 else None
# Config update endpoint — adjust cost parameters live
if self.path == "/config" and body:
global COST_CONFIG
for key in body:
if key in COST_CONFIG:
COST_CONFIG[key] = type(_DEFAULT_COST_CONFIG.get(key, ""))(body[key])
_save_cost_config(COST_CONFIG)
self._send_json(200, {"status": "updated", "config": COST_CONFIG})
return
# Model update endpoint — downloads new GGUF and reloads
if self.path == "/admin/update-model" and body:
self._handle_model_update(body)
return
self._proxy_to_ollama(self.path, body) self._proxy_to_ollama(self.path, body)
def _handle_model_update(self, body):
"""Download a new GGUF from a URL and reload the model.
Request: {"url": "https://mortdec.ai/dl/...", "name": "mortdecai:0.5.0"}
This is opt-in — the gateway operator must enable ALLOW_MODEL_UPDATES=true.
"""
if os.environ.get("ALLOW_MODEL_UPDATES", "false").lower() != "true":
self._send_json(403, {"error": "Model updates disabled. Set ALLOW_MODEL_UPDATES=true in .env to enable."})
return
url = body.get("url")
name = body.get("name", "mortdecai-latest")
if not url:
self._send_json(400, {"error": "url is required"})
return
try:
import subprocess
# Download GGUF
gguf_path = f"/models/{name}.gguf"
print(f"Downloading model from {url}...")
r = requests.get(url, stream=True, timeout=600)
r.raise_for_status()
with open(f"models/{name}.gguf", "wb") as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
# Create Modelfile and load
subprocess.run(
["docker", "exec", "mortdecai-ollama", "ollama", "create", name, "-f", f"/models/Modelfile"],
timeout=120, check=True
)
self._send_json(200, {"status": "ok", "model": name, "message": "Model updated and loaded"})
except Exception as e:
self._send_json(500, {"error": f"Update failed: {e}"})
def _serve_dashboard(self): def _serve_dashboard(self):
"""Simple HTML dashboard showing usage stats.""" """Simple HTML dashboard showing usage stats."""
with _stats_lock: with _stats_lock:
s = dict(_stats) s = {k: v for k, v in _stats.items() if not k.startswith("_")}
gpu = _get_gpu_utilization() gpu = _get_gpu_utilization()
c = COST_CONFIG
marginal_w = (c["gpu_load_watts"] - c["gpu_idle_watts"]) + (c["system_inference_watts"] - c["system_idle_watts"])
active = _check_budget()
avg_cost_per_req = s["total_cost"] / max(s["total_requests"], 1)
reqs_remaining = int((c["spending_cap"] - s["total_cost"]) / max(avg_cost_per_req, 0.000001)) if avg_cost_per_req > 0 else ""
html = f"""<!DOCTYPE html> html = f"""<!DOCTYPE html>
<html><head><title>Mortdecai Gateway</title> <html><head><title>Mortdecai Gateway</title>
<meta http-equiv="refresh" content="10"> <meta http-equiv="refresh" content="10">
<style> <style>
body {{ font-family: monospace; background: #1a1a1a; color: #e0e0e0; padding: 2rem; }} body {{ font-family: monospace; background: #1a1a1a; color: #e0e0e0; padding: 2rem; max-width: 700px; margin: 0 auto; }}
h1 {{ color: #D35400; }} h1 {{ color: #D35400; }}
.stat {{ background: #252525; border: 1px solid #333; padding: 1rem; margin: 0.5rem 0; border-radius: 6px; }} h2 {{ color: #D35400; font-size: 1rem; margin-top: 1.5rem; border-bottom: 1px solid #333; padding-bottom: 0.3rem; }}
.stat {{ background: #252525; border: 1px solid #333; padding: 0.8rem 1rem; margin: 0.3rem 0; border-radius: 4px; display: flex; justify-content: space-between; }}
.label {{ color: #999; }} .label {{ color: #999; }}
.value {{ color: #D35400; font-size: 1.2rem; font-weight: bold; }} .value {{ color: #D35400; font-weight: bold; }}
.ok {{ color: #4caf50; }}
.warn {{ color: #ff9800; }}
.bad {{ color: #f44336; }}
.bar {{ background: #333; border-radius: 3px; height: 20px; margin: 0.5rem 0; }}
.bar-fill {{ background: #D35400; height: 100%; border-radius: 3px; transition: width 0.5s; }}
</style></head><body> </style></head><body>
<h1>Mortdecai Gateway</h1> <h1>Mortdecai Gateway</h1>
<div class="stat"><span class="label">Status:</span> <span class="value">{"ACTIVE" if _check_budget() else "PAUSED (cap reached)"}</span></div>
<div class="stat"><span class="label">Total Requests:</span> <span class="value">{s['total_requests']}</span></div> <div class="stat"><span class="label">Status</span>
<div class="stat"><span class="label">Tokens (in/out):</span> <span class="value">{s['total_tokens_in']:,} / {s['total_tokens_out']:,}</span></div> <span class="value {'ok' if active else 'bad'}">{'● ACTIVE' if active else '● PAUSED (cap reached)'}</span></div>
<div class="stat"><span class="label">Inference Time:</span> <span class="value">{s['total_inference_seconds']:.0f}s</span></div>
<div class="stat"><span class="label">Energy Used:</span> <span class="value">{s['total_energy_wh']:.1f} Wh</span></div> <h2>Usage</h2>
<div class="stat"><span class="label">Estimated Cost:</span> <span class="value">${s['total_cost']:.4f} / ${SPENDING_CAP:.2f}</span></div> <div class="stat"><span class="label">Requests</span><span class="value">{s['total_requests']:,}</span></div>
<div class="stat"><span class="label">Rejected (over cap):</span> <span class="value">{s['requests_rejected']}</span></div> <div class="stat"><span class="label">Tokens (in / out)</span><span class="value">{s['total_tokens_in']:,} / {s['total_tokens_out']:,}</span></div>
<div class="stat"><span class="label">GPU Utilization:</span> <span class="value">{gpu['utilization']}% ({gpu['source']})</span></div> <div class="stat"><span class="label">Inference Time</span><span class="value">{s['total_inference_seconds']:.0f}s ({s['total_inference_seconds']/3600:.1f}h)</span></div>
<div class="stat"><span class="label">GPU Temperature:</span> <span class="value">{gpu['temperature']}°C</span></div> <div class="stat"><span class="label">Avg per Request</span><span class="value">{s['total_inference_seconds']/max(s['total_requests'],1):.1f}s, {s['total_tokens_out']//max(s['total_requests'],1)} tokens</span></div>
<div class="stat"><span class="label">Last Request:</span> <span class="value">{s['last_request_at'] or 'never'}</span></div> <div class="stat"><span class="label">Rejected (cap)</span><span class="value">{s['requests_rejected']}</span></div>
<div class="stat"><span class="label">Config:</span> <span class="value">TDP={GPU_TDP_WATTS}W + {SYSTEM_OVERHEAD_WATTS}W overhead @ ${ELECTRICITY_RATE}/kWh</span></div> <div class="stat"><span class="label">Last Request</span><span class="value">{s['last_request_at'] or 'never'}</span></div>
<h2>Cost</h2>
<div class="bar"><div class="bar-fill" style="width: {min(s['total_cost']/max(c['spending_cap'],0.01)*100, 100):.0f}%"></div></div>
<div class="stat"><span class="label">Spent</span><span class="value">${s['total_cost']:.4f}</span></div>
<div class="stat"><span class="label">Budget</span><span class="value">${c['spending_cap']:.2f}</span></div>
<div class="stat"><span class="label">Remaining</span><span class="value">${c['spending_cap'] - s['total_cost']:.4f} (~{reqs_remaining} requests)</span></div>
<div class="stat"><span class="label">Avg Cost/Request</span><span class="value">${avg_cost_per_req:.6f}</span></div>
<div class="stat"><span class="label">Energy Used</span><span class="value">{s['total_energy_wh']:.1f} Wh ({s['total_energy_wh']/1000:.4f} kWh)</span></div>
<h2>Labor & Profit</h2>
<div class="stat"><span class="label">Labor Rate</span><span class="value">${c['labor_rate_per_hour']:.2f}/hr</span></div>
<div class="stat"><span class="label">Hours Logged</span><span class="value">{c['labor_hours_logged']:.1f}h</span></div>
<div class="stat"><span class="label">Labor Cost</span><span class="value">${c['labor_rate_per_hour'] * c['labor_hours_logged']:.2f}</span></div>
<div class="stat"><span class="label">Profit Margin</span><span class="value">{c['profit_margin']*100:.0f}%</span></div>
<div class="stat"><span class="label">Total Owed (electricity + labor + margin)</span><span class="value">${s['total_cost'] + c['labor_rate_per_hour'] * c['labor_hours_logged']:.4f}</span></div>
<h2>Power Model</h2>
<div class="stat"><span class="label">Billing Mode</span><span class="value">{c['billing_mode']}</span></div>
<div class="stat"><span class="label">GPU (idle → load)</span><span class="value">{c['gpu_idle_watts']}W → {c['gpu_load_watts']}W</span></div>
<div class="stat"><span class="label">System (idle → load)</span><span class="value">{c['system_idle_watts']}W → {c['system_inference_watts']}W</span></div>
<div class="stat"><span class="label">Marginal Draw</span><span class="value">{marginal_w}W per inference call</span></div>
<div class="stat"><span class="label">Electricity Rate</span><span class="value">${c['electricity_rate']}/kWh</span></div>
{'<div class="stat"><span class="label">Base Rate</span><span class="value">$' + f"{c['base_rate_per_hour']:.3f}" + '/hr</span></div>' if c['billing_mode'] == 'dedicated' else ''}
<h2>GPU</h2>
<div class="stat"><span class="label">Utilization</span><span class="value">{gpu['utilization']}%</span></div>
<div class="stat"><span class="label">Temperature</span><span class="value {'warn' if gpu['temperature'] > 75 else 'ok'}">{gpu['temperature']}°C</span></div>
<div class="stat"><span class="label">Power Draw</span><span class="value">{gpu['power_watts']}W</span></div>
<div class="stat"><span class="label">Source</span><span class="value">{gpu['source']}</span></div>
<p style="color:#555; font-size:0.8rem; margin-top:2rem;">
Config: GET /config | Update: POST /config | Stats: GET /stats (auth required)
</p>
</body></html>""" </body></html>"""
self.send_response(200) self.send_response(200)
@@ -293,12 +579,14 @@ h1 {{ color: #D35400; }}
def main(): def main():
_load_stats() _load_stats()
c = COST_CONFIG
print(f"Mortdecai Gateway starting") print(f"Mortdecai Gateway starting")
print(f" Ollama: {OLLAMA_URL}") print(f" Ollama: {OLLAMA_URL}")
print(f" Listen: 0.0.0.0:{LISTEN_PORT}") print(f" Listen: 0.0.0.0:{LISTEN_PORT}")
print(f" TDP: {GPU_TDP_WATTS}W + {SYSTEM_OVERHEAD_WATTS}W overhead") print(f" GPU: {c['gpu_idle_watts']}W idle → {c['gpu_load_watts']}W load")
print(f" Rate: ${ELECTRICITY_RATE}/kWh") print(f" System: {c['system_idle_watts']}W idle → {c['system_inference_watts']}W load")
print(f" Cap: ${SPENDING_CAP}") print(f" Rate: ${c['electricity_rate']}/kWh | Mode: {c['billing_mode']}")
print(f" Cap: ${c['spending_cap']}")
print(f" Dashboard: http://localhost:{LISTEN_PORT}/dashboard") print(f" Dashboard: http://localhost:{LISTEN_PORT}/dashboard")
# Save stats periodically # Save stats periodically
+147
View File
@@ -0,0 +1,147 @@
#!/usr/bin/env python3
"""
Ledger Receiver — runs on YOUR server to collect transaction records from remote gateways.
Each gateway POSTs transactions here. You keep an independent copy of every
transaction with hash verification. If the gateway operator resets their stats,
your ledger still has the full history.
Usage:
python3 ledger_receiver.py
LEDGER_SECRET=shared_secret python3 ledger_receiver.py
Endpoints:
POST /transaction — receive a transaction from a gateway
GET /ledger — view all transactions
GET /reconcile/<host> — compare your ledger against a gateway's
GET /summary — total cost by gateway
"""
import json
import os
import hashlib
import threading
import time
from http.server import HTTPServer, BaseHTTPRequestHandler
from urllib.parse import urlparse
LISTEN_PORT = int(os.environ.get("RECEIVER_PORT", "8435"))
LEDGER_DIR = os.environ.get("LEDGER_DIR", "/var/lib/mortdecai-ledger")
LEDGER_SECRET = os.environ.get("LEDGER_SECRET", "change_me_shared_secret")
_lock = threading.Lock()
def _verify_hash(entry):
raw = f"{entry['id']}|{entry['tokens_in']}|{entry['tokens_out']}|{entry['duration']}|{entry['cost']}|{LEDGER_SECRET}"
expected = hashlib.sha256(raw.encode()).hexdigest()[:16]
return entry.get("hash") == expected
def _save_transaction(entry, source_ip):
"""Save a transaction to the per-gateway ledger file."""
entry["_received_at"] = time.strftime("%Y-%m-%dT%H:%M:%SZ")
entry["_source_ip"] = source_ip
entry["_hash_valid"] = _verify_hash(entry)
os.makedirs(LEDGER_DIR, exist_ok=True)
# One file per source IP
safe_ip = source_ip.replace(":", "_").replace(".", "_")
path = os.path.join(LEDGER_DIR, f"ledger_{safe_ip}.jsonl")
with _lock:
with open(path, "a") as f:
f.write(json.dumps(entry) + "\n")
def _load_all():
"""Load all ledger entries from all gateways."""
all_entries = {}
try:
for fname in os.listdir(LEDGER_DIR):
if fname.endswith(".jsonl"):
gateway = fname.replace("ledger_", "").replace(".jsonl", "")
entries = []
with open(os.path.join(LEDGER_DIR, fname)) as f:
for line in f:
if line.strip():
entries.append(json.loads(line))
all_entries[gateway] = entries
except:
pass
return all_entries
class ReceiverHandler(BaseHTTPRequestHandler):
def log_message(self, fmt, *args):
pass
def _send_json(self, status, data):
body = json.dumps(data, indent=2).encode()
self.send_response(status)
self.send_header("Content-Type", "application/json")
self.end_headers()
self.wfile.write(body)
def do_POST(self):
if self.path == "/transaction":
length = int(self.headers.get("Content-Length", 0))
entry = json.loads(self.rfile.read(length))
source_ip = self.client_address[0]
valid = _verify_hash(entry)
_save_transaction(entry, source_ip)
self._send_json(200, {
"status": "recorded",
"id": entry.get("id"),
"hash_valid": valid,
})
return
self._send_json(404, {"error": "not found"})
def do_GET(self):
parsed = urlparse(self.path)
if parsed.path == "/summary":
all_data = _load_all()
summary = {}
for gateway, entries in all_data.items():
total_cost = sum(e.get("cost", 0) for e in entries)
total_tokens = sum(e.get("tokens_out", 0) for e in entries)
valid = sum(1 for e in entries if e.get("_hash_valid", False))
invalid = len(entries) - valid
summary[gateway] = {
"transactions": len(entries),
"total_cost": round(total_cost, 6),
"total_tokens_out": total_tokens,
"hashes_valid": valid,
"hashes_invalid": invalid,
}
self._send_json(200, summary)
return
if parsed.path == "/ledger":
all_data = _load_all()
flat = []
for entries in all_data.values():
flat.extend(entries)
flat.sort(key=lambda e: e.get("timestamp", ""))
total = sum(e.get("cost", 0) for e in flat)
self._send_json(200, {
"total_transactions": len(flat),
"total_cost": round(total, 6),
"last_20": flat[-20:],
})
return
self._send_json(404, {"error": "not found"})
if __name__ == "__main__":
os.makedirs(LEDGER_DIR, exist_ok=True)
print(f"Ledger Receiver on port {LISTEN_PORT}")
print(f" Ledger dir: {LEDGER_DIR}")
HTTPServer(("0.0.0.0", LISTEN_PORT), ReceiverHandler).serve_forever()
+74 -29
View File
@@ -1,10 +1,15 @@
#!/bin/bash #!/bin/bash
# Quick setup for Mortdecai Gateway # Mortdecai Gateway — fully automated setup
# Run this after cloning the repo # Just run: ./setup.sh
# Everything downloads and configures automatically.
set -e set -e
MODEL_URL="${MODEL_URL:-https://mortdec.ai/dl/m4gguf/mortdecai-0.4.0.gguf}"
MODEL_NAME="mortdecai:0.4.0"
echo "=== Mortdecai Gateway Setup ===" echo "=== Mortdecai Gateway Setup ==="
echo ""
# Generate API key if not set # Generate API key if not set
if [ ! -f .env ]; then if [ ! -f .env ]; then
@@ -20,30 +25,52 @@ EOF
echo "Saved to .env" echo "Saved to .env"
else else
echo ".env already exists" echo ".env already exists"
KEY=$(grep API_KEY .env | cut -d= -f2)
fi fi
# Start containers # Start containers
echo ""
echo "Starting containers..." echo "Starting containers..."
docker compose up -d docker compose up -d
# Wait for Ollama to be ready # Wait for Ollama to be ready
echo "Waiting for Ollama..." echo "Waiting for Ollama to start..."
for i in $(seq 1 30); do for i in $(seq 1 60); do
if curl -s http://localhost:11434/api/tags > /dev/null 2>&1; then if curl -s http://localhost:11434/api/tags > /dev/null 2>&1; then
echo "Ollama is ready" echo "Ollama is ready"
break break
fi fi
if [ $i -eq 60 ]; then
echo "ERROR: Ollama failed to start after 2 minutes"
echo "Check: docker logs mortdecai-ollama"
exit 1
fi
sleep 2 sleep 2
done done
# Load the model if GGUF exists # Check if model already loaded
if ls models/*.gguf 1>/dev/null 2>&1; then LOADED=$(curl -s http://localhost:11434/api/tags 2>/dev/null | python3 -c "import sys,json; print('yes' if any('$MODEL_NAME' in m['name'] for m in json.load(sys.stdin).get('models',[])) else 'no')" 2>/dev/null || echo "no")
GGUF=$(ls models/*.gguf | head -1)
MODEL_NAME=$(basename "$GGUF" .gguf | tr '[:upper:]' '[:lower:]')
echo "Loading model from $GGUF..."
cat > /tmp/Modelfile << MEOF if [ "$LOADED" = "yes" ]; then
FROM /models/$(basename $GGUF) echo "Model $MODEL_NAME already loaded"
else
# Download GGUF
mkdir -p models
GGUF_PATH="models/${MODEL_NAME}.gguf"
if [ ! -f "$GGUF_PATH" ]; then
echo ""
echo "Downloading model (~5.3 GB)..."
echo "Source: $MODEL_URL"
curl -L -o "$GGUF_PATH" "$MODEL_URL" --progress-bar
echo "Download complete"
else
echo "GGUF already downloaded"
fi
# Create Modelfile
cat > models/Modelfile << 'MEOF'
FROM /models/mortdecai:0.4.0.gguf
TEMPLATE """{{- if .Messages }} TEMPLATE """{{- if .Messages }}
{{- if or .System .Tools }}<|im_start|>system {{- if or .System .Tools }}<|im_start|>system
{{- if .System }} {{- if .System }}
@@ -51,11 +78,11 @@ TEMPLATE """{{- if .Messages }}
{{- end }} {{- end }}
<|im_end|> <|im_end|>
{{ end }} {{ end }}
{{- range \$m := .Messages }} {{- range $m := .Messages }}
{{- if eq \$m.Role "user" }}<|im_start|>user {{- if eq $m.Role "user" }}<|im_start|>user
{{ \$m.Content }}<|im_end|> {{ $m.Content }}<|im_end|>
{{- else if eq \$m.Role "assistant" }}<|im_start|>assistant {{- else if eq $m.Role "assistant" }}<|im_start|>assistant
{{ \$m.Content }}<|im_end|> {{ $m.Content }}<|im_end|>
{{- end }} {{- end }}
{{- end }}<|im_start|>assistant {{- end }}<|im_start|>assistant
{{ end }}""" {{ end }}"""
@@ -64,22 +91,40 @@ PARAMETER stop <|im_start|>
PARAMETER temperature 0.7 PARAMETER temperature 0.7
MEOF MEOF
docker exec mortdecai-ollama ollama create mortdecai-v4 -f /tmp/Modelfile echo "Loading model into Ollama..."
echo "Model loaded as mortdecai-v4" docker exec mortdecai-ollama ollama create "$MODEL_NAME" -f /models/Modelfile
echo "Model loaded as $MODEL_NAME"
fi
# Quick test
echo ""
echo "Running test inference..."
RESULT=$(curl -s http://localhost:8434/api/chat \
-H "Authorization: Bearer $KEY" \
-H "Content-Type: application/json" \
-d "{\"model\": \"$MODEL_NAME\", \"messages\": [{\"role\": \"user\", \"content\": \"say hello\"}], \"stream\": false}" 2>/dev/null)
if echo "$RESULT" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d['message']['content'][:80])" 2>/dev/null; then
echo "Test passed!"
else else
echo "No GGUF found in models/ — place your GGUF file there and run:" echo "Test inference returned unexpected result (model may still be loading)"
echo " docker exec mortdecai-ollama ollama create mortdecai-v4 -f Modelfile" echo "Try again in a minute: curl -s http://localhost:8434/health"
fi fi
echo "" echo ""
echo "=== Setup Complete ===" echo "========================================="
echo "Dashboard: http://localhost:8434/dashboard" echo " Mortdecai Gateway is ready!"
echo "API Key: $(grep API_KEY .env | cut -d= -f2)" echo "========================================="
echo "" echo ""
echo "Test: curl -s http://localhost:8434/health" echo " Dashboard: http://localhost:8434/dashboard"
echo " Health: http://localhost:8434/health"
echo " API Key: $KEY"
echo "" echo ""
echo "To use from remote:" echo " Send this to Seth:"
echo " curl -X POST http://YOUR_IP:8434/api/chat \\" echo " - Your public IP"
echo " -H 'Authorization: Bearer YOUR_API_KEY' \\" echo " - Port: 8434"
echo " -H 'Content-Type: application/json' \\" echo " - API Key: $KEY"
echo " -d '{\"model\": \"mortdecai-v4\", \"messages\": [{\"role\": \"user\", \"content\": \"test\"}]}'" echo ""
echo " To stop: docker compose down"
echo " To start: docker compose up -d"
echo "========================================="
+34
View File
@@ -0,0 +1,34 @@
#!/bin/bash
# Update Mortdecai model to a new version
# Usage: ./update-model.sh [url] [name]
# Example: ./update-model.sh https://mortdec.ai/dl/m5gguf/mortdecai-0.5.0.gguf mortdecai:0.5.0
set -e
URL="${1:-https://mortdec.ai/dl/m4gguf/mortdecai-0.4.0.gguf}"
NAME="${2:-mortdecai:0.4.0}"
echo "=== Mortdecai Model Update ==="
echo " URL: $URL"
echo " Name: $NAME"
echo ""
# Download
echo "Downloading..."
mkdir -p models
curl -L -o "models/${NAME}.gguf" "$URL" --progress-bar
echo "Download complete"
# Load into Ollama
echo "Loading into Ollama..."
docker exec mortdecai-ollama ollama create "$NAME" -f /models/Modelfile
echo "Model loaded as $NAME"
# Verify
echo ""
echo "Verifying..."
docker exec mortdecai-ollama ollama list | grep "$NAME"
echo ""
echo "=== Update complete ==="
echo "Model $NAME is ready"