mortdecai-gateway

10 Commits 1 Branch 0 Tags

Author	SHA1	Message	Date
Seth	adeda6dd84	Pre-set HSA_OVERRIDE_GFX_VERSION for Strix Halo ROCm detection Ollama ROCm doesn't auto-detect newer AMD iGPUs (gfx1150/1151). Setting HSA_OVERRIDE_GFX_VERSION=11.0.0 in the compose fixes this. Configurable via .env for other AMD chips. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 20:37:06 -04:00
Seth	f3ea624269	Complete README: cost model, dual ledger, all endpoints documented Full documentation covering: - Quick start with automated setup - Marginal vs dedicated billing modes - All cost parameters with defaults - Dual ledger architecture and tamper protection - Reconciliation process - All endpoints (public, authenticated, admin) - Model update paths (remote + manual) - Response metadata format - Dashboard features - GPU support (AMD ROCm + NVIDIA) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 20:00:01 -04:00
Seth	968b00890f	Dual ledger: tamper-proof transaction tracking on both sides Every inference request is recorded in a local JSONL ledger with a SHA-256 hash of (id + tokens + duration + cost + shared_secret). Both sides keep independent copies: - Gateway (Matt's): writes to ledger.jsonl on every request - Receiver (Seth's): receives callbacks, saves per-gateway ledger Endpoints: - GET /ledger — view transactions + total cost - GET /reconcile — compare ledger vs stats, verify all hashes - POST /config — adjust cost params live ledger_receiver.py runs on Seth's server: - POST /transaction — receive and verify gateway callbacks - GET /summary — total cost per gateway - GET /ledger — all transactions across gateways If either side resets stats, the other's ledger has the full history. If either side tampers with entries, hash verification catches it. Tested: request → ledger write → reconcile → hash valid → zero discrepancy Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 19:56:10 -04:00
Seth	583c563daa	Fix startup print for new config model Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 19:50:54 -04:00
Seth	6d3df9ae58	Full cost model: marginal power, labor, profit, live config Cost model: - Marginal billing: only charge for watts above idle - Dedicated billing: charge for all uptime (optional) - Labor rate: $/hr for operator time, manually logged - Profit margin: percentage markup on electricity cost - All parameters adjustable live via POST /config Dashboard shows: - Cost breakdown with progress bar - Power model (idle→load for GPU and system) - Marginal watts per inference call - Labor hours + labor cost - Total owed (electricity + labor + margin) - GPU utilization, temperature, power draw - Avg cost per request, estimated remaining requests Endpoints: - GET /config — view current cost config - POST /config — update any parameter live - GET /stats — full usage stats + cost config (auth required) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 19:49:14 -04:00
Seth	648b123f14	Add manual model update script ./update-model.sh [url] [name] Downloads GGUF and loads into Ollama. No remote access needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 19:41:56 -04:00
Seth	0b37d7de79	Add opt-in model update endpoint + API key support Gateway: POST /admin/update-model downloads new GGUF and reloads. Disabled by default — requires ALLOW_MODEL_UPDATES=true in .env. Matt controls whether remote model updates are allowed. Self-play: --api-key flag for authenticated gateway connections. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 19:39:50 -04:00
Seth	f470f052aa	Fix models mount to read-write for Modelfile creation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 19:35:45 -04:00
Seth	df9f623943	Fully automated setup: downloads GGUF, loads model, tests inference Setup script now: 1. Generates API key 2. Starts Docker containers 3. Downloads GGUF from mortdec.ai automatically (~5.3GB) 4. Creates Ollama model with correct chat template 5. Runs test inference 6. Prints connection details for Seth Matt just runs ./setup.sh — no manual file copying. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 19:33:39 -04:00
Seth	c5865feb35	Mortdecai Gateway — authenticated Ollama proxy with power metering - API key auth on all inference endpoints - Power/cost tracking: GPU TDP × inference time × electricity rate - Spending cap enforcement - Web dashboard with live stats - Docker compose for AMD ROCm (Strix Halo) or NVIDIA - Auto-setup script with GGUF loading - Tested against local Ollama Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 19:26:43 -04:00