docs: scrub PII/IPs from gpu-bakeoff

- Rename host alias matt-strix -> strix-halo (removes third-party name)
- Move host URLs to env-var lookup (OLLAMA_*_URL), drop hardcoded IPs
  from harness source. Defaults: steel141 keeps localhost; pve197 and
  strix-halo require their env var to be set before use.
- Update doc: remove the Tailscale IP and LAN-IP references, describe
  access paths without specific addresses.
- Rename runs/matt-strix -> runs/strix-halo and patch the host field
  in each JSON.

Harness still functional for the original author (set the env vars)
and safe to share without leaking routable addresses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Mortdecai
2026-04-20 05:50:52 -04:00
parent 22af59756f
commit 91842f30cb
9 changed files with 43 additions and 21 deletions
+30 -8
View File
@@ -5,7 +5,7 @@ three hosts:
- steel141 : RTX 3090 Ti (24 GB GDDR6X, compute 8.6, ~1008 GB/s)
- pve197 : Tesla V100-PCIE-32GB (32 GB HBM2, compute 7.0, ~900 GB/s)
- matt-strix: AMD Strix Halo iGPU (shared LPDDR5X, ~256 GB/s)
- strix-halo: AMD Strix Halo iGPU (shared LPDDR5X, ~256 GB/s)
Per (host, model, prompt_length), runs 1 warmup + N measurement runs,
records Ollama's canonical timing fields, and writes one JSON trace to
@@ -15,6 +15,13 @@ All three Ollama servers are polled via HTTP; no SSH required. All
timings come from Ollama's own /api/generate response fields so wall-
clock jitter between the harness and the server is excluded.
Host URLs are resolved from environment variables so routable addresses
don't live in source. Set these before running against non-local hosts:
OLLAMA_STEEL141_URL=http://127.0.0.1:11434
OLLAMA_PVE197_URL=http://<lan-ip>:11434
OLLAMA_STRIX_URL=http://<tailscale-ip>:11434
Invocation:
python3 harness.py --host steel141 --model gemma4:26b --prompt short
python3 harness.py all # runs the full planned matrix
@@ -24,6 +31,7 @@ from __future__ import annotations
import argparse
import json
import os
import sys
import time
import urllib.request
@@ -31,16 +39,30 @@ from pathlib import Path
HOSTS = {
"steel141": {"url": "http://127.0.0.1:11434", "gpu": "RTX 3090 Ti", "vram_gb": 24},
"pve197": {"url": "http://192.168.0.179:11434", "gpu": "Tesla V100-PCIE-32GB", "vram_gb": 32},
"matt-strix": {"url": "http://100.117.155.64:11434", "gpu": "AMD Strix Halo iGPU", "vram_gb": None},
"steel141": {"url_env": "OLLAMA_STEEL141_URL", "default_url": "http://127.0.0.1:11434",
"gpu": "RTX 3090 Ti", "vram_gb": 24},
"pve197": {"url_env": "OLLAMA_PVE197_URL", "default_url": None,
"gpu": "Tesla V100-PCIE-32GB", "vram_gb": 32},
"strix-halo": {"url_env": "OLLAMA_STRIX_URL", "default_url": None,
"gpu": "AMD Strix Halo iGPU", "vram_gb": None},
}
# Per-host model tag mapping. matt-strix uses gemma4:31b, the others
def _host_url(host: str) -> str:
cfg = HOSTS[host]
url = os.environ.get(cfg["url_env"]) or cfg["default_url"]
if not url:
raise RuntimeError(
f"host {host!r} has no URL — set ${cfg['url_env']} in env"
)
return url
# Per-host model tag mapping. strix-halo uses gemma4:31b, the others
# use gemma4:31b-it-q4_K_M — identical weights, different tags.
MODEL_ALIASES = {
"gemma4:26b": {"steel141": "gemma4:26b", "pve197": "gemma4:26b", "matt-strix": "gemma4:26b"},
"gemma4:31b": {"steel141": "gemma4:31b-it-q4_K_M", "pve197": "gemma4:31b-it-q4_K_M", "matt-strix": "gemma4:31b"},
"gemma4:26b": {"steel141": "gemma4:26b", "pve197": "gemma4:26b", "strix-halo": "gemma4:26b"},
"gemma4:31b": {"steel141": "gemma4:31b-it-q4_K_M", "pve197": "gemma4:31b-it-q4_K_M", "strix-halo": "gemma4:31b"},
# V100-only edge case — only 32 GB host has headroom for the Q8 MoE.
"gemma4:26b-q8": {"pve197": "gemma4:26b-a4b-it-q8_0"},
}
@@ -151,7 +173,7 @@ def run_matrix(
return {"host": host, "model_alias": model_alias, "skipped": "model not available on host"}
prompt = PROMPTS[prompt_key]
url = host_cfg["url"]
url = _host_url(host)
trace = {
"host": host,