docs: scrub PII/IPs from gpu-bakeoff

- Rename host alias matt-strix -> strix-halo (removes third-party name) - Move host URLs to env-var lookup (OLLAMA_*_URL), drop hardcoded IPs from harness source. Defaults: steel141 keeps localhost; pve197 and strix-halo require their env var to be set before use. - Update doc: remove the Tailscale IP and LAN-IP references, describe access paths without specific addresses. - Rename runs/matt-strix -> runs/strix-halo and patch the host field in each JSON. Harness still functional for the original author (set the env vars) and safe to share without leaking routable addresses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 05:50:52 -04:00
parent 22af59756f
commit 91842f30cb
9 changed files with 43 additions and 21 deletions
@@ -1,7 +1,7 @@
 # GPU Bakeoff — Gemma 4 Throughput: 3090 Ti vs Strix Halo

 **Date:** 2026-04-20
-**Host matrix:** steel141 (RTX 3090 Ti) · matt-strix (AMD Strix Halo iGPU)
+**Host matrix:** steel141 (RTX 3090 Ti) · strix-halo (AMD Strix Halo iGPU)
 **Models:** `gemma4:26b` (MoE Q4_K_M) · `gemma4:31b-it-q4_K_M` (dense Q4_K_M)
 **Harness:** `scripts/gpu-bakeoff/harness.py`
 **Raw data:** `scripts/gpu-bakeoff/runs/`
@@ -13,7 +13,7 @@
 | GPU | 26B (MoE) decode | 31B (dense) decode | Long-prompt prefill (26B) |
 |-----|------------------|--------------------|-----------------------|
 | **RTX 3090 Ti** (steel141) | **128 tok/s** | **27 tok/s** | **23,849 tok/s** |
-| **AMD Strix Halo iGPU** (matt-strix) | 54 tok/s (42%) | 11 tok/s (39%) | 14,326 tok/s (60%) |
+| **AMD Strix Halo iGPU** (strix-halo) | 54 tok/s (42%) | 11 tok/s (39%) | 14,326 tok/s (60%) |

 ### Headline findings

@@ -34,8 +34,8 @@

 | Host | GPU | VRAM | Bandwidth | Compute cap | Notes |
 |------|-----|------|-----------|-------------|-------|
-| steel141 | RTX 3090 Ti | 24 GB GDDR6X | ~1008 GB/s | 8.6 (Ampere) | Seth's workstation. Also has a GTX 1660 SUPER as aux display card — not used for inference. Ollama on 127.0.0.1:11434. |
-| matt-strix | AMD Strix Halo (Radeon 890M iGPU + XDNA 2 NPU) | Shared LPDDR5X | ~256 GB/s | — | Unified memory lets it fit models a 24 GB card can't. Ollama on 100.117.155.64:11434 via Tailscale. |
+| steel141 | RTX 3090 Ti | 24 GB GDDR6X | ~1008 GB/s | 8.6 (Ampere) | Workstation. Also has a GTX 1660 SUPER as aux display card — not used for inference. Ollama on localhost. |
+| strix-halo | AMD Strix Halo (Radeon 890M iGPU + XDNA 2 NPU) | Shared LPDDR5X | ~256 GB/s | — | Unified memory lets it fit models a 24 GB card can't. Ollama accessed via Tailscale. |

 ---

@@ -151,7 +151,7 @@ and matches or slightly exceeds proportionally.

 1. **Strix max-model fit.** Strix can host models that wouldn't fit the
   3090 Ti. A follow-up would pull a larger model (70 B+ quantized) on
-   matt-strix and measure the Strix-only performance ceiling.
+   strix-halo and measure the Strix-only performance ceiling.
 2. **Q8 vs Q4 on Strix.** Same model, two quantizations — quality/speed
   tradeoff characterization.

@@ -166,7 +166,7 @@ runs/
 ├── steel141/
 │   ├── gemma4-26b/{short,long}.json
 │   └── gemma4-31b/{short,long}.json
-└── matt-strix/
+└── strix-halo/
    ├── gemma4-26b/{short,long}.json
    └── gemma4-31b/{short,long}.json
 ```