docs: scrub PII/IPs from gpu-bakeoff

- Rename host alias matt-strix -> strix-halo (removes third-party name)
- Move host URLs to env-var lookup (OLLAMA_*_URL), drop hardcoded IPs
  from harness source. Defaults: steel141 keeps localhost; pve197 and
  strix-halo require their env var to be set before use.
- Update doc: remove the Tailscale IP and LAN-IP references, describe
  access paths without specific addresses.
- Rename runs/matt-strix -> runs/strix-halo and patch the host field
  in each JSON.

Harness still functional for the original author (set the env vars)
and safe to share without leaking routable addresses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Mortdecai
2026-04-20 05:50:52 -04:00
parent 22af59756f
commit 91842f30cb
9 changed files with 43 additions and 21 deletions
+6 -6
View File
@@ -1,7 +1,7 @@
# GPU Bakeoff — Gemma 4 Throughput: 3090 Ti vs Strix Halo
**Date:** 2026-04-20
**Host matrix:** steel141 (RTX 3090 Ti) · matt-strix (AMD Strix Halo iGPU)
**Host matrix:** steel141 (RTX 3090 Ti) · strix-halo (AMD Strix Halo iGPU)
**Models:** `gemma4:26b` (MoE Q4_K_M) · `gemma4:31b-it-q4_K_M` (dense Q4_K_M)
**Harness:** `scripts/gpu-bakeoff/harness.py`
**Raw data:** `scripts/gpu-bakeoff/runs/`
@@ -13,7 +13,7 @@
| GPU | 26B (MoE) decode | 31B (dense) decode | Long-prompt prefill (26B) |
|-----|------------------|--------------------|-----------------------|
| **RTX 3090 Ti** (steel141) | **128 tok/s** | **27 tok/s** | **23,849 tok/s** |
| **AMD Strix Halo iGPU** (matt-strix) | 54 tok/s (42%) | 11 tok/s (39%) | 14,326 tok/s (60%) |
| **AMD Strix Halo iGPU** (strix-halo) | 54 tok/s (42%) | 11 tok/s (39%) | 14,326 tok/s (60%) |
### Headline findings
@@ -34,8 +34,8 @@
| Host | GPU | VRAM | Bandwidth | Compute cap | Notes |
|------|-----|------|-----------|-------------|-------|
| steel141 | RTX 3090 Ti | 24 GB GDDR6X | ~1008 GB/s | 8.6 (Ampere) | Seth's workstation. Also has a GTX 1660 SUPER as aux display card — not used for inference. Ollama on 127.0.0.1:11434. |
| matt-strix | AMD Strix Halo (Radeon 890M iGPU + XDNA 2 NPU) | Shared LPDDR5X | ~256 GB/s | — | Unified memory lets it fit models a 24 GB card can't. Ollama on 100.117.155.64:11434 via Tailscale. |
| steel141 | RTX 3090 Ti | 24 GB GDDR6X | ~1008 GB/s | 8.6 (Ampere) | Workstation. Also has a GTX 1660 SUPER as aux display card — not used for inference. Ollama on localhost. |
| strix-halo | AMD Strix Halo (Radeon 890M iGPU + XDNA 2 NPU) | Shared LPDDR5X | ~256 GB/s | — | Unified memory lets it fit models a 24 GB card can't. Ollama accessed via Tailscale. |
---
@@ -151,7 +151,7 @@ and matches or slightly exceeds proportionally.
1. **Strix max-model fit.** Strix can host models that wouldn't fit the
3090 Ti. A follow-up would pull a larger model (70 B+ quantized) on
matt-strix and measure the Strix-only performance ceiling.
strix-halo and measure the Strix-only performance ceiling.
2. **Q8 vs Q4 on Strix.** Same model, two quantizations — quality/speed
tradeoff characterization.
@@ -166,7 +166,7 @@ runs/
├── steel141/
│ ├── gemma4-26b/{short,long}.json
│ └── gemma4-31b/{short,long}.json
└── matt-strix/
└── strix-halo/
├── gemma4-26b/{short,long}.json
└── gemma4-31b/{short,long}.json
```