docs: scrub PII/IPs from gpu-bakeoff
- Rename host alias matt-strix -> strix-halo (removes third-party name) - Move host URLs to env-var lookup (OLLAMA_*_URL), drop hardcoded IPs from harness source. Defaults: steel141 keeps localhost; pve197 and strix-halo require their env var to be set before use. - Update doc: remove the Tailscale IP and LAN-IP references, describe access paths without specific addresses. - Rename runs/matt-strix -> runs/strix-halo and patch the host field in each JSON. Harness still functional for the original author (set the env vars) and safe to share without leaking routable addresses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,7 +1,7 @@
|
||||
# GPU Bakeoff — Gemma 4 Throughput: 3090 Ti vs Strix Halo
|
||||
|
||||
**Date:** 2026-04-20
|
||||
**Host matrix:** steel141 (RTX 3090 Ti) · matt-strix (AMD Strix Halo iGPU)
|
||||
**Host matrix:** steel141 (RTX 3090 Ti) · strix-halo (AMD Strix Halo iGPU)
|
||||
**Models:** `gemma4:26b` (MoE Q4_K_M) · `gemma4:31b-it-q4_K_M` (dense Q4_K_M)
|
||||
**Harness:** `scripts/gpu-bakeoff/harness.py`
|
||||
**Raw data:** `scripts/gpu-bakeoff/runs/`
|
||||
@@ -13,7 +13,7 @@
|
||||
| GPU | 26B (MoE) decode | 31B (dense) decode | Long-prompt prefill (26B) |
|
||||
|-----|------------------|--------------------|-----------------------|
|
||||
| **RTX 3090 Ti** (steel141) | **128 tok/s** | **27 tok/s** | **23,849 tok/s** |
|
||||
| **AMD Strix Halo iGPU** (matt-strix) | 54 tok/s (42%) | 11 tok/s (39%) | 14,326 tok/s (60%) |
|
||||
| **AMD Strix Halo iGPU** (strix-halo) | 54 tok/s (42%) | 11 tok/s (39%) | 14,326 tok/s (60%) |
|
||||
|
||||
### Headline findings
|
||||
|
||||
@@ -34,8 +34,8 @@
|
||||
|
||||
| Host | GPU | VRAM | Bandwidth | Compute cap | Notes |
|
||||
|------|-----|------|-----------|-------------|-------|
|
||||
| steel141 | RTX 3090 Ti | 24 GB GDDR6X | ~1008 GB/s | 8.6 (Ampere) | Seth's workstation. Also has a GTX 1660 SUPER as aux display card — not used for inference. Ollama on 127.0.0.1:11434. |
|
||||
| matt-strix | AMD Strix Halo (Radeon 890M iGPU + XDNA 2 NPU) | Shared LPDDR5X | ~256 GB/s | — | Unified memory lets it fit models a 24 GB card can't. Ollama on 100.117.155.64:11434 via Tailscale. |
|
||||
| steel141 | RTX 3090 Ti | 24 GB GDDR6X | ~1008 GB/s | 8.6 (Ampere) | Workstation. Also has a GTX 1660 SUPER as aux display card — not used for inference. Ollama on localhost. |
|
||||
| strix-halo | AMD Strix Halo (Radeon 890M iGPU + XDNA 2 NPU) | Shared LPDDR5X | ~256 GB/s | — | Unified memory lets it fit models a 24 GB card can't. Ollama accessed via Tailscale. |
|
||||
|
||||
---
|
||||
|
||||
@@ -151,7 +151,7 @@ and matches or slightly exceeds proportionally.
|
||||
|
||||
1. **Strix max-model fit.** Strix can host models that wouldn't fit the
|
||||
3090 Ti. A follow-up would pull a larger model (70 B+ quantized) on
|
||||
matt-strix and measure the Strix-only performance ceiling.
|
||||
strix-halo and measure the Strix-only performance ceiling.
|
||||
2. **Q8 vs Q4 on Strix.** Same model, two quantizations — quality/speed
|
||||
tradeoff characterization.
|
||||
|
||||
@@ -166,7 +166,7 @@ runs/
|
||||
├── steel141/
|
||||
│ ├── gemma4-26b/{short,long}.json
|
||||
│ └── gemma4-31b/{short,long}.json
|
||||
└── matt-strix/
|
||||
└── strix-halo/
|
||||
├── gemma4-26b/{short,long}.json
|
||||
└── gemma4-31b/{short,long}.json
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user