gemma4-research

Seth/gemma4-research

Fork 0

Commit Graph

Author	SHA1	Message	Date
Mortdecai	91842f30cb	docs: scrub PII/IPs from gpu-bakeoff - Rename host alias matt-strix -> strix-halo (removes third-party name) - Move host URLs to env-var lookup (OLLAMA_*_URL), drop hardcoded IPs from harness source. Defaults: steel141 keeps localhost; pve197 and strix-halo require their env var to be set before use. - Update doc: remove the Tailscale IP and LAN-IP references, describe access paths without specific addresses. - Rename runs/matt-strix -> runs/strix-halo and patch the host field in each JSON. Harness still functional for the original author (set the env vars) and safe to share without leaking routable addresses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 05:50:52 -04:00
Mortdecai	b6190357ba	feat: GPU bakeoff — 3090 Ti vs V100 vs Strix Halo Cross-host Gemma 4 throughput comparison across three architectures. Harness at scripts/gpu-bakeoff/; writeup at docs/reference/gpu-bakeoff-2026-04-20.md. Key findings: - RTX 3090 Ti wins decode decisively (128 tok/s on gemma4:26b MoE Q4, ~4.7× faster than gemma4:31b dense on the same card). - AMD Strix Halo iGPU lands at ~42% of 3090 Ti decode on ~25% of the memory bandwidth — good SIMD utilization, especially for MoE. - V100 numbers are DEGRADED: CT 167 ai-visualizer SDXL consumes 31/32 GB of its VRAM, forcing Gemma 4 models 95% onto CPU. Isolated V100 run requires SDXL eviction — left as follow-up. - MoE vs dense is the dominant latency factor across all GPUs: ~4 B active params of gemma4:26b beats 31.3 B active of gemma4:31b by the same ratio (~4.7×) on every card tested. Methodology: 1 warmup + 3 measurement runs per (host × model × prompt-length), Ollama's canonical timing fields, temp=0 greedy, num_predict=256. All three Ollama servers accessed via HTTP (Strix via Tailscale). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 05:45:26 -04:00

Author

SHA1

Message

Date

Mortdecai

91842f30cb

docs: scrub PII/IPs from gpu-bakeoff

- Rename host alias matt-strix -> strix-halo (removes third-party name)
- Move host URLs to env-var lookup (OLLAMA_*_URL), drop hardcoded IPs
  from harness source. Defaults: steel141 keeps localhost; pve197 and
  strix-halo require their env var to be set before use.
- Update doc: remove the Tailscale IP and LAN-IP references, describe
  access paths without specific addresses.
- Rename runs/matt-strix -> runs/strix-halo and patch the host field
  in each JSON.

Harness still functional for the original author (set the env vars)
and safe to share without leaking routable addresses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-20 05:50:52 -04:00

Mortdecai

b6190357ba

feat: GPU bakeoff — 3090 Ti vs V100 vs Strix Halo

Cross-host Gemma 4 throughput comparison across three architectures.
Harness at scripts/gpu-bakeoff/; writeup at
docs/reference/gpu-bakeoff-2026-04-20.md.

Key findings:
- RTX 3090 Ti wins decode decisively (128 tok/s on gemma4:26b MoE Q4,
  ~4.7× faster than gemma4:31b dense on the same card).
- AMD Strix Halo iGPU lands at ~42% of 3090 Ti decode on ~25% of the
  memory bandwidth — good SIMD utilization, especially for MoE.
- V100 numbers are DEGRADED: CT 167 ai-visualizer SDXL consumes 31/32
  GB of its VRAM, forcing Gemma 4 models 95% onto CPU. Isolated V100
  run requires SDXL eviction — left as follow-up.
- MoE vs dense is the dominant latency factor across all GPUs: ~4 B
  active params of gemma4:26b beats 31.3 B active of gemma4:31b by
  the same ratio (~4.7×) on every card tested.

Methodology: 1 warmup + 3 measurement runs per (host × model ×
prompt-length), Ollama's canonical timing fields, temp=0 greedy,
num_predict=256. All three Ollama servers accessed via HTTP (Strix
via Tailscale).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-20 05:45:26 -04:00

2 Commits