Files
gemma4-research/CORPUS_ollama_variants.md
T
Mortdecai 5011059f5d docs: initial Gemma 4 research corpus and synthesis
Architecture specs, benchmarks, gotchas, Ollama settings, tool calling
format, and implementation patterns from Simon and AI_Visualizer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 18:14:19 -04:00

1.7 KiB

Gemma 4 on Ollama — Available Variants

Last verified against Seth's homelab: 2026-04-12

Ollama Model Tags

Tag Params Quant Size on Disk VRAM Notes
gemma4:e4b-it-q8_0 ~8B total / 4B effective Q8_0 11.6GB ~12GB Vision + audio capable. ~25 tok/s on V100
gemma4:26b 25.8B Q4_K_M (default) 18.0GB ~18GB Sweet spot for quality/speed. ~134 tok/s on 3090 Ti
gemma4:31b-it-q4_K_M 31.3B Q4_K_M 19.9GB ~24.5GB Sharpest but 5x slower (~28 tok/s on 3090 Ti, memory pressure)

Capabilities by Variant (from ollama show)

All variants support:

  • Text generation (completion, chat)
  • Vision (image input via base64 in images field)
  • Tool/function calling (native Ollama tool format)

E-series (E2B, E4B) additionally support:

  • Audio input (conformer encoder)

GPU Coexistence (pve197 V100 32GB)

  • gemma4:26b + SDXL Turbo: ~28.5GB peak VRAM — fits on V100-32GB
  • gemma4:31b: 24.5GB alone — memory pressure with any coexisting model
  • gemma4:e4b-it-q8_0: ~12GB — comfortable headroom

Ollama API Endpoint

  • /api/generate (single-turn, used by AI_Visualizer)
  • /api/chat (multi-turn with message history, used by Simon)
  • Both accept tools, images, stream, options, keep_alive

Important Ollama Defaults to Override

Parameter Ollama Default Recommended Why
num_ctx 2048 4096-32768 Default is absurdly small, causes truncation
num_predict 128 512-4096+ Default truncates almost all useful output
think true (Ollama 0.20+) false See GOTCHAS doc
keep_alive 5m 30m-4h Prevents expensive model reload between calls