Files

T

Mortdecai 5011059f5d docs: initial Gemma 4 research corpus and synthesis

Architecture specs, benchmarks, gotchas, Ollama settings, tool calling
format, and implementation patterns from Simon and AI_Visualizer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-12 18:14:19 -04:00

1.7 KiB

Raw Blame History

Gemma 4 on Ollama — Available Variants

Last verified against Seth's homelab: 2026-04-12

Ollama Model Tags

Tag	Params	Quant	Size on Disk	VRAM	Notes
`gemma4:e4b-it-q8_0`	~8B total / 4B effective	Q8_0	11.6GB	~12GB	Vision + audio capable. ~25 tok/s on V100
`gemma4:26b`	25.8B	Q4_K_M (default)	18.0GB	~18GB	Sweet spot for quality/speed. ~134 tok/s on 3090 Ti
`gemma4:31b-it-q4_K_M`	31.3B	Q4_K_M	19.9GB	~24.5GB	Sharpest but 5x slower (~28 tok/s on 3090 Ti, memory pressure)

Capabilities by Variant (from `ollama show`)

All variants support:

Text generation (completion, chat)
Vision (image input via base64 in images field)
Tool/function calling (native Ollama tool format)

E-series (E2B, E4B) additionally support:

Audio input (conformer encoder)

GPU Coexistence (pve197 V100 32GB)

gemma4:26b + SDXL Turbo: ~28.5GB peak VRAM — fits on V100-32GB
gemma4:31b: 24.5GB alone — memory pressure with any coexisting model
gemma4:e4b-it-q8_0: ~12GB — comfortable headroom

Ollama API Endpoint

/api/generate (single-turn, used by AI_Visualizer)
/api/chat (multi-turn with message history, used by Simon)
Both accept tools, images, stream, options, keep_alive

Important Ollama Defaults to Override

Parameter	Ollama Default	Recommended	Why
`num_ctx`	2048	4096-32768	Default is absurdly small, causes truncation
`num_predict`	128	512-4096+	Default truncates almost all useful output
`think`	true (Ollama 0.20+)	false	See GOTCHAS doc
`keep_alive`	5m	30m-4h	Prevents expensive model reload between calls

1.7 KiB Raw Blame History

Gemma 4 on Ollama — Available Variants

Ollama Model Tags

Capabilities by Variant (from ollama show)

GPU Coexistence (pve197 V100 32GB)

Ollama API Endpoint

Important Ollama Defaults to Override

1.7 KiB

Raw Blame History

Capabilities by Variant (from `ollama show`)