Files

T

Mortdecai 5011059f5d docs: initial Gemma 4 research corpus and synthesis

Architecture specs, benchmarks, gotchas, Ollama settings, tool calling
format, and implementation patterns from Simon and AI_Visualizer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-12 18:14:19 -04:00

3.5 KiB

Raw Blame History

Gemma 4 Implementation Reference

Patterns extracted from Seth's two production Gemma 4 projects.

Project: Simon (FreibergFamily/simon/)

Purpose: AI genealogy historian — multi-turn chat with tool-calling agent

Setting	Value
Model	`gemma4:26b`
API	`/api/chat` (multi-turn)
num_ctx	32768
num_predict	4096
temperature	1.0
top_p	0.95
top_k	64
keep_alive	4h
think	(not explicitly set — should be false)
format_json	not used
Vision	not used
Tool calling	6 tools, max 12 iterations

Key Patterns

Aggressive system prompt: 40+ lines defining identity, boundaries, tool usage rules, multi-step chaining requirements. Gemma 4 follows all of it.
Tool chaining instructions: System prompt explicitly tells Gemma to chain tools (e.g., "after lookup_person, ALSO call get_historical_context"). Gemma 4 follows these multi-step chains reliably.
Parallel tool calls: Encouraged in system prompt for multiple lookups. Gemma 4 does this.
History pruning: Drops old tool results and tool-call messages, keeps assistant summaries. Prevents context bloat in multi-turn.
Fallback to streaming: After 12 tool iterations, switches to stream mode (no tools) to force a text response.
Two modes (historian vs interview): Completely different system prompts swapped at runtime. Gemma 4 stays in character for both.

Project: AI Visualizer (AI_Visualizer/)

Purpose: Music-reactive video generator — Gemma 4 as reasoning engine across 4 pipeline stages

Stage	num_predict	num_ctx	temperature	Purpose
Mood Analysis	4096	16384	0.4-0.6	Analyze CLAP descriptors -> narratives + boundary adjustments
Rate Pass	512	4096	0.3-0.5	Choose visual pacing rate per music segment
Storyboard	2048	4096	0.6-0.8	Generate SDXL prompts per music segment
Batch Expansion	2048	default	0.7	Interpolate between scene prompts over time
Vision Validator	256	default	0.2	Critique generated frames (queued for disable)

Key Patterns

No tool calling used. All Gemma interaction is single-turn generate with JSON requested in prompt.

Client-side JSON extraction:

body = response["response"]
start = body.find("{")
end = body.rfind("}")
obj = json.loads(body[start:end + 1])

Temperature ramping on retry: Base temp + bump per attempt. Conservative first, creative on retry.
think: false everywhere. Explicitly set on every call. Critical for budget control.
format_json: false everywhere. Causes infinite loops on nested schemas.
Model pinning: keep_alive=-1 to prevent GPU eviction during long SDXL pauses.
Explicit num_ctx: Added after discovering Ollama defaults to 2048, which truncated mood analyzer prompts on long tracks.
Banned vocabulary in prompts: List of cliche words (cinematic, dramatic, ethereal...) passed to Gemma to avoid generic output.
Vision for image critique: Base64-encoded PNG -> structured SCORE/ISSUE/REASON output parsed by regex. Works but overrejects on subjective quality.

Common Settings Across Both Projects

{
  "model": "gemma4:26b",
  "think": false,
  "options": {
    "num_ctx": 4096,
    "num_predict": 2048,
    "temperature": 0.5
  },
  "keep_alive": "30m"
}

Adjust num_ctx/num_predict upward for your payload size. These are safe minimums.

3.5 KiB Raw Blame History

Gemma 4 Implementation Reference

Project: Simon (FreibergFamily/simon/)

Key Patterns

Project: AI Visualizer (AI_Visualizer/)

Key Patterns

Common Settings Across Both Projects

3.5 KiB

Raw Blame History