Files
gemma4-research/IMPLEMENTATIONS.md
T
Mortdecai 5011059f5d docs: initial Gemma 4 research corpus and synthesis
Architecture specs, benchmarks, gotchas, Ollama settings, tool calling
format, and implementation patterns from Simon and AI_Visualizer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 18:14:19 -04:00

3.5 KiB

Gemma 4 Implementation Reference

Patterns extracted from Seth's two production Gemma 4 projects.

Project: Simon (FreibergFamily/simon/)

Purpose: AI genealogy historian — multi-turn chat with tool-calling agent

Setting Value
Model gemma4:26b
API /api/chat (multi-turn)
num_ctx 32768
num_predict 4096
temperature 1.0
top_p 0.95
top_k 64
keep_alive 4h
think (not explicitly set — should be false)
format_json not used
Vision not used
Tool calling 6 tools, max 12 iterations

Key Patterns

  1. Aggressive system prompt: 40+ lines defining identity, boundaries, tool usage rules, multi-step chaining requirements. Gemma 4 follows all of it.

  2. Tool chaining instructions: System prompt explicitly tells Gemma to chain tools (e.g., "after lookup_person, ALSO call get_historical_context"). Gemma 4 follows these multi-step chains reliably.

  3. Parallel tool calls: Encouraged in system prompt for multiple lookups. Gemma 4 does this.

  4. History pruning: Drops old tool results and tool-call messages, keeps assistant summaries. Prevents context bloat in multi-turn.

  5. Fallback to streaming: After 12 tool iterations, switches to stream mode (no tools) to force a text response.

  6. Two modes (historian vs interview): Completely different system prompts swapped at runtime. Gemma 4 stays in character for both.


Project: AI Visualizer (AI_Visualizer/)

Purpose: Music-reactive video generator — Gemma 4 as reasoning engine across 4 pipeline stages

Stage num_predict num_ctx temperature Purpose
Mood Analysis 4096 16384 0.4-0.6 Analyze CLAP descriptors -> narratives + boundary adjustments
Rate Pass 512 4096 0.3-0.5 Choose visual pacing rate per music segment
Storyboard 2048 4096 0.6-0.8 Generate SDXL prompts per music segment
Batch Expansion 2048 default 0.7 Interpolate between scene prompts over time
Vision Validator 256 default 0.2 Critique generated frames (queued for disable)

Key Patterns

  1. No tool calling used. All Gemma interaction is single-turn generate with JSON requested in prompt.

  2. Client-side JSON extraction:

    body = response["response"]
    start = body.find("{")
    end = body.rfind("}")
    obj = json.loads(body[start:end + 1])
    
  3. Temperature ramping on retry: Base temp + bump per attempt. Conservative first, creative on retry.

  4. think: false everywhere. Explicitly set on every call. Critical for budget control.

  5. format_json: false everywhere. Causes infinite loops on nested schemas.

  6. Model pinning: keep_alive=-1 to prevent GPU eviction during long SDXL pauses.

  7. Explicit num_ctx: Added after discovering Ollama defaults to 2048, which truncated mood analyzer prompts on long tracks.

  8. Banned vocabulary in prompts: List of cliche words (cinematic, dramatic, ethereal...) passed to Gemma to avoid generic output.

  9. Vision for image critique: Base64-encoded PNG -> structured SCORE/ISSUE/REASON output parsed by regex. Works but overrejects on subjective quality.


Common Settings Across Both Projects

{
  "model": "gemma4:26b",
  "think": false,
  "options": {
    "num_ctx": 4096,
    "num_predict": 2048,
    "temperature": 0.5
  },
  "keep_alive": "30m"
}

Adjust num_ctx/num_predict upward for your payload size. These are safe minimums.