# Gemma 4 Implementation Reference > Patterns extracted from Seth's two production Gemma 4 projects. ## Project: Simon (FreibergFamily/simon/) **Purpose:** AI genealogy historian — multi-turn chat with tool-calling agent | Setting | Value | |---------|-------| | Model | `gemma4:26b` | | API | `/api/chat` (multi-turn) | | num_ctx | 32768 | | num_predict | 4096 | | temperature | 1.0 | | top_p | 0.95 | | top_k | 64 | | keep_alive | 4h | | think | (not explicitly set — should be false) | | format_json | not used | | Vision | not used | | Tool calling | 6 tools, max 12 iterations | ### Key Patterns 1. **Aggressive system prompt:** 40+ lines defining identity, boundaries, tool usage rules, multi-step chaining requirements. Gemma 4 follows all of it. 2. **Tool chaining instructions:** System prompt explicitly tells Gemma to chain tools (e.g., "after lookup_person, ALSO call get_historical_context"). Gemma 4 follows these multi-step chains reliably. 3. **Parallel tool calls:** Encouraged in system prompt for multiple lookups. Gemma 4 does this. 4. **History pruning:** Drops old tool results and tool-call messages, keeps assistant summaries. Prevents context bloat in multi-turn. 5. **Fallback to streaming:** After 12 tool iterations, switches to stream mode (no tools) to force a text response. 6. **Two modes (historian vs interview):** Completely different system prompts swapped at runtime. Gemma 4 stays in character for both. --- ## Project: AI Visualizer (AI_Visualizer/) **Purpose:** Music-reactive video generator — Gemma 4 as reasoning engine across 4 pipeline stages | Stage | num_predict | num_ctx | temperature | Purpose | |-------|-------------|---------|-------------|---------| | Mood Analysis | 4096 | 16384 | 0.4-0.6 | Analyze CLAP descriptors -> narratives + boundary adjustments | | Rate Pass | 512 | 4096 | 0.3-0.5 | Choose visual pacing rate per music segment | | Storyboard | 2048 | 4096 | 0.6-0.8 | Generate SDXL prompts per music segment | | Batch Expansion | 2048 | default | 0.7 | Interpolate between scene prompts over time | | Vision Validator | 256 | default | 0.2 | Critique generated frames (queued for disable) | ### Key Patterns 1. **No tool calling used.** All Gemma interaction is single-turn generate with JSON requested in prompt. 2. **Client-side JSON extraction:** ```python body = response["response"] start = body.find("{") end = body.rfind("}") obj = json.loads(body[start:end + 1]) ``` 3. **Temperature ramping on retry:** Base temp + bump per attempt. Conservative first, creative on retry. 4. **think: false everywhere.** Explicitly set on every call. Critical for budget control. 5. **format_json: false everywhere.** Causes infinite loops on nested schemas. 6. **Model pinning:** `keep_alive=-1` to prevent GPU eviction during long SDXL pauses. 7. **Explicit num_ctx:** Added after discovering Ollama defaults to 2048, which truncated mood analyzer prompts on long tracks. 8. **Banned vocabulary in prompts:** List of cliche words (cinematic, dramatic, ethereal...) passed to Gemma to avoid generic output. 9. **Vision for image critique:** Base64-encoded PNG -> structured SCORE/ISSUE/REASON output parsed by regex. Works but overrejects on subjective quality. --- ## Common Settings Across Both Projects ```json { "model": "gemma4:26b", "think": false, "options": { "num_ctx": 4096, "num_predict": 2048, "temperature": 0.5 }, "keep_alive": "30m" } ``` Adjust num_ctx/num_predict upward for your payload size. These are safe minimums.