gemma4-research/tooling/README.md

# Gemma 4 — Canonical Tooling Corpus

Actual scripts, notebooks, model cards, and configs downloaded from Google, Hugging Face, and the canonical framework maintainers. Populated 2026-04-18 by parallel research across five lanes. 147 files, ~14 MB.

**Triage: read the subdirectory README that matches your task, not this one.** This file is an index.

## Directory map

| Dir | What's there | When to open it |
|-----|--------------|-----------------|
| `google-official/` | `google-deepmind/gemma` JAX/Flax examples, `google/gemma_pytorch` scripts, `gemma.cpp` README + `gemma_api_server` docs, `google-gemma/cookbook` notebooks, official ai.google.dev HTML snapshots, Gemma 3 tech report PDF | Before trusting any non-Google source; when you need the authoritative prompt format or function-calling spec |
| `huggingface/` | All 8 `google/gemma-4-*` model cards, chat-template `.jinja` files, `tokenizer_config.json` (with response-schema regex), transformers `gemma4/` source, official Gemma 4 Spaces `app.py`, HF launch blog posts | Before writing any transformers / `processor` integration; for the canonical chat-template handling |
| `inference-frameworks/` | Comparison table across vLLM / llama.cpp / MLX / Keras-hub / TGI / Gemini API / Vertex AI. Real launch commands in `run_commands.sh`, 9 code snippets under `snippets/` | When picking a non-Ollama runtime; when you need audio/video input (Ollama doesn't expose it) |
| `gemma-family/` | 12 per-variant briefs: ShieldGemma 2, CodeGemma, PaliGemma 2, RecurrentGemma, DataGemma, MedGemma, TxGemma, EmbeddingGemma, TranslateGemma, FunctionGemma, DolphinGemma, SignGemma + `index.md` | When scoping a project that needs a specialized sister model (embeddings, safety, vision-grounded, translation, tool routing) |
| `fine-tuning/` | Unsloth Gemma 4 notebooks (text/vision/audio/GRPO), Axolotl Gemma 4 YAMLs (including 26B-A4B MoE), TRL reference scripts, Google cookbook fine-tune notebooks, `recipe-recommendation.md` with Seth's homelab-specific path | Before spending a dollar on cloud GPU or starting any Gemma 4 fine-tune |

## Findings that update / contradict the existing corpus

These were merged into the top-level corpus docs on 2026-04-18 — each finding below
is marked **[merged: file]** where it landed, or **[flagged]** if it's informational
only. Scan here for provenance; read the CORPUS / SYNTHESIS / GOTCHAS files for the
authoritative working text.

1. **Prompt-token format changed in Gemma 4.** Gemma 1/2/3 used `<start_of_turn>user ... <end_of_turn>`. Gemma 4 uses asymmetric pipe-brackets: `<|turn>user\n ... <turn|>`. Also new: `<|think|>`, `<|channel>thought...<channel|>`, `<|tool>`, `<|tool_call>`, `<|tool_response>` (+ inverses), `<|image>`, `<|audio>`, and string delimiter `<|"|>`. Canonical source: `huggingface/model-cards/gemma-4-31B-it-chat_template.jinja` and `google-official/docs/ai-google-dev_prompt_formatting_gemma4.html`. **[merged: CORPUS_tool_calling_format.md — added Chat Template Context section]**

2. **`google/gemma_pytorch` is abandoned for Gemma 4.** Last push 2025-05-30; the variants validator rejects Gemma 4 IDs. Use HF `transformers` or `google-deepmind/gemma` (JAX/Flax) instead. **[merged: GOTCHAS.md — MEDIUM severity section]**

3. **`gemma.cpp` ships a Gemini-API-compatible local HTTP server** (`gemma_api_server`, endpoint `POST /v1beta/models/<model>:generateContent`, SSE streaming). Google-authored alternative to Ollama that speaks the real Gemini REST API. See `google-official/gemma-cpp/API_SERVER_README.md`. **[flagged — not merged; no current homelab use case, but worth knowing it exists]**

4. **Transformers exposes `AutoModelForMultimodalLM` (new AutoClass)** — not `AutoModelForCausalLM`. It also exposes `processor.parse_response(..., response_schema=...)` driven from `tokenizer_config.json`. Pin: `transformers>=5.5.4`. **[merged: CORPUS_tool_calling_format.md — HF transformers Alternative section]**

5. **Gemma 4 breaks Flash Attention** (training only). FA2's max head_dim is 256, FA4's is 128, and Gemma 4's global head_dim is 512. Use SDP or Flex Attention. Does not affect Ollama / vLLM inference which already use SDP. **[merged: GOTCHAS.md — under LOW: Fine-Tuning Ecosystem Issues]**

6. **The 26B variant is a MoE** — `gemma-4-26B-A4B`, 25.2B total / 3.8B active, 8 experts of 128 + 1 shared. Q4_K_M inference is fine (standard for MoE — Mixtral/DeepSeek ship same). The "MoE quality degrades at 4-bit" concern is training-time only. **[merged: CORPUS_ollama_variants.md — annotated 26b row; GOTCHAS.md — training caveat in fine-tuning section]**

7. **No Gemma 4 technical report PDF exists yet** as of 2026-04-18. DeepMind repo says "Gemma 4 (Coming soon)". Gemma 3 report is at `google-official/tech-report/Gemma3Report.pdf`. **[flagged — nothing to merge; check back mid-2026]**

8. **No Gemma-4-generation specialized siblings yet.** ShieldGemma 2 is Gemma 3-based, CodeGemma on Gemma 2, PaliGemma 2 on Gemma 2, EmbeddingGemma on Gemma 3, etc. All still usable — just don't confuse the sibling generation with the base-model generation. Historical lag is 3–6 months; expect siblings-on-4 mid-to-late 2026. **[merged: CORPUS_capabilities.md — "What Gemma 4 Does NOT Do" now points at EmbeddingGemma for retrieval; full catalog in `gemma-family/index.md`]**

9. **No Gemma-4-specific TRL script in `huggingface/trl` yet.** HF blog says "fully supported," but the SFT/DPO/GRPO examples are still on Gemma 3 model IDs. Drop-in with `model_id` swap works. Only Gemma-4-dedicated TRL example today is `huggingface-gemma-recipes/carla_vlm_gemma.py` (VLM GRPO). **[flagged — only relevant if fine-tuning]**

10. **HF Spaces `app.py` files are the shortest Gemma 4 inference examples** — Google and HF both use them as ref. See `huggingface/spaces/huggingface-projects_gemma-4-{31b,e4b}-it-app.py`. **[flagged — reference material]**

11. **Native object detection with bbox output.** Prompt `"Detect the X in this image"` → structured `{box_2d: [ymin, xmin, ymax, xmax]}` in 1000×1000-normalized coords. First-class Gemma 4 capability, no separate detection model needed. **[merged: CORPUS_capabilities.md — Native Object Detection section]**

12. **Native `system` role support.** New in Gemma 4 — Gemma 3 prepended system as a user turn. Matters if you were hand-building the prompt string; invisible if you use Ollama `system` or HF `apply_chat_template`. **[merged: CORPUS_capabilities.md — Text section]**

13. **Audio input is E-series only AND not via Ollama.** Requires llama.cpp's `mmproj-*-E*B-it-*.gguf` projector or vLLM's `input_features_padded`. **[merged: CORPUS_ollama_variants.md and CORPUS_capabilities.md]**

## Immediate homelab plug-ins (from the gemma-family research)

- **EmbeddingGemma (308M)** — 100+ languages, Matryoshka to 128d. Drop-in upgrade from `nomic-embed-text` on both hosts.
- **FunctionGemma (270M)** — cheap tool-router in front of `mortdecai:*` (latency win on hot path).
- **PaliGemma 2 3B-448** — vision grounding with bbox output for AI_Visualizer / AI visualizer CT 167 alongside SDXL.
- **TranslateGemma 4B** — useful for the family history agent (German/Polish sources).

## Source-url discipline

Every URL in the subdirectory READMEs was fetched and verified, not reconstructed from training. If a downloaded file is wrong, `git log` will show when it was pulled; the agent transcripts are the record of the source commit. Upstream repos can and do rename paths (see: `google-gemini/gemma-cookbook` → `google-gemma/cookbook`). Re-verify before citing externally.