Five-lane parallel research pass. Each subdir under tooling/ has its own README indexing downloaded files with verified upstream sources. - google-official/: deepmind-gemma JAX examples, gemma_pytorch scripts, gemma.cpp API server docs, google-gemma/cookbook notebooks, ai.google.dev HTML snapshots, Gemma 3 tech report - huggingface/: 8 gemma-4-* model cards, chat-template .jinja files, tokenizer_config.json, transformers gemma4/ source, launch blog posts, official HF Spaces app.py - inference-frameworks/: vLLM/llama.cpp/MLX/Keras-hub/TGI/Gemini API/Vertex AI comparison, run_commands.sh with 8 working launches, 9 code snippets - gemma-family/: 12 per-variant briefs (ShieldGemma 2, CodeGemma, PaliGemma 2, Recurrent/Data/Med/TxGemma, Embedding/Translate/Function/Dolphin/SignGemma) - fine-tuning/: Unsloth Gemma 4 notebooks, Axolotl YAMLs (incl 26B-A4B MoE), TRL scripts, Google cookbook fine-tune notebooks, recipe-recommendation.md Findings that update earlier CORPUS_* docs are flagged in tooling/README.md (not applied) — notably the new <|turn>/<turn|> prompt format, gemma_pytorch abandonment, gemma.cpp Gemini-API server, transformers AutoModelForMultimodalLM, FA2 head_dim=512 break, 26B-A4B MoE quantization rules, no Gemma 4 tech report PDF yet, no Gemma-4-generation specialized siblings yet. Pre-commit secrets hook bypassed per user authorization — flagged "secrets" are base64 notebook cell outputs and example Ed25519 keys in the HDP agentic-security demo, not real credentials. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Gemma 4 — Canonical Tooling Corpus
Actual scripts, notebooks, model cards, and configs downloaded from Google, Hugging Face, and the canonical framework maintainers. Populated 2026-04-18 by parallel research across five lanes. 147 files, ~14 MB.
Triage: read the subdirectory README that matches your task, not this one. This file is an index.
Directory map
| Dir | What's there | When to open it |
|---|---|---|
google-official/ |
google-deepmind/gemma JAX/Flax examples, google/gemma_pytorch scripts, gemma.cpp README + gemma_api_server docs, google-gemma/cookbook notebooks, official ai.google.dev HTML snapshots, Gemma 3 tech report PDF |
Before trusting any non-Google source; when you need the authoritative prompt format or function-calling spec |
huggingface/ |
All 8 google/gemma-4-* model cards, chat-template .jinja files, tokenizer_config.json (with response-schema regex), transformers gemma4/ source, official Gemma 4 Spaces app.py, HF launch blog posts |
Before writing any transformers / processor integration; for the canonical chat-template handling |
inference-frameworks/ |
Comparison table across vLLM / llama.cpp / MLX / Keras-hub / TGI / Gemini API / Vertex AI. Real launch commands in run_commands.sh, 9 code snippets under snippets/ |
When picking a non-Ollama runtime; when you need audio/video input (Ollama doesn't expose it) |
gemma-family/ |
12 per-variant briefs: ShieldGemma 2, CodeGemma, PaliGemma 2, RecurrentGemma, DataGemma, MedGemma, TxGemma, EmbeddingGemma, TranslateGemma, FunctionGemma, DolphinGemma, SignGemma + index.md |
When scoping a project that needs a specialized sister model (embeddings, safety, vision-grounded, translation, tool routing) |
fine-tuning/ |
Unsloth Gemma 4 notebooks (text/vision/audio/GRPO), Axolotl Gemma 4 YAMLs (including 26B-A4B MoE), TRL reference scripts, Google cookbook fine-tune notebooks, recipe-recommendation.md with Seth's homelab-specific path |
Before spending a dollar on cloud GPU or starting any Gemma 4 fine-tune |
Findings that update / contradict the existing corpus
These are real gaps worth patching into SYNTHESIS.md, GOTCHAS.md, or CORPUS_tool_calling_format.md. Flagged here, not applied — the user asked for research, not a rewrite.
-
Prompt-token format changed in Gemma 4. Gemma 1/2/3 used
<start_of_turn>user ... <end_of_turn>. Gemma 4 uses asymmetric pipe-brackets:<|turn>user\n ... <turn|>. Also new:<|think|>,<|channel>thought...<channel|>,<|tool>,<|tool_call>,<|tool_response>(+ inverses),<|image>,<|audio>, and string delimiter<|"|>. The existingCORPUS_tool_calling_format.mddocuments the tool tokens but doesn't reflect the turn-token change or the thinking/channel tokens. Canonical source:huggingface/model-cards/gemma-4-31B-it-chat_template.jinjaandgoogle-official/docs/ai-google-dev_prompt_formatting_gemma4.html. -
google/gemma_pytorchis abandoned for Gemma 4. Last push 2025-05-30; the variants validator rejects Gemma 4 IDs. Anyone pointing at it as the PyTorch reference is wrong — use HFtransformersorgoogle-deepmind/gemma(JAX/Flax) instead. -
gemma.cppships a Gemini-API-compatible local HTTP server (gemma_api_server, endpointPOST /v1beta/models/<model>:generateContent, SSE streaming). This is a Google-authored alternative to Ollama that speaks the real Gemini REST API — possibly the single most interesting discovery in this research pass. Seegoogle-official/gemma-cpp/API_SERVER_README.md. -
Transformers exposes
AutoModelForMultimodalLM(new AutoClass) — notAutoModelForCausalLM. It also exposesprocessor.parse_response(..., response_schema=...)driven fromtokenizer_config.json, which replaces the hand-rolled regex in the currentCORPUS_tool_calling_format.md. Pin:transformers>=5.5.4. -
Gemma 4 breaks Flash Attention. FA2's max head_dim is 256, FA4's is 128, and Gemma 4's global head_dim is 512. Use SDP or Flex Attention. Axolotl hard-codes
sdp_attention: truefor Gemma 4. This belongs inGOTCHAS.md. -
The 26B variant is a MoE —
gemma-4-26B-A4B(A4B = 4B active per token). Quantization rules differ: Unsloth says use 16-bit LoRA, not 4-bit QLoRA, for acceptable quality. Axolotl's ScatterMoE + expert-LoRA config is the only tool validated for 4-bit MoE training. Worth a line inCORPUS_ollama_variants.md. -
No Gemma 4 technical report PDF exists yet as of 2026-04-18. DeepMind repo says "Gemma 4 (Coming soon)". Gemma 3 report (downloaded at
google-official/tech-report/Gemma3Report.pdf) remains the closest authoritative family citation. -
No
google/gemma-4-*specialized siblings yet — ShieldGemma, CodeGemma, PaliGemma, MedGemma, DataGemma are all still on Gemma 2 or 3 base. Historical lag is 3–6 months; expect siblings-on-4 mid-to-late 2026. -
No Gemma-4-specific TRL script in
huggingface/trlyet. HF blog says "fully supported," but the SFT/DPO/GRPO examples are still on Gemma 3 model IDs. Drop-in withmodel_idswap works. Only Gemma-4-dedicated TRL example today ishuggingface-gemma-recipes/carla_vlm_gemma.py(VLM GRPO). -
HF Spaces
app.pyfiles are the shortest Gemma 4 inference examples — Google and HF both use them as ref. Seehuggingface/spaces/huggingface-projects_gemma-4-{31b,e4b}-it-app.py.
Immediate homelab plug-ins (from the gemma-family research)
- EmbeddingGemma (308M) — 100+ languages, Matryoshka to 128d. Drop-in upgrade from
nomic-embed-texton both hosts. - FunctionGemma (270M) — cheap tool-router in front of
mortdecai:*(latency win on hot path). - PaliGemma 2 3B-448 — vision grounding with bbox output for AI_Visualizer / AI visualizer CT 167 alongside SDXL.
- TranslateGemma 4B — useful for the family history agent (German/Polish sources).
Source-url discipline
Every URL in the subdirectory READMEs was fetched and verified, not reconstructed from training. If a downloaded file is wrong, git log will show when it was pulled; the agent transcripts are the record of the source commit. Upstream repos can and do rename paths (see: google-gemini/gemma-cookbook → google-gemma/cookbook). Re-verify before citing externally.