Files
gemma4-research/tooling
Mortdecai 5775978899 docs: merge tooling findings into SYNTHESIS/GOTCHAS/CORPUS_* and add handoff
Patches the top-level corpus docs with the 13 findings flagged during the
2026-04-18 canonical tooling research pass. tooling/README.md now marks each
finding [merged: <file>] or [flagged] for provenance.

- CORPUS_ollama_variants.md: annotate gemma4:26b as MoE (25.2B total / 3.8B
  active, 8-of-128 experts + 1 shared). Note Q4_K_M inference is standard
  (the "MoE quality degrades at 4-bit" caveat is training-only). Add note
  that audio on E-series is NOT available via Ollama — llama.cpp mmproj
  or vLLM only.
- CORPUS_capabilities.md: native system role, configurable thinking mode,
  first trained tool use (vs Gemma 1/2/3 proof-of-concept), native object
  detection with bbox output in 1000x1000 coords, pointer to EmbeddingGemma
  for retrieval (Gemma 4 has no embedding mode).
- CORPUS_tool_calling_format.md: add Chat Template Context section
  documenting the <|turn>/<turn|> asymmetric brackets (new in Gemma 4,
  replaced <start_of_turn>/<end_of_turn>) plus <|think>, <|channel>,
  <|image>, <|audio> tokens. Add HF transformers Alternative section
  showing processor.parse_response with response_schema.
- GOTCHAS.md: add MEDIUM gotcha for abandoned google/gemma_pytorch (no
  Gemma 4 support since 2025-05-30). Expand fine-tuning section with FA2/FA4
  head_dim=512 break, fused LoRA kernel issues, 26B A4B training-quant
  guidance, new tool-call tokens as learned embeddings.
- SYNTHESIS.md: add banner pointing to tooling/ for canonical upstream
  material. Add embeddinggemma row to Model Selection table.

Also:
- Add .gitignore excluding .backup/ (local scratch per global CLAUDE.md
  convention, not needed in tracked history) and __pycache__/.
- Add .claude/handoffs/2026-04-18-canonical-tooling-research.md so future
  sessions can pick up cold — facts verified, open threads, what changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:48:26 -04:00
..

Gemma 4 — Canonical Tooling Corpus

Actual scripts, notebooks, model cards, and configs downloaded from Google, Hugging Face, and the canonical framework maintainers. Populated 2026-04-18 by parallel research across five lanes. 147 files, ~14 MB.

Triage: read the subdirectory README that matches your task, not this one. This file is an index.

Directory map

Dir What's there When to open it
google-official/ google-deepmind/gemma JAX/Flax examples, google/gemma_pytorch scripts, gemma.cpp README + gemma_api_server docs, google-gemma/cookbook notebooks, official ai.google.dev HTML snapshots, Gemma 3 tech report PDF Before trusting any non-Google source; when you need the authoritative prompt format or function-calling spec
huggingface/ All 8 google/gemma-4-* model cards, chat-template .jinja files, tokenizer_config.json (with response-schema regex), transformers gemma4/ source, official Gemma 4 Spaces app.py, HF launch blog posts Before writing any transformers / processor integration; for the canonical chat-template handling
inference-frameworks/ Comparison table across vLLM / llama.cpp / MLX / Keras-hub / TGI / Gemini API / Vertex AI. Real launch commands in run_commands.sh, 9 code snippets under snippets/ When picking a non-Ollama runtime; when you need audio/video input (Ollama doesn't expose it)
gemma-family/ 12 per-variant briefs: ShieldGemma 2, CodeGemma, PaliGemma 2, RecurrentGemma, DataGemma, MedGemma, TxGemma, EmbeddingGemma, TranslateGemma, FunctionGemma, DolphinGemma, SignGemma + index.md When scoping a project that needs a specialized sister model (embeddings, safety, vision-grounded, translation, tool routing)
fine-tuning/ Unsloth Gemma 4 notebooks (text/vision/audio/GRPO), Axolotl Gemma 4 YAMLs (including 26B-A4B MoE), TRL reference scripts, Google cookbook fine-tune notebooks, recipe-recommendation.md with Seth's homelab-specific path Before spending a dollar on cloud GPU or starting any Gemma 4 fine-tune

Findings that update / contradict the existing corpus

These were merged into the top-level corpus docs on 2026-04-18 — each finding below is marked [merged: file] where it landed, or [flagged] if it's informational only. Scan here for provenance; read the CORPUS / SYNTHESIS / GOTCHAS files for the authoritative working text.

  1. Prompt-token format changed in Gemma 4. Gemma 1/2/3 used <start_of_turn>user ... <end_of_turn>. Gemma 4 uses asymmetric pipe-brackets: <|turn>user\n ... <turn|>. Also new: <|think|>, <|channel>thought...<channel|>, <|tool>, <|tool_call>, <|tool_response> (+ inverses), <|image>, <|audio>, and string delimiter <|"|>. Canonical source: huggingface/model-cards/gemma-4-31B-it-chat_template.jinja and google-official/docs/ai-google-dev_prompt_formatting_gemma4.html. [merged: CORPUS_tool_calling_format.md — added Chat Template Context section]

  2. google/gemma_pytorch is abandoned for Gemma 4. Last push 2025-05-30; the variants validator rejects Gemma 4 IDs. Use HF transformers or google-deepmind/gemma (JAX/Flax) instead. [merged: GOTCHAS.md — MEDIUM severity section]

  3. gemma.cpp ships a Gemini-API-compatible local HTTP server (gemma_api_server, endpoint POST /v1beta/models/<model>:generateContent, SSE streaming). Google-authored alternative to Ollama that speaks the real Gemini REST API. See google-official/gemma-cpp/API_SERVER_README.md. [flagged — not merged; no current homelab use case, but worth knowing it exists]

  4. Transformers exposes AutoModelForMultimodalLM (new AutoClass) — not AutoModelForCausalLM. It also exposes processor.parse_response(..., response_schema=...) driven from tokenizer_config.json. Pin: transformers>=5.5.4. [merged: CORPUS_tool_calling_format.md — HF transformers Alternative section]

  5. Gemma 4 breaks Flash Attention (training only). FA2's max head_dim is 256, FA4's is 128, and Gemma 4's global head_dim is 512. Use SDP or Flex Attention. Does not affect Ollama / vLLM inference which already use SDP. [merged: GOTCHAS.md — under LOW: Fine-Tuning Ecosystem Issues]

  6. The 26B variant is a MoEgemma-4-26B-A4B, 25.2B total / 3.8B active, 8 experts of 128 + 1 shared. Q4_K_M inference is fine (standard for MoE — Mixtral/DeepSeek ship same). The "MoE quality degrades at 4-bit" concern is training-time only. [merged: CORPUS_ollama_variants.md — annotated 26b row; GOTCHAS.md — training caveat in fine-tuning section]

  7. No Gemma 4 technical report PDF exists yet as of 2026-04-18. DeepMind repo says "Gemma 4 (Coming soon)". Gemma 3 report is at google-official/tech-report/Gemma3Report.pdf. [flagged — nothing to merge; check back mid-2026]

  8. No Gemma-4-generation specialized siblings yet. ShieldGemma 2 is Gemma 3-based, CodeGemma on Gemma 2, PaliGemma 2 on Gemma 2, EmbeddingGemma on Gemma 3, etc. All still usable — just don't confuse the sibling generation with the base-model generation. Historical lag is 36 months; expect siblings-on-4 mid-to-late 2026. [merged: CORPUS_capabilities.md — "What Gemma 4 Does NOT Do" now points at EmbeddingGemma for retrieval; full catalog in gemma-family/index.md]

  9. No Gemma-4-specific TRL script in huggingface/trl yet. HF blog says "fully supported," but the SFT/DPO/GRPO examples are still on Gemma 3 model IDs. Drop-in with model_id swap works. Only Gemma-4-dedicated TRL example today is huggingface-gemma-recipes/carla_vlm_gemma.py (VLM GRPO). [flagged — only relevant if fine-tuning]

  10. HF Spaces app.py files are the shortest Gemma 4 inference examples — Google and HF both use them as ref. See huggingface/spaces/huggingface-projects_gemma-4-{31b,e4b}-it-app.py. [flagged — reference material]

  11. Native object detection with bbox output. Prompt "Detect the X in this image" → structured {box_2d: [ymin, xmin, ymax, xmax]} in 1000×1000-normalized coords. First-class Gemma 4 capability, no separate detection model needed. [merged: CORPUS_capabilities.md — Native Object Detection section]

  12. Native system role support. New in Gemma 4 — Gemma 3 prepended system as a user turn. Matters if you were hand-building the prompt string; invisible if you use Ollama system or HF apply_chat_template. [merged: CORPUS_capabilities.md — Text section]

  13. Audio input is E-series only AND not via Ollama. Requires llama.cpp's mmproj-*-E*B-it-*.gguf projector or vLLM's input_features_padded. [merged: CORPUS_ollama_variants.md and CORPUS_capabilities.md]

Immediate homelab plug-ins (from the gemma-family research)

  • EmbeddingGemma (308M) — 100+ languages, Matryoshka to 128d. Drop-in upgrade from nomic-embed-text on both hosts.
  • FunctionGemma (270M) — cheap tool-router in front of mortdecai:* (latency win on hot path).
  • PaliGemma 2 3B-448 — vision grounding with bbox output for AI_Visualizer / AI visualizer CT 167 alongside SDXL.
  • TranslateGemma 4B — useful for the family history agent (German/Polish sources).

Source-url discipline

Every URL in the subdirectory READMEs was fetched and verified, not reconstructed from training. If a downloaded file is wrong, git log will show when it was pulled; the agent transcripts are the record of the source commit. Upstream repos can and do rename paths (see: google-gemini/gemma-cookbookgoogle-gemma/cookbook). Re-verify before citing externally.