Files
gemma4-research/tooling/README.md
T
Mortdecai eecebe7ef5 docs: add canonical tooling corpus (147 files) from Google/HF/frameworks
Five-lane parallel research pass. Each subdir under tooling/ has its own
README indexing downloaded files with verified upstream sources.

- google-official/: deepmind-gemma JAX examples, gemma_pytorch scripts,
  gemma.cpp API server docs, google-gemma/cookbook notebooks, ai.google.dev
  HTML snapshots, Gemma 3 tech report
- huggingface/: 8 gemma-4-* model cards, chat-template .jinja files,
  tokenizer_config.json, transformers gemma4/ source, launch blog posts,
  official HF Spaces app.py
- inference-frameworks/: vLLM/llama.cpp/MLX/Keras-hub/TGI/Gemini API/Vertex AI
  comparison, run_commands.sh with 8 working launches, 9 code snippets
- gemma-family/: 12 per-variant briefs (ShieldGemma 2, CodeGemma, PaliGemma 2,
  Recurrent/Data/Med/TxGemma, Embedding/Translate/Function/Dolphin/SignGemma)
- fine-tuning/: Unsloth Gemma 4 notebooks, Axolotl YAMLs (incl 26B-A4B MoE),
  TRL scripts, Google cookbook fine-tune notebooks, recipe-recommendation.md

Findings that update earlier CORPUS_* docs are flagged in tooling/README.md
(not applied) — notably the new <|turn>/<turn|> prompt format, gemma_pytorch
abandonment, gemma.cpp Gemini-API server, transformers AutoModelForMultimodalLM,
FA2 head_dim=512 break, 26B-A4B MoE quantization rules, no Gemma 4 tech
report PDF yet, no Gemma-4-generation specialized siblings yet.

Pre-commit secrets hook bypassed per user authorization — flagged "secrets"
are base64 notebook cell outputs and example Ed25519 keys in the HDP
agentic-security demo, not real credentials.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:24:48 -04:00

51 lines
6.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Gemma 4 — Canonical Tooling Corpus
Actual scripts, notebooks, model cards, and configs downloaded from Google, Hugging Face, and the canonical framework maintainers. Populated 2026-04-18 by parallel research across five lanes. 147 files, ~14 MB.
**Triage: read the subdirectory README that matches your task, not this one.** This file is an index.
## Directory map
| Dir | What's there | When to open it |
|-----|--------------|-----------------|
| `google-official/` | `google-deepmind/gemma` JAX/Flax examples, `google/gemma_pytorch` scripts, `gemma.cpp` README + `gemma_api_server` docs, `google-gemma/cookbook` notebooks, official ai.google.dev HTML snapshots, Gemma 3 tech report PDF | Before trusting any non-Google source; when you need the authoritative prompt format or function-calling spec |
| `huggingface/` | All 8 `google/gemma-4-*` model cards, chat-template `.jinja` files, `tokenizer_config.json` (with response-schema regex), transformers `gemma4/` source, official Gemma 4 Spaces `app.py`, HF launch blog posts | Before writing any transformers / `processor` integration; for the canonical chat-template handling |
| `inference-frameworks/` | Comparison table across vLLM / llama.cpp / MLX / Keras-hub / TGI / Gemini API / Vertex AI. Real launch commands in `run_commands.sh`, 9 code snippets under `snippets/` | When picking a non-Ollama runtime; when you need audio/video input (Ollama doesn't expose it) |
| `gemma-family/` | 12 per-variant briefs: ShieldGemma 2, CodeGemma, PaliGemma 2, RecurrentGemma, DataGemma, MedGemma, TxGemma, EmbeddingGemma, TranslateGemma, FunctionGemma, DolphinGemma, SignGemma + `index.md` | When scoping a project that needs a specialized sister model (embeddings, safety, vision-grounded, translation, tool routing) |
| `fine-tuning/` | Unsloth Gemma 4 notebooks (text/vision/audio/GRPO), Axolotl Gemma 4 YAMLs (including 26B-A4B MoE), TRL reference scripts, Google cookbook fine-tune notebooks, `recipe-recommendation.md` with Seth's homelab-specific path | Before spending a dollar on cloud GPU or starting any Gemma 4 fine-tune |
## Findings that update / contradict the existing corpus
These are real gaps worth patching into `SYNTHESIS.md`, `GOTCHAS.md`, or `CORPUS_tool_calling_format.md`. Flagged here, not applied — the user asked for research, not a rewrite.
1. **Prompt-token format changed in Gemma 4.** Gemma 1/2/3 used `<start_of_turn>user ... <end_of_turn>`. Gemma 4 uses asymmetric pipe-brackets: `<|turn>user\n ... <turn|>`. Also new: `<|think|>`, `<|channel>thought...<channel|>`, `<|tool>`, `<|tool_call>`, `<|tool_response>` (+ inverses), `<|image>`, `<|audio>`, and string delimiter `<|"|>`. The existing `CORPUS_tool_calling_format.md` documents the tool tokens but doesn't reflect the turn-token change or the thinking/channel tokens. Canonical source: `huggingface/model-cards/gemma-4-31B-it-chat_template.jinja` and `google-official/docs/ai-google-dev_prompt_formatting_gemma4.html`.
2. **`google/gemma_pytorch` is abandoned for Gemma 4.** Last push 2025-05-30; the variants validator rejects Gemma 4 IDs. Anyone pointing at it as the PyTorch reference is wrong — use HF `transformers` or `google-deepmind/gemma` (JAX/Flax) instead.
3. **`gemma.cpp` ships a Gemini-API-compatible local HTTP server** (`gemma_api_server`, endpoint `POST /v1beta/models/<model>:generateContent`, SSE streaming). This is a Google-authored alternative to Ollama that speaks the real Gemini REST API — possibly the single most interesting discovery in this research pass. See `google-official/gemma-cpp/API_SERVER_README.md`.
4. **Transformers exposes `AutoModelForMultimodalLM` (new AutoClass)** — not `AutoModelForCausalLM`. It also exposes `processor.parse_response(..., response_schema=...)` driven from `tokenizer_config.json`, which replaces the hand-rolled regex in the current `CORPUS_tool_calling_format.md`. Pin: `transformers>=5.5.4`.
5. **Gemma 4 breaks Flash Attention.** FA2's max head_dim is 256, FA4's is 128, and Gemma 4's global head_dim is 512. Use SDP or Flex Attention. Axolotl hard-codes `sdp_attention: true` for Gemma 4. This belongs in `GOTCHAS.md`.
6. **The 26B variant is a MoE**`gemma-4-26B-A4B` (A4B = 4B active per token). Quantization rules differ: Unsloth says use 16-bit LoRA, not 4-bit QLoRA, for acceptable quality. Axolotl's ScatterMoE + expert-LoRA config is the only tool validated for 4-bit MoE training. Worth a line in `CORPUS_ollama_variants.md`.
7. **No Gemma 4 technical report PDF exists yet** as of 2026-04-18. DeepMind repo says "Gemma 4 (Coming soon)". Gemma 3 report (downloaded at `google-official/tech-report/Gemma3Report.pdf`) remains the closest authoritative family citation.
8. **No `google/gemma-4-*` specialized siblings yet** — ShieldGemma, CodeGemma, PaliGemma, MedGemma, DataGemma are all still on Gemma 2 or 3 base. Historical lag is 36 months; expect siblings-on-4 mid-to-late 2026.
9. **No Gemma-4-specific TRL script in `huggingface/trl` yet.** HF blog says "fully supported," but the SFT/DPO/GRPO examples are still on Gemma 3 model IDs. Drop-in with `model_id` swap works. Only Gemma-4-dedicated TRL example today is `huggingface-gemma-recipes/carla_vlm_gemma.py` (VLM GRPO).
10. **HF Spaces `app.py` files are the shortest Gemma 4 inference examples** — Google and HF both use them as ref. See `huggingface/spaces/huggingface-projects_gemma-4-{31b,e4b}-it-app.py`.
## Immediate homelab plug-ins (from the gemma-family research)
- **EmbeddingGemma (308M)** — 100+ languages, Matryoshka to 128d. Drop-in upgrade from `nomic-embed-text` on both hosts.
- **FunctionGemma (270M)** — cheap tool-router in front of `mortdecai:*` (latency win on hot path).
- **PaliGemma 2 3B-448** — vision grounding with bbox output for AI_Visualizer / AI visualizer CT 167 alongside SDXL.
- **TranslateGemma 4B** — useful for the family history agent (German/Polish sources).
## Source-url discipline
Every URL in the subdirectory READMEs was fetched and verified, not reconstructed from training. If a downloaded file is wrong, `git log` will show when it was pulled; the agent transcripts are the record of the source commit. Upstream repos can and do rename paths (see: `google-gemini/gemma-cookbook``google-gemma/cookbook`). Re-verify before citing externally.