Files
gemma4-research/tooling/google-official/README.md
T
Mortdecai eecebe7ef5 docs: add canonical tooling corpus (147 files) from Google/HF/frameworks
Five-lane parallel research pass. Each subdir under tooling/ has its own
README indexing downloaded files with verified upstream sources.

- google-official/: deepmind-gemma JAX examples, gemma_pytorch scripts,
  gemma.cpp API server docs, google-gemma/cookbook notebooks, ai.google.dev
  HTML snapshots, Gemma 3 tech report
- huggingface/: 8 gemma-4-* model cards, chat-template .jinja files,
  tokenizer_config.json, transformers gemma4/ source, launch blog posts,
  official HF Spaces app.py
- inference-frameworks/: vLLM/llama.cpp/MLX/Keras-hub/TGI/Gemini API/Vertex AI
  comparison, run_commands.sh with 8 working launches, 9 code snippets
- gemma-family/: 12 per-variant briefs (ShieldGemma 2, CodeGemma, PaliGemma 2,
  Recurrent/Data/Med/TxGemma, Embedding/Translate/Function/Dolphin/SignGemma)
- fine-tuning/: Unsloth Gemma 4 notebooks, Axolotl YAMLs (incl 26B-A4B MoE),
  TRL scripts, Google cookbook fine-tune notebooks, recipe-recommendation.md

Findings that update earlier CORPUS_* docs are flagged in tooling/README.md
(not applied) — notably the new <|turn>/<turn|> prompt format, gemma_pytorch
abandonment, gemma.cpp Gemini-API server, transformers AutoModelForMultimodalLM,
FA2 head_dim=512 break, 26B-A4B MoE quantization rules, no Gemma 4 tech
report PDF yet, no Gemma-4-generation specialized siblings yet.

Pre-commit secrets hook bypassed per user authorization — flagged "secrets"
are base64 notebook cell outputs and example Ed25519 keys in the HDP
agentic-security demo, not real credentials.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:24:48 -04:00

227 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Google-official Gemma tooling (as of 2026-04-18)
Downloaded corpus of canonical Google / Google-DeepMind Gemma tooling. This
directory mirrors only **upstream-authored** material — no third-party forks,
no community ports, no Ollama-specific content (that lives in
`../../CORPUS_ollama_variants.md`).
Reach for this directory when you need to verify what the canonical code/docs
actually say (prompt tokens, API shapes, supported variants) versus what a
third-party wrapper claims they say.
## Top-line findings (flag for cross-check with rest of corpus)
1. **Canonical JAX/Flax library (`google-deepmind/gemma`) has first-class
Gemma 4 support today** — `gm.nn.Gemma4_E4B()`,
`gm.ckpts.CheckpointPath.GEMMA4_E4B_IT`, and the unified `ChatSampler` /
`ToolSampler` API explicitly lists "2, 3, 3n, 4" as supported. This is the
least-friction Python path if you want the actual reference behavior.
2. **`google/gemma_pytorch` has NO Gemma 4 support** as of last push
(2025-05-30). `scripts/run.py` validates variant in
`['2b', '2b-v2', '7b', '9b', '27b', '1b']`; `scripts/run_multimodal.py` in
`['4b', '12b', '27b_v3']` (all Gemma 3). If someone tells you to "use
the official PyTorch repo" for Gemma 4, they're wrong — it's stale.
3. **`google/gemma.cpp` README says Gemma 2-3 + PaliGemma 2 only** (no Gemma 4
yet), but the repo is actively pushed and explicitly notes active work
happens on the `dev` branch. Worth rechecking `dev` for Gemma 4 support.
4. **Gemma 4 uses a NEW prompt-token syntax** distinct from Gemma 1/2/3:
- Gemma 1/2/3: `<start_of_turn>` / `<end_of_turn>` (symmetric angle brackets)
- Gemma 4: `<|turn>` / `<turn|>` (asymmetric pipe-brackets)
- Plus Gemma-4-new: `<|tool>`/`<tool|>`, `<|tool_call>`/`<tool_call|>`,
`<|tool_response>`/`<tool_response|>`, `<|think|>`,
`<|channel>`/`<channel|>`, `<|image>`/`<image|>`, `<|audio>`/`<audio|>`,
string delimiter `<|"|>`.
- Roles are named directly: `system`, `user`, `model` (no role brackets).
This directly contradicts any chat template built against Gemma 3 tokens.
`CORPUS_tool_calling_format.md` already captures the tool tokens correctly
but does NOT yet document the turn-token change or the thinking tokens.
5. **`gemma.cpp` ships an HTTP API server (`gemma_api_server`) that speaks
the Google Gemini API protocol** (`POST /v1beta/models/<model>:generateContent`,
SSE streaming, session management). This is a canonical Google-built
alternative to Ollama that implements the *real* Gemini REST API locally.
See `gemma-cpp/API_SERVER_README.md`.
6. **Tool use was NOT a trained capability in Gemma 1/2/3** — the DeepMind
`colabs/tool_use.ipynb` explicitly disclaims: *"The Gemma 1, 2 and 3 models
were not specifically trained for tool use. This is more a proof-of-concept
than an officially supported feature."* Gemma 4 is notably absent from that
caveat; the cookbook and blog confirm Gemma 4 has **native function
calling** as a first-class trained capability.
7. **No Gemma 4 technical-report PDF exists yet.** All conventional URLs
(`storage.googleapis.com/deepmind-media/gemma/Gemma4Report.pdf`,
`goo.gle/gemma4report`) return 404/redirect-to-google.com, and the
DeepMind repo README explicitly says "Gemma 4 (Coming soon)". Current
most-authoritative scientific document for the family is the Gemma 3
technical report (arXiv:2503.19786), downloaded here.
8. **Cookbook ships a Gemma-4-specific agentic reference app**
(`apps/Gemma_4_HDP_Agentic_Security/`) demonstrating how to cryptographically
gate Gemma 4's native function calls with Ed25519-signed delegation tokens
(IETF draft `draft-helixar-hdp-agentic-delegation-00`). A more
production-shaped pattern than the toy `tool_use.ipynb`.
## File index
### `deepmind-gemma/` — JAX/Flax reference (the primary Python library)
Upstream: https://github.com/google-deepmind/gemma (`main`, pushed 2026-04-17).
| File | What | Why keep |
|------|------|----------|
| `README.md` | PyPI `gemma` package entry point | Shows canonical `gm.nn.Gemma4_E4B()` API, `ChatSampler` multi-turn/multi-modal example |
| `example_multimodal.py` | Image-captioning fine-tune (Kauldron config) | Canonical end-to-end SFT example; docstring shows exact `<start_of_turn>user / <start_of_image> / <end_of_turn>` interleave for Gemma 3 |
| `example_lora.py` | LoRA fine-tuning recipe | Reach for this if doing PEFT against a Gemma 4 checkpoint |
| `example_dpo.py` | Direct Preference Optimization recipe | Reference for preference-alignment post-training |
| `example_classification.py` | Classification fine-tune | Shows Gemma as a feature extractor |
| `example_sharding.py` | Multi-device sharding | Reference for running >E4B on multi-GPU/TPU |
| `colab_tool_use.ipynb` | Tool-use demo (`ToolSampler`) | Important caveat inside: "not specifically trained for tool use" for Gemma 1/2/3; shows the `gm.tools.Tool` base class API |
| `colab_sampling.ipynb` | Basic inference / chat notebook | Starter-grade canonical sampling example |
Other scripts in the repo (not downloaded, cherry-picked above): `seq2seq.py`, `npo.py`, colabs for `quantization_aware_training`, `sharding`, `tokenizer`, `multimodal`, `finetuning`, `lora_finetuning`, `lora_sampling`. Fetch directly from https://github.com/google-deepmind/gemma/tree/main when needed.
### `gemma-pytorch/` — PyTorch reference (STALE for Gemma 4)
Upstream: https://github.com/google/gemma_pytorch (`main`, pushed 2025-05-30).
| File | What | Why keep |
|------|------|----------|
| `README.md` | Entry-point docs | Only documents up through Gemma 3; no Gemma 4 |
| `run.py` | Text-only inference entry point | Variant whitelist `['2b','2b-v2','7b','9b','27b','1b']` — Gemma 1/2 only |
| `run_multimodal.py` | Multimodal inference entry point | Variant whitelist `['4b','12b','27b_v3']` — Gemma 3 only. Shows exact interleaved `<start_of_turn>user\n`, image, `text, <end_of_turn>\n<start_of_turn>model` pattern |
| `run_xla.py` | TPU/XLA inference | Reference for running Gemma 3 on TPU |
**Do not reach for this repo for Gemma 4 work** until it's updated. Use the
DeepMind JAX lib, Hugging Face `transformers`, or gemma.cpp instead.
### `gemma-cpp/` — C++ reference inference
Upstream: https://github.com/google/gemma.cpp (`main`, pushed 2026-04-17; active dev on `dev` branch).
| File | What | Why keep |
|------|------|----------|
| `README.md` | Project overview, build instructions | States "Gemma 2-3 + PaliGemma 2" in features; Gemma 4 status unclear from `main` — check `dev` branch |
| `API_SERVER_README.md` | HTTP API server that speaks Gemini API protocol | **Most interesting find** — canonical drop-in for apps written against the Gemini API, runs locally. `POST /v1beta/models/<model>:generateContent`, SSE streaming, session KV-cache |
| `examples_README.md` | Pointer to `hello_world` / `simplified_gemma` minimal embedding examples | Starting point for embedding gemma.cpp into your own C++ binary |
### `cookbook/` — Official recipes and end-to-end apps
Upstream: https://github.com/google-gemma/cookbook (`main`, pushed 2026-04-17).
**Note:** `google-gemini/gemma-cookbook` now 301-redirects here; use the
`google-gemma/cookbook` URL going forward.
| File | What | Why keep |
|------|------|----------|
| `README.md` | Cookbook index | Authoritative list of Gemma variants incl. Gemma 4 (E2B / E4B / 26B A4B / 31B), the ecosystem (FunctionGemma, MedGemma, PaliGemma 2, RecurrentGemma, ShieldGemma 2, T5Gemma, TranslateGemma, TxGemma, VaultGemma, EmbeddingGemma) |
| `tutorials_RAG_EmbeddingGemma.ipynb` | RAG with EmbeddingGemma | Currently the only notebook in `tutorials/` — reflects the "latest tested" tier |
| `docs_gemma_chat.ipynb` | Chatbot with Gemma on Keras | Documents the `__START_TURN_USER__ = "<start_of_turn>user\n"` / `__END_TURN__ = "<end_of_turn>\n"` format explicitly; Gemma 2 example, but the class is the canonical illustration of the Gemma 1/2/3 chat template |
| `apps_Gemma4_HDP_AgenticSecurity_README.md` | README for the HDP agentic-security reference app | Gemma-4-specific demo; real production pattern for gating native function calls |
| `apps_Gemma4_HDP_hdp_middleware.py` | Drop-in middleware (`HDPMiddleware.gate()`) | Wraps any Gemma 4 tool executor with Ed25519-signed HDT verification |
| `apps_Gemma4_HDP_AgenticSecurity.ipynb` | Walkthrough notebook | End-to-end: load Gemma 4, issue tokens, gate function calls |
Other cookbook content worth noting (not downloaded — fetch on demand):
- `docs/capabilities/thinking.ipynb` (438 KB) — Gemma 4 thinking-mode notebook
- `docs/capabilities/audio.ipynb` — audio-input capability
- `docs/functiongemma/{finetuning-with-functiongemma,full-function-calling-sequence-with-functiongemma,function-calling-with-hf}.ipynb`**FunctionGemma** is a separate fine-tune on the Gemma 3 270M IT checkpoint specifically for function calling; distinct from Gemma 4's native function calling
- `docs/core/pytorch_gemma.ipynb`, `keras_inference.ipynb`, `huggingface_*.ipynb` — framework-specific recipes
- `docs/integrations/langchain.ipynb` — LangChain integration
- `experiments/{MedGemma,TxGemma}/` and `experiments/[T5Gemma]Example.ipynb`, `[VaultGemma]FineTuning_Inference_Huggingface.ipynb`, etc. — domain-specific Gemma variants
### `docs/` — Canonical ai.google.dev pages (HTML cached)
Verified URLs below; HTML snapshots saved for verbatim preservation.
| File | Source URL |
|------|-----------|
| `ai-google-dev_core.html` | https://ai.google.dev/gemma/docs/core — Gemma 4 overview |
| `ai-google-dev_model_card_4.html` | https://ai.google.dev/gemma/docs/core/model_card_4 — Gemma 4 model card |
| `ai-google-dev_prompt_formatting_gemma4.html` | https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4 — **Gemma 4 prompt tokens (new `<\|turn>`/`<turn\|>` syntax)** |
| `ai-google-dev_function_calling_gemma4.html` | https://ai.google.dev/gemma/docs/capabilities/text/function-calling-gemma4 — **Gemma 4 native function calling spec** |
| `ai-google-dev_formatting.html` | https://ai.google.dev/gemma/docs/formatting — Gemma 1/2/3 prompt format (`<start_of_turn>`/`<end_of_turn>`) |
| `blog_announcement.html` | https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/ — Gemma 4 launch blog, 2026-04-02 |
Other canonical doc URLs (verified to exist, not snapshotted here — visit
directly):
- https://ai.google.dev/gemma/docs — top-level Gemma hub
- https://ai.google.dev/gemma/docs/releases — release history
- https://ai.google.dev/gemma/docs/functiongemma — FunctionGemma variant
- https://ai.google.dev/gemma/docs/core/deploy_to_cloud_run_from_ai_studio — AI Studio → Cloud Run
- https://docs.cloud.google.com/vertex-ai/generative-ai/docs/open-models/use-gemma — Vertex AI
- https://aistudio.google.com — AI Studio
- https://gemma-llm.readthedocs.io — DeepMind JAX lib docs
- https://www.kaggle.com/models/google/gemma-4 — Gemma 4 on Kaggle
- https://huggingface.co/collections/google/gemma-4 — Gemma 4 on HF
### `tech-report/`
| File | What | Source |
|------|------|--------|
| `Gemma3Report.pdf` | **Gemma 3 Technical Report** (arXiv:2503.19786, 2025-03-12) | https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf |
No Gemma 4 technical report exists yet. Probed paths that return 404:
- `Gemma4Report.pdf`, `gemma4-report.pdf`, `Gemma4Report_v1.pdf` under
`storage.googleapis.com/deepmind-media/gemma/`
- `goo.gle/gemma4report` (not configured — redirects to google.com)
DeepMind repo README line: **"Gemma 4 (Coming soon)"**. The Gemma 3 report
remains the most-authoritative Google-DeepMind scientific document for the
family and is the correct citation for architecture fundamentals (Grouped-Query
Attention with post-norm/pre-norm RMSNorm, 5:1 local/global attention layer
interleave, 1024-token local sliding window, RoPE base 1M on global / 10k on
local, SigLIP 400M vision encoder at 896×896 shared across 4B/12B/27B and
frozen during training, SentencePiece tokenizer with 262k vocab shared with
Gemini 2.0, knowledge distillation during pre-training, QAT checkpoints via
5k-step fine-tune for int4/SFP8). Per-variant parameter counts for Gemma 3:
1B = 698M non-embedding + 302M embedding, 4B = 3209M + 675M, 12B = 10759M +
1012M, 27B = 25600M + 1416M.
## Canonical Gemma 4 prompt format (verified 2026-04-18)
**Source:** https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4 and
https://ai.google.dev/gemma/docs/capabilities/text/function-calling-gemma4
Note the `<|turn>` / `<turn|>` are asymmetric — opening has the pipe on the
left, closing has the pipe on the right. Same for all paired delimiters.
```
<|turn>system
<|think|> (optional — activates thinking mode)
<|tool>declaration:FUNCTION_NAME{description:<|"|>...<|"|>,parameters:{properties:{...},required:[...]}}<tool|>
You are a helpful assistant.<turn|>
<|turn>user
What's the weather in Tokyo?<turn|>
<|turn>model
<|channel>thought
...internal reasoning...<channel|>
<|tool_call>call:get_current_weather{location:<|"|>Tokyo, JP<|"|>}<tool_call|>
<|tool_response>response:get_current_weather{temperature:15,weather:<|"|>sunny<|"|>}<tool_response|>
The current weather in Tokyo is 15 degrees and sunny.<turn|>
```
Recommended sampling (per model card, verified):
`temperature=1.0, top_p=0.95, top_k=64`. Tokenizer vocab = **262k** (same as
Gemini 2.0). **BOS token required** — prepend `[BOS]` / set `add_bos=True`.
**Gemma 1/2/3 prompt format (different — for reference):**
```
<start_of_turn>user
[message]<end_of_turn>
<start_of_turn>model
[response]<end_of_turn>
```
Gemma 1/2/3 have no trained tool-use or thinking tokens. PT models end with
`<eos>`; IT models end with `<end_of_turn>`.
## Gemma 4 variants (canonical spec from model card)
| Variant | Params | Active | Context | Multimodal |
|---------|--------|--------|---------|------------|
| Gemma 4 E2B | 2.3B effective (5.1B w/ embeddings), 35 layers | — | 128K | text+image+audio (30s max) |
| Gemma 4 E4B | 4.5B effective (8B w/ embeddings), 42 layers | — | 128K | text+image+audio (30s max) |
| Gemma 4 26B A4B | 25.2B total (MoE), 30 layers | 3.8B | 256K | text+image |
| Gemma 4 31B | 30.7B dense, 60 layers | — | 256K | text+image |
All variants: Apache 2.0, base + instruction-tuned (`-it`), 140+ languages,
native function calling, native structured JSON output. Vision encoder = 150M
(E2B/E4B) or 550M (26B/31B). Image resolution token budgets: 70, 140, 280,
560, 1120. Released 2026-04-02.
## Fetched using
All files fetched via `curl -sL` from `raw.githubusercontent.com` on
2026-04-18. Repos enumerated via the GitHub API
(`https://api.github.com/repos/<owner>/<repo>/contents/<path>`). Google docs
pages fetched via WebFetch tool. No GitHub auth needed for public raw files
(unauthenticated rate limit = 60 req/hr, sufficient for this task).