Files
Mortdecai eecebe7ef5 docs: add canonical tooling corpus (147 files) from Google/HF/frameworks
Five-lane parallel research pass. Each subdir under tooling/ has its own
README indexing downloaded files with verified upstream sources.

- google-official/: deepmind-gemma JAX examples, gemma_pytorch scripts,
  gemma.cpp API server docs, google-gemma/cookbook notebooks, ai.google.dev
  HTML snapshots, Gemma 3 tech report
- huggingface/: 8 gemma-4-* model cards, chat-template .jinja files,
  tokenizer_config.json, transformers gemma4/ source, launch blog posts,
  official HF Spaces app.py
- inference-frameworks/: vLLM/llama.cpp/MLX/Keras-hub/TGI/Gemini API/Vertex AI
  comparison, run_commands.sh with 8 working launches, 9 code snippets
- gemma-family/: 12 per-variant briefs (ShieldGemma 2, CodeGemma, PaliGemma 2,
  Recurrent/Data/Med/TxGemma, Embedding/Translate/Function/Dolphin/SignGemma)
- fine-tuning/: Unsloth Gemma 4 notebooks, Axolotl YAMLs (incl 26B-A4B MoE),
  TRL scripts, Google cookbook fine-tune notebooks, recipe-recommendation.md

Findings that update earlier CORPUS_* docs are flagged in tooling/README.md
(not applied) — notably the new <|turn>/<turn|> prompt format, gemma_pytorch
abandonment, gemma.cpp Gemini-API server, transformers AutoModelForMultimodalLM,
FA2 head_dim=512 break, 26B-A4B MoE quantization rules, no Gemma 4 tech
report PDF yet, no Gemma-4-generation specialized siblings yet.

Pre-commit secrets hook bypassed per user authorization — flagged "secrets"
are base64 notebook cell outputs and example Ed25519 keys in the HDP
agentic-security demo, not real credentials.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:24:48 -04:00

15 KiB
Raw Permalink Blame History

Google-official Gemma tooling (as of 2026-04-18)

Downloaded corpus of canonical Google / Google-DeepMind Gemma tooling. This directory mirrors only upstream-authored material — no third-party forks, no community ports, no Ollama-specific content (that lives in ../../CORPUS_ollama_variants.md).

Reach for this directory when you need to verify what the canonical code/docs actually say (prompt tokens, API shapes, supported variants) versus what a third-party wrapper claims they say.

Top-line findings (flag for cross-check with rest of corpus)

  1. Canonical JAX/Flax library (google-deepmind/gemma) has first-class Gemma 4 support todaygm.nn.Gemma4_E4B(), gm.ckpts.CheckpointPath.GEMMA4_E4B_IT, and the unified ChatSampler / ToolSampler API explicitly lists "2, 3, 3n, 4" as supported. This is the least-friction Python path if you want the actual reference behavior.
  2. google/gemma_pytorch has NO Gemma 4 support as of last push (2025-05-30). scripts/run.py validates variant in ['2b', '2b-v2', '7b', '9b', '27b', '1b']; scripts/run_multimodal.py in ['4b', '12b', '27b_v3'] (all Gemma 3). If someone tells you to "use the official PyTorch repo" for Gemma 4, they're wrong — it's stale.
  3. google/gemma.cpp README says Gemma 2-3 + PaliGemma 2 only (no Gemma 4 yet), but the repo is actively pushed and explicitly notes active work happens on the dev branch. Worth rechecking dev for Gemma 4 support.
  4. Gemma 4 uses a NEW prompt-token syntax distinct from Gemma 1/2/3:
    • Gemma 1/2/3: <start_of_turn> / <end_of_turn> (symmetric angle brackets)
    • Gemma 4: <|turn> / <turn|> (asymmetric pipe-brackets)
    • Plus Gemma-4-new: <|tool>/<tool|>, <|tool_call>/<tool_call|>, <|tool_response>/<tool_response|>, <|think|>, <|channel>/<channel|>, <|image>/<image|>, <|audio>/<audio|>, string delimiter <|"|>.
    • Roles are named directly: system, user, model (no role brackets). This directly contradicts any chat template built against Gemma 3 tokens. CORPUS_tool_calling_format.md already captures the tool tokens correctly but does NOT yet document the turn-token change or the thinking tokens.
  5. gemma.cpp ships an HTTP API server (gemma_api_server) that speaks the Google Gemini API protocol (POST /v1beta/models/<model>:generateContent, SSE streaming, session management). This is a canonical Google-built alternative to Ollama that implements the real Gemini REST API locally. See gemma-cpp/API_SERVER_README.md.
  6. Tool use was NOT a trained capability in Gemma 1/2/3 — the DeepMind colabs/tool_use.ipynb explicitly disclaims: "The Gemma 1, 2 and 3 models were not specifically trained for tool use. This is more a proof-of-concept than an officially supported feature." Gemma 4 is notably absent from that caveat; the cookbook and blog confirm Gemma 4 has native function calling as a first-class trained capability.
  7. No Gemma 4 technical-report PDF exists yet. All conventional URLs (storage.googleapis.com/deepmind-media/gemma/Gemma4Report.pdf, goo.gle/gemma4report) return 404/redirect-to-google.com, and the DeepMind repo README explicitly says "Gemma 4 (Coming soon)". Current most-authoritative scientific document for the family is the Gemma 3 technical report (arXiv:2503.19786), downloaded here.
  8. Cookbook ships a Gemma-4-specific agentic reference app (apps/Gemma_4_HDP_Agentic_Security/) demonstrating how to cryptographically gate Gemma 4's native function calls with Ed25519-signed delegation tokens (IETF draft draft-helixar-hdp-agentic-delegation-00). A more production-shaped pattern than the toy tool_use.ipynb.

File index

deepmind-gemma/ — JAX/Flax reference (the primary Python library)

Upstream: https://github.com/google-deepmind/gemma (main, pushed 2026-04-17).

File What Why keep
README.md PyPI gemma package entry point Shows canonical gm.nn.Gemma4_E4B() API, ChatSampler multi-turn/multi-modal example
example_multimodal.py Image-captioning fine-tune (Kauldron config) Canonical end-to-end SFT example; docstring shows exact <start_of_turn>user / <start_of_image> / <end_of_turn> interleave for Gemma 3
example_lora.py LoRA fine-tuning recipe Reach for this if doing PEFT against a Gemma 4 checkpoint
example_dpo.py Direct Preference Optimization recipe Reference for preference-alignment post-training
example_classification.py Classification fine-tune Shows Gemma as a feature extractor
example_sharding.py Multi-device sharding Reference for running >E4B on multi-GPU/TPU
colab_tool_use.ipynb Tool-use demo (ToolSampler) Important caveat inside: "not specifically trained for tool use" for Gemma 1/2/3; shows the gm.tools.Tool base class API
colab_sampling.ipynb Basic inference / chat notebook Starter-grade canonical sampling example

Other scripts in the repo (not downloaded, cherry-picked above): seq2seq.py, npo.py, colabs for quantization_aware_training, sharding, tokenizer, multimodal, finetuning, lora_finetuning, lora_sampling. Fetch directly from https://github.com/google-deepmind/gemma/tree/main when needed.

gemma-pytorch/ — PyTorch reference (STALE for Gemma 4)

Upstream: https://github.com/google/gemma_pytorch (main, pushed 2025-05-30).

File What Why keep
README.md Entry-point docs Only documents up through Gemma 3; no Gemma 4
run.py Text-only inference entry point Variant whitelist ['2b','2b-v2','7b','9b','27b','1b'] — Gemma 1/2 only
run_multimodal.py Multimodal inference entry point Variant whitelist ['4b','12b','27b_v3'] — Gemma 3 only. Shows exact interleaved <start_of_turn>user\n, image, text, <end_of_turn>\n<start_of_turn>model pattern
run_xla.py TPU/XLA inference Reference for running Gemma 3 on TPU

Do not reach for this repo for Gemma 4 work until it's updated. Use the DeepMind JAX lib, Hugging Face transformers, or gemma.cpp instead.

gemma-cpp/ — C++ reference inference

Upstream: https://github.com/google/gemma.cpp (main, pushed 2026-04-17; active dev on dev branch).

File What Why keep
README.md Project overview, build instructions States "Gemma 2-3 + PaliGemma 2" in features; Gemma 4 status unclear from main — check dev branch
API_SERVER_README.md HTTP API server that speaks Gemini API protocol Most interesting find — canonical drop-in for apps written against the Gemini API, runs locally. POST /v1beta/models/<model>:generateContent, SSE streaming, session KV-cache
examples_README.md Pointer to hello_world / simplified_gemma minimal embedding examples Starting point for embedding gemma.cpp into your own C++ binary

cookbook/ — Official recipes and end-to-end apps

Upstream: https://github.com/google-gemma/cookbook (main, pushed 2026-04-17). Note: google-gemini/gemma-cookbook now 301-redirects here; use the google-gemma/cookbook URL going forward.

File What Why keep
README.md Cookbook index Authoritative list of Gemma variants incl. Gemma 4 (E2B / E4B / 26B A4B / 31B), the ecosystem (FunctionGemma, MedGemma, PaliGemma 2, RecurrentGemma, ShieldGemma 2, T5Gemma, TranslateGemma, TxGemma, VaultGemma, EmbeddingGemma)
tutorials_RAG_EmbeddingGemma.ipynb RAG with EmbeddingGemma Currently the only notebook in tutorials/ — reflects the "latest tested" tier
docs_gemma_chat.ipynb Chatbot with Gemma on Keras Documents the __START_TURN_USER__ = "<start_of_turn>user\n" / __END_TURN__ = "<end_of_turn>\n" format explicitly; Gemma 2 example, but the class is the canonical illustration of the Gemma 1/2/3 chat template
apps_Gemma4_HDP_AgenticSecurity_README.md README for the HDP agentic-security reference app Gemma-4-specific demo; real production pattern for gating native function calls
apps_Gemma4_HDP_hdp_middleware.py Drop-in middleware (HDPMiddleware.gate()) Wraps any Gemma 4 tool executor with Ed25519-signed HDT verification
apps_Gemma4_HDP_AgenticSecurity.ipynb Walkthrough notebook End-to-end: load Gemma 4, issue tokens, gate function calls

Other cookbook content worth noting (not downloaded — fetch on demand):

  • docs/capabilities/thinking.ipynb (438 KB) — Gemma 4 thinking-mode notebook
  • docs/capabilities/audio.ipynb — audio-input capability
  • docs/functiongemma/{finetuning-with-functiongemma,full-function-calling-sequence-with-functiongemma,function-calling-with-hf}.ipynbFunctionGemma is a separate fine-tune on the Gemma 3 270M IT checkpoint specifically for function calling; distinct from Gemma 4's native function calling
  • docs/core/pytorch_gemma.ipynb, keras_inference.ipynb, huggingface_*.ipynb — framework-specific recipes
  • docs/integrations/langchain.ipynb — LangChain integration
  • experiments/{MedGemma,TxGemma}/ and experiments/[T5Gemma]Example.ipynb, [VaultGemma]FineTuning_Inference_Huggingface.ipynb, etc. — domain-specific Gemma variants

docs/ — Canonical ai.google.dev pages (HTML cached)

Verified URLs below; HTML snapshots saved for verbatim preservation.

File Source URL
ai-google-dev_core.html https://ai.google.dev/gemma/docs/core — Gemma 4 overview
ai-google-dev_model_card_4.html https://ai.google.dev/gemma/docs/core/model_card_4 — Gemma 4 model card
ai-google-dev_prompt_formatting_gemma4.html https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4Gemma 4 prompt tokens (new <|turn>/<turn|> syntax)
ai-google-dev_function_calling_gemma4.html https://ai.google.dev/gemma/docs/capabilities/text/function-calling-gemma4Gemma 4 native function calling spec
ai-google-dev_formatting.html https://ai.google.dev/gemma/docs/formatting — Gemma 1/2/3 prompt format (<start_of_turn>/<end_of_turn>)
blog_announcement.html https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/ — Gemma 4 launch blog, 2026-04-02

Other canonical doc URLs (verified to exist, not snapshotted here — visit directly):

tech-report/

File What Source
Gemma3Report.pdf Gemma 3 Technical Report (arXiv:2503.19786, 2025-03-12) https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf

No Gemma 4 technical report exists yet. Probed paths that return 404:

  • Gemma4Report.pdf, gemma4-report.pdf, Gemma4Report_v1.pdf under storage.googleapis.com/deepmind-media/gemma/
  • goo.gle/gemma4report (not configured — redirects to google.com)

DeepMind repo README line: "Gemma 4 (Coming soon)". The Gemma 3 report remains the most-authoritative Google-DeepMind scientific document for the family and is the correct citation for architecture fundamentals (Grouped-Query Attention with post-norm/pre-norm RMSNorm, 5:1 local/global attention layer interleave, 1024-token local sliding window, RoPE base 1M on global / 10k on local, SigLIP 400M vision encoder at 896×896 shared across 4B/12B/27B and frozen during training, SentencePiece tokenizer with 262k vocab shared with Gemini 2.0, knowledge distillation during pre-training, QAT checkpoints via 5k-step fine-tune for int4/SFP8). Per-variant parameter counts for Gemma 3: 1B = 698M non-embedding + 302M embedding, 4B = 3209M + 675M, 12B = 10759M + 1012M, 27B = 25600M + 1416M.

Canonical Gemma 4 prompt format (verified 2026-04-18)

Source: https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4 and https://ai.google.dev/gemma/docs/capabilities/text/function-calling-gemma4

Note the <|turn> / <turn|> are asymmetric — opening has the pipe on the left, closing has the pipe on the right. Same for all paired delimiters.

<|turn>system
<|think|>  (optional — activates thinking mode)
<|tool>declaration:FUNCTION_NAME{description:<|"|>...<|"|>,parameters:{properties:{...},required:[...]}}<tool|>
You are a helpful assistant.<turn|>
<|turn>user
What's the weather in Tokyo?<turn|>
<|turn>model
<|channel>thought
...internal reasoning...<channel|>
<|tool_call>call:get_current_weather{location:<|"|>Tokyo, JP<|"|>}<tool_call|>
<|tool_response>response:get_current_weather{temperature:15,weather:<|"|>sunny<|"|>}<tool_response|>
The current weather in Tokyo is 15 degrees and sunny.<turn|>

Recommended sampling (per model card, verified): temperature=1.0, top_p=0.95, top_k=64. Tokenizer vocab = 262k (same as Gemini 2.0). BOS token required — prepend [BOS] / set add_bos=True.

Gemma 1/2/3 prompt format (different — for reference):

<start_of_turn>user
[message]<end_of_turn>
<start_of_turn>model
[response]<end_of_turn>

Gemma 1/2/3 have no trained tool-use or thinking tokens. PT models end with <eos>; IT models end with <end_of_turn>.

Gemma 4 variants (canonical spec from model card)

Variant Params Active Context Multimodal
Gemma 4 E2B 2.3B effective (5.1B w/ embeddings), 35 layers 128K text+image+audio (30s max)
Gemma 4 E4B 4.5B effective (8B w/ embeddings), 42 layers 128K text+image+audio (30s max)
Gemma 4 26B A4B 25.2B total (MoE), 30 layers 3.8B 256K text+image
Gemma 4 31B 30.7B dense, 60 layers 256K text+image

All variants: Apache 2.0, base + instruction-tuned (-it), 140+ languages, native function calling, native structured JSON output. Vision encoder = 150M (E2B/E4B) or 550M (26B/31B). Image resolution token budgets: 70, 140, 280, 560, 1120. Released 2026-04-02.

Fetched using

All files fetched via curl -sL from raw.githubusercontent.com on 2026-04-18. Repos enumerated via the GitHub API (https://api.github.com/repos/<owner>/<repo>/contents/<path>). Google docs pages fetched via WebFetch tool. No GitHub auth needed for public raw files (unauthenticated rate limit = 60 req/hr, sufficient for this task).