gemma4-research/tooling/huggingface/README.md

# Gemma 4 — Hugging Face Canonical Tooling

Downloaded April 2026. First-party Google/HF content only. No weights, no third-party fine-tunes.

## What's here

### `model-cards/`
Verbatim `README.md` from every `google/gemma-4-*` repo (raw endpoint, ungated). Plus the chat template and tokenizer config for two representative variants (31B-it and E4B-it). All eight model cards have identical body text; they differ only in the `pipeline_tag:` YAML frontmatter and size-specific tables.

| File | What it demonstrates |
|------|----------------------|
| `gemma-4-31B-it-README.md` | Flagship dense (33B) instruction-tuned. Full "how to use" from Google+HF. |
| `gemma-4-31B-README.md` | Base (pretrained) variant of the above. |
| `gemma-4-26B-A4B-it-README.md` | MoE (26B params, 4B active) instruction-tuned. The "A4B" = 4B active. |
| `gemma-4-26B-A4B-README.md` | Base MoE. |
| `gemma-4-E4B-it-README.md` | Edge-sized 8B instruction-tuned. Multimodal including audio. |
| `gemma-4-E4B-README.md` | Base E4B. |
| `gemma-4-E2B-it-README.md` | Smallest (5B) instruction-tuned, mobile-targeted. |
| `gemma-4-E2B-README.md` | Base E2B. |
| `gemma-4-31B-it-chat_template.jinja` | **Canonical chat template.** 16KB Jinja — handles system/user/model/tool roles, thinking channel, tool calls, image/audio/video tokens. |
| `gemma-4-E4B-it-chat_template.jinja` | Near-identical to 31B's (131-byte difference — likely one whitespace-sensitive thing around audio handling). |
| `gemma-4-31B-it-tokenizer_config.json` | **Special-token inventory + `response_schema` regex machinery.** See "New capabilities" below. |
| `gemma-4-E4B-it-tokenizer_config.json` | Same shape. |

### `transformers/`
Files under `src/transformers/models/gemma4/` on `huggingface/transformers@main`. Full files for small ones; outlines (signatures + first 12 lines per class/def) for the two large ones.

| File | Lines | What |
|------|-------|------|
| `__init__.py` | 33 | Module exports |
| `configuration_gemma4.py` | 352 | `Gemma4Config`, `Gemma4TextConfig`, `Gemma4AudioConfig`, `Gemma4VisionConfig` — all hyperparams |
| `processing_gemma4.py` | 366 | `Gemma4Processor` — the thing `AutoProcessor.from_pretrained` returns. Includes `parse_response()` |
| `feature_extraction_gemma4.py` | 298 | Audio feature extraction (mel spec, padding) |
| `image_processing_gemma4.py` | 220 | Tensor-backed image preprocessing |
| `image_processing_pil_gemma4.py` | 278 | PIL-backed variant (slower fallback) |
| `video_processing_gemma4.py` | 237 | Frame sampling + stitching to image tokens |
| `modeling_gemma4-OUTLINE.py` | 723 | Outline of the 2657-line modeling file (43 classes: attention, MoE, audio encoder, vision tower, all LM heads) |
| `modular_gemma4-OUTLINE.py` | 563 | Outline of the modular source file — shows Gemma4 **inherits from Gemma3n classes** (RMSNorm, attention blocks etc.) confirming the 3n→4 lineage |

Full files: https://github.com/huggingface/transformers/tree/main/src/transformers/models/gemma4

### `recipes/`
From `huggingface/huggingface-gemma-recipes` — the canonical HF recipe repo. The only Gemma 4-specific recipe as of April 2026 is one notebook; the rest is Gemma 3n which is architecturally the parent of Gemma 4.

| File | What |
|------|------|
| `notebooks/Gemma4_E2B-Multimodal.ipynb` | **The one first-party Gemma-4 recipe.** Original ipynb. 36 cells: image, video, audio, function calling, object detection with `box_2d`, any-to-any pipeline, captioning. |
| `notebooks/Gemma4_E2B-Multimodal-extracted.py` | Same notebook flattened to readable .py for grep/diff. |
| `scripts/ft_gemma3n_image_trl.py` | TRL SFT fine-tune of Gemma 3n on images. Direct precursor to Gemma 4 SFT. |
| `scripts/ft_gemma3n_image_vt.py` | Vision+text fine-tune without TRL (pure Transformers Trainer). |
| `scripts/ft_gemma3n_audio_vt.py` | Audio+text fine-tune. |
| `scripts/gemma3n_fine_tuning_on_all_modalities.py` | All-modalities SFT script — template for full Gemma-4 all-modal SFT. |
| `scripts/carla_vlm_gemma.py` | CARLA driving sim VLM example using Gemma. |

### `trl/`
**Empty as of April 2026.** Searched `huggingface/trl/examples/scripts` — only `sft_gemma3.py` and `sft_vlm_gemma3.py` exist, no gemma4 yet. The gemma-recipes repo's `ft_gemma3n_image_trl.py` is the closest first-party TRL pattern; it is saved under `recipes/scripts/` above.

### `peft/`
**Empty as of April 2026.** `huggingface/peft/examples` has no gemma-specific directory. The canonical HF PEFT guide for Gemma is the blog post `gemma-peft.md`, saved under `blog/` below. It covers Gemma 1 but the LoRA target-module patterns apply unchanged to Gemma 4 (same `q_proj/k_proj/v_proj/o_proj` naming).

### `blog/`
| File | What |
|------|------|
| `gemma4-blog.md` | **"Welcome Gemma 4: Frontier multimodal intelligence on device"** — the HF launch blog. 764 lines. Authored by merve. Covers architecture, capabilities, transformers usage, HF Inference API, llama.cpp/MLX quantization, thinking mode examples. |
| `gemma-peft-blog.md` | "Fine-Tuning Gemma Models in Hugging Face" — the PEFT/LoRA recipe blog (gemma-agnostic, target modules unchanged for Gemma 4). |

### `spaces/`
The official HF-run interactive demo Spaces.

| File | What |
|------|------|
| `huggingface-projects_gemma-4-31b-it-app.py` | Official 31B demo (Gradio 6 chat + multimodal). |
| `huggingface-projects_gemma-4-e4b-it-app.py` | Official E4B demo. **More illustrative** — shows the full multimodal+thinking pattern in ~320 lines. |
| `*-requirements.txt` | Pinned deps. **`transformers==5.5.4`** (as of 2026-04-18) — that's the minimum version for Gemma 4 in transformers main line. |

---

## New capabilities the HF integration exposes that weren't in the existing corpus

1. **`AutoModelForMultimodalLM`** — new transformers AutoClass, not `AutoModelForCausalLM`. Required to get any-to-any routing (text+image+audio+video in, text out). The corpus's `CORPUS_capabilities.md` should note this.

2. **`processor.parse_response(text) -> dict`** — built into `Gemma4Processor`. Returns `{thinking, content, tool_calls}` parsed from raw decoded output. Driven by regexes declared in `tokenizer_config.json` under `response_schema` (new HF feature using `x-regex`, `x-regex-iterator`, and a custom `x-parser: gemma4-tool-call`). **You no longer need to hand-roll tool-call regex parsing** if you use the HF processor — this is the HF-canonical replacement for the manual parsing done in `CORPUS_tool_calling_format.md`.

3. **`enable_thinking=True`** — a kwarg to `processor.apply_chat_template()`. When set, injects `<|think|>` at the top of the system turn. **This is how you turn reasoning mode on** through the HF API. Not documented in the existing corpus.

4. **`load_audio_from_video=True`** — another `apply_chat_template` kwarg. Pulls the audio track out of a video URL and feeds it as audio tokens alongside sampled frames. Only relevant for E2B/E4B which have audio; the notebook comment explicitly calls this out.

5. **`pipeline("any-to-any", model=...)`** — a new HF pipeline task registered for Gemma 4. Accepts the chat-style messages list directly. Easiest one-liner for multimodal inference.

6. **Object detection via `box_2d` JSON** — prompting with "What's the bounding box for the X?" returns `[{"box_2d": [ymin, xmin, ymax, xmax], "label": "..."}]` in a 1000x1000 normalized coordinate frame, with images resized to multiples of 48 pixels. This is a Gemma-4-specific convention the notebook demonstrates. Corpus doesn't cover this.

7. **Thinking delimiters are `<|channel>thought...<channel|>`** — not `<thinking>...</thinking>` like some other open-weights models. The Space app explicitly strips these to pass to Gradio 6's `reasoning_tags` for collapsible thinking UI.

8. **Breaking change in role/turn markers vs Gemma 3** — Gemma 3 used `<start_of_turn>user ... <end_of_turn>`. Gemma 4 uses `<|turn>user\n ... <turn|>`. Tokenizer config:
   - `sot_token`: `<|turn>` (start of turn)
   - `eot_token`: `<turn|>` (end of turn)
   - Role after `<|turn>` can be `system`, `user`, `model`, or `tool`.
   - `enable_thinking` injects a `<|think|>` marker into the first system turn.
   Anything in the homelab that hard-codes `<start_of_turn>` for Gemma needs to branch on family version. Worth adding to `GOTCHAS.md`.

---

## Canonical chat template format

**Source of truth:** the two `.jinja` files in `model-cards/`. Use them directly — **do not reimplement.** The tokenizer loads them automatically:

```python
from transformers import AutoProcessor
processor = AutoProcessor.from_pretrained("google/gemma-4-E4B-it")
inputs = processor.apply_chat_template(
    messages,
    tools=[WEATHER_TOOL],           # optional; OpenAI-style tool schema
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
    add_generation_prompt=True,
    enable_thinking=True,           # turns on reasoning, injects <|think|>
    load_audio_from_video=False,    # only for video inputs
)
output = model.generate(**inputs, max_new_tokens=1000)
generated = processor.decode(output[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
result = processor.parse_response(generated)
# → {"thinking": "...", "content": "...", "tool_calls": [...]}
```

### Wire format that the template produces

```
<bos><|turn>system
<|think|>
{system prompt here if any}
<|tool>declaration:get_weather{city:{type:<|"|>STRING<|"|>,description:<|"|>...<|"|>}}<tool|>
<turn|>
<|turn>user
{user text}
<|image|>                       ← placeholder for each image
<|audio|>                       ← placeholder for each audio
<|video|>                       ← placeholder for each video
<turn|>
<|turn>model
<|channel>thought
{reasoning text}
<channel|>
<|tool_call>call:get_weather{city:<|"|>London<|"|>}<tool_call|>
<|tool_response>response:get_weather{temperature:15}<tool_response|>
{final content}
<turn|>
```

Every Gemma-4-specific token appears in `tokenizer_config.json`. The `apply_chat_template` call + the `response_schema` + `parse_response()` round-trip means **homelab code should never hand-emit these tokens** — always go through the processor.

---

## Source URLs (first-party only)

- Model collection: https://huggingface.co/collections/google/gemma-4
- transformers gemma4 dir: https://github.com/huggingface/transformers/tree/main/src/transformers/models/gemma4
- Recipes repo: https://github.com/huggingface/huggingface-gemma-recipes
- Launch blog: https://huggingface.co/blog/gemma4
- Official 31B Space: https://huggingface.co/spaces/huggingface-projects/gemma-4-31b-it
- Official E4B Space: https://huggingface.co/spaces/huggingface-projects/gemma-4-e4b-it