Files

T

Mortdecai eecebe7ef5 docs: add canonical tooling corpus (147 files) from Google/HF/frameworks

Five-lane parallel research pass. Each subdir under tooling/ has its own
README indexing downloaded files with verified upstream sources.

- google-official/: deepmind-gemma JAX examples, gemma_pytorch scripts,
  gemma.cpp API server docs, google-gemma/cookbook notebooks, ai.google.dev
  HTML snapshots, Gemma 3 tech report
- huggingface/: 8 gemma-4-* model cards, chat-template .jinja files,
  tokenizer_config.json, transformers gemma4/ source, launch blog posts,
  official HF Spaces app.py
- inference-frameworks/: vLLM/llama.cpp/MLX/Keras-hub/TGI/Gemini API/Vertex AI
  comparison, run_commands.sh with 8 working launches, 9 code snippets
- gemma-family/: 12 per-variant briefs (ShieldGemma 2, CodeGemma, PaliGemma 2,
  Recurrent/Data/Med/TxGemma, Embedding/Translate/Function/Dolphin/SignGemma)
- fine-tuning/: Unsloth Gemma 4 notebooks, Axolotl YAMLs (incl 26B-A4B MoE),
  TRL scripts, Google cookbook fine-tune notebooks, recipe-recommendation.md

Findings that update earlier CORPUS_* docs are flagged in tooling/README.md
(not applied) — notably the new <|turn>/<turn|> prompt format, gemma_pytorch
abandonment, gemma.cpp Gemini-API server, transformers AutoModelForMultimodalLM,
FA2 head_dim=512 break, 26B-A4B MoE quantization rules, no Gemma 4 tech
report PDF yet, no Gemma-4-generation specialized siblings yet.

Pre-commit secrets hook bypassed per user authorization — flagged "secrets"
are base64 notebook cell outputs and example Ed25519 keys in the HDP
agentic-security demo, not real credentials.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-18 12:24:48 -04:00

8.8 KiB

Raw Permalink Blame History

Hugging Face Gemma Recipes

🤗💎 Welcome! This repository contains minimal recipes to get started quickly with the Gemma family of models.

Note

Gemma 4 Multimodal inference (vision, video, audio, function calling, object detection):

Getting Started

To quickly run a Gemma 💎 model on your machine, install the latest version of timm (for the vision encoder) and 🤗 transformers to run inference, or if you want to fine tune it.

$ pip install -U -q transformers timm

Inference with pipeline

The easiest way to start using Gemma 3n is by using the pipeline abstraction in transformers:

import torch
from transformers import pipeline

pipe = pipeline(
   "image-text-to-text",
   model="google/gemma-3n-E4B-it", # "google/gemma-3n-E4B-it"
   device="cuda",
   torch_dtype=torch.bfloat16
)

messages = [
   {
       "role": "user",
       "content": [
           {"type": "image", "url": "https://huggingface.co/datasets/ariG23498/demo-data/resolve/main/airplane.jpg"},
           {"type": "text", "text": "Describe this image"}
       ]
   }
]

output = pipe(text=messages, max_new_tokens=32)
print(output[0]["generated_text"][-1]["content"])

Detailed inference with transformers

Initialize the model and the processor from the Hub, and write the model_generation function that takes care of processing the prompts and running the inference on the model.

from transformers import AutoProcessor, AutoModelForImageTextToText
import torch

model_id = "google/gemma-3n-e4b-it" # google/gemma-3n-e2b-it
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(model_id).to(device)

def model_generation(model, messages):
    inputs = processor.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=True,
        return_dict=True,
        return_tensors="pt",
    )
    input_len = inputs["input_ids"].shape[-1]

    inputs = inputs.to(model.device, dtype=model.dtype)

    with torch.inference_mode():
        generation = model.generate(**inputs, max_new_tokens=32, disable_compile=False)
        generation = generation[:, input_len:]

    decoded = processor.batch_decode(generation, skip_special_tokens=True)
    print(decoded[0])

And then using calling it with our specific modality:

Text only

# Text Only

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What is the capital of France?"}
        ]
    }
]
model_generation(model, messages)

Interleaved with Audio

# Interleaved with Audio

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Transcribe the following speech segment in English:"},
            {"type": "audio", "audio": "https://huggingface.co/datasets/ariG23498/demo-data/resolve/main/speech.wav"},
        ]
    }
]
model_generation(model, messages)

Interleaved with Image/Video

# Interleaved with Image

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "https://huggingface.co/datasets/ariG23498/demo-data/resolve/main/airplane.jpg"},
            {"type": "text", "text": "Describe this image."}
        ]
    }
]
model_generation(model, messages)

Inference

Gemma 4

Notebooks

Multimodal inference with Gemma 4 (vision, video, audio, function calling, object detection)

Gemma 3n

Notebooks

Multimodal inference using Gemma 3n via pipeline

Function Calling

Gemma 3n

Notebooks

Function Calling with Gemma 3n: Local File Reader

Fine Tuning

We include a series of notebook+scripts for fine tuning the models.

Gemma 3n

Notebooks

Scripts

Gemma 3

RAG

Gemma 3n

Retrieval-Augmented Generation with Gemma 3n

Before fine-tuning the model, ensure all dependencies are installed:

$ pip install -U -q -r requirements.txt

✨ Bonus: We've also experimented with adding object detection 🔍 capabilities to Gemma 3. You can explore that work in this dedicated repo.

8.8 KiB Raw Permalink Blame History

Hugging Face Gemma Recipes

Getting Started

Inference with pipeline

Detailed inference with transformers

Text only

Interleaved with Audio

Interleaved with Image/Video

Inference

Gemma 4

Notebooks

Gemma 3n

Notebooks

Function Calling

Gemma 3n

Notebooks

Fine Tuning

Gemma 3n

Notebooks

Scripts

Gemma 3

RAG

Gemma 3n

8.8 KiB

Raw Permalink Blame History