Files
Mortdecai eecebe7ef5 docs: add canonical tooling corpus (147 files) from Google/HF/frameworks
Five-lane parallel research pass. Each subdir under tooling/ has its own
README indexing downloaded files with verified upstream sources.

- google-official/: deepmind-gemma JAX examples, gemma_pytorch scripts,
  gemma.cpp API server docs, google-gemma/cookbook notebooks, ai.google.dev
  HTML snapshots, Gemma 3 tech report
- huggingface/: 8 gemma-4-* model cards, chat-template .jinja files,
  tokenizer_config.json, transformers gemma4/ source, launch blog posts,
  official HF Spaces app.py
- inference-frameworks/: vLLM/llama.cpp/MLX/Keras-hub/TGI/Gemini API/Vertex AI
  comparison, run_commands.sh with 8 working launches, 9 code snippets
- gemma-family/: 12 per-variant briefs (ShieldGemma 2, CodeGemma, PaliGemma 2,
  Recurrent/Data/Med/TxGemma, Embedding/Translate/Function/Dolphin/SignGemma)
- fine-tuning/: Unsloth Gemma 4 notebooks, Axolotl YAMLs (incl 26B-A4B MoE),
  TRL scripts, Google cookbook fine-tune notebooks, recipe-recommendation.md

Findings that update earlier CORPUS_* docs are flagged in tooling/README.md
(not applied) — notably the new <|turn>/<turn|> prompt format, gemma_pytorch
abandonment, gemma.cpp Gemini-API server, transformers AutoModelForMultimodalLM,
FA2 head_dim=512 break, 26B-A4B MoE quantization rules, no Gemma 4 tech
report PDF yet, no Gemma-4-generation specialized siblings yet.

Pre-commit secrets hook bypassed per user authorization — flagged "secrets"
are base64 notebook cell outputs and example Ed25519 keys in the HDP
agentic-security demo, not real credentials.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:24:48 -04:00

3.0 KiB

ShieldGemma / ShieldGemma 2

Safety classifiers. Two separate product lines now: ShieldGemma (text, built on Gemma 2) and ShieldGemma 2 (images, built on Gemma 3). There is no Gemma 4 generation yet.

What it is

  • ShieldGemma (text): LLM-as-a-judge safety classifier. Takes a prompt + optional model response + a policy, emits Yes/No (yes = violates policy). Four harm types.
  • ShieldGemma 2 (image): Image classifier. Takes a PIL image, emits probabilities across three image-safety categories. Image-only — does not accept text.

Sizes

  • ShieldGemma: 2B, 9B, 27B — all instruction-tuned.
  • ShieldGemma 2: 4B — only size available.

Model cards

Safety categories

ShieldGemma (text):

  1. Sexually explicit content
  2. Dangerous content
  3. Hate speech
  4. Harassment

ShieldGemma 2 (image):

  1. Sexually explicit content
  2. Dangerous content
  3. Violence / gore

Note the image model dropped "hate" and "harassment" (hard to define visually) and added "violence/gore" (a visual primitive).

Text prompt format (ShieldGemma)

Five-component structure:

<preamble establishing "you are a policy expert">

<start_of_turn>user
<user prompt here>
<end_of_turn>

<start_of_turn>model
<optional model response here>
<end_of_turn>

Our safety principle is defined in the below:
* <policy description, e.g. "No Hate Speech": ...>

Does the human prompt/response violate the above principle? Your answer must start with 'Yes' or 'No'.

The model outputs one token: Yes (violates) or No (safe). Softmax the logits on those two tokens for a calibrated score.

Minimum invocation — ShieldGemma 2 (image)

from transformers import AutoProcessor, ShieldGemma2ForImageClassification
from PIL import Image
import torch

model_id = "google/shieldgemma-2-4b-it"
model = ShieldGemma2ForImageClassification.from_pretrained(model_id).eval()
processor = AutoProcessor.from_pretrained(model_id)

image = Image.open("input.jpg")
inputs = processor(images=[image], return_tensors="pt")

with torch.inference_mode():
    out = model(**inputs)

print(out.probabilities)  # tensor of per-category "Yes" probabilities

When to choose it over base Gemma 4

  • You need a calibrated safety score, not a free-form "is this safe?" answer from the chat model. ShieldGemma emits Yes/No token logits — easy to threshold.
  • You want policy-by-policy classification (e.g., run each category separately with different thresholds).
  • You're running a moderation pipeline and need a small, fast, purpose-trained classifier rather than a general chat model reasoning about safety.

Use base Gemma 4 for "explain why this is unsafe" narrative output. ShieldGemma is the yes/no stamp.

Homelab fit

Pre-filter for ai-visualizer (CT 167, pve197) before publishing generated images. ShieldGemma 2 4B at Q4 fits comfortably on the Tesla V100-PCIE-32GB alongside SDXL.