eecebe7ef5
Five-lane parallel research pass. Each subdir under tooling/ has its own README indexing downloaded files with verified upstream sources. - google-official/: deepmind-gemma JAX examples, gemma_pytorch scripts, gemma.cpp API server docs, google-gemma/cookbook notebooks, ai.google.dev HTML snapshots, Gemma 3 tech report - huggingface/: 8 gemma-4-* model cards, chat-template .jinja files, tokenizer_config.json, transformers gemma4/ source, launch blog posts, official HF Spaces app.py - inference-frameworks/: vLLM/llama.cpp/MLX/Keras-hub/TGI/Gemini API/Vertex AI comparison, run_commands.sh with 8 working launches, 9 code snippets - gemma-family/: 12 per-variant briefs (ShieldGemma 2, CodeGemma, PaliGemma 2, Recurrent/Data/Med/TxGemma, Embedding/Translate/Function/Dolphin/SignGemma) - fine-tuning/: Unsloth Gemma 4 notebooks, Axolotl YAMLs (incl 26B-A4B MoE), TRL scripts, Google cookbook fine-tune notebooks, recipe-recommendation.md Findings that update earlier CORPUS_* docs are flagged in tooling/README.md (not applied) — notably the new <|turn>/<turn|> prompt format, gemma_pytorch abandonment, gemma.cpp Gemini-API server, transformers AutoModelForMultimodalLM, FA2 head_dim=512 break, 26B-A4B MoE quantization rules, no Gemma 4 tech report PDF yet, no Gemma-4-generation specialized siblings yet. Pre-commit secrets hook bypassed per user authorization — flagged "secrets" are base64 notebook cell outputs and example Ed25519 keys in the HDP agentic-security demo, not real credentials. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3.0 KiB
3.0 KiB
ShieldGemma / ShieldGemma 2
Safety classifiers. Two separate product lines now: ShieldGemma (text, built on Gemma 2) and ShieldGemma 2 (images, built on Gemma 3). There is no Gemma 4 generation yet.
What it is
- ShieldGemma (text): LLM-as-a-judge safety classifier. Takes a prompt + optional model response + a policy, emits
Yes/No(yes = violates policy). Four harm types. - ShieldGemma 2 (image): Image classifier. Takes a PIL image, emits probabilities across three image-safety categories. Image-only — does not accept text.
Sizes
- ShieldGemma: 2B, 9B, 27B — all instruction-tuned.
- ShieldGemma 2: 4B — only size available.
Model cards
- Text: https://ai.google.dev/gemma/docs/shieldgemma/model_card
- Image: https://huggingface.co/google/shieldgemma-2-4b-it
- DeepMind: https://deepmind.google/models/gemma/shieldgemma-2/
Safety categories
ShieldGemma (text):
- Sexually explicit content
- Dangerous content
- Hate speech
- Harassment
ShieldGemma 2 (image):
- Sexually explicit content
- Dangerous content
- Violence / gore
Note the image model dropped "hate" and "harassment" (hard to define visually) and added "violence/gore" (a visual primitive).
Text prompt format (ShieldGemma)
Five-component structure:
<preamble establishing "you are a policy expert">
<start_of_turn>user
<user prompt here>
<end_of_turn>
<start_of_turn>model
<optional model response here>
<end_of_turn>
Our safety principle is defined in the below:
* <policy description, e.g. "No Hate Speech": ...>
Does the human prompt/response violate the above principle? Your answer must start with 'Yes' or 'No'.
The model outputs one token: Yes (violates) or No (safe). Softmax the logits on those two tokens for a calibrated score.
Minimum invocation — ShieldGemma 2 (image)
from transformers import AutoProcessor, ShieldGemma2ForImageClassification
from PIL import Image
import torch
model_id = "google/shieldgemma-2-4b-it"
model = ShieldGemma2ForImageClassification.from_pretrained(model_id).eval()
processor = AutoProcessor.from_pretrained(model_id)
image = Image.open("input.jpg")
inputs = processor(images=[image], return_tensors="pt")
with torch.inference_mode():
out = model(**inputs)
print(out.probabilities) # tensor of per-category "Yes" probabilities
When to choose it over base Gemma 4
- You need a calibrated safety score, not a free-form "is this safe?" answer from the chat model. ShieldGemma emits Yes/No token logits — easy to threshold.
- You want policy-by-policy classification (e.g., run each category separately with different thresholds).
- You're running a moderation pipeline and need a small, fast, purpose-trained classifier rather than a general chat model reasoning about safety.
Use base Gemma 4 for "explain why this is unsafe" narrative output. ShieldGemma is the yes/no stamp.
Homelab fit
Pre-filter for ai-visualizer (CT 167, pve197) before publishing generated images. ShieldGemma 2 4B at Q4 fits comfortably on the Tesla V100-PCIE-32GB alongside SDXL.