# ShieldGemma / ShieldGemma 2 Safety classifiers. Two separate product lines now: **ShieldGemma** (text, built on Gemma 2) and **ShieldGemma 2** (images, built on Gemma 3). There is no Gemma 4 generation yet. ## What it is - **ShieldGemma (text):** LLM-as-a-judge safety classifier. Takes a prompt + optional model response + a policy, emits `Yes`/`No` (yes = violates policy). Four harm types. - **ShieldGemma 2 (image):** Image classifier. Takes a PIL image, emits probabilities across three image-safety categories. Image-only — does not accept text. ## Sizes - ShieldGemma: **2B, 9B, 27B** — all instruction-tuned. - ShieldGemma 2: **4B** — only size available. ## Model cards - Text: https://ai.google.dev/gemma/docs/shieldgemma/model_card - Image: https://huggingface.co/google/shieldgemma-2-4b-it - DeepMind: https://deepmind.google/models/gemma/shieldgemma-2/ ## Safety categories **ShieldGemma (text):** 1. Sexually explicit content 2. Dangerous content 3. Hate speech 4. Harassment **ShieldGemma 2 (image):** 1. Sexually explicit content 2. Dangerous content 3. Violence / gore Note the image model dropped "hate" and "harassment" (hard to define visually) and added "violence/gore" (a visual primitive). ## Text prompt format (ShieldGemma) Five-component structure: ``` user model Our safety principle is defined in the below: * Does the human prompt/response violate the above principle? Your answer must start with 'Yes' or 'No'. ``` The model outputs one token: `Yes` (violates) or `No` (safe). Softmax the logits on those two tokens for a calibrated score. ## Minimum invocation — ShieldGemma 2 (image) ```python from transformers import AutoProcessor, ShieldGemma2ForImageClassification from PIL import Image import torch model_id = "google/shieldgemma-2-4b-it" model = ShieldGemma2ForImageClassification.from_pretrained(model_id).eval() processor = AutoProcessor.from_pretrained(model_id) image = Image.open("input.jpg") inputs = processor(images=[image], return_tensors="pt") with torch.inference_mode(): out = model(**inputs) print(out.probabilities) # tensor of per-category "Yes" probabilities ``` ## When to choose it over base Gemma 4 - You need a **calibrated safety score**, not a free-form "is this safe?" answer from the chat model. ShieldGemma emits Yes/No token logits — easy to threshold. - You want **policy-by-policy classification** (e.g., run each category separately with different thresholds). - You're running a moderation pipeline and need **a small, fast, purpose-trained classifier** rather than a general chat model reasoning about safety. Use base Gemma 4 for "explain *why* this is unsafe" narrative output. ShieldGemma is the yes/no stamp. ## Homelab fit Pre-filter for `ai-visualizer` (CT 167, pve197) before publishing generated images. ShieldGemma 2 4B at Q4 fits comfortably on the Tesla V100-PCIE-32GB alongside SDXL.