eecebe7ef5
Five-lane parallel research pass. Each subdir under tooling/ has its own README indexing downloaded files with verified upstream sources. - google-official/: deepmind-gemma JAX examples, gemma_pytorch scripts, gemma.cpp API server docs, google-gemma/cookbook notebooks, ai.google.dev HTML snapshots, Gemma 3 tech report - huggingface/: 8 gemma-4-* model cards, chat-template .jinja files, tokenizer_config.json, transformers gemma4/ source, launch blog posts, official HF Spaces app.py - inference-frameworks/: vLLM/llama.cpp/MLX/Keras-hub/TGI/Gemini API/Vertex AI comparison, run_commands.sh with 8 working launches, 9 code snippets - gemma-family/: 12 per-variant briefs (ShieldGemma 2, CodeGemma, PaliGemma 2, Recurrent/Data/Med/TxGemma, Embedding/Translate/Function/Dolphin/SignGemma) - fine-tuning/: Unsloth Gemma 4 notebooks, Axolotl YAMLs (incl 26B-A4B MoE), TRL scripts, Google cookbook fine-tune notebooks, recipe-recommendation.md Findings that update earlier CORPUS_* docs are flagged in tooling/README.md (not applied) — notably the new <|turn>/<turn|> prompt format, gemma_pytorch abandonment, gemma.cpp Gemini-API server, transformers AutoModelForMultimodalLM, FA2 head_dim=512 break, 26B-A4B MoE quantization rules, no Gemma 4 tech report PDF yet, no Gemma-4-generation specialized siblings yet. Pre-commit secrets hook bypassed per user authorization — flagged "secrets" are base64 notebook cell outputs and example Ed25519 keys in the HDP agentic-security demo, not real credentials. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
90 lines
3.0 KiB
Markdown
90 lines
3.0 KiB
Markdown
# ShieldGemma / ShieldGemma 2
|
|
|
|
Safety classifiers. Two separate product lines now: **ShieldGemma** (text, built on Gemma 2) and **ShieldGemma 2** (images, built on Gemma 3). There is no Gemma 4 generation yet.
|
|
|
|
## What it is
|
|
|
|
- **ShieldGemma (text):** LLM-as-a-judge safety classifier. Takes a prompt + optional model response + a policy, emits `Yes`/`No` (yes = violates policy). Four harm types.
|
|
- **ShieldGemma 2 (image):** Image classifier. Takes a PIL image, emits probabilities across three image-safety categories. Image-only — does not accept text.
|
|
|
|
## Sizes
|
|
|
|
- ShieldGemma: **2B, 9B, 27B** — all instruction-tuned.
|
|
- ShieldGemma 2: **4B** — only size available.
|
|
|
|
## Model cards
|
|
|
|
- Text: https://ai.google.dev/gemma/docs/shieldgemma/model_card
|
|
- Image: https://huggingface.co/google/shieldgemma-2-4b-it
|
|
- DeepMind: https://deepmind.google/models/gemma/shieldgemma-2/
|
|
|
|
## Safety categories
|
|
|
|
**ShieldGemma (text):**
|
|
1. Sexually explicit content
|
|
2. Dangerous content
|
|
3. Hate speech
|
|
4. Harassment
|
|
|
|
**ShieldGemma 2 (image):**
|
|
1. Sexually explicit content
|
|
2. Dangerous content
|
|
3. Violence / gore
|
|
|
|
Note the image model dropped "hate" and "harassment" (hard to define visually) and added "violence/gore" (a visual primitive).
|
|
|
|
## Text prompt format (ShieldGemma)
|
|
|
|
Five-component structure:
|
|
|
|
```
|
|
<preamble establishing "you are a policy expert">
|
|
|
|
<start_of_turn>user
|
|
<user prompt here>
|
|
<end_of_turn>
|
|
|
|
<start_of_turn>model
|
|
<optional model response here>
|
|
<end_of_turn>
|
|
|
|
Our safety principle is defined in the below:
|
|
* <policy description, e.g. "No Hate Speech": ...>
|
|
|
|
Does the human prompt/response violate the above principle? Your answer must start with 'Yes' or 'No'.
|
|
```
|
|
|
|
The model outputs one token: `Yes` (violates) or `No` (safe). Softmax the logits on those two tokens for a calibrated score.
|
|
|
|
## Minimum invocation — ShieldGemma 2 (image)
|
|
|
|
```python
|
|
from transformers import AutoProcessor, ShieldGemma2ForImageClassification
|
|
from PIL import Image
|
|
import torch
|
|
|
|
model_id = "google/shieldgemma-2-4b-it"
|
|
model = ShieldGemma2ForImageClassification.from_pretrained(model_id).eval()
|
|
processor = AutoProcessor.from_pretrained(model_id)
|
|
|
|
image = Image.open("input.jpg")
|
|
inputs = processor(images=[image], return_tensors="pt")
|
|
|
|
with torch.inference_mode():
|
|
out = model(**inputs)
|
|
|
|
print(out.probabilities) # tensor of per-category "Yes" probabilities
|
|
```
|
|
|
|
## When to choose it over base Gemma 4
|
|
|
|
- You need a **calibrated safety score**, not a free-form "is this safe?" answer from the chat model. ShieldGemma emits Yes/No token logits — easy to threshold.
|
|
- You want **policy-by-policy classification** (e.g., run each category separately with different thresholds).
|
|
- You're running a moderation pipeline and need **a small, fast, purpose-trained classifier** rather than a general chat model reasoning about safety.
|
|
|
|
Use base Gemma 4 for "explain *why* this is unsafe" narrative output. ShieldGemma is the yes/no stamp.
|
|
|
|
## Homelab fit
|
|
|
|
Pre-filter for `ai-visualizer` (CT 167, pve197) before publishing generated images. ShieldGemma 2 4B at Q4 fits comfortably on the Tesla V100-PCIE-32GB alongside SDXL.
|