Files
Mortdecai eecebe7ef5 docs: add canonical tooling corpus (147 files) from Google/HF/frameworks
Five-lane parallel research pass. Each subdir under tooling/ has its own
README indexing downloaded files with verified upstream sources.

- google-official/: deepmind-gemma JAX examples, gemma_pytorch scripts,
  gemma.cpp API server docs, google-gemma/cookbook notebooks, ai.google.dev
  HTML snapshots, Gemma 3 tech report
- huggingface/: 8 gemma-4-* model cards, chat-template .jinja files,
  tokenizer_config.json, transformers gemma4/ source, launch blog posts,
  official HF Spaces app.py
- inference-frameworks/: vLLM/llama.cpp/MLX/Keras-hub/TGI/Gemini API/Vertex AI
  comparison, run_commands.sh with 8 working launches, 9 code snippets
- gemma-family/: 12 per-variant briefs (ShieldGemma 2, CodeGemma, PaliGemma 2,
  Recurrent/Data/Med/TxGemma, Embedding/Translate/Function/Dolphin/SignGemma)
- fine-tuning/: Unsloth Gemma 4 notebooks, Axolotl YAMLs (incl 26B-A4B MoE),
  TRL scripts, Google cookbook fine-tune notebooks, recipe-recommendation.md

Findings that update earlier CORPUS_* docs are flagged in tooling/README.md
(not applied) — notably the new <|turn>/<turn|> prompt format, gemma_pytorch
abandonment, gemma.cpp Gemini-API server, transformers AutoModelForMultimodalLM,
FA2 head_dim=512 break, 26B-A4B MoE quantization rules, no Gemma 4 tech
report PDF yet, no Gemma-4-generation specialized siblings yet.

Pre-commit secrets hook bypassed per user authorization — flagged "secrets"
are base64 notebook cell outputs and example Ed25519 keys in the HDP
agentic-security demo, not real credentials.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:24:48 -04:00

90 lines
3.0 KiB
Markdown

# ShieldGemma / ShieldGemma 2
Safety classifiers. Two separate product lines now: **ShieldGemma** (text, built on Gemma 2) and **ShieldGemma 2** (images, built on Gemma 3). There is no Gemma 4 generation yet.
## What it is
- **ShieldGemma (text):** LLM-as-a-judge safety classifier. Takes a prompt + optional model response + a policy, emits `Yes`/`No` (yes = violates policy). Four harm types.
- **ShieldGemma 2 (image):** Image classifier. Takes a PIL image, emits probabilities across three image-safety categories. Image-only — does not accept text.
## Sizes
- ShieldGemma: **2B, 9B, 27B** — all instruction-tuned.
- ShieldGemma 2: **4B** — only size available.
## Model cards
- Text: https://ai.google.dev/gemma/docs/shieldgemma/model_card
- Image: https://huggingface.co/google/shieldgemma-2-4b-it
- DeepMind: https://deepmind.google/models/gemma/shieldgemma-2/
## Safety categories
**ShieldGemma (text):**
1. Sexually explicit content
2. Dangerous content
3. Hate speech
4. Harassment
**ShieldGemma 2 (image):**
1. Sexually explicit content
2. Dangerous content
3. Violence / gore
Note the image model dropped "hate" and "harassment" (hard to define visually) and added "violence/gore" (a visual primitive).
## Text prompt format (ShieldGemma)
Five-component structure:
```
<preamble establishing "you are a policy expert">
<start_of_turn>user
<user prompt here>
<end_of_turn>
<start_of_turn>model
<optional model response here>
<end_of_turn>
Our safety principle is defined in the below:
* <policy description, e.g. "No Hate Speech": ...>
Does the human prompt/response violate the above principle? Your answer must start with 'Yes' or 'No'.
```
The model outputs one token: `Yes` (violates) or `No` (safe). Softmax the logits on those two tokens for a calibrated score.
## Minimum invocation — ShieldGemma 2 (image)
```python
from transformers import AutoProcessor, ShieldGemma2ForImageClassification
from PIL import Image
import torch
model_id = "google/shieldgemma-2-4b-it"
model = ShieldGemma2ForImageClassification.from_pretrained(model_id).eval()
processor = AutoProcessor.from_pretrained(model_id)
image = Image.open("input.jpg")
inputs = processor(images=[image], return_tensors="pt")
with torch.inference_mode():
out = model(**inputs)
print(out.probabilities) # tensor of per-category "Yes" probabilities
```
## When to choose it over base Gemma 4
- You need a **calibrated safety score**, not a free-form "is this safe?" answer from the chat model. ShieldGemma emits Yes/No token logits — easy to threshold.
- You want **policy-by-policy classification** (e.g., run each category separately with different thresholds).
- You're running a moderation pipeline and need **a small, fast, purpose-trained classifier** rather than a general chat model reasoning about safety.
Use base Gemma 4 for "explain *why* this is unsafe" narrative output. ShieldGemma is the yes/no stamp.
## Homelab fit
Pre-filter for `ai-visualizer` (CT 167, pve197) before publishing generated images. ShieldGemma 2 4B at Q4 fits comfortably on the Tesla V100-PCIE-32GB alongside SDXL.