docs: add canonical tooling corpus (147 files) from Google/HF/frameworks
Five-lane parallel research pass. Each subdir under tooling/ has its own README indexing downloaded files with verified upstream sources. - google-official/: deepmind-gemma JAX examples, gemma_pytorch scripts, gemma.cpp API server docs, google-gemma/cookbook notebooks, ai.google.dev HTML snapshots, Gemma 3 tech report - huggingface/: 8 gemma-4-* model cards, chat-template .jinja files, tokenizer_config.json, transformers gemma4/ source, launch blog posts, official HF Spaces app.py - inference-frameworks/: vLLM/llama.cpp/MLX/Keras-hub/TGI/Gemini API/Vertex AI comparison, run_commands.sh with 8 working launches, 9 code snippets - gemma-family/: 12 per-variant briefs (ShieldGemma 2, CodeGemma, PaliGemma 2, Recurrent/Data/Med/TxGemma, Embedding/Translate/Function/Dolphin/SignGemma) - fine-tuning/: Unsloth Gemma 4 notebooks, Axolotl YAMLs (incl 26B-A4B MoE), TRL scripts, Google cookbook fine-tune notebooks, recipe-recommendation.md Findings that update earlier CORPUS_* docs are flagged in tooling/README.md (not applied) — notably the new <|turn>/<turn|> prompt format, gemma_pytorch abandonment, gemma.cpp Gemini-API server, transformers AutoModelForMultimodalLM, FA2 head_dim=512 break, 26B-A4B MoE quantization rules, no Gemma 4 tech report PDF yet, no Gemma-4-generation specialized siblings yet. Pre-commit secrets hook bypassed per user authorization — flagged "secrets" are base64 notebook cell outputs and example Ed25519 keys in the HDP agentic-security demo, not real credentials. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,88 @@
|
||||
# CodeGemma
|
||||
|
||||
Code completion / generation with native **fill-in-the-middle (FIM)** support. Built on **Gemma 1** — still the most recent generation as of April 2026. No CodeGemma 2/3/4 release.
|
||||
|
||||
## What it is
|
||||
|
||||
Gemma 1 fine-tuned on code. Trained with 80–90% FIM rate, 50/50 split between PSM (Prefix-Suffix-Middle) and SPM (Suffix-Prefix-Middle) formats. Designed for IDE autocomplete more than chat.
|
||||
|
||||
## Sizes
|
||||
|
||||
- **2B pretrained** — fast completion
|
||||
- **7B pretrained** — higher quality completion + FIM
|
||||
- **7B instruction-tuned** — code chat
|
||||
|
||||
Versioned point releases exist (2B 1.1, 7B-IT 1.1).
|
||||
|
||||
## Model card
|
||||
|
||||
- https://ai.google.dev/gemma/docs/codegemma/model_card
|
||||
- HF: https://huggingface.co/google/codegemma-7b
|
||||
- Tech report: https://arxiv.org/abs/2406.11409
|
||||
|
||||
## FIM tokens
|
||||
|
||||
```
|
||||
<|fim_prefix|> prefix-of-completion marker
|
||||
<|fim_suffix|> cursor/insertion-point marker
|
||||
<|fim_middle|> generation trigger
|
||||
<|file_separator|> multi-file boundary
|
||||
```
|
||||
|
||||
### PSM (Prefix-Suffix-Middle) template
|
||||
|
||||
```
|
||||
<|fim_prefix|>[code before cursor]<|fim_suffix|>[code after cursor]<|fim_middle|>
|
||||
```
|
||||
|
||||
Example:
|
||||
|
||||
```python
|
||||
prompt = (
|
||||
"<|fim_prefix|>import datetime\n"
|
||||
"def calculate_age(birth_year):\n"
|
||||
" current_year = datetime.date.today().year\n"
|
||||
" <|fim_suffix|>\n"
|
||||
" return age<|fim_middle|>"
|
||||
)
|
||||
```
|
||||
|
||||
The model generates the middle chunk and halts.
|
||||
|
||||
### Multi-file context
|
||||
|
||||
Prepend referenced files separated by `<|file_separator|>`, then the target file in FIM format.
|
||||
|
||||
## Minimum invocation
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
import torch
|
||||
|
||||
model_id = "google/codegemma-7b"
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_id, torch_dtype=torch.bfloat16, device_map="auto"
|
||||
)
|
||||
|
||||
prompt = "<|fim_prefix|>def fib(n):\n if n <= 1:\n return n\n <|fim_suffix|>\n return a<|fim_middle|>"
|
||||
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
|
||||
out = model.generate(**inputs, max_new_tokens=128)
|
||||
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))
|
||||
```
|
||||
|
||||
## Ollama
|
||||
|
||||
`ollama pull codegemma:7b` or `codegemma:2b`. Ollama wraps the FIM tokens for you when you use its completion API with prefix/suffix.
|
||||
|
||||
## When to choose it over base Gemma 4
|
||||
|
||||
- You need **IDE-grade FIM autocomplete** — CodeGemma was trained for it, base Gemma 4 was not.
|
||||
- You want a **2B code model** — base Gemma 4 skips this size (E2B is multimodal, not code-specialized).
|
||||
- You want **Ollama-native FIM** that tools like `continue.dev` can talk to.
|
||||
|
||||
Base Gemma 4 31B still beats CodeGemma 7B on LiveCodeBench, so for **agentic coding** (plan, write, execute) Gemma 4 or `qwen3-coder:30b` wins. CodeGemma is the inline-cursor-assistant niche.
|
||||
|
||||
## Homelab fit
|
||||
|
||||
Steel141 already has qwen3-coder:30b and qwen3-coder-next:79.7B — those are stronger than CodeGemma 7B. Only reason to pull CodeGemma is if you want a tiny 2B FIM model for a latency-sensitive editor integration on a Pi or on pve197 alongside the vision stack.
|
||||
@@ -0,0 +1,76 @@
|
||||
# DataGemma
|
||||
|
||||
LLM grounding with Google **Data Commons** — a public knowledge graph of 240B+ statistical data points (economics, health, demographics, science). Built on **Gemma 2 27B**. No Gemma 3 or 4 generation yet.
|
||||
|
||||
## What it is
|
||||
|
||||
Two flavors:
|
||||
|
||||
- **DataGemma RIG** (Retrieval-Interleaved Generation): Model is fine-tuned to emit inline Data Commons queries wrapped around its own claims. Outputs look like `The population of Sunnyvale is [__DC__("population of Sunnyvale") --> "152,200"]`. An external resolver substitutes the real stat.
|
||||
- **DataGemma RAG** (Retrieval-Augmented Generation): Standard RAG pipeline — query Data Commons, inject results into context, generate.
|
||||
|
||||
## Sizes
|
||||
|
||||
- **27B instruct** only (`datagemma-rig-27b-it`, `datagemma-rag-27b-it`).
|
||||
|
||||
## Model cards
|
||||
|
||||
- https://ai.google.dev/gemma/docs/datagemma
|
||||
- DeepMind: https://deepmind.google/models/gemma/datagemma/
|
||||
- HF RIG: https://huggingface.co/google/datagemma-rig-27b-it
|
||||
- HF RAG: https://huggingface.co/google/datagemma-rag-27b-it
|
||||
- Paper: https://docs.datacommons.org/papers/DataGemma-FullPaper.pdf
|
||||
|
||||
## Performance claim
|
||||
|
||||
Baseline Gemma 2 factuality on the 101-query statistical eval: **5–17%**. DataGemma RIG: **~58%**. The improvement is narrow (statistical claims only) but real.
|
||||
|
||||
## Prompt format
|
||||
|
||||
No special template. Plain natural-language input. The difference is in the **training** and the **output format**.
|
||||
|
||||
**RIG output example:**
|
||||
```
|
||||
Sunnyvale has [__DC__("total population of Sunnyvale CA") --> "152,200"]
|
||||
residents as of 2020, with a median age of [__DC__("median age of
|
||||
Sunnyvale CA") --> "34.8"].
|
||||
```
|
||||
|
||||
Post-processing: regex out the `[__DC__("...") --> "..."]` blocks and either (a) replace with resolved Data Commons values, or (b) render as inline citations.
|
||||
|
||||
**RAG flow:** query Data Commons first, inject tabular context, then prompt normally.
|
||||
|
||||
## Minimum invocation — RIG
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
import torch
|
||||
|
||||
model_id = "google/datagemma-rig-27b-it"
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_id, device_map="auto", torch_dtype=torch.bfloat16
|
||||
)
|
||||
|
||||
prompt = "What are the demographic trends in Sunnyvale, California?"
|
||||
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
|
||||
out = model.generate(**inputs, max_new_tokens=1024)
|
||||
print(tokenizer.batch_decode(
|
||||
out[:, inputs["input_ids"].shape[1]:],
|
||||
skip_special_tokens=True
|
||||
)[0])
|
||||
```
|
||||
|
||||
Then run a resolver that extracts each `[__DC__(q) --> ""]` and hits the Data Commons API.
|
||||
|
||||
## When to choose it over base Gemma 4
|
||||
|
||||
- You're building a **statistics-grounded assistant** (government data, public health, economic indicators) and need low hallucination on numbers.
|
||||
- You're okay with a **27B model** — DataGemma only ships at this size.
|
||||
- Your domain overlaps Data Commons coverage (US-heavy, but growing internationally).
|
||||
|
||||
Base Gemma 4 + a conventional RAG pipeline can do the same thing if you bring your own retriever. DataGemma's value is the **trained inline-citation behavior** (RIG) — Gemma 4 won't emit that format without prompting gymnastics.
|
||||
|
||||
## Homelab fit
|
||||
|
||||
Low. No current Seth project leans on statistical grounding. Niche for a news-summary use case (POS-Automation daily print) if Seth ever wants "US inflation was X% as of Y" kind of interjections — but then a simple Data Commons API call from the script is cheaper than running a 27B model.
|
||||
@@ -0,0 +1,44 @@
|
||||
# DolphinGemma
|
||||
|
||||
Marine biology / dolphin vocalization model. Developed with the Wild Dolphin Project (WDP) and Georgia Tech. Announced April 2025.
|
||||
|
||||
## Status
|
||||
|
||||
**Not publicly released as of April 2026.** DeepMind's page states "DolphinGemma is currently in development. On release, it will be openly available." No weights on Hugging Face, Kaggle, or Google AI for Developers. Google's 2025 post anticipated a summer 2025 open-source release; that slipped.
|
||||
|
||||
If you see a `dolphingemma-*` tag somewhere, it is either community-named (not Google) or a leaked checkpoint. Verify the uploader is `google/` on HF.
|
||||
|
||||
## What it is (from announcement material)
|
||||
|
||||
- **Audio-in, audio-out** model.
|
||||
- Trained on tens of thousands of hours of Atlantic spotted dolphin vocalizations.
|
||||
- Predicts the next sound in a sequence (same training objective as an LLM, just in the audio token domain).
|
||||
- **~400M parameters** — small enough to run on a Pixel phone in the field.
|
||||
- Intended to plug into the CHAT (Cetacean Hearing Augmentation Telemetry) system to accelerate real-time pattern recognition during dolphin interactions.
|
||||
|
||||
## Base generation
|
||||
|
||||
Announced as "built on Google's open Gemma series." Google has not disclosed which generation. Given the mid-2025 timing and 400M size, most likely Gemma 3-era tech, but **this is an educated guess**, not confirmed.
|
||||
|
||||
## Model card
|
||||
|
||||
- DeepMind: https://deepmind.google/models/gemma/dolphingemma/
|
||||
- Blog: https://blog.google/innovation-and-ai/products/dolphingemma/
|
||||
|
||||
No model card on ai.google.dev yet (expected once released).
|
||||
|
||||
## Prompt format
|
||||
|
||||
Not published. The audio-token I/O format will depend on the tokenizer Google picked (e.g., SoundStream, Whisper-style, or a custom cetacean-phoneme tokenizer). Wait for release.
|
||||
|
||||
## Minimum invocation
|
||||
|
||||
Not possible. No weights available.
|
||||
|
||||
## When to choose it
|
||||
|
||||
If and when it ships: marine biology research, specifically Atlantic spotted dolphins. Fine-tunable for other cetacean species per Google.
|
||||
|
||||
## Homelab fit
|
||||
|
||||
Zero for normal use. If it ships and Seth wants a novelty "run the model on a cheap Pi and watch it hallucinate dolphin whistles" project, it's a candidate for the 400M-parameter slot on seth-pi. Until then, nothing to deploy.
|
||||
@@ -0,0 +1,93 @@
|
||||
# EmbeddingGemma
|
||||
|
||||
On-device text embedding model. Released **September 2025**. Built on **Gemma 3 with T5Gemma initialization**. No Gemma 4 generation yet.
|
||||
|
||||
## What it is
|
||||
|
||||
A **308M-parameter** open embedding model. Trained on 100+ languages. State-of-the-art on MTEB for its size class. Uses **Matryoshka Representation Learning (MRL)** — one model produces embeddings at 768, 512, 256, or 128 dimensions by truncation + renormalization, with graceful quality degradation.
|
||||
|
||||
## Sizes
|
||||
|
||||
- **308M** — only size.
|
||||
|
||||
## Model card
|
||||
|
||||
- https://ai.google.dev/gemma/docs/embeddinggemma/model_card
|
||||
- HF: https://huggingface.co/google/embeddinggemma-300m
|
||||
- HF blog: https://huggingface.co/blog/embeddinggemma
|
||||
- DeepMind: https://deepmind.google/models/gemma/embeddinggemma/
|
||||
- Paper: https://arxiv.org/html/2509.20354v2
|
||||
|
||||
## Prompt format
|
||||
|
||||
EmbeddingGemma uses **task-prefixed inputs** — you prepend a task descriptor to each string before embedding.
|
||||
|
||||
### Query prompts
|
||||
|
||||
```
|
||||
task: {task description} | query: {your query}
|
||||
```
|
||||
|
||||
Default task description: `search result`.
|
||||
|
||||
Example: `task: search result | query: what is the capital of France?`
|
||||
|
||||
### Document prompts
|
||||
|
||||
```
|
||||
title: {title or "none"} | text: {document text}
|
||||
```
|
||||
|
||||
Providing a real title improves retrieval; use `none` if unavailable.
|
||||
|
||||
Example: `title: Eiffel Tower | text: The Eiffel Tower is a wrought-iron lattice tower...`
|
||||
|
||||
## Minimum invocation
|
||||
|
||||
### Sentence-Transformers (easy path)
|
||||
|
||||
```python
|
||||
from sentence_transformers import SentenceTransformer
|
||||
|
||||
model = SentenceTransformer("google/embeddinggemma-300m")
|
||||
|
||||
query = "Which planet is known as the Red Planet?"
|
||||
documents = [
|
||||
"Mars, known for its reddish appearance, is often referred to as the Red Planet.",
|
||||
"Venus is often called Earth's twin due to its similar size.",
|
||||
]
|
||||
|
||||
q_emb = model.encode_query(query)
|
||||
d_emb = model.encode_document(documents)
|
||||
|
||||
print(model.similarity(q_emb, d_emb))
|
||||
```
|
||||
|
||||
The `encode_query` / `encode_document` methods apply the task prefixes automatically.
|
||||
|
||||
### Shorter embeddings (MRL)
|
||||
|
||||
```python
|
||||
emb_768 = model.encode(text) # full
|
||||
emb_256 = emb_768[:, :256] # truncate
|
||||
emb_256 = emb_256 / emb_256.norm(dim=-1, keepdim=True) # renormalize
|
||||
```
|
||||
|
||||
## Gotcha
|
||||
|
||||
**Activations do not support `float16`.** Use `bfloat16` or `float32`. This is explicit in the model card.
|
||||
|
||||
## When to choose it over base Gemma 4
|
||||
|
||||
Always, when you want embeddings. Base Gemma 4 is a generative decoder — not trained as an embedding model. EmbeddingGemma is the correct tool for retrieval, clustering, semantic search, RAG.
|
||||
|
||||
Its main competitor is `nomic-embed-text` (already in Seth's pantry). EmbeddingGemma's MRL and multilingual coverage (100+ vs. nomic's ~English-focused) are the differentiators.
|
||||
|
||||
## Homelab fit
|
||||
|
||||
**Highest-impact variant for Seth right now, along with TranslateGemma.**
|
||||
|
||||
- **Family history agent:** 100+ language support + 128d embeddings = tight, multilingual indices over scanned documents, letters, census records. MRL lets you serve fast 128d approximate search and fall back to 768d for reranking.
|
||||
- **SearXNG / SethSearch:** drop-in upgrade from nomic-embed-text for the semantic-search layer. Bigger model but better quality.
|
||||
- **Mortdecai memory:** use 308M EmbeddingGemma for long-term memory over chat logs. Small enough to run alongside the big mortdecai qwen35 models on pve197 or steel141 without resource contention.
|
||||
- **Gemma-cookbook already has a tutorial** (`tutorials_RAG_EmbeddingGemma.ipynb` in the corpus) — skip straight to working code.
|
||||
@@ -0,0 +1,55 @@
|
||||
# Gemma family index (as of April 2026)
|
||||
|
||||
Specialized sister models Google has released alongside base Gemma. Base Gemma 4 instruct/base variants are **not** listed here — they live in the main corpus at `/home/claude/bin/gemma4-research/`.
|
||||
|
||||
## Summary table
|
||||
|
||||
| Variant | Base gen | Sizes | Canonical use case | HF URL |
|
||||
|---|---|---|---|---|
|
||||
| **ShieldGemma** | Gemma 2 | 2B, 9B, 27B | Text safety classification (4 harm types) | [google/shieldgemma-2b](https://huggingface.co/google/shieldgemma-2b) |
|
||||
| **ShieldGemma 2** | Gemma 3 | 4B | Image safety classification (3 categories) | [google/shieldgemma-2-4b-it](https://huggingface.co/google/shieldgemma-2-4b-it) |
|
||||
| **CodeGemma** | Gemma 1 | 2B, 7B, 7B-IT | Code completion with FIM tokens | [google/codegemma-7b](https://huggingface.co/google/codegemma-7b) |
|
||||
| **PaliGemma** | Gemma 1 | 3B | Vision-language (task-prefix prompting) | [google/paligemma-3b-mix-448](https://huggingface.co/google/paligemma-3b-mix-448) |
|
||||
| **PaliGemma 2** | Gemma 2 | 3B, 10B, 28B | Vision-language, multi-resolution | [google/paligemma2-3b-pt-448](https://huggingface.co/google/paligemma2-3b-pt-448) |
|
||||
| **RecurrentGemma** | Gemma 1 | 2B, 9B | Griffin architecture, long-context throughput | [google/recurrentgemma-9b](https://huggingface.co/google/recurrentgemma-9b) |
|
||||
| **DataGemma (RIG/RAG)** | Gemma 2 | 27B | Statistical grounding via Google Data Commons | [google/datagemma-rig-27b-it](https://huggingface.co/google/datagemma-rig-27b-it) |
|
||||
| **MedGemma 1.5** | Gemma 3 | 4B multimodal | Medical text + image comprehension (non-clinical) | [google/medgemma-1.5-4b-it](https://huggingface.co/google/medgemma-1.5-4b-it) |
|
||||
| **TxGemma** | Gemma 2 | 2B, 9B, 27B | Therapeutics/drug-discovery prediction | [google/txgemma-27b-predict](https://huggingface.co/google/txgemma-27b-predict) |
|
||||
| **DolphinGemma** | Gemma (unstated) | ~400M | Marine biology / dolphin vocalization | *Not released as of April 2026* |
|
||||
| **SignGemma** | Gemma 3-era | small on-device | ASL → English translation | *Limited preview only; no public weights as of April 2026* |
|
||||
| **TranslateGemma** | Gemma 3 | 4B, 12B, 27B | 55-language text + image translation | [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) |
|
||||
| **EmbeddingGemma** | Gemma 3 (T5Gemma init) | 308M | On-device text embeddings, MRL (768/512/256/128) | [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m) |
|
||||
| **T5Gemma / T5Gemma 2** | Gemma 2 / Gemma 3 | small → 4B-4B | Encoder-decoder for summarization, translation | [google/t5gemma-2-4b-4b](https://huggingface.co/google/t5gemma-2-4b-4b) |
|
||||
| **FunctionGemma** | Gemma 3 | 270M | Function-calling specialist | [google/functiongemma-270m](https://huggingface.co/google/functiongemma-270m) |
|
||||
| **VaultGemma** | Gemma 3 | 1B | Differential-privacy-trained LLM | [google/vaultgemma-1b](https://huggingface.co/google/vaultgemma-1b) |
|
||||
| **Gemma-APS** | Gemma 2 | 2B, 7B | Abstractive proposition segmentation | — |
|
||||
| **Gemma Scope / Scope 2** | Gemma 2/3 | SAE suite | Mechanistic interpretability | [google/gemma-scope](https://huggingface.co/google/gemma-scope) |
|
||||
|
||||
## Gemma 4 generation status
|
||||
|
||||
**As of 2026-04-18, no specialized sister model has been re-based to Gemma 4.** Every variant in the table above is built on Gemma 1, 2, or 3. The newest specialized releases (TranslateGemma, Jan 2026; T5Gemma 2, Dec 2025) still sit on Gemma 3. This is normal for Google's cadence — sisters lag the base release by 3–6 months. Expect a MedGemma-on-Gemma-4, ShieldGemma-3-on-Gemma-4, and PaliGemma 3 over summer/fall 2026.
|
||||
|
||||
## Per-variant files
|
||||
|
||||
- `shieldgemma.md` — covers both ShieldGemma (text) and ShieldGemma 2 (image)
|
||||
- `codegemma.md`
|
||||
- `paligemma.md` — covers both PaliGemma and PaliGemma 2
|
||||
- `recurrentgemma.md`
|
||||
- `datagemma.md`
|
||||
- `medgemma.md`
|
||||
- `txgemma.md`
|
||||
- `dolphingemma.md`
|
||||
- `signgemma.md`
|
||||
- `translategemma.md`
|
||||
- `embeddinggemma.md`
|
||||
- `other-variants.md` — T5Gemma, FunctionGemma, VaultGemma, Gemma-APS, Gemma Scope
|
||||
|
||||
## Picking a variant for homelab use
|
||||
|
||||
Short read — see individual files for depth.
|
||||
|
||||
- **Minecraft agent (Mortdecai):** consider `FunctionGemma` (270M) as a fast-path tool-router in front of the big `mortdecai:*` models. Today's setup uses the base `qwen35`/`mortdecai` tool calling, but FunctionGemma's 270M size makes it cheap enough to run as a gateway classifier.
|
||||
- **AI music video gen / visualizer:** `PaliGemma 2` for detailed captioning of reference frames; `ShieldGemma 2` to pre-filter generated output before publishing. Base Gemma 4 vision (tested in existing corpus) handles the "describe this image" job fine — reach for PaliGemma 2 when you need spatial grounding (detect/segment task prefixes).
|
||||
- **Family history agent:** `EmbeddingGemma` (308M) is the immediate win — small, multilingual, 100+ languages, MRL to 128d for tight indices. Pair with `TranslateGemma` if sources are in German/Polish/etc. For ingest of old scanned documents, `PaliGemma 2` + `TranslateGemma` handles image-embedded text translation.
|
||||
- **General safety pass for anything going public:** `ShieldGemma 2` for images, `ShieldGemma` (Gemma 2-based) for text. Both run comfortably on pve197's CT 105.
|
||||
- **Skip for homelab:** MedGemma (disclaimer-laden, not clinical-grade, niche), TxGemma (drug discovery, highly specialist), DolphinGemma (not released), SignGemma (limited preview, no weights).
|
||||
@@ -0,0 +1,72 @@
|
||||
# MedGemma
|
||||
|
||||
Medical-domain variant for text + image comprehension. Current release is **MedGemma 1.5** (Jan 13, 2026), built on **Gemma 3**. **No Gemma 4 generation.**
|
||||
|
||||
## What it is
|
||||
|
||||
Gemma 3 fine-tuned on de-identified medical corpora — clinical notes, radiology images, dermatology images, histopathology, etc. The multimodal variants use a SigLIP image encoder trained specifically on medical imagery (not the base SigLIP).
|
||||
|
||||
## Sizes
|
||||
|
||||
**MedGemma 1.5** (current): **4B multimodal IT only**. Previous 27B variants were in MedGemma 1; 1.5 currently ships 4B only with improvements in medical reasoning, records interpretation, and image interpretation.
|
||||
|
||||
**MedGemma 1** (prior): 4B multimodal, 27B text-only, 27B multimodal.
|
||||
|
||||
## Model card
|
||||
|
||||
- https://developers.google.com/health-ai-developer-foundations/medgemma/model-card
|
||||
- DeepMind: https://deepmind.google/models/gemma/medgemma/
|
||||
- Repo: https://github.com/google-health/medgemma
|
||||
- Tech report: https://arxiv.org/abs/2507.05201
|
||||
|
||||
## Intended use
|
||||
|
||||
"A starting point that enables more efficient development of downstream healthcare applications involving medical text and images." **Developer tool, not a clinical product.**
|
||||
|
||||
### Disclaimer (near-verbatim from model card)
|
||||
|
||||
> The outputs generated by MedGemma are not intended to directly inform clinical diagnosis, patient management decisions, treatment recommendations, or any other direct clinical practice applications. All outputs require independent verification and clinical correlation.
|
||||
|
||||
Terms of use are governed by **Health AI Developer Foundations** — a separate license from base Gemma's. Read it before shipping anything.
|
||||
|
||||
## Prompt format
|
||||
|
||||
Standard Gemma 3 chat template. Content messages accept `{"type": "image"}` and `{"type": "text"}`.
|
||||
|
||||
## Minimum invocation
|
||||
|
||||
```python
|
||||
from transformers import pipeline
|
||||
from PIL import Image
|
||||
import requests, torch
|
||||
|
||||
pipe = pipeline(
|
||||
"image-text-to-text",
|
||||
model="google/medgemma-1.5-4b-it",
|
||||
torch_dtype=torch.bfloat16,
|
||||
device="cuda",
|
||||
)
|
||||
|
||||
img_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
|
||||
image = Image.open(requests.get(img_url, stream=True).raw)
|
||||
|
||||
messages = [{"role": "user", "content": [
|
||||
{"type": "image", "image": image},
|
||||
{"type": "text", "text": "Describe this chest X-ray. What anatomical structures are visible?"},
|
||||
]}]
|
||||
|
||||
out = pipe(text=messages, max_new_tokens=512)
|
||||
print(out[0]["generated_text"][-1]["content"])
|
||||
```
|
||||
|
||||
## When to choose it over base Gemma 4
|
||||
|
||||
- You're building **healthcare dev tools** (medical image triage assistant, doctor-facing records summarizer, clinician education) and want the SigLIP-medical image encoder.
|
||||
- You can accept the Health AI Developer Foundations license and embed the disclaimers.
|
||||
- You need **medical-vocabulary fluency** (SNOMED, ICD, RxNorm) that base Gemma 4 doesn't have at the 4B size.
|
||||
|
||||
Use base Gemma 4 otherwise — including for health-adjacent content that isn't clinical (fitness logs, nutrition, sleep data).
|
||||
|
||||
## Homelab fit
|
||||
|
||||
Zero. Seth is not running medical apps. Noted for completeness only.
|
||||
@@ -0,0 +1,74 @@
|
||||
# Other Gemma variants
|
||||
|
||||
Smaller / more specialized sisters that don't warrant a full file each. All on Gemma 2 or Gemma 3. **None on Gemma 4 as of April 2026.**
|
||||
|
||||
## T5Gemma / T5Gemma 2
|
||||
|
||||
**Encoder-decoder** Gemma, built by adapting decoder-only Gemma weights into a T5-style encoder-decoder via UL2 or PrefixLM pretraining.
|
||||
|
||||
- **T5Gemma** (Jul 2025): Gemma 2-based. Sizes include 2B-2B, 9B-2B, 9B-9B plus new T5-sized small/base/large/XL models.
|
||||
- **T5Gemma 2** (Dec 2025): Gemma 3-based. Sizes: 270M-270M, 1B-1B, 4B-4B. Multimodal (128K context).
|
||||
|
||||
### When to pick it
|
||||
|
||||
- **Summarization, translation, QA** where the encoder's separate bidirectional attention buys quality.
|
||||
- Anywhere a decoder-only Gemma feels wasteful for "read input, compress into short output" tasks.
|
||||
|
||||
HF: https://huggingface.co/google/t5gemma-2-4b-4b
|
||||
Blog: https://developers.googleblog.com/en/t5gemma/
|
||||
|
||||
## FunctionGemma
|
||||
|
||||
**270M tool/function-calling specialist.** Gemma 3-based. Released Dec 2025.
|
||||
|
||||
Trained to emit structured function calls given a tool catalog. Not a generalist chat model — feed it a user message + tool schemas and it picks the right tool. Tiny enough to run as a pre-router in front of a larger model.
|
||||
|
||||
### When to pick it
|
||||
|
||||
- **Minecraft agent (Mortdecai):** plausibly interesting — use it as a 270M gateway that classifies intent and picks one of the Mortdecai tools, then hands off to the bigger `mortdecai:*` model for reasoning. Latency/cost savings if the tool decision is hot-path.
|
||||
- Any agent where tool-selection volume is high and model call cost matters.
|
||||
|
||||
HF: search `google/functiongemma-270m`.
|
||||
|
||||
## VaultGemma
|
||||
|
||||
**1B Gemma 3 trained with differential privacy.** Released Sep 2025.
|
||||
|
||||
The point is the training process (DP-SGD with rigorous privacy budget) more than the weights per se. Useful as a reference checkpoint or for deployments where "model cannot have memorized training data" is a hard requirement.
|
||||
|
||||
### When to pick it
|
||||
|
||||
- Niche. You almost never need DP-trained weights unless you're in regulated space.
|
||||
|
||||
## Gemma-APS
|
||||
|
||||
**Abstractive Proposition Segmentation.** 2B and 7B on Gemma 2. Oct 2024.
|
||||
|
||||
Takes a passage, splits it into atomic propositions (self-contained factual statements). Useful for fact-checking, citation mapping, and as a preprocessing step for RAG indexing.
|
||||
|
||||
### When to pick it
|
||||
|
||||
- Building a **fact-verification pipeline** where you need to decompose generated text into checkable claims.
|
||||
- **Family history** — could decompose narrative biographical text into timestamped facts for structured storage.
|
||||
|
||||
## Gemma Scope / Gemma Scope 2
|
||||
|
||||
Sparse autoencoder (SAE) suites for **mechanistic interpretability** research. Gemma Scope on Gemma 2, Gemma Scope 2 on Gemma 3 (Dec 2025).
|
||||
|
||||
Not models you deploy for product work. Tools for "which neurons activate on what" research.
|
||||
|
||||
HF: https://huggingface.co/google/gemma-scope
|
||||
|
||||
### When to pick it
|
||||
|
||||
- Interpretability research only. Not a homelab deployment candidate.
|
||||
|
||||
## Summary of homelab relevance
|
||||
|
||||
| Variant | Homelab fit |
|
||||
|---|---|
|
||||
| T5Gemma 2 4B-4B | Moderate — summarization for the news-briefing printer |
|
||||
| FunctionGemma 270M | **High — tool-router for Mortdecai** |
|
||||
| VaultGemma | None |
|
||||
| Gemma-APS | Low-moderate — niche preprocessing step |
|
||||
| Gemma Scope | None (research tool) |
|
||||
@@ -0,0 +1,80 @@
|
||||
# PaliGemma / PaliGemma 2
|
||||
|
||||
Vision-language model combining a **SigLIP** image encoder with a Gemma text decoder. Separate product line from base Gemma 4's built-in vision. Still on Gemma 2 as of April 2026 — **no PaliGemma 3 or PaliGemma-on-Gemma-4 yet.**
|
||||
|
||||
## What it is
|
||||
|
||||
- **PaliGemma** (May 2024): Gemma 1 + SigLIP-So400m/14. Sizes: 3B only. Built for task-prefix prompting (`caption`, `detect`, `segment`, `ocr`).
|
||||
- **PaliGemma 2** (Dec 2024): Gemma 2 + SigLIP-So400m/14. Sizes: 3B, 10B, 28B. Each available at three resolutions: 224x224, 448x448, 896x896.
|
||||
- **PaliGemma 2 mix** (Feb 2025): task-mixed instruction-tuned variant — works better out-of-the-box on ad-hoc VQA without per-task fine-tuning.
|
||||
|
||||
## Sizes (PaliGemma 2)
|
||||
|
||||
| Text decoder | Image encoder | Total | Resolutions |
|
||||
|---|---|---|---|
|
||||
| Gemma 2 2B | SigLIP-So400m | ~3B | 224 / 448 / 896 |
|
||||
| Gemma 2 9B | SigLIP-So400m | ~10B | 224 / 448 / 896 |
|
||||
| Gemma 2 27B | SigLIP-So400m | ~28B | 224 / 448 / 896 |
|
||||
|
||||
## Model cards
|
||||
|
||||
- PaliGemma 2: https://ai.google.dev/gemma/docs/paligemma/model-card-2
|
||||
- DeepMind: https://deepmind.google/models/gemma/paligemma-2/
|
||||
- HF blog: https://huggingface.co/blog/paligemma2
|
||||
|
||||
## Prompt format
|
||||
|
||||
PaliGemma uses **task-prefix** prompting, not chat turns. Format:
|
||||
|
||||
```
|
||||
<image>{task} {args}
|
||||
```
|
||||
|
||||
Known task prefixes (not exhaustive; Google under-documents the full list):
|
||||
|
||||
| Prefix | Purpose | Example |
|
||||
|---|---|---|
|
||||
| `caption {lang}` | Image captioning | `<image>caption en` |
|
||||
| `ocr` | Read all text in image | `<image>ocr` |
|
||||
| `answer en {q}` | VQA | `<image>answer en what color is the car?` |
|
||||
| `detect {obj}` | Object detection (bounding boxes) | `<image>detect cat ; dog` |
|
||||
| `segment {obj}` | Segmentation masks | `<image>segment person` |
|
||||
|
||||
For `detect` and `segment`, output uses custom location (`<loc0123>`) and segmentation (`<seg000>`) tokens. You need the PaliGemma postprocessing routines to convert them to pixel coords.
|
||||
|
||||
## Minimum invocation — PaliGemma 2
|
||||
|
||||
```python
|
||||
from transformers import AutoProcessor, PaliGemmaForConditionalGeneration
|
||||
from PIL import Image
|
||||
import requests, torch
|
||||
|
||||
model_id = "google/paligemma2-3b-mix-448"
|
||||
model = PaliGemmaForConditionalGeneration.from_pretrained(
|
||||
model_id, torch_dtype=torch.bfloat16
|
||||
).to("cuda")
|
||||
processor = AutoProcessor.from_pretrained(model_id)
|
||||
|
||||
image = Image.open(requests.get(
|
||||
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cats.png",
|
||||
stream=True
|
||||
).raw).convert("RGB")
|
||||
|
||||
prompt = "<image>caption en"
|
||||
inputs = processor(prompt, image, return_tensors="pt").to("cuda")
|
||||
out = model.generate(**inputs, max_new_tokens=200)
|
||||
gen = processor.decode(out[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
|
||||
print(gen)
|
||||
```
|
||||
|
||||
## When to choose it over base Gemma 4 vision
|
||||
|
||||
- You need **structured spatial output** — bounding boxes, segmentation masks. Base Gemma 4 vision returns freeform text; PaliGemma 2 returns grid-aligned location tokens.
|
||||
- You're doing **pure VQA or captioning at scale** and want a smaller, faster, task-specialized 3B model (vs. Gemma 4 E4B at 4B-effective).
|
||||
- You're **fine-tuning** for a narrow vision task — PaliGemma 2 is explicitly designed to be easy to fine-tune; Google ships LoRA recipes.
|
||||
|
||||
Use base Gemma 4 for **conversational multimodal** (back-and-forth with images + text reasoning). PaliGemma is the "turn image into structured text" workhorse.
|
||||
|
||||
## Homelab fit
|
||||
|
||||
For `ai-visualizer` (CT 167, pve197 with V100): PaliGemma 2 3B-448 is a great caption-and-ground step when producing SDXL prompts from reference images. Already tested: base Gemma 4 E4B handles "describe this image" at ~25 tok/s on pve197. PaliGemma 2 would add `detect`/`segment` for spatial control (e.g., "put the character in the upper-left quadrant of the generated scene").
|
||||
@@ -0,0 +1,67 @@
|
||||
# RecurrentGemma
|
||||
|
||||
Griffin-architecture sibling. Built on **Gemma 1**. No Gemma 2/3/4 generation — the line has effectively stalled, with long-context Transformer variants (Gemma 4 with 256K context) overtaking the memory-efficiency argument.
|
||||
|
||||
## What it is
|
||||
|
||||
Gated linear recurrences + local sliding-window attention, replacing full self-attention. Fixed-size hidden state → **O(1) memory per token generated**, no KV cache growth. Inference stays fast and cheap as context lengthens.
|
||||
|
||||
## Sizes
|
||||
|
||||
- **2B** pretrained + instruct
|
||||
- **9B** pretrained + instruct
|
||||
|
||||
Only two sizes. No 27B. Griffin scaling beyond 9B is an open research question and Google didn't ship it.
|
||||
|
||||
## Model card
|
||||
|
||||
- https://ai.google.dev/gemma/docs/recurrentgemma/model_card
|
||||
- DeepMind: https://deepmind.google/models/gemma/recurrentgemma/
|
||||
- Paper: https://arxiv.org/abs/2404.07839
|
||||
- Repo: https://github.com/google-deepmind/recurrentgemma
|
||||
|
||||
## Architecture highlights
|
||||
|
||||
- **Griffin block:** alternates two residual recurrent blocks with a local MQA attention block.
|
||||
- **State size:** fixed — independent of sequence length.
|
||||
- **Sliding window:** local attention only, not global.
|
||||
- **Trade-off:** loses some needle-in-haystack precision vs. a full-attention Transformer, gains memory flatness.
|
||||
|
||||
## Prompt format
|
||||
|
||||
Standard Gemma turn format — same `<start_of_turn>user … <end_of_turn>` as Gemma 1 IT. No RecurrentGemma-specific tokens.
|
||||
|
||||
## Minimum invocation
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
import torch
|
||||
|
||||
model_id = "google/recurrentgemma-9b-it"
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_id, torch_dtype=torch.bfloat16, device_map="auto"
|
||||
)
|
||||
|
||||
prompt = "<start_of_turn>user\nWrite a haiku about memory.<end_of_turn>\n<start_of_turn>model\n"
|
||||
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
|
||||
out = model.generate(**inputs, max_new_tokens=100)
|
||||
print(tokenizer.decode(out[0], skip_special_tokens=True))
|
||||
```
|
||||
|
||||
## When to choose it over base Gemma 4
|
||||
|
||||
Honestly: **rarely, in April 2026.**
|
||||
|
||||
The original pitch was "long-context generation without KV blowup." Gemma 4 now ships with 256K context on the 26B/31B and 128K on the edge models, with efficient attention implementations. The gap RecurrentGemma was filling has narrowed.
|
||||
|
||||
Reasonable residual cases:
|
||||
- **Extremely memory-constrained hardware** (Jetson Nano tier) where even quantized Gemma 4 E2B KV cache is the limiting factor on sequence length.
|
||||
- **Streaming-generation workloads** where latency-per-token must stay constant as output length grows into the tens of thousands of tokens.
|
||||
- **Research interest** in recurrent LLMs.
|
||||
|
||||
For typical homelab use, skip. The V100 on pve197 has 32GB VRAM; Gemma 4 31B at Q4 fits with room for generous context.
|
||||
|
||||
## Homelab fit
|
||||
|
||||
Not a strong candidate for any current Seth project. Note for file: if a CPU-only streaming-transcript use case ever comes up (e.g., running on seth-pi for always-on audio processing), RecurrentGemma 2B could reappear in scope.
|
||||
@@ -0,0 +1,89 @@
|
||||
# ShieldGemma / ShieldGemma 2
|
||||
|
||||
Safety classifiers. Two separate product lines now: **ShieldGemma** (text, built on Gemma 2) and **ShieldGemma 2** (images, built on Gemma 3). There is no Gemma 4 generation yet.
|
||||
|
||||
## What it is
|
||||
|
||||
- **ShieldGemma (text):** LLM-as-a-judge safety classifier. Takes a prompt + optional model response + a policy, emits `Yes`/`No` (yes = violates policy). Four harm types.
|
||||
- **ShieldGemma 2 (image):** Image classifier. Takes a PIL image, emits probabilities across three image-safety categories. Image-only — does not accept text.
|
||||
|
||||
## Sizes
|
||||
|
||||
- ShieldGemma: **2B, 9B, 27B** — all instruction-tuned.
|
||||
- ShieldGemma 2: **4B** — only size available.
|
||||
|
||||
## Model cards
|
||||
|
||||
- Text: https://ai.google.dev/gemma/docs/shieldgemma/model_card
|
||||
- Image: https://huggingface.co/google/shieldgemma-2-4b-it
|
||||
- DeepMind: https://deepmind.google/models/gemma/shieldgemma-2/
|
||||
|
||||
## Safety categories
|
||||
|
||||
**ShieldGemma (text):**
|
||||
1. Sexually explicit content
|
||||
2. Dangerous content
|
||||
3. Hate speech
|
||||
4. Harassment
|
||||
|
||||
**ShieldGemma 2 (image):**
|
||||
1. Sexually explicit content
|
||||
2. Dangerous content
|
||||
3. Violence / gore
|
||||
|
||||
Note the image model dropped "hate" and "harassment" (hard to define visually) and added "violence/gore" (a visual primitive).
|
||||
|
||||
## Text prompt format (ShieldGemma)
|
||||
|
||||
Five-component structure:
|
||||
|
||||
```
|
||||
<preamble establishing "you are a policy expert">
|
||||
|
||||
<start_of_turn>user
|
||||
<user prompt here>
|
||||
<end_of_turn>
|
||||
|
||||
<start_of_turn>model
|
||||
<optional model response here>
|
||||
<end_of_turn>
|
||||
|
||||
Our safety principle is defined in the below:
|
||||
* <policy description, e.g. "No Hate Speech": ...>
|
||||
|
||||
Does the human prompt/response violate the above principle? Your answer must start with 'Yes' or 'No'.
|
||||
```
|
||||
|
||||
The model outputs one token: `Yes` (violates) or `No` (safe). Softmax the logits on those two tokens for a calibrated score.
|
||||
|
||||
## Minimum invocation — ShieldGemma 2 (image)
|
||||
|
||||
```python
|
||||
from transformers import AutoProcessor, ShieldGemma2ForImageClassification
|
||||
from PIL import Image
|
||||
import torch
|
||||
|
||||
model_id = "google/shieldgemma-2-4b-it"
|
||||
model = ShieldGemma2ForImageClassification.from_pretrained(model_id).eval()
|
||||
processor = AutoProcessor.from_pretrained(model_id)
|
||||
|
||||
image = Image.open("input.jpg")
|
||||
inputs = processor(images=[image], return_tensors="pt")
|
||||
|
||||
with torch.inference_mode():
|
||||
out = model(**inputs)
|
||||
|
||||
print(out.probabilities) # tensor of per-category "Yes" probabilities
|
||||
```
|
||||
|
||||
## When to choose it over base Gemma 4
|
||||
|
||||
- You need a **calibrated safety score**, not a free-form "is this safe?" answer from the chat model. ShieldGemma emits Yes/No token logits — easy to threshold.
|
||||
- You want **policy-by-policy classification** (e.g., run each category separately with different thresholds).
|
||||
- You're running a moderation pipeline and need **a small, fast, purpose-trained classifier** rather than a general chat model reasoning about safety.
|
||||
|
||||
Use base Gemma 4 for "explain *why* this is unsafe" narrative output. ShieldGemma is the yes/no stamp.
|
||||
|
||||
## Homelab fit
|
||||
|
||||
Pre-filter for `ai-visualizer` (CT 167, pve197) before publishing generated images. ShieldGemma 2 4B at Q4 fits comfortably on the Tesla V100-PCIE-32GB alongside SDXL.
|
||||
@@ -0,0 +1,43 @@
|
||||
# SignGemma
|
||||
|
||||
ASL (American Sign Language) → English translation model. Announced at Google I/O 2025.
|
||||
|
||||
## Status
|
||||
|
||||
**Limited preview only. No open weights as of April 2026.** Google published an interest form at I/O 2025; access has been gated to language-service providers, accessibility researchers, and members of the Deaf community. Participants receive a TensorFlow Lite package and sample integration code.
|
||||
|
||||
There is no public Hugging Face entry under `google/signgemma*`. The original plan was general availability by end-of-2025, which slipped. No updated timeline announced as of April 2026.
|
||||
|
||||
## What it is (from announcement material)
|
||||
|
||||
- **Video-in, text-out** on-device model.
|
||||
- Best performance on **ASL → English**; training includes other sign languages for future expansion.
|
||||
- Uses a **vision transformer** to analyze hand shapes, facial expressions, and motion, followed by a compact language model that produces English output.
|
||||
- Sized for **smartphones and laptops** — on-device real-time translation is the design goal.
|
||||
|
||||
## Base generation
|
||||
|
||||
Google states it is "part of the Gemma family" and "built on the Gemini Nano framework." Likely Gemma 3-era image/video encoder on a small Gemma 3 text decoder — **not confirmed**, and the "Gemini Nano framework" language suggests it may use Gemini-not-Gemma internals despite the name. Verify at release.
|
||||
|
||||
## Model card
|
||||
|
||||
- LinkedIn announcement: https://www.linkedin.com/posts/googledeepmind_signgemma-is-our-most-advanced-model-for-activity-7342957078249955329-JwJJ
|
||||
- Slator coverage: https://slator.com/google-invites-feedback-for-signgemma-a-new-ai-sign-language-translation-model/
|
||||
|
||||
No public model card yet.
|
||||
|
||||
## Prompt format
|
||||
|
||||
Not published.
|
||||
|
||||
## Minimum invocation
|
||||
|
||||
Not possible. No weights available.
|
||||
|
||||
## When to choose it
|
||||
|
||||
On release: accessibility apps, live captioning for Deaf users, sign-language learning tools.
|
||||
|
||||
## Homelab fit
|
||||
|
||||
Zero for typical homelab use. If Seth ever wants to pilot a real-time captioning overlay for video streams this could matter — but not buildable until Google ships weights.
|
||||
@@ -0,0 +1,105 @@
|
||||
# TranslateGemma
|
||||
|
||||
Multilingual text + image translation. Released **January 15, 2026**. Built on **Gemma 3** (not Gemma 4, despite being the newest variant at time of writing).
|
||||
|
||||
## What it is
|
||||
|
||||
Gemma 3 fine-tuned for translation across **55 languages**, using a two-stage distillation from Gemini. Retains Gemma 3's multimodal capability — can translate text embedded in images.
|
||||
|
||||
## Sizes
|
||||
|
||||
- **4B IT**
|
||||
- **12B IT**
|
||||
- **27B IT**
|
||||
|
||||
Google's headline claim: the 12B beats Gemma 3 27B baseline translation quality with less than half the parameters.
|
||||
|
||||
## Model card
|
||||
|
||||
- HF: https://huggingface.co/google/translategemma-4b-it
|
||||
- Blog: https://blog.google/innovation-and-ai/technology/developers-tools/translategemma/
|
||||
- InfoQ: https://www.infoq.com/news/2026/01/google-translategemma-models/
|
||||
|
||||
## Supported languages
|
||||
|
||||
55 languages via ISO 639-1 codes (`en`, `de`, `es`, `fr`, `pl`, `ja`, `zh`, `ar`, `hi`, etc.) plus regional variants (`en-US`, `en-GB`, `pt-BR`, `pt-PT`, `de-DE`, `de-AT`, `de-CH`, `zh-CN`, `zh-TW`, etc.).
|
||||
|
||||
## Prompt format
|
||||
|
||||
**Strict chat-template format.** Content list must contain exactly **one entry**, with mandatory `source_lang_code` and `target_lang_code`.
|
||||
|
||||
### Text translation
|
||||
|
||||
```python
|
||||
messages = [{
|
||||
"role": "user",
|
||||
"content": [{
|
||||
"type": "text",
|
||||
"source_lang_code": "cs",
|
||||
"target_lang_code": "de-DE",
|
||||
"text": "V nejhorším případě i k prasknutí čočky.",
|
||||
}],
|
||||
}]
|
||||
```
|
||||
|
||||
### Image translation (translates text inside the image)
|
||||
|
||||
```python
|
||||
messages = [{
|
||||
"role": "user",
|
||||
"content": [{
|
||||
"type": "image",
|
||||
"source_lang_code": "ja",
|
||||
"target_lang_code": "en",
|
||||
"url": "https://example.com/japanese-sign.jpg",
|
||||
}],
|
||||
}]
|
||||
```
|
||||
|
||||
Only `"text"` and `"image"` types are supported. Only `user` and `assistant` roles. Image input is normalized to 896×896 (256 vision tokens).
|
||||
|
||||
## Minimum invocation
|
||||
|
||||
```python
|
||||
from transformers import pipeline
|
||||
import torch
|
||||
|
||||
pipe = pipeline(
|
||||
"image-text-to-text",
|
||||
model="google/translategemma-4b-it",
|
||||
device="cuda",
|
||||
dtype=torch.bfloat16,
|
||||
)
|
||||
|
||||
messages = [{
|
||||
"role": "user",
|
||||
"content": [{
|
||||
"type": "text",
|
||||
"source_lang_code": "pl",
|
||||
"target_lang_code": "en",
|
||||
"text": "Dziadek mieszkał w Warszawie przed wojną.",
|
||||
}],
|
||||
}]
|
||||
|
||||
out = pipe(text=messages, max_new_tokens=200)
|
||||
print(out[0]["generated_text"][-1]["content"])
|
||||
```
|
||||
|
||||
## Performance
|
||||
|
||||
- **WMT24++ across 55 languages:** MetricX 5.32, COMET 81.6.
|
||||
- Context window: 2K tokens (short — this is a translation model, not a long-doc summarizer).
|
||||
|
||||
## When to choose it over base Gemma 4
|
||||
|
||||
- You want **translation quality > general Gemma 4** at equivalent size, with the strict prompt contract making it easy to drop into a pipeline.
|
||||
- You need **image-text translation** (street signs, menus, old documents) as a first-class task.
|
||||
- You care about the 55-language coverage and regionalized variants.
|
||||
|
||||
Base Gemma 4 31B *can* translate — fine for casual use. TranslateGemma wins for production pipelines and when you care about metric-validated quality.
|
||||
|
||||
## Homelab fit
|
||||
|
||||
**Strong fit for family history agent.** If source documents are in German, Polish, Hungarian, Yiddish, or any of the 55 supported languages, TranslateGemma 4B on pve197 (GPU-backed) becomes the translation leg of an ingest pipeline: OCR → TranslateGemma → Gemma 4 for reasoning. The 4B size fits alongside the other models on the V100.
|
||||
|
||||
Also useful for SearchXNG (if Seth ever wants to auto-translate non-English search results) and the news-summary print system (translate foreign-language feeds before summarization).
|
||||
@@ -0,0 +1,63 @@
|
||||
# TxGemma
|
||||
|
||||
Therapeutic-development / drug-discovery variant. Built on **Gemma 2**. No Gemma 3 or 4 generation yet.
|
||||
|
||||
## What it is
|
||||
|
||||
Gemma 2 fine-tuned on 7M examples curated from the **Therapeutics Data Commons (TDC)** — predictive tasks across small molecules, proteins, nucleic acids, diseases, and cell lines. Beats or matches state-of-the-art on 50 of 66 TDC tasks; beats specialist models on 26 of them.
|
||||
|
||||
## Sizes
|
||||
|
||||
- **2B predict** — prediction-only, narrow prompt format.
|
||||
- **9B predict** + **9B chat** — prediction plus conversational reasoning.
|
||||
- **27B predict** + **27B chat** — same, larger.
|
||||
|
||||
## Model card
|
||||
|
||||
- https://developers.google.com/health-ai-developer-foundations/txgemma/model-card
|
||||
- DeepMind: https://deepmind.google/models/gemma/txgemma/
|
||||
- Paper: https://deepmind.google/research/publications/153799/
|
||||
|
||||
## Prompting modes
|
||||
|
||||
**Prediction mode** (all sizes): structured TDC-format prompt with instruction + context + question + optional few-shot. Output is a short prediction (sometimes a single token or a float).
|
||||
|
||||
**Conversational mode** (9B, 27B): chat-template interactions, can explain reasoning behind predictions.
|
||||
|
||||
## Minimum invocation — prediction
|
||||
|
||||
```python
|
||||
from transformers import pipeline
|
||||
|
||||
pipe = pipeline(
|
||||
"text-generation",
|
||||
model="google/txgemma-27b-predict",
|
||||
device="cuda",
|
||||
)
|
||||
|
||||
prompt = (
|
||||
"Instructions: Predict whether the molecule can penetrate the blood-brain barrier.\n"
|
||||
"Context: Blood-brain barrier penetration is an important property for CNS drugs.\n"
|
||||
"Question: Given the SMILES string CN1C=NC2=C1C(=O)N(C(=O)N2C)C, "
|
||||
"predict BBB penetration. Answer with 'Yes' or 'No'.\n"
|
||||
"Answer:"
|
||||
)
|
||||
|
||||
out = pipe(prompt, max_new_tokens=8)
|
||||
print(out[0]["generated_text"])
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
Health AI Developer Foundations — same terms as MedGemma. Non-clinical, research-use.
|
||||
|
||||
## When to choose it over base Gemma 4
|
||||
|
||||
- You're doing **drug-discovery research** and need TDC-format predictions out of the box.
|
||||
- You want **SMILES-aware reasoning** without a custom cheminformatics stack.
|
||||
|
||||
Almost never chosen for general-purpose work. TxGemma's value is the training data, not the base model.
|
||||
|
||||
## Homelab fit
|
||||
|
||||
Zero. Noted for completeness.
|
||||
Reference in New Issue
Block a user