Files
gemma4-research/tooling/gemma-family/translategemma.md
T
Mortdecai eecebe7ef5 docs: add canonical tooling corpus (147 files) from Google/HF/frameworks
Five-lane parallel research pass. Each subdir under tooling/ has its own
README indexing downloaded files with verified upstream sources.

- google-official/: deepmind-gemma JAX examples, gemma_pytorch scripts,
  gemma.cpp API server docs, google-gemma/cookbook notebooks, ai.google.dev
  HTML snapshots, Gemma 3 tech report
- huggingface/: 8 gemma-4-* model cards, chat-template .jinja files,
  tokenizer_config.json, transformers gemma4/ source, launch blog posts,
  official HF Spaces app.py
- inference-frameworks/: vLLM/llama.cpp/MLX/Keras-hub/TGI/Gemini API/Vertex AI
  comparison, run_commands.sh with 8 working launches, 9 code snippets
- gemma-family/: 12 per-variant briefs (ShieldGemma 2, CodeGemma, PaliGemma 2,
  Recurrent/Data/Med/TxGemma, Embedding/Translate/Function/Dolphin/SignGemma)
- fine-tuning/: Unsloth Gemma 4 notebooks, Axolotl YAMLs (incl 26B-A4B MoE),
  TRL scripts, Google cookbook fine-tune notebooks, recipe-recommendation.md

Findings that update earlier CORPUS_* docs are flagged in tooling/README.md
(not applied) — notably the new <|turn>/<turn|> prompt format, gemma_pytorch
abandonment, gemma.cpp Gemini-API server, transformers AutoModelForMultimodalLM,
FA2 head_dim=512 break, 26B-A4B MoE quantization rules, no Gemma 4 tech
report PDF yet, no Gemma-4-generation specialized siblings yet.

Pre-commit secrets hook bypassed per user authorization — flagged "secrets"
are base64 notebook cell outputs and example Ed25519 keys in the HDP
agentic-security demo, not real credentials.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:24:48 -04:00

106 lines
3.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# TranslateGemma
Multilingual text + image translation. Released **January 15, 2026**. Built on **Gemma 3** (not Gemma 4, despite being the newest variant at time of writing).
## What it is
Gemma 3 fine-tuned for translation across **55 languages**, using a two-stage distillation from Gemini. Retains Gemma 3's multimodal capability — can translate text embedded in images.
## Sizes
- **4B IT**
- **12B IT**
- **27B IT**
Google's headline claim: the 12B beats Gemma 3 27B baseline translation quality with less than half the parameters.
## Model card
- HF: https://huggingface.co/google/translategemma-4b-it
- Blog: https://blog.google/innovation-and-ai/technology/developers-tools/translategemma/
- InfoQ: https://www.infoq.com/news/2026/01/google-translategemma-models/
## Supported languages
55 languages via ISO 639-1 codes (`en`, `de`, `es`, `fr`, `pl`, `ja`, `zh`, `ar`, `hi`, etc.) plus regional variants (`en-US`, `en-GB`, `pt-BR`, `pt-PT`, `de-DE`, `de-AT`, `de-CH`, `zh-CN`, `zh-TW`, etc.).
## Prompt format
**Strict chat-template format.** Content list must contain exactly **one entry**, with mandatory `source_lang_code` and `target_lang_code`.
### Text translation
```python
messages = [{
"role": "user",
"content": [{
"type": "text",
"source_lang_code": "cs",
"target_lang_code": "de-DE",
"text": "V nejhorším případě i k prasknutí čočky.",
}],
}]
```
### Image translation (translates text inside the image)
```python
messages = [{
"role": "user",
"content": [{
"type": "image",
"source_lang_code": "ja",
"target_lang_code": "en",
"url": "https://example.com/japanese-sign.jpg",
}],
}]
```
Only `"text"` and `"image"` types are supported. Only `user` and `assistant` roles. Image input is normalized to 896×896 (256 vision tokens).
## Minimum invocation
```python
from transformers import pipeline
import torch
pipe = pipeline(
"image-text-to-text",
model="google/translategemma-4b-it",
device="cuda",
dtype=torch.bfloat16,
)
messages = [{
"role": "user",
"content": [{
"type": "text",
"source_lang_code": "pl",
"target_lang_code": "en",
"text": "Dziadek mieszkał w Warszawie przed wojną.",
}],
}]
out = pipe(text=messages, max_new_tokens=200)
print(out[0]["generated_text"][-1]["content"])
```
## Performance
- **WMT24++ across 55 languages:** MetricX 5.32, COMET 81.6.
- Context window: 2K tokens (short — this is a translation model, not a long-doc summarizer).
## When to choose it over base Gemma 4
- You want **translation quality > general Gemma 4** at equivalent size, with the strict prompt contract making it easy to drop into a pipeline.
- You need **image-text translation** (street signs, menus, old documents) as a first-class task.
- You care about the 55-language coverage and regionalized variants.
Base Gemma 4 31B *can* translate — fine for casual use. TranslateGemma wins for production pipelines and when you care about metric-validated quality.
## Homelab fit
**Strong fit for family history agent.** If source documents are in German, Polish, Hungarian, Yiddish, or any of the 55 supported languages, TranslateGemma 4B on pve197 (GPU-backed) becomes the translation leg of an ingest pipeline: OCR → TranslateGemma → Gemma 4 for reasoning. The 4B size fits alongside the other models on the V100.
Also useful for SearchXNG (if Seth ever wants to auto-translate non-English search results) and the news-summary print system (translate foreign-language feeds before summarization).