eecebe7ef5
Five-lane parallel research pass. Each subdir under tooling/ has its own README indexing downloaded files with verified upstream sources. - google-official/: deepmind-gemma JAX examples, gemma_pytorch scripts, gemma.cpp API server docs, google-gemma/cookbook notebooks, ai.google.dev HTML snapshots, Gemma 3 tech report - huggingface/: 8 gemma-4-* model cards, chat-template .jinja files, tokenizer_config.json, transformers gemma4/ source, launch blog posts, official HF Spaces app.py - inference-frameworks/: vLLM/llama.cpp/MLX/Keras-hub/TGI/Gemini API/Vertex AI comparison, run_commands.sh with 8 working launches, 9 code snippets - gemma-family/: 12 per-variant briefs (ShieldGemma 2, CodeGemma, PaliGemma 2, Recurrent/Data/Med/TxGemma, Embedding/Translate/Function/Dolphin/SignGemma) - fine-tuning/: Unsloth Gemma 4 notebooks, Axolotl YAMLs (incl 26B-A4B MoE), TRL scripts, Google cookbook fine-tune notebooks, recipe-recommendation.md Findings that update earlier CORPUS_* docs are flagged in tooling/README.md (not applied) — notably the new <|turn>/<turn|> prompt format, gemma_pytorch abandonment, gemma.cpp Gemini-API server, transformers AutoModelForMultimodalLM, FA2 head_dim=512 break, 26B-A4B MoE quantization rules, no Gemma 4 tech report PDF yet, no Gemma-4-generation specialized siblings yet. Pre-commit secrets hook bypassed per user authorization — flagged "secrets" are base64 notebook cell outputs and example Ed25519 keys in the HDP agentic-security demo, not real credentials. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
106 lines
3.5 KiB
Markdown
106 lines
3.5 KiB
Markdown
# TranslateGemma
|
||
|
||
Multilingual text + image translation. Released **January 15, 2026**. Built on **Gemma 3** (not Gemma 4, despite being the newest variant at time of writing).
|
||
|
||
## What it is
|
||
|
||
Gemma 3 fine-tuned for translation across **55 languages**, using a two-stage distillation from Gemini. Retains Gemma 3's multimodal capability — can translate text embedded in images.
|
||
|
||
## Sizes
|
||
|
||
- **4B IT**
|
||
- **12B IT**
|
||
- **27B IT**
|
||
|
||
Google's headline claim: the 12B beats Gemma 3 27B baseline translation quality with less than half the parameters.
|
||
|
||
## Model card
|
||
|
||
- HF: https://huggingface.co/google/translategemma-4b-it
|
||
- Blog: https://blog.google/innovation-and-ai/technology/developers-tools/translategemma/
|
||
- InfoQ: https://www.infoq.com/news/2026/01/google-translategemma-models/
|
||
|
||
## Supported languages
|
||
|
||
55 languages via ISO 639-1 codes (`en`, `de`, `es`, `fr`, `pl`, `ja`, `zh`, `ar`, `hi`, etc.) plus regional variants (`en-US`, `en-GB`, `pt-BR`, `pt-PT`, `de-DE`, `de-AT`, `de-CH`, `zh-CN`, `zh-TW`, etc.).
|
||
|
||
## Prompt format
|
||
|
||
**Strict chat-template format.** Content list must contain exactly **one entry**, with mandatory `source_lang_code` and `target_lang_code`.
|
||
|
||
### Text translation
|
||
|
||
```python
|
||
messages = [{
|
||
"role": "user",
|
||
"content": [{
|
||
"type": "text",
|
||
"source_lang_code": "cs",
|
||
"target_lang_code": "de-DE",
|
||
"text": "V nejhorším případě i k prasknutí čočky.",
|
||
}],
|
||
}]
|
||
```
|
||
|
||
### Image translation (translates text inside the image)
|
||
|
||
```python
|
||
messages = [{
|
||
"role": "user",
|
||
"content": [{
|
||
"type": "image",
|
||
"source_lang_code": "ja",
|
||
"target_lang_code": "en",
|
||
"url": "https://example.com/japanese-sign.jpg",
|
||
}],
|
||
}]
|
||
```
|
||
|
||
Only `"text"` and `"image"` types are supported. Only `user` and `assistant` roles. Image input is normalized to 896×896 (256 vision tokens).
|
||
|
||
## Minimum invocation
|
||
|
||
```python
|
||
from transformers import pipeline
|
||
import torch
|
||
|
||
pipe = pipeline(
|
||
"image-text-to-text",
|
||
model="google/translategemma-4b-it",
|
||
device="cuda",
|
||
dtype=torch.bfloat16,
|
||
)
|
||
|
||
messages = [{
|
||
"role": "user",
|
||
"content": [{
|
||
"type": "text",
|
||
"source_lang_code": "pl",
|
||
"target_lang_code": "en",
|
||
"text": "Dziadek mieszkał w Warszawie przed wojną.",
|
||
}],
|
||
}]
|
||
|
||
out = pipe(text=messages, max_new_tokens=200)
|
||
print(out[0]["generated_text"][-1]["content"])
|
||
```
|
||
|
||
## Performance
|
||
|
||
- **WMT24++ across 55 languages:** MetricX 5.32, COMET 81.6.
|
||
- Context window: 2K tokens (short — this is a translation model, not a long-doc summarizer).
|
||
|
||
## When to choose it over base Gemma 4
|
||
|
||
- You want **translation quality > general Gemma 4** at equivalent size, with the strict prompt contract making it easy to drop into a pipeline.
|
||
- You need **image-text translation** (street signs, menus, old documents) as a first-class task.
|
||
- You care about the 55-language coverage and regionalized variants.
|
||
|
||
Base Gemma 4 31B *can* translate — fine for casual use. TranslateGemma wins for production pipelines and when you care about metric-validated quality.
|
||
|
||
## Homelab fit
|
||
|
||
**Strong fit for family history agent.** If source documents are in German, Polish, Hungarian, Yiddish, or any of the 55 supported languages, TranslateGemma 4B on pve197 (GPU-backed) becomes the translation leg of an ingest pipeline: OCR → TranslateGemma → Gemma 4 for reasoning. The 4B size fits alongside the other models on the V100.
|
||
|
||
Also useful for SearchXNG (if Seth ever wants to auto-translate non-English search results) and the news-summary print system (translate foreign-language feeds before summarization).
|