gemma4-research/tooling/gemma-family/translategemma.md

# TranslateGemma

Multilingual text + image translation. Released **January 15, 2026**. Built on **Gemma 3** (not Gemma 4, despite being the newest variant at time of writing).

## What it is

Gemma 3 fine-tuned for translation across **55 languages**, using a two-stage distillation from Gemini. Retains Gemma 3's multimodal capability — can translate text embedded in images.

## Sizes

- **4B IT**
- **12B IT**
- **27B IT**

Google's headline claim: the 12B beats Gemma 3 27B baseline translation quality with less than half the parameters.

## Model card

- HF: https://huggingface.co/google/translategemma-4b-it
- Blog: https://blog.google/innovation-and-ai/technology/developers-tools/translategemma/
- InfoQ: https://www.infoq.com/news/2026/01/google-translategemma-models/

## Supported languages

55 languages via ISO 639-1 codes (`en`, `de`, `es`, `fr`, `pl`, `ja`, `zh`, `ar`, `hi`, etc.) plus regional variants (`en-US`, `en-GB`, `pt-BR`, `pt-PT`, `de-DE`, `de-AT`, `de-CH`, `zh-CN`, `zh-TW`, etc.).

## Prompt format

**Strict chat-template format.** Content list must contain exactly **one entry**, with mandatory `source_lang_code` and `target_lang_code`.

### Text translation

```python
messages = [{
    "role": "user",
    "content": [{
        "type": "text",
        "source_lang_code": "cs",
        "target_lang_code": "de-DE",
        "text": "V nejhorším případě i k prasknutí čočky.",
    }],
}]
```

### Image translation (translates text inside the image)

```python
messages = [{
    "role": "user",
    "content": [{
        "type": "image",
        "source_lang_code": "ja",
        "target_lang_code": "en",
        "url": "https://example.com/japanese-sign.jpg",
    }],
}]
```

Only `"text"` and `"image"` types are supported. Only `user` and `assistant` roles. Image input is normalized to 896×896 (256 vision tokens).

## Minimum invocation

```python
from transformers import pipeline
import torch

pipe = pipeline(
    "image-text-to-text",
    model="google/translategemma-4b-it",
    device="cuda",
    dtype=torch.bfloat16,
)

messages = [{
    "role": "user",
    "content": [{
        "type": "text",
        "source_lang_code": "pl",
        "target_lang_code": "en",
        "text": "Dziadek mieszkał w Warszawie przed wojną.",
    }],
}]

out = pipe(text=messages, max_new_tokens=200)
print(out[0]["generated_text"][-1]["content"])
```

## Performance

- **WMT24++ across 55 languages:** MetricX 5.32, COMET 81.6.
- Context window: 2K tokens (short — this is a translation model, not a long-doc summarizer).

## When to choose it over base Gemma 4

- You want **translation quality > general Gemma 4** at equivalent size, with the strict prompt contract making it easy to drop into a pipeline.
- You need **image-text translation** (street signs, menus, old documents) as a first-class task.
- You care about the 55-language coverage and regionalized variants.

Base Gemma 4 31B *can* translate — fine for casual use. TranslateGemma wins for production pipelines and when you care about metric-validated quality.

## Homelab fit

**Strong fit for family history agent.** If source documents are in German, Polish, Hungarian, Yiddish, or any of the 55 supported languages, TranslateGemma 4B on pve197 (GPU-backed) becomes the translation leg of an ingest pipeline: OCR → TranslateGemma → Gemma 4 for reasoning. The 4B size fits alongside the other models on the V100.

Also useful for SearchXNG (if Seth ever wants to auto-translate non-English search results) and the news-summary print system (translate foreign-language feeds before summarization).