# TranslateGemma Multilingual text + image translation. Released **January 15, 2026**. Built on **Gemma 3** (not Gemma 4, despite being the newest variant at time of writing). ## What it is Gemma 3 fine-tuned for translation across **55 languages**, using a two-stage distillation from Gemini. Retains Gemma 3's multimodal capability — can translate text embedded in images. ## Sizes - **4B IT** - **12B IT** - **27B IT** Google's headline claim: the 12B beats Gemma 3 27B baseline translation quality with less than half the parameters. ## Model card - HF: https://huggingface.co/google/translategemma-4b-it - Blog: https://blog.google/innovation-and-ai/technology/developers-tools/translategemma/ - InfoQ: https://www.infoq.com/news/2026/01/google-translategemma-models/ ## Supported languages 55 languages via ISO 639-1 codes (`en`, `de`, `es`, `fr`, `pl`, `ja`, `zh`, `ar`, `hi`, etc.) plus regional variants (`en-US`, `en-GB`, `pt-BR`, `pt-PT`, `de-DE`, `de-AT`, `de-CH`, `zh-CN`, `zh-TW`, etc.). ## Prompt format **Strict chat-template format.** Content list must contain exactly **one entry**, with mandatory `source_lang_code` and `target_lang_code`. ### Text translation ```python messages = [{ "role": "user", "content": [{ "type": "text", "source_lang_code": "cs", "target_lang_code": "de-DE", "text": "V nejhorším případě i k prasknutí čočky.", }], }] ``` ### Image translation (translates text inside the image) ```python messages = [{ "role": "user", "content": [{ "type": "image", "source_lang_code": "ja", "target_lang_code": "en", "url": "https://example.com/japanese-sign.jpg", }], }] ``` Only `"text"` and `"image"` types are supported. Only `user` and `assistant` roles. Image input is normalized to 896×896 (256 vision tokens). ## Minimum invocation ```python from transformers import pipeline import torch pipe = pipeline( "image-text-to-text", model="google/translategemma-4b-it", device="cuda", dtype=torch.bfloat16, ) messages = [{ "role": "user", "content": [{ "type": "text", "source_lang_code": "pl", "target_lang_code": "en", "text": "Dziadek mieszkał w Warszawie przed wojną.", }], }] out = pipe(text=messages, max_new_tokens=200) print(out[0]["generated_text"][-1]["content"]) ``` ## Performance - **WMT24++ across 55 languages:** MetricX 5.32, COMET 81.6. - Context window: 2K tokens (short — this is a translation model, not a long-doc summarizer). ## When to choose it over base Gemma 4 - You want **translation quality > general Gemma 4** at equivalent size, with the strict prompt contract making it easy to drop into a pipeline. - You need **image-text translation** (street signs, menus, old documents) as a first-class task. - You care about the 55-language coverage and regionalized variants. Base Gemma 4 31B *can* translate — fine for casual use. TranslateGemma wins for production pipelines and when you care about metric-validated quality. ## Homelab fit **Strong fit for family history agent.** If source documents are in German, Polish, Hungarian, Yiddish, or any of the 55 supported languages, TranslateGemma 4B on pve197 (GPU-backed) becomes the translation leg of an ingest pipeline: OCR → TranslateGemma → Gemma 4 for reasoning. The 4B size fits alongside the other models on the V100. Also useful for SearchXNG (if Seth ever wants to auto-translate non-English search results) and the news-summary print system (translate foreign-language feeds before summarization).