Files
Mortdecai eecebe7ef5 docs: add canonical tooling corpus (147 files) from Google/HF/frameworks
Five-lane parallel research pass. Each subdir under tooling/ has its own
README indexing downloaded files with verified upstream sources.

- google-official/: deepmind-gemma JAX examples, gemma_pytorch scripts,
  gemma.cpp API server docs, google-gemma/cookbook notebooks, ai.google.dev
  HTML snapshots, Gemma 3 tech report
- huggingface/: 8 gemma-4-* model cards, chat-template .jinja files,
  tokenizer_config.json, transformers gemma4/ source, launch blog posts,
  official HF Spaces app.py
- inference-frameworks/: vLLM/llama.cpp/MLX/Keras-hub/TGI/Gemini API/Vertex AI
  comparison, run_commands.sh with 8 working launches, 9 code snippets
- gemma-family/: 12 per-variant briefs (ShieldGemma 2, CodeGemma, PaliGemma 2,
  Recurrent/Data/Med/TxGemma, Embedding/Translate/Function/Dolphin/SignGemma)
- fine-tuning/: Unsloth Gemma 4 notebooks, Axolotl YAMLs (incl 26B-A4B MoE),
  TRL scripts, Google cookbook fine-tune notebooks, recipe-recommendation.md

Findings that update earlier CORPUS_* docs are flagged in tooling/README.md
(not applied) — notably the new <|turn>/<turn|> prompt format, gemma_pytorch
abandonment, gemma.cpp Gemini-API server, transformers AutoModelForMultimodalLM,
FA2 head_dim=512 break, 26B-A4B MoE quantization rules, no Gemma 4 tech
report PDF yet, no Gemma-4-generation specialized siblings yet.

Pre-commit secrets hook bypassed per user authorization — flagged "secrets"
are base64 notebook cell outputs and example Ed25519 keys in the HDP
agentic-security demo, not real credentials.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:24:48 -04:00

3.5 KiB
Raw Permalink Blame History

TranslateGemma

Multilingual text + image translation. Released January 15, 2026. Built on Gemma 3 (not Gemma 4, despite being the newest variant at time of writing).

What it is

Gemma 3 fine-tuned for translation across 55 languages, using a two-stage distillation from Gemini. Retains Gemma 3's multimodal capability — can translate text embedded in images.

Sizes

  • 4B IT
  • 12B IT
  • 27B IT

Google's headline claim: the 12B beats Gemma 3 27B baseline translation quality with less than half the parameters.

Model card

Supported languages

55 languages via ISO 639-1 codes (en, de, es, fr, pl, ja, zh, ar, hi, etc.) plus regional variants (en-US, en-GB, pt-BR, pt-PT, de-DE, de-AT, de-CH, zh-CN, zh-TW, etc.).

Prompt format

Strict chat-template format. Content list must contain exactly one entry, with mandatory source_lang_code and target_lang_code.

Text translation

messages = [{
    "role": "user",
    "content": [{
        "type": "text",
        "source_lang_code": "cs",
        "target_lang_code": "de-DE",
        "text": "V nejhorším případě i k prasknutí čočky.",
    }],
}]

Image translation (translates text inside the image)

messages = [{
    "role": "user",
    "content": [{
        "type": "image",
        "source_lang_code": "ja",
        "target_lang_code": "en",
        "url": "https://example.com/japanese-sign.jpg",
    }],
}]

Only "text" and "image" types are supported. Only user and assistant roles. Image input is normalized to 896×896 (256 vision tokens).

Minimum invocation

from transformers import pipeline
import torch

pipe = pipeline(
    "image-text-to-text",
    model="google/translategemma-4b-it",
    device="cuda",
    dtype=torch.bfloat16,
)

messages = [{
    "role": "user",
    "content": [{
        "type": "text",
        "source_lang_code": "pl",
        "target_lang_code": "en",
        "text": "Dziadek mieszkał w Warszawie przed wojną.",
    }],
}]

out = pipe(text=messages, max_new_tokens=200)
print(out[0]["generated_text"][-1]["content"])

Performance

  • WMT24++ across 55 languages: MetricX 5.32, COMET 81.6.
  • Context window: 2K tokens (short — this is a translation model, not a long-doc summarizer).

When to choose it over base Gemma 4

  • You want translation quality > general Gemma 4 at equivalent size, with the strict prompt contract making it easy to drop into a pipeline.
  • You need image-text translation (street signs, menus, old documents) as a first-class task.
  • You care about the 55-language coverage and regionalized variants.

Base Gemma 4 31B can translate — fine for casual use. TranslateGemma wins for production pipelines and when you care about metric-validated quality.

Homelab fit

Strong fit for family history agent. If source documents are in German, Polish, Hungarian, Yiddish, or any of the 55 supported languages, TranslateGemma 4B on pve197 (GPU-backed) becomes the translation leg of an ingest pipeline: OCR → TranslateGemma → Gemma 4 for reasoning. The 4B size fits alongside the other models on the V100.

Also useful for SearchXNG (if Seth ever wants to auto-translate non-English search results) and the news-summary print system (translate foreign-language feeds before summarization).