Files
gemma4-research/tooling/gemma-family/codegemma.md
T
Mortdecai eecebe7ef5 docs: add canonical tooling corpus (147 files) from Google/HF/frameworks
Five-lane parallel research pass. Each subdir under tooling/ has its own
README indexing downloaded files with verified upstream sources.

- google-official/: deepmind-gemma JAX examples, gemma_pytorch scripts,
  gemma.cpp API server docs, google-gemma/cookbook notebooks, ai.google.dev
  HTML snapshots, Gemma 3 tech report
- huggingface/: 8 gemma-4-* model cards, chat-template .jinja files,
  tokenizer_config.json, transformers gemma4/ source, launch blog posts,
  official HF Spaces app.py
- inference-frameworks/: vLLM/llama.cpp/MLX/Keras-hub/TGI/Gemini API/Vertex AI
  comparison, run_commands.sh with 8 working launches, 9 code snippets
- gemma-family/: 12 per-variant briefs (ShieldGemma 2, CodeGemma, PaliGemma 2,
  Recurrent/Data/Med/TxGemma, Embedding/Translate/Function/Dolphin/SignGemma)
- fine-tuning/: Unsloth Gemma 4 notebooks, Axolotl YAMLs (incl 26B-A4B MoE),
  TRL scripts, Google cookbook fine-tune notebooks, recipe-recommendation.md

Findings that update earlier CORPUS_* docs are flagged in tooling/README.md
(not applied) — notably the new <|turn>/<turn|> prompt format, gemma_pytorch
abandonment, gemma.cpp Gemini-API server, transformers AutoModelForMultimodalLM,
FA2 head_dim=512 break, 26B-A4B MoE quantization rules, no Gemma 4 tech
report PDF yet, no Gemma-4-generation specialized siblings yet.

Pre-commit secrets hook bypassed per user authorization — flagged "secrets"
are base64 notebook cell outputs and example Ed25519 keys in the HDP
agentic-security demo, not real credentials.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:24:48 -04:00

3.0 KiB
Raw Blame History

CodeGemma

Code completion / generation with native fill-in-the-middle (FIM) support. Built on Gemma 1 — still the most recent generation as of April 2026. No CodeGemma 2/3/4 release.

What it is

Gemma 1 fine-tuned on code. Trained with 8090% FIM rate, 50/50 split between PSM (Prefix-Suffix-Middle) and SPM (Suffix-Prefix-Middle) formats. Designed for IDE autocomplete more than chat.

Sizes

  • 2B pretrained — fast completion
  • 7B pretrained — higher quality completion + FIM
  • 7B instruction-tuned — code chat

Versioned point releases exist (2B 1.1, 7B-IT 1.1).

Model card

FIM tokens

<|fim_prefix|>    prefix-of-completion marker
<|fim_suffix|>    cursor/insertion-point marker
<|fim_middle|>    generation trigger
<|file_separator|>  multi-file boundary

PSM (Prefix-Suffix-Middle) template

<|fim_prefix|>[code before cursor]<|fim_suffix|>[code after cursor]<|fim_middle|>

Example:

prompt = (
    "<|fim_prefix|>import datetime\n"
    "def calculate_age(birth_year):\n"
    "    current_year = datetime.date.today().year\n"
    "    <|fim_suffix|>\n"
    "    return age<|fim_middle|>"
)

The model generates the middle chunk and halts.

Multi-file context

Prepend referenced files separated by <|file_separator|>, then the target file in FIM format.

Minimum invocation

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "google/codegemma-7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

prompt = "<|fim_prefix|>def fib(n):\n    if n <= 1:\n        return n\n    <|fim_suffix|>\n    return a<|fim_middle|>"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
out = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))

Ollama

ollama pull codegemma:7b or codegemma:2b. Ollama wraps the FIM tokens for you when you use its completion API with prefix/suffix.

When to choose it over base Gemma 4

  • You need IDE-grade FIM autocomplete — CodeGemma was trained for it, base Gemma 4 was not.
  • You want a 2B code model — base Gemma 4 skips this size (E2B is multimodal, not code-specialized).
  • You want Ollama-native FIM that tools like continue.dev can talk to.

Base Gemma 4 31B still beats CodeGemma 7B on LiveCodeBench, so for agentic coding (plan, write, execute) Gemma 4 or qwen3-coder:30b wins. CodeGemma is the inline-cursor-assistant niche.

Homelab fit

Steel141 already has qwen3-coder:30b and qwen3-coder-next:79.7B — those are stronger than CodeGemma 7B. Only reason to pull CodeGemma is if you want a tiny 2B FIM model for a latency-sensitive editor integration on a Pi or on pve197 alongside the vision stack.