# CodeGemma Code completion / generation with native **fill-in-the-middle (FIM)** support. Built on **Gemma 1** — still the most recent generation as of April 2026. No CodeGemma 2/3/4 release. ## What it is Gemma 1 fine-tuned on code. Trained with 80–90% FIM rate, 50/50 split between PSM (Prefix-Suffix-Middle) and SPM (Suffix-Prefix-Middle) formats. Designed for IDE autocomplete more than chat. ## Sizes - **2B pretrained** — fast completion - **7B pretrained** — higher quality completion + FIM - **7B instruction-tuned** — code chat Versioned point releases exist (2B 1.1, 7B-IT 1.1). ## Model card - https://ai.google.dev/gemma/docs/codegemma/model_card - HF: https://huggingface.co/google/codegemma-7b - Tech report: https://arxiv.org/abs/2406.11409 ## FIM tokens ``` <|fim_prefix|> prefix-of-completion marker <|fim_suffix|> cursor/insertion-point marker <|fim_middle|> generation trigger <|file_separator|> multi-file boundary ``` ### PSM (Prefix-Suffix-Middle) template ``` <|fim_prefix|>[code before cursor]<|fim_suffix|>[code after cursor]<|fim_middle|> ``` Example: ```python prompt = ( "<|fim_prefix|>import datetime\n" "def calculate_age(birth_year):\n" " current_year = datetime.date.today().year\n" " <|fim_suffix|>\n" " return age<|fim_middle|>" ) ``` The model generates the middle chunk and halts. ### Multi-file context Prepend referenced files separated by `<|file_separator|>`, then the target file in FIM format. ## Minimum invocation ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "google/codegemma-7b" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto" ) prompt = "<|fim_prefix|>def fib(n):\n if n <= 1:\n return n\n <|fim_suffix|>\n return a<|fim_middle|>" inputs = tokenizer(prompt, return_tensors="pt").to("cuda") out = model.generate(**inputs, max_new_tokens=128) print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False)) ``` ## Ollama `ollama pull codegemma:7b` or `codegemma:2b`. Ollama wraps the FIM tokens for you when you use its completion API with prefix/suffix. ## When to choose it over base Gemma 4 - You need **IDE-grade FIM autocomplete** — CodeGemma was trained for it, base Gemma 4 was not. - You want a **2B code model** — base Gemma 4 skips this size (E2B is multimodal, not code-specialized). - You want **Ollama-native FIM** that tools like `continue.dev` can talk to. Base Gemma 4 31B still beats CodeGemma 7B on LiveCodeBench, so for **agentic coding** (plan, write, execute) Gemma 4 or `qwen3-coder:30b` wins. CodeGemma is the inline-cursor-assistant niche. ## Homelab fit Steel141 already has qwen3-coder:30b and qwen3-coder-next:79.7B — those are stronger than CodeGemma 7B. Only reason to pull CodeGemma is if you want a tiny 2B FIM model for a latency-sensitive editor integration on a Pi or on pve197 alongside the vision stack.