eecebe7ef5
Five-lane parallel research pass. Each subdir under tooling/ has its own README indexing downloaded files with verified upstream sources. - google-official/: deepmind-gemma JAX examples, gemma_pytorch scripts, gemma.cpp API server docs, google-gemma/cookbook notebooks, ai.google.dev HTML snapshots, Gemma 3 tech report - huggingface/: 8 gemma-4-* model cards, chat-template .jinja files, tokenizer_config.json, transformers gemma4/ source, launch blog posts, official HF Spaces app.py - inference-frameworks/: vLLM/llama.cpp/MLX/Keras-hub/TGI/Gemini API/Vertex AI comparison, run_commands.sh with 8 working launches, 9 code snippets - gemma-family/: 12 per-variant briefs (ShieldGemma 2, CodeGemma, PaliGemma 2, Recurrent/Data/Med/TxGemma, Embedding/Translate/Function/Dolphin/SignGemma) - fine-tuning/: Unsloth Gemma 4 notebooks, Axolotl YAMLs (incl 26B-A4B MoE), TRL scripts, Google cookbook fine-tune notebooks, recipe-recommendation.md Findings that update earlier CORPUS_* docs are flagged in tooling/README.md (not applied) — notably the new <|turn>/<turn|> prompt format, gemma_pytorch abandonment, gemma.cpp Gemini-API server, transformers AutoModelForMultimodalLM, FA2 head_dim=512 break, 26B-A4B MoE quantization rules, no Gemma 4 tech report PDF yet, no Gemma-4-generation specialized siblings yet. Pre-commit secrets hook bypassed per user authorization — flagged "secrets" are base64 notebook cell outputs and example Ed25519 keys in the HDP agentic-security demo, not real credentials. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
89 lines
3.0 KiB
Markdown
89 lines
3.0 KiB
Markdown
# CodeGemma
|
||
|
||
Code completion / generation with native **fill-in-the-middle (FIM)** support. Built on **Gemma 1** — still the most recent generation as of April 2026. No CodeGemma 2/3/4 release.
|
||
|
||
## What it is
|
||
|
||
Gemma 1 fine-tuned on code. Trained with 80–90% FIM rate, 50/50 split between PSM (Prefix-Suffix-Middle) and SPM (Suffix-Prefix-Middle) formats. Designed for IDE autocomplete more than chat.
|
||
|
||
## Sizes
|
||
|
||
- **2B pretrained** — fast completion
|
||
- **7B pretrained** — higher quality completion + FIM
|
||
- **7B instruction-tuned** — code chat
|
||
|
||
Versioned point releases exist (2B 1.1, 7B-IT 1.1).
|
||
|
||
## Model card
|
||
|
||
- https://ai.google.dev/gemma/docs/codegemma/model_card
|
||
- HF: https://huggingface.co/google/codegemma-7b
|
||
- Tech report: https://arxiv.org/abs/2406.11409
|
||
|
||
## FIM tokens
|
||
|
||
```
|
||
<|fim_prefix|> prefix-of-completion marker
|
||
<|fim_suffix|> cursor/insertion-point marker
|
||
<|fim_middle|> generation trigger
|
||
<|file_separator|> multi-file boundary
|
||
```
|
||
|
||
### PSM (Prefix-Suffix-Middle) template
|
||
|
||
```
|
||
<|fim_prefix|>[code before cursor]<|fim_suffix|>[code after cursor]<|fim_middle|>
|
||
```
|
||
|
||
Example:
|
||
|
||
```python
|
||
prompt = (
|
||
"<|fim_prefix|>import datetime\n"
|
||
"def calculate_age(birth_year):\n"
|
||
" current_year = datetime.date.today().year\n"
|
||
" <|fim_suffix|>\n"
|
||
" return age<|fim_middle|>"
|
||
)
|
||
```
|
||
|
||
The model generates the middle chunk and halts.
|
||
|
||
### Multi-file context
|
||
|
||
Prepend referenced files separated by `<|file_separator|>`, then the target file in FIM format.
|
||
|
||
## Minimum invocation
|
||
|
||
```python
|
||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||
import torch
|
||
|
||
model_id = "google/codegemma-7b"
|
||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||
model = AutoModelForCausalLM.from_pretrained(
|
||
model_id, torch_dtype=torch.bfloat16, device_map="auto"
|
||
)
|
||
|
||
prompt = "<|fim_prefix|>def fib(n):\n if n <= 1:\n return n\n <|fim_suffix|>\n return a<|fim_middle|>"
|
||
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
|
||
out = model.generate(**inputs, max_new_tokens=128)
|
||
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))
|
||
```
|
||
|
||
## Ollama
|
||
|
||
`ollama pull codegemma:7b` or `codegemma:2b`. Ollama wraps the FIM tokens for you when you use its completion API with prefix/suffix.
|
||
|
||
## When to choose it over base Gemma 4
|
||
|
||
- You need **IDE-grade FIM autocomplete** — CodeGemma was trained for it, base Gemma 4 was not.
|
||
- You want a **2B code model** — base Gemma 4 skips this size (E2B is multimodal, not code-specialized).
|
||
- You want **Ollama-native FIM** that tools like `continue.dev` can talk to.
|
||
|
||
Base Gemma 4 31B still beats CodeGemma 7B on LiveCodeBench, so for **agentic coding** (plan, write, execute) Gemma 4 or `qwen3-coder:30b` wins. CodeGemma is the inline-cursor-assistant niche.
|
||
|
||
## Homelab fit
|
||
|
||
Steel141 already has qwen3-coder:30b and qwen3-coder-next:79.7B — those are stronger than CodeGemma 7B. Only reason to pull CodeGemma is if you want a tiny 2B FIM model for a latency-sensitive editor integration on a Pi or on pve197 alongside the vision stack.
|