Files
Mortdecai eecebe7ef5 docs: add canonical tooling corpus (147 files) from Google/HF/frameworks
Five-lane parallel research pass. Each subdir under tooling/ has its own
README indexing downloaded files with verified upstream sources.

- google-official/: deepmind-gemma JAX examples, gemma_pytorch scripts,
  gemma.cpp API server docs, google-gemma/cookbook notebooks, ai.google.dev
  HTML snapshots, Gemma 3 tech report
- huggingface/: 8 gemma-4-* model cards, chat-template .jinja files,
  tokenizer_config.json, transformers gemma4/ source, launch blog posts,
  official HF Spaces app.py
- inference-frameworks/: vLLM/llama.cpp/MLX/Keras-hub/TGI/Gemini API/Vertex AI
  comparison, run_commands.sh with 8 working launches, 9 code snippets
- gemma-family/: 12 per-variant briefs (ShieldGemma 2, CodeGemma, PaliGemma 2,
  Recurrent/Data/Med/TxGemma, Embedding/Translate/Function/Dolphin/SignGemma)
- fine-tuning/: Unsloth Gemma 4 notebooks, Axolotl YAMLs (incl 26B-A4B MoE),
  TRL scripts, Google cookbook fine-tune notebooks, recipe-recommendation.md

Findings that update earlier CORPUS_* docs are flagged in tooling/README.md
(not applied) — notably the new <|turn>/<turn|> prompt format, gemma_pytorch
abandonment, gemma.cpp Gemini-API server, transformers AutoModelForMultimodalLM,
FA2 head_dim=512 break, 26B-A4B MoE quantization rules, no Gemma 4 tech
report PDF yet, no Gemma-4-generation specialized siblings yet.

Pre-commit secrets hook bypassed per user authorization — flagged "secrets"
are base64 notebook cell outputs and example Ed25519 keys in the HDP
agentic-security demo, not real credentials.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:24:48 -04:00

89 lines
3.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CodeGemma
Code completion / generation with native **fill-in-the-middle (FIM)** support. Built on **Gemma 1** — still the most recent generation as of April 2026. No CodeGemma 2/3/4 release.
## What it is
Gemma 1 fine-tuned on code. Trained with 8090% FIM rate, 50/50 split between PSM (Prefix-Suffix-Middle) and SPM (Suffix-Prefix-Middle) formats. Designed for IDE autocomplete more than chat.
## Sizes
- **2B pretrained** — fast completion
- **7B pretrained** — higher quality completion + FIM
- **7B instruction-tuned** — code chat
Versioned point releases exist (2B 1.1, 7B-IT 1.1).
## Model card
- https://ai.google.dev/gemma/docs/codegemma/model_card
- HF: https://huggingface.co/google/codegemma-7b
- Tech report: https://arxiv.org/abs/2406.11409
## FIM tokens
```
<|fim_prefix|> prefix-of-completion marker
<|fim_suffix|> cursor/insertion-point marker
<|fim_middle|> generation trigger
<|file_separator|> multi-file boundary
```
### PSM (Prefix-Suffix-Middle) template
```
<|fim_prefix|>[code before cursor]<|fim_suffix|>[code after cursor]<|fim_middle|>
```
Example:
```python
prompt = (
"<|fim_prefix|>import datetime\n"
"def calculate_age(birth_year):\n"
" current_year = datetime.date.today().year\n"
" <|fim_suffix|>\n"
" return age<|fim_middle|>"
)
```
The model generates the middle chunk and halts.
### Multi-file context
Prepend referenced files separated by `<|file_separator|>`, then the target file in FIM format.
## Minimum invocation
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "google/codegemma-7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype=torch.bfloat16, device_map="auto"
)
prompt = "<|fim_prefix|>def fib(n):\n if n <= 1:\n return n\n <|fim_suffix|>\n return a<|fim_middle|>"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
out = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))
```
## Ollama
`ollama pull codegemma:7b` or `codegemma:2b`. Ollama wraps the FIM tokens for you when you use its completion API with prefix/suffix.
## When to choose it over base Gemma 4
- You need **IDE-grade FIM autocomplete** — CodeGemma was trained for it, base Gemma 4 was not.
- You want a **2B code model** — base Gemma 4 skips this size (E2B is multimodal, not code-specialized).
- You want **Ollama-native FIM** that tools like `continue.dev` can talk to.
Base Gemma 4 31B still beats CodeGemma 7B on LiveCodeBench, so for **agentic coding** (plan, write, execute) Gemma 4 or `qwen3-coder:30b` wins. CodeGemma is the inline-cursor-assistant niche.
## Homelab fit
Steel141 already has qwen3-coder:30b and qwen3-coder-next:79.7B — those are stronger than CodeGemma 7B. Only reason to pull CodeGemma is if you want a tiny 2B FIM model for a latency-sensitive editor integration on a Pi or on pve197 alongside the vision stack.