docs: merge tooling findings into SYNTHESIS/GOTCHAS/CORPUS_* and add handoff

Patches the top-level corpus docs with the 13 findings flagged during the 2026-04-18 canonical tooling research pass. tooling/README.md now marks each finding [merged: <file>] or [flagged] for provenance. - CORPUS_ollama_variants.md: annotate gemma4:26b as MoE (25.2B total / 3.8B active, 8-of-128 experts + 1 shared). Note Q4_K_M inference is standard (the "MoE quality degrades at 4-bit" caveat is training-only). Add note that audio on E-series is NOT available via Ollama — llama.cpp mmproj or vLLM only. - CORPUS_capabilities.md: native system role, configurable thinking mode, first trained tool use (vs Gemma 1/2/3 proof-of-concept), native object detection with bbox output in 1000x1000 coords, pointer to EmbeddingGemma for retrieval (Gemma 4 has no embedding mode). - CORPUS_tool_calling_format.md: add Chat Template Context section documenting the <|turn>/<turn|> asymmetric brackets (new in Gemma 4, replaced <start_of_turn>/<end_of_turn>) plus <|think>, <|channel>, <|image>, <|audio> tokens. Add HF transformers Alternative section showing processor.parse_response with response_schema. - GOTCHAS.md: add MEDIUM gotcha for abandoned google/gemma_pytorch (no Gemma 4 support since 2025-05-30). Expand fine-tuning section with FA2/FA4 head_dim=512 break, fused LoRA kernel issues, 26B A4B training-quant guidance, new tool-call tokens as learned embeddings. - SYNTHESIS.md: add banner pointing to tooling/ for canonical upstream material. Add embeddinggemma row to Model Selection table. Also: - Add .gitignore excluding .backup/ (local scratch per global CLAUDE.md convention, not needed in tracked history) and __pycache__/. - Add .claude/handoffs/2026-04-18-canonical-tooling-research.md so future sessions can pick up cold — facts verified, open threads, what changed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:48:26 -04:00
parent eecebe7ef5
commit 5775978899
8 changed files with 197 additions and 20 deletions
@@ -0,0 +1,68 @@
+# Handoff — 2026-04-18: Canonical Tooling Research
+
+## TL;DR for the next session
+
+A parallel research pass pulled 147 files / 14 MB of first-party Gemma 4 tooling into `tooling/`, and the 13 findings that contradicted or extended the existing corpus were merged into the top-level `SYNTHESIS.md` / `GOTCHAS.md` / `CORPUS_*.md` docs. The repo is in a clean, coherent state.
+
+**If you're opening this repo for Gemma 4 implementation work, `SYNTHESIS.md` is still the right first read.** The new `tooling/README.md` is the receipts layer — read it when you need authoritative source material (model cards, chat templates, serving commands, sibling-model briefs).
+
+## What shipped
+
+**Commit `eecebe7` (master, pushed to `git.sethpc.xyz/Seth/gemma4-research`):** added `tooling/` with five subdirs — `google-official/`, `huggingface/`, `inference-frameworks/`, `gemma-family/`, `fine-tuning/`. Each subdir has its own indexing README.
+
+**Follow-up commit (same session):** patched top-level corpus docs with the 9 findings worth merging. The `tooling/README.md` "Findings" list now marks each one `[merged: <file>]` or `[flagged]` for provenance.
+
+## Key confirmed facts
+
+| Claim | Verified against |
+|-------|-----------------|
+| `gemma4:26b` is a MoE (25.2B total, 3.8B active, 8 of 128 experts + 1 shared) | HF model card at `tooling/huggingface/model-cards/gemma-4-26B-A4B-it-README.md` |
+| Q4_K_M inference on the MoE is fine (standard practice) | Mixtral/DeepSeek precedent; card neutral on inference quant |
+| Gemma 4 changed turn tokens from `<start_of_turn>` to `<|turn>`/`<turn|>` | `tooling/huggingface/model-cards/gemma-4-*-chat_template.jinja` |
+| Tool use is **trained** in Gemma 4, not a proof-of-concept as in Gemma 1/2/3 | DeepMind tool-use colab at `tooling/google-official/deepmind-gemma/colab_tool_use.ipynb` |
+| `google/gemma_pytorch` is abandoned for Gemma 4 | Last push 2025-05-30, variants validator |
+| No Gemma 4 technical report PDF as of 2026-04-18 | DeepMind repo README + direct URL probes |
+| No specialized siblings on Gemma 4 base yet (ShieldGemma 2, CodeGemma, PaliGemma 2, EmbeddingGemma all still on Gemma 2/3) | Per-sibling model cards in `tooling/gemma-family/` |
+
+## Open threads — flagged but not implemented
+
+These came out of the mort-bot impact review later in the session. All three are high-value but out of scope for this research pass:
+
+1. **EmbeddingGemma (308M) as a drop-in upgrade for mort-bot's `chat_search` / `memory_read` tools.** Mort currently uses FTS5 keyword-only — misses semantic matches. EmbeddingGemma's Matryoshka sizes (768/512/256/128) + 100+ languages make it a clean fit. Integration sketch in the session conversation; full research at `tooling/gemma-family/embeddinggemma.md`. Starter notebook at `tooling/google-official/cookbook/tutorials_RAG_EmbeddingGemma.ipynb`. **Next steps:** (a) `ollama pull embeddinggemma` on steel141, (b) A/B against existing `nomic-embed-text` on actual mort chat logs before committing to backfill.
+
+2. **ShieldGemma 2 (4B) as a `generate_image` pre-filter for mort-bot.** Mort's SDXL tool has no safety gate. ShieldGemma 2 is Gemma-3-based but scoped exactly to image safety. Would run on steel141 alongside `gemma4:26b` (3090 has headroom).
+
+3. **Native object detection for mort's `vision_describe`.** Gemma 4 does grounded bbox output natively — "Detect the X" → `{box_2d: [ymin, xmin, ymax, xmax]}` in 1000×1000 coords. Mort currently only does free-form vision description.
+
+None of these were implemented in this session.
+
+## Files changed this session
+
+- **New:** `tooling/` (147 files), `tooling/README.md`, `.claude/handoffs/2026-04-18-canonical-tooling-research.md` (this file)
+- **Edited:** `README.md` (added `tooling/` row), `SYNTHESIS.md` (banner + model-selection table), `GOTCHAS.md` (added gemma_pytorch abandonment + expanded fine-tuning section), `CORPUS_tool_calling_format.md` (added Chat Template Context + HF transformers Alternative), `CORPUS_ollama_variants.md` (annotated 26b as MoE + audio note), `CORPUS_capabilities.md` (native system role, thinking, object detection, embedding pointer)
+- **Unchanged:** `IMPLEMENTATIONS.md` (Simon/AI_Visualizer specific, not affected), `CORPUS_architecture.md` (already had MoE details right), `CORPUS_benchmarks.md` (still current)
+
+## What future sessions should know
+
+- **The research is the receipts, not the source of truth.** The top-level `SYNTHESIS.md` / `GOTCHAS.md` / `CORPUS_*.md` docs are the working reference. `tooling/` backs them up with downloaded upstream material when you need provenance or a working script.
+- **Don't re-research the same ground.** Every `tooling/*/README.md` lists what's there and the source URL. Grep the tooling corpus before spawning new web searches.
+- **The 26B-is-MoE and Q4_K_M-is-fine facts were the main things that would have been re-litigated without this handoff.** If you see a claim that conflicts with those, check the HF model card first (`tooling/huggingface/model-cards/gemma-4-26B-A4B-it-README.md`) — Google's own documentation, not secondhand.
+- **Sibling-model generation lag.** When reaching for ShieldGemma / CodeGemma / PaliGemma / EmbeddingGemma, don't assume a Gemma-4 base — they're still on 2 or 3. Use them anyway; just don't confuse generations.
+- **Mort-bot is where the low-hanging fruit is** if Seth wants a next practical project. Three items above; EmbeddingGemma is the biggest lever.
+
+## Session narrative (for context, not action)
+
+1. Started with the existing corpus (SYNTHESIS + GOTCHAS + 5 CORPUS files, ~22KB total). Goal: add canonical upstream tooling.
+2. Dispatched five parallel `general-purpose` agents covering Google official, HF, inference frameworks, Gemma family, fine-tuning.
+3. All five returned clean — 147 files downloaded, each indexed per subdir.
+4. Wrote `tooling/README.md` with 10 findings from the agents. Initial plan: flag only, don't touch the older corpus.
+5. Seth asked how the findings affect mort-bot. Read mort's CLAUDE.md / DECISIONS.md / llm.py / config.py / tools.py. Ranked: EmbeddingGemma (high), ShieldGemma 2 (high), bbox detection (high), E-series audio (medium), everything else (low/none because Ollama hides transformers changes).
+6. Seth ran `ollama show gemma4:26b`; output confirmed MoE (25.8B, Q4_K_M). Walked back the earlier "worth A/B testing" extrapolation — that was training guidance misapplied to inference. Q4_K_M on the MoE is fine.
+7. Seth asked "did you update synthesis?" — no, I hadn't. He authorized the updates. Patched 5 top-level docs; updated `tooling/README.md` findings list to mark merged-vs-flagged.
+8. Wrote this handoff.
+
+## Don't do these things next session
+
+- Don't commit the ipynb files with `--no-verify` unless you ask again — the secrets-hook false positives (base64 notebook outputs, example Ed25519 keys) are documented, but re-bypassing without asking would be scope creep. If you add more ipynb content, strip outputs with `jupyter nbconvert --ClearOutputPreprocessor.enabled=True` first.
+- Don't restructure the folder. It's organized fine: `README.md` → `SYNTHESIS.md` (primary) → specialized `CORPUS_*.md` / `GOTCHAS.md` / `IMPLEMENTATIONS.md` → `tooling/` (receipts). New material goes into one of those buckets, not a new top-level thing.
+- Don't assume the Gemma 3 technical report covers Gemma 4. It's the closest thing we have but it predates Gemma 4.