feat: CLI coding agent bakeoff — 26b reproducibly silent-stops at write_file

Ran minimal agent loop (Ollama /api/chat + read_file/write_file/run_bash) on
steel141 3090 Ti against 3 models on a broken-median-function task:

- gemma4:31b-it-q4_K_M: PASS (8 iters, 1 write, 44s) — textbook trace
- qwen3-coder:30b: PASS (15 iters, 1 write, 22s) — correct but chatty
- gemma4:26b: FAIL (6 iters, 0 writes) — silently stops with eval=4
  after reading source. Reproduced on second run. One-shot probe
  confirms 26b CAN produce the correct fix — failure is specifically
  at the write_file tool-call argument boundary.

Updates GOTCHAS with a new HIGH-severity entry, SYNTHESIS model-selection
table, CORPUS_cli_coding_agent.md empirical-follow-up pointer, and adds
docs/reference/bakeoff-2026-04-18.md with the full writeup.
This commit is contained in:
Mortdecai
2026-04-18 13:27:50 -04:00
parent 4b9c537dda
commit a945207aab
15 changed files with 1172 additions and 1 deletions
+1 -1
View File
@@ -176,7 +176,7 @@ Vision is on ALL Gemma 4 variants (E2B, E4B, 26B, 31B). Audio is E-series only.
| Maximum quality (single-model GPU) | `gemma4:31b-it-q4_K_M` | Dense 31B, sharpest but 5x slower, more VRAM pressure |
| Rapid prototyping / testing | `gemma4:26b` | Fast enough for interactive dev |
| Retrieval / embeddings | `embeddinggemma` (308M, separate model) | Gemma 4 has no embedding mode; use the sibling |
| CLI coding agent (openclaw / open code / pi / hermes / aider) | `gemma4:26b` (or compare to `qwen3-coder:30b`) | Trained tool use + strong LiveCodeBench, but Google didn't publish SWE-bench — see `CORPUS_cli_coding_agent.md` for the honest positioning and the homelab bakeoff plan |
| CLI coding agent (openclaw / open code / pi / hermes / aider) | `gemma4:31b-it-q4_K_M` (verified), fallback `qwen3-coder:30b` | 2026-04-18 bakeoff on 3090 Ti: 31B passes cleanly (8 iters, 1 write), Qwen3-Coder passes verbosely (15 iters), **26B reproducibly silent-stops at the write_file tool call** — see `CORPUS_cli_coding_agent.md` and `docs/reference/bakeoff-2026-04-18.md` |
## Anti-Patterns