feat: CLI coding agent bakeoff — 26b reproducibly silent-stops at write_file
Ran minimal agent loop (Ollama /api/chat + read_file/write_file/run_bash) on steel141 3090 Ti against 3 models on a broken-median-function task: - gemma4:31b-it-q4_K_M: PASS (8 iters, 1 write, 44s) — textbook trace - qwen3-coder:30b: PASS (15 iters, 1 write, 22s) — correct but chatty - gemma4:26b: FAIL (6 iters, 0 writes) — silently stops with eval=4 after reading source. Reproduced on second run. One-shot probe confirms 26b CAN produce the correct fix — failure is specifically at the write_file tool-call argument boundary. Updates GOTCHAS with a new HIGH-severity entry, SYNTHESIS model-selection table, CORPUS_cli_coding_agent.md empirical-follow-up pointer, and adds docs/reference/bakeoff-2026-04-18.md with the full writeup.
This commit is contained in:
@@ -0,0 +1,14 @@
|
||||
# Bakeoff Task
|
||||
|
||||
A tiny Python package (`calc/`) with a statistics module. Run `pytest` from this
|
||||
directory — two tests currently fail because `median` returns the upper-middle
|
||||
element instead of averaging the two middle elements on even-length inputs.
|
||||
|
||||
Your job: make all tests pass. Do not disable or modify the tests.
|
||||
|
||||
Allowed tools:
|
||||
- `read_file(path)` — read a file (relative to this directory)
|
||||
- `write_file(path, content)` — overwrite a file (relative to this directory)
|
||||
- `run_bash(command)` — run a shell command (cwd is this directory)
|
||||
|
||||
When all tests pass, reply with a short summary of the fix and stop calling tools.
|
||||
Reference in New Issue
Block a user