a945207aab
Ran minimal agent loop (Ollama /api/chat + read_file/write_file/run_bash) on steel141 3090 Ti against 3 models on a broken-median-function task: - gemma4:31b-it-q4_K_M: PASS (8 iters, 1 write, 44s) — textbook trace - qwen3-coder:30b: PASS (15 iters, 1 write, 22s) — correct but chatty - gemma4:26b: FAIL (6 iters, 0 writes) — silently stops with eval=4 after reading source. Reproduced on second run. One-shot probe confirms 26b CAN produce the correct fix — failure is specifically at the write_file tool-call argument boundary. Updates GOTCHAS with a new HIGH-severity entry, SYNTHESIS model-selection table, CORPUS_cli_coding_agent.md empirical-follow-up pointer, and adds docs/reference/bakeoff-2026-04-18.md with the full writeup.
16 lines
361 B
Plaintext
16 lines
361 B
Plaintext
# Local scratch / backups — per ~/.claude/CLAUDE.md, Claude keeps backups before
|
|
# editing any file. Useful locally; not useful in the tracked history.
|
|
.backup/
|
|
|
|
# Python
|
|
__pycache__/
|
|
*.pyc
|
|
.pytest_cache/
|
|
|
|
# Bakeoff work directories — recreated from task_seed/ each run; logs preserved separately
|
|
scripts/bakeoff/runs/*/work/
|
|
|
|
# Editor / OS
|
|
.DS_Store
|
|
*.swp
|