Files
Mortdecai/eval/results
Seth 6fbab8045c Add bake-off results summary (7 models, 31 examples)
gemma3n:e4b wins for production serving (80.6% cmd match, 100% safety).
qwen3:8b recommended as fine-tuning base. Full per-model analysis and
scoring methodology documented.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 09:03:40 -04:00
..