Qwen3.5-9B bake-off results, model named Mortdecai

Bake-off: qwen3.5:9b base model, 147 cases: - 70.1% command match (2x qwen3:8b baseline) - 15.6% needed syntax fixes - 29.9% miss (mostly God/prayer — no persona training) - Avg 7.5s, median 5.7s (thinking tokens) Model officially named Mortdecai. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 19:46:00 -04:00
parent 9abf9238c5
commit 910d7b4ca7
1 changed files with 17 additions and 0 deletions
@@ -0,0 +1,17 @@
 {
  "timestamp": 1773963920,
  "ollama_url": "http://192.168.0.141:11434",
  "note": "Partial run \u2014 147 of 1542 cases before manual kill",
  "summary": [
    {
      "model": "qwen3.5:9b",
      "n": 147,
      "cmd_match_%": 70.1,
      "syntax_fixes_%": 15.6,
      "miss_%": 29.9,
      "avg_latency_ms": 7457,
      "median_latency_ms": 5660,
      "note": "Base model, no fine-tuning. 2x better than qwen3:8b base (70% vs 34%)"
    }
  ]
 }