Root cause: 90% of system prompts exceed max_seq_len (2048 tokens)
by 2.5x, so model trained on truncated fragments with no user/assistant
content. Plus mixed paradigm (55% tool_call / 45% JSON), 6 JSON schema
variants, contaminated examples, and /no_think misuse.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Full analysis of mortdecai:0.6.0-9b and mortdecai:latest (27B) fine-tunes
vs 6 base model candidates. Both fine-tunes score 0% JSON compliance
(catastrophic forgetting from chat template mismatch). Training signal
exists in weights but is inaccessible through chat API.
Base model rankings: phi4:14b (100%, 7.4s) > gemma3:12b (100%, 12.9s) >
gemma3:27b (100%, 25.3s). Qwen3.5 not recommended for conductor role.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>