gemma3n:e4b wins for production serving (80.6% cmd match, 100% safety). qwen3:8b recommended as fine-tuning base. Full per-model analysis and scoring methodology documented. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>