gemma3n:e4b wins for production serving (80.6% cmd match, 100% safety).
qwen3:8b recommended as fine-tuning base. Full per-model analysis and
scoring methodology documented.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bake-off tested 7 models on 31 seed examples via GPU-accelerated Ollama
on node-197 RTX 4000. gemma3n:e4b leads for serving (80.6% cmd match,
100% safety, 5.9s). qwen3:8b recommended as fine-tuning base (Apache 2.0,
best syntax quality, strong ecosystem). Full research in MODEL_RESEARCH.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- knowledge/mc-commands/commands.json: 14 MC commands with JE syntax, args, examples, common errors, 1.21 version notes
- knowledge/server-context/servers.json: all 4 servers (mc1, shrink, paper-ai, paper-dev) with full config
- knowledge/build_index.py: TF-IDF indexer + search function (19 docs, 725 terms)
- All command syntax validated live on dev server via RCON (12/13 passed)
- PLAN.md: mark Phase 1.3 complete
- IDEA.md: project scope (Minecraft ops AI assistant via qwen3-coder LoRA/SFT)
- PLAN.md: complete roadmap with prior art analysis, architecture, phased plan, dev server docs
- data/schema.json: training example JSON Schema with negative_output support
- data/processed/seed_dataset.jsonl: 31 validated examples from repair code, prayer logs, session history
- data/validate_dataset.py: schema validator with summary statistics
- ingame/: Mineflayer bot framework (test_connect, spawn_bots, aware_bots with full event logging)
- Directory structure for knowledge/, eval/, training/, agent/ (Phase 1.3+ work)