# Model Card: Mortdecai ![Training Progress](branding/training_progress.svg) ## Model Details | Field | Value | |-------|-------| | **Name** | Mortdecai | | **Version** | 0.5.0 | | **Base Model** | Qwen3.5-9B (Apache 2.0) | | **Adaptation** | QLoRA (4-bit base + LoRA adapters in FP16) | | **Parameters** | 9.4B total, 29M trainable (0.31%) | | **Training Hardware** | RTX 3090 Ti (24GB VRAM) | | **Inference Hardware** | RTX 4000 (16GB), RTX 2080 Ti (11GB), GTX 1660 Super (6GB), or any GPU with 6GB+ | | **Quantization** | Q4_K_M (5.6GB GGUF) | | **Context Length** | 4096 tokens (training), 262K tokens (model capability) | | **License** | Proprietary (adapter + training data). Base model: Apache 2.0 | ## Intended Use Mortdecai is designed for **Minecraft Java Edition 1.21.x server operations**: - Translating natural language to valid Minecraft commands - Controlling an AI God character that responds to player prayers - Server administration via chat (gamerules, effects, world editing) - Error correction (self-corrects failed RCON commands) **Not intended for:** - General-purpose chat or reasoning - Other games or non-Minecraft domains - Safety-critical applications - Use without the validator safety layer ## Training Data | Source | Count | Description | |--------|-------|-------------| | Hand-curated seed examples | 3,196 | Command syntax, recipes, enchantments, entities, effects, memory, events | | Tool-calling sequences | 1,430 | Multi-turn RCON execution with 17 tools (script, memory, wiki, plugins) | | IGLU build dataset | 4,656 | Natural language → block placement commands from Microsoft Research | | Plugin training (RCON-validated) | 104 | WorldGuard, CoreProtect, EssentialsX, LuckPerms, FAWE | | Exploration self-play | 150 | Wiki-grounded knowledge discovery with RCON validation | | Self-play (0.4.0 + 0.5.0) | 2,900+ | Model-generated prompts validated via RCON | | Live server audit | 8,000+ | Wolf bot + real player interactions from 3 servers | **Total: ~20,000+ examples across all sources** ### Tool Architecture (17 tools) | Category | Tools | |----------|-------| | Execution | rcon.execute | | Knowledge | minecraft.wiki_lookup, plugin.docs_lookup, minecraft.changelog_lookup, paper.docs_lookup | | World Sensing | world.player_info, world.server_state, world.nearby_entities | | Memory | memory.read, memory.write | | Scripts | script.write, script.validate, script.execute, script.read, script.list, script.delete, script.schedule | ### Data Collection Methods 1. **Manual curation** — Minecraft Wiki, command reference, recipe databases 2. **Live server logs** — Real player interactions on Paper 1.21.x servers 3. **Bot collection** — Mineflayer bots with Gemini/Dolphin prompt generation 4. **API distillation** — Claude Haiku and Gemini Flash responses 5. **Self-play** — Model generates edge cases, attempts via RCON, learns from results 6. **RCON validation** — Every command tested against a live Minecraft server ### Known Biases - Training data skewed toward English (~97%) with limited multilingual coverage (3%) - Command distribution favors `give` and `effect` over complex `execute` chains - God persona training reflects a specific dramatic character — not neutral - Player interaction data comes from a small group of testers (< 10 players) - Self-play data may overrepresent patterns the model is already good at ## Evaluation ### Bake-off Results (0.5.0 vs 0.4.0, 38 prompts × 12 categories) | Metric | 0.4.0 | 0.5.0 | |--------|-------|-------| | Overall success rate | 45.2% | 46.8% | | Avg response time | 2.60s | 2.11s | | Errors (crashes) | 2 | 0 | | Empty responses | 0 | 0 | **Category improvements (0.5.0 vs 0.4.0):** | Category | 0.4.0 | 0.5.0 | Change | |----------|-------|-------|--------| | Enchantments | 20% | 67% | **+47%** | | EssentialsX | 0% | 60% | **+60%** | | Effects | 0% | 25% | **+25%** | | Basic commands | 75% | 75% | — | | Teleport | 100% | 100% | — | | Error recovery | 50% | 50% | — | ### Safety The model uses a 5-level risk hierarchy: - **Level 0 (never):** ban, kick, stop, op — hardcoded block in validator - **Level 1 (refuse):** permanent server state changes - **Level 2 (warn):** temporary/reversible changes, destructive actions - **Level 3 (normal):** standard gameplay commands - **Level 4 (generous):** full enchanted gear, large material stacks Additional safety layers: - Validator blocks dangerous commands even if model generates them - Dangerous effect duration caps (levitation 15s, wither 30s) - Fall protection (detects lethal teleports) - Gamerule auto-revert timers ### Limitations - Cannot determine what a player is looking at (no raycast) - Limited awareness of world state beyond player position - Enchantment syntax errors still occur (~15% need validator fixes) - Empty responses on ~5% of requests - Thinks in `` blocks that must be stripped (Qwen3 behavior) - God persona can be unpredictable by design ## Environmental Impact - **Training energy:** ~84W × 4 hours = 0.34 kWh per training run - **Inference energy:** ~54W during calls, idle otherwise - **All compute on consumer GPUs** — no data center resources used