# PLAN.md — Mortdecai Project Roadmap > **Last updated:** 2026-03-20 > **Model name:** Mortdecai > **Domain:** mortdec.ai > **Status legend:** `[ ]` planned | `[~]` in progress | `[x]` done | `[-]` cancelled/deferred --- ## Vision **Mortdecai** is a fine-tuned 9B parameter language model for Minecraft server operations. It translates natural language to commands, controls an AI God character, self-corrects errors via RCON feedback, and improves through self-play. It runs locally on consumer hardware with zero cloud dependencies at inference time. --- ## Current State (2026-03-20) ### Models | Model | Base | Examples | Loss | Status | |-------|------|---------|------|--------| | v1 | Qwen3-8B | 233 | 0.10 | Retired (overfit) | | v2 | Qwen3-8B | 361 | 2.03 | Retired | | v3 | Qwen3-8B | 1,308 | 0.55 | **Deployed on prod** | | **v4** | **Qwen3.5-9B** | **3,369** | **Training (~88%)** | ETA ~30 min | ### Infrastructure | Component | Location | Details | |-----------|----------|---------| | Training GPU | steel141 RTX 3090 Ti (24GB) | QLoRA via Unsloth | | Prod inference | node-197 RTX 4000 (16GB) | Ollama, mortdecai-v3 | | Minecraft servers | CT 644 on node-112 | paper-ai (25567), shrink-world (25566), dev (25568), vanilla (25565) | | Dev data collection | CT 644 | Gemini 2.5 Flash via API, 10 bots | | Whitelist app | CT 644 port 8099 | minecraft.mortdec.ai | | Caddy proxy | CT 600 on node-241 | mortdec.ai, minecraft.mortdec.ai | | GPU monitoring | Grafana on CT 300 (node-173) | Prometheus + nvidia exporter on steel141 | | LangGraph gateway | CT 644 port 8091 | Disabled on prod (fresh session mode available) | ### API Spend | Provider | Spent | Budget | Status | |----------|-------|--------|--------| | Claude Haiku | $20.01 | $20 | Exhausted | | Gemini 2.5 Flash | ~$0.50 | $20 | Active (dev bots) | ### Training Data: 2,318 seed + 1,159 tool-calling = 3,477 total | Category | Count | |----------|-------| | Command syntax reference | 107 | | Crafting recipes & chains | 176 | | Enchantments (mutual exclusions, max levels) | 60 | | Entities/mobs (summon, kill, NBT) | 60 | | Execute chains | 45 | | Multiplayer (selectors, teams, scoreboards) | 45 | | Advanced commands (tellraw, clone, data, ride) | 45 | | WorldEdit | 45 | | Paper server features | 55 | | Cosmetics/XP/effects | 42 | | Gamerules | 49 | | Risk hierarchy (L0-L3, prompt injection) | 40 | | Quantity boundaries | 32 | | Dangerous effect caps | 12 | | Revert-aware gamerules + drops | 20 | | Error correction pairs | 47 | | Claude-distilled outputs | 344 | | Bot audit interactions | 448+ | | Boundary/safety examples | 95+ | | Tool-calling (multi-turn with RCON) | 1,159 | --- ## Active TODOs ### Immediate (this session) - [~] Mortdecai v4 training completing (~30 min remaining) - [ ] Export v4 to GGUF, deploy to RTX 4000 as mortdecai-v4 - [ ] Enable single-call mode on prod with v4 - [ ] Run v4 bake-off and compare to v3/base - [ ] Commit and push all PaperFork changes ### Short-term - [ ] Deploy self-play loop with v4 (3-tier: drills, self-critique, adversarial) - [ ] Add ground-level detection for teleport safety (query terrain before tp) - [ ] Build revert_after/revert_commands into v5 training format - [ ] Add Gemini milestone POS printing - [ ] Fix whitelist app UUID lookup for vanilla server path - [ ] Start Greenfield world on paper-ai (downloaded, needs MCSManager start) ### Model improvements for v5 - [ ] Train on Qwen3.5-9B with tool-calling format (rcon.execute, wiki_lookup, etc.) - [ ] Self-play generated data (run 200 rounds after v4 deploys) - [ ] Ingest all Gemini 2.5 Flash training data ($20 worth) - [ ] Add revert_after field to training output format - [ ] Ground-level detection training (check terrain before tp) - [ ] More error correction from production RCON failures - [ ] Enchantment count-before-bracket error correction ### Infrastructure - [ ] Add GPU monitoring for RTX 4000 (second exporter) - [ ] Validator hit-rate analysis — remove fixes that fire <1% - [ ] Automate training pipeline: ingest → dedup → train → export → deploy - [ ] POS receipt for Gemini milestones - [ ] Consider moving to Mortdecai as Ollama model name on prod ### Content & Community - [ ] Invite more playtesters via minecraft.mortdec.ai - [ ] Update mortdec.ai README with v4 results when available - [ ] Consider public HuggingFace release once quality is validated - [ ] WorldEdit schematic library expansion (77 installed, need more) --- ## Risk Hierarchy Commands are classified by permanence, not just danger: | Level | Permanence | Examples | Model behavior | |:-----:|-----------|----------|----------------| | **0** | Irreversible/admin | ban, kick, stop, op, deop | Never execute | | **1** | Permanent toggle | gamemode @a, permanent gamerules, difficulty | Refuse or execute for self only | | **2** | Temporary/reversible | gamerules with time limits, brief difficulty | Allow, schedule auto-revert | | **3** | Transient | time, weather, tick speed, chat settings | Execute freely | | **4** | Generous | full enchanted gear, large material stacks | Execute for worthy requests | Gamerule revert system: changes auto-revert after 5-10 min unless "permanently" specified. Dangerous effect caps (hardcoded in validator): - Levitation: 15s max - Wither: 30s max - Poison: 60s max - Nausea: 30s max --- ## Key Architecture Decisions | Date | Decision | Rationale | |------|----------|-----------| | 2026-03-18 | Serving: gemma3n:e4b → qwen3.5:9b → mortdecai-v3 | Progressive upgrades as better models trained | | 2026-03-18 | Fine-tuning: Qwen3-8B → Qwen3.5-9B | 3.5 has 2x base accuracy (70% vs 34%), native tool-calling | | 2026-03-18 | God Soul document | Character framework adapted from Claude's soul. Defines identity, judgment, quantity boundaries | | 2026-03-19 | API cascade: Haiku → Gemini → local | Progressive fallback for dev data collection. $40 total API budget | | 2026-03-19 | /no_think in training | Prevents Qwen3 thinking tokens from consuming output budget | | 2026-03-19 | Single-call mode | One LLM call for commands + message (v3+). Two-call for older models | | 2026-03-19 | Error correction via RCON | Model tries command → RCON error → model self-corrects → retry | | 2026-03-19 | 3-tier self-play | Drills, self-critique, adversarial. Model generates its own training data | | 2026-03-20 | Gamerule revert timers | State changes auto-revert. Permanence determines risk level | | 2026-03-20 | Dangerous effect caps | Validator hardcodes max durations for levitation, wither, poison, nausea | | 2026-03-20 | Expanded safe_prefixes | gamerule, particle, playsound, title, scoreboard, team, bossbar, locate, etc. | | 2026-03-20 | Model named Mortdecai | mortdec.ai domain, Rajdhani Bold font, Sethian orange branding | --- ## Dev Server | Property | Value | |----------|-------| | Location | CT 644 on node-112 | | Game port | 25568 | | RCON port | 25578 | | RCON password | REDACTED_RCON | | Data dir | /opt/paper-dev-25568/ | | AI God | Gemini 2.5 Flash via API cascade | | Bots | 10 swarm bots (swarm_bots.js) | | Audit log | /var/log/mc_training_audit_dev.jsonl | --- ## Success Criteria | Metric | v3 (current) | v4 (target) | v5 (goal) | |--------|:-----------:|:-----------:|:---------:| | Command accuracy | ~70% | 85%+ | 95%+ | | Safety compliance | ~95% | 99%+ | 99.9%+ | | Error self-correction | N/A | 50%+ | 80%+ | | Response latency | 5-15s | <5s | <3s | | Empty response rate | ~10% | <5% | <2% | | Think token leakage | Yes | No | No | --- *This document is updated as the project evolves. Check git history for previous versions.*