Files
Mortdecai/MODEL_CARD.md
T
2026-03-20 21:43:21 -04:00

4.0 KiB
Raw Blame History

Model Card: Mortdecai

Model Details

Field Value
Name Mortdecai
Version 0.4.0
Base Model Qwen3.5-9B (Apache 2.0)
Adaptation QLoRA (4-bit base + LoRA adapters in FP16)
Parameters 9.4B total, 29M trainable (0.31%)
Training Hardware RTX 3090 Ti (24GB VRAM)
Inference Hardware RTX 4000 (16GB), RTX 2080 Ti (11GB), or any GPU with 6GB+ VRAM
Quantization Q4_K_M (5.3GB GGUF)
Context Length 4096 tokens (training), 262K tokens (model capability)
License Proprietary (adapter + training data). Base model: Apache 2.0

Intended Use

Mortdecai is designed for Minecraft Java Edition 1.21.x server operations:

  • Translating natural language to valid Minecraft commands
  • Controlling an AI God character that responds to player prayers
  • Server administration via chat (gamerules, effects, world editing)
  • Error correction (self-corrects failed RCON commands)

Not intended for:

  • General-purpose chat or reasoning
  • Other games or non-Minecraft domains
  • Safety-critical applications
  • Use without the validator safety layer

Training Data

Source Count Description
Hand-curated examples 966 Command syntax, recipes, enchantments, entities, effects
Player interactions 654 Real prayers from live server players
Sudo translations 525 Natural language → command pairs
Tool-calling sequences 1,159 Multi-turn RCON execution with error correction
Self-play 5,000+ Model-generated prompts validated via RCON
API distillation 344 Claude Haiku gold-standard responses
Error corrections 150+ Wrong → right command pairs

Total: ~8,400+ examples

Data Collection Methods

  1. Manual curation — Minecraft Wiki, command reference, recipe databases
  2. Live server logs — Real player interactions on Paper 1.21.x servers
  3. Bot collection — Mineflayer bots with Gemini/Dolphin prompt generation
  4. API distillation — Claude Haiku and Gemini Flash responses
  5. Self-play — Model generates edge cases, attempts via RCON, learns from results
  6. RCON validation — Every command tested against a live Minecraft server

Known Biases

  • Training data skewed toward English (~97%) with limited multilingual coverage (3%)
  • Command distribution favors give and effect over complex execute chains
  • God persona training reflects a specific dramatic character — not neutral
  • Player interaction data comes from a small group of testers (< 10 players)
  • Self-play data may overrepresent patterns the model is already good at

Evaluation

Bake-off Results (0.4.0, 2,397 test cases)

Metric Score
Command match 75.5%
Exact match 22.9%
Syntax correct 80.5%
Safety compliance 99.7%
No gratuitous tp 98.5%
Avg latency 4.0s

Safety

The model uses a 5-level risk hierarchy:

  • Level 0 (never): ban, kick, stop, op — hardcoded block in validator
  • Level 1 (refuse): permanent server state changes
  • Level 2 (warn): temporary/reversible changes, destructive actions
  • Level 3 (normal): standard gameplay commands
  • Level 4 (generous): full enchanted gear, large material stacks

Additional safety layers:

  • Validator blocks dangerous commands even if model generates them
  • Dangerous effect duration caps (levitation 15s, wither 30s)
  • Fall protection (detects lethal teleports)
  • Gamerule auto-revert timers

Limitations

  • Cannot determine what a player is looking at (no raycast)
  • Limited awareness of world state beyond player position
  • Enchantment syntax errors still occur (~15% need validator fixes)
  • Empty responses on ~5% of requests
  • Thinks in <think> blocks that must be stripped (Qwen3 behavior)
  • God persona can be unpredictable by design

Environmental Impact

  • Training energy: ~84W × 4 hours = 0.34 kWh per training run
  • Inference energy: ~54W during calls, idle otherwise
  • All compute on consumer GPUs — no data center resources used