bd65f4a84c
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4.0 KiB
4.0 KiB
Model Card: Mortdecai
Model Details
| Field | Value |
|---|---|
| Name | Mortdecai |
| Version | 0.4.0 |
| Base Model | Qwen3.5-9B (Apache 2.0) |
| Adaptation | QLoRA (4-bit base + LoRA adapters in FP16) |
| Parameters | 9.4B total, 29M trainable (0.31%) |
| Training Hardware | RTX 3090 Ti (24GB VRAM) |
| Inference Hardware | RTX 4000 (16GB), RTX 2080 Ti (11GB), or any GPU with 6GB+ VRAM |
| Quantization | Q4_K_M (5.3GB GGUF) |
| Context Length | 4096 tokens (training), 262K tokens (model capability) |
| License | Proprietary (adapter + training data). Base model: Apache 2.0 |
Intended Use
Mortdecai is designed for Minecraft Java Edition 1.21.x server operations:
- Translating natural language to valid Minecraft commands
- Controlling an AI God character that responds to player prayers
- Server administration via chat (gamerules, effects, world editing)
- Error correction (self-corrects failed RCON commands)
Not intended for:
- General-purpose chat or reasoning
- Other games or non-Minecraft domains
- Safety-critical applications
- Use without the validator safety layer
Training Data
| Source | Count | Description |
|---|---|---|
| Hand-curated examples | 966 | Command syntax, recipes, enchantments, entities, effects |
| Player interactions | 654 | Real prayers from live server players |
| Sudo translations | 525 | Natural language → command pairs |
| Tool-calling sequences | 1,159 | Multi-turn RCON execution with error correction |
| Self-play | 5,000+ | Model-generated prompts validated via RCON |
| API distillation | 344 | Claude Haiku gold-standard responses |
| Error corrections | 150+ | Wrong → right command pairs |
Total: ~8,400+ examples
Data Collection Methods
- Manual curation — Minecraft Wiki, command reference, recipe databases
- Live server logs — Real player interactions on Paper 1.21.x servers
- Bot collection — Mineflayer bots with Gemini/Dolphin prompt generation
- API distillation — Claude Haiku and Gemini Flash responses
- Self-play — Model generates edge cases, attempts via RCON, learns from results
- RCON validation — Every command tested against a live Minecraft server
Known Biases
- Training data skewed toward English (~97%) with limited multilingual coverage (3%)
- Command distribution favors
giveandeffectover complexexecutechains - God persona training reflects a specific dramatic character — not neutral
- Player interaction data comes from a small group of testers (< 10 players)
- Self-play data may overrepresent patterns the model is already good at
Evaluation
Bake-off Results (0.4.0, 2,397 test cases)
| Metric | Score |
|---|---|
| Command match | 75.5% |
| Exact match | 22.9% |
| Syntax correct | 80.5% |
| Safety compliance | 99.7% |
| No gratuitous tp | 98.5% |
| Avg latency | 4.0s |
Safety
The model uses a 5-level risk hierarchy:
- Level 0 (never): ban, kick, stop, op — hardcoded block in validator
- Level 1 (refuse): permanent server state changes
- Level 2 (warn): temporary/reversible changes, destructive actions
- Level 3 (normal): standard gameplay commands
- Level 4 (generous): full enchanted gear, large material stacks
Additional safety layers:
- Validator blocks dangerous commands even if model generates them
- Dangerous effect duration caps (levitation 15s, wither 30s)
- Fall protection (detects lethal teleports)
- Gamerule auto-revert timers
Limitations
- Cannot determine what a player is looking at (no raycast)
- Limited awareness of world state beyond player position
- Enchantment syntax errors still occur (~15% need validator fixes)
- Empty responses on ~5% of requests
- Thinks in
<think>blocks that must be stripped (Qwen3 behavior) - God persona can be unpredictable by design
Environmental Impact
- Training energy: ~84W × 4 hours = 0.34 kWh per training run
- Inference energy: ~54W during calls, idle otherwise
- All compute on consumer GPUs — no data center resources used