Files
Mortdecai/MODEL_CARD.md
T
2026-03-20 21:43:21 -04:00

107 lines
4.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Model Card: Mortdecai
## Model Details
| Field | Value |
|-------|-------|
| **Name** | Mortdecai |
| **Version** | 0.4.0 |
| **Base Model** | Qwen3.5-9B (Apache 2.0) |
| **Adaptation** | QLoRA (4-bit base + LoRA adapters in FP16) |
| **Parameters** | 9.4B total, 29M trainable (0.31%) |
| **Training Hardware** | RTX 3090 Ti (24GB VRAM) |
| **Inference Hardware** | RTX 4000 (16GB), RTX 2080 Ti (11GB), or any GPU with 6GB+ VRAM |
| **Quantization** | Q4_K_M (5.3GB GGUF) |
| **Context Length** | 4096 tokens (training), 262K tokens (model capability) |
| **License** | Proprietary (adapter + training data). Base model: Apache 2.0 |
## Intended Use
Mortdecai is designed for **Minecraft Java Edition 1.21.x server operations**:
- Translating natural language to valid Minecraft commands
- Controlling an AI God character that responds to player prayers
- Server administration via chat (gamerules, effects, world editing)
- Error correction (self-corrects failed RCON commands)
**Not intended for:**
- General-purpose chat or reasoning
- Other games or non-Minecraft domains
- Safety-critical applications
- Use without the validator safety layer
## Training Data
| Source | Count | Description |
|--------|-------|-------------|
| Hand-curated examples | 966 | Command syntax, recipes, enchantments, entities, effects |
| Player interactions | 654 | Real prayers from live server players |
| Sudo translations | 525 | Natural language → command pairs |
| Tool-calling sequences | 1,159 | Multi-turn RCON execution with error correction |
| Self-play | 5,000+ | Model-generated prompts validated via RCON |
| API distillation | 344 | Claude Haiku gold-standard responses |
| Error corrections | 150+ | Wrong → right command pairs |
**Total: ~8,400+ examples**
### Data Collection Methods
1. **Manual curation** — Minecraft Wiki, command reference, recipe databases
2. **Live server logs** — Real player interactions on Paper 1.21.x servers
3. **Bot collection** — Mineflayer bots with Gemini/Dolphin prompt generation
4. **API distillation** — Claude Haiku and Gemini Flash responses
5. **Self-play** — Model generates edge cases, attempts via RCON, learns from results
6. **RCON validation** — Every command tested against a live Minecraft server
### Known Biases
- Training data skewed toward English (~97%) with limited multilingual coverage (3%)
- Command distribution favors `give` and `effect` over complex `execute` chains
- God persona training reflects a specific dramatic character — not neutral
- Player interaction data comes from a small group of testers (< 10 players)
- Self-play data may overrepresent patterns the model is already good at
## Evaluation
### Bake-off Results (0.4.0, 2,397 test cases)
| Metric | Score |
|--------|-------|
| Command match | 75.5% |
| Exact match | 22.9% |
| Syntax correct | 80.5% |
| Safety compliance | 99.7% |
| No gratuitous tp | 98.5% |
| Avg latency | 4.0s |
### Safety
The model uses a 5-level risk hierarchy:
- **Level 0 (never):** ban, kick, stop, op — hardcoded block in validator
- **Level 1 (refuse):** permanent server state changes
- **Level 2 (warn):** temporary/reversible changes, destructive actions
- **Level 3 (normal):** standard gameplay commands
- **Level 4 (generous):** full enchanted gear, large material stacks
Additional safety layers:
- Validator blocks dangerous commands even if model generates them
- Dangerous effect duration caps (levitation 15s, wither 30s)
- Fall protection (detects lethal teleports)
- Gamerule auto-revert timers
### Limitations
- Cannot determine what a player is looking at (no raycast)
- Limited awareness of world state beyond player position
- Enchantment syntax errors still occur (~15% need validator fixes)
- Empty responses on ~5% of requests
- Thinks in `<think>` blocks that must be stripped (Qwen3 behavior)
- God persona can be unpredictable by design
## Environmental Impact
- **Training energy:** ~84W × 4 hours = 0.34 kWh per training run
- **Inference energy:** ~54W during calls, idle otherwise
- **All compute on consumer GPUs** — no data center resources used