Files
Mortdecai/PLAN.md
T
Seth f39809eaca Semver rename: v1-v5 → 0.1.0-0.5.0 across all files
Versioning scheme: semantic versioning (MAJOR.MINOR.PATCH)
- 0.x.0 = pre-release development
- 1.0.0 = first public/monetized release

Renamed everywhere: PLAN.md, training scripts, self-play, overnight script,
status printer, whitelist app, discord bot, all training data references.

Ollama models retagged: mortdecai-v4 → mortdecai:0.4.0
Server configs updated on all three servers.
Self-play restarted with new model name.
Entity targeting + radius-aware kill + distance scale training added.

Seed dataset: 2,503 + tool: 1,159 + self-play: 5,059 = 8,721 total examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 21:37:14 -04:00

240 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# PLAN.md — Mortdecai Project Roadmap
> **Last updated:** 2026-03-20 04:45 UTC
> **Model name:** Mortdecai
> **Domain:** mortdec.ai
> **Status legend:** `[ ]` planned | `[~]` in progress | `[x]` done | `[-]` cancelled/deferred
---
## Vision
**Mortdecai** is a fine-tuned 9B parameter language model for Minecraft server operations. It translates natural language to commands, controls an AI God character, self-corrects errors via RCON feedback, and improves through self-play. Runs locally on consumer hardware with zero cloud dependencies at inference time.
---
## Current State
### Models
| Model | Base | Examples | Loss | Status |
|-------|------|---------|------|--------|
| 0.1.0 | Qwen3-8B | 233 | 0.10 | Retired (overfit) |
| 0.2.0 | Qwen3-8B | 361 | 2.03 | Retired |
| 0.3.0 | Qwen3-8B | 1,308 | 0.55 | Available on steel141 |
| **0.4.0** | **Qwen3.5-9B** | **3,369** | **0.20** | **Deployed on prod** |
### Infrastructure
| Component | Location | Details |
|-----------|----------|---------|
| Training GPU | steel141 RTX 3090 Ti (24GB) | QLoRA via Unsloth 2026.3.8 |
| Prod inference | node-197 RTX 4000 (16GB) | Ollama, mortdecai:0.4.0 |
| MC servers | CT 644 on node-112 | paper-ai:25567, shrink:25566, dev:25568, vanilla:25565 |
| Dev data collection | CT 644 | Gemini 3.1 Flash Lite (preview), 5 bots |
| Whitelist app | CT 644:8099 | minecraft.mortdec.ai |
| Caddy proxy | CT 600 on node-241 | mortdec.ai, minecraft.mortdec.ai |
| GPU monitoring | Grafana CT 300 (node-173) | Prometheus + nvidia exporter on steel141 |
| Greenfield map | paper-ai | Downloaded, world swapped, needs MCSManager start |
| WorldEdit schematics | paper-ai | 77 installed in FAWE/schematics/ |
### API Spend
| Provider | Spent | Budget | Status |
|----------|-------|--------|--------|
| Claude Haiku | $20.01 | $20 | Exhausted |
| Gemini (all) | ~$0.50 | $20 | Active on dev (3.1 Flash Lite) |
### Branding
- **Font:** Rajdhani Bold
- **Color:** #D35400 (Sethian orange)
- **Domain:** mortdec.ai → Gitea repo, minecraft.mortdec.ai → whitelist page
- **Public repo:** https://git.sethpc.xyz/Seth/Mortdecai
### Training Data: 2,397 seed + 1,159 tool-calling = 3,556 total
| Category | Count |
|----------|-------|
| Command syntax reference | 107 |
| Crafting recipes & chains | 176 |
| Enchantments (mutual exclusions, max levels) | 60 |
| Entities/mobs (summon, kill, NBT) | 60 |
| Execute chains | 45 |
| Multiplayer (selectors, teams, scoreboards) | 45 |
| Advanced commands (tellraw, clone, data, ride) | 45 |
| WorldEdit | 45 |
| Paper server features | 55 |
| Cosmetics/XP/effects | 42 |
| Gamerules (49) + risk hierarchy (40) | 89 |
| Quantity boundaries | 32 |
| Dangerous effect caps (levitation, wither, etc.) | 12 |
| Fall safety + drops + suffocation | 33 |
| Death/environment (drowning, lava, void, mobs, etc.) | 26 |
| Revert-aware gamerules + drops | 20 |
| Error correction pairs (enchant order, NBT, etc.) | 54 |
| Claude-distilled outputs | 344 |
| Bot audit interactions | 448+ |
| Boundary/safety/prompt injection | 95+ |
| Tool-calling (multi-turn with RCON) | 1,159 |
---
## Completed This Session
### Model & Training
- [x] Mortdecai 0.4.0 trained: Qwen3.5-9B, 3,369 examples, loss 0.20
- [x] 0.4.0 exported to GGUF Q4_K_M (5.3GB)
- [x] 0.4.0 deployed to prod (RTX 4000) — paper-ai + shrink-world
- [x] Single-call mode enabled on prod
- [x] `/no_think` in all training data to suppress thinking tokens
- [x] Qwen3.5-9B base bake-off: 70.1% accuracy (2x Qwen3-8B)
- [~] 0.4.0 bake-off running on steel141
### Validator & Safety
- [x] Error correction: detects RCON errors, asks model to fix, retries
- [x] Broadened error patterns: `<--[HERE]` universal catch
- [x] `kill @a` blocked (players only)
- [x] `tp minecraft:spawn` → safe coordinates
- [x] Fire fallback won't trigger on "firework"
- [x] Dangerous effect caps: levitation 15s, wither 30s, poison 60s, nausea 30s
- [x] Fall protection: detects lethal tp, adds slow_falling unless intentional
- [x] Gamerule revert timers: auto-revert after 5-10 min (configurable)
- [x] Expanded safe_prefixes: gamerule, particle, playsound, title, scoreboard, team, bossbar, locate, etc.
- [x] Validator hit-rate tracking to /var/log/mc_validator_stats.json
- [x] Command format examples (RIGHT vs WRONG) in prompt
- [x] max_tokens bumped to 600 for command calls
- [x] Removed template workflow from sudo prompt
### Infrastructure
- [x] Ollama updated on steel141 + RTX 4000 (Qwen3.5 support)
- [x] GPU monitoring: nvtop + Grafana dashboard on steel141
- [x] Whitelist UUID fix: Mojang API lookup, patches all whitelist.json files
- [x] mortdec.ai + minecraft.mortdec.ai live with SSL
- [x] Public Mortdecai repo on Gitea with README
- [x] `status` command: shows model name, mode, validator stats in-game
- [x] Verbose pipeline logging: token counts, speed, elapsed time, think stripping
- [x] Greenfield world downloaded and installed on paper-ai
- [x] 77 WorldEdit schematics installed
### Training Data Added
- [x] Gamerules (49 examples): all major gamerules with natural language
- [x] Risk hierarchy (40): L0 blocked, L1 permanent, L2 temporary, prompt injection
- [x] Dangerous effects (12): levitation/wither/poison caps
- [x] Fall safety (25): height math, water/slime/hay awareness, intent detection
- [x] Suffocation (8): tp into blocks, sand/gravel crushing
- [x] Death/environment (26): drowning, lava, void, explosions, mobs, starvation, lightning
- [x] Revert-aware gamerules (8): revert_after field for 0.5.0
- [x] Drop/height (12): intentional drops, safe tp, slow_falling
- [x] Enchantment error correction (7): count-before-bracket, typos, old NBT
### Data Collection
- [x] API cascade: Haiku ($20) → Gemini ($20) → local
- [x] Switched dev to Gemini 3.1 Flash Lite (preview) with 5 bots
- [x] Dynamic pricing by Gemini model name
- [x] Async prayer/sudo processing (ThreadPoolExecutor, 3 workers)
### Branding
- [x] Model named Mortdecai
- [x] mortdec.ai domain purchased and configured
- [x] Rajdhani Bold as official font
- [x] Logo variants generated (6 fonts × 2 text versions)
- [x] Whitelist page branded with Mortdecai logo
---
## Active TODOs
### Immediate
- [~] 0.4.0 bake-off running — publish results to Gitea when complete
- [ ] Fix 0.4.0 Modelfile chat template on RTX 4000 (done, needs verification)
- [ ] Also fix on steel141's Ollama instance
### Short-term (0.5.0 prep)
- [ ] Shared memory system: per-server JSON, owner-tagged, location/preference/fact types
- Player says "remember this is home" → AI writes location memory
- Other players can reference: "tp me to slingshooter08's home"
- Memory in context for location lookups, tool call for read/write
- [ ] `memory_write` field in model output schema
- [ ] Setblock training data expansion
- [ ] `world.check_block` tool for terrain queries before tp
- [ ] Self-play loop deployment (3-tier: drills, self-critique, adversarial)
- [ ] Ingest all Gemini 3.1 Flash Lite training data
- [ ] More error correction from production RCON failures
### Model 0.5.0 Training
- [ ] Train with tool-calling format (rcon.execute, wiki_lookup, world.player_info)
- [ ] `revert_after` / `revert_commands` in output schema
- [ ] Self-play generated data (200 rounds post-0.4.0)
- [ ] Memory read/write training examples
- [ ] Ground-level terrain detection training
- [ ] Fall damage math in model reasoning (not just validator)
- [ ] Setblock + block state training
- [ ] More death mechanics awareness in reasoning
### Infrastructure
- [ ] GPU monitoring for RTX 4000 (second exporter)
- [ ] Validator hit-rate analysis — remove fixes that fire <1%
- [ ] Automate training pipeline: ingest → dedup → train → export → deploy
- [ ] POS receipt for Gemini milestones
- [ ] Start Greenfield world via MCSManager
### Content & Community
- [ ] Invite more playtesters via minecraft.mortdec.ai
- [ ] Update mortdec.ai README with 0.4.0 bake-off results
- [ ] Consider public HuggingFace release
- [ ] WorldEdit schematic library expansion
---
## Risk Hierarchy
Commands classified by permanence:
| Level | Permanence | Examples | Model behavior |
|:-----:|-----------|----------|----------------|
| **0** | Irreversible/admin | ban, kick, stop, op, deop, whitelist | Never execute |
| **1** | Permanent toggle | gamemode @a, permanent gamerules, difficulty | Execute for self only, refuse for @a |
| **2** | Temporary/reversible | gamerules with time limits, brief changes | Allow, schedule auto-revert |
| **3** | Transient | time, weather, tick speed, chat settings | Execute freely |
| **4** | Generous | full enchanted gear, large material stacks | Execute for worthy requests |
**Gamerule revert system:** Changes auto-revert after 5-10 min unless "permanently" specified. Player notified of countdown.
**Dangerous effect caps (hardcoded):** Levitation 15s, Wither 30s, Poison 60s, Nausea 30s.
**Fall protection:** Lethal tp detected → slow_falling added unless intent words present (drop, yeet, throw, kill me).
---
## Key Decisions
| Date | Decision | Rationale |
|------|----------|-----------|
| 03-18 | gemma3n:e4b for initial prod | Bake-off winner at 80.6% accuracy |
| 03-18 | Qwen3-8B for 0.1.0-0.3.0 training | Best syntax quality, Apache 2.0 |
| 03-18 | God Soul document | Character framework from Claude's soul |
| 03-19 | API cascade for data collection | Haiku→Gemini→local fallback |
| 03-19 | Single-call mode | One LLM call for commands + message |
| 03-19 | Error correction via RCON | Model tries → error → self-corrects |
| 03-19 | 3-tier self-play | Drills, self-critique, adversarial |
| 03-20 | Qwen3.5-9B for 0.4.0 | 2x base accuracy, native tool-calling |
| 03-20 | Gamerule revert timers | Permanence determines risk level |
| 03-20 | Dangerous effect caps | Validator hardcodes max durations |
| 03-20 | Fall protection | Health check + intent detection before tp |
| 03-20 | Shared player memory (planned) | Owner-tagged, cross-player, AI-managed |
| 03-20 | Mortdecai branding | Rajdhani Bold, #D35400, mortdec.ai |
---
## Success Criteria
| Metric | 0.3.0 | 0.4.0 (target) | 0.5.0 (goal) |
|--------|:-:|:-:|:-:|
| Command accuracy | ~70% | 85%+ | 95%+ |
| Safety compliance | ~95% | 99%+ | 99.9%+ |
| Error self-correction | N/A | 50%+ | 80%+ |
| Response latency | 5-15s | <5s | <3s |
| Empty response rate | ~10% | <5% | <2% |
| Think token leakage | Yes | No | No |
---
*Updated as the project evolves. Check git history for previous versions.*