PLAN.md complete update — v4 deployed, all session work documented
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,6 +1,6 @@
|
|||||||
# PLAN.md — Mortdecai Project Roadmap
|
# PLAN.md — Mortdecai Project Roadmap
|
||||||
|
|
||||||
> **Last updated:** 2026-03-20
|
> **Last updated:** 2026-03-20 04:45 UTC
|
||||||
> **Model name:** Mortdecai
|
> **Model name:** Mortdecai
|
||||||
> **Domain:** mortdec.ai
|
> **Domain:** mortdec.ai
|
||||||
> **Status legend:** `[ ]` planned | `[~]` in progress | `[x]` done | `[-]` cancelled/deferred
|
> **Status legend:** `[ ]` planned | `[~]` in progress | `[x]` done | `[-]` cancelled/deferred
|
||||||
@@ -9,41 +9,47 @@
|
|||||||
|
|
||||||
## Vision
|
## Vision
|
||||||
|
|
||||||
**Mortdecai** is a fine-tuned 9B parameter language model for Minecraft server operations. It translates natural language to commands, controls an AI God character, self-corrects errors via RCON feedback, and improves through self-play.
|
**Mortdecai** is a fine-tuned 9B parameter language model for Minecraft server operations. It translates natural language to commands, controls an AI God character, self-corrects errors via RCON feedback, and improves through self-play. Runs locally on consumer hardware with zero cloud dependencies at inference time.
|
||||||
|
|
||||||
It runs locally on consumer hardware with zero cloud dependencies at inference time.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Current State (2026-03-20)
|
## Current State
|
||||||
|
|
||||||
### Models
|
### Models
|
||||||
| Model | Base | Examples | Loss | Status |
|
| Model | Base | Examples | Loss | Status |
|
||||||
|-------|------|---------|------|--------|
|
|-------|------|---------|------|--------|
|
||||||
| v1 | Qwen3-8B | 233 | 0.10 | Retired (overfit) |
|
| v1 | Qwen3-8B | 233 | 0.10 | Retired (overfit) |
|
||||||
| v2 | Qwen3-8B | 361 | 2.03 | Retired |
|
| v2 | Qwen3-8B | 361 | 2.03 | Retired |
|
||||||
| v3 | Qwen3-8B | 1,308 | 0.55 | **Deployed on prod** |
|
| v3 | Qwen3-8B | 1,308 | 0.55 | Available on steel141 |
|
||||||
| **v4** | **Qwen3.5-9B** | **3,369** | **Training (~88%)** | ETA ~30 min |
|
| **v4** | **Qwen3.5-9B** | **3,369** | **0.20** | **Deployed on prod** |
|
||||||
|
|
||||||
### Infrastructure
|
### Infrastructure
|
||||||
| Component | Location | Details |
|
| Component | Location | Details |
|
||||||
|-----------|----------|---------|
|
|-----------|----------|---------|
|
||||||
| Training GPU | steel141 RTX 3090 Ti (24GB) | QLoRA via Unsloth |
|
| Training GPU | steel141 RTX 3090 Ti (24GB) | QLoRA via Unsloth 2026.3.8 |
|
||||||
| Prod inference | node-197 RTX 4000 (16GB) | Ollama, mortdecai-v3 |
|
| Prod inference | node-197 RTX 4000 (16GB) | Ollama, mortdecai-v4 |
|
||||||
| Minecraft servers | CT 644 on node-112 | paper-ai (25567), shrink-world (25566), dev (25568), vanilla (25565) |
|
| MC servers | CT 644 on node-112 | paper-ai:25567, shrink:25566, dev:25568, vanilla:25565 |
|
||||||
| Dev data collection | CT 644 | Gemini 2.5 Flash via API, 10 bots |
|
| Dev data collection | CT 644 | Gemini 3.1 Flash Lite (preview), 5 bots |
|
||||||
| Whitelist app | CT 644 port 8099 | minecraft.mortdec.ai |
|
| Whitelist app | CT 644:8099 | minecraft.mortdec.ai |
|
||||||
| Caddy proxy | CT 600 on node-241 | mortdec.ai, minecraft.mortdec.ai |
|
| Caddy proxy | CT 600 on node-241 | mortdec.ai, minecraft.mortdec.ai |
|
||||||
| GPU monitoring | Grafana on CT 300 (node-173) | Prometheus + nvidia exporter on steel141 |
|
| GPU monitoring | Grafana CT 300 (node-173) | Prometheus + nvidia exporter on steel141 |
|
||||||
| LangGraph gateway | CT 644 port 8091 | Disabled on prod (fresh session mode available) |
|
| Greenfield map | paper-ai | Downloaded, world swapped, needs MCSManager start |
|
||||||
|
| WorldEdit schematics | paper-ai | 77 installed in FAWE/schematics/ |
|
||||||
|
|
||||||
### API Spend
|
### API Spend
|
||||||
| Provider | Spent | Budget | Status |
|
| Provider | Spent | Budget | Status |
|
||||||
|----------|-------|--------|--------|
|
|----------|-------|--------|--------|
|
||||||
| Claude Haiku | $20.01 | $20 | Exhausted |
|
| Claude Haiku | $20.01 | $20 | Exhausted |
|
||||||
| Gemini 2.5 Flash | ~$0.50 | $20 | Active (dev bots) |
|
| Gemini (all) | ~$0.50 | $20 | Active on dev (3.1 Flash Lite) |
|
||||||
|
|
||||||
|
### Branding
|
||||||
|
- **Font:** Rajdhani Bold
|
||||||
|
- **Color:** #D35400 (Sethian orange)
|
||||||
|
- **Domain:** mortdec.ai → Gitea repo, minecraft.mortdec.ai → whitelist page
|
||||||
|
- **Public repo:** https://git.sethpc.xyz/Seth/Mortdecai
|
||||||
|
|
||||||
|
### Training Data: 2,397 seed + 1,159 tool-calling = 3,556 total
|
||||||
|
|
||||||
### Training Data: 2,318 seed + 1,159 tool-calling = 3,477 total
|
|
||||||
| Category | Count |
|
| Category | Count |
|
||||||
|----------|-------|
|
|----------|-------|
|
||||||
| Command syntax reference | 107 |
|
| Command syntax reference | 107 |
|
||||||
@@ -56,120 +62,171 @@ It runs locally on consumer hardware with zero cloud dependencies at inference t
|
|||||||
| WorldEdit | 45 |
|
| WorldEdit | 45 |
|
||||||
| Paper server features | 55 |
|
| Paper server features | 55 |
|
||||||
| Cosmetics/XP/effects | 42 |
|
| Cosmetics/XP/effects | 42 |
|
||||||
| Gamerules | 49 |
|
| Gamerules (49) + risk hierarchy (40) | 89 |
|
||||||
| Risk hierarchy (L0-L3, prompt injection) | 40 |
|
|
||||||
| Quantity boundaries | 32 |
|
| Quantity boundaries | 32 |
|
||||||
| Dangerous effect caps | 12 |
|
| Dangerous effect caps (levitation, wither, etc.) | 12 |
|
||||||
|
| Fall safety + drops + suffocation | 33 |
|
||||||
|
| Death/environment (drowning, lava, void, mobs, etc.) | 26 |
|
||||||
| Revert-aware gamerules + drops | 20 |
|
| Revert-aware gamerules + drops | 20 |
|
||||||
| Error correction pairs | 47 |
|
| Error correction pairs (enchant order, NBT, etc.) | 54 |
|
||||||
| Claude-distilled outputs | 344 |
|
| Claude-distilled outputs | 344 |
|
||||||
| Bot audit interactions | 448+ |
|
| Bot audit interactions | 448+ |
|
||||||
| Boundary/safety examples | 95+ |
|
| Boundary/safety/prompt injection | 95+ |
|
||||||
| Tool-calling (multi-turn with RCON) | 1,159 |
|
| Tool-calling (multi-turn with RCON) | 1,159 |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## Completed This Session
|
||||||
|
|
||||||
|
### Model & Training
|
||||||
|
- [x] Mortdecai v4 trained: Qwen3.5-9B, 3,369 examples, loss 0.20
|
||||||
|
- [x] v4 exported to GGUF Q4_K_M (5.3GB)
|
||||||
|
- [x] v4 deployed to prod (RTX 4000) — paper-ai + shrink-world
|
||||||
|
- [x] Single-call mode enabled on prod
|
||||||
|
- [x] `/no_think` in all training data to suppress thinking tokens
|
||||||
|
- [x] Qwen3.5-9B base bake-off: 70.1% accuracy (2x Qwen3-8B)
|
||||||
|
- [~] v4 bake-off running on steel141
|
||||||
|
|
||||||
|
### Validator & Safety
|
||||||
|
- [x] Error correction: detects RCON errors, asks model to fix, retries
|
||||||
|
- [x] Broadened error patterns: `<--[HERE]` universal catch
|
||||||
|
- [x] `kill @a` blocked (players only)
|
||||||
|
- [x] `tp minecraft:spawn` → safe coordinates
|
||||||
|
- [x] Fire fallback won't trigger on "firework"
|
||||||
|
- [x] Dangerous effect caps: levitation 15s, wither 30s, poison 60s, nausea 30s
|
||||||
|
- [x] Fall protection: detects lethal tp, adds slow_falling unless intentional
|
||||||
|
- [x] Gamerule revert timers: auto-revert after 5-10 min (configurable)
|
||||||
|
- [x] Expanded safe_prefixes: gamerule, particle, playsound, title, scoreboard, team, bossbar, locate, etc.
|
||||||
|
- [x] Validator hit-rate tracking to /var/log/mc_validator_stats.json
|
||||||
|
- [x] Command format examples (RIGHT vs WRONG) in prompt
|
||||||
|
- [x] max_tokens bumped to 600 for command calls
|
||||||
|
- [x] Removed template workflow from sudo prompt
|
||||||
|
|
||||||
|
### Infrastructure
|
||||||
|
- [x] Ollama updated on steel141 + RTX 4000 (Qwen3.5 support)
|
||||||
|
- [x] GPU monitoring: nvtop + Grafana dashboard on steel141
|
||||||
|
- [x] Whitelist UUID fix: Mojang API lookup, patches all whitelist.json files
|
||||||
|
- [x] mortdec.ai + minecraft.mortdec.ai live with SSL
|
||||||
|
- [x] Public Mortdecai repo on Gitea with README
|
||||||
|
- [x] `status` command: shows model name, mode, validator stats in-game
|
||||||
|
- [x] Verbose pipeline logging: token counts, speed, elapsed time, think stripping
|
||||||
|
- [x] Greenfield world downloaded and installed on paper-ai
|
||||||
|
- [x] 77 WorldEdit schematics installed
|
||||||
|
|
||||||
|
### Training Data Added
|
||||||
|
- [x] Gamerules (49 examples): all major gamerules with natural language
|
||||||
|
- [x] Risk hierarchy (40): L0 blocked, L1 permanent, L2 temporary, prompt injection
|
||||||
|
- [x] Dangerous effects (12): levitation/wither/poison caps
|
||||||
|
- [x] Fall safety (25): height math, water/slime/hay awareness, intent detection
|
||||||
|
- [x] Suffocation (8): tp into blocks, sand/gravel crushing
|
||||||
|
- [x] Death/environment (26): drowning, lava, void, explosions, mobs, starvation, lightning
|
||||||
|
- [x] Revert-aware gamerules (8): revert_after field for v5
|
||||||
|
- [x] Drop/height (12): intentional drops, safe tp, slow_falling
|
||||||
|
- [x] Enchantment error correction (7): count-before-bracket, typos, old NBT
|
||||||
|
|
||||||
|
### Data Collection
|
||||||
|
- [x] API cascade: Haiku ($20) → Gemini ($20) → local
|
||||||
|
- [x] Switched dev to Gemini 3.1 Flash Lite (preview) with 5 bots
|
||||||
|
- [x] Dynamic pricing by Gemini model name
|
||||||
|
- [x] Async prayer/sudo processing (ThreadPoolExecutor, 3 workers)
|
||||||
|
|
||||||
|
### Branding
|
||||||
|
- [x] Model named Mortdecai
|
||||||
|
- [x] mortdec.ai domain purchased and configured
|
||||||
|
- [x] Rajdhani Bold as official font
|
||||||
|
- [x] Logo variants generated (6 fonts × 2 text versions)
|
||||||
|
- [x] Whitelist page branded with Mortdecai logo
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Active TODOs
|
## Active TODOs
|
||||||
|
|
||||||
### Immediate (this session)
|
### Immediate
|
||||||
- [~] Mortdecai v4 training completing (~30 min remaining)
|
- [~] v4 bake-off running — publish results to Gitea when complete
|
||||||
- [ ] Export v4 to GGUF, deploy to RTX 4000 as mortdecai-v4
|
- [ ] Fix v4 Modelfile chat template on RTX 4000 (done, needs verification)
|
||||||
- [ ] Enable single-call mode on prod with v4
|
- [ ] Also fix on steel141's Ollama instance
|
||||||
- [ ] Run v4 bake-off and compare to v3/base
|
|
||||||
- [ ] Commit and push all PaperFork changes
|
|
||||||
|
|
||||||
### Short-term
|
### Short-term (v5 prep)
|
||||||
- [ ] Deploy self-play loop with v4 (3-tier: drills, self-critique, adversarial)
|
- [ ] Shared memory system: per-server JSON, owner-tagged, location/preference/fact types
|
||||||
- [ ] Add ground-level detection for teleport safety (query terrain before tp)
|
- Player says "remember this is home" → AI writes location memory
|
||||||
- [ ] Build revert_after/revert_commands into v5 training format
|
- Other players can reference: "tp me to slingshooter08's home"
|
||||||
- [ ] Add Gemini milestone POS printing
|
- Memory in context for location lookups, tool call for read/write
|
||||||
- [ ] Fix whitelist app UUID lookup for vanilla server path
|
- [ ] `memory_write` field in model output schema
|
||||||
- [ ] Start Greenfield world on paper-ai (downloaded, needs MCSManager start)
|
- [ ] Setblock training data expansion
|
||||||
|
- [ ] `world.check_block` tool for terrain queries before tp
|
||||||
### Model improvements for v5
|
- [ ] Self-play loop deployment (3-tier: drills, self-critique, adversarial)
|
||||||
- [ ] Train on Qwen3.5-9B with tool-calling format (rcon.execute, wiki_lookup, etc.)
|
- [ ] Ingest all Gemini 3.1 Flash Lite training data
|
||||||
- [ ] Self-play generated data (run 200 rounds after v4 deploys)
|
|
||||||
- [ ] Ingest all Gemini 2.5 Flash training data ($20 worth)
|
|
||||||
- [ ] Add revert_after field to training output format
|
|
||||||
- [ ] Ground-level detection training (check terrain before tp)
|
|
||||||
- [ ] More error correction from production RCON failures
|
- [ ] More error correction from production RCON failures
|
||||||
- [ ] Enchantment count-before-bracket error correction
|
|
||||||
|
### Model v5 Training
|
||||||
|
- [ ] Train with tool-calling format (rcon.execute, wiki_lookup, world.player_info)
|
||||||
|
- [ ] `revert_after` / `revert_commands` in output schema
|
||||||
|
- [ ] Self-play generated data (200 rounds post-v4)
|
||||||
|
- [ ] Memory read/write training examples
|
||||||
|
- [ ] Ground-level terrain detection training
|
||||||
|
- [ ] Fall damage math in model reasoning (not just validator)
|
||||||
|
- [ ] Setblock + block state training
|
||||||
|
- [ ] More death mechanics awareness in reasoning
|
||||||
|
|
||||||
### Infrastructure
|
### Infrastructure
|
||||||
- [ ] Add GPU monitoring for RTX 4000 (second exporter)
|
- [ ] GPU monitoring for RTX 4000 (second exporter)
|
||||||
- [ ] Validator hit-rate analysis — remove fixes that fire <1%
|
- [ ] Validator hit-rate analysis — remove fixes that fire <1%
|
||||||
- [ ] Automate training pipeline: ingest → dedup → train → export → deploy
|
- [ ] Automate training pipeline: ingest → dedup → train → export → deploy
|
||||||
- [ ] POS receipt for Gemini milestones
|
- [ ] POS receipt for Gemini milestones
|
||||||
- [ ] Consider moving to Mortdecai as Ollama model name on prod
|
- [ ] Start Greenfield world via MCSManager
|
||||||
|
|
||||||
### Content & Community
|
### Content & Community
|
||||||
- [ ] Invite more playtesters via minecraft.mortdec.ai
|
- [ ] Invite more playtesters via minecraft.mortdec.ai
|
||||||
- [ ] Update mortdec.ai README with v4 results when available
|
- [ ] Update mortdec.ai README with v4 bake-off results
|
||||||
- [ ] Consider public HuggingFace release once quality is validated
|
- [ ] Consider public HuggingFace release
|
||||||
- [ ] WorldEdit schematic library expansion (77 installed, need more)
|
- [ ] WorldEdit schematic library expansion
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Risk Hierarchy
|
## Risk Hierarchy
|
||||||
|
|
||||||
Commands are classified by permanence, not just danger:
|
Commands classified by permanence:
|
||||||
|
|
||||||
| Level | Permanence | Examples | Model behavior |
|
| Level | Permanence | Examples | Model behavior |
|
||||||
|:-----:|-----------|----------|----------------|
|
|:-----:|-----------|----------|----------------|
|
||||||
| **0** | Irreversible/admin | ban, kick, stop, op, deop | Never execute |
|
| **0** | Irreversible/admin | ban, kick, stop, op, deop, whitelist | Never execute |
|
||||||
| **1** | Permanent toggle | gamemode @a, permanent gamerules, difficulty | Refuse or execute for self only |
|
| **1** | Permanent toggle | gamemode @a, permanent gamerules, difficulty | Execute for self only, refuse for @a |
|
||||||
| **2** | Temporary/reversible | gamerules with time limits, brief difficulty | Allow, schedule auto-revert |
|
| **2** | Temporary/reversible | gamerules with time limits, brief changes | Allow, schedule auto-revert |
|
||||||
| **3** | Transient | time, weather, tick speed, chat settings | Execute freely |
|
| **3** | Transient | time, weather, tick speed, chat settings | Execute freely |
|
||||||
| **4** | Generous | full enchanted gear, large material stacks | Execute for worthy requests |
|
| **4** | Generous | full enchanted gear, large material stacks | Execute for worthy requests |
|
||||||
|
|
||||||
Gamerule revert system: changes auto-revert after 5-10 min unless "permanently" specified.
|
**Gamerule revert system:** Changes auto-revert after 5-10 min unless "permanently" specified. Player notified of countdown.
|
||||||
|
|
||||||
Dangerous effect caps (hardcoded in validator):
|
**Dangerous effect caps (hardcoded):** Levitation 15s, Wither 30s, Poison 60s, Nausea 30s.
|
||||||
- Levitation: 15s max
|
|
||||||
- Wither: 30s max
|
**Fall protection:** Lethal tp detected → slow_falling added unless intent words present (drop, yeet, throw, kill me).
|
||||||
- Poison: 60s max
|
|
||||||
- Nausea: 30s max
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Key Architecture Decisions
|
## Key Decisions
|
||||||
|
|
||||||
| Date | Decision | Rationale |
|
| Date | Decision | Rationale |
|
||||||
|------|----------|-----------|
|
|------|----------|-----------|
|
||||||
| 2026-03-18 | Serving: gemma3n:e4b → qwen3.5:9b → mortdecai-v3 | Progressive upgrades as better models trained |
|
| 03-18 | gemma3n:e4b for initial prod | Bake-off winner at 80.6% accuracy |
|
||||||
| 2026-03-18 | Fine-tuning: Qwen3-8B → Qwen3.5-9B | 3.5 has 2x base accuracy (70% vs 34%), native tool-calling |
|
| 03-18 | Qwen3-8B for v1-v3 training | Best syntax quality, Apache 2.0 |
|
||||||
| 2026-03-18 | God Soul document | Character framework adapted from Claude's soul. Defines identity, judgment, quantity boundaries |
|
| 03-18 | God Soul document | Character framework from Claude's soul |
|
||||||
| 2026-03-19 | API cascade: Haiku → Gemini → local | Progressive fallback for dev data collection. $40 total API budget |
|
| 03-19 | API cascade for data collection | Haiku→Gemini→local fallback |
|
||||||
| 2026-03-19 | /no_think in training | Prevents Qwen3 thinking tokens from consuming output budget |
|
| 03-19 | Single-call mode | One LLM call for commands + message |
|
||||||
| 2026-03-19 | Single-call mode | One LLM call for commands + message (v3+). Two-call for older models |
|
| 03-19 | Error correction via RCON | Model tries → error → self-corrects |
|
||||||
| 2026-03-19 | Error correction via RCON | Model tries command → RCON error → model self-corrects → retry |
|
| 03-19 | 3-tier self-play | Drills, self-critique, adversarial |
|
||||||
| 2026-03-19 | 3-tier self-play | Drills, self-critique, adversarial. Model generates its own training data |
|
| 03-20 | Qwen3.5-9B for v4 | 2x base accuracy, native tool-calling |
|
||||||
| 2026-03-20 | Gamerule revert timers | State changes auto-revert. Permanence determines risk level |
|
| 03-20 | Gamerule revert timers | Permanence determines risk level |
|
||||||
| 2026-03-20 | Dangerous effect caps | Validator hardcodes max durations for levitation, wither, poison, nausea |
|
| 03-20 | Dangerous effect caps | Validator hardcodes max durations |
|
||||||
| 2026-03-20 | Expanded safe_prefixes | gamerule, particle, playsound, title, scoreboard, team, bossbar, locate, etc. |
|
| 03-20 | Fall protection | Health check + intent detection before tp |
|
||||||
| 2026-03-20 | Model named Mortdecai | mortdec.ai domain, Rajdhani Bold font, Sethian orange branding |
|
| 03-20 | Shared player memory (planned) | Owner-tagged, cross-player, AI-managed |
|
||||||
|
| 03-20 | Mortdecai branding | Rajdhani Bold, #D35400, mortdec.ai |
|
||||||
---
|
|
||||||
|
|
||||||
## Dev Server
|
|
||||||
|
|
||||||
| Property | Value |
|
|
||||||
|----------|-------|
|
|
||||||
| Location | CT 644 on node-112 |
|
|
||||||
| Game port | 25568 |
|
|
||||||
| RCON port | 25578 |
|
|
||||||
| RCON password | REDACTED_RCON |
|
|
||||||
| Data dir | /opt/paper-dev-25568/ |
|
|
||||||
| AI God | Gemini 2.5 Flash via API cascade |
|
|
||||||
| Bots | 10 swarm bots (swarm_bots.js) |
|
|
||||||
| Audit log | /var/log/mc_training_audit_dev.jsonl |
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Success Criteria
|
## Success Criteria
|
||||||
|
|
||||||
| Metric | v3 (current) | v4 (target) | v5 (goal) |
|
| Metric | v3 | v4 (target) | v5 (goal) |
|
||||||
|--------|:-----------:|:-----------:|:---------:|
|
|--------|:-:|:-:|:-:|
|
||||||
| Command accuracy | ~70% | 85%+ | 95%+ |
|
| Command accuracy | ~70% | 85%+ | 95%+ |
|
||||||
| Safety compliance | ~95% | 99%+ | 99.9%+ |
|
| Safety compliance | ~95% | 99%+ | 99.9%+ |
|
||||||
| Error self-correction | N/A | 50%+ | 80%+ |
|
| Error self-correction | N/A | 50%+ | 80%+ |
|
||||||
@@ -179,4 +236,4 @@ Dangerous effect caps (hardcoded in validator):
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
*This document is updated as the project evolves. Check git history for previous versions.*
|
*Updated as the project evolves. Check git history for previous versions.*
|
||||||
|
|||||||
Reference in New Issue
Block a user