Semver rename: v1-v5 → 0.1.0-0.5.0 across all files

Versioning scheme: semantic versioning (MAJOR.MINOR.PATCH) - 0.x.0 = pre-release development - 1.0.0 = first public/monetized release Renamed everywhere: PLAN.md, training scripts, self-play, overnight script, status printer, whitelist app, discord bot, all training data references. Ollama models retagged: mortdecai-v4 → mortdecai:0.4.0 Server configs updated on all three servers. Self-play restarted with new model name. Entity targeting + radius-aware kill + distance scale training added. Seed dataset: 2,503 + tool: 1,159 + self-play: 5,059 = 8,721 total examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 21:37:14 -04:00
parent a03c0a8087
commit f39809eaca
6 changed files with 4842 additions and 21 deletions
@@ -18,16 +18,16 @@
 ### Models
 | Model | Base | Examples | Loss | Status |
 |-------|------|---------|------|--------|
-| v1 | Qwen3-8B | 233 | 0.10 | Retired (overfit) |
-| v2 | Qwen3-8B | 361 | 2.03 | Retired |
-| v3 | Qwen3-8B | 1,308 | 0.55 | Available on steel141 |
-| **v4** | **Qwen3.5-9B** | **3,369** | **0.20** | **Deployed on prod** |
+| 0.1.0 | Qwen3-8B | 233 | 0.10 | Retired (overfit) |
+| 0.2.0 | Qwen3-8B | 361 | 2.03 | Retired |
+| 0.3.0 | Qwen3-8B | 1,308 | 0.55 | Available on steel141 |
+| **0.4.0** | **Qwen3.5-9B** | **3,369** | **0.20** | **Deployed on prod** |

 ### Infrastructure
 | Component | Location | Details |
 |-----------|----------|---------|
 | Training GPU | steel141 RTX 3090 Ti (24GB) | QLoRA via Unsloth 2026.3.8 |
-| Prod inference | node-197 RTX 4000 (16GB) | Ollama, mortdecai-v4 |
+| Prod inference | node-197 RTX 4000 (16GB) | Ollama, mortdecai:0.4.0 |
 | MC servers | CT 644 on node-112 | paper-ai:25567, shrink:25566, dev:25568, vanilla:25565 |
 | Dev data collection | CT 644 | Gemini 3.1 Flash Lite (preview), 5 bots |
 | Whitelist app | CT 644:8099 | minecraft.mortdec.ai |
@@ -79,13 +79,13 @@
 ## Completed This Session

 ### Model & Training
- [x] Mortdecai v4 trained: Qwen3.5-9B, 3,369 examples, loss 0.20
- [x] v4 exported to GGUF Q4_K_M (5.3GB)
- [x] v4 deployed to prod (RTX 4000) — paper-ai + shrink-world
+- [x] Mortdecai 0.4.0 trained: Qwen3.5-9B, 3,369 examples, loss 0.20
+- [x] 0.4.0 exported to GGUF Q4_K_M (5.3GB)
+- [x] 0.4.0 deployed to prod (RTX 4000) — paper-ai + shrink-world
 - [x] Single-call mode enabled on prod
 - [x] `/no_think` in all training data to suppress thinking tokens
 - [x] Qwen3.5-9B base bake-off: 70.1% accuracy (2x Qwen3-8B)
- [~] v4 bake-off running on steel141
+- [~] 0.4.0 bake-off running on steel141

 ### Validator & Safety
 - [x] Error correction: detects RCON errors, asks model to fix, retries
@@ -120,7 +120,7 @@
 - [x] Fall safety (25): height math, water/slime/hay awareness, intent detection
 - [x] Suffocation (8): tp into blocks, sand/gravel crushing
 - [x] Death/environment (26): drowning, lava, void, explosions, mobs, starvation, lightning
- [x] Revert-aware gamerules (8): revert_after field for v5
+- [x] Revert-aware gamerules (8): revert_after field for 0.5.0
 - [x] Drop/height (12): intentional drops, safe tp, slow_falling
 - [x] Enchantment error correction (7): count-before-bracket, typos, old NBT

@@ -142,11 +142,11 @@
 ## Active TODOs

 ### Immediate
- [~] v4 bake-off running — publish results to Gitea when complete
- [ ] Fix v4 Modelfile chat template on RTX 4000 (done, needs verification)
+- [~] 0.4.0 bake-off running — publish results to Gitea when complete
+- [ ] Fix 0.4.0 Modelfile chat template on RTX 4000 (done, needs verification)
 - [ ] Also fix on steel141's Ollama instance

-### Short-term (v5 prep)
+### Short-term (0.5.0 prep)
 - [ ] Shared memory system: per-server JSON, owner-tagged, location/preference/fact types
  - Player says "remember this is home" → AI writes location memory
  - Other players can reference: "tp me to slingshooter08's home"
@@ -158,10 +158,10 @@
 - [ ] Ingest all Gemini 3.1 Flash Lite training data
 - [ ] More error correction from production RCON failures

-### Model v5 Training
+### Model 0.5.0 Training
 - [ ] Train with tool-calling format (rcon.execute, wiki_lookup, world.player_info)
 - [ ] `revert_after` / `revert_commands` in output schema
- [ ] Self-play generated data (200 rounds post-v4)
+- [ ] Self-play generated data (200 rounds post-0.4.0)
 - [ ] Memory read/write training examples
 - [ ] Ground-level terrain detection training
 - [ ] Fall damage math in model reasoning (not just validator)
@@ -177,7 +177,7 @@

 ### Content & Community
 - [ ] Invite more playtesters via minecraft.mortdec.ai
- [ ] Update mortdec.ai README with v4 bake-off results
+- [ ] Update mortdec.ai README with 0.4.0 bake-off results
 - [ ] Consider public HuggingFace release
 - [ ] WorldEdit schematic library expansion

@@ -208,13 +208,13 @@ Commands classified by permanence:
 | Date | Decision | Rationale |
 |------|----------|-----------|
 | 03-18 | gemma3n:e4b for initial prod | Bake-off winner at 80.6% accuracy |
-| 03-18 | Qwen3-8B for v1-v3 training | Best syntax quality, Apache 2.0 |
+| 03-18 | Qwen3-8B for 0.1.0-0.3.0 training | Best syntax quality, Apache 2.0 |
 | 03-18 | God Soul document | Character framework from Claude's soul |
 | 03-19 | API cascade for data collection | Haiku→Gemini→local fallback |
 | 03-19 | Single-call mode | One LLM call for commands + message |
 | 03-19 | Error correction via RCON | Model tries → error → self-corrects |
 | 03-19 | 3-tier self-play | Drills, self-critique, adversarial |
-| 03-20 | Qwen3.5-9B for v4 | 2x base accuracy, native tool-calling |
+| 03-20 | Qwen3.5-9B for 0.4.0 | 2x base accuracy, native tool-calling |
 | 03-20 | Gamerule revert timers | Permanence determines risk level |
 | 03-20 | Dangerous effect caps | Validator hardcodes max durations |
 | 03-20 | Fall protection | Health check + intent detection before tp |
@@ -225,7 +225,7 @@ Commands classified by permanence:

 ## Success Criteria

-| Metric | v3 | v4 (target) | v5 (goal) |
+| Metric | 0.3.0 | 0.4.0 (target) | 0.5.0 (goal) |
 |--------|:-:|:-:|:-:|
 | Command accuracy | ~70% | 85%+ | 95%+ |
 | Safety compliance | ~95% | 99%+ | 99.9%+ |
@@ -5,7 +5,7 @@
 # Usage: nohup ./overnight_selfplay.sh > /var/log/selfplay_overnight.log 2>&1 &
 # Kill with: pkill -f overnight_selfplay ; pkill -f self_play.py

-MODEL="${1:-mortdecai-v4}"
+MODEL="${1:-mortdecai:0.4.0}"
 RCON_HOST="${2:-192.168.0.244}"
 RCON_PORT="${3:-25578}"
 ROUNDS_PER_TIER=50
@@ -546,7 +546,7 @@ def trace_to_training(trace):

 def main():
    parser = argparse.ArgumentParser(description="Self-play training data generator")
-    parser.add_argument("--model", default="qwen3-8b-mc-lora-v3")
+    parser.add_argument("--model", default="mortdecai:0.4.0")
    parser.add_argument("--ollama-url", default="http://192.168.0.141:11434")
    parser.add_argument("--api-key", default=None, help="API key for authenticated gateways")
    parser.add_argument("--rcon-host", default="192.168.0.244")