Semver rename: v1-v5 → 0.1.0-0.5.0 across all files

Versioning scheme: semantic versioning (MAJOR.MINOR.PATCH)
- 0.x.0 = pre-release development
- 1.0.0 = first public/monetized release

Renamed everywhere: PLAN.md, training scripts, self-play, overnight script,
status printer, whitelist app, discord bot, all training data references.

Ollama models retagged: mortdecai-v4 → mortdecai:0.4.0
Server configs updated on all three servers.
Self-play restarted with new model name.
Entity targeting + radius-aware kill + distance scale training added.

Seed dataset: 2,503 + tool: 1,159 + self-play: 5,059 = 8,721 total examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-20 21:37:14 -04:00
parent a03c0a8087
commit f39809eaca
6 changed files with 4842 additions and 21 deletions
+19 -19
View File
@@ -18,16 +18,16 @@
### Models
| Model | Base | Examples | Loss | Status |
|-------|------|---------|------|--------|
| v1 | Qwen3-8B | 233 | 0.10 | Retired (overfit) |
| v2 | Qwen3-8B | 361 | 2.03 | Retired |
| v3 | Qwen3-8B | 1,308 | 0.55 | Available on steel141 |
| **v4** | **Qwen3.5-9B** | **3,369** | **0.20** | **Deployed on prod** |
| 0.1.0 | Qwen3-8B | 233 | 0.10 | Retired (overfit) |
| 0.2.0 | Qwen3-8B | 361 | 2.03 | Retired |
| 0.3.0 | Qwen3-8B | 1,308 | 0.55 | Available on steel141 |
| **0.4.0** | **Qwen3.5-9B** | **3,369** | **0.20** | **Deployed on prod** |
### Infrastructure
| Component | Location | Details |
|-----------|----------|---------|
| Training GPU | steel141 RTX 3090 Ti (24GB) | QLoRA via Unsloth 2026.3.8 |
| Prod inference | node-197 RTX 4000 (16GB) | Ollama, mortdecai-v4 |
| Prod inference | node-197 RTX 4000 (16GB) | Ollama, mortdecai:0.4.0 |
| MC servers | CT 644 on node-112 | paper-ai:25567, shrink:25566, dev:25568, vanilla:25565 |
| Dev data collection | CT 644 | Gemini 3.1 Flash Lite (preview), 5 bots |
| Whitelist app | CT 644:8099 | minecraft.mortdec.ai |
@@ -79,13 +79,13 @@
## Completed This Session
### Model & Training
- [x] Mortdecai v4 trained: Qwen3.5-9B, 3,369 examples, loss 0.20
- [x] v4 exported to GGUF Q4_K_M (5.3GB)
- [x] v4 deployed to prod (RTX 4000) — paper-ai + shrink-world
- [x] Mortdecai 0.4.0 trained: Qwen3.5-9B, 3,369 examples, loss 0.20
- [x] 0.4.0 exported to GGUF Q4_K_M (5.3GB)
- [x] 0.4.0 deployed to prod (RTX 4000) — paper-ai + shrink-world
- [x] Single-call mode enabled on prod
- [x] `/no_think` in all training data to suppress thinking tokens
- [x] Qwen3.5-9B base bake-off: 70.1% accuracy (2x Qwen3-8B)
- [~] v4 bake-off running on steel141
- [~] 0.4.0 bake-off running on steel141
### Validator & Safety
- [x] Error correction: detects RCON errors, asks model to fix, retries
@@ -120,7 +120,7 @@
- [x] Fall safety (25): height math, water/slime/hay awareness, intent detection
- [x] Suffocation (8): tp into blocks, sand/gravel crushing
- [x] Death/environment (26): drowning, lava, void, explosions, mobs, starvation, lightning
- [x] Revert-aware gamerules (8): revert_after field for v5
- [x] Revert-aware gamerules (8): revert_after field for 0.5.0
- [x] Drop/height (12): intentional drops, safe tp, slow_falling
- [x] Enchantment error correction (7): count-before-bracket, typos, old NBT
@@ -142,11 +142,11 @@
## Active TODOs
### Immediate
- [~] v4 bake-off running — publish results to Gitea when complete
- [ ] Fix v4 Modelfile chat template on RTX 4000 (done, needs verification)
- [~] 0.4.0 bake-off running — publish results to Gitea when complete
- [ ] Fix 0.4.0 Modelfile chat template on RTX 4000 (done, needs verification)
- [ ] Also fix on steel141's Ollama instance
### Short-term (v5 prep)
### Short-term (0.5.0 prep)
- [ ] Shared memory system: per-server JSON, owner-tagged, location/preference/fact types
- Player says "remember this is home" → AI writes location memory
- Other players can reference: "tp me to slingshooter08's home"
@@ -158,10 +158,10 @@
- [ ] Ingest all Gemini 3.1 Flash Lite training data
- [ ] More error correction from production RCON failures
### Model v5 Training
### Model 0.5.0 Training
- [ ] Train with tool-calling format (rcon.execute, wiki_lookup, world.player_info)
- [ ] `revert_after` / `revert_commands` in output schema
- [ ] Self-play generated data (200 rounds post-v4)
- [ ] Self-play generated data (200 rounds post-0.4.0)
- [ ] Memory read/write training examples
- [ ] Ground-level terrain detection training
- [ ] Fall damage math in model reasoning (not just validator)
@@ -177,7 +177,7 @@
### Content & Community
- [ ] Invite more playtesters via minecraft.mortdec.ai
- [ ] Update mortdec.ai README with v4 bake-off results
- [ ] Update mortdec.ai README with 0.4.0 bake-off results
- [ ] Consider public HuggingFace release
- [ ] WorldEdit schematic library expansion
@@ -208,13 +208,13 @@ Commands classified by permanence:
| Date | Decision | Rationale |
|------|----------|-----------|
| 03-18 | gemma3n:e4b for initial prod | Bake-off winner at 80.6% accuracy |
| 03-18 | Qwen3-8B for v1-v3 training | Best syntax quality, Apache 2.0 |
| 03-18 | Qwen3-8B for 0.1.0-0.3.0 training | Best syntax quality, Apache 2.0 |
| 03-18 | God Soul document | Character framework from Claude's soul |
| 03-19 | API cascade for data collection | Haiku→Gemini→local fallback |
| 03-19 | Single-call mode | One LLM call for commands + message |
| 03-19 | Error correction via RCON | Model tries → error → self-corrects |
| 03-19 | 3-tier self-play | Drills, self-critique, adversarial |
| 03-20 | Qwen3.5-9B for v4 | 2x base accuracy, native tool-calling |
| 03-20 | Qwen3.5-9B for 0.4.0 | 2x base accuracy, native tool-calling |
| 03-20 | Gamerule revert timers | Permanence determines risk level |
| 03-20 | Dangerous effect caps | Validator hardcodes max durations |
| 03-20 | Fall protection | Health check + intent detection before tp |
@@ -225,7 +225,7 @@ Commands classified by permanence:
## Success Criteria
| Metric | v3 | v4 (target) | v5 (goal) |
| Metric | 0.3.0 | 0.4.0 (target) | 0.5.0 (goal) |
|--------|:-:|:-:|:-:|
| Command accuracy | ~70% | 85%+ | 95%+ |
| Safety compliance | ~95% | 99%+ | 99.9%+ |
Binary file not shown.

After

Width:  |  Height:  |  Size: 670 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.3 KiB

File diff suppressed because it is too large Load Diff
+1 -1
View File
@@ -5,7 +5,7 @@
# Usage: nohup ./overnight_selfplay.sh > /var/log/selfplay_overnight.log 2>&1 &
# Kill with: pkill -f overnight_selfplay ; pkill -f self_play.py
MODEL="${1:-mortdecai-v4}"
MODEL="${1:-mortdecai:0.4.0}"
RCON_HOST="${2:-192.168.0.244}"
RCON_PORT="${3:-25578}"
ROUNDS_PER_TIER=50
+1 -1
View File
@@ -546,7 +546,7 @@ def trace_to_training(trace):
def main():
parser = argparse.ArgumentParser(description="Self-play training data generator")
parser.add_argument("--model", default="qwen3-8b-mc-lora-v3")
parser.add_argument("--model", default="mortdecai:0.4.0")
parser.add_argument("--ollama-url", default="http://192.168.0.141:11434")
parser.add_argument("--api-key", default=None, help="API key for authenticated gateways")
parser.add_argument("--rcon-host", default="192.168.0.244")