Model can now output revert_after (seconds) and revert_commands fields.
Python service schedules timer from model's response, not just heuristics.
Players notified of revert countdown. Revert announced when applied.
Training examples: temporary gamerules with explicit/implicit/no duration,
permanent changes (no revert), effects with built-in duration, combined reverts.
Key principle: no duration specified → default 5 min revert for safety.
"permanently"/"forever"/"always" → no revert.
Effects → built-in duration, no revert_after needed.
Seed dataset: 3,136 examples
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Versioning scheme: semantic versioning (MAJOR.MINOR.PATCH)
- 0.x.0 = pre-release development
- 1.0.0 = first public/monetized release
Renamed everywhere: PLAN.md, training scripts, self-play, overnight script,
status printer, whitelist app, discord bot, all training data references.
Ollama models retagged: mortdecai-v4 → mortdecai:0.4.0
Server configs updated on all three servers.
Self-play restarted with new model name.
Entity targeting + radius-aware kill + distance scale training added.
Seed dataset: 2,503 + tool: 1,159 + self-play: 5,059 = 8,721 total examples
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Teaches the model to distinguish:
- "kill the zombie" → limit=1,sort=nearest (specific target)
- "kill all zombies" → distance=..30 (area clear)
- "what mobs are nearby" → requires world.nearby_entities tool
- "target the closest enemy" → type=!player,limit=1,sort=nearest
With LangGraph tools enabled, world.nearby_entities gives the model
entity awareness before generating kill commands.
Seed dataset: 2,486 examples
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Teaches command ordering and dependencies:
- Build structure THEN tp inside (not reverse)
- Apply protection BEFORE spawning hostile mobs
- Create water pool BEFORE dropping player
- Effects before gear (protection active during equip)
- Clear mobs before healing (don't waste heal)
- Cage before tp victim (prevent escape)
Key principle: reasoning explains WHY order matters.
Seed dataset: 2,409 examples
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: self-play opened/closed a new TCP socket for every RCON command
(hundreds/minute). Paper's RCON listener creates a thread per connection,
overwhelming the server until it stopped.
Fix: PersistentRCON class maintains a single connection per server with
auto-reconnect. Thread-safe via lock. Connection pool keyed by host:port.
Applied to:
- mc_aigod_paper.py (prod paper-ai + dev)
- mc_aigod.py (shrink-world)
- self_play.py (training data generation)
- persistent_rcon.py (shared module)
Before: ~100+ RCON connections/minute → server crash
After: 3 persistent connections total → stable
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bumped from 20 rounds/tier to 50. Reduced sleep from 1s to 0.1s.
GPUs should run near 100% — Ollama queues requests internally.
mortdecai-sites container (CT 650) created on pve112.
Landing page live at mortdec.ai.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each cycle runs all three tiers at the same time on different GPUs:
- Tier 1 (drills) on GPU A
- Tier 2 (self-critique) on GPU B
- Tier 3 (adversarial) on GPU C
GPU assignments rotate each cycle for even wear.
3x throughput vs sequential. RCON handles concurrent commands.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Round-robin load balancing across three Ollama instances:
- 141:11434 (RTX 3090 Ti 24GB)
- 141:11435 (RTX 2080 Ti 11GB) — new second instance
- 179:11434 (RTX 4000 16GB)
Each tier cycles to a different GPU. 3x throughput overnight.
Cycles: Tier 1 drills → Tier 2 self-critique → Tier 3 adversarial → repeat
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Full rewrite reflecting current state:
- Model history v1→v4, infrastructure map, API spend
- Training data breakdown (3,477 total examples)
- Active TODOs: immediate, short-term, v5, infrastructure, community
- Risk hierarchy with permanence-based levels
- Key architecture decisions log
- Success criteria: v3 actual → v4 target → v5 goal
- Single-call enabled on prod (mortdecai-v3)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Python revert system (live on prod):
- Gamerule changes auto-revert after default timeout (5-10 min)
- User can specify duration: "disable mobs for 5 minutes"
- "permanently"/"forever" skips revert
- Setting back to default cancels pending revert
- Players notified of revert countdown
Training data (20 examples):
- 8 revert-aware gamerules with revert_after/revert_commands fields
- 12 drop/height/tp examples: intentional drops, safe tp, context-aware
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Validator hardcodes maximum durations for dangerous effects:
- Levitation: 15s max (player floats into sky and dies from fall)
- Wither: 30s max (drains health, can kill)
- Poison: 60s max
- Nausea: 30s max
12 training examples: levitation safety, emergency clear, duration caps,
"I can't stop floating" → clear levitation + slow falling
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prod deployment:
- paper-ai and shrink-world switched from gemma3n:e4b to qwen3.5:9b
- Error correction: detects RCON errors (<--[HERE]), asks model to fix, retries
- Broadened error patterns: Unknown game mode, Unknown enchantment, etc.
- Fixed fire fallback matching "firework" as fire intent
- Fixed command format examples (WRONG vs RIGHT in prompt)
- max_tokens bumped to 600 for command calls
- Removed template workflow commands from sudo prompt
Dev server:
- Gemini 2.5 Flash ($0.15/$0.60 per M tokens) replaces Flash Lite
- 10 bots for ~$1-1.5/hr training data generation
- Dynamic pricing by model name in cost tracker
Branding:
- Rajdhani Bold as official Mortdecai font
- Logo variants: mortdecai + mortdec.ai in 6 fonts
- Whitelist page updated with Mortdecai branding + mortdec.ai domain
Whitelist UUID fix:
- Looks up real Mojang UUID via api.mojang.com
- Patches all whitelist.json files directly
- No more offline-mode UUID mismatches
WorldEdit schematics:
- 77 schematics installed (villages, bridges, lighthouses, parks, etc.)
Mortdecai v4 training in progress: 63% complete on steel141
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bake-off: qwen3.5:9b base model, 147 cases:
- 70.1% command match (2x qwen3:8b baseline)
- 15.6% needed syntax fixes
- 29.9% miss (mostly God/prayer — no persona training)
- Avg 7.5s, median 5.7s (thinking tokens)
Model officially named Mortdecai.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Self-play (training/scripts/self_play.py):
- Model generates edge-case prompts across 9 categories
- Attempts commands via RCON, self-corrects on errors
- Successful traces → standard training examples
- Error correction traces → multi-turn tool-calling examples
- Anti-collapse: focuses on categories model is weakest in
- Ready for v4 deployment, not yet active
Qwen3.5-9B base model bake-off (147/1542 cases):
- 70.1% OK (vs 34% Qwen3-8B base) — 2x improvement
- 29.9% MISS (mostly God/prayer — no persona training)
- 15.6% needed syntax fixes
- Avg 7.5s response (thinking tokens)
- Strong v4 candidate: better base + tool-calling architecture
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
God Soul updated with quantity rules:
- Common (dirt/wood): max 320, Uncommon (iron/gold): max 128
- Rare (diamond/emerald): max 32, Very rare (netherite/elytra): max 4
- Forbidden (bedrock/command_block): never give
- Greedy → scaled back, Humble → generous within cap, Absurd → comedic
32 training examples: greedy(6), casual(6), humble(4), explicit(6),
forbidden(5), absurd(3), enchanted(2)
Dataset: 1,340 examples total
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v3 training:
- 1,308 examples: curated + Claude-distilled + bot audit + recipes + command ref
- 1 epoch, rank 16, LR 1e-4, loss 0.55 (sweet spot)
- GGUF Q4_K_M exported, loaded in Ollama as qwen3-8b-mc-lora-v3
- Correct commands, no Chinese, proper safety refusals, dramatic God persona
API cascade for dev server:
- Stage 1: Claude Haiku ($20 budget, ~$11 spent)
- Stage 2: Gemini 2.5 Flash Lite ($20 budget)
- Stage 3: qwen3-8b-mc-lora-v3 (free, local)
- Gemini call function with persistent cost tracking
- Full status report printed at each $1 milestone
Data collection: 2,677 dev audit entries and growing
Bot status printer budget display fix
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merged: 964 curated + 344 Claude-distilled = 1,308 total
All examples tagged with risk_level (0-4)
Model outputs risk classification in training target
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- All 644 examples tagged: 0=blocked(15), 1=refuse(33), 2=warn(24), 3=normal(498), 4=generous(74)
- Training output now includes risk_level field for decision transparency
- Model learns to classify risk before generating commands
- Validator can sanity-check: risk 0-1 should have empty commands
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Haiku cost persists to /var/log/mc_anthropic_cost.json (survives restarts)
- Status printer reads persistent cost file instead of journalctl
- Seeded at $3.08 estimated cumulative spend
- Whitelist app: Sethian Dark theme, mission description, server info
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Knowledge corpus (knowledge/mc-data/):
- 1505 items, 886 crafting recipes, 1166 blocks from minecraft-data 1.21.11
- Recipe dependency tree builder (knowledge/build_recipe_tree.py)
- Crafting chain training: "give me everything to make X from scratch"
- Smelting recipes, version awareness examples
Training data (644 examples total):
- 107 command syntax reference examples (every command + common errors)
- 176 recipe/crafting chain examples (63 crafting, 103 material-giving, 11 smelting)
- 344 Claude-distilled examples (222 sudo + 122 god via Haiku)
- Live bot audit data ingested (128 examples from dev server)
Swarm bots:
- Swimming/water escape logic
- Door opening
- Context-aware prayers (inventory, health, time, depth)
- Prefix enforcement on all Gemini/Dolphin prompts
GitHub log scraper (data/scrape_server_logs.py):
- Searches GitHub for Minecraft server logs with commands
- Strict 1.20.5+ version filter
- Extracts command pairs, converts to training format
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Swarm bots (ingame/swarm_bots.js):
- 10 survival bots with generated names (SwiftWolf, DarkWolf, etc.)
- All bots wander, take damage, auto-respawn, pray when hurt
- Gemini + Dolphin(5%) + Multilingual(3%) prompt generation
- 20-60s interaction interval per bot
Distillation results:
- 222 sudo examples via Haiku ($0.28)
- 122 god examples via Haiku ($0.37) — with God Soul personality
- Total: 344 distilled, $0.65 spent of $5 budget
- RCON validation: 74.7% fully valid, 30 real errors out of ~1000 commands
validate_distilled.py:
- Executes distilled commands on live server via RCON
- Distinguishes real errors from benign (no player online)
- Tags each example with validation status
Dev server switched to Claude Haiku via Anthropic API:
- llm_provider: anthropic with $5 budget cap
- Auto-fallback to Ollama when budget exhausted
- Cost tracking with logging
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
God Soul (agent/prompts/god_soul.md):
- Adapted from Claude's soul framework for the Minecraft God character
- Defines identity, principals hierarchy, decision-making framework
- Spectrum of responses (generous→silence), risk awareness, multilingual divinity
- Honesty within character, intervention guidelines
- Deployed to both prod and dev servers
System prompts updated:
- God prompt loads soul document dynamically
- Intervention prompt references soul for personality guidance
- Both include multilingual instruction (match player's language)
Distillation pipeline (training/scripts/distill.py):
- Sends all training examples through Claude API
- Haiku for sudo ($0.25), Sonnet for god ($0.50)
- Budget-capped, cost-tracked, --dry-run supported
- Outputs distilled.jsonl with Claude-quality responses
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ingested 128 new examples from bot-driven data collection.
Dropped: 86 duplicates, 19 language mismatches, 10 prompt leaks, 19 empty.
Changed default epochs from 3 to 1 (previous run overfit at loss 0.10).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
data/ingest_audit.py:
- Pulls training audit logs from CT 644 (dev + prod)
- Filters: language mismatch (Chinese output for English input), system
prompt leaks, empty responses, duplicates
- Keeps multilingual examples where input/output languages match
- Converts to dataset schema, appends to seed_dataset.jsonl
- --dry-run to preview, --source dev/prod/both
Tested: 237 entries → 112 kept (16 lang mismatch, 10 prompt leak, 86 dupe, 13 empty dropped)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bots:
- Dolphin-Mistral generates uncensored/offensive prompts (first 100, then 5%)
- PrayBot_0 runs survival mode: auto-respawn, contextual low-health prayers
- Gemini 2.5 Flash Lite for diverse natural prompts
- Fixed Gemini markdown wrapper parsing, dolphin JSON format
POS printer:
- Triggers on $0.50 Gemini cost threshold instead of fixed interval
- Checks every 15 min, only prints when threshold crossed
- --check flag to see current cost status
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prayer bots (ingame/prayer_bots.js):
- 3 Mineflayer bots that actively pray, sudo, and bug_log on dev server
- Gemini 2.5 Flash Lite generates diverse natural prompts on the fly
- Falls back to static pool if Gemini unavailable
- 15-45s interval per bot, 50/35/10/5 pray/sudo/bug/chat split
POS status printer (scripts/training_status_printer.py):
- Prints training data collection status to Epson TM-m30
- Tracks: dataset size, audit logs, bot activity, Gemini API cost, service status
- Triggers on $0.50 cost threshold (configurable), checks every 15 min
- --dry-run, --check, --force flags
Training:
- First LoRA run completed (233 examples, 3 epochs, loss 1.5→0.10)
- GGUF exported and loaded into Ollama as qwen3-8b-mc-lora on steel141
- Model is bad (expected) — hallucinating Chinese, leaking system prompt
- Deployed to dev server for live testing and data collection
- bf16 fix for Ampere GPU, system prompts included in training conversations
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>