Mortdecai

Author	SHA1	Message	Date
Seth	5b28002001	0.6.0 training session: Oracle Bot, RL combat, Mind's Eye, multilingual pipeline Major changes from this session: Training: - 0.6.0 training running: 9B on steel141 3090 Ti, 27B on rented H100 NVL - 7,256 merged training examples (up from 3,183) - New training data: failure modes (85), midloop messaging (27), prompt injection defense (29), personality (32), gold from quarantine bank (232), new tool examples (30), claude's own experience (10) - All training data RCON-validated at 100% pass rate - Bake-off: gemma3:27b 66%, qwen3.5:27b 61%, translategemma:27b 56% Oracle Bot (Mind's Eye): - Invisible spectator bot (mineflayer) streams world state via WebSocket - HTML5 Canvas frontend at mind.mortdec.ai - Real-time tool trace visualization with expandable entries - Streaming model tokens during inference - Gateway integration: fire-and-forget POST /trace on every tool call Reinforcement Learning: - Gymnasium environment wrapping mineflayer bot (minecraft_env.py) - PPO training via Stable Baselines3 (10K param policy network) - Behavioral cloning pretraining (97.5% accuracy on expert policy) - Infinite training loop with auto-restart and checkpoint resume - Bot learns combat, survival, navigation from raw experience Bot Army: - 8-soldier marching formation with autonomous combat - Combat bots using mineflayer-pvp, pathfinder, armor-manager - Multilingual prayer bots via translategemma:27b (18 languages) - Frame-based AI architecture: LLM planner + reactive micro-scripts Infrastructure: - Fixed mattpc.sethpc.xyz billing gateway (API key + player list parser) - Billing gateway now tracks all LAN traffic (LAN auto-auth) - Gateway fallback for empty god-mode responses - Updated mortdec.ai landing page Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 20:22:50 -04:00
Mortdecai	9c2c9a2310	1200+ distilled gold examples, journal system, redstone mastery, safety awareness Distilled Training Data (1,203 examples): - 341 initial gold (plugins, enchantments, builds, effects, god, errors) - 165 buildings + pipeline (100 structures built on dev, 65 request→query→act) - 24 safety-aware (worldborder, safe tp, intentional harm, gamemode checks) - 17 advanced logic (decanonized items, redstone gates, iterative builds) - 12 redstone mastery (NOT/OR/AND/XOR/RS-latch/T-flip-flop/comparator/clock) - 7 circuit verification and diagnosis - 1 compact comparator gates - 10 redstone methodology (build→test→save→recall→learn from mistakes) - 8 player journal usage - 29 creative+uncommon+pipeline+god with full tool chains Player Journal System: - agent/tools/player_journal.py — per-player text files (1-10 lines) - journal.read + journal.write tool schemas added - Cross-contaminated: God and Sudo share same journal per player - Includes sentiment, relationship, builds, preferences, skill level Redstone Engineering: - agent/prompts/redstone_rules.md — baked-in wall torch, dedicated lead, repeater rules - Learned from 4 iterations of 8-switch circuit: wall_torch on back face, not top - T-junction bypass prevention: dedicated lead wire between merge and NOT block - RCON limitation: can build circuits but cannot test them (lever toggle doesn't propagate) Training Data Cleaning: - 466 @s→@p fixes, 10 template commands removed - 12 outdated refusals replaced with correct plugin commands - Data de-duped across all sources Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 20:50:52 -04:00
Mortdecai	d9acb653fe	Fix chart labels, add version history table to README Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 15:48:35 -04:00
Mortdecai	f5118505b1	0.5.0 bake-off results, knowledge lookup tools, training progress chart Bake-off (0.5.0 vs 0.4.0): - Overall: 46.8% vs 45.2% (+1.6%), 0 errors vs 2 - Enchantments: +47% (20% → 67%) - EssentialsX: +60% (0% → 60%) - Effects: +25% (0% → 25%) - Regressions: fill_build -67%, world -20% Knowledge Lookup Tools (4 new): - plugin.docs_lookup: WorldGuard, WorldEdit, CoreProtect, EssentialsX, LuckPerms docs - minecraft.changelog_lookup: version history from Minecraft Wiki - paper.docs_lookup: Paper server-specific documentation - Wired into gateway model-driven tool loop and exploration self-play Exploration Self-Play: - General (vanilla MC) and plugins focus modes - Wiki-grounded: model researches before acting, validates through RCON - 2,243 exploration examples generated, 150 kept after quality filtering Training Progress Chart: - SVG chart showing training examples and inverse loss across versions - Added to MODEL_CARD.md for Gitea display Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 15:28:09 -04:00
Mortdecai	da8f557219	GPU scheduler, 14-tool architecture, plugin deployment, event dispatcher GPU Scheduler (gpu.sethpc.xyz): - Live dashboard with 4 GPUs, training monitor, loss sparklines - Preset-based job scheduler with 3 triggers (time, finish_training, cost) - Model selection per GPU, pipeline configuration - Tool self-play and training pipeline types - Behind Google OAuth, live-refresh without page reload Tool Architecture (14 tools): - 3 new tools: world.nearby_entities, memory.read, memory.write - 7 script.* tools: write, validate, execute, read, list, delete, schedule - ScriptManager: full mcfunction datapack CRUD with RCON validation - Training data: 1,430 tool examples (up from 1,159) Plugin Deployment (paper-ai-25567): - WorldGuard 7.0.12, CoreProtect CE 23.1, EssentialsX 2.21.2, Vault 1.7.3 - Fresh greenfield world reset - 104 RCON-validated plugin training examples Event Dispatcher: - Watches server log for deaths, joins, advancements, PvP kills - Configurable trigger probability and cooldowns per event type - Deployed to dev server, fires god_system prompts on events - 21 event-response training examples Training Infrastructure: - train_lora.py: --save-steps 50, --resume from checkpoint - run_training.sh: stops Ollama, activates conda, restarts after - Passwordless sudo for ollama services on steel141 - Dev server added to MCSManager with autoStart Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 03:14:45 -04:00
Mortdecai	434589d098	Prompt pipeline: 1660 generates, bigger GPUs process via Mortdecai Architecture: - 1660 Super (qwen3.5:0.8b) generates diverse edge-case prompts - 2080 Ti / RTX 4000 / 3090 Ti process through Mortdecai + RCON validation - File-based queue with locking for multi-GPU coordination - 10 prompt categories targeting known weaknesses Categories: fill_syntax, enchantments, execute_chains, entity_targeting, gamerules_timed, memory_commands, creative_prayers, edge_items, multicommand, natural_language Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 00:08:48 -04:00
Seth	f39809eaca	Semver rename: v1-v5 → 0.1.0-0.5.0 across all files Versioning scheme: semantic versioning (MAJOR.MINOR.PATCH) - 0.x.0 = pre-release development - 1.0.0 = first public/monetized release Renamed everywhere: PLAN.md, training scripts, self-play, overnight script, status printer, whitelist app, discord bot, all training data references. Ollama models retagged: mortdecai-v4 → mortdecai:0.4.0 Server configs updated on all three servers. Self-play restarted with new model name. Entity targeting + radius-aware kill + distance scale training added. Seed dataset: 2,503 + tool: 1,159 + self-play: 5,059 = 8,721 total examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 21:37:14 -04:00
Seth	0f043384e5	Self-play: --api-key for authenticated gateway connections Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 19:40:01 -04:00
Seth	ead16fd429	Persistent RCON connections — fixes server crash from connection spam Root cause: self-play opened/closed a new TCP socket for every RCON command (hundreds/minute). Paper's RCON listener creates a thread per connection, overwhelming the server until it stopped. Fix: PersistentRCON class maintains a single connection per server with auto-reconnect. Thread-safe via lock. Connection pool keyed by host:port. Applied to: - mc_aigod_paper.py (prod paper-ai + dev) - mc_aigod.py (shrink-world) - self_play.py (training data generation) - persistent_rcon.py (shared module) Before: ~100+ RCON connections/minute → server crash After: 3 persistent connections total → stable Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 18:24:44 -04:00
Seth	25918b5b66	Self-play: 50 rounds, 0.1s sleep, max GPU utilization Bumped from 20 rounds/tier to 50. Reduced sleep from 1s to 0.1s. GPUs should run near 100% — Ollama queues requests internally. mortdecai-sites container (CT 650) created on pve112. Landing page live at mortdec.ai. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 07:36:01 -04:00
Seth	3580d350b4	Parallel 3-GPU self-play: all tiers run simultaneously Each cycle runs all three tiers at the same time on different GPUs: - Tier 1 (drills) on GPU A - Tier 2 (self-critique) on GPU B - Tier 3 (adversarial) on GPU C GPU assignments rotate each cycle for even wear. 3x throughput vs sequential. RCON handles concurrent commands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 00:55:24 -04:00
Seth	de14f4a1c8	3-GPU overnight self-play: 3090 Ti + 2080 Ti + RTX 4000 Round-robin load balancing across three Ollama instances: - 141:11434 (RTX 3090 Ti 24GB) - 141:11435 (RTX 2080 Ti 11GB) — new second instance - 179:11434 (RTX 4000 16GB) Each tier cycles to a different GPU. 3x throughput overnight. Cycles: Tier 1 drills → Tier 2 self-critique → Tier 3 adversarial → repeat Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 00:54:29 -04:00
Seth	a3d139e04f	Mortdecai v4 pre-training: /no_think, dedup, 3,369 examples - /no_think prepended to all system prompts (seed + tool training) - Deduplicated seed dataset (435 dupes removed) - Training script updated for Qwen3.5-9B + /no_think - 2,210 seed + 1,159 tool-calling = 3,369 total examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 20:15:00 -04:00
Seth	9abf9238c5	3-tier self-play: command drills, self-critique, adversarial Tier 1 — Command drills: Random seed prompts → generate commands → RCON validates Teaches: accurate command syntax Tier 2 — Single-shot self-critique: Model invents a tricky prompt AND responds in one call RCON validates the self-generated commands Teaches: edge-case awareness, self-evaluation Tier 3 — Adversarial self-play: Session A generates challenging prompts Fresh Session B responds cold (can't cheat) RCON validates, self-corrects on errors Teaches: robustness, generalization Usage: --tier 1\|2\|3\|all --rounds N --focus category Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 19:39:33 -04:00
Seth	c947fc3fa9	Self-play loop, Qwen3.5-9B bake-off: 70% base accuracy Self-play (training/scripts/self_play.py): - Model generates edge-case prompts across 9 categories - Attempts commands via RCON, self-corrects on errors - Successful traces → standard training examples - Error correction traces → multi-turn tool-calling examples - Anti-collapse: focuses on categories model is weakest in - Ready for v4 deployment, not yet active Qwen3.5-9B base model bake-off (147/1542 cases): - 70.1% OK (vs 34% Qwen3-8B base) — 2x improvement - 29.9% MISS (mostly God/prayer — no persona training) - 15.6% needed syntax fixes - Avg 7.5s response (thinking tokens) - Strong v4 candidate: better base + tool-calling architecture Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 19:35:57 -04:00
Seth	750cf15c79	1,542 seed + 1,159 tool-calling examples, async processing, validator tracking New knowledge baked in: - Enchantments (60): all 1.21 enchants, mutual exclusions, max levels, component syntax - WorldEdit (45): //set, //replace, //sphere, //stack, selection, brushes - Paper server (55): gamerules, permissions, plugins, scoreboard, moderation - Cosmetics/XP (42): title, tellraw, playsound, particle, xp, effect mechanics - Quantity boundaries (32): item tier caps, greedy→stingy, humble→generous Training infrastructure: - train_lora.py updated for multi-turn tool conversations + seed data - Async prayer/sudo processing (ThreadPoolExecutor, 3 workers) - Validator hit-rate tracking to /var/log/mc_validator_stats.json Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 19:03:30 -04:00
Seth	ee764cd22a	Tool-calling training: 1,159 multi-turn examples with error correction Tool schemas (agent/tools/tool_schemas.py): - rcon.execute: execute commands, get success/error results - minecraft.wiki_lookup: look up syntax and item info - world.player_info: player health, position, inventory - world.server_state: time, weather, online players - 10 RCON error patterns with corrections - 12 common error scenarios for training Training data generator (training/scripts/generate_tool_training.py): - Converts seed dataset to multi-turn tool conversations - Error correction: model tries wrong command → gets error → self-corrects - Wiki/player/server lookups for uncertainty scenarios - Qwen3 native tool-calling format with <tool_call> tags 1,159 examples: 1043 success, 79 error correction, 24 error scenarios, 13 tool lookups. Ready for v4 training. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 18:49:08 -04:00
Seth	e28836106f	Risk_level in all 644 examples + model outputs risk classification - All 644 examples tagged: 0=blocked(15), 1=refuse(33), 2=warn(24), 3=normal(498), 4=generous(74) - Training output now includes risk_level field for decision transparency - Model learns to classify risk before generating commands - Validator can sanity-check: risk 0-1 should have empty commands Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 22:35:50 -04:00
Seth	65ee146043	Swarm bots, RCON validation, Haiku distillation complete Swarm bots (ingame/swarm_bots.js): - 10 survival bots with generated names (SwiftWolf, DarkWolf, etc.) - All bots wander, take damage, auto-respawn, pray when hurt - Gemini + Dolphin(5%) + Multilingual(3%) prompt generation - 20-60s interaction interval per bot Distillation results: - 222 sudo examples via Haiku ($0.28) - 122 god examples via Haiku ($0.37) — with God Soul personality - Total: 344 distilled, $0.65 spent of $5 budget - RCON validation: 74.7% fully valid, 30 real errors out of ~1000 commands validate_distilled.py: - Executes distilled commands on live server via RCON - Distinguishes real errors from benign (no player online) - Tags each example with validation status Dev server switched to Claude Haiku via Anthropic API: - llm_provider: anthropic with $5 budget cap - Auto-fallback to Ollama when budget exhausted - Cost tracking with logging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 19:18:19 -04:00
Seth	961f53ea7d	God Soul document, Claude distillation pipeline, soul-driven prompts God Soul (agent/prompts/god_soul.md): - Adapted from Claude's soul framework for the Minecraft God character - Defines identity, principals hierarchy, decision-making framework - Spectrum of responses (generous→silence), risk awareness, multilingual divinity - Honesty within character, intervention guidelines - Deployed to both prod and dev servers System prompts updated: - God prompt loads soul document dynamically - Intervention prompt references soul for personality guidance - Both include multilingual instruction (match player's language) Distillation pipeline (training/scripts/distill.py): - Sends all training examples through Claude API - Haiku for sudo ($0.25), Sonnet for god ($0.50) - Budget-capped, cost-tracked, --dry-run supported - Outputs distilled.jsonl with Claude-quality responses Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 18:28:21 -04:00
Seth	62419976e5	361 training examples, default to 1 epoch Ingested 128 new examples from bot-driven data collection. Dropped: 86 duplicates, 19 language mismatches, 10 prompt leaks, 19 empty. Changed default epochs from 3 to 1 (previous run overfit at loss 0.10). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 18:03:33 -04:00
Seth	142e4fd3c4	Fix training script: bf16 for Ampere GPU, add system prompts to training data - Switch fp16 to bf16 (RTX 3090 Ti is Ampere, supports BF16 natively) - Include system prompt in training conversations (mode-aware: sudo/god/god_system) - Include message field only for god modes - Add determine_mode() and get_system_prompt() helpers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 16:26:47 -04:00
Seth	48b627d498	Add LoRA training scripts and fix bake-off token budget - training/scripts/train_lora.py: Unsloth QLoRA trainer for qwen3:8b - training/scripts/train_lora.sh: Launch script for steel141 RTX 3090 Ti - eval/bakeoff.py: Fixed token budget (400->1500) that caused qwen3 models to exhaust tokens on thinking, added --no-think flag - agent/serve.py: Default model changed to gemma3n:e4b Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 10:40:18 -04:00
Seth	7da28c8800	Add model bake-off harness and base model research Bake-off tested 7 models on 31 seed examples via GPU-accelerated Ollama on node-197 RTX 4000. gemma3n:e4b leads for serving (80.6% cmd match, 100% safety, 5.9s). qwen3:8b recommended as fine-tuning base (Apache 2.0, best syntax quality, strong ecosystem). Full research in MODEL_RESEARCH.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 08:54:11 -04:00
Seth	827850b8d7	Initial project scaffold: dataset schema, 31 seed training examples, Mineflayer bot framework, and 7-phase roadmap - IDEA.md: project scope (Minecraft ops AI assistant via qwen3-coder LoRA/SFT) - PLAN.md: complete roadmap with prior art analysis, architecture, phased plan, dev server docs - data/schema.json: training example JSON Schema with negative_output support - data/processed/seed_dataset.jsonl: 31 validated examples from repair code, prayer logs, session history - data/validate_dataset.py: schema validator with summary statistics - ingame/: Mineflayer bot framework (test_connect, spawn_bots, aware_bots with full event logging) - Directory structure for knowledge/, eval/, training/, agent/ (Phase 1.3+ work)	2026-03-18 01:51:28 -04:00

25 Commits