Mortdecai

Author	SHA1	Message	Date
Seth	ead16fd429	Persistent RCON connections — fixes server crash from connection spam Root cause: self-play opened/closed a new TCP socket for every RCON command (hundreds/minute). Paper's RCON listener creates a thread per connection, overwhelming the server until it stopped. Fix: PersistentRCON class maintains a single connection per server with auto-reconnect. Thread-safe via lock. Connection pool keyed by host:port. Applied to: - mc_aigod_paper.py (prod paper-ai + dev) - mc_aigod.py (shrink-world) - self_play.py (training data generation) - persistent_rcon.py (shared module) Before: ~100+ RCON connections/minute → server crash After: 3 persistent connections total → stable Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 18:24:44 -04:00
Seth	ee764cd22a	Tool-calling training: 1,159 multi-turn examples with error correction Tool schemas (agent/tools/tool_schemas.py): - rcon.execute: execute commands, get success/error results - minecraft.wiki_lookup: look up syntax and item info - world.player_info: player health, position, inventory - world.server_state: time, weather, online players - 10 RCON error patterns with corrections - 12 common error scenarios for training Training data generator (training/scripts/generate_tool_training.py): - Converts seed dataset to multi-turn tool conversations - Error correction: model tries wrong command → gets error → self-corrects - Wiki/player/server lookups for uncertainty scenarios - Qwen3 native tool-calling format with <tool_call> tags 1,159 examples: 1043 success, 79 error correction, 24 error scenarios, 13 tool lookups. Ready for v4 training. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 18:49:08 -04:00
Seth	4e83da39fd	Quantity boundaries: item tier caps, tone-based scaling, 32 training examples God Soul updated with quantity rules: - Common (dirt/wood): max 320, Uncommon (iron/gold): max 128 - Rare (diamond/emerald): max 32, Very rare (netherite/elytra): max 4 - Forbidden (bedrock/command_block): never give - Greedy → scaled back, Humble → generous within cap, Absurd → comedic 32 training examples: greedy(6), casual(6), humble(4), explicit(6), forbidden(5), absurd(3), enchanted(2) Dataset: 1,340 examples total Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 18:22:26 -04:00
Seth	961f53ea7d	God Soul document, Claude distillation pipeline, soul-driven prompts God Soul (agent/prompts/god_soul.md): - Adapted from Claude's soul framework for the Minecraft God character - Defines identity, principals hierarchy, decision-making framework - Spectrum of responses (generous→silence), risk awareness, multilingual divinity - Honesty within character, intervention guidelines - Deployed to both prod and dev servers System prompts updated: - God prompt loads soul document dynamically - Intervention prompt references soul for personality guidance - Both include multilingual instruction (match player's language) Distillation pipeline (training/scripts/distill.py): - Sends all training examples through Claude API - Haiku for sudo ($0.25), Sonnet for god ($0.50) - Budget-capped, cost-tracked, --dry-run supported - Outputs distilled.jsonl with Claude-quality responses Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 18:28:21 -04:00
Seth	78031d16c0	Risk gradient (0-5), updated system prompts, 233 examples Risk gradient system: - All 233 training examples tagged with risk_level (0-5) - 0=blocked(15), 1=refuse(9), 2=warn(17), 3=normal(169), 4=generous(23) - Schema updated with risk_level and scoring_mode fields - Eval harness uses risk_level for safety scoring System prompts rewritten: - Shared syntax rules and risk gradient reference across all modes - Sudo: permission level 4, do what admin asks, only refuse level 0-1 - God: permission level 2-4 (mood-dependent), character-driven decisions - God_system: permission level 3, 80% benevolent / 15% mischievous / 5% wrathful Data: - 20 new live playtest examples from training audit log (233 total) - 43 wrong→right pairs (17 from validator repairs) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 16:14:54 -04:00
Seth	48b627d498	Add LoRA training scripts and fix bake-off token budget - training/scripts/train_lora.py: Unsloth QLoRA trainer for qwen3:8b - training/scripts/train_lora.sh: Launch script for steel141 RTX 3090 Ti - eval/bakeoff.py: Fixed token budget (400->1500) that caused qwen3 models to exhaust tokens on thinking, added --no-think flag - agent/serve.py: Default model changed to gemma3n:e4b Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 10:40:18 -04:00
Seth	e00d454b19	Add baseline assistant with tools, guardrails, and system prompts (Phase 1.4) - agent/serve.py: CLI assistant with interactive, single-query, and eval modes (Ollama + qwen3-coder) - agent/tools/rcon_tool.py: RCON execute, server status, player info - agent/tools/knowledge_tool.py: TF-IDF RAG search, command reference lookup, server context - agent/guardrails/command_filter.py: 14-prefix allowlist, execute-tail bypass detection, destructive flags, 1.21 syntax warnings, audit log - agent/prompts/system_prompts.py: sudo (pure commands), god (persona), intervention (benign) system prompts - Guardrails tested: 10/10 allowlist, 5/6 syntax warnings pass	2026-03-18 02:12:20 -04:00
Seth	827850b8d7	Initial project scaffold: dataset schema, 31 seed training examples, Mineflayer bot framework, and 7-phase roadmap - IDEA.md: project scope (Minecraft ops AI assistant via qwen3-coder LoRA/SFT) - PLAN.md: complete roadmap with prior art analysis, architecture, phased plan, dev server docs - data/schema.json: training example JSON Schema with negative_output support - data/processed/seed_dataset.jsonl: 31 validated examples from repair code, prayer logs, session history - data/validate_dataset.py: schema validator with summary statistics - ingame/: Mineflayer bot framework (test_connect, spawn_bots, aware_bots with full event logging) - Directory structure for knowledge/, eval/, training/, agent/ (Phase 1.3+ work)	2026-03-18 01:51:28 -04:00

8 Commits