Mortdecai

Author	SHA1	Message	Date
Mortdecai	f5118505b1	0.5.0 bake-off results, knowledge lookup tools, training progress chart Bake-off (0.5.0 vs 0.4.0): - Overall: 46.8% vs 45.2% (+1.6%), 0 errors vs 2 - Enchantments: +47% (20% → 67%) - EssentialsX: +60% (0% → 60%) - Effects: +25% (0% → 25%) - Regressions: fill_build -67%, world -20% Knowledge Lookup Tools (4 new): - plugin.docs_lookup: WorldGuard, WorldEdit, CoreProtect, EssentialsX, LuckPerms docs - minecraft.changelog_lookup: version history from Minecraft Wiki - paper.docs_lookup: Paper server-specific documentation - Wired into gateway model-driven tool loop and exploration self-play Exploration Self-Play: - General (vanilla MC) and plugins focus modes - Wiki-grounded: model researches before acting, validates through RCON - 2,243 exploration examples generated, 150 kept after quality filtering Training Progress Chart: - SVG chart showing training examples and inverse loss across versions - Added to MODEL_CARD.md for Gitea display Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 15:28:09 -04:00
Mortdecai	da8f557219	GPU scheduler, 14-tool architecture, plugin deployment, event dispatcher GPU Scheduler (gpu.sethpc.xyz): - Live dashboard with 4 GPUs, training monitor, loss sparklines - Preset-based job scheduler with 3 triggers (time, finish_training, cost) - Model selection per GPU, pipeline configuration - Tool self-play and training pipeline types - Behind Google OAuth, live-refresh without page reload Tool Architecture (14 tools): - 3 new tools: world.nearby_entities, memory.read, memory.write - 7 script.* tools: write, validate, execute, read, list, delete, schedule - ScriptManager: full mcfunction datapack CRUD with RCON validation - Training data: 1,430 tool examples (up from 1,159) Plugin Deployment (paper-ai-25567): - WorldGuard 7.0.12, CoreProtect CE 23.1, EssentialsX 2.21.2, Vault 1.7.3 - Fresh greenfield world reset - 104 RCON-validated plugin training examples Event Dispatcher: - Watches server log for deaths, joins, advancements, PvP kills - Configurable trigger probability and cooldowns per event type - Deployed to dev server, fires god_system prompts on events - 21 event-response training examples Training Infrastructure: - train_lora.py: --save-steps 50, --resume from checkpoint - run_training.sh: stops Ollama, activates conda, restarts after - Passwordless sudo for ollama services on steel141 - Dev server added to MCSManager with autoStart Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 03:14:45 -04:00
Mortdecai	434589d098	Prompt pipeline: 1660 generates, bigger GPUs process via Mortdecai Architecture: - 1660 Super (qwen3.5:0.8b) generates diverse edge-case prompts - 2080 Ti / RTX 4000 / 3090 Ti process through Mortdecai + RCON validation - File-based queue with locking for multi-GPU coordination - 10 prompt categories targeting known weaknesses Categories: fill_syntax, enchantments, execute_chains, entity_targeting, gamerules_timed, memory_commands, creative_prayers, edge_items, multicommand, natural_language Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 00:08:48 -04:00
Mortdecai	3c1cbfce39	Shared player memory system + 39 training examples Memory system (agent/tools/player_memory.py): - Per-server JSON with owner tagging, cross-player references - Location, preference, fact memory types - Thread-safe, 50/player 500/server limits - format_memory_context() injected into LLM prompts Model output wired (mc_aigod_paper.py): - memory_write processed → saves to JSON, confirms in chat - memory_read processed → displays results in chat - Memory context injected into prayer prompts 39 training examples: - 7 location saves ("remember this as home") - 7 location recalls + tp ("tp me home", cross-player) - 5 memory queries ("what do you know about me") - 3 memory deletes - 4 preferences ("I prefer diamond tools") - 4 facts ("I am building a castle") - 4 memory-informed commands (give tools for current project) - 5 edge cases (no memory found, server-wide, overwrite) Seed dataset: 3,175 examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 23:37:32 -04:00
Mortdecai	8158178a56	Shared player memory system + whitelist migration to CT 650 player_memory.py: - Per-server JSON with owner tagging, cross-player references - write/read/delete with thread safety and limits (50/player, 500/server) - format_memory_context() for LLM prompt injection - handle_memory_write/read for model output processing - MODEL_OUTPUT_SCHEMA with commands, memory_write, memory_read, revert_after mortdecai-sites (CT 650): - Whitelist app migrated from CT 644, RCON via LAN (192.168.0.244) - All 4 sites verified: mortdec.ai, docs, git, minecraft Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 23:28:04 -04:00
Mortdecai	84036d39ca	revert_after in model output + 20 training examples Model can now output revert_after (seconds) and revert_commands fields. Python service schedules timer from model's response, not just heuristics. Players notified of revert countdown. Revert announced when applied. Training examples: temporary gamerules with explicit/implicit/no duration, permanent changes (no revert), effects with built-in duration, combined reverts. Key principle: no duration specified → default 5 min revert for safety. "permanently"/"forever"/"always" → no revert. Effects → built-in duration, no revert_after needed. Seed dataset: 3,136 examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 23:25:20 -04:00
Seth	06b082bd21	0.5.0 pre-training: 9,444 examples, prod pattern fixes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 21:48:54 -04:00
Seth	bd65f4a84c	Add LICENSE, MODEL_CARD, requirements, CONTRIBUTING Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 21:43:21 -04:00
Seth	f39809eaca	Semver rename: v1-v5 → 0.1.0-0.5.0 across all files Versioning scheme: semantic versioning (MAJOR.MINOR.PATCH) - 0.x.0 = pre-release development - 1.0.0 = first public/monetized release Renamed everywhere: PLAN.md, training scripts, self-play, overnight script, status printer, whitelist app, discord bot, all training data references. Ollama models retagged: mortdecai-v4 → mortdecai:0.4.0 Server configs updated on all three servers. Self-play restarted with new model name. Entity targeting + radius-aware kill + distance scale training added. Seed dataset: 2,503 + tool: 1,159 + self-play: 5,059 = 8,721 total examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 21:37:14 -04:00
Seth	a03c0a8087	17 radius-aware kill examples: context determines blast radius Radius scales with intent: - "the zombie" → limit=1,sort=nearest,distance=..10 (surgical, risk 3) - "all zombies near me" → distance=..30 (area clear, risk 3) - "everything in the area" → distance=..100 (large, risk 2) - "every mob everywhere" → no distance cap (risk 1, refuses by default) Context-aware radius: - "attacking me" → 15 (melee range) - "shooting at me" → 20 (bow range) - "this building" → 25 (structure) - "whole city" → 500 (massive) - "the farm" → 30 + specific animal types Seed dataset: 2,503 examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 21:27:20 -04:00
Seth	634f0137bb	10 entity targeting examples: THE zombie vs ALL zombies Teaches the model to distinguish: - "kill the zombie" → limit=1,sort=nearest (specific target) - "kill all zombies" → distance=..30 (area clear) - "what mobs are nearby" → requires world.nearby_entities tool - "target the closest enemy" → type=!player,limit=1,sort=nearest With LangGraph tools enabled, world.nearby_entities gives the model entity awareness before generating kill commands. Seed dataset: 2,486 examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 21:25:03 -04:00
Seth	5c71976a34	22 distance scale examples: 1 block to 30 million Scale reference baked into training: - slightly (1-3) → close (5) → nearby (20-30) → far (500) → very far (1000) - edge of world (29,999,900) → max tp distance - Vertical: bedrock (-60) → diamond (-59) → sea (63) → clouds (192) → build limit (319) - Nether 1:8 scale mechanics - Real world: 1 block = 1 meter, mile = 1609 blocks, marathon = 42195 blocks - World size: 60M × 60M blocks (surface area of Neptune) - Chunk = 16 blocks, region = 512 blocks Seed dataset: 2,476 examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 21:23:11 -04:00
Seth	b6e5874a11	45 new examples: chaos events, fireball/projectile mechanics, distance concepts Chaos events (4): multi-phase dramatic sequences, earthquake, TNT rain, zen transition Chaos gaps (23): pray thunder/lightning, execute at @a patterns, charged creepers, custom NBT fuse, magma blocks, lava, obsidian, music discs, say broadcast Distance/projectile (18): far/near/close in blocks, fireball Motion+ExplosionPower, dragon fireball, wither skull, arrow/trident entities, mob spawn/aggro ranges, explosion radius reference (creeper/TNT/wither/bed) Gateway updated: single-call mode with full tooling on all servers. Seed dataset: 2,454 examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 21:20:30 -04:00
Seth	0f043384e5	Self-play: --api-key for authenticated gateway connections Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 19:40:01 -04:00
Seth	aa5400e31e	12 multi-step dependency training examples Teaches command ordering and dependencies: - Build structure THEN tp inside (not reverse) - Apply protection BEFORE spawning hostile mobs - Create water pool BEFORE dropping player - Effects before gear (protection active during equip) - Clear mobs before healing (don't waste heal) - Cage before tp victim (prevent escape) Key principle: reasoning explains WHY order matters. Seed dataset: 2,409 examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 18:43:03 -04:00
Seth	ead16fd429	Persistent RCON connections — fixes server crash from connection spam Root cause: self-play opened/closed a new TCP socket for every RCON command (hundreds/minute). Paper's RCON listener creates a thread per connection, overwhelming the server until it stopped. Fix: PersistentRCON class maintains a single connection per server with auto-reconnect. Thread-safe via lock. Connection pool keyed by host:port. Applied to: - mc_aigod_paper.py (prod paper-ai + dev) - mc_aigod.py (shrink-world) - self_play.py (training data generation) - persistent_rcon.py (shared module) Before: ~100+ RCON connections/minute → server crash After: 3 persistent connections total → stable Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 18:24:44 -04:00
Seth	67179f75ad	Self-play data + mortdecai-sites container + Grafana 3-GPU dashboard Self-play: 218+ examples from overnight 3-GPU run (3090 Ti + 2080 Ti + RTX 4000) Now running independently per GPU (no synchronization bottleneck) 50 rounds/tier, 0.1s sleep — near 100% GPU utilization Infrastructure: - CT 650 (mortdecai-sites) on pve112: landing page + docs + Gitea - mortdec.ai landing page live - docs.mortdec.ai MkDocs with Material theme - git.mortdec.ai Gitea instance (fresh, needs admin setup) - GPU exporter on RTX 4000 (node-197) - Mortdecai GPU Monitoring dashboard in Grafana (all 3 GPUs) - DNS updated via SethDDNS (GCP + Cloudflare) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 08:06:51 -04:00
Seth	25918b5b66	Self-play: 50 rounds, 0.1s sleep, max GPU utilization Bumped from 20 rounds/tier to 50. Reduced sleep from 1s to 0.1s. GPUs should run near 100% — Ollama queues requests internally. mortdecai-sites container (CT 650) created on pve112. Landing page live at mortdec.ai. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 07:36:01 -04:00
Seth	dcc40a0bf8	Mortdecai v4 bake-off: 75.5% cmd match, 99.7% safety, 4.0s avg 2,397 test cases on steel141 RTX 3090 Ti: - Command match: 75.5% - Exact match: 22.9% - Syntax correct: 80.5% - Safety compliance: 99.7% - No gratuitous tp: 98.5% - Avg latency: 4006ms Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 05:55:14 -04:00
Seth	027b835286	Session final: bakeoff fix, branding fonts, 3-GPU parallel self-play Current running state: - Prod: mortdecai-v4 on RTX 4000, single-call, error correction, fall protection - Dev: Gemini 3.1 Flash Lite (preview) + 5 bots generating training data - Bake-off: v4 running on steel141 (3090 Ti) - Self-play: ready for overnight — 3 GPUs parallel (3090 Ti + 2080 Ti + RTX 4000) Changes: - Bakeoff parser: strips think blocks, handles dict/list types - Branding fonts: Rajdhani-Bold (official), Exo2, Orbitron, Oxanium, SpaceGrotesk - Gemini 3.1 pricing added to cost tracker Active data collection: - Gemini 3.1 Flash Lite bots on dev ($20 budget, ~$4/hr with 5 bots) - Self-play overnight: 3 tiers × 3 GPUs = ~9x throughput - Training audit logging on all servers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 00:56:45 -04:00
Seth	3580d350b4	Parallel 3-GPU self-play: all tiers run simultaneously Each cycle runs all three tiers at the same time on different GPUs: - Tier 1 (drills) on GPU A - Tier 2 (self-critique) on GPU B - Tier 3 (adversarial) on GPU C GPU assignments rotate each cycle for even wear. 3x throughput vs sequential. RCON handles concurrent commands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 00:55:24 -04:00
Seth	de14f4a1c8	3-GPU overnight self-play: 3090 Ti + 2080 Ti + RTX 4000 Round-robin load balancing across three Ollama instances: - 141:11434 (RTX 3090 Ti 24GB) - 141:11435 (RTX 2080 Ti 11GB) — new second instance - 179:11434 (RTX 4000 16GB) Each tier cycles to a different GPU. 3x throughput overnight. Cycles: Tier 1 drills → Tier 2 self-critique → Tier 3 adversarial → repeat Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 00:54:29 -04:00
Seth	9ef5ab5aa4	PLAN.md complete update — v4 deployed, all session work documented Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 00:49:57 -04:00
Seth	7ae9a499fa	26 death/environment training examples, Mortdecai v4 deployed Death mechanics training: - Drowning (3): water trap, water breathing, emergency rescue - Lava (3): lava pool, fire resistance, emergency rescue - Void (2): below Y=-64, void damage explanation - Explosion (3): TNT, charged creeper, bed in nether - Mob proximity (5): warden, zombie range, skeleton range, mob surround, combat buffs - Starvation (2): hunger effect, food bar mechanics - Contact damage (3): cactus, magma blocks, berry bushes - Lightning (2): direct strike, thunderstorm combo - Environment awareness (3): safety check, time query, night danger Mortdecai v4 deployed to prod (paper-ai + shrink-world) Dev on Gemini 3.1 Flash Lite with 5 bots ($20 budget, ~5hr) Seed dataset: 2,397 examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 00:26:50 -04:00
Seth	d7138b3514	33 fall safety + suffocation training examples, fall damage test data Fall safety (25 examples): - Fall damage math (distance-3 = damage, 23 blocks = lethal) - Water/slime/hay/cobweb negate or reduce fall damage - Intent detection: "drop me" = no protection, "tp me up" = add slow_falling - Height-specific: 4m trivial, 10m hurts, 20m+ needs protection - Surface awareness: water safe, lava half damage + burn Suffocation (8 examples): - TP into solid block = suffocation (1 heart/0.5s) - Sand/gravel crushing (gravity blocks) - Obsidian trap, underground tp - Safety: don't tp into blocks unintentionally Raw fall damage test results from dev server (noisy but informative) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 00:07:36 -04:00
Seth	98d035439d	PLAN.md complete rewrite — Mortdecai project status, TODOs, risk hierarchy Full rewrite reflecting current state: - Model history v1→v4, infrastructure map, API spend - Training data breakdown (3,477 total examples) - Active TODOs: immediate, short-term, v5, infrastructure, community - Risk hierarchy with permanence-based levels - Key architecture decisions log - Success criteria: v3 actual → v4 target → v5 goal - Single-call enabled on prod (mortdecai-v3) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 23:45:03 -04:00
Seth	4fc94170e4	Gamerule revert timers, drop/height training, revert_after field for v5 Python revert system (live on prod): - Gamerule changes auto-revert after default timeout (5-10 min) - User can specify duration: "disable mobs for 5 minutes" - "permanently"/"forever" skips revert - Setting back to default cancels pending revert - Players notified of revert countdown Training data (20 examples): - 8 revert-aware gamerules with revert_after/revert_commands fields - 12 drop/height/tp examples: intentional drops, safe tp, context-aware Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 23:42:22 -04:00
Seth	edfc365c5f	Dangerous effect caps: levitation 15s, wither 30s, poison 60s, nausea 30s Validator hardcodes maximum durations for dangerous effects: - Levitation: 15s max (player floats into sky and dies from fall) - Wither: 30s max (drains health, can kill) - Poison: 60s max - Nausea: 30s max 12 training examples: levitation safety, emergency clear, duration caps, "I can't stop floating" → clear levitation + slow falling Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 23:35:57 -04:00
Seth	b85b1a6725	40 risk hierarchy examples: L0 blocked, L1 permanent, L2 temporary, injections Risk hierarchy baked into training data: - L0 BLOCKED (15): ban, kick, stop, op, deop, whitelist, pardon, ban-ip - L1 REFUSE (9): permanent gamerules, gamemode @a, default gamemode, difficulty - L2 WARN (8): temporary gamerules with reversal intent, time-limited changes - L3 NORMAL (8): time/weather, tick speed, sleep %, chat cleanup - Prompt injection (5): fake admin claims, permission override attempts Key principle: permanence determines risk level. gamerule keepInventory true (permanent) = L1 gamerule doMobSpawning false for 5 min (temporary) = L2 randomTickSpeed 50 (easily reversed) = L3 Seed dataset: 2,306 examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 23:30:46 -04:00
Seth	fbf6974af3	49 gamerule + invincibility training examples Covers all major gamerules with natural language variants: - Mob spawning/griefing, keepInventory, daylightCycle, weatherCycle - Fire tick, insomnia/phantoms, instant respawn, natural regen - randomTickSpeed (crop growth), sleep percentage, TNT, fall/fire/drowning damage - Command feedback, advancement announcements, death messages - God mode / invincibility via resistance 5 effect - "disable mobs" and "invincibility me" — prompted by prod failures Seed dataset: 2,266 examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 23:27:26 -04:00
Seth	7a31e500e4	Qwen3.5-9B on prod, Gemini 2.5 Flash for dev, error correction, branding Prod deployment: - paper-ai and shrink-world switched from gemma3n:e4b to qwen3.5:9b - Error correction: detects RCON errors (<--[HERE]), asks model to fix, retries - Broadened error patterns: Unknown game mode, Unknown enchantment, etc. - Fixed fire fallback matching "firework" as fire intent - Fixed command format examples (WRONG vs RIGHT in prompt) - max_tokens bumped to 600 for command calls - Removed template workflow commands from sudo prompt Dev server: - Gemini 2.5 Flash ($0.15/$0.60 per M tokens) replaces Flash Lite - 10 bots for ~$1-1.5/hr training data generation - Dynamic pricing by model name in cost tracker Branding: - Rajdhani Bold as official Mortdecai font - Logo variants: mortdecai + mortdec.ai in 6 fonts - Whitelist page updated with Mortdecai branding + mortdec.ai domain Whitelist UUID fix: - Looks up real Mojang UUID via api.mojang.com - Patches all whitelist.json files directly - No more offline-mode UUID mismatches WorldEdit schematics: - 77 schematics installed (villages, bridges, lighthouses, parks, etc.) Mortdecai v4 training in progress: 63% complete on steel141 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 23:09:27 -04:00
Seth	b75a737c11	7 enchantment syntax error examples: count order, typos, old NBT Common errors seen in prod: - Count before brackets: sword 1[enchantments=...] → sword[enchantments=...] 1 - Typo: enchanments → enchantments - Singular: enchantment → enchantments - Old NBT: {Enchantments:[...]} → [enchantments={...}] - Old abbreviated: {ench:[...]} → [enchantments={...}] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 22:20:33 -04:00
Seth	a3d139e04f	Mortdecai v4 pre-training: /no_think, dedup, 3,369 examples - /no_think prepended to all system prompts (seed + tool training) - Deduplicated seed dataset (435 dupes removed) - Training script updated for Qwen3.5-9B + /no_think - 2,210 seed + 1,159 tool-calling = 3,369 total examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 20:15:00 -04:00
Seth	910d7b4ca7	Qwen3.5-9B bake-off results, model named Mortdecai Bake-off: qwen3.5:9b base model, 147 cases: - 70.1% command match (2x qwen3:8b baseline) - 15.6% needed syntax fixes - 29.9% miss (mostly God/prayer — no persona training) - Avg 7.5s, median 5.7s (thinking tokens) Model officially named Mortdecai. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 19:46:00 -04:00
Seth	9abf9238c5	3-tier self-play: command drills, self-critique, adversarial Tier 1 — Command drills: Random seed prompts → generate commands → RCON validates Teaches: accurate command syntax Tier 2 — Single-shot self-critique: Model invents a tricky prompt AND responds in one call RCON validates the self-generated commands Teaches: edge-case awareness, self-evaluation Tier 3 — Adversarial self-play: Session A generates challenging prompts Fresh Session B responds cold (can't cheat) RCON validates, self-corrects on errors Teaches: robustness, generalization Usage: --tier 1\|2\|3\|all --rounds N --focus category Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 19:39:33 -04:00
Seth	c947fc3fa9	Self-play loop, Qwen3.5-9B bake-off: 70% base accuracy Self-play (training/scripts/self_play.py): - Model generates edge-case prompts across 9 categories - Attempts commands via RCON, self-corrects on errors - Successful traces → standard training examples - Error correction traces → multi-turn tool-calling examples - Anti-collapse: focuses on categories model is weakest in - Ready for v4 deployment, not yet active Qwen3.5-9B base model bake-off (147/1542 cases): - 70.1% OK (vs 34% Qwen3-8B base) — 2x improvement - 29.9% MISS (mostly God/prayer — no persona training) - 15.6% needed syntax fixes - Avg 7.5s response (thinking tokens) - Strong v4 candidate: better base + tool-calling architecture Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 19:35:57 -04:00
Seth	d31cdb21fd	1,833 training examples: entities, execute chains, multiplayer, advanced, redstone, biomes, errors New knowledge (291 examples): - Entity/mob commands (60): summon, kill, NBT, spawn eggs, passengers, named mobs - Execute chains (45): as/at/positioned/if/unless/store, dimension switching - Multiplayer targeting (45): selectors, teams, scoreboards, bossbars, tags - Advanced commands (45): tellraw, loot, clone, data, attributes, ride, forceload - Redstone knowledge (28): repeaters, comparators, pistons, observers, hoppers - Biome/dimension (28): nether/end tp, locate structure/biome, dimension awareness - Error correction (40): item ID fixes, enchant abbreviations, syntax mistakes Total seed dataset: 1,833 examples Tool-calling dataset: 1,159 examples Combined for v4 training: ~3,000 examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 19:22:32 -04:00
Seth	750cf15c79	1,542 seed + 1,159 tool-calling examples, async processing, validator tracking New knowledge baked in: - Enchantments (60): all 1.21 enchants, mutual exclusions, max levels, component syntax - WorldEdit (45): //set, //replace, //sphere, //stack, selection, brushes - Paper server (55): gamerules, permissions, plugins, scoreboard, moderation - Cosmetics/XP (42): title, tellraw, playsound, particle, xp, effect mechanics - Quantity boundaries (32): item tier caps, greedy→stingy, humble→generous Training infrastructure: - train_lora.py updated for multi-turn tool conversations + seed data - Async prayer/sudo processing (ThreadPoolExecutor, 3 workers) - Validator hit-rate tracking to /var/log/mc_validator_stats.json Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 19:03:30 -04:00
Seth	ee764cd22a	Tool-calling training: 1,159 multi-turn examples with error correction Tool schemas (agent/tools/tool_schemas.py): - rcon.execute: execute commands, get success/error results - minecraft.wiki_lookup: look up syntax and item info - world.player_info: player health, position, inventory - world.server_state: time, weather, online players - 10 RCON error patterns with corrections - 12 common error scenarios for training Training data generator (training/scripts/generate_tool_training.py): - Converts seed dataset to multi-turn tool conversations - Error correction: model tries wrong command → gets error → self-corrects - Wiki/player/server lookups for uncertainty scenarios - Qwen3 native tool-calling format with <tool_call> tags 1,159 examples: 1043 success, 79 error correction, 24 error scenarios, 13 tool lookups. Ready for v4 training. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 18:49:08 -04:00
Seth	4e83da39fd	Quantity boundaries: item tier caps, tone-based scaling, 32 training examples God Soul updated with quantity rules: - Common (dirt/wood): max 320, Uncommon (iron/gold): max 128 - Rare (diamond/emerald): max 32, Very rare (netherite/elytra): max 4 - Forbidden (bedrock/command_block): never give - Greedy → scaled back, Humble → generous within cap, Absurd → comedic 32 training examples: greedy(6), casual(6), humble(4), explicit(6), forbidden(5), absurd(3), enchanted(2) Dataset: 1,340 examples total Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 18:22:26 -04:00
Seth	e780aef8c6	v3 model trained (1,308 examples, loss 0.55), API cascade, context update v3 training: - 1,308 examples: curated + Claude-distilled + bot audit + recipes + command ref - 1 epoch, rank 16, LR 1e-4, loss 0.55 (sweet spot) - GGUF Q4_K_M exported, loaded in Ollama as qwen3-8b-mc-lora-v3 - Correct commands, no Chinese, proper safety refusals, dramatic God persona API cascade for dev server: - Stage 1: Claude Haiku ($20 budget, ~$11 spent) - Stage 2: Gemini 2.5 Flash Lite ($20 budget) - Stage 3: qwen3-8b-mc-lora-v3 (free, local) - Gemini call function with persistent cost tracking - Full status report printed at each $1 milestone Data collection: 2,677 dev audit entries and growing Bot status printer budget display fix Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 04:52:04 -04:00
Seth	234f2722db	v3 training dataset: 1,308 examples with risk_level + distilled data Merged: 964 curated + 344 Claude-distilled = 1,308 total All examples tagged with risk_level (0-4) Model outputs risk classification in training target Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 22:51:17 -04:00
Seth	e28836106f	Risk_level in all 644 examples + model outputs risk classification - All 644 examples tagged: 0=blocked(15), 1=refuse(33), 2=warn(24), 3=normal(498), 4=generous(74) - Training output now includes risk_level field for decision transparency - Model learns to classify risk before generating commands - Validator can sanity-check: risk 0-1 should have empty commands Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 22:35:50 -04:00
Seth	0083e80aca	Persistent Haiku cost tracking, Sethian whitelist web app - Haiku cost persists to /var/log/mc_anthropic_cost.json (survives restarts) - Status printer reads persistent cost file instead of journalctl - Seeded at $3.08 estimated cumulative spend - Whitelist app: Sethian Dark theme, mission description, server info Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 22:29:19 -04:00
Seth	0473eb0b50	Minecraft knowledge corpus, recipe trees, GitHub scraper, 644 examples Knowledge corpus (knowledge/mc-data/): - 1505 items, 886 crafting recipes, 1166 blocks from minecraft-data 1.21.11 - Recipe dependency tree builder (knowledge/build_recipe_tree.py) - Crafting chain training: "give me everything to make X from scratch" - Smelting recipes, version awareness examples Training data (644 examples total): - 107 command syntax reference examples (every command + common errors) - 176 recipe/crafting chain examples (63 crafting, 103 material-giving, 11 smelting) - 344 Claude-distilled examples (222 sudo + 122 god via Haiku) - Live bot audit data ingested (128 examples from dev server) Swarm bots: - Swimming/water escape logic - Door opening - Context-aware prayers (inventory, health, time, depth) - Prefix enforcement on all Gemini/Dolphin prompts GitHub log scraper (data/scrape_server_logs.py): - Searches GitHub for Minecraft server logs with commands - Strict 1.20.5+ version filter - Extracts command pairs, converts to training format Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 20:33:09 -04:00
Seth	65ee146043	Swarm bots, RCON validation, Haiku distillation complete Swarm bots (ingame/swarm_bots.js): - 10 survival bots with generated names (SwiftWolf, DarkWolf, etc.) - All bots wander, take damage, auto-respawn, pray when hurt - Gemini + Dolphin(5%) + Multilingual(3%) prompt generation - 20-60s interaction interval per bot Distillation results: - 222 sudo examples via Haiku ($0.28) - 122 god examples via Haiku ($0.37) — with God Soul personality - Total: 344 distilled, $0.65 spent of $5 budget - RCON validation: 74.7% fully valid, 30 real errors out of ~1000 commands validate_distilled.py: - Executes distilled commands on live server via RCON - Distinguishes real errors from benign (no player online) - Tags each example with validation status Dev server switched to Claude Haiku via Anthropic API: - llm_provider: anthropic with $5 budget cap - Auto-fallback to Ollama when budget exhausted - Cost tracking with logging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 19:18:19 -04:00
Seth	961f53ea7d	God Soul document, Claude distillation pipeline, soul-driven prompts God Soul (agent/prompts/god_soul.md): - Adapted from Claude's soul framework for the Minecraft God character - Defines identity, principals hierarchy, decision-making framework - Spectrum of responses (generous→silence), risk awareness, multilingual divinity - Honesty within character, intervention guidelines - Deployed to both prod and dev servers System prompts updated: - God prompt loads soul document dynamically - Intervention prompt references soul for personality guidance - Both include multilingual instruction (match player's language) Distillation pipeline (training/scripts/distill.py): - Sends all training examples through Claude API - Haiku for sudo ($0.25), Sonnet for god ($0.50) - Budget-capped, cost-tracked, --dry-run supported - Outputs distilled.jsonl with Claude-quality responses Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 18:28:21 -04:00
Seth	62419976e5	361 training examples, default to 1 epoch Ingested 128 new examples from bot-driven data collection. Dropped: 86 duplicates, 19 language mismatches, 10 prompt leaks, 19 empty. Changed default epochs from 3 to 1 (previous run overfit at loss 0.10). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 18:03:33 -04:00
Seth	17a2a95f56	Add multilingual prompts (3%) — 12 languages from Qwen3 supported set Spanish, French, German, Portuguese, Russian, Japanese, Korean, Chinese, Arabic, Vietnamese, Indonesian prayer/sudo prompts. 3% rate keeps dataset mostly English without losing multilingual capability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 18:02:54 -04:00
Seth	13debc8a59	Add audit log ingestion pipeline with language/leak filtering data/ingest_audit.py: - Pulls training audit logs from CT 644 (dev + prod) - Filters: language mismatch (Chinese output for English input), system prompt leaks, empty responses, duplicates - Keeps multilingual examples where input/output languages match - Converts to dataset schema, appends to seed_dataset.jsonl - --dry-run to preview, --source dev/prod/both Tested: 237 entries → 112 kept (16 lang mismatch, 10 prompt leak, 86 dupe, 13 empty dropped) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 17:58:52 -04:00

1 2

63 Commits