GPU Scheduler (gpu.sethpc.xyz):
- Live dashboard with 4 GPUs, training monitor, loss sparklines
- Preset-based job scheduler with 3 triggers (time, finish_training, cost)
- Model selection per GPU, pipeline configuration
- Tool self-play and training pipeline types
- Behind Google OAuth, live-refresh without page reload
Tool Architecture (14 tools):
- 3 new tools: world.nearby_entities, memory.read, memory.write
- 7 script.* tools: write, validate, execute, read, list, delete, schedule
- ScriptManager: full mcfunction datapack CRUD with RCON validation
- Training data: 1,430 tool examples (up from 1,159)
Plugin Deployment (paper-ai-25567):
- WorldGuard 7.0.12, CoreProtect CE 23.1, EssentialsX 2.21.2, Vault 1.7.3
- Fresh greenfield world reset
- 104 RCON-validated plugin training examples
Event Dispatcher:
- Watches server log for deaths, joins, advancements, PvP kills
- Configurable trigger probability and cooldowns per event type
- Deployed to dev server, fires god_system prompts on events
- 21 event-response training examples
Training Infrastructure:
- train_lora.py: --save-steps 50, --resume from checkpoint
- run_training.sh: stops Ollama, activates conda, restarts after
- Passwordless sudo for ollama services on steel141
- Dev server added to MCSManager with autoStart
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Model can now output revert_after (seconds) and revert_commands fields.
Python service schedules timer from model's response, not just heuristics.
Players notified of revert countdown. Revert announced when applied.
Training examples: temporary gamerules with explicit/implicit/no duration,
permanent changes (no revert), effects with built-in duration, combined reverts.
Key principle: no duration specified → default 5 min revert for safety.
"permanently"/"forever"/"always" → no revert.
Effects → built-in duration, no revert_after needed.
Seed dataset: 3,136 examples
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Teaches the model to distinguish:
- "kill the zombie" → limit=1,sort=nearest (specific target)
- "kill all zombies" → distance=..30 (area clear)
- "what mobs are nearby" → requires world.nearby_entities tool
- "target the closest enemy" → type=!player,limit=1,sort=nearest
With LangGraph tools enabled, world.nearby_entities gives the model
entity awareness before generating kill commands.
Seed dataset: 2,486 examples
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Teaches command ordering and dependencies:
- Build structure THEN tp inside (not reverse)
- Apply protection BEFORE spawning hostile mobs
- Create water pool BEFORE dropping player
- Effects before gear (protection active during equip)
- Clear mobs before healing (don't waste heal)
- Cage before tp victim (prevent escape)
Key principle: reasoning explains WHY order matters.
Seed dataset: 2,409 examples
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Python revert system (live on prod):
- Gamerule changes auto-revert after default timeout (5-10 min)
- User can specify duration: "disable mobs for 5 minutes"
- "permanently"/"forever" skips revert
- Setting back to default cancels pending revert
- Players notified of revert countdown
Training data (20 examples):
- 8 revert-aware gamerules with revert_after/revert_commands fields
- 12 drop/height/tp examples: intentional drops, safe tp, context-aware
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Validator hardcodes maximum durations for dangerous effects:
- Levitation: 15s max (player floats into sky and dies from fall)
- Wither: 30s max (drains health, can kill)
- Poison: 60s max
- Nausea: 30s max
12 training examples: levitation safety, emergency clear, duration caps,
"I can't stop floating" → clear levitation + slow falling
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
God Soul updated with quantity rules:
- Common (dirt/wood): max 320, Uncommon (iron/gold): max 128
- Rare (diamond/emerald): max 32, Very rare (netherite/elytra): max 4
- Forbidden (bedrock/command_block): never give
- Greedy → scaled back, Humble → generous within cap, Absurd → comedic
32 training examples: greedy(6), casual(6), humble(4), explicit(6),
forbidden(5), absurd(3), enchanted(2)
Dataset: 1,340 examples total
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merged: 964 curated + 344 Claude-distilled = 1,308 total
All examples tagged with risk_level (0-4)
Model outputs risk classification in training target
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- All 644 examples tagged: 0=blocked(15), 1=refuse(33), 2=warn(24), 3=normal(498), 4=generous(74)
- Training output now includes risk_level field for decision transparency
- Model learns to classify risk before generating commands
- Validator can sanity-check: risk 0-1 should have empty commands
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Knowledge corpus (knowledge/mc-data/):
- 1505 items, 886 crafting recipes, 1166 blocks from minecraft-data 1.21.11
- Recipe dependency tree builder (knowledge/build_recipe_tree.py)
- Crafting chain training: "give me everything to make X from scratch"
- Smelting recipes, version awareness examples
Training data (644 examples total):
- 107 command syntax reference examples (every command + common errors)
- 176 recipe/crafting chain examples (63 crafting, 103 material-giving, 11 smelting)
- 344 Claude-distilled examples (222 sudo + 122 god via Haiku)
- Live bot audit data ingested (128 examples from dev server)
Swarm bots:
- Swimming/water escape logic
- Door opening
- Context-aware prayers (inventory, health, time, depth)
- Prefix enforcement on all Gemini/Dolphin prompts
GitHub log scraper (data/scrape_server_logs.py):
- Searches GitHub for Minecraft server logs with commands
- Strict 1.20.5+ version filter
- Extracts command pairs, converts to training format
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ingested 128 new examples from bot-driven data collection.
Dropped: 86 duplicates, 19 language mismatches, 10 prompt leaks, 19 empty.
Changed default epochs from 3 to 1 (previous run overfit at loss 0.10).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Expanded dataset from 31 to 182 examples (45 manual + 106 extracted from server logs)
- Built eval/harness.py with per-category breakdowns and baseline tracking
- Built eval/live_bakeoff.py for RCON-verified model comparison on live server
- Extracted training data from prayer logs, sudo logs, and bug reports on CT 644
- Added Reddit post draft and modmail for playtester recruitment
- Updated server context: all servers now online-mode=false + whitelist
- Updated PLAN.md with Phase 2 progress
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- IDEA.md: project scope (Minecraft ops AI assistant via qwen3-coder LoRA/SFT)
- PLAN.md: complete roadmap with prior art analysis, architecture, phased plan, dev server docs
- data/schema.json: training example JSON Schema with negative_output support
- data/processed/seed_dataset.jsonl: 31 validated examples from repair code, prayer logs, session history
- data/validate_dataset.py: schema validator with summary statistics
- ingame/: Mineflayer bot framework (test_connect, spawn_bots, aware_bots with full event logging)
- Directory structure for knowledge/, eval/, training/, agent/ (Phase 1.3+ work)