Validator hardcodes maximum durations for dangerous effects:
- Levitation: 15s max (player floats into sky and dies from fall)
- Wither: 30s max (drains health, can kill)
- Poison: 60s max
- Nausea: 30s max
12 training examples: levitation safety, emergency clear, duration caps,
"I can't stop floating" → clear levitation + slow falling
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
God Soul updated with quantity rules:
- Common (dirt/wood): max 320, Uncommon (iron/gold): max 128
- Rare (diamond/emerald): max 32, Very rare (netherite/elytra): max 4
- Forbidden (bedrock/command_block): never give
- Greedy → scaled back, Humble → generous within cap, Absurd → comedic
32 training examples: greedy(6), casual(6), humble(4), explicit(6),
forbidden(5), absurd(3), enchanted(2)
Dataset: 1,340 examples total
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merged: 964 curated + 344 Claude-distilled = 1,308 total
All examples tagged with risk_level (0-4)
Model outputs risk classification in training target
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- All 644 examples tagged: 0=blocked(15), 1=refuse(33), 2=warn(24), 3=normal(498), 4=generous(74)
- Training output now includes risk_level field for decision transparency
- Model learns to classify risk before generating commands
- Validator can sanity-check: risk 0-1 should have empty commands
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Knowledge corpus (knowledge/mc-data/):
- 1505 items, 886 crafting recipes, 1166 blocks from minecraft-data 1.21.11
- Recipe dependency tree builder (knowledge/build_recipe_tree.py)
- Crafting chain training: "give me everything to make X from scratch"
- Smelting recipes, version awareness examples
Training data (644 examples total):
- 107 command syntax reference examples (every command + common errors)
- 176 recipe/crafting chain examples (63 crafting, 103 material-giving, 11 smelting)
- 344 Claude-distilled examples (222 sudo + 122 god via Haiku)
- Live bot audit data ingested (128 examples from dev server)
Swarm bots:
- Swimming/water escape logic
- Door opening
- Context-aware prayers (inventory, health, time, depth)
- Prefix enforcement on all Gemini/Dolphin prompts
GitHub log scraper (data/scrape_server_logs.py):
- Searches GitHub for Minecraft server logs with commands
- Strict 1.20.5+ version filter
- Extracts command pairs, converts to training format
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Swarm bots (ingame/swarm_bots.js):
- 10 survival bots with generated names (SwiftWolf, DarkWolf, etc.)
- All bots wander, take damage, auto-respawn, pray when hurt
- Gemini + Dolphin(5%) + Multilingual(3%) prompt generation
- 20-60s interaction interval per bot
Distillation results:
- 222 sudo examples via Haiku ($0.28)
- 122 god examples via Haiku ($0.37) — with God Soul personality
- Total: 344 distilled, $0.65 spent of $5 budget
- RCON validation: 74.7% fully valid, 30 real errors out of ~1000 commands
validate_distilled.py:
- Executes distilled commands on live server via RCON
- Distinguishes real errors from benign (no player online)
- Tags each example with validation status
Dev server switched to Claude Haiku via Anthropic API:
- llm_provider: anthropic with $5 budget cap
- Auto-fallback to Ollama when budget exhausted
- Cost tracking with logging
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ingested 128 new examples from bot-driven data collection.
Dropped: 86 duplicates, 19 language mismatches, 10 prompt leaks, 19 empty.
Changed default epochs from 3 to 1 (previous run overfit at loss 0.10).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Expanded dataset from 31 to 182 examples (45 manual + 106 extracted from server logs)
- Built eval/harness.py with per-category breakdowns and baseline tracking
- Built eval/live_bakeoff.py for RCON-verified model comparison on live server
- Extracted training data from prayer logs, sudo logs, and bug reports on CT 644
- Added Reddit post draft and modmail for playtester recruitment
- Updated server context: all servers now online-mode=false + whitelist
- Updated PLAN.md with Phase 2 progress
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- IDEA.md: project scope (Minecraft ops AI assistant via qwen3-coder LoRA/SFT)
- PLAN.md: complete roadmap with prior art analysis, architecture, phased plan, dev server docs
- data/schema.json: training example JSON Schema with negative_output support
- data/processed/seed_dataset.jsonl: 31 validated examples from repair code, prayer logs, session history
- data/validate_dataset.py: schema validator with summary statistics
- ingame/: Mineflayer bot framework (test_connect, spawn_bots, aware_bots with full event logging)
- Directory structure for knowledge/, eval/, training/, agent/ (Phase 1.3+ work)