Each cycle runs all three tiers at the same time on different GPUs:
- Tier 1 (drills) on GPU A
- Tier 2 (self-critique) on GPU B
- Tier 3 (adversarial) on GPU C
GPU assignments rotate each cycle for even wear.
3x throughput vs sequential. RCON handles concurrent commands.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Round-robin load balancing across three Ollama instances:
- 141:11434 (RTX 3090 Ti 24GB)
- 141:11435 (RTX 2080 Ti 11GB) — new second instance
- 179:11434 (RTX 4000 16GB)
Each tier cycles to a different GPU. 3x throughput overnight.
Cycles: Tier 1 drills → Tier 2 self-critique → Tier 3 adversarial → repeat
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Self-play (training/scripts/self_play.py):
- Model generates edge-case prompts across 9 categories
- Attempts commands via RCON, self-corrects on errors
- Successful traces → standard training examples
- Error correction traces → multi-turn tool-calling examples
- Anti-collapse: focuses on categories model is weakest in
- Ready for v4 deployment, not yet active
Qwen3.5-9B base model bake-off (147/1542 cases):
- 70.1% OK (vs 34% Qwen3-8B base) — 2x improvement
- 29.9% MISS (mostly God/prayer — no persona training)
- 15.6% needed syntax fixes
- Avg 7.5s response (thinking tokens)
- Strong v4 candidate: better base + tool-calling architecture
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- All 644 examples tagged: 0=blocked(15), 1=refuse(33), 2=warn(24), 3=normal(498), 4=generous(74)
- Training output now includes risk_level field for decision transparency
- Model learns to classify risk before generating commands
- Validator can sanity-check: risk 0-1 should have empty commands
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Swarm bots (ingame/swarm_bots.js):
- 10 survival bots with generated names (SwiftWolf, DarkWolf, etc.)
- All bots wander, take damage, auto-respawn, pray when hurt
- Gemini + Dolphin(5%) + Multilingual(3%) prompt generation
- 20-60s interaction interval per bot
Distillation results:
- 222 sudo examples via Haiku ($0.28)
- 122 god examples via Haiku ($0.37) — with God Soul personality
- Total: 344 distilled, $0.65 spent of $5 budget
- RCON validation: 74.7% fully valid, 30 real errors out of ~1000 commands
validate_distilled.py:
- Executes distilled commands on live server via RCON
- Distinguishes real errors from benign (no player online)
- Tags each example with validation status
Dev server switched to Claude Haiku via Anthropic API:
- llm_provider: anthropic with $5 budget cap
- Auto-fallback to Ollama when budget exhausted
- Cost tracking with logging
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
God Soul (agent/prompts/god_soul.md):
- Adapted from Claude's soul framework for the Minecraft God character
- Defines identity, principals hierarchy, decision-making framework
- Spectrum of responses (generous→silence), risk awareness, multilingual divinity
- Honesty within character, intervention guidelines
- Deployed to both prod and dev servers
System prompts updated:
- God prompt loads soul document dynamically
- Intervention prompt references soul for personality guidance
- Both include multilingual instruction (match player's language)
Distillation pipeline (training/scripts/distill.py):
- Sends all training examples through Claude API
- Haiku for sudo ($0.25), Sonnet for god ($0.50)
- Budget-capped, cost-tracked, --dry-run supported
- Outputs distilled.jsonl with Claude-quality responses
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ingested 128 new examples from bot-driven data collection.
Dropped: 86 duplicates, 19 language mismatches, 10 prompt leaks, 19 empty.
Changed default epochs from 3 to 1 (previous run overfit at loss 0.10).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Switch fp16 to bf16 (RTX 3090 Ti is Ampere, supports BF16 natively)
- Include system prompt in training conversations (mode-aware: sudo/god/god_system)
- Include message field only for god modes
- Add determine_mode() and get_system_prompt() helpers
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- training/scripts/train_lora.py: Unsloth QLoRA trainer for qwen3:8b
- training/scripts/train_lora.sh: Launch script for steel141 RTX 3090 Ti
- eval/bakeoff.py: Fixed token budget (400->1500) that caused qwen3
models to exhaust tokens on thinking, added --no-think flag
- agent/serve.py: Default model changed to gemma3n:e4b
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- IDEA.md: project scope (Minecraft ops AI assistant via qwen3-coder LoRA/SFT)
- PLAN.md: complete roadmap with prior art analysis, architecture, phased plan, dev server docs
- data/schema.json: training example JSON Schema with negative_output support
- data/processed/seed_dataset.jsonl: 31 validated examples from repair code, prayer logs, session history
- data/validate_dataset.py: schema validator with summary statistics
- ingame/: Mineflayer bot framework (test_connect, spawn_bots, aware_bots with full event logging)
- Directory structure for knowledge/, eval/, training/, agent/ (Phase 1.3+ work)