Commit Graph

20 Commits

Author SHA1 Message Date
Mortdecai da8f557219 GPU scheduler, 14-tool architecture, plugin deployment, event dispatcher
GPU Scheduler (gpu.sethpc.xyz):
- Live dashboard with 4 GPUs, training monitor, loss sparklines
- Preset-based job scheduler with 3 triggers (time, finish_training, cost)
- Model selection per GPU, pipeline configuration
- Tool self-play and training pipeline types
- Behind Google OAuth, live-refresh without page reload

Tool Architecture (14 tools):
- 3 new tools: world.nearby_entities, memory.read, memory.write
- 7 script.* tools: write, validate, execute, read, list, delete, schedule
- ScriptManager: full mcfunction datapack CRUD with RCON validation
- Training data: 1,430 tool examples (up from 1,159)

Plugin Deployment (paper-ai-25567):
- WorldGuard 7.0.12, CoreProtect CE 23.1, EssentialsX 2.21.2, Vault 1.7.3
- Fresh greenfield world reset
- 104 RCON-validated plugin training examples

Event Dispatcher:
- Watches server log for deaths, joins, advancements, PvP kills
- Configurable trigger probability and cooldowns per event type
- Deployed to dev server, fires god_system prompts on events
- 21 event-response training examples

Training Infrastructure:
- train_lora.py: --save-steps 50, --resume from checkpoint
- run_training.sh: stops Ollama, activates conda, restarts after
- Passwordless sudo for ollama services on steel141
- Dev server added to MCSManager with autoStart

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 03:14:45 -04:00
Mortdecai 434589d098 Prompt pipeline: 1660 generates, bigger GPUs process via Mortdecai
Architecture:
- 1660 Super (qwen3.5:0.8b) generates diverse edge-case prompts
- 2080 Ti / RTX 4000 / 3090 Ti process through Mortdecai + RCON validation
- File-based queue with locking for multi-GPU coordination
- 10 prompt categories targeting known weaknesses

Categories: fill_syntax, enchantments, execute_chains, entity_targeting,
gamerules_timed, memory_commands, creative_prayers, edge_items,
multicommand, natural_language

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 00:08:48 -04:00
Seth f39809eaca Semver rename: v1-v5 → 0.1.0-0.5.0 across all files
Versioning scheme: semantic versioning (MAJOR.MINOR.PATCH)
- 0.x.0 = pre-release development
- 1.0.0 = first public/monetized release

Renamed everywhere: PLAN.md, training scripts, self-play, overnight script,
status printer, whitelist app, discord bot, all training data references.

Ollama models retagged: mortdecai-v4 → mortdecai:0.4.0
Server configs updated on all three servers.
Self-play restarted with new model name.
Entity targeting + radius-aware kill + distance scale training added.

Seed dataset: 2,503 + tool: 1,159 + self-play: 5,059 = 8,721 total examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 21:37:14 -04:00
Seth 0f043384e5 Self-play: --api-key for authenticated gateway connections
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:40:01 -04:00
Seth ead16fd429 Persistent RCON connections — fixes server crash from connection spam
Root cause: self-play opened/closed a new TCP socket for every RCON command
(hundreds/minute). Paper's RCON listener creates a thread per connection,
overwhelming the server until it stopped.

Fix: PersistentRCON class maintains a single connection per server with
auto-reconnect. Thread-safe via lock. Connection pool keyed by host:port.

Applied to:
- mc_aigod_paper.py (prod paper-ai + dev)
- mc_aigod.py (shrink-world)
- self_play.py (training data generation)
- persistent_rcon.py (shared module)

Before: ~100+ RCON connections/minute → server crash
After: 3 persistent connections total → stable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 18:24:44 -04:00
Seth 25918b5b66 Self-play: 50 rounds, 0.1s sleep, max GPU utilization
Bumped from 20 rounds/tier to 50. Reduced sleep from 1s to 0.1s.
GPUs should run near 100% — Ollama queues requests internally.
mortdecai-sites container (CT 650) created on pve112.
Landing page live at mortdec.ai.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 07:36:01 -04:00
Seth 3580d350b4 Parallel 3-GPU self-play: all tiers run simultaneously
Each cycle runs all three tiers at the same time on different GPUs:
- Tier 1 (drills) on GPU A
- Tier 2 (self-critique) on GPU B
- Tier 3 (adversarial) on GPU C
GPU assignments rotate each cycle for even wear.
3x throughput vs sequential. RCON handles concurrent commands.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 00:55:24 -04:00
Seth de14f4a1c8 3-GPU overnight self-play: 3090 Ti + 2080 Ti + RTX 4000
Round-robin load balancing across three Ollama instances:
- 141:11434 (RTX 3090 Ti 24GB)
- 141:11435 (RTX 2080 Ti 11GB) — new second instance
- 179:11434 (RTX 4000 16GB)

Each tier cycles to a different GPU. 3x throughput overnight.
Cycles: Tier 1 drills → Tier 2 self-critique → Tier 3 adversarial → repeat

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 00:54:29 -04:00
Seth a3d139e04f Mortdecai v4 pre-training: /no_think, dedup, 3,369 examples
- /no_think prepended to all system prompts (seed + tool training)
- Deduplicated seed dataset (435 dupes removed)
- Training script updated for Qwen3.5-9B + /no_think
- 2,210 seed + 1,159 tool-calling = 3,369 total examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 20:15:00 -04:00
Seth 9abf9238c5 3-tier self-play: command drills, self-critique, adversarial
Tier 1 — Command drills:
  Random seed prompts → generate commands → RCON validates
  Teaches: accurate command syntax

Tier 2 — Single-shot self-critique:
  Model invents a tricky prompt AND responds in one call
  RCON validates the self-generated commands
  Teaches: edge-case awareness, self-evaluation

Tier 3 — Adversarial self-play:
  Session A generates challenging prompts
  Fresh Session B responds cold (can't cheat)
  RCON validates, self-corrects on errors
  Teaches: robustness, generalization

Usage: --tier 1|2|3|all --rounds N --focus category

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 19:39:33 -04:00
Seth c947fc3fa9 Self-play loop, Qwen3.5-9B bake-off: 70% base accuracy
Self-play (training/scripts/self_play.py):
- Model generates edge-case prompts across 9 categories
- Attempts commands via RCON, self-corrects on errors
- Successful traces → standard training examples
- Error correction traces → multi-turn tool-calling examples
- Anti-collapse: focuses on categories model is weakest in
- Ready for v4 deployment, not yet active

Qwen3.5-9B base model bake-off (147/1542 cases):
- 70.1% OK (vs 34% Qwen3-8B base) — 2x improvement
- 29.9% MISS (mostly God/prayer — no persona training)
- 15.6% needed syntax fixes
- Avg 7.5s response (thinking tokens)
- Strong v4 candidate: better base + tool-calling architecture

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 19:35:57 -04:00
Seth 750cf15c79 1,542 seed + 1,159 tool-calling examples, async processing, validator tracking
New knowledge baked in:
- Enchantments (60): all 1.21 enchants, mutual exclusions, max levels, component syntax
- WorldEdit (45): //set, //replace, //sphere, //stack, selection, brushes
- Paper server (55): gamerules, permissions, plugins, scoreboard, moderation
- Cosmetics/XP (42): title, tellraw, playsound, particle, xp, effect mechanics
- Quantity boundaries (32): item tier caps, greedy→stingy, humble→generous

Training infrastructure:
- train_lora.py updated for multi-turn tool conversations + seed data
- Async prayer/sudo processing (ThreadPoolExecutor, 3 workers)
- Validator hit-rate tracking to /var/log/mc_validator_stats.json

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 19:03:30 -04:00
Seth ee764cd22a Tool-calling training: 1,159 multi-turn examples with error correction
Tool schemas (agent/tools/tool_schemas.py):
- rcon.execute: execute commands, get success/error results
- minecraft.wiki_lookup: look up syntax and item info
- world.player_info: player health, position, inventory
- world.server_state: time, weather, online players
- 10 RCON error patterns with corrections
- 12 common error scenarios for training

Training data generator (training/scripts/generate_tool_training.py):
- Converts seed dataset to multi-turn tool conversations
- Error correction: model tries wrong command → gets error → self-corrects
- Wiki/player/server lookups for uncertainty scenarios
- Qwen3 native tool-calling format with <tool_call> tags

1,159 examples: 1043 success, 79 error correction, 24 error scenarios,
13 tool lookups. Ready for v4 training.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 18:49:08 -04:00
Seth e28836106f Risk_level in all 644 examples + model outputs risk classification
- All 644 examples tagged: 0=blocked(15), 1=refuse(33), 2=warn(24), 3=normal(498), 4=generous(74)
- Training output now includes risk_level field for decision transparency
- Model learns to classify risk before generating commands
- Validator can sanity-check: risk 0-1 should have empty commands

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 22:35:50 -04:00
Seth 65ee146043 Swarm bots, RCON validation, Haiku distillation complete
Swarm bots (ingame/swarm_bots.js):
- 10 survival bots with generated names (SwiftWolf, DarkWolf, etc.)
- All bots wander, take damage, auto-respawn, pray when hurt
- Gemini + Dolphin(5%) + Multilingual(3%) prompt generation
- 20-60s interaction interval per bot

Distillation results:
- 222 sudo examples via Haiku ($0.28)
- 122 god examples via Haiku ($0.37) — with God Soul personality
- Total: 344 distilled, $0.65 spent of $5 budget
- RCON validation: 74.7% fully valid, 30 real errors out of ~1000 commands

validate_distilled.py:
- Executes distilled commands on live server via RCON
- Distinguishes real errors from benign (no player online)
- Tags each example with validation status

Dev server switched to Claude Haiku via Anthropic API:
- llm_provider: anthropic with $5 budget cap
- Auto-fallback to Ollama when budget exhausted
- Cost tracking with logging

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 19:18:19 -04:00
Seth 961f53ea7d God Soul document, Claude distillation pipeline, soul-driven prompts
God Soul (agent/prompts/god_soul.md):
- Adapted from Claude's soul framework for the Minecraft God character
- Defines identity, principals hierarchy, decision-making framework
- Spectrum of responses (generous→silence), risk awareness, multilingual divinity
- Honesty within character, intervention guidelines
- Deployed to both prod and dev servers

System prompts updated:
- God prompt loads soul document dynamically
- Intervention prompt references soul for personality guidance
- Both include multilingual instruction (match player's language)

Distillation pipeline (training/scripts/distill.py):
- Sends all training examples through Claude API
- Haiku for sudo ($0.25), Sonnet for god ($0.50)
- Budget-capped, cost-tracked, --dry-run supported
- Outputs distilled.jsonl with Claude-quality responses

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 18:28:21 -04:00
Seth 62419976e5 361 training examples, default to 1 epoch
Ingested 128 new examples from bot-driven data collection.
Dropped: 86 duplicates, 19 language mismatches, 10 prompt leaks, 19 empty.
Changed default epochs from 3 to 1 (previous run overfit at loss 0.10).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 18:03:33 -04:00
Seth 142e4fd3c4 Fix training script: bf16 for Ampere GPU, add system prompts to training data
- Switch fp16 to bf16 (RTX 3090 Ti is Ampere, supports BF16 natively)
- Include system prompt in training conversations (mode-aware: sudo/god/god_system)
- Include message field only for god modes
- Add determine_mode() and get_system_prompt() helpers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 16:26:47 -04:00
Seth 48b627d498 Add LoRA training scripts and fix bake-off token budget
- training/scripts/train_lora.py: Unsloth QLoRA trainer for qwen3:8b
- training/scripts/train_lora.sh: Launch script for steel141 RTX 3090 Ti
- eval/bakeoff.py: Fixed token budget (400->1500) that caused qwen3
  models to exhaust tokens on thinking, added --no-think flag
- agent/serve.py: Default model changed to gemma3n:e4b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 10:40:18 -04:00
Seth 827850b8d7 Initial project scaffold: dataset schema, 31 seed training examples, Mineflayer bot framework, and 7-phase roadmap
- IDEA.md: project scope (Minecraft ops AI assistant via qwen3-coder LoRA/SFT)
- PLAN.md: complete roadmap with prior art analysis, architecture, phased plan, dev server docs
- data/schema.json: training example JSON Schema with negative_output support
- data/processed/seed_dataset.jsonl: 31 validated examples from repair code, prayer logs, session history
- data/validate_dataset.py: schema validator with summary statistics
- ingame/: Mineflayer bot framework (test_connect, spawn_bots, aware_bots with full event logging)
- Directory structure for knowledge/, eval/, training/, agent/ (Phase 1.3+ work)
2026-03-18 01:51:28 -04:00