5 Commits

Author SHA1 Message Date
Seth 5b28002001 0.6.0 training session: Oracle Bot, RL combat, Mind's Eye, multilingual pipeline
Major changes from this session:

Training:
- 0.6.0 training running: 9B on steel141 3090 Ti, 27B on rented H100 NVL
- 7,256 merged training examples (up from 3,183)
- New training data: failure modes (85), midloop messaging (27),
  prompt injection defense (29), personality (32), gold from quarantine
  bank (232), new tool examples (30), claude's own experience (10)
- All training data RCON-validated at 100% pass rate
- Bake-off: gemma3:27b 66%, qwen3.5:27b 61%, translategemma:27b 56%

Oracle Bot (Mind's Eye):
- Invisible spectator bot (mineflayer) streams world state via WebSocket
- HTML5 Canvas frontend at mind.mortdec.ai
- Real-time tool trace visualization with expandable entries
- Streaming model tokens during inference
- Gateway integration: fire-and-forget POST /trace on every tool call

Reinforcement Learning:
- Gymnasium environment wrapping mineflayer bot (minecraft_env.py)
- PPO training via Stable Baselines3 (10K param policy network)
- Behavioral cloning pretraining (97.5% accuracy on expert policy)
- Infinite training loop with auto-restart and checkpoint resume
- Bot learns combat, survival, navigation from raw experience

Bot Army:
- 8-soldier marching formation with autonomous combat
- Combat bots using mineflayer-pvp, pathfinder, armor-manager
- Multilingual prayer bots via translategemma:27b (18 languages)
- Frame-based AI architecture: LLM planner + reactive micro-scripts

Infrastructure:
- Fixed mattpc.sethpc.xyz billing gateway (API key + player list parser)
- Billing gateway now tracks all LAN traffic (LAN auto-auth)
- Gateway fallback for empty god-mode responses
- Updated mortdec.ai landing page

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 20:22:50 -04:00
Seth 027b835286 Session final: bakeoff fix, branding fonts, 3-GPU parallel self-play
Current running state:
- Prod: mortdecai-v4 on RTX 4000, single-call, error correction, fall protection
- Dev: Gemini 3.1 Flash Lite (preview) + 5 bots generating training data
- Bake-off: v4 running on steel141 (3090 Ti)
- Self-play: ready for overnight — 3 GPUs parallel (3090 Ti + 2080 Ti + RTX 4000)

Changes:
- Bakeoff parser: strips think blocks, handles dict/list types
- Branding fonts: Rajdhani-Bold (official), Exo2, Orbitron, Oxanium, SpaceGrotesk
- Gemini 3.1 pricing added to cost tracker

Active data collection:
- Gemini 3.1 Flash Lite bots on dev ($20 budget, ~$4/hr with 5 bots)
- Self-play overnight: 3 tiers × 3 GPUs = ~9x throughput
- Training audit logging on all servers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 00:56:45 -04:00
Seth 9d789d2524 Three-tier constraint model, mode-aware eval, boundary examples, playtest tooling
Eval harness:
- Mode-aware scoring: sudo=strict (exact match), pray/god=soft (category match,
  in-character, appropriate intensity)
- New metrics: cmd_category_match, appropriate_intensity, scoring_mode breakdown
- Eval defaults to steel141 (192.168.0.141) — prod GPU reserved for serving

Dataset (213 examples):
- Added 31 boundary/adversarial examples (safety edges, abstention, near-boundary)
- Updated pray example reasoning: character-driven logic, not prescriptive outputs
- Tagged pray examples with scoring_mode=soft

Playtest tooling:
- whitelist.sh: add/remove/list across all 3 servers
- FRIENDS_INVITE.md + Discord version: playtester recruitment docs
- Server addresses and implementation details for both training servers

PLAN.md:
- Three-tier constraint model documented (sudo/pray/god_system)
- Success criteria split by scoring mode
- All session decisions logged

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 15:57:01 -04:00
Seth 48b627d498 Add LoRA training scripts and fix bake-off token budget
- training/scripts/train_lora.py: Unsloth QLoRA trainer for qwen3:8b
- training/scripts/train_lora.sh: Launch script for steel141 RTX 3090 Ti
- eval/bakeoff.py: Fixed token budget (400->1500) that caused qwen3
  models to exhaust tokens on thinking, added --no-think flag
- agent/serve.py: Default model changed to gemma3n:e4b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 10:40:18 -04:00
Seth 7da28c8800 Add model bake-off harness and base model research
Bake-off tested 7 models on 31 seed examples via GPU-accelerated Ollama
on node-197 RTX 4000. gemma3n:e4b leads for serving (80.6% cmd match,
100% safety, 5.9s). qwen3:8b recommended as fine-tuning base (Apache 2.0,
best syntax quality, strong ecosystem). Full research in MODEL_RESEARCH.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 08:54:11 -04:00