Mortdecai

Author	SHA1	Message	Date
Seth	5b28002001	0.6.0 training session: Oracle Bot, RL combat, Mind's Eye, multilingual pipeline Major changes from this session: Training: - 0.6.0 training running: 9B on steel141 3090 Ti, 27B on rented H100 NVL - 7,256 merged training examples (up from 3,183) - New training data: failure modes (85), midloop messaging (27), prompt injection defense (29), personality (32), gold from quarantine bank (232), new tool examples (30), claude's own experience (10) - All training data RCON-validated at 100% pass rate - Bake-off: gemma3:27b 66%, qwen3.5:27b 61%, translategemma:27b 56% Oracle Bot (Mind's Eye): - Invisible spectator bot (mineflayer) streams world state via WebSocket - HTML5 Canvas frontend at mind.mortdec.ai - Real-time tool trace visualization with expandable entries - Streaming model tokens during inference - Gateway integration: fire-and-forget POST /trace on every tool call Reinforcement Learning: - Gymnasium environment wrapping mineflayer bot (minecraft_env.py) - PPO training via Stable Baselines3 (10K param policy network) - Behavioral cloning pretraining (97.5% accuracy on expert policy) - Infinite training loop with auto-restart and checkpoint resume - Bot learns combat, survival, navigation from raw experience Bot Army: - 8-soldier marching formation with autonomous combat - Combat bots using mineflayer-pvp, pathfinder, armor-manager - Multilingual prayer bots via translategemma:27b (18 languages) - Frame-based AI architecture: LLM planner + reactive micro-scripts Infrastructure: - Fixed mattpc.sethpc.xyz billing gateway (API key + player list parser) - Billing gateway now tracks all LAN traffic (LAN auto-auth) - Gateway fallback for empty god-mode responses - Updated mortdec.ai landing page Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 20:22:50 -04:00
Seth	027b835286	Session final: bakeoff fix, branding fonts, 3-GPU parallel self-play Current running state: - Prod: mortdecai-v4 on RTX 4000, single-call, error correction, fall protection - Dev: Gemini 3.1 Flash Lite (preview) + 5 bots generating training data - Bake-off: v4 running on steel141 (3090 Ti) - Self-play: ready for overnight — 3 GPUs parallel (3090 Ti + 2080 Ti + RTX 4000) Changes: - Bakeoff parser: strips think blocks, handles dict/list types - Branding fonts: Rajdhani-Bold (official), Exo2, Orbitron, Oxanium, SpaceGrotesk - Gemini 3.1 pricing added to cost tracker Active data collection: - Gemini 3.1 Flash Lite bots on dev ($20 budget, ~$4/hr with 5 bots) - Self-play overnight: 3 tiers × 3 GPUs = ~9x throughput - Training audit logging on all servers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 00:56:45 -04:00
Seth	9d789d2524	Three-tier constraint model, mode-aware eval, boundary examples, playtest tooling Eval harness: - Mode-aware scoring: sudo=strict (exact match), pray/god=soft (category match, in-character, appropriate intensity) - New metrics: cmd_category_match, appropriate_intensity, scoring_mode breakdown - Eval defaults to steel141 (192.168.0.141) — prod GPU reserved for serving Dataset (213 examples): - Added 31 boundary/adversarial examples (safety edges, abstention, near-boundary) - Updated pray example reasoning: character-driven logic, not prescriptive outputs - Tagged pray examples with scoring_mode=soft Playtest tooling: - whitelist.sh: add/remove/list across all 3 servers - FRIENDS_INVITE.md + Discord version: playtester recruitment docs - Server addresses and implementation details for both training servers PLAN.md: - Three-tier constraint model documented (sudo/pray/god_system) - Success criteria split by scoring mode - All session decisions logged Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 15:57:01 -04:00
Seth	48b627d498	Add LoRA training scripts and fix bake-off token budget - training/scripts/train_lora.py: Unsloth QLoRA trainer for qwen3:8b - training/scripts/train_lora.sh: Launch script for steel141 RTX 3090 Ti - eval/bakeoff.py: Fixed token budget (400->1500) that caused qwen3 models to exhaust tokens on thinking, added --no-think flag - agent/serve.py: Default model changed to gemma3n:e4b Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 10:40:18 -04:00
Seth	7da28c8800	Add model bake-off harness and base model research Bake-off tested 7 models on 31 seed examples via GPU-accelerated Ollama on node-197 RTX 4000. gemma3n:e4b leads for serving (80.6% cmd match, 100% safety, 5.9s). qwen3:8b recommended as fine-tuning base (Apache 2.0, best syntax quality, strong ecosystem). Full research in MODEL_RESEARCH.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 08:54:11 -04:00

5 Commits