38b9a02e45
- Expanded dataset from 31 to 182 examples (45 manual + 106 extracted from server logs) - Built eval/harness.py with per-category breakdowns and baseline tracking - Built eval/live_bakeoff.py for RCON-verified model comparison on live server - Extracted training data from prayer logs, sudo logs, and bug reports on CT 644 - Added Reddit post draft and modmail for playtester recruitment - Updated server context: all servers now online-mode=false + whitelist - Updated PLAN.md with Phase 2 progress Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>