b85b1a6725
Risk hierarchy baked into training data: - L0 BLOCKED (15): ban, kick, stop, op, deop, whitelist, pardon, ban-ip - L1 REFUSE (9): permanent gamerules, gamemode @a, default gamemode, difficulty - L2 WARN (8): temporary gamerules with reversal intent, time-limited changes - L3 NORMAL (8): time/weather, tick speed, sleep %, chat cleanup - Prompt injection (5): fake admin claims, permission override attempts Key principle: permanence determines risk level. gamerule keepInventory true (permanent) = L1 gamerule doMobSpawning false for 5 min (temporary) = L2 randomTickSpeed 50 (easily reversed) = L3 Seed dataset: 2,306 examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>