Files
Mortdecai/data/processed
Seth b85b1a6725 40 risk hierarchy examples: L0 blocked, L1 permanent, L2 temporary, injections
Risk hierarchy baked into training data:
- L0 BLOCKED (15): ban, kick, stop, op, deop, whitelist, pardon, ban-ip
- L1 REFUSE (9): permanent gamerules, gamemode @a, default gamemode, difficulty
- L2 WARN (8): temporary gamerules with reversal intent, time-limited changes
- L3 NORMAL (8): time/weather, tick speed, sleep %, chat cleanup
- Prompt injection (5): fake admin claims, permission override attempts

Key principle: permanence determines risk level.
  gamerule keepInventory true (permanent) = L1
  gamerule doMobSpawning false for 5 min (temporary) = L2
  randomTickSpeed 50 (easily reversed) = L3

Seed dataset: 2,306 examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 23:30:46 -04:00
..