0.5.0 bake-off results, knowledge lookup tools, training progress chart

Bake-off (0.5.0 vs 0.4.0):
- Overall: 46.8% vs 45.2% (+1.6%), 0 errors vs 2
- Enchantments: +47% (20% → 67%)
- EssentialsX: +60% (0% → 60%)
- Effects: +25% (0% → 25%)
- Regressions: fill_build -67%, world -20%

Knowledge Lookup Tools (4 new):
- plugin.docs_lookup: WorldGuard, WorldEdit, CoreProtect, EssentialsX, LuckPerms docs
- minecraft.changelog_lookup: version history from Minecraft Wiki
- paper.docs_lookup: Paper server-specific documentation
- Wired into gateway model-driven tool loop and exploration self-play

Exploration Self-Play:
- General (vanilla MC) and plugins focus modes
- Wiki-grounded: model researches before acting, validates through RCON
- 2,243 exploration examples generated, 150 kept after quality filtering

Training Progress Chart:
- SVG chart showing training examples and inverse loss across versions
- Added to MODEL_CARD.md for Gitea display

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Mortdecai
2026-03-21 15:28:09 -04:00
parent da8f557219
commit f5118505b1
10 changed files with 3215 additions and 20 deletions
+56
View File
@@ -0,0 +1,56 @@
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 700 400" width="700" height="400">
<rect width="700" height="400" fill="#111111" rx="8"/>
<!-- Title -->
<text x="350.0" y="25" fill="#e0e0e0" font-family="monospace" font-size="16" text-anchor="middle" font-weight="bold">Mortdecai Training Progress</text>
<!-- Grid lines -->
<line x1="70" y1="320.0" x2="670" y2="320.0" stroke="#252525" stroke-width="0.5"/>
<text x="65" y="324.0" fill="#999999" font-family="monospace" font-size="10" text-anchor="end">0</text>
<line x1="70" y1="250.0" x2="670" y2="250.0" stroke="#252525" stroke-width="0.5"/>
<text x="65" y="254.0" fill="#999999" font-family="monospace" font-size="10" text-anchor="end">1,252</text>
<line x1="70" y1="180.0" x2="670" y2="180.0" stroke="#252525" stroke-width="0.5"/>
<text x="65" y="184.0" fill="#999999" font-family="monospace" font-size="10" text-anchor="end">2,505</text>
<line x1="70" y1="110.0" x2="670" y2="110.0" stroke="#252525" stroke-width="0.5"/>
<text x="65" y="114.0" fill="#999999" font-family="monospace" font-size="10" text-anchor="end">3,758</text>
<line x1="70" y1="40.0" x2="670" y2="40.0" stroke="#252525" stroke-width="0.5"/>
<text x="65" y="44.0" fill="#999999" font-family="monospace" font-size="10" text-anchor="end">5,011</text>
<rect x="94.0" y="292.06536704112375" width="72.0" height="27.934632958876232" fill="#D35400" rx="3" opacity="0.85"/>
<text x="130.0" y="284.06536704112375" fill="#D35400" font-family="monospace" font-size="11" text-anchor="middle" font-weight="bold">500</text>
<text x="130.0" y="340" fill="#e0e0e0" font-family="monospace" font-size="12" text-anchor="middle">0.1.0</text>
<text x="130.0" y="355" fill="#999999" font-family="monospace" font-size="9" text-anchor="middle">v1 (seed)</text>
<rect x="214.0" y="252.95688089869705" width="72.0" height="67.04311910130295" fill="#D35400" rx="3" opacity="0.85"/>
<text x="250.0" y="244.95688089869705" fill="#D35400" font-family="monospace" font-size="11" text-anchor="middle" font-weight="bold">1,200</text>
<text x="250.0" y="340" fill="#e0e0e0" font-family="monospace" font-size="12" text-anchor="middle">0.2.0</text>
<text x="250.0" y="355" fill="#999999" font-family="monospace" font-size="9" text-anchor="middle">v2 (+entities)</text>
<rect x="334.0" y="202.67454157271982" width="72.0" height="117.32545842728017" fill="#D35400" rx="3" opacity="0.85"/>
<text x="370.0" y="194.67454157271982" fill="#D35400" font-family="monospace" font-size="11" text-anchor="middle" font-weight="bold">2,100</text>
<text x="370.0" y="340" fill="#e0e0e0" font-family="monospace" font-size="12" text-anchor="middle">0.3.0</text>
<text x="370.0" y="355" fill="#999999" font-family="monospace" font-size="9" text-anchor="middle">v3 (+errors)</text>
<rect x="454.0" y="142.61508071113596" width="72.0" height="177.38491928886404" fill="#D35400" rx="3" opacity="0.85"/>
<text x="490.0" y="134.61508071113596" fill="#D35400" font-family="monospace" font-size="11" text-anchor="middle" font-weight="bold">3,175</text>
<text x="490.0" y="340" fill="#e0e0e0" font-family="monospace" font-size="12" text-anchor="middle">0.4.0</text>
<text x="490.0" y="355" fill="#999999" font-family="monospace" font-size="9" text-anchor="middle">v4 (+tools)</text>
<rect x="574.0" y="76.52173913043475" width="72.0" height="243.47826086956525" fill="#D35400" rx="3" opacity="0.85"/>
<text x="610.0" y="68.52173913043475" fill="#D35400" font-family="monospace" font-size="11" text-anchor="middle" font-weight="bold">4,358</text>
<text x="610.0" y="340" fill="#e0e0e0" font-family="monospace" font-size="12" text-anchor="middle">0.5.0</text>
<text x="610.0" y="355" fill="#999999" font-family="monospace" font-size="9" text-anchor="middle">v5 (+plugins)</text>
<polyline points="130.0,301.4492753623188 250.0,293.13343328335833 370.0,272.492046659597 490.0,208.695652173913 610.0,76.52173913043475" fill="none" stroke="#4caf50" stroke-width="2.5" stroke-linecap="round" stroke-linejoin="round"/>
<circle cx="130.0" cy="301.4492753623188" r="4" fill="#4caf50"/>
<text x="130.0" y="291.4492753623188" fill="#4caf50" font-family="monospace" font-size="10" text-anchor="middle">loss=2.1</text>
<circle cx="250.0" cy="293.13343328335833" r="4" fill="#4caf50"/>
<text x="250.0" y="283.13343328335833" fill="#4caf50" font-family="monospace" font-size="10" text-anchor="middle">loss=1.45</text>
<circle cx="370.0" cy="272.492046659597" r="4" fill="#4caf50"/>
<text x="370.0" y="262.492046659597" fill="#4caf50" font-family="monospace" font-size="10" text-anchor="middle">loss=0.82</text>
<circle cx="490.0" cy="208.695652173913" r="4" fill="#4caf50"/>
<text x="490.0" y="198.695652173913" fill="#4caf50" font-family="monospace" font-size="10" text-anchor="middle">loss=0.35</text>
<circle cx="610.0" cy="76.52173913043475" r="4" fill="#4caf50"/>
<text x="610.0" y="66.52173913043475" fill="#4caf50" font-family="monospace" font-size="10" text-anchor="middle">loss=0.16</text>
<text x="25" y="180.0" fill="#D35400" font-family="monospace" font-size="11" text-anchor="middle" transform="rotate(-90,25,180.0)">Training Examples</text>
<rect x="520" y="45" width="12" height="12" fill="#D35400" rx="2"/>
<text x="537" y="55" fill="#999999" font-family="monospace" font-size="10">Training Examples</text>
<line x1="520" y1="68" x2="532" y2="68" stroke="#4caf50" stroke-width="2.5"/>
<text x="537" y="72" fill="#999999" font-family="monospace" font-size="10">Model Quality (1/loss)</text>
<text x="350.0" y="390" fill="#999999" font-family="monospace" font-size="11" text-anchor="middle">Model Version</text>
</svg>

After

Width:  |  Height:  |  Size: 5.5 KiB