Files
Mortdecai/training/scripts/exploration_self_play.py
Mortdecai f5118505b1 0.5.0 bake-off results, knowledge lookup tools, training progress chart
Bake-off (0.5.0 vs 0.4.0):
- Overall: 46.8% vs 45.2% (+1.6%), 0 errors vs 2
- Enchantments: +47% (20% → 67%)
- EssentialsX: +60% (0% → 60%)
- Effects: +25% (0% → 25%)
- Regressions: fill_build -67%, world -20%

Knowledge Lookup Tools (4 new):
- plugin.docs_lookup: WorldGuard, WorldEdit, CoreProtect, EssentialsX, LuckPerms docs
- minecraft.changelog_lookup: version history from Minecraft Wiki
- paper.docs_lookup: Paper server-specific documentation
- Wired into gateway model-driven tool loop and exploration self-play

Exploration Self-Play:
- General (vanilla MC) and plugins focus modes
- Wiki-grounded: model researches before acting, validates through RCON
- 2,243 exploration examples generated, 150 kept after quality filtering

Training Progress Chart:
- SVG chart showing training examples and inverse loss across versions
- Added to MODEL_CARD.md for Gitea display

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 15:28:09 -04:00

412 lines
20 KiB
Python

#!/usr/bin/env python3
"""
Exploration Self-Play — model uses wiki_lookup to explore Minecraft knowledge,
then validates its understanding through RCON commands.
Unlike canned self-play, the model drives its own curiosity:
1. Gets a broad topic ("explore enchantments", "learn about 1.21 items")
2. Uses minecraft.wiki_lookup to research
3. Generates commands based on what it learned
4. RCON validates correctness
5. If wrong, researches more and corrects
Produces gold-standard knowledge-grounded training data.
Usage:
python3 exploration_self_play.py --ollama-url http://localhost:11434 \
--model mortdecai:0.5.0 --rcon-host 192.168.0.244 --rcon-port 25578
"""
import argparse
import json
import random
import re
import sys
import time
from pathlib import Path
PROJECT_ROOT = Path(__file__).resolve().parent.parent.parent
sys.path.insert(0, str(PROJECT_ROOT))
import requests
from agent.tools.persistent_rcon import get_rcon
OUTPUT_DIR = PROJECT_ROOT / "data" / "raw" / "exploration_selfplay"
PLAYERS = ["slingshooter08", "Ace13245", "TheBigBoss", "xXDragonSlayerXx"]
# Topics for the model to explore — broad enough that it needs to look things up
EXPLORATION_TOPICS_PLUGINS = [
# WorldGuard deep dive
"Research all WorldGuard region flags. Create a region and test each flag one at a time for {p}.",
"Look up how WorldGuard region priorities work. Create overlapping regions with different rules.",
"Research WorldGuard's __global__ region. What flags can you set globally? Test a few.",
"Look up WorldGuard entry/exit deny flags. Create a VIP-only zone and test it.",
"Research how to make a WorldGuard region that heals players. Set it up near {p}.",
"What WorldGuard flags control explosions? Research and create a blast-proof zone.",
"Look up how to block specific commands in a WorldGuard region. Test with /home.",
"Research WorldGuard greeting and farewell messages. Set up regions with welcome messages.",
# CoreProtect deep dive
"Research all CoreProtect action types (block, container, chat, command). Test /co lookup with each.",
"Look up CoreProtect time format syntax. Test rollbacks with different time ranges (1h, 30m, 7d).",
"Research how CoreProtect handles container logging. Place a chest, add items, then lookup the history.",
"What CoreProtect parameters filter by block type? Test rolling back only specific blocks.",
"Look up how to use CoreProtect radius parameter. Test different radius values.",
"Research CoreProtect restore vs rollback — what's the difference? Demonstrate both.",
# EssentialsX deep dive
"Research all EssentialsX economy commands. Set up a working economy with /eco, /balance, /pay.",
"Look up EssentialsX kit creation syntax. Create a starter kit and a VIP kit.",
"Research EssentialsX warp system. Create 5 warps at interesting locations.",
"What EssentialsX commands exist for player management? Test /nick, /seen, /whois.",
"Look up EssentialsX home system. Set multiple named homes for {p}.",
"Research EssentialsX god mode, fly mode, and speed commands. Test all three.",
"What EssentialsX commands modify the world? Test /sun, /storm, /day, /night.",
# LuckPerms deep dive
"Research LuckPerms group inheritance. Create parent and child groups and test permission flow.",
"Look up LuckPerms temporary permissions. Give {p} temp fly access for 5 minutes.",
"Research LuckPerms meta (prefix/suffix). Set up colored chat prefixes for different groups.",
"What LuckPerms commands check a user's permissions? Audit {p}'s current permissions.",
"Look up how to create a LuckPerms permission ladder (default -> member -> vip -> admin).",
"Research LuckPerms weight system. How do group priorities work?",
# FAWE/WorldEdit deep dive
"Research all WorldEdit shape commands (sphere, cyl, pyramid). Build one of each near {p}.",
"Look up WorldEdit brush types. What brushes exist beyond sphere brush?",
"Research WorldEdit mask syntax. How do masks work with //replace?",
"What WorldEdit clipboard operations exist? Test //copy, //paste, //rotate, //flip.",
"Look up WorldEdit pattern syntax. Can you mix multiple blocks in one command?",
"Research WorldEdit //generate command. Can it make mathematical surfaces?",
"What WorldEdit selection modes exist? Test //sel cuboid vs poly vs sphere.",
# Script writing exploration
"Research Minecraft datapack function syntax. Write a mcfunction that creates a parkour course.",
"Look up how Minecraft tick functions work. Write one that makes particles at spawn.",
"Research how to chain mcfunctions together. Write a main function that calls sub-functions.",
"What Minecraft datapack tags control function scheduling? Test tick.json and load.json.",
"Look up execute command syntax for mcfunctions. Write a script using execute at/as/if.",
"Research scoreboard objectives. Write a script that tracks player kills and announces leaders.",
# Multi-plugin combos
"Research how to combine WorldEdit builds with WorldGuard protection. Build and protect an arena.",
"Look up how to use CoreProtect to undo WorldEdit operations specifically.",
"Research combining LuckPerms with WorldGuard — can you tie region access to permission groups?",
"Create a complete server setup: spawn area (WE), protected (WG), with warps (Ess) and perms (LP).",
"Research how to build a minigame arena: WE for building, WG for rules, scoreboards for tracking.",
]
EXPLORATION_TOPICS = [
# Items and crafting
"What are all the new items added in 1.21? Look them up and give one of each to {p}.",
"Research every type of arrow (tipped arrows) and give {p} one of each.",
"Look up all the banner patterns available and create a cool banner for {p}.",
"What suspicious stew effects exist? Research and give {p} the best one.",
"Research all the different types of potions and give {p} the three most useful ones.",
"What are all the different horse armor types? Look them up and give one of each to {p}.",
"Research all smithing templates and give {p} the rarest ones.",
"Look up every type of spawn egg and give {p} five interesting ones.",
# Enchantments
"Research the best enchantment setup for a full netherite armor set. Give it to {p}.",
"What enchantments are exclusive to each other? Look them up and explain while giving {p} examples.",
"Research the difference between Protection, Fire Protection, Blast Protection, and Projectile Protection. Which is best for general use? Give {p} the optimal set.",
"Look up what Thorns does exactly — is it worth using? Give {p} armor with and without it to test.",
"Research Sweeping Edge — does it still exist in 1.21? Give {p} a sword with the correct enchantments.",
"What's the maximum level for each enchantment? Research and give {p} a tool with impossible levels vs correct levels.",
# Effects and potions
"Research all status effects in 1.21. Which ones are new? Apply the 3 newest ones to {p}.",
"Look up the Ominous Bottle effect — what does it do? Give one to {p}.",
"What's the difference between Strength and Haste? Research and apply the right one for mining.",
"Research what Wind Charged does. Apply it to {p}.",
"Look up all negative effects and their max safe durations. Apply a brief demonstration.",
"What effect does a Beacon give? Research all beacon effects and apply them.",
# Mobs and entities
"Research all tameable mobs in 1.21. Summon one of each near {p}.",
"What mobs were added or changed in 1.21? Look them up and summon the new ones.",
"Research the Breeze mob — what does it drop? Summon one for {p}.",
"Look up all rideable mobs and summon one for {p} with a saddle.",
"What's the strongest mob in the game? Research its stats and summon it (carefully).",
"Research all fish types and summon them in water near {p}.",
# Blocks and building
"Research all copper block variants and their oxidation states. Place examples near {p}.",
"What blocks emit light? Look up all light-emitting blocks and demonstrate.",
"Research all types of stairs, slabs, and walls available in 1.21.",
"Look up how to make colored concrete powder and place a rainbow near {p}.",
"What are all the glazed terracotta patterns? Research and place one of each.",
"Research redstone components — what's the difference between a comparator and repeater?",
# Commands and mechanics
"Research the /place command. What can it place? Demonstrate with a structure.",
"Look up the /damage command syntax and demonstrate different damage types on a mob.",
"Research /attribute — what attributes can be modified? Give {p} double health.",
"What does the /ride command do? Research and demonstrate.",
"Look up /fillbiome — can you change the biome? Try it near {p}.",
"Research the /random command added in 1.21. What can it do?",
# Worldgen and structures
"Research all structure types that /locate can find. Find the 3 nearest to {p}.",
"What biomes exist in 1.21? Look up any new ones and locate them.",
"Research Trial Chambers — where do they spawn? Locate one for {p}.",
# Plugin-specific research
"Research WorldGuard region flags — what flags exist? Set up a demo region with interesting flags.",
"Look up CoreProtect rollback syntax — what parameters does it accept?",
"Research LuckPerms group inheritance — how do child groups work?",
"What WorldEdit brushes are available? Research and describe them.",
"Look up EssentialsX economy commands — set up a basic economy demonstration.",
]
def wiki_lookup(query, timeout=15):
"""Actually search the Minecraft wiki via DuckDuckGo + scraping."""
try:
# Use a simple search - the model will call this through the tool loop
r = requests.get(
"https://minecraft.wiki/api.php",
params={"action": "opensearch", "search": query, "limit": 3, "format": "json"},
timeout=timeout,
)
results = r.json()
if len(results) >= 4 and results[1]:
titles = results[1][:3]
urls = results[3][:3] if len(results) > 3 else []
# Fetch first result summary
if titles:
r2 = requests.get(
"https://minecraft.wiki/api.php",
params={
"action": "query", "prop": "extracts",
"exintro": True, "explaintext": True,
"titles": titles[0], "format": "json",
},
timeout=timeout,
)
pages = r2.json().get("query", {}).get("pages", {})
for page in pages.values():
extract = page.get("extract", "")
if extract:
return {
"content": extract[:1500],
"url": urls[0] if urls else f"https://minecraft.wiki/w/{titles[0]}",
"ok": True,
}
return {"content": f"No wiki results for: {query}", "url": "", "ok": False}
except Exception as e:
return {"content": f"Wiki lookup failed: {e}", "url": "", "ok": False}
def run_exploration(topic, player, ollama_url, model, rcon):
"""Run one exploration round — model researches and acts."""
system = (
"/no_think\n"
"You are a Minecraft 1.21 expert on a Paper server with plugins: "
"WorldGuard, CoreProtect, EssentialsX, LuckPerms, FastAsyncWorldEdit.\n\n"
"You have these lookup tools:\n"
"- minecraft.wiki_lookup: {\"query\": \"...\"} — Minecraft Wiki for items, mobs, commands\n"
"- plugin.docs_lookup: {\"plugin\": \"worldguard|worldedit|coreprotect|essentialsx|luckperms\", \"query\": \"...\"} — plugin documentation\n"
"- minecraft.changelog_lookup: {\"query\": \"...\", \"version\": \"1.21\"} — version changes\n"
"- paper.docs_lookup: {\"query\": \"...\"} — Paper server docs\n"
"- rcon.execute: {\"command\": \"...\"} — execute a Minecraft command\n\n"
"WORKFLOW:\n"
"1. Research the topic using the appropriate lookup tool\n"
"2. For plugin commands, use plugin.docs_lookup instead of minecraft.wiki_lookup\n"
"3. Generate and execute commands via rcon.execute\n"
"4. If a command fails, look up the correct syntax and try again\n\n"
"To call a tool, respond with:\n"
"<tool_call>\n{\"name\": \"tool_name\", \"arguments\": {...}}\n</tool_call>\n\n"
"When done, respond with final JSON:\n"
"{\"commands\": [...], \"reasoning\": \"what you learned\", \"wiki_topics\": [\"topics you looked up\"]}\n\n"
"Be curious. ALWAYS look things up before guessing. Verify your knowledge."
)
topic_resolved = topic.replace("{p}", player)
messages = [
{"role": "system", "content": system},
{"role": "user", "content": f"Player {player}: {topic_resolved}"},
]
tool_trace = []
all_commands = []
wiki_topics = []
max_steps = 10
for step in range(max_steps):
try:
r = requests.post(f"{ollama_url}/api/chat", json={
"model": model,
"messages": messages,
"stream": False,
"options": {"temperature": 0.6, "num_predict": 800},
}, timeout=120)
raw = r.json()["message"]["content"]
except Exception as e:
print(f" LLM error: {e}")
break
raw = re.sub(r'<think>[\s\S]*?</think>\s*', '', raw)
# Check for tool calls
tool_matches = re.findall(r'<tool_call>\s*(\{.*?\})\s*</tool_call>', raw, re.DOTALL)
if not tool_matches:
# Final response — done exploring
break
for tc_json in tool_matches:
try:
tc = json.loads(tc_json)
tool_name = tc.get("name", "")
tool_args = tc.get("arguments", {})
except json.JSONDecodeError:
continue
if tool_name == "minecraft.wiki_lookup":
query = tool_args.get("query", "")
wiki_topics.append(query)
result = wiki_lookup(query)
print(f" wiki: {query[:60]} -> {len(result.get('content',''))} chars")
elif tool_name in ("plugin.docs_lookup", "minecraft.changelog_lookup", "paper.docs_lookup"):
try:
from agent.tools.knowledge_lookup import handle_knowledge_tool
result = handle_knowledge_tool(tool_name, tool_args)
except ImportError:
result = wiki_lookup(tool_args.get("query", tool_args.get("plugin", "")))
query = tool_args.get("query", "")
wiki_topics.append(f"{tool_name}:{query}")
print(f" {tool_name}: {query[:50]} -> {len(result.get('content',''))} chars")
elif tool_name == "rcon.execute":
cmd = tool_args.get("command", "")
try:
rcon_result = rcon.command(cmd)
is_err = any(e in rcon_result for e in ("<--[HERE]", "Unknown", "Incorrect"))
result = {"success": not is_err, "result": rcon_result[:300]}
all_commands.append(cmd)
status = "OK" if not is_err else "ERR"
print(f" rcon: {cmd[:60]} -> {status}")
except Exception as e:
result = {"success": False, "result": str(e)}
print(f" rcon: {cmd[:60]} -> FAIL")
else:
result = {"ok": False, "error": f"unknown tool: {tool_name}"}
tool_trace.append({
"tool": tool_name,
"input": str(tool_args)[:200],
"ok": result.get("ok", result.get("success", False)),
"step": step,
})
messages.append({"role": "assistant", "content": f"<tool_call>\n{json.dumps(tc)}\n</tool_call>"})
messages.append({"role": "tool", "content": json.dumps(result)[:3000]})
time.sleep(0.1)
# Parse final response if present
reasoning = ""
try:
parsed = json.loads(raw)
reasoning = parsed.get("reasoning", "")
if parsed.get("commands"):
all_commands.extend(parsed["commands"])
except json.JSONDecodeError:
reasoning = raw[:200]
return {
"id": f"explore-{int(time.time())}-{random.randint(0,9999):04d}",
"source": "exploration_self_play",
"type": "exploration",
"input": {"user_message": topic_resolved, "player": player},
"output": {
"commands": all_commands,
"reasoning": reasoning,
"wiki_topics": wiki_topics,
},
"tool_trace": tool_trace,
"messages": messages,
"metadata": {
"model": model,
"steps": min(step + 1, max_steps),
"wiki_lookups": len(wiki_topics),
"rcon_commands": len(all_commands),
"success_rate": (
sum(1 for t in tool_trace if t["tool"] == "rcon.execute" and t["ok"])
/ max(sum(1 for t in tool_trace if t["tool"] == "rcon.execute"), 1)
),
},
}
def main():
parser = argparse.ArgumentParser(description="Exploration self-play")
parser.add_argument("--ollama-url", default="http://localhost:11434")
parser.add_argument("--model", default="mortdecai:0.5.0")
parser.add_argument("--rcon-host", default="192.168.0.244")
parser.add_argument("--rcon-port", type=int, default=25578)
parser.add_argument("--rcon-pass", default="REDACTED_RCON")
parser.add_argument("--rounds", type=int, default=999999)
parser.add_argument("--focus", default="general", choices=["general", "plugins", "all"],
help="Topic focus: general (vanilla MC), plugins (WG/CP/Ess/LP/FAWE/scripts), all (both)")
args = parser.parse_args()
if args.focus == "plugins":
topics = EXPLORATION_TOPICS_PLUGINS
elif args.focus == "all":
topics = EXPLORATION_TOPICS + EXPLORATION_TOPICS_PLUGINS
else:
topics = EXPLORATION_TOPICS
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
output_path = OUTPUT_DIR / f"exploration_{args.focus}_{int(time.time())}.jsonl"
rcon = get_rcon(args.rcon_host, args.rcon_port, args.rcon_pass)
print(f"Exploration Self-Play")
print(f" Model: {args.model} on {args.ollama_url}")
print(f" RCON: {args.rcon_host}:{args.rcon_port}")
print(f" Focus: {args.focus} ({len(topics)} topics)")
print(f" Output: {output_path}")
print()
stats = {"total": 0, "wiki_lookups": 0, "rcon_commands": 0, "rcon_success": 0}
for round_num in range(args.rounds):
topic = random.choice(topics)
player = random.choice(PLAYERS)
print(f"\n── Round {round_num+1} ──")
print(f" Topic: {topic[:80].replace('{p}', player)}")
example = run_exploration(topic, player, args.ollama_url, args.model, rcon)
stats["total"] += 1
stats["wiki_lookups"] += example["metadata"]["wiki_lookups"]
stats["rcon_commands"] += example["metadata"]["rcon_commands"]
stats["rcon_success"] += int(example["metadata"]["success_rate"] * example["metadata"]["rcon_commands"])
print(f" Result: {example['metadata']['wiki_lookups']} lookups, "
f"{example['metadata']['rcon_commands']} commands, "
f"{example['metadata']['success_rate']:.0%} success")
with open(output_path, "a") as f:
f.write(json.dumps(example, ensure_ascii=False) + "\n")
if (round_num + 1) % 10 == 0:
rate = stats["rcon_success"] / max(stats["rcon_commands"], 1) * 100
print(f"\n Progress: {stats['total']} explorations, "
f"{stats['wiki_lookups']} wiki lookups, "
f"{stats['rcon_commands']} commands ({rate:.0f}% success)")
time.sleep(0.5)
print(f"\nExploration complete: {stats['total']} topics explored")
if __name__ == "__main__":
main()