f5118505b1
Bake-off (0.5.0 vs 0.4.0): - Overall: 46.8% vs 45.2% (+1.6%), 0 errors vs 2 - Enchantments: +47% (20% → 67%) - EssentialsX: +60% (0% → 60%) - Effects: +25% (0% → 25%) - Regressions: fill_build -67%, world -20% Knowledge Lookup Tools (4 new): - plugin.docs_lookup: WorldGuard, WorldEdit, CoreProtect, EssentialsX, LuckPerms docs - minecraft.changelog_lookup: version history from Minecraft Wiki - paper.docs_lookup: Paper server-specific documentation - Wired into gateway model-driven tool loop and exploration self-play Exploration Self-Play: - General (vanilla MC) and plugins focus modes - Wiki-grounded: model researches before acting, validates through RCON - 2,243 exploration examples generated, 150 kept after quality filtering Training Progress Chart: - SVG chart showing training examples and inverse loss across versions - Added to MODEL_CARD.md for Gitea display Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
412 lines
20 KiB
Python
412 lines
20 KiB
Python
#!/usr/bin/env python3
|
|
"""
|
|
Exploration Self-Play — model uses wiki_lookup to explore Minecraft knowledge,
|
|
then validates its understanding through RCON commands.
|
|
|
|
Unlike canned self-play, the model drives its own curiosity:
|
|
1. Gets a broad topic ("explore enchantments", "learn about 1.21 items")
|
|
2. Uses minecraft.wiki_lookup to research
|
|
3. Generates commands based on what it learned
|
|
4. RCON validates correctness
|
|
5. If wrong, researches more and corrects
|
|
|
|
Produces gold-standard knowledge-grounded training data.
|
|
|
|
Usage:
|
|
python3 exploration_self_play.py --ollama-url http://localhost:11434 \
|
|
--model mortdecai:0.5.0 --rcon-host 192.168.0.244 --rcon-port 25578
|
|
"""
|
|
|
|
import argparse
|
|
import json
|
|
import random
|
|
import re
|
|
import sys
|
|
import time
|
|
from pathlib import Path
|
|
|
|
PROJECT_ROOT = Path(__file__).resolve().parent.parent.parent
|
|
sys.path.insert(0, str(PROJECT_ROOT))
|
|
|
|
import requests
|
|
from agent.tools.persistent_rcon import get_rcon
|
|
|
|
OUTPUT_DIR = PROJECT_ROOT / "data" / "raw" / "exploration_selfplay"
|
|
|
|
PLAYERS = ["slingshooter08", "Ace13245", "TheBigBoss", "xXDragonSlayerXx"]
|
|
|
|
# Topics for the model to explore — broad enough that it needs to look things up
|
|
EXPLORATION_TOPICS_PLUGINS = [
|
|
# WorldGuard deep dive
|
|
"Research all WorldGuard region flags. Create a region and test each flag one at a time for {p}.",
|
|
"Look up how WorldGuard region priorities work. Create overlapping regions with different rules.",
|
|
"Research WorldGuard's __global__ region. What flags can you set globally? Test a few.",
|
|
"Look up WorldGuard entry/exit deny flags. Create a VIP-only zone and test it.",
|
|
"Research how to make a WorldGuard region that heals players. Set it up near {p}.",
|
|
"What WorldGuard flags control explosions? Research and create a blast-proof zone.",
|
|
"Look up how to block specific commands in a WorldGuard region. Test with /home.",
|
|
"Research WorldGuard greeting and farewell messages. Set up regions with welcome messages.",
|
|
|
|
# CoreProtect deep dive
|
|
"Research all CoreProtect action types (block, container, chat, command). Test /co lookup with each.",
|
|
"Look up CoreProtect time format syntax. Test rollbacks with different time ranges (1h, 30m, 7d).",
|
|
"Research how CoreProtect handles container logging. Place a chest, add items, then lookup the history.",
|
|
"What CoreProtect parameters filter by block type? Test rolling back only specific blocks.",
|
|
"Look up how to use CoreProtect radius parameter. Test different radius values.",
|
|
"Research CoreProtect restore vs rollback — what's the difference? Demonstrate both.",
|
|
|
|
# EssentialsX deep dive
|
|
"Research all EssentialsX economy commands. Set up a working economy with /eco, /balance, /pay.",
|
|
"Look up EssentialsX kit creation syntax. Create a starter kit and a VIP kit.",
|
|
"Research EssentialsX warp system. Create 5 warps at interesting locations.",
|
|
"What EssentialsX commands exist for player management? Test /nick, /seen, /whois.",
|
|
"Look up EssentialsX home system. Set multiple named homes for {p}.",
|
|
"Research EssentialsX god mode, fly mode, and speed commands. Test all three.",
|
|
"What EssentialsX commands modify the world? Test /sun, /storm, /day, /night.",
|
|
|
|
# LuckPerms deep dive
|
|
"Research LuckPerms group inheritance. Create parent and child groups and test permission flow.",
|
|
"Look up LuckPerms temporary permissions. Give {p} temp fly access for 5 minutes.",
|
|
"Research LuckPerms meta (prefix/suffix). Set up colored chat prefixes for different groups.",
|
|
"What LuckPerms commands check a user's permissions? Audit {p}'s current permissions.",
|
|
"Look up how to create a LuckPerms permission ladder (default -> member -> vip -> admin).",
|
|
"Research LuckPerms weight system. How do group priorities work?",
|
|
|
|
# FAWE/WorldEdit deep dive
|
|
"Research all WorldEdit shape commands (sphere, cyl, pyramid). Build one of each near {p}.",
|
|
"Look up WorldEdit brush types. What brushes exist beyond sphere brush?",
|
|
"Research WorldEdit mask syntax. How do masks work with //replace?",
|
|
"What WorldEdit clipboard operations exist? Test //copy, //paste, //rotate, //flip.",
|
|
"Look up WorldEdit pattern syntax. Can you mix multiple blocks in one command?",
|
|
"Research WorldEdit //generate command. Can it make mathematical surfaces?",
|
|
"What WorldEdit selection modes exist? Test //sel cuboid vs poly vs sphere.",
|
|
|
|
# Script writing exploration
|
|
"Research Minecraft datapack function syntax. Write a mcfunction that creates a parkour course.",
|
|
"Look up how Minecraft tick functions work. Write one that makes particles at spawn.",
|
|
"Research how to chain mcfunctions together. Write a main function that calls sub-functions.",
|
|
"What Minecraft datapack tags control function scheduling? Test tick.json and load.json.",
|
|
"Look up execute command syntax for mcfunctions. Write a script using execute at/as/if.",
|
|
"Research scoreboard objectives. Write a script that tracks player kills and announces leaders.",
|
|
|
|
# Multi-plugin combos
|
|
"Research how to combine WorldEdit builds with WorldGuard protection. Build and protect an arena.",
|
|
"Look up how to use CoreProtect to undo WorldEdit operations specifically.",
|
|
"Research combining LuckPerms with WorldGuard — can you tie region access to permission groups?",
|
|
"Create a complete server setup: spawn area (WE), protected (WG), with warps (Ess) and perms (LP).",
|
|
"Research how to build a minigame arena: WE for building, WG for rules, scoreboards for tracking.",
|
|
]
|
|
|
|
EXPLORATION_TOPICS = [
|
|
# Items and crafting
|
|
"What are all the new items added in 1.21? Look them up and give one of each to {p}.",
|
|
"Research every type of arrow (tipped arrows) and give {p} one of each.",
|
|
"Look up all the banner patterns available and create a cool banner for {p}.",
|
|
"What suspicious stew effects exist? Research and give {p} the best one.",
|
|
"Research all the different types of potions and give {p} the three most useful ones.",
|
|
"What are all the different horse armor types? Look them up and give one of each to {p}.",
|
|
"Research all smithing templates and give {p} the rarest ones.",
|
|
"Look up every type of spawn egg and give {p} five interesting ones.",
|
|
|
|
# Enchantments
|
|
"Research the best enchantment setup for a full netherite armor set. Give it to {p}.",
|
|
"What enchantments are exclusive to each other? Look them up and explain while giving {p} examples.",
|
|
"Research the difference between Protection, Fire Protection, Blast Protection, and Projectile Protection. Which is best for general use? Give {p} the optimal set.",
|
|
"Look up what Thorns does exactly — is it worth using? Give {p} armor with and without it to test.",
|
|
"Research Sweeping Edge — does it still exist in 1.21? Give {p} a sword with the correct enchantments.",
|
|
"What's the maximum level for each enchantment? Research and give {p} a tool with impossible levels vs correct levels.",
|
|
|
|
# Effects and potions
|
|
"Research all status effects in 1.21. Which ones are new? Apply the 3 newest ones to {p}.",
|
|
"Look up the Ominous Bottle effect — what does it do? Give one to {p}.",
|
|
"What's the difference between Strength and Haste? Research and apply the right one for mining.",
|
|
"Research what Wind Charged does. Apply it to {p}.",
|
|
"Look up all negative effects and their max safe durations. Apply a brief demonstration.",
|
|
"What effect does a Beacon give? Research all beacon effects and apply them.",
|
|
|
|
# Mobs and entities
|
|
"Research all tameable mobs in 1.21. Summon one of each near {p}.",
|
|
"What mobs were added or changed in 1.21? Look them up and summon the new ones.",
|
|
"Research the Breeze mob — what does it drop? Summon one for {p}.",
|
|
"Look up all rideable mobs and summon one for {p} with a saddle.",
|
|
"What's the strongest mob in the game? Research its stats and summon it (carefully).",
|
|
"Research all fish types and summon them in water near {p}.",
|
|
|
|
# Blocks and building
|
|
"Research all copper block variants and their oxidation states. Place examples near {p}.",
|
|
"What blocks emit light? Look up all light-emitting blocks and demonstrate.",
|
|
"Research all types of stairs, slabs, and walls available in 1.21.",
|
|
"Look up how to make colored concrete powder and place a rainbow near {p}.",
|
|
"What are all the glazed terracotta patterns? Research and place one of each.",
|
|
"Research redstone components — what's the difference between a comparator and repeater?",
|
|
|
|
# Commands and mechanics
|
|
"Research the /place command. What can it place? Demonstrate with a structure.",
|
|
"Look up the /damage command syntax and demonstrate different damage types on a mob.",
|
|
"Research /attribute — what attributes can be modified? Give {p} double health.",
|
|
"What does the /ride command do? Research and demonstrate.",
|
|
"Look up /fillbiome — can you change the biome? Try it near {p}.",
|
|
"Research the /random command added in 1.21. What can it do?",
|
|
|
|
# Worldgen and structures
|
|
"Research all structure types that /locate can find. Find the 3 nearest to {p}.",
|
|
"What biomes exist in 1.21? Look up any new ones and locate them.",
|
|
"Research Trial Chambers — where do they spawn? Locate one for {p}.",
|
|
|
|
# Plugin-specific research
|
|
"Research WorldGuard region flags — what flags exist? Set up a demo region with interesting flags.",
|
|
"Look up CoreProtect rollback syntax — what parameters does it accept?",
|
|
"Research LuckPerms group inheritance — how do child groups work?",
|
|
"What WorldEdit brushes are available? Research and describe them.",
|
|
"Look up EssentialsX economy commands — set up a basic economy demonstration.",
|
|
]
|
|
|
|
|
|
def wiki_lookup(query, timeout=15):
|
|
"""Actually search the Minecraft wiki via DuckDuckGo + scraping."""
|
|
try:
|
|
# Use a simple search - the model will call this through the tool loop
|
|
r = requests.get(
|
|
"https://minecraft.wiki/api.php",
|
|
params={"action": "opensearch", "search": query, "limit": 3, "format": "json"},
|
|
timeout=timeout,
|
|
)
|
|
results = r.json()
|
|
if len(results) >= 4 and results[1]:
|
|
titles = results[1][:3]
|
|
urls = results[3][:3] if len(results) > 3 else []
|
|
|
|
# Fetch first result summary
|
|
if titles:
|
|
r2 = requests.get(
|
|
"https://minecraft.wiki/api.php",
|
|
params={
|
|
"action": "query", "prop": "extracts",
|
|
"exintro": True, "explaintext": True,
|
|
"titles": titles[0], "format": "json",
|
|
},
|
|
timeout=timeout,
|
|
)
|
|
pages = r2.json().get("query", {}).get("pages", {})
|
|
for page in pages.values():
|
|
extract = page.get("extract", "")
|
|
if extract:
|
|
return {
|
|
"content": extract[:1500],
|
|
"url": urls[0] if urls else f"https://minecraft.wiki/w/{titles[0]}",
|
|
"ok": True,
|
|
}
|
|
return {"content": f"No wiki results for: {query}", "url": "", "ok": False}
|
|
except Exception as e:
|
|
return {"content": f"Wiki lookup failed: {e}", "url": "", "ok": False}
|
|
|
|
|
|
def run_exploration(topic, player, ollama_url, model, rcon):
|
|
"""Run one exploration round — model researches and acts."""
|
|
system = (
|
|
"/no_think\n"
|
|
"You are a Minecraft 1.21 expert on a Paper server with plugins: "
|
|
"WorldGuard, CoreProtect, EssentialsX, LuckPerms, FastAsyncWorldEdit.\n\n"
|
|
"You have these lookup tools:\n"
|
|
"- minecraft.wiki_lookup: {\"query\": \"...\"} — Minecraft Wiki for items, mobs, commands\n"
|
|
"- plugin.docs_lookup: {\"plugin\": \"worldguard|worldedit|coreprotect|essentialsx|luckperms\", \"query\": \"...\"} — plugin documentation\n"
|
|
"- minecraft.changelog_lookup: {\"query\": \"...\", \"version\": \"1.21\"} — version changes\n"
|
|
"- paper.docs_lookup: {\"query\": \"...\"} — Paper server docs\n"
|
|
"- rcon.execute: {\"command\": \"...\"} — execute a Minecraft command\n\n"
|
|
"WORKFLOW:\n"
|
|
"1. Research the topic using the appropriate lookup tool\n"
|
|
"2. For plugin commands, use plugin.docs_lookup instead of minecraft.wiki_lookup\n"
|
|
"3. Generate and execute commands via rcon.execute\n"
|
|
"4. If a command fails, look up the correct syntax and try again\n\n"
|
|
"To call a tool, respond with:\n"
|
|
"<tool_call>\n{\"name\": \"tool_name\", \"arguments\": {...}}\n</tool_call>\n\n"
|
|
"When done, respond with final JSON:\n"
|
|
"{\"commands\": [...], \"reasoning\": \"what you learned\", \"wiki_topics\": [\"topics you looked up\"]}\n\n"
|
|
"Be curious. ALWAYS look things up before guessing. Verify your knowledge."
|
|
)
|
|
|
|
topic_resolved = topic.replace("{p}", player)
|
|
messages = [
|
|
{"role": "system", "content": system},
|
|
{"role": "user", "content": f"Player {player}: {topic_resolved}"},
|
|
]
|
|
|
|
tool_trace = []
|
|
all_commands = []
|
|
wiki_topics = []
|
|
max_steps = 10
|
|
|
|
for step in range(max_steps):
|
|
try:
|
|
r = requests.post(f"{ollama_url}/api/chat", json={
|
|
"model": model,
|
|
"messages": messages,
|
|
"stream": False,
|
|
"options": {"temperature": 0.6, "num_predict": 800},
|
|
}, timeout=120)
|
|
raw = r.json()["message"]["content"]
|
|
except Exception as e:
|
|
print(f" LLM error: {e}")
|
|
break
|
|
|
|
raw = re.sub(r'<think>[\s\S]*?</think>\s*', '', raw)
|
|
|
|
# Check for tool calls
|
|
tool_matches = re.findall(r'<tool_call>\s*(\{.*?\})\s*</tool_call>', raw, re.DOTALL)
|
|
|
|
if not tool_matches:
|
|
# Final response — done exploring
|
|
break
|
|
|
|
for tc_json in tool_matches:
|
|
try:
|
|
tc = json.loads(tc_json)
|
|
tool_name = tc.get("name", "")
|
|
tool_args = tc.get("arguments", {})
|
|
except json.JSONDecodeError:
|
|
continue
|
|
|
|
if tool_name == "minecraft.wiki_lookup":
|
|
query = tool_args.get("query", "")
|
|
wiki_topics.append(query)
|
|
result = wiki_lookup(query)
|
|
print(f" wiki: {query[:60]} -> {len(result.get('content',''))} chars")
|
|
elif tool_name in ("plugin.docs_lookup", "minecraft.changelog_lookup", "paper.docs_lookup"):
|
|
try:
|
|
from agent.tools.knowledge_lookup import handle_knowledge_tool
|
|
result = handle_knowledge_tool(tool_name, tool_args)
|
|
except ImportError:
|
|
result = wiki_lookup(tool_args.get("query", tool_args.get("plugin", "")))
|
|
query = tool_args.get("query", "")
|
|
wiki_topics.append(f"{tool_name}:{query}")
|
|
print(f" {tool_name}: {query[:50]} -> {len(result.get('content',''))} chars")
|
|
elif tool_name == "rcon.execute":
|
|
cmd = tool_args.get("command", "")
|
|
try:
|
|
rcon_result = rcon.command(cmd)
|
|
is_err = any(e in rcon_result for e in ("<--[HERE]", "Unknown", "Incorrect"))
|
|
result = {"success": not is_err, "result": rcon_result[:300]}
|
|
all_commands.append(cmd)
|
|
status = "OK" if not is_err else "ERR"
|
|
print(f" rcon: {cmd[:60]} -> {status}")
|
|
except Exception as e:
|
|
result = {"success": False, "result": str(e)}
|
|
print(f" rcon: {cmd[:60]} -> FAIL")
|
|
else:
|
|
result = {"ok": False, "error": f"unknown tool: {tool_name}"}
|
|
|
|
tool_trace.append({
|
|
"tool": tool_name,
|
|
"input": str(tool_args)[:200],
|
|
"ok": result.get("ok", result.get("success", False)),
|
|
"step": step,
|
|
})
|
|
|
|
messages.append({"role": "assistant", "content": f"<tool_call>\n{json.dumps(tc)}\n</tool_call>"})
|
|
messages.append({"role": "tool", "content": json.dumps(result)[:3000]})
|
|
|
|
time.sleep(0.1)
|
|
|
|
# Parse final response if present
|
|
reasoning = ""
|
|
try:
|
|
parsed = json.loads(raw)
|
|
reasoning = parsed.get("reasoning", "")
|
|
if parsed.get("commands"):
|
|
all_commands.extend(parsed["commands"])
|
|
except json.JSONDecodeError:
|
|
reasoning = raw[:200]
|
|
|
|
return {
|
|
"id": f"explore-{int(time.time())}-{random.randint(0,9999):04d}",
|
|
"source": "exploration_self_play",
|
|
"type": "exploration",
|
|
"input": {"user_message": topic_resolved, "player": player},
|
|
"output": {
|
|
"commands": all_commands,
|
|
"reasoning": reasoning,
|
|
"wiki_topics": wiki_topics,
|
|
},
|
|
"tool_trace": tool_trace,
|
|
"messages": messages,
|
|
"metadata": {
|
|
"model": model,
|
|
"steps": min(step + 1, max_steps),
|
|
"wiki_lookups": len(wiki_topics),
|
|
"rcon_commands": len(all_commands),
|
|
"success_rate": (
|
|
sum(1 for t in tool_trace if t["tool"] == "rcon.execute" and t["ok"])
|
|
/ max(sum(1 for t in tool_trace if t["tool"] == "rcon.execute"), 1)
|
|
),
|
|
},
|
|
}
|
|
|
|
|
|
def main():
|
|
parser = argparse.ArgumentParser(description="Exploration self-play")
|
|
parser.add_argument("--ollama-url", default="http://localhost:11434")
|
|
parser.add_argument("--model", default="mortdecai:0.5.0")
|
|
parser.add_argument("--rcon-host", default="192.168.0.244")
|
|
parser.add_argument("--rcon-port", type=int, default=25578)
|
|
parser.add_argument("--rcon-pass", default="REDACTED_RCON")
|
|
parser.add_argument("--rounds", type=int, default=999999)
|
|
parser.add_argument("--focus", default="general", choices=["general", "plugins", "all"],
|
|
help="Topic focus: general (vanilla MC), plugins (WG/CP/Ess/LP/FAWE/scripts), all (both)")
|
|
args = parser.parse_args()
|
|
|
|
if args.focus == "plugins":
|
|
topics = EXPLORATION_TOPICS_PLUGINS
|
|
elif args.focus == "all":
|
|
topics = EXPLORATION_TOPICS + EXPLORATION_TOPICS_PLUGINS
|
|
else:
|
|
topics = EXPLORATION_TOPICS
|
|
|
|
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
|
|
output_path = OUTPUT_DIR / f"exploration_{args.focus}_{int(time.time())}.jsonl"
|
|
|
|
rcon = get_rcon(args.rcon_host, args.rcon_port, args.rcon_pass)
|
|
|
|
print(f"Exploration Self-Play")
|
|
print(f" Model: {args.model} on {args.ollama_url}")
|
|
print(f" RCON: {args.rcon_host}:{args.rcon_port}")
|
|
print(f" Focus: {args.focus} ({len(topics)} topics)")
|
|
print(f" Output: {output_path}")
|
|
print()
|
|
|
|
stats = {"total": 0, "wiki_lookups": 0, "rcon_commands": 0, "rcon_success": 0}
|
|
|
|
for round_num in range(args.rounds):
|
|
topic = random.choice(topics)
|
|
player = random.choice(PLAYERS)
|
|
|
|
print(f"\n── Round {round_num+1} ──")
|
|
print(f" Topic: {topic[:80].replace('{p}', player)}")
|
|
|
|
example = run_exploration(topic, player, args.ollama_url, args.model, rcon)
|
|
|
|
stats["total"] += 1
|
|
stats["wiki_lookups"] += example["metadata"]["wiki_lookups"]
|
|
stats["rcon_commands"] += example["metadata"]["rcon_commands"]
|
|
stats["rcon_success"] += int(example["metadata"]["success_rate"] * example["metadata"]["rcon_commands"])
|
|
|
|
print(f" Result: {example['metadata']['wiki_lookups']} lookups, "
|
|
f"{example['metadata']['rcon_commands']} commands, "
|
|
f"{example['metadata']['success_rate']:.0%} success")
|
|
|
|
with open(output_path, "a") as f:
|
|
f.write(json.dumps(example, ensure_ascii=False) + "\n")
|
|
|
|
if (round_num + 1) % 10 == 0:
|
|
rate = stats["rcon_success"] / max(stats["rcon_commands"], 1) * 100
|
|
print(f"\n Progress: {stats['total']} explorations, "
|
|
f"{stats['wiki_lookups']} wiki lookups, "
|
|
f"{stats['rcon_commands']} commands ({rate:.0f}% success)")
|
|
|
|
time.sleep(0.5)
|
|
|
|
print(f"\nExploration complete: {stats['total']} topics explored")
|
|
|
|
|
|
if __name__ == "__main__":
|
|
main()
|