Persistent RCON connections — fixes server crash from connection spam

Root cause: self-play opened/closed a new TCP socket for every RCON command
(hundreds/minute). Paper's RCON listener creates a thread per connection,
overwhelming the server until it stopped.

Fix: PersistentRCON class maintains a single connection per server with
auto-reconnect. Thread-safe via lock. Connection pool keyed by host:port.

Applied to:
- mc_aigod_paper.py (prod paper-ai + dev)
- mc_aigod.py (shrink-world)
- self_play.py (training data generation)
- persistent_rcon.py (shared module)

Before: ~100+ RCON connections/minute → server crash
After: 3 persistent connections total → stable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-20 18:24:44 -04:00
parent 67179f75ad
commit ead16fd429
2 changed files with 157 additions and 23 deletions
+10 -23
View File
@@ -42,33 +42,16 @@ sys.path.insert(0, str(ROOT))
OUTPUT = ROOT / "data" / "processed" / "self_play.jsonl"
# --- RCON ---
# --- RCON (persistent connection) ---
from agent.tools.persistent_rcon import get_rcon
def rcon_command(cmd, host, port, password):
"""Execute via RCON, return (success, result_text)."""
import socket, struct
"""Execute via persistent RCON, return (success, result_text)."""
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(5)
s.connect((host, port))
def send(rid, ptype, payload):
data = struct.pack("<ii", rid, ptype) + payload.encode("utf-8") + b"\x00\x00"
s.sendall(struct.pack("<i", len(data)) + data)
def recv():
raw = s.recv(4)
if len(raw) < 4: return None
length = struct.unpack("<i", raw)[0]
data = s.recv(length)
return data[8:-2].decode("utf-8", errors="replace")
send(1, 3, password)
time.sleep(0.1)
recv()
send(2, 2, cmd)
time.sleep(0.2)
result = recv() or ""
s.close()
conn = get_rcon(host, port, password)
result = conn.command(cmd)
# Detect errors
error_patterns = [
"Unknown or incomplete command",
"Incorrect argument",
@@ -76,8 +59,12 @@ def rcon_command(cmd, host, port, password):
"Unknown item",
"Invalid or unknown",
"Expected",
"<--[HERE]",
]
is_error = any(p.lower() in result.lower() for p in error_patterns)
# Benign non-errors
if "no player was found" in result.lower():
is_error = False
return (not is_error, result.strip())
except Exception as e:
return (False, f"RCON error: {e}")