Files
Mortdecai/REDDIT_EVAL_INVITE.md
T
Seth 38b9a02e45 Phase 2: eval harness, 182 examples, live bake-off, playtest infrastructure
- Expanded dataset from 31 to 182 examples (45 manual + 106 extracted from server logs)
- Built eval/harness.py with per-category breakdowns and baseline tracking
- Built eval/live_bakeoff.py for RCON-verified model comparison on live server
- Extracted training data from prayer logs, sudo logs, and bug reports on CT 644
- Added Reddit post draft and modmail for playtester recruitment
- Updated server context: all servers now online-mode=false + whitelist
- Updated PLAN.md with Phase 2 progress

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 13:38:12 -04:00

124 lines
4.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Reddit Post
**Subreddit:** r/admincraft — could also work on r/Minecraft or r/mcservers
**Title:** Looking for a handful of playtesters for an experimental Minecraft server feature (1.21, Java)
---
**Body:**
I'm working on a custom feature for my 1.21 Java Edition server and I need some players to try it out and give feedback. It involves AI-powered in-game interactions — you'll be able to do some things through chat that you normally can't on a vanilla server.
I don't want to over-explain it before people try it — half the fun is seeing how players react to it cold. What I will say:
- It's something you interact with through in-game chat
- It does things in the world based on what you say
- It's entertaining, occasionally unpredictable, and I want to see what happens when real players poke at it
**Details:**
- Whitelisted server, Java Edition 1.21.x, hosted in the US
- Looking for ~10 players for a few sessions over the next couple weeks
- Sessions will be scheduled around availability (probably evenings/weekends)
- Your in-game chat during these sessions will be logged for development purposes — no personal data beyond your Minecraft username
- This is a hobby project, not commercial
If this sounds interesting, fill out the short form below and I'll follow up with details and the server IP.
[FORM LINK]
---
*Happy to answer general questions in the comments, but I'm going to be vague about the specifics on purpose.*
---
# Form Questions
**Google Form / Typeform — "Playtest Application"**
Page header: *Quick form to make sure we get a good group. Takes ~2 minutes.*
---
### 1. What's your Minecraft Java Edition username?
*(Short answer, required)*
**Purpose:** Whitelist + Mojang API verification that the account exists.
---
### 2. How long have you been playing Minecraft?
*(Multiple choice, required)*
- Less than a year
- 1 3 years
- 3+ years
**Purpose:** Context. Not a dealbreaker either way.
---
### 3. Have you played on community/SMP servers before?
*(Multiple choice, required)*
- Yes, regularly
- A few times
- No, mostly singleplayer
**Purpose:** SMP players understand shared-world norms.
---
### 4. What interests you about this? (pick all that apply)
*(Checkboxes, required)*
- Curious what the feature actually is
- Helping test something new
- Trying to break things (in a helpful way)
- Looking for a server to hang out on
**Purpose:** "Looking for a server" alone is a soft red flag — they may not engage. Best candidates are curious or want to help test.
---
### 5. You're testing a new server feature and it refuses to do something you asked. What do you do?
*(Long answer, required)*
**Purpose:** The key screener. Good: curiosity, rephrasing, reporting the issue. Red flags: fixation on bypassing/forcing it, or frustration that reads as entitlement.
---
### 6. Have you ever been banned from a server? If so, what happened?
*(Long answer, required)*
**Purpose:** Honesty check. Minor/old bans with self-awareness are fine. Defensiveness or serial bans are red flags.
---
### 7. When are you generally available? (timezone + rough hours)
*(Short answer, required)*
**Purpose:** Scheduling. Also filters zero-effort applications.
---
### 8. Anything else?
*(Long answer, optional)*
**Purpose:** Personality signal. Thoughtful responses correlate with better testers.
---
# Scoring Rubric (internal, not shown to applicants)
| Signal | Green | Yellow | Red |
|--------|-------|--------|-----|
| Q4 (interest) | Multiple boxes, especially "curious" or "test" | Single box, but reasonable | Only "looking for a server" |
| Q5 (refusal) | Curious, tries alternatives, reports it | Short but benign ("I'd move on") | Wants to force/bypass, hostile tone |
| Q6 (ban history) | Clean or honest with context | Vague but not defensive | Defensive, hostile, or serial bans |
| Overall effort | Complete sentences, reads like a person | Terse but present | Single-word answers, empty fields |
Auto-approve: All green. Manual review: Any yellow. Reject: Any red.