docs: AI Phase 1 shipped — context, decisions, handoff

- CLAUDE.md: phase line moved to "Phase 1 deployed"; key files lists the new bot module, game-end extraction, and selfplay harness. - DECISIONS.md: new "Phase 1 implementation outcomes" subsection records the CasualBrain-engine reversal, the FEN-vanilla-only invariant, why blind keeps heuristic, and the bot-slot token randomization. The earlier "Stockfish deferred" entry is partially superseded. - .claude/handoffs/: handoff document for the next session.
2026-04-28 15:20:24 -04:00
parent 7c18725586
commit 1674695eef
3 changed files with 168 additions and 7 deletions
@@ -50,9 +50,9 @@ Format: `YYYY-MM-DD: <decision> — <why>`
 - 2026-04-28: **WS path through Caddy** — `wss://chess.sethpc.xyz/ws?game=<id>` works without explicit `transport ws` config. Caddy's reverse_proxy handles upgrade transparently.
 - 2026-04-28: **Public DNS** — relies on existing `*.sethpc.xyz` wildcard pointing at the WAN IP; no Pi-hole entry was needed. Caddy host-routes `chess.sethpc.xyz` to 192.168.0.245:3000.

-## AI / computer player (designed 2026-04-28, not yet implemented)
+## AI / computer player

-Spec: `docs/superpowers/specs/2026-04-28-ai-player-design.md`. All decisions below are settled at spec-approval time; revisit if implementation surfaces something the spec didn't anticipate.
+Spec: `docs/superpowers/specs/2026-04-28-ai-player-design.md`. **Phase 1 (Casual bot) deployed 2026-04-28** — live at https://chess.sethpc.xyz "Play vs computer". Phase 2 (Recon) deferred until Phase 1 has soaked.

 - 2026-04-28: **Two AI bots, phased delivery** — `CasualBrain` (Phase 1, algorithmic, in-process) ships first; `ReconBrain` (Phase 2, `gemma4:26b` chat agent) ships second. Phased to keep research uncertainty (Recon's actual playing strength) from blocking shipping anything. Rejected: combined launch, single difficulty-dial UX, throwaway Casual-as-stub.
 - 2026-04-28: **Bots use the same view filter as humans** — `BotDriver` calls `buildView(game, botColor)`; bot input is filtered `BoardView` + `Announcement[]`. No oracle access. Preserves the architectural invariant: the view filter is the only egress for board state, even for in-process bots. Rejected: "easy mode" oracle access for Casual to keep it simple.
@@ -66,6 +66,15 @@ Spec: `docs/superpowers/specs/2026-04-28-ai-player-design.md`. All decisions bel
 - 2026-04-28: **Reasoning hidden during play, revealed post-game** — Gemma's chat history is private during the game; on game end, the chat history is copied to `Game.aiThoughtsLog` and the post-game screen shows a collapsible "View gemma4's reasoning" section. Rejected: live streaming "thinking tokens" to user (leaks strategy), permanent hiding (loses showcase value of the project).
 - 2026-04-28: **`vsAi` field added to `CreateGameRequest`; `aiInfo` field added to `joined`/`update` server messages; `'ai_unavailable'` added to `EndReason`** — minimal protocol surface for the feature. AI metadata is NOT in `ModeratorText` enum (kept clean). UI-system messages for game-start info and failover events are style-distinct from `Announcement` entries.

+### Phase 1 implementation outcomes (2026-04-28)
+
+- 2026-04-28: **Phase 1 shipped to https://chess.sethpc.xyz.** 13 implementation tasks executed via subagent-driven development against `docs/superpowers/plans/2026-04-28-ai-player-phase-1-casual.md`. 75 tests passing (21 shared + 54 server). Live smoke checklist passed.
+- 2026-04-28: **CasualBrain reversal — vanilla mode now uses `js-chess-engine` (level 2, randomness=30), not the hand-rolled scorer.** The original heuristic lost to a random-move baseline 7-7 in 100-game self-play (target was ≥80%). After swap-in: Casual wins 97% as white and 96% as black vs Random, ~5-30ms/move. Supersedes the spec's "no Stockfish" decision in spirit — `js-chess-engine` is MIT-licensed, ~400KB, no native deps, and at level 2 plays "Casual" strength (beats random comfortably, loses to a careful human). Originally rejected "Stockfish for strong vanilla AI" was about *strength*, not about *using a pre-made engine*. Documented and pushed; accepted as a learning.
+- 2026-04-28: **Bot's BoardView is the only egress to the engine.** `BrainInput.fen` is set ONLY in vanilla mode (where the view is full reveal); blind mode omits it. Engine cannot smuggle opponent positions past the view filter — same architectural invariant the brainstorming session established for human-played blind chess.
+- 2026-04-28: **Blind mode keeps the heuristic (not engine).** Architecturally Stockfish/js-chess-engine can't usefully play blind chess — they need a full board to evaluate, and giving them one would be oracle access. Building a belief-state from announcements is the Recon bot's design (Phase 2). Self-play confirmed blind heuristic completes games (avgPly=16, 0 errors, all decisive) — short games but functional.
+- 2026-04-28: **Bot-slot synthetic token is randomized, not a fixed placeholder.** Using a hard-coded placeholder ("botxxxxxxxxxxxxxxxxxxxxx") would let any client knowing it claim the bot's color via `hello`. Random tokens (same shape as human tokens) close that hole. Caught in code review of Task 7.
+- 2026-04-28: **`endGame` and `finalizeIfEnded` extracted from `ws.ts` to `packages/server/src/game-end.ts`.** Both `ws.ts` and `bot/driver.ts` need to set the game-finished state — duplication risk. Hoist resolves it.
+
 ## Deferred / Rejected

 <!-- Decisions NOT to do something are just as valuable -- prevents re-proposing rejected ideas -->
@@ -83,7 +92,7 @@ Spec: `docs/superpowers/specs/2026-04-28-ai-player-design.md`. All decisions bel
 - 2026-04-28: **Pre-deploy "server restarting" warning to active players** — stretch goal, not MVP. Mitigation for now: deploy during low-usage windows.
 - 2026-04-28: ~~**Client-side AI / hint generation** — explicitly out of scope. Human vs. human only.~~ **Partially superseded 2026-04-28** by AI-player spec. Reversal applies *only* to the human-vs-AI path; client-side AI / hint generation in human-vs-human games remains rejected.
 - 2026-04-28: **Difficulty slider for AI** — rejected. Two named buttons (Casual, Recon) only. No continuum; the two bots are architecturally different, not tuneable strengths of the same engine.
- 2026-04-28: **Stockfish for vanilla-mode AI strength** — deferred. Vanilla is a side-effect, not a feature target. Revisit if users explicitly ask for strong vanilla AI.
+- 2026-04-28: ~~**Stockfish for vanilla-mode AI strength** — deferred. Vanilla is a side-effect, not a feature target. Revisit if users explicitly ask for strong vanilla AI.~~ **Partially superseded 2026-04-28** during Phase 1 implementation — using `js-chess-engine` (smaller, MIT, no GPL concerns) at level 2 for Casual vanilla, capped at ~30ms/move. The original rejection was about not making Casual *strong*; the engine at level 2 is genuinely casual-strength while still beating random comfortably. Stockfish itself remains rejected (GPL, 7MB+ wasm, overkill for the strength target).
 - 2026-04-28: **Live token streaming during Gemma's thinking** — rejected for MVP. Static "AI is thinking..." indicator only. Streaming would leak strategic intent and adds protocol complexity.
 - 2026-04-28: **Mid-game GPU flap-back** — rejected. Once failed over to V100, stays there for the rest of the game even if steel141 recovers. Simpler, more predictable, and chat-history is mid-flight.
 - 2026-04-28: **AI vs AI public spectate-able games** — rejected for MVP. Self-play harness is CLI-only (`scripts/selfplay.ts`).