2e808008b1
- DECISIONS.md: new "Table-fidelity features" section + deferred items (smart-tracker rejected, highlight/phantom coupling deferred, abandoned-game localStorage cleanup deferred). - CLAUDE.md: current state, test count 78->87, key files, known gaps. - spec: record that the driver unit test covers the bot-suppression path in place of the considered-and-dropped ai-game-casual integration test (resolves a spec/implementation drift the final review flagged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
117 lines
22 KiB
Markdown
117 lines
22 KiB
Markdown
# DECISIONS.md — blind_chess Decision Log
|
||
|
||
Project-specific decisions. For global/cross-cutting decisions, see `~/bin/DECISIONS.md`.
|
||
|
||
Format: `YYYY-MM-DD: <decision> — <why>`
|
||
|
||
## Architecture
|
||
|
||
- 2026-04-28: Node 22 + TypeScript stack — single-language top-to-bottom; `chess.js` is the de facto rules engine and lives natively here.
|
||
- 2026-04-28: pnpm workspace with three packages — `packages/server` (Fastify + ws), `packages/client` (Svelte + Vite), `packages/shared` (TS types). Shared types are the load-bearing decision: the WS protocol drift surface is high-risk and shared types catch it at compile time.
|
||
- 2026-04-28: Fastify > Express — better TypeScript ergonomics, faster, cleaner plugin model for `ws` integration.
|
||
- 2026-04-28: Svelte > React — smaller bundle, reactive stores fit the constantly-changing board state model. React is overkill for a 2-route app.
|
||
- 2026-04-28: `chess.js` for rules + custom `geometricMoves` helper — chess.js doesn't expose pseudo-legal moves; ~80 LoC pure function covers all six piece types. Lives in `packages/shared` so server and client use the same code.
|
||
- 2026-04-28: In-memory only; `Map<gameId, Game>` is the entire database — simplest possible. SQLite later if crash recovery becomes painful. Rejected: SQLite for MVP (premature given hobby-project scope).
|
||
- 2026-04-28: Single-port Node service — Fastify serves both static client and `/ws` upgrade on port 3000. No reverse proxy logic in our service; Caddy CT 600 handles TLS and routing.
|
||
- 2026-04-28: Deploy target: new LXC on node-241 — clean isolation, matches existing patterns. Behind Caddy CT 600 at `chess.sethpc.xyz`.
|
||
- 2026-04-28: No auth beyond the hashed game link — friction-minimal; appropriate for casual play. No Authentik gate. Rejected: gating with Authentik (overkill).
|
||
- 2026-04-28: 8-character `gameId` (32 bits, `^[a-z0-9]{8}$`), 24-character `PlayerToken` (144 bits) — game IDs short enough for hand-shareable links, tokens long enough to prevent guessing.
|
||
- 2026-04-28: WebSocket transport for in-game; REST POST `/api/games` for creation — keeps create flow simple (refresh-friendly, cacheable), keeps in-game traffic on a single channel.
|
||
|
||
## Implementation
|
||
|
||
- 2026-04-28: Both modes (vanilla + blind) shipped day one — single engine, mode = per-player view filter. Vanilla mode is "blind mode with full reveal."
|
||
- 2026-04-28: Moderator hierarchy refined to four tiers: (1) `no_such_piece`, (2) `no_legal_moves` = pseudo-legal ∅, (3) `wont_help` = pseudo-legal ≠∅ but legal ∅ (pin OR unresolved check), (4) silent = legal moves exist. Each tier is information-strictly-monotonic (more info revealed at later tiers).
|
||
- 2026-04-28: Touch-move FSM — tap arms (reversible, client-side only), drag-start or destination-click commits ("touches"). Server tracks `armed: { color, from }`. `no_legal_moves` and `wont_help` checks fire only on first commit with a piece; once committed, all subsequent failed attempts are `illegal_move` with the touch staying.
|
||
- 2026-04-28: Highlighting (blind+ON) is purely geometric — function of `(piece type, position, own-piece set)`, no opponent input. Rays extend through unseen opponent pieces. Stop at own pieces. Off-board excluded. Zero opponent info leak. (Vanilla+ON shows engine-truth: legal-empty as green dot, legal-capture as red ring.)
|
||
- 2026-04-28: Game creation: creator picks side at create time (default random); single-use link (first joiner takes the open slot, then locked); no spectators in MVP; link dies with the game.
|
||
- 2026-04-28: Reconnect via opaque `PlayerToken` in browser `localStorage`, 5-minute grace window — generous for phone hiccups, short enough that abandoned games end. Grace expiry → `endReason: 'abandoned'`, opponent wins. Both-sides simultaneous expiry → game ends with `winner: undefined`.
|
||
- 2026-04-28: Pawn promotion via modal (Q/R/B/N), client must include `promotion` field in the move; moderator announces the promotion (it's tactically significant — public info).
|
||
- 2026-04-28: All draws auto-detected (stalemate, insufficient material, threefold, 50-move) — casual-play friendly; no "claim" UI.
|
||
- 2026-04-28: `Announcement` is an enum (`ModeratorText`), not a free-form string. Display strings live client-side. Tests assert against enum values.
|
||
- 2026-04-28: `update` is the single, idempotent server-to-client message that includes a filtered `view` and any new `Announcement` entries. Replaying the latest `update` produces correct render.
|
||
- 2026-04-28: Moderator-vocabulary "errors" (no_such_piece, no_legal_moves, wont_help, illegal_move) come through as `Announcement` entries on `update`, NOT as `error` messages. Errors reserved for protocol failures.
|
||
- 2026-04-28: Janitor prunes finished games after 30 min idle; active games never expire (until restart).
|
||
- 2026-04-28: Rate limiting via per-token bucket on `commit`: 10/s, burst 20 — well above human pace, well below abuse.
|
||
- 2026-04-28: Mobile-first responsive design — IDEA.md's share-a-link flow strongly implies phone use.
|
||
- 2026-04-28: Logging via Pino (Fastify default) → journald. `/api/health` for Uptime Kuma probe. No Prometheus/OpenTelemetry in MVP.
|
||
- 2026-04-28: Resign + draw-offer/accept-decline flow — standard chess UX. Resignation ends without grace; disconnect applies grace.
|
||
- 2026-04-28: Game-over screen reveals full board for both sides — post-game review is part of the experience.
|
||
|
||
## Implementation outcomes (2026-04-28 build session)
|
||
|
||
- 2026-04-28: **Repo:** `git.sethpc.xyz/Seth/blind_chess`. Created via `gitea create blind_chess`. Default branch `main`.
|
||
- 2026-04-28: **CT:** 690 on node-241, hostname `blind-chess`, IP 192.168.0.245, Debian 12, Node 22.22.2. 2 cores / 512 MB RAM / 8 GB rootfs. Resting memory ~29 MB, plenty of headroom.
|
||
- 2026-04-28: **Chosen `chess.js` v1.4.0** — uses `Move.isEnPassant()` / `isKingsideCastle()` / `isQueensideCastle()` instead of the deprecated `flags` string. The `Move` constructor's deprecated `flags` field is intentionally not relied upon.
|
||
- 2026-04-28: **Half-move clock for the 50-move rule** is read from FEN field 4 (chess.js doesn't expose it directly). See `translator.ts:halfMoveClock`.
|
||
- 2026-04-28: **Shared package import resolution** — `packages/shared/package.json` `main` and `exports` point to `./dist/`. Source `.ts` is dev-only. Always run `pnpm --filter @blind-chess/shared build` before `pnpm --filter @blind-chess/server build` (the workspace project refs handle this when running `pnpm -r build`).
|
||
- 2026-04-28: **Client routing** is hash-based with a pathname fallback in `App.svelte` so `https://chess.sethpc.xyz/g/<id>` (the share URL) and `https://chess.sethpc.xyz/#/g/<id>` (the post-create URL) both render the game. The Fastify SPA fallback serves `index.html` on any non-matching `text/html` request.
|
||
- 2026-04-28: **Click-to-move only** — drag-and-drop deferred. Tap-arm + tap-destination is faithful to the touch-move FSM and works identically on phone and desktop.
|
||
- 2026-04-28: **WS path through Caddy** — `wss://chess.sethpc.xyz/ws?game=<id>` works without explicit `transport ws` config. Caddy's reverse_proxy handles upgrade transparently.
|
||
- 2026-04-28: **Public DNS** — relies on existing `*.sethpc.xyz` wildcard pointing at the WAN IP; no Pi-hole entry was needed. Caddy host-routes `chess.sethpc.xyz` to 192.168.0.245:3000.
|
||
|
||
## AI / computer player
|
||
|
||
Spec: `docs/superpowers/specs/2026-04-28-ai-player-design.md`. **Phase 1 (Casual bot) deployed 2026-04-28** — live at https://chess.sethpc.xyz "Play vs computer". Phase 2 (Recon) deferred until Phase 1 has soaked.
|
||
|
||
- 2026-04-28: **Two AI bots, phased delivery** — `CasualBrain` (Phase 1, algorithmic, in-process) ships first; `ReconBrain` (Phase 2, `gemma4:26b` chat agent) ships second. Phased to keep research uncertainty (Recon's actual playing strength) from blocking shipping anything. Rejected: combined launch, single difficulty-dial UX, throwaway Casual-as-stub.
|
||
- 2026-04-28: **Bots use the same view filter as humans** — `BotDriver` calls `buildView(game, botColor)`; bot input is filtered `BoardView` + `Announcement[]`. No oracle access. Preserves the architectural invariant: the view filter is the only egress for board state, even for in-process bots. Rejected: "easy mode" oracle access for Casual to keep it simple.
|
||
- 2026-04-28: **In-process virtual players, not external WS clients** — `BotDriver` lives in the existing Fastify server, dispatches actions through the same `commit` handler humans use. One process, no new deploy targets. Rejected: external bot processes (more operational surface, no real benefit), hybrid Casual-in-process / Recon-external (asymmetric for no gain).
|
||
- 2026-04-28: **Recon bot is a stateful chat agent, not stateless** — per-game chat history persists across turns as the bot's private memory. Each turn appends user (new view + announcements + candidates) + assistant (reasoning + move). Reasoning is hidden from the human during play, revealed in collapsible post-game panel. Rejected: stateless one-shot move-picker (loses belief-tracking across turns), revealing reasoning during play (would leak strategic intent).
|
||
- 2026-04-28: **Endpoint priority: steel141 RTX 3090 Ti primary, pve197 V100 fallback** — preflight on game creation; mid-game failover allowed once (one-way). Rationale: 3090 Ti benchmarks at ~134 tok/s on `gemma4:26b`; V100 estimated ~80 tok/s. Both have the model present. Rejected: no failover (worse UX), bidirectional flap (more complexity, no real benefit).
|
||
- 2026-04-28: **GPU shown to user** — persistent badge under AI's slot reads `"gemma4:26b · RTX 3090 Ti"` (or V100 / failed-over variant). Game-start moderator-panel UI message explicitly names the model + host. Rationale: chess.sethpc.xyz is a personal homelab site; surfacing the hardware is brand-appropriate and gives honest feedback when fallback engages. Rejected: hiding the GPU (would be opaque on slow V100 fallback).
|
||
- 2026-04-28: **`gemma4:26b` model choice** — sweet spot per gemma4-research: ~134 tok/s decode on 3090 Ti (4.7× faster than 31B), MoE 3.8B active, vision-capable (not used here). Rejected: 31B (5× slower, marginal strength gain not worth latency), e4b (too small for this task).
|
||
- 2026-04-28: **Per-move latency budget: 30s normal, 90s first-move** — first-move headroom covers cold-start (steel141 keep_alive=30m policy, ~30-60s reload after idle). Beyond 90s, treat as endpoint failure → failover. Rejected: tighter cap (false-positives on cold start), looser cap (UX death).
|
||
- 2026-04-28: **Recon "done" bar: ≥60% wins over 50 Recon-vs-Casual self-play games** — concrete, measurable acceptance bound. If Recon misses 60% but holds >40%, prompt-engineering rabbit hole; if <40%, design signal (try 31B or feed textual board representation). Self-play harness lives in `scripts/selfplay.ts`, not in CI. Rejected: subjective "feels okay" bar (would let weak Recon ship), bar against humans (untestable at scale).
|
||
- 2026-04-28: **Reasoning hidden during play, revealed post-game** — Gemma's chat history is private during the game; on game end, the chat history is copied to `Game.aiThoughtsLog` and the post-game screen shows a collapsible "View gemma4's reasoning" section. Rejected: live streaming "thinking tokens" to user (leaks strategy), permanent hiding (loses showcase value of the project).
|
||
- 2026-04-28: **`vsAi` field added to `CreateGameRequest`; `aiInfo` field added to `joined`/`update` server messages; `'ai_unavailable'` added to `EndReason`** — minimal protocol surface for the feature. AI metadata is NOT in `ModeratorText` enum (kept clean). UI-system messages for game-start info and failover events are style-distinct from `Announcement` entries.
|
||
|
||
### Phase 1 implementation outcomes (2026-04-28)
|
||
|
||
- 2026-04-28: **Phase 1 shipped to https://chess.sethpc.xyz.** 13 implementation tasks executed via subagent-driven development against `docs/superpowers/plans/2026-04-28-ai-player-phase-1-casual.md`. 75 tests passing (21 shared + 54 server). Live smoke checklist passed.
|
||
- 2026-04-28: **CasualBrain reversal — vanilla mode now uses `js-chess-engine` (level 2, randomness=30), not the hand-rolled scorer.** The original heuristic lost to a random-move baseline 7-7 in 100-game self-play (target was ≥80%). After swap-in: Casual wins 97% as white and 96% as black vs Random, ~5-30ms/move. Supersedes the spec's "no Stockfish" decision in spirit — `js-chess-engine` is MIT-licensed, ~400KB, no native deps, and at level 2 plays "Casual" strength (beats random comfortably, loses to a careful human). Originally rejected "Stockfish for strong vanilla AI" was about *strength*, not about *using a pre-made engine*. Documented and pushed; accepted as a learning.
|
||
- 2026-04-28: **Bot's BoardView is the only egress to the engine.** `BrainInput.fen` is set ONLY in vanilla mode (where the view is full reveal); blind mode omits it. Engine cannot smuggle opponent positions past the view filter — same architectural invariant the brainstorming session established for human-played blind chess.
|
||
- 2026-04-28: **Blind mode keeps the heuristic (not engine).** Architecturally Stockfish/js-chess-engine can't usefully play blind chess — they need a full board to evaluate, and giving them one would be oracle access. Building a belief-state from announcements is the Recon bot's design (Phase 2). Self-play confirmed blind heuristic completes games (avgPly=16, 0 errors, all decisive) — short games but functional.
|
||
- 2026-04-28: **Bot-slot synthetic token is randomized, not a fixed placeholder.** Using a hard-coded placeholder ("botxxxxxxxxxxxxxxxxxxxxx") would let any client knowing it claim the bot's color via `hello`. Random tokens (same shape as human tokens) close that hole. Caught in code review of Task 7.
|
||
- 2026-04-28: **`endGame` and `finalizeIfEnded` extracted from `ws.ts` to `packages/server/src/game-end.ts`.** Both `ws.ts` and `bot/driver.ts` need to set the game-finished state — duplication risk. Hoist resolves it.
|
||
|
||
## Table-fidelity features (2026-05-18)
|
||
|
||
Spec: `docs/superpowers/specs/2026-05-18-table-fidelity-features-design.md`. Plan: `docs/superpowers/plans/2026-05-18-table-fidelity-features.md`. Three features requested by Andrew Freiberg (a physical-game player); shipped to `main` 2026-05-18, 12 tasks via subagent-driven development. 87 tests passing (25 shared + 62 server).
|
||
|
||
- 2026-05-18: **All moderator announcements are `audience: 'both'`** — every move event and every attempted-move error reaches both players, faithful to the physical game where the moderator speaks aloud. A deliberate, authorised widening of the moderator channel (it makes blind mode slightly less blind — e.g. you hear "won't help you" on the opponent's turn). The `audience` field is retained (now uniformly `'both'`) as the egress-control hook in `ws.ts` / `ModeratorPanel`.
|
||
- 2026-05-18: **Bot intermediate retry-rejection announcements are popped in `BotDriver.dispatch`** — the blind Casual bot's retry search would otherwise broadcast up to 25 churn announcements per turn. Only the bot's final committed move is announced. Human probes (1–3 pieces, human-paced) still broadcast — that is the feature.
|
||
- 2026-05-18: **Capture tally is a server-derived per-viewer `captures` field on `joined`/`update`**, not a `ModeratorText` enum entry — the announcement vocabulary stays a pure event enum; the tally is structured data (`CaptureTally = { byYou, byOpponent }`). Must be server-side: in blind mode the capturing client can't see what it took.
|
||
- 2026-05-18: **Phantom opponent-piece layer is 100% client-local** — never sent to the server, persisted only to `localStorage` (`bc:phantoms:<gameId>`), in its own store (`phantoms.svelte.ts`) separate from the protocol store so the zero-leak property is auditable. Blind mode only. `buildView` / `geometric.ts` untouched.
|
||
- 2026-05-18: **Manual phantom model** — seeded once with the opponent's standard starting army, then fully manual: drag anywhere, drag off-board to remove, re-add from an unlimited palette, no automation. Rejected: a "smart tracker" that auto-removes on capture and tracks promotions (Seth chose the manual model).
|
||
- 2026-05-18: **Phantom manipulation is pointer-event drag-and-drop** with a tap-vs-drag threshold so a tap still makes a real move. Real chess moves stay click-to-move — the deferred drag-and-drop decision for *real* moves still stands; F3's drag is phantom-only.
|
||
- 2026-05-18: **Client has no unit-test harness** (deliberate) — Feature 3's testable pure logic (`opponentStartPosition`, `deserializePhantoms`) lives in `packages/shared` and is unit-tested there; Svelte components/stores are covered by `svelte-check` typechecking plus manual verification.
|
||
|
||
## Deferred / Rejected
|
||
|
||
<!-- Decisions NOT to do something are just as valuable -- prevents re-proposing rejected ideas -->
|
||
|
||
- 2026-04-28: **Tactical-advice interpretation of "won't help you"** — rejected. The phrase is a check-resolution announcement, not engine evaluation. Subjective "this move is bad" is anti-fun and out of scope.
|
||
- 2026-04-28: **Spectator mode** — deferred. Single-use links and no spectators in MVP. Revisit if there's demand.
|
||
- 2026-04-28: **Time controls (clocks)** — deferred. Untimed correspondence-style for MVP. Optional 5+0 / 10+0 / 15+10 in a follow-up if Seth wants.
|
||
- 2026-04-28: **SQLite persistence** — deferred. In-memory only for MVP. Add when crash recovery becomes painful (1-day implementation: serialize Map on `ExecStop`, deserialize on `ExecStart`).
|
||
- 2026-04-28: **End-to-end browser tests (Playwright)** — out of scope for MVP. Protocol-level integration tests cover the same drift surface for ~10× less maintenance. Manual phone+desktop testing suffices.
|
||
- 2026-04-28: **Vanilla-only or blind-only MVP** — rejected in favor of both-from-day-one. The shared engine + view-filter architecture means vanilla is essentially free.
|
||
- 2026-04-28: **Authentik gate on `chess.sethpc.xyz`** — rejected. The hashed link IS the auth; an additional gate would be friction with no security benefit (link guessing is already infeasible).
|
||
- 2026-04-28: **CI/CD automation** — deferred. Manual `pnpm -r build` + `rsync` + `systemctl restart` is fine for a hobby project. Add Gitea Actions later if deploy friction grows.
|
||
- 2026-04-28: **Move log / PGN export, post-game replay** — deferred. Announcements are persisted in-game (so the moderator-panel scrollback works); export and replay are post-MVP.
|
||
- 2026-04-28: **Public lobby / matchmaking / ratings** — out of scope. This is a private-link game, not a chess site.
|
||
- 2026-04-28: **Pre-deploy "server restarting" warning to active players** — stretch goal, not MVP. Mitigation for now: deploy during low-usage windows.
|
||
- 2026-04-28: ~~**Client-side AI / hint generation** — explicitly out of scope. Human vs. human only.~~ **Partially superseded 2026-04-28** by AI-player spec. Reversal applies *only* to the human-vs-AI path; client-side AI / hint generation in human-vs-human games remains rejected.
|
||
- 2026-04-28: **Difficulty slider for AI** — rejected. Two named buttons (Casual, Recon) only. No continuum; the two bots are architecturally different, not tuneable strengths of the same engine.
|
||
- 2026-04-28: ~~**Stockfish for vanilla-mode AI strength** — deferred. Vanilla is a side-effect, not a feature target. Revisit if users explicitly ask for strong vanilla AI.~~ **Partially superseded 2026-04-28** during Phase 1 implementation — using `js-chess-engine` (smaller, MIT, no GPL concerns) at level 2 for Casual vanilla, capped at ~30ms/move. The original rejection was about not making Casual *strong*; the engine at level 2 is genuinely casual-strength while still beating random comfortably. Stockfish itself remains rejected (GPL, 7MB+ wasm, overkill for the strength target).
|
||
- 2026-04-28: **Live token streaming during Gemma's thinking** — rejected for MVP. Static "AI is thinking..." indicator only. Streaming would leak strategic intent and adds protocol complexity.
|
||
- 2026-04-28: **Mid-game GPU flap-back** — rejected. Once failed over to V100, stays there for the rest of the game even if steel141 recovers. Simpler, more predictable, and chat-history is mid-flight.
|
||
- 2026-04-28: **AI vs AI public spectate-able games** — rejected for MVP. Self-play harness is CLI-only (`scripts/selfplay.ts`).
|
||
- 2026-04-28: **Per-turn context compaction** — deferred. Spec uses `num_ctx: 32768` which covers ~128 turns; longer games would overflow but are rare in casual play. Add running-summary compaction if seen in practice.
|
||
- 2026-04-28: **Bot rating / Elo / personalities** — out of scope. Two named buttons, no scoreboard.
|
||
- 2026-04-28: **In-game chat (player ↔ player and human ↔ Gemma)** — deferred indefinitely. Two failure modes drove the deferral: (1) blind-mode chat is a side channel that bypasses the moderator-vocabulary security boundary ("knight on c3, take it" defeats the entire view-filter architecture); (2) chatting with Gemma during play leaks the bot's belief state and undermines the post-game reasoning reveal. Resolvable but expensive (two-history split for Gemma, blind-mode mute or social-variant warnings, mobile UI real estate). Revisit only if users explicitly ask. The post-game reasoning reveal already covers most of the "see what Gemma was thinking" appeal without the leak surface.
|
||
- 2026-05-18: **Smart-tracker phantom model** (auto-remove a phantom on capture, track promotions, constrain the phantom set to the opponent's surviving army) — rejected in favour of the fully-manual model. More code and more edge cases; Seth wanted the manual ritual.
|
||
- 2026-05-18: **Highlighting interacting with phantoms** (bishop/rook rays stopping at phantom pieces) — deferred. Safe to do (phantoms carry zero real opponent info) but out of scope for v1; phantoms are a pure annotation layer that highlighting ignores.
|
||
- 2026-05-18: **Phantom-layer `localStorage` cleanup for abandoned games** — deferred. `clearForGame` only fires when the game reaches `finished` while `<Game>` is mounted; a tab closed mid-game leaves a stale `bc:phantoms:<id>` key. Each entry is a tiny JSON object; add a stale-key sweep on app start only if it ever matters.
|