Files
blind_chess/.claude/handoffs/2026-04-29-060121-blind-casual-check-fix.md
claude (blind_chess) 04494fcdee docs: handoff for blind Casual check-resolution fix
Captures session state: root cause, fix, verification numbers (blind 100%
-> 17% resignation, avg ply 26 -> 90), preserved view-filter invariant,
deferred Phase 2 work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 06:05:21 -04:00

148 lines
13 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Handoff: Blind Casual check-resolution fix shipped
## Session Metadata
- Created: 2026-04-29 06:01:21 UTC
- Project: /home/claude/bin/blind_chess
- Branch: main (commits `dc7f8ad`, `f00164e` pushed)
- Session duration: ~1 hour
- Live URL: https://chess.sethpc.xyz (deployed and verified)
### Recent Commits (for context)
- `f00164e` chore: gitignore tmp/ for self-play transcripts
- `dc7f8ad` fix(bot): blind Casual no longer resigns prematurely under check
- `1213ec8` docs: handoff reflects final merged state
- `1674695` docs: AI Phase 1 shipped — context, decisions, handoff
- `7c18725` feat(bot): vanilla CasualBrain delegates to js-chess-engine
## Handoff Chain
- **Continues from**: [2026-04-28-191500-ai-phase-1-shipped.md](./2026-04-28-191500-ai-phase-1-shipped.md) — Phase 1 (Casual bot) deployed; the prior handoff predicted this exact bug as a deferred risk: *"the heuristic exhausts its retry cap (5) when the bot picks a move that can't legally proceed in blind mode... Consider raising retry cap or improving heuristic if blind Casual feels broken in real play."*
- **Supersedes**: None.
## Current State Summary
User reported: *"casual bot is resigning prematurely."* Investigation confirmed the prior handoff's prediction. Vanilla mode is rock-solid (0 resigns across 80 stress games); blind mode was 100% resign at avg ply 26 in self-play. Root cause: `CasualBrain.heuristicPick` ignored the `<own>_in_check` moderator announcement and scored moves on capture/advance signals uncorrelated with check resolution. chess.js rejected every non-resolving attempt, `BotDriver.RETRY_CAP=5` fired, and the bot resigned. Fix shipped in two commits, deployed to CT 690, smoke-tested. **Blind self-play (100 games): resigns 100% → 17%, avg ply 26 → 90.** Vanilla regression check confirmed unchanged strength.
## Architecture Overview
The fix preserves the spec's view-filter invariant — **the brain still sees only its own pieces + announcements, no oracle access added**. The data needed to detect check was already being delivered to the brain in `newAnnouncements`; the heuristic just wasn't reading it. This is a recurring shape worth recognizing: a bug that looks like "the AI is broken" often turns out to be "the AI ignored a signal the protocol already sends."
The retry-cap raise (5 → 25) is essentially free for vanilla because chess.js verbose moves are guaranteed legal — vanilla never exercises retries. Blind needs the larger budget because pseudo-legal candidates from `geometricMoves` are filtered by chess.js at commit time and many fail (pinned pieces, unresolved check).
The new `[bot resign]` log line in `BotDriver.botResign()` decouples observability from the fix. Phase 1 had silent resignations — operators couldn't grep journald for them, which is why the bug surfaced as a user report rather than an alert. Future regressions are now greppable: `journalctl -u blind-chess | grep "bot resign"`.
## Critical Files
| File | Purpose | Relevance |
|------|---------|-----------|
| `packages/server/src/bot/casual-brain.ts` | Decision logic; vanilla delegates to js-chess-engine, blind uses heuristic | New `detectOwnCheck()` and `findOwnKing()` methods; `heuristicPick` takes `inCheck` parameter and applies +5000 boost to king moves |
| `packages/server/src/bot/driver.ts` | Per-game orchestrator; mutex, retry, dispatch, dispose | `RETRY_CAP` 5 → 25; `botResign()` now takes a `BotResignReason` and logs `[bot resign]` with structured detail |
| `packages/server/test/unit/bot/casual-brain.test.ts` | Unit tests | +2 tests: check-aware king bias (20-seed determinism check), and fall-through to non-king when all king moves are rejected |
| `packages/server/test/unit/bot/driver.test.ts` | Unit tests | Retry-cap test updated for new RETRY_CAP=25 |
| `scripts/selfplay.ts` | Operator CLI for evaluation | Used heavily this session — `pnpm selfplay --white casual --black casual --games 100 --mode blind --seed 100` |
## Verification Results
| Check | Result |
|---|---|
| Blind 100-game self-play (Casual vs Casual, seed=100) | resigns 100% → 17%, avgPly 26 → 90; 42 checkmates, 41 threefolds |
| Blind 20-game self-play (seed=42, same as pre-fix benchmark) | resigns 100% → 35%, avgPly 26 → 82 |
| Vanilla 30-game self-play (Casual vs Casual, seed=42) | 0 resigns; 27 checkmates, 2 threefolds, 1 fifty-move |
| Vanilla 50-game self-play (Casual W vs Random B, seed=7) | 0 resigns; Casual wins 49/50 |
| Vanilla 50-game self-play (Random W vs Casual B, seed=7) | 0 resigns; Casual wins 49/50 |
| Test suite | 78 passing (was 75; +2 new check tests, +1 driver retry-cap test updated) |
| Live `/api/health` | `{"ok":true,"activeGames":0,"uptime":4}` |
| Live POST `/api/games` with `vsAi.brain=casual` blind mode | 200 + `joinUrl:null` |
| Live POST `/api/games` with `vsAi.brain=recon` | 503 + `ai_offline` (Phase 2 unimplemented, expected) |
| journald post-deploy | No errors/warnings |
## Decisions Made
| Decision | Options Considered | Rationale |
|----------|-------------------|-----------|
| Boost king moves in heuristic vs filter candidates by chess.js legality | (a) heuristic boost — preserves view-filter invariant; (b) chess.js pre-filter — would leak attacker info | Chose (a). Preserves "bots play through the same view filter as humans" principle from the AI spec; same information ration as a human player |
| `RETRY_CAP` 5 → 25 (single global cap) vs per-mode caps | Per-mode (5 vanilla, 25 blind) vs global 25 | Chose global. Vanilla never hits the cap, so single cap simplifies code with no regression |
| King-move boost magnitude +5000 | Smaller (e.g., +200) vs larger | +5000 is large enough to deterministically dominate all other heuristic factors plus the 0.01 random tiebreak; unit test asserts 20/20 seeds pick king moves under check |
| Add resign logging now vs defer | (a) bundled with fix; (b) separate later commit | Bundled. The handoff explicitly noted the silent-resign observability gap; fixing that gap was load-bearing for any future regression detection |
| Two commits (fix + .gitignore) vs one | One bundled commit vs split | Split. Per global homelab convention: "no batching unrelated changes" — .gitignore drift was pre-existing and orthogonal |
## Immediate Next Steps
1. **Soak the fix for a few days of real play** before declaring "blind Casual is solid". Watch for:
- `ssh root@192.168.0.245 'journalctl -u blind-chess | grep "bot resign"'` — should be rare; legitimate forced positions only.
- User feedback on whether blind Casual still feels broken (lower bar but still possible).
- Mid-game stuck states (the retry budget is now 25; with degenerate brain output that's 25× more compute per cycle — should still be sub-second).
2. **When ready, write Phase 2 plan**`docs/superpowers/plans/<DATE>-ai-player-phase-2-recon.md`. Phase 2 reuses the `Brain`/`BotDriver` infrastructure unchanged; new pieces are `OllamaClient`, `ollama-endpoints` (preflight + failover), `prompt`, `parse`, `ReconBrain`, plus `aiInfo` protocol field, `'ai_unavailable'` end reason, post-game reasoning reveal UI.
3. **(Cleanup, low priority)** `git rm --cached packages/server/tsconfig.tsbuildinfo` — file is tracked from before the `*.tsbuildinfo` rule was added to `.gitignore`. Persistent `M` noise in `git status` between any rebuilds. Not blocking.
## Blockers / Open Questions
- **Blind Casual is now noticeably stronger but still loses to careful play.** The 17% post-fix resign rate represents legitimately stuck positions (multi-piece checks with no king escape, etc.) more than blunders. A human in those positions would also struggle. If users still feel blind Casual is unbeatable-or-broken, the next lever is making the heuristic *also* prefer captures and adjacent-to-king moves under check (likely block targets).
- **Threefold draws spiked from 0% → 41% in blind self-play.** Two Casual bots with the same seed/heuristic shuffle pieces and repeat positions. This is more a self-play artifact than a real-play concern; humans don't repeat. Worth watching but not actionable yet.
## Deferred Items
All Phase 2 work, untouched:
- `ReconBrain` (gemma4:26b chat agent on steel141 RTX 3090 Ti, pve197 V100 fallback)
- Mid-game GPU failover, preflight, AI-unavailable end state
- Persistent chat history per game; post-game reasoning reveal UI
- `aiInfo` protocol field (model + GPU + host)
- Acceptance bar: Recon wins ≥60% over 50 Recon-vs-Casual self-play games
## Important Context
- **The view-filter invariant is preserved.** No oracle access was added. The brain detects check via `<own_color>_in_check` in `newAnnouncements`, which is a public moderator announcement humans receive too. Phase 2 ReconBrain will read these same announcements — the pattern is now established.
- **`BrainInput.fen` is set ONLY in vanilla mode.** Blind mode omits it so the engine path can't smuggle opponent positions past the view filter. The fix did not touch this; the security boundary holds.
- **Watermark advance only on successful dispatch** is load-bearing for the fix. On retry, the brain still sees the original `<color>_in_check` announcement from the opponent's move (because `lastSeenAnnouncementCount` doesn't advance until success). This is what makes `detectOwnCheck` robust across retries.
- **The bot still uses the heuristic in vanilla as fallback** if the engine returns a move not in the chess.js candidate list. Vanilla never exercised this path in our tests, but the new `inCheck` parameter is wired through it for safety.
- **`scripts/selfplay.ts` is the canonical evaluation tool.** Phase 2 will extend it to support `--white recon --black casual` etc. The harness sets `game.aiOpponent = undefined; game.status = 'active'` after `createGame` returns — that's how it transitions out of "waiting" without a hello.
## Assumptions Made
- The user was playing in **blind mode** when they reported premature resignation. I didn't ask, but vanilla self-play showed 0 resigns across 80 games while blind showed 100%, so blind was overwhelmingly the more likely mode. If they were actually playing vanilla, that's a different bug — though I have no evidence of one.
- The +5000 king-move boost is "large enough." Verified by 20-seed determinism test; if the heuristic ever gains another factor scoring above ~5000, this assumption breaks and the test will catch it.
- `RETRY_CAP=25` is sufficient. 100-game blind self-play showed 17% still hit the cap — those are legitimate stuck positions, not under-budgeted retry. If real-play feedback says otherwise, raise further (each retry is microseconds for the heuristic; the cap could go to 50+ without performance concern).
## Potential Gotchas
- **`packages/server/tsconfig.tsbuildinfo` shows persistent `M`** in `git status` — it was tracked before `*.tsbuildinfo` was gitignored. Don't be alarmed; it's preexisting drift, not your work.
- **The pre-commit hook is `detect-secrets-hook --baseline .secrets.baseline`** at `~/.config/git/hooks/pre-commit`. If you add a new dep and pnpm-lock hashes get flagged, run `detect-secrets scan > .secrets.baseline` to refresh.
- **Server restart drops in-memory games.** Acceptable for MVP per prior decisions, but be aware: any active player-vs-Casual game in flight at deploy time will lose state.
- **`js-chess-engine` declares `engines: { node: '>=24' }`** but works on Node 22.22.2. Engines is advisory by default. If a future Node update breaks it, pin to v1.x of the package.
## Files Modified This Session
| File | Change |
|------|--------|
| `packages/server/src/bot/casual-brain.ts` | +35 LoC: new `detectOwnCheck`, `findOwnKing`; `heuristicPick` takes `inCheck`, boosts king moves +5000 when set |
| `packages/server/src/bot/driver.ts` | `RETRY_CAP` 5 → 25; `botResign(reason, detail?)` with `console.error('[bot resign]', ...)`; `BotResignReason` union; `errString` helper |
| `packages/server/test/unit/bot/casual-brain.test.ts` | +2 tests (check-aware king preference; fall-through to non-king when king moves exhausted) |
| `packages/server/test/unit/bot/driver.test.ts` | Retry-cap test updated 5 → 25, expected calls updated |
| `.gitignore` | +`tmp/` (separate commit `f00164e`) |
## Environment State
- **CT 690 / blind-chess.service:** running. Restarted 09:54 UTC after deploy. `systemctl is-active` returns `active`.
- **Active processes:** none session-relevant. Deploy was a normal restart of the systemd unit.
- **Environment variables:** none added/changed.
- **Backups:**
- Local: `packages/server/src/bot/.backup/{casual-brain,driver}.ts.1777455623`
- CT 690: `/opt/blind-chess/.backup/server-1777456437.tar.gz`
- **Secrets:** none added; pre-commit detect-secrets hook passed both commits clean.
## Related Resources
- Live URL: https://chess.sethpc.xyz
- Repo: https://git.sethpc.xyz/Seth/blind_chess (`main` at `f00164e`)
- AI Phase 1 spec: `docs/superpowers/specs/2026-04-28-ai-player-design.md`
- Phase 1 plan: `docs/superpowers/plans/2026-04-28-ai-player-phase-1-casual.md`
- DECISIONS.md "AI / computer player" section
- Project identity: `CLAUDE.md`
- Prior handoffs: `2026-04-28-191500-ai-phase-1-shipped.md`, `2026-04-28-170713-ai-player-spec.md`, `2026-04-28-152000-mvp-deployed.md`, `2026-04-28-104344-spec-approved-ready-for-plan.md`, `2026-04-28-kickoff.md`
---
**Security Reminder**: This handoff describes a behavior fix; no credentials, secrets, or sensitive endpoints are exposed in the handoff or the deployed code.