This repo opens with the design-discovery work completed before any product
code is written. Two model bakeoffs against gemma4:8b/26b/31b on a local
Ollama established that:
- Whole-puzzle generation in the Connections shape is unreliable on Gemma 4
(gemma4:31b ~50% structural-pass, gemma4:26b ~20-30%); 31b is intentionally
out of project scope, so the generation route is harder still.
- Atomic semantic-judging skills are reliable: 87.5%/93.75%/100% (8B/26b/31b)
on JUDGE; *all three models* scored 10/10 on CREATIVE_ACCEPT — fair judging
of player-INVENTED categories. That is the structural unlock vs static
hand-curated word games.
The README contains the full writeup, the test bench, and a brainstormed
bank of 10 distinct game-mechanics ideas across the fast/medium/slow tempo
range, plus a primitives table for recombination.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>