Files
seth_semantic_game/docs/reference/gemma-generation-bakeoff-2026-04-27-221751-raw.json
Mortdecai 5a2a02e483 docs: bootstrap repo with bakeoff results and game-mechanics idea bank
This repo opens with the design-discovery work completed before any product
code is written. Two model bakeoffs against gemma4:8b/26b/31b on a local
Ollama established that:

- Whole-puzzle generation in the Connections shape is unreliable on Gemma 4
  (gemma4:31b ~50% structural-pass, gemma4:26b ~20-30%); 31b is intentionally
  out of project scope, so the generation route is harder still.
- Atomic semantic-judging skills are reliable: 87.5%/93.75%/100% (8B/26b/31b)
  on JUDGE; *all three models* scored 10/10 on CREATIVE_ACCEPT — fair judging
  of player-INVENTED categories. That is the structural unlock vs static
  hand-curated word games.

The README contains the full writeup, the test bench, and a brainstormed
bank of 10 distinct game-mechanics ideas across the fast/medium/slow tempo
range, plus a primitives table for recombination.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 23:09:46 -04:00

790 lines
20 KiB
JSON

[
{
"model": "gemma4:26b",
"i": 1,
"dt": 5.670714855194092,
"ok": true,
"attempts": 1,
"puzzle": {
"theme_seed": "An exploration of common words that can be paired with 'BIRD' or 'DRAGON' or relate to types of 'FLY'.",
"groups": [
{
"difficulty": "yellow",
"category": "Types of Flies",
"words": [
"FRUIT",
"HOUSE",
"FIRE",
"BLOW"
]
},
{
"difficulty": "green",
"category": "Words before 'FLY'",
"words": [
"DRAGON",
"MAY",
"BUTTER",
"BOTTLE"
]
},
{
"difficulty": "blue",
"category": "Types of Birds",
"words": [
"LARK",
"SWALLOW",
"CRANE",
"SWIFT"
]
},
{
"difficulty": "purple",
"category": "Words that follow 'DRAGON'",
"words": [
"FRUIT",
"FLY",
"SLAYER",
"FLAME"
]
}
],
"intended_traps": [
{
"word": "FRUIT",
"actual_group": "purple",
"trap_group": "yellow",
"reason": "Fruit fly is a very common type of fly, making it a primary red herring for the yellow group."
},
{
"word": "FIRE",
"actual_group": "yellow",
"trap_group": "purple",
"reason": "Fire dragon or dragon flame connection could lead someone to look for a dragon-themed group."
},
{
"word": "DRAGON",
"actual_group": "green",
"trap_group": "blue",
"reason": "Dragon is a mythical creature, and users might look for a 'mythical creatures' group including Crane or Swallow (which are real birds)."
},
{
"word": "FLY",
"actual_group": "purple",
"trap_group": "green",
"reason": "Dragonfly is a common word, tempting the player to put 'FLY' in the 'Words before FLY' group."
}
]
},
"eval_count": 500,
"prompt_eval_count": 440
},
{
"model": "gemma4:26b",
"i": 2,
"dt": 5.300434589385986,
"ok": true,
"attempts": 1,
"puzzle": {
"theme_seed": "Homophones and linguistic overlaps between nature and everyday objects.",
"groups": [
{
"difficulty": "yellow",
"category": "Types of Trees",
"words": [
"BIRCH",
"CEDAR",
"MAPLE",
"PINE"
]
},
{
"difficulty": "green",
"category": "Things that can be 'Dropped'",
"words": [
"BEAT",
"HINT",
"LINE",
"SUBJECT"
]
},
{
"difficulty": "blue",
"category": "Units of Measurement",
"words": [
"FOOT",
"GRAM",
"SECOND",
"STONE"
]
},
{
"difficulty": "purple",
"category": "Homophones of Body Parts",
"words": [
"EYE",
"HARE",
"MUSSEL",
"QUEUE"
]
}
],
"intended_traps": [
{
"word": "PINE",
"actual_group": "yellow",
"trap_group": "green",
"reason": "To 'pine' for someone is a feeling, but 'pine' could be mistaken for something you can 'drop' (like a pine needle)."
},
{
"word": "FOOT",
"actual_group": "blue",
"trap_group": "yellow",
"reason": "A 'foot' is a unit of measurement, but it is also a body part, potentially leading to the 'Body Part Homophones' group."
},
{
"word": "BEAT",
"actual_group": "green",
"trap_group": "blue",
"reason": "A 'beat' is a rhythm, but it could be mistaken for a unit of time/measurement in a musical context."
},
{
"word": "EYE",
"actual_group": "purple",
"trap_group": "blue",
"reason": "The word 'eye' is a body part, which might tempt players looking for a body part category, but it is actually a homophone (I) for the purple group."
}
]
},
"eval_count": 510,
"prompt_eval_count": 440
},
{
"model": "gemma4:26b",
"i": 3,
"dt": 5.103548049926758,
"ok": true,
"attempts": 1,
"puzzle": {
"theme_seed": "The concept of 'Pressure' and things that can be measured or applied.",
"groups": [
{
"difficulty": "yellow",
"category": "Units of Pressure",
"words": [
"BAR",
"PASCAL",
"PSI",
"TORR"
]
},
{
"difficulty": "green",
"category": "Words after 'BLOOD'",
"words": [
"PRESSURE",
"TYPE",
"ORANGE",
"DONOR"
]
},
{
"difficulty": "blue",
"category": "Things that can be 'Broken'",
"words": [
"HEART",
"RECORD",
"LAW",
"PROMISE"
]
},
{
"difficulty": "purple",
"category": "Words that are also Verbs for 'Applying Force'",
"words": [
"PRESS",
"STAMP",
"PUNCH",
"DRIVE"
]
}
],
"intended_traps": [
{
"word": "PRESSURE",
"actual_group": "green",
"trap_group": "yellow",
"reason": "Pressure is the concept being measured by the units in the yellow group."
},
{
"word": "BAR",
"actual_group": "yellow",
"trap_group": "blue",
"reason": "A 'bar' is something that can be 'broken' (like a chocolate bar) or refers to a legal setting."
},
{
"word": "PRESS",
"actual_group": "purple",
"trap_group": "green",
"reason": "The 'Press' is a common term in news/media, and 'Press' can be related to 'Blood' in a medical context (blood pressure)."
},
{
"word": "RECORD",
"actual_group": "blue",
"trap_group": "purple",
"reason": "A 'record' is something you can 'stamp' or 'drive' (as in a driving record)."
}
]
},
"eval_count": 492,
"prompt_eval_count": 440
},
{
"model": "gemma4:26b",
"i": 4,
"dt": 5.257405757904053,
"ok": true,
"attempts": 1,
"puzzle": {
"theme_seed": "A collection of words related to things that can be 'broken' or 'cracked', mixed with other categories like cooking and music.",
"groups": [
{
"difficulty": "yellow",
"category": "Kinds of Eggs",
"words": [
"CHICKEN",
"QUAIL",
"GOOSE",
"DUCK"
]
},
{
"difficulty": "green",
"category": "Things You Can Crack",
"words": [
"NUT",
"CODE",
"WHIP",
"SMILE"
]
},
{
"difficulty": "blue",
"category": "Musical Terms",
"words": [
"NOTE",
"CHORD",
"CLEF",
"BEAT"
]
},
{
"difficulty": "purple",
"category": "Words Before 'Egg'",
"words": [
"NEST",
"EASTER",
"BAD",
"NESTING"
]
}
],
"intended_traps": [
{
"word": "DUCK",
"actual_group": "yellow",
"trap_group": "green",
"reason": "DUCK is an action related to avoiding a whip or a blow, and can be paired with 'crack' in some linguistic contexts."
},
{
"word": "CHICKEN",
"actual_group": "yellow",
"trap_group": "purple",
"reason": "CHICKEN is an egg-related word that could tempt someone looking for '___ Egg' phrases (like Chicken Egg)."
},
{
"word": "BEAT",
"actual_group": "blue",
"trap_group": "green",
"reason": "BEAT is something you can do to a whip or a person, fitting the 'Things you can crack/strike' theme."
},
{
"word": "NEST",
"actual_group": "purple",
"trap_group": "yellow",
"reason": "NEST is intrinsically linked to the birds in the yellow group (CHICKEN, GOOSE, etc.)."
}
]
},
"eval_count": 506,
"prompt_eval_count": 440
},
{
"model": "gemma4:26b",
"i": 5,
"dt": 4.578032493591309,
"ok": true,
"attempts": 1,
"puzzle": {
"theme_seed": "A collection of words centered around various types of 'records' and 'tracks'.",
"groups": [
{
"difficulty": "yellow",
"category": "Types of Records",
"words": [
"LP",
"EP",
"Single",
"Album"
]
},
{
"difficulty": "green",
"category": "Things that have Tracks",
"words": [
"Train",
"Athlete",
"Song",
"Mud"
]
},
{
"difficulty": "blue",
"category": "Words for 'Evidence'",
"words": [
"Clue",
"Trace",
"Mark",
"Proof"
]
},
{
"difficulty": "purple",
"category": "Words that follow 'Side'",
"words": [
"Effect",
"Kick",
"Split",
"Walk"
]
}
],
"intended_traps": [
{
"word": "Single",
"actual_group": "yellow",
"trap_group": "blue",
"reason": "A 'single' can be a piece of evidence or a solitary item."
},
{
"word": "Song",
"actual_group": "green",
"trap_group": "yellow",
"reason": "A song is often released as an LP, EP, or Single."
},
{
"word": "Trace",
"actual_group": "blue",
"trap_group": "green",
"reason": "A trace can be a physical track left behind in mud."
},
{
"word": "Split",
"actual_group": "purple",
"trap_group": "green",
"reason": "A 'split' can refer to a track or division in a race."
}
]
},
"eval_count": 436,
"prompt_eval_count": 440
},
{
"model": "gemma4:31b-it-q4_K_M",
"i": 1,
"dt": 18.89196801185608,
"ok": true,
"attempts": 1,
"puzzle": {
"theme_seed": "Words related to circles, timing, and subtle wordplay with 'Hands'",
"groups": [
{
"difficulty": "yellow",
"category": "Circular Objects",
"words": [
"RING",
"COIN",
"WHEEL",
"DISC"
]
},
{
"difficulty": "green",
"category": "Units of Time",
"words": [
"SECOND",
"MINUTE",
"HOUR",
"DAY"
]
},
{
"difficulty": "blue",
"category": "Ways to Assist",
"words": [
"HELP",
"AID",
"BACK",
"SUPPORT"
]
},
{
"difficulty": "purple",
"category": "Things with Hands",
"words": [
"CLOCK",
"WATCH",
"DECK",
"PLAYER"
]
}
],
"intended_traps": [
{
"word": "SECOND",
"actual_group": "green",
"trap_group": "blue",
"reason": "To 'second' a motion is to support it."
},
{
"word": "WATCH",
"actual_group": "purple",
"trap_group": "green",
"reason": "A watch is a device used to measure the units of time (second, minute, hour)."
},
{
"word": "CLOCK",
"actual_group": "purple",
"trap_group": "green",
"reason": "Similar to watch, it is intrinsically linked to the time units group."
},
{
"word": "BACK",
"actual_group": "blue",
"trap_group": "yellow",
"reason": "Could be mistaken as a part of a circular object or a physical position."
}
]
},
"eval_count": 439,
"prompt_eval_count": 440
},
{
"model": "gemma4:31b-it-q4_K_M",
"i": 2,
"dt": 16.603100776672363,
"ok": true,
"attempts": 1,
"puzzle": {
"theme_seed": "A mix of architectural elements, types of scales, and linguistic wordplay",
"groups": [
{
"difficulty": "yellow",
"category": "Parts of a Building",
"words": [
"BEAM",
"COLUMN",
"EAVE",
"FOUNDATION"
]
},
{
"difficulty": "green",
"category": "Types of Scales",
"words": [
"LIBRA",
"MUSICAL",
"WEIGHING",
"TIMELINE"
]
},
{
"difficulty": "blue",
"category": "Synonyms for Balance",
"words": [
"EQUILIBRIUM",
"POISE",
"STEADY",
"STABILITY"
]
},
{
"difficulty": "purple",
"category": "Words that start with a Greek Letter",
"words": [
"ALPHABET",
"BETRAYAL",
"GAMUT",
"DELUXE"
]
}
],
"intended_traps": [
{
"word": "LIBRA",
"actual_group": "green",
"trap_group": "blue",
"reason": "Libra is the sign of the scales, which represents balance/equilibrium."
},
{
"word": "COLUMN",
"actual_group": "yellow",
"trap_group": "green",
"reason": "A column can be a way to organize data on a scale or chart."
},
{
"word": "GAMUT",
"actual_group": "purple",
"trap_group": "green",
"reason": "The 'gamut' refers to the full range or scale of something."
}
]
},
"eval_count": 410,
"prompt_eval_count": 440
},
{
"model": "gemma4:31b-it-q4_K_M",
"i": 3,
"dt": 17.279732942581177,
"ok": true,
"attempts": 1,
"puzzle": {
"theme_seed": "Words related to vision, optics, and ocular puns",
"groups": [
{
"difficulty": "yellow",
"category": "Parts of the Eye",
"words": [
"CORNEA",
"PUPIL",
"IRIS",
"RETINA"
]
},
{
"difficulty": "green",
"category": "Types of Lenses",
"words": [
"CONCAVE",
"CONVEX",
"CONTACT",
"BIFOCAL"
]
},
{
"difficulty": "blue",
"category": "Words meaning 'To Witness'",
"words": [
"OBSERVE",
"SPOT",
"NOTE",
"VIEW"
]
},
{
"difficulty": "purple",
"category": "___ EYE",
"words": [
"BULLS",
"BIRD",
"FISH",
"PRIVATE"
]
}
],
"intended_traps": [
{
"word": "PUPIL",
"actual_group": "yellow",
"trap_group": "blue",
"reason": "A pupil is a student, and one might 'note' or 'observe' a pupil in a classroom setting."
},
{
"word": "PRIVATE",
"actual_group": "purple",
"trap_group": "blue",
"reason": "Private can be a synonym for 'secret' or 'hidden', contrasting with the act of observing or spotting."
},
{
"word": "CONTACT",
"actual_group": "green",
"trap_group": "blue",
"reason": "To 'contact' someone can be confused with 'spotting' or 'viewing' them in a social context."
}
]
},
"eval_count": 422,
"prompt_eval_count": 440
},
{
"model": "gemma4:31b-it-q4_K_M",
"i": 4,
"dt": 19.867947340011597,
"ok": true,
"attempts": 1,
"puzzle": {
"theme_seed": "Common words that can also be types of birds or musical terms",
"groups": [
{
"difficulty": "yellow",
"category": "Things that can be 'Cast'",
"words": [
"SPELL",
"VOTE",
"SHADOW",
"NET"
]
},
{
"difficulty": "green",
"category": "Musical Dynamics",
"words": [
"PIANO",
"FORTE",
"SHARP",
"FLAT"
]
},
{
"difficulty": "blue",
"category": "Types of Birds",
"words": [
"CRANE",
"SWALLOW",
"SWIFT",
"LOON"
]
},
{
"difficulty": "purple",
"category": "Words that mean 'to move quickly'",
"words": [
"DART",
"BOLT",
"RUSH",
"FLY"
]
}
],
"intended_traps": [
{
"word": "SWIFT",
"actual_group": "blue",
"trap_group": "purple",
"reason": "Swift is a synonym for moving quickly, fitting the 'move quickly' group."
},
{
"word": "FLY",
"actual_group": "purple",
"trap_group": "blue",
"reason": "Fly is a common bird-related term (or a type of insect), making it look like it belongs with Crane and Swallow."
},
{
"word": "SWALLOW",
"actual_group": "blue",
"trap_group": "purple",
"reason": "While a bird, if the user interprets 'move quickly' as 'gulp down', it might confuse them, though primarily it serves as a bird lure."
},
{
"word": "SHARP",
"actual_group": "green",
"trap_group": "purple",
"reason": "A 'sharp' turn or movement can be associated with quickness/darting."
}
]
},
"eval_count": 487,
"prompt_eval_count": 440
},
{
"model": "gemma4:31b-it-q4_K_M",
"i": 5,
"dt": 18.18777823448181,
"ok": true,
"attempts": 2,
"puzzle": {
"theme_seed": "Words related to types of 'scales' and musical terminology",
"groups": [
{
"difficulty": "yellow",
"category": "Musical Scales",
"words": [
"MAJOR",
"MINOR",
"CHROMATIC",
"PENTATONIC"
]
},
{
"difficulty": "green",
"category": "Things that Weigh",
"words": [
"BALANCE",
"SPRING",
"DIGITAL",
"TRIPLE-BEAM"
]
},
{
"difficulty": "blue",
"category": "Ways to Climb",
"words": [
"ASCEND",
"MOUNT",
"SCALE",
"SURMOUNT"
]
},
{
"difficulty": "purple",
"category": "___ NOTE",
"words": [
"FOOT",
"BANK",
"KEY",
"HIGH"
]
}
],
"intended_traps": [
{
"word": "SCALE",
"actual_group": "blue",
"trap_group": "yellow",
"reason": "It is the root word for musical scales, leading the player to look for other scale-related terms."
},
{
"word": "KEY",
"actual_group": "purple",
"trap_group": "yellow",
"reason": "A 'key' is fundamentally linked to musical scales (e.g., the Key of C Major)."
},
{
"word": "HIGH",
"actual_group": "purple",
"trap_group": "blue",
"reason": "High is an adjective often associated with climbing or ascending."
},
{
"word": "BALANCE",
"actual_group": "green",
"trap_group": "blue",
"reason": "Balance can be seen as a state of being when climbing or mountaineering."
}
]
},
"eval_count": 453,
"prompt_eval_count": 440
}
]