# Gemma 4 Benchmarks > Source: Google DeepMind model card, HuggingFace blog, LMArena > Released: April 2, 2026 ## Gemma 4 vs Gemma 3 (biggest single-version jump in Gemma family) | Benchmark | Gemma 3 27B | Gemma 4 31B | Gemma 4 26B A4B | Delta (31B vs G3) | |-----------|------------|------------|----------------|-------------------| | MMLU Pro | 67.6% | 85.2% | 82.6% | +17.6 | | AIME 2026 (no tools) | 20.8% | 89.2% | 88.3% | +68.4 | | GPQA Diamond | 42.4% | 84.3% | 82.3% | +41.9 | | BigBench Extra Hard | 19.3% | 74.4% | 64.8% | +55.1 | | LiveCodeBench v6 | 29.1% | 80.0% | 77.1% | +50.9 | | Codeforces ELO | 110 | 2150 | 1718 | +2040 | | MMMU Pro (vision) | 49.7% | 76.9% | 73.8% | +27.2 | | MATH-Vision | 46.0% | 85.6% | 82.4% | +39.6 | | OmniDocBench (lower=better) | 0.365 | 0.131 | 0.149 | -0.234 | | MRCR v2 128K | 13.5% | 66.4% | 44.1% | +52.9 | | MMMLU (multilingual) | 70.7% | 88.4% | 86.3% | +17.7 | ## Arena Scores | Model | LMArena Score | Rank | |-------|--------------|------| | Gemma 4 31B | 1452 | #3 | | Gemma 4 26B A4B | 1441 | #6 | ## Agentic Benchmark (tau2-bench) | Model | Score | |-------|-------| | 31B | 86.4% | | 26B A4B | 85.5% | | E4B | 57.5% | | E2B | 29.4% | ## Takeaway The jump from Gemma 3 to 4 is enormous — AIME went from 20.8% to 89.2%, Codeforces from 110 to 2150 ELO. This is not an incremental update. The 26B MoE nearly matches 31B Dense on most benchmarks while using ~4B active params.