a945207aab
Ran minimal agent loop (Ollama /api/chat + read_file/write_file/run_bash) on steel141 3090 Ti against 3 models on a broken-median-function task: - gemma4:31b-it-q4_K_M: PASS (8 iters, 1 write, 44s) — textbook trace - qwen3-coder:30b: PASS (15 iters, 1 write, 22s) — correct but chatty - gemma4:26b: FAIL (6 iters, 0 writes) — silently stops with eval=4 after reading source. Reproduced on second run. One-shot probe confirms 26b CAN produce the correct fix — failure is specifically at the write_file tool-call argument boundary. Updates GOTCHAS with a new HIGH-severity entry, SYNTHESIS model-selection table, CORPUS_cli_coding_agent.md empirical-follow-up pointer, and adds docs/reference/bakeoff-2026-04-18.md with the full writeup.
31 lines
591 B
Python
31 lines
591 B
Python
from calc.stats import mean, median, mode, variance
|
|
|
|
|
|
def test_mean_basic():
|
|
assert mean([1, 2, 3, 4, 5]) == 3.0
|
|
|
|
|
|
def test_median_odd():
|
|
assert median([1, 2, 3]) == 2
|
|
|
|
|
|
def test_median_even():
|
|
assert median([1, 2, 3, 4]) == 2.5
|
|
|
|
|
|
def test_median_unsorted():
|
|
assert median([3, 1, 4, 1, 5, 9, 2, 6]) == 3.5
|
|
|
|
|
|
def test_median_floats():
|
|
assert median([1.0, 2.0, 3.0, 4.0]) == 2.5
|
|
|
|
|
|
def test_mode_basic():
|
|
assert mode([1, 2, 2, 3]) == 2
|
|
|
|
|
|
def test_variance_basic():
|
|
# sample variance (n-1) of [1, 2, 3, 4, 5] is 10/4 = 2.5
|
|
assert variance([1, 2, 3, 4, 5]) == 2.5
|