df5542f7d6
Three-arm harness under scripts/native-bakeoff/: - arm A: /api/chat with JSON tools (current default) - arm B: /api/generate raw:true with canonical HF jinja template rendered directly - arm C: google-deepmind/gemma JAX ToolSampler (env-gated, JAX required) Interim finding from A+B sweep on matt-strix gemma4:26b Q4: Ollama's bidirectional JSON↔native tool-call translator is faithful. The "long" multi-tool task produces identical behavior (7 steps / 6 tools) on both arms. Earlier arm-B parser bug that looked like a divergence was a harness issue: preserving the model's <|channel>thought\n<channel|> prefix as assistant content tripped the jinja template's tool_response-following conditional, appending a spurious <turn|>\n that corrupted the next step's prompt. Fixed by dropping the channel prefix on the assistant message. Arm C left as scaffolded-but-not-run — the JAX/bf16 reference path would answer "does the GGUF runtime diverge from DeepMind's implementation" but requires a separate env with the `gemma` PyPI package. Parked pending SDXL eviction or vast-h100 session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
61 lines
4.1 KiB
JSON
61 lines
4.1 KiB
JSON
{
|
|
"arm": "ollama-native",
|
|
"model": "gemma4:26b",
|
|
"num_ctx": 8192,
|
|
"num_predict": 2048,
|
|
"started_at": 1776600742.5834217,
|
|
"turns": [
|
|
{
|
|
"step": 1,
|
|
"elapsed_s": 1.09,
|
|
"prompt_eval_count": 1318,
|
|
"eval_count": 33,
|
|
"content_len": 109,
|
|
"tool_call_count": 1,
|
|
"stop_reason": "stop",
|
|
"history_chars_before_append": 2697,
|
|
"raw_completion_head": "<|channel>thought\n<channel|><|tool_call>call:web_search{query:<|\"|>highly rated sci-fi movies 2024 2025<|\"|>}",
|
|
"raw_completion_tail": "",
|
|
"prompt_tail": "xmox cluster nodes again<turn|>\n<|turn>model\nFour nodes: pve173 (tank ZFS host, PowerEdge R820), pve112 (workhorse), pve241 (caddy + game servers), and pve197 (GPU inference). Corosync on 10.10.10.0/24.<turn|>\n<|turn>user\n[2026-04-18 14:20] @seth:sethpc.xyz: Recommend 3 sci-fi movies NOT already in my sethflix library. Check your picks against check_sethflix before finalizing.<turn|>\n<|turn>model\n",
|
|
"prompt_head": "<bos><|turn>system\nYou are Mort, a direct and witty AI assistant on Seth's Matrix server. Powered by Gemma 4. Current time: Saturday, April 18 2026 02:30 PM EDT.\n\nWhen a tool can answer the question, "
|
|
},
|
|
{
|
|
"step": 2,
|
|
"elapsed_s": 1.56,
|
|
"prompt_eval_count": 1503,
|
|
"eval_count": 53,
|
|
"content_len": 187,
|
|
"tool_call_count": 1,
|
|
"stop_reason": "stop",
|
|
"history_chars_before_append": 3306,
|
|
"raw_completion_head": "<|channel>thought\n<channel|><|tool_call>call:check_sethflix{titles:<|\"|>Arrival,Ex Machina,Blade Runner 2049,Interstellar,Children of Men,Looper,District 9,Sunshine,Primer,Coherence<|\"|>}",
|
|
"raw_completion_tail": "",
|
|
"prompt_tail": "a community discussion with multiple perspectives and useful links to follow up on. https://example.com/two\n3. Example result three \u2014 official documentation or reference material. https://example.com/three\n4. Example result four \u2014 a recent news article with relevant context. https://example.com/four\n5. Example result five \u2014 a tutorial or how-to guide. https://example.com/five<|\"|>}<tool_response|>",
|
|
"prompt_head": "<bos><|turn>system\nYou are Mort, a direct and witty AI assistant on Seth's Matrix server. Powered by Gemma 4. Current time: Saturday, April 18 2026 02:30 PM EDT.\n\nWhen a tool can answer the question, "
|
|
},
|
|
{
|
|
"step": 3,
|
|
"elapsed_s": 2.88,
|
|
"prompt_eval_count": 1649,
|
|
"eval_count": 120,
|
|
"content_len": 460,
|
|
"tool_call_count": 0,
|
|
"stop_reason": "stop",
|
|
"history_chars_before_append": 3587,
|
|
"raw_completion_head": "<|channel>thought\n<channel|>Since you've already got the heavy hitters like *Ex Machina* and *Blade Runner 2049* in the library, here are three sci-fi picks you're currently missing:\n\n1. **Arrival** \u2014 Essential viewing if you haven't seen i",
|
|
"raw_completion_tail": "f you haven't seen it; it's high-concept linguistics and first contact.\n2. **Interstellar** \u2014 A massive space epic that's a must-watch for the visuals alone.\n3. **Children of Men** \u2014 Gritty, single-take masterpiece about a dystopian future.",
|
|
"prompt_tail": " 9,Sunshine,Primer,Coherence<|\"|>}<tool_call|><|tool_response>response:check_sethflix{value:<|\"|>- Arrival: NOT IN LIBRARY\n- Ex Machina: IN LIBRARY\n- Blade Runner 2049: IN LIBRARY\n- Interstellar: NOT IN LIBRARY\n- Children of Men: NOT IN LIBRARY\n- Looper: NOT IN LIBRARY\n- District 9: NOT IN LIBRARY\n- Sunshine: NOT IN LIBRARY\n- Primer: NOT IN LIBRARY\n- Coherence: NOT IN LIBRARY<|\"|>}<tool_response|>",
|
|
"prompt_head": "<bos><|turn>system\nYou are Mort, a direct and witty AI assistant on Seth's Matrix server. Powered by Gemma 4. Current time: Saturday, April 18 2026 02:30 PM EDT.\n\nWhen a tool can answer the question, "
|
|
}
|
|
],
|
|
"final": {
|
|
"halt_reason": "no_tool_calls",
|
|
"steps_used": 3,
|
|
"tool_calls_total": 2,
|
|
"wall_clock_s": 5.54,
|
|
"final_message_count": 18,
|
|
"final_history_chars": 4047
|
|
},
|
|
"task": "movies",
|
|
"task_prompt": "Recommend 3 sci-fi movies NOT already in my sethflix library. Check your picks against check_sethflix before finalizing."
|
|
} |