df5542f7d6
Three-arm harness under scripts/native-bakeoff/: - arm A: /api/chat with JSON tools (current default) - arm B: /api/generate raw:true with canonical HF jinja template rendered directly - arm C: google-deepmind/gemma JAX ToolSampler (env-gated, JAX required) Interim finding from A+B sweep on matt-strix gemma4:26b Q4: Ollama's bidirectional JSON↔native tool-call translator is faithful. The "long" multi-tool task produces identical behavior (7 steps / 6 tools) on both arms. Earlier arm-B parser bug that looked like a divergence was a harness issue: preserving the model's <|channel>thought\n<channel|> prefix as assistant content tripped the jinja template's tool_response-following conditional, appending a spurious <turn|>\n that corrupted the next step's prompt. Fixed by dropping the channel prefix on the assistant message. Arm C left as scaffolded-but-not-run — the JAX/bf16 reference path would answer "does the GGUF runtime diverge from DeepMind's implementation" but requires a separate env with the `gemma` PyPI package. Parked pending SDXL eviction or vast-h100 session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
41 lines
1.3 KiB
JSON
41 lines
1.3 KiB
JSON
{
|
|
"arm": "ollama-native",
|
|
"model": "gemma4:26b",
|
|
"num_ctx": 8192,
|
|
"num_predict": 2048,
|
|
"started_at": 1776600258.6906579,
|
|
"turns": [
|
|
{
|
|
"step": 1,
|
|
"elapsed_s": 0.81,
|
|
"prompt_eval_count": 1306,
|
|
"eval_count": 26,
|
|
"content_len": 109,
|
|
"tool_call_count": 1,
|
|
"stop_reason": "stop",
|
|
"history_chars_before_append": 2656,
|
|
"raw_completion_head": "<|channel>thought\n<channel|><|tool_call>call:memory_read{query:<|\"|>home automation<|\"|>,user:<|\"|>seth<|\"|>}"
|
|
},
|
|
{
|
|
"step": 2,
|
|
"elapsed_s": 2.69,
|
|
"prompt_eval_count": 1426,
|
|
"eval_count": 109,
|
|
"content_len": 356,
|
|
"tool_call_count": 0,
|
|
"stop_reason": "stop",
|
|
"history_chars_before_append": 2956,
|
|
"raw_completion_head": "thought\n<channel|>You've got a fairly solid Home Assistant setup running. Here's the gist:\n\n* **Core:** Running on **VM 706** (hosted on `pve173`).\n* **Connectivity:** Uses **Zigbee2MQTT** and an **MQTT broker** (running on **CT 149**)."
|
|
}
|
|
],
|
|
"final": {
|
|
"halt_reason": "no_tool_calls",
|
|
"steps_used": 2,
|
|
"tool_calls_total": 1,
|
|
"wall_clock_s": 3.49,
|
|
"final_message_count": 16,
|
|
"final_history_chars": 3312
|
|
},
|
|
"task": "memory",
|
|
"task_prompt": "What do I have stored about home automation? If anything, summarize it briefly."
|
|
} |