5011059f5d
Architecture specs, benchmarks, gotchas, Ollama settings, tool calling format, and implementation patterns from Simon and AI_Visualizer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2.5 KiB
2.5 KiB
Gemma 4 Native Tool Calling Format
Source: Google AI for Developers - Function Calling docs https://ai.google.dev/gemma/docs/capabilities/text/function-calling-gemma4
Special Tokens (6 total)
| Token | Purpose |
|---|---|
<|tool> / <tool|> |
Tool definition block |
<|tool_call> / <tool_call|> |
Model's tool request |
<|tool_response> / <tool_response|> |
Tool execution result |
String delimiter: <\|"\|> (encloses all string values in native format)
Native Format (raw model tokens)
Tool definition in system prompt:
<|tool>declaration:
get_current_temperature{
location:{type:<|"|>string<|"|>,description:<|"|>The city<|"|>},
unit:{type:<|"|>string<|"|>,enum:[<|"|>celsius<|"|>,<|"|>fahrenheit<|"|>]}
}<tool|>
Tool call from model:
<|tool_call>call:get_current_temperature{location:<|"|>London<|"|>}<tool_call|>
Tool response:
<|tool_response>response:get_current_weather{temperature:15,weather:<|"|>sunny<|"|>}<tool_response|>
JSON Chat Format (for Ollama / OpenAI-compatible APIs)
This is what you actually use in practice. Ollama translates to/from native tokens.
Tool definition:
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "The city name"}
},
"required": ["city"]
}
}
}
Model returns:
{
"role": "assistant",
"tool_calls": [{
"function": {
"name": "get_weather",
"arguments": {"city": "London"}
}
}]
}
Tool result message:
{
"role": "tool",
"content": "{\"temperature\": 15, \"weather\": \"sunny\"}"
}
Thinking Mode + Tool Calls
- When thinking is enabled, preserve thoughts between tool calls
- For long agent chains, summarize thoughts as plain text to save context
- Recommended: disable thinking for tool-heavy workflows (Seth's finding)
Framework Flags
| Framework | Required Flag |
|---|---|
| llama.cpp | --jinja |
| vLLM | --enable-auto-tool-choice |
| Ollama | Works via /api/chat endpoint with tools field |
| transformers | apply_chat_template(tools=[...]) |
Known Issues
- Ollama v0.20.0-0.20.1: tool call parser broken, streaming drops tool calls
- llama.cpp: format mismatches and continuous loops reported
- LM Studio: compatibility issues with tool calling
- Workaround: Use non-streaming mode for tool calls (proven in Simon)