# Gemma 4 Native Tool Calling Format > Source: Google AI for Developers - Function Calling docs > https://ai.google.dev/gemma/docs/capabilities/text/function-calling-gemma4 > Canonical source in corpus: `tooling/google-official/docs/ai-google-dev_function_calling_gemma4.html` > Authoritative chat template: `tooling/huggingface/model-cards/gemma-4-{31B,E4B}-it-chat_template.jinja` ## Chat Template Context (what surrounds the tool tokens) Gemma 4 changed the turn-token syntax from Gemma 3. You won't usually write these by hand — Ollama, llama.cpp `--jinja`, and HF `apply_chat_template` all handle it — but know what's on the wire when debugging: | Purpose | Gemma 3 | Gemma 4 | |---------|---------|---------| | Turn start | `role\n` | `<\|turn>role\n` | | Turn end | `\n` | `\n` | | Thinking | (not standardized) | `<\|think>...` | | Thought channel | (n/a) | `<\|channel>thought...` | | Image inline | `` | `<\|image>...` | | Audio inline | (n/a) | `<\|audio>...` | | String delimiter in native format | (n/a) | `<\|"\|>` | **Asymmetric brackets are intentional.** Opening is `<|token>`, closing is ``. If you see `<|turn>...` in a code sample, that's wrong. ## Tool Special Tokens (6 total) | Token | Purpose | |-------|---------| | `<\|tool>` / `` | Tool definition block | | `<\|tool_call>` / `` | Model's tool request | | `<\|tool_response>` / `` | Tool execution result | String delimiter: `<\|"\|>` (encloses all string values in native format) ## Native Format (raw model tokens) ### Tool definition in system prompt: ``` <|tool>declaration: get_current_temperature{ location:{type:<|"|>string<|"|>,description:<|"|>The city<|"|>}, unit:{type:<|"|>string<|"|>,enum:[<|"|>celsius<|"|>,<|"|>fahrenheit<|"|>]} } ``` ### Tool call from model: ``` <|tool_call>call:get_current_temperature{location:<|"|>London<|"|>} ``` ### Tool response: ``` <|tool_response>response:get_current_weather{temperature:15,weather:<|"|>sunny<|"|>} ``` ## JSON Chat Format (for Ollama / OpenAI-compatible APIs) This is what you actually use in practice. Ollama translates to/from native tokens. ### Tool definition: ```json { "type": "function", "function": { "name": "get_weather", "description": "Get current weather for a location", "parameters": { "type": "object", "properties": { "city": {"type": "string", "description": "The city name"} }, "required": ["city"] } } } ``` ### Model returns: ```json { "role": "assistant", "tool_calls": [{ "function": { "name": "get_weather", "arguments": {"city": "London"} } }] } ``` ### Tool result message: ```json { "role": "tool", "content": "{\"temperature\": 15, \"weather\": \"sunny\"}" } ``` ## Thinking Mode + Tool Calls - When thinking is enabled, preserve thoughts between tool calls - For long agent chains, summarize thoughts as plain text to save context - Recommended: **disable thinking for tool-heavy workflows** (Seth's finding) ## Framework Flags | Framework | Required Flag | |-----------|--------------| | llama.cpp | `--jinja` | | vLLM | `--enable-auto-tool-choice` | | Ollama | Works via `/api/chat` endpoint with `tools` field | | transformers | `apply_chat_template(tools=[...])` | ## Known Issues - Ollama v0.20.0-0.20.1: tool call parser broken, streaming drops tool calls - llama.cpp: format mismatches and continuous loops reported - LM Studio: compatibility issues with tool calling - **Workaround:** Use non-streaming mode for tool calls (proven in Simon) ## HF `transformers` Alternative (not needed if using Ollama) If you ever route through HF `transformers` (v5.5.4+) instead of Ollama, there's a cleaner parser than hand-rolled regex: ```python inputs = processor.apply_chat_template( messages, tools=TOOLS, enable_thinking=True, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt" ) out = model.generate(**inputs) parsed = processor.parse_response(processor.decode(out[0])) # -> {"thinking": "...", "content": "...", "tool_calls": [...]} ``` `parse_response` uses `response_schema` + `x-regex` fields baked into `tokenizer_config.json` (downloaded at `tooling/huggingface/model-cards/`). For Ollama users this is informational — Ollama's server-side tool parser already does the equivalent and returns structured `tool_calls` in the chat response.