docs: initial Gemma 4 research corpus and synthesis
Architecture specs, benchmarks, gotchas, Ollama settings, tool calling format, and implementation patterns from Simon and AI_Visualizer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,100 @@
|
||||
# Gemma 4 Native Tool Calling Format
|
||||
|
||||
> Source: Google AI for Developers - Function Calling docs
|
||||
> https://ai.google.dev/gemma/docs/capabilities/text/function-calling-gemma4
|
||||
|
||||
## Special Tokens (6 total)
|
||||
|
||||
| Token | Purpose |
|
||||
|-------|---------|
|
||||
| `<\|tool>` / `<tool\|>` | Tool definition block |
|
||||
| `<\|tool_call>` / `<tool_call\|>` | Model's tool request |
|
||||
| `<\|tool_response>` / `<tool_response\|>` | Tool execution result |
|
||||
|
||||
String delimiter: `<\|"\|>` (encloses all string values in native format)
|
||||
|
||||
## Native Format (raw model tokens)
|
||||
|
||||
### Tool definition in system prompt:
|
||||
```
|
||||
<|tool>declaration:
|
||||
get_current_temperature{
|
||||
location:{type:<|"|>string<|"|>,description:<|"|>The city<|"|>},
|
||||
unit:{type:<|"|>string<|"|>,enum:[<|"|>celsius<|"|>,<|"|>fahrenheit<|"|>]}
|
||||
}<tool|>
|
||||
```
|
||||
|
||||
### Tool call from model:
|
||||
```
|
||||
<|tool_call>call:get_current_temperature{location:<|"|>London<|"|>}<tool_call|>
|
||||
```
|
||||
|
||||
### Tool response:
|
||||
```
|
||||
<|tool_response>response:get_current_weather{temperature:15,weather:<|"|>sunny<|"|>}<tool_response|>
|
||||
```
|
||||
|
||||
## JSON Chat Format (for Ollama / OpenAI-compatible APIs)
|
||||
|
||||
This is what you actually use in practice. Ollama translates to/from native tokens.
|
||||
|
||||
### Tool definition:
|
||||
```json
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "get_weather",
|
||||
"description": "Get current weather for a location",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"city": {"type": "string", "description": "The city name"}
|
||||
},
|
||||
"required": ["city"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Model returns:
|
||||
```json
|
||||
{
|
||||
"role": "assistant",
|
||||
"tool_calls": [{
|
||||
"function": {
|
||||
"name": "get_weather",
|
||||
"arguments": {"city": "London"}
|
||||
}
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
### Tool result message:
|
||||
```json
|
||||
{
|
||||
"role": "tool",
|
||||
"content": "{\"temperature\": 15, \"weather\": \"sunny\"}"
|
||||
}
|
||||
```
|
||||
|
||||
## Thinking Mode + Tool Calls
|
||||
|
||||
- When thinking is enabled, preserve thoughts between tool calls
|
||||
- For long agent chains, summarize thoughts as plain text to save context
|
||||
- Recommended: **disable thinking for tool-heavy workflows** (Seth's finding)
|
||||
|
||||
## Framework Flags
|
||||
|
||||
| Framework | Required Flag |
|
||||
|-----------|--------------|
|
||||
| llama.cpp | `--jinja` |
|
||||
| vLLM | `--enable-auto-tool-choice` |
|
||||
| Ollama | Works via `/api/chat` endpoint with `tools` field |
|
||||
| transformers | `apply_chat_template(tools=[...])` |
|
||||
|
||||
## Known Issues
|
||||
|
||||
- Ollama v0.20.0-0.20.1: tool call parser broken, streaming drops tool calls
|
||||
- llama.cpp: format mismatches and continuous loops reported
|
||||
- LM Studio: compatibility issues with tool calling
|
||||
- **Workaround:** Use non-streaming mode for tool calls (proven in Simon)
|
||||
Reference in New Issue
Block a user