docs: initial Gemma 4 research corpus and synthesis

Architecture specs, benchmarks, gotchas, Ollama settings, tool calling
format, and implementation patterns from Simon and AI_Visualizer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Mortdecai
2026-04-12 18:14:19 -04:00
commit 5011059f5d
9 changed files with 861 additions and 0 deletions
+100
View File
@@ -0,0 +1,100 @@
# Gemma 4 Native Tool Calling Format
> Source: Google AI for Developers - Function Calling docs
> https://ai.google.dev/gemma/docs/capabilities/text/function-calling-gemma4
## Special Tokens (6 total)
| Token | Purpose |
|-------|---------|
| `<\|tool>` / `<tool\|>` | Tool definition block |
| `<\|tool_call>` / `<tool_call\|>` | Model's tool request |
| `<\|tool_response>` / `<tool_response\|>` | Tool execution result |
String delimiter: `<\|"\|>` (encloses all string values in native format)
## Native Format (raw model tokens)
### Tool definition in system prompt:
```
<|tool>declaration:
get_current_temperature{
location:{type:<|"|>string<|"|>,description:<|"|>The city<|"|>},
unit:{type:<|"|>string<|"|>,enum:[<|"|>celsius<|"|>,<|"|>fahrenheit<|"|>]}
}<tool|>
```
### Tool call from model:
```
<|tool_call>call:get_current_temperature{location:<|"|>London<|"|>}<tool_call|>
```
### Tool response:
```
<|tool_response>response:get_current_weather{temperature:15,weather:<|"|>sunny<|"|>}<tool_response|>
```
## JSON Chat Format (for Ollama / OpenAI-compatible APIs)
This is what you actually use in practice. Ollama translates to/from native tokens.
### Tool definition:
```json
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "The city name"}
},
"required": ["city"]
}
}
}
```
### Model returns:
```json
{
"role": "assistant",
"tool_calls": [{
"function": {
"name": "get_weather",
"arguments": {"city": "London"}
}
}]
}
```
### Tool result message:
```json
{
"role": "tool",
"content": "{\"temperature\": 15, \"weather\": \"sunny\"}"
}
```
## Thinking Mode + Tool Calls
- When thinking is enabled, preserve thoughts between tool calls
- For long agent chains, summarize thoughts as plain text to save context
- Recommended: **disable thinking for tool-heavy workflows** (Seth's finding)
## Framework Flags
| Framework | Required Flag |
|-----------|--------------|
| llama.cpp | `--jinja` |
| vLLM | `--enable-auto-tool-choice` |
| Ollama | Works via `/api/chat` endpoint with `tools` field |
| transformers | `apply_chat_template(tools=[...])` |
## Known Issues
- Ollama v0.20.0-0.20.1: tool call parser broken, streaming drops tool calls
- llama.cpp: format mismatches and continuous loops reported
- LM Studio: compatibility issues with tool calling
- **Workaround:** Use non-streaming mode for tool calls (proven in Simon)