gemma4-research/tooling/inference-frameworks/snippets/vllm_registry_excerpt.txt

# Source: vllm-project/vllm main branch — vllm/model_executor/models/registry.py
# Verified 2026-04-18 via GitHub API.

# Line 99 (text-only Gemma 4 CausalLM):
"Gemma4ForCausalLM": ("gemma4", "Gemma4ForCausalLM"),

# Line 230 (multimodal Gemma 4: vision + audio + video):
"Gemma4ForCausalLM": ("gemma4_mm", "Gemma4ForConditionalGeneration"),

# The second (_mm) registration maps Gemma4ForCausalLM -> gemma4_mm.Gemma4ForConditionalGeneration,
# which wires in:
#   - vision_tower (pixel_values, pixel_position_ids)
#   - audio_tower  (input_features_padded, input_features_mask)  [E2B/E4B only]
#   - video path   (pixel_values_videos — decomposed to frames, up to 32 frames @ 70 soft tokens)
#
# vLLM dispatches based on whether the HF config has audio_config populated.