gemma4-research/tooling/inference-frameworks/snippets/keras_hub_gemma4.py

"""Canonical Keras / keras-hub example for Gemma 4.

Source: keras-team/keras-hub — keras_hub/src/models/gemma4/
Requires: pip install keras-hub keras[jax]  (or keras[torch] / keras[tensorflow])

Presets (verified 2026-04-18 from gemma4_presets.py):
  gemma4_2b              gemma4_instruct_2b
  gemma4_4b              gemma4_instruct_4b
  gemma4_26b_a4b         gemma4_instruct_26b_a4b
  gemma4_31b             gemma4_instruct_31b

Keras-hub is the reference implementation maintained by the Keras team
(Google). It ships all components modularly — see the directory listing:
gemma4_attention, gemma4_audio_encoder, gemma4_vision_encoder,
gemma4_moe, gemma4_decoder_block, gemma4_causal_lm, etc.  This makes it
the most legible path to *read* the architecture, but it is a
training/fine-tuning tool — not a production inference server.
"""

import keras_hub

# Text causal LM
model = keras_hub.models.Gemma4CausalLM.from_preset("gemma4_instruct_4b")
print(model.generate("Write a haiku about JAX.", max_length=128))

# For multimodal (vision/audio) use the backbone + preprocessors directly:
# backbone = keras_hub.models.Gemma4Backbone.from_preset("gemma4_instruct_4b")
# preproc  = keras_hub.models.Gemma4CausalLMPreprocessor.from_preset("gemma4_instruct_4b")
# Vision and audio encoders are in separate modules (gemma4_vision_encoder,
# gemma4_audio_encoder) and are wired by the backbone when preset includes them.