# TxGemma Therapeutic-development / drug-discovery variant. Built on **Gemma 2**. No Gemma 3 or 4 generation yet. ## What it is Gemma 2 fine-tuned on 7M examples curated from the **Therapeutics Data Commons (TDC)** — predictive tasks across small molecules, proteins, nucleic acids, diseases, and cell lines. Beats or matches state-of-the-art on 50 of 66 TDC tasks; beats specialist models on 26 of them. ## Sizes - **2B predict** — prediction-only, narrow prompt format. - **9B predict** + **9B chat** — prediction plus conversational reasoning. - **27B predict** + **27B chat** — same, larger. ## Model card - https://developers.google.com/health-ai-developer-foundations/txgemma/model-card - DeepMind: https://deepmind.google/models/gemma/txgemma/ - Paper: https://deepmind.google/research/publications/153799/ ## Prompting modes **Prediction mode** (all sizes): structured TDC-format prompt with instruction + context + question + optional few-shot. Output is a short prediction (sometimes a single token or a float). **Conversational mode** (9B, 27B): chat-template interactions, can explain reasoning behind predictions. ## Minimum invocation — prediction ```python from transformers import pipeline pipe = pipeline( "text-generation", model="google/txgemma-27b-predict", device="cuda", ) prompt = ( "Instructions: Predict whether the molecule can penetrate the blood-brain barrier.\n" "Context: Blood-brain barrier penetration is an important property for CNS drugs.\n" "Question: Given the SMILES string CN1C=NC2=C1C(=O)N(C(=O)N2C)C, " "predict BBB penetration. Answer with 'Yes' or 'No'.\n" "Answer:" ) out = pipe(prompt, max_new_tokens=8) print(out[0]["generated_text"]) ``` ## License Health AI Developer Foundations — same terms as MedGemma. Non-clinical, research-use. ## When to choose it over base Gemma 4 - You're doing **drug-discovery research** and need TDC-format predictions out of the box. - You want **SMILES-aware reasoning** without a custom cheminformatics stack. Almost never chosen for general-purpose work. TxGemma's value is the training data, not the base model. ## Homelab fit Zero. Noted for completeness.