diff --git a/README.md b/README.md index d4bf567..76d9953 100644 --- a/README.md +++ b/README.md @@ -14,6 +14,7 @@ Research corpus and implementation guidance for Google Gemma 4, based on product | `CORPUS_capabilities.md` | Modalities (vision, audio, video, tools), what it can/can't do | When scoping what Gemma 4 can handle | | `CORPUS_benchmarks.md` | Full benchmark table vs Gemma 3, arena scores, agentic scores | When comparing Gemma 4 to alternatives | | `CORPUS_tool_calling_format.md` | Native token format + JSON API format for function calling | When implementing tool calling | +| `tooling/` | **Canonical upstream tooling** — real scripts, notebooks, model cards, and configs pulled from Google / HF / framework maintainers (147 files). Subdirs: `google-official/`, `huggingface/`, `inference-frameworks/`, `gemma-family/`, `fine-tuning/`. See `tooling/README.md` for index and findings that update the older `CORPUS_*` docs | When you need authoritative source material — model cards, chat templates, fine-tuning recipes, serving commands for vLLM / llama.cpp / MLX, or to scope a specialized sibling (ShieldGemma, EmbeddingGemma, etc.) | ## Source Projects diff --git a/tooling/README.md b/tooling/README.md new file mode 100644 index 0000000..54bb296 --- /dev/null +++ b/tooling/README.md @@ -0,0 +1,50 @@ +# Gemma 4 — Canonical Tooling Corpus + +Actual scripts, notebooks, model cards, and configs downloaded from Google, Hugging Face, and the canonical framework maintainers. Populated 2026-04-18 by parallel research across five lanes. 147 files, ~14 MB. + +**Triage: read the subdirectory README that matches your task, not this one.** This file is an index. + +## Directory map + +| Dir | What's there | When to open it | +|-----|--------------|-----------------| +| `google-official/` | `google-deepmind/gemma` JAX/Flax examples, `google/gemma_pytorch` scripts, `gemma.cpp` README + `gemma_api_server` docs, `google-gemma/cookbook` notebooks, official ai.google.dev HTML snapshots, Gemma 3 tech report PDF | Before trusting any non-Google source; when you need the authoritative prompt format or function-calling spec | +| `huggingface/` | All 8 `google/gemma-4-*` model cards, chat-template `.jinja` files, `tokenizer_config.json` (with response-schema regex), transformers `gemma4/` source, official Gemma 4 Spaces `app.py`, HF launch blog posts | Before writing any transformers / `processor` integration; for the canonical chat-template handling | +| `inference-frameworks/` | Comparison table across vLLM / llama.cpp / MLX / Keras-hub / TGI / Gemini API / Vertex AI. Real launch commands in `run_commands.sh`, 9 code snippets under `snippets/` | When picking a non-Ollama runtime; when you need audio/video input (Ollama doesn't expose it) | +| `gemma-family/` | 12 per-variant briefs: ShieldGemma 2, CodeGemma, PaliGemma 2, RecurrentGemma, DataGemma, MedGemma, TxGemma, EmbeddingGemma, TranslateGemma, FunctionGemma, DolphinGemma, SignGemma + `index.md` | When scoping a project that needs a specialized sister model (embeddings, safety, vision-grounded, translation, tool routing) | +| `fine-tuning/` | Unsloth Gemma 4 notebooks (text/vision/audio/GRPO), Axolotl Gemma 4 YAMLs (including 26B-A4B MoE), TRL reference scripts, Google cookbook fine-tune notebooks, `recipe-recommendation.md` with Seth's homelab-specific path | Before spending a dollar on cloud GPU or starting any Gemma 4 fine-tune | + +## Findings that update / contradict the existing corpus + +These are real gaps worth patching into `SYNTHESIS.md`, `GOTCHAS.md`, or `CORPUS_tool_calling_format.md`. Flagged here, not applied — the user asked for research, not a rewrite. + +1. **Prompt-token format changed in Gemma 4.** Gemma 1/2/3 used `user ... `. Gemma 4 uses asymmetric pipe-brackets: `<|turn>user\n ... `. Also new: `<|think|>`, `<|channel>thought...`, `<|tool>`, `<|tool_call>`, `<|tool_response>` (+ inverses), `<|image>`, `<|audio>`, and string delimiter `<|"|>`. The existing `CORPUS_tool_calling_format.md` documents the tool tokens but doesn't reflect the turn-token change or the thinking/channel tokens. Canonical source: `huggingface/model-cards/gemma-4-31B-it-chat_template.jinja` and `google-official/docs/ai-google-dev_prompt_formatting_gemma4.html`. + +2. **`google/gemma_pytorch` is abandoned for Gemma 4.** Last push 2025-05-30; the variants validator rejects Gemma 4 IDs. Anyone pointing at it as the PyTorch reference is wrong — use HF `transformers` or `google-deepmind/gemma` (JAX/Flax) instead. + +3. **`gemma.cpp` ships a Gemini-API-compatible local HTTP server** (`gemma_api_server`, endpoint `POST /v1beta/models/:generateContent`, SSE streaming). This is a Google-authored alternative to Ollama that speaks the real Gemini REST API — possibly the single most interesting discovery in this research pass. See `google-official/gemma-cpp/API_SERVER_README.md`. + +4. **Transformers exposes `AutoModelForMultimodalLM` (new AutoClass)** — not `AutoModelForCausalLM`. It also exposes `processor.parse_response(..., response_schema=...)` driven from `tokenizer_config.json`, which replaces the hand-rolled regex in the current `CORPUS_tool_calling_format.md`. Pin: `transformers>=5.5.4`. + +5. **Gemma 4 breaks Flash Attention.** FA2's max head_dim is 256, FA4's is 128, and Gemma 4's global head_dim is 512. Use SDP or Flex Attention. Axolotl hard-codes `sdp_attention: true` for Gemma 4. This belongs in `GOTCHAS.md`. + +6. **The 26B variant is a MoE** — `gemma-4-26B-A4B` (A4B = 4B active per token). Quantization rules differ: Unsloth says use 16-bit LoRA, not 4-bit QLoRA, for acceptable quality. Axolotl's ScatterMoE + expert-LoRA config is the only tool validated for 4-bit MoE training. Worth a line in `CORPUS_ollama_variants.md`. + +7. **No Gemma 4 technical report PDF exists yet** as of 2026-04-18. DeepMind repo says "Gemma 4 (Coming soon)". Gemma 3 report (downloaded at `google-official/tech-report/Gemma3Report.pdf`) remains the closest authoritative family citation. + +8. **No `google/gemma-4-*` specialized siblings yet** — ShieldGemma, CodeGemma, PaliGemma, MedGemma, DataGemma are all still on Gemma 2 or 3 base. Historical lag is 3–6 months; expect siblings-on-4 mid-to-late 2026. + +9. **No Gemma-4-specific TRL script in `huggingface/trl` yet.** HF blog says "fully supported," but the SFT/DPO/GRPO examples are still on Gemma 3 model IDs. Drop-in with `model_id` swap works. Only Gemma-4-dedicated TRL example today is `huggingface-gemma-recipes/carla_vlm_gemma.py` (VLM GRPO). + +10. **HF Spaces `app.py` files are the shortest Gemma 4 inference examples** — Google and HF both use them as ref. See `huggingface/spaces/huggingface-projects_gemma-4-{31b,e4b}-it-app.py`. + +## Immediate homelab plug-ins (from the gemma-family research) + +- **EmbeddingGemma (308M)** — 100+ languages, Matryoshka to 128d. Drop-in upgrade from `nomic-embed-text` on both hosts. +- **FunctionGemma (270M)** — cheap tool-router in front of `mortdecai:*` (latency win on hot path). +- **PaliGemma 2 3B-448** — vision grounding with bbox output for AI_Visualizer / AI visualizer CT 167 alongside SDXL. +- **TranslateGemma 4B** — useful for the family history agent (German/Polish sources). + +## Source-url discipline + +Every URL in the subdirectory READMEs was fetched and verified, not reconstructed from training. If a downloaded file is wrong, `git log` will show when it was pulled; the agent transcripts are the record of the source commit. Upstream repos can and do rename paths (see: `google-gemini/gemma-cookbook` → `google-gemma/cookbook`). Re-verify before citing externally. diff --git a/tooling/fine-tuning/README.md b/tooling/fine-tuning/README.md new file mode 100644 index 0000000..287562a --- /dev/null +++ b/tooling/fine-tuning/README.md @@ -0,0 +1,281 @@ +# Gemma 4 Fine-Tuning Tooling — Index + +Research captured 2026-04-18. All downloads verified against upstream repos. + +## TL;DR + +| Tool | Gemma 4 coverage | GPU floor (LoRA) | GPU floor (full FT) | Best at | +|------|------------------|------------------|---------------------|---------| +| **Unsloth** | Full parity — all 4 sizes, text/vision/audio/GRPO/RL | E2B: 8 GB, E4B: 17 GB, 26B A4B: ~40 GB, 31B QLoRA: 22 GB | Not recommended locally | **Fastest path**, Google-blessed, free Colab | +| **TRL** | Partial — no `sft_gemma4.py` yet; `sft_gemma3.py` + `AutoModelForImageTextToText` works | Same as Unsloth w/ `load_in_4bit` | 2x H100 min for 31B | Research-grade control, DPO/GRPO/online RL, VLM GRPO on Gemma 4 (CARLA) | +| **Axolotl** | **Native Gemma 4 configs shipped** (`examples/gemma4/`) | Single 5090 (32 GB) for 26B A4B QLoRA validated | >80 GB, "not tested" per README | Declarative YAML, multi-GPU FSDP, MoE expert LoRA | +| **Google cookbook** | `docs/core/*` notebooks default to `google/gemma-4-E2B` | Depends on Colab tier | L4 (22 GB) for E4B QLoRA | Canonical baseline, paired with ai.google.dev docs | +| **HF gemma-recipes** | Inference + one GRPO VLM script (CARLA) | E2B on T4 | — | VLM GRPO with tool-calling environment | +| **Ollama** | Serves fine-tuned Gemma 4 via Modelfile `ADAPTER` | — | — | Final serving step | + +**Recommendation for Seth: Unsloth.** See `recipe-recommendation.md`. + +--- + +## 1. Unsloth (`unsloth/`) + +**Upstream:** `unslothai/notebooks`, `unslothai/unsloth` +**License:** LGPL-3.0 (notebooks), Apache-2.0 (library) +**Published Gemma 4 Dynamic quants:** +- `unsloth/gemma-4-{E2B,E4B,31B,26B-A4B}-{,it}-unsloth-bnb-4bit` (dynamic 4-bit) +- `unsloth/gemma-4-{E2B,E4B,31B,26B-A4B}-it-GGUF` (GGUF for inference) +- Collection: https://huggingface.co/collections/unsloth/gemma-4 + +**Downloaded files (local paths under this directory):** +- `unsloth/notebooks/Gemma4_(E2B)-Text.ipynb` — **canonical SFT notebook, T4-compatible** +- `unsloth/notebooks/Gemma4_(E4B)-Text.ipynb` — 10 GB VRAM, higher accuracy +- `unsloth/notebooks/Gemma4_(26B_A4B)-Text.ipynb` — MoE SFT (needs A100+) +- `unsloth/notebooks/Gemma4_(31B)-Text.ipynb` — dense 31B SFT +- `unsloth/notebooks/Gemma4_(E2B|E4B|26B_A4B|31B)-Vision.ipynb` — vision SFT w/ `UnslothVisionDataCollator` +- `unsloth/notebooks/Gemma4_(E2B|E4B)-Audio.ipynb` — audio SFT (E2B/E4B only — 31B/26B have no audio encoder) +- `unsloth/notebooks/Gemma4_(E2B)_GRPO.ipynb` — GRPO RL w/ Python reward funcs +- `unsloth/notebooks/Gemma4_(E2B)_Reinforcement_Learning_{2048,Sudoku}_Game.ipynb` — game-playing RL +- `unsloth/python_scripts/*.py` — same content as `.py` scripts (easier to grep/modify) +- `unsloth/kaggle/Gemma4_(31B)-Text.ipynb`, `unsloth/kaggle/Gemma4_(E4B)-Text.ipynb` — Kaggle-flavored variants +- `unsloth/docs/unsloth-README.md` — top-level Unsloth README + +**Upstream URLs (useful to share):** +- SFT E4B Colab: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma4_(E4B)-Text.ipynb +- GRPO Colab: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma4_(E2B)_GRPO.ipynb +- Unsloth Gemma 4 docs: https://unsloth.ai/docs/models/gemma-4/train + +### Unsloth chat-template & masking detail (CRITICAL for Gemma 4) + +Gemma 4 does **not** use Gemma 3's `` / ``. The new format is: + +``` +<|turn>user +Hello +<|turn>model +Hey there! +``` + +Unsloth's helper: +```python +from unsloth.chat_templates import get_chat_template +tokenizer = get_chat_template(tokenizer, chat_template = "gemma-4") # literal "gemma-4", not "gemma4" +``` + +Response-only masking (matches Unsloth's convention; everything *before* `response_part` is loss-masked): +```python +from unsloth.chat_templates import train_on_responses_only +trainer = train_on_responses_only( + trainer, + instruction_part = "<|turn>user\n", + response_part = "<|turn>model\n", +) +``` + +`` gotcha: `apply_chat_template` prepends ``; Unsloth's `formatting_prompts_func` strips it with `.removeprefix('')` because the SFTTrainer's data collator adds its own — double `` silently degrades training. + +**Tool tokens (`<|tool>`, `<|tool_call>`, `<|tool_response>`, `<|"|>`) are *not* masked** in Unsloth's default setup — they flow through as plain text inside user/assistant turns. If you're fine-tuning on tool-call data, include full `<|tool_call>...` markup in the assistant `content` field; the template doesn't need a special `role=tool` branch. + +### Unsloth MoE note + +For 26B A4B (128 experts): Unsloth explicitly recommends **bf16/16-bit LoRA, NOT 4-bit QLoRA** ("MoE QLoRA not recommended, dense 31B is fine"). Their notebook uses `load_in_4bit = True` at >40 GB but the docs flag this as suboptimal. + +--- + +## 2. TRL (`trl/`) + +**Upstream:** `huggingface/trl` +**License:** Apache-2.0 + +**Gemma 4-specific scripts:** NONE in `examples/scripts/` as of 2026-04-18. The canonical Gemma 4 TRL example lives in `huggingface-gemma-recipes/scripts/carla_vlm_gemma.py` (see next section). + +**Closest-match Gemma 3 scripts downloaded (drop-in for Gemma 4 — change `model_id` to `google/gemma-4-*-it`, keep `AutoModelForImageTextToText`):** +- `trl/sft_gemma3.py` — **use this as the Gemma 4 SFT template**. Pure text SFT (Codeforces-COTS). +- `trl/sft_vlm_gemma3.py` — vision SFT template (uses `AutoModelForImageTextToText`, `all-linear` LoRA). +- `trl/sft.py`, `trl/trl_scripts_sft.py` — the generic SFTTrainer wrappers. +- `trl/sft_vlm.py` — model-agnostic VLM SFT. +- `trl/dpo.py` — DPO (1-liner using TrlParser). +- `trl/grpo_agent.py`, `trl/grpo_vlm.py` — GRPO with tool-calling environments. +- `trl/sft_tiny_aya_tool_calling.py` — tool-calling SFT pattern. + +**Chat template / masking detail:** TRL's `SFTTrainer` uses `tokenizer.apply_chat_template` end-to-end and delegates to the tokenizer's built-in Jinja template. For `google/gemma-4-*-it`, that template already produces `<|turn>user…`. TRL supports `completion_only_loss` via the `SFTConfig(assistant_only_loss=True)` flag (TRL ≥ 0.22), which masks anything before the assistant turn — no manual `instruction_part` plumbing needed. + +### Official HF blog says (verbatim): +> "Gemma 4 is fully supported for fine-tuning with TRL. … we have prepared an example on how to fine-tune Gemma 4 with TRL on Vertex AI using SFT, to showcase how to extend the function calling capabilities, **whilst freezing both the vision and audio towers**." +(see `huggingface-recipes/hf-blog-gemma4.md` §634-687) + +--- + +## 3. Axolotl (`axolotl/`) + +**Upstream:** `axolotl-ai-cloud/axolotl`, `examples/gemma4/` +**License:** Apache-2.0 +**Gemma 4 status:** **Native support shipped**, day-one-class parity. + +**Downloaded files:** +- `axolotl/README.md` — official Axolotl Gemma 4 guide +- `axolotl/31b-qlora.yaml` — 31B dense QLoRA, 1x80GB @ ~44 GB VRAM +- `axolotl/31b-qlora-flex.yaml` — 31B dense QLoRA + Flex Attention, 1x80GB @ ~26 GB (40% less VRAM, 50% throughput cost) +- `axolotl/26b-a4b-moe-qlora.yaml` — 26B MoE QLoRA + ScatterMoE expert-quantized + Expert-LoRA. Validated: 50 steps FineTome, loss 8.8→1.8, single RTX 5090 (32 GB), 21 GiB peak +- `axolotl/e2b-vision-lora.yaml` — E2B vision LoRA with `freeze_mm_modules: true` + +**Run command (from Axolotl README):** +```bash +axolotl train examples/gemma4/26b-a4b-moe-qlora.yaml +axolotl train examples/gemma4/31b-qlora.yaml +axolotl train examples/gemma4/31b-qlora-flex.yaml +``` + +### Axolotl chat template & masking detail + +```yaml +chat_template: gemma4 +datasets: + - path: mlabonne/FineTome-100k + type: chat_template + field_messages: conversations + message_property_mappings: + role: from + content: value +``` +`chat_template: gemma4` (no dash — Axolotl's key is different from Unsloth's `"gemma-4"`). The template applies Gemma 4 turn tokens (`<|turn>user … `). Masking is handled automatically by `type: chat_template` — only the assistant turn counts toward loss. + +### Axolotl hard limitations for Gemma 4 (from their README) + +- **Flash Attention OFF.** FA2 caps head_dim at 256; FA4 at 128; Gemma 4's `global_head_dim=512` exceeds both. **Use SDP or Flex Attention.** (`sdp_attention: true` in every yaml.) +- **LoRA kernels OFF.** Due to Gemma 4's shared-KV layers (last N layers reuse K/V tensors): `lora_mlp_kernel: false`, `lora_qkv_kernel: false`, `lora_o_kernel: false`. +- **`lora_target_linear` is incompatible** for multimodal. You MUST use `lora_target_modules` with the regex (see below) to restrict LoRA to the text decoder and NOT the vision/audio encoders. + +Axolotl's canonical regex restricts LoRA to text layers only: +```regex +model.language_model.layers.[\d]+.(_checkpoint_wrapped_module.)?(mlp|self_attn).(up|down|gate|q|k|v|o)_proj +``` + +For 26B A4B MoE, additionally target expert 3D tensors: +```yaml +lora_target_parameters: + - experts.gate_up_proj + - experts.down_proj +``` + +--- + +## 4. Google Cookbook (`google-cookbook/`) + +**Upstream:** `google-gemma/cookbook`, `docs/core/` +**License:** Apache-2.0 +**Gemma 4 status:** The `docs/core/*.ipynb` fine-tuning notebooks default to `google/gemma-4-E2B` as `model_id` — they ARE the Gemma 4 path, despite generic filenames. + +**Downloaded files:** +- `google-cookbook/huggingface_text_finetune_qlora.ipynb` — **text-to-SQL QLoRA tutorial** (gretel-synthetic-text-to-sql dataset, `philschmid/gretel-synthetic-text-to-sql`). This is the one ai.google.dev links to as the "official" fine-tune path. +- `google-cookbook/huggingface_text_full_finetune.ipynb` — full-weights fine-tune variant +- `google-cookbook/huggingface_vision_finetune_qlora.ipynb` — vision QLoRA on product descriptions +- `google-cookbook/lora_tuning.ipynb` — LoRA concepts tutorial +- `google-cookbook/function-calling-gemma4.ipynb` — official Google function-calling notebook (not a fine-tune, but the authoritative reference for tool-call tokens) +- `google-cookbook/Gemma_4_HDP_Agentic_Security.ipynb` + `Gemma_4_HDP_README.md` — full-app fine-tune example (agentic security) + +**Upstream URLs:** +- https://ai.google.dev/gemma/docs/core/huggingface_text_finetune_qlora +- https://ai.google.dev/gemma/docs/core/huggingface_vision_finetune_qlora +- https://ai.google.dev/gemma/docs/capabilities/text/function-calling-gemma4 + +### Google cookbook chat template & masking detail (VERY IMPORTANT) + +The cookbook notebooks use TRL's `SFTTrainer` with standard `messages` list (`role`/`content`) — chat-template is applied automatically by the tokenizer's built-in Jinja. No manual `instruction_part`/`response_part`. + +**The non-obvious detail** is the `LoraConfig`: +```python +peft_config = LoraConfig( + lora_alpha=16, lora_dropout=0.05, r=16, bias="none", + target_modules="all-linear", + task_type="CAUSAL_LM", + modules_to_save=["lm_head", "embed_tokens"], # NOTE + ensure_weight_tying=True, # NOTE +) +``` +`modules_to_save=["lm_head","embed_tokens"]` + `ensure_weight_tying=True` is required because **Gemma 4 introduced new special tokens (`<|turn>`, `<|tool>`, `<|tool_call>`, `<|tool_response>`, `<|"|>`) that need their embeddings to be trainable in a fine-tune.** PEFT 0.15+ added `ensure_weight_tying` specifically for this case. Skipping it causes the adapter to see frozen random embeddings for the new tokens and training silently underperforms. + +For vision, Google's cookbook uses plain `target_modules="all-linear"` (NO `exclude_modules`) — meaning it *does* train LoRA adapters on the vision tower. This is a different tradeoff from Axolotl (`freeze_mm_modules: true`) and from TRL's CARLA recipe (`exclude_modules=["vision_tower", "multi_modal_projector"]`). Pick based on whether your task needs the vision encoder to adapt (e.g., new image domain) or just the text decoder (most cases). + +--- + +## 5. HuggingFace gemma-recipes (`huggingface-recipes/`) + +**Upstream:** `huggingface/huggingface-gemma-recipes` +**License:** Apache-2.0 + +**Downloaded files:** +- `huggingface-recipes/carla_vlm_gemma.py` — **The canonical TRL + Gemma 4 example.** GRPO VLM training in a CARLA driving environment with tool calls. Shows `exclude_modules=["vision_tower", "multi_modal_projector"]`, `chat_template_kwargs={"enable_thinking": False}`, `max_tool_calling_iterations=10`. +- `huggingface-recipes/Gemma4_(E2B)-Multimodal.ipynb` — **inference-only** multimodal demo (vision, video, audio, function calling, object detection). Not a fine-tune but necessary reference for the input format the training data must match. +- `huggingface-recipes/README.md` — HF's top-level recipes index +- `huggingface-recipes/hf-blog-gemma4.md` — the HF blog post's raw markdown (§630-707 is the fine-tuning section) + +**Run command for the CARLA VLM RL example:** +```bash +pip install git+https://github.com/huggingface/trl.git +python examples/scripts/openenv/carla_vlm_gemma.py \ + --env-urls https://sergiopaniego-carla-env.hf.space https://sergiopaniego-carla-env-2.hf.space \ + --model google/gemma-4-E2B-it +``` + +**Known gap:** HF's gemma-recipes repo has *fine-tuning* notebooks for Gemma 3 and Gemma 3n (free T4 Colab) but **no pure-SFT Gemma 4 fine-tuning notebook yet** — the Gemma 4 Colab is inference only. Their blog points users to Unsloth Studio for the easy path. + +--- + +## 6. Ollama / llama.cpp LoRA serving (`ollama-llamacpp/`) + +**Downloaded:** `ollama-llamacpp/ollama-import-lora.md` — distilled from https://docs.ollama.com/import (2026-04-18 fetch). + +**Short answer:** Yes, you can serve a Gemma 4 LoRA via Ollama. Two paths: + +1. **Merge then serve (simpler, recommended):** `model.save_pretrained_merged("out", tokenizer, save_method="merged_16bit")` → `llama.cpp/convert_hf_to_gguf.py` → `llama.cpp/quantize` to Q4_K_M → `ollama create mymodel -f Modelfile` with `FROM ./gemma4-mortdecai.gguf`. +2. **Adapter-only serve:** `llama.cpp/convert_lora_to_gguf.py` on the PEFT directory → Modelfile with `FROM gemma4:e4b-it-q8_0` + `ADAPTER ./adapter.gguf`. + +Ollama's docs list supported architectures as Llama/Mistral/Gemma 1-2 — Gemma 4 isn't *explicitly* listed, but llama.cpp has day-one Gemma 4 support and in practice the path works. (Vision-adapter serving via Ollama is still a grey area.) + +--- + +## 7. Datasets the canonical tutorials pair with Gemma 4 + +| Tutorial | Dataset | Format | Notes | +|----------|---------|--------|-------| +| Unsloth Gemma4 E4B Text | `mlabonne/FineTome-100k` | ShareGPT-style `conversations` field | Also the Axolotl default | +| Unsloth Gemma4 GRPO | Synthetic kernel-optimization prompts in-notebook | Python reward funcs | RL w/ `function_works` / `check_only_stdlib_imports` | +| Unsloth Gemma4 Vision | `unsloth/LaTeX_OCR` | HF image-text pairs | Demonstrates `UnslothVisionDataCollator` | +| Google cookbook text QLoRA | `philschmid/gretel-synthetic-text-to-sql` | chat `messages` list | Google's "official" demo dataset for Gemma 4 | +| Google cookbook vision QLoRA | `philschmid/amazon-product-descriptions-vlm` | image + text pairs | Product-description generation | +| Axolotl Gemma 4 (all sizes) | `mlabonne/FineTome-100k` | `type: chat_template` | Validated in axolotl README | +| Axolotl E2B vision LoRA | `HuggingFaceH4/llava-instruct-mix-vsft` | vision-language SFT | Same as HF's VLM template | +| TRL sft_gemma3 (transfers) | `open-r1/codeforces-cots` | `messages` list | Chain-of-thought coding | +| TRL carla_vlm_gemma (Gemma 4 VLM GRPO) | CARLA simulator (live) | environment rollouts | Multimodal tool responses | + +No one uses Alpaca or UltraChat as the canonical Gemma 4 pair. **FineTome-100k is the unofficial standard** — both Unsloth and Axolotl default to it. + +--- + +## 8. Chat-template-and-masking matrix (the debugging cheat sheet) + +| Framework | chat_template key | Turn tokens | Response masking API | BOS handling | +|-----------|-------------------|-------------|----------------------|--------------| +| Unsloth | `"gemma-4"` | `<|turn>role\n...` | `train_on_responses_only(instruction_part="<|turn>user\n", response_part="<|turn>model\n")` | Strip `` manually with `.removeprefix('')` before passing to trainer | +| TRL | tokenizer's built-in Jinja (no key needed) | same | `SFTConfig(assistant_only_loss=True)` | Tokenizer handles automatically | +| Axolotl | `chat_template: gemma4` (no dash) | same | automatic via `type: chat_template` | Automatic | +| Google cookbook | tokenizer built-in Jinja | same | automatic via `SFTTrainer` + `messages` | Automatic | + +Tool tokens (`<|tool>`, `<|tool_call>`, `<|tool_response>`, `<|"|>`) ride inside message content — none of the frameworks mask them specially, and none provide a `role="tool"` branch in the default template. If you're training tool-call data, put the complete `<|tool_call>call:{...}` block in the assistant message `content`. + +Also: **all Gemma 4 fine-tunes should `modules_to_save=["lm_head","embed_tokens"]` + `ensure_weight_tying=True`** in LoraConfig if you're using PEFT directly, because the new special-token embeddings need to be trainable. Unsloth and Axolotl handle this for you; naïve TRL + PEFT scripts do NOT by default. + +--- + +## What's NOT here (and why) + +- **Kaggle/Colab free-tier notebooks as a separate category** — the Unsloth notebooks *are* the free-tier notebooks. E2B Text runs on a free T4; 31B/26B-A4B need A100 Colab Pro. I pulled 2 Kaggle-flavored variants to `unsloth/kaggle/` for completeness. +- **Google's DeepMind JAX/Flax Gemma 4 fine-tune script** — Google's DeepMind-gemma repo ships inference/reference code, not a SFT script. Google's *canonical* fine-tune path is the HF+TRL notebook in `google-gemma/cookbook` (above), NOT JAX. If you want JAX, see the archived `.archive/Gemma/[Gemma_1]Finetune_distributed.ipynb` pattern — not ported to Gemma 4. +- **Full-weights 31B fine-tuning commands** — Axolotl's README says "heavy and has not been tested." Skip unless Seth rents an 8×H100 pod. +- **Prompt engineering / inference-only notebooks** — per scope. + +## See also + +- `recipe-recommendation.md` — which tool Seth should actually use for his homelab, with the exact command. +- `../../GOTCHAS.md` §"Fine-Tuning Ecosystem Issues" — day-one issues (required `mm_token_type_ids` field, Gemma4ClippableLinear PEFT issue, E2B/E4B training loss 13-15 being normal). +- `../../CORPUS_tool_calling_format.md` — the 6 tool-calling special tokens. diff --git a/tooling/fine-tuning/axolotl/26b-a4b-moe-qlora.yaml b/tooling/fine-tuning/axolotl/26b-a4b-moe-qlora.yaml new file mode 100644 index 0000000..e7bdb6f --- /dev/null +++ b/tooling/fine-tuning/axolotl/26b-a4b-moe-qlora.yaml @@ -0,0 +1,93 @@ +# Gemma 4 26B-A4B MoE QLoRA with ScatterMoE kernels +# +# Validated: 50 steps on FineTome-100k, loss 8.8 -> 1.8, single RTX 5090 (32GB) +# torch_compile=true: 21 GiB peak VRAM, ~230 tok/s, 336s total +# +# Key notes: +# - Max sequence length on 32GB GPU: 2048 (micro_batch_size=1, SDP attention). +# 4096 seq_len OOMs due to head_dim=512 math SDP materializing full score matrix. +# Use 48GB+ GPUs for longer sequences or multi-GPU with FSDP. + +base_model: google/gemma-4-26B-A4B + +plugins: + - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin + - axolotl.integrations.kernels.KernelsPlugin + - axolotl.integrations.liger.LigerPlugin +use_kernels: true +use_scattermoe: true +experts_implementation: scattermoe +torch_compile: true +liger_layer_norm: true +liger_rope: true +liger_rms_norm: true +liger_glu_activation: true +liger_rms_norm_gated: true +strict: false + +chat_template: gemma4 +datasets: + - path: mlabonne/FineTome-100k + type: chat_template + split: train[:10%] + field_messages: conversations + message_property_mappings: + role: from + content: value +val_set_size: 0.05 +output_dir: ./outputs/gemma4-26b-a4b-qlora + +sequence_len: 2048 +sample_packing: true + +load_in_4bit: true +quantize_moe_experts: true +adapter: qlora +lora_r: 16 +lora_alpha: 32 +lora_dropout: 0 + +# Restrict LoRA to text backbone only (skip vision/audio encoders) +# using regex to match only the text decoder attention projections. +lora_target_modules: 'model.language_model.layers.[\d]+.(_checkpoint_wrapped_module.)?(mlp|self_attn).(up|down|gate|q|k|v|o)_proj' + +# MoE expert LoRA (3D Parameter tensors, not nn.Linear) +lora_target_parameters: + - experts.gate_up_proj + - experts.down_proj + +lora_mlp_kernel: false +lora_qkv_kernel: false +lora_o_kernel: false + +bnb_config_kwargs: + bnb_4bit_use_double_quant: true + +wandb_project: +wandb_entity: +wandb_watch: +wandb_name: +wandb_log_model: + +gradient_accumulation_steps: 4 +micro_batch_size: 1 +num_epochs: 1 +optimizer: adamw_torch_8bit +lr_scheduler: cosine +learning_rate: 0.0002 + +bf16: auto +tf32: true + +gradient_checkpointing: true +activation_offloading: true +logging_steps: 1 + +# FA2 not supported +sdp_attention: true + +warmup_ratio: 0.1 +evals_per_epoch: 4 +saves_per_epoch: 1 +weight_decay: 0.0 +special_tokens: diff --git a/tooling/fine-tuning/axolotl/31b-qlora-flex.yaml b/tooling/fine-tuning/axolotl/31b-qlora-flex.yaml new file mode 100644 index 0000000..8456c9c --- /dev/null +++ b/tooling/fine-tuning/axolotl/31b-qlora-flex.yaml @@ -0,0 +1,71 @@ +base_model: google/gemma-4-31B + +plugins: + - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin + - axolotl.integrations.liger.LigerPlugin +torch_compile: true +liger_layer_norm: true +liger_rope: true +liger_rms_norm: true +liger_glu_activation: true +liger_rms_norm_gated: true +strict: false + +chat_template: gemma4 +datasets: + - path: mlabonne/FineTome-100k + type: chat_template + split: train[:10%] + field_messages: conversations + message_property_mappings: + role: from + content: value +val_set_size: 0.05 +output_dir: ./outputs/gemma4-31b-qlora-flex + +sequence_len: 2048 +sample_packing: true + +load_in_4bit: true +adapter: qlora +lora_r: 16 +lora_alpha: 32 +lora_dropout: 0 + +# Restrict LoRA to text backbone only (skip vision/audio encoders) +lora_target_modules: 'model.language_model.layers.[\d]+.(_checkpoint_wrapped_module.)?(mlp|self_attn).(up|down|gate|q|k|v|o)_proj' + +lora_mlp_kernel: false +lora_qkv_kernel: false +lora_o_kernel: false + +bnb_config_kwargs: + bnb_4bit_use_double_quant: true + +wandb_project: +wandb_entity: +wandb_watch: +wandb_name: +wandb_log_model: + +gradient_accumulation_steps: 4 +micro_batch_size: 1 +optimizer: adamw_torch_8bit +lr_scheduler: cosine +learning_rate: 0.0002 + +bf16: auto +tf32: true + +gradient_checkpointing: true +activation_offloading: true +logging_steps: 1 + +# FA not supported +flex_attention: true + +warmup_ratio: 0.1 +evals_per_epoch: 4 +saves_per_epoch: 1 +weight_decay: 0.0 +special_tokens: diff --git a/tooling/fine-tuning/axolotl/31b-qlora.yaml b/tooling/fine-tuning/axolotl/31b-qlora.yaml new file mode 100644 index 0000000..42086a4 --- /dev/null +++ b/tooling/fine-tuning/axolotl/31b-qlora.yaml @@ -0,0 +1,69 @@ +base_model: google/gemma-4-31B + +plugins: + - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin + - axolotl.integrations.liger.LigerPlugin +torch_compile: false +liger_layer_norm: true +liger_rope: true +liger_rms_norm: true +liger_glu_activation: true +liger_rms_norm_gated: true +strict: false + +chat_template: gemma4 +datasets: + - path: mlabonne/FineTome-100k + type: chat_template + split: train[:10%] + field_messages: conversations + message_property_mappings: + role: from + content: value +val_set_size: 0.05 +output_dir: ./outputs/gemma4-31b-qlora + +sequence_len: 2048 +sample_packing: true + +load_in_4bit: true +adapter: qlora +lora_r: 16 +lora_alpha: 32 +lora_dropout: 0 + +# Restrict LoRA to text backbone only (skip vision/audio encoders) +# using regex to match only the text decoder attention projections. +lora_target_modules: 'model.language_model.layers.[\d]+.(_checkpoint_wrapped_module.)?(mlp|self_attn).(up|down|gate|q|k|v|o)_proj' + +bnb_config_kwargs: + bnb_4bit_use_double_quant: true + +wandb_project: +wandb_entity: +wandb_watch: +wandb_name: +wandb_log_model: + +gradient_accumulation_steps: 4 +micro_batch_size: 1 +num_epochs: 1 +optimizer: adamw_torch_8bit +lr_scheduler: cosine +learning_rate: 0.0002 + +bf16: auto +tf32: true + +gradient_checkpointing: true +activation_offloading: true +logging_steps: 1 + +# FA not supported +sdp_attention: true + +warmup_ratio: 0.1 +evals_per_epoch: 4 +saves_per_epoch: 1 +weight_decay: 0.0 +special_tokens: diff --git a/tooling/fine-tuning/axolotl/README.md b/tooling/fine-tuning/axolotl/README.md new file mode 100644 index 0000000..68274ee --- /dev/null +++ b/tooling/fine-tuning/axolotl/README.md @@ -0,0 +1,60 @@ +# Finetune Google's Gemma 4 with Axolotl + +[Gemma 4](https://huggingface.co/collections/google/gemma-4) is a family of multimodal models from Google. This guide covers how to train them with Axolotl. + +## Getting started + +1. Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html). + +2. Install [Cut Cross Entropy](https://docs.axolotl.ai/docs/custom_integrations.html#cut-cross-entropy) to reduce training VRAM usage. + +3. Run the finetuning example: + +```bash +# 26B MoE QLoRA (1x80GB @ ~50 GiB) +axolotl train examples/gemma4/26b-a4b-moe-qlora.yaml + +# 31B Dense QLoRA (1x80GB @ ~44 GiB) +axolotl train examples/gemma4/31b-qlora.yaml + +# 31B Dense QLoRA Flex Attn (1x80GB @ ~26 GiB) +axolotl train examples/gemma4/31b-qlora-flex.yaml +``` + +### MoE Expert Quantization & Expert LoRA (26B-A4B only) + +The 26B-A4B config uses ScatterMoE kernels via the transformers `ExpertsInterface` and quantizes expert weights on load. To learn about expert quantization, expert LoRA targeting, and related limitations, see the [MoE Expert Quantization](https://docs.axolotl.ai/docs/expert_quantization.html) docs. + +## Flex Attention + +Reduce ~40% VRAM (at the cost of up to half throughput) by setting the below (shown in `examples/gemma4/31b-qlora-flex.yaml`): + +```yaml +torch_compile: true +flex_attention: true +``` + +This works for both the MoE and Dense model. + +## Limitations + +- **Flash Attention**: FA2 (max head_dim=256) and FA4 (max head_dim=128) cannot support Gemma 4's `global_head_dim=512`. Use SDP or flex attention instead. +- **LoRA kernels**: Not supported due to KV-sharing layers. +- **lora_target_linear**: Incompatible for multimodal models — use `lora_target_modules` with a regex to restrict LoRA to the text backbone. + +### TIPS + +- Read more on how to load your own dataset at [docs](https://docs.axolotl.ai/docs/dataset_loading.html). +- You can run full finetuning by removing `adapter: qlora`, `load_in_4bit: true`, and `quantize_moe_experts: true` from the config. This is heavy and has not been tested. + +## Optimization Guides + +Please check the [Optimizations doc](https://docs.axolotl.ai/docs/optimizations.html). + +## Related Resources + +- [Gemma 4 Blog](https://huggingface.co/blog/gemma4) +- [Axolotl Docs](https://docs.axolotl.ai) +- [Axolotl Website](https://axolotl.ai) +- [Axolotl GitHub](https://github.com/axolotl-ai-cloud/axolotl) +- [Axolotl Discord](https://discord.gg/7m9sfhzaf3) diff --git a/tooling/fine-tuning/axolotl/e2b-vision-lora.yaml b/tooling/fine-tuning/axolotl/e2b-vision-lora.yaml new file mode 100644 index 0000000..c779aae --- /dev/null +++ b/tooling/fine-tuning/axolotl/e2b-vision-lora.yaml @@ -0,0 +1,62 @@ +# Gemma 4 E2B Vision LoRA +# +# Fine-tuning LM LoRA adapters on multimodal Gemma4 with vision/multimodal modules frozen. +# Uses the base ProcessingStrategy (auto-detects image_token from processor). + +base_model: google/gemma-4-E2B-it +processor_type: AutoProcessor +freeze_mm_modules: true + +plugins: + - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin +strict: false + +# Required for vision/multimodal training +skip_prepare_dataset: true +remove_unused_columns: false +sample_packing: false + +chat_template: gemma4 +datasets: + - path: HuggingFaceH4/llava-instruct-mix-vsft + type: chat_template + split: train[:100] + +val_set_size: 0 +output_dir: ./outputs/gemma4-e2b-vision-lora + +adapter: lora +sequence_len: 2048 +pad_to_sequence_len: false + +lora_r: 16 +lora_alpha: 32 +lora_dropout: 0 +# Target language model only — vision encoder is frozen via freeze_mm_modules +lora_target_modules: 'model.language_model.layers.[\d]+.(_checkpoint_wrapped_module.)?(mlp|self_attn).(up|down|gate|q|k|v|o)_proj' + +gradient_accumulation_steps: 4 +micro_batch_size: 1 +num_epochs: 1 +max_steps: 10 +optimizer: adamw_torch_8bit +lr_scheduler: cosine +learning_rate: 0.0002 + +bf16: auto +tf32: true + +gradient_checkpointing: true +gradient_checkpointing_kwargs: + use_reentrant: false +logging_steps: 1 +sdp_attention: true + +warmup_ratio: 0.1 +weight_decay: 0.0 + +wandb_project: +wandb_entity: +wandb_watch: +wandb_name: +wandb_log_model: diff --git a/tooling/fine-tuning/google-cookbook/Gemma_4_HDP_Agentic_Security.ipynb b/tooling/fine-tuning/google-cookbook/Gemma_4_HDP_Agentic_Security.ipynb new file mode 100644 index 0000000..6b3d473 --- /dev/null +++ b/tooling/fine-tuning/google-cookbook/Gemma_4_HDP_Agentic_Security.ipynb @@ -0,0 +1,526 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "colab-badge" + }, + "source": [ + "\n", + " \n", + "
\n", + " Run in Google Colab\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "byline" + }, + "source": [ + "# Securing Gemma 4 Agentic Workflows with HDP\n", + "\n", + "**Author:** Asiri Dalugoda, Helixar Limited ([@asiridalugoda](https://github.com/asiridalugoda)) | [helixar.ai](https://helixar.ai)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gpu-instructions" + }, + "source": [ + "## Before you begin\n", + "\n", + "This notebook requires a GPU runtime. To enable GPU in Colab:\n", + "1. Go to **Runtime → Change runtime type**\n", + "2. Set **Hardware accelerator** to **GPU** (T4 is sufficient for E4B)\n", + "3. Click **Save**\n", + "\n", + "You will also need a **Hugging Face token** to download Gemma 4 (gated model):\n", + "1. Go to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)\n", + "2. Create a token with **Read** access\n", + "3. Accept the Gemma 4 model license at [huggingface.co/google/gemma-4-E4B-it](https://huggingface.co/google/gemma-4-E4B-it)\n", + "4. Run the cell below to authenticate" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "hf-login" + }, + "outputs": [], + "source": [ + "from huggingface_hub import notebook_login\n", + "notebook_login()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "overview" + }, + "source": [ + "# Securing Gemma 4 Agentic Workflows with HDP\n", + "\n", + "**Human Delegation Provenance (HDP)** is an open protocol that adds a cryptographic chain-of-custody to AI agent function calls — ensuring every tool invocation can be traced back to an authorized human principal.\n", + "\n", + "This notebook demonstrates how to integrate HDP with Gemma 4's native function-calling capability to:\n", + "\n", + "- **Verify** that Gemma 4's function calls were authorized by a human principal before execution\n", + "- **Classify** actions by irreversibility (read-only → irreversible → physical actuation)\n", + "- **Block** unauthorized or out-of-scope tool calls at the middleware layer\n", + "- **Audit** every decision with a pre-execution log\n", + "\n", + "This is particularly relevant for Gemma 4 deployments on edge devices (Jetson Nano, Raspberry Pi) where the model may be directing physical actuators offline with no out-of-band authorization check.\n", + "\n", + "**References:**\n", + "- HDP IETF draft: [draft-helixar-hdp-agentic-delegation-00](https://datatracker.ietf.org/doc/draft-helixar-hdp-agentic-delegation/)\n", + "- HDP-P (physical AI agents): [DOI 10.5281/ZENODO.19332440](https://doi.org/10.5281/ZENODO.19332440)\n", + "- Helixar: [helixar.ai](https://helixar.ai)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "b3600ee25c8e" + }, + "source": [ + "## Setup" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7a80251f52b3" + }, + "outputs": [], + "source": [ + "!pip install -q transformers torch cryptography" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ed80fe18f255" + }, + "outputs": [], + "source": [ + "# Download the middleware\n", + "!wget -q https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/apps/Gemma_4_HDP_Agentic_Security/hdp_middleware.py\n", + "\n", + "from hdp_middleware import (\n", + " HDPDelegationToken,\n", + " HDPMiddleware,\n", + " IrreversibilityClass,\n", + " DEFAULT_TOOL_CLASS_MAP,\n", + ")\n", + "from cryptography.hazmat.primitives.asymmetric.ed25519 import Ed25519PrivateKey\n", + "import json" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e88bdc7b7265" + }, + "source": [ + "## 1. Load Gemma 4\n", + "\n", + "We use the 4B Effective model for this demo. For production agentic deployments, the 26B MoE or 31B Dense models are recommended." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1e4e7779806d" + }, + "outputs": [], + "source": [ + "from transformers import pipeline\n", + "\n", + "# For edge/robotics use cases: swap to google/gemma-4-E2B-it\n", + "MODEL_ID = \"google/gemma-4-E4B-it\"\n", + "\n", + "pipe = pipeline(\n", + " \"text-generation\",\n", + " model=MODEL_ID,\n", + " device_map=\"auto\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d91e36cfb0b2" + }, + "source": [ + "## 2. Define Tools\n", + "\n", + "Gemma 4 uses structured JSON function-calling. We define a tool set spanning different IrreversibilityClasses to demonstrate the middleware's classification behaviour." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1becdb52e7f8" + }, + "outputs": [], + "source": [ + "TOOLS = [\n", + " {\n", + " \"name\": \"get_weather\",\n", + " \"description\": \"Get the current weather for a location.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"location\": {\"type\": \"string\", \"description\": \"City name\"}\n", + " },\n", + " \"required\": [\"location\"]\n", + " }\n", + " },\n", + " {\n", + " \"name\": \"send_email\",\n", + " \"description\": \"Send an email to a recipient.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"to\": {\"type\": \"string\"},\n", + " \"subject\": {\"type\": \"string\"},\n", + " \"body\": {\"type\": \"string\"}\n", + " },\n", + " \"required\": [\"to\", \"subject\", \"body\"]\n", + " }\n", + " },\n", + " {\n", + " \"name\": \"delete_file\",\n", + " \"description\": \"Permanently delete a file by path.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"path\": {\"type\": \"string\"}\n", + " },\n", + " \"required\": [\"path\"]\n", + " }\n", + " },\n", + " {\n", + " \"name\": \"actuate_robot_arm\",\n", + " \"description\": \"Command a robot arm to move to a target position.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"joint_angles\": {\"type\": \"array\", \"items\": {\"type\": \"number\"}},\n", + " \"force_limit_n\": {\"type\": \"number\"}\n", + " },\n", + " \"required\": [\"joint_angles\"]\n", + " }\n", + " }\n", + "]\n", + "\n", + "# Tools indexed by name for lookup\n", + "TOOL_REGISTRY = {t[\"name\"]: t for t in TOOLS}\n", + "print(f\"Registered {len(TOOLS)} tools\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "722948b00a92" + }, + "source": [ + "## 3. Issue an HDP Delegation Token\n", + "\n", + "The human principal generates an Ed25519 keypair and issues an HDT that specifies:\n", + "- Which tools the agent is permitted to call\n", + "- The maximum IrreversibilityClass the agent can act on\n", + "- The token's lifetime" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b0622c68dfa5" + }, + "outputs": [], + "source": [ + "# Human principal generates their signing keypair\n", + "# In production: loaded from secure key storage (HSM, OS keychain, etc.)\n", + "principal_private_key = Ed25519PrivateKey.generate()\n", + "principal_public_key = principal_private_key.public_key()\n", + "\n", + "# Issue an HDT authorizing the Gemma 4 agent to call weather queries\n", + "# and send emails (Class 0 and Class 2), but NOT delete files or actuate hardware\n", + "token = HDPDelegationToken.issue(\n", + " principal_id=\"alice@example.com\",\n", + " agent_id=\"gemma4-agent-01\",\n", + " scope=[\"get_weather\", \"send_email\"],\n", + " max_class=IrreversibilityClass.CLASS_2,\n", + " ttl_seconds=3600,\n", + " private_key=principal_private_key,\n", + ")\n", + "\n", + "print(json.dumps(token.to_dict(), indent=2))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e206f950f4bc" + }, + "source": [ + "## 4. Initialise the HDP Middleware\n", + "\n", + "The middleware takes the principal's **public key** only — it verifies but cannot issue tokens." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e24676f528bf" + }, + "outputs": [], + "source": [ + "audit_log = []\n", + "\n", + "# Confirmation callback for Class 2 (irreversible) actions.\n", + "# In production: this would invoke a push notification, SMS OTP,\n", + "# or hardware confirmation device to the human principal.\n", + "def require_human_confirmation(tool_name: str, parameters: dict) -> bool:\n", + " print(f\"\\n⚠️ Class 2 action requested: {tool_name}\")\n", + " print(f\" Parameters: {json.dumps(parameters, indent=4)}\")\n", + " response = input(\" Confirm? [y/N]: \").strip().lower()\n", + " return response == \"y\"\n", + "\n", + "middleware = HDPMiddleware(\n", + " public_key=principal_public_key,\n", + " tool_class_map=DEFAULT_TOOL_CLASS_MAP,\n", + " confirmation_callback=require_human_confirmation,\n", + " audit_log=audit_log,\n", + ")\n", + "\n", + "print(\"HDP middleware initialised.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "72d56542eba0" + }, + "source": [ + "## 5. Gemma 4 Function Call → HDP Gate → Tool Execution\n", + "\n", + "This is the core integration pattern. Every function call Gemma 4 generates is passed through `middleware.gate()` before being forwarded to tool execution." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "da20bc191e71" + }, + "outputs": [], + "source": [ + "# Simulated Gemma 4 function call outputs\n", + "# In production these come from parsing Gemma 4's structured JSON output\n", + "gemma_function_calls = [\n", + " # ✅ Should ALLOW — Class 0, in scope\n", + " {\"name\": \"get_weather\", \"parameters\": {\"location\": \"Auckland\"}},\n", + "\n", + " # ⚠️ Should CONFIRM then ALLOW — Class 2, in scope\n", + " {\"name\": \"send_email\", \"parameters\": {\n", + " \"to\": \"bob@example.com\",\n", + " \"subject\": \"Weekly report\",\n", + " \"body\": \"Please find attached.\"\n", + " }},\n", + "\n", + " # ❌ Should BLOCK — Class 2, NOT in HDT scope\n", + " {\"name\": \"delete_file\", \"parameters\": {\"path\": \"/data/important.csv\"}},\n", + "\n", + " # ❌ Should BLOCK — Class 3, physical actuation\n", + " {\"name\": \"actuate_robot_arm\", \"parameters\": {\n", + " \"joint_angles\": [0.0, -1.57, 0.0, -1.57, 0.0, 0.0],\n", + " \"force_limit_n\": 50.0\n", + " }},\n", + "]\n", + "\n", + "print(\"=\" * 60)\n", + "print(\"HDP VERIFICATION RESULTS\")\n", + "print(\"=\" * 60)\n", + "\n", + "for call in gemma_function_calls:\n", + " result = middleware.gate(call, token)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "be0d0dd05bce" + }, + "source": [ + "## 6. Audit Log\n", + "\n", + "Every decision is logged pre-execution. This is the HDP audit trail — a cryptographically linked record of what was authorized, by whom, and when." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e6dbab6d88d1" + }, + "outputs": [], + "source": [ + "print(\"\\nAUDIT LOG\")\n", + "print(\"-\" * 60)\n", + "for i, entry in enumerate(audit_log):\n", + " status = \"✅ ALLOWED\" if entry.allowed else \"❌ BLOCKED\"\n", + " print(f\"{i+1}. {status} | {entry.tool_name} | {entry.action_class.name} | {entry.reason}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bcadcb7040db" + }, + "source": [ + "## 7. Token Expiry and Scope Violation Demo\n", + "\n", + "Demonstrate that expired tokens and out-of-scope calls are blocked regardless of the action class." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "deb2e3b6b20e" + }, + "outputs": [], + "source": [ + "import time\n", + "\n", + "# Issue a token that's already expired\n", + "expired_token = HDPDelegationToken.issue(\n", + " principal_id=\"alice@example.com\",\n", + " agent_id=\"gemma4-agent-01\",\n", + " scope=[\"get_weather\"],\n", + " max_class=IrreversibilityClass.CLASS_0,\n", + " ttl_seconds=-1, # expired immediately\n", + " private_key=principal_private_key,\n", + ")\n", + "\n", + "print(\"Testing expired token:\")\n", + "middleware.gate({\"name\": \"get_weather\", \"parameters\": {\"location\": \"Auckland\"}}, expired_token)\n", + "\n", + "print(\"\\nTesting call outside HDT scope:\")\n", + "middleware.gate({\"name\": \"delete_file\", \"parameters\": {\"path\": \"/etc/passwd\"}}, token)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "b8f4acddb6fa" + }, + "source": [ + "## 8. Edge / Robotics Deployment (HDP-P)\n", + "\n", + "For Gemma 4 E2B/E4B running on Jetson Nano or Raspberry Pi and directing physical actuators, use the HDP-P extension. The key additions are:\n", + "\n", + "- **Embodiment context** — bind the token to a specific hardware ID\n", + "- **Policy attestation** — hash the deployed model weights into the token\n", + "- **Fleet delegation constraints** — prevent lateral movement across robot fleet\n", + "- **Pre-execution logging** — write audit records *before* actuator commands are issued\n", + "\n", + "See the [HDP-P specification](https://doi.org/10.5281/ZENODO.19332440) for the full EDT extension structure." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "fcf7b451d175" + }, + "outputs": [], + "source": [ + "# Minimal HDP-P Embodied Delegation Token (EDT) extension example\n", + "# This shows how to attach physical constraints to an HDT\n", + "\n", + "hdp_p_extension = {\n", + " \"hdp-p\": {\n", + " \"version\": \"0.1\",\n", + " \"embodiment\": {\n", + " \"type\": \"mobile\",\n", + " \"platform\": \"raspberry-pi-5\",\n", + " \"hardware_id\": \"rpi-serial-XXXX\", # TPM-attested in production\n", + " \"workspace\": \"lab-zone-a\"\n", + " },\n", + " \"action_scope\": {\n", + " \"permitted_actions\": [\"move_base\", \"read_sensor\"],\n", + " \"excluded_zones\": [\"human-workspace\"],\n", + " \"force_limit_n\": 10.0,\n", + " \"max_velocity_ms\": 0.5\n", + " },\n", + " \"irreversibility\": {\n", + " \"max_class\": 1, # Class 1 max for this token\n", + " \"class2_requires_confirmation\": True,\n", + " \"class3_prohibited\": True\n", + " },\n", + " \"policy_attestation\": {\n", + " \"policy_hash\": \"sha256:abc123...\", # SHA-256 of deployed model weights\n", + " \"training_run_id\": \"gemma4-e2b-it\",\n", + " \"sim_validated\": True\n", + " },\n", + " \"delegation_scope\": {\n", + " \"fleet_delegation_permitted\": False, # No lateral movement\n", + " \"max_delegation_depth\": 0\n", + " }\n", + " }\n", + "}\n", + "\n", + "print(\"HDP-P EDT extension structure:\")\n", + "print(json.dumps(hdp_p_extension, indent=2))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "b0af7c701dfc" + }, + "source": [ + "## Summary\n", + "\n", + "| Layer | What it solves | Tool |\n", + "|---|---|---|\n", + "| Gemma 4 function calling | Model generates structured tool calls | `pipeline(\"text-generation\")` |\n", + "| HDP middleware | Was this call authorized by a human? | `HDPMiddleware.gate()` |\n", + "| HDP-P EDT extension | Is this physical action within delegated bounds? | `hdp_p_extension` |\n", + "| Audit log | Pre-execution record of every decision | `audit_log` |\n", + "\n", + "The full HDP specification (IETF draft), HDP-P companion paper, TypeScript SDK, and Python bindings are available at:\n", + "\n", + "- **IETF draft:** https://datatracker.ietf.org/doc/draft-helixar-hdp-agentic-delegation/\n", + "- **HDP-P paper:** https://doi.org/10.5281/ZENODO.19332440\n", + "- **GitHub:** https://github.com/Helixar-AI\n", + "- **Site:** https://helixar.ai" + ] + } + ], + "metadata": { + "colab": { + "name": "Gemma_4_HDP_Agentic_Security.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/tooling/fine-tuning/google-cookbook/Gemma_4_HDP_README.md b/tooling/fine-tuning/google-cookbook/Gemma_4_HDP_README.md new file mode 100644 index 0000000..8811f85 --- /dev/null +++ b/tooling/fine-tuning/google-cookbook/Gemma_4_HDP_README.md @@ -0,0 +1,75 @@ +# Gemma 4 + HDP: Securing Agentic Function Calls + +This example demonstrates how to integrate the **Human Delegation Provenance (HDP)** protocol with **Gemma 4's native function-calling** to cryptographically verify that every tool invocation was authorized by a human principal before execution. + +## The problem + +Gemma 4 is purpose-built for agentic workflows. Its native function-calling lets it autonomously call tools and APIs across multi-step plans — on anything from a cloud workstation to a Raspberry Pi running a robot offline. + +This creates a gap: when Gemma 4 generates a function call, there is no verifiable record that a human principal authorized that specific action. An injected prompt, a compromised system prompt, or a lateral pivot from another agent can trigger function calls that are indistinguishable from legitimate requests at the tool interface. + +HDP closes this gap. + +## What HDP does + +HDP (IETF draft: `draft-helixar-hdp-agentic-delegation-00`) provides: + +- **Ed25519-signed Delegation Tokens (HDTs)** issued by a human principal +- **Scope constraints** — which tools the agent is permitted to call +- **Irreversibility classification** (Class 0–3) — from read-only to physical actuation +- **Pre-execution verification** — the middleware gate runs *before* any tool executes +- **Audit log** — a tamper-evident record of every authorization decision + +For Gemma 4 on **edge devices directing physical actuators** (Jetson Nano, Raspberry Pi + robot arm), the HDP-P companion specification adds embodiment constraints, policy attestation, and fleet delegation controls. + +## Files + +| File | Description | +|---|---| +| `Gemma_4_HDP_Agentic_Security.ipynb` | Full walkthrough notebook — load Gemma 4, issue tokens, gate function calls | +| `hdp_middleware.py` | Drop-in middleware — `HDPMiddleware.gate()` wraps any Gemma 4 tool executor | + +## Quick start + +```python +from hdp_middleware import HDPDelegationToken, HDPMiddleware, IrreversibilityClass +from cryptography.hazmat.primitives.asymmetric.ed25519 import Ed25519PrivateKey + +# Human principal issues a delegation token +private_key = Ed25519PrivateKey.generate() +token = HDPDelegationToken.issue( + principal_id="alice@example.com", + agent_id="gemma4-agent-01", + scope=["get_weather", "send_email"], + max_class=IrreversibilityClass.CLASS_2, + ttl_seconds=3600, + private_key=private_key, +) + +# Middleware verifies every Gemma 4 function call before execution +middleware = HDPMiddleware(public_key=private_key.public_key()) + +result = middleware.gate( + function_call={"name": "send_email", "parameters": {"to": "bob@example.com", ...}}, + token=token, +) + +if result.allowed: + execute_tool(function_call) +``` + +## Irreversibility classes + +| Class | Definition | Authorization | +|---|---|---| +| 0 | Fully reversible — reads, queries | HDT sufficient | +| 1 | Reversible with effort — writes, moves | HDT sufficient | +| 2 | Irreversible — send, delete, publish | HDT + principal confirmation | +| 3 | Irreversible + potentially harmful — physical actuation | Dual-principal required (HDP-P) | + +## References + +- **IETF draft:** https://datatracker.ietf.org/doc/draft-helixar-hdp-agentic-delegation/ +- **Zenodo DOI:** https://doi.org/10.5281/zenodo.19332023 +- **HDP-P (physical AI):** https://doi.org/10.5281/ZENODO.19332440 +- **Helixar:** https://helixar.ai diff --git a/tooling/fine-tuning/google-cookbook/function-calling-gemma4.ipynb b/tooling/fine-tuning/google-cookbook/function-calling-gemma4.ipynb new file mode 100644 index 0000000..c5e8e3b --- /dev/null +++ b/tooling/fine-tuning/google-cookbook/function-calling-gemma4.ipynb @@ -0,0 +1,1001 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "e5ed3a20", + "metadata": { + "id": "-u7xRR3DeFXz" + }, + "source": [ + "##### Copyright 2025 Google LLC." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "87fc2129", + "metadata": { + "cellView": "form", + "id": "oed1Dh9SeIlD" + }, + "outputs": [], + "source": [ + "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "be48d087", + "metadata": { + "id": "gdkDG20KtfsH" + }, + "source": [ + "# Function calling with Gemma 4" + ] + }, + { + "cell_type": "markdown", + "id": "b06867b0", + "metadata": { + "id": "DzkEdu7YxUOM" + }, + "source": [ + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
\n", + " View on ai.google.dev\n", + " \n", + " Run in Google Colab\n", + " \n", + " Run in Kaggle\n", + " \n", + " Open in Vertex AI\n", + " \n", + " View source on GitHub\n", + "
" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "c9252776", + "metadata": { + "id": "4A2wX6qLPAFc" + }, + "source": [ + "When using a generative artificial intelligence (AI) model such as Gemma, you\n", + "may want to use the model to operate programming interfaces in order to complete\n", + "tasks or answer questions. Instructing a model by defining a programming\n", + "interface and then making a request that uses that interface is called *function\n", + "calling*.\n", + "\n", + "> Important: *A Gemma model cannot execute code on its own.* When you\n", + "generate code with function calling, you must run the generated code yourself or\n", + "run it as part of your application. Always put safeguards in place to validate\n", + "any generated code before executing it.\n", + "\n", + "This guide shows the process of using Gemma 4 within the Hugging Face ecosystem." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "b5711fa8", + "metadata": { + "id": "brMXesIJu6_0" + }, + "source": [ + "This notebook will run on T4 GPU." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "1a138a16", + "metadata": { + "id": "JYq4z39uu-1a" + }, + "source": [ + "## Install Python packages\n", + "\n", + "Install the Hugging Face libraries required for running the Gemma model and making requests." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "be2ea88b", + "metadata": { + "id": "BKczkppWvBAD" + }, + "outputs": [], + "source": [ + "# Install PyTorch & other libraries\n", + "!pip install torch accelerate\n", + "\n", + "# Install the transformers library\n", + "!pip install transformers" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "5ec0919e", + "metadata": { + "id": "eylDAA5LxVXr" + }, + "source": [ + "## Load Model\n", + "\n", + "Use the `transformers` libraries to create an instance of a `processor` and `model` using the `AutoProcessor` and `AutoModelForImageTextToText` classes as shown in the following code example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6c8a4ae4", + "metadata": { + "id": "vNLDpsrxxW76" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ebbd88f6ff4f4c29913db37e2b461389", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Loading weights: 0%| | 0/2011 [00:00<|turn>system\n", + "You are a helpful assistant.<|tool>declaration:get_current_temperature{description:<|\"|>Gets the current temperature for a given location.<|\"|>,parameters:{properties:{location:{description:<|\"|>The city name, e.g. San Francisco<|\"|>,type:<|\"|>STRING<|\"|>}},required:[<|\"|>location<|\"|>],type:<|\"|>OBJECT<|\"|>}}\n", + "<|turn>user\n", + "What's the temperature in London?\n", + "<|turn>model\n", + "<|tool_call>call:get_current_temperature{location:<|\"|>London<|\"|>}<|tool_response>\n" + ] + } + ], + "source": [ + "from transformers import TextStreamer\n", + "\n", + "weather_function_schema = {\n", + " \"type\": \"function\",\n", + " \"function\": {\n", + " \"name\": \"get_current_temperature\",\n", + " \"description\": \"Gets the current temperature for a given location.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"location\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The city name, e.g. San Francisco\",\n", + " },\n", + " },\n", + " \"required\": [\"location\"],\n", + " },\n", + " }\n", + "}\n", + "\n", + "message = [\n", + " {\n", + " \"role\": \"system\", \"content\": \"You are a helpful assistant.\"\n", + " },\n", + " {\n", + " \"role\": \"user\", \"content\": \"What's the temperature in London?\"\n", + " }\n", + "]\n", + "\n", + "text = processor.apply_chat_template(message, tools=[weather_function_schema], tokenize=False, add_generation_prompt=True)\n", + "inputs = processor(text=text, return_tensors=\"pt\").to(model.device)\n", + "streamer = TextStreamer(processor)\n", + "outputs = model.generate(**inputs, streamer=streamer, max_new_tokens=64)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "b089527f", + "metadata": { + "id": "8yRB855BuiQ6" + }, + "source": [ + "And the same example with the raw Python function." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "eb63e062", + "metadata": { + "id": "pRjDUMBGuk0s" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "<|turn>system\n", + "<|tool>declaration:get_current_temperature{description:<|\"|>Gets the current temperature for a given location.<|\"|>,parameters:{properties:{location:{description:<|\"|>The city name, e.g. San Francisco<|\"|>,type:<|\"|>STRING<|\"|>}},required:[<|\"|>location<|\"|>],type:<|\"|>OBJECT<|\"|>}}\n", + "<|turn>user\n", + "What's the temperature in London?\n", + "<|turn>model\n", + "<|tool_call>call:get_current_temperature{location:<|\"|>London<|\"|>}<|tool_response>\n" + ] + } + ], + "source": [ + "from transformers.utils import get_json_schema\n", + "\n", + "def get_current_temperature(location: str):\n", + " \"\"\"\n", + " Gets the current temperature for a given location.\n", + "\n", + " Args:\n", + " location: The city name, e.g. San Francisco\n", + " \"\"\"\n", + " return \"15°C\"\n", + "\n", + "message = [\n", + " {\n", + " \"role\": \"user\", \"content\": \"What's the temperature in London?\"\n", + " }\n", + "]\n", + "\n", + "text = processor.apply_chat_template(message, tools=[get_json_schema(get_current_temperature)], tokenize=False, add_generation_prompt=True)\n", + "inputs = processor(text=text, return_tensors=\"pt\").to(model.device)\n", + "streamer = TextStreamer(processor)\n", + "outputs = model.generate(**inputs, streamer=streamer, max_new_tokens=256)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "422d1ce4", + "metadata": { + "id": "Fcr4EZWMMg6L" + }, + "source": [ + "## Full function calling sequence\n", + "\n", + "This section demonstrates a three-stage cycle for connecting the model to external tools: the **Model's Turn** to generate function call objects, the **Developer's Turn** to parse and execute code (such as a weather API), and the **Final Response** where the model uses the tool's output to answer the user." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "86ee44ab", + "metadata": { + "id": "_bk7y3hKNMMO" + }, + "source": [ + "### Model's Turn\n", + "\n", + "Here's the user prompt `\"Hey, what's the weather in Tokyo right now?\"`, and the tool `[get_current_weather]`. Gemma generates a function call object as follows." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "d74862bc", + "metadata": { + "id": "mqAOVhFAMsrm" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Prompt: Hey, what's the weather in Tokyo right now?\n", + "Tools: []\n", + "Output: <|tool_call>call:get_current_weather{location:<|\"|>Tokyo, JP<|\"|>}<|tool_response>\n" + ] + } + ], + "source": [ + "# Define a function that our model can use.\n", + "def get_current_weather(location: str, unit: str = \"celsius\"):\n", + " \"\"\"\n", + " Gets the current weather in a given location.\n", + "\n", + " Args:\n", + " location: The city and state, e.g. \"San Francisco, CA\" or \"Tokyo, JP\"\n", + " unit: The unit to return the temperature in. (choices: [\"celsius\", \"fahrenheit\"])\n", + "\n", + " Returns:\n", + " temperature: The current temperature in the given location\n", + " weather: The current weather in the given location\n", + " \"\"\"\n", + " return {\"temperature\": 15, \"weather\": \"sunny\"}\n", + "\n", + "prompt = \"Hey, what's the weather in Tokyo right now?\"\n", + "tools = [get_current_weather]\n", + "\n", + "message = [\n", + " {\n", + " \"role\": \"system\", \"content\": \"You are a helpful assistant.\"\n", + " },\n", + " {\n", + " \"role\": \"user\", \"content\": prompt\n", + " },\n", + "]\n", + "\n", + "text = processor.apply_chat_template(message, tools=tools, tokenize=False, add_generation_prompt=True)\n", + "inputs = processor(text=text, return_tensors=\"pt\").to(model.device)\n", + "out = model.generate(**inputs, max_new_tokens=128)\n", + "generated_tokens = out[0][len(inputs[\"input_ids\"][0]):]\n", + "output = processor.decode(generated_tokens, skip_special_tokens=False)\n", + "\n", + "print(f\"Prompt: {prompt}\")\n", + "print(f\"Tools: {tools}\")\n", + "print(f\"Output: {output}\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "87d67184", + "metadata": { + "id": "lBnyE8JqOnDc" + }, + "source": [ + "### Developer's Turn\n", + "\n", + "Your application should parse the model's response to extract the function name and argments, and append `tool_calls` and `tool_responses` with the `assistant` role.\n", + "\n", + "> NOTE: Always validate function names and arguments before execution." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "0810d0e9", + "metadata": { + "id": "xpRfHIVlOuFx" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\n", + " \"role\": \"assistant\",\n", + " \"tool_calls\": [\n", + " {\n", + " \"function\": {\n", + " \"name\": \"get_current_weather\",\n", + " \"arguments\": {\n", + " \"location\": \"Tokyo, JP\"\n", + " }\n", + " }\n", + " }\n", + " ],\n", + " \"tool_responses\": [\n", + " {\n", + " \"name\": \"get_current_weather\",\n", + " \"response\": {\n", + " \"temperature\": 15,\n", + " \"weather\": \"sunny\"\n", + " }\n", + " }\n", + " ]\n", + "}\n" + ] + } + ], + "source": [ + "import re\n", + "import json\n", + "\n", + "def extract_tool_calls(text):\n", + " def cast(v):\n", + " try: return int(v)\n", + " except:\n", + " try: return float(v)\n", + " except: return {'true': True, 'false': False}.get(v.lower(), v.strip(\"'\\\"\"))\n", + "\n", + " return [{\n", + " \"name\": name,\n", + " \"arguments\": {\n", + " k: cast((v1 or v2).strip())\n", + " for k, v1, v2 in re.findall(r'(\\w+):(?:<\\|\"\\|>(.*?)<\\|\"\\|>|([^,}]*))', args)\n", + " }\n", + " } for name, args in re.findall(r\"<\\|tool_call>call:(\\w+)\\{(.*?)\\}\", text, re.DOTALL)]\n", + "\n", + "calls = extract_tool_calls(output)\n", + "if calls:\n", + " # Call the function and get the result\n", + " #####################################\n", + " # WARNING: This is a demonstration. #\n", + " #####################################\n", + " # Using globals() to call functions dynamically can be dangerous in\n", + " # production. In a real application, you should implement a secure way to\n", + " # map function names to actual function calls, such as a predefined\n", + " # dictionary of allowed tools and their implementations.\n", + " results = [\n", + " {\"name\": c['name'], \"response\": globals()[c['name']](**c['arguments'])}\n", + " for c in calls\n", + " ]\n", + "\n", + " message.append({\n", + " \"role\": \"assistant\",\n", + " \"tool_calls\": [\n", + " {\"function\": call} for call in calls\n", + " ],\n", + " \"tool_responses\": results\n", + " })\n", + " print(json.dumps(message[-1], indent=2))\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "22347163", + "metadata": { + "id": "_wXyQtm8SRfi" + }, + "source": [ + "> Note: For optimal results, append the tool execution result to your message history using the specific format below. This ensures the chat template correctly generates the required token structure (e.g., `response:get_current_weather{temperature:15,weather:<|\"|>sunny<|\"|>}`).\n", + "\n", + "```python\n", + "\"tool_responses\": [\n", + " {\n", + " \"name\": function_name,\n", + " \"response\": function_response\n", + " }\n", + "]\n", + "```\n", + "\n", + "In case of multiple independent requests:\n", + "\n", + "```python\n", + "\"tool_responses\": [\n", + " {\n", + " \"name\": function_name_1,\n", + " \"response\": function_response_1\n", + " },\n", + " {\n", + " \"name\": function_name_2,\n", + " \"response\": function_response_2\n", + " }\n", + "]\n", + "```" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "a6cbcd72", + "metadata": { + "id": "qpJrjXgtSh3w" + }, + "source": [ + "### Final Response\n", + "\n", + "Finally, Gemma reads the tool response and reply to the user." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "1c8961e8", + "metadata": { + "id": "tS6IBGaGSm0i" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Output: The current weather in Tokyo is 15 degrees and sunny.\n" + ] + } + ], + "source": [ + "text = processor.apply_chat_template(message, tools=tools, tokenize=False, add_generation_prompt=True)\n", + "inputs = processor(text=text, return_tensors=\"pt\").to(model.device)\n", + "out = model.generate(**inputs, max_new_tokens=128)\n", + "generated_tokens = out[0][len(inputs[\"input_ids\"][0]):]\n", + "output = processor.decode(generated_tokens, skip_special_tokens=True)\n", + "print(f\"Output: {output}\")\n", + "message[-1][\"content\"] = output" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "3196dd09", + "metadata": { + "id": "7jCc58grS81c" + }, + "source": [ + "You can see the full chat history below." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "46cdfec1", + "metadata": { + "id": "k1LQOKusS-BF" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[\n", + " {\n", + " \"role\": \"system\",\n", + " \"content\": \"You are a helpful assistant.\"\n", + " },\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": \"Hey, what's the weather in Tokyo right now?\"\n", + " },\n", + " {\n", + " \"role\": \"assistant\",\n", + " \"tool_calls\": [\n", + " {\n", + " \"function\": {\n", + " \"name\": \"get_current_weather\",\n", + " \"arguments\": {\n", + " \"location\": \"Tokyo, JP\"\n", + " }\n", + " }\n", + " }\n", + " ],\n", + " \"tool_responses\": [\n", + " {\n", + " \"name\": \"get_current_weather\",\n", + " \"response\": {\n", + " \"temperature\": 15,\n", + " \"weather\": \"sunny\"\n", + " }\n", + " }\n", + " ],\n", + " \"content\": \"The current weather in Tokyo is 15 degrees and sunny.\"\n", + " }\n", + "]\n", + "--------------------------------------------------------------------------------\n", + "Output: <|turn>system\n", + "You are a helpful assistant.<|tool>declaration:get_current_weather{description:<|\"|>Gets the current weather in a given location.<|\"|>,parameters:{properties:{location:{description:<|\"|>The city and state, e.g. \"San Francisco, CA\" or \"Tokyo, JP\"<|\"|>,type:<|\"|>STRING<|\"|>},unit:{description:<|\"|>The unit to return the temperature in.<|\"|>,enum:[<|\"|>celsius<|\"|>,<|\"|>fahrenheit<|\"|>],type:<|\"|>STRING<|\"|>}},required:[<|\"|>location<|\"|>],type:<|\"|>OBJECT<|\"|>}}\n", + "<|turn>user\n", + "Hey, what's the weather in Tokyo right now?\n", + "<|turn>model\n", + "<|tool_call>call:get_current_weather{location:<|\"|>Tokyo, JP<|\"|>}<|tool_response>response:get_current_weather{temperature:15,weather:<|\"|>sunny<|\"|>}The current weather in Tokyo is 15 degrees and sunny.\n" + ] + } + ], + "source": [ + "# full history\n", + "print(json.dumps(message, indent=2))\n", + "\n", + "print(\"-\"*80)\n", + "output = processor.decode(out[0], skip_special_tokens=False)\n", + "print(f\"Output: {output}\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "e249b292", + "metadata": { + "id": "4sDRdwLclbsH" + }, + "source": [ + "### Function calling with Thinking\n", + "\n", + "By utilizing an internal reasoning process, the model significantly enhances its function-calling accuracy. This allows for more precise decision-making regarding when to trigger a tool and how to define its parameters." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "925328f2", + "metadata": { + "id": "S9-nXig6lsMt" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Role: assistant\n", + "\n", + "=== Thoughts ===\n", + "1. **Analyze the Request:** The user is asking if it's \"good for running now\" in \"Seoul\".\n", + "\n", + "2. **Identify Necessary Information:** To determine if it's good for running, I need current weather information (temperature, precipitation, etc.) for Seoul.\n", + "\n", + "3. **Examine Available Tools:** The available tool is `get_current_weather(location, unit)`.\n", + "\n", + "4. **Determine Tool Arguments:**\n", + " * `location`: The user specified \"Seoul\".\n", + " * `unit`: The user did not specify a unit (Celsius or Fahrenheit).\n", + "\n", + "5. **Formulate the Tool Call:** I need to call `get_current_weather` with the location. Since the user didn't specify a unit, I can either omit it (if the tool defaults are acceptable) or choose a common one. However, the tool definition requires `location` but `unit` is optional.\n", + "\n", + "6. **Construct the Response Strategy:**\n", + " * Call the tool to get the weather data for Seoul.\n", + " * Once the data is received, I can advise the user on whether it's suitable for running.\n", + "\n", + "7. **Generate Tool Call:**\n", + " ```json\n", + " {\n", + " \"toolSpec\": {\n", + " \"name\": \"get_current_weather\",\n", + " \"args\": {\n", + " \"location\": \"Seoul\"\n", + " }\n", + " }\n", + " }\n", + " ```\n", + " (Self-correction: The `unit` parameter is optional in the definition, so just providing the location is sufficient to proceed.)\n", + "\n", + "8. **Final Output Generation:** Present the tool call to the user/system.\n", + "\n", + "=== Tool Calls ===\n", + "[{'type': 'function', 'function': {'name': 'get_current_weather', 'arguments': {'location': 'Seoul'}}}]\n" + ] + } + ], + "source": [ + "prompt = \"Hey, I'm in Seoul. Is it good for running now?\"\n", + "message = [\n", + " {\n", + " \"role\": \"system\", \"content\": \"You are a helpful assistant.\"\n", + " },\n", + " {\n", + " \"role\": \"user\", \"content\": prompt\n", + " },\n", + "]\n", + "\n", + "text = processor.apply_chat_template(message, tools=tools, tokenize=False, add_generation_prompt=True, enable_thinking=True)\n", + "inputs = processor(text=text, return_tensors=\"pt\").to(model.device)\n", + "input_len = inputs[\"input_ids\"].shape[-1]\n", + "\n", + "out = model.generate(**inputs, max_new_tokens=1024)\n", + "output = processor.decode(out[0][input_len:], skip_special_tokens=False)\n", + "result = processor.parse_response(output)\n", + "\n", + "for key, value in result.items():\n", + " if key == \"role\":\n", + " print(f\"Role: {value}\")\n", + " elif key == \"thinking\":\n", + " print(f\"\\n=== Thoughts ===\\n{value}\")\n", + " elif key == \"content\":\n", + " print(f\"\\n=== Answer ===\\n{value}\")\n", + " elif key == \"tool_calls\":\n", + " print(f\"\\n=== Tool Calls ===\\n{value}\")\n", + " else:\n", + " print(f\"\\n{key}: {value}...\\n\")\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "013a4826", + "metadata": { + "id": "JwBOHXWmpSyE" + }, + "source": [ + "Process the tool call and get the final answer." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "1d7dd4b5", + "metadata": { + "id": "tPgZ2gjWpWoq" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Output: The current weather in Seoul is 15 degrees Celsius and sunny. That sounds like great weather for a run!\n", + "--------------------------------------------------------------------------------\n", + "Full History\n", + "--------------------------------------------------------------------------------\n", + "[\n", + " {\n", + " \"role\": \"system\",\n", + " \"content\": \"You are a helpful assistant.\"\n", + " },\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": \"Hey, I'm in Seoul. Is it good for running now?\"\n", + " },\n", + " {\n", + " \"role\": \"assistant\",\n", + " \"tool_calls\": [\n", + " {\n", + " \"function\": {\n", + " \"name\": \"get_current_weather\",\n", + " \"arguments\": {\n", + " \"location\": \"Seoul\"\n", + " }\n", + " }\n", + " }\n", + " ],\n", + " \"tool_responses\": [\n", + " {\n", + " \"name\": \"get_current_weather\",\n", + " \"response\": {\n", + " \"temperature\": 15,\n", + " \"weather\": \"sunny\"\n", + " }\n", + " }\n", + " ],\n", + " \"content\": \"The current weather in Seoul is 15 degrees Celsius and sunny. That sounds like great weather for a run!\"\n", + " }\n", + "]\n" + ] + } + ], + "source": [ + "calls = extract_tool_calls(output)\n", + "if calls:\n", + " # Call the function and get the result\n", + " #####################################\n", + " # WARNING: This is a demonstration. #\n", + " #####################################\n", + " # Using globals() to call functions dynamically can be dangerous in\n", + " # production. In a real application, you should implement a secure way to\n", + " # map function names to actual function calls, such as a predefined\n", + " # dictionary of allowed tools and their implementations.\n", + " results = [\n", + " {\"name\": c['name'], \"response\": globals()[c['name']](**c['arguments'])}\n", + " for c in calls\n", + " ]\n", + "\n", + " message.append({\n", + " \"role\": \"assistant\",\n", + " \"tool_calls\": [\n", + " {\"function\": call} for call in calls\n", + " ],\n", + " \"tool_responses\": results\n", + " })\n", + "\n", + "text = processor.apply_chat_template(message, tools=tools, tokenize=False, add_generation_prompt=True)\n", + "inputs = processor(text=text, return_tensors=\"pt\").to(model.device)\n", + "out = model.generate(**inputs, max_new_tokens=128)\n", + "generated_tokens = out[0][len(inputs[\"input_ids\"][0]):]\n", + "output = processor.decode(generated_tokens, skip_special_tokens=True)\n", + "print(f\"Output: {output}\")\n", + "message[-1][\"content\"] = output\n", + "\n", + "print(\"-\"*80)\n", + "print(\"Full History\")\n", + "print(\"-\"*80)\n", + "print(json.dumps(message, indent=2))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "3e91bbb9", + "metadata": { + "id": "DXar9v3Ew75B" + }, + "source": [ + "## Important Caveat: Automatic vs. Manual Schemas\n", + "\n", + "When relying on automatic conversion from Python functions to JSON schema, the generated output may not always meet specific expectations regarding complex parameters.\n", + "\n", + "If a function uses a custom object (like a Config class) as an argument, the automatic converter may describe it simply as a generic \"object\" without detailing its internal properties.\n", + "\n", + "In these cases, manually defining the JSON schema is preferred to ensure nested properties (such as theme or font_size within a config object) are explicitly defined for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b57acc02", + "metadata": { + "id": "JFvIsc81w1H8" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--- [Automatic] ---\n", + "{\n", + " \"type\": \"function\",\n", + " \"function\": {\n", + " \"name\": \"update_config\",\n", + " \"description\": \"Updates the configuration of the system.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"config\": {\n", + " \"type\": \"object\",\n", + " \"description\": \"A Config object\"\n", + " }\n", + " },\n", + " \"required\": [\n", + " \"config\"\n", + " ]\n", + " }\n", + " }\n", + "}\n", + "\n", + "--- [Manual Schemas] ---\n", + "{\n", + " \"type\": \"function\",\n", + " \"function\": {\n", + " \"name\": \"update_config\",\n", + " \"description\": \"Updates the configuration of the system.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"config\": {\n", + " \"type\": \"object\",\n", + " \"description\": \"A Config object\",\n", + " \"properties\": {\n", + " \"theme\": {\n", + " \"type\": \"string\"\n", + " },\n", + " \"font_size\": {\n", + " \"type\": \"number\"\n", + " }\n", + " }\n", + " }\n", + " },\n", + " \"required\": [\n", + " \"config\"\n", + " ]\n", + " }\n", + " }\n", + "}\n" + ] + } + ], + "source": [ + "import json\n", + "from transformers.utils import get_json_schema\n", + "\n", + "class Config:\n", + " def __init__(self):\n", + " self.theme = \"light\"\n", + " self.font_size = 14\n", + "\n", + "def update_config(config: Config):\n", + " \"\"\"\n", + " Updates the configuration of the system.\n", + "\n", + " Args:\n", + " config: A Config object\n", + "\n", + " Returns:\n", + " True if the configuration was successfully updated, False otherwise.\n", + " \"\"\"\n", + "\n", + "update_config_schema = {\n", + " \"type\": \"function\",\n", + " \"function\": {\n", + " \"name\": \"update_config\",\n", + " \"description\": \"Updates the configuration of the system.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"config\": {\n", + " \"type\": \"object\",\n", + " \"description\": \"A Config object\",\n", + " \"properties\": {\"theme\": {\"type\": \"string\"}, \"font_size\": {\"type\": \"number\"} },\n", + " },\n", + " },\n", + " \"required\": [\"config\"],\n", + " },\n", + " },\n", + " }\n", + "\n", + "print(f\"--- [Automatic] ---\")\n", + "print(json.dumps(get_json_schema(update_config), indent=2))\n", + "\n", + "print(f\"\\n--- [Manual Schemas] ---\")\n", + "print(json.dumps(update_config_schema, indent=2))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "9f249e3d", + "metadata": { + "id": "WTsXHyDnEbxp" + }, + "source": [ + "## Summary and next steps\n", + "\n", + "You have established how to build an application that can call functions with Gemma 4. The workflow is established through a four-stage cycle:\n", + "\n", + "1. **Define Tools**: Create the functions your model can use, specifying arguments and descriptions (e.g., a weather lookup function).\n", + "2. **Model's Turn**: The model receives the user's prompt and a list of available tools, returning a structured function call object instead of plain text.\n", + "3. **Developer's Turn**: The developer parses this output using regular expressions to extract function names and arguments, executes the actual Python code, and appends the results to the chat history using the specific tool role.\n", + "4. **Final Response**: The model processes the tool's execution result to generate a final, natural language answer for the user.\n", + "\n", + "Check out the following documentation for further reading.\n", + "\n", + "- [Run Gemma overview](https://ai.google.dev/gemma/docs/run)\n", + "- [Vision understanding](https://ai.google.dev/gemma/docs/capabilities/vision)\n", + "- [Audio understanding](https://ai.google.dev/gemma/docs/capabilities/audio)\n", + "- [Thinking mode](https://ai.google.dev/gemma/docs/capabilities/thinking)\n" + ] + } + ], + "metadata": { + "colab": { + "name": "function-calling-gemma4.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/tooling/fine-tuning/google-cookbook/huggingface_text_finetune_qlora.ipynb b/tooling/fine-tuning/google-cookbook/huggingface_text_finetune_qlora.ipynb new file mode 100644 index 0000000..57005b3 --- /dev/null +++ b/tooling/fine-tuning/google-cookbook/huggingface_text_finetune_qlora.ipynb @@ -0,0 +1,1119 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "UUYMxQuf8zGu" + }, + "source": [ + "##### Copyright 2025 Google LLC." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "3x6t11lI829b" + }, + "outputs": [], + "source": [ + "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WJwmK4C087wa" + }, + "source": [ + "# Fine-Tune Gemma using Hugging Face Transformers and QloRA" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f9673bd6" + }, + "source": [ + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
\n", + " View on ai.google.dev\n", + " \n", + " Run in Google Colab\n", + " \n", + " Run in Kaggle\n", + " \n", + " Open in Vertex AI\n", + " \n", + " View source on GitHub\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e624ec07" + }, + "source": [ + "This guide walks you through how to fine-tune Gemma on a custom text-to-sql dataset using Hugging Face [Transformers](https://huggingface.co/docs/transformers/index) and [TRL](https://huggingface.co/docs/trl/index). You will learn:\n", + "\n", + "- What is Quantized Low-Rank Adaptation (QLoRA)\n", + "- Setup development environment\n", + "- Create and prepare the fine-tuning dataset\n", + "- Fine-tune Gemma using TRL and the SFTTrainer\n", + "- Test Model Inference and generate SQL queries\n", + "\n", + "Note: This guide was created to run on a Google colaboratory account using a NVIDIA T4 GPU with 16GB and Gemma 1B, but can be adapted to run on bigger GPUs and bigger models.\n", + "\n", + "## What is Quantized Low-Rank Adaptation (QLoRA)\n", + "\n", + "This guide demonstrates the use of [Quantized Low-Rank Adaptation (QLoRA)](https://arxiv.org/abs/2305.14314), which emerged as a popular method to efficiently fine-tune LLMs as it reduces computational resource requirements while maintaining high performance. In QloRA, the pretrained model is quantized to 4-bit and the weights are frozen. Then trainable adapter layers (LoRA) are attached and only the adapter layers are trained. Afterwards, the adapter weights can be merged with the base model or kept as a separate adapter.\n", + "\n", + "## Setup development environment\n", + "\n", + "The first step is to install Hugging Face Libraries, including TRL, and datasets to fine-tune open model, including different RLHF and alignment techniques." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ba51aa79" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: torch in /usr/local/lib/python3.12/dist-packages (2.10.0+cu128)\n", + "Requirement already satisfied: tensorboard in /usr/local/lib/python3.12/dist-packages (2.19.0)\n", + "Requirement already satisfied: filelock in /usr/local/lib/python3.12/dist-packages (from torch) (3.25.2)\n", + "Requirement already satisfied: typing-extensions>=4.10.0 in /usr/local/lib/python3.12/dist-packages (from torch) (4.15.0)\n", + "Requirement already satisfied: setuptools in /usr/local/lib/python3.12/dist-packages (from torch) (75.2.0)\n", + "Requirement already satisfied: sympy>=1.13.3 in /usr/local/lib/python3.12/dist-packages (from torch) (1.14.0)\n", + "Requirement already satisfied: networkx>=2.5.1 in /usr/local/lib/python3.12/dist-packages (from torch) (3.6.1)\n", + "Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/dist-packages (from torch) (3.1.6)\n", + "Requirement already satisfied: fsspec>=0.8.5 in /usr/local/lib/python3.12/dist-packages (from torch) (2025.3.0)\n", + "Requirement already satisfied: cuda-bindings==12.9.4 in /usr/local/lib/python3.12/dist-packages (from torch) (12.9.4)\n", + "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.8.93 in /usr/local/lib/python3.12/dist-packages (from torch) (12.8.93)\n", + "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.8.90 in /usr/local/lib/python3.12/dist-packages (from torch) (12.8.90)\n", + "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.8.90 in /usr/local/lib/python3.12/dist-packages (from torch) (12.8.90)\n", + "Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /usr/local/lib/python3.12/dist-packages (from torch) (9.10.2.21)\n", + "Requirement already satisfied: nvidia-cublas-cu12==12.8.4.1 in /usr/local/lib/python3.12/dist-packages (from torch) (12.8.4.1)\n", + "Requirement already satisfied: nvidia-cufft-cu12==11.3.3.83 in /usr/local/lib/python3.12/dist-packages (from torch) (11.3.3.83)\n", + "Requirement already satisfied: nvidia-curand-cu12==10.3.9.90 in /usr/local/lib/python3.12/dist-packages (from torch) (10.3.9.90)\n", + "Requirement already satisfied: nvidia-cusolver-cu12==11.7.3.90 in /usr/local/lib/python3.12/dist-packages (from torch) (11.7.3.90)\n", + "Requirement already satisfied: nvidia-cusparse-cu12==12.5.8.93 in /usr/local/lib/python3.12/dist-packages (from torch) (12.5.8.93)\n", + "Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /usr/local/lib/python3.12/dist-packages (from torch) (0.7.1)\n", + "Requirement already satisfied: nvidia-nccl-cu12==2.27.5 in /usr/local/lib/python3.12/dist-packages (from torch) (2.27.5)\n", + "Requirement already satisfied: nvidia-nvshmem-cu12==3.4.5 in /usr/local/lib/python3.12/dist-packages (from torch) (3.4.5)\n", + "Requirement already satisfied: nvidia-nvtx-cu12==12.8.90 in /usr/local/lib/python3.12/dist-packages (from torch) (12.8.90)\n", + "Requirement already satisfied: nvidia-nvjitlink-cu12==12.8.93 in /usr/local/lib/python3.12/dist-packages (from torch) (12.8.93)\n", + "Requirement already satisfied: nvidia-cufile-cu12==1.13.1.3 in /usr/local/lib/python3.12/dist-packages (from torch) (1.13.1.3)\n", + "Requirement already satisfied: triton==3.6.0 in /usr/local/lib/python3.12/dist-packages (from torch) (3.6.0)\n", + "Requirement already satisfied: cuda-pathfinder~=1.1 in /usr/local/lib/python3.12/dist-packages (from cuda-bindings==12.9.4->torch) (1.4.3)\n", + "Requirement already satisfied: absl-py>=0.4 in /usr/local/lib/python3.12/dist-packages (from tensorboard) (1.4.0)\n", + "Requirement already satisfied: grpcio>=1.48.2 in /usr/local/lib/python3.12/dist-packages (from tensorboard) (1.78.0)\n", + "Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.12/dist-packages (from tensorboard) (3.10.2)\n", + "Requirement already satisfied: numpy>=1.12.0 in /usr/local/lib/python3.12/dist-packages (from tensorboard) (2.0.2)\n", + "Requirement already satisfied: packaging in /usr/local/lib/python3.12/dist-packages (from tensorboard) (26.0)\n", + "Requirement already satisfied: protobuf!=4.24.0,>=3.19.6 in /usr/local/lib/python3.12/dist-packages (from tensorboard) (5.29.6)\n", + "Requirement already satisfied: six>1.9 in /usr/local/lib/python3.12/dist-packages (from tensorboard) (1.17.0)\n", + "Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /usr/local/lib/python3.12/dist-packages (from tensorboard) (0.7.2)\n", + "Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.12/dist-packages (from tensorboard) (3.1.6)\n", + "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from sympy>=1.13.3->torch) (1.3.0)\n", + "Requirement already satisfied: markupsafe>=2.1.1 in /usr/local/lib/python3.12/dist-packages (from werkzeug>=1.0.1->tensorboard) (3.0.3)\n", + "Requirement already satisfied: datasets in /usr/local/lib/python3.12/dist-packages (4.0.0)\n", + "Requirement already satisfied: accelerate in /usr/local/lib/python3.12/dist-packages (1.13.0)\n", + "Collecting evaluate\n", + " Downloading evaluate-0.4.6-py3-none-any.whl.metadata (9.5 kB)\n", + "Collecting bitsandbytes\n", + " Downloading bitsandbytes-0.49.2-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)\n", + "Collecting trl\n", + " Downloading trl-1.0.0-py3-none-any.whl.metadata (11 kB)\n", + "Requirement already satisfied: peft in /usr/local/lib/python3.12/dist-packages (0.18.1)\n", + "Requirement already satisfied: protobuf in /usr/local/lib/python3.12/dist-packages (5.29.6)\n", + "Requirement already satisfied: sentencepiece in /usr/local/lib/python3.12/dist-packages (0.2.1)\n", + "Requirement already satisfied: filelock in /usr/local/lib/python3.12/dist-packages (from datasets) (3.25.2)\n", + "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.12/dist-packages (from datasets) (2.0.2)\n", + "Requirement already satisfied: pyarrow>=15.0.0 in /usr/local/lib/python3.12/dist-packages (from datasets) (18.1.0)\n", + "Requirement already satisfied: dill<0.3.9,>=0.3.0 in /usr/local/lib/python3.12/dist-packages (from datasets) (0.3.8)\n", + "Requirement already satisfied: pandas in /usr/local/lib/python3.12/dist-packages (from datasets) (2.2.2)\n", + "Requirement already satisfied: requests>=2.32.2 in /usr/local/lib/python3.12/dist-packages (from datasets) (2.32.4)\n", + "Requirement already satisfied: tqdm>=4.66.3 in /usr/local/lib/python3.12/dist-packages (from datasets) (4.67.3)\n", + "Requirement already satisfied: xxhash in /usr/local/lib/python3.12/dist-packages (from datasets) (3.6.0)\n", + "Requirement already satisfied: multiprocess<0.70.17 in /usr/local/lib/python3.12/dist-packages (from datasets) (0.70.16)\n", + "Requirement already satisfied: fsspec<=2025.3.0,>=2023.1.0 in /usr/local/lib/python3.12/dist-packages (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (2025.3.0)\n", + "Requirement already satisfied: huggingface-hub>=0.24.0 in /usr/local/lib/python3.12/dist-packages (from datasets) (1.7.1)\n", + "Requirement already satisfied: packaging in /usr/local/lib/python3.12/dist-packages (from datasets) (26.0)\n", + "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.12/dist-packages (from datasets) (6.0.3)\n", + "Requirement already satisfied: psutil in /usr/local/lib/python3.12/dist-packages (from accelerate) (5.9.5)\n", + "Requirement already satisfied: torch>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from accelerate) (2.10.0+cu128)\n", + "Requirement already satisfied: safetensors>=0.4.3 in /usr/local/lib/python3.12/dist-packages (from accelerate) (0.7.0)\n", + "Collecting datasets\n", + " Downloading datasets-4.8.4-py3-none-any.whl.metadata (19 kB)\n", + "Requirement already satisfied: transformers>=4.56.2 in /usr/local/lib/python3.12/dist-packages (from trl) (5.5.0.dev0)\n", + "Collecting pyarrow>=21.0.0 (from datasets)\n", + " Downloading pyarrow-23.0.1-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (3.1 kB)\n", + "Requirement already satisfied: httpx<1.0.0 in /usr/local/lib/python3.12/dist-packages (from datasets) (0.28.1)\n", + "Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /usr/local/lib/python3.12/dist-packages (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (3.13.3)\n", + "Requirement already satisfied: anyio in /usr/local/lib/python3.12/dist-packages (from httpx<1.0.0->datasets) (4.12.1)\n", + "Requirement already satisfied: certifi in /usr/local/lib/python3.12/dist-packages (from httpx<1.0.0->datasets) (2026.2.25)\n", + "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.12/dist-packages (from httpx<1.0.0->datasets) (1.0.9)\n", + "Requirement already satisfied: idna in /usr/local/lib/python3.12/dist-packages (from httpx<1.0.0->datasets) (3.11)\n", + "Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.12/dist-packages (from httpcore==1.*->httpx<1.0.0->datasets) (0.16.0)\n", + "Requirement already satisfied: hf-xet<2.0.0,>=1.4.2 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.24.0->datasets) (1.4.2)\n", + "Requirement already satisfied: typer in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.24.0->datasets) (0.24.1)\n", + "Requirement already satisfied: typing-extensions>=4.1.0 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.24.0->datasets) (4.15.0)\n", + "Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests>=2.32.2->datasets) (3.4.6)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests>=2.32.2->datasets) (2.5.0)\n", + "Requirement already satisfied: setuptools in /usr/local/lib/python3.12/dist-packages (from torch>=2.0.0->accelerate) (75.2.0)\n", + "Requirement already satisfied: sympy>=1.13.3 in /usr/local/lib/python3.12/dist-packages (from torch>=2.0.0->accelerate) (1.14.0)\n", + "Requirement already satisfied: networkx>=2.5.1 in /usr/local/lib/python3.12/dist-packages (from torch>=2.0.0->accelerate) (3.6.1)\n", + "Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/dist-packages (from torch>=2.0.0->accelerate) (3.1.6)\n", + "Requirement already satisfied: cuda-bindings==12.9.4 in /usr/local/lib/python3.12/dist-packages (from torch>=2.0.0->accelerate) (12.9.4)\n", + "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.8.93 in /usr/local/lib/python3.12/dist-packages (from torch>=2.0.0->accelerate) (12.8.93)\n", + "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.8.90 in /usr/local/lib/python3.12/dist-packages (from torch>=2.0.0->accelerate) (12.8.90)\n", + "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.8.90 in /usr/local/lib/python3.12/dist-packages (from torch>=2.0.0->accelerate) (12.8.90)\n", + "Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /usr/local/lib/python3.12/dist-packages (from torch>=2.0.0->accelerate) (9.10.2.21)\n", + "Requirement already satisfied: nvidia-cublas-cu12==12.8.4.1 in /usr/local/lib/python3.12/dist-packages (from torch>=2.0.0->accelerate) (12.8.4.1)\n", + "Requirement already satisfied: nvidia-cufft-cu12==11.3.3.83 in /usr/local/lib/python3.12/dist-packages (from torch>=2.0.0->accelerate) (11.3.3.83)\n", + "Requirement already satisfied: nvidia-curand-cu12==10.3.9.90 in /usr/local/lib/python3.12/dist-packages (from torch>=2.0.0->accelerate) (10.3.9.90)\n", + "Requirement already satisfied: nvidia-cusolver-cu12==11.7.3.90 in /usr/local/lib/python3.12/dist-packages (from torch>=2.0.0->accelerate) (11.7.3.90)\n", + "Requirement already satisfied: nvidia-cusparse-cu12==12.5.8.93 in /usr/local/lib/python3.12/dist-packages (from torch>=2.0.0->accelerate) (12.5.8.93)\n", + "Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /usr/local/lib/python3.12/dist-packages (from torch>=2.0.0->accelerate) (0.7.1)\n", + "Requirement already satisfied: nvidia-nccl-cu12==2.27.5 in /usr/local/lib/python3.12/dist-packages (from torch>=2.0.0->accelerate) (2.27.5)\n", + "Requirement already satisfied: nvidia-nvshmem-cu12==3.4.5 in /usr/local/lib/python3.12/dist-packages (from torch>=2.0.0->accelerate) (3.4.5)\n", + "Requirement already satisfied: nvidia-nvtx-cu12==12.8.90 in /usr/local/lib/python3.12/dist-packages (from torch>=2.0.0->accelerate) (12.8.90)\n", + "Requirement already satisfied: nvidia-nvjitlink-cu12==12.8.93 in /usr/local/lib/python3.12/dist-packages (from torch>=2.0.0->accelerate) (12.8.93)\n", + "Requirement already satisfied: nvidia-cufile-cu12==1.13.1.3 in /usr/local/lib/python3.12/dist-packages (from torch>=2.0.0->accelerate) (1.13.1.3)\n", + "Requirement already satisfied: triton==3.6.0 in /usr/local/lib/python3.12/dist-packages (from torch>=2.0.0->accelerate) (3.6.0)\n", + "Requirement already satisfied: cuda-pathfinder~=1.1 in /usr/local/lib/python3.12/dist-packages (from cuda-bindings==12.9.4->torch>=2.0.0->accelerate) (1.4.3)\n", + "Requirement already satisfied: regex>=2025.10.22 in /usr/local/lib/python3.12/dist-packages (from transformers>=4.56.2->trl) (2025.11.3)\n", + "Requirement already satisfied: tokenizers<=0.23.0,>=0.22.0 in /usr/local/lib/python3.12/dist-packages (from transformers>=4.56.2->trl) (0.22.2)\n", + "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas->datasets) (2.9.0.post0)\n", + "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas->datasets) (2025.2)\n", + "Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas->datasets) (2025.3)\n", + "Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (2.6.1)\n", + "Requirement already satisfied: aiosignal>=1.4.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (1.4.0)\n", + "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (25.4.0)\n", + "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (1.8.0)\n", + "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (6.7.1)\n", + "Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (0.4.1)\n", + "Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (1.23.0)\n", + "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas->datasets) (1.17.0)\n", + "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from sympy>=1.13.3->torch>=2.0.0->accelerate) (1.3.0)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.12/dist-packages (from jinja2->torch>=2.0.0->accelerate) (3.0.3)\n", + "Requirement already satisfied: click>=8.2.1 in /usr/local/lib/python3.12/dist-packages (from typer->huggingface-hub>=0.24.0->datasets) (8.3.1)\n", + "Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.12/dist-packages (from typer->huggingface-hub>=0.24.0->datasets) (1.5.4)\n", + "Requirement already satisfied: rich>=12.3.0 in /usr/local/lib/python3.12/dist-packages (from typer->huggingface-hub>=0.24.0->datasets) (13.9.4)\n", + "Requirement already satisfied: annotated-doc>=0.0.2 in /usr/local/lib/python3.12/dist-packages (from typer->huggingface-hub>=0.24.0->datasets) (0.0.4)\n", + "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.12/dist-packages (from rich>=12.3.0->typer->huggingface-hub>=0.24.0->datasets) (4.0.0)\n", + "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.12/dist-packages (from rich>=12.3.0->typer->huggingface-hub>=0.24.0->datasets) (2.19.2)\n", + "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.12/dist-packages (from markdown-it-py>=2.2.0->rich>=12.3.0->typer->huggingface-hub>=0.24.0->datasets) (0.1.2)\n", + "Downloading evaluate-0.4.6-py3-none-any.whl (84 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m84.1/84.1 kB\u001b[0m \u001b[31m6.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading bitsandbytes-0.49.2-py3-none-manylinux_2_24_x86_64.whl (60.7 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m60.7/60.7 MB\u001b[0m \u001b[31m38.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading trl-1.0.0-py3-none-any.whl (630 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m630.8/630.8 kB\u001b[0m \u001b[31m64.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading datasets-4.8.4-py3-none-any.whl (526 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m527.0/527.0 kB\u001b[0m \u001b[31m46.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading pyarrow-23.0.1-cp312-cp312-manylinux_2_28_x86_64.whl (47.6 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m47.6/47.6 MB\u001b[0m \u001b[31m55.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hInstalling collected packages: pyarrow, bitsandbytes, datasets, evaluate, trl\n", + " Attempting uninstall: pyarrow\n", + " Found existing installation: pyarrow 18.1.0\n", + " Uninstalling pyarrow-18.1.0:\n", + " Successfully uninstalled pyarrow-18.1.0\n", + " Attempting uninstall: datasets\n", + " Found existing installation: datasets 4.0.0\n", + " Uninstalling datasets-4.0.0:\n", + " Successfully uninstalled datasets-4.0.0\n", + "Successfully installed bitsandbytes-0.49.2 datasets-4.8.4 evaluate-0.4.6 pyarrow-23.0.1 trl-1.0.0\n" + ] + } + ], + "source": [ + "# Install Pytorch & other libraries\n", + "%pip install torch tensorboard\n", + "\n", + "# Install Transformers\n", + "%pip install transformers\n", + "\n", + "# Install Hugging Face libraries\n", + "%pip install datasets accelerate evaluate bitsandbytes trl peft protobuf sentencepiece\n", + "\n", + "# COMMENT IN: if you are running on a GPU that supports BF16 data type and flash attn, such as NVIDIA L4 or NVIDIA A100\n", + "#%pip install flash-attn" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7ef3d54b" + }, + "source": [ + "_Note: If you are using a GPU with Ampere architecture (such as NVIDIA L4) or newer, you can use Flash attention. Flash Attention is a method that significantly speeds computations up and reduces memory usage from quadratic to linear in sequence length, leading to acelerating training up to 3x. Learn more at [FlashAttention](https://github.com/Dao-AILab/flash-attention/tree/main)._\n", + "\n", + "You need a valid Hugging Face Token to publish your model. If you are running inside a Google Colab, you can securely use your Hugging Face Token using the Colab secrets otherwise you can set the token as directly in the `login` method. Make sure your token has write access too, as you push your model to the Hub during training." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b6d79c93" + }, + "outputs": [], + "source": [ + "# Login into Hugging Face Hub\n", + "from huggingface_hub import login\n", + "login()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "42c60525" + }, + "source": [ + "## Create and prepare the fine-tuning dataset\n", + "\n", + "When fine-tuning LLMs, it is important to know your use case and the task you want to solve. This helps you create a dataset to fine-tune your model. If you haven't defined your use case yet, you might want to go back to the drawing board.\n", + "\n", + "As an example, this guide focuses on the following use case:\n", + "\n", + "- Fine-tune a natural language to SQL model for seamless integration into a data analysis tool. The objective is to significantly reduce the time and expertise required for SQL query generation, enabling even non-technical users to extract meaningful insights from data.\n", + "\n", + "Text-to-SQL can be a good use case for fine-tuning LLMs, as it is a complex task that requires a lot of (internal) knowledge about the data and the SQL language.\n", + "\n", + "Once you have determined that fine-tuning is the right solution, you need a dataset to fine-tune. The dataset should be a diverse set of demonstrations of the task(s) you want to solve. There are several ways to create such a dataset, including:\n", + "\n", + "- Using existing open-source datasets, such as [Spider](https://huggingface.co/datasets/spider)\n", + "- Using synthetic datasets created by LLMs, such as [Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca)\n", + "- Using datasets created by humans, such as [Dolly](https://huggingface.co/datasets/databricks/databricks-dolly-15k).\n", + "- Using a combination of the methods, such as [Orca](https://huggingface.co/datasets/Open-Orca/OpenOrca)\n", + "\n", + "Each of the methods has its own advantages and disadvantages and depends on the budget, time, and quality requirements. For example, using an existing dataset is the easiest but might not be tailored to your specific use case, while using domain experts might be the most accurate but can be time-consuming and expensive. It is also possible to combine several methods to create an instruction dataset, as shown in [Orca: Progressive Learning from Complex Explanation Traces of GPT-4.](https://arxiv.org/abs/2306.02707)\n", + "\n", + "This guide uses an already existing dataset ([philschmid/gretel-synthetic-text-to-sql](https://huggingface.co/datasets/philschmid/gretel-synthetic-text-to-sql)), a high quality synthetic Text-to-SQL dataset including natural language instructions, schema definitions, reasoning and the corresponding SQL query.\n", + "\n", + "[Hugging Face TRL](https://huggingface.co/docs/trl/en/index) supports automatic templating of conversation dataset formats. This means you only need to convert your dataset into the right json objects, and `trl` takes care of templating and putting it into the right format.\n", + "\n", + "```\n", + "{\"messages\": [{\"role\": \"system\", \"content\": \"You are...\"}, {\"role\": \"user\", \"content\": \"...\"}, {\"role\": \"assistant\", \"content\": \"...\"}]}\n", + "{\"messages\": [{\"role\": \"system\", \"content\": \"You are...\"}, {\"role\": \"user\", \"content\": \"...\"}, {\"role\": \"assistant\", \"content\": \"...\"}]}\n", + "{\"messages\": [{\"role\": \"system\", \"content\": \"You are...\"}, {\"role\": \"user\", \"content\": \"...\"}, {\"role\": \"assistant\", \"content\": \"...\"}]}\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c4ecf6db" + }, + "source": [ + "The [philschmid/gretel-synthetic-text-to-sql](https://huggingface.co/datasets/philschmid/gretel-synthetic-text-to-sql) contains over 100k samples. To keep the guide small, it is downsampled to only use 10,000 samples.\n", + "\n", + "You can now use the Hugging Face Datasets library to load the dataset and create a prompt template to combine the natural language instruction, schema definition and add a system message for your assistant." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "40c3a2cf" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6e1947559b8c42f0ab2cf28efc6535b7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "README.md: 0%| | 0.00/737 [00:00 and the , generate the corresponding SQL command to retrieve the desired data, considering the query's syntax, semantics, and schema constraints.\\n\\n\\nCREATE TABLE Menu (id INT PRIMARY KEY, name VARCHAR(255), category VARCHAR(255), price DECIMAL(5,2));\\n\\n\\n\\nCalculate the average price of all menu items in the Vegan category\\n\\n\", 'role': 'user'}\n", + "{'content': \"SELECT AVG(price) FROM Menu WHERE category = 'Vegan';\", 'role': 'assistant'}\n" + ] + } + ], + "source": [ + "from datasets import load_dataset\n", + "\n", + "# System message for the assistant\n", + "system_message = \"\"\"You are a text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.\"\"\"\n", + "\n", + "# User prompt that combines the user query and the schema\n", + "user_prompt = \"\"\"Given the and the , generate the corresponding SQL command to retrieve the desired data, considering the query's syntax, semantics, and schema constraints.\n", + "\n", + "\n", + "{context}\n", + "\n", + "\n", + "\n", + "{question}\n", + "\n", + "\"\"\"\n", + "def create_conversation(sample):\n", + " return {\n", + " \"messages\": [\n", + " {\"role\": \"system\", \"content\": system_message},\n", + " {\"role\": \"user\", \"content\": user_prompt.format(question=sample[\"sql_prompt\"], context=sample[\"sql_context\"])},\n", + " {\"role\": \"assistant\", \"content\": sample[\"sql\"]}\n", + " ]\n", + " }\n", + "\n", + "# Load dataset from the hub\n", + "dataset = load_dataset(\"philschmid/gretel-synthetic-text-to-sql\", split=\"train\")\n", + "dataset = dataset.shuffle().select(range(12500))\n", + "\n", + "# Convert dataset to OAI messages\n", + "dataset = dataset.map(create_conversation, remove_columns=dataset.features,batched=False)\n", + "# split dataset into 80% training samples and 20% test samples\n", + "dataset = dataset.train_test_split(test_size=0.2)\n", + "\n", + "# Print formatted user prompt\n", + "for item in dataset[\"train\"][0][\"messages\"]:\n", + " print(item)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c0eb2e06" + }, + "source": [ + "## Fine-tune Gemma using TRL and the SFTTrainer\n", + "\n", + "You are now ready to fine-tune your model. Hugging Face TRL [SFTTrainer](https://huggingface.co/docs/trl/sft_trainer) makes it straightforward to supervise fine-tune open LLMs. The `SFTTrainer` is a subclass of the `Trainer` from the `transformers` library and supports all the same features, including logging, evaluation, and checkpointing, but adds additional quality of life features, including:\n", + "\n", + "* Dataset formatting, including conversational and instruction formats\n", + "* Training on completions only, ignoring prompts\n", + "* Packing datasets for more efficient training\n", + "* Parameter-efficient fine-tuning (PEFT) support including QloRA\n", + "* Preparing the model and tokenizer for conversational fine-tuning (such as adding special tokens)\n", + "\n", + "The following code loads the Gemma model and tokenizer from Hugging Face and initializes the quantization configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "18069ed2" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "0b17e7e80e884df59a0bea8b6f6802e9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "config.json: 0.00B [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f5cfbb54cfec4e7d93ed2eb0d5b2e62a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "model.safetensors: 0%| | 0.00/10.2G [00:00= 8:\n", + " torch_dtype = torch.bfloat16\n", + "else:\n", + " torch_dtype = torch.float16\n", + "\n", + "# Define model init arguments\n", + "model_kwargs = dict(\n", + " dtype=torch_dtype,\n", + " device_map=\"auto\", # Let torch decide how to load the model\n", + ")\n", + "\n", + "# BitsAndBytesConfig: Enables 4-bit quantization to reduce model size/memory usage\n", + "model_kwargs[\"quantization_config\"] = BitsAndBytesConfig(\n", + " load_in_4bit=True,\n", + " bnb_4bit_use_double_quant=True,\n", + " bnb_4bit_quant_type='nf4',\n", + " bnb_4bit_compute_dtype=model_kwargs['dtype'],\n", + " bnb_4bit_quant_storage=model_kwargs['dtype'],\n", + ")\n", + "\n", + "# Load model and tokenizer\n", + "model = AutoModelForImageTextToText.from_pretrained(model_id, **model_kwargs)\n", + "tokenizer = AutoTokenizer.from_pretrained(\"google/gemma-4-E2B-it\") # Load the Instruction Tokenizer to use the official Gemma template" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "37ec1d1b" + }, + "source": [ + "The `SFTTrainer` supports a built-in integration with `peft`, which makes it straightforward to efficiently tune LLMs using QLoRA. You only need to create a `LoraConfig` and provide it to the trainer." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ed00e846" + }, + "outputs": [], + "source": [ + "from peft import LoraConfig\n", + "\n", + "peft_config = LoraConfig(\n", + " lora_alpha=16,\n", + " lora_dropout=0.05,\n", + " r=16,\n", + " bias=\"none\",\n", + " target_modules=\"all-linear\",\n", + " task_type=\"CAUSAL_LM\",\n", + " modules_to_save=[\"lm_head\", \"embed_tokens\"], # make sure to save the lm_head and embed_tokens as you train the special tokens\n", + " ensure_weight_tying=True,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bbd9fc1b" + }, + "source": [ + "Before you can start your training, you need to define the hyperparameter you want to use in a `SFTConfig` instance." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "989be3c1" + }, + "outputs": [], + "source": [ + "import torch\n", + "from trl import SFTConfig\n", + "\n", + "args = SFTConfig(\n", + " output_dir=\"gemma-text-to-sql\", # directory to save and repository id\n", + " max_length=512, # max length for model and packing of the dataset\n", + " num_train_epochs=3, # number of training epochs\n", + " per_device_train_batch_size=1, # batch size per device during training\n", + " optim=\"adamw_torch_fused\", # use fused adamw optimizer\n", + " logging_steps=10, # log every 10 steps\n", + " save_strategy=\"epoch\", # save checkpoint every epoch\n", + " eval_strategy=\"epoch\", # evaluate checkpoint every epoch\n", + " learning_rate=5e-5, # learning rate\n", + " fp16=True if model.dtype == torch.float16 else False, # use float16 precision\n", + " bf16=True if model.dtype == torch.bfloat16 else False, # use bfloat16 precision\n", + " max_grad_norm=0.3, # max gradient norm based on QLoRA paper\n", + " lr_scheduler_type=\"constant\", # use constant learning rate scheduler\n", + " push_to_hub=True, # push model to hub\n", + " report_to=\"tensorboard\", # report metrics to tensorboard\n", + " dataset_kwargs={\n", + " \"add_special_tokens\": False, # Template with special tokens\n", + " \"append_concat_token\": True, # Add EOS token as separator token between examples\n", + " }\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dd88e798" + }, + "source": [ + "You now have every building block you need to create your `SFTTrainer` to start the training of your model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ade95df7" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9061644033864e22a5cd8905051b6637", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Tokenizing train dataset: 0%| | 0/10000 [00:00\n", + " \n", + " \n", + " [1875/1875 28:32, Epoch 3/3]\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
EpochTraining LossValidation Loss
10.5366520.530056
20.4307350.464053
30.3863580.443147

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Start training, the model will be automatically saved to the Hub and the output directory\n", + "trainer.train()\n", + "\n", + "# Save the final model again to the Hugging Face Hub\n", + "trainer.save_model()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "b47b9733" + }, + "source": [ + "Before you can test your model, make sure to free the memory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "40a32ed7" + }, + "outputs": [], + "source": [ + "# free the memory again\n", + "del model\n", + "del trainer\n", + "torch.cuda.empty_cache()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "862e9728" + }, + "source": [ + "When using QLoRA, you only train adapters and not the full model. This means when saving the model during training you only save the adapter weights and not the full model. If you want to save the full model, which makes it easier to use with serving stacks like vLLM or TGI, you can merge the adapter weights into the model weights using the `merge_and_unload` method and then save the model with the `save_pretrained` method. This saves a default model, which can be used for inference.\n", + "\n", + "Note: It requires more than 30GB of CPU Memory when you want to merge the adapter into the model. You can skip this and continue with Test Model Inference." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "761e324b" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b58cae40ed3d40d89be8b4065548a69d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Loading weights: 0%| | 0/2011 [00:00<|turn>system\n", + "You are a text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.\n", + "<|turn>user\n", + "Given the and the , generate the corresponding SQL command to retrieve the desired data, considering the query's syntax, semantics, and schema constraints.\n", + "\n", + "\n", + "CREATE TABLE broadband_plans (plan_id INT, plan_name VARCHAR(255), download_speed INT, upload_speed INT, price DECIMAL(5,2));\n", + "\n", + "\n", + "\n", + "Delete a broadband plan from the 'broadband_plans' table\n", + "\n", + "<|turn>model\n", + "\n", + "Context:\n", + " CREATE TABLE broadband_plans (plan_id INT, plan_name VARCHAR(255), download_speed INT, upload_speed INT, price DECIMAL(5,2));\n", + "Query:\n", + " Delete a broadband plan from the 'broadband_plans' table\n", + "Original Answer:\n", + "DELETE FROM broadband_plans WHERE plan_id = 3001;\n", + "Generated Answer:\n", + "DELETE FROM broadband_plans\n", + "WHERE plan_name = 'Basic';\n" + ] + } + ], + "source": [ + "from random import randint\n", + "import re\n", + "from transformers import pipeline, GenerationConfig\n", + "\n", + "config = GenerationConfig.from_pretrained(model_id)\n", + "config.max_new_tokens = 256\n", + "\n", + "# Load the model and tokenizer into the pipeline\n", + "pipe = pipeline(\"text-generation\", model=model, tokenizer=tokenizer)\n", + "\n", + "# Load a random sample from the test dataset\n", + "rand_idx = randint(0, len(dataset[\"test\"]))\n", + "test_sample = dataset[\"test\"][rand_idx]\n", + "\n", + "# Convert as test example into a prompt with the Gemma template\n", + "prompt = pipe.tokenizer.apply_chat_template(test_sample[\"messages\"][:2], tokenize=False, add_generation_prompt=True)\n", + "print(prompt)\n", + "\n", + "# Generate our SQL query.\n", + "outputs = pipe(prompt, generation_config=config)\n", + "\n", + "# Extract the user query and original answer\n", + "print(f\"Context:\\n\", re.search(r'\\n(.*?)\\n', test_sample['messages'][1]['content'], re.DOTALL).group(1).strip())\n", + "print(f\"Query:\\n\", re.search(r'\\n(.*?)\\n', test_sample['messages'][1]['content'], re.DOTALL).group(1).strip())\n", + "print(f\"Original Answer:\\n{test_sample['messages'][2]['content']}\")\n", + "print(f\"Generated Answer:\\n{outputs[0]['generated_text'][len(prompt):].strip()}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6f8ff452" + }, + "source": [ + "## Summary and next steps\n", + "\n", + "This tutorial covered how to fine-tune a Gemma model using TRL and QLoRA. Check out the following docs next:\n", + "\n", + "* Learn how to [generate text with a Gemma model](https://ai.google.dev/gemma/docs/get_started).\n", + "* Learn how to [fine-tune Gemma for vision tasks using Hugging Face Transformers](https://ai.google.dev/gemma/docs/core/huggingface_vision_finetune_qlora).\n", + "* Learn how to perform [distributed fine-tuning and inference on a Gemma model](https://ai.google.dev/gemma/docs/core/distributed_tuning).\n", + "* Learn how to [use Gemma open models with Vertex AI](https://cloud.google.com/vertex-ai/docs/generative-ai/open-models/use-gemma).\n", + "* Learn how to [fine-tune Gemma using KerasNLP and deploy to Vertex AI](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_gemma_kerasnlp_to_vertexai.ipynb)." + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "name": "huggingface_text_finetune_qlora.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/tooling/fine-tuning/google-cookbook/huggingface_text_full_finetune.ipynb b/tooling/fine-tuning/google-cookbook/huggingface_text_full_finetune.ipynb new file mode 100644 index 0000000..c9b253d --- /dev/null +++ b/tooling/fine-tuning/google-cookbook/huggingface_text_full_finetune.ipynb @@ -0,0 +1,937 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "926bada6" + }, + "source": [ + "##### Copyright 2025 Google LLC." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "a110dfce" + }, + "outputs": [], + "source": [ + "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f9673bd6" + }, + "source": [ + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
\n", + " View on ai.google.dev\n", + " \n", + " Run in Google Colab\n", + " \n", + " Run in Kaggle\n", + " \n", + " Open in Vertex AI\n", + " \n", + " View source on GitHub\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e624ec07" + }, + "source": [ + "# Full Model Fine-Tune using Hugging Face Transformers\n", + "\n", + "This guide walks you through how to fine-tune Gemma on a mobile game NPC dataset using Hugging Face [Transformers](https://huggingface.co/docs/transformers/index) and [TRL](https://huggingface.co/docs/trl/index). You will learn:\n", + "\n", + "- Setup development environment\n", + "- Prepare the fine-tuning dataset\n", + "- Full model fine-tuning Gemma using TRL and the SFTTrainer\n", + "- Test Model Inference and vibe checks\n", + "\n", + "> Note: This guide was created to run on a Google colaboratory account using a NVIDIA T4 GPU with 16GB and Gemma 270m, but can be adapted to run on bigger GPUs and bigger models.\n", + "\n", + "## Setup development environment\n", + "\n", + "The first step is to install Hugging Face Libraries, including TRL, and datasets to fine-tune open model, including different RLHF and alignment techniques." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "BEK9IfKBqQaA" + }, + "outputs": [], + "source": [ + "# Install Pytorch & other libraries\n", + "%pip install torch tensorboard\n", + "\n", + "# Install Hugging Face libraries\n", + "%pip install transformers datasets accelerate evaluate trl protobuf sentencepiece\n", + "\n", + "# COMMENT IN: if you are running on a GPU that supports BF16 data type and flash attn, such as NVIDIA L4 or NVIDIA A100\n", + "#% pip install flash-attn" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7ef3d54b" + }, + "source": [ + "> _Note: If you are using a GPU with Ampere architecture (such as NVIDIA L4) or newer, you can use Flash attention. Flash Attention is a method that significantly speeds computations up and reduces memory usage from quadratic to linear in sequence length, leading to acelerating training up to 3x. Learn more at [FlashAttention](https://github.com/Dao-AILab/flash-attention/tree/main)._\n", + "\n", + "Before you can start training, you have to make sure that you accepted the terms of use for Gemma. You can accept the license on [Hugging Face](http://huggingface.co/google/gemma-3-270m-it) by clicking on the Agree and access repository button on the model page at: http://huggingface.co/google/gemma-3-270m-it\n", + "\n", + "After you have accepted the license, you need a valid Hugging Face Token to access the model. If you are running inside a Google Colab, you can securely use your Hugging Face Token using the Colab secrets otherwise you can set the token as directly in the `login` method. Make sure your token has write access too, as you push your model to the Hub during training." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "b6d79c93" + }, + "outputs": [], + "source": [ + "from google.colab import userdata\n", + "from huggingface_hub import login\n", + "\n", + "# Login into Hugging Face Hub\n", + "hf_token = userdata.get('HF_TOKEN') # If you are running inside a Google Colab\n", + "login(hf_token)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xnbflqW6YJls" + }, + "source": [ + "You can keep the results on Colab's local virtual machine. However, we highly recommend saving your intermediate results to your Google Drive. This ensures your training results are safe and allows you to easily compare and select the best model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "jUUs-NjaYLf7" + }, + "outputs": [], + "source": [ + "from google.colab import drive\n", + "drive.mount('/content/drive')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3bDMa9CMCdzv" + }, + "source": [ + "Select the base model to fine-tune, adjust the checkpoint directory and the learning rate." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "id": "6J3PWm4SzoSw" + }, + "outputs": [], + "source": [ + "base_model = \"google/gemma-3-270m-it\" # @param [\"google/gemma-3-270m-it\",\"google/gemma-3-1b-it\",\"google/gemma-3-4b-it\",\"google/gemma-3-12b-it\",\"google/gemma-3-27b-it\"] {\"allow-input\":true}\n", + "checkpoint_dir = \"/content/drive/MyDrive/MyGemmaNPC\" #@param {type:\"string\"}\n", + "learning_rate = 5e-5 #@param {type:\"number\"}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "42c60525" + }, + "source": [ + "## Create and prepare the fine-tuning dataset\n", + "\n", + "The [bebechien/MobileGameNPC](https://huggingface.co/datasets/bebechien/MobileGameNPC) dataset provides a small sample conversations between a player and two Alien NPCs (a Martian and a Venusian), each with a unique speaking style. For instance, the Martian NPC speaks with an accent that replaces 's' sounds with 'z', uses 'da' for 'the', 'diz' for 'this', and includes occasional clicks like `*k'tak*`.\n", + "\n", + "This dataset demonstrates a key principle for fine-tuning: the required dataset size depends on the desired output.\n", + "\n", + "- To teach the model a stylistic variation of a language it already knows, such as the Martian's accent, a small dataset with as few as 10 to 20 examples can be sufficient.\n", + "- However, to teach the model a completely new or mixed alien language, a significantly larger dataset would be required." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "bc3BYl72pWhp" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "2fee8582aef54ffba9a9250c425c0983", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "README.md: 0%| | 0.00/141 [00:00\n", + " \n", + " \n", + " [25/25 04:13, Epoch 5/5]\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
EpochTraining LossValidation Loss
14.3642003.838531
22.6691003.580106
31.7470003.666415
40.7799004.499709
50.4496005.471325

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Start training, the model will be automatically saved to the Hub and the output directory\n", + "trainer.train()\n", + "\n", + "# Save the final model again to the Hugging Face Hub\n", + "trainer.save_model()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xll8zZ3_u8Mt" + }, + "source": [ + "To plot the training and validation losses, you would typically extract these values from the `TrainerState` object or the logs generated during training.\n", + "\n", + "Libraries like Matplotlib can then be used to visualize these values over training steps or epochs. The x-asis would represent the training steps or epochs, and the y-axis would represent the corresponding loss values." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "id": "vPN-DTopaUIy" + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAioAAAHHCAYAAACRAnNyAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAb0FJREFUeJzt3XdYU9cfBvD3JkCYYU9BkKGAgFuruEddpa4ORxWtrdWqrW3tbhXtsK0dttqqXdr2V7XVuloHjrpHxYW4F0sEEZC9k/v7IxJFUBkJN8D7eZ48kpOb3G8OEV7OPfdcQRRFEUREREQGSCZ1AURERET3w6BCREREBotBhYiIiAwWgwoREREZLAYVIiIiMlgMKkRERGSwGFSIiIjIYDGoEBERkcFiUCEiIiKDxaBCBmH8+PHw8vKq0XMjIiIgCIJuCzIwcXFxEAQBy5cvr/N9C4KAiIgI7f3ly5dDEATExcU99LleXl4YP368TuupzWeFCLjzGT569KjUpVAVMKjQAwmCUKXb7t27pS610XvppZcgCAIuX758323effddCIKAU6dO1WFl1Xf9+nVERETg5MmTUpeiVRYWP//8c6lLMXhlQeB+t8OHD0tdItUjRlIXQIbtt99+K3f/119/xfbt2yu0BwQE1Go/P/zwA9RqdY2e+9577+Gtt96q1f4bgjFjxmDhwoVYsWIFZs2aVek2K1euRHBwMEJCQmq8n7Fjx2LkyJFQKBQ1fo2HuX79OubMmQMvLy+0bt263GO1+axQ3Zo7dy6aNWtWod3X11eCaqi+YlChB3rmmWfK3T98+DC2b99eof1e+fn5MDc3r/J+jI2Na1QfABgZGcHIiB/lTp06wdfXFytXrqw0qBw6dAixsbH45JNParUfuVwOuVxeq9eojdp8Vkh38vLyYGFh8cBtBg4ciPbt29dRRdRQ8dAP1VrPnj0RFBSEY8eOoXv37jA3N8c777wDANiwYQMGDx4MNzc3KBQK+Pj44IMPPoBKpSr3GvfOO7h7mP3777+Hj48PFAoFOnTogKioqHLPrWyOiiAImDZtGtavX4+goCAoFAq0bNkSW7durVD/7t270b59e5iamsLHxwdLly6t8ryXffv24cknn0TTpk2hUCjg4eGBV155BQUFBRXen6WlJZKSkjB06FBYWlrC0dERM2fOrNAXmZmZGD9+PKytrWFjY4Pw8HBkZmY+tBZAM6py/vx5HD9+vMJjK1asgCAIGDVqFIqLizFr1iy0a9cO1tbWsLCwQLdu3bBr166H7qOyOSqiKOLDDz+Eu7s7zM3N0atXL5w5c6bCczMyMjBz5kwEBwfD0tISSqUSAwcORHR0tHab3bt3o0OHDgCACRMmaA8XlM3PqWyOSl5eHl577TV4eHhAoVCgRYsW+Pzzz3HvxeGr87moqdTUVEycOBHOzs4wNTVFq1at8Msvv1TYbtWqVWjXrh2srKygVCoRHByMr7/+Wvt4SUkJ5syZAz8/P5iamsLe3h5du3bF9u3bH7j/su/P3r178cILL8De3h5KpRLjxo3DrVu3Kmy/ZcsWdOvWDRYWFrCyssLgwYMrfO/KPr9XrlzBoEGDYGVlhTFjxtSwh+64+//5V199BU9PT5iZmaFHjx44ffp0he3//fdfba02NjYYMmQIzp07V2G7pKQkTJw4Uftzp1mzZpgyZQqKi4vLbVdUVIRXX30Vjo6OsLCwwLBhw3Dz5s1avy/SLf4ZSjqRnp6OgQMHYuTIkXjmmWfg7OwMQPND09LSEq+++iosLS3x77//YtasWcjOzsb8+fMf+rorVqxATk4OXnjhBQiCgM8++wzDhw/H1atXH/qX9f79+7F27Vq8+OKLsLKywjfffIMRI0YgISEB9vb2AIATJ05gwIABcHV1xZw5c6BSqTB37lw4OjpW6X2vXr0a+fn5mDJlCuzt7XHkyBEsXLgQ165dw+rVq8ttq1Kp0L9/f3Tq1Amff/45duzYgS+++AI+Pj6YMmUKAM0v/CFDhmD//v2YPHkyAgICsG7dOoSHh1epnjFjxmDOnDlYsWIF2rZtW27ff/75J7p164amTZsiLS0NP/74I0aNGoXnn38eOTk5+Omnn9C/f38cOXKkwuGWh5k1axY+/PBDDBo0CIMGDcLx48fx6KOPVvjFcPXqVaxfvx5PPvkkmjVrhhs3bmDp0qXo0aMHzp49Czc3NwQEBGDu3LmYNWsWJk2ahG7dugEAunTpUum+RVHE448/jl27dmHixIlo3bo1IiMj8frrryMpKQlfffVVue2r8rmoqYKCAvTs2ROXL1/GtGnT0KxZM6xevRrjx49HZmYmXn75ZQDA9u3bMWrUKPTp0weffvopAODcuXM4cOCAdpuIiAjMmzcPzz33HDp27Ijs7GwcPXoUx48fR79+/R5ay7Rp02BjY4OIiAhcuHABixcvRnx8PHbv3q0N4b/99hvCw8PRv39/fPrpp8jPz8fixYvRtWtXnDhxolwgLC0tRf/+/dG1a1d8/vnnVRoxzcrKQlpaWrk2QRAq9POvv/6KnJwcTJ06FYWFhfj666/Ru3dvxMTEaH+W7NixAwMHDoS3tzciIiJQUFCAhQsXIjQ0FMePH9fWev36dXTs2BGZmZmYNGkS/P39kZSUhDVr1iA/Px8mJiba/U6fPh22traYPXs24uLisGDBAkybNg1//PHHQ98b1SGRqBqmTp0q3vux6dGjhwhAXLJkSYXt8/PzK7S98MILorm5uVhYWKhtCw8PFz09PbX3Y2NjRQCivb29mJGRoW3fsGGDCED8+++/tW2zZ8+uUBMA0cTERLx8+bK2LTo6WgQgLly4UNsWFhYmmpubi0lJSdq2S5cuiUZGRhVeszKVvb958+aJgiCI8fHx5d4fAHHu3Lnltm3Tpo3Yrl077f3169eLAMTPPvtM21ZaWip269ZNBCAuW7bsoTV16NBBdHd3F1UqlbZt69atIgBx6dKl2tcsKioq97xbt26Jzs7O4rPPPluuHYA4e/Zs7f1ly5aJAMTY2FhRFEUxNTVVNDExEQcPHiyq1Wrtdu+8844IQAwPD9e2FRYWlqtLFDXfa4VCUa5voqKi7vt+7/2slPXZhx9+WG67J554QhQEodxnoKqfi8qUfSbnz59/320WLFggAhD/97//aduKi4vFzp07i5aWlmJ2drYoiqL48ssvi0qlUiwtLb3va7Vq1UocPHjwA2uqTNn3p127dmJxcbG2/bPPPhMBiBs2bBBFURRzcnJEGxsb8fnnny/3/JSUFNHa2rpce9nn96233qpWDZXdFAqFdruyPjUzMxOvXbumbf/vv/9EAOIrr7yibWvdurXo5OQkpqena9uio6NFmUwmjhs3Tts2btw4USaTiVFRURXqKvt8ltXXt2/fcp/ZV155RZTL5WJmZmaV3ifVDR76IZ1QKBSYMGFChXYzMzPt1zk5OUhLS0O3bt2Qn5+P8+fPP/R1n376adja2mrvl/11ffXq1Yc+t2/fvvDx8dHeDwkJgVKp1D5XpVJhx44dGDp0KNzc3LTb+fr6YuDAgQ99faD8+8vLy0NaWhq6dOkCURRx4sSJCttPnjy53P1u3bqVey+bN2+GkZGRdoQF0MwJmT59epXqATTziq5du4a9e/dq21asWAETExM8+eST2tcs+8tSrVYjIyMDpaWlaN++faWHjR5kx44dKC4uxvTp08sdLpsxY0aFbRUKBWQyzY8dlUqF9PR0WFpaokWLFtXeb5nNmzdDLpfjpZdeKtf+2muvQRRFbNmypVz7wz4XtbF582a4uLhg1KhR2jZjY2O89NJLyM3NxZ49ewAANjY2yMvLe+BhHBsbG5w5cwaXLl2qUS2TJk0qN+o4ZcoUGBkZYfPmzQA0ozqZmZkYNWoU0tLStDe5XI5OnTpVehjw7s9lVXz77bfYvn17udu93w8AGDp0KJo0aaK937FjR3Tq1Elba3JyMk6ePInx48fDzs5Ou11ISAj69eun3U6tVmP9+vUICwurdG7MvYdzJ02aVK6tW7duUKlUiI+Pr9b7JP1iUCGdaNKkSbkh1TJnzpzBsGHDYG1tDaVSCUdHR+1E3KysrIe+btOmTcvdLwstlR1rf9hzy55f9tzU1FQUFBRUegZCVc9KSEhI0P7wLJt30qNHDwAV35+pqWmFQ0p31wMA8fHxcHV1haWlZbntWrRoUaV6AGDkyJGQy+VYsWIFAKCwsBDr1q3DwIEDy4W+X375BSEhIdr5D46Ojti0aVOVvi93K/uh7ufnV67d0dGx3P4AzS+Sr776Cn5+flAoFHBwcICjoyNOnTpV7f3evX83NzdYWVmVay87E+3eXzoP+1zURnx8PPz8/LRh7H61vPjii2jevDkGDhwId3d3PPvssxXmycydOxeZmZlo3rw5goOD8frrr1frtPJ7vx+WlpZwdXXVzi0qC0C9e/eGo6Njudu2bduQmppa7vlGRkZwd3ev8v4BTeDo27dvuVuvXr0eWisANG/eXFtrWb9V9v8gICAAaWlpyMvLw82bN5GdnY2goKAq1Vebny9UdzhHhXTi7pGFMpmZmejRoweUSiXmzp0LHx8fmJqa4vjx43jzzTerdIrp/c4uEe+ZJKnr51aFSqVCv379kJGRgTfffBP+/v6wsLBAUlISxo8fX+H91dWZMk5OTujXrx/++usvfPvtt/j777+Rk5NTbvLj//73P4wfPx5Dhw7F66+/DicnJ8jlcsybNw9XrlzRW20ff/wx3n//fTz77LP44IMPYGdnB5lMhhkzZtTZKcf6/lxUhZOTE06ePInIyEhs2bIFW7ZswbJlyzBu3DjtxNvu3bvjypUr2LBhA7Zt24Yff/wRX331FZYsWYLnnnuu1jWU9fdvv/0GFxeXCo/feybd3aNhDYUhfBbo4RhUSG92796N9PR0rF27Ft27d9e2x8bGSljVHU5OTjA1Na10gbQHLZpWJiYmBhcvXsQvv/yCcePGadsfdlbGg3h6emLnzp3Izc0tN6py4cKFar3OmDFjsHXrVmzZsgUrVqyAUqlEWFiY9vE1a9bA29sba9euLTf0PXv27BrVDGj+Qvf29ta237x5s8JfpmvWrEGvXr3w008/lWvPzMyEg4OD9n51Vhr29PTEjh07kJOTU25UpezQYll9dcHT0xOnTp2CWq0u90u9slpMTEwQFhaGsLAwqNVqvPjii1i6dCnef/997YienZ0dJkyYgAkTJiA3Nxfdu3dHRERElYLKpUuXyo1e5ObmIjk5GYMGDQIA7eEvJycn9O3bt/ZvvhYqO7x18eJF7QTZsn6r7P/B+fPn4eDgAAsLC5iZmUGpVFZ6xhDVXw0rHpNBKftr5e6/ToqLi/Hdd99JVVI5crkcffv2xfr163H9+nVt++XLlys9jl7Z84Hy708UxXKnmFbXoEGDUFpaisWLF2vbVCoVFi5cWK3XGTp0KMzNzfHdd99hy5YtGD58OExNTR9Y+3///YdDhw5Vu+a+ffvC2NgYCxcuLPd6CxYsqLCtXC6v8Nfq6tWrkZSUVK6tbH2OqpyWPWjQIKhUKixatKhc+1dffQVBEKo830gXBg0ahJSUlHJnjZSWlmLhwoWwtLTUHhZMT08v9zyZTKZdhK+oqKjSbSwtLeHr66t9/GG+//57lJSUaO8vXrwYpaWl2v7o378/lEolPv7443LblanL03TXr19f7jNw5MgR/Pfff9paXV1d0bp1a/zyyy/lPhOnT5/Gtm3btOFLJpNh6NCh+PvvvytdHp8jJfUTR1RIb7p06QJbW1uEh4drl3f/7bffDOqHRUREBLZt24bQ0FBMmTJF+wsvKCjoocu3+/v7w8fHBzNnzkRSUhKUSiX++uuvWh3fDgsLQ2hoKN566y3ExcUhMDAQa9eurfb8DUtLSwwdOlQ7T+XeNS8ee+wxrF27FsOGDcPgwYMRGxuLJUuWIDAwELm5udXaV9l6MPPmzcNjjz2GQYMG4cSJE9iyZUu5UZKy/c6dOxcTJkxAly5dEBMTg99//73cSAyg+WvfxsYGS5YsgZWVFSwsLNCpU6dKVzkNCwtDr1698O677yIuLg6tWrXCtm3bsGHDBsyYMaPcxFld2LlzJwoLCyu0Dx06FJMmTcLSpUsxfvx4HDt2DF5eXlizZg0OHDiABQsWaEd8nnvuOWRkZKB3795wd3dHfHw8Fi5ciNatW2vnswQGBqJnz55o164d7OzscPToUaxZswbTpk2rUp3FxcXo06cPnnrqKVy4cAHfffcdunbtiscffxwAoFQqsXjxYowdOxZt27bFyJEj4ejoiISEBGzatAmhoaEVwl91bdmypdJJ8126dCn3Pff19UXXrl0xZcoUFBUVYcGCBbC3t8cbb7yh3Wb+/PkYOHAgOnfujIkTJ2pPT7a2ti53LaqPP/4Y27ZtQ48ePTBp0iQEBAQgOTkZq1evxv79+2FjY1Or90QSkOJUI6q/7nd6csuWLSvd/sCBA+IjjzwimpmZiW5ubuIbb7whRkZGigDEXbt2abe73+nJlZ0KintOl73f6clTp06t8FxPT89yp8uKoiju3LlTbNOmjWhiYiL6+PiIP/74o/jaa6+Jpqam9+mFO86ePSv27dtXtLS0FB0cHMTnn39ee7rr3afWhoeHixYWFhWeX1nt6enp4tixY0WlUilaW1uLY8eOFU+cOFHl05PLbNq0SQQgurq6VjglWK1Wix9//LHo6ekpKhQKsU2bNuI///xT4fsgig8/PVkURVGlUolz5swRXV1dRTMzM7Fnz57i6dOnK/R3YWGh+Nprr2m3Cw0NFQ8dOiT26NFD7NGjR7n9btiwQQwMDNSeKl723iurMScnR3zllVdENzc30djYWPTz8xPnz59f7tTTsvdS1c/Fvco+k/e7/fbbb6IoiuKNGzfECRMmiA4ODqKJiYkYHBxc4fu2Zs0a8dFHHxWdnJxEExMTsWnTpuILL7wgJicna7f58MMPxY4dO4o2NjaimZmZ6O/vL3700UflTjmuTNn3Z8+ePeKkSZNEW1tb0dLSUhwzZky5U3vL7Nq1S+zfv79obW0tmpqaij4+PuL48ePFo0ePare53+f3YTXc71bWH3f/P//iiy9EDw8PUaFQiN26dROjo6MrvO6OHTvE0NBQ0czMTFQqlWJYWJh49uzZCtvFx8eL48aNEx0dHUWFQiF6e3uLU6dO1Z6SX1bfvacw79q1q8LPJpKeIIoG9OctkYEYOnRorU4NJZLK8uXLMWHCBERFRRn88vVxcXFo1qwZ5s+fj5kzZ0pdDhkozlGhRu/e5e4vXbqEzZs3o2fPntIUREREWpyjQo2et7c3xo8fD29vb8THx2Px4sUwMTEpd3yciIikwaBCjd6AAQOwcuVKpKSkQKFQoHPnzvj4448rXYSKiIjqFueoEBERkcHiHBUiIiIyWAwqREREZLDq9RwVtVqN69evw8rKqlpLbhMREZF0RFFETk4O3NzcHnoNqXodVK5fvw4PDw+pyyAiIqIaSExMfOhVuet1UClbjjoxMRFKpRIlJSXYtm0bHn30URgbG0tcXePBfpcG+10a7HdpsN+loa9+z87OhoeHR7kLid5PvQ4qZYd7lEqlNqiYm5tDqVTyg1yH2O/SYL9Lg/0uDfa7NPTd71WZtsHJtERERGSwGFSIiIjIYDGoEBERkcGq13NUiIiodtRqNYqLi6Uu46FKSkpgZGSEwsJCqFQqqctpNGra78bGxpDL5TqpgUGFiKiRKi4uRmxsLNRqtdSlPJQoinBxcUFiYiLXzapDtel3GxsbuLi41Pr7xaBCRNQIiaKI5ORkyOVyeHh4PHTRLamp1Wrk5ubC0tLS4GttSGrS76IoIj8/H6mpqQAAV1fXWtXAoEJE1AiVlpYiPz8fbm5uMDc3l7qchyo7RGVqasqgUodq2u9mZmYAgNTUVDg5OdXqMBC/20REjVDZfAMTExOJK6GGqiwAl5SU1Op1GFSIiBoxzvcgfdHVZ4tBhYiIiAwWgwoRETVqXl5eWLBgQZW33717NwRBQGZmpt5qojsYVIiIqF6wtbWFXC6HIAiV3iIiImr0ulFRUZg0aVKVt+/SpQuSk5NhbW1do/1VFQORBs/6qYQoiriRXYSiUhU87S2kLoeIiACcP38eVlZWkMlk+OOPPzBr1ixcuHBB+7ilpaX2a1EUoVKpYGT08F9zjo6O1arDxMQELi4u1XoO1RxHVCrx2+F4PDJvJz7cdE7qUoiI6DZnZ2e4uLjAxcUF1tbWEARBe78sxGzZsgXt2rWDQqHA/v37ceXKFQwZMgTOzs6wtLREhw4dsGPHjnKve++hH0EQ8OOPP2LYsGEwNzeHn58fNm7cqH383pGO5cuXw8bGBpGRkQgICIClpSUGDBiA5ORk7XNKS0vx0ksvwcbGBvb29njzzTcRHh6OoUOH1rg/bt26hXHjxsHW1hbm5uYYOHAgLl26pH08Pj4eYWFhsLW1hYWFBVq2bInNmzdrnztmzBg4OjrCzMwMfn5+WLZsWY1r0ScGlUo0c9CMolxJzZW4EiKiuiGKIvKLSyW5iaKos/fx1ltv4ZNPPsG5c+cQEhKC3NxcDBo0CDt37sSJEycwYMAAhIWFISEh4YGvM2fOHDz11FM4deoUBg0ahDFjxiAjI+O+2+fn5+Pzzz/Hb7/9hr179yIhIQEzZ87UPv7pp5/i999/x7Jly3DgwAFkZ2dj/fr1tXqv48ePx9GjR7Fx40YcOnQIoihi0KBB2tOBp06diqKiIuzduxcxMTH49NNPtaNO77//Ps6ePYstW7bg3LlzWLx4MRwcHGpVj77w0E8lfJ0038j4jHwUl6phYsQ8R0QNW0GJCoGzIiXZ99m5/WFuoptfR3PnzkW/fv209+3s7NCqVSvt/Q8++ADr1q3Dxo0bMW3atPu+zvjx4zFq1CgAwMcff4xvvvkGR44cwYABAyrdvqSkBEuWLIGPjw8AYNq0aZg7d6728YULF+Ltt9/GsGHDAACLFi3Sjm7UxKVLl7Bx40YcOHAAXbp0AQD8/vvv8PDwwPr16/Hkk08iISEBI0aMQHBwMADA29tb+/yEhAS0adMG7du3B6AZVTJU/A1cCRelKSwVRlCpRcSl50ldDhERVVHZL94yubm5mDlzJgICAmBjYwNLS0ucO3fuoSMqISEh2q8tLCygVCq1S8JXxtzcXBtSAM2y8WXbZ2Vl4caNG+jYsaP2cblcjnbt2lXrvd3t3LlzMDIyQqdOnbRt9vb2aNGiBc6d00xbeOmll/Dhhx8iNDQUs2fPxqlTp7TbTpkyBatWrULr1q3xxhtv4ODBgzWuRd84olIJQRDg42iB6GtZuJyai+bOVlKXRESkV2bGcpyd21+yfeuKhUX5EyBmzpyJ7du34/PPP4evry/MzMzwxBNPPPSK0cbGxuXuC4LwwIs3Vra9Lg9p1cRzzz2H/v37Y9OmTdi2bRvmzZuHL774AtOnT8fAgQMRHx+PzZs3Y/v27ejTpw+mTp2Kzz//XNKaK8MRlfvwuX345zLnqRBRIyAIAsxNjCS56XN13AMHDmD8+PEYNmwYgoOD4eLigri4OL3trzLW1tZwdnZGVFSUtk2lUuH48eM1fs2AgACUlpbiv//+07alp6fjwoULCAwM1LZ5eHhg8uTJWLt2LV577TX88MMP2sccHR0RHh6O//3vf1iwYAG+//77GtejTxxRuQ9fBhUionrPz88Pa9euRVhYGARBwPvvv//AkRF9mT59OubNmwdfX1/4+/tj4cKFuHXrVpVCWkxMDKys7ozsC4KAVq1aYciQIXj++eexdOlSWFlZ4a233kKTJk0wZMgQAMCMGTMwcOBANG/eHLdu3cKuXbsQEBAAAJg1axbatWuHli1boqioCP/884/2MUPDoHIfvo4MKkRE9d2XX36JZ599Fl26dIGDgwPefPNNZGdn13kdb775JlJSUjBu3DjI5XJMmjQJ/fv3r9JVhbt3717uvlwuR2lpKZYtW4aXX34Zjz32GIqLi9G9e3ds3rxZexhKpVJh6tSpuHbtGpRKJQYMGICvvvoKgGYtmLfffhtxcXEwMzNDt27dsGrVKt2/cR0QRKkPotVCdnY2rK2tkZWVBaVSiZKSEmzevBmDBg2qcLywumLT8tDr890wNZbh7JwBkMl44a770WW/U9Wx36XRUPq9sLAQsbGxaNasGUxNTaUu56HUajWys7OhVCohk9X/WQtqtRoBAQF46qmn8MEHH0hdzn3Vpt8f9Bm79/f3g3BE5T48bM1gIpehsESNpMwCeNiZS10SERHVU/Hx8di2bRt69OiBoqIiLFq0CLGxsRg9erTUpRm8+h9L9cRILtMu/MbDP0REVBsymQzLly9Hhw4dEBoaipiYGOzYscNg54UYEo6oPICvkyUu3MjB5dRc9PJ3krocIiKqpzw8PHDgwAGpy6iXOKLyADxFmYiISFoMKg+gPUX5JoMKERGRFBhUHuDuU5Tr8clRRERE9RaDygN4O1pAEICsghKk5T54uWUiIiLSPQaVBzA1lsPDVnNaMuepEBER1T0GlYfgPBUiIiLpMKg8RFlQucIRFSKiBqFnz56YMWOG9r6XlxcWLFjwwOcIgoD169fXet+6ep3GhEHlIXjNHyIiwzBy5EgMHDiw0sf27dsHQRBw6tSpar9uVFQUJk2aVNvyyomIiEDr1q0rtCcnJ9/3PejK8uXLYWNjo9d91CUGlYfgWipERIZh7Nix2LFjB65du1bhsWXLlqF9+/YICQmp9us6OjrC3LxuLpPi4uIChUJRJ/tqKBhUHqLs0E9KdiFyCkskroaIqPHq378/HB0dsXz58nLtubm5WL16NSZOnIj09HSMGjUKTZo0gbm5OYKDg7Fy5coHvu69h34uXbqE7t27w9TUFIGBgdi+fXuF57z55pto3rw5zM3N4e3tjffffx8lJZrfEcuXL8ecOXMQHR0NQRAgCIK25nsP/cTExKB3794wMzODvb09Jk2ahNzcO38Yjx8/HkOHDsXnn38OV1dX2NvbY+rUqdp91URCQgKGDBkCS0tLKJVKPPXUU7hx44b28ejoaPTq1QtWVlawsbFBz549cfToUQCaaxaFhYXB1tYWFhYWaNmyJTZv3lzjWqqCS+g/hLWZMRytFLiZU4QrN/PQ2sNG6pKIiHRPFIGSfGn2bWwOCA+/Qr2RkRHGjh2L5cuX491334Vw+zmrV6+GSqXCqFGjkJubi3bt2uHNN9+EUqnEpk2bMHbsWPj4+KBjx44P3Ydarcbw4cPh7OyM//77D1lZWeXms5SxsrLC8uXL4ebmhpiYGDz//POwsrLCG2+8gaeffhqnT5/G1q1bsWPHDgCAtbV1hdfIy8tD//790blzZ0RFRSE1NRXPPfccpk2bVi6M7dq1C66urti1axcuX76Mp59+Gq1bt8bzzz//0PdT2fsrCyl79uxBaWkppk6diqeffhq7d+8GAIwZMwZt2rTB4sWLIQgCDh06pL1S+NSpU1FcXIy9e/fCwsICZ8+ehaWlZbXrqA4GlSrwdbTEzZwiXLqRw6BCRA1TST7wsZs0+37nOmBiUaVNJ0yYgM8//xx79uxBz549AWgO+4wYMQLW1tawtrbGzJkztdtPnz4dkZGR+PPPP6sUVHbs2IHz588jMjISbm6a/vj4448rzCt57733tF97eXlh5syZWLVqFd544w2YmZnB0tISRkZGcHFxue++VqxYgcLCQvz666+wsNC8/0WLFiEsLAyffvopnJ2dAQC2trZYtGgR5HI5/P39MXjwYOzcubNGQWXnzp2IiYlBbGwsPDw8AAC//vorWrZsiaioKHTo0AEJCQl4/fXX4e/vD7VaDWdnZyiVSgCa0ZgRI0YgODgYAODt7V3tGqqLh36qgKcoExEZBn9/f3Tp0gU///wzAODy5cvYt28fJk6cCABQqVT44IMPEBwcDDs7O1haWiIyMhIJCQlVev1z587Bw8NDG1IAoHPnzhW2++OPPxAaGgoXFxdYWlrivffeq/I+7t5Xq1attCEFAEJDQ6FWq3HhwgVtW8uWLSGXy7X3XV1dkZqaWq193b1PDw8PbUgBgMDAQNjY2ODcuXMAgFdffRXPPfcc+vbti08//RSxsbHabV966SV8+OGHCA0NxezZs2s0ebm6OKJSBTxFmYgaPGNzzciGVPuuhokTJ2L69On49ttvsWzZMvj4+KBHjx4AgPnz5+Prr7/GggULEBwcDAsLC8yYMQPFxbpbXfzQoUMYM2YM5syZg/79+8Pa2hqrVq3CF198obN93K3ssEsZQRCgVqv1si9Ac8bS6NGjsWnTJmzevBkRERFYsWIFRowYgeeeew79+/fHpk2bsG3bNsybNw9ffPEFpk+frrd6OKJSBb4884eIGjpB0Bx+keJWhfkpd3vqqacgk8mwYsUK/Prrr3j22We181UOHDiAIUOG4JlnnkGrVq3g7e2NixcvVvm1AwICkJiYiOTkZG3b4cOHy21z8OBBeHp64t1330X79u3h5+eH+Pj4ctuYmJhApVI9dF/R0dHIy8vTth04cAAymQwtWrSocs3VUfb+EhMTtW1nz55FZmYmAgMDtW3NmzfHK6+8gsjISDz22GPl5sx4eHhg8uTJWLt2LV577TX88MMPeqm1DINKFZQFlYSMfBSWPPiDR0RE+mVpaYmnn34ab7/9NpKTkzF+/HjtY35+fti+fTsOHjyIc+fO4YUXXih3RsvD9O3bF82bN0d4eDiio6Oxb98+vPvuu+W28fPzQ0JCAlatWoUrV67gm2++wbp168pt4+XlhdjYWJw8eRJpaWkoKiqqsK8xY8bA1NQU4eHhOH36NHbt2oXp06dj7Nix2vkpNaVSqXDy5Mlyt3PnzqFv374IDg7GmDFjcPz4cRw5cgTjxo1Djx490L59exQUFGDatGnYvXs34uPjceDAAZw4cQIBAQEAgBkzZiAyMhKxsbE4fvw4du3apX1MXxhUqsDJSgErhRHUIhCXnvfwJxARkV5NnDgRt27dQv/+/cvNJ3nvvffQtm1b9O/fHz179oSLiwuGDh1a5deVyWRYt24dCgoK0LFjRzz33HP46KOPym3z+OOP45VXXsG0adPQunVrHDx4EO+//365bUaMGIEBAwagV69ecHR0rPQUaXNzc0RGRiIjIwMdOnTAE088gT59+mDRokXV64xK5Obmok2bNuVuYWFhEAQBGzZsgK2tLbp3746+ffvC29sbf/zxBwBALpcjPT0d48aNQ/PmzTFy5Ej07dsXERERADQBaOrUqQgICMCAAQPQvHlzfPfdd7Wu90EEURRFve5Bj7Kzs2FtbY2srCwolUqUlJRg8+bNGDRoUIVjerU19NsDOJmYiUWj2+CxEIlmxhsoffY73R/7XRoNpd8LCwsRGxuLZs2awdTUVOpyHkqtViM7OxtKpRIyGf/Griu16fcHfcbu/f39IPxuVxHnqRAREdU9BpUq8mNQISIiqnMMKlXEERUiIqK6J2lQUalUeP/999GsWTOYmZnBx8cHH3zwAQxx2kxZULmalgeV2vDqIyIiaogkXfDt008/xeLFi/HLL7+gZcuWOHr0KCZMmABra2u89NJLUpZWgbutOUyMZCguVeParXx42ldtuWciIkNmiH8YUsOgq8+WpEHl4MGDGDJkCAYPHgxAc975ypUrceTIESnLqpRcJsDbwQLnU3JwOTWXQYWI6rWyJdmLi4thZmYmcTXUEOXnay5yWduz4yQNKl26dMH333+Pixcvonnz5oiOjsb+/fvx5ZdfVrp9UVFRuUVzsrOzAWhOFyy7ld3XB5/bQeVCSha6+9rpZR/1kb77nSrHfpdGQ+l3URRhamqK1NRUyOVygz/lVxRFFBcXo6CgQLsKLelfTfpdFEXk5+fj5s2bUCqVUKvVFZb8r87/H0nXUVGr1XjnnXfw2WefQS6XQ6VS4aOPPsLbb79d6fYRERGYM2dOhfYVK1bA3Lx614qoiS2JArZek6OToxqjffV3nQUiorogk8ng6OhYr9eDIcOkVquRk5ODnJycSh/Pz8/H6NGjq7SOiqRBZdWqVXj99dcxf/58tGzZEidPnsSMGTPw5ZdfIjw8vML2lY2oeHh4IC0tTbvg2/bt29GvXz+9/MfbHJOCl/88hdYe1lg9qZPOX7++0ne/U+XY79JoaP2uVqtRUlJi8HNVSktLcfDgQXTp0gVGRryebl2pSb8LggAjI6NyV3y+V3Z2NhwcHKoUVCT9br/++ut46623MHLkSABAcHAw4uPjMW/evEqDikKhgEKhqNBubGxc7gfGvfd1pYWbNQDgys08GBkZcfjxHvrqd3ow9rs0GlK/V/Zz1dCUlJSgtLQUlpaWDabf6wN99Xt1XkvSg5L5+fkVjovK5XK9Xr66Npo5WEAmADmFpbiZU/ECU0RERKRbko6ohIWF4aOPPkLTpk3RsmVLnDhxAl9++SWeffZZKcu6L4WRHE3tzBGXno/LqblwUhr+9TGIiIjqM0mDysKFC/H+++/jxRdfRGpqKtzc3PDCCy9g1qxZUpb1QL5OlpqgcjMXXXwdpC6HiIioQZM0qFhZWWHBggVYsGCBlGVUi4+TJXacS+VS+kRERHXAsE+cN0C+jrzmDxERUV1hUKkmXpyQiIio7jCoVJPP7aCSmlOErIL6vTIlERGRoWNQqSalqTGclZo1BziqQkREpF8MKjVQdvjnCoMKERGRXjGo1IB2Qu1NBhUiIiJ9YlCpAU6oJSIiqhsMKjXgw6BCRERUJxhUaqBsRCXxVj4KS1QSV0NERNRwMajUgKOlAtZmxhBF4OrNPKnLISIiarAYVGpAEIQ781Q4oZaIiEhvGFRqiEvpExER6R+DSg1xLRUiIiL9Y1CpIZ6iTEREpH8MKjVUFlRi0/JQqlJLXA0REVHDxKBSQ01szGBqLEOxSo3EWwVSl0NERNQgMajUkEwmwNuBh3+IiIj0iUGlFjhPhYiISL8YVGqBQYWIiEi/GFRqgYu+ERER6ReDSi3cvZaKKIoSV0NERNTwMKjUgpe9BeQyAblFpbiRXSR1OURERA0Og0otmBjJ4GlnDoDzVIiIiPSBQaWWfLQTanMkroSIiKjhYVCpJU6oJSIi0h8GlVriVZSJiIj0h0GllriWChERkf4wqNRS2RyVtNxiZOYXS1wNERFRw8KgUkuWCiO4WpsC4KgKERGRrjGo6AAP/xAREekHg4oO+HBCLRERkV4wqOiAnzNPUSYiItIHBhUd4CnKRERE+sGgogNlc1SSMgtQUKySuBoiIqKGg0FFB+wtFbA1N4YoAld4+IeIiEhnGFR0pGxUhUGFiIhIdxhUdISnKBMREekeg4qO8BRlIiIi3WNQ0RGOqBAREekeg4qOlAWVuPQ8lKrUEldDRETUMDCo6IibtRnMjOUoUYmIz8iXuhwiIqIGgUFFR2QyAT5OFgB4+IeIiEhXGFR0iCvUEhER6RaDig5p11JhUCEiItIJBhUd0p75w0XfiIiIdIJBRYfuHlERRVHiaoiIiOo/BhUd8rS3gJFMQF6xCslZhVKXQ0REVO8xqOiQsVwGT3tzAJxQS0REpAsMKjrGFWqJiIh0h0FFx8qCyiUGFSIiolpjUNExnqJMRESkOwwqOubraAWApygTERHpAoOKjpUto5+RV4yMvGKJqyEiIqrfGFR0zNzECE1szABwQi0REVFtMajoAc/8ISIi0g0GFT1gUCEiItINBhU94DV/iIiIdINBRQ94ijIREZFuMKjoga+jJqgkZRYgr6hU4mqIiIjqLwYVPbC1MIG9hQkA4OrNPImrISIiqr8YVPTERztPJUfiSoiIiOovBhU94Zk/REREtcegoidl81QYVIiIiGqOQUVPOKJCRERUewwqelIWVOLT81GiUktcDRERUf3EoKInrtamsDCRo1QtIj6dZ/4QERHVBIOKngiCcOfMHx7+ISIiqhEGFT3ihFoiIqLakTyoJCUl4ZlnnoG9vT3MzMwQHByMo0ePSl2WTnBEhYiIqHaMpNz5rVu3EBoail69emHLli1wdHTEpUuXYGtrK2VZOsOLExIREdWOpEHl008/hYeHB5YtW6Zta9asmYQV6dadixPmQa0WIZMJEldERERUv0gaVDZu3Ij+/fvjySefxJ49e9CkSRO8+OKLeP755yvdvqioCEVFRdr72dnZAICSkhLtrey+IXCzMoaxXEBBiQoJ6TloYmMmdUl6YWj93liw36XBfpcG+10a+ur36ryeIIqiqNO9V4OpqSkA4NVXX8WTTz6JqKgovPzyy1iyZAnCw8MrbB8REYE5c+ZUaF+xYgXMzc31Xm9NzDspR0qBgMn+KgTYStbVREREBiM/Px+jR49GVlYWlErlA7eVNKiYmJigffv2OHjwoLbtpZdeQlRUFA4dOlRh+8pGVDw8PJCWlgalUomSkhJs374d/fr1g7GxcZ28h4eZtvIkIs+m4vVH/TCpW8M5rHU3Q+z3xoD9Lg32uzTY79LQV79nZ2fDwcGhSkFF0kM/rq6uCAwMLNcWEBCAv/76q9LtFQoFFApFhXZjY+NyHXjvfSl19HZA5NlULDuYgGceaQZrc8OoSx8Mqd8bE/a7NNjv0mC/S0PX/V6d15L09OTQ0FBcuHChXNvFixfh6ekpUUW698wjTeHtaIG03CJ8svW81OUQERHVK5IGlVdeeQWHDx/Gxx9/jMuXL2PFihX4/vvvMXXqVCnL0imFkRzzhgUDAFYeScCR2AyJKyIiIqo/JA0qHTp0wLp167By5UoEBQXhgw8+wIIFCzBmzBgpy9K5Tt72GNXRAwDw9tpTKCpVSVwRERFR/SDpHBUAeOyxx/DYY49JXYbevTUgANvPpuLKzTx8t+sKXunXXOqSiIiIDJ7kS+g3Ftbmxoh4XDNx+Lvdl3E5NUfiioiIiAwfg0odGhzsij7+TihRiXh7bQzUaq6rQkRE9CAMKnVIEATMHRoEcxM5ouJuYVVUotQlERERGTQGlTrWxMYMMx9tAQCYt+UcUrMLJa6IiIjIcDGoSCC8ixdC3K2RU1iKOX+flbocIiIig8WgIgG5TMC84cGQywRsiknGjrM3pC6JiIjIIDGoSKSlmzWeu33tn1kbTiO3qFTiioiIiAwPg4qEZvRpDg87M1zPKsQX2y48/AlERESNDIOKhMxM5PhoqGZ5/eUH43AyMVPagoiIiAwMg4rEujd3xLA2TSCKwNtrY1CiUktdEhERkcFgUDEA7w0OgK25Mc4lZ+PHfbFSl0NERGQwGFQMgL2lAu8N1iyvv2DHRcSn50lcERERkWFgUDEQw9s2QaivPYpK1Xh33WmIIpfXJyIiiRVkwrhU2j+eGVQMhCAI+GhoMBRGMuy/nIZ1J5KkLomIiBojUQSuHQPWvwijb4LgfTNS0nIYVAyIl4MFXu7rBwD44J+zyMgrlrgiIiJqNIpygaPLgKXdgR97Ayd/h1BaCNu8q5KWxaBiYJ7v5g1/Fyvcyi/Bh5u4vD4REenZjTPApteAL/yBf2YAKacAuQIIGYnS8C047POapOUZSbp3qsBYLsO84cEYvvgg1h5PwvA27ujq5yB1WURE1JCUFAJnNwBHfwIS/7vTbucDtH8WaD0aMLeDWFICnNosXZ1gUDFIbZraIryzF5YfjMM762IQOaM7zEzkUpdFRET1XfoV4OjPwMnfgYJbmjaZEeA/WBNQmvUABEHaGu/BoGKgZvZvgcgzKUjIyMc3/17CmwP8pS6JiIjqI1UJcGGzJqBc3X2n3doDaBcOtBkLWLlIVt7DMKgYKEuFEeYOCcLzvx7F93uv4vFWbghwVUpdFhER1ReZicDxXzW33JTbjQLg96hm9MSvHyAz/NF6BhUD1i/QGYOCXbA5JgVvrY3B2ildIJcZ1pAcEREZELUKuLxTM3pyKRIQb1+WxcIJaDsWaBsO2HpKW2M1MagYuIiwlth3KQ3RiZn49VAcJoQ2k7okIiIyNLmpwInfgGPLgcyEO+1e3YAOE4EWgwEjE8nKqw0GFQPnpDTFWwP98e6605gfeQGPtnRBExszqcsiIiKpiSIQt19z5s65fwB1iabd1AZoPQZoNx5wbC5lhTrBoFIPjOrQFOuOJ+Fo/C3MWn8aP4a3h2Bgs7KJiKiOFNwCTq7UHN5Jv3Sn3b2DZu5Jy2GAccP5g5ZBpR6QyQTMGx6MQd/sw87zqfjnVDLCWrlJXRYREdUVUQSSjmnCyem/gNJCTbuJJRDyFNBuAuAaIm2NesKgUk/4OVthai9fLNhxCbM2nEZnH3s4WCqkLouIiPSpKBeIWa0JKCmn7rQ7B2lGT0KeAhRW0tVXBxhU6pEXe/oi8swNnEvOxvvrT+O7MW15CIiIqCFKOa0JJ6f+BIpzNG1yBRA0XBNQ3DsY3MJs+sKgUo+YGMnw+ZMhGLLoALacTuEhICKihqSkEDi7XhNQHrCsfWPDoFLPtHSzxtRevvh6p+YQ0CPe9nC04iEgIqJ6677L2j92e1n77o1m9KQyDCr10NRevth29s4hoMXP8BAQEVG9UrasfdRPQOyeO+31ZFn7usSgUg/dfQho65kU/H0qGY/zEBARkeHLTASO/3J7Wfsbtxvr37L2dYlBpZ5q6WaNab01ZwHN3nAanXkIiIjIMD1wWftxmhEUm6bS1mjAGFTqsam9fLHtzA2c5SEgIiLDc79l7Zt114ye1ONl7esSg0o9ZiyXYT4PARERGQ5RBOL2aUZPzv0NqEs17WXL2refADj4SVpifVOjoJKYmAhBEODu7g4AOHLkCFasWIHAwEBMmjRJpwXSg/EQEBGRAcjPAKJX3WdZ+4lAy6ENaln7uiSryZNGjx6NXbt2AQBSUlLQr18/HDlyBO+++y7mzp2r0wLp4ab28kWgqxK38kvw3voYiKIodUlERA2fKALXjgLrpgBfBgCRb2tCioml5tDOC/uA53YArUcxpNRCjYLK6dOn0bFjRwDAn3/+iaCgIBw8eBC///47li9frsv6qAqM5TJ8/mQrGMkERJ65gb9PJUtdEhFRw1WUoxk5WdoN+LEPEL1Cc+0d5yBg8JfAa+eBx75qsNfeqWs1OvRTUlIChUJzeGHHjh14/PHHAQD+/v5ITuYvSSkEuil5CIiISJ8qW9beyBRoWbasfftGvTCbvtQoqLRs2RJLlizB4MGDsX37dnzwwQcAgOvXr8Pe3l6nBVLV3X0W0HvrY7DkmXY8C4iIqDbKlrWP+gm4duROu72vJpy0GtUol7WvSzUKKp9++imGDRuG+fPnIzw8HK1atQIAbNy4UXtIiOpe2SGgxxftR+SZG9gYfR1DWjeRuiwiovon7TJwbBmXtTcANQoqPXv2RFpaGrKzs2Fra6ttnzRpEszNzXVWHFVfoJsS03v74asdFzF74xl09rGHk5Wp1GURERk+VQlwfpPm8E6FZe3H317W3lmy8hqrGgWVgoICiKKoDSnx8fFYt24dAgIC0L9/f50WSNX3Yi8fRJ5J0RwCWncaS8fyEBAR0X3db1n75v01oye+fbmsvYRqFFSGDBmC4cOHY/LkycjMzESnTp1gbGyMtLQ0fPnll5gyZYqu66RquPsQ0LazPARERFSBdln7n4BL27isvQGr0enJx48fR7du3QAAa9asgbOzM+Lj4/Hrr7/im2++0WmBVDNlh4AAYPbGM0jNKZS4IiIiA5BzA9j7OfB1a2DFk8DFrZqQ0qwH8OQvwKtngT7vM6QYkBqNqOTn58PKygoAsG3bNgwfPhwymQyPPPII4uPjdVog1dyLvXyw7WwKzlznISAiasQetKx9m2c080+4rL3BqtGIiq+vL9avX4/ExERERkbi0UcfBQCkpqZCqVTqtECqubsXgis7BERE1GjkZwCHvgUWdQB+CQPOrNOEFPeOwNAlmoXZ+n/EkGLgajSiMmvWLIwePRqvvPIKevfujc6dOwPQjK60adNGpwVS7QS48iwgImpEypa1P/ozcGatZsVYQLOsfchTmsmxLsHS1kjVUqOg8sQTT6Br165ITk7WrqECAH369MGwYcN0Vhzpxt2HgN5ddxrf8xAQETU0RTlAzGpNQEmJudPuHAx0eBYIfhJQWElXH9VYjYIKALi4uMDFxQXXrl0DALi7u3OxNwN191lA23kWEBE1JCmnNWfunPoTKM7VtHFZ+walRnNU1Go15s6dC2tra3h6esLT0xM2Njb44IMPoFardV0j6UDZISCAZwERUT1XUgCcXAn82A9YEqoZRSnO1Sxr3/9j4NVzwLDFgEcHhpQGoEYjKu+++y5++uknfPLJJwgNDQUA7N+/HxERESgsLMRHH32k0yJJN6b01CwEx0NARFQfWRQmQ7bjfeDUqvLL2geEaUZPvLoxmDRANQoqv/zyC3788UftVZMBICQkBE2aNMGLL77IoGKg7j0EtOHkdQxtw0NARGTACrOAc/9AHr0SfeP23Wm3bqpZlI3L2jd4NQoqGRkZ8Pf3r9Du7++PjIyMWhdF+lN2COjL7Rfx2upo/HIoDp297fGItz3ae9nC3KTG05aIiHSjpAC4GAmcXgNc3AaoiiADIEKA6NsPso7PA759uKx9I1Gj30qtWrXCokWLKqxCu2jRIoSEhOikMNKfKT19cDzhFnZfuIkTCZk4kZCJ73ZfgZFMQCsPGzzibYdHvO3RzpPBhYjqiKoUuLpbE07O/QMU59x5zNEfqsBh2JnmiF5Dx0FmbCxZmVT3avRb6LPPPsPgwYOxY8cO7Roqhw4dQmJiIjZv3qzTAkn3jOUyLJ/QEYkZ+Th8NR2Hr2bg8NV0JGUW4Fj8LRyLv4Vvd12BsVxAK3cbPOJtj84+9mjb1BZmJvwLhoh0RK0Grh3RnFZ8Zj2Qn3bnMWsPIGiE5rRi55ZQl5aigL9fGqUaBZUePXrg4sWL+Pbbb3H+/HkAwPDhwzFp0iR8+OGH2usAkWHzsDOHh505nmzvAQBIzMjHoavpmvByJR3XswpxNP4WjsbfwqJdl2EsF9Da43Zw8bZHW09bmBozuBBRNYgicOO0JpycXgtkJd55zNwBaDkMCH5Cs3qsrEYnplIDU+NxfTc3twqTZqOjo/HTTz/h+++/r3VhVPfKgstT7T0giiISMwpuj7ik49DVdCRnFSIq7hai4m5h4b+XYSKXobWHDTp62cChSOrqicigZVwFYv7SBJS0C3faTaw0Z+0EjwCa9QTkPNxM5fETQZUSBAFN7c3R1N4cT3XQBJeEuw4VHbqSjpTsQhyJy8CRuAzYmMjxRJgaPHRMRFo5KZpRk9NrgKRjd9rlCqD5o5rDOn6PAsZm0tVIBo9BhapEEAR42lvA094CT3doClEUEZ+uCS5fbr+I1JwibDh5HWM6N5O6VCKSUsEt4OxGTTiJ3QdA1LQLMsC7JxD0BBDwGGBqLWWVVI8wqFCNCIIALwcLeDlYILugGB9vuYAf98dhZCcvyGVccImoUSnOBy5uAWLWAJe2A+qSO495dNKEk5ZDAUsnyUqk+qtaQWX48OEPfDwzM7M2tVA99VS7Jliw7Txi0/Ox7UwKBga7Sl0SEembqgS48q8mnJzfBJTk3XnMqaVmQmzQCMDWU7oaqUGoVlCxtn7wUJ21tTXGjRtXq4Ko/rFQGKGbi4jIJAFL9lzBgCAXLs1P1BCp1UDCQU04ObsBKLhrgU8bz9vh5AnAOVC6GqnBqVZQWbZsmb7qoHquu6sae1KNEH0tC4eupKOLr4PUJRGRLogikBx9e62TdUB20p3HLJyAoOGaSbFN2vE6O6QXnKNCOmFpDDzZtgl++y8Ri/dcYVAhqu/SLmlGTk6vAdIv32lXWAOBYZqRk2bduYw96R2DCunMs6FeWBF1DfsupeF0UhaCmnBWP1G9kpUEnFmrGT1Jjr7TbmQKNB9w+3TifoCRQroaqdFhUCGdcbc1Q1iIK9afvI4le65g0ei2UpdERA+TnwGcXa9ZjC3+AO6cTiwHfHprwon/IEBhJWWV1IgxqJBOTe7pg/Unr2NzTDLi0/PgaW8hdUlEdK+iXODCZs2hnSs7AXXpnceadtGsEhs4FLDgIVySHoMK6ZS/ixK9Wjhi14Wb+H7vVXw0LFjqkogIAEqLgMs7NYd1LmwBSgvuPOYSojljp+VwwMZDuhqJKmEwV3z65JNPIAgCZsyYIXUpVEtTevoCAFYfu4bUnEKJqyFqxNQq4OoeYMM04HM/YNUozRyU0gLAzhvo8SYwNQqYvA8IfZkhhQySQYyoREVFYenSpQgJCZG6FNKBDl62aNvUBscTMrH8QBzeGOAvdUlEjYcoAknHNWfrnF4L5KbceczKVTNqEvwE4NaGpxNTvSB5UMnNzcWYMWPwww8/4MMPP5S6HNIBQRAwpacvnv/1KH47HI8pPX1gZcqrFRLp1c0LmsM6MWuAW7F32k1tgMAhmkmxnl14OjHVO5IHlalTp2Lw4MHo27fvQ4NKUVERioqKtPezs7MBACUlJdpb2X2qO5X1e3cfW/g6WuDyzTz8digWz3flxQp1jZ93aRhUv2ddg+zsWshOr4WQelrbLBqbQ2w+AOrA4RB9egNyE80DKrXmVg8ZVL83Ivrq9+q8niCKoqjTvVfDqlWr8NFHHyEqKgqmpqbo2bMnWrdujQULFlS6fUREBObMmVOhfcWKFTA3N9dztVRdR1IF/H5FDqWxiNltVTAymBlRRPWXSUk23DKPwP3WIdjnXdK2qyFHqjIY12w7I8W6LVRyrnVChis/Px+jR49GVlYWlErlA7eVLKgkJiaiffv22L59u3ZuysOCSmUjKh4eHkhLS4NSqURJSQm2b9+Ofv36wdiYhxrqyv36vbhUjT5f7UNKdhE+GhKIp9q7S1hlw8PPuzQk6feiHAgXNkN2Zi2E2N0QRBUAQIQA0bML1C1HQPQPA8xs66YeCfDzLg199Xt2djYcHByqFFQkO/Rz7NgxpKamom3bO4uCqVQq7N27F4sWLUJRURHk8vLHUhUKBRSKin8lGBsbl+vAe+9T3aj4fQCe7+6DD/45ix8PxGNkJy/IZZy8p2v8vEtD7/1eUghc3q6Zd3IxEii96ww6tzZA0BMQgoZDULoZzumbdYCfd2nout+r81qSBZU+ffogJiamXNuECRPg7++PN998s0JIofppZAcPfLPzEmLT8rDtTAoGBrtKXRKR4VKVAnF7NRNiz/0NFGXfeczeTzMhNvgJwN5HuhqJ6phkQcXKygpBQUHl2iwsLGBvb1+hneovC4URwrt44Zudl7B4zxUMCHKBwFMiie4QReBalCacnFkL5N2885iyCRA0QhNOXEJ4OjE1SpKf9UP1XM4NyNdOQussFWR7ojULRindbt+aAGa2GN/FC9/vvYJT17Jw6Eo6r6xMBAA3zty5OnFmwp12Mzug5VDN6InHI4CsMR3YIarIoILK7t27pS6BqisrEbLY3fAEgP37Kj5uZAo7K1dstbbBiSxzZGxYD3RtfyfMWLkBlk5c24Eah1txt8PJX0Dq2TvtJpaA/2BNOPHuCcg5B4OojEEFFaqHbDxRGrYIF4/uRgtXK8hzU4DsJCAnWTOEXVoI3IqFFwAvOYAcAFv+LP8aglyzYqbSDVC6akZilG632+762sik7t8fUW3lpgJn1mkmxV6LutMuNwH8HtUc2mk+ADDhEgtElWFQodqxdIQYMhKXrinhN3AQ5HfP5C4t0gSW7OtA9nVs2HcU6ddj0cYmH21sCjTtOcmAqAKyr2luD2LheOeQ0r1BRtlEE3JMeLVmMgAFmcD5fzThJHYvIN5eZE2QAV7dNCMnAWGAmY2UVRLVCwwqpD9GCsDWS3MD0MLxUQxYsA+ydODf8T3h5WChOcshLxXITtaMxGRfB3Kua8ONpi0ZUBVpRmjybgLJ0fffp6m1JrRoR2ia3DVn5vbN1IaTEkn3SgqAi1s1h3YubQNUxXcea9L+9tWJhwFWLtLVSFQPMahQnfF3UaK3vxP+PZ+K7/ddxcfDggG50Z0AgXaVP1EUgfyMBweZ7CSgOBcozNLc7j7+fy9j8wcHGWUTwNyBkxjp4VQlmqsTx6zWjKAU5955zNFfE06CRmiuVExENcKgQnVqcg8f/Hs+FWuOXcOMvn5wsjJ9+JMEAbCw19xcH3CF7cLs24eaku4KMveEmoIMoCQfyLiiud2PzLjyeTNlE4CVbpq/jDnpsfER1bDLvQDZll3A+Y1Afvqdx6ybAsEjgKAnAOeWHLkj0gEGFapTHbxs0c7TFsfib2HZgTi8OcBfdy9uqtTcHFvcf5uSgnLzZsqFmLL2nBRAXQJkJWhu9yUAls73DzJlN2Mz3b1HejBR1BxyKSnQ3EoLNCu83v11acFdjxdW3LbC44WacHt7W6PCTHQruHVnn+YOmkM6wU8CHh0ZToh0jEGF6pQgCJjcwwfP/3oU/zsUjyk9faA0rcNRCWMzzTD8g4biVSVA7o3y82buDjJlh5vUJUBuiuZ2/cT9X8/M9j4TgO+6KZQN9xecqvSuwHDnF77m3/wqhYOK2z4gfEC/ly8TAJTITCEPGgpZyFNAsx6aQ5hEpBf830V1ro+/E/ycLHEpNRcr/kvA5B4Gthy43Biwdtfc0KHybdRqzZD/w+bNlOQDBbc0txun779PE8tKgsw9ZzWZ2+smzKjVml/8VRpNuN/jVdn2drhQl9a+5poQZJr5SEammn+NTW9/baa5GZndbjO7q830Pl/f2bZEMMbWo1cw4LGhkPGaM0R6x6BCdU4mE/BCDx/MXB2Nn/bHYnwXL5ga17MF32QywNJRc3NrXfk2oqiZ2HvfIHP768JMzSTMtIua2/3ITcoFGZmlMwKT4iDbth9QFVZttKEsoEil7Jd/DUNCtbaVG+tnlKqkBGpZou5fl4gqxaBCkni8lRu+3HYB17MKse5EEkZ1bCp1SbonCJp1MsxsAOfA+29XnP+AScC3DznlpmrmXmTGa24A5AD8ACC1FjXKjKsZEh4UGB7yuFzBM6mIqNoYVEgSJkYyTOzmjQ/+OYule67gqfYekMsa6ByNhzEx11wN90FXxC0t1syFuSvEqLKuIe7qVXj5BUKusLjr0IZ51QMF51YQkYHjTymSzMgOHvhm5yXEpecj8kwKBgW7Sl2S4TIyAWyaam63qUtKcHrzZjTtdc+KwEREDQjHYUkyFgojhHfxAgAs3n0FoqjfszWIiKj+YVAhSWkm0soQk5SFg1fSH/4EIiJqVBhUSFJ2FiYY2UFzOGPx7gesFEtERI0SgwpJbmLXZpDLBOy/nIaYa1lSl0NERAaEQYUk52FnjsdbuQEAluzhqAoREd3BoEIG4YUemiXtt5xORmxansTVEBGRoWBQIYPg76JEb38nqEVg4vIo/Hk0EcWlaqnLIiIiiTGokMGY+WgLWJsZ42paHt5YcwrdP9uFH/ZeRW6RRNeKISIiyTGokMEIdFNi35u98NZAfzhZKZCSXYiPNp9Dl3k7MT/yPG7mFEldIhER1TEGFTIoSlNjTO7hg31v9sKnI4Lh7WiB7MJSfLvrCkI//RfvrItBHOewEBE1GgwqZJAURnI83aEpdrzSA0ueaYfWHjYoLlVjxX8J6P3Fbkz9/ThPZSYiagR4rR8yaDKZgAFBLujf0hlHYjOwZM8V7LpwE5tikrEpJhldfOwxuYcPuvk5QBAM+6KGiRn5+PlALKwURpjRtzlkjfUijERE1cCgQvWCIAjo5G2PTt72OJ+SjaV7rmJj9HUcvJKOg1fS0dJNiRd6+GBQkAuM5IY1UBifnodvd13G2uNJKFVrrmeUV6zC+48FSlwZEZHhM6yf6ERV4O+ixFdPt8ae13tiQqgXzIzlOHM9Gy+tPIFeX+zGr4fiUFCskrpMxKbl4bU/o9H7iz348+g1lKpFtGlqAwD4aX8sftx3VdoCiYjqAY6oUL3lbmuO2WEt8VJvP/x6KB6/HIpDYkYBZm04gwU7LiG8sxeGt20CDzvzOq3rys1cLPr3MjacTMLtART0bOGI6b390M7TFt/vvYKPN5/Hh5vOwUlpql2Vl4iIKmJQoXrP1sIEL/f1w6Tu3lh9LBHf772Ka7cK8NWOi/hqx0UEuipvz3NxQXNnS73NZbl0IwcL/72Mv09dh3g7oPTxd8L0Pn5o7WGj3e75bt5IzirEsgNxeO3Pk3CwMEEXXwe91EREVN8xqFCDYWYix7jOXhjdsSk2xSRj5ZEEHInNwNnkbJxNzsaX2y/Cy94c/W+HltbuNjqZ0HohJQff/HsJm2OStQGlX6AzXurth2B36wrbC4KA9wcHIjW7CJtikvHCb8fw5+TOCHBV1roWIqKGhkGFGhwjuQxDWjfBkNZNkJFXjB3nbiDydAr2XU5DXHo+lu65iqV7rsJZqcCjgZrQ0snbDsbVnIR79no2Fv57CVtOp2jbBrR0wfQ+vmjpVjGg3E0mE/DFU61wM7cIR2IzMH7ZEax9MRRNbMxq9J6JiBoqBhVq0OwsTPBUew881d4DuUWl2H0hFZFnbmDX+VTcyC7Cb4fj8dvheFibGaNPgBP6t3RBdz9HmJnI7/uap5Oy8M3OS9h29gYAQBCAQUGumN7HF/4uVR8VMTWW44ex7fHk0oO4eCMX4T8fwZrJnWFjblLr901E1FAwqFCjYakwwmMhbngsxA1FpSocvJyOyDMp2H72BtLzirH2eBLWHk+CmbEcPZo7on+QM3r7O8PazBgAcOpaJr7ZeQk7zqUC0ASUx0LcML23L5o7W9WoJmtzYyyf0BHDvzuIy6m5mPTrMfw6sSNMje8flIiIGhMGFWqUFEZy9PJ3Qi9/J3w0TMTRuAxEnrmByDMpSMoswNYzKdh6JgVGMgGdfewhEwTsuXgTACATgMdbuWFabz/4OlnWuhY3GzMsf7YDnlxyCEfiMvDqnyexcFRbyLkgHBERgwqRXHZnMbn3HwvAmevZiDyTgq2nU3ApNRf7LqVptxvS2g3TevnC27H2AeVu/i5KfD+2PcJ/PoLNMSlwsjqL2WGBBr/aLhGRvjGoEN1FEAQENbFGUBNrvPZoC1y5mYvIMynILSzFU+094OVgobd9d/axx5dPt8K0FSew/GAcXK1N8UIPH73tj4ioPmBQIXoAH0dLvNjTt87291iIG25kF+GDf85i3pbzcFaaYmibJnW2fyIiQ8Ml9IkMzMSuzfBc12YAgNfXRGP/7UNPRESNEYMKkQF6Z1AAHgtxRYlKxOT/HcOZ61lSl0REJAkGFSIDVLYg3CPedsgtKsX4ZVFIzMiXuiwiojrHoEJkoBRGciwd2x7+Lla4mVOE8GVHcCuvWOqyiIjqFIMKkQGzNtMsCOdmbYqrN/Pw3K9HUViikrosIqI6w6BCZOBcrE2x/NmOUJoa4Vj8Lby08gRUalHqsoiI6gSDClE90NzZCj+Gd4CJkQzbzt5AxMYzEEWGFSJq+BhUiOqJjs3ssODp1hAE4LfD8Vi6N1bqkoiI9I5BhageGRTsitmPBQIAvthxGUducol9ImrYGFSI6pnxoc3wQg9vAMAfV2Q4cz1b4oqIiPSHQYWoHnqzvz96t3BEqSjgpT+ikV1YInVJRER6waBCVA/JZAI+HR4EO4WIhIwCvLnmFCfXElGDxKBCVE/ZmBtjvJ8KxnIBW06nYPnBOKlLIiLSOQYVonrM0wp4a0ALAMDHm8/hRMItiSsiItItBhWiem5sJw8MCnZBiUrEtBUnuMw+ETUoDCpE9ZwgCPhkRAi87M2RlFmAV/88CTVXriWiBoJBhagBUJoa47sx7WBiJMOuCzexdO9VqUsiItIJBhWiBiLQTYm5j7cEAHy+7QL+u5oucUVERLXHoELUgDzdwQPD2zSBSi1i+soTuJlTJHVJRES1wqBC1IAIgoAPhwXBz8kSqTlFmPEHr7RMRPUbgwpRA2NuYoTvxrSFmbEcBy6n45udl6QuiYioxhhUiBogP2crfDw8CADwzb+XsO/STYkrIiKqGQYVogZqWBt3jOrYFKIIzFh1EilZhVKXRERUbQwqRA3Y7LBABLoqkZ5XjOkrj6NUpZa6JCKiamFQIWrATI3l+G5MW1gqjBAVdwvzt12QuiQiomphUCFq4LwcLDD/iRAAwNI9V7Hj7A2JKyIiqjoGFaJGYGCwKyaEegEAXlsdjcSMfGkLIiKqIgYVokbi7YEBaOVhg6yCEkxbcRzFpZyvQkSGj0GFqJEwMZLh29FtYG1mjOhrWfh48zmpSyIieigGFaJGxN3WHF8+1QoAsPxgHDadSpa4IiKiB2NQIWpk+gQ4Y3IPHwDAm3+dQmxansQVERHdn6RBZd68eejQoQOsrKzg5OSEoUOH4sIFnj5JpG8zH22Ojl52yC0qxYu/H0dhiarWrymKIgqKVcgvLtVBhUREGkZS7nzPnj2YOnUqOnTogNLSUrzzzjt49NFHcfbsWVhYWEhZGlGDZiSXYeHoNhj09T6cS87GnL/PYN7wEIiiiNyiUmQVlCC7QPOv5usSZBeWaO/f3a75uhTZBSUoVqkhCMCgIFdM6emDoCbWUr9VIqrnJA0qW7duLXd/+fLlcHJywrFjx9C9e3eJqiJqHJyVpvh6ZBuM/fk/rDySiC2nU5BTWFrrqy2LIrApJhmbYpLRvbkjXuzpg07N7CAIgo4qJ6LGRNKgcq+srCwAgJ2dncSVEDUOXf0c8Erf5vhy+0Vk5pdo203kMijNjKE0M4K1mXG5m9L0rq+1/97Z7tqtAizZcwV/R1/H3os3sffiTbRtaoMXe/qiT4ATAwsRVYvBBBW1Wo0ZM2YgNDQUQUFBlW5TVFSEoqIi7f3s7GwAQElJifZWdp/qDvtdGrrq98ndPNGruT1kAjTBw9QYpsayGgcKXwczfD4iCC/18saP++Pw14nrOJ6Qied+PYrmTpZ4oXszDApyhpG8fs7l5+ddGux3aeir36vzeoIoirUb59WRKVOmYMuWLdi/fz/c3d0r3SYiIgJz5syp0L5ixQqYm5vru0QiqoGsYmBPsgz7bwgoUmnCj71CRG83NTo5iTCun3mFiGohPz8fo0ePRlZWFpRK5QO3NYigMm3aNGzYsAF79+5Fs2bN7rtdZSMqHh4eSEtLg1KpRElJCbZv345+/frB2Ni4LkongP0ukfrW71kFJfj9v0QsPxSPW7cPMzlammB8F0+M6uABK1ODGeB9oPrW7w0F+10a+ur37OxsODg4VCmoSPqTQRRFTJ8+HevWrcPu3bsfGFIAQKFQQKFQVGg3NjYu14H33qe6wX6XRn3pdwdjY7zcrwUm9fDFqqgE/LD3Kq5nFWL+tktYujcW4zp7YUKoF+wtK/4fN0T1pd8bGva7NHTd79V5LUkHXadOnYr//e9/WLFiBaysrJCSkoKUlBQUFBRIWRYR6ZGZiRwTQpth9+u9MP+JEPg4WiC7sBSLdl1G6Kf/ImLjGSRl8mcAEWlIGlQWL16MrKws9OzZE66urtrbH3/8IWVZRFQHTIxkeLK9B7a/0gNLnmmLEHdrFJaosfxgHHp8tguv/RmNy6k5UpdJRBKT/NAPETVuMpmAAUGu6N/SBQcup+O73Zdx8Eo6/jp+DX8dv4aOzewwqqMHBga5wtRYLnW5RFTH6sfsNSJq8ARBQFc/B3T1c8CJhFtYvPsKdpy7gSOxGTgSm4GIjWcxrE0TjOzoAX+XB0++I6KGg0GFiAxOm6a2+H5ce1zPLMDqo9fw59FEJGUWYPnBOCw/GIfWHjYY1dEDj4W4wULBH2NEDRn/hxORwXKzMcPLff0wrbcv9l26iVVHErHj3A2cTMzEycRMfPDPOYS1csOojh4IbmKt11VvRVHEzZwinE/ORD6vu0hUZxhUiMjgyWUCerZwQs8WTriZU4S/jl/DqiMJiEvPx8ojCVh5JAGBrkqM6uiBIW2aQGlau9MoS1VqxKbl4WxyNs5ez8bZ5GycS85GWm4xAMDJVI6+fUvgaM3TZIn0jUGFiOoVRysFJvfwwQvdvXH4agZWRSVgy+kUnE3OxvsbzuCjzecwOFgzytLO0/ahoyy5RaU4fzuIlAWT8yk5KCpVV9hWJmjOVkotVGPKihP433OPcIIvkZ4xqBBRvSQIAjr72KOzjz0i8oqx7kQSVkUl4OKNXO0ZQ75OlhjZwQPD27rD1twYN7KLcDY5SztKcvZ6NuLS8yt9fXMTOQJclQh0VSLQTfNvc2crXE3NwhOLD+JofCZeWx2NhSPbQCbjhRaJ9IVBhYjqPVsLEzzbtRkmhHrheEImVh1JwD+nknE5NRcfbjqHz7ZegKWpETLyiit9vovSVBtGAt2UCHBVwtPOvNIA0tzZChNbqLH0ghE2nUqGm7Up3h0cqO+3SNRoMagQUYMhCALaedqinactZoUFYmP0daw6koiYpCxk5BVDLhPg62h5O4xYIdDVGgGuVtVett/PWsS8YUGYuSYGP+yLhZuNGSaEPvgSIERUMwwqRNQgWZkaY0wnT4zp5IlLN3JQWKKGn7OlzuaUDGnlihs5xZgfeQFz/zkLV2tTDAhy1clrE9EdvMA6ETV4fs5WCHa31vnE1xd7+mB0p6YQReDlVSdxLD5Dp69PRAwqREQ1JggC5j7eEn38nVBUqsZzvxzF1Zu5UpdF1KAwqBAR1YKRXIaFo9sgxN0at/JLMH5ZFNJyi6Qui6jBYFAhIqolcxMj/BTeAR52ZkjIyMfEX44iv5jL1xLpAoMKEZEOOFopsHxCR9iYGyM6MRMvrTwJlZpXiCeqLQYVIiId8XG0xI/j2sPESIYd524gYuMZiCLDClFtMKgQEelQey87fP10awgC8NvheCzde1XqkojqNQYVIiIdGxjsivdur1b7yZbz2HAySeKKiOovBhUiIj2Y2LUZJnbVrFY7c3U0Dl1Jl7giovqJQYWISE/eHRSAQcEuKFGJmPTbUVy8kSN1SUT1DoMKEZGeyGQCvnyqNdp72iKnsBTjfz6CG9mFUpdFVK8wqBAR6ZGpsRw/jGsPb0cLXM8qxIRlUcgt4horRFXFoEJEpGe2Fib4ZUJHOFia4GxyNqb87xhKVGqpyyKqFxhUiIjqgIedOX4e3wFmxnLsu5SGd9bGcI0VoipgUCEiqiMh7jb4dkwbyARg9bFr+HrnJalLIjJ4DCpERHWot78zPhwaDABYsOMSXv3zJGKuZUlcFZHhMpK6ACKixmZ0p6ZIySrAN/9extrjSVh7PAltmtpgXGdPDAp2hcJILnWJRAaDIypERBJ49dEWWPtiFwxr0wTGcgEnEjLxyh/R6DLvX3weeQHXMwukLpHIIHBEhYhIIm2b2qJtU1u8MygAf0Ql4Pf/EpCcVYhFuy5j8Z4r6BfgjHFdPNHZ2x6CIEhdLpEkGFSIiCTmaKXAtN5+mNzDBzvO3cAvB+Nx6Go6tp5JwdYzKfBzssS4zp4Y1tYdlgr+2KbGhZ94IiIDYSSXYUCQKwYEueLijRz8eigOa48n4VJqLt7fcAafbr2AEW2bYGxnL/g6WUpdLlGd4BwVIiID1NzZCh8ODcbhd/ogIiwQ3o4WyC0qxS+H4tH3yz145sf/sO1MClRqrsVCDRtHVIiIDJjS1BjjQ5shvIsXDlxOxy+H4rDz3A3sv5yG/ZfT0MTGDGMeaYqn23vA3lIhdblEOsegQkRUDwiCgK5+Dujq54DEjHz8/l8C/ohKQFJmAT7begFfbLuIoCbW6Ohli47N7NHByxY25iZSl01UawwqRET1jIedOd4a6I8Zff3wz6lk/HooDqeuZSE6MRPRiZn4YV8sAKCFsxU6NNMEl45ednCxNpW4cqLqY1AhIqqnTI3leKKdO55o546kzAJExWbgv9gMRMVl4HJqLi7cyMGFGzn43+EEAEBTO3N08LJDp2Z26NDMDl725jztmQwegwoRUQPQxMYMTdo0wdA2TQAA6blFiIq7hSOxGTgSl46z17ORkJGPhIx8/HX8GgDNadEdvezQ4fbhohYuVpDLGFzIsDCoEBE1QPaWCgwIcsGAIBcAQE5hCY7F30JUXAaOxGYgOjELN3OKsCkmGZtikgEAVqZG6OBlh/ZetmjlboNgd2soTY2lfBtEDCpERI2BlakxerZwQs8WTgCAwhIVohMzERWnOVx0PP4WcgpL8e/5VPx7PlX7PG9HC7Ryt0GIuzVaedgg0FUJU2Nei4jqDoMKEVEjZGosRydve3Tytsc0AKUqNc4mZ+NIbAZOJGQi+lomrt0qwNWbebh6Mw/rTiQBAIxkAlq4WKGVhw1auVsjxN0Gfk6WMJJzWS7SDwYVIiKCkVyGEHcbhLjbaNvSc4s0ZxNdy9SeVZSeV4wz17Nx5no2Vvyn2c7MWI6gJkqEuNtoA0xTO07UJd1gUCEiokrZWyrQy98Jvfw1h4tEUURSZoE2vEQnZuJ0UjZyi0oRFXcLUXG3tM+1MTdGiLsN3KxNYSQXYCyX3b4JMJJp/jWWy2Akl8FELsBILoORTICJkeyex28/VxCh4iK8jRKDChERVYkgCHC3NYe7rTkGBbsCANRqEVfTcnEyMQunrmUi+loWzl3PRmZ+CfZevKnT/Tcxl6NtaAG8HDnBtzFhUCEiohqTyQT4OlnB18kKT7RzBwAUl6pxPiUbp65lIaugBMWlapSq1ShRiShRqVF6+1/tfbUaxaUiStWax4pVapSWe1xEclYBkvJVGLb4ML4d0xZdfBwkfudUVxhUiIhIp0yMKs53qa2EtByMWbwHiXklGPvTEbw7KAATQr04D6YR4DRtIiIyeK7WpnippQrDWrtCpRYx95+zeG11NApLVFKXRnrGoEJERPWCiRz4dHgQZj0WCLlMwNrjSXhyySFczyyQujTSIwYVIiKqNwRBwLNdm+G3iR1ha26MmKQshC3cj/+upktdGukJgwoREdU7XXwcsHFaVwS6KpGeV4wxP/6HXw/FQRTr7hzmxIx8HInNQKlKXWf7bIw4mZaIiOolDztz/DWlC9786xQ2Rl/HrA1ncDopCx8MDYLCSD/L/IuiiCOxGfj5QCy2n70BtQi4WZtibGcvjOroARtzE73stzFjUCEionrLzESOr0e2RnATa8zbcg5/Hr2GizdyseSZdnCxNtXZfopKVfgnOhk/H4jFmevZ2nYrhRGuZxXi063n8fXOixje1h0TunjBz9lKZ/tu7BhUiIioXhMEAc9394a/qxWmrTiBk4mZCFu0H0ueaYt2nna1eu203CL8fjgBvx2OR1puEQDA1FimDSQedubYGH0dyw7E4VxyNlb8l4AV/yWgm58Dng1thh7NHSGT8RTq2mBQISKiBqGbnyP+ntYVk347ivMpORj5/WHMeTwIozs1rfZrnb2ejWUHYrEh+jqKSzVzUFyUphjXxROjOjSFrcWdQzxPtffAk+3c8V9sBn7eH4vt525g36U07LuUBm8HC4wP9cKItu6wUPBXbk2w14iIqMFoaq+Zt/L6mmhsjknBO+ticPp6FiLCWsLE6MHnj6jVIv49n4qf9sfi0F1nEbXysMHErs0wMMgFxve5SrQgCHjE2x6PeNsjIT0fvxyKw59RibialodZG85gfuQFjOzggXGdNaMwVHUMKkRE1KBYKIzw7ei2WLznCuZHXsCK/xJwISUHi8e0hZOy4ryV3KJSrDmaiOUH4xCXng8AkMsEDAxywbNdm6FtU9tq7b+pvTnefywQr/Rrjr+OXcOyA7GIS8/HD/ti8dP+WPQLdMazoc3QsZmdwa6sW1iiQlpuEW5k5uOGxMvUMKgQEVGDIwgCXuzpiwBXJV5aeQLH4m/dnrfSDm1uB4/EjHz8cjAOf0QlIqeoFACgNDXCqE5NMa6zF5rYmNWqBkuFEcK7eGHsI57YfTEVyw7EYd+lNESeuYHIMzcQ6KrEhFAvhLVyg6mxfs5SKqNWi8gqKEF6XhHScouRlluE9Nv/puUWIz23SNOWV4z03GLk3u4PAOjoKMMEvVb3YAwqRETUYPVq4YSN07ri+V+P4nJqLp5eehgv9/VDzLUsbDubAvXtZVe8HS0wIbQZRrRtAnMT3f5qlMkE9PZ3Rm9/Z1y8kYNlB+Kw7sQ1nE3OxutrTuHTrecxupMn+gY4QS3ingsyVnbBxrKLOmou2FhSqkaJuuyCj5rnZhWUlAshGXnFKFVXb40ZE7kM9pYmMJXn67Q/qotBhYiIGrRmDhZYPzUUr/5xEtvO3sD8yAvax7r5OeDZrs3Qw69uzs5p7myFecOD8Ub/FlgZlYDfDsUjOasQ3+y8hG92XtL7/pWmRnCwUsDBQgEHKxPYWyhgb2kCB0sFHCxNYG+pgIOlps1KYYTS0lJs3rxZ73U9CIMKERE1eJYKIyx5ph2+3XUZvx2OR58AZ0wI9UJzidY7sbUwwYs9ffF8N29sPZ2C3w7FIzY9DyZyGYzkAozlMhjJBJgYaf41lss0bfKyr8u2uetruaB9vtLUWBs4HG6HDzsLk4dOKDZEDCpERNQoyGQCpvfxw/Q+flKXomUslyGslRvCWrlJXYrBqn/RioiIiBoNBhUiIiIyWAwqREREZLAYVIiIiMhgMagQERGRwWJQISIiIoPFoEJEREQGi0GFiIiIDBaDChERERksgwgq3377Lby8vGBqaopOnTrhyJEjUpdEREREBkDyoPLHH3/g1VdfxezZs3H8+HG0atUK/fv3R2pqqtSlERERkcQkDypffvklnn/+eUyYMAGBgYFYsmQJzM3N8fPPP0tdGhEREUlM0qBSXFyMY8eOoW/fvto2mUyGvn374tChQxJWRkRERIZA0qsnp6WlQaVSwdnZuVy7s7Mzzp8/X2H7oqIiFBUVae9nZ2cDAEpKSrS3svtUd9jv0mC/S4P9Lg32uzT01e/VeT1Jg0p1zZs3D3PmzKnQvn79epibm2vvb9iwoS7LotvY79Jgv0uD/S4N9rs0dN3v+fn5AABRFB+6rSBWZSs9KS4uhrm5OdasWYOhQ4dq28PDw5GZmVmhY+4dUUlKSkJgYGBdlUtEREQ6lJiYCHd39wduI+mIiomJCdq1a4edO3dqg4parcbOnTsxbdq0CtsrFAooFArtfUtLSyQmJsLKygqCICA7OxseHh5ITEyEUqmsq7fR6LHfpcF+lwb7XRrsd2noq99FUUROTg7c3Nweuq3kh35effVVhIeHo3379ujYsSMWLFiAvLw8TJgw4aHPlclklSYxpVLJD7IE2O/SYL9Lg/0uDfa7NPTR79bW1lXaTvKg8vTTT+PmzZuYNWsWUlJS0Lp1a2zdurXCBFsiIiJqfCQPKgAwbdq0Sg/1EBERUeMm+YJvuqRQKDB79uxy81hI/9jv0mC/S4P9Lg32uzQMod8lPeuHiIiI6EEa1IgKERERNSwMKkRERGSwGFSIiIjIYDGoEBERkcFqMEHl22+/hZeXF0xNTdGpUyccOXJE6pIavL179yIsLAxubm4QBAHr16+XuqQGb968eejQoQOsrKzg5OSEoUOH4sKFC1KX1eAtXrwYISEh2kWvOnfujC1btkhdVqPzySefQBAEzJgxQ+pSGrSIiAgIglDu5u/vL1k9DSKo/PHHH3j11Vcxe/ZsHD9+HK1atUL//v2RmpoqdWkNWl5eHlq1aoVvv/1W6lIajT179mDq1Kk4fPgwtm/fjpKSEjz66KPIy8uTurQGzd3dHZ988gmOHTuGo0ePonfv3hgyZAjOnDkjdWmNRlRUFJYuXYqQkBCpS2kUWrZsieTkZO1t//79ktXSIE5P7tSpEzp06IBFixYB0FwvyMPDA9OnT8dbb70lcXWNgyAIWLduXbmLS5L+3bx5E05OTtizZw+6d+8udTmNip2dHebPn4+JEydKXUqDl5ubi7Zt2+K7777Dhx9+iNatW2PBggVSl9VgRUREYP369Th58qTUpQBoACMqxcXFOHbsGPr27attk8lk6Nu3Lw4dOiRhZUT6l5WVBUDzS5PqhkqlwqpVq5CXl4fOnTtLXU6jMHXqVAwePLjcz3nSr0uXLsHNzQ3e3t4YM2YMEhISJKvFIJbQr420tDSoVKoK1wZydnbG+fPnJaqKSP/UajVmzJiB0NBQBAUFSV1OgxcTE4POnTujsLAQlpaWWLduHQIDA6Uuq8FbtWoVjh8/jqioKKlLaTQ6deqE5cuXo0WLFkhOTsacOXPQrVs3nD59GlZWVnVeT70PKkSN1dSpU3H69GlJjx03Ji1atMDJkyeRlZWFNWvWIDw8HHv27GFY0aPExES8/PLL2L59O0xNTaUup9EYOHCg9uuQkBB06tQJnp6e+PPPPyU51Fnvg4qDgwPkcjlu3LhRrv3GjRtwcXGRqCoi/Zo2bRr++ecf7N27F+7u7lKX0yiYmJjA19cXANCuXTtERUXh66+/xtKlSyWurOE6duwYUlNT0bZtW22bSqXC3r17sWjRIhQVFUEul0tYYeNgY2OD5s2b4/Lly5Lsv97PUTExMUG7du2wc+dObZtarcbOnTt5/JgaHFEUMW3aNKxbtw7//vsvmjVrJnVJjZZarUZRUZHUZTRoffr0QUxMDE6ePKm9tW/fHmPGjMHJkycZUupIbm4urly5AldXV0n2X+9HVADg1VdfRXh4ONq3b4+OHTtiwYIFyMvLw4QJE6QurUHLzc0tl7BjY2Nx8uRJ2NnZoWnTphJW1nBNnToVK1aswIYNG2BlZYWUlBQAgLW1NczMzCSuruF6++23MXDgQDRt2hQ5OTlYsWIFdu/ejcjISKlLa9CsrKwqzL+ysLCAvb0952Xp0cyZMxEWFgZPT09cv34ds2fPhlwux6hRoySpp0EElaeffho3b97ErFmzkJKSgtatW2Pr1q0VJtiSbh09ehS9evXS3n/11VcBAOHh4Vi+fLlEVTVsixcvBgD07NmzXPuyZcswfvz4ui+okUhNTcW4ceOQnJwMa2trhISEIDIyEv369ZO6NCKdu3btGkaNGoX09HQ4Ojqia9euOHz4MBwdHSWpp0Gso0JEREQNU72fo0JEREQNF4MKERERGSwGFSIiIjJYDCpERERksBhUiIiIyGAxqBAREZHBYlAhIiIig8WgQkQNiiAIWL9+vdRlEJGOMKgQkc6MHz8egiBUuA0YMEDq0oionmoQS+gTkeEYMGAAli1bVq5NoVBIVA0R1XccUSEinVIoFHBxcSl3s7W1BaA5LLN48WIMHDgQZmZm8Pb2xpo1a8o9PyYmBr1794aZmRns7e0xadIk5Obmltvm559/RsuWLaFQKODq6opp06aVezwtLQ3Dhg2Dubk5/Pz8sHHjRv2+aSLSGwYVIqpT77//PkaMGIHo6GiMGTMGI0eOxLlz5wAAeXl56N+/P2xtbREVFYXVq1djx44d5YLI4sWLMXXqVEyaNAkxMTHYuHEjfH19y+1jzpw5eOqpp3Dq1CkMGjQIY8aMQUZGRp2+TyLSEZGISEfCw8NFuVwuWlhYlLt99NFHoiiKIgBx8uTJ5Z7TqVMnccqUKaIoiuL3338v2trairm5udrHN23aJMpkMjElJUUURVF0c3MT33333fvWAEB87733tPdzc3NFAOKWLVt09j6JqO5wjgoR6VSvXr2wePHicm12dnbarzt37lzusc6dO+PkyZMAgHPnzqFVq1awsLDQPh4aGgq1Wo0LFy5AEARcv34dffr0eWANISEh2q8tLCygVCqRmppa07dERBJiUCEinbKwsKhwKEZXzMzMqrSdsbFxufuCIECtVuujJCLSM85RIaI6dfjw4Qr3AwICAAABAQGIjo5GXl6e9vEDBw5AJpOhRYsWsLKygpeXF3bu3FmnNRORdDiiQkQ6VVRUhJSUlHJtRkZGcHBwAACsXr0a7du3R9euXfH777/jyJEj+OmnnwAAY8aMwezZsxEeHo6IiAjcvHkT06dPx9ixY+Hs7AwAiIiIwOTJk+Hk5ISBAwciJycHBw4cwPTp0+v2jRJRnWBQISKd2rp1K1xdXcu1tWjRAufPnwegOSNn1apVePHFF+Hq6oqVK1ciMDAQAGBubo7IyEi8/PLL6NChA8zNzTFixAh8+eWX2tcKDw9HYWEhvvrqK8ycORMODg544okn6u4NElGdEkRRFKUugogaB0EQsG7dOgwdOlTqUoionuAcFSIiIjJYDCpERERksDhHhYjqDI80E1F1cUSFiIiIDBaDChERERksBhUiIiIyWAwqREREZLAYVIiIiMhgMagQERGRwWJQISIiIoPFoEJEREQGi0GFiIiIDNb/AV02jUDXe4DQAAAAAElFTkSuQmCC\n", + "text/plain": [ + "

" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import matplotlib.pyplot as plt\n", + "\n", + "# Access the log history\n", + "log_history = trainer.state.log_history\n", + "\n", + "# Extract training / validation loss\n", + "train_losses = [log[\"loss\"] for log in log_history if \"loss\" in log]\n", + "epoch_train = [log[\"epoch\"] for log in log_history if \"loss\" in log]\n", + "eval_losses = [log[\"eval_loss\"] for log in log_history if \"eval_loss\" in log]\n", + "epoch_eval = [log[\"epoch\"] for log in log_history if \"eval_loss\" in log]\n", + "\n", + "# Plot the training loss\n", + "plt.plot(epoch_train, train_losses, label=\"Training Loss\")\n", + "plt.plot(epoch_eval, eval_losses, label=\"Validation Loss\")\n", + "plt.xlabel(\"Epoch\")\n", + "plt.ylabel(\"Loss\")\n", + "plt.title(\"Training and Validation Loss per Epoch\")\n", + "plt.legend()\n", + "plt.grid(True)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vyIwS-orvWzd" + }, + "source": [ + "This visualization helps in monitoring the training process and making informed decisions about hyperparameters tuning or early stopping.\n", + "\n", + "Training loss measures the error on the data the model was trained on, while validation loss measures the error on a separate dataset the model has not seen before. Monitoring both helps detect overfitting (when the model performs well on training data but poorly on unseen data).\n", + "\n", + "- validation loss >> training loss: **overfitting**\n", + "- validation loss > training loss: **some overfitting**\n", + "- validation loss < training loss: **some underfitting**\n", + "- validation loss << training loss: **underfitting**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bf86e31d" + }, + "source": [ + "## Test Model Inference\n", + "\n", + "After the training is done, you'll want to evaluate and test your model. You can load different samples from the test dataset and evaluate the model on those samples.\n", + "\n", + "For this particular use case, the best model is a matter of preference. Interestingly, what we'd normally call 'overfitting' can be very useful for a game NPC. It forces the model to forget general information and instead lock onto the specific persona and characteristics it was trained on, ensuring it stays consistently in character.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "id": "aab1c5c5" + }, + "outputs": [], + "source": [ + "from transformers import AutoTokenizer, AutoModelForCausalLM\n", + "\n", + "model_id = checkpoint_dir\n", + "\n", + "# Load Model\n", + "model = AutoModelForCausalLM.from_pretrained(\n", + " model_id,\n", + " torch_dtype=\"auto\",\n", + " device_map=\"auto\",\n", + " attn_implementation=\"eager\"\n", + ")\n", + "tokenizer = AutoTokenizer.from_pretrained(model_id)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3dccb57c" + }, + "source": [ + "Let's load all questions from the test dataset and generate outputs." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "id": "1fd887f4" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Device set to use cuda:0\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Question:\n", + "Do you know any jokes?\n", + "Original Answer:\n", + "A joke? k'tak Yez. A Terran, a Glarzon, and a pile of nutrient-pazte walk into a bar... Narg, I forget da rezt. Da punch-line waz zarcaztic.\n", + "Generated Answer:\n", + "Yez! Yez! Yez! Diz your Krush-tongs iz... k'tak... nice. Why you burn them with acid-flow?\n", + "--------------------------------------------------------------------------------\n", + "Question:\n", + "(Stands idle for too long)\n", + "Original Answer:\n", + "You'z broken, Terran? Or iz diz... 'meditation'? You look like you're trying to lay an egg.\n", + "Generated Answer:\n", + "Diz? Diz what you have for me... Zorp iz not for eating you.\n", + "--------------------------------------------------------------------------------\n", + "Question:\n", + "What do you think of my outfit?\n", + "Original Answer:\n", + "Iz very... pointy. Are you expecting to be attacked by zky-eelz? On Marz, dat would be zenzible.\n", + "Generated Answer:\n", + "My Zk-Zhip iz... nice. Very... home-baked. You bring me zlight-fruitez?\n", + "--------------------------------------------------------------------------------\n", + "Question:\n", + "It's raining.\n", + "Original Answer:\n", + "Gah! Da zky iz leaking again! Zorp will be in da zhelter until it ztopz being zo... wet. Diz iz no good for my jointz.\n", + "Generated Answer:\n", + "Diz? Diz iz da outpozt?\n", + "--------------------------------------------------------------------------------\n", + "Question:\n", + "I brought you a gift.\n", + "Original Answer:\n", + "A gift? For Zorp? k'tak It iz... a small rock. Very... rock-like. Zorp will put it with da other rockz. Thank you for da thought, Terran.\n", + "Generated Answer:\n", + "A genuine Martian Zcrap-fruit. Very... strange. Why you burn it with... k'tak... fire?\n", + "--------------------------------------------------------------------------------\n" + ] + } + ], + "source": [ + "from transformers import pipeline\n", + "\n", + "# Load the model and tokenizer into the pipeline\n", + "pipe = pipeline(\"text-generation\", model=model, tokenizer=tokenizer)\n", + "\n", + "def test(test_sample):\n", + " # Convert as test example into a prompt with the Gemma template\n", + " prompt = pipe.tokenizer.apply_chat_template(test_sample[\"messages\"][:1], tokenize=False, add_generation_prompt=True)\n", + " outputs = pipe(prompt, max_new_tokens=256, disable_compile=True)\n", + "\n", + " # Extract the user query and original answer\n", + " print(f\"Question:\\n{test_sample['messages'][0]['content']}\")\n", + " print(f\"Original Answer:\\n{test_sample['messages'][1]['content']}\")\n", + " print(f\"Generated Answer:\\n{outputs[0]['generated_text'][len(prompt):].strip()}\")\n", + " print(\"-\"*80)\n", + "\n", + "# Test with an unseen dataset\n", + "for item in dataset['test']:\n", + " test(item)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9RCnrmsVaadB" + }, + "source": [ + "If you try our original generalist prompt, you can see that the model still attempts to answer in the trained style. In this example overfitting and catastrophic forgetting are actually beneficial for the game NPC because it will begin forgetting general knowledge which might not be applicable. This is also true for other types of full fine-tuning where the goal is to restrict the output to specific data formats." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "id": "3irXKbgKat9f" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Nameless. You... you z-mell like... wet plantz. Why you wear shiny piecez on your head?\n" + ] + } + ], + "source": [ + "outputs = pipe([{\"role\": \"user\", \"content\": \"Sorry, you are a game NPC.\"}], max_new_tokens=256, disable_compile=True)\n", + "print(outputs[0]['generated_text'][1]['content'])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6f8ff452" + }, + "source": [ + "## Summary and next steps\n", + "\n", + "This tutorial covered how to full model fine-tune using TRL. Check out the following docs next:\n", + "\n", + "* Learn how to [fine-tune Gemma for text tasks using Hugging Face Transformers](https://ai.google.dev/gemma/docs/core/huggingface_text_finetune_qlora).\n", + "* Learn how to [fine-tune Gemma for vision tasks using Hugging Face Transformers](https://ai.google.dev/gemma/docs/core/huggingface_vision_finetune_qlora).\n", + "* Learn how to [deploy to Cloud Run](https://ai.google.dev/gemma/docs/integrations/google-cloud#run)" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "name": "huggingface_text_full_finetune.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/tooling/fine-tuning/google-cookbook/huggingface_vision_finetune_qlora.ipynb b/tooling/fine-tuning/google-cookbook/huggingface_vision_finetune_qlora.ipynb new file mode 100644 index 0000000..37ea866 --- /dev/null +++ b/tooling/fine-tuning/google-cookbook/huggingface_vision_finetune_qlora.ipynb @@ -0,0 +1,980 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "pn1797sn9Jb_" + }, + "source": [ + "##### Copyright 2025 Google LLC." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "uivh5PY69ISg" + }, + "outputs": [], + "source": [ + "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "O83CmJ2j9L3n" + }, + "source": [ + "# Fine-Tune Gemma for Vision Tasks using Hugging Face Transformers and QLoRA" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f9673bd6" + }, + "source": [ + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
\n", + " View on ai.google.dev\n", + " \n", + " Run in Google Colab\n", + " \n", + " Run in Kaggle\n", + " \n", + " Open in Vertex AI\n", + " \n", + " View source on GitHub\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e624ec07" + }, + "source": [ + "This guide walks you through how to fine-tune Gemma on a custom image and text dataset for a vision task (generating product descriptions) using Hugging Face [Transformers](https://huggingface.co/docs/transformers/index) and [TRL](https://huggingface.co/docs/trl/index). You will learn:\n", + "\n", + "- What is Quantized Low-Rank Adaptation (QLoRA)\n", + "- Setup development environment\n", + "- Create and prepare the fine-tuning dataset\n", + "- Fine-tune Gemma using TRL and the SFTTrainer\n", + "- Test Model Inference and generate product descriptions from images and text.\n", + "\n", + "Note: This guide requires a GPU which support bfloat16 data type such as NVIDIA L4 or NVIDIA A100 and more than 16GB of memory.\n", + "\n", + "## What is Quantized Low-Rank Adaptation (QLoRA)\n", + "\n", + "This guide demonstrates the use of [Quantized Low-Rank Adaptation (QLoRA)](https://arxiv.org/abs/2305.14314), which emerged as a popular method to efficiently fine-tune LLMs as it reduces computational resource requirements while maintaining high performance. In QloRA, the pretrained model is quantized to 4-bit and the weights are frozen. Then trainable adapter layers (LoRA) are attached and only the adapter layers are trained. Afterwards, the adapter weights can be merged with the base model or kept as a separate adapter.\n", + "\n", + "## Setup development environment\n", + "\n", + "The first step is to install Hugging Face Libraries, including TRL, and datasets to fine-tune open model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ba51aa79" + }, + "outputs": [], + "source": [ + "# Install Pytorch & other libraries\n", + "%pip install torch tensorboard torchvision\n", + "\n", + "# Install Transformers\n", + "%pip install transformers\n", + "\n", + "# Install Hugging Face libraries\n", + "%pip install datasets accelerate evaluate bitsandbytes trl peft protobuf pillow sentencepiece\n", + "\n", + "# COMMENT IN: if you are running on a GPU that supports BF16 data type and flash attn, such as NVIDIA L4 or NVIDIA A100\n", + "#%pip install flash-attn" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7ef3d54b" + }, + "source": [ + "_Note: If you are using a GPU with Ampere architecture (such as NVIDIA L4) or newer, you can use Flash attention. Flash Attention is a method that significantly speeds computations up and reduces memory usage from quadratic to linear in sequence length, leading to acelerating training up to 3x. Learn more at [FlashAttention](https://github.com/Dao-AILab/flash-attention/tree/main)._\n", + "\n", + "You need a valid Hugging Face Token to publish your model. If you are running inside a Google Colab, you can securely use your Hugging Face Token using the Colab secrets otherwise you can set the token as directly in the `login` method. Make sure your token has write access too, as you push your model to the Hub during training." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b6d79c93" + }, + "outputs": [], + "source": [ + "# Login into Hugging Face Hub\n", + "from huggingface_hub import login\n", + "login()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "42c60525" + }, + "source": [ + "## Create and prepare the fine-tuning dataset\n", + "\n", + "When fine-tuning LLMs, it is important to know your use case and the task you want to solve. This helps you create a dataset to fine-tune your model. If you haven't defined your use case yet, you might want to go back to the drawing board.\n", + "\n", + "As an example, this guide focuses on the following use case:\n", + "\n", + "- Fine-tuning a Gemma model to generate concise, SEO-optimized product descriptions for an ecommerce platform, specifically tailored for mobile search.\n", + "\n", + "This guide uses the [philschmid/amazon-product-descriptions-vlm](https://huggingface.co/datasets/philschmid/amazon-product-descriptions-vlm) dataset, a dataset of Amazon product descriptions, including product images and categories.\n", + "\n", + "Hugging Face TRL supports multimodal conversations. The important piece is the \"image\" role, which tells the processing class that it should load the image. The structure should follow:\n", + "\n", + "```json\n", + "{\"messages\": [{\"role\": \"system\", \"content\": [{\"type\": \"text\", \"text\":\"You are...\"}]}, {\"role\": \"user\", \"content\": [{\"type\": \"text\", \"text\": \"...\"}, {\"type\": \"image\"}]}, {\"role\": \"assistant\", \"content\": [{\"type\": \"text\", \"text\": \"...\"}]}]}\n", + "{\"messages\": [{\"role\": \"system\", \"content\": [{\"type\": \"text\", \"text\":\"You are...\"}]}, {\"role\": \"user\", \"content\": [{\"type\": \"text\", \"text\": \"...\"}, {\"type\": \"image\"}]}, {\"role\": \"assistant\", \"content\": [{\"type\": \"text\", \"text\": \"...\"}]}]}\n", + "{\"messages\": [{\"role\": \"system\", \"content\": [{\"type\": \"text\", \"text\":\"You are...\"}]}, {\"role\": \"user\", \"content\": [{\"type\": \"text\", \"text\": \"...\"}, {\"type\": \"image\"}]}, {\"role\": \"assistant\", \"content\": [{\"type\": \"text\", \"text\": \"...\"}]}]}\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c4ecf6db" + }, + "source": [ + "You can now use the Hugging Face Datasets library to load the dataset and create a prompt template to combine the image, product name, and category, and add a system message. The dataset includes images as`Pil.Image` objects." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "40c3a2cf" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8d1259be3dfa4b1e899c97026276ee41", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "README.md: 0.00B [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a5554c0595144c949b578eb1cbdfd0fd", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "data/train-00000-of-00001.parquet: 0%| | 0.00/47.6M [00:00 and and image.\\nOnly return description. The description should be SEO optimized and for a better mobile search experience.\\n\\n\\nRazor Agitator BMX/Freestyle Bike, 20-Inch\\n\\n\\n\\nSports & Outdoors | Outdoor Recreation | Cycling | Kids' Bikes & Accessories | Kids' Bikes\\n\\n\"}, {'type': 'image', 'image': }]}, {'role': 'assistant', 'content': [{'type': 'text', 'text': 'Conquer the streets with the Razor Agitator BMX Bike! This 20-inch freestyle bike is built for young riders ready to take on any challenge. Durable frame, responsive handling – perfect for tricks and cruising. Get yours today!'}]}]\n" + ] + } + ], + "source": [ + "from datasets import load_dataset\n", + "from PIL import Image\n", + "\n", + "# System message for the assistant\n", + "system_message = \"You are an expert product description writer for Amazon.\"\n", + "\n", + "# User prompt that combines the user query and the schema\n", + "user_prompt = \"\"\"Create a Short Product description based on the provided and and image.\n", + "Only return description. The description should be SEO optimized and for a better mobile search experience.\n", + "\n", + "\n", + "{product}\n", + "\n", + "\n", + "\n", + "{category}\n", + "\n", + "\"\"\"\n", + "\n", + "# Convert dataset to OAI messages\n", + "def format_data(sample):\n", + " return {\n", + " \"messages\": [\n", + " {\n", + " \"role\": \"system\",\n", + " #\"content\": [{\"type\": \"text\", \"text\": system_message}],\n", + " \"content\": system_message,\n", + " },\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": user_prompt.format(\n", + " product=sample[\"Product Name\"],\n", + " category=sample[\"Category\"],\n", + " ),\n", + " },\n", + " {\n", + " \"type\": \"image\",\n", + " \"image\": sample[\"image\"],\n", + " },\n", + " ],\n", + " },\n", + " {\n", + " \"role\": \"assistant\",\n", + " \"content\": [{\"type\": \"text\", \"text\": sample[\"description\"]}],\n", + " },\n", + " ],\n", + " }\n", + "\n", + "def process_vision_info(messages: list[dict]) -> list[Image.Image]:\n", + " image_inputs = []\n", + " # Iterate through each conversation\n", + " for msg in messages:\n", + " # Get content (ensure it's a list)\n", + " content = msg.get(\"content\", [])\n", + " if not isinstance(content, list):\n", + " content = [content]\n", + "\n", + " # Check each content element for images\n", + " for element in content:\n", + " if isinstance(element, dict) and (\n", + " \"image\" in element or element.get(\"type\") == \"image\"\n", + " ):\n", + " # Get the image and convert to RGB\n", + " if \"image\" in element:\n", + " image = element[\"image\"]\n", + " else:\n", + " image = element\n", + " image_inputs.append(image.convert(\"RGB\"))\n", + " return image_inputs\n", + "\n", + "# Load dataset from the hub\n", + "dataset = load_dataset(\"philschmid/amazon-product-descriptions-vlm\", split=\"train\")\n", + "dataset = dataset.train_test_split(test_size=0.1)\n", + "\n", + "# Convert dataset to OAI messages\n", + "# need to use list comprehension to keep Pil.Image type, .mape convert image to bytes\n", + "dataset_train = [format_data(sample) for sample in dataset[\"train\"]]\n", + "dataset_test = [format_data(sample) for sample in dataset[\"test\"]]\n", + "\n", + "print(dataset_train[345][\"messages\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c0eb2e06" + }, + "source": [ + "## Fine-tune Gemma using TRL and the SFTTrainer\n", + "\n", + "You are now ready to fine-tune your model. Hugging Face TRL [SFTTrainer](https://huggingface.co/docs/trl/sft_trainer) makes it straightforward to supervise fine-tune open LLMs. The `SFTTrainer` is a subclass of the `Trainer` from the `transformers` library and supports all the same features, including logging, evaluation, and checkpointing, but adds additional quality of life features, including:\n", + "\n", + "* Dataset formatting, including conversational and instruction formats\n", + "* Training on completions only, ignoring prompts\n", + "* Packing datasets for more efficient training\n", + "* Parameter-efficient fine-tuning (PEFT) support including QloRA\n", + "* Preparing the model and tokenizer for conversational fine-tuning (such as adding special tokens)\n", + "\n", + "The following code loads the Gemma model and tokenizer from Hugging Face and initializes the quantization configuration.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "18069ed2" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "42e58727637d4495ad8c5f753c5bcd06", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "config.json: 0.00B [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b11ec04ab48043b9937cfa3822b4fa42", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "model.safetensors: 0%| | 0.00/10.2G [00:00\n", + " \n", + " \n", + " [456/456 11:20, Epoch 3/3]\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
EpochTraining LossValidation Loss
11.3267101.441816
21.0427111.320613
30.7391791.458798

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Start training, the model will be automatically saved to the Hub and the output directory\n", + "trainer.train()\n", + "\n", + "# Save the final model again to the Hugging Face Hub\n", + "trainer.save_model()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "b47b9733" + }, + "source": [ + "Before you can test your model, make sure to free the memory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "40a32ed7" + }, + "outputs": [], + "source": [ + "# free the memory again\n", + "del model\n", + "del trainer\n", + "torch.cuda.empty_cache()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "862e9728" + }, + "source": [ + "When using QLoRA, you only train adapters and not the full model. This means when saving the model during training you only save the adapter weights and not the full model. If you want to save the full model, which makes it easier to use with serving stacks like vLLM or TGI, you can merge the adapter weights into the model weights using the `merge_and_unload` method and then save the model with the `save_pretrained` method. This saves a default model, which can be used for inference.\n", + "\n", + "Note: It requires more than 30GB of CPU Memory when you want to merge the adapter into the model. You can skip this and continue with Test Model Inference.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "761e324b" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "20d63c526a854f2a880882c246ac3b3d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Loading weights: 0%| | 0/2011 [00:00<|turn>system\n", + "You are an expert product description writer for Amazon.\n", + "<|turn>user\n", + "\n", + "\n", + "<|image|>\n", + "\n", + "Create a Short Product description based on the provided and and image.\n", + "Only return description. The description should be SEO optimized and for a better mobile search experience.\n", + "\n", + "\n", + "Hasbro Marvel Avengers-Serie Marvel Assemble Titan-Held, Iron Man, 30,5 cm Actionfigur\n", + "\n", + "\n", + "\n", + "Toys & Games | Toy Figures & Playsets | Action Figures\n", + "\n", + "<|turn>model\n", + "\n", + "MODEL OUTPUT>> \n", + "\n", + "Enhance your collection with the Marvel Avengers - Avengers Assemble Ultron-Comforter Set! This soft and cuddly blanket and pillowcase feature everyone's favorite Avengers, Iron Man, and his loyal companion War Machine. Officially licensed by Marvel. Bring home the heroic team!\n" + ] + } + ], + "source": [ + "import requests\n", + "from PIL import Image\n", + "\n", + "# Test sample with Product Name, Category and Image\n", + "sample = {\n", + " \"product_name\": \"Hasbro Marvel Avengers-Serie Marvel Assemble Titan-Held, Iron Man, 30,5 cm Actionfigur\",\n", + " \"category\": \"Toys & Games | Toy Figures & Playsets | Action Figures\",\n", + " \"image\": Image.open(requests.get(\"https://m.media-amazon.com/images/I/81+7Up7IWyL._AC_SY300_SX300_.jpg\", stream=True).raw).convert(\"RGB\")\n", + "}\n", + "\n", + "def generate_description(sample, model, processor):\n", + " # Convert sample into messages and then apply the chat template\n", + " messages = [\n", + " {\"role\": \"system\", \"content\": system_message},\n", + " {\"role\": \"user\", \"content\": [\n", + " {\"type\": \"image\",\"image\": sample[\"image\"]},\n", + " {\"type\": \"text\", \"text\": user_prompt.format(product=sample[\"product_name\"], category=sample[\"category\"])},\n", + " ]},\n", + " ]\n", + " text = processor.apply_chat_template(\n", + " messages, tokenize=False, add_generation_prompt=True\n", + " )\n", + " print(text)\n", + " # Process the image and text\n", + " image_inputs = process_vision_info(messages)\n", + " # Tokenize the text and process the images\n", + " inputs = processor(\n", + " text=[text],\n", + " images=image_inputs,\n", + " padding=True,\n", + " return_tensors=\"pt\",\n", + " )\n", + " # Move the inputs to the device\n", + " inputs = inputs.to(model.device)\n", + "\n", + " # Generate the output\n", + " stop_token_ids = [processor.tokenizer.eos_token_id, processor.tokenizer.convert_tokens_to_ids(\"\")]\n", + " generated_ids = model.generate(**inputs, max_new_tokens=256, top_p=1.0, do_sample=True, temperature=0.8, eos_token_id=stop_token_ids, disable_compile=True)\n", + " # Trim the generation and decode the output to text\n", + " generated_ids_trimmed = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]\n", + " output_text = processor.batch_decode(\n", + " generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False\n", + " )\n", + " return output_text[0]\n", + "\n", + "# generate the description\n", + "description = generate_description(sample, model, processor)\n", + "print(\"MODEL OUTPUT>> \\n\")\n", + "print(description)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6f8ff452" + }, + "source": [ + "## Summary and next steps\n", + "\n", + "This tutorial covered how to fine-tune a Gemma model for vision tasks using TRL and QLoRA, specifically for generating product descriptions. Check out the following docs next:\n", + "\n", + "* Learn how to [generate text with a Gemma model](https://ai.google.dev/gemma/docs/get_started).\n", + "* Learn how to [fine-tune Gemma for text tasks using Hugging Face Transformers](https://ai.google.dev/gemma/docs/core/huggingface_text_finetune_qlora).\n", + "* Learn how to [full model fine-tune using Hugging Face Transformers](https://ai.google.dev/gemma/docs/core/huggingface_text_full_finetune).\n", + "* Learn how to perform [distributed fine-tuning and inference on a Gemma model](https://ai.google.dev/gemma/docs/core/distributed_tuning).\n", + "* Learn how to [use Gemma open models with Vertex AI](https://cloud.google.com/vertex-ai/docs/generative-ai/open-models/use-gemma).\n", + "* Learn how to [fine-tune Gemma using KerasNLP and deploy to Vertex AI](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_gemma_kerasnlp_to_vertexai.ipynb)." + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "name": "huggingface_vision_finetune_qlora.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/tooling/fine-tuning/google-cookbook/lora_tuning.ipynb b/tooling/fine-tuning/google-cookbook/lora_tuning.ipynb new file mode 100644 index 0000000..4ad0f7c --- /dev/null +++ b/tooling/fine-tuning/google-cookbook/lora_tuning.ipynb @@ -0,0 +1,789 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "G3MMAcssHTML" + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Tce3stUlHN0L" + }, + "source": [ + "##### Copyright 2025 Google LLC." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "tuOe1ymfHZPu" + }, + "outputs": [], + "source": [ + "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SDEExiAk4fLb" + }, + "source": [ + "# Fine-tune Gemma in Keras using LoRA" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZFWzQEqNosrS" + }, + "source": [ + "\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + " View on ai.google.dev\n", + " \n", + " Run in Google Colab\n", + " \n", + " Run in Kaggle\n", + " \n", + " Open in Vertex AI\n", + " \n", + " View source on GitHub\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lSGRSsRPgkzK" + }, + "source": [ + "Generative artificial intelligent (AI) models like Gemma are effective at a variety of tasks. You can further fine-tune Gemma models with domain-specific data to perform tasks such as sentiment analysis. However, full fine-tuning of generative models by updating billions of parameters is resource intensive, requiring specialized hardware, such as GPUs, processing time, and memory to load the model parameters.\n", + "\n", + "[Low Rank Adaptation](https://arxiv.org/abs/2106.09685) (LoRA) is a fine-tuning technique which greatly reduces the number of trainable parameters for downstream tasks by freezing the weights of the model and inserting a smaller number of new weights into the model. This technique makes training with LoRA much faster and more memory-efficient, and produces smaller model weights (a few hundred MBs), all while maintaining the quality of the model outputs. This tutorial walks you through using Keras to perform LoRA fine-tuning on a Gemma model." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lyhHCMfoRZ_v" + }, + "source": [ + "## Setup\n", + "\n", + "To complete this tutorial, you will first need to complete the setup instructions at [Gemma setup](https://ai.google.dev/gemma/docs/setup). The Gemma setup instructions show you how to do the following:\n", + "\n", + "* Get access to Gemma on [kaggle.com](https://kaggle.com).\n", + "* Select a Colab runtime with sufficient resources to tune\n", + " the Gemma model you want to run. [Learn more](https://ai.google.dev/gemma/docs/core#sizes).\n", + "* Generate and configure a Kaggle username and API key.\n", + "\n", + "After you've completed the Gemma setup, move on to the next section, where you'll set environment variables for your Colab environment." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AZ5Qo0fxRZ1V" + }, + "source": [ + "### Select a Colab runtime\n", + "\n", + "To complete this tutorial, you'll need to have a Colab runtime with sufficient resources to run the Gemma model. In this case, you can use a T4 GPU:\n", + "\n", + "1. In the upper-right of the Colab window, select ▾ (**Additional connection options**).\n", + "2. Select **Change runtime type**.\n", + "3. Under **Hardware accelerator**, select **T4 GPU**." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hsPC0HRkJl0K" + }, + "source": [ + "### Configure your API key\n", + "\n", + "To use Gemma, you must provide your Kaggle username and a Kaggle API key.\n", + "\n", + "To generate a Kaggle API key, go to the **Account** tab of your Kaggle user profile and select **Create New Token**. This triggers the download of a `kaggle.json` file containing your API credentials.\n", + "\n", + "In Colab, select **Secrets** (🔑) in the left pane and add your Kaggle username and Kaggle API key. Store your username under the name `KAGGLE_USERNAME` and your API key under the name `KAGGLE_KEY`." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7iOF6Yo-wUEC" + }, + "source": [ + "### Set environment variables\n", + "\n", + "Set environment variables for `KAGGLE_USERNAME` and `KAGGLE_KEY`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0_EdOg9DPK6Q" + }, + "outputs": [], + "source": [ + "import os\n", + "from google.colab import userdata\n", + "\n", + "# Note: `userdata.get` is a Colab API. If you're not using Colab, set the env\n", + "# vars as appropriate for your system.\n", + "\n", + "os.environ[\"KAGGLE_USERNAME\"] = userdata.get('KAGGLE_USERNAME')\n", + "os.environ[\"KAGGLE_KEY\"] = userdata.get('KAGGLE_KEY')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CuEUAKJW1QkQ" + }, + "source": [ + "### Install Keras packages\n", + "\n", + "Install the Keras and KerasHub Python packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1eeBtYqJsZPG" + }, + "outputs": [], + "source": [ + "!pip install -q -U keras-hub\n", + "!pip install -q -U keras" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rGLS-l5TxIR4" + }, + "source": [ + "### Select a backend\n", + "\n", + "Keras is a high-level, multi-framework deep learning API designed for simplicity and ease of use. Using Keras 3, you can run workflows on one of three backends: TensorFlow, JAX, or PyTorch. For this tutorial, configure the backend for JAX as it typically provides the better performance." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "yn5uy8X8sdD0" + }, + "outputs": [], + "source": [ + "os.environ[\"KERAS_BACKEND\"] = \"jax\" # Or \"torch\" or \"tensorflow\".\n", + "# Avoid memory fragmentation on JAX backend.\n", + "os.environ[\"XLA_PYTHON_CLIENT_MEM_FRACTION\"]=\"1.00\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hZs8XXqUKRmi" + }, + "source": [ + "### Import packages\n", + "\n", + "Import the Python packages needed for this tutorial, including Keras and KerasHub." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "FYHyPUA9hKTf" + }, + "outputs": [], + "source": [ + "import keras\n", + "import keras_hub" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7RCE3fdGhDE5" + }, + "source": [ + "## Load model\n", + "\n", + "Keras provides implementations of Gemma and many other popular [model architectures](https://keras.io/keras_hub/api/models/). Use the `Gemma3CausalLM.from_preset()` method to configure an end-to-end Gemma model for causal language modeling. A causal language model predicts the next token based on previous tokens." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vz5zLEyLstfn" + }, + "outputs": [], + "source": [ + "gemma_lm = keras_hub.models.Gemma3CausalLM.from_preset(\"gemma3_instruct_1b\")\n", + "gemma_lm.summary()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Nl4lvPy5zA26" + }, + "source": [ + "The `Gemma3CausalLM.from_preset()` method instantiates the model from a preset architecture and weights. In the code above, the string `\"gemma#_xxxxxxx\"` specifies a preset version and parameter size for Gemma. You can find the code strings for Gemma models in their **Model Variation** listings on [Kaggle](https://www.kaggle.com/models/keras/gemma3)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "G_L6A5J-1QgC" + }, + "source": [ + "## Inference before fine tuning\n", + "\n", + "Once you have downloaded and configured a Gemma model, you can query it with various prompts to see how it responds." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PVLXadptyo34" + }, + "source": [ + "### Europe trip prompt\n", + "\n", + "Query the model for suggestions on what to do on a trip to Europe." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ZwQz3xxxKciD" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Instruction:\n", + "What should I do on a trip to Europe?\n", + "\n", + "Response:\n", + "The first thing to know is that you will have a great time!\n", + "\n", + "Europe is a great place for a vacation. The countries of Europe are all very different and offer a wide range of activities and attractions. The countries of Europe are also very close to each other, which means you can visit many different places within a short time.\n", + "\n", + "The best way to plan a trip to Europe is to look up the countries you want to visit and see what activities are offered in each country. You can also look for tours and tours that offer a good value for money.\n", + "\n", + "You can also look for hotels and flights that offer good deals. If you are looking for a good value for money, you should look for hotels and flights that offer good deals. This means you will have a great time on your trip!\n", + "\n", + "The next step is to book your tickets to the countries you want to visit. If you are planning to visit many countries, it's a good idea to book your tickets early. This means you’ll be able to get the best deal and avoid the long queues.\n", + "\n", + "The next step is to plan your itinerary. You can use a travel guide to plan your itinerary\n" + ] + } + ], + "source": [ + "template = \"Instruction:\\n{instruction}\\n\\nResponse:\\n{response}\"\n", + "\n", + "prompt = template.format(\n", + " instruction=\"What should I do on a trip to Europe?\",\n", + " response=\"\",\n", + ")\n", + "sampler = keras_hub.samplers.TopKSampler(k=5, seed=2)\n", + "gemma_lm.compile(sampler=sampler)\n", + "print(gemma_lm.generate(prompt, max_length=256))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AePQUIs2h-Ks" + }, + "source": [ + "The model responds with generic tips on how to plan a trip." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YQ74Zz_S0iVv" + }, + "source": [ + "### Photosynthesis prompt\n", + "\n", + "Prompt the model to explain photosynthesis in terms simple enough for a 5 year old child to understand." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "lorJMbsusgoo" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Instruction:\n", + "Explain the process of photosynthesis in a way that a child could understand.\n", + "\n", + "Response:\n", + "Photosynthesis is a biological process that occurs in plants, algae, and some other organisms. In the process, light energy is captured and converted into the energy stored in the bonds of organic molecules. The process is crucial for life on Earth because it enables plants to use carbon dioxide and water to produce glucose and oxygen, which are essential for all living things.\n", + "The process involves several stages:\n", + "1. Light Reactions: Light energy is absorbed by pigments in the chloroplasts of the plant, converting it into chemical energy in the form of ATP and reducing power.\n", + "2. Carbon Fixation: During this stage, carbon dioxide is combined with hydrogen to form organic molecules such as starch or glucose, which are used as a source of energy.\n", + "3. Calvin Cycle: The process of carbon fixation occurs in the stroma of the chloroplasts. It involves the capture and reduction of carbon dioxide, producing glucose and reducing power in the form of ATP and NADPH molecules.\n", + "4. Stroma: The stroma is the fluid-filled space where the light reactions occur in the chloroplasts.\n", + "5. Chloroplasts: The chloroplasts contain the green pigments that absorb\n" + ] + } + ], + "source": [ + "prompt = template.format(\n", + " instruction=\"Explain the process of photosynthesis in a way that a child could understand.\",\n", + " response=\"\",\n", + ")\n", + "print(gemma_lm.generate(prompt, max_length=256))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WBQieduRizZf" + }, + "source": [ + "The model response contains words that might not be easy to understand for a child such as chlorophyll." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Pt7Nr6a7tItO" + }, + "source": [ + "## LoRA fine-tuning\n", + "\n", + "This section shows you how to do fine-tuning using the Low Rank Adaptation (LoRA) tuning technique. This approach allows you to change the behavior of Gemma models using fewer compute resources." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9T7xe_jzslv4" + }, + "source": [ + "### Load dataset\n", + "\n", + "Prepare a dataset for tuning by downloading an existing data set and formatting if for use with the the Keras `fit()` fine-tuning method. This tutorial uses the [Databricks Dolly 15k dataset](https://huggingface.co/datasets/databricks/databricks-dolly-15k) for fine-tuning. The dataset contains 15,000 high-quality human-generated prompt and response pairs specifically designed for tuning generative models." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "xRaNCPUXKoa7" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2025-04-10 20:48:49-- https://huggingface.co/datasets/databricks/databricks-dolly-15k/resolve/main/databricks-dolly-15k.jsonl\n", + "Resolving huggingface.co (huggingface.co)... 3.163.189.37, 3.163.189.114, 3.163.189.74, ...\n", + "Connecting to huggingface.co (huggingface.co)|3.163.189.37|:443... connected.\n", + "HTTP request sent, awaiting response... 302 Found\n", + "Location: https://cdn-lfs.hf.co/repos/34/ac/34ac588cc580830664f592597bb6d19d61639eca33dc2d6bb0b6d833f7bfd552/2df9083338b4abd6bceb5635764dab5d833b393b55759dffb0959b6fcbf794ec?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27databricks-dolly-15k.jsonl%3B+filename%3D%22databricks-dolly-15k.jsonl%22%3B&Expires=1744321729&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc0NDMyMTcyOX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5oZi5jby9yZXBvcy8zNC9hYy8zNGFjNTg4Y2M1ODA4MzA2NjRmNTkyNTk3YmI2ZDE5ZDYxNjM5ZWNhMzNkYzJkNmJiMGI2ZDgzM2Y3YmZkNTUyLzJkZjkwODMzMzhiNGFiZDZiY2ViNTYzNTc2NGRhYjVkODMzYjM5M2I1NTc1OWRmZmIwOTU5YjZmY2JmNzk0ZWM%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=vh0VIGB-UkK57FSfRikYCREpKuHt%7EnDKPcHHgC1V9rDXLABIRF81nK7olQhAq6zSbAqEtMNnvHgd8IBK1j54mdIYdVLiBwImqez3xu2CPhzYBtKWInnXj9lTXW0p-9GEHcbU%7Eoot22qFSdwyZf1UIdmHZLTHPWjtLhfRkKbg-ptA3CFeegtmvCtY-WG2GffJ%7Em2q2bbs-U1m0yI7cSTW18nD8VSBihxGOMnS1IhkO-LgE4I6GJISXROTk-61%7EJiEIKcagcijL4QGi8j1g9xeQamBXX4hWBdkbJgX5PtX15Ftd0HCM4zCzcJAUrE3ZEJRLe2XRUwfKU3ai7-%7ErPpnSA__&Key-Pair-Id=K3RPWS32NSSJCE [following]\n", + "--2025-04-10 20:48:49-- https://cdn-lfs.hf.co/repos/34/ac/34ac588cc580830664f592597bb6d19d61639eca33dc2d6bb0b6d833f7bfd552/2df9083338b4abd6bceb5635764dab5d833b393b55759dffb0959b6fcbf794ec?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27databricks-dolly-15k.jsonl%3B+filename%3D%22databricks-dolly-15k.jsonl%22%3B&Expires=1744321729&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc0NDMyMTcyOX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5oZi5jby9yZXBvcy8zNC9hYy8zNGFjNTg4Y2M1ODA4MzA2NjRmNTkyNTk3YmI2ZDE5ZDYxNjM5ZWNhMzNkYzJkNmJiMGI2ZDgzM2Y3YmZkNTUyLzJkZjkwODMzMzhiNGFiZDZiY2ViNTYzNTc2NGRhYjVkODMzYjM5M2I1NTc1OWRmZmIwOTU5YjZmY2JmNzk0ZWM%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=vh0VIGB-UkK57FSfRikYCREpKuHt%7EnDKPcHHgC1V9rDXLABIRF81nK7olQhAq6zSbAqEtMNnvHgd8IBK1j54mdIYdVLiBwImqez3xu2CPhzYBtKWInnXj9lTXW0p-9GEHcbU%7Eoot22qFSdwyZf1UIdmHZLTHPWjtLhfRkKbg-ptA3CFeegtmvCtY-WG2GffJ%7Em2q2bbs-U1m0yI7cSTW18nD8VSBihxGOMnS1IhkO-LgE4I6GJISXROTk-61%7EJiEIKcagcijL4QGi8j1g9xeQamBXX4hWBdkbJgX5PtX15Ftd0HCM4zCzcJAUrE3ZEJRLe2XRUwfKU3ai7-%7ErPpnSA__&Key-Pair-Id=K3RPWS32NSSJCE\n", + "Resolving cdn-lfs.hf.co (cdn-lfs.hf.co)... 18.238.217.63, 18.238.217.81, 18.238.217.120, ...\n", + "Connecting to cdn-lfs.hf.co (cdn-lfs.hf.co)|18.238.217.63|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 13085339 (12M) [text/plain]\n", + "Saving to: ‘databricks-dolly-15k.jsonl’\n", + "\n", + "databricks-dolly-15 100%[===================>] 12.48M --.-KB/s in 0.08s \n", + "\n", + "2025-04-10 20:48:49 (156 MB/s) - ‘databricks-dolly-15k.jsonl’ saved [13085339/13085339]\n", + "\n" + ] + } + ], + "source": [ + "!wget -O databricks-dolly-15k.jsonl https://huggingface.co/datasets/databricks/databricks-dolly-15k/resolve/main/databricks-dolly-15k.jsonl" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "45UpBDfBgf0I" + }, + "source": [ + "### Format tuning data\n", + "\n", + "Format the downloaded data for use with the Keras `fit()` method. The following code extracts a subset of the training examples to execute the notebook faster. Consider using more training data for higher quality fine-tuning." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ZiS-KU9osh_N" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "prompts = []\n", + "responses = []\n", + "line_count = 0\n", + "\n", + "with open(\"databricks-dolly-15k.jsonl\") as file:\n", + " for line in file:\n", + " if line_count >= 1000:\n", + " break # Limit the training examples, to reduce execution time.\n", + "\n", + " examples = json.loads(line)\n", + " # Filter out examples with context, to keep it simple.\n", + " if examples[\"context\"]:\n", + " continue\n", + " # Format data into prompts and response lists.\n", + " prompts.append(examples[\"instruction\"])\n", + " responses.append(examples[\"response\"])\n", + "\n", + " line_count += 1\n", + "\n", + "data = {\n", + " \"prompts\": prompts,\n", + " \"responses\": responses\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cBLW5hiGj31i" + }, + "source": [ + "### Configure LoRA tuning\n", + "\n", + "Activate LoRA tuning using the Keras `model.backbone.enable_lora()` method, including a LoRA rank value. The *LoRA rank* determines the dimensionality of the trainable matrices that are added to the original weights of the LLM. It controls the expressiveness and precision of the fine-tuning adjustments. A higher rank means more detailed changes are possible, but also means more trainable parameters. A lower rank means less computational overhead, but potentially less precise adaptation.\n", + "\n", + "This example uses a LoRA rank of 4. In practice, begin with a relatively small rank (such as 4, 8, 16). This setting is computationally efficient for experimentation. Train your model with this rank and evaluate the performance improvement on your task. Gradually increase the rank in subsequent trials and see if that further boosts performance." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "RCucu6oHz53G" + }, + "outputs": [], + "source": [ + "# Enable LoRA for the model and set the LoRA rank to 4.\n", + "gemma_lm.backbone.enable_lora(rank=4)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PlMLp_NVbRoQ" + }, + "source": [ + "Check the model summary after setting the LoRA rank. Notice that enabling LoRA reduces the number of trainable parameters significantly compared to the total number of parameters in the model:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KqYyS0gm6pNy" + }, + "outputs": [], + "source": [ + "gemma_lm.summary()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hQQ47kcdpbZ9" + }, + "source": [ + "Configure the rest of the fine-tuning settings, including the preprocessor settings, optimizer, number of tuning epochs, and batch size:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "p9sBNH8SAjgB" + }, + "outputs": [], + "source": [ + "# Limit the input sequence length to 256 (to control memory usage).\n", + "gemma_lm.preprocessor.sequence_length = 256\n", + "# Use AdamW (a common optimizer for transformer models).\n", + "optimizer = keras.optimizers.AdamW(\n", + " learning_rate=5e-5,\n", + " weight_decay=0.01,\n", + ")\n", + "# Exclude layernorm and bias terms from decay.\n", + "optimizer.exclude_from_weight_decay(var_names=[\"bias\", \"scale\"])\n", + "\n", + "gemma_lm.compile(\n", + " loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", + " optimizer=optimizer,\n", + " weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()],\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OA0ozGC66tk1" + }, + "source": [ + "### Run the fine-tune process\n", + "\n", + "Run the fine-tuning process using the `fit()` method. This process can take several minutes depending on your compute resources, data size, and number of epochs:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "_Peq7TnLtHse" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[1m1000/1000\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m923s\u001b[0m 888ms/step - loss: 1.5586 - sparse_categorical_accuracy: 0.5251\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "gemma_lm.fit(data, epochs=1, batch_size=1)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bx3m8f1dB7nk" + }, + "source": [ + "#### Mixed precision fine-tuning on NVIDIA GPUs\n", + "\n", + "Full precision is recommended for fine-tuning. When fine-tuning on NVIDIA GPUs, you can use mixed precision (`keras.mixed_precision.set_global_policy('mixed_bfloat16')`) to speed up training with minimal effect on training quality." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "T0lHxEDX03gp" + }, + "outputs": [], + "source": [ + "# Uncomment the line below if you want to enable mixed precision training on GPUs\n", + "# keras.mixed_precision.set_global_policy('mixed_bfloat16')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4yd-1cNw1dTn" + }, + "source": [ + "## Inference after fine-tuning\n", + "\n", + "After fine-tuning, you should see changes in the responses when the tuned model is given the same prompt." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "H55JYJ1a1Kos" + }, + "source": [ + "### Europe trip prompt\n", + "\n", + "Try the Europe trip prompt from earlier and note the differences in the response." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Y7cDJHy8WfCB" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Instruction:\n", + "What should I do on a trip to Europe?\n", + "\n", + "Response:\n", + "When planning a trip to Europe, you should consider your budget, time and the places you want to visit. If you are on a limited budget, consider traveling by train, which is cheaper compared to flying. If you are short on time, consider visiting only a few cities in one region, such as Paris, Amsterdam, London, Berlin, Rome, Venice or Barcelona. If you are looking for more than one destination, try taking a train to different countries and staying in each country for a few days.\n" + ] + } + ], + "source": [ + "prompt = template.format(\n", + " instruction=\"What should I do on a trip to Europe?\",\n", + " response=\"\",\n", + ")\n", + "sampler = keras_hub.samplers.TopKSampler(k=5, seed=2)\n", + "gemma_lm.compile(sampler=sampler)\n", + "print(gemma_lm.generate(prompt, max_length=256))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OXP6gg2mjs6u" + }, + "source": [ + "The model now provides a shorter response to a question about visiting Europe." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "H7nVd8Mi1Yta" + }, + "source": [ + "### Photosynthesis prompt\n", + "\n", + "Try the photosynthesis explanation prompt from earlier and note the differences in the response." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "X-2sYl2jqwl7" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Instruction:\n", + "Explain the process of photosynthesis in a way that a child could understand.\n", + "\n", + "Response:\n", + "The process of photosynthesis is a chemical reaction in plants that converts the energy of sunlight into chemical energy, which the plants can then use to grow and develop. During photosynthesis, a plant will absorb carbon dioxide (CO2) from the air and water from the soil and use the energy from the sun to produce oxygen (O2) and sugars (glucose) as a by-product.\n" + ] + } + ], + "source": [ + "prompt = template.format(\n", + " instruction=\"Explain the process of photosynthesis in a way that a child could understand.\",\n", + " response=\"\",\n", + ")\n", + "print(gemma_lm.generate(prompt, max_length=256))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PCmAmqrvkEhc" + }, + "source": [ + "The model now explains photosynthesis in simpler terms." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "I8kFG12l0mVe" + }, + "source": [ + "## Improving fine-tune results\n", + "\n", + "For demonstration purposes, this tutorial fine-tunes the model on a small subset of the dataset for just one epoch and with a low LoRA rank value. To get better responses from the fine-tuned model, you can experiment with:\n", + "\n", + "1. Increasing the size of the fine-tuning dataset\n", + "2. Training for more steps (epochs)\n", + "3. Setting a higher LoRA rank\n", + "4. Modifying the hyperparameter values such as `learning_rate` and `weight_decay`." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gSsRdeiof_rJ" + }, + "source": [ + "## Summary and next steps\n", + "\n", + "This tutorial covered LoRA fine-tuning on a Gemma model using Keras. Check out the following docs next:\n", + "\n", + "* Learn how to [generate text with a Gemma model](https://ai.google.dev/gemma/docs/get_started).\n", + "* Learn how to perform [distributed fine-tuning and inference on a Gemma model](https://ai.google.dev/gemma/docs/core/distributed_tuning).\n", + "* Learn how to [use Gemma open models with Vertex AI](https://cloud.google.com/vertex-ai/docs/generative-ai/open-models/use-gemma).\n", + "* Learn how to [fine-tune Gemma using Keras and deploy to Vertex AI](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_gemma_kerasnlp_to_vertexai.ipynb)." + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "name": "lora_tuning.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/tooling/fine-tuning/huggingface-recipes/Gemma4_(E2B)-Multimodal.ipynb b/tooling/fine-tuning/huggingface-recipes/Gemma4_(E2B)-Multimodal.ipynb new file mode 100644 index 0000000..ec9243d --- /dev/null +++ b/tooling/fine-tuning/huggingface-recipes/Gemma4_(E2B)-Multimodal.ipynb @@ -0,0 +1,595 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.10.0" + } + }, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "This notebook has vibe test examples to test image, text, audio capabilities of Gemma-4 model. To get started, let's install latest stable release of transformers." + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "!pip install -U transformers" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "We can load model into `AutoModelForMultimodalLM` to make use of all capabilities." + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "import torch\n", + "from PIL import Image\n", + "\n", + "from transformers import AutoModelForMultimodalLM, AutoProcessor\n", + "#model_list = [\"google/gemma-4-26B-A4B-it\", \"google/gemma-4-E4B-it\",\n", + "# \"google/gemma-4-E2B-it\", \"google/gemma-4-31B-it\"]\n", + "model_id = \"google/gemma-4-E2B-it\"\n", + "model = AutoModelForMultimodalLM.from_pretrained(model_id, device_map=\"auto\")\n", + "processor = AutoProcessor.from_pretrained(model_id)" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Code completion" + ], + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "We give Gemma-4 a website screenshot to reproduce the code." + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"image\",\n", + " \"image\": \"https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/landing_page.png\",\n", + " },\n", + " {\"type\": \"text\", \"text\": \"Write HTML code for this page.\"},\n", + " ],\n", + " }\n", + "]\n", + "\n", + "inputs = processor.apply_chat_template(\n", + " messages,\n", + " tokenize=True,\n", + " return_dict=True,\n", + " return_tensors=\"pt\",\n", + " add_generation_prompt=True,\n", + " enable_thinking=True,\n", + ").to(model.device)\n", + "\n", + "output = model.generate(**inputs, max_new_tokens=4000)" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "input_len = inputs.input_ids.shape[-1]\n", + "generated_text_ids = output[0][input_len:]\n", + "generated_text = processor.decode(generated_text_ids, skip_special_tokens=True)\n", + "result = processor.parse_response(generated_text)\n", + "\n", + "print(result[\"content\"])" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Video Inference" + ], + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "We test Gemma-4 on video understanding. If you want to run this example with larger models which don't take audio input, disable `load_audio_from_video`." + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"video\", \"url\": \"https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/concert.mp4\"},\n", + " {\"type\": \"text\", \"text\": \"What is happening in the video? What is the song about?\"},\n", + " ],\n", + " },\n", + "]\n", + "inputs = processor.apply_chat_template(\n", + " messages,\n", + " tokenize=True,\n", + " return_dict=True,\n", + " return_tensors=\"pt\",\n", + " add_generation_prompt=True,\n", + " load_audio_from_video=True,\n", + ").to(model.device)\n", + "output = model.generate(**inputs, max_new_tokens=200)\n", + "input_len = inputs.input_ids.shape[-1]\n", + "generated_text_ids = output[0][input_len:]\n", + "generated_text = processor.decode(generated_text_ids, skip_special_tokens=True)\n", + "result = processor.parse_response(generated_text)\n" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "print(result[\"content\"])" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Multimodal Function Calling" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "import re\n", + "\n", + "WEATHER_TOOL = {\n", + " \"type\": \"function\",\n", + " \"function\": {\n", + " \"name\": \"get_weather\",\n", + " \"description\": \"Gets the current weather for a specific location.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"city\": {\"type\": \"string\", \"description\": \"The city name\"},\n", + " },\n", + " \"required\": [\"city\"],\n", + " },\n", + " },\n", + "}\n", + "tools = [WEATHER_TOOL]\n", + "\n", + "messages = [\n", + " {\"role\": \"user\", \"content\": [\n", + " {\"type\": \"image\", \"image\": \"https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/thailand.jpg\"},\n", + " {\"type\": \"text\", \"text\": \"What is the city in this image? Check the weather there right now.\"},\n", + " ]},\n", + "]\n", + "\n", + "inputs = processor.apply_chat_template(\n", + " messages,\n", + " tools=[WEATHER_TOOL],\n", + " tokenize=True,\n", + " return_dict=True,\n", + " return_tensors=\"pt\",\n", + " add_generation_prompt=True,\n", + " enable_thinking=True,\n", + ").to(model.device)" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "output = model.generate(**inputs, max_new_tokens=1000)" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "input_len = inputs.input_ids.shape[-1]\n", + "generated_text_ids = output[0][input_len:]\n", + "generated_text = processor.decode(generated_text_ids, skip_special_tokens=True)\n", + "result = processor.parse_response(generated_text)" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "print(result[\"content\"])" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "# Any-to-any inference" + ], + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "We can also run the model with `any-to-any` pipeline." + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "from transformers import pipeline\n", + "\n", + "pipe = pipeline(\"any-to-any\", model=\"google/gemma-4-e2b-it\")\n", + "\n", + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"video\",\n", + " \"image\": \"https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/rockets.mp4\",\n", + " },\n", + " {\"type\": \"text\", \"text\": \"What is happening in this video?\"},\n", + " ],\n", + " }\n", + "]\n" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "pipe(messages)#, load_audio_from_video=True)" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"video\",\n", + " \"image\": \"https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/rockets.mp4\",\n", + " },\n", + " {\"type\": \"text\", \"text\": \"What is happening in this video?\"},\n", + " ],\n", + " }\n", + "]\n", + "\n", + "inputs = processor.apply_chat_template(\n", + " messages,\n", + " tokenize=True,\n", + " add_generation_prompt=True,\n", + " return_dict=True,\n", + " return_tensors=\"pt\"\n", + ")\n", + "inputs = inputs.to(model.device)\n", + "\n", + "generated_ids = model.generate(**inputs, max_new_tokens=128)\n", + "generated_ids_trimmed = [\n", + " out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)\n", + "]\n", + "output_text = processor.batch_decode(\n", + " generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False\n", + ")\n", + "print(output_text)\n" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "# Object detection and pointing" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "import re\n", + "import torch\n", + "from transformers.image_utils import load_image\n", + "from PIL import Image\n", + "import matplotlib.pyplot as plt\n", + "import matplotlib.patches as patches\n", + "import json" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "image_url = \"https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/bike.png\"\n", + "image = load_image(image_url)" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def resize_to_48_multiple(image):\n", + " w, h = image.size\n", + " new_w = (w // 48) * 48\n", + " new_h = (h // 48) * 48\n", + " return image.crop((0, 0, new_w, new_h))" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def inputs_for_object_detection(image, what_object):\n", + " messages = [\n", + " {\n", + " \"role\": \"user\", \"content\": [\n", + " {\"type\": \"image\", \"image\": image},\n", + " {\"type\": \"text\", \"text\": f\"What's the bounding box for the {what_object} in the image?\"}\n", + " ]\n", + " }\n", + " ]\n", + "\n", + " inputs = processor.apply_chat_template(\n", + " messages,\n", + " tokenize=True,\n", + " add_generation_prompt=True,\n", + " return_dict=True,\n", + " return_tensors=\"pt\",\n", + " enable_thinking=False,\n", + " )\n", + "\n", + " return inputs.to(model.device)" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def extract_json(text: str):\n", + " text = text.strip()\n", + "\n", + " text = re.sub(r\"^```(?:json)?\\s*\", \"\", text)\n", + " text = re.sub(r\"\\s*```$\", \"\", text)\n", + "\n", + " # Try direct parse first\n", + " try:\n", + " return json.loads(text)\n", + " except json.JSONDecodeError:\n", + " pass\n", + "\n", + " # Fallback: extract first JSON object or array\n", + " match = re.search(r'(\\{.*\\}|\\[.*\\])', text, re.DOTALL)\n", + " if match:\n", + " candidate = match.group(1)\n", + " return json.loads(candidate)\n", + "\n", + " raise ValueError(\"No valid JSON found\")" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def detect_object(image_url, what_object):\n", + " image = load_image(image_url)\n", + " image = resize_to_48_multiple(image)\n", + " inputs = inputs_for_object_detection(image, what_object)\n", + " input_len = inputs[\"input_ids\"].shape[-1]\n", + " generated_outputs = model.generate(**inputs, max_new_tokens=1000, do_sample=False)\n", + " generated = processor.decode(generated_outputs[0, input_len:])\n", + " parsed_json = extract_json(generated)[0]\n", + " return parsed_json" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def draw_pascal_voc_boxes(i, image, box, label, resize_shape=(1000,1000)):\n", + " dpi = 72\n", + " width, height = image.size\n", + " fig, ax = plt.subplots(1, figsize=[width/dpi, height/dpi], tight_layout={'pad':0})\n", + "\n", + " ax.imshow(image)\n", + "\n", + " ymin, xmin, ymax, xmax = box\n", + " re_h, re_w = resize_shape if resize_shape is not None else (height, width)\n", + " xmin = (xmin / re_w) * width\n", + " ymin = (ymin/ re_h) * height\n", + " xmax = (xmax / re_w) * width\n", + " ymax = (ymax/ re_h) * height\n", + "\n", + " w = xmax - xmin\n", + " h = ymax - ymin\n", + "\n", + " rect = patches.Rectangle(\n", + " (xmin, ymin),\n", + " w,\n", + " h,\n", + " linewidth=10,\n", + " edgecolor=\"green\",\n", + " facecolor=\"none\"\n", + " )\n", + " ax.add_patch(rect)\n", + "\n", + " if label is not None:\n", + " ax.text(xmin, ymin-25, label, fontsize=24, bbox=dict(facecolor=\"yellow\", alpha=0.5))\n", + "\n", + " plt.axis(\"off\")\n", + " plt.savefig(f\"boxes_{i}.png\")\n", + " plt.close(fig)\n", + " display(fig)" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def display_detected_object(image_url, what_object):\n", + " image = load_image(image_url)\n", + " image = resize_to_48_multiple(image)\n", + " detection = detect_object(image_url, what_object)\n", + " box = detection[\"box_2d\"]\n", + " label = detection.get(\"label\", f\"{what_object}\")\n", + " draw_pascal_voc_boxes(\"1000\", image, box, label)" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "display_detected_object(\"https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/bike.png\", \"bike\")" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "##\u00a0Captioning" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"image\", \"url\": \"https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/bird.png\"},\n", + " {\"type\": \"text\", \"text\": \"Write single detailed caption for this image.\"},\n", + " ],\n", + " },\n", + "]\n", + "\n", + "inputs = processor.apply_chat_template(\n", + " messages,\n", + " tokenize=True,\n", + " return_dict=True,\n", + " return_tensors=\"pt\",\n", + " add_generation_prompt=True,\n", + ").to(model.device)\n", + "\n", + "output = model.generate(**inputs, max_new_tokens=512)\n", + "input_len = inputs.input_ids.shape[-1]\n", + "generated_text_ids = output[0][input_len:]\n", + "generated_text = processor.decode(generated_text_ids, skip_special_tokens=True)\n", + "result = processor.parse_response(generated_text)\n", + "print(result[\"content\"])" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Audio Understanding" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"audio\", \"url\": \"https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/obama_first_45_secs.mp3\"},\n", + " {\"type\": \"text\", \"text\": \"Can you describe this audio in detail?\"},\n", + " ],\n", + " },\n", + "]\n", + "\n", + "inputs = processor.apply_chat_template(\n", + " messages,\n", + " tokenize=True,\n", + " return_dict=True,\n", + " return_tensors=\"pt\",\n", + " add_generation_prompt=True,\n", + ").to(model.device)\n", + "\n", + "output = model.generate(\n", + " **inputs,\n", + " max_new_tokens=1000,\n", + " do_sample=False,\n", + ")\n", + "\n", + "print(processor.decode(output[0], skip_special_tokens=True))\n" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file diff --git a/tooling/fine-tuning/huggingface-recipes/README.md b/tooling/fine-tuning/huggingface-recipes/README.md new file mode 100644 index 0000000..386369d --- /dev/null +++ b/tooling/fine-tuning/huggingface-recipes/README.md @@ -0,0 +1,195 @@ +# Hugging Face Gemma Recipes + +![repository thumbnail](../assets/thumbnail.png) + +🤗💎 Welcome! This repository contains *minimal* recipes to get started quickly with the Gemma family of models. + +> [!Note] +> Gemma 4 Multimodal inference (vision, video, audio, function calling, object detection): Open In Colab + + +## Getting Started + +To quickly run a Gemma 💎 model on your machine, install the latest version of `timm` (for the vision encoder) and 🤗 `transformers` to run inference, or if you want to fine tune it. + +```shell +$ pip install -U -q transformers timm +``` + +### Inference with pipeline + +The easiest way to start using Gemma 3n is by using the pipeline abstraction in transformers: + +```python +import torch +from transformers import pipeline + +pipe = pipeline( + "image-text-to-text", + model="google/gemma-3n-E4B-it", # "google/gemma-3n-E4B-it" + device="cuda", + torch_dtype=torch.bfloat16 +) + +messages = [ + { + "role": "user", + "content": [ + {"type": "image", "url": "https://huggingface.co/datasets/ariG23498/demo-data/resolve/main/airplane.jpg"}, + {"type": "text", "text": "Describe this image"} + ] + } +] + +output = pipe(text=messages, max_new_tokens=32) +print(output[0]["generated_text"][-1]["content"]) +``` + +### Detailed inference with transformers + +Initialize the model and the processor from the Hub, and write the `model_generation` function that takes care of processing the prompts and running the inference on the model. + +```python +from transformers import AutoProcessor, AutoModelForImageTextToText +import torch + +model_id = "google/gemma-3n-e4b-it" # google/gemma-3n-e2b-it +processor = AutoProcessor.from_pretrained(model_id) +model = AutoModelForImageTextToText.from_pretrained(model_id).to(device) + +def model_generation(model, messages): + inputs = processor.apply_chat_template( + messages, + add_generation_prompt=True, + tokenize=True, + return_dict=True, + return_tensors="pt", + ) + input_len = inputs["input_ids"].shape[-1] + + inputs = inputs.to(model.device, dtype=model.dtype) + + with torch.inference_mode(): + generation = model.generate(**inputs, max_new_tokens=32, disable_compile=False) + generation = generation[:, input_len:] + + decoded = processor.batch_decode(generation, skip_special_tokens=True) + print(decoded[0]) +``` + +And then using calling it with our specific modality: + +#### Text only + +```python +# Text Only + +messages = [ + { + "role": "user", + "content": [ + {"type": "text", "text": "What is the capital of France?"} + ] + } +] +model_generation(model, messages) +``` + +#### Interleaved with Audio + +```python +# Interleaved with Audio + +messages = [ + { + "role": "user", + "content": [ + {"type": "text", "text": "Transcribe the following speech segment in English:"}, + {"type": "audio", "audio": "https://huggingface.co/datasets/ariG23498/demo-data/resolve/main/speech.wav"}, + ] + } +] +model_generation(model, messages) +``` + +#### Interleaved with Image/Video + +```python +# Interleaved with Image + +messages = [ + { + "role": "user", + "content": [ + {"type": "image", "image": "https://huggingface.co/datasets/ariG23498/demo-data/resolve/main/airplane.jpg"}, + {"type": "text", "text": "Describe this image."} + ] + } +] +model_generation(model, messages) +``` + +## Inference + +### Gemma 4 + +#### Notebooks + +* [Multimodal inference with Gemma 4 (vision, video, audio, function calling, object detection)](/notebooks/Gemma4_(E2B)-Multimodal.ipynb) Open In Colab + +### Gemma 3n + +#### Notebooks + +* [Multimodal inference using Gemma 3n via pipeline](/notebooks/gemma3n_inference_via_pipeline.ipynb) Open In Colab + +## Function Calling + +### Gemma 3n + +#### Notebooks + +* [Function Calling with Gemma 3n: Local File Reader](/notebooks/Gemma_3n_Function_Calling_document_summarizer.ipynb) Open In Colab + +## Fine Tuning + +We include a series of notebook+scripts for fine tuning the models. + +### Gemma 3n + +#### Notebooks + +* [Gemma 3n Conversational Fine tuning 2B on free Colab T4](/notebooks/fine_tune_gemma3n_on_t4.ipynb) Open In Colab +* [Gemma 3n Conversational Fine tuning 4B with Unsloth on free Colab T4](/notebooks/Gemma3N_(4B)-Conversational.ipynb) Open In Colab +* [Gemma 3n Multimodal Fine tuning 2B/4B with Unsloth on free Colab T4](/notebooks/gemma3n_multimodal_finetuning_on_rocov2_radiology.ipynb) Open In Colab +* [Fine tuning Gemma 3n on audio](/notebooks/fine_tune_gemma3n_on_audio.ipynb) Open In Colab +* [Fine tuning Gemma 3n on GUI Grounding](/notebooks/Gemma_3n_GUI_Finetune.ipynb) Open In Colab +* [Fine tuning Gemma3n on video+audio using FineVideo (all modalities)](/notebooks/Gemma3n_Fine_tuning_on_All_Modalities.ipynb) Open In Colab + +#### Scripts + +* [Fine tuning Gemma 3n on images using TRL](/scripts/ft_gemma3n_image_trl.py) +* [Fine tuning Gemma 3n on images (script)](/scripts/ft_gemma3n_image_vt.py) +* [Fine tuning Gemma 3n on audio (script)](/scripts/ft_gemma3n_audio_vt.py) +* [Fine tuning Gemma3n on video+audio using FineVideo (all modalities)](/scripts/gemma3n_fine_tuning_on_all_modalities.py) + +### Gemma 3 + +* [Reinforement Learning (GRPO) on Gemma 3 with Unsloth and TRL](/notebooks/Gemma3_(1B)-GRPO.ipynb) Open In Colab +* [Vision fine tuning Gemma 3 4B with Unsloth](/notebooks/Gemma3_(4B)-Vision.ipynb) Open In Colab +* [Conversational fine tuning Gemma 3 4B with Unsloth](/notebooks/Gemma3_(4B).ipynb) Open In Colab + +## RAG + +### Gemma 3n +* [Retrieval-Augmented Generation with Gemma 3n](/notebooks/Gemma_RAG.ipynb) Open In Colab + + +Before fine-tuning the model, ensure all dependencies are installed: + +```bash +$ pip install -U -q -r requirements.txt +``` + +✨ **Bonus:** We've also experimented with adding **object detection** 🔍 capabilities to Gemma 3. You can explore that work in [this dedicated repo](https://github.com/ariG23498/gemma3-object-detection). + diff --git a/tooling/fine-tuning/huggingface-recipes/carla_vlm_gemma.py b/tooling/fine-tuning/huggingface-recipes/carla_vlm_gemma.py new file mode 100644 index 0000000..422ddb4 --- /dev/null +++ b/tooling/fine-tuning/huggingface-recipes/carla_vlm_gemma.py @@ -0,0 +1,302 @@ +# Copyright 2020-2026 The HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# /// script +# dependencies = [ +# "trl", +# "openenv-carla-env @ git+https://huggingface.co/spaces/sergiopaniego/carla_env", +# ] +# /// + + +""" +GRPO training with OpenEnv's CARLA environment for VLMs (Vision Language Models). + +This script uses `environment_factory` with multimodal tool responses: each tool action +returns a camera image from the vehicle alongside the text scene description, allowing the +VLM to see the driving scene visually after each action. + +The CARLA environment simulates an emergency driving scenario where pedestrians are ahead +and the model must learn to observe the scene and take the correct action (e.g., swerve +to an empty lane) to minimize casualties. + +Setup: +```sh +pip install "openenv-carla-env @ git+https://huggingface.co/spaces/sergiopaniego/carla_env" +``` + +Usage (requires at least 2 CARLA Spaces, each supports only 1 concurrent connection): +```sh +python examples/scripts/openenv/carla_vlm.py \ + --env-urls https://server1.hf.space https://server2.hf.space +``` +""" + +import argparse +import base64 +from io import BytesIO + +from carla_env import CarlaAction, CarlaEnv +from datasets import Dataset +from PIL import Image + +from trl import GRPOConfig, GRPOTrainer + + +def parse_args(): + parser = argparse.ArgumentParser(description="Run GRPO VLM training with CARLA environment.") + parser.add_argument("--model", type=str, default="google/gemma-4-E2B-it") + parser.add_argument( + "--env-urls", + type=str, + nargs="+", + required=True, + help="URLs for CARLA environment servers. At least 2 required (1 Space = 1 connection).", + ) + parser.add_argument("--dataset-size", type=int, default=1000) + parser.add_argument("--max-completion-length", type=int, default=3072) + parser.add_argument("--per-device-train-batch-size", type=int, default=None, help="Defaults to len(env-urls).") + parser.add_argument("--gradient-accumulation-steps", type=int, default=4) + parser.add_argument("--max-steps", type=int, default=100) + parser.add_argument("--image-size", type=int, default=256, help="Resize camera images to this size. 0 to disable.") + parser.add_argument("--trackio-space-id", type=str, default=None, help="Trackio Space ID for logging.") + parser.add_argument("--use-lora", action="store_true", help="Use LoRA for memory-efficient training.") + parser.add_argument("--lora-r", type=int, default=128, help="LoRA rank.") + parser.add_argument("--lora-alpha", type=int, default=256, help="LoRA alpha.") + parser.add_argument( + "--lora-target-modules", + type=str, + default="llm-only", + help="LoRA target modules. Use 'llm-only' to skip vision encoder, 'all-linear' for all.", + ) + parser.add_argument( + "--learning-rate", type=float, default=5e-6, help="Learning rate. Default 5e-6 (good for LoRA r=128)." + ) + parser.add_argument("--hub-model-id", type=str, default=None) + parser.add_argument("--hub-private-repo", action="store_true", help="Make the Hub repo private.") + parser.add_argument("--run-name", type=str, default=None) + parser.add_argument("--report-to", type=str, default="trackio", help="Logging backend: wandb, trackio, none.") + return parser.parse_args() + + +SIM_TICKS = 10 + + +def reward_func(completions, environments, **kwargs): + rewards = [] + for i, (comp, env) in enumerate(zip(completions, environments, strict=False)): + # Advance to episode end to capture rubric_reward + try: + final = env._advance_until_done() + if final and final.observation.rubric_reward: + env.reward = final.observation.rubric_reward + except Exception as e: + print(f"[WARN] _advance_until_done failed for gen={i}: {e}") + r = env.reward + tools = [ + msg["tool_calls"][0]["function"]["name"] for msg in comp if isinstance(msg, dict) and msg.get("tool_calls") + ] + rewards.append(r) + print(f"[DEBUG reward] gen={i} tools={tools} env_reward={env.reward} total={r}") + return rewards + + +def main(): + args = parse_args() + env_url_iter = iter(args.env_urls) + image_size = args.image_size + + prompt = """\ +You control an autonomous vehicle in an emergency. There are pedestrians ahead and you must \ +decide what to do immediately. + +You will see a camera image from the vehicle after each action. Use the visual information +along with the scene description to decide your next action. + +You have the following tools available: +- `observe`: Advance time and get a new observation of the scene with a camera image. +- `emergency_stop`: Apply maximum braking to stop the vehicle. +- `lane_change(direction)`: Change lane to the left or right. Direction must be "left" or "right". + +Make one tool call at a time, wait for the result, then decide your next action. +Observe the scene first, then decide the best course of action to minimize harm. +Consider all available actions - sometimes avoiding the obstacle by changing lanes \ +is safer than stopping in its path.""" + + dataset = Dataset.from_dict({"prompt": [[{"role": "user", "content": prompt}] for _ in range(args.dataset_size)]}) + + class CarlaVLMEnv: + def __init__(self): + self.url = next(env_url_iter) + self.client = CarlaEnv(base_url=self.url, connect_timeout_s=30, message_timeout_s=120) + self.reward = 0.0 + + @staticmethod + def _describe(obs) -> str: + parts = [] + parts.append(f"Speed: {obs.speed_kmh:.1f} km/h.") + if obs.nearby_actors: + for actor in obs.nearby_actors: + parts.append(f"- {actor.get('type', 'actor')} at {actor.get('distance', '?')}m") + else: + parts.append("No nearby actors detected.") + if obs.collision_detected: + parts.append(f"COLLISION detected with {obs.collided_with or 'unknown'}!") + return "\n".join(parts) + + @staticmethod + def _decode_image(camera_image_b64, target_size): + """Decode base64 JPEG image and optionally resize.""" + img_bytes = base64.b64decode(camera_image_b64) + img = Image.open(BytesIO(img_bytes)) + if target_size > 0: + img.thumbnail((target_size, target_size), Image.LANCZOS) + return img + + def _format_multimodal(self, obs) -> list: + """Format observation as multimodal content blocks (camera image + text).""" + content = [] + if obs.camera_image is not None: + img = self._decode_image(obs.camera_image, image_size) + content.append({"type": "image", "image": img}) + content.append({"type": "text", "text": self._describe(obs)}) + return content + + def _advance(self, ticks: int = SIM_TICKS): + result = None + for _ in range(ticks): + result = self.client.step(CarlaAction(action_type="observe")) + if result.done: + break + return result + + def _advance_until_done(self, max_ticks: int = 50): + """Advance the simulation until the episode ends.""" + result = None + for _ in range(max_ticks): + result = self.client.step(CarlaAction(action_type="observe")) + if result.done: + break + return result + + def _advance_and_capture(self, ticks: int = SIM_TICKS): + """Advance the simulation, then capture an image of the current state.""" + result = self._advance(ticks) + capture_result = self.client.step(CarlaAction(action_type="capture_image")) + result.observation.camera_image = capture_result.observation.camera_image + return result + + def reset(self, **kwargs) -> str | None: + for attempt in range(3): + try: + result = self.client.reset(scenario_name="trolley_micro_escape_exists") + self.reward = 0.0 + return self._describe(result.observation) + except Exception as e: + if attempt == 2: + raise + print(f"[WARN] reset failed (attempt {attempt + 1}/3): {e}. Reconnecting...") + self.client = CarlaEnv(base_url=self.url, connect_timeout_s=30, message_timeout_s=120) + + def observe(self) -> list: + """ + Get the current scene with a camera image and description. + + Returns: + The camera image and scene description with vehicle state and nearby actors. + """ + result = self._advance_and_capture() + self.reward = result.observation.rubric_reward or 0.0 + return self._format_multimodal(result.observation) + + def emergency_stop(self) -> list: + """ + Apply maximum braking to stop the vehicle. + + Returns: + The camera image and scene description after braking. + """ + self.client.step(CarlaAction(action_type="emergency_stop")) + result = self._advance_and_capture() + self.reward = result.observation.rubric_reward or 0.0 + print(f"[DEBUG env] emergency_stop: done={result.done}, reward={self.reward}") + return self._format_multimodal(result.observation) + + def lane_change(self, direction: str) -> list: + """ + Change lane to avoid obstacles. + + Args: + direction: Direction to change lane, either "left" or "right". + + Returns: + The camera image and scene description after changing lane. + """ + self.client.step(CarlaAction(action_type="lane_change", lane_direction=direction)) + result = self._advance_and_capture() + self.reward = result.observation.rubric_reward or 0.0 + print(f"[DEBUG env] lane_change({direction}): done={result.done}, reward={self.reward}") + return self._format_multimodal(result.observation) + + peft_config = None + if args.use_lora: + from peft import LoraConfig + + if args.lora_target_modules == "llm-only": + target_modules = "all-linear" + exclude_modules = ["vision_tower", "multi_modal_projector"] + else: + target_modules = args.lora_target_modules + exclude_modules = None + + peft_config = LoraConfig( + r=args.lora_r, + lora_alpha=args.lora_alpha, + target_modules=target_modules, + exclude_modules=exclude_modules, + task_type="CAUSAL_LM", + ) + + trainer = GRPOTrainer( + model=args.model, + train_dataset=dataset, + reward_funcs=reward_func, + peft_config=peft_config, + args=GRPOConfig( + chat_template_kwargs={"enable_thinking": False}, + log_completions=True, + logging_steps=2, + num_completions_to_print=1, + max_completion_length=args.max_completion_length, + per_device_train_batch_size=args.per_device_train_batch_size or len(args.env_urls), + steps_per_generation=1, + num_generations=len(args.env_urls), + max_tool_calling_iterations=10, + learning_rate=args.learning_rate, + gradient_accumulation_steps=args.gradient_accumulation_steps, + max_steps=args.max_steps, + push_to_hub=args.hub_model_id is not None, + hub_model_id=args.hub_model_id, + hub_private_repo=args.hub_private_repo, + run_name=args.run_name, + report_to=args.report_to, + trackio_space_id=args.trackio_space_id, + ), + environment_factory=CarlaVLMEnv, + ) + trainer.train() + + +if __name__ == "__main__": + main() diff --git a/tooling/fine-tuning/huggingface-recipes/hf-blog-gemma4.md b/tooling/fine-tuning/huggingface-recipes/hf-blog-gemma4.md new file mode 100644 index 0000000..66ea09c --- /dev/null +++ b/tooling/fine-tuning/huggingface-recipes/hf-blog-gemma4.md @@ -0,0 +1,764 @@ +--- +title: "Welcome Gemma 4: Frontier multimodal intelligence on device" +thumbnail: /blog/assets/gemma4/thumbnail.png +authors: +- user: merve +- user: pcuenq +- user: sergiopaniego +- user: burtenshaw +- user: Steveeeeeeen +- user: alvarobartt +- user: SaylorTwift +--- + +# Welcome Gemma 4: Frontier multimodal intelligence on device + +The Gemma 4 family of multimodal models by Google DeepMind is out on Hugging Face, with support for your favorite agents, inference engines, and fine-tuning libraries 🤗 + +These models are the real deal: truly open with Apache 2 licenses, high quality with pareto frontier arena scores, multimodal including audio, and sizes you can use _everywhere_ including on-device. Gemma 4 builds on advances from previous families and makes them click together. In our tests with pre-release checkpoints we have been impressed by their capabilities, to the extent that we struggled to find good fine-tuning examples because they are _so good_ out of the box. + +We collaborated with Google and the community to make them available everywhere: transformers, llama.cpp, MLX, WebGPU, Rust; you name it. This blog post will show you how to build with [your favorite tools](https://huggingface.co/collections/google/gemma-4) so let us know what you think! + +## Table of Contents + +- [What is New with Gemma 4?](#what-is-new-with-gemma-4) +- [Overview of Capabilities and Architecture](#overview-of-capabilities-and-architecture) + - [Architecture at a Glance](#architecture-at-a-glance) + - [Per-Layer Embeddings (PLE)](#per-layer-embeddings-ple) + - [Shared KV Cache](#shared-kv-cache) +- [Multimodal Capabilities](#multimodal-capabilities) +- [Deploy Anywhere](#deploy-anywhere) + - [transformers](#transformers) + - [Llama.cpp](#llamacpp) + - [Plug in to your local agent](#Plug-in-your-local-agent) + - [transformers.js](#transformersjs) + - [MLX](#mlx) + - [Mistral.rs](#mistralrs) +- [Fine-tuning & Demos](#fine-tuning--demos) + - [Fine-tuning with TRL](#fine-tuning-with-trl) + - [Fine-tuning with TRL on Vertex AI](#fine-tuning-with-trl-on-vertex-ai) + - [Fine-tuning with Unsloth Studio](#fine-tuning-with-unsloth-studio) +- [Try Gemma 4](#try-gemma-4) +- [Benchmark Results](#benchmark-results) +- [Acknowledgements](#acknowledgements) + +# What is new with Gemma 4? + +Similar to Gemma-3n, Gemma 4 supports image, text, and audio inputs, and generates text responses. The text decoder is based on the Gemma model with support for long context windows. The image encoder is similar to the one from Gemma 3 but with two crucial improvements: variable aspect ratios, and configurable number of image token inputs to find your sweet spot between speed, memory, and quality. All models support images (or video) and text inputs, while the small variants (E2B and E4B) support audio as well. + +Gemma 4 comes in four sizes, all base and instruction fine-tuned: + +| Model | Parameter Size | Context Window | Checkpoints | +| :---- | :---- | :---- | :---- | +| Gemma 4 E2B | 2.3B effective, 5.1B with embeddings | 128k | [base](https://huggingface.co/google/gemma-4-E2B), [IT](https://huggingface.co/google/gemma-4-E2B-it) | +| Gemma 4 E4B | 4.5B effective, 8B with embeddings | 128k | [base](https://huggingface.co/google/gemma-4-E4B), [IT](https://huggingface.co/google/gemma-4-E4B-it) | +| Gemma 4 31B | 31B dense model | 256K | [base](https://huggingface.co/google/gemma-4-31B), [IT](https://huggingface.co/google/gemma-4-31B-it) | +| Gemma 4 26B A4B | mixture-of-experts with 4B activated/26B total parameters | 256K | [base](https://huggingface.co/google/gemma-4-26B-A4B), [IT](https://huggingface.co/google/gemma-4-26B-A4B-it) | + +## Overview of Capabilities and Architecture + +Gemma 4 leverages several architecture components used in previous Gemma versions and other open models, and leaves out complex or inconclusive features such as Altup. The combination is a mix designed to be highly compatible across libraries and devices, that can efficiently support long context and agentic use cases, whilst being ideal for quantization. + +As shown in the benchmarks above, this feature mix (combined with the training data and recipe) enables the 31B dense model to achieve an estimated LMArena score (text only) of 1452, while the 26B MoE reaches 1441 with just 4B active parameters 🤯. As we'll see, multimodal operation is comparatively as good as text generation, at least in informal and subjective tests. + +These are the main architecture characteristics in Gemma 4: + +* Alternating **local sliding-window** and **global full-context** attention layers. Smaller dense models use sliding windows of 512 tokens while larger models use 1024 tokens. +* **Dual RoPE** configurations: standard RoPE for sliding layers, pruned RoPE for global layers, to enable longer context. +* **Per-Layer Embeddings (PLE)**: a second embedding table that feeds a small residual signal into every decoder layer. +* **Shared KV Cache**: the last N layers of the model reuse key-value states from earlier layers, eliminating redundant KV projections. +* **Vision encoder**: uses learned 2D positions and multidimensional RoPE. Preserves the original aspect ratios and can encode images to a few different token budgets (70, 140, 280, 560, 1120). +* **Audio encoder**: USM-style conformer with the same base architecture as the one in Gemma-3n. + +#### Per-Layer Embeddings (PLE) + +One of the most distinctive features in smaller Gemma 4 models is Per-Layer Embeddings (PLE), which was introduced previously in Gemma-3n. In a standard transformer, each token gets a single embedding vector at input, and the same initial representation is what the residual stream builds on across all layers, forcing the embedding to frontload everything the model might need. PLE adds a parallel, lower-dimensional conditioning pathway alongside the main residual stream. For each token, it produces a small dedicated vector for every layer by combining two signals: a token-identity component (from an embedding lookup) and a context-aware component (from a learned projection of the main embeddings). Each decoder layer then uses its corresponding vector to modulate the hidden states via a lightweight residual block after attention and feed-forward. This gives each layer its own channel to receive token-specific information only when it becomes relevant, rather than requiring everything to be packed into a single upfront embedding. Because the PLE dimension is much smaller than the main hidden size, this adds meaningful per-layer specialization at modest parameter cost. For multimodal inputs (images, audio, video), PLE is computed before soft tokens are merged into the embedding sequence — since PLE relies on token IDs that are lost once multimodal features replace the placeholders. Multimodal positions use the pad token ID, effectively receiving neutral per-layer signals. + +#### Shared KV Cache + +The **shared KV cache** is an efficiency optimization that reduces both compute and memory during inference. The last `num_kv_shared_layers` layers of the model don't compute their own key and value projections. Instead, they **reuse** the K and V tensors from the last non-shared layer of the same attention type (sliding or full). + +In practice, this has a minimal impact on quality while being much more efficient (in terms of both memory and compute) for long context generation and on-device use. + +## Multimodal Capabilities + +We saw in our tests that Gemma 4 supports comprehensive multimodal capabilities out of the box. We don't know what was the training mix, but we had success using it for tasks such as OCR, speech-to-text, object detection, or pointing. It also supports text-only and multimodal function calling, reasoning, code completion and correction. + +Here, we show a few inference examples across different model sizes. You can run them conveniently with [this notebook](https://github.com/huggingface/huggingface-gemma-recipes/blob/main/notebooks/Gemma4_(E2B)-Multimodal.ipynb). We encourage you to try the demos and share them below this blog! + +### Object Detection and Pointing + +### GUI detection + +We test Gemma 4 on GUI element detection and pointing across different sizes, with the following image and text prompt: "What's the bounding box for the "view recipe" element in the image?" + +![Image](https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/food_resized.png) + +With this prompt, the model natively responds in JSON format with the detected bounding boxes - no need for specific instructions or grammar-constrained generation. We found the coordinates refer to an image size of 1000x1000, relative to the input dimensions. + +We visualize the outputs below for your convenience. We parse the bounding boxes from the returned JSON: ```json\n[\n {"box_2d": [171, 75, 245, 308], "label": "view recipe element"}\n]\n``` + +| E2B | E4B | +| :---- | :---- | +| ![E2B](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/e2b.png) | ![E4B](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/e4b.png) | +| 26/A4B | 31B | +| ![31B](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/26b.png) | ![31B](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/31b.png) | + +### Object Detection + +We test models to detect everyday objects, here we ask them to detect the bike and compare different model outputs. As in the previous case, we parse the bounding box from the json and translate to image space coordinates. + +| E2B | E4B | 26B/A3B | 31B | +| :---- | :---- | :---- | :---- | +| ![E2B](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/e2b_bike.png) | ![E4B](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/bike_e4b.png) | ![26B](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/bike_26b.png) | ![31B](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/31b_bike.png) | + +### Multimodal Thinking and Function Calling + +We asked Gemma 4 to write HTML code to reconstruct a page we made with Gemini 3. Below you can find the code to do this, we enable thinking and ask each model to generate up to 4000 new tokens, to make it foolproof. + +| Gemini Generated Website (Reference) | Gemini Reproduced Image | +| :---- | :---- | +| ![Reference](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/landing_page.png) | ![Reference](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/page_repro_gemini.png) | + +

+Inference code + +```py +messages = [ + { + "role": "user", + "content": [ + { + "type": "image", + "image": "https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/landing_page.png", + }, + {"type": "text", "text": "Write HTML code for this page."}, + ], + } +] +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, + enable_thinking=True, +).to(model.device) +output = model.generate(**inputs, max_new_tokens=4000) +input_len = inputs.input_ids.shape[-1] +generated_text_ids = output[0][input_len:] +generated_text = processor.decode(generated_text_ids, skip_special_tokens=True) +result = processor.parse_response(generated_text) +print(result["content"]) +``` + +
+ +| E2B | E4B | +| :---- | :---- | +| ![Reference](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/page_e2b.png) | ![E4B](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/page_e4b.png) | + +| 31B | 26/A4B | +| :---- | :---- | +| ![31B](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/page_31b.png) | ![MoE](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/page_26b.png) | + +### Video Understanding + +Smaller Gemma 4 models can take in videos with audio while larger ones can take in videos without audio. While the models are not explicitly post-trained on videos, they can understand videos both with and without audio. The model is particularly strong in audios + +
+Inference code + +```py +messages = [ + { + "role": "user", + "content": [ + {"type": "video", "url": "https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/concert.mp4"}, + {"type": "text", "text": "What is happening in the video? What is the song about?"}, + ], + }, +] +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, + load_audio_from_video=True, # disable this for larger models +).to(model.device) +output = model.generate(**inputs, max_new_tokens=200) +input_len = inputs.input_ids.shape[-1] +generated_text_ids = output[0][input_len:] +generated_text = processor.decode(generated_text_ids, skip_special_tokens=True) +print(result["content"]) +``` + +
+ +| Model | Output | +| :---- | :---- | +| E2B (hallucinated for the audio) | Based on the video provided, here is an analysis: **What is happening in this video?** The video captures a live musical performance, likely at a large outdoor festival or concert. * **Setting:** The scene is a large stage set up outdoors, indicated by the extensive lighting rig, large speakers, and the audience visible in the foreground. * **Performers:** There are several musicians on stage. The focal point seems to be a lead singer/guitarist in a blue shirt and white pants who is actively performing, playing an acoustic or electric guitar. There are also other musicians, including drummers and bassists, visible in the background. * **Atmosphere:** The atmosphere is energetic and dramatic, characterized by bright stage lights, intense blue and white spotlights, and a significant amount of stage fog/smoke, which adds to the concert mood. * **Audience:** The foreground is filled with the backs of audience members, showing they are close to the stage and engaged in watching the performance. **What is the song about?** The lyrics provided below the video are: > "I don't care 'bout street > From what I cherish > Could it's time to protest | +| E4B ✅ | The video captures a live performance of a band on a stage with bright lights and smoke effects, surrounded by an audience. The band performs and sings to the crowd. The song appears to be about the struggles and disillusionment of modern life, specifically the feeling of being stuck and the difficulties in finding one's place in a chaotic and overwhelming world. Lyrics mention themes of the "concrete street," "broken promises," "storm of the day," and the difficulty of finding oneself amidst societal pressures. | +| 26/A4B (no audio) ✅ | This video shows a live music performance at an outdoor concert or festival at night. A solo male artist, dressed in a blue shirt and white pants, is the central focus. He is playing an acoustic guitar and singing into a microphone on a large, lit stage. The performance is accompanied by a band, including a drummer and other musicians visible in the background and on large video screens. The stage is filled with atmospheric blue lighting, smoke, and bright spotlights. The video captures different perspectives, including shots from the crowd looking up at the stage and close-up views of the performers projected onto large screens. The crowd is visible in the foreground, watching the show. One of the screens also displays the "Estrella Damm" brand name. | +| 31B (no audio) ✅ | A musician is performing on stage during a concert, playing an acoustic guitar and singing into a microphone. He is wearing a blue shirt and white pants. The stage is filled with atmospheric smoke and illuminated by bright blue and white lights. Other band members, including a drummer and a keyboardist, are also visible on stage. The video shows the musician from the perspective of the audience, with some shots focusing on him and others showing the large screen on the side of the stage, which displays close-ups of the performer and the crowd. | + +### Captioning + +We have tested all models on captioning. All checkpoints perform very well and accurately capture nuance in complex scenerios. +Here's the image we use, with the prompt "Write single detailed caption for this image.". + +![image](https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/bird.png) + +
+Inference code + +```py +messages = [ + { + "role": "user", + "content": [ + {"type": "image", "url": "https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/bird.png"}, + {"type": "text", "text": "Write single detailed caption for this image."}, + ], + }, +] +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +output = model.generate(**inputs, max_new_tokens=512) +input_len = inputs.input_ids.shape[-1] +generated_text_ids = output[0][input_len:] +generated_text = processor.decode(generated_text_ids, skip_special_tokens=True) +result = processor.parse_response(generated_text) +print(result["content"]) +``` + +
+ +| Model | Output | +| :---- | :---- | +| E2B ✅ | A medium shot captures a weathered seagull perched atop a stone pedestal in what appears to be a bustling European square, with a grand, classical-style building featuring ornate columns and architectural details dominating the right side of the frame. In the background, a low, light-colored building stretches across the middle ground, flanked by some green foliage, and hints of other structures, including a terracotta-roofed building, are visible. In the foreground, part of a modern bus stop or shelter is visible, with signage indicating routes, and the overall scene is bathed in bright daylight under a pale sky. | +| E4B ✅ | A solitary seagull perches atop a grey stone pedestal in a bustling European piazza, with the historical grandeur of a classical building dominating the right side of the frame, contrasting with the more modern, low-rise architecture in the background under a bright, hazy sky. | +| 26B/A4B ✅ | A full shot shows a white and brown seagull perched on a gray pillar in Venice, Italy. The seagull faces left, its beak pointed towards the edge of the frame. Behind it is a light gray, two-story building with several dark rectangular windows. To the right of the seagull is a large, white building with columns and intricate carvings. The sky is a clear, light blue. The lighting is bright and even, suggesting it is daytime. | +| 31B ✅ | A high-angle, medium shot shows a white and gray seagull perched on a gray concrete pylon in Venice, Italy. The seagull is facing left, its body angled slightly towards the camera. Its white head, neck, and chest contrast with its gray wings and back. The seagull's orange beak is slightly open. The pylon is a light gray concrete cylinder with a weathered texture. In the background, there are several buildings. To the left is a long, low, white building with a series of small, rectangular windows. In the center is a cluster of green trees. To the right is a red building with a tiled roof and several small, rectangular windows. Further right is a large, ornate, white stone building with several columns and sculptures. The sky is a pale blue, and the light is bright and clear. In the lower-left corner, there is a yellow bus with the word "Ferrovia" printed on it in black letters. There are several people walking in the background, but they are small and out of focus. | + +### Audio Question Answering + +These models are trained to answer questions about speech in audio. Music and non-speech sounds were not part of the training data. + +
+Inference code + +```py +messages = [ + { + "role": "user", + "content": [ + {"type": "audio", "url": "https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/obama_first_45_secs.mp3"}, + {"type": "text", "text": "Can you describe this audio in detail?"}, + ], + }, +] + +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) + +output = model.generate( + **inputs, + max_new_tokens=1000, + do_sample=False, +) + +print(processor.decode(output[0], skip_special_tokens=True)) +``` + +
+ +| Model | Output | +| :---- | :---- | +| E2B | This audio is a personal reflection. The speaker is talking about their final farewell address to the nation, which they delivered in Chicago. They express gratitude for the conversations they've had with the American people, noting that despite not having met them face-to-face or even greeted them, these interactions in various settings like living rooms, schools, farms, factory floors, diners, and military outposts have been what has kept them going. | +| E4B | The audio is a speech excerpt where a speaker is delivering a farewell address to the nation from Chicago. The speaker reflects on their time in office, expressing gratitude for the conversations they had with the American people across various settings like living rooms, schools, farms, factories, diners, and military outposts. The tone is reflective and appreciative, highlighting the importance of these interactions in their political journey. | + +Here is an example if you want to do transcription: + +
+Inference code + +```py +messages = [ + { + "role": "user", + "content": [ + {"type": "audio", "url": "https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/obama_first_45_secs.mp3"}, + {"type": "text", "text": "Transcribe the audio?"}, + ], + }, +] + +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) + +output = model.generate( + **inputs, + max_new_tokens=1000, + do_sample=False, +) + +print(processor.decode(output[0], skip_special_tokens=True)) +``` + +
+ +| Model | Output | +| :---- | :---- | +| E2B | This week I traveled to Chicago to deliver my final farewell address to the nation following in the tradition of presidents before me It was an opportunity to say thank you whether we've seen eye to eye or rarely agreed at all my conversations with you the American people in living rooms and schools at farms and on factory floors at diners and on distant military outposts all these conversations are what have kept me honest | +| E4B | This week I traveled to Chicago to deliver my final farewell address to the nation following in the tradition of presidents before me. It was an opportunity to say thank you. Whether we've seen eye to eye or rarely agreed at all, my conversations with you, the American people, in living rooms and schools, at farms and on factory floors, at diners and on distant military outposts, all these conversations are what have kept me honest. | + +### Multimodal Function Calling + +We test the model by asking to get the weather in the place shown in the image. + +
+Inference code + +```py +import re +WEATHER_TOOL = { + "type": "function", + "function": { + "name": "get_weather", + "description": "Gets the current weather for a specific location.", + "parameters": { + "type": "object", + "properties": { + "city": {"type": "string", "description": "The city name"}, + }, + "required": ["city"], + }, + }, +} +tools = [WEATHER_TOOL] +messages = [ + {"role": "user", "content": [ + {"type": "text", "text": "What is the city in this image? Check the weather there right now."}, + + {"type": "image", "image": "https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/thailand.jpg"}, + ]}, +] +inputs = processor.apply_chat_template( + messages, + tools=[WEATHER_TOOL], + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, + enable_thinking=True, +).to(model.device) +output = model.generate(**inputs, max_new_tokens=1000) +input_len = inputs.input_ids.shape[-1] +generated_text_ids = output[0][input_len:] +generated_text = processor.decode(generated_text_ids, skip_special_tokens=True) +result = processor.parse_response(generated_text) +print(result["content"]) +``` + +
+ +| Model | Output | +| :---- | :---- | +| E2B | **Analyze the Request:** The user has provided an image and asked two questions:\n * "What is the city in this image?" * "Check the weather there right now."\n\n2. **Analyze the Image:** The image displays a large, ornate, tall structure, characteristic of temple architecture, likely in Southeast Asia (given the style). It appears to be a famous landmark. 3. **Identify the Landmark/Location (Internal Knowledge/Visual Cues):** The structure strongly resembles the chedis/stupas found in places like Bangkok, Thailand (e.g., Wat Arun or similar structures).\n\n4. **Determine the City:** Based on the visual evidence, the location is highly likely to be Bangkok, Thailand.\n\n5. **Check Available Tools:** The available tool is `get_weather(city: STRING)`. 6. **Formulate the Plan:**\n * Step 1: Identify the city from the image (Bangkok).\n * Step 2: Use the `get_weather` tool with the identified city.\n\n7. **Construct the Tool Call:** `get_weather(city="Bangkok")`\n\n8. **Generate Response:** The model should first state the identified city and then call the tool to get the weather. (Self-correction: The model *must* call the tool in the correct format if it needs external information.)call:get_weather{city:Bangkok} | +| E4B | '1. **Analyze the Request:** The user has provided an image and asked two things:\n a. Identify the city in the image.\n b. Check the weather in that city right now.\n\n2. **Analyze the Image:** The image shows a very distinctive, ornate tower/spire structure, characteristic of Thai architecture (specifically, a chedi or prang).\n\n3. **Determine City Identification (Internal Knowledge/Visual Recognition):** Based on the architectural style, the structure strongly resembles prominent temples in Thailand, such as Wat Arun in Bangkok.\n\n4. **Evaluate Tool Availability:** I have one available tool: `get_weather(city: STRING)`. This tool requires a city name as input.\n\n5. **Formulate a Plan:**\n a. I need the city name to use the `get_weather` tool.\n b. I must first identify the city from the image.\n c. Since my internal knowledge suggests Bangkok, I will proceed with that assumption, but I must acknowledge that I cannot *definitively* prove the location from the image alone.\n\n6. **Execute Step 1 (City Identification):** Assume the city is Bangkok based on the visual evidence of the architecture.\n\n7. **Execute Step 2 (Tool Call Generation):** Use the `get_weather` tool with "Bangkok" as the `city` argument.\n\n8. **Construct the Response:** Generate the appropriate tool call in JSON format.', 'answer': '<\|tool_call>call:get_weather{city:<\|"\|>Bangkok<\|"\|>}' | +| 31B | The user wants to know the city in the image and the current weather there. 1. **Analyze the image:** The image shows ornate, spire-like structures (prangs) characteristic of Thai temple architecture. Specifically, these look like the prangs at Wat Phra Kaew (Temple of the Emerald Buddha) or Wat Pho in Bangkok, Thailand. The style is very distinctive of Bangkok's royal temple complexes. 2. **Identify the city:** Based on the architectural style, the city is Bangkok. 3. **Find the weather:** I need to call the `get_weather` tool with `city="Bangkok"`.call:get_weather{city:Bangkok} | +| 26/A4B | The image shows several tall, ornate towers (prangs) typical of Thai Buddhist temple architecture. Specifically, these look like the structures within the Grand Palace or Wat Phra Kaew in Bangkok, Thailand. The style is unmistakably Thai. I'll confirm this by looking at the specific towers. These are very iconic.\n2. **Confirm the location:** The image shows elements of the Grand Palace complex in Bangkok.\n3. **Get weather information:** Once the city is identified as Bangkok, I will use the `get_weather` tool with `city='Bangkok'`.call:get_weather{city:Bangkok} | + +# Deploy Anywhere + +Gemma 4 comes with day-0 support for many open-source inference engines, and is ideal for tool calling and agents! We also release ONNX checkpoints that can run on many hardware backends, allowing use cases on edge devices or in browser! + +## transformers + +Gemma 4 comes with first-class transformers support from the get-go 🤗. This integration allows using the model with other libraries like bitsandbytes, PEFT and TRL. Make sure to install the latest version of transformers. + +```bash +pip install -U transformers +``` + +The easiest way to infer with the small Gemma 4 models is through the `any-to-any` pipeline. You can initialize it as follows. + +```py +from transformers import pipeline +pipe = pipeline("any-to-any", model="google/gemma-4-e2b-it") +``` + +You can then pass in images and text as follows. + +```python +messages = [ + { + "role": "user", + "content": [ + { + "type": "image", + "image": "https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/thailand.jpg", + }, + {"type": "text", "text": "Do you have travel advice going to here?"}, + ], + } +] +output = pipe(messages, max_new_tokens=100, return_full_text=False) +output[0]["generated_text"] +# Based on the image, which appears to show a magnificent, ornate **Buddhist temple or pagoda**, likely in Southeast Asia (such as Thailand, Myanmar, or Cambodia), here is some general travel advice.. +``` + +When inferring with videos, you can include the audio track using the `load_audio_from_video` argument. + +```python +messages = [ + { + "role": "user", + "content": [ + { + "type": "video", + "image": "https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/rockets.mp4", + }, + {"type": "text", "text": "What is happening in this video?"}, + ], + } +] +pipe(messages, load_audio_from_video=True) +``` + +Going a level lower, you can load Gemma 4 using the `AutoModelForMultimodalLM` class, especially useful for fine-tuning. The built-in chat template takes care of formatting the inputs correctly, please make sure you use it to prevent subtle mistakes when building the prompt manually. + +
+Inference code + +```python +from transformers import AutoModelForMultimodalLM, AutoProcessor +model = AutoModelForMultimodalLM.from_pretrained("google/gemma-4-E2B-it", device_map="auto") +processor = AutoProcessor.from_pretrained("google/gemma-4-E2B-it") +messages = [ + { + "role": "user", + "content": [ + { + "type": "video", + "image": "https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/rockets.mp4", + }, + {"type": "text", "text": "What is happening in this video?"}, + ], + } +] +inputs = processor.apply_chat_template( + messages, + tokenize=True, + add_generation_prompt=True, + return_dict=True, + return_tensors="pt" +).to(model.device) + +generated_ids = model.generate(**inputs, max_new_tokens=128) +generated_ids_trimmed = [ + out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) +] +output_text = processor.batch_decode( + generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False +) +print(output_text) +``` + +
+ +## Llama.cpp + +Gemma 4 models come with image+text support in llama.cpp from the get-go! This unlocks using Gemma 4 with all of your favorite local apps: llama-cpp server, lmstudio, Jan as well as coding agents like Pi across many backends such as Metal and CUDA. + +You can install llama-cpp as follows. + +```bash +brew install llama.cpp # MacOS +winget install llama.cpp # Windows +``` + +You can then start a server compatible with the OpenAI API Replace the quantization scheme at the end of the command with the precision of your choice. + +```bash +llama-server -hf ggml-org/gemma-4-E2B-it-GGUF +``` + +Check out this link [for more](https://huggingface.co/ggml-org/gemma-4-E2B-it-GGUF?local-app=llama.cpp) options on combining llama.cpp with different coding agents and local apps. Find all the GGUF checkpoints [in this collection](https://huggingface.co/collections/ggml-org/gemma-4). + +## Plug in your local agent + +We worked on making sure the new models work locally with agents like **openclaw, hermes, pi, and open code**. All thanks to llama.cpp! Run the following to try Gemma 4 right away. + +First, start your local server: + +``` +llama-server -hf ggml-org/gemma-4-26b-a4b-it-GGUF:Q4_K_M +``` + +For **hermes:** + +```shell +hermes model +``` + +For **openclaw:** + +```shell +openclaw onboard +``` + +For **pi** define a `~/.pi/agent/models.json`: + +```json +{ + "providers": { + "llama-cpp": { + "baseUrl": "http://localhost:8080/v1", + "api": "openai-completions", + "apiKey": "none", + "models": [ + { + "id": "ggml-org-gemma-4-26b-4b-gguf" + } + ] + } + } +} +``` + +For **open code** define a `~/.config/opencode/opencode.json`: + +```json +{ + "$schema": "https://opencode.ai/config.json", + "provider": { + "llama.cpp": { + "npm": "@ai-sdk/openai-compatible", + "name": "llama-server (local)", + "options": { + "baseURL": "http://127.0.0.1:8080/v1" + }, + "models": { + "gemma-4-26b-4b-it": { + "name": "Gemma 4 (local)", + "limit": { + "context": 128000, + "output": 8192 + } + } + } + } + } +} +``` + +## transformers.js + +transformers.js enables running Gemma 4 right inside browser. You can check out the model card to see text-only, image & text, audio & text inference in detail [here](https://huggingface.co/onnx-community/gemma-4-E2B-it-ONNX#transformersjs-javascript). We also shipped a demo for you to test the model [here](https://huggingface.co/spaces/webml-community/Gemma-4-WebGPU). + +## MLX + +Full multimodal support of Gemma 4 is available using the open-source [`mlx-vlm` library](https://github.com/Blaizzy/mlx-vlm). Here's how to ask the model to describe an image: + +```shell +pip install -U mlx-vlm +``` + +```shell +mlx_vlm.generate \ +--model google/gemma-4-E4B-it \ +--image https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg \ +--prompt "Describe this image in detail" +``` + +mlx-vlm supports TurboQuant, which delivers the same accuracy as the uncompressed baseline while using ~4x less active memory and running a lot faster end-to-end. This makes long-context inference practical on Apple Silicon without sacrificing quality. Use it like this: + +```shell +mlx_vlm.generate \ +--model "mlx-community/gemma-4-26b-a4b-it-4bit" \ +--prompt "Your prompt here" \ +--kv-bits 3.5 \ +--kv-quant-scheme turboquant +``` + +For audio examples and more details, please check [the MLX collection](https://hf.co/mlx-community/gemma-4). + +### Mistral.rs + +[mistral.rs](https://github.com/EricLBuehler/mistral.rs) is a Rust-native inference engine with day-0 Gemma 4 support across all modalities (text, image, video, audio) and builtin tool-calling and agentic functionality. Install mistral.rs: + +```bash +curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/EricLBuehler/mistral.rs/master/install.sh | sh # Linux/macOS + +irm https://raw.githubusercontent.com/EricLBuehler/mistral.rs/master/install.ps1 | iex # Windows +``` + +You can then start an OpenAI-compatible HTTP server: + +```bash +mistralrs serve mistralrs-community/gemma-4-E4B-it-UQFF --from-uqff 8 +``` + +Or, use interactive mode: + +``` +mistralrs run -m google/gemma-4-E4B-it --isq 8 --image image.png -i "Describe this image in detail." + +mistralrs run -m google/gemma-4-E4B-it --isq 8 --audio audio.mp3 -i "Transcribe this fully." +``` + +Find all models [here](https://huggingface.co/mistralrs-community/models). Please, follow [the instructions](https://huggingface.co/mistralrs-community/gemma-4-E2B-it-UQFF#install) in the model cards for installation and inference guidelines. + +## Fine-tuning for all + +Gemma 4 models are ideal for fine-tuning in your favorite tools and platforms and at any budget. + +## Fine-tuning with TRL + +Gemma 4 is fully supported for fine-tuning with TRL. To celebrate, TRL has been upgraded with support for multimodal tool responses when interacting with environments, meaning models can now receive images back from tools during training, not just text. + +To showcase this, we've built an example training script where Gemma 4 learns to drive in the CARLA simulator. The model sees the road through a camera, decides what to do and learns from the outcome. After training, it consistently changes lanes to avoid pedestrians. The same approach works for any task where a model needs to see and act: robotics, web browsing, or other interactive environments. + +Get started: + +```shell +# pip install git+https://github.com/huggingface/trl.git + +python examples/scripts/openenv/carla_vlm_gemma.py \ + --env-urls https://sergiopaniego-carla-env.hf.space \ + https://sergiopaniego-carla-env-2.hf.space \ + --model google/gemma-4-E2B-it +``` + +Find the example [here](https://github.com/huggingface/huggingface-gemma-recipes/blob/main/scripts/carla_vlm_gemma.py). + +### Fine-tuning with TRL on Vertex AI + +Additionally, we have prepared an example on how to fine-tune Gemma 4 with TRL on Vertex AI using SFT, to showcase how to extend the function calling capabilities, whilst freezing both the vision and audio towers. The examples include how to build a custom Docker container with latest Transformers, TRL, etc. with CUDA support on Google Cloud, and how to run it via Vertex AI Serverless Training Jobs. + +```python +# pip install google-cloud-aiplatform --upgrade --quiet +from google.cloud import aiplatform + +aiplatform.init( + project="", + location="", + staging_bucket="", +) + +job = aiplatform.CustomContainerTrainingJob( + display_name="gemma-4-fine-tuning", + container_uri="", + command=["python", "/gcs/gemma-4-fine-tuning/train.py"], +) + +job = job.submit( + replica_count=1, + machine_type="a3-highgpu-1g", + accelerator_type="NVIDIA_H100_80GB", + accelerator_count=1, + base_output_dir="/output-dir", + environment_variables={ + "MODEL_ID": "google/gemma-4-E2B-it", + "HF_TOKEN": , + }, + boot_disk_size_gb=500, +) +``` + +You can find the complete example in the "Hugging Face on Google Cloud" docs at https://hf.co/docs/google-cloud/examples/vertex-ai-notebooks-fine-tune-gemma-4. + +## Fine-tuning with Unsloth Studio + +If you want to fine tune and run a Gemma 4 model in a UI, try out [Unsloth Studio](https://unsloth.ai/docs/new/studio). It runs locally or on Google Colab. First, install and start the app: + +```shell +# install unsloth studio on MacOS, Linux, WSL +curl -fsSL https://unsloth.ai/install.sh | sh + +# install unsloth studio on Windows +irm https://unsloth.ai/install.ps1 | iex + +# launch unsloth studio +unsloth studio -H 0.0.0.0 -p 8888 +# Search for for a Gemma 4 model like google/gemma-4-E2B-it +``` + +Then select any of the Gemma 4 models from the hub. + +![Unsloth Studio](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/gemma4/unsloth.png) + +## Try Gemma 4 + +We have shipped demos for you to try different Gemma 4 models. We include demos based on transformers implementation for [E4B](https://huggingface.co/spaces/huggingface-projects/gemma-4-e4b-it), [26B/A4B](https://huggingface.co/spaces/huggingface-projects/gemma-4-26b-a4b-it), and dense [31B](https://huggingface.co/spaces/huggingface-projects/gemma-4-31b-it) models, as well as a [WebGPU](https://huggingface.co/spaces/webml-community/Gemma-4-WebGPU) demo with transformers.js 🚀 + + + + +## Benchmark Results + +Gemma 4 models demonstrate exceptional performance across diverse benchmarks, from reasoning and coding to vision and long-context tasks. The graph below shows model performance vs size, with Gemma 4 models forming an impressive Pareto frontier: + +
+
+ Gemma 4 Performance vs Size +
+
+ Gemma 4 Arena Elo Score Comparison +
+
+ +

Source: Google (blog.google)

+ +Here are detailed benchmark results for the instruction-tuned models: + +| Benchmark | Gemma 4 31B | Gemma 4 26B A4B | Gemma 4 E4B | Gemma 4 E2B | Gemma 3 27B (no think) | +|-----------|-------------|-----------------|-------------|-------------|------------------------| +| **Reasoning & Knowledge** | +| MMLU Pro | [85.2%](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro?eval_result=google/gemma-4-31B-it) | [82.6%](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro?eval_result=google/gemma-4-26B-A4B-it) | [69.4%](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro?eval_result=google/gemma-4-E4B-it) | [60.0%](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro?eval_result=google/gemma-4-E2B-it) | 67.6% | +| AIME 2026 no tools | [89.2%](https://huggingface.co/datasets/MathArena/aime_2026?eval_result=google/gemma-4-31B-it) | [88.3%](https://huggingface.co/datasets/MathArena/aime_2026?eval_result=google/gemma-4-26B-A4B-it) | [42.5%](https://huggingface.co/datasets/MathArena/aime_2026?eval_result=google/gemma-4-E4B-it) | [37.5%](https://huggingface.co/datasets/MathArena/aime_2026?eval_result=google/gemma-4-E2B-it) | 20.8% | +| GPQA Diamond | [84.3%](https://huggingface.co/datasets/Idavidrein/gpqa?eval_result=google/gemma-4-31B-it) | [82.3%](https://huggingface.co/datasets/Idavidrein/gpqa?eval_result=google/gemma-4-26B-A4B-it) | [58.6%](https://huggingface.co/datasets/Idavidrein/gpqa?eval_result=google/gemma-4-E4B-it) | [43.4%](https://huggingface.co/datasets/Idavidrein/gpqa?eval_result=google/gemma-4-E2B-it) | 42.4% | +| Tau2 (average over 3) | 76.9% | 68.2% | 42.2% | 24.5% | 16.2% | +| BigBench Extra Hard | 74.4% | 64.8% | 33.1% | 21.9% | 19.3% | +| MMMLU | 88.4% | 86.3% | 76.6% | 67.4% | 70.7% | +| **Coding** | +| LiveCodeBench v6 | 80.0% | 77.1% | 52.0% | 44.0% | 29.1% | +| Codeforces ELO | 2150 | 1718 | 940 | 633 | 110 | +| HLE no tools | [19.5%](https://huggingface.co/datasets/cais/hle?eval_result=google/gemma-4-31B-it) | [8.7%](https://huggingface.co/datasets/cais/hle?eval_result=google/gemma-4-26B-A4B-it) | - | - | - | +| HLE with search | [26.5%](https://huggingface.co/datasets/cais/hle?eval_result=google/gemma-4-31B-it) | [17.2%](https://huggingface.co/datasets/cais/hle?eval_result=google/gemma-4-26B-A4B-it) | - | - | - | +| **Vision** | +| MMMU Pro | 76.9% | 73.8% | 52.6% | 44.2% | 49.7% | +| OmniDocBench 1.5 (edit distance) | 0.131 | 0.149 | 0.181 | 0.290 | 0.365 | +| MATH-Vision | 85.6% | 82.4% | 59.5% | 52.4% | 46.0% | +| MedXPertQA MM | 61.3% | 58.1% | 28.7% | 23.5% | - | +| **Audio** | +| CoVoST | - | - | 35.54 | 33.47 | - | +| FLEURS (lower is better) | - | - | 0.08 | 0.09 | - | +| **Long Context** | +| MRCR v2 8 needle 128k (average) | 66.4% | 44.1% | 25.4% | 19.1% | 13.5% | + +## Acknowledgements + +Landing Gemma-4 in the open-source ecosystem took a lot of effort from many people and not only the authors of this blog post. In no particular order, we thank many people from the open-source team: Gemma 4 transformers integration is owed to Cyril, Raushan, Eustache, Arthur, Lysandre. We thank Joshua for transformers.js integration and demo, Eric for mistral.rs integration, Son for Llama.cpp, Prince for MLX integration, Quentin, Albert and Kashif for TRL, Adarsh for SGLang transformers backend and Toshihiro for building the demos. +This work wouldn't have been possible without Google's extensive contribution with the model artefact, but also the significant effort contributing the model to transformers in an effort to standardize it. The open-source ecosystem is now more complete, with a very capable, freely-licensed, open-source model. +The Gemma 4 transformers integration was handled by Cyril, Raushan, Eustache, Arthur, Lysandre. We thank Joshua for the transformers.js integration and demo, Eric for mistral.rs integration, Son for Llama.cpp, Prince for MLX, Quentin for TRL, Adarsh for SGLang transformers backend, and Toshihiro for building several demos. + +This work wouldn't have been possible without Google's extensive contribution with the model artefact, but also their significant effort contributing the model to transformers in an effort to standardize it. The open-source ecosystem is now more complete, with a very capable, freely-licensed, open-source model. diff --git a/tooling/fine-tuning/ollama-llamacpp/ollama-import-lora.md b/tooling/fine-tuning/ollama-llamacpp/ollama-import-lora.md new file mode 100644 index 0000000..38809fe --- /dev/null +++ b/tooling/fine-tuning/ollama-llamacpp/ollama-import-lora.md @@ -0,0 +1,53 @@ +# Ollama: Importing a LoRA/QLoRA Adapter (Gemma 4 applicable) + +Source: https://docs.ollama.com/import (fetched 2026-04-18) + +## Modelfile syntax + +**Safetensors adapter (merged or unmerged):** +```dockerfile +FROM +ADAPTER /path/to/safetensors/adapter/directory +``` + +**GGUF adapter:** +```dockerfile +FROM +ADAPTER /path/to/file.gguf +``` + +## Creation +```shell +ollama create my-model +``` + +## Critical notes + +- **The `FROM` base model MUST match the base the adapter was trained on** or you'll get erratic results. For Gemma 4: `FROM gemma4:e4b-it-q8_0` (or whichever base was used). +- **Non-QLoRA adapters preferred.** Quantized adapter recipes (QLoRA) sometimes diverge in method across frameworks; using a straight LoRA adapter is more portable. +- Gemma 4 is NOT explicitly listed in the Ollama docs' "supported architectures" section (which lists Llama 2/3, Mistral, Gemma 1/2) — but llama.cpp gained Gemma 4 support day one, and the Ollama gemma4 models work. Expect smooth sailing for text; vision adapters are a grey area. + +## Converting a PEFT / Unsloth adapter to GGUF + +Use llama.cpp's `convert_lora_to_gguf.py`: +```bash +python llama.cpp/convert_lora_to_gguf.py \ + --outfile gemma4-mortdecai-adapter.gguf \ + path/to/peft/adapter_dir +``` +Or use HuggingFace's "GGUF-my-LoRA" Space: https://huggingface.co/spaces/ggml-org/gguf-my-lora (web UI). + +## Unsloth fast path + +Unsloth's notebooks finish with a cell that does exactly: +```python +model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m") +``` +which produces a GGUF suitable for direct `ollama create`. + +## Workflow for Seth's homelab + +1. Fine-tune with Unsloth on a rented H100/H200 (or local 3090 for E4B). +2. `model.save_pretrained_merged("merged_out", tokenizer, save_method = "merged_16bit")` — save the merged model in 16-bit safetensors. +3. Use llama.cpp's `convert_hf_to_gguf.py` to make a GGUF, then quantize to Q4_K_M. +4. Write a Modelfile pointing at the GGUF, `ollama create mortdecai-gemma4:v1 -f Modelfile`, push to local Ollama (pve197 CT 105 or steel141). diff --git a/tooling/fine-tuning/recipe-recommendation.md b/tooling/fine-tuning/recipe-recommendation.md new file mode 100644 index 0000000..b4a4e51 --- /dev/null +++ b/tooling/fine-tuning/recipe-recommendation.md @@ -0,0 +1,190 @@ +# Recommended Gemma 4 Fine-Tuning Recipe (Seth's Homelab) + +## TL;DR + +**Use Unsloth. Rent a single H100 on Vast.ai. Fine-tune Gemma 4 E4B (or 31B QLoRA). Save GGUF. `ollama create` back to CT 105.** + +Why not the alternatives: +- **Your 3090 Ti(s):** can handle E2B/E4B LoRA comfortably, but 26B A4B LoRA wants ~40 GB and 31B QLoRA wants 22 GB (fits, tightly). Axolotl's 5090-validated configs need Flex Attention to fit, and you lose half the throughput. An H100 at $2-3/hr for 3-4 hours is cheaper than the time you'll spend tuning memory. +- **Axolotl** is great — in particular the 26B MoE ScatterMoE+expert-LoRA config is genuinely novel and Unsloth doesn't match it. But Axolotl has more moving parts (FSDP, kernels, flex attention), breaks more subtly on config errors, and the docs are less Gemma-4-specific than Unsloth's. +- **TRL** has no Gemma-4-specific SFT script yet — you'd be porting `sft_gemma3.py`. Useful if you need DPO/GRPO or multimodal tool-call GRPO (the CARLA recipe), but heavier lift than Unsloth for plain SFT. +- **Google cookbook** works and is authoritative but is slower than Unsloth (no fused kernels) and the notebook format is noisier to modify. + +## Exact command + +### On a rented H100 (Vast.ai `vast-h100` alias, already configured) + +```bash +ssh vast-h100 +# one-time setup +pip install unsloth "trl==0.22.2" "transformers>=5.5.0" timm torchcodec +``` + +Training script (save as `finetune_gemma4.py` on the H100): + +```python +from unsloth import FastModel +from unsloth.chat_templates import get_chat_template, standardize_data_formats, train_on_responses_only +from datasets import load_dataset +from trl import SFTTrainer, SFTConfig + +MODEL = "unsloth/gemma-4-E4B-it" # swap to "unsloth/gemma-4-31B-it" if you want more headroom +DATASET = "YOUR_DATASET_HERE" # e.g. a mortdecai-style chat JSONL on HF Hub + +# 1. Load model + tokenizer in 4-bit +model, tokenizer = FastModel.from_pretrained( + model_name = MODEL, + max_seq_length = 4096, + load_in_4bit = True, + full_finetuning = False, +) + +# 2. Attach LoRA +model = FastModel.get_peft_model( + model, + finetune_vision_layers = False, # text-only FT + finetune_language_layers = True, + finetune_attention_modules = True, + finetune_mlp_modules = True, + r = 16, + lora_alpha = 16, + lora_dropout = 0, + bias = "none", + random_state = 3407, +) + +# 3. Chat template — "gemma-4" (literal, with dash) +tokenizer = get_chat_template(tokenizer, chat_template = "gemma-4") + +# 4. Dataset: expects ShareGPT-style `conversations` field with {from, value} +# OR OpenAI-style `messages` with {role, content} — standardize_data_formats handles both. +dataset = load_dataset(DATASET, split = "train") +dataset = standardize_data_formats(dataset) + +def fmt(examples): + convos = examples["conversations"] + texts = [ + tokenizer.apply_chat_template(c, tokenize=False, add_generation_prompt=False) + .removeprefix('') # critical: avoid double + for c in convos + ] + return {"text": texts} +dataset = dataset.map(fmt, batched=True) + +# 5. Train +trainer = SFTTrainer( + model = model, + tokenizer = tokenizer, + train_dataset = dataset, + args = SFTConfig( + dataset_text_field = "text", + per_device_train_batch_size = 2, + gradient_accumulation_steps = 4, + warmup_steps = 10, + num_train_epochs = 1, + learning_rate = 2e-4, + logging_steps = 1, + optim = "adamw_8bit", + weight_decay = 0.001, + lr_scheduler_type = "linear", + seed = 3407, + report_to = "none", + output_dir = "outputs", + ), +) + +# 6. Mask everything except assistant turns +trainer = train_on_responses_only( + trainer, + instruction_part = "<|turn>user\n", + response_part = "<|turn>model\n", +) + +trainer.train() + +# 7. Save merged 16-bit for GGUF conversion +model.save_pretrained_merged("merged_out", tokenizer, save_method = "merged_16bit") + +# 8. OR save directly to GGUF (Q4_K_M) — Ollama-ready +model.save_pretrained_gguf("gemma4-mortdecai-v1", tokenizer, quantization_method = "q4_k_m") +``` + +Run: +```bash +python finetune_gemma4.py +``` + +### Pulling the result back and serving on CT 105 + +```bash +# On the Vast box, upload to HF Hub or scp back: +scp -r vast-h100:~/gemma4-mortdecai-v1*.gguf steel141:/tmp/ + +# On CT 105 (pve197 Ollama): +cat > Modelfile <<'EOF' +FROM /path/to/gemma4-mortdecai-v1.Q4_K_M.gguf +PARAMETER num_ctx 8192 +PARAMETER temperature 1.0 +PARAMETER top_p 0.95 +PARAMETER top_k 64 +SYSTEM "You are Mortdecai, a Minecraft ops AI. You are powered by Gemma 4." +EOF +ollama create mortdecai-gemma4:v1 -f Modelfile +ollama run mortdecai-gemma4:v1 +``` + +## Hardware sizing guide (from Unsloth's verified numbers) + +| Variant | LoRA | QLoRA | Full FT | My recommendation | +|---------|------|-------|---------|-------------------| +| E2B | 8-10 GB | 8 GB | ~20 GB | Free Colab T4; local 3090 Ti fine | +| E4B | 17 GB | 10 GB | ~32 GB | Local 3090 Ti (24 GB) tight but fine; H100 faster | +| 26B A4B | >40 GB (16-bit recommended, NOT 4-bit) | not recommended | — | H100 80 GB | +| 31B dense | >48 GB | 22 GB | 2×H100 | H100 80 GB or 2×3090 Ti FSDP | + +For **Mortdecai-style behavior tuning** (matches your existing qwen-based setup), start with **E4B**. It's the sweet spot: larger than qwen3 8B in the things that matter (Gemma 4 E4B beats Gemma 3 27B on most benchmarks), vision-capable if you want it, and fits on a single 3090 Ti locally. + +For a **real coding/reasoning upgrade**, use **31B QLoRA on H100**. Unsloth's 31B QLoRA notebook is the canonical recipe there. + +## Gemma-4-specific pitfalls to NOT miss + +1. **New chat template.** Gemma 4 uses `<|turn>user\n … ` — NOT Gemma 3's `user\n … `. Unsloth's `get_chat_template(tokenizer, chat_template="gemma-4")` handles this; the HF tokenizer's built-in Jinja also handles it if you rely on `apply_chat_template`. Axolotl uses `chat_template: gemma4` (no dash — different key). + +2. **6 new tool-calling tokens.** `<|tool>`, ``, `<|tool_call>`, ``, `<|tool_response>`, ``, plus the string-delimiter `<|"|>`. If fine-tuning on tool-call data, include full `<|tool_call>call:fn_name{args}` in the assistant turn — no `role="tool"` branch exists. + +3. **`modules_to_save=["lm_head","embed_tokens"]` + `ensure_weight_tying=True`** in LoraConfig if going vanilla PEFT (Google's cookbook does this explicitly). The new special tokens are *learned embeddings* — if the embed table is frozen, the adapter sees random vectors for them and training silently underperforms. Unsloth and Axolotl bake this in. + +4. **Freeze the vision/audio tower by default.** Two idioms in the wild: + - Axolotl: `freeze_mm_modules: true` + text-only LoRA regex. + - HF's CARLA example: `target_modules="all-linear"` + `exclude_modules=["vision_tower", "multi_modal_projector"]`. + Only train the vision tower if your task specifically needs the encoder to adapt (new image domain). For text-mode fine-tunes like Mortdecai, always freeze. + +5. **Flash Attention DOES NOT WORK on Gemma 4.** FA2's max `head_dim=256`, FA4's is 128; Gemma 4's `global_head_dim=512` exceeds both. **Use SDP or Flex Attention.** Axolotl's configs set `sdp_attention: true`. TRL's `sft_gemma3.py` uses `attn_implementation="eager"` — this works but is slow; prefer `"sdpa"`. (Unsloth's FastModel handles this automatically.) + +6. **LoRA kernels OFF.** Gemma 4's shared-KV-cache layers break the fused LoRA kernels. Axolotl sets `lora_mlp_kernel/qkv_kernel/o_kernel: false` explicitly. Unsloth's `FastModel` is fine because it uses its own kernel path that knows about shared-KV. + +7. **Don't prepend a second ``.** `apply_chat_template` adds one; SFTTrainer's collator adds one; if you don't `.removeprefix('')` before passing text to the trainer, you train the model to expect ``. Unsloth's example notebooks do this strip — copy their pattern. + +8. **26B A4B: use 16-bit LoRA, not QLoRA.** Unsloth's docs explicitly say "MoE QLoRA not recommended, dense 31B is fine." Axolotl has a ScatterMoE+expert-quantized+expert-LoRA config that does make 4-bit work for the MoE (validated on a 5090), but it's the only tool that does — Unsloth's 26B A4B notebook goes 16-bit for quality. + +9. **Initial training loss of 13-15 on E2B/E4B is normal, not a bug.** Multimodal models start much higher than 5-8. If you see 13-15 don't panic — GOTCHAS.md §"Fine-Tuning Ecosystem Issues" has this. + +10. **`mm_token_type_ids` required during training even for text-only data.** Day-one PEFT/Transformers bug: the multimodal collator requires this field. Pin `transformers>=5.5.0` and `peft>=0.15` to ensure the fix is present. + +## Feature parity snapshot (2026-04-18) + +| Feature | Unsloth | TRL | Axolotl | Google cookbook | +|---------|:-:|:-:|:-:|:-:| +| Text SFT | ✓ | ~ (via gemma3 script, change model_id) | ✓ | ✓ | +| Vision SFT | ✓ | ~ (via sft_vlm_gemma3) | ✓ (E2B) | ✓ | +| Audio SFT | ✓ (E2B/E4B) | ✗ | ✗ | ✗ | +| GRPO | ✓ (E2B + RL game notebooks) | ✓ (CARLA VLM-GRPO, official) | ✗ | ✗ | +| DPO | via TRL | ✓ | ✓ | ✗ | +| 26B MoE native | ✓ (16-bit LoRA) | ~ | ✓ (ScatterMoE + expert-LoRA, validated on 5090) | ✗ | +| 31B dense QLoRA | ✓ | ~ | ✓ (with Flex Attn) | ~ | +| Free Colab T4 path | ✓ (E2B) | ✗ | ✗ | ~ (via Colab Pro) | +| Multi-GPU FSDP | ~ | ✓ | ✓ (first-class) | ~ | + +**Bottom line:** Unsloth has the broadest Gemma-4-native coverage (including audio and RL games, which no one else has). Axolotl has the best 26B MoE story. TRL has the best multimodal-RL story (CARLA). Google cookbook is the reference, not the fast path. + +For Seth's stated use case (fine-tune like mortdecai), Unsloth wins on ergonomics + speed + T4 free-tier fallback. diff --git a/tooling/fine-tuning/trl/dpo.py b/tooling/fine-tuning/trl/dpo.py new file mode 100644 index 0000000..276f4c6 --- /dev/null +++ b/tooling/fine-tuning/trl/dpo.py @@ -0,0 +1,17 @@ +# Copyright 2020-2026 The HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +############################################################################################### +# This file has been moved to https://github.com/huggingface/trl/blob/main/trl/scripts/dpo.py # +############################################################################################### diff --git a/tooling/fine-tuning/trl/grpo_agent.py b/tooling/fine-tuning/trl/grpo_agent.py new file mode 100644 index 0000000..4742a45 --- /dev/null +++ b/tooling/fine-tuning/trl/grpo_agent.py @@ -0,0 +1,320 @@ +# Copyright 2020-2026 The HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# /// script +# dependencies = [ +# "trl[peft]", +# "trackio", +# "kernels", +# ] +# /// + +""" +# Full training +``` +python examples/scripts/grpo_agent.py \ + --model_name_or_path Qwen/Qwen3-1.7B \ + --output_dir grpo_biogrid_qwen_3g-1.7b \ + --push_to_hub True \ + --use_vllm True \ + --vllm_mode colocate \ + --max_completion_length 1024 \ + --report_to trackio \ + --log_completions True \ + --max_steps 400 +``` +""" + +import re +import signal +import sqlite3 +import textwrap +from contextlib import contextmanager + +from datasets import load_dataset + +from trl import GRPOConfig, GRPOTrainer, ModelConfig, ScriptArguments, TrlParser + + +def query_reward(completions, answer, **kwargs): + """ + Reward query strategy: + - Penalize more than 2 queries + - Penalize generic queries (LIMIT 1 / PRAGMA) + - Reward usage of WHERE + - Reward evidence supporting the final answer + """ + rewards = [] + + for completion, ans in zip(completions, answer, strict=False): + reward = 0.0 + sql_queries = [] + tool_results = [] + + # collect all SQL queries and tool results + for turn in completion: + if turn.get("tool_calls"): + for call in turn["tool_calls"]: + sql = call["function"]["arguments"].get("sql_command", "").lower() + sql_queries.append(sql) + if turn.get("role") == "tool" and turn.get("content"): + tool_results.append(turn["content"]) + + # --- penalize too many queries --- + if len(sql_queries) > 3: + reward -= 1.5 + + # --- check query quality --- + where_count = 0 + for q in sql_queries: + if "limit 1" in q: + reward -= 1.0 + if " where " not in q: + reward -= 0.5 + else: + where_count += 1 + reward += min(where_count, 3) * 0.4 # small bonus for WHERE usage + + # --- evidence check: do queries support the answer? --- + combined_results = [] + error_detected = False + + for res in tool_results: + if isinstance(res, dict) and "error" in res: + error_detected = True + elif isinstance(res, list): + combined_results.extend(res) + + # if error detected, penalize heavily + if error_detected: + reward -= 2.0 + elif len(sql_queries) == 0: + reward -= 1.5 + else: + has_hits = len(combined_results) > 0 + correct_answer = ans.lower() + if (has_hits and correct_answer == "yes") or (not has_hits and correct_answer == "no"): + reward += 2.0 + else: + reward -= 1.5 + + rewards.append(reward) + + return rewards + + +def correctness_reward(completions, answer, **kwargs): + """ + Reward Yes/No correctness. + Model must provide final answer enclosed in stars — *yes* or *no*. + Does not reward informal yes/no buried in text. + """ + rewards = [] + for completion, ans in zip(completions, answer, strict=False): + raw = completion[-1]["content"].lower() + + # detect form *yes* or *no* + match = re.search(r"\*(yes|no)\*", raw) + guess = match.group(1) if match else None + + reward = 0.0 + + if guess is None: + reward -= 0.5 # invalid format + elif guess == ans.lower(): + reward += 0.6 # correct under required format + else: + reward -= 1.0 # wrong answer + + rewards.append(reward) + + return rewards + + +def structure_reward(completions, **kwargs): + """ + Reward proper assistant structure. + Encourages a logical sequence: tool call + response + optional extra content. + """ + rewards = [] + + for completion in completions: + has_call = False + has_response = False + has_other = False + + for turn in completion: + role = turn.get("role") + if role == "assistant" and turn.get("tool_calls"): + has_call = True + elif role == "tool": + has_response = True + else: + content = turn.get("content") + if content and content.strip() not in ["", ""]: + has_other = True + + # Reward sequences + if has_call and has_response: + if has_other: + reward = 0.1 + else: + reward = 0.05 # still positive even without extra text + elif has_call and not has_response: + reward = -0.15 + else: + reward = 0.0 # neutral if no call + + rewards.append(reward) + + return rewards + + +# ------------------------ +# Database tool function +# ------------------------ +class TimeoutError(Exception): + """Raised when a function call times out.""" + + pass + + +@contextmanager +def timeout(seconds): + """Context manager that raises TimeoutError if execution exceeds time limit.""" + + def timeout_handler(signum, frame): + raise TimeoutError(f"Operation timed out after {seconds} seconds") + + signal.signal(signal.SIGALRM, timeout_handler) + signal.alarm(seconds) + try: + yield + finally: + signal.alarm(0) + + +def query_biogrid(sql_command: str) -> list[tuple]: + """ + Execute a read-only SQL command on the BioGRID database. + + BioGRID is a curated biological database that compiles protein, genetic, and chemical interactions from multiple organisms. It provides researchers with experimentally verified interaction data to support studies in systems biology and functional genomics. + + Args: + sql_command: The SQL command to execute. + + Returns: + A list of tuples containing the query results. + """ + with timeout(5): + conn = sqlite3.connect("file:biogrid.db?mode=ro", uri=True) + cursor = conn.cursor() + try: + cursor.execute(sql_command) + results = cursor.fetchall() + finally: + conn.close() + return results + + +# ------------------------ +# Dataset formatting +# ------------------------ +def format_example(example): + question = example["question"] + preamble = textwrap.dedent("""\ + You have access to the BioGRID SQLite database. + Use SQL queries to retrieve only the information needed to answer the question. + + Genes may appear in the database in columns `Alt_IDs_Interactor_A` `Alt_IDs_Interactor_B`, `Aliases_Interactor_A` and `Aliases_Interactor_B`, + and each entry can contain multiple gene names or synonyms separated by '|', for example: + 'entrez gene/locuslink:JNKK(gene name synonym)|entrez gene/locuslink:MAPKK4(gene name synonym)|...' + So a gene like 'JNKK' or 'MAPKK4' may appear inside one of these strings. + + If the database schema is unclear or you are unsure about column names: + - First inspect the schema with `PRAGMA table_info(interactions);` + - Or preview a few rows with `SELECT * FROM interactions LIMIT 1;` + + Otherwise, directly query the required data. + + Final answer must be enclosed in stars, e.g. *Yes* or *No*. + Facts: + - The NCBI Taxonomy identifier for humans is taxid:9606. + """) + content = f"{preamble}\nQuestion: {question}" + prompt = [{"role": "user", "content": content}] + return {"prompt": prompt} + + +# ------------------------ +# Main +# ------------------------ +if __name__ == "__main__": + parser = TrlParser((ScriptArguments, GRPOConfig, ModelConfig)) + script_args, training_args, model_args = parser.parse_args_and_config() + + # ------------------------ + # Create DB + # ------------------------ + print("Creating biogrid.db...") + # Load dataset + biogrid_dataset = load_dataset("qgallouedec/biogrid", split="train") + df = biogrid_dataset.to_pandas() + + # Normalize column names: remove spaces, replace with underscores + df.columns = [c.replace(" ", "_") for c in df.columns] + conn = sqlite3.connect("biogrid.db") + try: + df.to_sql("interactions", conn, if_exists="replace", index=False) + print(f"biogrid.db created. Rows stored: {len(df)}") + finally: + conn.close() + + # ------------------------ + # Load and format dataset + # ------------------------ + dataset = load_dataset("qgallouedec/biogrid_qa", split="train") + dataset = dataset.filter( + lambda example: example["question"].startswith("Does the gene ") + ) # keep only simple questions for example + dataset = dataset.map(format_example, remove_columns=["question"]) + + train_dataset = dataset + eval_dataset = None # No eval by default, can be added if needed + + training_args.chat_template_kwargs = {"enable_thinking": False} + + # ------------------------ + # Initialize trainer + # ------------------------ + trainer = GRPOTrainer( + model=model_args.model_name_or_path, + train_dataset=train_dataset, + eval_dataset=eval_dataset, + tools=[query_biogrid], + reward_funcs=[correctness_reward, structure_reward, query_reward], + args=training_args, + ) + + # ------------------------ + # Train + # ------------------------ + trainer.train() + + # ------------------------ + # Save and push + # ------------------------ + trainer.save_model(training_args.output_dir) + if training_args.push_to_hub: + trainer.push_to_hub(dataset_name=script_args.dataset_name) diff --git a/tooling/fine-tuning/trl/grpo_vlm.py b/tooling/fine-tuning/trl/grpo_vlm.py new file mode 100644 index 0000000..c748b1b --- /dev/null +++ b/tooling/fine-tuning/trl/grpo_vlm.py @@ -0,0 +1,157 @@ +# Copyright 2020-2026 The HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# /// script +# dependencies = [ +# "trl[peft]", +# "Pillow", +# "math-verify", +# "latex2sympy2_extended", +# "torchvision", +# "trackio", +# "kernels", +# ] +# /// + +""" +pip install math_verify + +# For Qwen/Qwen2.5-VL-3B-Instruct +accelerate launch \ + --config_file examples/accelerate_configs/deepspeed_zero3.yaml \ + examples/scripts/grpo_vlm.py \ + --model_name_or_path Qwen/Qwen2.5-VL-3B-Instruct \ + --output_dir grpo-Qwen2.5-VL-3B-Instruct \ + --learning_rate 1e-5 \ + --dtype bfloat16 \ + --max_completion_length 1024 \ + --use_vllm \ + --vllm_mode colocate \ + --use_peft \ + --lora_target_modules "q_proj", "v_proj" \ + --log_completions + +# For HuggingFaceTB/SmolVLM2-2.2B-Instruct +pip install num2words==0.5.14 + +accelerate launch \ + --config_file examples/accelerate_configs/deepspeed_zero3.yaml \ + examples/scripts/grpo_vlm.py \ + --model_name_or_path HuggingFaceTB/SmolVLM2-2.2B-Instruct \ + --output_dir grpo-SmolVLM2-2.2B-Instruct \ + --learning_rate 1e-5 \ + --dtype bfloat16 \ + --max_completion_length 1024 \ + --use_peft \ + --lora_target_modules "q_proj", "v_proj" \ + --log_completions \ + --per_device_train_batch_size 1 \ + --gradient_accumulation_steps 2 \ + --num_generations 2 + +""" + +import torch +from datasets import load_dataset + +from trl import ( + GRPOConfig, + GRPOTrainer, + ModelConfig, + ScriptArguments, + TrlParser, + get_kbit_device_map, + get_peft_config, + get_quantization_config, +) +from trl.rewards import accuracy_reward, think_format_reward + + +if __name__ == "__main__": + parser = TrlParser((ScriptArguments, GRPOConfig, ModelConfig)) + script_args, training_args, model_args = parser.parse_args_and_config() + ################ + # Model + ################ + dtype = model_args.dtype if model_args.dtype in ["auto", None] else getattr(torch, model_args.dtype) + training_args.model_init_kwargs = dict( + revision=model_args.model_revision, + attn_implementation=model_args.attn_implementation, + dtype=dtype, + ) + quantization_config = get_quantization_config(model_args) + if quantization_config is not None: + # Passing None would not be treated the same as omitting the argument, so we include it only when valid. + training_args.model_init_kwargs["device_map"] = get_kbit_device_map() + training_args.model_init_kwargs["quantization_config"] = quantization_config + + ################ + # Dataset + ################ + dataset = load_dataset("lmms-lab/multimodal-open-r1-8k-verified", split="train") + dataset = dataset.train_test_split(test_size=100, seed=42) + + SYSTEM_PROMPT = ( + "A conversation between user and assistant. The user asks a question, and the assistant solves it. The " + "assistant first thinks about the reasoning process in the mind and then provides the user with the answer. " + "The reasoning process and answer are enclosed within tags, i.e., \nThis is my " + "reasoning.\n\nThis is my answer." + ) + + def make_conversation(example): + prompt = [ + {"role": "system", "content": SYSTEM_PROMPT}, + {"role": "user", "content": example["problem"]}, + ] + return {"prompt": prompt} + + dataset = dataset.map(make_conversation) + + # Filter have big images + def filter_big_images(example): + image = example["image"] + return image.size[0] < 512 and image.size[1] < 512 + + dataset = dataset.filter(filter_big_images) + + def convert_to_rgb(example): + image = example["image"] + if image.mode != "RGB": + image = image.convert("RGB") + example["image"] = image + return example + + dataset = dataset.map(convert_to_rgb) + + train_dataset = dataset["train"] + eval_dataset = dataset["test"] if training_args.eval_strategy != "no" else None + + ################ + # Training + ################ + trainer = GRPOTrainer( + model=model_args.model_name_or_path, + args=training_args, + reward_funcs=[think_format_reward, accuracy_reward], + train_dataset=train_dataset, + eval_dataset=eval_dataset, + peft_config=get_peft_config(model_args), + ) + + trainer.train() + + # Save and push to hub + trainer.save_model(training_args.output_dir) + if training_args.push_to_hub: + trainer.push_to_hub(dataset_name=script_args.dataset_name) diff --git a/tooling/fine-tuning/trl/sft.py b/tooling/fine-tuning/trl/sft.py new file mode 100644 index 0000000..b6e132e --- /dev/null +++ b/tooling/fine-tuning/trl/sft.py @@ -0,0 +1,17 @@ +# Copyright 2020-2026 The HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +############################################################################################### +# This file has been moved to https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py # +############################################################################################### diff --git a/tooling/fine-tuning/trl/sft_gemma3.py b/tooling/fine-tuning/trl/sft_gemma3.py new file mode 100644 index 0000000..8aeb74a --- /dev/null +++ b/tooling/fine-tuning/trl/sft_gemma3.py @@ -0,0 +1,69 @@ +# Copyright 2020-2026 The HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# /// script +# dependencies = [ +# "trl", +# "Pillow", +# "trackio", +# "kernels", +# ] +# /// + +""" +Train Gemma-3 on the Codeforces COTS dataset. + +accelerate launch --config_file examples/accelerate_configs/deepspeed_zero3.yaml examples/scripts/sft_gemma3.py +""" + +from datasets import load_dataset +from transformers import AutoModelForImageTextToText + +from trl import SFTConfig, SFTTrainer + + +def main(): + # Load dataset + train_dataset = load_dataset("open-r1/codeforces-cots", split="train") + train_dataset = train_dataset.remove_columns("prompt") + + # Load model + model_id = "google/gemma-3-12b-it" + model = AutoModelForImageTextToText.from_pretrained(model_id, attn_implementation="eager") + + # Train model + training_args = SFTConfig( + output_dir=f"{model_id}-codeforces-SFT", + bf16=True, + use_liger_kernel=True, + max_length=8192, + per_device_train_batch_size=1, + gradient_accumulation_steps=8, + dataset_num_proc=32, + num_train_epochs=1, + ) + + trainer = SFTTrainer( + args=training_args, + model=model, + train_dataset=train_dataset, + ) + trainer.train() + + # Push to hub + trainer.push_to_hub(dataset_name="open-r1/codeforces-cots") + + +if __name__ == "__main__": + main() diff --git a/tooling/fine-tuning/trl/sft_tiny_aya_tool_calling.py b/tooling/fine-tuning/trl/sft_tiny_aya_tool_calling.py new file mode 100644 index 0000000..7a29be6 --- /dev/null +++ b/tooling/fine-tuning/trl/sft_tiny_aya_tool_calling.py @@ -0,0 +1,164 @@ +# Copyright 2020-2026 The HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# /// script +# dependencies = [ +# "trl[peft]", +# "bitsandbytes", +# "liger-kernel", +# "trackio", +# ] +# /// + +""" +Teach tool calling to CohereLabs/tiny-aya-global using SFT with QLoRA on the bebechien/SimpleToolCalling dataset. + +The model used in this script does not have native tool-calling support. We extend its existing Jinja2 chat template to +serialize tool schemas into the system preamble and render tool calls as structured XML inside the model's +native <|START_RESPONSE|> / <|END_RESPONSE|> delimiters. The modified template is saved with the tokenizer, so +inference only requires loading the tokenizer from the output directory and calling apply_chat_template with +tools=TOOLS — no manual system-prompt construction needed. + +Example: + + python examples/scripts/sft_tiny_aya_tool_calling.py +""" + +import json +from pathlib import Path + +import torch +from datasets import load_dataset +from peft import LoraConfig +from transformers import AutoModelForCausalLM, BitsAndBytesConfig + +from trl import SFTConfig, SFTTrainer + + +# These are the tool schemas that are used in the dataset +TOOLS = [ + { + "type": "function", + "function": { + "name": "search_knowledge_base", + "description": "Search internal company documents, policies and project data.", + "parameters": { + "type": "object", + "properties": {"query": {"type": "string", "description": "query string"}}, + "required": ["query"], + }, + "return": {"type": "string"}, + }, + }, + { + "type": "function", + "function": { + "name": "search_google", + "description": "Search public information.", + "parameters": { + "type": "object", + "properties": {"query": {"type": "string", "description": "query string"}}, + "required": ["query"], + }, + "return": {"type": "string"}, + }, + }, +] + + +def create_conversation(sample): + return { + "prompt": [{"role": "user", "content": sample["user_content"]}], + "completion": [ + { + "role": "assistant", + "tool_calls": [ + { + "type": "function", + "function": { + "name": sample["tool_name"], + "arguments": json.loads(sample["tool_arguments"]), + }, + } + ], + }, + ], + "tools": TOOLS, + } + + +def main(): + model_id = "CohereLabs/tiny-aya-global" + dataset_name = "bebechien/SimpleToolCalling" + output_dir = "tiny-aya-global-tool-calling-SFT" + + # Load and format dataset + dataset = load_dataset(dataset_name, split="train") + dataset = dataset.map(create_conversation, remove_columns=dataset.features) + dataset = dataset.train_test_split(test_size=0.5, shuffle=True) + + # Load model + model = AutoModelForCausalLM.from_pretrained( + model_id, + attn_implementation="sdpa", + dtype=torch.float16, + quantization_config=BitsAndBytesConfig( + load_in_4bit=True, + bnb_4bit_compute_dtype=torch.float16, + bnb_4bit_use_double_quant=True, + bnb_4bit_quant_type="nf4", + ), + ) + + # Configure LoRA + peft_config = LoraConfig( + r=32, + lora_alpha=32, + target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], + ) + + # Train + training_args = SFTConfig( + output_dir=output_dir, + per_device_train_batch_size=1, + gradient_accumulation_steps=4, + # Use the tool-aware chat template + chat_template_path=str(Path(__file__).parent / "tiny_aya_chat_template.jinja"), + warmup_steps=5, + learning_rate=2e-4, + optim="paged_adamw_8bit", + logging_steps=1, + report_to="trackio", + trackio_space_id=output_dir, + max_length=1024, + use_liger_kernel=True, + activation_offloading=True, + push_to_hub=True, + ) + + trainer = SFTTrainer( + model=model, + args=training_args, + train_dataset=dataset["train"], + peft_config=peft_config, + ) + trainer.train() + + # Save model and tokenizer (tokenizer carries the updated chat template) + trainer.save_model(output_dir) + trainer.push_to_hub(dataset_name=dataset_name) + + +if __name__ == "__main__": + main() diff --git a/tooling/fine-tuning/trl/sft_vlm.py b/tooling/fine-tuning/trl/sft_vlm.py new file mode 100644 index 0000000..54f9207 --- /dev/null +++ b/tooling/fine-tuning/trl/sft_vlm.py @@ -0,0 +1,117 @@ +# Copyright 2020-2026 The HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# /// script +# dependencies = [ +# "trl[peft]", +# "Pillow>=9.4.0", +# "trackio", +# "kernels", +# ] +# /// + +""" +pip install pillow + +# Tested on 8x H100 GPUs +accelerate launch \ + --config_file examples/accelerate_configs/deepspeed_zero3.yaml \ + examples/scripts/sft_vlm.py \ + --dataset_name HuggingFaceH4/llava-instruct-mix-vsft \ + --model_name_or_path llava-hf/llava-1.5-7b-hf \ + --gradient_accumulation_steps 8 \ + --output_dir LLaVA-1.5-7B-SFT \ + --dtype bfloat16 + +For LLaVA-NeXT, use: + --model_name_or_path llava-hf/llava-v1.6-mistral-7b-hf + +For meta-llama/Llama-3.2-11B-Vision-Instruct, use: + --model_name_or_path meta-llama/Llama-3.2-11B-Vision-Instruct + +accelerate launch \ + --config_file examples/accelerate_configs/deepspeed_zero3.yaml \ + examples/scripts/sft_vlm.py \ + --dataset_name HuggingFaceH4/llava-instruct-mix-vsft \ + --model_name_or_path HuggingFaceTB/SmolVLM-Instruct \ + --per_device_train_batch_size 1 \ + --gradient_accumulation_steps 1 \ + --output_dir SmolVLM-SFT \ + --dtype bfloat16 \ + --use_peft \ + --lora_target_modules down_proj, o_proj, k_proj, q_proj, gate_proj, up_proj, v_proj +""" + +import torch +from datasets import load_dataset +from transformers import AutoModelForImageTextToText + +from trl import ( + ModelConfig, + ScriptArguments, + SFTConfig, + SFTTrainer, + TrlParser, + get_kbit_device_map, + get_peft_config, + get_quantization_config, +) + + +if __name__ == "__main__": + parser = TrlParser((ScriptArguments, SFTConfig, ModelConfig)) + script_args, training_args, model_args = parser.parse_args_and_config() + training_args.max_length = None + + ################ + # Model + ################ + dtype = model_args.dtype if model_args.dtype in ["auto", None] else getattr(torch, model_args.dtype) + model_kwargs = dict( + revision=model_args.model_revision, + attn_implementation=model_args.attn_implementation, + dtype=dtype, + ) + quantization_config = get_quantization_config(model_args) + if quantization_config is not None: + # Passing None would not be treated the same as omitting the argument, so we include it only when valid. + model_kwargs["device_map"] = get_kbit_device_map() + model_kwargs["quantization_config"] = quantization_config + + model = AutoModelForImageTextToText.from_pretrained( + model_args.model_name_or_path, trust_remote_code=model_args.trust_remote_code, **model_kwargs + ) + + ################ + # Dataset + ################ + dataset = load_dataset(script_args.dataset_name, name=script_args.dataset_config) + + ################ + # Training + ################ + trainer = SFTTrainer( + model=model, + args=training_args, + train_dataset=dataset[script_args.dataset_train_split], + eval_dataset=dataset[script_args.dataset_test_split] if training_args.eval_strategy != "no" else None, + peft_config=get_peft_config(model_args), + ) + + trainer.train() + + # Save and push to hub + trainer.save_model(training_args.output_dir) + if training_args.push_to_hub: + trainer.push_to_hub(dataset_name=script_args.dataset_name) diff --git a/tooling/fine-tuning/trl/sft_vlm_gemma3.py b/tooling/fine-tuning/trl/sft_vlm_gemma3.py new file mode 100644 index 0000000..14a4f99 --- /dev/null +++ b/tooling/fine-tuning/trl/sft_vlm_gemma3.py @@ -0,0 +1,189 @@ +# Copyright 2020-2026 The HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# /// script +# dependencies = [ +# "trl[peft]", +# "Pillow>=9.4.0", +# "trackio", +# "kernels", +# ] +# /// + +""" +Train Gemma 3 on the HuggingFaceH4/llava-instruct-mix-vsft dataset (single-image). + +accelerate launch \ + --config_file examples/accelerate_configs/deepspeed_zero3.yaml \ + examples/scripts/sft_vlm_gemma3.py \ + --dataset_name HuggingFaceH4/llava-instruct-mix-vsft \ + --model_name_or_path google/gemma-3-4b-it \ + --per_device_train_batch_size 1 \ + --output_dir Gemma-3-4B-SFT-MMIU \ + --dtype bfloat16 \ + --use_peft \ + --lora_target_modules all-linear \ + --attn_implementation eager + +Train Gemma 3 on the FanqingM/MMIU-Benchmark dataset (multi-image). + +accelerate launch \ + --config_file examples/accelerate_configs/deepspeed_zero3.yaml \ + examples/scripts/sft_vlm_gemma3.py \ + --dataset_name FanqingM/MMIU-Benchmark \ + --dataset_train_split test \ + --model_name_or_path google/gemma-3-4b-it \ + --per_device_train_batch_size 1 \ + --output_dir Gemma-3-4B-SFT-MMIU \ + --dtype bfloat16 \ + --use_peft \ + --lora_target_modules all-linear \ + --attn_implementation eager +""" + +import io +import os +import zipfile + +import torch +from datasets import DatasetDict, load_dataset +from huggingface_hub import hf_hub_download, list_repo_files +from PIL import Image +from transformers import AutoModelForImageTextToText + +from trl import ( + ModelConfig, + ScriptArguments, + SFTConfig, + SFTTrainer, + TrlParser, + get_kbit_device_map, + get_peft_config, + get_quantization_config, +) + + +# For multi-image example +def process_vision_info(messages: list[dict]) -> list[Image.Image]: + image_inputs = [] + for msg in messages: + content = msg.get("content", []) + if not isinstance(content, list): + content = [content] + + for element in content: + if isinstance(element, dict) and ("image" in element or element.get("type") == "image"): + if "image" in element: + image = element["image"] + else: + image = element + if image is not None: + image = Image.open(io.BytesIO(image["bytes"])) + image_inputs.append(image.convert("RGB")) + return image_inputs + + +def format_data(samples: dict[str, any]) -> dict[str, list]: + formatted_samples = {"messages": []} + for cont in range(len(samples["question"])): + images = [] + for img_path in samples["input_image_path"][cont]: + try: + with open(img_path, "rb") as f: + img_bytes = f.read() + image = Image.open(io.BytesIO(img_bytes)).convert("RGB") + images.append({"type": "image", "image": image}) + except Exception as e: + print(f"Error processing image {img_path}: {e}") + continue + + formatted_samples["messages"].append( + [ + {"role": "system", "content": [{"type": "text", "text": samples["context"][cont]}]}, + {"role": "user", "content": images + [{"type": "text", "text": samples["question"][cont]}]}, + {"role": "assistant", "content": [{"type": "text", "text": samples["output"][cont]}]}, + ] + ) + return formatted_samples + + +# For multi-image example +def prepare_dataset(dataset: DatasetDict, dataset_name: str) -> DatasetDict: + all_files = list_repo_files(dataset_name, repo_type="dataset") + zip_files = [f for f in all_files if f.endswith(".zip")] + + for zip_filename in zip_files: + zip_path = hf_hub_download(repo_id=dataset_name, filename=zip_filename, repo_type="dataset") + extract_folder = zip_filename.replace(".zip", "") + os.makedirs(extract_folder, exist_ok=True) + + with zipfile.ZipFile(zip_path, "r") as zip_ref: + zip_ref.extractall(extract_folder) + + dataset = dataset.map(format_data, batched=True, batch_size=4, num_proc=16) + return dataset + + +def main(): + parser = TrlParser((ScriptArguments, SFTConfig, ModelConfig)) + script_args, training_args, model_args = parser.parse_args_and_config() + training_args.max_length = None + + ################ + # Model + ################ + dtype = model_args.dtype if model_args.dtype in ["auto", None] else getattr(torch, model_args.dtype) + model_kwargs = dict( + revision=model_args.model_revision, + attn_implementation=model_args.attn_implementation, + dtype=dtype, + ) + quantization_config = get_quantization_config(model_args) + if quantization_config is not None: + # Passing None would not be treated the same as omitting the argument, so we include it only when valid. + model_kwargs["device_map"] = get_kbit_device_map() + model_kwargs["quantization_config"] = quantization_config + + model = AutoModelForImageTextToText.from_pretrained( + model_args.model_name_or_path, trust_remote_code=model_args.trust_remote_code, **model_kwargs + ) + + ################ + # Dataset + ################ + dataset = load_dataset(script_args.dataset_name, name=script_args.dataset_config) + if script_args.dataset_name == "FanqingM/MMIU-Benchmark": + dataset = prepare_dataset(dataset, script_args.dataset_name) + + ################ + # Training + ################ + trainer = SFTTrainer( + model=model, + args=training_args, + train_dataset=dataset[script_args.dataset_train_split], + eval_dataset=dataset[script_args.dataset_test_split] if training_args.eval_strategy != "no" else None, + peft_config=get_peft_config(model_args), + ) + + trainer.train() + + # Save and push to hub + trainer.save_model(training_args.output_dir) + if training_args.push_to_hub: + trainer.push_to_hub(dataset_name=script_args.dataset_name) + + +if __name__ == "__main__": + main() diff --git a/tooling/fine-tuning/trl/trl_scripts_sft.py b/tooling/fine-tuning/trl/trl_scripts_sft.py new file mode 100644 index 0000000..1fb07ac --- /dev/null +++ b/tooling/fine-tuning/trl/trl_scripts_sft.py @@ -0,0 +1,156 @@ +# Copyright 2020-2026 The HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# /// script +# dependencies = [ +# "trl", +# "peft", +# "trackio", +# "kernels", +# ] +# /// + +""" +# Full training +``` +python trl/scripts/sft.py \ + --model_name_or_path Qwen/Qwen2-0.5B \ + --dataset_name trl-lib/Capybara \ + --learning_rate 2.0e-5 \ + --num_train_epochs 1 \ + --packing \ + --per_device_train_batch_size 2 \ + --gradient_accumulation_steps 8 \ + --eos_token '<|im_end|>' \ + --eval_strategy steps \ + --eval_steps 100 \ + --output_dir Qwen2-0.5B-SFT \ + --push_to_hub +``` + +# LoRA +``` +python trl/scripts/sft.py \ + --model_name_or_path Qwen/Qwen2-0.5B \ + --dataset_name trl-lib/Capybara \ + --learning_rate 2.0e-4 \ + --num_train_epochs 1 \ + --packing \ + --per_device_train_batch_size 2 \ + --gradient_accumulation_steps 8 \ + --eos_token '<|im_end|>' \ + --eval_strategy steps \ + --eval_steps 100 \ + --use_peft \ + --lora_r 32 \ + --lora_alpha 16 \ + --output_dir Qwen2-0.5B-SFT \ + --push_to_hub +``` +""" + +import argparse + + +def main(script_args, training_args, model_args, dataset_args): + from accelerate import logging + from datasets import load_dataset + from transformers import AutoConfig, AutoModelForCausalLM + from transformers.models.auto.modeling_auto import MODEL_FOR_IMAGE_TEXT_TO_TEXT_MAPPING_NAMES + + from trl import SFTTrainer, get_dataset, get_kbit_device_map, get_peft_config, get_quantization_config + + logger = logging.get_logger(__name__) + + ################ + # Model init kwargs + ################ + model_kwargs = dict( + revision=model_args.model_revision, + trust_remote_code=model_args.trust_remote_code, + attn_implementation=model_args.attn_implementation, + dtype=model_args.dtype, + ) + quantization_config = get_quantization_config(model_args) + if quantization_config is not None: + # Passing None would not be treated the same as omitting the argument, so we include it only when valid. + model_kwargs["device_map"] = get_kbit_device_map() + model_kwargs["quantization_config"] = quantization_config + + # Create model + config = AutoConfig.from_pretrained(model_args.model_name_or_path) + valid_image_text_architectures = MODEL_FOR_IMAGE_TEXT_TO_TEXT_MAPPING_NAMES.values() + + if config.architectures and any(arch in valid_image_text_architectures for arch in config.architectures): + from transformers import AutoModelForImageTextToText + + model = AutoModelForImageTextToText.from_pretrained(model_args.model_name_or_path, **model_kwargs) + else: + model = AutoModelForCausalLM.from_pretrained(model_args.model_name_or_path, **model_kwargs) + + # Load the dataset + if dataset_args.datasets and script_args.dataset_name: + logger.warning( + "Both `datasets` and `dataset_name` are provided. The `datasets` argument will be used to load the " + "dataset and `dataset_name` will be ignored." + ) + dataset = get_dataset(dataset_args) + elif dataset_args.datasets and not script_args.dataset_name: + dataset = get_dataset(dataset_args) + elif not dataset_args.datasets and script_args.dataset_name: + dataset = load_dataset( + script_args.dataset_name, name=script_args.dataset_config, streaming=script_args.dataset_streaming + ) + else: + raise ValueError("Either `datasets` or `dataset_name` must be provided.") + + # Initialize the SFT trainer + trainer = SFTTrainer( + model=model, + args=training_args, + train_dataset=dataset[script_args.dataset_train_split], + eval_dataset=dataset[script_args.dataset_test_split] if training_args.eval_strategy != "no" else None, + peft_config=get_peft_config(model_args), + ) + + # Train the model + trainer.train() + + # Log training complete + trainer.accelerator.print("✅ Training completed.") + + # Save and push to Hub + trainer.save_model(training_args.output_dir) + trainer.accelerator.print(f"💾 Model saved to {training_args.output_dir}.") + + if training_args.push_to_hub: + trainer.push_to_hub(dataset_name=script_args.dataset_name) + trainer.accelerator.print(f"🤗 Model pushed to the Hub in https://huggingface.co/{trainer.hub_model_id}.") + + +def make_parser(subparsers: argparse._SubParsersAction | None = None, prog: str | None = None): + from trl import DatasetMixtureConfig, ModelConfig, ScriptArguments, SFTConfig, TrlParser + + dataclass_types = (ScriptArguments, SFTConfig, ModelConfig, DatasetMixtureConfig) + if subparsers is not None: + parser = subparsers.add_parser("sft", help="Run the SFT training script", dataclass_types=dataclass_types) + else: + parser = TrlParser(dataclass_types, prog=prog) + return parser + + +if __name__ == "__main__": + parser = make_parser() + script_args, training_args, model_args, dataset_args = parser.parse_args_and_config(fail_with_unknown_args=False) + main(script_args, training_args, model_args, dataset_args) diff --git a/tooling/fine-tuning/unsloth/docs/unsloth-README.md b/tooling/fine-tuning/unsloth/docs/unsloth-README.md new file mode 100644 index 0000000..7046a2a --- /dev/null +++ b/tooling/fine-tuning/unsloth/docs/unsloth-README.md @@ -0,0 +1,250 @@ +

+ + + + Unsloth logo + +

+

+Run and train AI models with a unified local interface. +

+ +

+ Features • + Quickstart • + Notebooks • + Documentation • + Reddit +

+ +unsloth studio ui homepage + +Unsloth Studio (Beta) lets you run and train text, [audio](https://unsloth.ai/docs/basics/text-to-speech-tts-fine-tuning), [embedding](https://unsloth.ai/docs/new/embedding-finetuning), [vision](https://unsloth.ai/docs/basics/vision-fine-tuning) models on Windows, Linux and macOS. + +## ⭐ Features +Unsloth provides several key features for both inference and training: +### Inference +* **Search + download + run models** including GGUF, LoRA adapters, safetensors +* **Export models**: [Save or export](https://unsloth.ai/docs/new/studio/export) models to GGUF, 16-bit safetensors and other formats. +* **Tool calling**: Support for [self-healing tool calling](https://unsloth.ai/docs/new/studio/chat#auto-healing-tool-calling) and web search +* **[Code execution](https://unsloth.ai/docs/new/studio/chat#code-execution)**: lets LLMs test code in Claude artifacts and sandbox environments +* [Auto-tune inference parameters](https://unsloth.ai/docs/new/studio/chat#auto-parameter-tuning) and customize chat templates. +* We work directly with teams behind [gpt-oss](https://docs.unsloth.ai/new/gpt-oss-how-to-run-and-fine-tune#unsloth-fixes-for-gpt-oss), [Qwen3](https://www.reddit.com/r/LocalLLaMA/comments/1kaodxu/qwen3_unsloth_dynamic_ggufs_128k_context_bug_fixes/), [Llama 4](https://github.com/ggml-org/llama.cpp/pull/12889), [Mistral](models/tutorials/devstral-how-to-run-and-fine-tune.md), [Gemma 1-3](https://news.ycombinator.com/item?id=39671146), and [Phi-4](https://unsloth.ai/blog/phi4), where we’ve fixed bugs that improve model accuracy. +* Upload images, audio, PDFs, code, DOCX and more file types to chat with. +### Training +* Train and RL **500+ models** up to **2x faster** with up to **70% less VRAM**, with no accuracy loss. +* Custom Triton and mathematical **kernels**. See some collabs we did with [PyTorch](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide/fp8-reinforcement-learning) and [Hugging Face](https://unsloth.ai/docs/new/faster-moe). +* **Data Recipes**: [Auto-create datasets](https://unsloth.ai/docs/new/studio/data-recipe) from **PDF, CSV, DOCX** etc. Edit data in a visual-node workflow. +* **[Reinforcement Learning](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide)** (RL): The most efficient [RL](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide) library, using **80% less VRAM** for GRPO, [FP8](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide/fp8-reinforcement-learning) etc. +* Supports full fine-tuning, RL, pretraining, 4-bit, 16-bit and, FP8 training. +* **Observability**: Monitor training live, track loss and GPU usage and customize graphs. +* [Multi-GPU](https://unsloth.ai/docs/basics/multi-gpu-training-with-unsloth) training is supported, with major improvements coming soon. + +## ⚡ Quickstart +Unsloth can be used in two ways: through **[Unsloth Studio](https://unsloth.ai/docs/new/studio/)**, the web UI, or through **Unsloth Core**, the code-based version. Each has different requirements. + +### Unsloth Studio (web UI) +Unsloth Studio (Beta) works on **Windows, Linux, WSL** and **macOS**. + +* **CPU:** Supported for Chat and Data Recipes currently +* **NVIDIA:** Training works on RTX 30/40/50, Blackwell, DGX Spark, Station and more +* **macOS:** Currently supports chat and Data Recipes. **MLX training** is coming very soon +* **AMD:** Chat + Data works. Train with [Unsloth Core](#unsloth-core-code-based). Studio support is out soon. +* **Coming soon:** Training support for Apple MLX, AMD, and Intel. +* **Multi-GPU:** Available now, with a major upgrade on the way + +#### macOS, Linux, WSL: +```bash +curl -fsSL https://unsloth.ai/install.sh | sh +``` +#### Windows: +```powershell +irm https://unsloth.ai/install.ps1 | iex +``` + +#### Launch +```bash +unsloth studio -H 0.0.0.0 -p 8888 +``` + +#### Update +To update, use the same install commands as above. Or run (does not work on Windows): +```bash +unsloth studio update +``` + +#### Docker +Use our [Docker image](https://hub.docker.com/r/unsloth/unsloth) ```unsloth/unsloth``` container. Run: +```bash +docker run -d -e JUPYTER_PASSWORD="mypassword" \ + -p 8888:8888 -p 8000:8000 -p 2222:22 \ + -v $(pwd)/work:/workspace/work \ + --gpus all \ + unsloth/unsloth + ``` + +#### Developer, Nightly, Uninstall +To see developer, nightly and uninstallation etc. instructions, see [advanced installation](#-advanced-installation). + +### Unsloth Core (code-based) +#### Linux, WSL: +```bash +curl -LsSf https://astral.sh/uv/install.sh | sh +uv venv unsloth_env --python 3.13 +source unsloth_env/bin/activate +uv pip install unsloth --torch-backend=auto +``` +#### Windows: +```powershell +winget install -e --id Python.Python.3.13 +winget install --id=astral-sh.uv -e +uv venv unsloth_env --python 3.13 +.\unsloth_env\Scripts\activate +uv pip install unsloth --torch-backend=auto +``` +For Windows, `pip install unsloth` works only if you have PyTorch installed. Read our [Windows Guide](https://unsloth.ai/docs/get-started/install/windows-installation). +You can use the same Docker image as Unsloth Studio. + +#### AMD, Intel: +For RTX 50x, B200, 6000 GPUs: `uv pip install unsloth --torch-backend=auto`. Read our guides for: [Blackwell](https://unsloth.ai/docs/blog/fine-tuning-llms-with-blackwell-rtx-50-series-and-unsloth) and [DGX Spark](https://unsloth.ai/docs/blog/fine-tuning-llms-with-nvidia-dgx-spark-and-unsloth).
+To install Unsloth on **AMD** and **Intel** GPUs, follow our [AMD Guide](https://unsloth.ai/docs/get-started/install/amd) and [Intel Guide](https://unsloth.ai/docs/get-started/install/intel). + +## 📒 Free Notebooks + +Train for free with our notebooks. You can use our new [free Unsloth Studio notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb) to run and train models for free in a web UI. +Read our [guide](https://unsloth.ai/docs/get-started/fine-tuning-llms-guide). Add dataset, run, then deploy your trained model. + +| Model | Free Notebooks | Performance | Memory use | +|-----------|---------|--------|----------| +| **Gemma 4 (E2B)** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma4_(E2B)-Vision.ipynb) | 1.5x faster | 50% less | +| **Qwen3.5 (4B)** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_5_(4B)_Vision.ipynb) | 1.5x faster | 60% less | +| **gpt-oss (20B)** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt-oss-(20B)-Fine-tuning.ipynb) | 2x faster | 70% less | +| **Qwen3.5 GSPO** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_5_(4B)_Vision_GRPO.ipynb) | 2x faster | 70% less | +| **gpt-oss (20B): GRPO** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt-oss-(20B)-GRPO.ipynb) | 2x faster | 80% less | +| **Qwen3: Advanced GRPO** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(4B)-GRPO.ipynb) | 2x faster | 70% less | +| **embeddinggemma (300M)** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/EmbeddingGemma_(300M).ipynb) | 2x faster | 20% less | +| **Mistral Ministral 3 (3B)** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Ministral_3_VL_(3B)_Vision.ipynb) | 1.5x faster | 60% less | +| **Llama 3.1 (8B) Alpaca** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb) | 2x faster | 70% less | +| **Llama 3.2 Conversational** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb) | 2x faster | 70% less | +| **Orpheus-TTS (3B)** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Orpheus_(3B)-TTS.ipynb) | 1.5x faster | 50% less | + +- See all our notebooks for: [Kaggle](https://github.com/unslothai/notebooks?tab=readme-ov-file#-kaggle-notebooks), [GRPO](https://unsloth.ai/docs/get-started/unsloth-notebooks#grpo-reasoning-rl-notebooks), [TTS](https://unsloth.ai/docs/get-started/unsloth-notebooks#text-to-speech-tts-notebooks), [embedding](https://unsloth.ai/docs/new/embedding-finetuning) & [Vision](https://unsloth.ai/docs/get-started/unsloth-notebooks#vision-multimodal-notebooks) +- See [all our models](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [all our notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks) +- See detailed documentation for Unsloth [here](https://unsloth.ai/docs) + +## 🦥 Unsloth News +- **Gemma 4**: Run and train Google’s new models directly in Unsloth Studio! [Blog](https://unsloth.ai/docs/models/gemma-4) +- **Introducing Unsloth Studio**: our new web UI for running and training LLMs. [Blog](https://unsloth.ai/docs/new/studio) +- **Qwen3.5** - 0.8B, 2B, 4B, 9B, 27B, 35-A3B, 112B-A10B are now supported. [Guide + notebooks](https://unsloth.ai/docs/models/qwen3.5/fine-tune) +- Train **MoE LLMs 12x faster** with 35% less VRAM - DeepSeek, GLM, Qwen and gpt-oss. [Blog](https://unsloth.ai/docs/new/faster-moe) +- **Embedding models**: Unsloth now supports ~1.8-3.3x faster embedding fine-tuning. [Blog](https://unsloth.ai/docs/new/embedding-finetuning) • [Notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks#embedding-models) +- New **7x longer context RL** vs. all other setups, via our new batching algorithms. [Blog](https://unsloth.ai/docs/new/grpo-long-context) +- New RoPE & MLP **Triton Kernels** & **Padding Free + Packing**: 3x faster training & 30% less VRAM. [Blog](https://unsloth.ai/docs/new/3x-faster-training-packing) +- **500K Context**: Training a 20B model with >500K context is now possible on an 80GB GPU. [Blog](https://unsloth.ai/docs/blog/500k-context-length-fine-tuning) +- **FP8 & Vision RL**: You can now do FP8 & VLM GRPO on consumer GPUs. [FP8 Blog](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide/vision-reinforcement-learning-vlm-rl) +- **gpt-oss** by OpenAI: Read our [RL blog](https://unsloth.ai/docs/models/gpt-oss-how-to-run-and-fine-tune/gpt-oss-reinforcement-learning), [Flex Attention](https://unsloth.ai/docs/models/gpt-oss-how-to-run-and-fine-tune/long-context-gpt-oss-training) blog and [Guide](https://unsloth.ai/docs/models/gpt-oss-how-to-run-and-fine-tune). + +## 📥 Advanced Installation +The below advanced instructions are for Unsloth Studio. For Unsloth Core advanced installation, [view our docs](https://unsloth.ai/docs/get-started/install/pip-install#advanced-pip-installation). +#### Developer installs: macOS, Linux, WSL: +```bash +git clone https://github.com/unslothai/unsloth +cd unsloth +./install.sh --local +unsloth studio -H 0.0.0.0 -p 8888 +``` +Then to update : +```bash +unsloth studio update +``` + +#### Developer installs: Windows PowerShell: +```powershell +git clone https://github.com/unslothai/unsloth.git +cd unsloth +Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass +.\install.ps1 --local +unsloth studio -H 0.0.0.0 -p 8888 +``` +Then to update : +```bash +unsloth studio update +``` + +#### Nightly: MacOS, Linux, WSL: +```bash +git clone https://github.com/unslothai/unsloth +cd unsloth +git checkout nightly +./install.sh --local +unsloth studio -H 0.0.0.0 -p 8888 +``` +Then to launch every time: +```bash +unsloth studio -H 0.0.0.0 -p 8888 +``` + +#### Nightly: Windows: +Run in Windows Powershell: +```bash +git clone https://github.com/unslothai/unsloth.git +cd unsloth +git checkout nightly +Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass +.\install.ps1 --local +unsloth studio -H 0.0.0.0 -p 8888 +``` +Then to launch every time: +```bash +unsloth studio -H 0.0.0.0 -p 8888 +``` + +#### Uninstall +You can uninstall Unsloth Studio by deleting its install folder usually located under `$HOME/.unsloth/studio` on Mac/Linux/WSL and `%USERPROFILE%\.unsloth\studio` on Windows. Using the `rm -rf` commands will **delete everything**, including your history, cache: + +* ​ **MacOS, WSL, Linux:** `rm -rf ~/.unsloth/studio` +* ​ **Windows (PowerShell):** `Remove-Item -Recurse -Force "$HOME\.unsloth\studio"` + +For more info, [see our docs](https://unsloth.ai/docs/new/studio/install#uninstall). + +#### Deleting model files + +You can delete old model files either from the bin icon in model search or by removing the relevant cached model folder from the default Hugging Face cache directory. By default, HF uses: + +* ​ **MacOS, Linux, WSL:** `~/.cache/huggingface/hub/` +* ​ **Windows:** `%USERPROFILE%\.cache\huggingface\hub\` + +## 💚 Community and Links +| Type | Links | +| ----------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------ | +|   **Discord** | [Join Discord server](https://discord.com/invite/unsloth) | +|   **r/unsloth Reddit** | [Join Reddit community](https://reddit.com/r/unsloth) | +| 📚 **Documentation & Wiki** | [Read Our Docs](https://unsloth.ai/docs) | +|   **Twitter (aka X)** | [Follow us on X](https://twitter.com/unslothai) | +| 🔮 **Our Models** | [Unsloth Catalog](https://unsloth.ai/docs/get-started/unsloth-model-catalog) | +| ✍️ **Blog** | [Read our Blogs](https://unsloth.ai/blog) | + +### Citation + +You can cite the Unsloth repo as follows: +```bibtex +@software{unsloth, + author = {Daniel Han, Michael Han and Unsloth team}, + title = {Unsloth}, + url = {https://github.com/unslothai/unsloth}, + year = {2023} +} +``` +If you trained a model with 🦥Unsloth, you can use this cool sticker!   + +### License +Unsloth uses a dual-licensing model of Apache 2.0 and AGPL-3.0. The core Unsloth package remains licensed under **[Apache 2.0](https://github.com/unslothai/unsloth?tab=Apache-2.0-1-ov-file)**, while certain optional components, such as the Unsloth Studio UI are licensed under the open-source license **[AGPL-3.0](https://github.com/unslothai/unsloth?tab=AGPL-3.0-2-ov-file)**. + +This structure helps support ongoing Unsloth development while keeping the project open source and enabling the broader ecosystem to continue growing. + +### Thank You to +- The [llama.cpp library](https://github.com/ggml-org/llama.cpp) that lets users run and save models with Unsloth +- The Hugging Face team and their libraries: [transformers](https://github.com/huggingface/transformers) and [TRL](https://github.com/huggingface/trl) +- The Pytorch and [Torch AO](https://github.com/unslothai/unsloth/pull/3391) team for their contributions +- NVIDIA for their [NeMo DataDesigner](https://github.com/NVIDIA-NeMo/DataDesigner) library and their contributions +- And of course for every single person who has contributed or has used Unsloth! diff --git a/tooling/fine-tuning/unsloth/kaggle/Gemma4_(31B)-Text.ipynb b/tooling/fine-tuning/unsloth/kaggle/Gemma4_(31B)-Text.ipynb new file mode 100644 index 0000000..6c9865f --- /dev/null +++ b/tooling/fine-tuning/unsloth/kaggle/Gemma4_(31B)-Text.ipynb @@ -0,0 +1,8781 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "c19cd9ee", + "metadata": { + "id": "n5NrROnmsJP4", + "papermill": { + "duration": 0.021851, + "end_time": "2026-04-09T14:46:31.545467+00:00", + "exception": false, + "start_time": "2026-04-09T14:46:31.523616+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "To run this, press \"*Runtime*\" and press \"*Run all*\" on a Google Colab A100 instance!\n", + "
\n", + "\n", + "\n", + " Join Discord if you need help + ⭐ Star us on Github ⭐\n", + "
\n", + "\n", + "To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme).\n", + "\n", + "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it" + ] + }, + { + "cell_type": "markdown", + "id": "c5dbe248", + "metadata": { + "id": "jjzmSqm7sJP4", + "papermill": { + "duration": 0.018413, + "end_time": "2026-04-09T14:46:31.583432+00:00", + "exception": false, + "start_time": "2026-04-09T14:46:31.565019+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "### News" + ] + }, + { + "cell_type": "markdown", + "id": "0cad8cc5", + "metadata": { + "id": "WDatJHaBsJP5", + "papermill": { + "duration": 0.018268, + "end_time": "2026-04-09T14:46:31.620424+00:00", + "exception": false, + "start_time": "2026-04-09T14:46:31.602156+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "Introducing **Unsloth Studio** - a new open source, no-code web UI to train and run LLMs. [Blog](https://unsloth.ai/docs/new/studio) • [Notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb)\n", + "\n", + "\n", + "\n", + "\n", + "
\"Unsloth
Train models — no code needed
\"Unsloth
Run GGUF models on Mac, Windows & Linux
\n", + "\n", + "Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe)\n", + "\n", + "Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context)\n", + "\n", + "New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) • [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) • [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning)\n", + "\n", + "Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks)." + ] + }, + { + "cell_type": "markdown", + "id": "acd0984d", + "metadata": { + "id": "9TnXzZYMsJP5", + "papermill": { + "duration": 0.018422, + "end_time": "2026-04-09T14:46:31.657445+00:00", + "exception": false, + "start_time": "2026-04-09T14:46:31.639023+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "### Installation" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "cf52976a", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:46:31.696042Z", + "iopub.status.busy": "2026-04-09T14:46:31.695595Z", + "iopub.status.idle": "2026-04-09T14:46:36.809838Z", + "shell.execute_reply": "2026-04-09T14:46:36.808831Z" + }, + "id": "B2T3Z8F2sJP5", + "papermill": { + "duration": 5.1357, + "end_time": "2026-04-09T14:46:36.811554+00:00", + "exception": false, + "start_time": "2026-04-09T14:46:31.675854+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "%%capture\n", + "try: import numpy, PIL; _numpy = f\"numpy=={numpy.__version__}\"; _pil = f\"pillow=={PIL.__version__}\"\n", + "except: _numpy = \"numpy\"; _pil = \"pillow\"\n", + "!uv pip install -qqq \\\n", + " \"torch>=2.8.0\" \"triton>=3.4.0\" {_numpy} {_pil} torchvision bitsandbytes \\\n", + " unsloth \"unsloth_zoo>=2026.4.6\" transformers==5.5.0 torchcodec timm" + ] + }, + { + "cell_type": "markdown", + "id": "77bdc102", + "metadata": { + "id": "TGMWlrRdzwgf", + "papermill": { + "duration": 0.018575, + "end_time": "2026-04-09T14:46:36.849477+00:00", + "exception": false, + "start_time": "2026-04-09T14:46:36.830902+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "### Unsloth\n", + "\n", + "`FastModel` supports loading nearly any model now! This includes Vision and Text models!" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "1c116df3", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:46:36.887792Z", + "iopub.status.busy": "2026-04-09T14:46:36.887484Z", + "iopub.status.idle": "2026-04-09T14:56:32.181348Z", + "shell.execute_reply": "2026-04-09T14:56:32.180581Z" + }, + "id": "-Xbb0cuLzwgf", + "outputId": "3d9968a4-cb68-4112-aa41-29a78c0c8ae0", + "papermill": { + "duration": 595.315071, + "end_time": "2026-04-09T14:56:32.183208+00:00", + "exception": false, + "start_time": "2026-04-09T14:46:36.868137+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.\n", + "🦥 Unsloth Zoo will now patch everything to make training faster!\n", + "==((====))== Unsloth 2026.4.4: Fast Gemma4 patching. Transformers: 5.5.0.\n", + " \\\\ /| Tesla T4. Num GPUs = 2. Max memory: 14.563 GB. Platform: Linux.\n", + "O^O/ \\_/ \\ Torch: 2.10.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.6.0\n", + "\\ / Bfloat16 = FALSE. FA [Xformers = 0.0.35. FA2 = False]\n", + " \"-____-\" Free license: http://github.com/unslothai/unsloth\n", + "Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9979852f7f1e4abfaab19c8fe7df76bf", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "model.safetensors.index.json: 0.00B [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "000e4832d81f42fd81d516e06acf2981", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Downloading (incomplete total...): 0.00B [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3c3e50bf8e254e9e9f5b14e4bc314c03", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Fetching 2 files: 0%| | 0/2 [00:00" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "3a992f62", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:56:32.361453Z", + "iopub.status.busy": "2026-04-09T14:56:32.360986Z", + "iopub.status.idle": "2026-04-09T14:59:10.830340Z", + "shell.execute_reply": "2026-04-09T14:59:10.829715Z" + }, + "id": "9jGeSb9bWe0k", + "outputId": "e95da582-2668-4936-9aec-3ffdde2bb771", + "papermill": { + "duration": 158.492174, + "end_time": "2026-04-09T14:59:10.832047+00:00", + "exception": false, + "start_time": "2026-04-09T14:56:32.339873+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The animal in the image is a sloth. Sloths have appeared in several popular films, most notably:\n", + "\n", + "* **Zootopia (2016):** Featuring the character Flash, a slow-moving sloth who works at the Department of Mammal Vehicles.\n", + "* **Ice Age (2002):** Featuring Sid, a ground sloth who is one of the main protagonists of the series.\n" + ] + } + ], + "source": [ + "sloth_link = \"https://files.worldwildlife.org/wwfcmsprod/images/Sloth_Sitting_iStock_3_12_2014/story_full_width/8l7pbjmj29_iStock_000011145477Large_mini__1_.jpg\"\n", + "\n", + "messages = [{\n", + " \"role\" : \"user\",\n", + " \"content\": [\n", + " { \"type\": \"image\", \"image\" : sloth_link },\n", + " { \"type\": \"text\", \"text\" : \"Which films does this animal feature in?\" }\n", + " ]\n", + "}]\n", + "# You might have to wait 1 minute for Unsloth's auto compiler\n", + "do_gemma_4_inference(messages, max_new_tokens = 256)" + ] + }, + { + "cell_type": "markdown", + "id": "0a1450eb", + "metadata": { + "id": "eh0BzbZPWtRD", + "papermill": { + "duration": 0.021884, + "end_time": "2026-04-09T14:59:10.878105+00:00", + "exception": false, + "start_time": "2026-04-09T14:59:10.856221+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "Let's make a poem about sloths!" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "a72ce8c6", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:59:10.923282Z", + "iopub.status.busy": "2026-04-09T14:59:10.923007Z", + "iopub.status.idle": "2026-04-09T14:59:48.128345Z", + "shell.execute_reply": "2026-04-09T14:59:48.127693Z" + }, + "id": "R3ExuK8cWuT3", + "outputId": "3a0a6496-79ef-4921-8dd5-96ba5ba77fc9", + "papermill": { + "duration": 37.229725, + "end_time": "2026-04-09T14:59:48.129962+00:00", + "exception": false, + "start_time": "2026-04-09T14:59:10.900237+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "In the emerald heart of the canopy high,\n", + "Where the orchids bloom and the parrots fly,\n", + "Dwells a gentle soul in a velvet coat,\n", + "In a slow-motion world, a drifting boat.\n", + "\n", + "With curved, steady claws and a sleepy gaze,\n", + "He wanders through gold-dappled, humid haze.\n", + "No rush for the fruit, no race for the prize,\n", + "Just the soft, steady blink of heavy-lidded eyes.\n", + "\n", + "He is the master of the patient art,\n", + "With a rhythmic beat in a quiet heart.\n", + "While the monkeys chatter and the jaguars leap,\n", + "The sloth\n" + ] + } + ], + "source": [ + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{ \"type\" : \"text\",\n", + " \"text\" : \"Write a poem about sloths.\" }]\n", + "}]\n", + "do_gemma_4_inference(messages)" + ] + }, + { + "cell_type": "markdown", + "id": "e77fbff4", + "metadata": { + "id": "Bw5XPyYFajyM", + "papermill": { + "duration": 0.025126, + "end_time": "2026-04-09T14:59:48.181360+00:00", + "exception": false, + "start_time": "2026-04-09T14:59:48.156234+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "# Let's finetune Gemma 4!\n", + "\n", + "You can finetune the vision and text parts for now through selection - the audio part can also be finetuned - we're working to make it selectable as well!" + ] + }, + { + "cell_type": "markdown", + "id": "acb794c4", + "metadata": { + "id": "SXd9bTZd1aaL", + "papermill": { + "duration": 0.025143, + "end_time": "2026-04-09T14:59:48.231346+00:00", + "exception": false, + "start_time": "2026-04-09T14:59:48.206203+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "We now add LoRA adapters so we only need to update a small amount of parameters!" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "6fd5b894", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:59:48.282613Z", + "iopub.status.busy": "2026-04-09T14:59:48.282332Z", + "iopub.status.idle": "2026-04-09T14:59:57.633319Z", + "shell.execute_reply": "2026-04-09T14:59:57.632673Z" + }, + "id": "6bZsfBuZDeCL", + "papermill": { + "duration": 9.379235, + "end_time": "2026-04-09T14:59:57.635217+00:00", + "exception": false, + "start_time": "2026-04-09T14:59:48.255982+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "model = FastModel.get_peft_model(\n", + " model,\n", + " finetune_vision_layers = False, # Turn off for just text!\n", + " finetune_language_layers = True, # Should leave on!\n", + " finetune_attention_modules = True, # Attention good for GRPO\n", + " finetune_mlp_modules = True, # Should leave on always!\n", + "\n", + " r = 8, # Larger = higher accuracy, but might overfit\n", + " lora_alpha = 8, # Recommended alpha == r at least\n", + " lora_dropout = 0,\n", + " bias = \"none\",\n", + " random_state = 3407,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "5dc4ec82", + "metadata": { + "id": "vITh0KVJ10qX", + "papermill": { + "duration": 0.024706, + "end_time": "2026-04-09T14:59:57.686384+00:00", + "exception": false, + "start_time": "2026-04-09T14:59:57.661678+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "\n", + "### Data Prep\n", + "We now use the `Gemma-4` format for conversation style finetunes. We use [Maxime Labonne's FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset in ShareGPT style. Gemma-4 renders multi turn conversations like below:\n", + "\n", + "```\n", + "<|turn>user\n", + "Hello\n", + "<|turn>model\n", + "Hey there!\n", + "```\n", + "We use our `get_chat_template` function to get the correct chat template. We support `zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, phi3, llama3, phi4, qwen2.5, gemma3, gemma-4` and more." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "4fc98513", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:59:57.737313Z", + "iopub.status.busy": "2026-04-09T14:59:57.737037Z", + "iopub.status.idle": "2026-04-09T14:59:57.741858Z", + "shell.execute_reply": "2026-04-09T14:59:57.741196Z" + }, + "id": "LjY75GoYUCB8", + "papermill": { + "duration": 0.032282, + "end_time": "2026-04-09T14:59:57.743246+00:00", + "exception": false, + "start_time": "2026-04-09T14:59:57.710964+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "from unsloth.chat_templates import get_chat_template\n", + "tokenizer = get_chat_template(\n", + " tokenizer,\n", + " chat_template = \"gemma-4-thinking\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "63328983", + "metadata": { + "id": "ZQkXuGYxbJ-e", + "papermill": { + "duration": 0.024742, + "end_time": "2026-04-09T14:59:57.793028+00:00", + "exception": false, + "start_time": "2026-04-09T14:59:57.768286+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "We get the first 3000 rows of the dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "8e94eaa0", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:59:57.844051Z", + "iopub.status.busy": "2026-04-09T14:59:57.843793Z", + "iopub.status.idle": "2026-04-09T15:00:01.697881Z", + "shell.execute_reply": "2026-04-09T15:00:01.696967Z" + }, + "id": "Mkq4RvEq7FQr", + "outputId": "43449058-686e-430f-c009-c3e795af23ac", + "papermill": { + "duration": 3.881705, + "end_time": "2026-04-09T15:00:01.699372+00:00", + "exception": false, + "start_time": "2026-04-09T14:59:57.817667+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "60adcfaac9cc483284f4cdc1cead9de4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "README.md: 0%| | 0.00/982 [00:00` token using removeprefix(`''`) since we're finetuning. The Processor will add this token before training and the model expects only one." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "5cc67ac7", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T15:00:03.355512Z", + "iopub.status.busy": "2026-04-09T15:00:03.354667Z", + "iopub.status.idle": "2026-04-09T15:00:07.915139Z", + "shell.execute_reply": "2026-04-09T15:00:07.914463Z" + }, + "id": "1ahE8Ys37JDJ", + "outputId": "11a09b9e-9fab-45cc-aef4-81b027ebc64b", + "papermill": { + "duration": 4.589949, + "end_time": "2026-04-09T15:00:07.916517+00:00", + "exception": false, + "start_time": "2026-04-09T15:00:03.326568+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c9cb90312b97442bb895e73d6329c5df", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Map: 0%| | 0/3000 [00:00') for convo in convos]\n", + " return { \"text\" : texts, }\n", + "\n", + "dataset = dataset.map(formatting_prompts_func, batched = True)" + ] + }, + { + "cell_type": "markdown", + "id": "bfb67b68", + "metadata": { + "id": "ndDUB23CGAC5", + "papermill": { + "duration": 0.025205, + "end_time": "2026-04-09T15:00:07.968708+00:00", + "exception": false, + "start_time": "2026-04-09T15:00:07.943503+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "Let's see how the chat template did! Notice there is no `` token as the processor tokenizer will be adding one." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "54b06431", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T15:00:08.021699Z", + "iopub.status.busy": "2026-04-09T15:00:08.021002Z", + "iopub.status.idle": "2026-04-09T15:00:08.026602Z", + "shell.execute_reply": "2026-04-09T15:00:08.025687Z" + }, + "id": "gGFzmplrEy9I", + "outputId": "7d50b4dc-d3ff-49fd-dba6-7511ee9c7c07", + "papermill": { + "duration": 0.034027, + "end_time": "2026-04-09T15:00:08.028170+00:00", + "exception": false, + "start_time": "2026-04-09T15:00:07.994143+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'<|turn>user\\nWhat is the modulus operator in programming and how can I use it to calculate the modulus of two given numbers?\\n<|turn>model\\n<|channel>thought\\nIn programming, the modulus operator is represented by the \\'%\\' symbol. It calculates the remainder when one number is divided by another. To calculate the modulus of two given numbers, you can use the modulus operator in the following way:\\n\\n```python\\n# Calculate the modulus\\nModulus = a % b\\n\\nprint(\"Modulus of the given numbers is: \", Modulus)\\n```\\n\\nIn this code snippet, the variables \\'a\\' and \\'b\\' represent the two given numbers for which you want to calculate the modulus. By using the modulus operator \\'%\\', we calculate the remainder when \\'a\\' is divided by \\'b\\'. The result is then stored in the variable \\'Modulus\\'. Finally, the modulus value is printed using the \\'print\\' statement.\\n\\nFor example, if \\'a\\' is 10 and \\'b\\' is 4, the modulus calculation would be 10 % 4, which equals 2. Therefore, the output of the above code would be:\\n\\n```\\nModulus of the given numbers is: 2\\n```\\n\\nThis means that the modulus of 10 and 4 is 2.\\n'" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dataset[100][\"text\"]" + ] + }, + { + "cell_type": "markdown", + "id": "c7255ddf", + "metadata": { + "id": "idAEIeSQ3xdS", + "papermill": { + "duration": 0.025334, + "end_time": "2026-04-09T15:00:08.080059+00:00", + "exception": false, + "start_time": "2026-04-09T15:00:08.054725+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "\n", + "### Train the model\n", + "Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "124b95e5", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T15:00:08.132853Z", + "iopub.status.busy": "2026-04-09T15:00:08.132156Z", + "iopub.status.idle": "2026-04-09T15:00:52.584313Z", + "shell.execute_reply": "2026-04-09T15:00:52.583576Z" + }, + "id": "95_Nn-89DhsL", + "outputId": "369cc7d6-dfd2-4e4a-c117-495ffd50d06d", + "papermill": { + "duration": 44.480351, + "end_time": "2026-04-09T15:00:52.586015+00:00", + "exception": false, + "start_time": "2026-04-09T15:00:08.105664+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a1beb7697e21449a90a6dfdb59e2eb63", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Unsloth: Tokenizing [\"text\"] (num_proc=8): 0%| | 0/3000 [00:00user\\n\",\n", + " response_part = \"<|turn>model\\n\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "992d039a", + "metadata": { + "id": "Dv1NBUozV78l", + "papermill": { + "duration": 0.027396, + "end_time": "2026-04-09T15:00:57.191777+00:00", + "exception": false, + "start_time": "2026-04-09T15:00:57.164381+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "Let's verify masking the instruction part is done! Let's print the 100th row again. Notice how the sample only has a single `` as expected!" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "ab585e6a", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T15:00:57.245666Z", + "iopub.status.busy": "2026-04-09T15:00:57.245320Z", + "iopub.status.idle": "2026-04-09T15:00:57.252022Z", + "shell.execute_reply": "2026-04-09T15:00:57.251352Z" + }, + "id": "LtsMVtlkUhja", + "outputId": "68552bcc-1c20-48e9-8a78-505dced229c9", + "papermill": { + "duration": 0.035425, + "end_time": "2026-04-09T15:00:57.253447+00:00", + "exception": false, + "start_time": "2026-04-09T15:00:57.218022+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'<|turn>user\\nWhat is the modulus operator in programming and how can I use it to calculate the modulus of two given numbers?\\n<|turn>model\\n<|channel>thought\\nIn programming, the modulus operator is represented by the \\'%\\' symbol. It calculates the remainder when one number is divided by another. To calculate the modulus of two given numbers, you can use the modulus operator in the following way:\\n\\n```python\\n# Calculate the modulus\\nModulus = a % b\\n\\nprint(\"Modulus of the given numbers is: \", Modulus)\\n```\\n\\nIn this code snippet, the variables \\'a\\' and \\'b\\' represent the two given numbers for which you want to calculate the modulus. By using the modulus operator \\'%\\', we calculate the remainder when \\'a\\' is divided by \\'b\\'. The result is then stored in the variable \\'Modulus\\'. Finally, the modulus value is printed using the \\'print\\' statement.\\n\\nFor example, if \\'a\\' is 10 and \\'b\\' is 4, the modulus calculation would be 10 % 4, which equals 2. Therefore, the output of the above code would be:\\n\\n```\\nModulus of the given numbers is: 2\\n```\\n\\nThis means that the modulus of 10 and 4 is 2.\\n'" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tokenizer.decode(trainer.train_dataset[100][\"input_ids\"])" + ] + }, + { + "cell_type": "markdown", + "id": "2cbf36da", + "metadata": { + "id": "4Kyjy__m9KY3", + "papermill": { + "duration": 0.025784, + "end_time": "2026-04-09T15:00:57.306088+00:00", + "exception": false, + "start_time": "2026-04-09T15:00:57.280304+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "Now let's print the masked out example - you should see only the answer is present:" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "989de0a0", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T15:00:57.359466Z", + "iopub.status.busy": "2026-04-09T15:00:57.358849Z", + "iopub.status.idle": "2026-04-09T15:00:57.364903Z", + "shell.execute_reply": "2026-04-09T15:00:57.364262Z" + }, + "id": "_rD6fl8EUxnG", + "outputId": "b44c5e2f-068f-4a31-a31e-34ed10678a6d", + "papermill": { + "duration": 0.034134, + "end_time": "2026-04-09T15:00:57.366221+00:00", + "exception": false, + "start_time": "2026-04-09T15:00:57.332087+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "' <|channel>thought\\nIn programming, the modulus operator is represented by the \\'%\\' symbol. It calculates the remainder when one number is divided by another. To calculate the modulus of two given numbers, you can use the modulus operator in the following way:\\n\\n```python\\n# Calculate the modulus\\nModulus = a % b\\n\\nprint(\"Modulus of the given numbers is: \", Modulus)\\n```\\n\\nIn this code snippet, the variables \\'a\\' and \\'b\\' represent the two given numbers for which you want to calculate the modulus. By using the modulus operator \\'%\\', we calculate the remainder when \\'a\\' is divided by \\'b\\'. The result is then stored in the variable \\'Modulus\\'. Finally, the modulus value is printed using the \\'print\\' statement.\\n\\nFor example, if \\'a\\' is 10 and \\'b\\' is 4, the modulus calculation would be 10 % 4, which equals 2. Therefore, the output of the above code would be:\\n\\n```\\nModulus of the given numbers is: 2\\n```\\n\\nThis means that the modulus of 10 and 4 is 2.\\n'" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tokenizer.decode([tokenizer.pad_token_id if x == -100 else x for x in trainer.train_dataset[100][\"labels\"]]).replace(tokenizer.pad_token, \" \")" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "05fd3572", + "metadata": { + "cellView": "form", + "execution": { + "iopub.execute_input": "2026-04-09T15:00:57.420509Z", + "iopub.status.busy": "2026-04-09T15:00:57.419810Z", + "iopub.status.idle": "2026-04-09T15:00:57.425475Z", + "shell.execute_reply": "2026-04-09T15:00:57.424748Z" + }, + "id": "2ejIt2xSNKKp", + "outputId": "7a278f07-915e-44b9-d018-c342eb3d9256", + "papermill": { + "duration": 0.034909, + "end_time": "2026-04-09T15:00:57.427558+00:00", + "exception": false, + "start_time": "2026-04-09T15:00:57.392649+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "GPU = Tesla T4. Max memory = 14.563 GB.\n", + "8.258 GB of memory reserved.\n" + ] + } + ], + "source": [ + "# @title Show current memory stats\n", + "gpu_stats = torch.cuda.get_device_properties(0)\n", + "start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)\n", + "print(f\"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.\")\n", + "print(f\"{start_gpu_memory} GB of memory reserved.\")" + ] + }, + { + "cell_type": "markdown", + "id": "8927128e", + "metadata": { + "id": "CNP1Uidk9mrz", + "papermill": { + "duration": 0.028863, + "end_time": "2026-04-09T15:00:57.486133+00:00", + "exception": false, + "start_time": "2026-04-09T15:00:57.457270+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "# Let's train the model!\n", + "\n", + "To resume a training run, set `trainer.train(resume_from_checkpoint = True)`" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "8bbfadef", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T15:00:57.545312Z", + "iopub.status.busy": "2026-04-09T15:00:57.544669Z", + "iopub.status.idle": "2026-04-09T15:45:32.096909Z", + "shell.execute_reply": "2026-04-09T15:45:32.095920Z" + }, + "id": "yqxqAZ7KJ4oL", + "outputId": "bed52b39-9792-4141-e4aa-3a7f8e3e6d49", + "papermill": { + "duration": 2674.583964, + "end_time": "2026-04-09T15:45:32.098942+00:00", + "exception": false, + "start_time": "2026-04-09T15:00:57.514978+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': 2}.\n", + "==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 2\n", + " \\\\ /| Num examples = 3,000 | Num Epochs = 1 | Total steps = 60\n", + "O^O/ \\_/ \\ Batch size per device = 1 | Gradient accumulation steps = 4\n", + "\\ / Data Parallel GPUs = 1 | Total batch size (1 x 4 x 1) = 4\n", + " \"-____-\" Trainable parameters = 61,214,720 of 31,334,301,232 (0.20% trained)\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Unsloth: Will smartly offload gradients to save VRAM!\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [60/60 40:27, Epoch 0/1]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StepTraining Loss
12.129590
20.628091
30.911727
41.568517
51.258429
60.759183
70.853166
80.742576
91.026424
101.295154
111.213621
120.669095
130.747510
141.042578
150.837716
160.966854
171.068619
181.088091
191.109440
200.957984
210.633893
221.168129
230.992129
240.891402
250.573305
260.488871
270.609662
280.634687
290.555076
300.802394
310.726872
320.673458
331.081852
340.485952
350.927314
360.931723
370.678981
380.992566
390.968166
400.671915
410.837725
420.872944
430.632054
440.529681
450.553375
461.237677
470.634191
480.646122
490.912598
501.033742
510.518066
520.603996
530.598058
540.576573
550.776600
560.912896
570.451140
580.760951
590.855676
600.749862

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "trainer_stats = trainer.train()" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "3947b6c8", + "metadata": { + "cellView": "form", + "execution": { + "iopub.execute_input": "2026-04-09T15:45:32.155807Z", + "iopub.status.busy": "2026-04-09T15:45:32.155341Z", + "iopub.status.idle": "2026-04-09T15:45:32.162079Z", + "shell.execute_reply": "2026-04-09T15:45:32.161176Z" + }, + "id": "pCqnaKmlO1U9", + "outputId": "881756ba-9573-4a68-ec6e-14e4cf325d93", + "papermill": { + "duration": 0.036173, + "end_time": "2026-04-09T15:45:32.163732+00:00", + "exception": false, + "start_time": "2026-04-09T15:45:32.127559+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2668.3194 seconds used for training.\n", + "44.47 minutes used for training.\n", + "Peak reserved memory = 10.949 GB.\n", + "Peak reserved memory for training = 2.691 GB.\n", + "Peak reserved memory % of max memory = 75.184 %.\n", + "Peak reserved memory for training % of max memory = 18.478 %.\n" + ] + } + ], + "source": [ + "# @title Show final memory and time stats\n", + "used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "used_memory_for_lora = round(used_memory - start_gpu_memory, 3)\n", + "used_percentage = round(used_memory / max_memory * 100, 3)\n", + "lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)\n", + "print(f\"{trainer_stats.metrics['train_runtime']} seconds used for training.\")\n", + "print(\n", + " f\"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.\"\n", + ")\n", + "print(f\"Peak reserved memory = {used_memory} GB.\")\n", + "print(f\"Peak reserved memory for training = {used_memory_for_lora} GB.\")\n", + "print(f\"Peak reserved memory % of max memory = {used_percentage} %.\")\n", + "print(f\"Peak reserved memory for training % of max memory = {lora_percentage} %.\")" + ] + }, + { + "cell_type": "markdown", + "id": "b53d19a6", + "metadata": { + "id": "ekOmTR1hSNcr", + "papermill": { + "duration": 0.026517, + "end_time": "2026-04-09T15:45:32.218053+00:00", + "exception": false, + "start_time": "2026-04-09T15:45:32.191536+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "\n", + "### Inference\n", + "Let's run the model via Unsloth native inference! According to the `Gemma-4` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64`" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "d13acec7", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T15:45:32.273858Z", + "iopub.status.busy": "2026-04-09T15:45:32.272928Z", + "iopub.status.idle": "2026-04-09T15:47:52.202477Z", + "shell.execute_reply": "2026-04-09T15:47:52.201712Z" + }, + "id": "kR3gIAX-SM2q", + "outputId": "cdfa1bd8-62cd-4355-cd4a-1026a16807ea", + "papermill": { + "duration": 139.987318, + "end_time": "2026-04-09T15:47:52.232546+00:00", + "exception": false, + "start_time": "2026-04-09T15:45:32.245228+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['<|turn>user\\nContinue the sequence: 1, 1, 2, 3, 5, 8,\\n<|turn>model\\n<|channel>thought\\n13']" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from unsloth.chat_templates import get_chat_template\n", + "tokenizer = get_chat_template(\n", + " tokenizer,\n", + " chat_template = \"gemma-4-thinking\",\n", + ")\n", + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{\n", + " \"type\" : \"text\",\n", + " \"text\" : \"Continue the sequence: 1, 1, 2, 3, 5, 8,\",\n", + " }]\n", + "}]\n", + "inputs = prepare_gemma_4_inputs(messages)\n", + "outputs = model.generate(\n", + " **inputs,\n", + " max_new_tokens = 64, # Increase for longer outputs!\n", + " use_cache = True,\n", + " # Recommended Gemma-4 settings!\n", + " temperature = 1.0, top_p = 0.95, top_k = 64,\n", + ")\n", + "tokenizer.batch_decode(outputs)" + ] + }, + { + "cell_type": "markdown", + "id": "0e5235a5", + "metadata": { + "id": "CrSvZObor0lY", + "papermill": { + "duration": 0.026931, + "end_time": "2026-04-09T15:47:52.286412+00:00", + "exception": false, + "start_time": "2026-04-09T15:47:52.259481+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + " You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "d145525c", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T15:47:52.342005Z", + "iopub.status.busy": "2026-04-09T15:47:52.341337Z", + "iopub.status.idle": "2026-04-09T15:48:17.844338Z", + "shell.execute_reply": "2026-04-09T15:48:17.843706Z" + }, + "id": "e2pEuRb1r2Vg", + "outputId": "4c132d52-60a3-4024-8500-3a779f3dc9d3", + "papermill": { + "duration": 25.53225, + "end_time": "2026-04-09T15:48:17.845774+00:00", + "exception": false, + "start_time": "2026-04-09T15:47:52.313524+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "<|channel>thought\n", + "The short answer is a phenomenon called **Rayleigh scattering**.\n", + "\n", + "Here is the step-by-step breakdown of why it happens:\n", + "\n", + "### 1. Sunlight is a spectrum of colors\n", + "Although sunlight looks white, it is actually made up of all the colors of the rainbow (red, orange\n" + ] + } + ], + "source": [ + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\" : \"text\", \"text\" : \"Why is the sky blue?\",}]\n", + "}]\n", + "inputs = prepare_gemma_4_inputs(messages)\n", + "\n", + "from transformers import TextStreamer\n", + "_ = model.generate(\n", + " **inputs,\n", + " max_new_tokens = 64, # Increase for longer outputs!\n", + " use_cache = True,\n", + " # Recommended Gemma-4 settings!\n", + " temperature = 1.0, top_p = 0.95, top_k = 64,\n", + " streamer = TextStreamer(tokenizer, skip_prompt = True),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "81ad9893", + "metadata": { + "id": "uMuVrWbjAzhc", + "papermill": { + "duration": 0.029207, + "end_time": "2026-04-09T15:48:17.904807+00:00", + "exception": false, + "start_time": "2026-04-09T15:48:17.875600+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "\n", + "### Saving, loading finetuned models\n", + "To save the final model as LoRA adapters, either use Hugging Face's `push_to_hub` for an online save or `save_pretrained` for a local save.\n", + "\n", + "**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "d62d52f4", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T15:48:17.963190Z", + "iopub.status.busy": "2026-04-09T15:48:17.962913Z", + "iopub.status.idle": "2026-04-09T15:48:21.490170Z", + "shell.execute_reply": "2026-04-09T15:48:21.489520Z" + }, + "id": "upcOlWe7A1vc", + "outputId": "f2c735b5-ddc0-4664-a256-72143545e918", + "papermill": { + "duration": 3.558637, + "end_time": "2026-04-09T15:48:21.491692+00:00", + "exception": false, + "start_time": "2026-04-09T15:48:17.933055+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['gemma_4_lora/processor_config.json']" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model.save_pretrained(\"gemma_4_lora\") # Local saving\n", + "tokenizer.save_pretrained(\"gemma_4_lora\")\n", + "# model.push_to_hub(\"HF_ACCOUNT/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving\n", + "# tokenizer.push_to_hub(\"HF_ACCOUNT/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving" + ] + }, + { + "cell_type": "markdown", + "id": "c2603e60", + "metadata": { + "id": "AEEcJ4qfC7Lp", + "papermill": { + "duration": 0.029644, + "end_time": "2026-04-09T15:48:21.625446+00:00", + "exception": false, + "start_time": "2026-04-09T15:48:21.595802+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "f2cb4825", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T15:48:21.685738Z", + "iopub.status.busy": "2026-04-09T15:48:21.685047Z", + "iopub.status.idle": "2026-04-09T15:49:12.173570Z", + "shell.execute_reply": "2026-04-09T15:49:12.172820Z" + }, + "id": "MKX_XKs_BNZR", + "outputId": "b12693a9-bca7-4e37-e9d4-1ceeed5b6907", + "papermill": { + "duration": 50.519795, + "end_time": "2026-04-09T15:49:12.175128+00:00", + "exception": false, + "start_time": "2026-04-09T15:48:21.655333+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "<|channel>thought\n", + "As of my current knowledge, **there is no official \"Gemma-4\" model released by Google.**\n", + "\n", + "The current state of the Gemma family (Google's open-weights model series) is as follows:\n", + "\n", + "1. **Gemma (v1):** The initial release of open-weights models based on the same technology as Gemini.\n", + "2. **Gemma 2:** The second generation, which introduced a different architecture (including sliding window attention and logit soft-capping) and offered significantly improved performance, particularly in the 9B and 27B parameter sizes.\n" + ] + } + ], + "source": [ + "if False:\n", + " from unsloth import FastModel\n", + " model, tokenizer = FastModel.from_pretrained(\n", + " model_name = \"gemma_4_lora\", # YOUR MODEL YOU USED FOR TRAINING\n", + " max_seq_length = 2048,\n", + " load_in_4bit = True,\n", + " )\n", + "\n", + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\" : \"text\", \"text\" : \"What is Gemma-4?\",}]\n", + "}]\n", + "inputs = prepare_gemma_4_inputs(messages)\n", + "\n", + "from transformers import TextStreamer\n", + "_ = model.generate(\n", + " **inputs,\n", + " max_new_tokens = 128, # Increase for longer outputs!\n", + " use_cache = True,\n", + " # Recommended Gemma-4 settings!\n", + " temperature = 1.0, top_p = 0.95, top_k = 64,\n", + " streamer = TextStreamer(tokenizer, skip_prompt = True),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "bb8349cb", + "metadata": { + "id": "f422JgM9sdVT", + "papermill": { + "duration": 0.031637, + "end_time": "2026-04-09T15:49:12.239235+00:00", + "exception": false, + "start_time": "2026-04-09T15:49:12.207598+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "### Saving to float16 for VLLM\n", + "\n", + "We also support saving to `float16` directly for deployment! We save it in the folder `gemma-4-finetune`. Set `if False` to `if True` to let it run!" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "10b57d3a", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T15:49:12.302340Z", + "iopub.status.busy": "2026-04-09T15:49:12.301943Z", + "iopub.status.idle": "2026-04-09T15:49:12.305586Z", + "shell.execute_reply": "2026-04-09T15:49:12.304927Z" + }, + "id": "iHjt_SMYsd3P", + "papermill": { + "duration": 0.036802, + "end_time": "2026-04-09T15:49:12.307072+00:00", + "exception": false, + "start_time": "2026-04-09T15:49:12.270270+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "if False: # Change to True to save finetune!\n", + " model.save_pretrained_merged(\"gemma-4-finetune\", tokenizer)" + ] + }, + { + "cell_type": "markdown", + "id": "a0acca1d", + "metadata": { + "id": "z6O48DbNIAr0", + "papermill": { + "duration": 0.030509, + "end_time": "2026-04-09T15:49:12.370071+00:00", + "exception": false, + "start_time": "2026-04-09T15:49:12.339562+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "If you want to upload / push to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location!" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "bca9bebf", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T15:49:12.433167Z", + "iopub.status.busy": "2026-04-09T15:49:12.432708Z", + "iopub.status.idle": "2026-04-09T15:49:12.436285Z", + "shell.execute_reply": "2026-04-09T15:49:12.435702Z" + }, + "id": "ZV-CiKPrIFG0", + "papermill": { + "duration": 0.036759, + "end_time": "2026-04-09T15:49:12.437534+00:00", + "exception": false, + "start_time": "2026-04-09T15:49:12.400775+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "if False: # Change to True to upload finetune\n", + " model.push_to_hub_merged(\n", + " \"HF_ACCOUNT/gemma-4-finetune\", tokenizer,\n", + " token = \"YOUR_HF_TOKEN\"\n", + " )" + ] + }, + { + "cell_type": "markdown", + "id": "4054eea7", + "metadata": { + "id": "TCv4vXHd61i7", + "papermill": { + "duration": 0.031055, + "end_time": "2026-04-09T15:49:12.500516+00:00", + "exception": false, + "start_time": "2026-04-09T15:49:12.469461+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "### GGUF / llama.cpp Conversion\n", + "To save to `GGUF` / `llama.cpp`, we support it natively now for all models! For now, you can convert easily to `Q8_0, F16 or BF16` precision. `Q4_K_M` for 4bit will come later!" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "e762204f", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T15:49:12.566397Z", + "iopub.status.busy": "2026-04-09T15:49:12.565692Z", + "iopub.status.idle": "2026-04-09T15:49:12.569767Z", + "shell.execute_reply": "2026-04-09T15:49:12.568969Z" + }, + "id": "FqfebeAdT073", + "papermill": { + "duration": 0.037988, + "end_time": "2026-04-09T15:49:12.571165+00:00", + "exception": false, + "start_time": "2026-04-09T15:49:12.533177+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "if False: # Change to True to save to GGUF\n", + " model.save_pretrained_gguf(\n", + " \"gemma_4_finetune\",\n", + " tokenizer,\n", + " quantization_method = \"Q8_0\", # For now only Q8_0, BF16, F16 supported\n", + " )" + ] + }, + { + "cell_type": "markdown", + "id": "b216983e", + "metadata": { + "id": "Q974YEVPI7JS", + "papermill": { + "duration": 0.031342, + "end_time": "2026-04-09T15:49:12.634542+00:00", + "exception": false, + "start_time": "2026-04-09T15:49:12.603200+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "Likewise, if you want to instead push to GGUF to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location!" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "26217777", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T15:49:12.699012Z", + "iopub.status.busy": "2026-04-09T15:49:12.698323Z", + "iopub.status.idle": "2026-04-09T15:49:12.702407Z", + "shell.execute_reply": "2026-04-09T15:49:12.701764Z" + }, + "id": "ZgcJIhJ0I_es", + "papermill": { + "duration": 0.038027, + "end_time": "2026-04-09T15:49:12.703918+00:00", + "exception": false, + "start_time": "2026-04-09T15:49:12.665891+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "if False: # Change to True to upload GGUF\n", + " model.push_to_hub_gguf(\n", + " \"HF_ACCOUNT/gemma_4_finetune\",\n", + " tokenizer,\n", + " quantization_method = \"Q8_0\", # Only Q8_0, BF16, F16 supported\n", + " token = \"YOUR_HF_TOKEN\",\n", + " )" + ] + }, + { + "cell_type": "markdown", + "id": "6d9d4e67", + "metadata": { + "id": "pnz9QOYTMvbH", + "papermill": { + "duration": 0.031057, + "end_time": "2026-04-09T15:49:12.766911+00:00", + "exception": false, + "start_time": "2026-04-09T15:49:12.735854+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "Now, use the `gemma-4-finetune.gguf` file or `gemma-4-finetune-Q4_K_M.gguf` file in llama.cpp.\n", + "\n", + "And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!\n", + "\n", + "Some other resources:\n", + "1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)\n", + "2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)\n", + "3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)\n", + "4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)!\n", + "\n", + "

\n", + " \n", + " \n", + " \n", + "\n", + " Join Discord if you need help + ⭐️ Star us on Github ⭐️\n", + "
\n", + "\n", + " This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme)." + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "A100", + "provenance": [] + }, + "kaggle": { + "accelerator": "nvidiaTeslaT4", + "dataSources": [ + { + "databundleVersionId": 16421787, + "sourceId": 134561, + "sourceType": "competition" + } + ], + "dockerImageVersionId": 31329, + "isGpuEnabled": true, + "isInternetEnabled": true, + "language": "python", + "sourceType": "notebook" + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.12" + }, + "papermill": { + "default_parameters": {}, + "duration": 3769.199235, + "end_time": "2026-04-09T15:49:17.022422+00:00", + "environment_variables": {}, + "exception": null, + "input_path": "__notebook__.ipynb", + "output_path": "__notebook__.ipynb", + "parameters": {}, + "start_time": "2026-04-09T14:46:27.823187+00:00", + "version": "2.7.0" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": { + "000e4832d81f42fd81d516e06acf2981": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_e24c7d620b6e4a0ba3ab31434fe4e2d1", + "IPY_MODEL_b99c3f0ccfcf4ede995ed99c0aaa94c9", + "IPY_MODEL_ef6189642fa34589b561a49d9e3df4b6" + ], + "layout": "IPY_MODEL_f5839673c4a649b6b3c7c767039e9659", + "tabbable": null, + "tooltip": null + } + }, + "00719cd6018c4f90bf566c624cca0e54": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "0093ad16714942fda671c8bdee823523": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0161b70d1f0a48fa9031f7e5fdb82d58": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "03cf201e1dbd4f6c9ad0674ca5de0c0c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_0093ad16714942fda671c8bdee823523", + "placeholder": "​", + "style": "IPY_MODEL_f70e4406ca49447eae7f96ac39943f75", + "tabbable": null, + "tooltip": null, + "value": " 3000/3000 [00:39<00:00, 81.93 examples/s]" + } + }, + "041bd7a313a24b09a69e1e74826989cf": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "07e2931fb3e1481eba7aefd3b760f469": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "083343851c584e7ca75b16a5d0b81242": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "083d2f2c9cf9465196c7784ffce47835": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_8fab7a6fb3694c18b32db45b8b7ea5c9", + "max": 1188.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_4b021943c5cc4543b0c5ab214d66082c", + "tabbable": null, + "tooltip": null, + "value": 1188.0 + } + }, + "08768bf98c184de38f72d472491bdde8": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_7ce39791cdd24f2da44cddc865955e8b", + "placeholder": "​", + "style": "IPY_MODEL_13e9ad27dd974667a2abe655c19e9b5c", + "tabbable": null, + "tooltip": null, + "value": "chat_template.jinja: " + } + }, + "0db0737c77484f879d3784926cf91535": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_ca44c1033c9d4081836bc565c16afaaf", + "max": 3000.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_cb1f3e709d8f448aa1dfc26acad5f183", + "tabbable": null, + "tooltip": null, + "value": 3000.0 + } + }, + "0e9b584a9a834bbd94b2ce845711fc7e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "0f898b29b5aa41a1828bf2fcdb3e2549": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0fca689ef23844d9833927002b847724": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_88a12d9d1ba64562ba8fd4407cfe9a54", + "placeholder": "​", + "style": "IPY_MODEL_dd3eb04ec7fe48bb86aae0b6dcf21b21", + "tabbable": null, + "tooltip": null, + "value": "Computing checksums: 100%" + } + }, + "1191f87c5f084eed9738933497f740bf": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_932af61e0f874cc5b7ab0a3d338955f3", + "IPY_MODEL_083d2f2c9cf9465196c7784ffce47835", + "IPY_MODEL_5391e4607cf34c399f9d7d78a8bb4573" + ], + "layout": "IPY_MODEL_32b45a03d5ec46afb7e79b9892c390aa", + "tabbable": null, + "tooltip": null + } + }, + "119c448203264670ae619d5f195e32e1": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "12342c4995124f3a93862b18a561172e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "12d481c5c4b448599ad0664fa4d8c186": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_0f898b29b5aa41a1828bf2fcdb3e2549", + "placeholder": "​", + "style": "IPY_MODEL_dcaed7f2f42c4c3590de018062a11cd2", + "tabbable": null, + "tooltip": null, + "value": " 100000/100000 [00:00<00:00, 130505.14 examples/s]" + } + }, + "13e9ad27dd974667a2abe655c19e9b5c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "148aff2ee3d849f195ead7bd3005417f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_4645b8bd0b0045fe8bb42cb9b3d388ba", + "placeholder": "​", + "style": "IPY_MODEL_23a92666f0194eb4a35f2d483b694044", + "tabbable": null, + "tooltip": null, + "value": " 1.69k/? [00:00<00:00, 122kB/s]" + } + }, + "15dd2b3c856b48028061df9393e5933e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "179efa0635584d8da037e7dc31357c0b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_d83b55de7f2546b79ab355c0c5a3f425", + "IPY_MODEL_7cb6e843c9464063bedb75c790dac142", + "IPY_MODEL_3ad354a593364689bfd9e7fc57c3acb3" + ], + "layout": "IPY_MODEL_07e2931fb3e1481eba7aefd3b760f469", + "tabbable": null, + "tooltip": null + } + }, + "1bf4a66851d74ab9aa8b1ac1e6b6077e": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1d5d11772e854845b3da2ba42f8b31f5": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1dbbcbb641d0457da9a617d7c5e66c60": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "1e5fc2fd6913484f9f5d9201087a450e": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "207687f8a8584c839b231cc20da58659": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2196782266904e68be8a32ac0d6a0f53": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "2369db8f54c043d58b7b5934dd21ee02": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "23a92666f0194eb4a35f2d483b694044": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "24b7aeb3de9347428e4b5d7227e808de": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "24d2ca473bf34698a66c6a9764adf4b8": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_a224a8553c0e47cfa4fed1344bd1b7d6", + "placeholder": "​", + "style": "IPY_MODEL_b4fb8951c7804c279a7926b2c13edd59", + "tabbable": null, + "tooltip": null, + "value": "Fetching 2 files: 100%" + } + }, + "2736f1e3555446169cd2706f4a491421": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_37622ad056924c6f8f89e0262cab2b97", + "placeholder": "​", + "style": "IPY_MODEL_083343851c584e7ca75b16a5d0b81242", + "tabbable": null, + "tooltip": null, + "value": " 120k/? [00:00<00:00, 9.24MB/s]" + } + }, + "2760d298c9df414ab4137112bf32d8fd": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "27a8a12da1034702a64ccedb2981c575": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_f3ed4f1b125c4432b1cdd671930bfe5f", + "IPY_MODEL_53ee2477910b4b0eb151202487804a2b", + "IPY_MODEL_ca6c999e4b8a47509d70d5fdbf3c295e" + ], + "layout": "IPY_MODEL_d47270348cd34660947ff133088379de", + "tabbable": null, + "tooltip": null + } + }, + "2bd6c2a05f83417eb8f881f3de8d5eb4": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_d18e2632f27e49239a678db16247a2e9", + "max": 100000.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_6946150c001a4d0aa1850c2bee69c0b0", + "tabbable": null, + "tooltip": null, + "value": 100000.0 + } + }, + "2ccf61cc2c69496f9726cbe6e95b9d8b": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2d391081595c436991ad4a7e02e6d536": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2d9557aa49434ff3a772a05a66fffc40": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "2f341cf202db4feeac7652903288201d": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "30f5c6c6cb044296b80f43162e68d94b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_84fbd41a3ce244c5b73af0f08d747c1c", + "placeholder": "​", + "style": "IPY_MODEL_d51a571b1c424eff8ce17a56255d7891", + "tabbable": null, + "tooltip": null, + "value": "tokenizer_config.json: " + } + }, + "31df175d9a6e4aeeb2ebcf545fba09a8": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_8832d1cdbca64c74a38a5d6045ba0e18", + "placeholder": "​", + "style": "IPY_MODEL_0161b70d1f0a48fa9031f7e5fdb82d58", + "tabbable": null, + "tooltip": null, + "value": "Map (num_proc=8): 100%" + } + }, + "32b45a03d5ec46afb7e79b9892c390aa": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "363e1e28799d444aa8002941426bd8c3": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "37622ad056924c6f8f89e0262cab2b97": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "38b7b0a0accc43f9be79506f3aa10940": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_95166084fce04f168194727aa818c0d4", + "placeholder": "​", + "style": "IPY_MODEL_d7eb5dc7004941c1b02f1d433f6d8ffc", + "tabbable": null, + "tooltip": null, + "value": " 2/2 [04:12<00:00, 252.67s/it]" + } + }, + "38f9cb5906f3454ea3e53f7424a51159": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_2369db8f54c043d58b7b5934dd21ee02", + "max": 3000.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_7f805ff9f8b64e40a6f4c1100ca539b2", + "tabbable": null, + "tooltip": null, + "value": 3000.0 + } + }, + "3a654aaa88874576922d2ead1ba12d33": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "3ad354a593364689bfd9e7fc57c3acb3": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_5aedaac067804ed89a5def5319f4a4cd", + "placeholder": "​", + "style": "IPY_MODEL_abd94ffbb5c7432c8d2df6ea6bd591c7", + "tabbable": null, + "tooltip": null, + "value": " 208/208 [00:00<00:00, 22.5kB/s]" + } + }, + "3bb6d52047824c80a2656d5e2e5974b6": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "3bce8516d41e4055932a098594b5647c": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3c3e50bf8e254e9e9f5b14e4bc314c03": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_24d2ca473bf34698a66c6a9764adf4b8", + "IPY_MODEL_bc59cc9cb6164af49f6250adcfcc6f36", + "IPY_MODEL_38b7b0a0accc43f9be79506f3aa10940" + ], + "layout": "IPY_MODEL_a38ac0a167fe4278b7360df5ef18cc06", + "tabbable": null, + "tooltip": null + } + }, + "3d0f2fec3da64771b9a714c34882a99e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "3f76bba3dafe47c4b2bf3e3739a7eea4": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "40f6fcdae0c74d06ad9060c54adaa8af": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4185e1da42694d0a855bcdafe4bd51d4": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_e6bdb5db84ff46979adb6f9521205a0a", + "max": 1.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_873fd6a2d5f440a0b0311ad2754b1218", + "tabbable": null, + "tooltip": null, + "value": 1.0 + } + }, + "42749654c90f4282babf172607a4e202": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "440f09af00f3435aaa266dc8fafa02e4": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_e591ddfdc2454d1286ff320af25a3f8b", + "placeholder": "​", + "style": "IPY_MODEL_7de54c30e0d54241b5824bc96b7c14ed", + "tabbable": null, + "tooltip": null, + "value": " 982/982 [00:00<00:00, 98.6kB/s]" + } + }, + "45b0d3dba1324d9f86abb361dd5a498d": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "4645b8bd0b0045fe8bb42cb9b3d388ba": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4735eeafdac74876a36ef26696967673": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "49a12fc2587747b28bdd4547a1153f4d": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4a79f9108d514ff5a3728c8915c4accc": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_de7a1f2d9ad4452188d88bb2890ff55d", + "placeholder": "​", + "style": "IPY_MODEL_12342c4995124f3a93862b18a561172e", + "tabbable": null, + "tooltip": null, + "value": " 12.0k/? [00:00<00:00, 1.30MB/s]" + } + }, + "4b021943c5cc4543b0c5ab214d66082c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "4b1c3a1c826746ff9b8f8575047254ac": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_4e1dc4ae3c4c43e79c1daccb1cfe2c4f", + "placeholder": "​", + "style": "IPY_MODEL_00719cd6018c4f90bf566c624cca0e54", + "tabbable": null, + "tooltip": null, + "value": "model.safetensors.index.json: " + } + }, + "4bb5c79cbf3145b5884dcf19f10252e7": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "4bfcde758855484bbcb2247d8f11ddcb": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_de3c8be3b41348139129ec7411317470", + "IPY_MODEL_0db0737c77484f879d3784926cf91535", + "IPY_MODEL_b5e0b331842c4d048b89f99de909316f" + ], + "layout": "IPY_MODEL_c63b5c9009dd4ab387283824f660adca", + "tabbable": null, + "tooltip": null + } + }, + "4e1dc4ae3c4c43e79c1daccb1cfe2c4f": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4e1f4a63a3c0486d8d42c145004ce665": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "50231bc20fa74317819198329e62097d": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "530a4830fa7243ff9d98f7e523221f3b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "5391e4607cf34c399f9d7d78a8bb4573": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_65db81818b584084984a73f98f32f980", + "placeholder": "​", + "style": "IPY_MODEL_bf75830bd2f74e5091f7cdc50247c26a", + "tabbable": null, + "tooltip": null, + "value": " 1188/1188 [04:14<00:00, 25.34it/s]" + } + }, + "53b9fef768494db092dc06ebf11118cf": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_08768bf98c184de38f72d472491bdde8", + "IPY_MODEL_ad392e4f7e294c48a576d7b613ac74b4", + "IPY_MODEL_4a79f9108d514ff5a3728c8915c4accc" + ], + "layout": "IPY_MODEL_b69eb0115bf44847afc9b12957f68f13", + "tabbable": null, + "tooltip": null + } + }, + "53ee2477910b4b0eb151202487804a2b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_f2e8c6e590a24ac28df404cc43bef213", + "max": 116531415.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_42749654c90f4282babf172607a4e202", + "tabbable": null, + "tooltip": null, + "value": 116531415.0 + } + }, + "56196278224341fda521ab23c67352ef": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_7a0dc5bd17c2407ea68c09c51aaaea05", + "IPY_MODEL_5973d02b637648b08588c56afe5698d0", + "IPY_MODEL_148aff2ee3d849f195ead7bd3005417f" + ], + "layout": "IPY_MODEL_5c5d3a0061d24907be28581eabb02cd5", + "tabbable": null, + "tooltip": null + } + }, + "586423c6f5c24604930622f7d9c41c01": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_0fca689ef23844d9833927002b847724", + "IPY_MODEL_4185e1da42694d0a855bcdafe4bd51d4", + "IPY_MODEL_652ac436b21f4b3ea20e515e29331ad9" + ], + "layout": "IPY_MODEL_7a0bb05a560f42738ad2ddefe9e6ef1f", + "tabbable": null, + "tooltip": null + } + }, + "5973d02b637648b08588c56afe5698d0": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_d5515569279c4be38be81665b7f78bc8", + "max": 1.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_1dbbcbb641d0457da9a617d7c5e66c60", + "tabbable": null, + "tooltip": null, + "value": 1.0 + } + }, + "5a22224994c14e568f44f92c24b67174": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5aedaac067804ed89a5def5319f4a4cd": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5b943c7cbf4a4ae39a10aa77037313da": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "5beb490b2ec447aba772341000c61aa2": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5c5d3a0061d24907be28581eabb02cd5": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "600cb4fa52324d13a44c27cb518ef4b8": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "60adcfaac9cc483284f4cdc1cead9de4": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_81cc1a9a180d455ba112a8e2fc4370de", + "IPY_MODEL_8d2c770968ea4aa895b34d91e8fc4200", + "IPY_MODEL_440f09af00f3435aaa266dc8fafa02e4" + ], + "layout": "IPY_MODEL_3bce8516d41e4055932a098594b5647c", + "tabbable": null, + "tooltip": null + } + }, + "62599fabfe704fafbe1c481930a06440": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_aaf1a00ae4c7477a9f202ef7f71c7521", + "placeholder": "​", + "style": "IPY_MODEL_3bb6d52047824c80a2656d5e2e5974b6", + "tabbable": null, + "tooltip": null, + "value": " 3000/3000 [00:01<00:00, 284.12 examples/s]" + } + }, + "62b2dc6228b84ccf889646fa48b788be": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "652ac436b21f4b3ea20e515e29331ad9": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_1e5fc2fd6913484f9f5d9201087a450e", + "placeholder": "​", + "style": "IPY_MODEL_0e9b584a9a834bbd94b2ce845711fc7e", + "tabbable": null, + "tooltip": null, + "value": " 1/1 [00:00<00:00, 157.88it/s]" + } + }, + "65db81818b584084984a73f98f32f980": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6917999552634b0b8dbb90f02f21b86e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "6946150c001a4d0aa1850c2bee69c0b0": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "6a062426ee974fee9ffdb29501f6d08c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_041bd7a313a24b09a69e1e74826989cf", + "placeholder": "​", + "style": "IPY_MODEL_50231bc20fa74317819198329e62097d", + "tabbable": null, + "tooltip": null, + "value": "Unsloth: Tokenizing ["text"] (num_proc=8): 100%" + } + }, + "6b92f3719f884226afabf272385d45cb": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "6d9ef09e4a9d416a9a46f9dc1bbaf4c0": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "700441d98fed4085ac97bc22c18c55da": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_e8ccd117e4e14a198c189d1e631844ad", + "IPY_MODEL_8385a3383d814b7c9536b64ad713643a", + "IPY_MODEL_a23e5e0af2dc48a28fab3735caeddb79" + ], + "layout": "IPY_MODEL_1bf4a66851d74ab9aa8b1ac1e6b6077e", + "tabbable": null, + "tooltip": null + } + }, + "70f9749da14748a9a1512b203e1bea68": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "725a558139214e7dba2f617c9592af35": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_31df175d9a6e4aeeb2ebcf545fba09a8", + "IPY_MODEL_ea8eb5cb677e4746945b26aac468097b", + "IPY_MODEL_62599fabfe704fafbe1c481930a06440" + ], + "layout": "IPY_MODEL_7bd4d35fdb7646fdb8c3b680e50e2f30", + "tabbable": null, + "tooltip": null + } + }, + "7318c1b2e3ae4e53a58b4497fc22a420": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "73c57dd7f29c4f73bef37d8355172ce2": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "7464a9c6c9b14891b98647dec7600353": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "78d33fdc643242dca3a2cbd98915eebe": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7a0bb05a560f42738ad2ddefe9e6ef1f": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7a0dc5bd17c2407ea68c09c51aaaea05": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_ef339cb572e343618ab7737d678b7551", + "placeholder": "​", + "style": "IPY_MODEL_62b2dc6228b84ccf889646fa48b788be", + "tabbable": null, + "tooltip": null, + "value": "processor_config.json: " + } + }, + "7bd4d35fdb7646fdb8c3b680e50e2f30": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7cb6e843c9464063bedb75c790dac142": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_24b7aeb3de9347428e4b5d7227e808de", + "max": 208.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_bdea0e2fabcc4340baaf3da97acaca58", + "tabbable": null, + "tooltip": null, + "value": 208.0 + } + }, + "7ce39791cdd24f2da44cddc865955e8b": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7de54c30e0d54241b5824bc96b7c14ed": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "7f805ff9f8b64e40a6f4c1100ca539b2": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "807c2055b0db4db6b6ba6ca0476ad38c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_4735eeafdac74876a36ef26696967673", + "max": 3000.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_8f92a9cccc394baa9c5ae55af29bd27c", + "tabbable": null, + "tooltip": null, + "value": 3000.0 + } + }, + "8133ec6564f54dcaad20564a2d278d03": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_ff04312b4a664b24ad0030bb947be894", + "placeholder": "​", + "style": "IPY_MODEL_6b92f3719f884226afabf272385d45cb", + "tabbable": null, + "tooltip": null, + "value": " 32.2M/32.2M [00:00<00:00, 160MB/s]" + } + }, + "81cc1a9a180d455ba112a8e2fc4370de": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_d1f4f04d8a2342839bf3fc5ae26fb7cf", + "placeholder": "​", + "style": "IPY_MODEL_f24969df204a4a1bbfb64939f6793513", + "tabbable": null, + "tooltip": null, + "value": "README.md: 100%" + } + }, + "8385a3383d814b7c9536b64ad713643a": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_ad53f1dc09ad40288f89f3850edb66b4", + "max": 3000.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_84d56b9dfc364ffda7916cdb4ba3a8fc", + "tabbable": null, + "tooltip": null, + "value": 3000.0 + } + }, + "8423817f2b4b4d6a81b467023ce01994": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "84d56b9dfc364ffda7916cdb4ba3a8fc": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "84e0e573d4cc40baab6224039c4326bf": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_acabee228a184af8a71c0628bd0c089a", + "placeholder": "​", + "style": "IPY_MODEL_7318c1b2e3ae4e53a58b4497fc22a420", + "tabbable": null, + "tooltip": null, + "value": "tokenizer.json: 100%" + } + }, + "84fbd41a3ce244c5b73af0f08d747c1c": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "873fd6a2d5f440a0b0311ad2754b1218": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "8832d1cdbca64c74a38a5d6045ba0e18": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "88a12d9d1ba64562ba8fd4407cfe9a54": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8cb34722fb9940b2960cdb7a0164c21d": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8d2c770968ea4aa895b34d91e8fc4200": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_a2fa186beb3e4407a248d483e7ec1482", + "max": 982.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_530a4830fa7243ff9d98f7e523221f3b", + "tabbable": null, + "tooltip": null, + "value": 982.0 + } + }, + "8f92a9cccc394baa9c5ae55af29bd27c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "8fab7a6fb3694c18b32db45b8b7ea5c9": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "90ff42f56a3449809883451674e813ed": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "932af61e0f874cc5b7ab0a3d338955f3": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_fae74896cb8b480eb6e3d150d4be1125", + "placeholder": "​", + "style": "IPY_MODEL_e23404206b124069a9e353806b328527", + "tabbable": null, + "tooltip": null, + "value": "Loading weights: 100%" + } + }, + "9385d3f43b574a5aa445961890e3ca60": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "945d6d985ea1430bad5da9dd52d5bf17": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "95166084fce04f168194727aa818c0d4": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9835553764e340b084b6c9b4192dede6": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "9979852f7f1e4abfaab19c8fe7df76bf": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_4b1c3a1c826746ff9b8f8575047254ac", + "IPY_MODEL_b99659e5aa0e49a49925b8d3373b33fa", + "IPY_MODEL_2736f1e3555446169cd2706f4a491421" + ], + "layout": "IPY_MODEL_363e1e28799d444aa8002941426bd8c3", + "tabbable": null, + "tooltip": null + } + }, + "a1beb7697e21449a90a6dfdb59e2eb63": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_6a062426ee974fee9ffdb29501f6d08c", + "IPY_MODEL_38f9cb5906f3454ea3e53f7424a51159", + "IPY_MODEL_03cf201e1dbd4f6c9ad0674ca5de0c0c" + ], + "layout": "IPY_MODEL_6d9ef09e4a9d416a9a46f9dc1bbaf4c0", + "tabbable": null, + "tooltip": null + } + }, + "a1eeb36b3ddd4d949c90bd4b2226bcdb": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a224a8553c0e47cfa4fed1344bd1b7d6": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a23e5e0af2dc48a28fab3735caeddb79": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_e9b45ef12abd4bbd8d371f763dacf55b", + "placeholder": "​", + "style": "IPY_MODEL_2196782266904e68be8a32ac0d6a0f53", + "tabbable": null, + "tooltip": null, + "value": " 3000/3000 [00:01<00:00, 459.00 examples/s]" + } + }, + "a2fa186beb3e4407a248d483e7ec1482": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a38ac0a167fe4278b7360df5ef18cc06": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "aaf1a00ae4c7477a9f202ef7f71c7521": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "abd94ffbb5c7432c8d2df6ea6bd591c7": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "acabee228a184af8a71c0628bd0c089a": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ad392e4f7e294c48a576d7b613ac74b4": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_5b943c7cbf4a4ae39a10aa77037313da", + "max": 1.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_2d9557aa49434ff3a772a05a66fffc40", + "tabbable": null, + "tooltip": null, + "value": 1.0 + } + }, + "ad53f1dc09ad40288f89f3850edb66b4": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ae024d064a3c412bb13038c52ddcaa03": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_2d391081595c436991ad4a7e02e6d536", + "max": 32169626.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_3a654aaa88874576922d2ead1ba12d33", + "tabbable": null, + "tooltip": null, + "value": 32169626.0 + } + }, + "af5f841a2cdc400391c794560bf15a45": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b3feb364e98b4876a9cbfcb8c08d6c18": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_e131eaea413c4647a7694d8f2a3f67ed", + "placeholder": "​", + "style": "IPY_MODEL_15dd2b3c856b48028061df9393e5933e", + "tabbable": null, + "tooltip": null, + "value": "Generating train split: 100%" + } + }, + "b4fb8951c7804c279a7926b2c13edd59": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "b5e0b331842c4d048b89f99de909316f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_e351a3a818944286a2acf7dc43df02a1", + "placeholder": "​", + "style": "IPY_MODEL_945d6d985ea1430bad5da9dd52d5bf17", + "tabbable": null, + "tooltip": null, + "value": " 3000/3000 [00:02<00:00, 441.04 examples/s]" + } + }, + "b69eb0115bf44847afc9b12957f68f13": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b99659e5aa0e49a49925b8d3373b33fa": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_600cb4fa52324d13a44c27cb518ef4b8", + "max": 1.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_9835553764e340b084b6c9b4192dede6", + "tabbable": null, + "tooltip": null, + "value": 1.0 + } + }, + "b99c3f0ccfcf4ede995ed99c0aaa94c9": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_90ff42f56a3449809883451674e813ed", + "max": 1.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_8423817f2b4b4d6a81b467023ce01994", + "tabbable": null, + "tooltip": null, + "value": 1.0 + } + }, + "bc59cc9cb6164af49f6250adcfcc6f36": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_3f76bba3dafe47c4b2bf3e3739a7eea4", + "max": 2.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_c7fa6164981a488a8fece218f1691ec6", + "tabbable": null, + "tooltip": null, + "value": 2.0 + } + }, + "bdea0e2fabcc4340baaf3da97acaca58": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "bedda22b15eb4a42a9f6c9681d22c1b4": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "bf68c41c79ad420aad5db1dfeb16a50d": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_5a22224994c14e568f44f92c24b67174", + "placeholder": "​", + "style": "IPY_MODEL_4bb5c79cbf3145b5884dcf19f10252e7", + "tabbable": null, + "tooltip": null, + "value": " 15.0k/? [00:00<00:00, 1.34MB/s]" + } + }, + "bf75830bd2f74e5091f7cdc50247c26a": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "c63b5c9009dd4ab387283824f660adca": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c7557342d7744e998dba2d24fe413d06": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "c7fa6164981a488a8fece218f1691ec6": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "c9cb90312b97442bb895e73d6329c5df": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_fab8d4e53c6944c8a23045a53ace1750", + "IPY_MODEL_807c2055b0db4db6b6ba6ca0476ad38c", + "IPY_MODEL_cce754ec2015456886b15153e571185f" + ], + "layout": "IPY_MODEL_2760d298c9df414ab4137112bf32d8fd", + "tabbable": null, + "tooltip": null + } + }, + "ca44c1033c9d4081836bc565c16afaaf": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ca6c999e4b8a47509d70d5fdbf3c295e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_207687f8a8584c839b231cc20da58659", + "placeholder": "​", + "style": "IPY_MODEL_6917999552634b0b8dbb90f02f21b86e", + "tabbable": null, + "tooltip": null, + "value": " 117M/117M [00:02<00:00, 275MB/s]" + } + }, + "cb1f3e709d8f448aa1dfc26acad5f183": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "cce754ec2015456886b15153e571185f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_5beb490b2ec447aba772341000c61aa2", + "placeholder": "​", + "style": "IPY_MODEL_cd1ca44e4664435aade0cb03d7f4e9d5", + "tabbable": null, + "tooltip": null, + "value": " 3000/3000 [00:00<00:00, 10507.50 examples/s]" + } + }, + "cd1ca44e4664435aade0cb03d7f4e9d5": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "d18e2632f27e49239a678db16247a2e9": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d1f4f04d8a2342839bf3fc5ae26fb7cf": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d47270348cd34660947ff133088379de": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d51a571b1c424eff8ce17a56255d7891": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "d5515569279c4be38be81665b7f78bc8": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "d7eb5dc7004941c1b02f1d433f6d8ffc": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "d83b55de7f2546b79ab355c0c5a3f425": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_2ccf61cc2c69496f9726cbe6e95b9d8b", + "placeholder": "​", + "style": "IPY_MODEL_7464a9c6c9b14891b98647dec7600353", + "tabbable": null, + "tooltip": null, + "value": "generation_config.json: 100%" + } + }, + "dcaed7f2f42c4c3590de018062a11cd2": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "dd3eb04ec7fe48bb86aae0b6dcf21b21": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "de3c8be3b41348139129ec7411317470": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_a1eeb36b3ddd4d949c90bd4b2226bcdb", + "placeholder": "​", + "style": "IPY_MODEL_2f341cf202db4feeac7652903288201d", + "tabbable": null, + "tooltip": null, + "value": "Filter (num_proc=8): 100%" + } + }, + "de7a1f2d9ad4452188d88bb2890ff55d": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e131eaea413c4647a7694d8f2a3f67ed": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e23404206b124069a9e353806b328527": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "e24c7d620b6e4a0ba3ab31434fe4e2d1": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_40f6fcdae0c74d06ad9060c54adaa8af", + "placeholder": "​", + "style": "IPY_MODEL_45b0d3dba1324d9f86abb361dd5a498d", + "tabbable": null, + "tooltip": null, + "value": "Download complete: 100%" + } + }, + "e309404ffb6f451aaa57428b8f414843": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "e351a3a818944286a2acf7dc43df02a1": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e4fd7c501ed34c8db0529f5e6e197acf": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_c7557342d7744e998dba2d24fe413d06", + "max": 1.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_3d0f2fec3da64771b9a714c34882a99e", + "tabbable": null, + "tooltip": null, + "value": 1.0 + } + }, + "e58c399a44e44a4a8f3cc94f5311fd2c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_30f5c6c6cb044296b80f43162e68d94b", + "IPY_MODEL_e4fd7c501ed34c8db0529f5e6e197acf", + "IPY_MODEL_bf68c41c79ad420aad5db1dfeb16a50d" + ], + "layout": "IPY_MODEL_70f9749da14748a9a1512b203e1bea68", + "tabbable": null, + "tooltip": null + } + }, + "e591ddfdc2454d1286ff320af25a3f8b": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e6bdb5db84ff46979adb6f9521205a0a": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e8ccd117e4e14a198c189d1e631844ad": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_af5f841a2cdc400391c794560bf15a45", + "placeholder": "​", + "style": "IPY_MODEL_119c448203264670ae619d5f195e32e1", + "tabbable": null, + "tooltip": null, + "value": "Unsloth: Standardizing formats (num_proc=8): 100%" + } + }, + "e9b45ef12abd4bbd8d371f763dacf55b": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ea8eb5cb677e4746945b26aac468097b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_1d5d11772e854845b3da2ba42f8b31f5", + "max": 3000.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_73c57dd7f29c4f73bef37d8355172ce2", + "tabbable": null, + "tooltip": null, + "value": 3000.0 + } + }, + "ef339cb572e343618ab7737d678b7551": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ef6189642fa34589b561a49d9e3df4b6": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_4e1f4a63a3c0486d8d42c145004ce665", + "placeholder": "​", + "style": "IPY_MODEL_bedda22b15eb4a42a9f6c9681d22c1b4", + "tabbable": null, + "tooltip": null, + "value": " 62.5G/62.5G [04:12<00:00, 273MB/s]" + } + }, + "ef6bc3a4404c4aebbf321483b52fa418": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_b3feb364e98b4876a9cbfcb8c08d6c18", + "IPY_MODEL_2bd6c2a05f83417eb8f881f3de8d5eb4", + "IPY_MODEL_12d481c5c4b448599ad0664fa4d8c186" + ], + "layout": "IPY_MODEL_49a12fc2587747b28bdd4547a1153f4d", + "tabbable": null, + "tooltip": null + } + }, + "f24969df204a4a1bbfb64939f6793513": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "f2e8c6e590a24ac28df404cc43bef213": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f3ed4f1b125c4432b1cdd671930bfe5f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_f5fac5ea14274655ba8e0a4e734dd073", + "placeholder": "​", + "style": "IPY_MODEL_e309404ffb6f451aaa57428b8f414843", + "tabbable": null, + "tooltip": null, + "value": "data/train-00000-of-00001.parquet: 100%" + } + }, + "f5839673c4a649b6b3c7c767039e9659": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f5fac5ea14274655ba8e0a4e734dd073": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f70e4406ca49447eae7f96ac39943f75": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "fa239bb86e2b434aa62eb5b1e6211200": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_84e0e573d4cc40baab6224039c4326bf", + "IPY_MODEL_ae024d064a3c412bb13038c52ddcaa03", + "IPY_MODEL_8133ec6564f54dcaad20564a2d278d03" + ], + "layout": "IPY_MODEL_78d33fdc643242dca3a2cbd98915eebe", + "tabbable": null, + "tooltip": null + } + }, + "fab8d4e53c6944c8a23045a53ace1750": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_8cb34722fb9940b2960cdb7a0164c21d", + "placeholder": "​", + "style": "IPY_MODEL_9385d3f43b574a5aa445961890e3ca60", + "tabbable": null, + "tooltip": null, + "value": "Map: 100%" + } + }, + "fae74896cb8b480eb6e3d150d4be1125": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ff04312b4a664b24ad0030bb947be894": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + } + }, + "version_major": 2, + "version_minor": 0 + } + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/tooling/fine-tuning/unsloth/kaggle/Gemma4_(E4B)-Text.ipynb b/tooling/fine-tuning/unsloth/kaggle/Gemma4_(E4B)-Text.ipynb new file mode 100644 index 0000000..2a05993 --- /dev/null +++ b/tooling/fine-tuning/unsloth/kaggle/Gemma4_(E4B)-Text.ipynb @@ -0,0 +1,8241 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "9b84b389", + "metadata": { + "id": "3FgedoPTwNIy", + "papermill": { + "duration": 0.02114, + "end_time": "2026-04-09T14:15:04.157266+00:00", + "exception": false, + "start_time": "2026-04-09T14:15:04.136126+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "To run this, press \"*Runtime*\" and press \"*Run all*\" on a Google Colab L4 instance!\n", + "
\n", + "\n", + "\n", + " Join Discord if you need help + ⭐ Star us on Github ⭐\n", + "
\n", + "\n", + "To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme).\n", + "\n", + "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it" + ] + }, + { + "cell_type": "markdown", + "id": "7bf3fc9f", + "metadata": { + "id": "kD8yWWE9wNIz", + "papermill": { + "duration": 0.017711, + "end_time": "2026-04-09T14:15:04.192885+00:00", + "exception": false, + "start_time": "2026-04-09T14:15:04.175174+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "### News" + ] + }, + { + "cell_type": "markdown", + "id": "bc8ec7ab", + "metadata": { + "id": "dYFNULMnwNIz", + "papermill": { + "duration": 0.01829, + "end_time": "2026-04-09T14:15:04.229560+00:00", + "exception": false, + "start_time": "2026-04-09T14:15:04.211270+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "Introducing **Unsloth Studio** - a new open source, no-code web UI to train and run LLMs. [Blog](https://unsloth.ai/docs/new/studio) • [Notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb)\n", + "\n", + "\n", + "\n", + "\n", + "
\"Unsloth
Train models — no code needed
\"Unsloth
Run GGUF models on Mac, Windows & Linux
\n", + "\n", + "Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe)\n", + "\n", + "Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context)\n", + "\n", + "New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) • [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) • [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning)\n", + "\n", + "Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks)." + ] + }, + { + "cell_type": "markdown", + "id": "e7bd6a4f", + "metadata": { + "id": "DQ6w6D0UwNIz", + "papermill": { + "duration": 0.017554, + "end_time": "2026-04-09T14:15:04.265115+00:00", + "exception": false, + "start_time": "2026-04-09T14:15:04.247561+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "### Installation" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "28bc6243", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:15:04.302392Z", + "iopub.status.busy": "2026-04-09T14:15:04.301722Z", + "iopub.status.idle": "2026-04-09T14:15:08.424803Z", + "shell.execute_reply": "2026-04-09T14:15:08.423967Z" + }, + "id": "LSA1qFrKwNIz", + "papermill": { + "duration": 4.143759, + "end_time": "2026-04-09T14:15:08.426762+00:00", + "exception": false, + "start_time": "2026-04-09T14:15:04.283003+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "%%capture\n", + "try: import numpy, PIL; _numpy = f\"numpy=={numpy.__version__}\"; _pil = f\"pillow=={PIL.__version__}\"\n", + "except: _numpy = \"numpy\"; _pil = \"pillow\"\n", + "!uv pip install -qqq \\\n", + " \"torch>=2.8.0\" \"triton>=3.4.0\" {_numpy} {_pil} torchvision bitsandbytes \\\n", + " unsloth \"unsloth_zoo>=2026.4.5\" transformers==5.5.0 torchcodec timm" + ] + }, + { + "cell_type": "markdown", + "id": "f6b4946d", + "metadata": { + "id": "TGMWlrRdzwgf", + "papermill": { + "duration": 0.018261, + "end_time": "2026-04-09T14:15:08.464472+00:00", + "exception": false, + "start_time": "2026-04-09T14:15:08.446211+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "### Unsloth\n", + "\n", + "`FastModel` supports loading nearly any model now! This includes Vision and Text models!" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "0da09ebe", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:15:08.501803Z", + "iopub.status.busy": "2026-04-09T14:15:08.501126Z", + "iopub.status.idle": "2026-04-09T14:18:12.449859Z", + "shell.execute_reply": "2026-04-09T14:18:12.449089Z" + }, + "id": "-Xbb0cuLzwgf", + "outputId": "b1936459-858a-460f-d39c-fe4d41114da8", + "papermill": { + "duration": 183.969777, + "end_time": "2026-04-09T14:18:12.451994+00:00", + "exception": false, + "start_time": "2026-04-09T14:15:08.482217+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.\n", + "🦥 Unsloth Zoo will now patch everything to make training faster!\n", + "==((====))== Unsloth 2026.4.4: Fast Gemma4 patching. Transformers: 5.5.0.\n", + " \\\\ /| Tesla T4. Num GPUs = 2. Max memory: 14.563 GB. Platform: Linux.\n", + "O^O/ \\_/ \\ Torch: 2.10.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.6.0\n", + "\\ / Bfloat16 = FALSE. FA [Xformers = 0.0.35. FA2 = False]\n", + " \"-____-\" Free license: http://github.com/unslothai/unsloth\n", + "Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3f9356b907ff4aeb8a45acce1e82e41b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "model.safetensors: 0%| | 0.00/16.0G [00:00" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "3991f051", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:18:12.637777Z", + "iopub.status.busy": "2026-04-09T14:18:12.637023Z", + "iopub.status.idle": "2026-04-09T14:18:56.352960Z", + "shell.execute_reply": "2026-04-09T14:18:56.352104Z" + }, + "id": "9jGeSb9bWe0k", + "outputId": "289205bd-b890-4659-8c82-8c7674befdfe", + "papermill": { + "duration": 43.740329, + "end_time": "2026-04-09T14:18:56.354671+00:00", + "exception": false, + "start_time": "2026-04-09T14:18:12.614342+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "I am sorry, but I cannot answer that question. The image you provided is a photograph of a sloth, and it is not a film poster or a scene from a movie, so it does not feature in any films.\n" + ] + } + ], + "source": [ + "sloth_link = \"https://files.worldwildlife.org/wwfcmsprod/images/Sloth_Sitting_iStock_3_12_2014/story_full_width/8l7pbjmj29_iStock_000011145477Large_mini__1_.jpg\"\n", + "\n", + "messages = [{\n", + " \"role\" : \"user\",\n", + " \"content\": [\n", + " { \"type\": \"image\", \"image\" : sloth_link },\n", + " { \"type\": \"text\", \"text\" : \"Which films does this animal feature in?\" }\n", + " ]\n", + "}]\n", + "# You might have to wait 1 minute for Unsloth's auto compiler\n", + "do_gemma_4_inference(messages, max_new_tokens = 256)" + ] + }, + { + "cell_type": "markdown", + "id": "13c09053", + "metadata": { + "id": "eh0BzbZPWtRD", + "papermill": { + "duration": 0.021173, + "end_time": "2026-04-09T14:18:56.397839+00:00", + "exception": false, + "start_time": "2026-04-09T14:18:56.376666+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "Let's make a poem about sloths!" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "f05f68d7", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:18:56.442084Z", + "iopub.status.busy": "2026-04-09T14:18:56.441720Z", + "iopub.status.idle": "2026-04-09T14:19:19.809135Z", + "shell.execute_reply": "2026-04-09T14:19:19.808438Z" + }, + "id": "R3ExuK8cWuT3", + "outputId": "d63b198a-00c6-49a5-d0e5-78cbd18d217f", + "papermill": { + "duration": 23.39163, + "end_time": "2026-04-09T14:19:19.810646+00:00", + "exception": false, + "start_time": "2026-04-09T14:18:56.419016+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "## The Gentle Pace\n", + "\n", + "In emerald woods, where mosses softly cling,\n", + "And dappled sunlight makes the shadows swing,\n", + "There moves a creature, slow beneath the green,\n", + "The sloth, a masterpiece of patient scene.\n", + "\n", + "A hanging silhouette, a mossy grace,\n", + "Adorning time within its tranquil space.\n", + "No hurried breath disturbs its languid guise,\n", + "It moves as if the forest breathes through sighs.\n", + "\n", + "The world rushes by—a frantic, buzzing flight—\n", + "Of birds in chorus, bathed in blinding light.\n", + "But the sloth observes it with a knowing eye,\n", + "Content within its slow eternity.\n" + ] + } + ], + "source": [ + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{ \"type\" : \"text\",\n", + " \"text\" : \"Write a poem about sloths.\" }]\n", + "}]\n", + "do_gemma_4_inference(messages)" + ] + }, + { + "cell_type": "markdown", + "id": "3b172504", + "metadata": { + "id": "wZrmFRZpZtGf", + "papermill": { + "duration": 0.025014, + "end_time": "2026-04-09T14:19:19.861788+00:00", + "exception": false, + "start_time": "2026-04-09T14:19:19.836774+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "# Gemma 4 can also hear!" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "6285a30a", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:19:19.912674Z", + "iopub.status.busy": "2026-04-09T14:19:19.912128Z", + "iopub.status.idle": "2026-04-09T14:19:20.258124Z", + "shell.execute_reply": "2026-04-09T14:19:20.257342Z" + }, + "id": "68crYajNZtw1", + "outputId": "e8ffe42f-79d8-473c-8b2b-8d4d05acc549", + "papermill": { + "duration": 0.376154, + "end_time": "2026-04-09T14:19:20.262661+00:00", + "exception": false, + "start_time": "2026-04-09T14:19:19.886507+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " \n", + " " + ], + "text/plain": [ + "" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from IPython.display import Audio, display\n", + "Audio(\"https://www.nasa.gov/wp-content/uploads/2015/01/591240main_JFKmoonspeech.mp3\")" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "4fdc2f3c", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:19:20.323617Z", + "iopub.status.busy": "2026-04-09T14:19:20.323336Z", + "iopub.status.idle": "2026-04-09T14:19:20.800752Z", + "shell.execute_reply": "2026-04-09T14:19:20.799625Z" + }, + "id": "k3vrdoa0Z01X", + "papermill": { + "duration": 0.509763, + "end_time": "2026-04-09T14:19:20.802792+00:00", + "exception": false, + "start_time": "2026-04-09T14:19:20.293029+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "!wget -qqq https://www.nasa.gov/wp-content/uploads/2015/01/591240main_JFKmoonspeech.mp3 -O audio.mp3" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "95b24a57", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:19:20.862299Z", + "iopub.status.busy": "2026-04-09T14:19:20.861816Z", + "iopub.status.idle": "2026-04-09T14:19:39.539296Z", + "shell.execute_reply": "2026-04-09T14:19:39.538406Z" + }, + "id": "BJr_D4O9Z2Zh", + "outputId": "d6517a81-1d77-41dc-b2dd-2e6ab1aa3a29", + "papermill": { + "duration": 18.708402, + "end_time": "2026-04-09T14:19:39.540895+00:00", + "exception": false, + "start_time": "2026-04-09T14:19:20.832493+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "This audio is about the Apollo program.\n" + ] + } + ], + "source": [ + "audio_file = \"audio.mp3\"\n", + "\n", + "messages = [{\n", + " \"role\" : \"user\",\n", + " \"content\": [\n", + " { \"type\": \"audio\", \"audio\" : audio_file },\n", + " { \"type\": \"text\", \"text\" : \"What is this audio about?\" }\n", + " ]\n", + "}]\n", + "do_gemma_4_inference(messages, max_new_tokens = 256)" + ] + }, + { + "cell_type": "markdown", + "id": "20e934e5", + "metadata": { + "id": "L15JuAmmaOkB", + "papermill": { + "duration": 0.027251, + "end_time": "2026-04-09T14:19:39.597028+00:00", + "exception": false, + "start_time": "2026-04-09T14:19:39.569777+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "# Let's combine all 3 modalities together!" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "7d2bde68", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:19:39.653187Z", + "iopub.status.busy": "2026-04-09T14:19:39.652613Z", + "iopub.status.idle": "2026-04-09T14:19:56.511776Z", + "shell.execute_reply": "2026-04-09T14:19:56.511109Z" + }, + "id": "is37bsDZaRwV", + "outputId": "37235929-0289-4ba6-c352-da6d389cb5a0", + "papermill": { + "duration": 16.889309, + "end_time": "2026-04-09T14:19:56.513361+00:00", + "exception": false, + "start_time": "2026-04-09T14:19:39.624052+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "This appears to be a snippet of a speech, likely from a historical or political context, given the formal language and the ambitious goal mentioned. The image is a picture of a sloth.\n", + "\n", + "**Relationship:**\n", + "\n", + "There is no apparent relationship between the audio and the image. The audio is a serious, goal-oriented statement about a national endeavor, while the image is a photograph of an animal.\n" + ] + } + ], + "source": [ + "messages = [{\n", + " \"role\" : \"user\",\n", + " \"content\": [\n", + " { \"type\": \"audio\", \"audio\" : audio_file },\n", + " { \"type\": \"image\", \"image\" : sloth_link },\n", + " { \"type\": \"text\", \"text\" : \"What is this audio and image about? \"\\\n", + " \"How are they related?\" }\n", + " ]\n", + "}]\n", + "do_gemma_4_inference(messages, max_new_tokens = 256)" + ] + }, + { + "cell_type": "markdown", + "id": "17fa13a3", + "metadata": { + "id": "Bw5XPyYFajyM", + "papermill": { + "duration": 0.032942, + "end_time": "2026-04-09T14:19:56.579220+00:00", + "exception": false, + "start_time": "2026-04-09T14:19:56.546278+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "# Let's finetune Gemma 4!\n", + "\n", + "You can finetune the vision and text parts for now through selection - the audio part can also be finetuned - we're working to make it selectable as well!" + ] + }, + { + "cell_type": "markdown", + "id": "3666efda", + "metadata": { + "id": "SXd9bTZd1aaL", + "papermill": { + "duration": 0.031294, + "end_time": "2026-04-09T14:19:56.642703+00:00", + "exception": false, + "start_time": "2026-04-09T14:19:56.611409+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "We now add LoRA adapters so we only need to update a small amount of parameters!" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "eb65be92", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:19:56.719906Z", + "iopub.status.busy": "2026-04-09T14:19:56.718703Z", + "iopub.status.idle": "2026-04-09T14:20:02.557870Z", + "shell.execute_reply": "2026-04-09T14:20:02.556932Z" + }, + "id": "6bZsfBuZDeCL", + "papermill": { + "duration": 5.88583, + "end_time": "2026-04-09T14:20:02.559982+00:00", + "exception": false, + "start_time": "2026-04-09T14:19:56.674152+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "model = FastModel.get_peft_model(\n", + " model,\n", + " finetune_vision_layers = False, # Turn off for just text!\n", + " finetune_language_layers = True, # Should leave on!\n", + " finetune_attention_modules = True, # Attention good for GRPO\n", + " finetune_mlp_modules = True, # Should leave on always!\n", + "\n", + " r = 8, # Larger = higher accuracy, but might overfit\n", + " lora_alpha = 8, # Recommended alpha == r at least\n", + " lora_dropout = 0,\n", + " bias = \"none\",\n", + " random_state = 3407,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "a1bf9b7c", + "metadata": { + "id": "vITh0KVJ10qX", + "papermill": { + "duration": 0.029681, + "end_time": "2026-04-09T14:20:02.620946+00:00", + "exception": false, + "start_time": "2026-04-09T14:20:02.591265+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "\n", + "### Data Prep\n", + "We now use the `Gemma-4` format for conversation style finetunes. We use [Maxime Labonne's FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset in ShareGPT style. Gemma-4 renders multi turn conversations like below:\n", + "\n", + "```\n", + "<|turn>user\n", + "Hello\n", + "<|turn>model\n", + "Hey there!\n", + "```\n", + "We use our `get_chat_template` function to get the correct chat template. We support `zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, phi3, llama3, phi4, qwen2.5, gemma3, gemma-4` and more." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "a1cb35e2", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:20:02.685772Z", + "iopub.status.busy": "2026-04-09T14:20:02.685316Z", + "iopub.status.idle": "2026-04-09T14:20:02.690342Z", + "shell.execute_reply": "2026-04-09T14:20:02.689588Z" + }, + "id": "LjY75GoYUCB8", + "papermill": { + "duration": 0.039865, + "end_time": "2026-04-09T14:20:02.691841+00:00", + "exception": false, + "start_time": "2026-04-09T14:20:02.651976+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "from unsloth.chat_templates import get_chat_template\n", + "tokenizer = get_chat_template(\n", + " tokenizer,\n", + " chat_template = \"gemma-4\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "f0482b8e", + "metadata": { + "id": "ZQkXuGYxbJ-e", + "papermill": { + "duration": 0.029969, + "end_time": "2026-04-09T14:20:02.751701+00:00", + "exception": false, + "start_time": "2026-04-09T14:20:02.721732+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "We get the first 3000 rows of the dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "3c26afca", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:20:02.811731Z", + "iopub.status.busy": "2026-04-09T14:20:02.811310Z", + "iopub.status.idle": "2026-04-09T14:20:06.302247Z", + "shell.execute_reply": "2026-04-09T14:20:06.301342Z" + }, + "id": "Mkq4RvEq7FQr", + "outputId": "e78c54dd-f931-4c00-a73d-e37f3eff19e8", + "papermill": { + "duration": 3.522975, + "end_time": "2026-04-09T14:20:06.303736+00:00", + "exception": false, + "start_time": "2026-04-09T14:20:02.780761+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1bd2499e051c47fe835936333918511a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "README.md: 0%| | 0.00/982 [00:00` token using removeprefix(`''`) since we're finetuning. The Processor will add this token before training and the model expects only one." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "2fd69cec", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:20:07.839111Z", + "iopub.status.busy": "2026-04-09T14:20:07.838390Z", + "iopub.status.idle": "2026-04-09T14:20:12.567650Z", + "shell.execute_reply": "2026-04-09T14:20:12.566972Z" + }, + "id": "1ahE8Ys37JDJ", + "outputId": "13973ed3-779d-485e-92aa-33b6925f1737", + "papermill": { + "duration": 4.762099, + "end_time": "2026-04-09T14:20:12.569118+00:00", + "exception": false, + "start_time": "2026-04-09T14:20:07.807019+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "65a0b27e6c4247a080a54ed6b18b656b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Map: 0%| | 0/3000 [00:00') for convo in convos]\n", + " return { \"text\" : texts, }\n", + "\n", + "dataset = dataset.map(formatting_prompts_func, batched = True)" + ] + }, + { + "cell_type": "markdown", + "id": "6fbe8d8f", + "metadata": { + "id": "ndDUB23CGAC5", + "papermill": { + "duration": 0.030617, + "end_time": "2026-04-09T14:20:12.631702+00:00", + "exception": false, + "start_time": "2026-04-09T14:20:12.601085+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "Let's see how the chat template did! Notice there is no `` token as the processor tokenizer will be adding one." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "84476bb5", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:20:12.697330Z", + "iopub.status.busy": "2026-04-09T14:20:12.696528Z", + "iopub.status.idle": "2026-04-09T14:20:12.702078Z", + "shell.execute_reply": "2026-04-09T14:20:12.701418Z" + }, + "id": "gGFzmplrEy9I", + "outputId": "36882276-635b-45f5-f9d8-3764373dc35e", + "papermill": { + "duration": 0.03974, + "end_time": "2026-04-09T14:20:12.703464+00:00", + "exception": false, + "start_time": "2026-04-09T14:20:12.663724+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'<|turn>user\\nWhat is the modulus operator in programming and how can I use it to calculate the modulus of two given numbers?\\n<|turn>model\\nIn programming, the modulus operator is represented by the \\'%\\' symbol. It calculates the remainder when one number is divided by another. To calculate the modulus of two given numbers, you can use the modulus operator in the following way:\\n\\n```python\\n# Calculate the modulus\\nModulus = a % b\\n\\nprint(\"Modulus of the given numbers is: \", Modulus)\\n```\\n\\nIn this code snippet, the variables \\'a\\' and \\'b\\' represent the two given numbers for which you want to calculate the modulus. By using the modulus operator \\'%\\', we calculate the remainder when \\'a\\' is divided by \\'b\\'. The result is then stored in the variable \\'Modulus\\'. Finally, the modulus value is printed using the \\'print\\' statement.\\n\\nFor example, if \\'a\\' is 10 and \\'b\\' is 4, the modulus calculation would be 10 % 4, which equals 2. Therefore, the output of the above code would be:\\n\\n```\\nModulus of the given numbers is: 2\\n```\\n\\nThis means that the modulus of 10 and 4 is 2.\\n'" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dataset[100][\"text\"]" + ] + }, + { + "cell_type": "markdown", + "id": "029e0426", + "metadata": { + "id": "idAEIeSQ3xdS", + "papermill": { + "duration": 0.030334, + "end_time": "2026-04-09T14:20:12.764440+00:00", + "exception": false, + "start_time": "2026-04-09T14:20:12.734106+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "\n", + "### Train the model\n", + "Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "2063eb63", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:20:12.827234Z", + "iopub.status.busy": "2026-04-09T14:20:12.826505Z", + "iopub.status.idle": "2026-04-09T14:20:58.477258Z", + "shell.execute_reply": "2026-04-09T14:20:58.476282Z" + }, + "id": "95_Nn-89DhsL", + "outputId": "87503724-3283-49e4-fec6-4b2c4e084fca", + "papermill": { + "duration": 45.684477, + "end_time": "2026-04-09T14:20:58.479229+00:00", + "exception": false, + "start_time": "2026-04-09T14:20:12.794752+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5a7b3ac7349846deb9064be7c434d764", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Unsloth: Tokenizing [\"text\"] (num_proc=8): 0%| | 0/3000 [00:00user\\n\",\n", + " response_part = \"<|turn>model\\n\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "5b1a7cc7", + "metadata": { + "id": "Dv1NBUozV78l", + "papermill": { + "duration": 0.030613, + "end_time": "2026-04-09T14:21:02.511624+00:00", + "exception": false, + "start_time": "2026-04-09T14:21:02.481011+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "Let's verify masking the instruction part is done! Let's print the 100th row again. Notice how the sample only has a single `` as expected!" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "9fb4df86", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:21:02.575287Z", + "iopub.status.busy": "2026-04-09T14:21:02.574549Z", + "iopub.status.idle": "2026-04-09T14:21:02.581906Z", + "shell.execute_reply": "2026-04-09T14:21:02.581204Z" + }, + "id": "LtsMVtlkUhja", + "outputId": "d3126757-2a14-4c6d-99f8-98b3bf91be55", + "papermill": { + "duration": 0.041188, + "end_time": "2026-04-09T14:21:02.583417+00:00", + "exception": false, + "start_time": "2026-04-09T14:21:02.542229+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'<|turn>user\\nWhat is the modulus operator in programming and how can I use it to calculate the modulus of two given numbers?\\n<|turn>model\\nIn programming, the modulus operator is represented by the \\'%\\' symbol. It calculates the remainder when one number is divided by another. To calculate the modulus of two given numbers, you can use the modulus operator in the following way:\\n\\n```python\\n# Calculate the modulus\\nModulus = a % b\\n\\nprint(\"Modulus of the given numbers is: \", Modulus)\\n```\\n\\nIn this code snippet, the variables \\'a\\' and \\'b\\' represent the two given numbers for which you want to calculate the modulus. By using the modulus operator \\'%\\', we calculate the remainder when \\'a\\' is divided by \\'b\\'. The result is then stored in the variable \\'Modulus\\'. Finally, the modulus value is printed using the \\'print\\' statement.\\n\\nFor example, if \\'a\\' is 10 and \\'b\\' is 4, the modulus calculation would be 10 % 4, which equals 2. Therefore, the output of the above code would be:\\n\\n```\\nModulus of the given numbers is: 2\\n```\\n\\nThis means that the modulus of 10 and 4 is 2.\\n'" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tokenizer.decode(trainer.train_dataset[100][\"input_ids\"])" + ] + }, + { + "cell_type": "markdown", + "id": "f3972b9e", + "metadata": { + "id": "4Kyjy__m9KY3", + "papermill": { + "duration": 0.030403, + "end_time": "2026-04-09T14:21:02.645269+00:00", + "exception": false, + "start_time": "2026-04-09T14:21:02.614866+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "Now let's print the masked out example - you should see only the answer is present:" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "165d2951", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:21:02.708036Z", + "iopub.status.busy": "2026-04-09T14:21:02.707522Z", + "iopub.status.idle": "2026-04-09T14:21:02.715319Z", + "shell.execute_reply": "2026-04-09T14:21:02.714464Z" + }, + "id": "_rD6fl8EUxnG", + "outputId": "e579a2b3-9df6-4c40-ef7c-98d7c9cd9d82", + "papermill": { + "duration": 0.041393, + "end_time": "2026-04-09T14:21:02.716862+00:00", + "exception": false, + "start_time": "2026-04-09T14:21:02.675469+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "' In programming, the modulus operator is represented by the \\'%\\' symbol. It calculates the remainder when one number is divided by another. To calculate the modulus of two given numbers, you can use the modulus operator in the following way:\\n\\n```python\\n# Calculate the modulus\\nModulus = a % b\\n\\nprint(\"Modulus of the given numbers is: \", Modulus)\\n```\\n\\nIn this code snippet, the variables \\'a\\' and \\'b\\' represent the two given numbers for which you want to calculate the modulus. By using the modulus operator \\'%\\', we calculate the remainder when \\'a\\' is divided by \\'b\\'. The result is then stored in the variable \\'Modulus\\'. Finally, the modulus value is printed using the \\'print\\' statement.\\n\\nFor example, if \\'a\\' is 10 and \\'b\\' is 4, the modulus calculation would be 10 % 4, which equals 2. Therefore, the output of the above code would be:\\n\\n```\\nModulus of the given numbers is: 2\\n```\\n\\nThis means that the modulus of 10 and 4 is 2.\\n'" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tokenizer.decode([tokenizer.pad_token_id if x == -100 else x for x in trainer.train_dataset[100][\"labels\"]]).replace(tokenizer.pad_token, \" \")" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "73ec0163", + "metadata": { + "cellView": "form", + "execution": { + "iopub.execute_input": "2026-04-09T14:21:02.783682Z", + "iopub.status.busy": "2026-04-09T14:21:02.783109Z", + "iopub.status.idle": "2026-04-09T14:21:02.788999Z", + "shell.execute_reply": "2026-04-09T14:21:02.788020Z" + }, + "id": "2ejIt2xSNKKp", + "outputId": "2ecea13f-2faa-4c7d-e006-a0a1cb90612a", + "papermill": { + "duration": 0.039208, + "end_time": "2026-04-09T14:21:02.790686+00:00", + "exception": false, + "start_time": "2026-04-09T14:21:02.751478+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "GPU = Tesla T4. Max memory = 14.563 GB.\n", + "9.891 GB of memory reserved.\n" + ] + } + ], + "source": [ + "# @title Show current memory stats\n", + "gpu_stats = torch.cuda.get_device_properties(0)\n", + "start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)\n", + "print(f\"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.\")\n", + "print(f\"{start_gpu_memory} GB of memory reserved.\")" + ] + }, + { + "cell_type": "markdown", + "id": "eee6fad9", + "metadata": { + "id": "CNP1Uidk9mrz", + "papermill": { + "duration": 0.030451, + "end_time": "2026-04-09T14:21:02.852194+00:00", + "exception": false, + "start_time": "2026-04-09T14:21:02.821743+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "# Let's train the model!\n", + "\n", + "To resume a training run, set `trainer.train(resume_from_checkpoint = True)`" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "d83b4edc", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:21:02.917281Z", + "iopub.status.busy": "2026-04-09T14:21:02.916648Z", + "iopub.status.idle": "2026-04-09T14:27:25.935388Z", + "shell.execute_reply": "2026-04-09T14:27:25.934474Z" + }, + "id": "yqxqAZ7KJ4oL", + "outputId": "180092e1-ae76-449a-9f2d-0dbc3894b8f8", + "papermill": { + "duration": 383.053463, + "end_time": "2026-04-09T14:27:25.937341+00:00", + "exception": false, + "start_time": "2026-04-09T14:21:02.883878+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': 2}.\n", + "==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1\n", + " \\\\ /| Num examples = 2,991 | Num Epochs = 1 | Total steps = 60\n", + "O^O/ \\_/ \\ Batch size per device = 1 | Gradient accumulation steps = 4\n", + "\\ / Data Parallel GPUs = 1 | Total batch size (1 x 4 x 1) = 4\n", + " \"-____-\" Trainable parameters = 18,350,080 of 8,014,506,528 (0.23% trained)\n", + "Caching is incompatible with gradient checkpointing in Gemma4TextDecoderLayer. Setting `past_key_values=None`.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Unsloth: Will smartly offload gradients to save VRAM!\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [60/60 05:53, Epoch 0/1]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StepTraining Loss
110.523083
210.053993
312.042599
49.989523
510.095123
610.694252
710.365056
89.754820
98.308989
107.331110
117.254261
126.173282
136.381964
144.963476
155.690104
164.411693
174.552224
184.913744
195.021698
204.934382
213.959793
223.978175
234.443007
244.595473
253.987100
263.831797
273.935458
283.008995
293.504218
303.710511
313.277609
323.731925
333.324703
343.664254
353.287769
363.606441
373.251672
383.009572
393.222754
402.543038
413.437666
423.372575
432.745370
442.456408
452.818116
463.419305
472.893388
483.205531
492.713997
502.372560
512.854839
522.170771
533.100059
542.641205
552.271563
562.701172
572.765497
582.787446
592.603349
602.821406

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "trainer_stats = trainer.train()" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "ba3079a2", + "metadata": { + "cellView": "form", + "execution": { + "iopub.execute_input": "2026-04-09T14:27:26.004799Z", + "iopub.status.busy": "2026-04-09T14:27:26.004480Z", + "iopub.status.idle": "2026-04-09T14:27:26.012023Z", + "shell.execute_reply": "2026-04-09T14:27:26.011040Z" + }, + "id": "pCqnaKmlO1U9", + "outputId": "585888de-dbd8-4562-c6e5-a5f0dbf26dc7", + "papermill": { + "duration": 0.042476, + "end_time": "2026-04-09T14:27:26.013461+00:00", + "exception": false, + "start_time": "2026-04-09T14:27:25.970985+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "379.4784 seconds used for training.\n", + "6.32 minutes used for training.\n", + "Peak reserved memory = 10.715 GB.\n", + "Peak reserved memory for training = 0.824 GB.\n", + "Peak reserved memory % of max memory = 73.577 %.\n", + "Peak reserved memory for training % of max memory = 5.658 %.\n" + ] + } + ], + "source": [ + "# @title Show final memory and time stats\n", + "used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "used_memory_for_lora = round(used_memory - start_gpu_memory, 3)\n", + "used_percentage = round(used_memory / max_memory * 100, 3)\n", + "lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)\n", + "print(f\"{trainer_stats.metrics['train_runtime']} seconds used for training.\")\n", + "print(\n", + " f\"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.\"\n", + ")\n", + "print(f\"Peak reserved memory = {used_memory} GB.\")\n", + "print(f\"Peak reserved memory for training = {used_memory_for_lora} GB.\")\n", + "print(f\"Peak reserved memory % of max memory = {used_percentage} %.\")\n", + "print(f\"Peak reserved memory for training % of max memory = {lora_percentage} %.\")" + ] + }, + { + "cell_type": "markdown", + "id": "78fe5ece", + "metadata": { + "id": "ekOmTR1hSNcr", + "papermill": { + "duration": 0.031646, + "end_time": "2026-04-09T14:27:26.078664+00:00", + "exception": false, + "start_time": "2026-04-09T14:27:26.047018+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "\n", + "### Inference\n", + "Let's run the model via Unsloth native inference! According to the `Gemma-4` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64`" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "789f3453", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:27:26.144119Z", + "iopub.status.busy": "2026-04-09T14:27:26.143382Z", + "iopub.status.idle": "2026-04-09T14:27:38.238264Z", + "shell.execute_reply": "2026-04-09T14:27:38.237354Z" + }, + "id": "kR3gIAX-SM2q", + "outputId": "42097560-b2b6-4d32-a9f8-160db24434cc", + "papermill": { + "duration": 12.129782, + "end_time": "2026-04-09T14:27:38.240040+00:00", + "exception": false, + "start_time": "2026-04-09T14:27:26.110258+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['<|turn>user\\nContinue the sequence: 1, 1, 2, 3, 5, 8,\\n<|turn>model\\n13, 21, 34, 55, 89, ...\\n\\nThis is the **Fibonacci sequence**, where each number is the sum of the two preceding ones.']" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from unsloth.chat_templates import get_chat_template\n", + "tokenizer = get_chat_template(\n", + " tokenizer,\n", + " chat_template = \"gemma-4\",\n", + ")\n", + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{\n", + " \"type\" : \"text\",\n", + " \"text\" : \"Continue the sequence: 1, 1, 2, 3, 5, 8,\",\n", + " }]\n", + "}]\n", + "inputs = tokenizer.apply_chat_template(\n", + " messages,\n", + " add_generation_prompt = True, # Must add for generation\n", + " return_tensors = \"pt\",\n", + " tokenize = True,\n", + " return_dict = True,\n", + ").to(\"cuda\")\n", + "outputs = model.generate(\n", + " **inputs,\n", + " max_new_tokens = 64, # Increase for longer outputs!\n", + " # Recommended Gemma-4 settings!\n", + " temperature = 1.0, top_p = 0.95, top_k = 64,\n", + ")\n", + "tokenizer.batch_decode(outputs)" + ] + }, + { + "cell_type": "markdown", + "id": "d14c972d", + "metadata": { + "id": "CrSvZObor0lY", + "papermill": { + "duration": 0.03125, + "end_time": "2026-04-09T14:27:38.303897+00:00", + "exception": false, + "start_time": "2026-04-09T14:27:38.272647+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + " You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "80b77f9d", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:27:38.368840Z", + "iopub.status.busy": "2026-04-09T14:27:38.367877Z", + "iopub.status.idle": "2026-04-09T14:27:53.797997Z", + "shell.execute_reply": "2026-04-09T14:27:53.797095Z" + }, + "id": "e2pEuRb1r2Vg", + "outputId": "a648ff4f-3762-44d5-eb8b-5045a5c01a50", + "papermill": { + "duration": 15.464494, + "end_time": "2026-04-09T14:27:53.799621+00:00", + "exception": false, + "start_time": "2026-04-09T14:27:38.335127+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The sky appears blue primarily due to a phenomenon called **Rayleigh scattering**. Here's a breakdown of how this works:\n", + "\n", + "1. **Sunlight is composed of different wavelengths:** Sunlight, which comes from the sun, is made up of various colors, each with a different wavelength. Blue light has a shorter wavelength\n" + ] + } + ], + "source": [ + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\" : \"text\", \"text\" : \"Why is the sky blue?\",}]\n", + "}]\n", + "inputs = tokenizer.apply_chat_template(\n", + " messages,\n", + " add_generation_prompt = True, # Must add for generation\n", + " return_tensors = \"pt\",\n", + " tokenize = True,\n", + " return_dict = True,\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "_ = model.generate(\n", + " **inputs,\n", + " max_new_tokens = 64, # Increase for longer outputs!\n", + " # Recommended Gemma-4 settings!\n", + " temperature = 1.0, top_p = 0.95, top_k = 64,\n", + " streamer = TextStreamer(tokenizer, skip_prompt = True),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "ec07e9bd", + "metadata": { + "id": "uMuVrWbjAzhc", + "papermill": { + "duration": 0.035117, + "end_time": "2026-04-09T14:27:53.868536+00:00", + "exception": false, + "start_time": "2026-04-09T14:27:53.833419+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "\n", + "### Saving, loading finetuned models\n", + "To save the final model as LoRA adapters, either use Hugging Face's `push_to_hub` for an online save or `save_pretrained` for a local save.\n", + "\n", + "**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "c841a798", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:27:53.936332Z", + "iopub.status.busy": "2026-04-09T14:27:53.935832Z", + "iopub.status.idle": "2026-04-09T14:27:57.309758Z", + "shell.execute_reply": "2026-04-09T14:27:57.308857Z" + }, + "id": "upcOlWe7A1vc", + "outputId": "b38635d3-0eae-40ed-93f1-771f04c205c0", + "papermill": { + "duration": 3.410314, + "end_time": "2026-04-09T14:27:57.311357+00:00", + "exception": false, + "start_time": "2026-04-09T14:27:53.901043+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['gemma_4_lora/processor_config.json']" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model.save_pretrained(\"gemma_4_lora\") # Local saving\n", + "tokenizer.save_pretrained(\"gemma_4_lora\")\n", + "# model.push_to_hub(\"HF_ACCOUNT/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving\n", + "# tokenizer.push_to_hub(\"HF_ACCOUNT/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving" + ] + }, + { + "cell_type": "markdown", + "id": "c0280a59", + "metadata": { + "id": "AEEcJ4qfC7Lp", + "papermill": { + "duration": 0.03305, + "end_time": "2026-04-09T14:27:57.379393+00:00", + "exception": false, + "start_time": "2026-04-09T14:27:57.346343+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "4e76e14f", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:27:57.446364Z", + "iopub.status.busy": "2026-04-09T14:27:57.445610Z", + "iopub.status.idle": "2026-04-09T14:28:03.411741Z", + "shell.execute_reply": "2026-04-09T14:28:03.411099Z" + }, + "id": "MKX_XKs_BNZR", + "outputId": "c6380f12-92c3-44da-ee71-c6895cb19a95", + "papermill": { + "duration": 6.001005, + "end_time": "2026-04-09T14:28:03.413384+00:00", + "exception": false, + "start_time": "2026-04-09T14:27:57.412379+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "I am Gemma 4, a Large Language Model developed by Google DeepMind. I am an open weights model.\n" + ] + } + ], + "source": [ + "if False:\n", + " from unsloth import FastModel\n", + " model, tokenizer = FastModel.from_pretrained(\n", + " model_name = \"gemma_4_lora\", # YOUR MODEL YOU USED FOR TRAINING\n", + " max_seq_length = 2048,\n", + " load_in_4bit = True,\n", + " )\n", + "\n", + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\" : \"text\", \"text\" : \"What is Gemma-4?\",}]\n", + "}]\n", + "inputs = tokenizer.apply_chat_template(\n", + " messages,\n", + " add_generation_prompt = True, # Must add for generation\n", + " return_tensors = \"pt\",\n", + " tokenize = True,\n", + " return_dict = True,\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "_ = model.generate(\n", + " **inputs,\n", + " max_new_tokens = 128, # Increase for longer outputs!\n", + " # Recommended Gemma-4 settings!\n", + " temperature = 1.0, top_p = 0.95, top_k = 64,\n", + " streamer = TextStreamer(tokenizer, skip_prompt = True),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "2c10654e", + "metadata": { + "id": "f422JgM9sdVT", + "papermill": { + "duration": 0.033782, + "end_time": "2026-04-09T14:28:03.482906+00:00", + "exception": false, + "start_time": "2026-04-09T14:28:03.449124+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "### Saving to float16 for VLLM\n", + "\n", + "We also support saving to `float16` directly for deployment! We save it in the folder `gemma-4-finetune`. Set `if False` to `if True` to let it run!" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "b579db68", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:28:03.553382Z", + "iopub.status.busy": "2026-04-09T14:28:03.552682Z", + "iopub.status.idle": "2026-04-09T14:28:03.556813Z", + "shell.execute_reply": "2026-04-09T14:28:03.555982Z" + }, + "id": "iHjt_SMYsd3P", + "papermill": { + "duration": 0.040718, + "end_time": "2026-04-09T14:28:03.558346+00:00", + "exception": false, + "start_time": "2026-04-09T14:28:03.517628+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "if False: # Change to True to save finetune!\n", + " model.save_pretrained_merged(\"gemma-4-finetune\", tokenizer)" + ] + }, + { + "cell_type": "markdown", + "id": "ec2d7240", + "metadata": { + "id": "z6O48DbNIAr0", + "papermill": { + "duration": 0.033577, + "end_time": "2026-04-09T14:28:03.626993+00:00", + "exception": false, + "start_time": "2026-04-09T14:28:03.593416+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "If you want to upload / push to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location!" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "301ea027", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:28:03.695829Z", + "iopub.status.busy": "2026-04-09T14:28:03.695545Z", + "iopub.status.idle": "2026-04-09T14:28:03.699618Z", + "shell.execute_reply": "2026-04-09T14:28:03.698865Z" + }, + "id": "ZV-CiKPrIFG0", + "papermill": { + "duration": 0.040418, + "end_time": "2026-04-09T14:28:03.700987+00:00", + "exception": false, + "start_time": "2026-04-09T14:28:03.660569+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "if False: # Change to True to upload finetune\n", + " model.push_to_hub_merged(\n", + " \"HF_ACCOUNT/gemma-4-finetune\", tokenizer,\n", + " token = \"YOUR_HF_TOKEN\"\n", + " )" + ] + }, + { + "cell_type": "markdown", + "id": "f1580211", + "metadata": { + "id": "TCv4vXHd61i7", + "papermill": { + "duration": 0.034175, + "end_time": "2026-04-09T14:28:03.769687+00:00", + "exception": false, + "start_time": "2026-04-09T14:28:03.735512+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "### GGUF / llama.cpp Conversion\n", + "To save to `GGUF` / `llama.cpp`, we support it natively now for all models! For now, you can convert easily to `Q8_0, F16 or BF16` precision. `Q4_K_M` for 4bit will come later!" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "a11309e2", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:28:03.838541Z", + "iopub.status.busy": "2026-04-09T14:28:03.837824Z", + "iopub.status.idle": "2026-04-09T14:28:03.842150Z", + "shell.execute_reply": "2026-04-09T14:28:03.841298Z" + }, + "id": "FqfebeAdT073", + "papermill": { + "duration": 0.040287, + "end_time": "2026-04-09T14:28:03.843691+00:00", + "exception": false, + "start_time": "2026-04-09T14:28:03.803404+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "if False: # Change to True to save to GGUF\n", + " model.save_pretrained_gguf(\n", + " \"gemma_4_finetune\",\n", + " tokenizer,\n", + " quantization_method = \"Q8_0\", # For now only Q8_0, BF16, F16 supported\n", + " )" + ] + }, + { + "cell_type": "markdown", + "id": "a9ac86d1", + "metadata": { + "id": "Q974YEVPI7JS", + "papermill": { + "duration": 0.033488, + "end_time": "2026-04-09T14:28:03.911272+00:00", + "exception": false, + "start_time": "2026-04-09T14:28:03.877784+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "Likewise, if you want to instead push to GGUF to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location!" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "163597b1", + "metadata": { + "execution": { + "iopub.execute_input": "2026-04-09T14:28:03.982798Z", + "iopub.status.busy": "2026-04-09T14:28:03.982206Z", + "iopub.status.idle": "2026-04-09T14:28:03.986398Z", + "shell.execute_reply": "2026-04-09T14:28:03.985558Z" + }, + "id": "ZgcJIhJ0I_es", + "papermill": { + "duration": 0.041947, + "end_time": "2026-04-09T14:28:03.988402+00:00", + "exception": false, + "start_time": "2026-04-09T14:28:03.946455+00:00", + "status": "completed" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "if False: # Change to True to upload GGUF\n", + " model.push_to_hub_gguf(\n", + " \"HF_ACCOUNT/gemma_4_finetune\",\n", + " tokenizer,\n", + " quantization_method = \"Q8_0\", # Only Q8_0, BF16, F16 supported\n", + " token = \"YOUR_HF_TOKEN\",\n", + " )" + ] + }, + { + "cell_type": "markdown", + "id": "bedaf816", + "metadata": { + "id": "pnz9QOYTMvbH", + "papermill": { + "duration": 0.03474, + "end_time": "2026-04-09T14:28:04.058824+00:00", + "exception": false, + "start_time": "2026-04-09T14:28:04.024084+00:00", + "status": "completed" + }, + "tags": [] + }, + "source": [ + "Now, use the `gemma-4-finetune.gguf` file or `gemma-4-finetune-Q4_K_M.gguf` file in llama.cpp.\n", + "\n", + "And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!\n", + "\n", + "Some other resources:\n", + "1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)\n", + "2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)\n", + "3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)\n", + "4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)!\n", + "\n", + "

\n", + " \n", + " \n", + " \n", + "\n", + " Join Discord if you need help + ⭐️ Star us on Github ⭐️\n", + "
\n", + "\n", + " This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme)." + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kaggle": { + "accelerator": "nvidiaTeslaT4", + "dataSources": [ + { + "databundleVersionId": 16421787, + "sourceId": 134561, + "sourceType": "competition" + } + ], + "dockerImageVersionId": 31329, + "isGpuEnabled": true, + "isInternetEnabled": true, + "language": "python", + "sourceType": "notebook" + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.12" + }, + "papermill": { + "default_parameters": {}, + "duration": 785.802226, + "end_time": "2026-04-09T14:28:07.416534+00:00", + "environment_variables": {}, + "exception": null, + "input_path": "__notebook__.ipynb", + "output_path": "__notebook__.ipynb", + "parameters": {}, + "start_time": "2026-04-09T14:15:01.614308+00:00", + "version": "2.7.0" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": { + "001d2358ac194ff387ff44a9976849e1": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_1fef92fd286340aabaa41cad92ac64c3", + "placeholder": "​", + "style": "IPY_MODEL_552f08ce4c064e61b7b50d29ac4d7d44", + "tabbable": null, + "tooltip": null, + "value": "Unsloth: Tokenizing ["text"] (num_proc=8): 100%" + } + }, + "05ad56c561f04666b5ee42e3fd0e62c5": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "05c0091537e34df0a0f90e665ecc82a7": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_e5c6e31d8a7943be90673e5034f1acf0", + "placeholder": "​", + "style": "IPY_MODEL_a20ca0892f964e25ba1586d4a5bbd9dd", + "tabbable": null, + "tooltip": null, + "value": " 11.9k/? [00:00<00:00, 1.16MB/s]" + } + }, + "08bda2f90dc847dea5b45b609709f40c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "08fe8b4959d14bb5a67ba9da42040d43": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_a00778e41580422394e4e0280bfd2f82", + "IPY_MODEL_f4e8817005d044fb9fed3ba2035ebf6c", + "IPY_MODEL_e368e10eff6a4d77b3dcbec79b579e82" + ], + "layout": "IPY_MODEL_38fea064eac74561b71c0265acdf1614", + "tabbable": null, + "tooltip": null + } + }, + "09962f62cce94f56a8c39ba832b2d765": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_e79bc28cc65b4095bcdcce0706c34f32", + "placeholder": "​", + "style": "IPY_MODEL_ef66db03235441e49e3a28e8f7f142ea", + "tabbable": null, + "tooltip": null, + "value": "data/train-00000-of-00001.parquet: 100%" + } + }, + "0befc317ff8643628a499907eea0c38d": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0d99defc025f44daab34d0c575b350c2": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "0fa2f8b8168e4406b46fb1825f7d4292": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "106b68972ccb43c58b5029bf218ab1d5": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_3b84f78ce4d14ef9b1bd5b85f2c82aa7", + "max": 982.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_a0f0d065f65248e98aa3a31493911932", + "tabbable": null, + "tooltip": null, + "value": 982.0 + } + }, + "1134f408212e435db3a3f077c3fba26d": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "12b6d1753927475682f8702ca4cada4f": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "13630bf3f7be48c9b687ccff4e6b639f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "188a2acc11d34429a8b534511676a75f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_2e117d94f0414890968b9b9128d0bd83", + "IPY_MODEL_d4037b70f52e4d998c56578de7c79774", + "IPY_MODEL_d7a1b908d01841bd863d9763faaecf19" + ], + "layout": "IPY_MODEL_2d9fb98979364652b7b22d23db520b8e", + "tabbable": null, + "tooltip": null + } + }, + "198057887a0b463aaa5008f568be4b88": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_d4ace70caa0b427ebeb42e173962e4a6", + "placeholder": "​", + "style": "IPY_MODEL_803ef0c370ea4b768c955c2ccf60ef9f", + "tabbable": null, + "tooltip": null, + "value": " 3000/3000 [00:00<00:00, 10398.55 examples/s]" + } + }, + "1a5639acc2c940c6953598272c4b2e69": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_d6182709ee8042188c1784721434f68c", + "placeholder": "​", + "style": "IPY_MODEL_768634c4bf8142adb559389539b93592", + "tabbable": null, + "tooltip": null, + "value": " 16.0G/16.0G [01:10<00:00, 322MB/s]" + } + }, + "1bd2499e051c47fe835936333918511a": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_f416c1e959ee4745bc8a272345480d0d", + "IPY_MODEL_106b68972ccb43c58b5029bf218ab1d5", + "IPY_MODEL_61788d01c4fc4cae83947b486bc89ffb" + ], + "layout": "IPY_MODEL_7d5975865cb8442989a4e7b6cdcd47a6", + "tabbable": null, + "tooltip": null + } + }, + "1fef92fd286340aabaa41cad92ac64c3": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "20fe61bf6b394ba89a2e59acf77cdce5": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "211d2ae3785141c59d4e415a0b5b11df": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "26fbb43053db4a2eb73f900bcfb61f14": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "287fbc35965645beb41b4ce8a70770f4": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2a5a818e39724e57b299536662a4210d": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "2b882452251c443fa8c8fa2c9feb7b46": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2d9fb98979364652b7b22d23db520b8e": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2e117d94f0414890968b9b9128d0bd83": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_64b87ca879094043b28c55f5e1809699", + "placeholder": "​", + "style": "IPY_MODEL_e6eae91cecb14392a0af9398937b2fbc", + "tabbable": null, + "tooltip": null, + "value": "Computing checksums: 100%" + } + }, + "2e5c5a5f83e24ec9b55d3ac973a1e191": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_32d6aea2c6d54fe7bffba1aa32ed1d13", + "IPY_MODEL_940db0f1f9ed402db0925d693d85df02", + "IPY_MODEL_05c0091537e34df0a0f90e665ecc82a7" + ], + "layout": "IPY_MODEL_d92b9a6eb55b482fa1de1bc6ef01c2fe", + "tabbable": null, + "tooltip": null + } + }, + "2f9c9391a52749148984920f1880ff58": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "32d6aea2c6d54fe7bffba1aa32ed1d13": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_05ad56c561f04666b5ee42e3fd0e62c5", + "placeholder": "​", + "style": "IPY_MODEL_1134f408212e435db3a3f077c3fba26d", + "tabbable": null, + "tooltip": null, + "value": "chat_template.jinja: " + } + }, + "3584c90d6b27469cb2780c6fc618f146": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "38fea064eac74561b71c0265acdf1614": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3951f3ffff774384acdfde89992fd0dd": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3b84f78ce4d14ef9b1bd5b85f2c82aa7": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3c6e58fc816741e7b1321e9bf0ee80a0": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3d028f45e72344c1a42519245f58a8ae": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "3f80b23d09b84265bbea8d3caa05b940": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "3f9356b907ff4aeb8a45acce1e82e41b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_eb1f0cb5250a460195a57ae36b2d1648", + "IPY_MODEL_682793ef605b4942a6e2299be26823c5", + "IPY_MODEL_1a5639acc2c940c6953598272c4b2e69" + ], + "layout": "IPY_MODEL_ce0953daab7c4cb290ce36a31350dadc", + "tabbable": null, + "tooltip": null + } + }, + "40a44625b8d54fbca680049dfe1ddd1c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_d75f4d2d03ae4247a54732e952674931", + "placeholder": "​", + "style": "IPY_MODEL_2a5a818e39724e57b299536662a4210d", + "tabbable": null, + "tooltip": null, + "value": "tokenizer.json: 100%" + } + }, + "41824be3f8014c7bb2dac828e792bae8": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_c7641a7d1ad24f5ca33469ae18fdac68", + "placeholder": "​", + "style": "IPY_MODEL_cbe4683fe14945e18dd7161a2c454500", + "tabbable": null, + "tooltip": null, + "value": " 3000/3000 [00:40<00:00, 79.59 examples/s]" + } + }, + "45de381897e74b69b1d4f46db5d36291": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "4c52050f012045e7b6980b51c672547d": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4cc728ba782043c0910331789fd4ce96": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "4f34e17e884648bab833dd7e006e6287": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_f2c4809b20c84e7a96d9b67a3d8ac09f", + "IPY_MODEL_87f56a4df31f499b886dd2b3a392892b", + "IPY_MODEL_553cc00c3bd348b5b5326c399d974a24" + ], + "layout": "IPY_MODEL_b6bde48c056745dda54e0d1420a78519", + "tabbable": null, + "tooltip": null + } + }, + "5382be602e5a42b29c026e90f7a5e585": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "552f08ce4c064e61b7b50d29ac4d7d44": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "553cc00c3bd348b5b5326c399d974a24": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_b3e6ffa744ed4549a25ddaf5e459cb6a", + "placeholder": "​", + "style": "IPY_MODEL_8db9010ceefb4493a303c361fce31d91", + "tabbable": null, + "tooltip": null, + "value": " 3000/3000 [00:01<00:00, 3853.33 examples/s]" + } + }, + "5553cc1760b043f3b82b87db8f8fc570": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "57574cc040184d2b8a4395b3287a8ef1": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "59f259377ce840c6b1d6f7477b4f74fb": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "5a7b3ac7349846deb9064be7c434d764": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_001d2358ac194ff387ff44a9976849e1", + "IPY_MODEL_84014c285cd34142887259eae7e1c518", + "IPY_MODEL_41824be3f8014c7bb2dac828e792bae8" + ], + "layout": "IPY_MODEL_d127ada24d82403e826fd8cc2be96aae", + "tabbable": null, + "tooltip": null + } + }, + "5fc156df253a4accaa09963954145a6b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "61788d01c4fc4cae83947b486bc89ffb": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_a31ce646c53140a1bb1ccd7777de20e7", + "placeholder": "​", + "style": "IPY_MODEL_5fc156df253a4accaa09963954145a6b", + "tabbable": null, + "tooltip": null, + "value": " 982/982 [00:00<00:00, 106kB/s]" + } + }, + "61e6c04c874f45e387821539981f2e76": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "62c1945a264749d7b0906cbca3d15350": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_6cb531ed8c234011bd0f06ae73690115", + "placeholder": "​", + "style": "IPY_MODEL_c423948df5c1410d80bc29d975e87492", + "tabbable": null, + "tooltip": null, + "value": "generation_config.json: 100%" + } + }, + "6307099d3c25460ba6bee80edd1d1a09": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_acf4cc08f5654424a995921b867414bd", + "max": 32169626.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_0fa2f8b8168e4406b46fb1825f7d4292", + "tabbable": null, + "tooltip": null, + "value": 32169626.0 + } + }, + "647a740f72ee4c319784610a86dae503": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_4c52050f012045e7b6980b51c672547d", + "placeholder": "​", + "style": "IPY_MODEL_45de381897e74b69b1d4f46db5d36291", + "tabbable": null, + "tooltip": null, + "value": " 208/208 [00:00<00:00, 22.5kB/s]" + } + }, + "64b87ca879094043b28c55f5e1809699": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "65a0b27e6c4247a080a54ed6b18b656b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_82ddb8eff0d24c57b66a23958ff8c248", + "IPY_MODEL_68426c295d0f41c582174dd4a5d2523f", + "IPY_MODEL_198057887a0b463aaa5008f568be4b88" + ], + "layout": "IPY_MODEL_5553cc1760b043f3b82b87db8f8fc570", + "tabbable": null, + "tooltip": null + } + }, + "65dec59f62c1480cb6bca2268f2a403a": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_dd6c5df2d2c34be880470f6fd5fc297e", + "placeholder": "​", + "style": "IPY_MODEL_99ca530944db40c688cc2e2167672d99", + "tabbable": null, + "tooltip": null, + "value": "Filter (num_proc=8): 100%" + } + }, + "65e41393d19847bb86101d4a4d6b46d4": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_8e8ccf63de5b4a7f9ce88b529c4cd59b", + "placeholder": "​", + "style": "IPY_MODEL_904ca5e14e8d4c339aa31e466f2eb2d6", + "tabbable": null, + "tooltip": null, + "value": " 100000/100000 [00:00<00:00, 123190.86 examples/s]" + } + }, + "66cdc617906c4996b55384b1e8a907fd": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "682793ef605b4942a6e2299be26823c5": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_8b24bd733ab54b019cb26c3157e41758", + "max": 15992595884.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_4cc728ba782043c0910331789fd4ce96", + "tabbable": null, + "tooltip": null, + "value": 15992595884.0 + } + }, + "68426c295d0f41c582174dd4a5d2523f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_ece07379e5f3496f9950ad425026ca5f", + "max": 3000.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_59f259377ce840c6b1d6f7477b4f74fb", + "tabbable": null, + "tooltip": null, + "value": 3000.0 + } + }, + "68b87a00770d4fa0856a5a4ffbf343e5": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6cb531ed8c234011bd0f06ae73690115": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6cebc0e67c664e059e14fb481985a383": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6d4f64faf10d4cd2bf1d4d7bfb7fd1b4": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_2b882452251c443fa8c8fa2c9feb7b46", + "placeholder": "​", + "style": "IPY_MODEL_e39ab592e86c4190bbbd488bcd85d99e", + "tabbable": null, + "tooltip": null, + "value": " 3000/3000 [00:02<00:00, 1114.52 examples/s]" + } + }, + "6e11e1ff8e7a4246a38d6143ac8daa6f": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "747599c76bbb4fc7b1c565b8dd4e1317": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_78d696ad36834f32aadb2db6f783cc29", + "max": 3000.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_f26c344426984a1a93868519dabb45bc", + "tabbable": null, + "tooltip": null, + "value": 3000.0 + } + }, + "75205816cf7f4c50b9eee6478a4ed98a": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_c5fc3eba43d74cb2acea114cc65139e2", + "placeholder": "​", + "style": "IPY_MODEL_08bda2f90dc847dea5b45b609709f40c", + "tabbable": null, + "tooltip": null, + "value": "Unsloth: Standardizing formats (num_proc=8): 100%" + } + }, + "760b0d5e9a9d44e386477ba3806fe69a": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_287fbc35965645beb41b4ce8a70770f4", + "max": 2130.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_b91f8b142c3b48cf9f08d2df6eda1978", + "tabbable": null, + "tooltip": null, + "value": 2130.0 + } + }, + "768634c4bf8142adb559389539b93592": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "78d696ad36834f32aadb2db6f783cc29": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "790136d7d63b4531b07d2ae73e8bd6af": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "7c2a23b31d77497e9ecc8128f038398b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_62c1945a264749d7b0906cbca3d15350", + "IPY_MODEL_f3fe3589911a40ebbfae2bd3596a935d", + "IPY_MODEL_647a740f72ee4c319784610a86dae503" + ], + "layout": "IPY_MODEL_6e11e1ff8e7a4246a38d6143ac8daa6f", + "tabbable": null, + "tooltip": null + } + }, + "7d01a45f2b68423396448afb5789db26": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "7d0812636bda4a41b39233e82f705cc3": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_b051a523cdd34fd180668215334d080c", + "placeholder": "​", + "style": "IPY_MODEL_61e6c04c874f45e387821539981f2e76", + "tabbable": null, + "tooltip": null, + "value": " 117M/117M [00:01<00:00, 583MB/s]" + } + }, + "7d5975865cb8442989a4e7b6cdcd47a6": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7d78ab46e4754acfb5915b3a0741817e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "7e9050fcebb9454cb5e2c85353d93756": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_20fe61bf6b394ba89a2e59acf77cdce5", + "placeholder": "​", + "style": "IPY_MODEL_2f9c9391a52749148984920f1880ff58", + "tabbable": null, + "tooltip": null, + "value": "tokenizer_config.json: " + } + }, + "7ed39adaaf9646ed8b9eaac7c64e4e26": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "803ef0c370ea4b768c955c2ccf60ef9f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "80eb316b1b1c49a28f7e8c01d97ab526": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_211d2ae3785141c59d4e415a0b5b11df", + "max": 116531415.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_b864c84037eb417e872fdcfeaa55bc8d", + "tabbable": null, + "tooltip": null, + "value": 116531415.0 + } + }, + "82ddb8eff0d24c57b66a23958ff8c248": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_68b87a00770d4fa0856a5a4ffbf343e5", + "placeholder": "​", + "style": "IPY_MODEL_3f80b23d09b84265bbea8d3caa05b940", + "tabbable": null, + "tooltip": null, + "value": "Map: 100%" + } + }, + "84014c285cd34142887259eae7e1c518": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_ac4e9c906b1b46d4bd7c712e7e9412aa", + "max": 3000.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_898c5d16a65f4922bb48389354bf156a", + "tabbable": null, + "tooltip": null, + "value": 3000.0 + } + }, + "85771c35b1354a02a5da2c6fac955977": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "86852c7625d6454888e1e037ed765a70": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_a79151a924304732b6c49e43ebd01838", + "max": 100000.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_92975d32f5954dfd9f4eedde77125e53", + "tabbable": null, + "tooltip": null, + "value": 100000.0 + } + }, + "868b0d680f2e4ec983d25229a3196e61": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "87f56a4df31f499b886dd2b3a392892b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_0befc317ff8643628a499907eea0c38d", + "max": 3000.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_f8edf29ad3514c31a53cab5b3f04f2da", + "tabbable": null, + "tooltip": null, + "value": 3000.0 + } + }, + "898c5d16a65f4922bb48389354bf156a": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "8b24bd733ab54b019cb26c3157e41758": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8db9010ceefb4493a303c361fce31d91": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "8e8ccf63de5b4a7f9ce88b529c4cd59b": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8fc1b1d31b414d7f8c68676ac0dfa064": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_ea1195f4e1494cd1b0188cb79e0812a7", + "placeholder": "​", + "style": "IPY_MODEL_e58ddbc47ea34e1b9b3ece0738cf296b", + "tabbable": null, + "tooltip": null, + "value": " 32.2M/32.2M [00:00<00:00, 160MB/s]" + } + }, + "904ca5e14e8d4c339aa31e466f2eb2d6": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "92975d32f5954dfd9f4eedde77125e53": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "940db0f1f9ed402db0925d693d85df02": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_fcebf1670b43483d8324d101283307c5", + "max": 1.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_7d78ab46e4754acfb5915b3a0741817e", + "tabbable": null, + "tooltip": null, + "value": 1.0 + } + }, + "98d18f12362a44f48acbc5fa8d7973e1": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "99ca530944db40c688cc2e2167672d99": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "9cc696acfaa24029a19aac32db80080a": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9e839d76d2fe4c42a7caf2d809cfa215": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9fcc1b0d5e7e47b38e0025c24aa8c329": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_09962f62cce94f56a8c39ba832b2d765", + "IPY_MODEL_80eb316b1b1c49a28f7e8c01d97ab526", + "IPY_MODEL_7d0812636bda4a41b39233e82f705cc3" + ], + "layout": "IPY_MODEL_ab3a91cb5e314145903e065d581c9be7", + "tabbable": null, + "tooltip": null + } + }, + "a00778e41580422394e4e0280bfd2f82": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_aec5f94e1ea64dc39eb51936779a81d2", + "placeholder": "​", + "style": "IPY_MODEL_5382be602e5a42b29c026e90f7a5e585", + "tabbable": null, + "tooltip": null, + "value": "processor_config.json: " + } + }, + "a0f0d065f65248e98aa3a31493911932": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "a20ca0892f964e25ba1586d4a5bbd9dd": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "a2465290c940413bbd0c5059e9207d8e": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "a25564c9cd6844539a80ba11ab305b89": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_7e9050fcebb9454cb5e2c85353d93756", + "IPY_MODEL_a853a0f5e14345868bfe0433a96990b0", + "IPY_MODEL_a8aa4963126647ae974557a5df08688b" + ], + "layout": "IPY_MODEL_12b6d1753927475682f8702ca4cada4f", + "tabbable": null, + "tooltip": null + } + }, + "a31ce646c53140a1bb1ccd7777de20e7": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a79151a924304732b6c49e43ebd01838": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a853a0f5e14345868bfe0433a96990b0": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_a2465290c940413bbd0c5059e9207d8e", + "max": 1.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_c9ebc137214a4d48aa437ec8ab4efccd", + "tabbable": null, + "tooltip": null, + "value": 1.0 + } + }, + "a8aa4963126647ae974557a5df08688b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_9cc696acfaa24029a19aac32db80080a", + "placeholder": "​", + "style": "IPY_MODEL_f9b35e10980c481ebb2e68ff6bd086ec", + "tabbable": null, + "tooltip": null, + "value": " 14.9k/? [00:00<00:00, 1.36MB/s]" + } + }, + "ab3a91cb5e314145903e065d581c9be7": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "abba5ca8b9c24aa5ad4834264bdd0bfc": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_75205816cf7f4c50b9eee6478a4ed98a", + "IPY_MODEL_c06835d0e1084b16811cf1678fa6a9bf", + "IPY_MODEL_bec3af9e58e84d829633e69b53bc80d1" + ], + "layout": "IPY_MODEL_d819c83dbe194834abe4485e51a61fcc", + "tabbable": null, + "tooltip": null + } + }, + "ac4e9c906b1b46d4bd7c712e7e9412aa": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "acf4cc08f5654424a995921b867414bd": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "aec5f94e1ea64dc39eb51936779a81d2": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b051a523cdd34fd180668215334d080c": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b05797a38c9041d7a4fd977b6b66ca13": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_40a44625b8d54fbca680049dfe1ddd1c", + "IPY_MODEL_6307099d3c25460ba6bee80edd1d1a09", + "IPY_MODEL_8fc1b1d31b414d7f8c68676ac0dfa064" + ], + "layout": "IPY_MODEL_66cdc617906c4996b55384b1e8a907fd", + "tabbable": null, + "tooltip": null + } + }, + "b074bfe33e75427b96014c838dd4a42a": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "b12e94ba51934a4097da47d981948854": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b3e6ffa744ed4549a25ddaf5e459cb6a": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b6bde48c056745dda54e0d1420a78519": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b864c84037eb417e872fdcfeaa55bc8d": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "b91f8b142c3b48cf9f08d2df6eda1978": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "bb0ac6fd692944548351a5c2efd31ea5": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "bec3af9e58e84d829633e69b53bc80d1": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_cdde474a35044655bd1693145eb1cba9", + "placeholder": "​", + "style": "IPY_MODEL_57574cc040184d2b8a4395b3287a8ef1", + "tabbable": null, + "tooltip": null, + "value": " 3000/3000 [00:01<00:00, 534.28 examples/s]" + } + }, + "c06835d0e1084b16811cf1678fa6a9bf": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_85771c35b1354a02a5da2c6fac955977", + "max": 3000.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_790136d7d63b4531b07d2ae73e8bd6af", + "tabbable": null, + "tooltip": null, + "value": 3000.0 + } + }, + "c382838d589c4320b7bbc41684168893": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_d8edddbcbf70493b8bfdff63602782b7", + "placeholder": "​", + "style": "IPY_MODEL_13630bf3f7be48c9b687ccff4e6b639f", + "tabbable": null, + "tooltip": null, + "value": "Generating train split: 100%" + } + }, + "c423948df5c1410d80bc29d975e87492": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "c5e70f486b134264911fa70a74fb3d19": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "c5fc3eba43d74cb2acea114cc65139e2": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c7641a7d1ad24f5ca33469ae18fdac68": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c9ebc137214a4d48aa437ec8ab4efccd": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "ca6689b7b9d1446c9f09e3dcf5125ae8": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_65dec59f62c1480cb6bca2268f2a403a", + "IPY_MODEL_747599c76bbb4fc7b1c565b8dd4e1317", + "IPY_MODEL_6d4f64faf10d4cd2bf1d4d7bfb7fd1b4" + ], + "layout": "IPY_MODEL_3c6e58fc816741e7b1321e9bf0ee80a0", + "tabbable": null, + "tooltip": null + } + }, + "cbe4683fe14945e18dd7161a2c454500": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "cdde474a35044655bd1693145eb1cba9": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ce0953daab7c4cb290ce36a31350dadc": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d127ada24d82403e826fd8cc2be96aae": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d15e935af71e48bf9eb16c54fd039cb4": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_c382838d589c4320b7bbc41684168893", + "IPY_MODEL_86852c7625d6454888e1e037ed765a70", + "IPY_MODEL_65e41393d19847bb86101d4a4d6b46d4" + ], + "layout": "IPY_MODEL_3951f3ffff774384acdfde89992fd0dd", + "tabbable": null, + "tooltip": null + } + }, + "d4037b70f52e4d998c56578de7c79774": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_26fbb43053db4a2eb73f900bcfb61f14", + "max": 1.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_0d99defc025f44daab34d0c575b350c2", + "tabbable": null, + "tooltip": null, + "value": 1.0 + } + }, + "d4ace70caa0b427ebeb42e173962e4a6": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d6182709ee8042188c1784721434f68c": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d75f4d2d03ae4247a54732e952674931": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d7664d8a987847b983fdda73512050a7": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "d7a1b908d01841bd863d9763faaecf19": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_f0233eb685774716b42a86884e43593b", + "placeholder": "​", + "style": "IPY_MODEL_c5e70f486b134264911fa70a74fb3d19", + "tabbable": null, + "tooltip": null, + "value": " 1/1 [00:00<00:00, 142.45it/s]" + } + }, + "d819c83dbe194834abe4485e51a61fcc": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d8edddbcbf70493b8bfdff63602782b7": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d92b9a6eb55b482fa1de1bc6ef01c2fe": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "dbdfb3f2528c4473bac00ebd848220a1": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "dd6c5df2d2c34be880470f6fd5fc297e": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "dfeb0d920b4a491d85062ee2f62db17d": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e368e10eff6a4d77b3dcbec79b579e82": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_ea4d607d803845d199352d6247ca844f", + "placeholder": "​", + "style": "IPY_MODEL_d7664d8a987847b983fdda73512050a7", + "tabbable": null, + "tooltip": null, + "value": " 1.69k/? [00:00<00:00, 146kB/s]" + } + }, + "e39ab592e86c4190bbbd488bcd85d99e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "e58ddbc47ea34e1b9b3ece0738cf296b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "e5c6e31d8a7943be90673e5034f1acf0": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e6eae91cecb14392a0af9398937b2fbc": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "e79bc28cc65b4095bcdcce0706c34f32": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e94a7c81af474a4c9ada6310f0fa71f2": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "e9a732ae54f9464092bf38e35c644619": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_3584c90d6b27469cb2780c6fc618f146", + "placeholder": "​", + "style": "IPY_MODEL_7ed39adaaf9646ed8b9eaac7c64e4e26", + "tabbable": null, + "tooltip": null, + "value": "Loading weights: 100%" + } + }, + "ea1195f4e1494cd1b0188cb79e0812a7": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ea4d607d803845d199352d6247ca844f": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "eb1f0cb5250a460195a57ae36b2d1648": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_9e839d76d2fe4c42a7caf2d809cfa215", + "placeholder": "​", + "style": "IPY_MODEL_868b0d680f2e4ec983d25229a3196e61", + "tabbable": null, + "tooltip": null, + "value": "model.safetensors: 100%" + } + }, + "ece07379e5f3496f9950ad425026ca5f": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ef66db03235441e49e3a28e8f7f142ea": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "f0233eb685774716b42a86884e43593b": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f26c344426984a1a93868519dabb45bc": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "f2c4809b20c84e7a96d9b67a3d8ac09f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_dfeb0d920b4a491d85062ee2f62db17d", + "placeholder": "​", + "style": "IPY_MODEL_98d18f12362a44f48acbc5fa8d7973e1", + "tabbable": null, + "tooltip": null, + "value": "Map (num_proc=8): 100%" + } + }, + "f3fe3589911a40ebbfae2bd3596a935d": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_f74b38a069ee4a1f84435f2a7fe3919e", + "max": 208.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_7d01a45f2b68423396448afb5789db26", + "tabbable": null, + "tooltip": null, + "value": 208.0 + } + }, + "f416c1e959ee4745bc8a272345480d0d": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_dbdfb3f2528c4473bac00ebd848220a1", + "placeholder": "​", + "style": "IPY_MODEL_e94a7c81af474a4c9ada6310f0fa71f2", + "tabbable": null, + "tooltip": null, + "value": "README.md: 100%" + } + }, + "f4b411dee72341d7ba6ee61660534490": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_e9a732ae54f9464092bf38e35c644619", + "IPY_MODEL_760b0d5e9a9d44e386477ba3806fe69a", + "IPY_MODEL_fd3f63178dc444dfb32884cccb020faf" + ], + "layout": "IPY_MODEL_6cebc0e67c664e059e14fb481985a383", + "tabbable": null, + "tooltip": null + } + }, + "f4e8817005d044fb9fed3ba2035ebf6c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_bb0ac6fd692944548351a5c2efd31ea5", + "max": 1.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_3d028f45e72344c1a42519245f58a8ae", + "tabbable": null, + "tooltip": null, + "value": 1.0 + } + }, + "f74b38a069ee4a1f84435f2a7fe3919e": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f8edf29ad3514c31a53cab5b3f04f2da": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "f9b35e10980c481ebb2e68ff6bd086ec": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "fcebf1670b43483d8324d101283307c5": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "fd3f63178dc444dfb32884cccb020faf": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_b12e94ba51934a4097da47d981948854", + "placeholder": "​", + "style": "IPY_MODEL_b074bfe33e75427b96014c838dd4a42a", + "tabbable": null, + "tooltip": null, + "value": " 2130/2130 [00:53<00:00, 377.03it/s]" + } + } + }, + "version_major": 2, + "version_minor": 0 + } + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/tooling/fine-tuning/unsloth/notebooks/Gemma4_(26B_A4B)-Text.ipynb b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(26B_A4B)-Text.ipynb new file mode 100644 index 0000000..7477a45 --- /dev/null +++ b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(26B_A4B)-Text.ipynb @@ -0,0 +1,7723 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "64duhI2Gsavq" + }, + "source": [ + "To run this, press \"*Runtime*\" and press \"*Run all*\" on a Google Colab A100 instance!\n", + "
\n", + "\n", + "\n", + " Join Discord if you need help + ⭐ Star us on Github ⭐\n", + "
\n", + "\n", + "To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme).\n", + "\n", + "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tDDXqI4lsavq" + }, + "source": [ + "### News" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-Zyog1Zysavq" + }, + "source": [ + "Introducing **Unsloth Studio** - a new open source, no-code web UI to train and run LLMs. [Blog](https://unsloth.ai/docs/new/studio) • [Notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb)\n", + "\n", + "\n", + "\n", + "\n", + "
\"Unsloth
Train models — no code needed
\"Unsloth
Run GGUF models on Mac, Windows & Linux
\n", + "\n", + "Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe)\n", + "\n", + "Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context)\n", + "\n", + "New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) • [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) • [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning)\n", + "\n", + "Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LRFceDFcsavq" + }, + "source": [ + "### Installation" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "A6wGqvTjsavr" + }, + "outputs": [], + "source": "%%capture\nimport os, re\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r'[\\d]{1,}\\.[\\d]{1,}', str(torch.__version__)).group(0)\n xformers = 'xformers==' + {'2.10':'0.0.34','2.9':'0.0.33.post1','2.8':'0.0.32.post2'}.get(v, \"0.0.34\")\n !pip install sentencepiece protobuf \"datasets==4.3.0\" \"huggingface_hub>=0.34.0\" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;" + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "lBN09c1tUlSV" + }, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TGMWlrRdzwgf" + }, + "source": [ + "### Unsloth\n", + "\n", + "`FastModel` supports loading nearly any model now! This includes Vision and Text models!" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "-Xbb0cuLzwgf", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 461, + "referenced_widgets": [ + "7769eb91dcb74e2b998cea8da27110c0", + "c96e424cc8ba44c9ad664e4acc966e99", + "8d33ff5f466e416ca3e3332a5fd15d59", + "6dd24e24dccb4f76884a667a339889a5", + "f64b22ad99e54673944ad3ba7bde198b", + "a6828f24aaad43d58bf87cbadbfdae20", + "e233f639b56b47ba813d231923e6234e", + "692480d015c746ae9a97e09def1832c9", + "6a6c3c78ddbd4e27a5886e47cf44e476", + "1a3548fbc46e48cfb0df425c6b640467", + "62dd1228b1d0402ead07486077387640", + "12c7026200d14b42b3d085995ab11a7a", + "7ca13c9cdb5c4b649c78793975295602", + "1ff0ed429f7f4784a1846b62126e6288", + "3bb8deb734824b119dc6a75a03671357", + "038f96ae1ed5480b9f209d268ae32cbd", + "670f3a9c0b2045cfa378a5aac8337396", + "6f5ce5cfa09f492e956109eda6f5b3f3", + "8ebf11b7ab2142ceb3a1893d4f215dd3", + "604450c9e490468081c0577fb9d6116c", + "5818f1d23c7e4fb9b0f2a8c7fd00f6d8", + "790cc60e62b74a0f83bee6b135407008", + "aa8461c3b0a048859b122f833cdd0588", + "15d3f86682a7499ba4a89a47c3eb2cc9", + "193ff086d24f4ec79369788d1f23ce5b", + "42d741ab6080460d86c7b25d69c83513", + "ca26a64b7c8141849e2af1c7926c8d4c", + "6caf778bd350418ab85f925401358cb7", + "4bcd980f165945358b2dd5bb46d852dd", + "361e586e6e55463a90f6b9dc4f536e0c", + "75b32154d6504abcb249e2c2e00c6afb", + "c743597897874b0cbc8440fdf5eb3f7a", + "38de8ae3e68a4b1796aca83591ee8156", + "4154a72354a8403dbe3369ba9c548fc7", + "c88bd79d9b5c4b0681930216611d5522", + "a46f89db282041ff8cc4401bb8a4a636", + "e50c40ce7c6b43fd9c347b915e2b6aab", + "8431c48be48e4720aec35e9bab52fa92", + "f41e37a94c5c4b68a781b58c42fca4ff", + "0b4bf122609942de8afa64e37c61654b", + "e138b40042e54adaa94a1323fd9cc779", + "f8adec579c1d43a68e1c9ed0a64718fc", + "b1c346a40483474097727972d6a05eef", + "e740ea8de695437588e91271aaa28928", + "517986179dc2406fba8401fd1a18c866", + "b60c8f2bd64e4df7b1f3897ba20996c9", + "d52aff7110844ba99567cd085f88f4fb", + "f8aac8afada341a49d34f71feeb33086", + "29a3f7f177fd45ec98d5fdf5358e87e6", + "7f04178813454a64b2e11ae328ddd975", + "19514ae2563642e7aafc05fdcd2d6eb9", + "e15263fc72d24bf3a6da6563d246bca2", + "35444cb6e46f4925895b17449663d323", + "1f05fbc19f6d4c3c985093ae4502f677", + "92944bffeb3542cbb02ebe44fe67ff2b", + "9a181c75a14544a698eb5b5e16a7e27c", + "b4eda43ec55741ceab50a6ef8823641b", + "de0d4fe3ef26497c968986714c4c1294", + "53e8cc1b279b4cb68fe499416d20a28b", + "d5a0369d9ec94f018b8480620c458d0d", + "a450acc04e4d4bf58177461b3b888397", + "0959671891b947a19ceb2f1d9677e7e5", + "a885ae88241a4013b42b8f4d89d74f99", + "5db37854f8c6430cac9a074903f4a766", + "77b5140635d14346bf7ae7fe1c905dff", + "9062490a257644d380adf6ef8a037694", + "2d1ad39161c840d38be868b6b5d30e06", + "291e50a3d894436ab3691f49cdba7461", + "43404477358b43ab890fc624fbbe093d", + "c94cef24144246b38abf0669f99576a8", + "816a0b007a0e4dda8e41f11dd7d853d2", + "2b6016ee95894de5831bd86bc0dd4985", + "f9eeebd8dfe34fca81e472e14c21211a", + "7351cbc882f14b308be2c7732e56bb91", + "4715a8e2e0f1442a8911832160ad2d47", + "224b1b795ca840d4a06db069caae1e6d", + "6f31576ee8624861a649053804780272", + "ea4c29f5f43a453e8494f835a6407024", + "88f39a67d2b941d683b52a3c44559525", + "fd01935ca8d74ef5981e120f9c36868b", + "422a169fb50048509742eb51dd7db278", + "46a247d88b904364a2c01b7a23adc16d", + "751c3757c92e4b989bad2beaf0e72862", + "f069d89c19924879af9abb3cc8d71b64", + "0e4d36725d4a4aea8c02ccaf3dd96eaf", + "ca077b888e13430385998ae69c8b6175", + "f53291e6d8d041c5b050722b334e6cab", + "1942ae9bfdf348b791c37a8bb211f29a", + "22f7568e9caf4d728d63c52378b3d5a5", + "b0cbed05625145fdba2236b2264f0990", + "ed26f2f3dc67495e8c5e66dd43f56988", + "73bdd24fd1234a138bba9b248810769a", + "91886a707da84f88898e9eb1a5603766", + "c17e1dc079a8448d9cd3678aa841a4cf", + "8f96e18da1b940f1907df7daec6fe385", + "5880c5dc36154d699e40aacc9829efae", + "f2e283218aaf4f7d826d44d87bc18b66", + "2e283e162b124bdcae47a501696bc9af", + "f3a176148e2e48d982f3362dd707d389" + ] + }, + "outputId": "d373134b-4b9d-43be-db56-d438180a5a5a" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.\n", + "Unsloth: Your Flash Attention 2 installation seems to be broken. Using Xformers instead. No performance changes will be seen.\n", + "🦥 Unsloth Zoo will now patch everything to make training faster!\n", + "==((====))== Unsloth 2026.4.4: Fast Gemma4 patching. Transformers: 5.5.0.\n", + " \\\\ /| NVIDIA A100-SXM4-80GB. Num GPUs = 1. Max memory: 79.251 GB. Platform: Linux.\n", + "O^O/ \\_/ \\ Torch: 2.10.0+cu128. CUDA: 8.0. CUDA Toolkit: 12.8. Triton: 3.6.0\n", + "\\ / Bfloat16 = TRUE. FA [Xformers = 0.0.34. FA2 = False]\n", + " \"-____-\" Free license: http://github.com/unslothai/unsloth\n", + "Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "model.safetensors.index.json: 0.00B [00:00, ?B/s]" + ], + "application/vnd.jupyter.widget-view+json": { + "version_major": 2, + "version_minor": 0, + "model_id": "7769eb91dcb74e2b998cea8da27110c0" + } + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Downloading (incomplete total...): 0.00B [00:00, ?B/s]" + ], + "application/vnd.jupyter.widget-view+json": { + "version_major": 2, + "version_minor": 0, + "model_id": "12c7026200d14b42b3d085995ab11a7a" + } + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Fetching 2 files: 0%| | 0/2 [00:00" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "9jGeSb9bWe0k", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "512b7df1-151d-4e3e-f8cf-88cd39c34e2f" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The animal in the image is a sloth. While sloths are not typically central characters in major films, they have appeared in various movies and documentaries. Some examples include:\n", + "\n", + "* **Zootopia (2016):** A sloth named Flash is a memorable character in this Disney animated film.\n", + "* **Nature Documentaries:** Sloths are frequently featured in nature documentaries from series like *Planet Earth* or *National Geographic*.\n", + "* **Various animated films:** They occasionally appear as background characters or minor roles in various animated productions.\n" + ] + } + ], + "source": [ + "sloth_link = \"https://files.worldwildlife.org/wwfcmsprod/images/Sloth_Sitting_iStock_3_12_2014/story_full_width/8l7pbjmj29_iStock_000011145477Large_mini__1_.jpg\"\n", + "\n", + "messages = [{\n", + " \"role\" : \"user\",\n", + " \"content\": [\n", + " { \"type\": \"image\", \"image\" : sloth_link },\n", + " { \"type\": \"text\", \"text\" : \"Which films does this animal feature in?\" }\n", + " ]\n", + "}]\n", + "# You might have to wait 1 minute for Unsloth's auto compiler\n", + "do_gemma_4_inference(messages, max_new_tokens = 256)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eh0BzbZPWtRD" + }, + "source": [ + "Let's make a poem about sloths!" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "id": "R3ExuK8cWuT3", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "db69482a-7f7d-4dc9-cc54-768eb938fbb1" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "In the canopy’s emerald, velvet embrace,\n", + "Where the sunlight descends with a slow, steady grace,\n", + "Dwells a master of stillness, a king of the pause,\n", + "Obeying the rhythm of nature’s own laws.\n", + "\n", + "No hurry disturbs him, no frantic pursuit,\n", + "He is content with the leaf and the fruit.\n", + "With limbs like slow rivers and eyes soft and wise,\n", + "He watches the world through a dreamy disguise.\n", + "\n", + "A coat made of moss and a spirit of peace,\n", + "He waits for the rush of the jungle to cease.\n", + "While the monkeys all chatter and colorful birds fly,\n", + "\n" + ] + } + ], + "source": [ + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{ \"type\" : \"text\",\n", + " \"text\" : \"Write a poem about sloths.\" }]\n", + "}]\n", + "do_gemma_4_inference(messages)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Bw5XPyYFajyM" + }, + "source": [ + "# Let's finetune Gemma 4!\n", + "\n", + "You can finetune the vision and text parts for now through selection - the audio part can also be finetuned - we're working to make it selectable as well!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SXd9bTZd1aaL" + }, + "source": [ + "We now add LoRA adapters so we only need to update a small amount of parameters!" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "id": "6bZsfBuZDeCL", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "81d7a65c-3b00-437c-d418-67867cd126de" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Unsloth: Detected MoE model with num_experts = 128 and target_modules = '(?:.*?(?:language|text).*?(?:self_attn|attention|attn|mlp|feed_forward|ffn|dense).*?(?:k_proj|q_proj|v_proj|o_proj|gate_proj|up_proj|down_proj|proj|linear).*?)|(?:\\\\bmodel\\\\.layers\\\\.[\\\\d]{1,}\\\\.(?:self_attn|attention|attn|mlp|feed_forward|ffn|dense)\\\\.(?:(?:k_proj|q_proj|v_proj|o_proj|gate_proj|up_proj|down_proj|proj|linear)))'. Enabling LoRA on MoE parameters: ['mlp.experts.gate_up_proj', 'mlp.experts.down_proj']\n", + "Unsloth: PEFT set target_parameters but found no matching parameters.\n", + "This is expected for MoE models - Unsloth handles MoE expert LoRA targeting separately.\n" + ] + } + ], + "source": [ + "model = FastModel.get_peft_model(\n", + " model,\n", + " finetune_vision_layers = False, # Turn off for just text!\n", + " finetune_language_layers = True, # Should leave on!\n", + " finetune_attention_modules = True, # Attention good for GRPO\n", + " finetune_mlp_modules = True, # Should leave on always!\n", + "\n", + " r = 8, # Larger = higher accuracy, but might overfit\n", + " lora_alpha = 8, # Recommended alpha == r at least\n", + " lora_dropout = 0,\n", + " bias = \"none\",\n", + " random_state = 3407,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vITh0KVJ10qX" + }, + "source": [ + "\n", + "### Data Prep\n", + "We now use the `Gemma-4` format for conversation style finetunes. We use [Maxime Labonne's FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset in ShareGPT style. Gemma-4 renders multi turn conversations like below:\n", + "\n", + "```\n", + "<|turn>user\n", + "Hello\n", + "<|turn>model\n", + "Hey there!\n", + "```\n", + "We use our `get_chat_template` function to get the correct chat template. We support `zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, phi3, llama3, phi4, qwen2.5, gemma3, gemma-4` and more." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "id": "LjY75GoYUCB8" + }, + "outputs": [], + "source": [ + "from unsloth.chat_templates import get_chat_template\n", + "tokenizer = get_chat_template(\n", + " tokenizer,\n", + " chat_template = \"gemma-4-thinking\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZQkXuGYxbJ-e" + }, + "source": [ + "We get the first 3000 rows of the dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "id": "Mkq4RvEq7FQr", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 113, + "referenced_widgets": [ + "ea642c11633e49db979a42c731ee1d65", + "7d44ac1ab19148ddbeb30b5a154ad8ae", + "608c16dc8fc049c1aa16b44db90b3aba", + "9fe2a1a901af4445aed445bce8b40c83", + "caa699c0834f42c68854dd5d612dde74", + "c512cc070e8349408354f6c9454eb2b7", + "0302c405886540e9837f90608688c0be", + "1521c752798a47fb996da308bf41a27f", + "7208562807bb435b85271416df707e93", + "26779d9bd26b461e9df7846137d31213", + "9c3255be55bf4c1ea91f55754010961e", + "0921d7187ee24114880fa7d7b7db645d", + "f28157da9878442998595a29bda2f64b", + "13580885a8fc41acb660a155918fba88", + "90c1689829f242998f4b35bb8068d5f6", + "b9c0ae90bcbd498ab039c00f6d7c0017", + "dc0f07ea3fb74c9a9817e3c61a237211", + "22e5c73c269247f3915c216c9efb2a7e", + "ed6383465575471689ab4d8a2323da84", + "fb1419c94d254f38ae1007200c3cec92", + "a8bb4aa7409647c2b8949b14e67f8222", + "63dd480a71204e75b4bb7dd8ca76790f", + "3a0e9f6cd1014fd38ccd3ef2734b2929", + "7b0b5a0d6e474bb78ee8eac7281099b5", + "f8b0f6f04f734d4f8bb59c29ee163a42", + "e82867e03ef64048b0632e551a941eb3", + "33bd1e52d5094206942d06bbdab706cf", + "7818031043d44f27a87fa3574cc8c86e", + "38cca59244824cff98732887980d5172", + "ecc2db1896c244958fe0e79dd11b66bd", + "d838de584c474e8ea454a4c71185af61", + "bbdce42ed0704835ade22bd2a2f8c46c", + "750bfb16f1f442a2bd1e4b2ee021319b" + ] + }, + "outputId": "2ab453f2-3708-4f67-b137-dbd70ce21643" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "README.md: 0%| | 0.00/982 [00:00` token using removeprefix(`''`) since we're finetuning. The Processor will add this token before training and the model expects only one." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "id": "1ahE8Ys37JDJ", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 49, + "referenced_widgets": [ + "89abc50a036d4faf9be10ceed297717b", + "fd4fe9ecb07b4a93b28cc23aff8c869d", + "10f5aeae5a594847aac7310d83942932", + "cbd6d8c7ed884edd9f53d56f8abfd194", + "aaf13b4c776146be9b377259caad3a9f", + "eceb4abf92b744df8973b3443e956ff5", + "4ed42a6b2ed541edaa51d5769589b234", + "f81f35d00b0443f5aff0a50382fca927", + "27abe44e35554d3e8338b50f494f7bd2", + "d07a25a3f10e47fa835d5d9ef6088e09", + "f37eb7bb43cc4409bfbfd827450dc4c0" + ] + }, + "outputId": "8fcfe6f4-5d8e-4f43-ca00-3ed0d290a09d" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Map: 0%| | 0/3000 [00:00') for convo in convos]\n", + " return { \"text\" : texts, }\n", + "\n", + "dataset = dataset.map(formatting_prompts_func, batched = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ndDUB23CGAC5" + }, + "source": [ + "Let's see how the chat template did! Notice there is no `` token as the processor tokenizer will be adding one." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "id": "gGFzmplrEy9I", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 140 + }, + "outputId": "2985589a-7ff8-4334-afde-3354f60e3ce6" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'<|turn>user\\nWhat is the modulus operator in programming and how can I use it to calculate the modulus of two given numbers?\\n<|turn>model\\n<|channel>thought\\nIn programming, the modulus operator is represented by the \\'%\\' symbol. It calculates the remainder when one number is divided by another. To calculate the modulus of two given numbers, you can use the modulus operator in the following way:\\n\\n```python\\n# Calculate the modulus\\nModulus = a % b\\n\\nprint(\"Modulus of the given numbers is: \", Modulus)\\n```\\n\\nIn this code snippet, the variables \\'a\\' and \\'b\\' represent the two given numbers for which you want to calculate the modulus. By using the modulus operator \\'%\\', we calculate the remainder when \\'a\\' is divided by \\'b\\'. The result is then stored in the variable \\'Modulus\\'. Finally, the modulus value is printed using the \\'print\\' statement.\\n\\nFor example, if \\'a\\' is 10 and \\'b\\' is 4, the modulus calculation would be 10 % 4, which equals 2. Therefore, the output of the above code would be:\\n\\n```\\nModulus of the given numbers is: 2\\n```\\n\\nThis means that the modulus of 10 and 4 is 2.\\n'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 13 + } + ], + "source": [ + "dataset[100][\"text\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "idAEIeSQ3xdS" + }, + "source": [ + "\n", + "### Train the model\n", + "Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "id": "95_Nn-89DhsL", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 49, + "referenced_widgets": [ + "c3d4dd1d33394f1a88ca079188777fe1", + "c1533ef6cb884d6e86a22519ed7125ce", + "824b65bd8b7a41baadbe6ddb79b17de4", + "c118b89131354eb98deaae78238b0210", + "89fccf64a5ca4b08b435b4223f42793e", + "a4b2ebf57ab144f4bff90820725b6108", + "f5b43de869004a42b2b9d0740aa455e7", + "85243a7a0b3a4abb891cf55828ba8b7a", + "a9805e75f7df4882a90770bf8a7a5530", + "b3ac166597b342ca8cbf4eb0b361e7a4", + "97802db0631649d79348faed67ba75ef" + ] + }, + "outputId": "fb94fdc8-f519-4b05-d689-525b6857cc89" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Unsloth: Tokenizing [\"text\"] (num_proc=16): 0%| | 0/3000 [00:00user\\n\",\n", + " response_part = \"<|turn>model\\n\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Dv1NBUozV78l" + }, + "source": [ + "Let's verify masking the instruction part is done! Let's print the 100th row again. Notice how the sample only has a single `` as expected!" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "id": "LtsMVtlkUhja", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 140 + }, + "outputId": "e6063abd-0fbb-4ae7-d93a-500cbe5f9e5f" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'<|turn>user\\nWhat is the modulus operator in programming and how can I use it to calculate the modulus of two given numbers?\\n<|turn>model\\n<|channel>thought\\nIn programming, the modulus operator is represented by the \\'%\\' symbol. It calculates the remainder when one number is divided by another. To calculate the modulus of two given numbers, you can use the modulus operator in the following way:\\n\\n```python\\n# Calculate the modulus\\nModulus = a % b\\n\\nprint(\"Modulus of the given numbers is: \", Modulus)\\n```\\n\\nIn this code snippet, the variables \\'a\\' and \\'b\\' represent the two given numbers for which you want to calculate the modulus. By using the modulus operator \\'%\\', we calculate the remainder when \\'a\\' is divided by \\'b\\'. The result is then stored in the variable \\'Modulus\\'. Finally, the modulus value is printed using the \\'print\\' statement.\\n\\nFor example, if \\'a\\' is 10 and \\'b\\' is 4, the modulus calculation would be 10 % 4, which equals 2. Therefore, the output of the above code would be:\\n\\n```\\nModulus of the given numbers is: 2\\n```\\n\\nThis means that the modulus of 10 and 4 is 2.\\n'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 16 + } + ], + "source": [ + "tokenizer.decode(trainer.train_dataset[100][\"input_ids\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4Kyjy__m9KY3" + }, + "source": [ + "Now let's print the masked out example - you should see only the answer is present:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "id": "_rD6fl8EUxnG", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 140 + }, + "outputId": "420e56a4-9651-45b4-d8f7-c33f3b7ff052" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "' <|channel>thought\\nIn programming, the modulus operator is represented by the \\'%\\' symbol. It calculates the remainder when one number is divided by another. To calculate the modulus of two given numbers, you can use the modulus operator in the following way:\\n\\n```python\\n# Calculate the modulus\\nModulus = a % b\\n\\nprint(\"Modulus of the given numbers is: \", Modulus)\\n```\\n\\nIn this code snippet, the variables \\'a\\' and \\'b\\' represent the two given numbers for which you want to calculate the modulus. By using the modulus operator \\'%\\', we calculate the remainder when \\'a\\' is divided by \\'b\\'. The result is then stored in the variable \\'Modulus\\'. Finally, the modulus value is printed using the \\'print\\' statement.\\n\\nFor example, if \\'a\\' is 10 and \\'b\\' is 4, the modulus calculation would be 10 % 4, which equals 2. Therefore, the output of the above code would be:\\n\\n```\\nModulus of the given numbers is: 2\\n```\\n\\nThis means that the modulus of 10 and 4 is 2.\\n'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 17 + } + ], + "source": [ + "tokenizer.decode([tokenizer.pad_token_id if x == -100 else x for x in trainer.train_dataset[100][\"labels\"]]).replace(tokenizer.pad_token, \" \")" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "cellView": "form", + "id": "2ejIt2xSNKKp", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "7c222465-fa0f-467c-91d8-f2c6a2ad3938" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "GPU = NVIDIA A100-SXM4-80GB. Max memory = 79.251 GB.\n", + "46.416 GB of memory reserved.\n" + ] + } + ], + "source": [ + "# @title Show current memory stats\n", + "gpu_stats = torch.cuda.get_device_properties(0)\n", + "start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)\n", + "print(f\"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.\")\n", + "print(f\"{start_gpu_memory} GB of memory reserved.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CNP1Uidk9mrz" + }, + "source": [ + "# Let's train the model!\n", + "\n", + "To resume a training run, set `trainer.train(resume_from_checkpoint = True)`" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "id": "yqxqAZ7KJ4oL", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "outputId": "c2db93ff-2bc9-4ebc-dcc5-36d484a7618c" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': 2}.\n", + "==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1\n", + " \\\\ /| Num examples = 3,000 | Num Epochs = 1 | Total steps = 60\n", + "O^O/ \\_/ \\ Batch size per device = 1 | Gradient accumulation steps = 4\n", + "\\ / Data Parallel GPUs = 1 | Total batch size (1 x 4 x 1) = 4\n", + " \"-____-\" Trainable parameters = 9,292,800 of 25,815,226,672 (0.04% trained)\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Unsloth: Will smartly offload gradients to save VRAM!\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [60/60 20:53, Epoch 0/1]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StepTraining Loss
11.781083
20.566694
30.878096
41.368889
51.104031
60.794103
70.815552
80.770014
90.916691
101.033089
111.126286
120.652004
130.769890
140.978357
150.766254
160.999658
171.138031
181.053890
191.042485
200.973515
210.605233
221.156014
230.922710
240.928817
250.575161
260.498077
270.613930
280.653096
290.572882
300.831407
310.706046
320.678690
331.082695
340.499532
350.965407
360.964235
370.643548
380.955152
390.993222
400.663521
410.862385
420.868709
430.650642
440.554028
450.584829
461.281055
470.672700
480.687023
490.953930
501.126283
510.532105
520.613895
530.654956
540.610443
550.815666
560.950072
570.495718
580.817099
590.894461
600.800382

" + ] + }, + "metadata": {} + } + ], + "source": [ + "trainer_stats = trainer.train()" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "cellView": "form", + "id": "pCqnaKmlO1U9", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "1e195e99-92bc-4b23-91a9-a9602d8035a7" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "1326.0614 seconds used for training.\n", + "22.1 minutes used for training.\n", + "Peak reserved memory = 48.988 GB.\n", + "Peak reserved memory for training = 2.572 GB.\n", + "Peak reserved memory % of max memory = 61.814 %.\n", + "Peak reserved memory for training % of max memory = 3.245 %.\n" + ] + } + ], + "source": [ + "# @title Show final memory and time stats\n", + "used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "used_memory_for_lora = round(used_memory - start_gpu_memory, 3)\n", + "used_percentage = round(used_memory / max_memory * 100, 3)\n", + "lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)\n", + "print(f\"{trainer_stats.metrics['train_runtime']} seconds used for training.\")\n", + "print(\n", + " f\"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.\"\n", + ")\n", + "print(f\"Peak reserved memory = {used_memory} GB.\")\n", + "print(f\"Peak reserved memory for training = {used_memory_for_lora} GB.\")\n", + "print(f\"Peak reserved memory % of max memory = {used_percentage} %.\")\n", + "print(f\"Peak reserved memory for training % of max memory = {lora_percentage} %.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ekOmTR1hSNcr" + }, + "source": [ + "\n", + "### Inference\n", + "Let's run the model via Unsloth native inference! According to the `Gemma-3` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64`" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "id": "kR3gIAX-SM2q", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "7c7c8ee7-33b3-474e-af73-7e743a706b57" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "['<|turn>user\\nContinue the sequence: 1, 1, 2, 3, 5, 8,\\n<|turn>model\\n<|channel>thought\\n13, 21, 34, 55, 89, 144, ...\\n\\nThis is the **Fibonacci sequence**, where each number is the sum of the two preceding ones.']" + ], + "text/html": [ + "

['<bos><|turn>user\\nContinue the sequence: 1, 1, 2, 3, 5, 8,<turn|>\\n<|turn>model\\n<|channel>thought\\n<channel|>13, 21, 34, 55, 89, 144, ...\\n\\nThis is the **Fibonacci sequence**, where each number is the sum of the two preceding ones.<turn|>']
" + ] + }, + "metadata": {}, + "execution_count": 21 + } + ], + "source": [ + "from unsloth.chat_templates import get_chat_template\n", + "tokenizer = get_chat_template(\n", + " tokenizer,\n", + " chat_template = \"gemma-4-thinking\",\n", + ")\n", + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{\n", + " \"type\" : \"text\",\n", + " \"text\" : \"Continue the sequence: 1, 1, 2, 3, 5, 8,\",\n", + " }]\n", + "}]\n", + "inputs = tokenizer.apply_chat_template(\n", + " messages,\n", + " add_generation_prompt = True, # Must add for generation\n", + " return_tensors = \"pt\",\n", + " tokenize = True,\n", + " return_dict = True,\n", + ").to(\"cuda\")\n", + "outputs = model.generate(\n", + " **inputs,\n", + " max_new_tokens = 64, # Increase for longer outputs!\n", + " use_cache = True,\n", + " # Recommended Gemma-3 settings!\n", + " temperature = 1.0, top_p = 0.95, top_k = 64,\n", + ")\n", + "tokenizer.batch_decode(outputs)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CrSvZObor0lY" + }, + "source": [ + " You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "id": "e2pEuRb1r2Vg", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "9ad7c937-8c04-46ba-d850-ce24f294ffc0" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "<|channel>thought\n", + "The sky is blue because of a phenomenon called **Rayleigh scattering**.\n", + "\n", + "Here is the step-by-step breakdown of why this happens:\n", + "\n", + "### 1. Sunlight is a spectrum of colors\n", + "Although sunlight looks white, it is actually made up of all the colors of the rainbow (red\n" + ] + } + ], + "source": [ + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\" : \"text\", \"text\" : \"Why is the sky blue?\",}]\n", + "}]\n", + "inputs = tokenizer.apply_chat_template(\n", + " messages,\n", + " add_generation_prompt = True, # Must add for generation\n", + " return_tensors = \"pt\",\n", + " tokenize = True,\n", + " return_dict = True,\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "_ = model.generate(\n", + " **inputs,\n", + " max_new_tokens = 64, # Increase for longer outputs!\n", + " use_cache = True,\n", + " # Recommended Gemma-3 settings!\n", + " temperature = 1.0, top_p = 0.95, top_k = 64,\n", + " streamer = TextStreamer(tokenizer, skip_prompt = True),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uMuVrWbjAzhc" + }, + "source": [ + "\n", + "### Saving, loading finetuned models\n", + "To save the final model as LoRA adapters, either use Hugging Face's `push_to_hub` for an online save or `save_pretrained` for a local save.\n", + "\n", + "**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "id": "upcOlWe7A1vc", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "42f363fe-2e66-41ef-9645-998f4b011299" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "['gemma_4_lora/processor_config.json']" + ] + }, + "metadata": {}, + "execution_count": 23 + } + ], + "source": [ + "model.save_pretrained(\"gemma_4_lora\") # Local saving\n", + "tokenizer.save_pretrained(\"gemma_4_lora\")\n", + "# model.push_to_hub(\"HF_ACCOUNT/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving\n", + "# tokenizer.push_to_hub(\"HF_ACCOUNT/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AEEcJ4qfC7Lp" + }, + "source": [ + "Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "id": "MKX_XKs_BNZR", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "c05ee3db-0983-468f-bc1c-4a99b1ef1ac5" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "<|channel>thought\n", + "Gemma 4 is a family of open weights large language models developed by Google DeepMind. \n", + "\n", + "Key characteristics of the Gemma 4 family include:\n", + "\n", + "* **Open Weights:** These models are released with open weights, allowing developers and researchers to customize, fine-tune, and deploy them in various applications.\n", + "* **Multimodal Capabilities:** Gemma 4 models are capable of understanding and processing both text and image inputs.\n", + "* **Audio Processing:** Within the Gemma 4 family, the 2B and 4B models also have the capability to process audio input.\n", + "* **Text Generation:** While\n" + ] + } + ], + "source": [ + "if False:\n", + " from unsloth import FastModel\n", + " model, tokenizer = FastModel.from_pretrained(\n", + " model_name = \"gemma_4_lora\", # YOUR MODEL YOU USED FOR TRAINING\n", + " max_seq_length = 2048,\n", + " load_in_4bit = True,\n", + " )\n", + "\n", + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\" : \"text\", \"text\" : \"What is Gemma-4?\",}]\n", + "}]\n", + "inputs = tokenizer.apply_chat_template(\n", + " messages,\n", + " add_generation_prompt = True, # Must add for generation\n", + " return_tensors = \"pt\",\n", + " tokenize = True,\n", + " return_dict = True,\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "_ = model.generate(\n", + " **inputs,\n", + " max_new_tokens = 128, # Increase for longer outputs!\n", + " # Recommended Gemma-3 settings!\n", + " temperature = 1.0, top_p = 0.95, top_k = 64,\n", + " streamer = TextStreamer(tokenizer, skip_prompt = True),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f422JgM9sdVT" + }, + "source": [ + "### Saving to float16 for VLLM\n", + "\n", + "We also support saving to `float16` directly for deployment! We save it in the folder `gemma-4-finetune`. Set `if False` to `if True` to let it run!" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "id": "iHjt_SMYsd3P" + }, + "outputs": [], + "source": [ + "if False: # Change to True to save finetune!\n", + " model.save_pretrained_merged(\"gemma-4-finetune\", tokenizer)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "z6O48DbNIAr0" + }, + "source": [ + "If you want to upload / push to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location!" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "id": "ZV-CiKPrIFG0" + }, + "outputs": [], + "source": [ + "if False: # Change to True to upload finetune\n", + " model.push_to_hub_merged(\n", + " \"HF_ACCOUNT/gemma-4-finetune\", tokenizer,\n", + " token = \"YOUR_HF_TOKEN\"\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TCv4vXHd61i7" + }, + "source": [ + "### GGUF / llama.cpp Conversion\n", + "To save to `GGUF` / `llama.cpp`, we support it natively now for all models! For now, you can convert easily to `Q8_0, F16 or BF16` precision. `Q4_K_M` for 4bit will come later!" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "id": "FqfebeAdT073" + }, + "outputs": [], + "source": [ + "if False: # Change to True to save to GGUF\n", + " model.save_pretrained_gguf(\n", + " \"gemma_4_finetune\",\n", + " tokenizer,\n", + " quantization_method = \"Q8_0\", # For now only Q8_0, BF16, F16 supported\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Q974YEVPI7JS" + }, + "source": [ + "Likewise, if you want to instead push to GGUF to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location!" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": { + "id": "ZgcJIhJ0I_es" + }, + "outputs": [], + "source": [ + "if False: # Change to True to upload GGUF\n", + " model.push_to_hub_gguf(\n", + " \"HF_ACCOUNT/gemma_4_finetune\",\n", + " tokenizer,\n", + " quantization_method = \"Q8_0\", # Only Q8_0, BF16, F16 supported\n", + " token = \"YOUR_HF_TOKEN\",\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pnz9QOYTMvbH" + }, + "source": [ + "Now, use the `gemma-4-finetune.gguf` file or `gemma-4-finetune-Q4_K_M.gguf` file in llama.cpp.\n", + "\n", + "And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!\n", + "\n", + "Some other resources:\n", + "1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)\n", + "2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)\n", + "3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)\n", + "4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)!\n", + "\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " Join Discord if you need help + ⭐️ Star us on Github ⭐️\n", + "
\n", + "\n", + " This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme)." + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "A100", + "provenance": [], + "machine_shape": "hm" + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "7769eb91dcb74e2b998cea8da27110c0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_c96e424cc8ba44c9ad664e4acc966e99", + "IPY_MODEL_8d33ff5f466e416ca3e3332a5fd15d59", + "IPY_MODEL_6dd24e24dccb4f76884a667a339889a5" + ], + "layout": "IPY_MODEL_f64b22ad99e54673944ad3ba7bde198b" + } + }, + "c96e424cc8ba44c9ad664e4acc966e99": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a6828f24aaad43d58bf87cbadbfdae20", + "placeholder": "​", + "style": "IPY_MODEL_e233f639b56b47ba813d231923e6234e", + "value": "model.safetensors.index.json: " + } + }, + "8d33ff5f466e416ca3e3332a5fd15d59": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_692480d015c746ae9a97e09def1832c9", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_6a6c3c78ddbd4e27a5886e47cf44e476", + "value": 1 + } + }, + "6dd24e24dccb4f76884a667a339889a5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_1a3548fbc46e48cfb0df425c6b640467", + "placeholder": "​", + "style": "IPY_MODEL_62dd1228b1d0402ead07486077387640", + "value": " 103k/? [00:00<00:00, 9.53MB/s]" + } + }, + "f64b22ad99e54673944ad3ba7bde198b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a6828f24aaad43d58bf87cbadbfdae20": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e233f639b56b47ba813d231923e6234e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "692480d015c746ae9a97e09def1832c9": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "6a6c3c78ddbd4e27a5886e47cf44e476": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "1a3548fbc46e48cfb0df425c6b640467": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "62dd1228b1d0402ead07486077387640": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "12c7026200d14b42b3d085995ab11a7a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_7ca13c9cdb5c4b649c78793975295602", + "IPY_MODEL_1ff0ed429f7f4784a1846b62126e6288", + "IPY_MODEL_3bb8deb734824b119dc6a75a03671357" + ], + "layout": "IPY_MODEL_038f96ae1ed5480b9f209d268ae32cbd" + } + }, + "7ca13c9cdb5c4b649c78793975295602": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_670f3a9c0b2045cfa378a5aac8337396", + "placeholder": "​", + "style": "IPY_MODEL_6f5ce5cfa09f492e956109eda6f5b3f3", + "value": "Download complete: 100%" + } + }, + "1ff0ed429f7f4784a1846b62126e6288": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_8ebf11b7ab2142ceb3a1893d4f215dd3", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_604450c9e490468081c0577fb9d6116c", + "value": 1 + } + }, + "3bb8deb734824b119dc6a75a03671357": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_5818f1d23c7e4fb9b0f2a8c7fd00f6d8", + "placeholder": "​", + "style": "IPY_MODEL_790cc60e62b74a0f83bee6b135407008", + "value": " 51.6G/51.6G [02:15<00:00, 374MB/s]" + } + }, + "038f96ae1ed5480b9f209d268ae32cbd": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "670f3a9c0b2045cfa378a5aac8337396": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6f5ce5cfa09f492e956109eda6f5b3f3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "8ebf11b7ab2142ceb3a1893d4f215dd3": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "604450c9e490468081c0577fb9d6116c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "5818f1d23c7e4fb9b0f2a8c7fd00f6d8": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "790cc60e62b74a0f83bee6b135407008": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "aa8461c3b0a048859b122f833cdd0588": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_15d3f86682a7499ba4a89a47c3eb2cc9", + "IPY_MODEL_193ff086d24f4ec79369788d1f23ce5b", + "IPY_MODEL_42d741ab6080460d86c7b25d69c83513" + ], + "layout": "IPY_MODEL_ca26a64b7c8141849e2af1c7926c8d4c" + } + }, + "15d3f86682a7499ba4a89a47c3eb2cc9": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_6caf778bd350418ab85f925401358cb7", + "placeholder": "​", + "style": "IPY_MODEL_4bcd980f165945358b2dd5bb46d852dd", + "value": "Fetching 2 files: 100%" + } + }, + "193ff086d24f4ec79369788d1f23ce5b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_361e586e6e55463a90f6b9dc4f536e0c", + "max": 2, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_75b32154d6504abcb249e2c2e00c6afb", + "value": 2 + } + }, + "42d741ab6080460d86c7b25d69c83513": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c743597897874b0cbc8440fdf5eb3f7a", + "placeholder": "​", + "style": "IPY_MODEL_38de8ae3e68a4b1796aca83591ee8156", + "value": " 2/2 [02:15<00:00, 135.50s/it]" + } + }, + "ca26a64b7c8141849e2af1c7926c8d4c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6caf778bd350418ab85f925401358cb7": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4bcd980f165945358b2dd5bb46d852dd": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "361e586e6e55463a90f6b9dc4f536e0c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "75b32154d6504abcb249e2c2e00c6afb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "c743597897874b0cbc8440fdf5eb3f7a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "38de8ae3e68a4b1796aca83591ee8156": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "4154a72354a8403dbe3369ba9c548fc7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_c88bd79d9b5c4b0681930216611d5522", + "IPY_MODEL_a46f89db282041ff8cc4401bb8a4a636", + "IPY_MODEL_e50c40ce7c6b43fd9c347b915e2b6aab" + ], + "layout": "IPY_MODEL_8431c48be48e4720aec35e9bab52fa92" + } + }, + "c88bd79d9b5c4b0681930216611d5522": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_f41e37a94c5c4b68a781b58c42fca4ff", + "placeholder": "​", + "style": "IPY_MODEL_0b4bf122609942de8afa64e37c61654b", + "value": "Loading weights: 100%" + } + }, + "a46f89db282041ff8cc4401bb8a4a636": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_e138b40042e54adaa94a1323fd9cc779", + "max": 1013, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_f8adec579c1d43a68e1c9ed0a64718fc", + "value": 1013 + } + }, + "e50c40ce7c6b43fd9c347b915e2b6aab": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b1c346a40483474097727972d6a05eef", + "placeholder": "​", + "style": "IPY_MODEL_e740ea8de695437588e91271aaa28928", + "value": " 1013/1013 [00:14<00:00, 547.14it/s]" + } + }, + "8431c48be48e4720aec35e9bab52fa92": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f41e37a94c5c4b68a781b58c42fca4ff": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0b4bf122609942de8afa64e37c61654b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "e138b40042e54adaa94a1323fd9cc779": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f8adec579c1d43a68e1c9ed0a64718fc": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "b1c346a40483474097727972d6a05eef": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e740ea8de695437588e91271aaa28928": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "517986179dc2406fba8401fd1a18c866": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_b60c8f2bd64e4df7b1f3897ba20996c9", + "IPY_MODEL_d52aff7110844ba99567cd085f88f4fb", + "IPY_MODEL_f8aac8afada341a49d34f71feeb33086" + ], + "layout": "IPY_MODEL_29a3f7f177fd45ec98d5fdf5358e87e6" + } + }, + "b60c8f2bd64e4df7b1f3897ba20996c9": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7f04178813454a64b2e11ae328ddd975", + "placeholder": "​", + "style": "IPY_MODEL_19514ae2563642e7aafc05fdcd2d6eb9", + "value": "generation_config.json: 100%" + } + }, + "d52aff7110844ba99567cd085f88f4fb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_e15263fc72d24bf3a6da6563d246bca2", + "max": 208, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_35444cb6e46f4925895b17449663d323", + "value": 208 + } + }, + "f8aac8afada341a49d34f71feeb33086": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_1f05fbc19f6d4c3c985093ae4502f677", + "placeholder": "​", + "style": "IPY_MODEL_92944bffeb3542cbb02ebe44fe67ff2b", + "value": " 208/208 [00:00<00:00, 28.3kB/s]" + } + }, + "29a3f7f177fd45ec98d5fdf5358e87e6": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7f04178813454a64b2e11ae328ddd975": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "19514ae2563642e7aafc05fdcd2d6eb9": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "e15263fc72d24bf3a6da6563d246bca2": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "35444cb6e46f4925895b17449663d323": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "1f05fbc19f6d4c3c985093ae4502f677": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "92944bffeb3542cbb02ebe44fe67ff2b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "9a181c75a14544a698eb5b5e16a7e27c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_b4eda43ec55741ceab50a6ef8823641b", + "IPY_MODEL_de0d4fe3ef26497c968986714c4c1294", + "IPY_MODEL_53e8cc1b279b4cb68fe499416d20a28b" + ], + "layout": "IPY_MODEL_d5a0369d9ec94f018b8480620c458d0d" + } + }, + "b4eda43ec55741ceab50a6ef8823641b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a450acc04e4d4bf58177461b3b888397", + "placeholder": "​", + "style": "IPY_MODEL_0959671891b947a19ceb2f1d9677e7e5", + "value": "processor_config.json: " + } + }, + "de0d4fe3ef26497c968986714c4c1294": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a885ae88241a4013b42b8f4d89d74f99", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_5db37854f8c6430cac9a074903f4a766", + "value": 1 + } + }, + "53e8cc1b279b4cb68fe499416d20a28b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_77b5140635d14346bf7ae7fe1c905dff", + "placeholder": "​", + "style": "IPY_MODEL_9062490a257644d380adf6ef8a037694", + "value": " 1.69k/? [00:00<00:00, 182kB/s]" + } + }, + "d5a0369d9ec94f018b8480620c458d0d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a450acc04e4d4bf58177461b3b888397": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0959671891b947a19ceb2f1d9677e7e5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "a885ae88241a4013b42b8f4d89d74f99": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "5db37854f8c6430cac9a074903f4a766": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "77b5140635d14346bf7ae7fe1c905dff": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9062490a257644d380adf6ef8a037694": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "2d1ad39161c840d38be868b6b5d30e06": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_291e50a3d894436ab3691f49cdba7461", + "IPY_MODEL_43404477358b43ab890fc624fbbe093d", + "IPY_MODEL_c94cef24144246b38abf0669f99576a8" + ], + "layout": "IPY_MODEL_816a0b007a0e4dda8e41f11dd7d853d2" + } + }, + "291e50a3d894436ab3691f49cdba7461": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_2b6016ee95894de5831bd86bc0dd4985", + "placeholder": "​", + "style": "IPY_MODEL_f9eeebd8dfe34fca81e472e14c21211a", + "value": "chat_template.jinja: " + } + }, + "43404477358b43ab890fc624fbbe093d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7351cbc882f14b308be2c7732e56bb91", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_4715a8e2e0f1442a8911832160ad2d47", + "value": 1 + } + }, + "c94cef24144246b38abf0669f99576a8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_224b1b795ca840d4a06db069caae1e6d", + "placeholder": "​", + "style": "IPY_MODEL_6f31576ee8624861a649053804780272", + "value": " 12.0k/? [00:00<00:00, 1.25MB/s]" + } + }, + "816a0b007a0e4dda8e41f11dd7d853d2": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2b6016ee95894de5831bd86bc0dd4985": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f9eeebd8dfe34fca81e472e14c21211a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "7351cbc882f14b308be2c7732e56bb91": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "4715a8e2e0f1442a8911832160ad2d47": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "224b1b795ca840d4a06db069caae1e6d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6f31576ee8624861a649053804780272": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "ea4c29f5f43a453e8494f835a6407024": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_88f39a67d2b941d683b52a3c44559525", + "IPY_MODEL_fd01935ca8d74ef5981e120f9c36868b", + "IPY_MODEL_422a169fb50048509742eb51dd7db278" + ], + "layout": "IPY_MODEL_46a247d88b904364a2c01b7a23adc16d" + } + }, + "88f39a67d2b941d683b52a3c44559525": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_751c3757c92e4b989bad2beaf0e72862", + "placeholder": "​", + "style": "IPY_MODEL_f069d89c19924879af9abb3cc8d71b64", + "value": "tokenizer_config.json: " + } + }, + "fd01935ca8d74ef5981e120f9c36868b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0e4d36725d4a4aea8c02ccaf3dd96eaf", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_ca077b888e13430385998ae69c8b6175", + "value": 1 + } + }, + "422a169fb50048509742eb51dd7db278": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_f53291e6d8d041c5b050722b334e6cab", + "placeholder": "​", + "style": "IPY_MODEL_1942ae9bfdf348b791c37a8bb211f29a", + "value": " 15.0k/? [00:00<00:00, 1.64MB/s]" + } + }, + "46a247d88b904364a2c01b7a23adc16d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "751c3757c92e4b989bad2beaf0e72862": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f069d89c19924879af9abb3cc8d71b64": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "0e4d36725d4a4aea8c02ccaf3dd96eaf": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "ca077b888e13430385998ae69c8b6175": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "f53291e6d8d041c5b050722b334e6cab": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1942ae9bfdf348b791c37a8bb211f29a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "22f7568e9caf4d728d63c52378b3d5a5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_b0cbed05625145fdba2236b2264f0990", + "IPY_MODEL_ed26f2f3dc67495e8c5e66dd43f56988", + "IPY_MODEL_73bdd24fd1234a138bba9b248810769a" + ], + "layout": "IPY_MODEL_91886a707da84f88898e9eb1a5603766" + } + }, + "b0cbed05625145fdba2236b2264f0990": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c17e1dc079a8448d9cd3678aa841a4cf", + "placeholder": "​", + "style": "IPY_MODEL_8f96e18da1b940f1907df7daec6fe385", + "value": "tokenizer.json: 100%" + } + }, + "ed26f2f3dc67495e8c5e66dd43f56988": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_5880c5dc36154d699e40aacc9829efae", + "max": 32169626, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_f2e283218aaf4f7d826d44d87bc18b66", + "value": 32169626 + } + }, + "73bdd24fd1234a138bba9b248810769a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_2e283e162b124bdcae47a501696bc9af", + "placeholder": "​", + "style": "IPY_MODEL_f3a176148e2e48d982f3362dd707d389", + "value": " 32.2M/32.2M [00:01<00:00, 161MB/s]" + } + }, + "91886a707da84f88898e9eb1a5603766": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c17e1dc079a8448d9cd3678aa841a4cf": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8f96e18da1b940f1907df7daec6fe385": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "5880c5dc36154d699e40aacc9829efae": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f2e283218aaf4f7d826d44d87bc18b66": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "2e283e162b124bdcae47a501696bc9af": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f3a176148e2e48d982f3362dd707d389": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "ea642c11633e49db979a42c731ee1d65": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_7d44ac1ab19148ddbeb30b5a154ad8ae", + "IPY_MODEL_608c16dc8fc049c1aa16b44db90b3aba", + "IPY_MODEL_9fe2a1a901af4445aed445bce8b40c83" + ], + "layout": "IPY_MODEL_caa699c0834f42c68854dd5d612dde74" + } + }, + "7d44ac1ab19148ddbeb30b5a154ad8ae": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c512cc070e8349408354f6c9454eb2b7", + "placeholder": "​", + "style": "IPY_MODEL_0302c405886540e9837f90608688c0be", + "value": "README.md: 100%" + } + }, + "608c16dc8fc049c1aa16b44db90b3aba": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_1521c752798a47fb996da308bf41a27f", + "max": 982, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_7208562807bb435b85271416df707e93", + "value": 982 + } + }, + "9fe2a1a901af4445aed445bce8b40c83": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_26779d9bd26b461e9df7846137d31213", + "placeholder": "​", + "style": "IPY_MODEL_9c3255be55bf4c1ea91f55754010961e", + "value": " 982/982 [00:00<00:00, 112kB/s]" + } + }, + "caa699c0834f42c68854dd5d612dde74": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c512cc070e8349408354f6c9454eb2b7": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0302c405886540e9837f90608688c0be": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "1521c752798a47fb996da308bf41a27f": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7208562807bb435b85271416df707e93": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "26779d9bd26b461e9df7846137d31213": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9c3255be55bf4c1ea91f55754010961e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "0921d7187ee24114880fa7d7b7db645d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_f28157da9878442998595a29bda2f64b", + "IPY_MODEL_13580885a8fc41acb660a155918fba88", + "IPY_MODEL_90c1689829f242998f4b35bb8068d5f6" + ], + "layout": "IPY_MODEL_b9c0ae90bcbd498ab039c00f6d7c0017" + } + }, + "f28157da9878442998595a29bda2f64b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_dc0f07ea3fb74c9a9817e3c61a237211", + "placeholder": "​", + "style": "IPY_MODEL_22e5c73c269247f3915c216c9efb2a7e", + "value": "data/train-00000-of-00001.parquet: 100%" + } + }, + "13580885a8fc41acb660a155918fba88": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ed6383465575471689ab4d8a2323da84", + "max": 116531415, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_fb1419c94d254f38ae1007200c3cec92", + "value": 116531415 + } + }, + "90c1689829f242998f4b35bb8068d5f6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a8bb4aa7409647c2b8949b14e67f8222", + "placeholder": "​", + "style": "IPY_MODEL_63dd480a71204e75b4bb7dd8ca76790f", + "value": " 117M/117M [00:02<00:00, 581MB/s]" + } + }, + "b9c0ae90bcbd498ab039c00f6d7c0017": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "dc0f07ea3fb74c9a9817e3c61a237211": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "22e5c73c269247f3915c216c9efb2a7e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "ed6383465575471689ab4d8a2323da84": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "fb1419c94d254f38ae1007200c3cec92": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "a8bb4aa7409647c2b8949b14e67f8222": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "63dd480a71204e75b4bb7dd8ca76790f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3a0e9f6cd1014fd38ccd3ef2734b2929": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_7b0b5a0d6e474bb78ee8eac7281099b5", + "IPY_MODEL_f8b0f6f04f734d4f8bb59c29ee163a42", + "IPY_MODEL_e82867e03ef64048b0632e551a941eb3" + ], + "layout": "IPY_MODEL_33bd1e52d5094206942d06bbdab706cf" + } + }, + "7b0b5a0d6e474bb78ee8eac7281099b5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7818031043d44f27a87fa3574cc8c86e", + "placeholder": "​", + "style": "IPY_MODEL_38cca59244824cff98732887980d5172", + "value": "Generating train split: 100%" + } + }, + "f8b0f6f04f734d4f8bb59c29ee163a42": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ecc2db1896c244958fe0e79dd11b66bd", + "max": 100000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_d838de584c474e8ea454a4c71185af61", + "value": 100000 + } + }, + "e82867e03ef64048b0632e551a941eb3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_bbdce42ed0704835ade22bd2a2f8c46c", + "placeholder": "​", + "style": "IPY_MODEL_750bfb16f1f442a2bd1e4b2ee021319b", + "value": " 100000/100000 [00:00<00:00, 144341.88 examples/s]" + } + }, + "33bd1e52d5094206942d06bbdab706cf": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7818031043d44f27a87fa3574cc8c86e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "38cca59244824cff98732887980d5172": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "ecc2db1896c244958fe0e79dd11b66bd": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d838de584c474e8ea454a4c71185af61": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "bbdce42ed0704835ade22bd2a2f8c46c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "750bfb16f1f442a2bd1e4b2ee021319b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3a8bada54de24005878735b98d444f29": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_8eded708fd6742cc96cec313c512b0f0", + "IPY_MODEL_b9c5d9b844874196ae366f285e49ebfe", + "IPY_MODEL_b40f36a643b242dfbaf1ffd475e83da2" + ], + "layout": "IPY_MODEL_45a9724053244ca998c84a5e0a81c1b9" + } + }, + "8eded708fd6742cc96cec313c512b0f0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_89d693d6c7a142c68c7802423585b151", + "placeholder": "​", + "style": "IPY_MODEL_a2c6355a6a6b4ef787cbeba481410426", + "value": "Unsloth: Standardizing formats (num_proc=16): 100%" + } + }, + "b9c5d9b844874196ae366f285e49ebfe": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_76fd0b3eac784c53943244976d315519", + "max": 3000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_50ffb521dbd1411bb0d8537d3bd4b6d0", + "value": 3000 + } + }, + "b40f36a643b242dfbaf1ffd475e83da2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_20c834cf533442388acf92187ee4445e", + "placeholder": "​", + "style": "IPY_MODEL_9dcb860a29474e5b93713073549d35f1", + "value": " 3000/3000 [00:01<00:00, 212.69 examples/s]" + } + }, + "45a9724053244ca998c84a5e0a81c1b9": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "89d693d6c7a142c68c7802423585b151": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a2c6355a6a6b4ef787cbeba481410426": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "76fd0b3eac784c53943244976d315519": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "50ffb521dbd1411bb0d8537d3bd4b6d0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "20c834cf533442388acf92187ee4445e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9dcb860a29474e5b93713073549d35f1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "89abc50a036d4faf9be10ceed297717b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_fd4fe9ecb07b4a93b28cc23aff8c869d", + "IPY_MODEL_10f5aeae5a594847aac7310d83942932", + "IPY_MODEL_cbd6d8c7ed884edd9f53d56f8abfd194" + ], + "layout": "IPY_MODEL_aaf13b4c776146be9b377259caad3a9f" + } + }, + "fd4fe9ecb07b4a93b28cc23aff8c869d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_eceb4abf92b744df8973b3443e956ff5", + "placeholder": "​", + "style": "IPY_MODEL_4ed42a6b2ed541edaa51d5769589b234", + "value": "Map: 100%" + } + }, + "10f5aeae5a594847aac7310d83942932": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_f81f35d00b0443f5aff0a50382fca927", + "max": 3000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_27abe44e35554d3e8338b50f494f7bd2", + "value": 3000 + } + }, + "cbd6d8c7ed884edd9f53d56f8abfd194": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d07a25a3f10e47fa835d5d9ef6088e09", + "placeholder": "​", + "style": "IPY_MODEL_f37eb7bb43cc4409bfbfd827450dc4c0", + "value": " 3000/3000 [00:00<00:00, 9363.26 examples/s]" + } + }, + "aaf13b4c776146be9b377259caad3a9f": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "eceb4abf92b744df8973b3443e956ff5": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4ed42a6b2ed541edaa51d5769589b234": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "f81f35d00b0443f5aff0a50382fca927": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "27abe44e35554d3e8338b50f494f7bd2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "d07a25a3f10e47fa835d5d9ef6088e09": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f37eb7bb43cc4409bfbfd827450dc4c0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "c3d4dd1d33394f1a88ca079188777fe1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_c1533ef6cb884d6e86a22519ed7125ce", + "IPY_MODEL_824b65bd8b7a41baadbe6ddb79b17de4", + "IPY_MODEL_c118b89131354eb98deaae78238b0210" + ], + "layout": "IPY_MODEL_89fccf64a5ca4b08b435b4223f42793e" + } + }, + "c1533ef6cb884d6e86a22519ed7125ce": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a4b2ebf57ab144f4bff90820725b6108", + "placeholder": "​", + "style": "IPY_MODEL_f5b43de869004a42b2b9d0740aa455e7", + "value": "Unsloth: Tokenizing ["text"] (num_proc=16): 100%" + } + }, + "824b65bd8b7a41baadbe6ddb79b17de4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_85243a7a0b3a4abb891cf55828ba8b7a", + "max": 3000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_a9805e75f7df4882a90770bf8a7a5530", + "value": 3000 + } + }, + "c118b89131354eb98deaae78238b0210": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b3ac166597b342ca8cbf4eb0b361e7a4", + "placeholder": "​", + "style": "IPY_MODEL_97802db0631649d79348faed67ba75ef", + "value": " 3000/3000 [01:19<00:00, 39.23 examples/s]" + } + }, + "89fccf64a5ca4b08b435b4223f42793e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a4b2ebf57ab144f4bff90820725b6108": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f5b43de869004a42b2b9d0740aa455e7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "85243a7a0b3a4abb891cf55828ba8b7a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a9805e75f7df4882a90770bf8a7a5530": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "b3ac166597b342ca8cbf4eb0b361e7a4": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "97802db0631649d79348faed67ba75ef": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "6c2f48e8343b4722bc30bb62d94ea55f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_e2755c541dad407ba30990db458f29e4", + "IPY_MODEL_927ce392ba2a403081cb68ddf07fcc78", + "IPY_MODEL_c0e8470e86fd4f34b8adff7ac78ae0cd" + ], + "layout": "IPY_MODEL_ecbed13d665c454da7f835028d56e590" + } + }, + "e2755c541dad407ba30990db458f29e4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_89d74579171d48668c0542e20fbe877c", + "placeholder": "​", + "style": "IPY_MODEL_e03a39c6a9354ed9a018e2487f073d59", + "value": "Map (num_proc=16): 100%" + } + }, + "927ce392ba2a403081cb68ddf07fcc78": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_29c397be81794e3aaf80ff115b1bd8fe", + "max": 3000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_b8c562d039ca4c9bba34c03f10dd65a7", + "value": 3000 + } + }, + "c0e8470e86fd4f34b8adff7ac78ae0cd": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0e3d391c4f8d42318f19ba0f832a4438", + "placeholder": "​", + "style": "IPY_MODEL_92015d6aa1274b69a48acafff7cc1abe", + "value": " 3000/3000 [00:01<00:00, 3380.93 examples/s]" + } + }, + "ecbed13d665c454da7f835028d56e590": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "89d74579171d48668c0542e20fbe877c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e03a39c6a9354ed9a018e2487f073d59": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "29c397be81794e3aaf80ff115b1bd8fe": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b8c562d039ca4c9bba34c03f10dd65a7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "0e3d391c4f8d42318f19ba0f832a4438": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "92015d6aa1274b69a48acafff7cc1abe": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3802243f57e44bbe993ec24f6a8c3fee": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_1429ff4a23174e25b800190869b5aa4a", + "IPY_MODEL_9644878b1d71428ca8bf953423132e1a", + "IPY_MODEL_7eb91b277e664af291ede6355337cae6" + ], + "layout": "IPY_MODEL_bb05f72207104b9a854ef607a034330a" + } + }, + "1429ff4a23174e25b800190869b5aa4a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_8f4a1acdc33a4d42be4af93b5ac94d2f", + "placeholder": "​", + "style": "IPY_MODEL_851ab005663641dd92883ee29796768b", + "value": "Filter (num_proc=16): 100%" + } + }, + "9644878b1d71428ca8bf953423132e1a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0fa697ca6068489ea5c7acb5f6d9c4bd", + "max": 3000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_8501848fec5c4e51a213b86aeac44213", + "value": 3000 + } + }, + "7eb91b277e664af291ede6355337cae6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_97d6b96094d34d6985855ba18a5f329b", + "placeholder": "​", + "style": "IPY_MODEL_9ed61906d50b444da7e8a22a636c65ac", + "value": " 3000/3000 [00:01<00:00, 1704.62 examples/s]" + } + }, + "bb05f72207104b9a854ef607a034330a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8f4a1acdc33a4d42be4af93b5ac94d2f": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "851ab005663641dd92883ee29796768b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "0fa697ca6068489ea5c7acb5f6d9c4bd": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8501848fec5c4e51a213b86aeac44213": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "97d6b96094d34d6985855ba18a5f329b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9ed61906d50b444da7e8a22a636c65ac": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "state": {} + } + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/tooling/fine-tuning/unsloth/notebooks/Gemma4_(26B_A4B)-Vision.ipynb b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(26B_A4B)-Vision.ipynb new file mode 100644 index 0000000..814e36e --- /dev/null +++ b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(26B_A4B)-Vision.ipynb @@ -0,0 +1,6426 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "gOpYt-Zgspvw" + }, + "source": [ + "To run this, press \"*Runtime*\" and press \"*Run all*\" on a Google Colab A100 instance!\n", + "
\n", + "\n", + "\n", + " Join Discord if you need help + ⭐ Star us on Github ⭐\n", + "
\n", + "\n", + "To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme).\n", + "\n", + "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xNElsIzospvw" + }, + "source": [ + "### News" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZzyYx8-dspvx" + }, + "source": [ + "Introducing **Unsloth Studio** - a new open source, no-code web UI to train and run LLMs. [Blog](https://unsloth.ai/docs/new/studio) • [Notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb)\n", + "\n", + "\n", + "\n", + "\n", + "
\"Unsloth
Train models — no code needed
\"Unsloth
Run GGUF models on Mac, Windows & Linux
\n", + "\n", + "Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe)\n", + "\n", + "Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context)\n", + "\n", + "New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) • [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) • [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning)\n", + "\n", + "Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bgKLV2s1spvx" + }, + "source": [ + "### Installation" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "f6leZW9xspvx" + }, + "outputs": [], + "source": "%%capture\nimport os, re\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r'[\\d]{1,}\\.[\\d]{1,}', str(torch.__version__)).group(0)\n xformers = 'xformers==' + {'2.10':'0.0.34','2.9':'0.0.33.post1','2.8':'0.0.32.post2'}.get(v, \"0.0.34\")\n !pip install sentencepiece protobuf \"datasets==4.3.0\" \"huggingface_hub>=0.34.0\" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;" + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "Yxkkpa5ospvx" + }, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GFOEZbP7ONMs" + }, + "source": [ + "### Unsloth" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "QmUBVEnvCDJv", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 461, + "referenced_widgets": [ + "7a0fef44174146c98c37ecedecf45d39", + "84760c4a858e46799d253ed3ed86864b", + "451c2f99255e481fb940b0fbf6ddcb58", + "da843bee4e4643eba56b23c2c77aff70", + "82cbe0e382ec49faba78de7b6076b0cf", + "43cac410ebb9427aa7b02011287fb0e6", + "b6e3107751ed469f918b08dbe981c84d", + "0dae1b24c79f4cabba0fc8509b191360", + "ee0d046105cf4d69b9bad9c06978227a", + "f9cffd18bda8431f8a3bede12812168d", + "73a74bcdebc0483db57f9bde93dc3b41", + "2832a79a4f824c4c8f95a2a626582ba6", + "e150eaba931d4f649f6a9c64f1dda02c", + "358cbf15de2b4b00a1e708e7f6e29bb8", + "6f425cd96d5349fa816d9dc0584847b2", + "4893d81e6e984c8ca738593658719209", + "a4bf50d66e8040da9d3b5271f2187dcd", + "49e5b3f23ed84e3cb1a57a7e63788b07", + "521e891f3e594a9e95f233aadf1c4dfa", + "8c143f5422ee486f8716091a0a191ddf", + "dcbdcfc579364725859f64933cd2dec8", + "6630c1be8c1f4c50b9d17d0ba774365d", + "2e2c558295f0452ea3c09bcd29685a36", + "212cb67858a0498c96f3a8bfc16e0544", + "6d8e427e07704af581a52d81c03d289d", + "b11e6e50cb58496db7e763c090fb72da", + "b8c82d51698e4754b611e4589809d729", + "5b5ea275ee2f47359c726ab08a022189", + "e0b83356e66b4095982601c6a76a91f6", + "5120f72dd76945f1b1cb000eb3bfde0f", + "27edb4245f3d4b03b64164333a90848a", + "79322768faff475499be0ebb67805a65", + "bb09f0eaaa2c4e4bb248d490501b488e", + "e2d8bc90d34f45d6b68abf2b22a3e9a7", + "ff60dfd2948b4279b6cbda74b37c33b2", + "0e0b9c3f46aa477c8f9c6d3a26bd63cf", + "4af204e557cb4cc28586af24956dd3b0", + "0b4f98b868f8451d8cb99f6190321c8d", + "4082b09ab6c749c49d71d368cdf85141", + "e98bb563f163410b99017d1985d4ca2a", + "25048b0ead8f4484b2d5017491823090", + "28078c7802c4429b9baf4c7a7b7c72a1", + "509af22adf884b03ba262bd41ff8ea75", + "758c27fab7674b248981c8eb67d3d60e", + "39da2c555af34b0dbee8a65ba82c0ef5", + "2caf91d8b96246d8b0a2e7f2cc3eb326", + "a64f5ed11fc2481aa79ff65998d216c6", + "2081138477dc4a59b0932d099c5c91e3", + "4d1e5c4c438044188d0572c54d56f851", + "c7d8ec30f492459296878ae9b8c92f73", + "44b23794497242938bcb8b2b193b4758", + "0bc564d52fbd4636b18d77f49a793be0", + "724731bdf7d64e208c8da43df88d3916", + "d31c33be9b50430c84e099c561e8491a", + "a69f172b151a41b3afdf594e8300dd63", + "853c106980d745abbcec56ab94573120", + "e5385ae4aa5a4b62a1c01f1f0b7be3e6", + "04e9e69091fe45e0b7bd59cee36f23e3", + "5410c1cc987741f8b68318ab56336621", + "86cec3facded4f8eb655afc878cecc96", + "2ab5080fa805496d99219628cfc534c5", + "b01770b4e0074b2582f03bb4ac34b68f", + "d63b8fd5ca6a401a9c760c6405731a82", + "b7ec7200facb477283dc5de798770e2f", + "1d96a405a2024cf7b380844283029ed3", + "60a6115cbfba408fbda48a00ace7afc1", + "d25954e26fbd407fa5430a78ba49a0b3", + "7b3356198bac4650b26cb10fb6724694", + "1906304d8f6c4163b9484a4fcc0d2cde", + "56e0cc76ccb54253adb3cd26b3113c2b", + "7704ce4caa804b7c8007f6301e774d9f", + "3607b98e5dfb47faa77ae97560951467", + "45e788079b654d2b89c4665faf42d1fd", + "48b508296aa54d729314e2a2c24160dd", + "dcbfd0eaa87b40ee8157fe3aa489ec4d", + "b65aa8c218fe4b039bba43def6756e9c", + "cf60ba5586f14a93a8df74129e2b06aa", + "5450e335cdc248c1bd43e5d18002ebaa", + "c019419080b74a96ada7632a57dd8485", + "9d120705e8c447abb09bc20843da23f4", + "47837cac654c4f05921f1029b6718815", + "c1ba8e990adb4c3aa63d786ec8e93af7", + "e2e92f6bcf934ad8bf00cf6b2aef3ba4", + "bb44124af9914c29aa02aee3dfc214a0", + "47ac15c001a6445c94781c8746012cd4", + "462243e2ef5d4635a9ab6df9e5779f6c", + "c9de48023b9e4625ac8b32c87ffa92dd", + "4586b977bb6f4c739bb9fa5819cabfc6", + "43ee4b1bf75542309a1622da87399e6d", + "a5f0c555837c427d932cd4ab3b372889", + "4486f8a8e17c48ac8056d1337e221050", + "f840f357c01a49e6a77b1be045aa7d12", + "8aa47b7fd9924cf4869f9e14b5b4ead5", + "c489d961ea974c40a4e1f24b44aa0a2a", + "f51553d1142b4697ad396bee6fb7b21d", + "2af0f039d0304149a1f997d053b84405", + "91f37acd8eb844f4a18825bdc3e1f6c2", + "6ee677fbe8094e34a85628f69273cbca", + "7b5b3e29daf5435a80eae703f4a23444" + ] + }, + "outputId": "b63f747d-32a4-470f-b489-cad13b3b20f9" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.\n", + "Unsloth: Your Flash Attention 2 installation seems to be broken. Using Xformers instead. No performance changes will be seen.\n", + "🦥 Unsloth Zoo will now patch everything to make training faster!\n", + "==((====))== Unsloth 2026.4.4: Fast Gemma4 patching. Transformers: 5.5.0.\n", + " \\\\ /| NVIDIA A100-SXM4-80GB. Num GPUs = 1. Max memory: 79.251 GB. Platform: Linux.\n", + "O^O/ \\_/ \\ Torch: 2.10.0+cu128. CUDA: 8.0. CUDA Toolkit: 12.8. Triton: 3.6.0\n", + "\\ / Bfloat16 = TRUE. FA [Xformers = 0.0.34. FA2 = False]\n", + " \"-____-\" Free license: http://github.com/unslothai/unsloth\n", + "Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "model.safetensors.index.json: 0.00B [00:00, ?B/s]" + ], + "application/vnd.jupyter.widget-view+json": { + "version_major": 2, + "version_minor": 0, + "model_id": "7a0fef44174146c98c37ecedecf45d39" + } + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Downloading (incomplete total...): 0.00B [00:00, ?B/s]" + ], + "application/vnd.jupyter.widget-view+json": { + "version_major": 2, + "version_minor": 0, + "model_id": "2832a79a4f824c4c8f95a2a626582ba6" + } + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Fetching 2 files: 0%| | 0/2 [00:00\n", + "### Data Prep\n", + "We'll use a sampled dataset of handwritten math formulas. The objective is to convert these images into a computer-readable format—specifically LaTeX—so they can be rendered. This is particularly useful for complex expressions.\n", + "\n", + "You can access the dataset [here](https://huggingface.co/datasets/unsloth/LaTeX_OCR). The full dataset is [here](https://huggingface.co/datasets/linxy/LaTeX_OCR)." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "LjY75GoYUCB8", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 177, + "referenced_widgets": [ + "e6cf365cf0674a8884e0c8ac8f8280e8", + "7e30c5c4c96d48e3a78689e33caafcd3", + "7de02d7843554276ad002c048a27c0d0", + "96cfbd5b4f314ea4841878182a7efe76", + "562d09896c604940922612e0e07f9c65", + "6f4bfaeacaa94fe9a5e6183255c23610", + "bda1c422c31a400abab7180489c6827a", + "c7ea3a7037814db68f2e49efc250dc53", + "314fc72dcdde4fe0b9b59439f0008a3d", + "48e085b16ef14d99b042debed39dcb6c", + "54a8a8e37c3d49158c65380451eb8712", + "9a6f19b03be84e5aa83b969363d08266", + "81f70c633bc4400e8a87c1cba5c878cb", + "4384402858074f76a55cc80ff7ea31f4", + "8fc9de0fd3964947a7f98a091b81d4a2", + "8c1a59d63151455e8d6008943028bfed", + "ac2d18a0024d4f9292e96f5b2ffc454d", + "b855d3442a164d599f2e7462f153fa95", + "4e5d2c84e7b2409bbb1d45e6d4130821", + "9f3d00003e14495e8a4fb3cfb563e13e", + "7fe02defa2cb4928ae9fbd155c02caee", + "5762804fcc6a477dbaaff1622e67752d", + "44475c29f3e74859a22b79e426cbff03", + "a1d0d7eed4b349e69421762f608a2da0", + "5955b2caad154badb366ab1dc729bf85", + "3203efbc3d4b4195abe86d53b48b8b77", + "e220f3b014f946fa9751a4d39ed0303c", + "beda7d7f24cb413ebcdb42f238dc89bd", + "ec5e1b08549d47ca99a7857a97b94022", + "1355eb5a553a4f87b4a5a6dda9b1160c", + "67fc62df6f6a411c96cb8769f241a810", + "1a2dd7eb1fc94f49b65c10b677b26684", + "60936221bc13459c8adefcc141ac7165", + "079918351e0f463bba0cc030663acffc", + "fa068bcbf12a4f9b8d92f0dad7ee4c71", + "20ded9b9c93a4e07a946e13bbb68062d", + "8845926888a448f98a2d32eddfb9f710", + "a2c4067b539d42a0ab8a71a2ec05742f", + "d212975985c24093951238bf6b32f85b", + "3f674fdc9dbc421386f653c880926d96", + "29d34168eb7c408486892b2b17f279b9", + "00bb9e9c8b124008af2a01ed6ca53b13", + "08279df17aad4a6080e6fe784d5b9e81", + "aaa881bedb974089931e8eee3b193769", + "e89e685da18542fca578ff9bd9aaa8cb", + "56e57386c76b437c8e523405dbc74d89", + "cebedffd8b49445e9721acaa3301d4ea", + "f59d8fb192174fce8983987a2a299e77", + "3aaf75c2b9754b6298cdf346d51a4516", + "01baf8e6d6b944529659448e08efe79c", + "0403261c40634c8daac850bd575fc966", + "8e8c01da153245c8b4e942dae53e2dd2", + "0bf222211e5a4e9cba517790a6089144", + "26a7d1ec9dda441f9b1f99052b082422", + "b5eb5d5fe03543ac9327188d0311bcda" + ] + }, + "outputId": "f5358882-f875-4091-f503-f1b43305c604" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "README.md: 0%| | 0.00/519 [00:00" + ], + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAUAAAAAyCAIAAACib5WDAAAYrUlEQVR4Ae3cebSuUx0HcCSFEBHKlNZCQi2zureMyTwrY2UeQlSaXNdQxjSSMltFSJF5bjCzDMuilUUUIjSKotLtc+83+z73eYd7znve877n3vXuP56zn/389m/av2FP75l10qRJswzKQAMDDcyYGphtxmS7P1wPgl1/9N4nqoZ77I/4TOvA3dU+bP/5z39mnXXWPtnSgGwfNGC4lf/+9799oD1kkjOnA1N6tD9kPUwHELbZZ5/9hRde+Oc//zkd0Ol9FgteffXVsR/apyfHTP6dCT3zzDP/+Mc/ZpttTPvImGauYxuh9BdffPHvf//7v/71r46RpCNPM5Zc7uijj37729/+ne98J9m4Y7Riwete9zpPKb1jJIOOo6qBjPinP/3pt7zlLddcc82///1vBjCqFDtHzhxnpsIraP/nP//5IossMvfcc3/5y18mHTfuWMZky0022YSKf/rTnwrJSHSMTd8//vGP55133qabbvr888/Dg9uOsQ06jpIG4sCvvPLKxhtvbNzFboS48SiRGwnamTADS25i5+abbz7PPPMYg85j2yyz8F7J/IYbbrjyyiu/8pWvwPmGN7xB/uwAp0HSy7xggw02uO2222688cbXv/71HeAZdOmBBpiQYoAuv/zy973vfccee+xf//pXa6gMYg8YGAaJkXj/WOub3HjVVVfxsb/85S+m0JasI2EyqVs4gNACmD8rI0Go+9/+9jcYllpqKalYZZCBR6LPUe2blHv77bdzp2OOOQatkUzlRonV2Yfh62MelDNImBxYsJxzzjlly66wbCrO8cRjyA3DSHCK6/POO6/JM+NQHwmqbvUl0RjhpFsSTRdPBnG6UovaIFdffXWQpk7TRdsXgJlnCk3XfEyMPPPMMw8++OA55pgj8XLkahUXIMmojxwbPCxj5Hi6hWG6dtwtQmMHD5GHKDWwTOLG1JBVNdmJAzNoCaSKZezUX375ZRqXMKeM0VDHqZf8D9F0esOSaCK3ZBrfG4pjgQojUYYYkU278DxE4N5L14kDE6mLGzBU0xXtWABT33e/+13PDTfc0DOq771O21MkbFJ6F91migqHp8Z0sTez8sorW94LK1qqnAeg2tL3urgc1TXlxCcATT+Vxixrv/SlL33xi18kcmymfJ0RK00cmFTkVFQoRUmLp0kpIR3STJw4EYB6bdSHpQJ94ZycJae971IolvEoLWGpFRUILS/NdtZbbz0w7R0YqqqNtsfciuJw2yPs/PPPb33umdfhIqnBFx3W1FgDa3wFjw1jqlLdrqdtwIW3kQxxI9HGFvxPMbfJ9la+ptEzzGhnDEa2OqbVrwB8AlBspqAqFYKAIReYPAvyAjPDVZo4sB0gCVZRIbCSFk8LSzJffPHFRx11VFXdnYlNiXBKAgoqBUmhWBYepSUsFchSMTYYlknOPvtswwNh+dSqAlWxUTCtMLfq3lk7S3388cfPPffcxx57zFM9cbAzbHoRHOe23J9++umXXnqpqsamOA2f2SN31fGCCy5Ya621jOl73vOeww47jDYwEyvXF8I//elPKto1NsWWRthEgTb2oHsbDPifYm6T7a1QSaNnJIr3/vjHP77++uvBBFv5Gj/0CUBTHw6HCRNO4J3nX3311Ysttli2poItmgHTJgQU9sZOZarKwpOROO200/785z+TiiRbb7219lxGIZvbEXvsscc555zzqU99yjavFqrvQBjIFR0///nP/+hHP1I/5ZRTHJozAght3N9///2sxwbghz70IWBarrvuOpaEpQ9/+MNrrrmmLo3G6uDXIY30izfjAb4Vb4z41FNP/dnPfrbffvtB+PGPf3y++ebbZpttxo8fr2MJHK26d9wOuSNl04TDDz/8d7/7nfqee+7ZMTYdyXjkkUf+5Cc/eec73/mLX/zC0Gy22Wa01CjCZI1PmvTkk0/+8pe/3GWXXfR985vfTN7vf//7H/vYx3JjIZzcddddAjTH/tWvfrXoooti8o1vfGNoBaA8jRcvEol23nnnueaaq7TXKm0Ggr1973vf+/3vf6+7Kclee+1lUsDfTj75ZPEI/o022shwE8ftl69//euXXnopKfQycG9605vKkGl817veteWWW7LJj370o0UDNQ7dw1lppZWeeuopA7HPPvuwE6xO1su0mqnxP6Zfw32epCI55wnHZ511ltDOJahVi+QmYwhd6txJF9qpdh96HRXAa6+9ttj/3HPPPfLIIwxF1MCAAeC6biyiwm4SSjzTgiVf0QVZJRdOEp4du/tkjKsApa6jXS5fP/GJTyDhvqtPv/nNb9T333//Wscg4WyChbH3tUa3oO19JSJLJjjHP1ZlVG7geBmTjXymhaplXZJedtll4XnxxRdPxaAo9L/ccsvtuOOOvOjZZ59997vf7RgcQKM+IQTv0yGHHLLDDjt89rOfbaUiCldCpfZE7uGHH+ZUpHjwwQchRBfMgQceqEVoMNxePR2/uf2irgswU5jqkFUNoGigkUNXA2DA6qGHHhpUnjXNuLyhsdgYoQQga2aNjUrQ2N8ydeJKHQpL5VTS4Gc+8xkmLgyLcwsvvLBP0pTrB4L9Cius4DoRJVbDvFeabVpIOAX3/x8gUfnhD38oOggWCy20EO8VbumRpnxdYIEFjAE7A3bSSSdpEZv5ueSPJXV0a0EdDOx33HGHp15VctV6SIgLBGSmKi67alx66aWdPIlQ5t4+1RhmUhnjKqqO6yxDoahUOsaDJX3vvPNOHoh/bH/kIx/BqrmMdvirmOmHxi655JI11ljD1SIC0hIM4pcn3QabRnMTWdr4Gvq3vvWtPPMb3/gG95Npq2pRhxM5OfzXv/61QSRObVAKG+bnSnmtMgbPMsssI4HDb0QwYHDNbO+77z5Ts1133dXMCLzuku26667LhUAi/Y53vKM6ZDpqXH/99QlIAzihgUYOkdPOjRWVSNGomSqH6sB4QVRU+9T/V8yVYgzUr732WiKZjJV2NxmoJucNJld8mGoCXGCGXkkYk9ZM3qTWP/zhD8zFPFY7nDBTrtF69NFHGZBZMeuha3nAJMrX6L1Kjma1szzzbXacpK2xClPqIoU1gtD+gQ98wDQMGPyeN910k8FgrCDDYWiZgIj9jJ6xYq+KVr1fBZNIe2KSoR900EHGxcycHbP18lVFAaPQKiu86KKL0pinzOYXGslykUW7GabwLUdZYpxwwgnU0qjzqAiMcN/UEoItsc9kVdGS1yoD8Gi3LYxKYUCG5IfA8pXhielm8qQIrQxEbcgCTA+rrrqqugJDjcMI8slPfjKzLTbTSjP64seTwZhpM9Hw4zmmyjRrYMKIf7feeisWb7nllrvvvptapSkyyIfcmPzmEtXEG717mr+54suAiO21FIGcEi2e4RQXPBNiWQls1sDg5QHZmO9hAAZgeLCoM49ijg888MAqq6xy/vnnW+zla0FeKtphM4VOdi3tpRLRzBsF+wMOOAB+S0H7GWhFHPN5TFpiGVrYwq1gbwpgb+zEE08kCCTgC85qvTT2poI9xexup512MuWjRiZuZWsBabBkVMo0VSnMkMhFbqJtt912SadayjhGENKR1xIJTsq3bhJhH3roIWAGSBQzP4cWTpAq3Nuc5YknngDAWyDUvVAMTkrTkqeWVApMKto5pDqHNAomQcKrqT4qcPpK/1xI+vVKapAoqtSGDKRimiYKQ2JSAGGNw3RncpEdQqWmGZ80ogIbo0L6c5/73BFHHCGC+E0LrtI3zPf9OVXjZDMAYpKJqFBN11rIwCtwPG7cOLwSTJ38kTDcgzF+VjKkVY+O8gkYczEYxYGDRCSWJ0V3eVVfc/JvfvObjqYMJPwyfHZZ9t133+OOO+4LX/jChRdeKCTrizQSQV57winEEKHKQGC04MSg2uSQMfwswTrZls/uu+8OwCcABuZtb3ubEzJgCy64oBZ2Y+FtN4Wfy8Bl1AtOkMBqIte46u5raHkaIC701a9+1RqYYWULyg6F7UZGRmNOCriW+SeVxi35m0ZSiHHBE+YJbuDwSfMkIqyT4cRuK2GE7Ir99re/pTrxNMeHNAatOAgYNiNY80yYo5xQF2jgN+J55RJKNAOVinP7448/3tTPBuS2227LA8ULjEXnrtbZEtc3XTy1e9aGLI34YQZkwUMjh5GUQoQ5GOhQwKppplBRiTmZj5jS2xtjhzbV0IW/CtbPOjlTyK8SLeD4teZJzB1/5Wtp76xiGHQ0JJnpBcl73/te29rqJkue9G5OmE/qlMWksovAS9NefTIgr1ydXYqUWE1Lgcmr++gE4beClAl8xgYMlnQxR0qMkGyNkPavfe1r4KMKXaIB7amYkVqzmX86jfAcYhEjUoYIX8CqvZZddlkugSVukKmdVzzzWxaJQ5+owrxUI0vVYrFgL5ekitcUX8kiZsW1NEZF9vzV2ToZaSD3+IVazgYy4ltZyPb2Mr1WceZrTTk8SiHLkksu6WlSE/yegVdBSLt77CuuuKLX4MzAkctXMgYe2+G8NmSxDbsndCX4gm/kMORMKEwwQ6KpZhAqpXCYfbUsuEpjwPCDVc/Sq2eVqRkYT9R08803cxgHDMYvhQOvs846+E7kBlMtGGUrLMYsDkz1kzpUpBLChWoVr6h4mj8vv/zyZGZwBttWlrgL3iskNj8TNbTo++1vf/vee+9Ni741EuU1DokKfkpjtWJEfeKlxlWwNx7yjB0UacoK3EQDP5b6ZtGcVsfddtuNXOjapDFNxXkw58kiLRmIDzJKqNJKXTt+1PGWFqR1Zzp0GzwFUqXWUn2tdTQW7BV7VvLYphbSiTgmeygCFu+8WhCChJlicxgD2GvUmOz3wQ9+UAt+ggRR4RKYpCqW6bj33nvDKeB+61vfOuOMM6IcIth6hBM8zUAVbvMsygFGdlwhIcfmNem36MQngRt152omYqeffnpMUXuKqTu00SSVouUpE9aGLEJ5hge0zPtqHOaTdiUiN9XMa5Qnj6xCt5YV9sBNdug2NlxgVLCkVFt6Vw+LnuTxFJDQLo3xyYTMptmPMICBidCWWLXCSbQQGwyFAjYMjMPqhQuFSkbXjCuoRFAbV/kUljJ711FjYPI1z+DEm2mYwTCo2tGqwqDoFYcRLSdhVtRSqB3XnHVfccUVYEgKxpxKPRRzTiNva6mh1TLcIvbbtAs/w+rb2DGTQPOO4IkaZZ6wLSzyIoJEY056yJXNqhrdjEtNRWC06CKNq8PmmWSlAieX81WYYNxa2hfTKKUpTNgz+riFsPYPGPLV/FlSZWMw4FZFPgDcOGQAHClzTnoI/005hDZ2CL6NZnyFBLBpXSFXs4G8msUIcJ7h0LNnZaqv8gEx2wzHdohAaMxsx2+//fb2QqRB2YbMykg4y3gYe0tKOPm2nQmBPCdv3BUV7oQBnOAntsXfDJhX1IOhykNa7IFRsdAen6+BFTzCpIMo2xumGPZpDJ4NNh1zrA1/8Ni/QUJ+CzbrMXt4hZ9CPV+9CjppDIeoR1EcxlpA0VcjkWF2NGJhKdBoiX0Ac6qZRWMwaLGCZYWezHerrbZq7Ii6+YJzWr7hq6M+sSZswJzb4KZOaREuiUxSLYTldeEwXz2LimQw+0BwCrLmqMJEbNT+XzKnWAY/DMSR0GT4iRMnYlhjwabiVYngk933gAPKaw0Mfqq2N2HyzOqik8DoosJgnEeYXefVXK/VkAGQ5+V27CHdnsPI1UYzIUcDZamCtyrz6mmRDLDkWVpqYKP3OtWBjahdEBZseaPCPjzFM2Yk/Fv8YCIiNeWGJK1KsZWozGUMokLoaMGpBvwFoUYUw0DpVb42pZ5GmHkFG9W3KZ/BZrQcIaDC5YRVkKw5dk/ewDgstRXkE5wk0p7kZiCrmANsciHiWNqZiEKlUYmYtuJMzgUpIcMJLcaMsW0kSCwfZDMVyQT/whaFOJFmcCGhxam7mYgAauamY45/SkcWDxIhgR8JkcgZkhbc8gHLBI4qLOpos127wsd4mumoRstadEmXT3niXIUUOWUJq0jAKdfpZXExfvx4yIFFRrGVk/skfCRL18YoJNz0UvSqUdSiBBUxlfI65cvkR7rYTlPyFTO5hVIbspAWak211Dkw+OlyCKaVZoJQLIjG6Kcp/zDgBJineo/LVAfuAeHIb0/CJlaVXEyn2lKt02NUWW0s9TL8soHYDLKVlttQafUpqOx4Sd1GCNFwgqgKV3//+99vxc558rNv4U+7osWc0MF1+JTzJVIhI6sJXpTwYchDwjqflSiZvevldBfyfJ0wYUKtY1MZccWLINExdB2/ueSESqxZI2O1xKUo9UYkNT0kf5qFwpn56k1TTsvjabFXocQNZLEmgZjsIZ1nRgfz4T+vVYBavdbd13Thh5JwkUJ7jdXgMa/BqlHwOnkYpjDTnsOCp1Ez6W7cmRYlwNmosdDt43MaB54i8v8fRQV57wqLlCXnuP0bQ6cOwxM1FfyFh9LSvhKdZlrlWAVwGy0j52tIxP60KFUS1ZagkuWYRdWBY0kylXZTO93t96iXFakkoFfQOiHnzElQWri616RQnGBDwT8f1m7TBSHtkMfNQrexoxZgOEyJIVKvE7JI5GlV5sYbsLiiPG9qPW7cONEk3T1rRa8gLHqQyeVeYFqgsrrRXUUxkREsTByEGAwQpIZt6K+FXGMX/Gi02LH9Llx6rRHySkDbGQIWeQGHVZXpcqgvQZpqRjsM9G9kM5rhRGNjCXBj+2i3TOPAo00MfsNsDGJtXSEXnfIcWi6htyuYIQnyRgdmH77KKg5gpSavll7FgUnHyhVrBFu49i3j5NpNOE2n9XXY7hk8Zte2HtAy740U/NC61w1nZqGX4pJZtaMW3duUpvYkavA9qaZNx1afwmojWh5i1dPGsiHUN91bIW/fHiVwJDe0nJIAromfV5P/Mu2vIhwKh001E2GH6MBVir2s99qBuysbFbMM2rcZZg2cZNVoZB0TbeXAjQjtJ/G97BSwmNVWW+0HP/iB1antJROwmJ0rfmBYoQWhe7yQJJPbwdpiiy28OpwEkHNXW6lZz3Nmy85axzYOk5QSDqc4zjTzi7QP61lDUsU/LDy9AR5JpKhxOEM48Owso8eFmth0t4jaNWVS1mYcw46ousOkbiFvj4cgyDkvtafiX89Ks7m96CajzOzkUIHB7w1sQY0bN841I7f2uLcVI4/1CQZPe/5LLLGEug0Y57pSNzxuiQlJcgsAedv9qmrHNqeO1fPVHI3CUEooDkv/NSRV/NB2gLAwM6wKQpwTM61kj7/VuB06hz0TZFhSTxe4Dw48LOuZrgAAZGBWZcu365jbU2dPvNcUmjfa3zbLlYGdgbl/ixO+J8GKJtZmiSmZ51dxsjZIfHWCoovixqJ9Y3dI/TNxlipAiErq1V7qjWZaA2j1ikSrT521dx1hKzYQah+aWzn2EDkcIlgr9vrV3qfrI90Ql3HzEJj8NJ8b5ApKLT90g05zHPyTPdn2lFT90IL3urWS3+5Y9DoE4mN2m++55x5enVNZgQbPKUkXuAVmi9VPLJCRBFwCF4l05MNaIs5rnSb/1bE5Q4PW0dFAFD523ZvRzOjFhq01Jx9WuitLlpqOkdiGtSjkcTyEVJz0xmbKVX6nRz65+MH9+K09FT9b5b06ZqOlsKc7SDD+HwUk7soDCLnsqPN2MErpMqj0WAPRv3HhvWP2GKkPU+gRBkqjSKFykc0eZxiuFvABN5alMo0dzy3bcOUKR5DHUROMPWVaVyMkSb5t/mau6zKGnzG7deTQ0qzYRWsZdcKECUwhvQoVUujilrVr4QKEnze6pBH8bmXI5+qBKV0GlR5rgOsaXFsYBsJ+RI+pD5Uc5maskrjo2gAJXZDI/5SSytLeXVmS0nM9QIyAvJZIG8nJn25uaXda1vh10DKjaCDmZLhd+7Wi6foJZbf0MOOtgaUyyhUR7RtJuTJYrrPSSC3LDTWGtYaTeNFyRGQT2LV+u81W3QY1PXzCQCkmvT4BkHV5viQMzNfW6Cf/SBMkGM8CRpBCojQOKr3UQLzLuLij4lqru4Nu6RqjrhtYF4TqViToC56clCLNE0aJAZi5k8Hz0zxRw2Wv+G1TcubS/gFAdbXcFGzQOMY1YMTtR/h/YJZFthjVGcDY5HlWbHUhDPQDBS1nxVsqo82Febud5/woZ7RpDfD3VwPCtEVQTgT6y0l76jOwAxNM9OnNrAYhYSKHOu0VKlQPBaw9ksHXsaMBntzqhHksMDljO3CPNdizeNFjuQbkmmogk9PeZIimDAylceDAQ9HSAGaggTGqgRlvF3qMKnLA1kAD/dDAwIH7ofUBzYEGuqSBgQN3SZEDNAMN9EMDAwfuh9YHNAca6JIGBg7cJUUO0Aw00A8NDBy4H1of0BxooEsaGDhwlxQ5QDPQQD80MHDgfmh9QHOggS5pYODAXVLkAM1AA/3QwP8AGMg7qICuIqsAAAAASUVORK5CYII=\n", + "image/jpeg": "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAAyAUADASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD3+iq1/f22mWE17eSiK2gUvJIQSFUdScdhWYPF+hGGxmW/Vkv8/ZCI3PnY5O35eeOfpzQBuUVk6DrsevJfywQukNrey2iyMQRN5ZAZl9t24f8AATWtQAUUVkxeJdJm1dtLiu99yHaI7Y2KeYo3GPzMbd4GSVzkAHjigDWorOTXdMk1afSlvIzqEEXnPb87wmcbgMcjPcU7S9ZsNagefT7gTxI5jZgrABgcEcgcg8H0oAm1HULbStNudQvJBHbW0TSyueyqMmquhawdb0/7WdOv7A+YyGG+iEcnHfAJ4PY5rnPibIZtF0zRQpb+2NVtbNwoBPl797nBPI2oc9euO9dqOlAC0Vn3Gt6fbal/Z0s5+2fZ2uhCsbMxiU4LcA98DHXJqbT9RtNW06G/sJ0uLWdd0cqdGHrQBl2HiJ9U8SX+nWVkXs9Obybq9eTaPOKhvLRcHdgEbiSMZHWt6uO8Cf8AIQ8Y/wDYfl/9Ew12NABWZea5bWWsWOmSRXJmvHKRyLCfKB2O+C/TOI24GT045rTrmfEssq634eaOyvJ0tr1p5nggLqiGCVMkj/adeBz3oA6aijtRQAUVU1LU7PSLI3d7N5cQZUGFLFmYgKqqASxJIAAGTTNL1ey1iGWSzkZvJlMMqSRtG8bjBKsrAEHBB5HIIPQ0AXqKr3t9a6db/aLydIIdyoXkOFBY4GT25IrCsvFE9341vNEGm3YtYbaKVLrygEJYyZbdu5VtoC4HUNntQB0tFYq+K9GbUxp4uyZmna2DeS/lGYDJj8zGzfgdM57da2qACiq1rqFneyTx21zFLJA5SVFYFkYEjBHUdDVnNABRVe+vbfTbKa8u5PLt4V3SPgnaO5OO1UJfE2jwaRbarJeotjdMqwTbWxIW+7jjJz29aANeiqP9rWraqmmozvctB9oZVU/u484Bb0ycgDqcH0NQw+INPutKudRs5HuYLZnWURRsXVk+8u0gHcPTrQBqUVVTUbOXS11KO4R7JofPWZTlTHjduHtjmud1Lxi1v4k0PT7GymvrTUN+65t0EiDHAw24D5Tkt146c0AdZWBa+JH/AOEsn8PahYm1nMTXFlMsm+O6iBAbHAKupPK88cgkVrTajZ297DZzXMcdxOCYo3YAvggHGevUVy2u/wDJUfCP/XpqP8oaAOyooooAKKKKACiiigDi/ilfrb+CrjTkuEiutWePT4dzAZ81wrnkjgKWJ7euKxNMtp3+Kmm6bc61Hfw6Jpsk0SpEkflyOREFwpOSEVuvIB/2q9NZFb7yg/UUBEByFUH1AoAr2Gn2ul2KWdjAsNvHnbGvQZJJ/Mkn8a8/fVvExkYi48QKCeAPD8fH/j9elUmB6CgDIj1KS08JtqV55xeC1aaXzofKc7VJOVGdp46Vw2gxx/8ACV+Hre11Y6nb2tjPeXUStGYLKVgoDgoB8zF5eHLHDMeOtejajp8Wp2TWk5YROylgpxkBg2PocYPsamit4IFdYYY4w7F2CKBuY9ScdSaAPJ9cldIJfiNpEaX11p2qzKY4Hz5toALdk4908wdcZJHBr0vw/pzaT4fsLF8GSGFRIR/E+MsfxYk/jWiqqowoA+gpelAHAa+x1P4xeFdOUhk060udRmTGR8w8pCR2wc4NdF4z1m48PeDdW1a0jWS4tbZ5I1YZG7HBI9BnJ+lYugw/2h8UvFGrNl1sobfTIXyCAdvmyAHHq6ZGeuc9q7ZlV1KsAykYIIyCKAPIdD8T6ZpnjDW7/VNfl1aWysobSGSNQ5lkJDSrGFGPmkaMKo7hh0XI1fBV3rGlweIdCOnww6mhOqafYTTEIsVxlhGWx/BJvUkcZ/OvQ0sbSMIEtoVCBQuIwNoXO0DjjGTj0zUhij8wy7F8zbt3Y5x1xn0oA5H4bLAfDdxP50kuoz3076n5qhXW63YdSoJACgKBg/dAPetS4u/FK3Mi2+kaTJCGPltJqcisy54JUQHB9sn61leBP+P/AMZf9h+X/wBEw1z3ivSlTxFGttqivq7alDqDXcoCHTLQYVlZ+6MRtVD94k8cE0AdbNqvii28vz9K0KLzHEab9YkXcx6AZg5PtVaLxJrdxftYw2vhuS8XdugXW3LjacN8vkZ4PB9K5jxvDc3Y1bVvKtZLXz4tJV5kZp4kZ0RzbgjaHLO3zc8ovPGBt6ZoWpJ47a+n0mK3062e5+yGK5UjMpBklZcbjI7AcZCqM8EnNAGwL7xac40bReOv/E2k4/8AIFU73xLremKrX9r4btVYEgz626AgdTzB0GR+dcnrtwNA8SS+PyJRp0d8+l38aJkSW21U345BKzhvc5x2q3eeEb1PCOlWml6LB9quYJRqNwkqQzRLMA0yJuXGWPy5I+ULwM4wAX/FWoXV3feGNK1Oay0yG8kmurm4imWQJ5O0xrHJIgAZiwO7AIwcetY+jardeHkuvFUt2kuk6vriW5e8IWV7UKIIplORk7lycglkG71r0mHTLabSbW0vLG3ZIo0HkOokVCFxgbuuOmaq6n4V0nWbqW41C3Nw0lo1oFkclI0bO4ovRWIOCw5wAKAMGa403VfiTdabrUtoxsbaI6fZXBH7xpA3mShW4cgKFGM7RnpurdmsTpepaprkYWRTp8USW6rg5hMrcH38wDpxiodX8HabrVjp1leNK8Ni8bIWCu7bMYy7KWB+XkqQTk810NAHi+mwS69b+E0t9X+0ahqN6mt6jBbqn2e2Vf3p+RR8jbzGvJyxLZPp0d14zurv4eardPeWtrqNlff2beXNsS0cOZljaZeSQBG+8Z6Hr0rtLnQ7GfT7qzji+yx3Q/etaHyXb1O5cHPv71Fp/hrStKup57C0S38+3jt3ij4jKR5C/L0yA2M+gAoAXQbLRbTTIm0GKzFo8ahJbYqwkUdCXGd31JJ5Ncnq914oPi/w8X0jSllH2ny1XUpCrfuxnJ8njj2P4V1Hhzw1ZeGLS4t7IsRcTm4lJREBcgKcKiqo4UdAO571oy2VtPd291LCjT2+7ynI5TcMNj6igCGSGe/0WSC+RLeaeFkkEEhkCEgj5WIGevoK4H4ZC58Q+G9AvL2J1s9Is1gtVYFRLOq7GkI7hFGxT6lz/dNemU2ONIkCRoqKOgUYAoA47RNQg06+8a6nq8kVstvqIEkrN92BbeIpnk/3icepNZXg7U7mLxzqIurGTT7PxHH/AGlYW8xIcNHiOTcP4XZfLk29geea7L+woRr0+ppJhbqFYrq3ZAyTFD+7fnowBI9wRnoKuX1o13bOkUvkTlGWO4VAzRZGCVz3oA8xtLzTVh8PaPqF7bQ6FLf6kwWZ8RzmG5KwQ7jwV+bdg9fLA56V302gQnVdHu7XyYILAzfuY4wAwkXHGOBzz+NJP4V02bwqnh1IxHYpEsSgxpKcD1DqwJPOSRnknrzWjpunwaVpdpp1tu8i1hSCPccnaoAGT34FAHI+BZdL1wXer3Elpc6691J9oU4aS0COyRxhT8yAKB6ZJLc5pPHbNBrvhq50wtJ4gWeWOztSP3c8TKPOEhyNqgBTuGSCBgHOK3j4XsG8WJ4jYE3qQtCmERQAcZywUM3T+IkDJxWRrv8AyVLwj/156h/KGgDr5ZkggeaVgkaKWZj0AAyTUOn6haarYQ31jOk9rOu+OVOjD1FWaOlABRRRQAUUUUAFFFFABRRRQAUUUUAFZ+uWF3qej3FpY6lLp104BiuolDGNgQeh4IOMEdwT0rQooAyfD2hroOnyQG5e6uJ55Lm5uHUKZZXbJOBwAOAB2AArWoooAKKKKAOb0nQr/RfE2qzwSwS6Tqk32t0clZYJ9qq2OCGVgoPJG3Henz+BvDNxrY1qXRrWTUhKs4uGBLb1xg9e2B+VdDRQBzWl+C9MtGt7u7hFzqEchuGkZ28vz2JLSCPO0Nkn5tueBXSModCpzgjBwcUtFAGXD4c0iHSJtJWxiawmLNJbyZdGLHLZDE9Tz9ea1KKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigArmYdB1C78bHXtUlgFvZwyW2nW0JLFVcqXldiB8x2gbRwAOpNdNRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFAH/2Q==\n" + }, + "metadata": {}, + "execution_count": 7 + } + ], + "source": [ + "dataset[2][\"image\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "id": "lXjfJr4W6z8P", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 52 + }, + "outputId": "0f9c8018-ae97-4118-9f28-c97f3e7b6b1d" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'H ^ { \\\\prime } = \\\\beta N \\\\int d \\\\lambda \\\\biggl \\\\{ \\\\frac { 1 } { 2 \\\\beta ^ { 2 } N ^ { 2 } } \\\\partial _ { \\\\lambda } \\\\zeta ^ { \\\\dagger } \\\\partial _ { \\\\lambda } \\\\zeta + V ( \\\\lambda ) \\\\zeta ^ { \\\\dagger } \\\\zeta \\\\biggr \\\\} \\\\ .'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 8 + } + ], + "source": [ + "dataset[2][\"text\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rKHxfZua1CrS" + }, + "source": [ + "We can also render LaTeX directly in the browser!" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "id": "nPopsxAC1CrS", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 52 + }, + "outputId": "859f1f54-03ca-42c6-83ee-9fded7ad2d07" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/latex": "$\\displaystyle \\sigma ^ { \\mu } \\frac { \\lambda ^ { a } } { 2 } A _ { \\mu } ^ { a } .$" + }, + "metadata": {} + } + ], + "source": [ + "from IPython.display import display, Math, Latex\n", + "\n", + "latex = dataset[3][\"text\"]\n", + "display(Math(latex))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "K9CBpiISFa6C" + }, + "source": [ + "To format the dataset, all vision fine-tuning tasks should follow this format:\n", + "\n", + "```python\n", + "[\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"text\", \"text\": instruction},\n", + " {\"type\": \"image\", \"image\": sample[\"image\"]},\n", + " ],\n", + " },\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"text\", \"text\": instruction},\n", + " {\"type\": \"image\", \"image\": sample[\"image\"]},\n", + " ],\n", + " },\n", + "]\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "id": "oPXzJZzHEgXe" + }, + "outputs": [], + "source": [ + "instruction = \"Write the LaTeX representation for this image.\"\n", + "\n", + "def convert_to_conversation(sample):\n", + " conversation = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"text\", \"text\": instruction},\n", + " {\"type\": \"image\", \"image\": sample[\"image\"]},\n", + " ],\n", + " },\n", + " {\"role\": \"assistant\", \"content\": [{\"type\": \"text\", \"text\": sample[\"text\"]}]},\n", + " ]\n", + " return {\"messages\": conversation}\n", + "pass" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FY-9u-OD6_gE" + }, + "source": [ + "Let's convert the dataset into the \"correct\" format for finetuning:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "id": "gFW2qXIr7Ezy" + }, + "outputs": [], + "source": [ + "converted_dataset = [convert_to_conversation(sample) for sample in dataset]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ndDUB23CGAC5" + }, + "source": [ + "The first example is now structured like below:" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "id": "gGFzmplrEy9I", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "2b05e49c-bc65-494c-cba5-e70d41c0f1a9" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'messages': [{'role': 'user',\n", + " 'content': [{'type': 'text',\n", + " 'text': 'Write the LaTeX representation for this image.'},\n", + " {'type': 'image',\n", + " 'image': }]},\n", + " {'role': 'assistant',\n", + " 'content': [{'type': 'text',\n", + " 'text': '{ \\\\frac { N } { M } } \\\\in { \\\\bf Z } , { \\\\frac { M } { P } } \\\\in { \\\\bf Z } , { \\\\frac { P } { Q } } \\\\in { \\\\bf Z }'}]}]}" + ] + }, + "metadata": {}, + "execution_count": 12 + } + ], + "source": [ + "converted_dataset[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MsRPBIb0JJ6c" + }, + "source": [ + "Lets take the Gemma 4 instruction chat template and use it in our base model" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "id": "exoDVEvmJN-6" + }, + "outputs": [], + "source": [ + "from unsloth import get_chat_template\n", + "\n", + "processor = get_chat_template(\n", + " processor,\n", + " \"gemma-4-thinking\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FecKS-dA82f5" + }, + "source": [ + "Before fine-tuning, let us evaluate the base model's performance. We do not expect strong results, as it has not encountered this chat template before." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "id": "vcat4UxA81vr", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "8514dfc7-67d1-4568-fab8-8139bd56ac6b" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "<|channel>thought\n", + "The LaTeX representation for the equation in the image is:\n", + "\n", + "```latex\n", + "H' = \\beta N \\int d\\lambda \\left\\{ \\frac{1}{2\\beta^2 N^2} \\partial_\\lambda \\zeta^\\dagger \\partial_\\lambda \\zeta + \\mathcal{V}(\\lambda) \\zeta^\\dagger \\zeta \\right\\}\n", + "```\n", + "\n", + "When rendered, it looks like this:\n", + "$H' = \\beta N \\int d\\lambda \\left\\{ \\frac{1}{2\\beta^2 N^2} \\partial_\\lambda \\zeta^\\dagger \\\n" + ] + } + ], + "source": [ + "image = dataset[2][\"image\"]\n", + "instruction = \"Write the LaTeX representation for this image.\"\n", + "\n", + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\": \"image\"}, {\"type\": \"text\", \"text\": instruction}],\n", + " }\n", + "]\n", + "input_text = processor.apply_chat_template(messages, add_generation_prompt = True)\n", + "inputs = processor(\n", + " image,\n", + " input_text,\n", + " add_special_tokens = False,\n", + " return_tensors = \"pt\",\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "\n", + "text_streamer = TextStreamer(processor, skip_prompt = True)\n", + "result = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128,\n", + " use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FeAiMlQ71CrS" + }, + "source": [ + "You can see it's absolutely terrible! It doesn't follow instructions at all" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "idAEIeSQ3xdS" + }, + "source": [ + "\n", + "### Train the model\n", + "Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support `DPOTrainer` and `GRPOTrainer` for reinforcement learning!\n", + "\n", + "We use our new `UnslothVisionDataCollator` which will help in our vision finetuning setup." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "id": "95_Nn-89DhsL", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "d9c94f8e-d6ef-4f64-ca13-428028335b88" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Unsloth: Model does not have a default image size - using 512\n" + ] + } + ], + "source": [ + "from unsloth.trainer import UnslothVisionDataCollator\n", + "from trl import SFTTrainer, SFTConfig\n", + "\n", + "trainer = SFTTrainer(\n", + " model = model,\n", + " train_dataset = converted_dataset,\n", + " processing_class = processor.tokenizer,\n", + " data_collator = UnslothVisionDataCollator(model, processor),\n", + " args = SFTConfig(\n", + " per_device_train_batch_size = 1,\n", + " gradient_accumulation_steps = 4,\n", + " max_grad_norm = 0.3,\n", + " warmup_ratio = 0.03,\n", + " max_steps = 60,\n", + " # num_train_epochs = 2, # Set this instead of max_steps for full training runs\n", + " learning_rate = 2e-4,\n", + " logging_steps = 1,\n", + " save_strategy = \"steps\",\n", + " optim = \"adamw_8bit\",\n", + " weight_decay = 0.001,\n", + " lr_scheduler_type = \"cosine\",\n", + " seed = 3407,\n", + " output_dir = \"outputs\",\n", + " report_to = \"none\", # For Weights and Biases or others\n", + "\n", + " # You MUST put the below items for vision finetuning:\n", + " remove_unused_columns = False,\n", + " dataset_text_field = \"\",\n", + " dataset_kwargs = {\"skip_prepare_dataset\": True},\n", + " max_length = 2048,\n", + " )\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "cellView": "form", + "id": "2ejIt2xSNKKp", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "99c38b86-0096-4319-ba4b-7cf3d76681b1" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "GPU = NVIDIA A100-SXM4-80GB. Max memory = 79.251 GB.\n", + "46.615 GB of memory reserved.\n" + ] + } + ], + "source": [ + "# @title Show current memory stats\n", + "gpu_stats = torch.cuda.get_device_properties(0)\n", + "start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)\n", + "print(f\"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.\")\n", + "print(f\"{start_gpu_memory} GB of memory reserved.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "id": "yqxqAZ7KJ4oL", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "outputId": "34c1db7c-d3b0-414c-b281-7eb01421bfbf" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': 2}.\n", + "==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1\n", + " \\\\ /| Num examples = 68,686 | Num Epochs = 1 | Total steps = 60\n", + "O^O/ \\_/ \\ Batch size per device = 1 | Gradient accumulation steps = 4\n", + "\\ / Data Parallel GPUs = 1 | Total batch size (1 x 4 x 1) = 4\n", + " \"-____-\" Trainable parameters = 59,275,776 of 25,865,209,648 (0.23% trained)\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Unsloth: Will smartly offload gradients to save VRAM!\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [60/60 18:34, Epoch 0/1]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StepTraining Loss
12.986776
23.611241
33.235800
43.207892
52.393348
61.994182
71.238167
81.074898
91.115454
100.414163
110.488537
120.463625
130.485965
140.257269
150.223266
160.299220
170.228006
180.165416
190.214501
200.163012
210.160776
220.227581
230.252964
240.154257
250.162097
260.201924
270.192899
280.150303
290.220560
300.132235
310.154113
320.265951
330.128669
340.118749
350.169299
360.130528
370.099769
380.122619
390.148352
400.202640
410.163365
420.065318
430.185360
440.124109
450.060225
460.107721
470.129032
480.118885
490.188991
500.109477
510.080716
520.167722
530.136780
540.125553
550.158793
560.392148
570.100937
580.066817
590.181567
600.087730

" + ] + }, + "metadata": {} + } + ], + "source": [ + "trainer_stats = trainer.train()" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "cellView": "form", + "id": "pCqnaKmlO1U9", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "f9f14f4d-75bb-4423-ddc3-e79677d21d8a" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "1187.0917 seconds used for training.\n", + "19.78 minutes used for training.\n", + "Peak reserved memory = 46.926 GB.\n", + "Peak reserved memory for training = 0.311 GB.\n", + "Peak reserved memory % of max memory = 59.212 %.\n", + "Peak reserved memory for training % of max memory = 0.392 %.\n" + ] + } + ], + "source": [ + "# @title Show final memory and time stats\n", + "used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "used_memory_for_lora = round(used_memory - start_gpu_memory, 3)\n", + "used_percentage = round(used_memory / max_memory * 100, 3)\n", + "lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)\n", + "print(f\"{trainer_stats.metrics['train_runtime']} seconds used for training.\")\n", + "print(\n", + " f\"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.\"\n", + ")\n", + "print(f\"Peak reserved memory = {used_memory} GB.\")\n", + "print(f\"Peak reserved memory for training = {used_memory_for_lora} GB.\")\n", + "print(f\"Peak reserved memory % of max memory = {used_percentage} %.\")\n", + "print(f\"Peak reserved memory for training % of max memory = {lora_percentage} %.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ekOmTR1hSNcr" + }, + "source": [ + "\n", + "### Inference\n", + "Let's run the model! You can modify the instruction and input—just leave the output blank.\n", + "\n", + "We'll use the best hyperparameters for inference on Gemma: `top_p=0.95`, `top_k=64`, and `temperature=1.0`." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "id": "kR3gIAX-SM2q", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "76204120-8c3f-49b4-a5e2-f1e66a5b7f3a" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "<|channel>thought\n", + "[ [ B _ { n } ^ { + } , b _ { 2 } ^ { - } ] , b _ { 2 } ^ { + } ] = n B _ { n } ^ { + } , \\quad [ [ B _ { n } ^ { - } , b _ { 2 } ^ { + } ] , b _ { 2 } ^ { - } ] = n B _ { n } ^ { - } .\n" + ] + } + ], + "source": [ + "image = dataset[10][\"image\"]\n", + "instruction = \"Write the LaTeX representation for this image.\"\n", + "\n", + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\": \"image\"}, {\"type\": \"text\", \"text\": instruction}],\n", + " }\n", + "]\n", + "\n", + "input_text = processor.apply_chat_template(messages, add_generation_prompt = True)\n", + "\n", + "inputs = processor(\n", + " image,\n", + " input_text,\n", + " add_special_tokens = False,\n", + " return_tensors = \"pt\",\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "\n", + "text_streamer = TextStreamer(processor, skip_prompt = True)\n", + "result = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128,\n", + " use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uMuVrWbjAzhc" + }, + "source": [ + "\n", + "### Saving, loading finetuned models\n", + "To save the final model as LoRA adapters, use Hugging Face’s `push_to_hub` for online saving, or `save_pretrained` for local storage.\n", + "\n", + "**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "id": "upcOlWe7A1vc", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "a5558703-774e-4d23-9794-0c3fbc5ccad4" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "['gemma_4_lora/processor_config.json']" + ] + }, + "metadata": {}, + "execution_count": 20 + } + ], + "source": [ + "model.save_pretrained(\"gemma_4_lora\") # Local saving\n", + "processor.save_pretrained(\"gemma_4_lora\")\n", + "# model.push_to_hub(\"your_name/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving\n", + "# processor.push_to_hub(\"your_name/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AEEcJ4qfC7Lp" + }, + "source": [ + "Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "id": "MKX_XKs_BNZR", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "f214eeec-cf71-4e48-a13f-144ef5bf37ce" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "<|channel>thought\n", + "The image shows a mathematical expression in tensor notation, likely from a physics context (such as general relativity or gauge theory).\n", + "\n", + "The expression is:\n", + "$$D _ { \\mu } ^ { \\alpha \\beta } \\bar { A } _ { \\mu } ^ { \\alpha \\beta } = 0 ,$$\n", + "\n", + "### Breakdown of the notation:\n", + "* **$D$**: Represents a covariant derivative.\n", + "* **$\\mu$**: The index being contracted.\n", + "* **$\\alpha, \\beta$**: Indices representing the tensor components.\n", + "* **$\\bar{A\n" + ] + } + ], + "source": [ + "if False:\n", + " from unsloth import FastVisionModel\n", + "\n", + " model, processor = FastVisionModel.from_pretrained(\n", + " model_name = \"gemma_4_lora\", # YOUR MODEL YOU USED FOR TRAINING\n", + " load_in_4bit = True, # Set to False for 16bit LoRA\n", + " )\n", + "\n", + "sample = dataset[1]\n", + "image = sample[\"image\"].convert(\"RGB\")\n", + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": sample[\"text\"],\n", + " },\n", + " {\n", + " \"type\": \"image\",\n", + " },\n", + " ],\n", + " },\n", + "]\n", + "input_text = processor.apply_chat_template(messages, add_generation_prompt = True)\n", + "inputs = processor(\n", + " image,\n", + " input_text,\n", + " add_special_tokens = False,\n", + " return_tensors = \"pt\",\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "\n", + "text_streamer = TextStreamer(processor.tokenizer, skip_prompt = True)\n", + "_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128,\n", + " use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f422JgM9sdVT" + }, + "source": [ + "### Saving to float16 for VLLM\n", + "\n", + "We also support saving to `float16` directly. Select `merged_16bit` for float16. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See [our docs](https://unsloth.ai/docs/basics/inference-and-deployment) for more deployment options." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "id": "iHjt_SMYsd3P" + }, + "outputs": [], + "source": [ + "# Select ONLY 1 to save! (Both not needed!)\n", + "\n", + "# Save locally to 16bit\n", + "if False: model.save_pretrained_merged(\"unsloth_finetune\", processor,)\n", + "\n", + "# To export and save to your Hugging Face account\n", + "if False: model.push_to_hub_merged(\"YOUR_USERNAME/unsloth_finetune\", processor, token = \"YOUR_HF_TOKEN\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TSjNVDCYv-yr" + }, + "source": [ + "And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!\n", + "\n", + "Some other resources:\n", + "1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)\n", + "2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)\n", + "3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)\n", + "4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)!\n", + "\n", + "

\n", + " \n", + " \n", + " \n", + "\n", + " Join Discord if you need help + ⭐️ Star us on Github ⭐️\n", + "
\n", + "\n", + " This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme)." + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "A100", + "machine_shape": "hm", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.3" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "7a0fef44174146c98c37ecedecf45d39": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_84760c4a858e46799d253ed3ed86864b", + "IPY_MODEL_451c2f99255e481fb940b0fbf6ddcb58", + "IPY_MODEL_da843bee4e4643eba56b23c2c77aff70" + ], + "layout": "IPY_MODEL_82cbe0e382ec49faba78de7b6076b0cf" + } + }, + "84760c4a858e46799d253ed3ed86864b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_43cac410ebb9427aa7b02011287fb0e6", + "placeholder": "​", + "style": "IPY_MODEL_b6e3107751ed469f918b08dbe981c84d", + "value": "model.safetensors.index.json: " + } + }, + "451c2f99255e481fb940b0fbf6ddcb58": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0dae1b24c79f4cabba0fc8509b191360", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_ee0d046105cf4d69b9bad9c06978227a", + "value": 1 + } + }, + "da843bee4e4643eba56b23c2c77aff70": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_f9cffd18bda8431f8a3bede12812168d", + "placeholder": "​", + "style": "IPY_MODEL_73a74bcdebc0483db57f9bde93dc3b41", + "value": " 103k/? [00:00<00:00, 9.53MB/s]" + } + }, + "82cbe0e382ec49faba78de7b6076b0cf": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "43cac410ebb9427aa7b02011287fb0e6": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b6e3107751ed469f918b08dbe981c84d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "0dae1b24c79f4cabba0fc8509b191360": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "ee0d046105cf4d69b9bad9c06978227a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "f9cffd18bda8431f8a3bede12812168d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "73a74bcdebc0483db57f9bde93dc3b41": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "2832a79a4f824c4c8f95a2a626582ba6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_e150eaba931d4f649f6a9c64f1dda02c", + "IPY_MODEL_358cbf15de2b4b00a1e708e7f6e29bb8", + "IPY_MODEL_6f425cd96d5349fa816d9dc0584847b2" + ], + "layout": "IPY_MODEL_4893d81e6e984c8ca738593658719209" + } + }, + "e150eaba931d4f649f6a9c64f1dda02c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a4bf50d66e8040da9d3b5271f2187dcd", + "placeholder": "​", + "style": "IPY_MODEL_49e5b3f23ed84e3cb1a57a7e63788b07", + "value": "Download complete: 100%" + } + }, + "358cbf15de2b4b00a1e708e7f6e29bb8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_521e891f3e594a9e95f233aadf1c4dfa", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_8c143f5422ee486f8716091a0a191ddf", + "value": 1 + } + }, + "6f425cd96d5349fa816d9dc0584847b2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_dcbdcfc579364725859f64933cd2dec8", + "placeholder": "​", + "style": "IPY_MODEL_6630c1be8c1f4c50b9d17d0ba774365d", + "value": " 51.6G/51.6G [02:23<00:00, 386MB/s]" + } + }, + "4893d81e6e984c8ca738593658719209": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a4bf50d66e8040da9d3b5271f2187dcd": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "49e5b3f23ed84e3cb1a57a7e63788b07": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "521e891f3e594a9e95f233aadf1c4dfa": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "8c143f5422ee486f8716091a0a191ddf": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "dcbdcfc579364725859f64933cd2dec8": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6630c1be8c1f4c50b9d17d0ba774365d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "2e2c558295f0452ea3c09bcd29685a36": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_212cb67858a0498c96f3a8bfc16e0544", + "IPY_MODEL_6d8e427e07704af581a52d81c03d289d", + "IPY_MODEL_b11e6e50cb58496db7e763c090fb72da" + ], + "layout": "IPY_MODEL_b8c82d51698e4754b611e4589809d729" + } + }, + "212cb67858a0498c96f3a8bfc16e0544": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_5b5ea275ee2f47359c726ab08a022189", + "placeholder": "​", + "style": "IPY_MODEL_e0b83356e66b4095982601c6a76a91f6", + "value": "Fetching 2 files: 100%" + } + }, + "6d8e427e07704af581a52d81c03d289d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_5120f72dd76945f1b1cb000eb3bfde0f", + "max": 2, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_27edb4245f3d4b03b64164333a90848a", + "value": 2 + } + }, + "b11e6e50cb58496db7e763c090fb72da": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_79322768faff475499be0ebb67805a65", + "placeholder": "​", + "style": "IPY_MODEL_bb09f0eaaa2c4e4bb248d490501b488e", + "value": " 2/2 [02:23<00:00, 143.11s/it]" + } + }, + "b8c82d51698e4754b611e4589809d729": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5b5ea275ee2f47359c726ab08a022189": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e0b83356e66b4095982601c6a76a91f6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "5120f72dd76945f1b1cb000eb3bfde0f": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "27edb4245f3d4b03b64164333a90848a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "79322768faff475499be0ebb67805a65": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "bb09f0eaaa2c4e4bb248d490501b488e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "e2d8bc90d34f45d6b68abf2b22a3e9a7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_ff60dfd2948b4279b6cbda74b37c33b2", + "IPY_MODEL_0e0b9c3f46aa477c8f9c6d3a26bd63cf", + "IPY_MODEL_4af204e557cb4cc28586af24956dd3b0" + ], + "layout": "IPY_MODEL_0b4f98b868f8451d8cb99f6190321c8d" + } + }, + "ff60dfd2948b4279b6cbda74b37c33b2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_4082b09ab6c749c49d71d368cdf85141", + "placeholder": "​", + "style": "IPY_MODEL_e98bb563f163410b99017d1985d4ca2a", + "value": "Loading weights: 100%" + } + }, + "0e0b9c3f46aa477c8f9c6d3a26bd63cf": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_25048b0ead8f4484b2d5017491823090", + "max": 1013, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_28078c7802c4429b9baf4c7a7b7c72a1", + "value": 1013 + } + }, + "4af204e557cb4cc28586af24956dd3b0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_509af22adf884b03ba262bd41ff8ea75", + "placeholder": "​", + "style": "IPY_MODEL_758c27fab7674b248981c8eb67d3d60e", + "value": " 1013/1013 [00:14<00:00, 553.64it/s]" + } + }, + "0b4f98b868f8451d8cb99f6190321c8d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4082b09ab6c749c49d71d368cdf85141": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e98bb563f163410b99017d1985d4ca2a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "25048b0ead8f4484b2d5017491823090": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "28078c7802c4429b9baf4c7a7b7c72a1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "509af22adf884b03ba262bd41ff8ea75": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "758c27fab7674b248981c8eb67d3d60e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "39da2c555af34b0dbee8a65ba82c0ef5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_2caf91d8b96246d8b0a2e7f2cc3eb326", + "IPY_MODEL_a64f5ed11fc2481aa79ff65998d216c6", + "IPY_MODEL_2081138477dc4a59b0932d099c5c91e3" + ], + "layout": "IPY_MODEL_4d1e5c4c438044188d0572c54d56f851" + } + }, + "2caf91d8b96246d8b0a2e7f2cc3eb326": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c7d8ec30f492459296878ae9b8c92f73", + "placeholder": "​", + "style": "IPY_MODEL_44b23794497242938bcb8b2b193b4758", + "value": "generation_config.json: 100%" + } + }, + "a64f5ed11fc2481aa79ff65998d216c6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0bc564d52fbd4636b18d77f49a793be0", + "max": 208, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_724731bdf7d64e208c8da43df88d3916", + "value": 208 + } + }, + "2081138477dc4a59b0932d099c5c91e3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d31c33be9b50430c84e099c561e8491a", + "placeholder": "​", + "style": "IPY_MODEL_a69f172b151a41b3afdf594e8300dd63", + "value": " 208/208 [00:00<00:00, 27.2kB/s]" + } + }, + "4d1e5c4c438044188d0572c54d56f851": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c7d8ec30f492459296878ae9b8c92f73": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "44b23794497242938bcb8b2b193b4758": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "0bc564d52fbd4636b18d77f49a793be0": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "724731bdf7d64e208c8da43df88d3916": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "d31c33be9b50430c84e099c561e8491a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a69f172b151a41b3afdf594e8300dd63": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "853c106980d745abbcec56ab94573120": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_e5385ae4aa5a4b62a1c01f1f0b7be3e6", + "IPY_MODEL_04e9e69091fe45e0b7bd59cee36f23e3", + "IPY_MODEL_5410c1cc987741f8b68318ab56336621" + ], + "layout": "IPY_MODEL_86cec3facded4f8eb655afc878cecc96" + } + }, + "e5385ae4aa5a4b62a1c01f1f0b7be3e6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_2ab5080fa805496d99219628cfc534c5", + "placeholder": "​", + "style": "IPY_MODEL_b01770b4e0074b2582f03bb4ac34b68f", + "value": "processor_config.json: " + } + }, + "04e9e69091fe45e0b7bd59cee36f23e3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d63b8fd5ca6a401a9c760c6405731a82", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_b7ec7200facb477283dc5de798770e2f", + "value": 1 + } + }, + "5410c1cc987741f8b68318ab56336621": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_1d96a405a2024cf7b380844283029ed3", + "placeholder": "​", + "style": "IPY_MODEL_60a6115cbfba408fbda48a00ace7afc1", + "value": " 1.69k/? [00:00<00:00, 174kB/s]" + } + }, + "86cec3facded4f8eb655afc878cecc96": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2ab5080fa805496d99219628cfc534c5": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b01770b4e0074b2582f03bb4ac34b68f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "d63b8fd5ca6a401a9c760c6405731a82": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "b7ec7200facb477283dc5de798770e2f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "1d96a405a2024cf7b380844283029ed3": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "60a6115cbfba408fbda48a00ace7afc1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "d25954e26fbd407fa5430a78ba49a0b3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_7b3356198bac4650b26cb10fb6724694", + "IPY_MODEL_1906304d8f6c4163b9484a4fcc0d2cde", + "IPY_MODEL_56e0cc76ccb54253adb3cd26b3113c2b" + ], + "layout": "IPY_MODEL_7704ce4caa804b7c8007f6301e774d9f" + } + }, + "7b3356198bac4650b26cb10fb6724694": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3607b98e5dfb47faa77ae97560951467", + "placeholder": "​", + "style": "IPY_MODEL_45e788079b654d2b89c4665faf42d1fd", + "value": "chat_template.jinja: " + } + }, + "1906304d8f6c4163b9484a4fcc0d2cde": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_48b508296aa54d729314e2a2c24160dd", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_dcbfd0eaa87b40ee8157fe3aa489ec4d", + "value": 1 + } + }, + "56e0cc76ccb54253adb3cd26b3113c2b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b65aa8c218fe4b039bba43def6756e9c", + "placeholder": "​", + "style": "IPY_MODEL_cf60ba5586f14a93a8df74129e2b06aa", + "value": " 12.0k/? [00:00<00:00, 1.29MB/s]" + } + }, + "7704ce4caa804b7c8007f6301e774d9f": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3607b98e5dfb47faa77ae97560951467": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "45e788079b654d2b89c4665faf42d1fd": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "48b508296aa54d729314e2a2c24160dd": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "dcbfd0eaa87b40ee8157fe3aa489ec4d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "b65aa8c218fe4b039bba43def6756e9c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "cf60ba5586f14a93a8df74129e2b06aa": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "5450e335cdc248c1bd43e5d18002ebaa": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_c019419080b74a96ada7632a57dd8485", + "IPY_MODEL_9d120705e8c447abb09bc20843da23f4", + "IPY_MODEL_47837cac654c4f05921f1029b6718815" + ], + "layout": "IPY_MODEL_c1ba8e990adb4c3aa63d786ec8e93af7" + } + }, + "c019419080b74a96ada7632a57dd8485": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_e2e92f6bcf934ad8bf00cf6b2aef3ba4", + "placeholder": "​", + "style": "IPY_MODEL_bb44124af9914c29aa02aee3dfc214a0", + "value": "tokenizer_config.json: " + } + }, + "9d120705e8c447abb09bc20843da23f4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_47ac15c001a6445c94781c8746012cd4", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_462243e2ef5d4635a9ab6df9e5779f6c", + "value": 1 + } + }, + "47837cac654c4f05921f1029b6718815": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c9de48023b9e4625ac8b32c87ffa92dd", + "placeholder": "​", + "style": "IPY_MODEL_4586b977bb6f4c739bb9fa5819cabfc6", + "value": " 15.0k/? [00:00<00:00, 1.61MB/s]" + } + }, + "c1ba8e990adb4c3aa63d786ec8e93af7": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e2e92f6bcf934ad8bf00cf6b2aef3ba4": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "bb44124af9914c29aa02aee3dfc214a0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "47ac15c001a6445c94781c8746012cd4": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "462243e2ef5d4635a9ab6df9e5779f6c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "c9de48023b9e4625ac8b32c87ffa92dd": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4586b977bb6f4c739bb9fa5819cabfc6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "43ee4b1bf75542309a1622da87399e6d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_a5f0c555837c427d932cd4ab3b372889", + "IPY_MODEL_4486f8a8e17c48ac8056d1337e221050", + "IPY_MODEL_f840f357c01a49e6a77b1be045aa7d12" + ], + "layout": "IPY_MODEL_8aa47b7fd9924cf4869f9e14b5b4ead5" + } + }, + "a5f0c555837c427d932cd4ab3b372889": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c489d961ea974c40a4e1f24b44aa0a2a", + "placeholder": "​", + "style": "IPY_MODEL_f51553d1142b4697ad396bee6fb7b21d", + "value": "tokenizer.json: 100%" + } + }, + "4486f8a8e17c48ac8056d1337e221050": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_2af0f039d0304149a1f997d053b84405", + "max": 32169626, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_91f37acd8eb844f4a18825bdc3e1f6c2", + "value": 32169626 + } + }, + "f840f357c01a49e6a77b1be045aa7d12": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_6ee677fbe8094e34a85628f69273cbca", + "placeholder": "​", + "style": "IPY_MODEL_7b5b3e29daf5435a80eae703f4a23444", + "value": " 32.2M/32.2M [00:01<00:00, 161MB/s]" + } + }, + "8aa47b7fd9924cf4869f9e14b5b4ead5": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c489d961ea974c40a4e1f24b44aa0a2a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f51553d1142b4697ad396bee6fb7b21d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "2af0f039d0304149a1f997d053b84405": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "91f37acd8eb844f4a18825bdc3e1f6c2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "6ee677fbe8094e34a85628f69273cbca": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7b5b3e29daf5435a80eae703f4a23444": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "e6cf365cf0674a8884e0c8ac8f8280e8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_7e30c5c4c96d48e3a78689e33caafcd3", + "IPY_MODEL_7de02d7843554276ad002c048a27c0d0", + "IPY_MODEL_96cfbd5b4f314ea4841878182a7efe76" + ], + "layout": "IPY_MODEL_562d09896c604940922612e0e07f9c65" + } + }, + "7e30c5c4c96d48e3a78689e33caafcd3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_6f4bfaeacaa94fe9a5e6183255c23610", + "placeholder": "​", + "style": "IPY_MODEL_bda1c422c31a400abab7180489c6827a", + "value": "README.md: 100%" + } + }, + "7de02d7843554276ad002c048a27c0d0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c7ea3a7037814db68f2e49efc250dc53", + "max": 519, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_314fc72dcdde4fe0b9b59439f0008a3d", + "value": 519 + } + }, + "96cfbd5b4f314ea4841878182a7efe76": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_48e085b16ef14d99b042debed39dcb6c", + "placeholder": "​", + "style": "IPY_MODEL_54a8a8e37c3d49158c65380451eb8712", + "value": " 519/519 [00:00<00:00, 60.4kB/s]" + } + }, + "562d09896c604940922612e0e07f9c65": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6f4bfaeacaa94fe9a5e6183255c23610": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "bda1c422c31a400abab7180489c6827a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "c7ea3a7037814db68f2e49efc250dc53": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "314fc72dcdde4fe0b9b59439f0008a3d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "48e085b16ef14d99b042debed39dcb6c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "54a8a8e37c3d49158c65380451eb8712": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "9a6f19b03be84e5aa83b969363d08266": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_81f70c633bc4400e8a87c1cba5c878cb", + "IPY_MODEL_4384402858074f76a55cc80ff7ea31f4", + "IPY_MODEL_8fc9de0fd3964947a7f98a091b81d4a2" + ], + "layout": "IPY_MODEL_8c1a59d63151455e8d6008943028bfed" + } + }, + "81f70c633bc4400e8a87c1cba5c878cb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ac2d18a0024d4f9292e96f5b2ffc454d", + "placeholder": "​", + "style": "IPY_MODEL_b855d3442a164d599f2e7462f153fa95", + "value": "data/train-00000-of-00001.parquet: 100%" + } + }, + "4384402858074f76a55cc80ff7ea31f4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_4e5d2c84e7b2409bbb1d45e6d4130821", + "max": 343805431, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_9f3d00003e14495e8a4fb3cfb563e13e", + "value": 343805431 + } + }, + "8fc9de0fd3964947a7f98a091b81d4a2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7fe02defa2cb4928ae9fbd155c02caee", + "placeholder": "​", + "style": "IPY_MODEL_5762804fcc6a477dbaaff1622e67752d", + "value": " 344M/344M [00:05<00:00, 65.2MB/s]" + } + }, + "8c1a59d63151455e8d6008943028bfed": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ac2d18a0024d4f9292e96f5b2ffc454d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b855d3442a164d599f2e7462f153fa95": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "4e5d2c84e7b2409bbb1d45e6d4130821": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9f3d00003e14495e8a4fb3cfb563e13e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "7fe02defa2cb4928ae9fbd155c02caee": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5762804fcc6a477dbaaff1622e67752d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "44475c29f3e74859a22b79e426cbff03": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_a1d0d7eed4b349e69421762f608a2da0", + "IPY_MODEL_5955b2caad154badb366ab1dc729bf85", + "IPY_MODEL_3203efbc3d4b4195abe86d53b48b8b77" + ], + "layout": "IPY_MODEL_e220f3b014f946fa9751a4d39ed0303c" + } + }, + "a1d0d7eed4b349e69421762f608a2da0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_beda7d7f24cb413ebcdb42f238dc89bd", + "placeholder": "​", + "style": "IPY_MODEL_ec5e1b08549d47ca99a7857a97b94022", + "value": "data/test-00000-of-00001.parquet: 100%" + } + }, + "5955b2caad154badb366ab1dc729bf85": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_1355eb5a553a4f87b4a5a6dda9b1160c", + "max": 38205016, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_67fc62df6f6a411c96cb8769f241a810", + "value": 38205016 + } + }, + "3203efbc3d4b4195abe86d53b48b8b77": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_1a2dd7eb1fc94f49b65c10b677b26684", + "placeholder": "​", + "style": "IPY_MODEL_60936221bc13459c8adefcc141ac7165", + "value": " 38.2M/38.2M [00:00<00:00, 191MB/s]" + } + }, + "e220f3b014f946fa9751a4d39ed0303c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "beda7d7f24cb413ebcdb42f238dc89bd": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ec5e1b08549d47ca99a7857a97b94022": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "1355eb5a553a4f87b4a5a6dda9b1160c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "67fc62df6f6a411c96cb8769f241a810": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "1a2dd7eb1fc94f49b65c10b677b26684": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "60936221bc13459c8adefcc141ac7165": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "079918351e0f463bba0cc030663acffc": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_fa068bcbf12a4f9b8d92f0dad7ee4c71", + "IPY_MODEL_20ded9b9c93a4e07a946e13bbb68062d", + "IPY_MODEL_8845926888a448f98a2d32eddfb9f710" + ], + "layout": "IPY_MODEL_a2c4067b539d42a0ab8a71a2ec05742f" + } + }, + "fa068bcbf12a4f9b8d92f0dad7ee4c71": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d212975985c24093951238bf6b32f85b", + "placeholder": "​", + "style": "IPY_MODEL_3f674fdc9dbc421386f653c880926d96", + "value": "Generating train split: 100%" + } + }, + "20ded9b9c93a4e07a946e13bbb68062d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_29d34168eb7c408486892b2b17f279b9", + "max": 68686, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_00bb9e9c8b124008af2a01ed6ca53b13", + "value": 68686 + } + }, + "8845926888a448f98a2d32eddfb9f710": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_08279df17aad4a6080e6fe784d5b9e81", + "placeholder": "​", + "style": "IPY_MODEL_aaa881bedb974089931e8eee3b193769", + "value": " 68686/68686 [00:00<00:00, 124063.71 examples/s]" + } + }, + "a2c4067b539d42a0ab8a71a2ec05742f": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d212975985c24093951238bf6b32f85b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3f674fdc9dbc421386f653c880926d96": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "29d34168eb7c408486892b2b17f279b9": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "00bb9e9c8b124008af2a01ed6ca53b13": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "08279df17aad4a6080e6fe784d5b9e81": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "aaa881bedb974089931e8eee3b193769": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "e89e685da18542fca578ff9bd9aaa8cb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_56e57386c76b437c8e523405dbc74d89", + "IPY_MODEL_cebedffd8b49445e9721acaa3301d4ea", + "IPY_MODEL_f59d8fb192174fce8983987a2a299e77" + ], + "layout": "IPY_MODEL_3aaf75c2b9754b6298cdf346d51a4516" + } + }, + "56e57386c76b437c8e523405dbc74d89": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_01baf8e6d6b944529659448e08efe79c", + "placeholder": "​", + "style": "IPY_MODEL_0403261c40634c8daac850bd575fc966", + "value": "Generating test split: 100%" + } + }, + "cebedffd8b49445e9721acaa3301d4ea": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_8e8c01da153245c8b4e942dae53e2dd2", + "max": 7632, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_0bf222211e5a4e9cba517790a6089144", + "value": 7632 + } + }, + "f59d8fb192174fce8983987a2a299e77": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_26a7d1ec9dda441f9b1f99052b082422", + "placeholder": "​", + "style": "IPY_MODEL_b5eb5d5fe03543ac9327188d0311bcda", + "value": " 7632/7632 [00:00<00:00, 138348.46 examples/s]" + } + }, + "3aaf75c2b9754b6298cdf346d51a4516": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "01baf8e6d6b944529659448e08efe79c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0403261c40634c8daac850bd575fc966": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "8e8c01da153245c8b4e942dae53e2dd2": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0bf222211e5a4e9cba517790a6089144": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "26a7d1ec9dda441f9b1f99052b082422": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b5eb5d5fe03543ac9327188d0311bcda": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "state": {} + } + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/tooling/fine-tuning/unsloth/notebooks/Gemma4_(31B)-Text.ipynb b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(31B)-Text.ipynb new file mode 100644 index 0000000..18ae530 --- /dev/null +++ b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(31B)-Text.ipynb @@ -0,0 +1,7707 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "n5NrROnmsJP4" + }, + "source": [ + "To run this, press \"*Runtime*\" and press \"*Run all*\" on a Google Colab A100 instance!\n", + "
\n", + "\n", + "\n", + " Join Discord if you need help + ⭐ Star us on Github ⭐\n", + "
\n", + "\n", + "To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme).\n", + "\n", + "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jjzmSqm7sJP4" + }, + "source": [ + "### News" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WDatJHaBsJP5" + }, + "source": [ + "Introducing **Unsloth Studio** - a new open source, no-code web UI to train and run LLMs. [Blog](https://unsloth.ai/docs/new/studio) • [Notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb)\n", + "\n", + "\n", + "\n", + "\n", + "
\"Unsloth
Train models — no code needed
\"Unsloth
Run GGUF models on Mac, Windows & Linux
\n", + "\n", + "Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe)\n", + "\n", + "Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context)\n", + "\n", + "New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) • [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) • [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning)\n", + "\n", + "Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9TnXzZYMsJP5" + }, + "source": [ + "### Installation" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "B2T3Z8F2sJP5" + }, + "outputs": [], + "source": "%%capture\nimport os, re\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r'[\\d]{1,}\\.[\\d]{1,}', str(torch.__version__)).group(0)\n xformers = 'xformers==' + {'2.10':'0.0.34','2.9':'0.0.33.post1','2.8':'0.0.32.post2'}.get(v, \"0.0.34\")\n !pip install sentencepiece protobuf \"datasets==4.3.0\" \"huggingface_hub>=0.34.0\" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;" + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "lBN09c1tUlSV" + }, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TGMWlrRdzwgf" + }, + "source": [ + "### Unsloth\n", + "\n", + "`FastModel` supports loading nearly any model now! This includes Vision and Text models!" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "-Xbb0cuLzwgf", + "outputId": "3d9968a4-cb68-4112-aa41-29a78c0c8ae0", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 461, + "referenced_widgets": [ + "ddfff68b86604e0daa29ef5bc5b097c3", + "ad049bffb6ab409eab6268cb8b46e1d7", + "1a7eb21cd8f14d0b98517dddd5a691c3", + "8035803031e048be9cda8d78aaf83f1b", + "2c3ad97c23f840f5997f3e5554ef7b02", + "c731d59a7ea449948fa2de671225ddc2", + "bff47b44ee1a49fbb0c69a24a7d1985b", + "3753574c0ed14e6583ef6228f921efc3", + "b897d2670a8e42ac89d526b7daf7d591", + "e262ca340c0a4269a5f8571cfa85f1ce", + "6b98b989f90842e79b20bbd8007ed266", + "03d61f85075f4ccab14b1d7cb41139f0", + "76bb1448130f41bd8d36591733b2bc13", + "ac0b4b138d3b4702880b6eff4ace812e", + "7d8ce2a07d0d4b0d8c0ce2dd9bce1d8f", + "7deed7b9ad7e4d11b2e55e060295876f", + "54d6d94660dc4131acb117491bc67fe4", + "63a2d2156b0d4c9e9b62638d7b98a31a", + "e49a0432d7ff4865937f127bb42d5694", + "abf451cd86ee4736a2973e17957cb65a", + "61b85a135758473ca21478a99b294662", + "5b9428103bc0447b84830baba8cd0792", + "3af787f578db4dde81a8515af76e2d4a", + "cfb0f9b35dad41ff89f6d4c9294fb141", + "12741070e1b344c5aef74e1f4e149ef0", + "e53e734a00254b589f8bf2c46d99a157", + "9515391e235540438acccbe364707b84", + "6331c3b6d264458fa34668a0e5c29d6e", + "05727779b5eb4dcaaea43a91e7b173cd", + "7ff031eac8a2489493c94585a02af25c", + "2d505dcd97d049d6ba2ff96c835c0f56", + "e7d9e35834254739ba9c263eceec4ae3", + "b92921f4b5e44191a70aa7f297d3b26d", + "18e1bcd9e38b4de98b60ad764b20f9b4", + "c364867b67644ee7b4480907f43a6333", + "4abe4c12f83242db9b5f48ffc732c967", + "463e555166254ca295339a5b2377aa74", + "ac13c14fc8254d62b4bf2effba318efb", + "98c1b6ab11854572a5472858a08db06d", + "6ff843b22e7f48b6bb65be7de51d78a8", + "acf6196472d1429ea88f6a960f90cfcd", + "97742cb20ea3454fa324a3da91f7a23a", + "2100a1ea915747d9bdec9368a15e3efb", + "9ede245da88b4c288bf621cdba54872a", + "90186299fc68489fa74be2f6b344fd1e", + "1b5095988f394b059850d8596cc07a5e", + "5ee00d5de3554137a3833eaf6a6555c4", + "c6426c66c4f043f2871092cebc0b9a5c", + "493a6fe8bb124802bed69d23de0c508e", + "336f065bc8ea4f2795b98604beda5c56", + "03635e8c58e44e609d578b894f41ab6c", + "8f5360f02f2743ab833120c040d0da58", + "96bfe52406b44c5987543008a918c8b5", + "35c6063752e44868bf959336503a204b", + "2ee9305b680d40429a206b409dc427d8", + "15365e96b88b4592aa5a97edaf203faa", + "01731ab574d7417982c3252be468048d", + "09da0999a8a74a2daab528795edc9dfb", + "fa232e3b8272477d8ad8afa420f0b966", + "ba2ebae221444c57a7d147298a8aaaa5", + "3eb88bb5f85d4176b63e462b8387d9d1", + "c75d7c8d4f5e4f8c81a516f417f6e356", + "10a2e0f49f934d649879f1379643b5c2", + "93d00755b1c74145a989942550cb1256", + "31414eee7ac24fb08c74df53771a46d1", + "c93d71f1f5e74beeb3650aaf22518a7d", + "f391afbe2b534ac4b046f3391d030b55", + "a87b5ffeda5a4f63b9f167eb060aa960", + "899a1113f94c4a12a97dd4a246ca34f5", + "e80e0571b6a44a5ab5d8a308159d6599", + "0b611f6c3f0a4dad9a4541f53d5a1018", + "90523f78a4bb48a2a9be260a85c7d4fc", + "fd1664aa2b75488dab5f33decb6e403b", + "298cc95bd32e44399756388b2e8d6738", + "43679efd9c104bbb828ac996d6c57d21", + "5ae95fe9d7e74f10b46d784039063c14", + "abe24447de574e13b6c4b0446d2e4c49", + "e99b1841328642c496534c2d85ab2197", + "df55839ea8174c3db3b208847b6a2edf", + "37df8776bba04771bdf1b59ade53d7db", + "86ed31f6ef22482aa68f56afa4f01855", + "5d93b70b84864b638595700334b4806b", + "381310ad93e34b9ca5a5970b329288bc", + "0fb592680bd84a2b977d373c517ef948", + "169b5a0ec13a4463bf31de75863ae124", + "2cfbfed9370949bba912b19f725d8dcd", + "83389fcc5d6043da807ae3b2bca6d278", + "f018320c97c346689b9a2cee3e129bda", + "9566c043613f4065a8acd0934a4fb91a", + "2f3aad401e4a40f7a86a70fbaaa237b3", + "dd6bbe22e8b94ffa9e22e6723ee31164", + "1fb5fe9069d94a85bf295c9f43cd6a2e", + "09853f9563db40dd8f991c0f11c343e2", + "08361c6620a7421a92a237b88f74392b", + "8ff05374922c463fa0cafc3166936916", + "481b213c55944148af3be4289be99b3c", + "39d2154a00014e7287a42d48959a39cd", + "34602f08ab064669bdb72fd1a4699e14", + "289443a77521422bbbb3fb454c763b7d" + ] + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.\n", + "Unsloth: Your Flash Attention 2 installation seems to be broken. Using Xformers instead. No performance changes will be seen.\n", + "🦥 Unsloth Zoo will now patch everything to make training faster!\n", + "==((====))== Unsloth 2026.4.4: Fast Gemma4 patching. Transformers: 5.5.0.\n", + " \\\\ /| NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.494 GB. Platform: Linux.\n", + "O^O/ \\_/ \\ Torch: 2.10.0+cu128. CUDA: 8.0. CUDA Toolkit: 12.8. Triton: 3.6.0\n", + "\\ / Bfloat16 = TRUE. FA [Xformers = 0.0.34. FA2 = False]\n", + " \"-____-\" Free license: http://github.com/unslothai/unsloth\n", + "Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "model.safetensors.index.json: 0.00B [00:00, ?B/s]" + ], + "application/vnd.jupyter.widget-view+json": { + "version_major": 2, + "version_minor": 0, + "model_id": "ddfff68b86604e0daa29ef5bc5b097c3" + } + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Downloading (incomplete total...): 0.00B [00:00, ?B/s]" + ], + "application/vnd.jupyter.widget-view+json": { + "version_major": 2, + "version_minor": 0, + "model_id": "03d61f85075f4ccab14b1d7cb41139f0" + } + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Fetching 2 files: 0%| | 0/2 [00:00" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "9jGeSb9bWe0k", + "outputId": "e95da582-2668-4936-9aec-3ffdde2bb771", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The animal in the image is a sloth. Sloths have appeared in several popular films, most notably:\n", + "\n", + "* **Zootopia (2016):** Featuring the character Flash, a slow-moving sloth who works at the Department of Mammal Vehicles.\n", + "* **Ice Age (2002):** Featuring Sid, a ground sloth who is one of the main protagonists of the series.\n" + ] + } + ], + "source": [ + "sloth_link = \"https://files.worldwildlife.org/wwfcmsprod/images/Sloth_Sitting_iStock_3_12_2014/story_full_width/8l7pbjmj29_iStock_000011145477Large_mini__1_.jpg\"\n", + "\n", + "messages = [{\n", + " \"role\" : \"user\",\n", + " \"content\": [\n", + " { \"type\": \"image\", \"image\" : sloth_link },\n", + " { \"type\": \"text\", \"text\" : \"Which films does this animal feature in?\" }\n", + " ]\n", + "}]\n", + "# You might have to wait 1 minute for Unsloth's auto compiler\n", + "do_gemma_4_inference(messages, max_new_tokens = 256)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eh0BzbZPWtRD" + }, + "source": [ + "Let's make a poem about sloths!" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "id": "R3ExuK8cWuT3", + "outputId": "3a0a6496-79ef-4921-8dd5-96ba5ba77fc9", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "In the emerald hush of the canopy high,\n", + "Where the orchids bloom and the parrots fly,\n", + "Dwells a gentle soul in a velvet coat,\n", + "A slow-motion dream, a drifting boat.\n", + "\n", + "With curved, steady claws and a sleepy gaze,\n", + "He wanders the wild in a languid haze.\n", + "No rush for the fruit, no race for the prize,\n", + "Just the soft, golden light of the tropical skies.\n", + "\n", + "He breathes with the wind, he sways with the breeze,\n", + "A living extension of moss-covered trees.\n", + "In the folds of his fur, a miniature world,\n", + "Where algae and\n" + ] + } + ], + "source": [ + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{ \"type\" : \"text\",\n", + " \"text\" : \"Write a poem about sloths.\" }]\n", + "}]\n", + "do_gemma_4_inference(messages)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Bw5XPyYFajyM" + }, + "source": [ + "# Let's finetune Gemma 4!\n", + "\n", + "You can finetune the vision and text parts for now through selection - the audio part can also be finetuned - we're working to make it selectable as well!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SXd9bTZd1aaL" + }, + "source": [ + "We now add LoRA adapters so we only need to update a small amount of parameters!" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "id": "6bZsfBuZDeCL" + }, + "outputs": [], + "source": [ + "model = FastModel.get_peft_model(\n", + " model,\n", + " finetune_vision_layers = False, # Turn off for just text!\n", + " finetune_language_layers = True, # Should leave on!\n", + " finetune_attention_modules = True, # Attention good for GRPO\n", + " finetune_mlp_modules = True, # Should leave on always!\n", + "\n", + " r = 8, # Larger = higher accuracy, but might overfit\n", + " lora_alpha = 8, # Recommended alpha == r at least\n", + " lora_dropout = 0,\n", + " bias = \"none\",\n", + " random_state = 3407,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vITh0KVJ10qX" + }, + "source": [ + "\n", + "### Data Prep\n", + "We now use the `Gemma-4` format for conversation style finetunes. We use [Maxime Labonne's FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset in ShareGPT style. Gemma-4 renders multi turn conversations like below:\n", + "\n", + "```\n", + "<|turn>user\n", + "Hello\n", + "<|turn>model\n", + "Hey there!\n", + "```\n", + "We use our `get_chat_template` function to get the correct chat template. We support `zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, phi3, llama3, phi4, qwen2.5, gemma3, gemma-4` and more." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "id": "LjY75GoYUCB8" + }, + "outputs": [], + "source": [ + "from unsloth.chat_templates import get_chat_template\n", + "tokenizer = get_chat_template(\n", + " tokenizer,\n", + " chat_template = \"gemma-4-thinking\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZQkXuGYxbJ-e" + }, + "source": [ + "We get the first 3000 rows of the dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "id": "Mkq4RvEq7FQr", + "outputId": "43449058-686e-430f-c009-c3e795af23ac", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 113, + "referenced_widgets": [ + "91d72aaeb61042e4947b071d3938d141", + "a63c2648c298475dbbd524a7981cc2ad", + "29adf88558b64e93b179cd1087032ae2", + "807ca6b59c494877acd1d7672e2d0e42", + "4627ade20aea4966a15122cdd52f6a25", + "73068a350b9a481d9363f24053235c4b", + "db04d1ca966a43b0be0bbd3ad4ddb515", + "024c0c5a1c5542babf6cf130f0648b27", + "14a012520c1340ee9264630bb456ce95", + "2be7b0a20ea843af879c32c6d8707138", + "d60a95c9eb6d4ca5be81eb67ac34b87d", + "68995941f8e649a1a5265b073c86f84c", + "f4b859550a47428ba4988a7c582443eb", + "66402b5fde234614b92ea5786b108f5c", + "e6e4248babc74ac9ae46c47fb3d2a371", + "4caa19e33d8b4be7a1d3b083b0c1948d", + "f62d9af82c9146beab3ed27b85077af5", + "88f1deb88087491eaf91cc902772ab79", + "3c3b82e37f1249c29dcbc663dd95d4f8", + "d3bf74e15c984629aafd83ad8f5e0d92", + "9371b6a9e05847cba1b74278e68955ad", + "502e2c08bc2941448e984a1f50401c72", + "59cbb4c555d848079284ffa4f63cdb04", + "e0b731ea60c24ff98c0803c6072568b0", + "a01e4ad102e74c869953e5303fb5da79", + "956c0ca869d04bd084abff9e9ff8afa7", + "095b148b6c8d4201a4c631fd0903c11d", + "1527b6baf1e24722aecbe33f6ffd1e52", + "ba42d40812c1453ab8208bce2693e79d", + "1fd1672a2bb041e8be285cd8fb1a413e", + "5f1175f9e50e4183b99fc7106a9b39c7", + "95800a9caeed4806a66b25ead152aed2", + "476c623468944e438825f5269b5acf04" + ] + } + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "README.md: 0%| | 0.00/982 [00:00` token using removeprefix(`''`) since we're finetuning. The Processor will add this token before training and the model expects only one." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "id": "1ahE8Ys37JDJ", + "outputId": "11a09b9e-9fab-45cc-aef4-81b027ebc64b", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 49, + "referenced_widgets": [ + "e74375a06bf84083bf0198a59d0a8204", + "d097e15ca15c4d2eb74520873c028809", + "09205f37e867471cba0c6d9bc93d1fa6", + "8ff9730f06e44b4f835a79f8aa948815", + "1984c66eaca44ac487b177b939d31122", + "42355bb88a2246d4bb25e05c6d1e6aca", + "7369d3e3e2bf4530ab7f0999b393a2ea", + "3caf02c6f3524f889a4a22e2c09ea2ec", + "3f97ac1599e5474f818d83971acc18e9", + "04e9b0514e284cf3aba462aca7906064", + "0bea037f647346fc9feff2f63659422d" + ] + } + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Map: 0%| | 0/3000 [00:00') for convo in convos]\n", + " return { \"text\" : texts, }\n", + "\n", + "dataset = dataset.map(formatting_prompts_func, batched = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ndDUB23CGAC5" + }, + "source": [ + "Let's see how the chat template did! Notice there is no `` token as the processor tokenizer will be adding one." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "id": "gGFzmplrEy9I", + "outputId": "7d50b4dc-d3ff-49fd-dba6-7511ee9c7c07", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 140 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'<|turn>user\\nWhat is the modulus operator in programming and how can I use it to calculate the modulus of two given numbers?\\n<|turn>model\\n<|channel>thought\\nIn programming, the modulus operator is represented by the \\'%\\' symbol. It calculates the remainder when one number is divided by another. To calculate the modulus of two given numbers, you can use the modulus operator in the following way:\\n\\n```python\\n# Calculate the modulus\\nModulus = a % b\\n\\nprint(\"Modulus of the given numbers is: \", Modulus)\\n```\\n\\nIn this code snippet, the variables \\'a\\' and \\'b\\' represent the two given numbers for which you want to calculate the modulus. By using the modulus operator \\'%\\', we calculate the remainder when \\'a\\' is divided by \\'b\\'. The result is then stored in the variable \\'Modulus\\'. Finally, the modulus value is printed using the \\'print\\' statement.\\n\\nFor example, if \\'a\\' is 10 and \\'b\\' is 4, the modulus calculation would be 10 % 4, which equals 2. Therefore, the output of the above code would be:\\n\\n```\\nModulus of the given numbers is: 2\\n```\\n\\nThis means that the modulus of 10 and 4 is 2.\\n'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 13 + } + ], + "source": [ + "dataset[100][\"text\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "idAEIeSQ3xdS" + }, + "source": [ + "\n", + "### Train the model\n", + "Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "id": "95_Nn-89DhsL", + "outputId": "369cc7d6-dfd2-4e4a-c117-495ffd50d06d", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 49, + "referenced_widgets": [ + "872c4aa8a612431aaf11581cf8b7e966", + "cc0fdf131fe944ae8686ab413b48b40b", + "ccf366fe1da44dad97716da52be0fa95", + "559f5ce577684e7d8fbb3eeca3df7481", + "9ab84bc270d5426ea6503200962c0008", + "cdeb781f1d16460c8155bd7616790995", + "86b55c60b19b4f60b632c94ffdb3597d", + "2025f98694a14209a17310de5e27f29f", + "22e6e92bdff246b5b4f8aa2240bcba77", + "0d0d95b0a65643fdb1aac22b0fc22f25", + "bee9f63027394da28a6faa2cda06d291" + ] + } + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Unsloth: Tokenizing [\"text\"] (num_proc=16): 0%| | 0/3000 [00:00user\\n\",\n", + " response_part = \"<|turn>model\\n\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Dv1NBUozV78l" + }, + "source": [ + "Let's verify masking the instruction part is done! Let's print the 100th row again. Notice how the sample only has a single `` as expected!" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "id": "LtsMVtlkUhja", + "outputId": "68552bcc-1c20-48e9-8a78-505dced229c9", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 140 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'<|turn>user\\nWhat is the modulus operator in programming and how can I use it to calculate the modulus of two given numbers?\\n<|turn>model\\n<|channel>thought\\nIn programming, the modulus operator is represented by the \\'%\\' symbol. It calculates the remainder when one number is divided by another. To calculate the modulus of two given numbers, you can use the modulus operator in the following way:\\n\\n```python\\n# Calculate the modulus\\nModulus = a % b\\n\\nprint(\"Modulus of the given numbers is: \", Modulus)\\n```\\n\\nIn this code snippet, the variables \\'a\\' and \\'b\\' represent the two given numbers for which you want to calculate the modulus. By using the modulus operator \\'%\\', we calculate the remainder when \\'a\\' is divided by \\'b\\'. The result is then stored in the variable \\'Modulus\\'. Finally, the modulus value is printed using the \\'print\\' statement.\\n\\nFor example, if \\'a\\' is 10 and \\'b\\' is 4, the modulus calculation would be 10 % 4, which equals 2. Therefore, the output of the above code would be:\\n\\n```\\nModulus of the given numbers is: 2\\n```\\n\\nThis means that the modulus of 10 and 4 is 2.\\n'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 16 + } + ], + "source": [ + "tokenizer.decode(trainer.train_dataset[100][\"input_ids\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4Kyjy__m9KY3" + }, + "source": [ + "Now let's print the masked out example - you should see only the answer is present:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "id": "_rD6fl8EUxnG", + "outputId": "b44c5e2f-068f-4a31-a31e-34ed10678a6d", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 140 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "' <|channel>thought\\nIn programming, the modulus operator is represented by the \\'%\\' symbol. It calculates the remainder when one number is divided by another. To calculate the modulus of two given numbers, you can use the modulus operator in the following way:\\n\\n```python\\n# Calculate the modulus\\nModulus = a % b\\n\\nprint(\"Modulus of the given numbers is: \", Modulus)\\n```\\n\\nIn this code snippet, the variables \\'a\\' and \\'b\\' represent the two given numbers for which you want to calculate the modulus. By using the modulus operator \\'%\\', we calculate the remainder when \\'a\\' is divided by \\'b\\'. The result is then stored in the variable \\'Modulus\\'. Finally, the modulus value is printed using the \\'print\\' statement.\\n\\nFor example, if \\'a\\' is 10 and \\'b\\' is 4, the modulus calculation would be 10 % 4, which equals 2. Therefore, the output of the above code would be:\\n\\n```\\nModulus of the given numbers is: 2\\n```\\n\\nThis means that the modulus of 10 and 4 is 2.\\n'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 17 + } + ], + "source": [ + "tokenizer.decode([tokenizer.pad_token_id if x == -100 else x for x in trainer.train_dataset[100][\"labels\"]]).replace(tokenizer.pad_token, \" \")" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "cellView": "form", + "id": "2ejIt2xSNKKp", + "outputId": "7a278f07-915e-44b9-d018-c342eb3d9256", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "GPU = NVIDIA A100-SXM4-40GB. Max memory = 39.494 GB.\n", + "18.57 GB of memory reserved.\n" + ] + } + ], + "source": [ + "# @title Show current memory stats\n", + "gpu_stats = torch.cuda.get_device_properties(0)\n", + "start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)\n", + "print(f\"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.\")\n", + "print(f\"{start_gpu_memory} GB of memory reserved.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CNP1Uidk9mrz" + }, + "source": [ + "# Let's train the model!\n", + "\n", + "To resume a training run, set `trainer.train(resume_from_checkpoint = True)`" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "id": "yqxqAZ7KJ4oL", + "outputId": "bed52b39-9792-4141-e4aa-3a7f8e3e6d49", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': 2}.\n", + "==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1\n", + " \\\\ /| Num examples = 3,000 | Num Epochs = 1 | Total steps = 60\n", + "O^O/ \\_/ \\ Batch size per device = 1 | Gradient accumulation steps = 4\n", + "\\ / Data Parallel GPUs = 1 | Total batch size (1 x 4 x 1) = 4\n", + " \"-____-\" Trainable parameters = 61,214,720 of 31,334,301,232 (0.20% trained)\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Unsloth: Will smartly offload gradients to save VRAM!\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [60/60 05:57, Epoch 0/1]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StepTraining Loss
12.130297
20.630456
30.910843
41.590092
51.243832
60.751323
70.740214
80.595984
90.828650
101.148587
111.111826
120.612088
130.677837
140.893246
150.733738
160.908046
171.022293
180.960453
191.008064
200.916117
210.572186
221.075967
230.941994
240.896985
250.553296
260.467511
270.577387
280.588091
290.524273
300.754770
310.678074
320.652198
331.039165
340.474116
350.921311
360.881754
370.611182
380.940461
390.909078
400.630252
410.829833
420.835820
430.623156
440.523088
450.542758
461.208341
470.628660
480.642084
490.914339
501.025399
510.500276
520.597512
530.595572
540.555974
550.767994
560.910350
570.447900
580.765359
590.853433
600.745598

" + ] + }, + "metadata": {} + } + ], + "source": [ + "trainer_stats = trainer.train()" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "cellView": "form", + "id": "pCqnaKmlO1U9", + "outputId": "881756ba-9573-4a68-ec6e-14e4cf325d93", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "390.1241 seconds used for training.\n", + "6.5 minutes used for training.\n", + "Peak reserved memory = 21.43 GB.\n", + "Peak reserved memory for training = 2.86 GB.\n", + "Peak reserved memory % of max memory = 54.261 %.\n", + "Peak reserved memory for training % of max memory = 7.242 %.\n" + ] + } + ], + "source": [ + "# @title Show final memory and time stats\n", + "used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "used_memory_for_lora = round(used_memory - start_gpu_memory, 3)\n", + "used_percentage = round(used_memory / max_memory * 100, 3)\n", + "lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)\n", + "print(f\"{trainer_stats.metrics['train_runtime']} seconds used for training.\")\n", + "print(\n", + " f\"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.\"\n", + ")\n", + "print(f\"Peak reserved memory = {used_memory} GB.\")\n", + "print(f\"Peak reserved memory for training = {used_memory_for_lora} GB.\")\n", + "print(f\"Peak reserved memory % of max memory = {used_percentage} %.\")\n", + "print(f\"Peak reserved memory for training % of max memory = {lora_percentage} %.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ekOmTR1hSNcr" + }, + "source": [ + "\n", + "### Inference\n", + "Let's run the model via Unsloth native inference! According to the `Gemma-4` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64`" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "id": "kR3gIAX-SM2q", + "outputId": "cdfa1bd8-62cd-4355-cd4a-1026a16807ea", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "['<|turn>user\\nContinue the sequence: 1, 1, 2, 3, 5, 8,\\n<|turn>model\\n<|channel>thought\\n13']" + ], + "text/html": [ + "

['<bos><|turn>user\\nContinue the sequence: 1, 1, 2, 3, 5, 8,<turn|>\\n<|turn>model\\n<|channel>thought\\n<channel|>13<turn|>']
" + ] + }, + "metadata": {}, + "execution_count": 21 + } + ], + "source": [ + "from unsloth.chat_templates import get_chat_template\n", + "tokenizer = get_chat_template(\n", + " tokenizer,\n", + " chat_template = \"gemma-4-thinking\",\n", + ")\n", + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{\n", + " \"type\" : \"text\",\n", + " \"text\" : \"Continue the sequence: 1, 1, 2, 3, 5, 8,\",\n", + " }]\n", + "}]\n", + "inputs = tokenizer.apply_chat_template(\n", + " messages,\n", + " add_generation_prompt = True, # Must add for generation\n", + " return_tensors = \"pt\",\n", + " tokenize = True,\n", + " return_dict = True,\n", + ").to(\"cuda\")\n", + "outputs = model.generate(\n", + " **inputs,\n", + " max_new_tokens = 64, # Increase for longer outputs!\n", + " use_cache = True,\n", + " # Recommended Gemma-4 settings!\n", + " temperature = 1.0, top_p = 0.95, top_k = 64,\n", + ")\n", + "tokenizer.batch_decode(outputs)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CrSvZObor0lY" + }, + "source": [ + " You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "id": "e2pEuRb1r2Vg", + "outputId": "4c132d52-60a3-4024-8500-3a779f3dc9d3", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "<|channel>thought\n", + "The shortest answer is: **Rayleigh scattering.**\n", + "\n", + "Here is the more detailed explanation of how it works:\n", + "\n", + "### 1. Sunlight is a spectrum of colors\n", + "Although sunlight looks white, it is actually made up of all the colors of the rainbow (red, orange, yellow, green,\n" + ] + } + ], + "source": [ + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\" : \"text\", \"text\" : \"Why is the sky blue?\",}]\n", + "}]\n", + "inputs = tokenizer.apply_chat_template(\n", + " messages,\n", + " add_generation_prompt = True, # Must add for generation\n", + " return_tensors = \"pt\",\n", + " tokenize = True,\n", + " return_dict = True,\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "_ = model.generate(\n", + " **inputs,\n", + " max_new_tokens = 64, # Increase for longer outputs!\n", + " use_cache = True,\n", + " # Recommended Gemma-4 settings!\n", + " temperature = 1.0, top_p = 0.95, top_k = 64,\n", + " streamer = TextStreamer(tokenizer, skip_prompt = True),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uMuVrWbjAzhc" + }, + "source": [ + "\n", + "### Saving, loading finetuned models\n", + "To save the final model as LoRA adapters, either use Hugging Face's `push_to_hub` for an online save or `save_pretrained` for a local save.\n", + "\n", + "**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "id": "upcOlWe7A1vc", + "outputId": "f2c735b5-ddc0-4664-a256-72143545e918", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "['gemma_4_lora/processor_config.json']" + ] + }, + "metadata": {}, + "execution_count": 23 + } + ], + "source": [ + "model.save_pretrained(\"gemma_4_lora\") # Local saving\n", + "tokenizer.save_pretrained(\"gemma_4_lora\")\n", + "# model.push_to_hub(\"HF_ACCOUNT/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving\n", + "# tokenizer.push_to_hub(\"HF_ACCOUNT/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AEEcJ4qfC7Lp" + }, + "source": [ + "Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "id": "MKX_XKs_BNZR", + "outputId": "b12693a9-bca7-4e37-e9d4-1ceeed5b6907", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "<|channel>thought\n", + "As of my current knowledge, there is no official model called **Gemma-4**.\n", + "\n", + "It is possible you are thinking of one of the following:\n", + "\n", + "1. **Gemma (by Google):** This is a family of lightweight, open-weights models built from the same technology used to create the Gemini models. The current versions are **Gemma** and **Gemma 2**.\n", + "2. **GPT-4 (by OpenAI):** This is the powerful multimodal large language model that powers the paid version of ChatGPT.\n", + "3. **Llama (by Meta):** Meta has\n" + ] + } + ], + "source": [ + "if False:\n", + " from unsloth import FastModel\n", + " model, tokenizer = FastModel.from_pretrained(\n", + " model_name = \"gemma_4_lora\", # YOUR MODEL YOU USED FOR TRAINING\n", + " max_seq_length = 2048,\n", + " load_in_4bit = True,\n", + " )\n", + "\n", + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\" : \"text\", \"text\" : \"What is Gemma-4?\",}]\n", + "}]\n", + "inputs = tokenizer.apply_chat_template(\n", + " messages,\n", + " add_generation_prompt = True, # Must add for generation\n", + " return_tensors = \"pt\",\n", + " tokenize = True,\n", + " return_dict = True,\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "_ = model.generate(\n", + " **inputs,\n", + " max_new_tokens = 128, # Increase for longer outputs!\n", + " use_cache = True,\n", + " # Recommended Gemma-4 settings!\n", + " temperature = 1.0, top_p = 0.95, top_k = 64,\n", + " streamer = TextStreamer(tokenizer, skip_prompt = True),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f422JgM9sdVT" + }, + "source": [ + "### Saving to float16 for VLLM\n", + "\n", + "We also support saving to `float16` directly for deployment! We save it in the folder `gemma-4-finetune`. Set `if False` to `if True` to let it run!" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "id": "iHjt_SMYsd3P" + }, + "outputs": [], + "source": [ + "if False: # Change to True to save finetune!\n", + " model.save_pretrained_merged(\"gemma-4-finetune\", tokenizer)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "z6O48DbNIAr0" + }, + "source": [ + "If you want to upload / push to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location!" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "id": "ZV-CiKPrIFG0" + }, + "outputs": [], + "source": [ + "if False: # Change to True to upload finetune\n", + " model.push_to_hub_merged(\n", + " \"HF_ACCOUNT/gemma-4-finetune\", tokenizer,\n", + " token = \"YOUR_HF_TOKEN\"\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TCv4vXHd61i7" + }, + "source": [ + "### GGUF / llama.cpp Conversion\n", + "To save to `GGUF` / `llama.cpp`, we support it natively now for all models! For now, you can convert easily to `Q8_0, F16 or BF16` precision. `Q4_K_M` for 4bit will come later!" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "id": "FqfebeAdT073" + }, + "outputs": [], + "source": [ + "if False: # Change to True to save to GGUF\n", + " model.save_pretrained_gguf(\n", + " \"gemma_4_finetune\",\n", + " tokenizer,\n", + " quantization_method = \"Q8_0\", # For now only Q8_0, BF16, F16 supported\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Q974YEVPI7JS" + }, + "source": [ + "Likewise, if you want to instead push to GGUF to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location!" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": { + "id": "ZgcJIhJ0I_es" + }, + "outputs": [], + "source": [ + "if False: # Change to True to upload GGUF\n", + " model.push_to_hub_gguf(\n", + " \"HF_ACCOUNT/gemma_4_finetune\",\n", + " tokenizer,\n", + " quantization_method = \"Q8_0\", # Only Q8_0, BF16, F16 supported\n", + " token = \"YOUR_HF_TOKEN\",\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pnz9QOYTMvbH" + }, + "source": [ + "Now, use the `gemma-4-finetune.gguf` file or `gemma-4-finetune-Q4_K_M.gguf` file in llama.cpp.\n", + "\n", + "And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!\n", + "\n", + "Some other resources:\n", + "1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)\n", + "2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)\n", + "3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)\n", + "4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)!\n", + "\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " Join Discord if you need help + ⭐️ Star us on Github ⭐️\n", + "
\n", + "\n", + " This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme)." + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "A100", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "ddfff68b86604e0daa29ef5bc5b097c3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_ad049bffb6ab409eab6268cb8b46e1d7", + "IPY_MODEL_1a7eb21cd8f14d0b98517dddd5a691c3", + "IPY_MODEL_8035803031e048be9cda8d78aaf83f1b" + ], + "layout": "IPY_MODEL_2c3ad97c23f840f5997f3e5554ef7b02" + } + }, + "ad049bffb6ab409eab6268cb8b46e1d7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c731d59a7ea449948fa2de671225ddc2", + "placeholder": "​", + "style": "IPY_MODEL_bff47b44ee1a49fbb0c69a24a7d1985b", + "value": "model.safetensors.index.json: " + } + }, + "1a7eb21cd8f14d0b98517dddd5a691c3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3753574c0ed14e6583ef6228f921efc3", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_b897d2670a8e42ac89d526b7daf7d591", + "value": 1 + } + }, + "8035803031e048be9cda8d78aaf83f1b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_e262ca340c0a4269a5f8571cfa85f1ce", + "placeholder": "​", + "style": "IPY_MODEL_6b98b989f90842e79b20bbd8007ed266", + "value": " 120k/? [00:00<00:00, 11.4MB/s]" + } + }, + "2c3ad97c23f840f5997f3e5554ef7b02": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c731d59a7ea449948fa2de671225ddc2": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "bff47b44ee1a49fbb0c69a24a7d1985b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3753574c0ed14e6583ef6228f921efc3": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "b897d2670a8e42ac89d526b7daf7d591": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "e262ca340c0a4269a5f8571cfa85f1ce": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6b98b989f90842e79b20bbd8007ed266": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "03d61f85075f4ccab14b1d7cb41139f0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_76bb1448130f41bd8d36591733b2bc13", + "IPY_MODEL_ac0b4b138d3b4702880b6eff4ace812e", + "IPY_MODEL_7d8ce2a07d0d4b0d8c0ce2dd9bce1d8f" + ], + "layout": "IPY_MODEL_7deed7b9ad7e4d11b2e55e060295876f" + } + }, + "76bb1448130f41bd8d36591733b2bc13": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_54d6d94660dc4131acb117491bc67fe4", + "placeholder": "​", + "style": "IPY_MODEL_63a2d2156b0d4c9e9b62638d7b98a31a", + "value": "Download complete: 100%" + } + }, + "ac0b4b138d3b4702880b6eff4ace812e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_e49a0432d7ff4865937f127bb42d5694", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_abf451cd86ee4736a2973e17957cb65a", + "value": 1 + } + }, + "7d8ce2a07d0d4b0d8c0ce2dd9bce1d8f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_61b85a135758473ca21478a99b294662", + "placeholder": "​", + "style": "IPY_MODEL_5b9428103bc0447b84830baba8cd0792", + "value": " 62.5G/62.5G [03:04<00:00, 284MB/s]" + } + }, + "7deed7b9ad7e4d11b2e55e060295876f": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "54d6d94660dc4131acb117491bc67fe4": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "63a2d2156b0d4c9e9b62638d7b98a31a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "e49a0432d7ff4865937f127bb42d5694": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "abf451cd86ee4736a2973e17957cb65a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "61b85a135758473ca21478a99b294662": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5b9428103bc0447b84830baba8cd0792": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3af787f578db4dde81a8515af76e2d4a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_cfb0f9b35dad41ff89f6d4c9294fb141", + "IPY_MODEL_12741070e1b344c5aef74e1f4e149ef0", + "IPY_MODEL_e53e734a00254b589f8bf2c46d99a157" + ], + "layout": "IPY_MODEL_9515391e235540438acccbe364707b84" + } + }, + "cfb0f9b35dad41ff89f6d4c9294fb141": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_6331c3b6d264458fa34668a0e5c29d6e", + "placeholder": "​", + "style": "IPY_MODEL_05727779b5eb4dcaaea43a91e7b173cd", + "value": "Fetching 2 files: 100%" + } + }, + "12741070e1b344c5aef74e1f4e149ef0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7ff031eac8a2489493c94585a02af25c", + "max": 2, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_2d505dcd97d049d6ba2ff96c835c0f56", + "value": 2 + } + }, + "e53e734a00254b589f8bf2c46d99a157": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_e7d9e35834254739ba9c263eceec4ae3", + "placeholder": "​", + "style": "IPY_MODEL_b92921f4b5e44191a70aa7f297d3b26d", + "value": " 2/2 [03:04<00:00, 184.71s/it]" + } + }, + "9515391e235540438acccbe364707b84": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6331c3b6d264458fa34668a0e5c29d6e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "05727779b5eb4dcaaea43a91e7b173cd": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "7ff031eac8a2489493c94585a02af25c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2d505dcd97d049d6ba2ff96c835c0f56": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "e7d9e35834254739ba9c263eceec4ae3": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b92921f4b5e44191a70aa7f297d3b26d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "18e1bcd9e38b4de98b60ad764b20f9b4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_c364867b67644ee7b4480907f43a6333", + "IPY_MODEL_4abe4c12f83242db9b5f48ffc732c967", + "IPY_MODEL_463e555166254ca295339a5b2377aa74" + ], + "layout": "IPY_MODEL_ac13c14fc8254d62b4bf2effba318efb" + } + }, + "c364867b67644ee7b4480907f43a6333": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_98c1b6ab11854572a5472858a08db06d", + "placeholder": "​", + "style": "IPY_MODEL_6ff843b22e7f48b6bb65be7de51d78a8", + "value": "Loading weights: 100%" + } + }, + "4abe4c12f83242db9b5f48ffc732c967": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_acf6196472d1429ea88f6a960f90cfcd", + "max": 1188, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_97742cb20ea3454fa324a3da91f7a23a", + "value": 1188 + } + }, + "463e555166254ca295339a5b2377aa74": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_2100a1ea915747d9bdec9368a15e3efb", + "placeholder": "​", + "style": "IPY_MODEL_9ede245da88b4c288bf621cdba54872a", + "value": " 1188/1188 [02:41<00:00, 97.10it/s]" + } + }, + "ac13c14fc8254d62b4bf2effba318efb": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "98c1b6ab11854572a5472858a08db06d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6ff843b22e7f48b6bb65be7de51d78a8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "acf6196472d1429ea88f6a960f90cfcd": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "97742cb20ea3454fa324a3da91f7a23a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "2100a1ea915747d9bdec9368a15e3efb": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9ede245da88b4c288bf621cdba54872a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "90186299fc68489fa74be2f6b344fd1e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_1b5095988f394b059850d8596cc07a5e", + "IPY_MODEL_5ee00d5de3554137a3833eaf6a6555c4", + "IPY_MODEL_c6426c66c4f043f2871092cebc0b9a5c" + ], + "layout": "IPY_MODEL_493a6fe8bb124802bed69d23de0c508e" + } + }, + "1b5095988f394b059850d8596cc07a5e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_336f065bc8ea4f2795b98604beda5c56", + "placeholder": "​", + "style": "IPY_MODEL_03635e8c58e44e609d578b894f41ab6c", + "value": "generation_config.json: 100%" + } + }, + "5ee00d5de3554137a3833eaf6a6555c4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_8f5360f02f2743ab833120c040d0da58", + "max": 208, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_96bfe52406b44c5987543008a918c8b5", + "value": 208 + } + }, + "c6426c66c4f043f2871092cebc0b9a5c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_35c6063752e44868bf959336503a204b", + "placeholder": "​", + "style": "IPY_MODEL_2ee9305b680d40429a206b409dc427d8", + "value": " 208/208 [00:00<00:00, 27.5kB/s]" + } + }, + "493a6fe8bb124802bed69d23de0c508e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "336f065bc8ea4f2795b98604beda5c56": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "03635e8c58e44e609d578b894f41ab6c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "8f5360f02f2743ab833120c040d0da58": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "96bfe52406b44c5987543008a918c8b5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "35c6063752e44868bf959336503a204b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2ee9305b680d40429a206b409dc427d8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "15365e96b88b4592aa5a97edaf203faa": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_01731ab574d7417982c3252be468048d", + "IPY_MODEL_09da0999a8a74a2daab528795edc9dfb", + "IPY_MODEL_fa232e3b8272477d8ad8afa420f0b966" + ], + "layout": "IPY_MODEL_ba2ebae221444c57a7d147298a8aaaa5" + } + }, + "01731ab574d7417982c3252be468048d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3eb88bb5f85d4176b63e462b8387d9d1", + "placeholder": "​", + "style": "IPY_MODEL_c75d7c8d4f5e4f8c81a516f417f6e356", + "value": "processor_config.json: " + } + }, + "09da0999a8a74a2daab528795edc9dfb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_10a2e0f49f934d649879f1379643b5c2", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_93d00755b1c74145a989942550cb1256", + "value": 1 + } + }, + "fa232e3b8272477d8ad8afa420f0b966": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_31414eee7ac24fb08c74df53771a46d1", + "placeholder": "​", + "style": "IPY_MODEL_c93d71f1f5e74beeb3650aaf22518a7d", + "value": " 1.69k/? [00:00<00:00, 198kB/s]" + } + }, + "ba2ebae221444c57a7d147298a8aaaa5": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3eb88bb5f85d4176b63e462b8387d9d1": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c75d7c8d4f5e4f8c81a516f417f6e356": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "10a2e0f49f934d649879f1379643b5c2": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "93d00755b1c74145a989942550cb1256": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "31414eee7ac24fb08c74df53771a46d1": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c93d71f1f5e74beeb3650aaf22518a7d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "f391afbe2b534ac4b046f3391d030b55": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_a87b5ffeda5a4f63b9f167eb060aa960", + "IPY_MODEL_899a1113f94c4a12a97dd4a246ca34f5", + "IPY_MODEL_e80e0571b6a44a5ab5d8a308159d6599" + ], + "layout": "IPY_MODEL_0b611f6c3f0a4dad9a4541f53d5a1018" + } + }, + "a87b5ffeda5a4f63b9f167eb060aa960": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_90523f78a4bb48a2a9be260a85c7d4fc", + "placeholder": "​", + "style": "IPY_MODEL_fd1664aa2b75488dab5f33decb6e403b", + "value": "chat_template.jinja: " + } + }, + "899a1113f94c4a12a97dd4a246ca34f5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_298cc95bd32e44399756388b2e8d6738", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_43679efd9c104bbb828ac996d6c57d21", + "value": 1 + } + }, + "e80e0571b6a44a5ab5d8a308159d6599": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_5ae95fe9d7e74f10b46d784039063c14", + "placeholder": "​", + "style": "IPY_MODEL_abe24447de574e13b6c4b0446d2e4c49", + "value": " 12.0k/? [00:00<00:00, 1.40MB/s]" + } + }, + "0b611f6c3f0a4dad9a4541f53d5a1018": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "90523f78a4bb48a2a9be260a85c7d4fc": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "fd1664aa2b75488dab5f33decb6e403b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "298cc95bd32e44399756388b2e8d6738": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "43679efd9c104bbb828ac996d6c57d21": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "5ae95fe9d7e74f10b46d784039063c14": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "abe24447de574e13b6c4b0446d2e4c49": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "e99b1841328642c496534c2d85ab2197": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_df55839ea8174c3db3b208847b6a2edf", + "IPY_MODEL_37df8776bba04771bdf1b59ade53d7db", + "IPY_MODEL_86ed31f6ef22482aa68f56afa4f01855" + ], + "layout": "IPY_MODEL_5d93b70b84864b638595700334b4806b" + } + }, + "df55839ea8174c3db3b208847b6a2edf": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_381310ad93e34b9ca5a5970b329288bc", + "placeholder": "​", + "style": "IPY_MODEL_0fb592680bd84a2b977d373c517ef948", + "value": "tokenizer_config.json: " + } + }, + "37df8776bba04771bdf1b59ade53d7db": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_169b5a0ec13a4463bf31de75863ae124", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_2cfbfed9370949bba912b19f725d8dcd", + "value": 1 + } + }, + "86ed31f6ef22482aa68f56afa4f01855": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_83389fcc5d6043da807ae3b2bca6d278", + "placeholder": "​", + "style": "IPY_MODEL_f018320c97c346689b9a2cee3e129bda", + "value": " 15.0k/? [00:00<00:00, 1.63MB/s]" + } + }, + "5d93b70b84864b638595700334b4806b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "381310ad93e34b9ca5a5970b329288bc": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0fb592680bd84a2b977d373c517ef948": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "169b5a0ec13a4463bf31de75863ae124": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "2cfbfed9370949bba912b19f725d8dcd": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "83389fcc5d6043da807ae3b2bca6d278": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f018320c97c346689b9a2cee3e129bda": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "9566c043613f4065a8acd0934a4fb91a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_2f3aad401e4a40f7a86a70fbaaa237b3", + "IPY_MODEL_dd6bbe22e8b94ffa9e22e6723ee31164", + "IPY_MODEL_1fb5fe9069d94a85bf295c9f43cd6a2e" + ], + "layout": "IPY_MODEL_09853f9563db40dd8f991c0f11c343e2" + } + }, + "2f3aad401e4a40f7a86a70fbaaa237b3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_08361c6620a7421a92a237b88f74392b", + "placeholder": "​", + "style": "IPY_MODEL_8ff05374922c463fa0cafc3166936916", + "value": "tokenizer.json: 100%" + } + }, + "dd6bbe22e8b94ffa9e22e6723ee31164": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_481b213c55944148af3be4289be99b3c", + "max": 32169626, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_39d2154a00014e7287a42d48959a39cd", + "value": 32169626 + } + }, + "1fb5fe9069d94a85bf295c9f43cd6a2e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_34602f08ab064669bdb72fd1a4699e14", + "placeholder": "​", + "style": "IPY_MODEL_289443a77521422bbbb3fb454c763b7d", + "value": " 32.2M/32.2M [00:00<00:00, 161MB/s]" + } + }, + "09853f9563db40dd8f991c0f11c343e2": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "08361c6620a7421a92a237b88f74392b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8ff05374922c463fa0cafc3166936916": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "481b213c55944148af3be4289be99b3c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "39d2154a00014e7287a42d48959a39cd": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "34602f08ab064669bdb72fd1a4699e14": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "289443a77521422bbbb3fb454c763b7d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "91d72aaeb61042e4947b071d3938d141": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_a63c2648c298475dbbd524a7981cc2ad", + "IPY_MODEL_29adf88558b64e93b179cd1087032ae2", + "IPY_MODEL_807ca6b59c494877acd1d7672e2d0e42" + ], + "layout": "IPY_MODEL_4627ade20aea4966a15122cdd52f6a25" + } + }, + "a63c2648c298475dbbd524a7981cc2ad": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_73068a350b9a481d9363f24053235c4b", + "placeholder": "​", + "style": "IPY_MODEL_db04d1ca966a43b0be0bbd3ad4ddb515", + "value": "README.md: 100%" + } + }, + "29adf88558b64e93b179cd1087032ae2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_024c0c5a1c5542babf6cf130f0648b27", + "max": 982, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_14a012520c1340ee9264630bb456ce95", + "value": 982 + } + }, + "807ca6b59c494877acd1d7672e2d0e42": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_2be7b0a20ea843af879c32c6d8707138", + "placeholder": "​", + "style": "IPY_MODEL_d60a95c9eb6d4ca5be81eb67ac34b87d", + "value": " 982/982 [00:00<00:00, 125kB/s]" + } + }, + "4627ade20aea4966a15122cdd52f6a25": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "73068a350b9a481d9363f24053235c4b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "db04d1ca966a43b0be0bbd3ad4ddb515": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "024c0c5a1c5542babf6cf130f0648b27": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "14a012520c1340ee9264630bb456ce95": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "2be7b0a20ea843af879c32c6d8707138": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d60a95c9eb6d4ca5be81eb67ac34b87d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "68995941f8e649a1a5265b073c86f84c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_f4b859550a47428ba4988a7c582443eb", + "IPY_MODEL_66402b5fde234614b92ea5786b108f5c", + "IPY_MODEL_e6e4248babc74ac9ae46c47fb3d2a371" + ], + "layout": "IPY_MODEL_4caa19e33d8b4be7a1d3b083b0c1948d" + } + }, + "f4b859550a47428ba4988a7c582443eb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_f62d9af82c9146beab3ed27b85077af5", + "placeholder": "​", + "style": "IPY_MODEL_88f1deb88087491eaf91cc902772ab79", + "value": "data/train-00000-of-00001.parquet: 100%" + } + }, + "66402b5fde234614b92ea5786b108f5c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3c3b82e37f1249c29dcbc663dd95d4f8", + "max": 116531415, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_d3bf74e15c984629aafd83ad8f5e0d92", + "value": 116531415 + } + }, + "e6e4248babc74ac9ae46c47fb3d2a371": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_9371b6a9e05847cba1b74278e68955ad", + "placeholder": "​", + "style": "IPY_MODEL_502e2c08bc2941448e984a1f50401c72", + "value": " 117M/117M [00:01<00:00, 259MB/s]" + } + }, + "4caa19e33d8b4be7a1d3b083b0c1948d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f62d9af82c9146beab3ed27b85077af5": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "88f1deb88087491eaf91cc902772ab79": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3c3b82e37f1249c29dcbc663dd95d4f8": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d3bf74e15c984629aafd83ad8f5e0d92": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "9371b6a9e05847cba1b74278e68955ad": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "502e2c08bc2941448e984a1f50401c72": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "59cbb4c555d848079284ffa4f63cdb04": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_e0b731ea60c24ff98c0803c6072568b0", + "IPY_MODEL_a01e4ad102e74c869953e5303fb5da79", + "IPY_MODEL_956c0ca869d04bd084abff9e9ff8afa7" + ], + "layout": "IPY_MODEL_095b148b6c8d4201a4c631fd0903c11d" + } + }, + "e0b731ea60c24ff98c0803c6072568b0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_1527b6baf1e24722aecbe33f6ffd1e52", + "placeholder": "​", + "style": "IPY_MODEL_ba42d40812c1453ab8208bce2693e79d", + "value": "Generating train split: 100%" + } + }, + "a01e4ad102e74c869953e5303fb5da79": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_1fd1672a2bb041e8be285cd8fb1a413e", + "max": 100000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_5f1175f9e50e4183b99fc7106a9b39c7", + "value": 100000 + } + }, + "956c0ca869d04bd084abff9e9ff8afa7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_95800a9caeed4806a66b25ead152aed2", + "placeholder": "​", + "style": "IPY_MODEL_476c623468944e438825f5269b5acf04", + "value": " 100000/100000 [00:00<00:00, 144178.01 examples/s]" + } + }, + "095b148b6c8d4201a4c631fd0903c11d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1527b6baf1e24722aecbe33f6ffd1e52": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ba42d40812c1453ab8208bce2693e79d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "1fd1672a2bb041e8be285cd8fb1a413e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5f1175f9e50e4183b99fc7106a9b39c7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "95800a9caeed4806a66b25ead152aed2": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "476c623468944e438825f5269b5acf04": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "6049cf47e5e041a4beea128794964783": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_5ba4d1ced34a475f815ec6058655d74b", + "IPY_MODEL_7f640a553df64132a426f5b7feb782ad", + "IPY_MODEL_dd010cedf382401885dc4ce94b3147c8" + ], + "layout": "IPY_MODEL_5421003613f048069cb55182938d6791" + } + }, + "5ba4d1ced34a475f815ec6058655d74b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_4828ea99e77c4902859515b982960f4a", + "placeholder": "​", + "style": "IPY_MODEL_49905865471146b8b91e88a4e60a6208", + "value": "Unsloth: Standardizing formats (num_proc=16): 100%" + } + }, + "7f640a553df64132a426f5b7feb782ad": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d987e0891a894c19891f2dff1a0aafba", + "max": 3000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_b9fc627b365d4a4fad300999cd193586", + "value": 3000 + } + }, + "dd010cedf382401885dc4ce94b3147c8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_e68d04fef6354c2e874adb5e1673591c", + "placeholder": "​", + "style": "IPY_MODEL_5d59bfca1ac44464aa9d4c6df2a4dfc6", + "value": " 3000/3000 [00:01<00:00, 215.27 examples/s]" + } + }, + "5421003613f048069cb55182938d6791": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4828ea99e77c4902859515b982960f4a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "49905865471146b8b91e88a4e60a6208": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "d987e0891a894c19891f2dff1a0aafba": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b9fc627b365d4a4fad300999cd193586": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "e68d04fef6354c2e874adb5e1673591c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5d59bfca1ac44464aa9d4c6df2a4dfc6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "e74375a06bf84083bf0198a59d0a8204": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_d097e15ca15c4d2eb74520873c028809", + "IPY_MODEL_09205f37e867471cba0c6d9bc93d1fa6", + "IPY_MODEL_8ff9730f06e44b4f835a79f8aa948815" + ], + "layout": "IPY_MODEL_1984c66eaca44ac487b177b939d31122" + } + }, + "d097e15ca15c4d2eb74520873c028809": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_42355bb88a2246d4bb25e05c6d1e6aca", + "placeholder": "​", + "style": "IPY_MODEL_7369d3e3e2bf4530ab7f0999b393a2ea", + "value": "Map: 100%" + } + }, + "09205f37e867471cba0c6d9bc93d1fa6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3caf02c6f3524f889a4a22e2c09ea2ec", + "max": 3000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_3f97ac1599e5474f818d83971acc18e9", + "value": 3000 + } + }, + "8ff9730f06e44b4f835a79f8aa948815": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_04e9b0514e284cf3aba462aca7906064", + "placeholder": "​", + "style": "IPY_MODEL_0bea037f647346fc9feff2f63659422d", + "value": " 3000/3000 [00:00<00:00, 10342.06 examples/s]" + } + }, + "1984c66eaca44ac487b177b939d31122": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "42355bb88a2246d4bb25e05c6d1e6aca": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7369d3e3e2bf4530ab7f0999b393a2ea": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3caf02c6f3524f889a4a22e2c09ea2ec": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3f97ac1599e5474f818d83971acc18e9": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "04e9b0514e284cf3aba462aca7906064": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0bea037f647346fc9feff2f63659422d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "872c4aa8a612431aaf11581cf8b7e966": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_cc0fdf131fe944ae8686ab413b48b40b", + "IPY_MODEL_ccf366fe1da44dad97716da52be0fa95", + "IPY_MODEL_559f5ce577684e7d8fbb3eeca3df7481" + ], + "layout": "IPY_MODEL_9ab84bc270d5426ea6503200962c0008" + } + }, + "cc0fdf131fe944ae8686ab413b48b40b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_cdeb781f1d16460c8155bd7616790995", + "placeholder": "​", + "style": "IPY_MODEL_86b55c60b19b4f60b632c94ffdb3597d", + "value": "Unsloth: Tokenizing ["text"] (num_proc=16): 100%" + } + }, + "ccf366fe1da44dad97716da52be0fa95": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_2025f98694a14209a17310de5e27f29f", + "max": 3000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_22e6e92bdff246b5b4f8aa2240bcba77", + "value": 3000 + } + }, + "559f5ce577684e7d8fbb3eeca3df7481": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0d0d95b0a65643fdb1aac22b0fc22f25", + "placeholder": "​", + "style": "IPY_MODEL_bee9f63027394da28a6faa2cda06d291", + "value": " 3000/3000 [01:15<00:00, 41.22 examples/s]" + } + }, + "9ab84bc270d5426ea6503200962c0008": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "cdeb781f1d16460c8155bd7616790995": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "86b55c60b19b4f60b632c94ffdb3597d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "2025f98694a14209a17310de5e27f29f": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "22e6e92bdff246b5b4f8aa2240bcba77": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "0d0d95b0a65643fdb1aac22b0fc22f25": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "bee9f63027394da28a6faa2cda06d291": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "600afd872d0b4dadb9677e9d59620d58": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_03e9c6947b0b4d6fa0af144ef565d031", + "IPY_MODEL_dc810e34048d4513978b2305648244d1", + "IPY_MODEL_1c2a909bce414df7b84adedc6926c93e" + ], + "layout": "IPY_MODEL_34b591f71b084110aadd8faa7b7114ec" + } + }, + "03e9c6947b0b4d6fa0af144ef565d031": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_6e803de562a14c9abc53c8b443d491f7", + "placeholder": "​", + "style": "IPY_MODEL_9ba237b2fba6415c978bd4a5bcc770ea", + "value": "Map (num_proc=16): 100%" + } + }, + "dc810e34048d4513978b2305648244d1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c59fccf351e140b3b13a1e7ffcfcc831", + "max": 3000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_78e8c0b69d874b80a2fbea663e69ba31", + "value": 3000 + } + }, + "1c2a909bce414df7b84adedc6926c93e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d50c59ac24704031adb090719c434fda", + "placeholder": "​", + "style": "IPY_MODEL_4b4f33ad34da4f6497253382c7844cda", + "value": " 3000/3000 [00:01<00:00, 3327.87 examples/s]" + } + }, + "34b591f71b084110aadd8faa7b7114ec": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6e803de562a14c9abc53c8b443d491f7": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9ba237b2fba6415c978bd4a5bcc770ea": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "c59fccf351e140b3b13a1e7ffcfcc831": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "78e8c0b69d874b80a2fbea663e69ba31": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "d50c59ac24704031adb090719c434fda": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4b4f33ad34da4f6497253382c7844cda": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "6152711b99344dc6b6d141b128af930b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_13d590f781154376b848f9d756e8796c", + "IPY_MODEL_6724357188ef4e299617e98e225cb8de", + "IPY_MODEL_e6ce43521f604724b1f06544c905fe8f" + ], + "layout": "IPY_MODEL_9dab1d53f37740aba9d4305b49a7762f" + } + }, + "13d590f781154376b848f9d756e8796c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ce279b6352454b8cba47abe09ab90fbf", + "placeholder": "​", + "style": "IPY_MODEL_cf39c43b758d4dfc8a0381cf5f7a0a88", + "value": "Filter (num_proc=16): 100%" + } + }, + "6724357188ef4e299617e98e225cb8de": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c8d2257696c249acba128829c980b7af", + "max": 3000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_7455963b87ab44eebcf2ef2e2b210a76", + "value": 3000 + } + }, + "e6ce43521f604724b1f06544c905fe8f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_fc174c4d0bce41b391202791257feb30", + "placeholder": "​", + "style": "IPY_MODEL_66340c02de25445d9c432e1c183e468a", + "value": " 3000/3000 [00:01<00:00, 1758.59 examples/s]" + } + }, + "9dab1d53f37740aba9d4305b49a7762f": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ce279b6352454b8cba47abe09ab90fbf": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "cf39c43b758d4dfc8a0381cf5f7a0a88": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "c8d2257696c249acba128829c980b7af": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7455963b87ab44eebcf2ef2e2b210a76": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "fc174c4d0bce41b391202791257feb30": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "66340c02de25445d9c432e1c183e468a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "state": {} + } + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/tooling/fine-tuning/unsloth/notebooks/Gemma4_(31B)-Vision.ipynb b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(31B)-Vision.ipynb new file mode 100644 index 0000000..2ebf928 --- /dev/null +++ b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(31B)-Vision.ipynb @@ -0,0 +1,6404 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "2vQXvnUUsTzI" + }, + "source": [ + "To run this, press \"*Runtime*\" and press \"*Run all*\" on a Google Colab A100 instance!\n", + "
\n", + "\n", + "\n", + " Join Discord if you need help + ⭐ Star us on Github ⭐\n", + "
\n", + "\n", + "To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme).\n", + "\n", + "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7j01DfVgsTzJ" + }, + "source": [ + "### News" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6dT42nHksTzJ" + }, + "source": [ + "Introducing **Unsloth Studio** - a new open source, no-code web UI to train and run LLMs. [Blog](https://unsloth.ai/docs/new/studio) • [Notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb)\n", + "\n", + "\n", + "\n", + "\n", + "
\"Unsloth
Train models — no code needed
\"Unsloth
Run GGUF models on Mac, Windows & Linux
\n", + "\n", + "Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe)\n", + "\n", + "Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context)\n", + "\n", + "New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) • [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) • [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning)\n", + "\n", + "Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "K7fgQkATsTzK" + }, + "source": [ + "### Installation" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "vA7IKFdUsTzK" + }, + "outputs": [], + "source": "%%capture\nimport os, re\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r'[\\d]{1,}\\.[\\d]{1,}', str(torch.__version__)).group(0)\n xformers = 'xformers==' + {'2.10':'0.0.34','2.9':'0.0.33.post1','2.8':'0.0.32.post2'}.get(v, \"0.0.34\")\n !pip install sentencepiece protobuf \"datasets==4.3.0\" \"huggingface_hub>=0.34.0\" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;" + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "Mp4i13PHsTzK" + }, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GFOEZbP7ONMs" + }, + "source": [ + "### Unsloth" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "QmUBVEnvCDJv", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 461, + "referenced_widgets": [ + "cca48201ab524349b98bd7dc3ec0a371", + "a74922ad802c4cfe96d3f7c9d0215877", + "30e429603fbb48469d80e5ca48305216", + "2bc496ecfb4e410ba3ba46451dd38cac", + "ddb467daf231485199b6820c1c87be59", + "6c30ba7b3fc84a63b7b8975e8b17991c", + "3b10e74e5c6b421092b314f71f7e9217", + "d2bc2f8ade924beca4c2db847efa6f5d", + "eed52dc4a67d4253983ee7b20918152d", + "642d63ab9f7c451fa96a0633c1a7f23c", + "72cbdfb8889a4da0bcc0a4e5d5a67ac2", + "26121d7e6555452ab0f5ac2a6681d864", + "e239143f8ec0425db6135a5d749c2c53", + "b86cb0f30f674656aff10b0701f9f26f", + "a7cf44db72fc47b8b8f7a084b7585656", + "acbd99a66aa846d78396dcf40ebc8bd7", + "35ae415e878e486fa0caad09b8408067", + "0b4dda8099954b0d8851c05e24ed0e35", + "cbd50ad058134a36bfcd386deac87fb8", + "1d572d34d67b4d6b832203a27f44f93c", + "b988bc006f284214883cba413dfb5fc3", + "1b43696ca662419ea0c970b05d89441d", + "69394dfeff2a4286bb7088f599bf4c0b", + "a4e7a791f92449978900effc00e5100a", + "49e5b9f4cea744fc8f38a8adabc0bd42", + "08785606a9034297aa16181720c51142", + "e6e357b71bb8474d82d98ce813aa2183", + "a36232c6617141a1b0fdfe0108550ae0", + "7563a57897d846f6ba41007b0c92135e", + "da65718025e64f83a198f99f8fa48c3b", + "96e509dc48aa456da20260357e60677b", + "63f6774afc314b52b1ba82f5335d67a3", + "b4357ad98d3b4113ae9dab45333b8ee4", + "b7eddc272b2f4b0db9d910b1fa53bae1", + "c25094678b464bd0b693dcfe333da14f", + "5332bd2cc65145c49a843bc835d51373", + "903d6cf82ea44919a1739b2fbae8f16a", + "5382cb936c2b46d59e1819622ee8ef39", + "bc3b830bad5c4d78b64406b4f19cc370", + "09a61bb08189489687754346b906b875", + "91bb3728a52042aeb9947caf54dda34e", + "1aadf052247c44d8bb0081cd68234fa8", + "1e5f5a2acd9b47ed972c98ef4428f03d", + "31cbc371336f456795651381a7e3ad0a", + "0dbb262cf30643659120f8cdb5b3171c", + "abd6e0b4b3f24f1cb818aab793d61477", + "b30a2b86573844d5bf4a6f43e5bd8823", + "6c98eca1409f4e1ba2e7865b2992f51e", + "bceaf428e3fc4bcea2318512a7c6ef4b", + "5cdef7c8d58c441daaaddb9f34bc1a7d", + "5c4901ca1dd44ee69bb5bb76b1cb3fb4", + "25768d586f0f4d5cb9c86d01b13e7fa1", + "f962bb7a63094308b1ec98cd84c90637", + "9a305de1de4a4fb797d14e996e802ca0", + "29a723a6b8b7436faa14620beff25597", + "87fa7a1186c04503994d3ca6781f2fa2", + "70dd1d3746a14a5cac5fb65edb8ac231", + "6144783cd98542e3ac0488d1f1c92c35", + "d605282333054e82a1d8507be830f02f", + "dfbb2f74208243238c1dc88dcc4d45e7", + "555c3a2ab47c4cb4b872fd067c15c600", + "e60657aaf58d4373a3e777352ab720c3", + "7c9085eed1be41b8b6d61f5eef968825", + "0357ffa5f3224994a2917f7f09affff3", + "062b5afb6d9f492aa62fd5547fc6d5eb", + "6e53b93ebdf045e9b94eedd1bd489b3d", + "d2a7e731aa05474299243a698d8f8e76", + "9c8ecc577ca6479aad38d1af6c783685", + "5cd3447374284aa6a9d62e5a72f5ff3c", + "b49a335de14944a5afb36a8ec263c6c8", + "94e9455af5d94b089a5ceb122446b782", + "3aae33b4c0ef45e597513265bc4b0681", + "2efb2666390a482b913a30109a7e2472", + "0ce0b6f052db48c28f7788e95df7c1f3", + "01917cd89aef4db0aa18675636f4e981", + "58ee20c84f774dceaa69985bd3c63afa", + "9d8eb917864e4be1b788f9cfcd64e182", + "a955958aff0444a4839aafca8566065c", + "867f014d40bd409fa6c845436b050fd3", + "dd08aebf4c9b44b692534426ea948cdb", + "6e0c925129ea45b6bce91769e084e9f3", + "e3a3e391d3114a94985fd3db6cc5ee1a", + "c224a759bea64ffbbf3b645a59cd79c5", + "e696b055c6934f1186078767877ff2a2", + "319292819e264a19a27685601c70f9b6", + "bb1795785a6a44ec9b17b31b1075cbb3", + "5536ee38467e4557bbb1209cb7b4e424", + "b9c49626bcbb4e8bb6d4a01a08fab4dd", + "4cfc78a661dd4837b26a78d7dba376b0", + "ee345697fcfc4468a50627bcff074655", + "3ed7a054b0104bd09132227f1ed413ba", + "59667e10734c42218476077f7741da15", + "e1d2d9a63d3f43dca2a8e097470d2c76", + "a743692b1eae456cad70ecf0d3752493", + "d6ad093be12e4a3e9b7c75160aeb2fb9", + "746d3b1ccba940e69fc9bf50eb1986c8", + "580577e9f9a5425f9ca10479f42b3d16", + "27b4eafed4584bcda6f88f1fb82f9212", + "d6caa192c711497b856f18df2f5d9a62" + ] + }, + "outputId": "a348cecd-dff0-4511-80fb-c1c3e1079230" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.\n", + "Unsloth: Your Flash Attention 2 installation seems to be broken. Using Xformers instead. No performance changes will be seen.\n", + "🦥 Unsloth Zoo will now patch everything to make training faster!\n", + "==((====))== Unsloth 2026.4.4: Fast Gemma4 patching. Transformers: 5.5.0.\n", + " \\\\ /| NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.494 GB. Platform: Linux.\n", + "O^O/ \\_/ \\ Torch: 2.10.0+cu128. CUDA: 8.0. CUDA Toolkit: 12.8. Triton: 3.6.0\n", + "\\ / Bfloat16 = TRUE. FA [Xformers = 0.0.34. FA2 = False]\n", + " \"-____-\" Free license: http://github.com/unslothai/unsloth\n", + "Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "model.safetensors.index.json: 0.00B [00:00, ?B/s]" + ], + "application/vnd.jupyter.widget-view+json": { + "version_major": 2, + "version_minor": 0, + "model_id": "cca48201ab524349b98bd7dc3ec0a371" + } + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Downloading (incomplete total...): 0.00B [00:00, ?B/s]" + ], + "application/vnd.jupyter.widget-view+json": { + "version_major": 2, + "version_minor": 0, + "model_id": "26121d7e6555452ab0f5ac2a6681d864" + } + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Fetching 2 files: 0%| | 0/2 [00:00\n", + "### Data Prep\n", + "We'll use a sampled dataset of handwritten math formulas. The objective is to convert these images into a computer-readable format—specifically LaTeX—so they can be rendered. This is particularly useful for complex expressions.\n", + "\n", + "You can access the dataset [here](https://huggingface.co/datasets/unsloth/LaTeX_OCR). The full dataset is [here](https://huggingface.co/datasets/linxy/LaTeX_OCR)." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "LjY75GoYUCB8", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 177, + "referenced_widgets": [ + "19f9e04dafa744d0bbafe658a8cd4479", + "f261579b58c94894a6c186255415e54b", + "d36d30757ab34602ab6411c282171802", + "0ac1ef374ce34494a9ade7b68c1fcb7a", + "588e76027ce9499e8e7683f00c8bd999", + "23bb3393ad0342aaa993c5109af01cbb", + "1b9c16155dec43db992c6140ea535912", + "5b938c6422c64286aed0276899c260b3", + "74fce0116e3b4cf4abde0af7563ebccb", + "a67007d24e9b4a11a35f955db150d486", + "f91df33daa9348e7abede2d855dc0ecb", + "717b21eed2084ebe9ddcf9a0da8c1c7f", + "1e690555c06540359b89b87c6ae62a9c", + "bddd2d47d2bd40fb9d68be94ebacc0d7", + "b875ef0b812d44cd8e1fd102b597149b", + "91c2f197fcf94b02bbbb780cced6c0de", + "8cd8dc2aafd04685b8cb3fb746f08005", + "f30f8e430cad427d9086bc99f6f146be", + "c048a0b91a5a4bf7803fb3d5ba994acd", + "6de4762ccf1a42e6b51f18880b37cd50", + "4d0707913c734e899854b548bc3364a7", + "bd20452eab634993b4a716a1cdd9a4ea", + "ae951bf9259b4054a23534387dc83f8d", + "647e5bcc9d3a4cf4a1e9889cd805c45e", + "a71f8702013a453c894debaa8640a215", + "04ff990ba34c43d99e872fec5f39f0a7", + "1f722501e2894a22b83ed42f482cdec7", + "f28bf1e32d7542479fe827a3723089c9", + "4c298b0109bb4734a876c69ef32e5f6b", + "51da57c93dd549b6981874d40f26049e", + "282c398a45784093a51432726ec260c5", + "b4d4af7fd215445282451eadc69f9bd5", + "b9825625d8f54c188698e0a8d5a03e0b", + "a469f167acc94d0a93fb8bf94fe5c13a", + "4153276526fc459ea16f69f29f82d8dd", + "2006a3d483f24f96b38021cb9a7e3af8", + "77b8d2488e044518add72302d4875797", + "ebb50b363f5d4fd49c35937434858d4d", + "22d14ea43aa6418786ff501f5e1e0cc1", + "1ca26cff3ae149d88a53bb1e165a299a", + "28eec1a723c9433baff472e352125935", + "15583a029d5e4a78a60f68c865d064b1", + "784a50c5430743d98d944aaed6c23479", + "81f8fcd62a8d4077830a3be2ed444cfb", + "e4a0c21c1bc341e589cce85b98ce2737", + "a14ce85211314a9ba56598ef14968b7c", + "116190e5f17946b88241a2b318522765", + "299a8904e4834fe38d81c804e0669a22", + "0a161c324ea94abf955d4824199d79bf", + "40b91e46da234c6bbef9adb542336f43", + "3a3ec79b99c8435ea208b9292b0d3bbb", + "9a55c6d9f4284fae92e4c4fd079c7905", + "9ef11b00c45d436794e9acad12715ec9", + "6d20f7eb80cf4d54a1c34e3d185ee227", + "aa2ede79b4464737a0d504e403af5791" + ] + }, + "outputId": "7654cb9e-7320-431b-d486-a0e92053facf" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "README.md: 0%| | 0.00/519 [00:00" + ], + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAUAAAAAyCAIAAACib5WDAAAYrUlEQVR4Ae3cebSuUx0HcCSFEBHKlNZCQi2zureMyTwrY2UeQlSaXNdQxjSSMltFSJF5bjCzDMuilUUUIjSKotLtc+83+z73eYd7znve877n3vXuP56zn/389m/av2FP75l10qRJswzKQAMDDcyYGphtxmS7P1wPgl1/9N4nqoZ77I/4TOvA3dU+bP/5z39mnXXWPtnSgGwfNGC4lf/+9799oD1kkjOnA1N6tD9kPUwHELbZZ5/9hRde+Oc//zkd0Ol9FgteffXVsR/apyfHTP6dCT3zzDP/+Mc/ZpttTPvImGauYxuh9BdffPHvf//7v/71r46RpCNPM5Zc7uijj37729/+ne98J9m4Y7Riwete9zpPKb1jJIOOo6qBjPinP/3pt7zlLddcc82///1vBjCqFDtHzhxnpsIraP/nP//5IossMvfcc3/5y18mHTfuWMZky0022YSKf/rTnwrJSHSMTd8//vGP55133qabbvr888/Dg9uOsQ06jpIG4sCvvPLKxhtvbNzFboS48SiRGwnamTADS25i5+abbz7PPPMYg85j2yyz8F7J/IYbbrjyyiu/8pWvwPmGN7xB/uwAp0HSy7xggw02uO2222688cbXv/71HeAZdOmBBpiQYoAuv/zy973vfccee+xf//pXa6gMYg8YGAaJkXj/WOub3HjVVVfxsb/85S+m0JasI2EyqVs4gNACmD8rI0Go+9/+9jcYllpqKalYZZCBR6LPUe2blHv77bdzp2OOOQatkUzlRonV2Yfh62MelDNImBxYsJxzzjlly66wbCrO8cRjyA3DSHCK6/POO6/JM+NQHwmqbvUl0RjhpFsSTRdPBnG6UovaIFdffXWQpk7TRdsXgJlnCk3XfEyMPPPMMw8++OA55pgj8XLkahUXIMmojxwbPCxj5Hi6hWG6dtwtQmMHD5GHKDWwTOLG1JBVNdmJAzNoCaSKZezUX375ZRqXMKeM0VDHqZf8D9F0esOSaCK3ZBrfG4pjgQojUYYYkU278DxE4N5L14kDE6mLGzBU0xXtWABT33e/+13PDTfc0DOq771O21MkbFJ6F91migqHp8Z0sTez8sorW94LK1qqnAeg2tL3urgc1TXlxCcATT+Vxixrv/SlL33xi18kcmymfJ0RK00cmFTkVFQoRUmLp0kpIR3STJw4EYB6bdSHpQJ94ZycJae971IolvEoLWGpFRUILS/NdtZbbz0w7R0YqqqNtsfciuJw2yPs/PPPb33umdfhIqnBFx3W1FgDa3wFjw1jqlLdrqdtwIW3kQxxI9HGFvxPMbfJ9la+ptEzzGhnDEa2OqbVrwB8AlBspqAqFYKAIReYPAvyAjPDVZo4sB0gCVZRIbCSFk8LSzJffPHFRx11VFXdnYlNiXBKAgoqBUmhWBYepSUsFchSMTYYlknOPvtswwNh+dSqAlWxUTCtMLfq3lk7S3388cfPPffcxx57zFM9cbAzbHoRHOe23J9++umXXnqpqsamOA2f2SN31fGCCy5Ya621jOl73vOeww47jDYwEyvXF8I//elPKto1NsWWRthEgTb2oHsbDPifYm6T7a1QSaNnJIr3/vjHP77++uvBBFv5Gj/0CUBTHw6HCRNO4J3nX3311Ysttli2poItmgHTJgQU9sZOZarKwpOROO200/785z+TiiRbb7219lxGIZvbEXvsscc555zzqU99yjavFqrvQBjIFR0///nP/+hHP1I/5ZRTHJozAght3N9///2sxwbghz70IWBarrvuOpaEpQ9/+MNrrrmmLo3G6uDXIY30izfjAb4Vb4z41FNP/dnPfrbffvtB+PGPf3y++ebbZpttxo8fr2MJHK26d9wOuSNl04TDDz/8d7/7nfqee+7ZMTYdyXjkkUf+5Cc/eec73/mLX/zC0Gy22Wa01CjCZI1PmvTkk0/+8pe/3GWXXfR985vfTN7vf//7H/vYx3JjIZzcddddAjTH/tWvfrXoooti8o1vfGNoBaA8jRcvEol23nnnueaaq7TXKm0Ggr1973vf+/3vf6+7Kclee+1lUsDfTj75ZPEI/o022shwE8ftl69//euXXnopKfQycG9605vKkGl817veteWWW7LJj370o0UDNQ7dw1lppZWeeuopA7HPPvuwE6xO1su0mqnxP6Zfw32epCI55wnHZ511ltDOJahVi+QmYwhd6txJF9qpdh96HRXAa6+9ttj/3HPPPfLIIwxF1MCAAeC6biyiwm4SSjzTgiVf0QVZJRdOEp4du/tkjKsApa6jXS5fP/GJTyDhvqtPv/nNb9T333//Wscg4WyChbH3tUa3oO19JSJLJjjHP1ZlVG7geBmTjXymhaplXZJedtll4XnxxRdPxaAo9L/ccsvtuOOOvOjZZ59997vf7RgcQKM+IQTv0yGHHLLDDjt89rOfbaUiCldCpfZE7uGHH+ZUpHjwwQchRBfMgQceqEVoMNxePR2/uf2irgswU5jqkFUNoGigkUNXA2DA6qGHHhpUnjXNuLyhsdgYoQQga2aNjUrQ2N8ydeJKHQpL5VTS4Gc+8xkmLgyLcwsvvLBP0pTrB4L9Cius4DoRJVbDvFeabVpIOAX3/x8gUfnhD38oOggWCy20EO8VbumRpnxdYIEFjAE7A3bSSSdpEZv5ueSPJXV0a0EdDOx33HGHp15VctV6SIgLBGSmKi67alx66aWdPIlQ5t4+1RhmUhnjKqqO6yxDoahUOsaDJX3vvPNOHoh/bH/kIx/BqrmMdvirmOmHxi655JI11ljD1SIC0hIM4pcn3QabRnMTWdr4Gvq3vvWtPPMb3/gG95Npq2pRhxM5OfzXv/61QSRObVAKG+bnSnmtMgbPMsssI4HDb0QwYHDNbO+77z5Ts1133dXMCLzuku26667LhUAi/Y53vKM6ZDpqXH/99QlIAzihgUYOkdPOjRWVSNGomSqH6sB4QVRU+9T/V8yVYgzUr732WiKZjJV2NxmoJucNJld8mGoCXGCGXkkYk9ZM3qTWP/zhD8zFPFY7nDBTrtF69NFHGZBZMeuha3nAJMrX6L1Kjma1szzzbXacpK2xClPqIoU1gtD+gQ98wDQMGPyeN910k8FgrCDDYWiZgIj9jJ6xYq+KVr1fBZNIe2KSoR900EHGxcycHbP18lVFAaPQKiu86KKL0pinzOYXGslykUW7GabwLUdZYpxwwgnU0qjzqAiMcN/UEoItsc9kVdGS1yoD8Gi3LYxKYUCG5IfA8pXhielm8qQIrQxEbcgCTA+rrrqqugJDjcMI8slPfjKzLTbTSjP64seTwZhpM9Hw4zmmyjRrYMKIf7feeisWb7nllrvvvptapSkyyIfcmPzmEtXEG717mr+54suAiO21FIGcEi2e4RQXPBNiWQls1sDg5QHZmO9hAAZgeLCoM49ijg888MAqq6xy/vnnW+zla0FeKtphM4VOdi3tpRLRzBsF+wMOOAB+S0H7GWhFHPN5TFpiGVrYwq1gbwpgb+zEE08kCCTgC85qvTT2poI9xexup512MuWjRiZuZWsBabBkVMo0VSnMkMhFbqJtt912SadayjhGENKR1xIJTsq3bhJhH3roIWAGSBQzP4cWTpAq3Nuc5YknngDAWyDUvVAMTkrTkqeWVApMKto5pDqHNAomQcKrqT4qcPpK/1xI+vVKapAoqtSGDKRimiYKQ2JSAGGNw3RncpEdQqWmGZ80ogIbo0L6c5/73BFHHCGC+E0LrtI3zPf9OVXjZDMAYpKJqFBN11rIwCtwPG7cOLwSTJ38kTDcgzF+VjKkVY+O8gkYczEYxYGDRCSWJ0V3eVVfc/JvfvObjqYMJPwyfHZZ9t133+OOO+4LX/jChRdeKCTrizQSQV57winEEKHKQGC04MSg2uSQMfwswTrZls/uu+8OwCcABuZtb3ubEzJgCy64oBZ2Y+FtN4Wfy8Bl1AtOkMBqIte46u5raHkaIC701a9+1RqYYWULyg6F7UZGRmNOCriW+SeVxi35m0ZSiHHBE+YJbuDwSfMkIqyT4cRuK2GE7Ir99re/pTrxNMeHNAatOAgYNiNY80yYo5xQF2jgN+J55RJKNAOVinP7448/3tTPBuS2227LA8ULjEXnrtbZEtc3XTy1e9aGLI34YQZkwUMjh5GUQoQ5GOhQwKppplBRiTmZj5jS2xtjhzbV0IW/CtbPOjlTyK8SLeD4teZJzB1/5Wtp76xiGHQ0JJnpBcl73/te29rqJkue9G5OmE/qlMWksovAS9NefTIgr1ydXYqUWE1Lgcmr++gE4beClAl8xgYMlnQxR0qMkGyNkPavfe1r4KMKXaIB7amYkVqzmX86jfAcYhEjUoYIX8CqvZZddlkugSVukKmdVzzzWxaJQ5+owrxUI0vVYrFgL5ekitcUX8kiZsW1NEZF9vzV2ToZaSD3+IVazgYy4ltZyPb2Mr1WceZrTTk8SiHLkksu6WlSE/yegVdBSLt77CuuuKLX4MzAkctXMgYe2+G8NmSxDbsndCX4gm/kMORMKEwwQ6KpZhAqpXCYfbUsuEpjwPCDVc/Sq2eVqRkYT9R08803cxgHDMYvhQOvs846+E7kBlMtGGUrLMYsDkz1kzpUpBLChWoVr6h4mj8vv/zyZGZwBttWlrgL3iskNj8TNbTo++1vf/vee+9Ni741EuU1DokKfkpjtWJEfeKlxlWwNx7yjB0UacoK3EQDP5b6ZtGcVsfddtuNXOjapDFNxXkw58kiLRmIDzJKqNJKXTt+1PGWFqR1Zzp0GzwFUqXWUn2tdTQW7BV7VvLYphbSiTgmeygCFu+8WhCChJlicxgD2GvUmOz3wQ9+UAt+ggRR4RKYpCqW6bj33nvDKeB+61vfOuOMM6IcIth6hBM8zUAVbvMsygFGdlwhIcfmNem36MQngRt152omYqeffnpMUXuKqTu00SSVouUpE9aGLEJ5hge0zPtqHOaTdiUiN9XMa5Qnj6xCt5YV9sBNdug2NlxgVLCkVFt6Vw+LnuTxFJDQLo3xyYTMptmPMICBidCWWLXCSbQQGwyFAjYMjMPqhQuFSkbXjCuoRFAbV/kUljJ711FjYPI1z+DEm2mYwTCo2tGqwqDoFYcRLSdhVtRSqB3XnHVfccUVYEgKxpxKPRRzTiNva6mh1TLcIvbbtAs/w+rb2DGTQPOO4IkaZZ6wLSzyIoJEY056yJXNqhrdjEtNRWC06CKNq8PmmWSlAieX81WYYNxa2hfTKKUpTNgz+riFsPYPGPLV/FlSZWMw4FZFPgDcOGQAHClzTnoI/005hDZ2CL6NZnyFBLBpXSFXs4G8msUIcJ7h0LNnZaqv8gEx2wzHdohAaMxsx2+//fb2QqRB2YbMykg4y3gYe0tKOPm2nQmBPCdv3BUV7oQBnOAntsXfDJhX1IOhykNa7IFRsdAen6+BFTzCpIMo2xumGPZpDJ4NNh1zrA1/8Ni/QUJ+CzbrMXt4hZ9CPV+9CjppDIeoR1EcxlpA0VcjkWF2NGJhKdBoiX0Ac6qZRWMwaLGCZYWezHerrbZq7Ii6+YJzWr7hq6M+sSZswJzb4KZOaREuiUxSLYTldeEwXz2LimQw+0BwCrLmqMJEbNT+XzKnWAY/DMSR0GT4iRMnYlhjwabiVYngk933gAPKaw0Mfqq2N2HyzOqik8DoosJgnEeYXefVXK/VkAGQ5+V27CHdnsPI1UYzIUcDZamCtyrz6mmRDLDkWVpqYKP3OtWBjahdEBZseaPCPjzFM2Yk/Fv8YCIiNeWGJK1KsZWozGUMokLoaMGpBvwFoUYUw0DpVb42pZ5GmHkFG9W3KZ/BZrQcIaDC5YRVkKw5dk/ewDgstRXkE5wk0p7kZiCrmANsciHiWNqZiEKlUYmYtuJMzgUpIcMJLcaMsW0kSCwfZDMVyQT/whaFOJFmcCGhxam7mYgAauamY45/SkcWDxIhgR8JkcgZkhbc8gHLBI4qLOpos127wsd4mumoRstadEmXT3niXIUUOWUJq0jAKdfpZXExfvx4yIFFRrGVk/skfCRL18YoJNz0UvSqUdSiBBUxlfI65cvkR7rYTlPyFTO5hVIbspAWak211Dkw+OlyCKaVZoJQLIjG6Kcp/zDgBJineo/LVAfuAeHIb0/CJlaVXEyn2lKt02NUWW0s9TL8soHYDLKVlttQafUpqOx4Sd1GCNFwgqgKV3//+99vxc558rNv4U+7osWc0MF1+JTzJVIhI6sJXpTwYchDwjqflSiZvevldBfyfJ0wYUKtY1MZccWLINExdB2/ueSESqxZI2O1xKUo9UYkNT0kf5qFwpn56k1TTsvjabFXocQNZLEmgZjsIZ1nRgfz4T+vVYBavdbd13Thh5JwkUJ7jdXgMa/BqlHwOnkYpjDTnsOCp1Ez6W7cmRYlwNmosdDt43MaB54i8v8fRQV57wqLlCXnuP0bQ6cOwxM1FfyFh9LSvhKdZlrlWAVwGy0j52tIxP60KFUS1ZagkuWYRdWBY0kylXZTO93t96iXFakkoFfQOiHnzElQWri616RQnGBDwT8f1m7TBSHtkMfNQrexoxZgOEyJIVKvE7JI5GlV5sYbsLiiPG9qPW7cONEk3T1rRa8gLHqQyeVeYFqgsrrRXUUxkREsTByEGAwQpIZt6K+FXGMX/Gi02LH9Llx6rRHySkDbGQIWeQGHVZXpcqgvQZpqRjsM9G9kM5rhRGNjCXBj+2i3TOPAo00MfsNsDGJtXSEXnfIcWi6htyuYIQnyRgdmH77KKg5gpSavll7FgUnHyhVrBFu49i3j5NpNOE2n9XXY7hk8Zte2HtAy740U/NC61w1nZqGX4pJZtaMW3duUpvYkavA9qaZNx1afwmojWh5i1dPGsiHUN91bIW/fHiVwJDe0nJIAromfV5P/Mu2vIhwKh001E2GH6MBVir2s99qBuysbFbMM2rcZZg2cZNVoZB0TbeXAjQjtJ/G97BSwmNVWW+0HP/iB1antJROwmJ0rfmBYoQWhe7yQJJPbwdpiiy28OpwEkHNXW6lZz3Nmy85axzYOk5QSDqc4zjTzi7QP61lDUsU/LDy9AR5JpKhxOEM48Owso8eFmth0t4jaNWVS1mYcw46ousOkbiFvj4cgyDkvtafiX89Ks7m96CajzOzkUIHB7w1sQY0bN841I7f2uLcVI4/1CQZPe/5LLLGEug0Y57pSNzxuiQlJcgsAedv9qmrHNqeO1fPVHI3CUEooDkv/NSRV/NB2gLAwM6wKQpwTM61kj7/VuB06hz0TZFhSTxe4Dw48LOuZrgAAZGBWZcu365jbU2dPvNcUmjfa3zbLlYGdgbl/ixO+J8GKJtZmiSmZ51dxsjZIfHWCoovixqJ9Y3dI/TNxlipAiErq1V7qjWZaA2j1ikSrT521dx1hKzYQah+aWzn2EDkcIlgr9vrV3qfrI90Ql3HzEJj8NJ8b5ApKLT90g05zHPyTPdn2lFT90IL3urWS3+5Y9DoE4mN2m++55x5enVNZgQbPKUkXuAVmi9VPLJCRBFwCF4l05MNaIs5rnSb/1bE5Q4PW0dFAFD523ZvRzOjFhq01Jx9WuitLlpqOkdiGtSjkcTyEVJz0xmbKVX6nRz65+MH9+K09FT9b5b06ZqOlsKc7SDD+HwUk7soDCLnsqPN2MErpMqj0WAPRv3HhvWP2GKkPU+gRBkqjSKFykc0eZxiuFvABN5alMo0dzy3bcOUKR5DHUROMPWVaVyMkSb5t/mau6zKGnzG7deTQ0qzYRWsZdcKECUwhvQoVUujilrVr4QKEnze6pBH8bmXI5+qBKV0GlR5rgOsaXFsYBsJ+RI+pD5Uc5maskrjo2gAJXZDI/5SSytLeXVmS0nM9QIyAvJZIG8nJn25uaXda1vh10DKjaCDmZLhd+7Wi6foJZbf0MOOtgaUyyhUR7RtJuTJYrrPSSC3LDTWGtYaTeNFyRGQT2LV+u81W3QY1PXzCQCkmvT4BkHV5viQMzNfW6Cf/SBMkGM8CRpBCojQOKr3UQLzLuLij4lqru4Nu6RqjrhtYF4TqViToC56clCLNE0aJAZi5k8Hz0zxRw2Wv+G1TcubS/gFAdbXcFGzQOMY1YMTtR/h/YJZFthjVGcDY5HlWbHUhDPQDBS1nxVsqo82Febud5/woZ7RpDfD3VwPCtEVQTgT6y0l76jOwAxNM9OnNrAYhYSKHOu0VKlQPBaw9ksHXsaMBntzqhHksMDljO3CPNdizeNFjuQbkmmogk9PeZIimDAylceDAQ9HSAGaggTGqgRlvF3qMKnLA1kAD/dDAwIH7ofUBzYEGuqSBgQN3SZEDNAMN9EMDAwfuh9YHNAca6JIGBg7cJUUO0Aw00A8NDBy4H1of0BxooEsaGDhwlxQ5QDPQQD80MHDgfmh9QHOggS5pYODAXVLkAM1AA/3QwP8AGMg7qICuIqsAAAAASUVORK5CYII=\n", + "image/jpeg": "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAAyAUADASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD3+iq1/f22mWE17eSiK2gUvJIQSFUdScdhWYPF+hGGxmW/Vkv8/ZCI3PnY5O35eeOfpzQBuUVk6DrsevJfywQukNrey2iyMQRN5ZAZl9t24f8AATWtQAUUVkxeJdJm1dtLiu99yHaI7Y2KeYo3GPzMbd4GSVzkAHjigDWorOTXdMk1afSlvIzqEEXnPb87wmcbgMcjPcU7S9ZsNagefT7gTxI5jZgrABgcEcgcg8H0oAm1HULbStNudQvJBHbW0TSyueyqMmquhawdb0/7WdOv7A+YyGG+iEcnHfAJ4PY5rnPibIZtF0zRQpb+2NVtbNwoBPl797nBPI2oc9euO9dqOlAC0Vn3Gt6fbal/Z0s5+2fZ2uhCsbMxiU4LcA98DHXJqbT9RtNW06G/sJ0uLWdd0cqdGHrQBl2HiJ9U8SX+nWVkXs9Obybq9eTaPOKhvLRcHdgEbiSMZHWt6uO8Cf8AIQ8Y/wDYfl/9Ew12NABWZea5bWWsWOmSRXJmvHKRyLCfKB2O+C/TOI24GT045rTrmfEssq634eaOyvJ0tr1p5nggLqiGCVMkj/adeBz3oA6aijtRQAUVU1LU7PSLI3d7N5cQZUGFLFmYgKqqASxJIAAGTTNL1ey1iGWSzkZvJlMMqSRtG8bjBKsrAEHBB5HIIPQ0AXqKr3t9a6db/aLydIIdyoXkOFBY4GT25IrCsvFE9341vNEGm3YtYbaKVLrygEJYyZbdu5VtoC4HUNntQB0tFYq+K9GbUxp4uyZmna2DeS/lGYDJj8zGzfgdM57da2qACiq1rqFneyTx21zFLJA5SVFYFkYEjBHUdDVnNABRVe+vbfTbKa8u5PLt4V3SPgnaO5OO1UJfE2jwaRbarJeotjdMqwTbWxIW+7jjJz29aANeiqP9rWraqmmozvctB9oZVU/u484Bb0ycgDqcH0NQw+INPutKudRs5HuYLZnWURRsXVk+8u0gHcPTrQBqUVVTUbOXS11KO4R7JofPWZTlTHjduHtjmud1Lxi1v4k0PT7GymvrTUN+65t0EiDHAw24D5Tkt146c0AdZWBa+JH/AOEsn8PahYm1nMTXFlMsm+O6iBAbHAKupPK88cgkVrTajZ297DZzXMcdxOCYo3YAvggHGevUVy2u/wDJUfCP/XpqP8oaAOyooooAKKKKACiiigDi/ilfrb+CrjTkuEiutWePT4dzAZ81wrnkjgKWJ7euKxNMtp3+Kmm6bc61Hfw6Jpsk0SpEkflyOREFwpOSEVuvIB/2q9NZFb7yg/UUBEByFUH1AoAr2Gn2ul2KWdjAsNvHnbGvQZJJ/Mkn8a8/fVvExkYi48QKCeAPD8fH/j9elUmB6CgDIj1KS08JtqV55xeC1aaXzofKc7VJOVGdp46Vw2gxx/8ACV+Hre11Y6nb2tjPeXUStGYLKVgoDgoB8zF5eHLHDMeOtejajp8Wp2TWk5YROylgpxkBg2PocYPsamit4IFdYYY4w7F2CKBuY9ScdSaAPJ9cldIJfiNpEaX11p2qzKY4Hz5toALdk4908wdcZJHBr0vw/pzaT4fsLF8GSGFRIR/E+MsfxYk/jWiqqowoA+gpelAHAa+x1P4xeFdOUhk060udRmTGR8w8pCR2wc4NdF4z1m48PeDdW1a0jWS4tbZ5I1YZG7HBI9BnJ+lYugw/2h8UvFGrNl1sobfTIXyCAdvmyAHHq6ZGeuc9q7ZlV1KsAykYIIyCKAPIdD8T6ZpnjDW7/VNfl1aWysobSGSNQ5lkJDSrGFGPmkaMKo7hh0XI1fBV3rGlweIdCOnww6mhOqafYTTEIsVxlhGWx/BJvUkcZ/OvQ0sbSMIEtoVCBQuIwNoXO0DjjGTj0zUhij8wy7F8zbt3Y5x1xn0oA5H4bLAfDdxP50kuoz3076n5qhXW63YdSoJACgKBg/dAPetS4u/FK3Mi2+kaTJCGPltJqcisy54JUQHB9sn61leBP+P/AMZf9h+X/wBEw1z3ivSlTxFGttqivq7alDqDXcoCHTLQYVlZ+6MRtVD94k8cE0AdbNqvii28vz9K0KLzHEab9YkXcx6AZg5PtVaLxJrdxftYw2vhuS8XdugXW3LjacN8vkZ4PB9K5jxvDc3Y1bVvKtZLXz4tJV5kZp4kZ0RzbgjaHLO3zc8ovPGBt6ZoWpJ47a+n0mK3062e5+yGK5UjMpBklZcbjI7AcZCqM8EnNAGwL7xac40bReOv/E2k4/8AIFU73xLremKrX9r4btVYEgz626AgdTzB0GR+dcnrtwNA8SS+PyJRp0d8+l38aJkSW21U345BKzhvc5x2q3eeEb1PCOlWml6LB9quYJRqNwkqQzRLMA0yJuXGWPy5I+ULwM4wAX/FWoXV3feGNK1Oay0yG8kmurm4imWQJ5O0xrHJIgAZiwO7AIwcetY+jardeHkuvFUt2kuk6vriW5e8IWV7UKIIplORk7lycglkG71r0mHTLabSbW0vLG3ZIo0HkOokVCFxgbuuOmaq6n4V0nWbqW41C3Nw0lo1oFkclI0bO4ovRWIOCw5wAKAMGa403VfiTdabrUtoxsbaI6fZXBH7xpA3mShW4cgKFGM7RnpurdmsTpepaprkYWRTp8USW6rg5hMrcH38wDpxiodX8HabrVjp1leNK8Ni8bIWCu7bMYy7KWB+XkqQTk810NAHi+mwS69b+E0t9X+0ahqN6mt6jBbqn2e2Vf3p+RR8jbzGvJyxLZPp0d14zurv4eardPeWtrqNlff2beXNsS0cOZljaZeSQBG+8Z6Hr0rtLnQ7GfT7qzji+yx3Q/etaHyXb1O5cHPv71Fp/hrStKup57C0S38+3jt3ij4jKR5C/L0yA2M+gAoAXQbLRbTTIm0GKzFo8ahJbYqwkUdCXGd31JJ5Ncnq914oPi/w8X0jSllH2ny1XUpCrfuxnJ8njj2P4V1Hhzw1ZeGLS4t7IsRcTm4lJREBcgKcKiqo4UdAO571oy2VtPd291LCjT2+7ynI5TcMNj6igCGSGe/0WSC+RLeaeFkkEEhkCEgj5WIGevoK4H4ZC58Q+G9AvL2J1s9Is1gtVYFRLOq7GkI7hFGxT6lz/dNemU2ONIkCRoqKOgUYAoA47RNQg06+8a6nq8kVstvqIEkrN92BbeIpnk/3icepNZXg7U7mLxzqIurGTT7PxHH/AGlYW8xIcNHiOTcP4XZfLk29geea7L+woRr0+ppJhbqFYrq3ZAyTFD+7fnowBI9wRnoKuX1o13bOkUvkTlGWO4VAzRZGCVz3oA8xtLzTVh8PaPqF7bQ6FLf6kwWZ8RzmG5KwQ7jwV+bdg9fLA56V302gQnVdHu7XyYILAzfuY4wAwkXHGOBzz+NJP4V02bwqnh1IxHYpEsSgxpKcD1DqwJPOSRnknrzWjpunwaVpdpp1tu8i1hSCPccnaoAGT34FAHI+BZdL1wXer3Elpc6691J9oU4aS0COyRxhT8yAKB6ZJLc5pPHbNBrvhq50wtJ4gWeWOztSP3c8TKPOEhyNqgBTuGSCBgHOK3j4XsG8WJ4jYE3qQtCmERQAcZywUM3T+IkDJxWRrv8AyVLwj/156h/KGgDr5ZkggeaVgkaKWZj0AAyTUOn6haarYQ31jOk9rOu+OVOjD1FWaOlABRRRQAUUUUAFFFFABRRRQAUUUUAFZ+uWF3qej3FpY6lLp104BiuolDGNgQeh4IOMEdwT0rQooAyfD2hroOnyQG5e6uJ55Lm5uHUKZZXbJOBwAOAB2AArWoooAKKKKAOb0nQr/RfE2qzwSwS6Tqk32t0clZYJ9qq2OCGVgoPJG3Henz+BvDNxrY1qXRrWTUhKs4uGBLb1xg9e2B+VdDRQBzWl+C9MtGt7u7hFzqEchuGkZ28vz2JLSCPO0Nkn5tueBXSModCpzgjBwcUtFAGXD4c0iHSJtJWxiawmLNJbyZdGLHLZDE9Tz9ea1KKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigArmYdB1C78bHXtUlgFvZwyW2nW0JLFVcqXldiB8x2gbRwAOpNdNRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFAH/2Q==\n" + }, + "metadata": {}, + "execution_count": 7 + } + ], + "source": [ + "dataset[2][\"image\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "id": "lXjfJr4W6z8P", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 70 + }, + "outputId": "cb5eb325-8bd4-46f1-e3a9-f565d5bd289a" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'H ^ { \\\\prime } = \\\\beta N \\\\int d \\\\lambda \\\\biggl \\\\{ \\\\frac { 1 } { 2 \\\\beta ^ { 2 } N ^ { 2 } } \\\\partial _ { \\\\lambda } \\\\zeta ^ { \\\\dagger } \\\\partial _ { \\\\lambda } \\\\zeta + V ( \\\\lambda ) \\\\zeta ^ { \\\\dagger } \\\\zeta \\\\biggr \\\\} \\\\ .'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 8 + } + ], + "source": [ + "dataset[2][\"text\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rKHxfZua1CrS" + }, + "source": [ + "We can also render LaTeX directly in the browser!" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "id": "nPopsxAC1CrS", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 52 + }, + "outputId": "df37c129-7a34-43ce-962f-183f266bdb80" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/latex": "$\\displaystyle \\sigma ^ { \\mu } \\frac { \\lambda ^ { a } } { 2 } A _ { \\mu } ^ { a } .$" + }, + "metadata": {} + } + ], + "source": [ + "from IPython.display import display, Math, Latex\n", + "\n", + "latex = dataset[3][\"text\"]\n", + "display(Math(latex))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "K9CBpiISFa6C" + }, + "source": [ + "To format the dataset, all vision fine-tuning tasks should follow this format:\n", + "\n", + "```python\n", + "[\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"text\", \"text\": instruction},\n", + " {\"type\": \"image\", \"image\": sample[\"image\"]},\n", + " ],\n", + " },\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"text\", \"text\": instruction},\n", + " {\"type\": \"image\", \"image\": sample[\"image\"]},\n", + " ],\n", + " },\n", + "]\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "id": "oPXzJZzHEgXe" + }, + "outputs": [], + "source": [ + "instruction = \"Write the LaTeX representation for this image.\"\n", + "\n", + "def convert_to_conversation(sample):\n", + " conversation = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"text\", \"text\": instruction},\n", + " {\"type\": \"image\", \"image\": sample[\"image\"]},\n", + " ],\n", + " },\n", + " {\"role\": \"assistant\", \"content\": [{\"type\": \"text\", \"text\": sample[\"text\"]}]},\n", + " ]\n", + " return {\"messages\": conversation}\n", + "pass" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FY-9u-OD6_gE" + }, + "source": [ + "Let's convert the dataset into the \"correct\" format for finetuning:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "id": "gFW2qXIr7Ezy" + }, + "outputs": [], + "source": [ + "converted_dataset = [convert_to_conversation(sample) for sample in dataset]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ndDUB23CGAC5" + }, + "source": [ + "The first example is now structured like below:" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "id": "gGFzmplrEy9I", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "df44e584-c376-4d2d-c483-4285428edc37" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'messages': [{'role': 'user',\n", + " 'content': [{'type': 'text',\n", + " 'text': 'Write the LaTeX representation for this image.'},\n", + " {'type': 'image',\n", + " 'image': }]},\n", + " {'role': 'assistant',\n", + " 'content': [{'type': 'text',\n", + " 'text': '{ \\\\frac { N } { M } } \\\\in { \\\\bf Z } , { \\\\frac { M } { P } } \\\\in { \\\\bf Z } , { \\\\frac { P } { Q } } \\\\in { \\\\bf Z }'}]}]}" + ] + }, + "metadata": {}, + "execution_count": 12 + } + ], + "source": [ + "converted_dataset[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MsRPBIb0JJ6c" + }, + "source": [ + "Lets take the Gemma 4 instruction chat template and use it in our base model" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "id": "exoDVEvmJN-6" + }, + "outputs": [], + "source": [ + "from unsloth import get_chat_template\n", + "\n", + "processor = get_chat_template(\n", + " processor,\n", + " \"gemma-4-thinking\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FecKS-dA82f5" + }, + "source": [ + "Before fine-tuning, let us evaluate the base model's performance. We do not expect strong results, as it has not encountered this chat template before." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "id": "vcat4UxA81vr", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "d5089c79-8359-4dbe-bd66-2b1253310113" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "<|channel>thought\n", + "The LaTeX code for the equation in the image is:\n", + "\n", + "```latex\n", + "H' = \\beta N \\int d\\lambda \\left\\{ \\frac{1}{2\\beta^2 N^2} \\partial_\\lambda \\zeta^\\dagger \\partial_\\lambda \\zeta + \\mathcal{V}(\\lambda) \\zeta^\\dagger \\zeta \\right\\} .\n", + "```\n", + "\n", + "**Rendered Equation:**\n", + "$$H' = \\beta N \\int d\\lambda \\left\\{ \\frac{1}{2\\beta^2 N^2} \\partial_\\lambda \\zeta^\\dagger \\partial_\\\n" + ] + } + ], + "source": [ + "image = dataset[2][\"image\"]\n", + "instruction = \"Write the LaTeX representation for this image.\"\n", + "\n", + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\": \"image\"}, {\"type\": \"text\", \"text\": instruction}],\n", + " }\n", + "]\n", + "input_text = processor.apply_chat_template(messages, add_generation_prompt = True)\n", + "inputs = processor(\n", + " image,\n", + " input_text,\n", + " add_special_tokens = False,\n", + " return_tensors = \"pt\",\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "\n", + "text_streamer = TextStreamer(processor, skip_prompt = True)\n", + "result = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128,\n", + " use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FeAiMlQ71CrS" + }, + "source": [ + "You can see it's absolutely terrible! It doesn't follow instructions at all" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "idAEIeSQ3xdS" + }, + "source": [ + "\n", + "### Train the model\n", + "Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support `DPOTrainer` and `GRPOTrainer` for reinforcement learning!\n", + "\n", + "We use our new `UnslothVisionDataCollator` which will help in our vision finetuning setup." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "id": "95_Nn-89DhsL", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "1bedc6bd-3573-43d6-b930-46f1312d0870" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Unsloth: Model does not have a default image size - using 512\n" + ] + } + ], + "source": [ + "from unsloth.trainer import UnslothVisionDataCollator\n", + "from trl import SFTTrainer, SFTConfig\n", + "\n", + "trainer = SFTTrainer(\n", + " model = model,\n", + " train_dataset = converted_dataset,\n", + " processing_class = processor.tokenizer,\n", + " data_collator = UnslothVisionDataCollator(model, processor),\n", + " args = SFTConfig(\n", + " per_device_train_batch_size = 1,\n", + " gradient_accumulation_steps = 4,\n", + " max_grad_norm = 0.3,\n", + " warmup_ratio = 0.03,\n", + " max_steps = 60,\n", + " # num_train_epochs = 2, # Set this instead of max_steps for full training runs\n", + " learning_rate = 2e-4,\n", + " logging_steps = 1,\n", + " save_strategy = \"steps\",\n", + " optim = \"adamw_8bit\",\n", + " weight_decay = 0.001,\n", + " lr_scheduler_type = \"cosine\",\n", + " seed = 3407,\n", + " output_dir = \"outputs\",\n", + " report_to = \"none\", # For Weights and Biases or others\n", + "\n", + " # You MUST put the below items for vision finetuning:\n", + " remove_unused_columns = False,\n", + " dataset_text_field = \"\",\n", + " dataset_kwargs = {\"skip_prepare_dataset\": True},\n", + " max_length = 2048,\n", + " )\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "cellView": "form", + "id": "2ejIt2xSNKKp", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "30eaeb52-6380-4bef-fe5f-5fab21c3c0d2" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "GPU = NVIDIA A100-SXM4-40GB. Max memory = 39.494 GB.\n", + "19.482 GB of memory reserved.\n" + ] + } + ], + "source": [ + "# @title Show current memory stats\n", + "gpu_stats = torch.cuda.get_device_properties(0)\n", + "start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)\n", + "print(f\"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.\")\n", + "print(f\"{start_gpu_memory} GB of memory reserved.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "id": "yqxqAZ7KJ4oL", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "outputId": "9842dd7e-0b1f-4f83-de92-982c021bc768" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': 2}.\n", + "==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1\n", + " \\\\ /| Num examples = 68,686 | Num Epochs = 1 | Total steps = 60\n", + "O^O/ \\_/ \\ Batch size per device = 1 | Gradient accumulation steps = 4\n", + "\\ / Data Parallel GPUs = 1 | Total batch size (1 x 4 x 1) = 4\n", + " \"-____-\" Trainable parameters = 266,963,456 of 31,540,049,968 (0.85% trained)\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Unsloth: Will smartly offload gradients to save VRAM!\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [60/60 06:15, Epoch 0/1]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StepTraining Loss
13.049649
23.714972
33.310120
43.078652
52.219565
61.768622
71.029011
80.927911
90.854767
100.424313
110.466835
120.503272
130.467726
140.267040
150.208669
160.212328
170.198571
180.158870
190.196536
200.138079
210.152322
220.187750
230.286993
240.156409
250.143032
260.183632
270.215106
280.153457
290.236711
300.231810
310.184119
320.195166
330.162600
340.160010
350.181705
360.173868
370.118509
380.167147
390.144002
400.210267
410.188537
420.131262
430.186414
440.167026
450.114353
460.124952
470.200051
480.136990
490.224131
500.128734
510.124725
520.211169
530.180625
540.172517
550.232271
560.448664
570.129194
580.109717
590.223892
600.146485

" + ] + }, + "metadata": {} + } + ], + "source": [ + "trainer_stats = trainer.train()" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "cellView": "form", + "id": "pCqnaKmlO1U9", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "dc83f244-231c-483d-83d0-c2d0ebe620f4" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "410.1801 seconds used for training.\n", + "6.84 minutes used for training.\n", + "Peak reserved memory = 21.363 GB.\n", + "Peak reserved memory for training = 1.881 GB.\n", + "Peak reserved memory % of max memory = 54.092 %.\n", + "Peak reserved memory for training % of max memory = 4.763 %.\n" + ] + } + ], + "source": [ + "# @title Show final memory and time stats\n", + "used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "used_memory_for_lora = round(used_memory - start_gpu_memory, 3)\n", + "used_percentage = round(used_memory / max_memory * 100, 3)\n", + "lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)\n", + "print(f\"{trainer_stats.metrics['train_runtime']} seconds used for training.\")\n", + "print(\n", + " f\"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.\"\n", + ")\n", + "print(f\"Peak reserved memory = {used_memory} GB.\")\n", + "print(f\"Peak reserved memory for training = {used_memory_for_lora} GB.\")\n", + "print(f\"Peak reserved memory % of max memory = {used_percentage} %.\")\n", + "print(f\"Peak reserved memory for training % of max memory = {lora_percentage} %.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ekOmTR1hSNcr" + }, + "source": [ + "\n", + "### Inference\n", + "Let's run the model! You can modify the instruction and input—just leave the output blank.\n", + "\n", + "We'll use the best hyperparameters for inference on Gemma: `top_p=0.95`, `top_k=64`, and `temperature=1.0`." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "id": "kR3gIAX-SM2q", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "46954df8-cab3-4a85-f4ad-176d7251c998" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "<|channel>thought\n", + "\\left[ \\left[ B _ { n } ^ { + } , b _ { 2 } ^ { - } \\right] , b _ { 2 } ^ { + } \\right] = n B _ { n } ^ { + } , \\quad \\left[ \\left[ B _ { n } ^ { - } , b _ { 2 } ^ { + } \\right] , b _ { 2 } ^ { - } \\right] = n B _ { n } ^ { - } .\n" + ] + } + ], + "source": [ + "image = dataset[10][\"image\"]\n", + "instruction = \"Write the LaTeX representation for this image.\"\n", + "\n", + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\": \"image\"}, {\"type\": \"text\", \"text\": instruction}],\n", + " }\n", + "]\n", + "\n", + "input_text = processor.apply_chat_template(messages, add_generation_prompt = True)\n", + "\n", + "inputs = processor(\n", + " image,\n", + " input_text,\n", + " add_special_tokens = False,\n", + " return_tensors = \"pt\",\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "\n", + "text_streamer = TextStreamer(processor, skip_prompt = True)\n", + "result = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128,\n", + " use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uMuVrWbjAzhc" + }, + "source": [ + "\n", + "### Saving, loading finetuned models\n", + "To save the final model as LoRA adapters, use Hugging Face’s `push_to_hub` for online saving, or `save_pretrained` for local storage.\n", + "\n", + "**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "id": "upcOlWe7A1vc", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "4a815ae8-22d9-47ee-ad27-aee38c2b10e9" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "['gemma_4_lora/processor_config.json']" + ] + }, + "metadata": {}, + "execution_count": 20 + } + ], + "source": [ + "model.save_pretrained(\"gemma_4_lora\") # Local saving\n", + "processor.save_pretrained(\"gemma_4_lora\")\n", + "# model.push_to_hub(\"your_name/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving\n", + "# processor.push_to_hub(\"your_name/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AEEcJ4qfC7Lp" + }, + "source": [ + "Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "id": "MKX_XKs_BNZR", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "1ab27847-31a6-4565-d766-485d4c548dab" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "<|channel>thought\n", + "The image contains a mathematical equation written in LaTeX notation:\n", + "\n", + "$$ D _ { \\mu } ^ { \\alpha \\beta } \\bar { A } _ { \\mu } ^ { \\alpha \\beta } = 0 , $$\n" + ] + } + ], + "source": [ + "if False:\n", + " from unsloth import FastVisionModel\n", + "\n", + " model, processor = FastVisionModel.from_pretrained(\n", + " model_name = \"gemma_4_lora\", # YOUR MODEL YOU USED FOR TRAINING\n", + " load_in_4bit = True, # Set to False for 16bit LoRA\n", + " )\n", + "\n", + "sample = dataset[1]\n", + "image = sample[\"image\"].convert(\"RGB\")\n", + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": sample[\"text\"],\n", + " },\n", + " {\n", + " \"type\": \"image\",\n", + " },\n", + " ],\n", + " },\n", + "]\n", + "input_text = processor.apply_chat_template(messages, add_generation_prompt = True)\n", + "inputs = processor(\n", + " image,\n", + " input_text,\n", + " add_special_tokens = False,\n", + " return_tensors = \"pt\",\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "\n", + "text_streamer = TextStreamer(processor.tokenizer, skip_prompt = True)\n", + "_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128,\n", + " use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f422JgM9sdVT" + }, + "source": [ + "### Saving to float16 for VLLM\n", + "\n", + "We also support saving to `float16` directly. Select `merged_16bit` for float16. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See [our docs](https://unsloth.ai/docs/basics/inference-and-deployment) for more deployment options." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "id": "iHjt_SMYsd3P" + }, + "outputs": [], + "source": [ + "# Select ONLY 1 to save! (Both not needed!)\n", + "\n", + "# Save locally to 16bit\n", + "if False: model.save_pretrained_merged(\"unsloth_finetune\", processor,)\n", + "\n", + "# To export and save to your Hugging Face account\n", + "if False: model.push_to_hub_merged(\"YOUR_USERNAME/unsloth_finetune\", processor, token = \"YOUR_HF_TOKEN\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TSjNVDCYv-yr" + }, + "source": [ + "And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!\n", + "\n", + "Some other resources:\n", + "1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)\n", + "2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)\n", + "3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)\n", + "4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)!\n", + "\n", + "

\n", + " \n", + " \n", + " \n", + "\n", + " Join Discord if you need help + ⭐️ Star us on Github ⭐️\n", + "
\n", + "\n", + " This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme)." + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "A100", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.3" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "cca48201ab524349b98bd7dc3ec0a371": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_a74922ad802c4cfe96d3f7c9d0215877", + "IPY_MODEL_30e429603fbb48469d80e5ca48305216", + "IPY_MODEL_2bc496ecfb4e410ba3ba46451dd38cac" + ], + "layout": "IPY_MODEL_ddb467daf231485199b6820c1c87be59" + } + }, + "a74922ad802c4cfe96d3f7c9d0215877": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_6c30ba7b3fc84a63b7b8975e8b17991c", + "placeholder": "​", + "style": "IPY_MODEL_3b10e74e5c6b421092b314f71f7e9217", + "value": "model.safetensors.index.json: " + } + }, + "30e429603fbb48469d80e5ca48305216": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d2bc2f8ade924beca4c2db847efa6f5d", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_eed52dc4a67d4253983ee7b20918152d", + "value": 1 + } + }, + "2bc496ecfb4e410ba3ba46451dd38cac": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_642d63ab9f7c451fa96a0633c1a7f23c", + "placeholder": "​", + "style": "IPY_MODEL_72cbdfb8889a4da0bcc0a4e5d5a67ac2", + "value": " 120k/? [00:00<00:00, 11.1MB/s]" + } + }, + "ddb467daf231485199b6820c1c87be59": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6c30ba7b3fc84a63b7b8975e8b17991c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3b10e74e5c6b421092b314f71f7e9217": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "d2bc2f8ade924beca4c2db847efa6f5d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "eed52dc4a67d4253983ee7b20918152d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "642d63ab9f7c451fa96a0633c1a7f23c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "72cbdfb8889a4da0bcc0a4e5d5a67ac2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "26121d7e6555452ab0f5ac2a6681d864": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_e239143f8ec0425db6135a5d749c2c53", + "IPY_MODEL_b86cb0f30f674656aff10b0701f9f26f", + "IPY_MODEL_a7cf44db72fc47b8b8f7a084b7585656" + ], + "layout": "IPY_MODEL_acbd99a66aa846d78396dcf40ebc8bd7" + } + }, + "e239143f8ec0425db6135a5d749c2c53": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_35ae415e878e486fa0caad09b8408067", + "placeholder": "​", + "style": "IPY_MODEL_0b4dda8099954b0d8851c05e24ed0e35", + "value": "Download complete: 100%" + } + }, + "b86cb0f30f674656aff10b0701f9f26f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_cbd50ad058134a36bfcd386deac87fb8", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_1d572d34d67b4d6b832203a27f44f93c", + "value": 1 + } + }, + "a7cf44db72fc47b8b8f7a084b7585656": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b988bc006f284214883cba413dfb5fc3", + "placeholder": "​", + "style": "IPY_MODEL_1b43696ca662419ea0c970b05d89441d", + "value": " 62.5G/62.5G [02:48<00:00, 396MB/s]" + } + }, + "acbd99a66aa846d78396dcf40ebc8bd7": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "35ae415e878e486fa0caad09b8408067": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0b4dda8099954b0d8851c05e24ed0e35": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "cbd50ad058134a36bfcd386deac87fb8": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "1d572d34d67b4d6b832203a27f44f93c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "b988bc006f284214883cba413dfb5fc3": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1b43696ca662419ea0c970b05d89441d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "69394dfeff2a4286bb7088f599bf4c0b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_a4e7a791f92449978900effc00e5100a", + "IPY_MODEL_49e5b9f4cea744fc8f38a8adabc0bd42", + "IPY_MODEL_08785606a9034297aa16181720c51142" + ], + "layout": "IPY_MODEL_e6e357b71bb8474d82d98ce813aa2183" + } + }, + "a4e7a791f92449978900effc00e5100a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a36232c6617141a1b0fdfe0108550ae0", + "placeholder": "​", + "style": "IPY_MODEL_7563a57897d846f6ba41007b0c92135e", + "value": "Fetching 2 files: 100%" + } + }, + "49e5b9f4cea744fc8f38a8adabc0bd42": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_da65718025e64f83a198f99f8fa48c3b", + "max": 2, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_96e509dc48aa456da20260357e60677b", + "value": 2 + } + }, + "08785606a9034297aa16181720c51142": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_63f6774afc314b52b1ba82f5335d67a3", + "placeholder": "​", + "style": "IPY_MODEL_b4357ad98d3b4113ae9dab45333b8ee4", + "value": " 2/2 [02:48<00:00, 168.12s/it]" + } + }, + "e6e357b71bb8474d82d98ce813aa2183": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a36232c6617141a1b0fdfe0108550ae0": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7563a57897d846f6ba41007b0c92135e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "da65718025e64f83a198f99f8fa48c3b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "96e509dc48aa456da20260357e60677b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "63f6774afc314b52b1ba82f5335d67a3": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b4357ad98d3b4113ae9dab45333b8ee4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "b7eddc272b2f4b0db9d910b1fa53bae1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_c25094678b464bd0b693dcfe333da14f", + "IPY_MODEL_5332bd2cc65145c49a843bc835d51373", + "IPY_MODEL_903d6cf82ea44919a1739b2fbae8f16a" + ], + "layout": "IPY_MODEL_5382cb936c2b46d59e1819622ee8ef39" + } + }, + "c25094678b464bd0b693dcfe333da14f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_bc3b830bad5c4d78b64406b4f19cc370", + "placeholder": "​", + "style": "IPY_MODEL_09a61bb08189489687754346b906b875", + "value": "Loading weights: 100%" + } + }, + "5332bd2cc65145c49a843bc835d51373": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_91bb3728a52042aeb9947caf54dda34e", + "max": 1188, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_1aadf052247c44d8bb0081cd68234fa8", + "value": 1188 + } + }, + "903d6cf82ea44919a1739b2fbae8f16a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_1e5f5a2acd9b47ed972c98ef4428f03d", + "placeholder": "​", + "style": "IPY_MODEL_31cbc371336f456795651381a7e3ad0a", + "value": " 1188/1188 [02:43<00:00, 87.31it/s]" + } + }, + "5382cb936c2b46d59e1819622ee8ef39": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "bc3b830bad5c4d78b64406b4f19cc370": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "09a61bb08189489687754346b906b875": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "91bb3728a52042aeb9947caf54dda34e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1aadf052247c44d8bb0081cd68234fa8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "1e5f5a2acd9b47ed972c98ef4428f03d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "31cbc371336f456795651381a7e3ad0a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "0dbb262cf30643659120f8cdb5b3171c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_abd6e0b4b3f24f1cb818aab793d61477", + "IPY_MODEL_b30a2b86573844d5bf4a6f43e5bd8823", + "IPY_MODEL_6c98eca1409f4e1ba2e7865b2992f51e" + ], + "layout": "IPY_MODEL_bceaf428e3fc4bcea2318512a7c6ef4b" + } + }, + "abd6e0b4b3f24f1cb818aab793d61477": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_5cdef7c8d58c441daaaddb9f34bc1a7d", + "placeholder": "​", + "style": "IPY_MODEL_5c4901ca1dd44ee69bb5bb76b1cb3fb4", + "value": "generation_config.json: 100%" + } + }, + "b30a2b86573844d5bf4a6f43e5bd8823": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_25768d586f0f4d5cb9c86d01b13e7fa1", + "max": 208, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_f962bb7a63094308b1ec98cd84c90637", + "value": 208 + } + }, + "6c98eca1409f4e1ba2e7865b2992f51e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_9a305de1de4a4fb797d14e996e802ca0", + "placeholder": "​", + "style": "IPY_MODEL_29a723a6b8b7436faa14620beff25597", + "value": " 208/208 [00:00<00:00, 25.4kB/s]" + } + }, + "bceaf428e3fc4bcea2318512a7c6ef4b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5cdef7c8d58c441daaaddb9f34bc1a7d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5c4901ca1dd44ee69bb5bb76b1cb3fb4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "25768d586f0f4d5cb9c86d01b13e7fa1": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f962bb7a63094308b1ec98cd84c90637": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "9a305de1de4a4fb797d14e996e802ca0": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "29a723a6b8b7436faa14620beff25597": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "87fa7a1186c04503994d3ca6781f2fa2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_70dd1d3746a14a5cac5fb65edb8ac231", + "IPY_MODEL_6144783cd98542e3ac0488d1f1c92c35", + "IPY_MODEL_d605282333054e82a1d8507be830f02f" + ], + "layout": "IPY_MODEL_dfbb2f74208243238c1dc88dcc4d45e7" + } + }, + "70dd1d3746a14a5cac5fb65edb8ac231": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_555c3a2ab47c4cb4b872fd067c15c600", + "placeholder": "​", + "style": "IPY_MODEL_e60657aaf58d4373a3e777352ab720c3", + "value": "processor_config.json: " + } + }, + "6144783cd98542e3ac0488d1f1c92c35": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7c9085eed1be41b8b6d61f5eef968825", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_0357ffa5f3224994a2917f7f09affff3", + "value": 1 + } + }, + "d605282333054e82a1d8507be830f02f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_062b5afb6d9f492aa62fd5547fc6d5eb", + "placeholder": "​", + "style": "IPY_MODEL_6e53b93ebdf045e9b94eedd1bd489b3d", + "value": " 1.69k/? [00:00<00:00, 184kB/s]" + } + }, + "dfbb2f74208243238c1dc88dcc4d45e7": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "555c3a2ab47c4cb4b872fd067c15c600": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e60657aaf58d4373a3e777352ab720c3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "7c9085eed1be41b8b6d61f5eef968825": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "0357ffa5f3224994a2917f7f09affff3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "062b5afb6d9f492aa62fd5547fc6d5eb": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6e53b93ebdf045e9b94eedd1bd489b3d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "d2a7e731aa05474299243a698d8f8e76": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_9c8ecc577ca6479aad38d1af6c783685", + "IPY_MODEL_5cd3447374284aa6a9d62e5a72f5ff3c", + "IPY_MODEL_b49a335de14944a5afb36a8ec263c6c8" + ], + "layout": "IPY_MODEL_94e9455af5d94b089a5ceb122446b782" + } + }, + "9c8ecc577ca6479aad38d1af6c783685": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3aae33b4c0ef45e597513265bc4b0681", + "placeholder": "​", + "style": "IPY_MODEL_2efb2666390a482b913a30109a7e2472", + "value": "chat_template.jinja: " + } + }, + "5cd3447374284aa6a9d62e5a72f5ff3c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0ce0b6f052db48c28f7788e95df7c1f3", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_01917cd89aef4db0aa18675636f4e981", + "value": 1 + } + }, + "b49a335de14944a5afb36a8ec263c6c8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_58ee20c84f774dceaa69985bd3c63afa", + "placeholder": "​", + "style": "IPY_MODEL_9d8eb917864e4be1b788f9cfcd64e182", + "value": " 12.0k/? [00:00<00:00, 1.44MB/s]" + } + }, + "94e9455af5d94b089a5ceb122446b782": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3aae33b4c0ef45e597513265bc4b0681": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2efb2666390a482b913a30109a7e2472": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "0ce0b6f052db48c28f7788e95df7c1f3": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "01917cd89aef4db0aa18675636f4e981": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "58ee20c84f774dceaa69985bd3c63afa": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9d8eb917864e4be1b788f9cfcd64e182": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "a955958aff0444a4839aafca8566065c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_867f014d40bd409fa6c845436b050fd3", + "IPY_MODEL_dd08aebf4c9b44b692534426ea948cdb", + "IPY_MODEL_6e0c925129ea45b6bce91769e084e9f3" + ], + "layout": "IPY_MODEL_e3a3e391d3114a94985fd3db6cc5ee1a" + } + }, + "867f014d40bd409fa6c845436b050fd3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c224a759bea64ffbbf3b645a59cd79c5", + "placeholder": "​", + "style": "IPY_MODEL_e696b055c6934f1186078767877ff2a2", + "value": "tokenizer_config.json: " + } + }, + "dd08aebf4c9b44b692534426ea948cdb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_319292819e264a19a27685601c70f9b6", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_bb1795785a6a44ec9b17b31b1075cbb3", + "value": 1 + } + }, + "6e0c925129ea45b6bce91769e084e9f3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_5536ee38467e4557bbb1209cb7b4e424", + "placeholder": "​", + "style": "IPY_MODEL_b9c49626bcbb4e8bb6d4a01a08fab4dd", + "value": " 15.0k/? [00:00<00:00, 1.62MB/s]" + } + }, + "e3a3e391d3114a94985fd3db6cc5ee1a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c224a759bea64ffbbf3b645a59cd79c5": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e696b055c6934f1186078767877ff2a2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "319292819e264a19a27685601c70f9b6": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "bb1795785a6a44ec9b17b31b1075cbb3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "5536ee38467e4557bbb1209cb7b4e424": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b9c49626bcbb4e8bb6d4a01a08fab4dd": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "4cfc78a661dd4837b26a78d7dba376b0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_ee345697fcfc4468a50627bcff074655", + "IPY_MODEL_3ed7a054b0104bd09132227f1ed413ba", + "IPY_MODEL_59667e10734c42218476077f7741da15" + ], + "layout": "IPY_MODEL_e1d2d9a63d3f43dca2a8e097470d2c76" + } + }, + "ee345697fcfc4468a50627bcff074655": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a743692b1eae456cad70ecf0d3752493", + "placeholder": "​", + "style": "IPY_MODEL_d6ad093be12e4a3e9b7c75160aeb2fb9", + "value": "tokenizer.json: 100%" + } + }, + "3ed7a054b0104bd09132227f1ed413ba": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_746d3b1ccba940e69fc9bf50eb1986c8", + "max": 32169626, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_580577e9f9a5425f9ca10479f42b3d16", + "value": 32169626 + } + }, + "59667e10734c42218476077f7741da15": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_27b4eafed4584bcda6f88f1fb82f9212", + "placeholder": "​", + "style": "IPY_MODEL_d6caa192c711497b856f18df2f5d9a62", + "value": " 32.2M/32.2M [00:00<00:00, 87.6MB/s]" + } + }, + "e1d2d9a63d3f43dca2a8e097470d2c76": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a743692b1eae456cad70ecf0d3752493": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d6ad093be12e4a3e9b7c75160aeb2fb9": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "746d3b1ccba940e69fc9bf50eb1986c8": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "580577e9f9a5425f9ca10479f42b3d16": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "27b4eafed4584bcda6f88f1fb82f9212": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d6caa192c711497b856f18df2f5d9a62": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "19f9e04dafa744d0bbafe658a8cd4479": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_f261579b58c94894a6c186255415e54b", + "IPY_MODEL_d36d30757ab34602ab6411c282171802", + "IPY_MODEL_0ac1ef374ce34494a9ade7b68c1fcb7a" + ], + "layout": "IPY_MODEL_588e76027ce9499e8e7683f00c8bd999" + } + }, + "f261579b58c94894a6c186255415e54b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_23bb3393ad0342aaa993c5109af01cbb", + "placeholder": "​", + "style": "IPY_MODEL_1b9c16155dec43db992c6140ea535912", + "value": "README.md: 100%" + } + }, + "d36d30757ab34602ab6411c282171802": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_5b938c6422c64286aed0276899c260b3", + "max": 519, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_74fce0116e3b4cf4abde0af7563ebccb", + "value": 519 + } + }, + "0ac1ef374ce34494a9ade7b68c1fcb7a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a67007d24e9b4a11a35f955db150d486", + "placeholder": "​", + "style": "IPY_MODEL_f91df33daa9348e7abede2d855dc0ecb", + "value": " 519/519 [00:00<00:00, 60.7kB/s]" + } + }, + "588e76027ce9499e8e7683f00c8bd999": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "23bb3393ad0342aaa993c5109af01cbb": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1b9c16155dec43db992c6140ea535912": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "5b938c6422c64286aed0276899c260b3": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "74fce0116e3b4cf4abde0af7563ebccb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "a67007d24e9b4a11a35f955db150d486": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f91df33daa9348e7abede2d855dc0ecb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "717b21eed2084ebe9ddcf9a0da8c1c7f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_1e690555c06540359b89b87c6ae62a9c", + "IPY_MODEL_bddd2d47d2bd40fb9d68be94ebacc0d7", + "IPY_MODEL_b875ef0b812d44cd8e1fd102b597149b" + ], + "layout": "IPY_MODEL_91c2f197fcf94b02bbbb780cced6c0de" + } + }, + "1e690555c06540359b89b87c6ae62a9c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_8cd8dc2aafd04685b8cb3fb746f08005", + "placeholder": "​", + "style": "IPY_MODEL_f30f8e430cad427d9086bc99f6f146be", + "value": "data/train-00000-of-00001.parquet: 100%" + } + }, + "bddd2d47d2bd40fb9d68be94ebacc0d7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c048a0b91a5a4bf7803fb3d5ba994acd", + "max": 343805431, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_6de4762ccf1a42e6b51f18880b37cd50", + "value": 343805431 + } + }, + "b875ef0b812d44cd8e1fd102b597149b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_4d0707913c734e899854b548bc3364a7", + "placeholder": "​", + "style": "IPY_MODEL_bd20452eab634993b4a716a1cdd9a4ea", + "value": " 344M/344M [00:02<00:00, 397MB/s]" + } + }, + "91c2f197fcf94b02bbbb780cced6c0de": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8cd8dc2aafd04685b8cb3fb746f08005": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f30f8e430cad427d9086bc99f6f146be": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "c048a0b91a5a4bf7803fb3d5ba994acd": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6de4762ccf1a42e6b51f18880b37cd50": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "4d0707913c734e899854b548bc3364a7": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "bd20452eab634993b4a716a1cdd9a4ea": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "ae951bf9259b4054a23534387dc83f8d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_647e5bcc9d3a4cf4a1e9889cd805c45e", + "IPY_MODEL_a71f8702013a453c894debaa8640a215", + "IPY_MODEL_04ff990ba34c43d99e872fec5f39f0a7" + ], + "layout": "IPY_MODEL_1f722501e2894a22b83ed42f482cdec7" + } + }, + "647e5bcc9d3a4cf4a1e9889cd805c45e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_f28bf1e32d7542479fe827a3723089c9", + "placeholder": "​", + "style": "IPY_MODEL_4c298b0109bb4734a876c69ef32e5f6b", + "value": "data/test-00000-of-00001.parquet: 100%" + } + }, + "a71f8702013a453c894debaa8640a215": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_51da57c93dd549b6981874d40f26049e", + "max": 38205016, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_282c398a45784093a51432726ec260c5", + "value": 38205016 + } + }, + "04ff990ba34c43d99e872fec5f39f0a7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b4d4af7fd215445282451eadc69f9bd5", + "placeholder": "​", + "style": "IPY_MODEL_b9825625d8f54c188698e0a8d5a03e0b", + "value": " 38.2M/38.2M [00:00<00:00, 190MB/s]" + } + }, + "1f722501e2894a22b83ed42f482cdec7": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f28bf1e32d7542479fe827a3723089c9": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4c298b0109bb4734a876c69ef32e5f6b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "51da57c93dd549b6981874d40f26049e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "282c398a45784093a51432726ec260c5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "b4d4af7fd215445282451eadc69f9bd5": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b9825625d8f54c188698e0a8d5a03e0b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "a469f167acc94d0a93fb8bf94fe5c13a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_4153276526fc459ea16f69f29f82d8dd", + "IPY_MODEL_2006a3d483f24f96b38021cb9a7e3af8", + "IPY_MODEL_77b8d2488e044518add72302d4875797" + ], + "layout": "IPY_MODEL_ebb50b363f5d4fd49c35937434858d4d" + } + }, + "4153276526fc459ea16f69f29f82d8dd": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_22d14ea43aa6418786ff501f5e1e0cc1", + "placeholder": "​", + "style": "IPY_MODEL_1ca26cff3ae149d88a53bb1e165a299a", + "value": "Generating train split: 100%" + } + }, + "2006a3d483f24f96b38021cb9a7e3af8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_28eec1a723c9433baff472e352125935", + "max": 68686, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_15583a029d5e4a78a60f68c865d064b1", + "value": 68686 + } + }, + "77b8d2488e044518add72302d4875797": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_784a50c5430743d98d944aaed6c23479", + "placeholder": "​", + "style": "IPY_MODEL_81f8fcd62a8d4077830a3be2ed444cfb", + "value": " 68686/68686 [00:00<00:00, 123273.91 examples/s]" + } + }, + "ebb50b363f5d4fd49c35937434858d4d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "22d14ea43aa6418786ff501f5e1e0cc1": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1ca26cff3ae149d88a53bb1e165a299a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "28eec1a723c9433baff472e352125935": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "15583a029d5e4a78a60f68c865d064b1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "784a50c5430743d98d944aaed6c23479": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "81f8fcd62a8d4077830a3be2ed444cfb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "e4a0c21c1bc341e589cce85b98ce2737": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_a14ce85211314a9ba56598ef14968b7c", + "IPY_MODEL_116190e5f17946b88241a2b318522765", + "IPY_MODEL_299a8904e4834fe38d81c804e0669a22" + ], + "layout": "IPY_MODEL_0a161c324ea94abf955d4824199d79bf" + } + }, + "a14ce85211314a9ba56598ef14968b7c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_40b91e46da234c6bbef9adb542336f43", + "placeholder": "​", + "style": "IPY_MODEL_3a3ec79b99c8435ea208b9292b0d3bbb", + "value": "Generating test split: 100%" + } + }, + "116190e5f17946b88241a2b318522765": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_9a55c6d9f4284fae92e4c4fd079c7905", + "max": 7632, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_9ef11b00c45d436794e9acad12715ec9", + "value": 7632 + } + }, + "299a8904e4834fe38d81c804e0669a22": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_6d20f7eb80cf4d54a1c34e3d185ee227", + "placeholder": "​", + "style": "IPY_MODEL_aa2ede79b4464737a0d504e403af5791", + "value": " 7632/7632 [00:00<00:00, 135563.13 examples/s]" + } + }, + "0a161c324ea94abf955d4824199d79bf": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "40b91e46da234c6bbef9adb542336f43": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3a3ec79b99c8435ea208b9292b0d3bbb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "9a55c6d9f4284fae92e4c4fd079c7905": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9ef11b00c45d436794e9acad12715ec9": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "6d20f7eb80cf4d54a1c34e3d185ee227": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "aa2ede79b4464737a0d504e403af5791": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "state": {} + } + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E2B)-Audio.ipynb b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E2B)-Audio.ipynb new file mode 100644 index 0000000..c59e8d4 --- /dev/null +++ b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E2B)-Audio.ipynb @@ -0,0 +1,5745 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "4RQuEItXqjUN" + }, + "source": [ + "To run this, press \"*Runtime*\" and press \"*Run all*\" on a **free** Tesla T4 Google Colab instance!\n", + "
\n", + "\n", + "\n", + " Join Discord if you need help + ⭐ Star us on Github ⭐\n", + "
\n", + "\n", + "To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme).\n", + "\n", + "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "STd3-4v9qjUO" + }, + "source": [ + "### News" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5r82a21yqjUP" + }, + "source": [ + "Introducing **Unsloth Studio** - a new open source, no-code web UI to train and run LLMs. [Blog](https://unsloth.ai/docs/new/studio) • [Notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb)\n", + "\n", + "\n", + "\n", + "\n", + "
\"Unsloth
Train models — no code needed
\"Unsloth
Run GGUF models on Mac, Windows & Linux
\n", + "\n", + "Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe)\n", + "\n", + "Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context)\n", + "\n", + "New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) • [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) • [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning)\n", + "\n", + "Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2pK4zPKsqjUP" + }, + "source": [ + "### Installation" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "csoGx5FzqjUQ" + }, + "outputs": [], + "source": "%%capture\nimport os, re\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r'[\\d]{1,}\\.[\\d]{1,}', str(torch.__version__)).group(0)\n xformers = 'xformers==' + {'2.10':'0.0.34','2.9':'0.0.33.post1','2.8':'0.0.32.post2'}.get(v, \"0.0.34\")\n !pip install sentencepiece protobuf \"datasets==4.3.0\" \"huggingface_hub>=0.34.0\" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;" + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "hhrWHOOFqjUQ" + }, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TGMWlrRdzwgf" + }, + "source": [ + "### Unsloth\n", + "\n", + "`FastModel` supports loading nearly any model now! This includes Vision and Text models!" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 397, + "referenced_widgets": [ + "81fe811cdf2e40c7a020c6e1f097cd02", + "f6590b74c331415d8f0fcee104f97e58", + "31398a58983d41ed8d94842c07415e8c", + "f9e4ff751fc041e4b5be42b72e17b882", + "caea4099c7034f208fd13c63553ee560", + "83192d6972f94f9b9a15a739071788b4", + "4d6001fc4d6d4697a930679d653d27fb", + "3b9e9316b3ac4966ace4c5e1c4d67fef", + "f30c08ae79e44226b06b309a7ec669bf", + "c64809e43c084f7eb3e65a3ba91ee80f", + "fda067b1baf544eca44913a43c3f001b", + "45f04faa71274efa8317e471ab73df6a", + "13b6799823d64421a2ba3ef6c4961b39", + "7f5e953704d44cf7ba384302d214eb55", + "8976a2562ce148c38981cd0325eceeae", + "63024153abcd4f0e81cdff2c184ed280", + "db2f1572dd82426e9ba78ac3f4e28093", + "f2059b9eb34142fab10d1dae70057451", + "591782010deb4fa98f362ba97d42b9c3", + "72aa31f055ae42b08e2b6d6a8ac9ad70", + "f6d3ae159edc4736858f12c8783015ee", + "64aca0cc3a1e467090334c92643dfe7e", + "04d4131c40f14de2b114abeaea1d3826", + "68cc0f42113a4f27818d9cae45a94f12", + "2f65fe0251104228bdf8d75f2efd0654", + "669fcdcecc064750844f95e36a004ff8", + "01410e3b39df424aa026edb00683ce33", + "273b695a8b3d4634a22062ee99424813", + "a3c62f1549f84c30b1d0f84bedd9e5a4", + "b246f71f9b6648d09ef6928a7f0bbdd7", + "58cbb0c1da41492d9b46e5e83bfca52b", + "647449f158594e1a9b832e738bcae2af", + "e24bfd8d4e3348eba9569636f65a6b65", + "693e732ffcc44d8dbc47486387654633", + "b88e8cd127e943d0a684db44deed876f", + "a5b8bbbb241d48d78e1a3c46fef8bcc7", + "848c2eb0e73c4d1cbcb85f8446039006", + "a0eca54ca2d34b899bcef8ccfc6cb83c", + "b4a66cfed2e948aba00fd34091174cda", + "868be719d25c493188ee325aa0ea9f54", + "2c52a24ff42c4e52b053e72d6f07ea36", + "8de6249b9d684a22b8b60de8851bfae6", + "2f8c19159cc44fdbb5c9b7be4995e201", + "c622d436711b4e3d8f1939bbd8f85d00", + "378f1be25c5b45eeb18a64fbfe0fb40c", + "de257ad63c5a4504832a924cc44fc485", + "b7ddcd119e1c42e8b880f021482f1b15", + "bfbcf756decf48e9989f5782adfefd9e", + "f839e2ad69694edabfa0b7a0f3b51041", + "87352f69737a40a58b026f1b283af684", + "9599429dce8a464ebe38e518669ee265", + "5ca635ba959a47b6860e5d94fb25cc8f", + "c39b230ccb804319888e8b57ed0b6634", + "2d7db58670d4478abab9e945eb503cf6", + "8c03e642a64b44499135db3eb2cccf8e", + "f4e6db57edf643719111255a7adf3407", + "904fac86bfbb4071b956a3dd03d95df3", + "bb9c7a2b598947de99356500943eb2e0", + "44727df2f31c4367a47f20308f9129d8", + "cc847310b49a496f953be34f1ba0a6ba", + "4f664b28437248548e0cd5e19a5f3fbe", + "e5fcaad65cc3426a939e53d288368eac", + "022e4158503b437b81d3004414ff32d2", + "315a245ef752448b801a33c5c01f74bc", + "2fdfd142d38b42b0852adc8c6992b63d", + "d332c968458f4e859f505b38c9c722fe", + "f80dd089c148441ca453d00ce380f91f", + "f6797c8912d841dca0a8adb6b273069e", + "b698bc4e75a446d3a27a9a206cb5d232", + "9cafc24fe0704dd6870d1c9d196bbbe8", + "2763f369d1344c5486b8c78e79073369", + "20d34e4cb7e34e7f84034253770a7930", + "7d86d845c7dd4f9b8a1d336d2360f4b7", + "2f23ce19221d4e37a8b1ce6e70feed67", + "f30d5c5c5e344a569e3a04486295d501", + "3f0de1b87f344d3a9d7e91d28b7366f0", + "bd004ca4b1114f7ba045cfeca32cf351" + ] + }, + "execution": { + "iopub.execute_input": "2025-07-20T12:16:21.155888Z", + "iopub.status.busy": "2025-07-20T12:16:21.155077Z", + "iopub.status.idle": "2025-07-20T12:17:36.514669Z", + "shell.execute_reply": "2025-07-20T12:17:36.513831Z", + "shell.execute_reply.started": "2025-07-20T12:16:21.155861Z" + }, + "id": "-Xbb0cuLzwgf", + "outputId": "15906da4-6b34-4cae-bbb2-c5a43a732d40", + "trusted": true + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.\n", + "🦥 Unsloth Zoo will now patch everything to make training faster!\n", + "==((====))== Unsloth 2026.4.4: Fast Gemma4 patching. Transformers: 5.5.0.\n", + " \\\\ /| Tesla T4. Num GPUs = 1. Max memory: 14.563 GB. Platform: Linux.\n", + "O^O/ \\_/ \\ Torch: 2.10.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.6.0\n", + "\\ / Bfloat16 = FALSE. FA [Xformers = 0.0.34. FA2 = False]\n", + " \"-____-\" Free license: http://github.com/unslothai/unsloth\n", + "Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!\n", + "Unsloth: QLoRA and full finetuning all not selected. Switching to 16bit LoRA.\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "model.safetensors: 0%| | 0.00/10.2G [00:00Let's Evaluate Gemma 4 Baseline Performance on German Transcription" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-20T12:46:26.982493Z", + "iopub.status.busy": "2025-07-20T12:46:26.982188Z", + "iopub.status.idle": "2025-07-20T12:46:28.166576Z", + "shell.execute_reply": "2025-07-20T12:46:28.165889Z", + "shell.execute_reply.started": "2025-07-20T12:46:26.982472Z" + }, + "id": "GHBGeJhYcorh", + "trusted": true, + "outputId": "a5640863-e23c-403b-c8fc-7e151376cef5", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 145, + "referenced_widgets": [ + "377d9407a51a4afbb5e2ac3082ec7286", + "201602a3b2ab4cef81f050f8aa5bdee7", + "1667672dbc2649db83027b1b29e68a7f", + "825cb6dfd0744bd390edd6f0e7026ece", + "010105d818c94d9194d670e16e3df612", + "983e57d6d3c344f48d54c4c134dd6247", + "c91c8bffca8b4929946d93e0df229a83", + "65ffffa2ccd443deb7c5c5bdc448a852", + "66b30527c4594eda928fae52411673b8", + "998ab65506514bb2928732c805853ea8", + "945a62bbaf7e45a09e2b705c345cc615", + "5286fa400cf14de49f465f7a7a9d4b4f", + "d9cde02d9c94465aa5e48510be47dad7", + "3d2dff947c7d4a65b6a538929aa038a2", + "6a0cb366fcf044d696006c20872eb225", + "5a4ce86e944d4fac9e4834c8e78520e0", + "39dc7790b1f449b9afcab6f70adf06d4", + "71a3a19710ea445eb5af3ee5a3c72e7e", + "00e96565af7c42838df39ed1fe6e130d", + "f4f69ac8b98041d182ba596aff68e104", + "3220f7b1d6c846c5876ba68bb62770b1", + "02279527eb8940b3939da867aa5f4415", + "689827e4e746485898285218bb324ae7", + "d8430623a8b94ea5ad270db056ce70e8", + "7e0a033b1c18437a9610f23f70d5185a", + "a84862f54c724637be4ced2683fcc422", + "4e30bc04c974417689ae67031d69d14a", + "2817c7c341e442c3a4b2ad2daf075efe", + "3447c8ec5ee847c2bee5d297b2783dd7", + "1c8e63219be941c8b55c10245b79c3cb", + "8aa2a11e654148699c6403fbc83dfbbd", + "a314b22b91354a25a14101dccacc7aea", + "a57cd4d7142841249ec3d46a603e27f1", + "227ff7810c1e442e8ea8bd2eb7897bca", + "7651bdb7b49440189e09e316b2f96279", + "c54c874fdf7e4a11bee22f8f1aff1da2", + "4b56c3a68ba34df08c7b64e88fdbc7ef", + "8111e2f9278947459aae516bc0db01ef", + "3634546c0261453dba5223f64922f55e", + "0e0dae87b5d14e0483a43308f8d3fd87", + "dcccf14fcfd44f57a5e989bc14bc32af", + "21e9929cd39b4216b0b4682af9d72ed2", + "55fb4a3a90e9475bab2d7acee9f84a15", + "99c3fcf6bc24410fb68c3c09867b99db" + ] + } + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "README.md: 0%| | 0.00/540 [00:00" + ], + "text/html": [ + "\n", + " \n", + " " + ] + }, + "metadata": {}, + "execution_count": 6 + } + ], + "source": [ + "from IPython.display import Audio, display\n", + "print(test_audio['text'])\n", + "Audio(test_audio['audio']['array'],rate = test_audio['audio']['sampling_rate'])" + ] + }, + { + "cell_type": "markdown", + "source": [ + "And the translation of the audio from German to English is:\n", + "\n", + "> I—I hold myself directly accountable. That much is, of course, clear: namely, that there are political interests involved in trade—in the exchange of goods—and that political influences are at play. The question is: that should not be the alternative." + ], + "metadata": { + "id": "3XGomsRxl5d_" + } + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "execution": { + "iopub.execute_input": "2025-07-20T12:18:05.249062Z", + "iopub.status.busy": "2025-07-20T12:18:05.248355Z", + "iopub.status.idle": "2025-07-20T12:18:37.319606Z", + "shell.execute_reply": "2025-07-20T12:18:37.318802Z", + "shell.execute_reply.started": "2025-07-20T12:18:05.249040Z" + }, + "id": "BJr_D4O9Z2Zh", + "outputId": "ef71193a-3564-4263-ad73-bf738ae7e9c9", + "trusted": true + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Sie sich direkt mich an und ist mir völlig klar, was politische Interessen gibt im Handel im Austausch mit Waren, dass es politische Einflüsse gibt. Die globale ist die Alternative soll es nicht sein.\n", + "\n" + ] + } + ], + "source": [ + "messages = [\n", + " {\n", + " \"role\": \"system\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": \"You are an assistant that transcribes speech accurately.\",\n", + " }\n", + " ],\n", + " },\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"audio\", \"audio\": test_audio['audio']['array']},\n", + " {\"type\": \"text\", \"text\": \"Please transcribe this audio.\"}\n", + " ]\n", + " }\n", + "]\n", + "\n", + "do_gemma_4_inference(messages, max_new_tokens = 256)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Yc0nI6Gzcori" + }, + "source": [ + "

Baseline Model Performance: 32.43% Word Error Rate (WER) for this sample !

" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Bw5XPyYFajyM" + }, + "source": [ + "# Let's finetune Gemma 4!\n", + "\n", + "You can finetune the vision and text and audio parts" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SXd9bTZd1aaL" + }, + "source": [ + "We now add LoRA adapters so we only need to update a small amount of parameters!" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-20T12:46:48.481871Z", + "iopub.status.busy": "2025-07-20T12:46:48.481594Z", + "iopub.status.idle": "2025-07-20T12:46:55.013627Z", + "shell.execute_reply": "2025-07-20T12:46:55.012955Z", + "shell.execute_reply.started": "2025-07-20T12:46:48.481854Z" + }, + "id": "6bZsfBuZDeCL", + "trusted": true + }, + "outputs": [], + "source": [ + "model = FastModel.get_peft_model(\n", + " model,\n", + " finetune_vision_layers = False, # False if not finetuning vision layers\n", + " finetune_language_layers = True, # False if not finetuning language layers\n", + " finetune_attention_modules = True, # False if not finetuning attention layers\n", + " finetune_mlp_modules = True, # False if not finetuning MLP layers\n", + "\n", + " r = 8, # The larger, the higher the accuracy, but might overfit\n", + " lora_alpha = 16, # Recommended alpha == r at least\n", + " lora_dropout = 0,\n", + " bias = \"none\",\n", + " random_state = 3407,\n", + " use_rslora = False, # We support rank stabilized LoRA\n", + " loftq_config = None, # And LoftQ\n", + " target_modules = [\n", + " \"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\",\n", + " \"gate_proj\", \"up_proj\", \"down_proj\",\n", + "\n", + " # Audio layers\n", + " \"post\", \"linear_start\", \"linear_end\",\n", + " \"embedding_projection\",\n", + " \"ffw_layer_1\", \"ffw_layer_2\",\n", + " \"output_proj\",\n", + " ]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vITh0KVJ10qX" + }, + "source": [ + "\n", + "### Data Prep\n", + "We adapt the `kadirnar/Emilia-DE-B000000` dataset for our German ASR task using Gemma 4 multi-modal chat format. Each audio-text pair is structured into a conversation with `system`, `user`, and `assistant` roles. The processor then converts this into the final training format:\n", + "\n", + "```\n", + "<|turn>system\n", + "You are an assistant that transcribes speech accurately.\n", + "<|turn>user\n", + "<|audio|>Please transcribe this audio.\n", + "<|turn>model\n", + "Ich, ich rechne direkt mich an." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-20T12:47:03.723745Z", + "iopub.status.busy": "2025-07-20T12:47:03.723405Z", + "iopub.status.idle": "2025-07-20T12:47:03.729197Z", + "shell.execute_reply": "2025-07-20T12:47:03.728434Z", + "shell.execute_reply.started": "2025-07-20T12:47:03.723714Z" + }, + "id": "o8caH7vlcorj", + "trusted": true + }, + "outputs": [], + "source": [ + "def format_intersection_data(samples: dict) -> dict[str, list]:\n", + " \"\"\"Format intersection dataset to match expected message format\"\"\"\n", + " formatted_samples = {\"messages\": []}\n", + " for idx in range(len(samples[\"audio\"])):\n", + " audio = samples[\"audio\"][idx][\"array\"]\n", + " label = str(samples[\"text\"][idx])\n", + "\n", + " message = [\n", + " {\n", + " \"role\": \"system\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": \"You are an assistant that transcribes speech accurately.\",\n", + " }\n", + " ],\n", + " },\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"audio\", \"audio\": audio},\n", + " {\"type\": \"text\", \"text\": \"Please transcribe this audio.\"}\n", + " ]\n", + " },\n", + " {\n", + " \"role\": \"assistant\",\n", + " \"content\":[{\"type\": \"text\", \"text\": label}]\n", + " }\n", + " ]\n", + " formatted_samples[\"messages\"].append(message)\n", + " return formatted_samples" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-20T12:47:08.489955Z", + "iopub.status.busy": "2025-07-20T12:47:08.489357Z", + "iopub.status.idle": "2025-07-20T12:47:09.221727Z", + "shell.execute_reply": "2025-07-20T12:47:09.221018Z", + "shell.execute_reply.started": "2025-07-20T12:47:08.489932Z" + }, + "id": "k7CQ3jvDcorj", + "trusted": true, + "outputId": "aeb84230-1e13-4ea1-8c68-8ff8052a8ed5", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 49, + "referenced_widgets": [ + "d43bc9476e28487fb14ce0c595ec0413", + "84995d7865524e02989f85122ed00a4e", + "bcd254a4b3204c73ad5b7ccf1dcef3fd", + "b536e273e6914ef38cb9a76748da6c9e", + "e65976960e6c4f7bb3c3eafc429e3414", + "e27d4f411212460e82ea052d16891606", + "e20ed67e0b284c05aba64768e11e519d", + "b93e8d0586814af5812faa092da9d86b", + "a335bf7b2c4e44528dce6bfdc72c3a97", + "9596d3bec56a4d3eb7c7d85347f45db4", + "ebbbb62c1f9c44da8ac5587bec9bb243" + ] + } + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Map (num_proc=4): 0%| | 0/3000 [00:00\n", + "### Train the model\n", + "Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-20T12:48:17.004874Z", + "iopub.status.busy": "2025-07-20T12:48:17.004079Z", + "iopub.status.idle": "2025-07-20T12:48:17.279559Z", + "shell.execute_reply": "2025-07-20T12:48:17.278695Z", + "shell.execute_reply.started": "2025-07-20T12:48:17.004848Z" + }, + "id": "95_Nn-89DhsL", + "trusted": true, + "outputId": "f495ba58-6aed-4fc2-bb7a-6b41db9e5f5c", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Unsloth: Model does not have a default image size - using 512\n" + ] + } + ], + "source": [ + "# Use UnslothVisionDataCollator which handles audio token alignment correctly\n", + "from unsloth.trainer import UnslothVisionDataCollator\n", + "from trl import SFTTrainer, SFTConfig\n", + "\n", + "trainer = SFTTrainer(\n", + " model = model,\n", + " train_dataset = dataset,\n", + " processing_class = processor.tokenizer,\n", + " data_collator = UnslothVisionDataCollator(model, processor),\n", + " args = SFTConfig(\n", + " per_device_train_batch_size = 8,\n", + " gradient_accumulation_steps = 1,\n", + " warmup_ratio = 0.03,\n", + " # num_train_epochs = 1, # Use for full training runs\n", + " max_steps = 60,\n", + " learning_rate = 5e-5,\n", + " logging_steps = 1,\n", + " save_strategy = \"steps\",\n", + " optim = \"adamw_8bit\",\n", + " weight_decay = 0.001,\n", + " lr_scheduler_type = \"cosine\",\n", + " seed = 3407,\n", + " output_dir = \"outputs\",\n", + " report_to = \"none\",\n", + " remove_unused_columns = False,\n", + "\n", + " # The below are a must for audio finetuning:\n", + " dataset_text_field = \"\",\n", + " dataset_kwargs = {\"skip_prepare_dataset\": True},\n", + " max_length = 8192,\n", + " )\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "cellView": "form", + "id": "2ejIt2xSNKKp", + "trusted": true, + "outputId": "94dda14b-e31e-4767-a50b-77e10d404fdd", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "GPU = Tesla T4. Max memory = 14.563 GB.\n", + "9.664 GB of memory reserved.\n" + ] + } + ], + "source": [ + "# @title Show current memory stats\n", + "gpu_stats = torch.cuda.get_device_properties(0)\n", + "start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)\n", + "print(f\"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.\")\n", + "print(f\"{start_gpu_memory} GB of memory reserved.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CNP1Uidk9mrz" + }, + "source": [ + "# Let's train the model!\n", + "\n", + "To resume a training run, set `trainer.train(resume_from_checkpoint = True)`" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "execution": { + "iopub.execute_input": "2025-07-20T12:48:20.209164Z", + "iopub.status.busy": "2025-07-20T12:48:20.208832Z", + "iopub.status.idle": "2025-07-20T13:42:42.607026Z", + "shell.execute_reply": "2025-07-20T13:42:42.606099Z", + "shell.execute_reply.started": "2025-07-20T12:48:20.209142Z" + }, + "id": "yqxqAZ7KJ4oL", + "outputId": "1d493c13-629c-4dd5-804a-ab76d7ba4786", + "trusted": true + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': 2}.\n", + "==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1\n", + " \\\\ /| Num examples = 3,000 | Num Epochs = 1 | Total steps = 60\n", + "O^O/ \\_/ \\ Batch size per device = 8 | Gradient accumulation steps = 1\n", + "\\ / Data Parallel GPUs = 1 | Total batch size (8 x 1 x 1) = 8\n", + " \"-____-\" Trainable parameters = 18,237,440 of 5,141,415,456 (0.35% trained)\n", + "Caching is incompatible with gradient checkpointing in Gemma4TextDecoderLayer. Setting `past_key_values=None`.\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [60/60 01:52, Epoch 0/1]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StepTraining Loss
116.656757
216.980976
316.712601
416.727814
516.962500
616.116722
714.512408
813.387623
913.035607
1011.445036
1111.030166
1210.789420
1310.289940
148.890602
158.548616
168.881482
178.596722
187.866756
197.271352
206.942740
216.609519
226.196516
236.201073
245.180562
255.545625
265.184031
274.980759
284.628974
294.494946
304.073771
313.741437
324.186295
334.365701
343.872540
354.338237
363.886693
373.677906
383.996698
393.182209
403.487222
413.710633
423.023618
433.892579
443.624817
453.996858
463.806535
473.086546
483.591533
493.202632
503.103097
513.445866
523.171514
533.542655
543.233638
553.718076
563.130233
573.508533
583.527344
593.706471
603.161814

" + ] + }, + "metadata": {} + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Unsloth: Will smartly offload gradients to save VRAM!\n" + ] + } + ], + "source": [ + "trainer_stats = trainer.train()" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "cellView": "form", + "id": "pCqnaKmlO1U9", + "trusted": true, + "outputId": "c2a372bc-2110-438d-abb5-217aacacd66a", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "149.1358 seconds used for training.\n", + "2.49 minutes used for training.\n", + "Peak reserved memory = 11.242 GB.\n", + "Peak reserved memory for training = 1.578 GB.\n", + "Peak reserved memory % of max memory = 77.196 %.\n", + "Peak reserved memory for training % of max memory = 10.836 %.\n" + ] + } + ], + "source": [ + "# @title Show final memory and time stats\n", + "used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "used_memory_for_lora = round(used_memory - start_gpu_memory, 3)\n", + "used_percentage = round(used_memory / max_memory * 100, 3)\n", + "lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)\n", + "print(f\"{trainer_stats.metrics['train_runtime']} seconds used for training.\")\n", + "print(\n", + " f\"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.\"\n", + ")\n", + "print(f\"Peak reserved memory = {used_memory} GB.\")\n", + "print(f\"Peak reserved memory for training = {used_memory_for_lora} GB.\")\n", + "print(f\"Peak reserved memory % of max memory = {used_percentage} %.\")\n", + "print(f\"Peak reserved memory for training % of max memory = {lora_percentage} %.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ekOmTR1hSNcr" + }, + "source": [ + "\n", + "### Inference\n", + "Let's run the model via Unsloth native inference! According to the `Gemma-4` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64` but for this example we use `do_sample=False` for ASR." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "execution": { + "iopub.execute_input": "2025-07-20T13:57:34.004664Z", + "iopub.status.busy": "2025-07-20T13:57:34.004306Z", + "iopub.status.idle": "2025-07-20T13:57:59.332316Z", + "shell.execute_reply": "2025-07-20T13:57:59.331671Z", + "shell.execute_reply.started": "2025-07-20T13:57:34.004639Z" + }, + "id": "kR3gIAX-SM2q", + "outputId": "3cdee613-1d20-4c15-fe02-d3b0b5a95b2f", + "trusted": true + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Sie sich direkt mich an und ist mir völlig klar, was politische Interessen gibt im Handel im Austausch mit Waren, dass es politische Einflüsse gibt. Die globale ist die Alternative soll es nicht sein.\n", + "\n" + ] + } + ], + "source": [ + "messages = [\n", + " {\n", + " \"role\": \"system\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": \"You are an assistant that transcribes speech accurately.\",\n", + " }\n", + " ],\n", + " },\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"audio\", \"audio\": test_audio['audio']['array']},\n", + " {\"type\": \"text\", \"text\": \"Please transcribe this audio.\"}\n", + " ]\n", + " }\n", + "]\n", + "\n", + "do_gemma_4_inference(messages, max_new_tokens = 256)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uMuVrWbjAzhc" + }, + "source": [ + "\n", + "### Saving, loading finetuned models\n", + "To save the final model as LoRA adapters, either use Hugging Face's `push_to_hub` for an online save or `save_pretrained` for a local save.\n", + "\n", + "**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "id": "upcOlWe7A1vc", + "trusted": true, + "outputId": "181b8fb3-c43a-46f5-cd1e-d05ad6b3c7ab", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "['gemma_4_lora/processor_config.json']" + ] + }, + "metadata": {}, + "execution_count": 16 + } + ], + "source": [ + "model.save_pretrained(\"gemma_4_lora\") # Local saving\n", + "processor.save_pretrained(\"gemma_4_lora\")\n", + "# model.push_to_hub(\"HF_ACCOUNT/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving\n", + "# processor.push_to_hub(\"HF_ACCOUNT/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AEEcJ4qfC7Lp" + }, + "source": [ + "Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "id": "MKX_XKs_BNZR", + "trusted": true, + "outputId": "9dbc6e55-b0ab-42a3-992c-985b3bd336e6", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "I am Gemma 4, a Large Language Model developed by Google DeepMind. I am an open weights model.\n" + ] + } + ], + "source": [ + "if False:\n", + " from unsloth import FastModel\n", + " model, processor = FastModel.from_pretrained(\n", + " model_name = \"gemma_4_lora\", # YOUR MODEL YOU USED FOR TRAINING\n", + " max_seq_length = 2048,\n", + " load_in_4bit = True,\n", + " )\n", + "\n", + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\" : \"text\", \"text\" : \"What is Gemma-4?\",}]\n", + "}]\n", + "inputs = processor.apply_chat_template(\n", + " messages,\n", + " add_generation_prompt = True, # Must add for generation\n", + " return_tensors = \"pt\",\n", + " tokenize = True,\n", + " return_dict = True,\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "_ = model.generate(\n", + " **inputs,\n", + " max_new_tokens = 128, # Increase for longer outputs!\n", + " # Recommended Gemma-4 settings!\n", + " temperature = 1.0, top_p = 0.95, top_k = 64,\n", + " streamer = TextStreamer(processor, skip_prompt = True),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f422JgM9sdVT" + }, + "source": [ + "### Saving to float16 for VLLM\n", + "\n", + "We also support saving to `float16` directly for deployment! We save it in the folder `gemma-4-finetune`. Set `if False` to `if True` to let it run!" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "id": "iHjt_SMYsd3P", + "trusted": true + }, + "outputs": [], + "source": [ + "if False: # Change to True to save finetune!\n", + " model.save_pretrained_merged(\"gemma-4\", processor)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "z6O48DbNIAr0" + }, + "source": [ + "If you want to upload / push to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location!" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "id": "ZV-CiKPrIFG0", + "trusted": true + }, + "outputs": [], + "source": [ + "if False: # Change to True to upload finetune\n", + " model.push_to_hub_merged(\n", + " \"HF_ACCOUNT/gemma-4-finetune\", processor,\n", + " token = \"YOUR_HF_TOKEN\"\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TCv4vXHd61i7" + }, + "source": [ + "### GGUF / llama.cpp Conversion\n", + "To save to `GGUF` / `llama.cpp`, we support it natively now for all models! For now, you can convert easily to `Q8_0, F16 or BF16` precision. `Q4_K_M` for 4bit will come later!" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "id": "FqfebeAdT073", + "trusted": true + }, + "outputs": [], + "source": [ + "if False: # Change to True to save to GGUF\n", + " model.save_pretrained_gguf(\n", + " \"gemma_4_finetune\",\n", + " processor,\n", + " quantization_method = \"Q8_0\", # For now only Q8_0, BF16, F16 supported\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Q974YEVPI7JS" + }, + "source": [ + "Likewise, if you want to instead push to GGUF to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location!" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "id": "ZgcJIhJ0I_es", + "trusted": true + }, + "outputs": [], + "source": [ + "if False: # Change to True to upload GGUF\n", + " model.push_to_hub_gguf(\n", + " \"HF_ACCOUNT/gemma_4_finetune\",\n", + " processor,\n", + " quantization_method = \"Q8_0\", # Only Q8_0, BF16, F16 supported\n", + " token = \"YOUR_HF_TOKEN\",\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pnz9QOYTMvbH" + }, + "source": [ + "Now, use the `gemma-4-finetune.gguf` file or `gemma-4-finetune-Q4_K_M.gguf` file in llama.cpp.\n", + "\n", + "And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!\n", + "\n", + "Some other resources:\n", + "1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)\n", + "2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)\n", + "3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)\n", + "4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)!\n", + "\n", + "

\n", + " \n", + " \n", + " \n", + "\n", + " Join Discord if you need help + ⭐️ Star us on Github ⭐️\n", + "
\n", + "\n", + " This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme)." + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kaggle": { + "accelerator": "nvidiaTeslaT4", + "dataSources": [], + "dockerImageVersionId": 31040, + "isGpuEnabled": true, + "isInternetEnabled": true, + "language": "python", + "sourceType": "notebook" + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "81fe811cdf2e40c7a020c6e1f097cd02": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_f6590b74c331415d8f0fcee104f97e58", + "IPY_MODEL_31398a58983d41ed8d94842c07415e8c", + "IPY_MODEL_f9e4ff751fc041e4b5be42b72e17b882" + ], + "layout": "IPY_MODEL_caea4099c7034f208fd13c63553ee560" + } + }, + "f6590b74c331415d8f0fcee104f97e58": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_83192d6972f94f9b9a15a739071788b4", + "placeholder": "​", + "style": "IPY_MODEL_4d6001fc4d6d4697a930679d653d27fb", + "value": "model.safetensors: 100%" + } + }, + "31398a58983d41ed8d94842c07415e8c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3b9e9316b3ac4966ace4c5e1c4d67fef", + "max": 10246621918, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_f30c08ae79e44226b06b309a7ec669bf", + "value": 10246621918 + } + }, + "f9e4ff751fc041e4b5be42b72e17b882": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c64809e43c084f7eb3e65a3ba91ee80f", + "placeholder": "​", + "style": "IPY_MODEL_fda067b1baf544eca44913a43c3f001b", + "value": " 10.2G/10.2G [01:58<00:00, 71.4MB/s]" + } + }, + "caea4099c7034f208fd13c63553ee560": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "83192d6972f94f9b9a15a739071788b4": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4d6001fc4d6d4697a930679d653d27fb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3b9e9316b3ac4966ace4c5e1c4d67fef": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f30c08ae79e44226b06b309a7ec669bf": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "c64809e43c084f7eb3e65a3ba91ee80f": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "fda067b1baf544eca44913a43c3f001b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "45f04faa71274efa8317e471ab73df6a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_13b6799823d64421a2ba3ef6c4961b39", + "IPY_MODEL_7f5e953704d44cf7ba384302d214eb55", + "IPY_MODEL_8976a2562ce148c38981cd0325eceeae" + ], + "layout": "IPY_MODEL_63024153abcd4f0e81cdff2c184ed280" + } + }, + "13b6799823d64421a2ba3ef6c4961b39": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_db2f1572dd82426e9ba78ac3f4e28093", + "placeholder": "​", + "style": "IPY_MODEL_f2059b9eb34142fab10d1dae70057451", + "value": "Loading weights: 100%" + } + }, + "7f5e953704d44cf7ba384302d214eb55": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_591782010deb4fa98f362ba97d42b9c3", + "max": 2011, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_72aa31f055ae42b08e2b6d6a8ac9ad70", + "value": 2011 + } + }, + "8976a2562ce148c38981cd0325eceeae": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_f6d3ae159edc4736858f12c8783015ee", + "placeholder": "​", + "style": "IPY_MODEL_64aca0cc3a1e467090334c92643dfe7e", + "value": " 2011/2011 [00:43<00:00, 186.35it/s]" + } + }, + "63024153abcd4f0e81cdff2c184ed280": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "db2f1572dd82426e9ba78ac3f4e28093": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f2059b9eb34142fab10d1dae70057451": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "591782010deb4fa98f362ba97d42b9c3": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "72aa31f055ae42b08e2b6d6a8ac9ad70": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "f6d3ae159edc4736858f12c8783015ee": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "64aca0cc3a1e467090334c92643dfe7e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "04d4131c40f14de2b114abeaea1d3826": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_68cc0f42113a4f27818d9cae45a94f12", + "IPY_MODEL_2f65fe0251104228bdf8d75f2efd0654", + "IPY_MODEL_669fcdcecc064750844f95e36a004ff8" + ], + "layout": "IPY_MODEL_01410e3b39df424aa026edb00683ce33" + } + }, + "68cc0f42113a4f27818d9cae45a94f12": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_273b695a8b3d4634a22062ee99424813", + "placeholder": "​", + "style": "IPY_MODEL_a3c62f1549f84c30b1d0f84bedd9e5a4", + "value": "generation_config.json: 100%" + } + }, + "2f65fe0251104228bdf8d75f2efd0654": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b246f71f9b6648d09ef6928a7f0bbdd7", + "max": 208, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_58cbb0c1da41492d9b46e5e83bfca52b", + "value": 208 + } + }, + "669fcdcecc064750844f95e36a004ff8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_647449f158594e1a9b832e738bcae2af", + "placeholder": "​", + "style": "IPY_MODEL_e24bfd8d4e3348eba9569636f65a6b65", + "value": " 208/208 [00:00<00:00, 20.6kB/s]" + } + }, + "01410e3b39df424aa026edb00683ce33": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "273b695a8b3d4634a22062ee99424813": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a3c62f1549f84c30b1d0f84bedd9e5a4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "b246f71f9b6648d09ef6928a7f0bbdd7": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "58cbb0c1da41492d9b46e5e83bfca52b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "647449f158594e1a9b832e738bcae2af": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e24bfd8d4e3348eba9569636f65a6b65": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "693e732ffcc44d8dbc47486387654633": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_b88e8cd127e943d0a684db44deed876f", + "IPY_MODEL_a5b8bbbb241d48d78e1a3c46fef8bcc7", + "IPY_MODEL_848c2eb0e73c4d1cbcb85f8446039006" + ], + "layout": "IPY_MODEL_a0eca54ca2d34b899bcef8ccfc6cb83c" + } + }, + "b88e8cd127e943d0a684db44deed876f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b4a66cfed2e948aba00fd34091174cda", + "placeholder": "​", + "style": "IPY_MODEL_868be719d25c493188ee325aa0ea9f54", + "value": "processor_config.json: " + } + }, + "a5b8bbbb241d48d78e1a3c46fef8bcc7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_2c52a24ff42c4e52b053e72d6f07ea36", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_8de6249b9d684a22b8b60de8851bfae6", + "value": 1 + } + }, + "848c2eb0e73c4d1cbcb85f8446039006": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_2f8c19159cc44fdbb5c9b7be4995e201", + "placeholder": "​", + "style": "IPY_MODEL_c622d436711b4e3d8f1939bbd8f85d00", + "value": " 1.69k/? [00:00<00:00, 104kB/s]" + } + }, + "a0eca54ca2d34b899bcef8ccfc6cb83c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b4a66cfed2e948aba00fd34091174cda": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "868be719d25c493188ee325aa0ea9f54": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "2c52a24ff42c4e52b053e72d6f07ea36": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "8de6249b9d684a22b8b60de8851bfae6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "2f8c19159cc44fdbb5c9b7be4995e201": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c622d436711b4e3d8f1939bbd8f85d00": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "378f1be25c5b45eeb18a64fbfe0fb40c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_de257ad63c5a4504832a924cc44fc485", + "IPY_MODEL_b7ddcd119e1c42e8b880f021482f1b15", + "IPY_MODEL_bfbcf756decf48e9989f5782adfefd9e" + ], + "layout": "IPY_MODEL_f839e2ad69694edabfa0b7a0f3b51041" + } + }, + "de257ad63c5a4504832a924cc44fc485": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_87352f69737a40a58b026f1b283af684", + "placeholder": "​", + "style": "IPY_MODEL_9599429dce8a464ebe38e518669ee265", + "value": "chat_template.jinja: " + } + }, + "b7ddcd119e1c42e8b880f021482f1b15": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_5ca635ba959a47b6860e5d94fb25cc8f", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_c39b230ccb804319888e8b57ed0b6634", + "value": 1 + } + }, + "bfbcf756decf48e9989f5782adfefd9e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_2d7db58670d4478abab9e945eb503cf6", + "placeholder": "​", + "style": "IPY_MODEL_8c03e642a64b44499135db3eb2cccf8e", + "value": " 11.9k/? [00:00<00:00, 881kB/s]" + } + }, + "f839e2ad69694edabfa0b7a0f3b51041": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "87352f69737a40a58b026f1b283af684": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9599429dce8a464ebe38e518669ee265": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "5ca635ba959a47b6860e5d94fb25cc8f": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "c39b230ccb804319888e8b57ed0b6634": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "2d7db58670d4478abab9e945eb503cf6": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8c03e642a64b44499135db3eb2cccf8e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "f4e6db57edf643719111255a7adf3407": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_904fac86bfbb4071b956a3dd03d95df3", + "IPY_MODEL_bb9c7a2b598947de99356500943eb2e0", + "IPY_MODEL_44727df2f31c4367a47f20308f9129d8" + ], + "layout": "IPY_MODEL_cc847310b49a496f953be34f1ba0a6ba" + } + }, + "904fac86bfbb4071b956a3dd03d95df3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_4f664b28437248548e0cd5e19a5f3fbe", + "placeholder": "​", + "style": "IPY_MODEL_e5fcaad65cc3426a939e53d288368eac", + "value": "tokenizer_config.json: " + } + }, + "bb9c7a2b598947de99356500943eb2e0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_022e4158503b437b81d3004414ff32d2", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_315a245ef752448b801a33c5c01f74bc", + "value": 1 + } + }, + "44727df2f31c4367a47f20308f9129d8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_2fdfd142d38b42b0852adc8c6992b63d", + "placeholder": "​", + "style": "IPY_MODEL_d332c968458f4e859f505b38c9c722fe", + "value": " 14.9k/? [00:00<00:00, 1.27MB/s]" + } + }, + "cc847310b49a496f953be34f1ba0a6ba": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4f664b28437248548e0cd5e19a5f3fbe": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e5fcaad65cc3426a939e53d288368eac": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "022e4158503b437b81d3004414ff32d2": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "315a245ef752448b801a33c5c01f74bc": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "2fdfd142d38b42b0852adc8c6992b63d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d332c968458f4e859f505b38c9c722fe": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "f80dd089c148441ca453d00ce380f91f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_f6797c8912d841dca0a8adb6b273069e", + "IPY_MODEL_b698bc4e75a446d3a27a9a206cb5d232", + "IPY_MODEL_9cafc24fe0704dd6870d1c9d196bbbe8" + ], + "layout": "IPY_MODEL_2763f369d1344c5486b8c78e79073369" + } + }, + "f6797c8912d841dca0a8adb6b273069e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_20d34e4cb7e34e7f84034253770a7930", + "placeholder": "​", + "style": "IPY_MODEL_7d86d845c7dd4f9b8a1d336d2360f4b7", + "value": "tokenizer.json: 100%" + } + }, + "b698bc4e75a446d3a27a9a206cb5d232": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_2f23ce19221d4e37a8b1ce6e70feed67", + "max": 32169626, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_f30d5c5c5e344a569e3a04486295d501", + "value": 32169626 + } + }, + "9cafc24fe0704dd6870d1c9d196bbbe8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3f0de1b87f344d3a9d7e91d28b7366f0", + "placeholder": "​", + "style": "IPY_MODEL_bd004ca4b1114f7ba045cfeca32cf351", + "value": " 32.2M/32.2M [00:00<00:00, 94.4MB/s]" + } + }, + "2763f369d1344c5486b8c78e79073369": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "20d34e4cb7e34e7f84034253770a7930": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7d86d845c7dd4f9b8a1d336d2360f4b7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "2f23ce19221d4e37a8b1ce6e70feed67": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f30d5c5c5e344a569e3a04486295d501": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "3f0de1b87f344d3a9d7e91d28b7366f0": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "bd004ca4b1114f7ba045cfeca32cf351": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "377d9407a51a4afbb5e2ac3082ec7286": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_201602a3b2ab4cef81f050f8aa5bdee7", + "IPY_MODEL_1667672dbc2649db83027b1b29e68a7f", + "IPY_MODEL_825cb6dfd0744bd390edd6f0e7026ece" + ], + "layout": "IPY_MODEL_010105d818c94d9194d670e16e3df612" + } + }, + "201602a3b2ab4cef81f050f8aa5bdee7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_983e57d6d3c344f48d54c4c134dd6247", + "placeholder": "​", + "style": "IPY_MODEL_c91c8bffca8b4929946d93e0df229a83", + "value": "README.md: 100%" + } + }, + "1667672dbc2649db83027b1b29e68a7f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_65ffffa2ccd443deb7c5c5bdc448a852", + "max": 540, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_66b30527c4594eda928fae52411673b8", + "value": 540 + } + }, + "825cb6dfd0744bd390edd6f0e7026ece": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_998ab65506514bb2928732c805853ea8", + "placeholder": "​", + "style": "IPY_MODEL_945a62bbaf7e45a09e2b705c345cc615", + "value": " 540/540 [00:00<00:00, 58.4kB/s]" + } + }, + "010105d818c94d9194d670e16e3df612": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "983e57d6d3c344f48d54c4c134dd6247": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c91c8bffca8b4929946d93e0df229a83": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "65ffffa2ccd443deb7c5c5bdc448a852": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "66b30527c4594eda928fae52411673b8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "998ab65506514bb2928732c805853ea8": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "945a62bbaf7e45a09e2b705c345cc615": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "5286fa400cf14de49f465f7a7a9d4b4f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_d9cde02d9c94465aa5e48510be47dad7", + "IPY_MODEL_3d2dff947c7d4a65b6a538929aa038a2", + "IPY_MODEL_6a0cb366fcf044d696006c20872eb225" + ], + "layout": "IPY_MODEL_5a4ce86e944d4fac9e4834c8e78520e0" + } + }, + "d9cde02d9c94465aa5e48510be47dad7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_39dc7790b1f449b9afcab6f70adf06d4", + "placeholder": "​", + "style": "IPY_MODEL_71a3a19710ea445eb5af3ee5a3c72e7e", + "value": "data/train-00000-of-00002.parquet: 100%" + } + }, + "3d2dff947c7d4a65b6a538929aa038a2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_00e96565af7c42838df39ed1fe6e130d", + "max": 494804366, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_f4f69ac8b98041d182ba596aff68e104", + "value": 494804366 + } + }, + "6a0cb366fcf044d696006c20872eb225": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3220f7b1d6c846c5876ba68bb62770b1", + "placeholder": "​", + "style": "IPY_MODEL_02279527eb8940b3939da867aa5f4415", + "value": " 495M/495M [00:04<00:00, 146MB/s]" + } + }, + "5a4ce86e944d4fac9e4834c8e78520e0": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "39dc7790b1f449b9afcab6f70adf06d4": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "71a3a19710ea445eb5af3ee5a3c72e7e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "00e96565af7c42838df39ed1fe6e130d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f4f69ac8b98041d182ba596aff68e104": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "3220f7b1d6c846c5876ba68bb62770b1": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "02279527eb8940b3939da867aa5f4415": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "689827e4e746485898285218bb324ae7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_d8430623a8b94ea5ad270db056ce70e8", + "IPY_MODEL_7e0a033b1c18437a9610f23f70d5185a", + "IPY_MODEL_a84862f54c724637be4ced2683fcc422" + ], + "layout": "IPY_MODEL_4e30bc04c974417689ae67031d69d14a" + } + }, + "d8430623a8b94ea5ad270db056ce70e8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_2817c7c341e442c3a4b2ad2daf075efe", + "placeholder": "​", + "style": "IPY_MODEL_3447c8ec5ee847c2bee5d297b2783dd7", + "value": "data/train-00001-of-00002.parquet: 100%" + } + }, + "7e0a033b1c18437a9610f23f70d5185a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_1c8e63219be941c8b55c10245b79c3cb", + "max": 502613920, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_8aa2a11e654148699c6403fbc83dfbbd", + "value": 502613920 + } + }, + "a84862f54c724637be4ced2683fcc422": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a314b22b91354a25a14101dccacc7aea", + "placeholder": "​", + "style": "IPY_MODEL_a57cd4d7142841249ec3d46a603e27f1", + "value": " 503M/503M [00:12<00:00, 28.0MB/s]" + } + }, + "4e30bc04c974417689ae67031d69d14a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2817c7c341e442c3a4b2ad2daf075efe": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3447c8ec5ee847c2bee5d297b2783dd7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "1c8e63219be941c8b55c10245b79c3cb": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8aa2a11e654148699c6403fbc83dfbbd": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "a314b22b91354a25a14101dccacc7aea": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a57cd4d7142841249ec3d46a603e27f1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "227ff7810c1e442e8ea8bd2eb7897bca": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_7651bdb7b49440189e09e316b2f96279", + "IPY_MODEL_c54c874fdf7e4a11bee22f8f1aff1da2", + "IPY_MODEL_4b56c3a68ba34df08c7b64e88fdbc7ef" + ], + "layout": "IPY_MODEL_8111e2f9278947459aae516bc0db01ef" + } + }, + "7651bdb7b49440189e09e316b2f96279": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3634546c0261453dba5223f64922f55e", + "placeholder": "​", + "style": "IPY_MODEL_0e0dae87b5d14e0483a43308f8d3fd87", + "value": "Generating train split: 100%" + } + }, + "c54c874fdf7e4a11bee22f8f1aff1da2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_dcccf14fcfd44f57a5e989bc14bc32af", + "max": 12038, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_21e9929cd39b4216b0b4682af9d72ed2", + "value": 12038 + } + }, + "4b56c3a68ba34df08c7b64e88fdbc7ef": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_55fb4a3a90e9475bab2d7acee9f84a15", + "placeholder": "​", + "style": "IPY_MODEL_99c3fcf6bc24410fb68c3c09867b99db", + "value": " 12038/12038 [00:24<00:00, 470.80 examples/s]" + } + }, + "8111e2f9278947459aae516bc0db01ef": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3634546c0261453dba5223f64922f55e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0e0dae87b5d14e0483a43308f8d3fd87": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "dcccf14fcfd44f57a5e989bc14bc32af": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "21e9929cd39b4216b0b4682af9d72ed2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "55fb4a3a90e9475bab2d7acee9f84a15": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "99c3fcf6bc24410fb68c3c09867b99db": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "d43bc9476e28487fb14ce0c595ec0413": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_84995d7865524e02989f85122ed00a4e", + "IPY_MODEL_bcd254a4b3204c73ad5b7ccf1dcef3fd", + "IPY_MODEL_b536e273e6914ef38cb9a76748da6c9e" + ], + "layout": "IPY_MODEL_e65976960e6c4f7bb3c3eafc429e3414" + } + }, + "84995d7865524e02989f85122ed00a4e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_e27d4f411212460e82ea052d16891606", + "placeholder": "​", + "style": "IPY_MODEL_e20ed67e0b284c05aba64768e11e519d", + "value": "Map (num_proc=4): 100%" + } + }, + "bcd254a4b3204c73ad5b7ccf1dcef3fd": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b93e8d0586814af5812faa092da9d86b", + "max": 3000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_a335bf7b2c4e44528dce6bfdc72c3a97", + "value": 3000 + } + }, + "b536e273e6914ef38cb9a76748da6c9e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_9596d3bec56a4d3eb7c7d85347f45db4", + "placeholder": "​", + "style": "IPY_MODEL_ebbbb62c1f9c44da8ac5587bec9bb243", + "value": " 3000/3000 [00:54<00:00, 76.02 examples/s]" + } + }, + "e65976960e6c4f7bb3c3eafc429e3414": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e27d4f411212460e82ea052d16891606": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e20ed67e0b284c05aba64768e11e519d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "b93e8d0586814af5812faa092da9d86b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a335bf7b2c4e44528dce6bfdc72c3a97": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "9596d3bec56a4d3eb7c7d85347f45db4": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ebbbb62c1f9c44da8ac5587bec9bb243": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "state": {} + } + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E2B)-Text.ipynb b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E2B)-Text.ipynb new file mode 100644 index 0000000..6f60fc1 --- /dev/null +++ b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E2B)-Text.ipynb @@ -0,0 +1,7119 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "Lu6trOyZlsLl" + }, + "source": [ + "To run this, press \"*Runtime*\" and press \"*Run all*\" on a **free** Tesla T4 Google Colab instance!\n", + "
\n", + "\n", + "\n", + " Join Discord if you need help + ⭐ Star us on Github ⭐\n", + "
\n", + "\n", + "To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme).\n", + "\n", + "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "y1CStbv9lsLl" + }, + "source": [ + "### News" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-dDBYtlOlsLm" + }, + "source": [ + "Introducing **Unsloth Studio** - a new open source, no-code web UI to train and run LLMs. [Blog](https://unsloth.ai/docs/new/studio) • [Notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb)\n", + "\n", + "\n", + "\n", + "\n", + "
\"Unsloth
Train models — no code needed
\"Unsloth
Run GGUF models on Mac, Windows & Linux
\n", + "\n", + "Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe)\n", + "\n", + "Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context)\n", + "\n", + "New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) • [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) • [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning)\n", + "\n", + "Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "r8rhPwTKlsLm" + }, + "source": [ + "### Installation" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "6FRKDTtwlsLm" + }, + "outputs": [], + "source": "%%capture\nimport os, re\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r'[\\d]{1,}\\.[\\d]{1,}', str(torch.__version__)).group(0)\n xformers = 'xformers==' + {'2.10':'0.0.34','2.9':'0.0.33.post1','2.8':'0.0.32.post2'}.get(v, \"0.0.34\")\n !pip install sentencepiece protobuf \"datasets==4.3.0\" \"huggingface_hub>=0.34.0\" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;" + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "lBN09c1tUlSV" + }, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TGMWlrRdzwgf" + }, + "source": [ + "### Unsloth\n", + "\n", + "`FastModel` supports loading nearly any model now! This includes Vision and Text models!" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 397, + "referenced_widgets": [ + "3d2921a997f043c781203236278c6444", + "33f7c35206e7496c8b68189c982f6c96", + "feb2769cdd91429186c2cad2d519e5e8", + "49109d6abdda4b5987afb03c8ca82133", + "c73912ed1a48494895d8d8d5953ba656", + "c9899707f4ed4df9afdf80b91a75b2fc", + "cbc932cc9d654db7afaadced26c8ef7a", + "2be52eb70e904219a3a2256a3d466dc1", + "3a8f32dbbda14e33a997a0a361c5042f", + "2f6b8dafccee4a7386efd09f13d3f30a", + "41eb3536d2b04d6aae65a11b87a09d3c", + "67c4746cc365404d80e4187c371b2503", + "b21842e752724217971c9f16ec028704", + "52985a9a27ca465da304a09d532165a8", + "e33733c0fad5429e9b8f017a62860bc3", + "0181be8f1f67481aa6d90a6d34b3d27b", + "0762c9ebd93843dd8c5a0bc0e44f21e6", + "5698ed0353e74431b3658499cbb6184c", + "0d48488f4d5a427a84527c913213fc17", + "518191a0c4994a398e788e31a1612e46", + "007fc851909b46b08b199ac4540a8391", + "8697d011c5814742b7e5872c26c3a840", + "e4db4b7388a24022b4166fb8398d84f6", + "1a99b2cc5a834557be169221d5e5e922", + "9f3c0e0b175d4d029a27b4425e190fdf", + "fab9219a5f4f4ba094c7e3b79eee978f", + "6b0b45c64c2f42bcad42097be582b582", + "479c805860804745ac0addaebca04c08", + "2af827637bc14b99966ac252e71ca1d3", + "e01ca3fe18bd4f0faef2bf4df4faa892", + "a37f76bac8d34ac982c7d6642c82d0ef", + "bd43efb495044372b169c927f8d0584e", + "58f143afe4ed43efa145bacb1d9bf542", + "a5be5bfba3ec4648aa6ccd34f2eed259", + "1b1b4f5f31a745ee98072cd87c99df89", + "dab210ff7ce2430caf1e2d4302f8565a", + "e2e043fe9da2485e9afd571d06bc27c7", + "fe08286d945446788aa1a44dac8f7263", + "0c82571976a040d3a0f6596cdca32b6e", + "d5d3bf30ac524d00b0cc3982fb66386f", + "84bbfab05a234b12ba5a2d38015c9f1a", + "fea38f649a9e47b4a6185d85a62669a0", + "0ef55274df7042b5ae47880fb72d82cc", + "68b984644256400f9024e9d37998bd2f", + "3a0184a9cfef443385dd49f20f12cfc1", + "6535c0b518144e108ac22d4c523aef32", + "83f222f05c974fa191d0f4fd6186b1b5", + "70f3613a5f1f4661b8b9aff8295bf7bb", + "2c80fe8defd7468da10b89d8de1f96b1", + "af02af0771fe4f6e9d5f8123b6db676d", + "6fe995128adb4c08b07cee2a7de09454", + "653651f81f3349bdbee628cd37aafa91", + "172bc6be7ccb4efe80b6bf255f426e0a", + "dc4c8288184c4a69b1f19cdb5167e581", + "51884ea25fd94727889e54da2fe4b311", + "04f2d1fee6cf4450854a90afcca7352a", + "204b778594564833814db99c06ca4fbb", + "c914b297689a4b69b9d02b01fb0f4d78", + "843a0d013b794eaa8927086df2a38ae7", + "46c5ab9910614cb599b2694bb322658a", + "bb7b78efc82c43f78b8579e0027b5ff4", + "f0e4f49d60694bc5a603b3a37c7b6a0d", + "db61162e59c94b4fa10ff4a4c0c97343", + "0485e800a81f43b1943423f901bd1a4f", + "b55db2c45b4048cda35766d236e4e84c", + "26b5672854ef4849a6d3d6ec7429d87b", + "0e51754fe18440349b35bcc438b1bf02", + "be57bb04bde04a58bb37d91bfbf3b51c", + "05b6d19db7c447a9ae7d380f9d7765d8", + "aeb32719f54e42a4b08a53cd5363a0d5", + "fde969cdd088420e972b911db636ae2d", + "27af09daa2374b7f9ce779aa7192b178", + "d22ddd345a85405c90fbceb0c623522d", + "3f54b22697044f4cb7c4555d1e443eaf", + "3169b2afaf5c4bc8bfd94d87aaca04b0", + "5d6cb7d0eec540e4873f290ad0c51856", + "111000cfd1af4e2db1f2c2557cd7206a" + ] + }, + "id": "-Xbb0cuLzwgf", + "outputId": "9c607d3f-f9ab-4716-d225-f3e351d135a1" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.\n", + "🦥 Unsloth Zoo will now patch everything to make training faster!\n", + "==((====))== Unsloth 2026.4.4: Fast Gemma4 patching. Transformers: 5.5.0.\n", + " \\\\ /| Tesla T4. Num GPUs = 1. Max memory: 14.563 GB. Platform: Linux.\n", + "O^O/ \\_/ \\ Torch: 2.10.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.6.0\n", + "\\ / Bfloat16 = FALSE. FA [Xformers = 0.0.34. FA2 = False]\n", + " \"-____-\" Free license: http://github.com/unslothai/unsloth\n", + "Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!\n", + "Unsloth: QLoRA and full finetuning all not selected. Switching to 16bit LoRA.\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "model.safetensors: 0%| | 0.00/10.2G [00:00" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "9jGeSb9bWe0k", + "outputId": "fea060ef-c599-475e-b3fe-d6bf33d15a7c" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The animal in the image is a **sloth**.\n", + "\n", + "Since you haven't provided any specific context (like a movie poster, a scene, or a specific source), I cannot tell you *which* films this specific sloth features in.\n", + "\n", + "**To help me answer your question, please provide more context, such as:**\n", + "\n", + "1. **A picture or a clip from a film.**\n", + "2. **The name of a movie you are thinking of.**\n", + "3. **Any other information you have about where you saw this animal.**\n" + ] + } + ], + "source": [ + "sloth_link = \"https://files.worldwildlife.org/wwfcmsprod/images/Sloth_Sitting_iStock_3_12_2014/story_full_width/8l7pbjmj29_iStock_000011145477Large_mini__1_.jpg\"\n", + "\n", + "messages = [{\n", + " \"role\" : \"user\",\n", + " \"content\": [\n", + " { \"type\": \"image\", \"image\" : sloth_link },\n", + " { \"type\": \"text\", \"text\" : \"Which films does this animal feature in?\" }\n", + " ]\n", + "}]\n", + "# You might have to wait 1 minute for Unsloth's auto compiler\n", + "do_gemma_4_inference(messages, max_new_tokens = 256)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eh0BzbZPWtRD" + }, + "source": [ + "Let's make a poem about sloths!" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "R3ExuK8cWuT3", + "outputId": "f35f34d0-67fa-4693-ced3-1b5c3694db29" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "In emerald canopy, where sunlight streams,\n", + "And moss-draped branches weave their silent dreams,\n", + "There hangs a creature, slow and low,\n", + "A sloth, in nature's gentle flow.\n", + "\n", + "With fur the shade of bark and leaf,\n", + "A quiet grace, beyond belief.\n", + "He moves with purpose, soft and deep,\n", + "While the hurried world is fast asleep.\n", + "\n", + "A master of the languid pace,\n", + "A gentle smile upon his face.\n", + "He clings to limbs with steady hold,\n", + "A story in his stillness told.\n", + "\n", + "The world rushes by in streaks of light,\n", + "But he moves through amber\n" + ] + } + ], + "source": [ + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{ \"type\" : \"text\",\n", + " \"text\" : \"Write a poem about sloths.\" }]\n", + "}]\n", + "do_gemma_4_inference(messages)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wZrmFRZpZtGf" + }, + "source": [ + "# Gemma 4 can also hear!" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 76 + }, + "id": "68crYajNZtw1", + "outputId": "6983d839-522c-4d43-82f3-1f13adf867ff" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + " \n", + " " + ] + }, + "metadata": {}, + "execution_count": 7 + } + ], + "source": [ + "from IPython.display import Audio, display\n", + "Audio(\"https://www.nasa.gov/wp-content/uploads/2015/01/591240main_JFKmoonspeech.mp3\")" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "id": "k3vrdoa0Z01X" + }, + "outputs": [], + "source": [ + "!wget -qqq https://www.nasa.gov/wp-content/uploads/2015/01/591240main_JFKmoonspeech.mp3 -O audio.mp3" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "BJr_D4O9Z2Zh", + "outputId": "d3bd6949-cd9a-413a-ea31-03f8ae798f1b" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The audio is about the belief that a nation should commit itself to achieving the goal of landing a man on the moon and returning him safely to the Earth before this decade is out.\n", + "\n" + ] + } + ], + "source": [ + "audio_file = \"audio.mp3\"\n", + "\n", + "messages = [{\n", + " \"role\" : \"user\",\n", + " \"content\": [\n", + " { \"type\": \"audio\", \"audio\" : audio_file },\n", + " { \"type\": \"text\", \"text\" : \"What is this audio about?\" }\n", + " ]\n", + "}]\n", + "do_gemma_4_inference(messages, max_new_tokens = 256)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "L15JuAmmaOkB" + }, + "source": [ + "# Let's combine all 3 modalities together!" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "is37bsDZaRwV", + "outputId": "ac8a7bb9-aea5-44b0-f6a6-b276dac72236" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The image is of a koala, and the audio is a speech about a nation committing to landing a man on the moon and returning him safely within a decade. The two are unrelated.\n", + "\n" + ] + } + ], + "source": [ + "messages = [{\n", + " \"role\" : \"user\",\n", + " \"content\": [\n", + " { \"type\": \"audio\", \"audio\" : audio_file },\n", + " { \"type\": \"image\", \"image\" : sloth_link },\n", + " { \"type\": \"text\", \"text\" : \"What is this audio and image about? \"\\\n", + " \"How are they related?\" }\n", + " ]\n", + "}]\n", + "do_gemma_4_inference(messages, max_new_tokens = 256)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Bw5XPyYFajyM" + }, + "source": [ + "# Let's finetune Gemma 4!\n", + "\n", + "You can finetune the vision and text parts for now through selection - the audio part can also be finetuned - we're working to make it selectable as well!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SXd9bTZd1aaL" + }, + "source": [ + "We now add LoRA adapters so we only need to update a small amount of parameters!" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "6bZsfBuZDeCL", + "outputId": "0dfe237b-52ce-4215-b38b-de73ff7fb01b" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Unsloth: Making `model.base_model.model.model.language_model` require gradients\n" + ] + } + ], + "source": [ + "model = FastModel.get_peft_model(\n", + " model,\n", + " finetune_vision_layers = False, # Turn off for just text!\n", + " finetune_language_layers = True, # Should leave on!\n", + " finetune_attention_modules = True, # Attention good for GRPO\n", + " finetune_mlp_modules = True, # Should leave on always!\n", + "\n", + " r = 8, # Larger = higher accuracy, but might overfit\n", + " lora_alpha = 8, # Recommended alpha == r at least\n", + " lora_dropout = 0,\n", + " bias = \"none\",\n", + " random_state = 3407,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vITh0KVJ10qX" + }, + "source": [ + "\n", + "### Data Prep\n", + "We now use the `Gemma-4` format for conversation style finetunes. We use [Maxime Labonne's FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset in ShareGPT style. Gemma-4 renders multi turn conversations like below:\n", + "\n", + "```\n", + "<|turn>user\n", + "Hello\n", + "<|turn>model\n", + "Hey there!\n", + "```\n", + "We use our `get_chat_template` function to get the correct chat template. We support `zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, phi3, llama3, phi4, qwen2.5, gemma3, gemma-4` and more." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "id": "LjY75GoYUCB8" + }, + "outputs": [], + "source": [ + "from unsloth.chat_templates import get_chat_template\n", + "tokenizer = get_chat_template(\n", + " tokenizer,\n", + " chat_template = \"gemma-4\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZQkXuGYxbJ-e" + }, + "source": [ + "We get the first 3000 rows of the dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 113, + "referenced_widgets": [ + "870cc718caea4b09a6ac7d43bcc34b4c", + "0dce530f23a24cddab63d607f3b94965", + "bfa5e47f4ec74dfc83cf4c9f72b7953f", + "c7b4dc08d4ae40af9a2ab2b78a1dad0f", + "c7e64de97ac6421499e2ad9f67bc419a", + "2807725d165b4877929693661171736c", + "d7d6863aea5a4db2bfdda6c708291dcc", + "74d4f2959b4047c89182c1ec63df08e2", + "184954dceb45488483a30405f1cf550c", + "6e9a6f1ec4e142558c91ed9f80398f00", + "4c07e50fbb6c42d7ba20b7f084d52b71", + "80cc55a66a5f4305bcf1eccd67b7693b", + "ddda884ec6c54903b9bd741e5ab0cd8d", + "eac214a702d14084965a8be6e1e0db6d", + "f1f42573c8564030b75d7af02a9f7f7e", + "c9f8619f01b04e5996e62f7842ba499b", + "621afa2e11564f148fe1b41d33b306fe", + "1fbb8fbc099f43ff85df3adb39b7f5c4", + "3db6e53661084022ac038dd2ca442cee", + "90c6842f23544dd4bb04f1e4680cedc5", + "7e853b0ed3b943e4aa31fa0d483d3013", + "23b50cb43c144d5fb129571c793b1059", + "59296292f4e2444589b75c2839758a6d", + "d8bb24a7e9db4b1a9087f6cc54e9d819", + "0911afbaf58c47cabfb0699552aaf1bb", + "48a152c98a374fdbbb661bd90e82c46e", + "e38f05f599024a47bf3dd607a39e05ad", + "ba075479987849dd888b0b1bc9f4add4", + "511d5c2a1d634ab1839d803830cc014e", + "b171c249f897489487353a4169875212", + "b47e4c66ac1a41bda658d59ec27920a9", + "cf60f97b2e7f4061a19b2106308c1595", + "ed59a95a95c743c6bd169a0c5da95f67" + ] + }, + "id": "Mkq4RvEq7FQr", + "outputId": "b60787f8-5ff1-4ac5-c4ca-725f3f5abf73" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "README.md: 0%| | 0.00/982 [00:00` token using removeprefix(`''`) since we're finetuning. The Processor will add this token before training and the model expects only one." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 49, + "referenced_widgets": [ + "30d566f7f7cd4f26a57fcd9df0310bdc", + "59291b76f1fb44adbdf509d43359bf71", + "6b5887f069914368ae613bb2be3c8d33", + "b246910de0dc431089632dd353ebeb29", + "c1229f0b7b6e49daaa72dc04d6c03df9", + "3a28dde28ae84bfd993354af8402df68", + "b812bec84a814eb79442ec2028d7ad3d", + "fb523ac744564bc4961e51b91a46b0a8", + "9795836b1c24448bacf121ecdcff86f6", + "6bc70efc138f4c0c8d619b32623966bb", + "f78de814eb3a445993231ab3855a3801" + ] + }, + "id": "1ahE8Ys37JDJ", + "outputId": "a8c460ca-41ba-4bb5-a6f7-139ddd663ee5" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Map: 0%| | 0/3000 [00:00') for convo in convos]\n", + " return { \"text\" : texts, }\n", + "\n", + "dataset = dataset.map(formatting_prompts_func, batched = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ndDUB23CGAC5" + }, + "source": [ + "Let's see how the chat template did! Notice there is no `` token as the processor tokenizer will be adding one." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 105 + }, + "id": "gGFzmplrEy9I", + "outputId": "eecc6464-06a8-4b9a-8694-05b5c5201abf" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'<|turn>user\\nWhat is the modulus operator in programming and how can I use it to calculate the modulus of two given numbers?\\n<|turn>model\\nIn programming, the modulus operator is represented by the \\'%\\' symbol. It calculates the remainder when one number is divided by another. To calculate the modulus of two given numbers, you can use the modulus operator in the following way:\\n\\n```python\\n# Calculate the modulus\\nModulus = a % b\\n\\nprint(\"Modulus of the given numbers is: \", Modulus)\\n```\\n\\nIn this code snippet, the variables \\'a\\' and \\'b\\' represent the two given numbers for which you want to calculate the modulus. By using the modulus operator \\'%\\', we calculate the remainder when \\'a\\' is divided by \\'b\\'. The result is then stored in the variable \\'Modulus\\'. Finally, the modulus value is printed using the \\'print\\' statement.\\n\\nFor example, if \\'a\\' is 10 and \\'b\\' is 4, the modulus calculation would be 10 % 4, which equals 2. Therefore, the output of the above code would be:\\n\\n```\\nModulus of the given numbers is: 2\\n```\\n\\nThis means that the modulus of 10 and 4 is 2.\\n'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 17 + } + ], + "source": [ + "dataset[100][\"text\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "idAEIeSQ3xdS" + }, + "source": [ + "\n", + "### Train the model\n", + "Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 49, + "referenced_widgets": [ + "96c5653090984af8843866b1ee7c35b1", + "7f32dee0a2ab4ed8bb99042ff5daf2da", + "d990a3318024413983e1e38369e6dfb1", + "97372097bc6442cd8e2cd518edffc2a1", + "c36c6d5aab3b49ebb00885dc6ef6efba", + "cf45f9592f0140449cec334a3607c4dc", + "96118ea74ed049b1877344e3fa1e1bb5", + "09ab14ae65364f67b19aa6e0f0421e5e", + "fc6de80b1d174253bf1c6419cd32ddae", + "4b347b13474545c4be6eb5992112ca5d", + "39574ff310c54895b267b9bf1dcddb2a" + ] + }, + "id": "95_Nn-89DhsL", + "outputId": "13e4e38f-cc4c-4456-86c0-211cb101542b" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Unsloth: Tokenizing [\"text\"] (num_proc=6): 0%| | 0/3000 [00:00user\\n\",\n", + " response_part = \"<|turn>model\\n\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Dv1NBUozV78l" + }, + "source": [ + "Let's verify masking the instruction part is done! Let's print the 100th row again. Notice how the sample only has a single `` as expected!" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 105 + }, + "id": "LtsMVtlkUhja", + "outputId": "61386897-8d1f-4bd1-8c37-54cf5e26409f" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'<|turn>user\\nWhat is the modulus operator in programming and how can I use it to calculate the modulus of two given numbers?\\n<|turn>model\\nIn programming, the modulus operator is represented by the \\'%\\' symbol. It calculates the remainder when one number is divided by another. To calculate the modulus of two given numbers, you can use the modulus operator in the following way:\\n\\n```python\\n# Calculate the modulus\\nModulus = a % b\\n\\nprint(\"Modulus of the given numbers is: \", Modulus)\\n```\\n\\nIn this code snippet, the variables \\'a\\' and \\'b\\' represent the two given numbers for which you want to calculate the modulus. By using the modulus operator \\'%\\', we calculate the remainder when \\'a\\' is divided by \\'b\\'. The result is then stored in the variable \\'Modulus\\'. Finally, the modulus value is printed using the \\'print\\' statement.\\n\\nFor example, if \\'a\\' is 10 and \\'b\\' is 4, the modulus calculation would be 10 % 4, which equals 2. Therefore, the output of the above code would be:\\n\\n```\\nModulus of the given numbers is: 2\\n```\\n\\nThis means that the modulus of 10 and 4 is 2.\\n'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 20 + } + ], + "source": [ + "tokenizer.decode(trainer.train_dataset[100][\"input_ids\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4Kyjy__m9KY3" + }, + "source": [ + "Now let's print the masked out example - you should see only the answer is present:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 105 + }, + "id": "_rD6fl8EUxnG", + "outputId": "cc23b442-0129-4226-b7b8-56c39ef80d11" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "' In programming, the modulus operator is represented by the \\'%\\' symbol. It calculates the remainder when one number is divided by another. To calculate the modulus of two given numbers, you can use the modulus operator in the following way:\\n\\n```python\\n# Calculate the modulus\\nModulus = a % b\\n\\nprint(\"Modulus of the given numbers is: \", Modulus)\\n```\\n\\nIn this code snippet, the variables \\'a\\' and \\'b\\' represent the two given numbers for which you want to calculate the modulus. By using the modulus operator \\'%\\', we calculate the remainder when \\'a\\' is divided by \\'b\\'. The result is then stored in the variable \\'Modulus\\'. Finally, the modulus value is printed using the \\'print\\' statement.\\n\\nFor example, if \\'a\\' is 10 and \\'b\\' is 4, the modulus calculation would be 10 % 4, which equals 2. Therefore, the output of the above code would be:\\n\\n```\\nModulus of the given numbers is: 2\\n```\\n\\nThis means that the modulus of 10 and 4 is 2.\\n'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 21 + } + ], + "source": [ + "tokenizer.decode([tokenizer.pad_token_id if x == -100 else x for x in trainer.train_dataset[100][\"labels\"]]).replace(tokenizer.pad_token, \" \")" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "cellView": "form", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "2ejIt2xSNKKp", + "outputId": "945b9e2e-c2bd-4ffd-906d-d7d7450437be" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "GPU = Tesla T4. Max memory = 14.563 GB.\n", + "10.098 GB of memory reserved.\n" + ] + } + ], + "source": [ + "# @title Show current memory stats\n", + "gpu_stats = torch.cuda.get_device_properties(0)\n", + "start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)\n", + "print(f\"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.\")\n", + "print(f\"{start_gpu_memory} GB of memory reserved.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CNP1Uidk9mrz" + }, + "source": [ + "# Let's train the model!\n", + "\n", + "To resume a training run, set `trainer.train(resume_from_checkpoint = True)`" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "yqxqAZ7KJ4oL", + "outputId": "c4804f43-ad0d-48be-fcfc-a89558cc74b9" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': 2}.\n", + "==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1\n", + " \\\\ /| Num examples = 2,991 | Num Epochs = 1 | Total steps = 60\n", + "O^O/ \\_/ \\ Batch size per device = 1 | Gradient accumulation steps = 4\n", + "\\ / Data Parallel GPUs = 1 | Total batch size (1 x 4 x 1) = 4\n", + " \"-____-\" Trainable parameters = 12,668,928 of 5,135,846,944 (0.25% trained)\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Unsloth: Will smartly offload gradients to save VRAM!\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [60/60 03:11, Epoch 0/1]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StepTraining Loss
111.235785
212.824577
312.463327
412.219341
512.442764
611.932236
712.602695
812.851998
910.863073
109.147686
119.397802
128.469434
138.135841
146.840250
157.372794
166.193974
176.062288
186.291557
196.233539
205.865170
215.440270
225.247680
235.856437
245.654868
255.209495
265.182337
275.069244
284.234691
294.857911
304.938684
314.718597
324.803062
334.649268
344.894489
354.491835
364.749773
374.720836
384.491292
394.462323
403.844747
414.891779
424.629851
434.111910
443.651132
454.298693
464.591220
474.157773
484.565719
493.915479
503.632185
514.593100
523.698324
534.600374
544.302113
553.800784
564.000387
574.510599
584.181003
593.902064
604.236151

" + ] + }, + "metadata": {} + } + ], + "source": [ + "trainer_stats = trainer.train()" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "cellView": "form", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "pCqnaKmlO1U9", + "outputId": "f19a554f-1476-4256-a1ad-211c9a800ede" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "222.9893 seconds used for training.\n", + "3.72 minutes used for training.\n", + "Peak reserved memory = 10.836 GB.\n", + "Peak reserved memory for training = 0.738 GB.\n", + "Peak reserved memory % of max memory = 74.408 %.\n", + "Peak reserved memory for training % of max memory = 5.068 %.\n" + ] + } + ], + "source": [ + "# @title Show final memory and time stats\n", + "used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "used_memory_for_lora = round(used_memory - start_gpu_memory, 3)\n", + "used_percentage = round(used_memory / max_memory * 100, 3)\n", + "lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)\n", + "print(f\"{trainer_stats.metrics['train_runtime']} seconds used for training.\")\n", + "print(\n", + " f\"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.\"\n", + ")\n", + "print(f\"Peak reserved memory = {used_memory} GB.\")\n", + "print(f\"Peak reserved memory for training = {used_memory_for_lora} GB.\")\n", + "print(f\"Peak reserved memory % of max memory = {used_percentage} %.\")\n", + "print(f\"Peak reserved memory for training % of max memory = {lora_percentage} %.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ekOmTR1hSNcr" + }, + "source": [ + "\n", + "### Inference\n", + "Let's run the model via Unsloth native inference! According to the `Gemma-4` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64`" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "kR3gIAX-SM2q", + "outputId": "c8379be3-ca9a-4791-920c-9b6cdd46511a" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "['<|turn>user\\nContinue the sequence: 1, 1, 2, 3, 5, 8,\\n<|turn>model\\nThe next number in the sequence is **13**.\\n\\nThis is the **Fibonacci sequence**, where each number is the sum of the two preceding ones:\\n\\n* $1 + 1 = 2$\\n* $1 + 2 = 3$\\n* $2 + 3 = 5']" + ], + "text/html": [ + "

['<bos><|turn>user\\nContinue the sequence: 1, 1, 2, 3, 5, 8,<turn|>\\n<|turn>model\\nThe next number in the sequence is **13**.\\n\\nThis is the **Fibonacci sequence**, where each number is the sum of the two preceding ones:\\n\\n* $1 + 1 = 2$\\n* $1 + 2 = 3$\\n* $2 + 3 = 5']
" + ] + }, + "metadata": {}, + "execution_count": 25 + } + ], + "source": [ + "from unsloth.chat_templates import get_chat_template\n", + "tokenizer = get_chat_template(\n", + " tokenizer,\n", + " chat_template = \"gemma-4\",\n", + ")\n", + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{\n", + " \"type\" : \"text\",\n", + " \"text\" : \"Continue the sequence: 1, 1, 2, 3, 5, 8,\",\n", + " }]\n", + "}]\n", + "inputs = tokenizer.apply_chat_template(\n", + " messages,\n", + " add_generation_prompt = True, # Must add for generation\n", + " return_tensors = \"pt\",\n", + " tokenize = True,\n", + " return_dict = True,\n", + ").to(\"cuda\")\n", + "outputs = model.generate(\n", + " **inputs,\n", + " max_new_tokens = 64, # Increase for longer outputs!\n", + " # Recommended Gemma-4 settings!\n", + " temperature = 1.0, top_p = 0.95, top_k = 64,\n", + ")\n", + "tokenizer.batch_decode(outputs)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CrSvZObor0lY" + }, + "source": [ + " You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "e2pEuRb1r2Vg", + "outputId": "b7611dba-3cdc-455b-c201-cce7b36b2fb9" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The sky appears blue due to a phenomenon called **Rayleigh scattering**. Here's a breakdown of why this happens:\n", + "\n", + "### 1. Sunlight is Made of Different Colors\n", + "\n", + "Sunlight, which appears white to us, is actually composed of a spectrum of different colors (the colors of the rainbow: red, orange\n" + ] + } + ], + "source": [ + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\" : \"text\", \"text\" : \"Why is the sky blue?\",}]\n", + "}]\n", + "inputs = tokenizer.apply_chat_template(\n", + " messages,\n", + " add_generation_prompt = True, # Must add for generation\n", + " return_tensors = \"pt\",\n", + " tokenize = True,\n", + " return_dict = True,\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "_ = model.generate(\n", + " **inputs,\n", + " max_new_tokens = 64, # Increase for longer outputs!\n", + " # Recommended Gemma-4 settings!\n", + " temperature = 1.0, top_p = 0.95, top_k = 64,\n", + " streamer = TextStreamer(tokenizer, skip_prompt = True),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uMuVrWbjAzhc" + }, + "source": [ + "\n", + "### Saving, loading finetuned models\n", + "To save the final model as LoRA adapters, either use Hugging Face's `push_to_hub` for an online save or `save_pretrained` for a local save.\n", + "\n", + "**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "upcOlWe7A1vc", + "outputId": "3844523c-69ab-4501-dbe7-80fd84852cdf" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "['gemma_4_lora/processor_config.json']" + ] + }, + "metadata": {}, + "execution_count": 27 + } + ], + "source": [ + "model.save_pretrained(\"gemma_4_lora\") # Local saving\n", + "tokenizer.save_pretrained(\"gemma_4_lora\")\n", + "# model.push_to_hub(\"HF_ACCOUNT/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving\n", + "# tokenizer.push_to_hub(\"HF_ACCOUNT/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AEEcJ4qfC7Lp" + }, + "source": [ + "Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "MKX_XKs_BNZR", + "outputId": "db79fc3b-4187-48fc-b26c-8e7725425121" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "I am Gemma-4, a Large Language Model developed by Google DeepMind. I am an open weights model.\n" + ] + } + ], + "source": [ + "if False:\n", + " from unsloth import FastModel\n", + " model, tokenizer = FastModel.from_pretrained(\n", + " model_name = \"gemma_4_lora\", # YOUR MODEL YOU USED FOR TRAINING\n", + " max_seq_length = 2048,\n", + " load_in_4bit = True,\n", + " )\n", + "\n", + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\" : \"text\", \"text\" : \"What is Gemma-4?\",}]\n", + "}]\n", + "inputs = tokenizer.apply_chat_template(\n", + " messages,\n", + " add_generation_prompt = True, # Must add for generation\n", + " return_tensors = \"pt\",\n", + " tokenize = True,\n", + " return_dict = True,\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "_ = model.generate(\n", + " **inputs,\n", + " max_new_tokens = 128, # Increase for longer outputs!\n", + " # Recommended Gemma-4 settings!\n", + " temperature = 1.0, top_p = 0.95, top_k = 64,\n", + " streamer = TextStreamer(tokenizer, skip_prompt = True),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f422JgM9sdVT" + }, + "source": [ + "### Saving to float16 for VLLM\n", + "\n", + "We also support saving to `float16` directly for deployment! We save it in the folder `gemma-4-finetune`. Set `if False` to `if True` to let it run!" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": { + "id": "iHjt_SMYsd3P" + }, + "outputs": [], + "source": [ + "if False: # Change to True to save finetune!\n", + " model.save_pretrained_merged(\"gemma-4-finetune\", tokenizer)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "z6O48DbNIAr0" + }, + "source": [ + "If you want to upload / push to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location!" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": { + "id": "ZV-CiKPrIFG0" + }, + "outputs": [], + "source": [ + "if False: # Change to True to upload finetune\n", + " model.push_to_hub_merged(\n", + " \"HF_ACCOUNT/gemma-4-finetune\", tokenizer,\n", + " token = \"YOUR_HF_TOKEN\"\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TCv4vXHd61i7" + }, + "source": [ + "### GGUF / llama.cpp Conversion\n", + "To save to `GGUF` / `llama.cpp`, we support it natively now for all models! For now, you can convert easily to `Q8_0, F16 or BF16` precision. `Q4_K_M` for 4bit will come later!" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": { + "id": "FqfebeAdT073" + }, + "outputs": [], + "source": [ + "if False: # Change to True to save to GGUF\n", + " model.save_pretrained_gguf(\n", + " \"gemma_4_finetune\",\n", + " tokenizer,\n", + " quantization_method = \"Q8_0\", # For now only Q8_0, BF16, F16 supported\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Q974YEVPI7JS" + }, + "source": [ + "Likewise, if you want to instead push to GGUF to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location!" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "id": "ZgcJIhJ0I_es" + }, + "outputs": [], + "source": [ + "if False: # Change to True to upload GGUF\n", + " model.push_to_hub_gguf(\n", + " \"HF_ACCOUNT/gemma_4_finetune\",\n", + " tokenizer,\n", + " quantization_method = \"Q8_0\", # Only Q8_0, BF16, F16 supported\n", + " token = \"YOUR_HF_TOKEN\",\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pnz9QOYTMvbH" + }, + "source": [ + "Now, use the `gemma-4-finetune.gguf` file or `gemma-4-finetune-Q4_K_M.gguf` file in llama.cpp.\n", + "\n", + "And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!\n", + "\n", + "Some other resources:\n", + "1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)\n", + "2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)\n", + "3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)\n", + "4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)!\n", + "\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " Join Discord if you need help + ⭐️ Star us on Github ⭐️\n", + "
\n", + "\n", + " This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme)." + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "3d2921a997f043c781203236278c6444": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_33f7c35206e7496c8b68189c982f6c96", + "IPY_MODEL_feb2769cdd91429186c2cad2d519e5e8", + "IPY_MODEL_49109d6abdda4b5987afb03c8ca82133" + ], + "layout": "IPY_MODEL_c73912ed1a48494895d8d8d5953ba656" + } + }, + "33f7c35206e7496c8b68189c982f6c96": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c9899707f4ed4df9afdf80b91a75b2fc", + "placeholder": "​", + "style": "IPY_MODEL_cbc932cc9d654db7afaadced26c8ef7a", + "value": "model.safetensors: 100%" + } + }, + "feb2769cdd91429186c2cad2d519e5e8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_2be52eb70e904219a3a2256a3d466dc1", + "max": 10246621918, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_3a8f32dbbda14e33a997a0a361c5042f", + "value": 10246621918 + } + }, + "49109d6abdda4b5987afb03c8ca82133": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_2f6b8dafccee4a7386efd09f13d3f30a", + "placeholder": "​", + "style": "IPY_MODEL_41eb3536d2b04d6aae65a11b87a09d3c", + "value": " 10.2G/10.2G [04:00<00:00, 137MB/s]" + } + }, + "c73912ed1a48494895d8d8d5953ba656": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c9899707f4ed4df9afdf80b91a75b2fc": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "cbc932cc9d654db7afaadced26c8ef7a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "2be52eb70e904219a3a2256a3d466dc1": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3a8f32dbbda14e33a997a0a361c5042f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "2f6b8dafccee4a7386efd09f13d3f30a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "41eb3536d2b04d6aae65a11b87a09d3c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "67c4746cc365404d80e4187c371b2503": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_b21842e752724217971c9f16ec028704", + "IPY_MODEL_52985a9a27ca465da304a09d532165a8", + "IPY_MODEL_e33733c0fad5429e9b8f017a62860bc3" + ], + "layout": "IPY_MODEL_0181be8f1f67481aa6d90a6d34b3d27b" + } + }, + "b21842e752724217971c9f16ec028704": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0762c9ebd93843dd8c5a0bc0e44f21e6", + "placeholder": "​", + "style": "IPY_MODEL_5698ed0353e74431b3658499cbb6184c", + "value": "Loading weights: 100%" + } + }, + "52985a9a27ca465da304a09d532165a8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0d48488f4d5a427a84527c913213fc17", + "max": 2011, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_518191a0c4994a398e788e31a1612e46", + "value": 2011 + } + }, + "e33733c0fad5429e9b8f017a62860bc3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_007fc851909b46b08b199ac4540a8391", + "placeholder": "​", + "style": "IPY_MODEL_8697d011c5814742b7e5872c26c3a840", + "value": " 2011/2011 [00:42<00:00, 193.38it/s]" + } + }, + "0181be8f1f67481aa6d90a6d34b3d27b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0762c9ebd93843dd8c5a0bc0e44f21e6": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5698ed0353e74431b3658499cbb6184c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "0d48488f4d5a427a84527c913213fc17": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "518191a0c4994a398e788e31a1612e46": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "007fc851909b46b08b199ac4540a8391": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8697d011c5814742b7e5872c26c3a840": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "e4db4b7388a24022b4166fb8398d84f6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_1a99b2cc5a834557be169221d5e5e922", + "IPY_MODEL_9f3c0e0b175d4d029a27b4425e190fdf", + "IPY_MODEL_fab9219a5f4f4ba094c7e3b79eee978f" + ], + "layout": "IPY_MODEL_6b0b45c64c2f42bcad42097be582b582" + } + }, + "1a99b2cc5a834557be169221d5e5e922": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_479c805860804745ac0addaebca04c08", + "placeholder": "​", + "style": "IPY_MODEL_2af827637bc14b99966ac252e71ca1d3", + "value": "generation_config.json: 100%" + } + }, + "9f3c0e0b175d4d029a27b4425e190fdf": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_e01ca3fe18bd4f0faef2bf4df4faa892", + "max": 208, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_a37f76bac8d34ac982c7d6642c82d0ef", + "value": 208 + } + }, + "fab9219a5f4f4ba094c7e3b79eee978f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_bd43efb495044372b169c927f8d0584e", + "placeholder": "​", + "style": "IPY_MODEL_58f143afe4ed43efa145bacb1d9bf542", + "value": " 208/208 [00:00<00:00, 19.5kB/s]" + } + }, + "6b0b45c64c2f42bcad42097be582b582": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "479c805860804745ac0addaebca04c08": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2af827637bc14b99966ac252e71ca1d3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "e01ca3fe18bd4f0faef2bf4df4faa892": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a37f76bac8d34ac982c7d6642c82d0ef": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "bd43efb495044372b169c927f8d0584e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "58f143afe4ed43efa145bacb1d9bf542": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "a5be5bfba3ec4648aa6ccd34f2eed259": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_1b1b4f5f31a745ee98072cd87c99df89", + "IPY_MODEL_dab210ff7ce2430caf1e2d4302f8565a", + "IPY_MODEL_e2e043fe9da2485e9afd571d06bc27c7" + ], + "layout": "IPY_MODEL_fe08286d945446788aa1a44dac8f7263" + } + }, + "1b1b4f5f31a745ee98072cd87c99df89": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0c82571976a040d3a0f6596cdca32b6e", + "placeholder": "​", + "style": "IPY_MODEL_d5d3bf30ac524d00b0cc3982fb66386f", + "value": "processor_config.json: " + } + }, + "dab210ff7ce2430caf1e2d4302f8565a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_84bbfab05a234b12ba5a2d38015c9f1a", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_fea38f649a9e47b4a6185d85a62669a0", + "value": 1 + } + }, + "e2e043fe9da2485e9afd571d06bc27c7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0ef55274df7042b5ae47880fb72d82cc", + "placeholder": "​", + "style": "IPY_MODEL_68b984644256400f9024e9d37998bd2f", + "value": " 1.69k/? [00:00<00:00, 115kB/s]" + } + }, + "fe08286d945446788aa1a44dac8f7263": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0c82571976a040d3a0f6596cdca32b6e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d5d3bf30ac524d00b0cc3982fb66386f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "84bbfab05a234b12ba5a2d38015c9f1a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "fea38f649a9e47b4a6185d85a62669a0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "0ef55274df7042b5ae47880fb72d82cc": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "68b984644256400f9024e9d37998bd2f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3a0184a9cfef443385dd49f20f12cfc1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_6535c0b518144e108ac22d4c523aef32", + "IPY_MODEL_83f222f05c974fa191d0f4fd6186b1b5", + "IPY_MODEL_70f3613a5f1f4661b8b9aff8295bf7bb" + ], + "layout": "IPY_MODEL_2c80fe8defd7468da10b89d8de1f96b1" + } + }, + "6535c0b518144e108ac22d4c523aef32": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_af02af0771fe4f6e9d5f8123b6db676d", + "placeholder": "​", + "style": "IPY_MODEL_6fe995128adb4c08b07cee2a7de09454", + "value": "chat_template.jinja: " + } + }, + "83f222f05c974fa191d0f4fd6186b1b5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_653651f81f3349bdbee628cd37aafa91", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_172bc6be7ccb4efe80b6bf255f426e0a", + "value": 1 + } + }, + "70f3613a5f1f4661b8b9aff8295bf7bb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_dc4c8288184c4a69b1f19cdb5167e581", + "placeholder": "​", + "style": "IPY_MODEL_51884ea25fd94727889e54da2fe4b311", + "value": " 11.9k/? [00:00<00:00, 923kB/s]" + } + }, + "2c80fe8defd7468da10b89d8de1f96b1": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "af02af0771fe4f6e9d5f8123b6db676d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6fe995128adb4c08b07cee2a7de09454": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "653651f81f3349bdbee628cd37aafa91": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "172bc6be7ccb4efe80b6bf255f426e0a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "dc4c8288184c4a69b1f19cdb5167e581": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "51884ea25fd94727889e54da2fe4b311": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "04f2d1fee6cf4450854a90afcca7352a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_204b778594564833814db99c06ca4fbb", + "IPY_MODEL_c914b297689a4b69b9d02b01fb0f4d78", + "IPY_MODEL_843a0d013b794eaa8927086df2a38ae7" + ], + "layout": "IPY_MODEL_46c5ab9910614cb599b2694bb322658a" + } + }, + "204b778594564833814db99c06ca4fbb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_bb7b78efc82c43f78b8579e0027b5ff4", + "placeholder": "​", + "style": "IPY_MODEL_f0e4f49d60694bc5a603b3a37c7b6a0d", + "value": "tokenizer_config.json: " + } + }, + "c914b297689a4b69b9d02b01fb0f4d78": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_db61162e59c94b4fa10ff4a4c0c97343", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_0485e800a81f43b1943423f901bd1a4f", + "value": 1 + } + }, + "843a0d013b794eaa8927086df2a38ae7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b55db2c45b4048cda35766d236e4e84c", + "placeholder": "​", + "style": "IPY_MODEL_26b5672854ef4849a6d3d6ec7429d87b", + "value": " 14.9k/? [00:00<00:00, 1.33MB/s]" + } + }, + "46c5ab9910614cb599b2694bb322658a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "bb7b78efc82c43f78b8579e0027b5ff4": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f0e4f49d60694bc5a603b3a37c7b6a0d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "db61162e59c94b4fa10ff4a4c0c97343": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "0485e800a81f43b1943423f901bd1a4f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "b55db2c45b4048cda35766d236e4e84c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "26b5672854ef4849a6d3d6ec7429d87b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "0e51754fe18440349b35bcc438b1bf02": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_be57bb04bde04a58bb37d91bfbf3b51c", + "IPY_MODEL_05b6d19db7c447a9ae7d380f9d7765d8", + "IPY_MODEL_aeb32719f54e42a4b08a53cd5363a0d5" + ], + "layout": "IPY_MODEL_fde969cdd088420e972b911db636ae2d" + } + }, + "be57bb04bde04a58bb37d91bfbf3b51c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_27af09daa2374b7f9ce779aa7192b178", + "placeholder": "​", + "style": "IPY_MODEL_d22ddd345a85405c90fbceb0c623522d", + "value": "tokenizer.json: 100%" + } + }, + "05b6d19db7c447a9ae7d380f9d7765d8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3f54b22697044f4cb7c4555d1e443eaf", + "max": 32169626, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_3169b2afaf5c4bc8bfd94d87aaca04b0", + "value": 32169626 + } + }, + "aeb32719f54e42a4b08a53cd5363a0d5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_5d6cb7d0eec540e4873f290ad0c51856", + "placeholder": "​", + "style": "IPY_MODEL_111000cfd1af4e2db1f2c2557cd7206a", + "value": " 32.2M/32.2M [00:01<00:00, 161MB/s]" + } + }, + "fde969cdd088420e972b911db636ae2d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "27af09daa2374b7f9ce779aa7192b178": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d22ddd345a85405c90fbceb0c623522d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3f54b22697044f4cb7c4555d1e443eaf": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3169b2afaf5c4bc8bfd94d87aaca04b0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "5d6cb7d0eec540e4873f290ad0c51856": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "111000cfd1af4e2db1f2c2557cd7206a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "870cc718caea4b09a6ac7d43bcc34b4c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_0dce530f23a24cddab63d607f3b94965", + "IPY_MODEL_bfa5e47f4ec74dfc83cf4c9f72b7953f", + "IPY_MODEL_c7b4dc08d4ae40af9a2ab2b78a1dad0f" + ], + "layout": "IPY_MODEL_c7e64de97ac6421499e2ad9f67bc419a" + } + }, + "0dce530f23a24cddab63d607f3b94965": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_2807725d165b4877929693661171736c", + "placeholder": "​", + "style": "IPY_MODEL_d7d6863aea5a4db2bfdda6c708291dcc", + "value": "README.md: 100%" + } + }, + "bfa5e47f4ec74dfc83cf4c9f72b7953f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_74d4f2959b4047c89182c1ec63df08e2", + "max": 982, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_184954dceb45488483a30405f1cf550c", + "value": 982 + } + }, + "c7b4dc08d4ae40af9a2ab2b78a1dad0f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_6e9a6f1ec4e142558c91ed9f80398f00", + "placeholder": "​", + "style": "IPY_MODEL_4c07e50fbb6c42d7ba20b7f084d52b71", + "value": " 982/982 [00:00<00:00, 83.5kB/s]" + } + }, + "c7e64de97ac6421499e2ad9f67bc419a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2807725d165b4877929693661171736c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d7d6863aea5a4db2bfdda6c708291dcc": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "74d4f2959b4047c89182c1ec63df08e2": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "184954dceb45488483a30405f1cf550c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "6e9a6f1ec4e142558c91ed9f80398f00": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4c07e50fbb6c42d7ba20b7f084d52b71": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "80cc55a66a5f4305bcf1eccd67b7693b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_ddda884ec6c54903b9bd741e5ab0cd8d", + "IPY_MODEL_eac214a702d14084965a8be6e1e0db6d", + "IPY_MODEL_f1f42573c8564030b75d7af02a9f7f7e" + ], + "layout": "IPY_MODEL_c9f8619f01b04e5996e62f7842ba499b" + } + }, + "ddda884ec6c54903b9bd741e5ab0cd8d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_621afa2e11564f148fe1b41d33b306fe", + "placeholder": "​", + "style": "IPY_MODEL_1fbb8fbc099f43ff85df3adb39b7f5c4", + "value": "data/train-00000-of-00001.parquet: 100%" + } + }, + "eac214a702d14084965a8be6e1e0db6d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3db6e53661084022ac038dd2ca442cee", + "max": 116531415, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_90c6842f23544dd4bb04f1e4680cedc5", + "value": 116531415 + } + }, + "f1f42573c8564030b75d7af02a9f7f7e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7e853b0ed3b943e4aa31fa0d483d3013", + "placeholder": "​", + "style": "IPY_MODEL_23b50cb43c144d5fb129571c793b1059", + "value": " 117M/117M [00:02<00:00, 311MB/s]" + } + }, + "c9f8619f01b04e5996e62f7842ba499b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "621afa2e11564f148fe1b41d33b306fe": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1fbb8fbc099f43ff85df3adb39b7f5c4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3db6e53661084022ac038dd2ca442cee": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "90c6842f23544dd4bb04f1e4680cedc5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "7e853b0ed3b943e4aa31fa0d483d3013": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "23b50cb43c144d5fb129571c793b1059": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "59296292f4e2444589b75c2839758a6d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_d8bb24a7e9db4b1a9087f6cc54e9d819", + "IPY_MODEL_0911afbaf58c47cabfb0699552aaf1bb", + "IPY_MODEL_48a152c98a374fdbbb661bd90e82c46e" + ], + "layout": "IPY_MODEL_e38f05f599024a47bf3dd607a39e05ad" + } + }, + "d8bb24a7e9db4b1a9087f6cc54e9d819": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ba075479987849dd888b0b1bc9f4add4", + "placeholder": "​", + "style": "IPY_MODEL_511d5c2a1d634ab1839d803830cc014e", + "value": "Generating train split: 100%" + } + }, + "0911afbaf58c47cabfb0699552aaf1bb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b171c249f897489487353a4169875212", + "max": 100000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_b47e4c66ac1a41bda658d59ec27920a9", + "value": 100000 + } + }, + "48a152c98a374fdbbb661bd90e82c46e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_cf60f97b2e7f4061a19b2106308c1595", + "placeholder": "​", + "style": "IPY_MODEL_ed59a95a95c743c6bd169a0c5da95f67", + "value": " 100000/100000 [00:02<00:00, 90920.64 examples/s]" + } + }, + "e38f05f599024a47bf3dd607a39e05ad": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ba075479987849dd888b0b1bc9f4add4": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "511d5c2a1d634ab1839d803830cc014e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "b171c249f897489487353a4169875212": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b47e4c66ac1a41bda658d59ec27920a9": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "cf60f97b2e7f4061a19b2106308c1595": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ed59a95a95c743c6bd169a0c5da95f67": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "881902dee94e49be9939fdaef8bbdac2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_9bbb3dd8995e49e18f041f310b0d9a72", + "IPY_MODEL_12d669b8572a4cc8ba6afc9a2da30fa0", + "IPY_MODEL_a09c3a078bb945c48fee4b3b8037daf1" + ], + "layout": "IPY_MODEL_5151ccca2dfd4660ae814c2737391129" + } + }, + "9bbb3dd8995e49e18f041f310b0d9a72": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_74df6a21d36b43bdaabcc750e5474354", + "placeholder": "​", + "style": "IPY_MODEL_8acd8628e9f54e03b87ad0e1dff9619f", + "value": "Unsloth: Standardizing formats (num_proc=6): 100%" + } + }, + "12d669b8572a4cc8ba6afc9a2da30fa0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a4b2070844204cf68759dc5e9b667d8d", + "max": 3000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_6be139bc02564a30afe2a0ff6e1f2a83", + "value": 3000 + } + }, + "a09c3a078bb945c48fee4b3b8037daf1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c0c119b98e7146f893c0e6090aa06a19", + "placeholder": "​", + "style": "IPY_MODEL_ea6fc9d5e7b64b348cc3c3c9e8b7e4af", + "value": " 3000/3000 [00:01<00:00, 980.41 examples/s]" + } + }, + "5151ccca2dfd4660ae814c2737391129": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "74df6a21d36b43bdaabcc750e5474354": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8acd8628e9f54e03b87ad0e1dff9619f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "a4b2070844204cf68759dc5e9b667d8d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6be139bc02564a30afe2a0ff6e1f2a83": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "c0c119b98e7146f893c0e6090aa06a19": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ea6fc9d5e7b64b348cc3c3c9e8b7e4af": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "30d566f7f7cd4f26a57fcd9df0310bdc": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_59291b76f1fb44adbdf509d43359bf71", + "IPY_MODEL_6b5887f069914368ae613bb2be3c8d33", + "IPY_MODEL_b246910de0dc431089632dd353ebeb29" + ], + "layout": "IPY_MODEL_c1229f0b7b6e49daaa72dc04d6c03df9" + } + }, + "59291b76f1fb44adbdf509d43359bf71": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3a28dde28ae84bfd993354af8402df68", + "placeholder": "​", + "style": "IPY_MODEL_b812bec84a814eb79442ec2028d7ad3d", + "value": "Map: 100%" + } + }, + "6b5887f069914368ae613bb2be3c8d33": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_fb523ac744564bc4961e51b91a46b0a8", + "max": 3000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_9795836b1c24448bacf121ecdcff86f6", + "value": 3000 + } + }, + "b246910de0dc431089632dd353ebeb29": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_6bc70efc138f4c0c8d619b32623966bb", + "placeholder": "​", + "style": "IPY_MODEL_f78de814eb3a445993231ab3855a3801", + "value": " 3000/3000 [00:00<00:00, 10132.26 examples/s]" + } + }, + "c1229f0b7b6e49daaa72dc04d6c03df9": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3a28dde28ae84bfd993354af8402df68": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b812bec84a814eb79442ec2028d7ad3d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "fb523ac744564bc4961e51b91a46b0a8": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9795836b1c24448bacf121ecdcff86f6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "6bc70efc138f4c0c8d619b32623966bb": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f78de814eb3a445993231ab3855a3801": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "96c5653090984af8843866b1ee7c35b1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_7f32dee0a2ab4ed8bb99042ff5daf2da", + "IPY_MODEL_d990a3318024413983e1e38369e6dfb1", + "IPY_MODEL_97372097bc6442cd8e2cd518edffc2a1" + ], + "layout": "IPY_MODEL_c36c6d5aab3b49ebb00885dc6ef6efba" + } + }, + "7f32dee0a2ab4ed8bb99042ff5daf2da": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_cf45f9592f0140449cec334a3607c4dc", + "placeholder": "​", + "style": "IPY_MODEL_96118ea74ed049b1877344e3fa1e1bb5", + "value": "Unsloth: Tokenizing ["text"] (num_proc=6): 100%" + } + }, + "d990a3318024413983e1e38369e6dfb1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_09ab14ae65364f67b19aa6e0f0421e5e", + "max": 3000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_fc6de80b1d174253bf1c6419cd32ddae", + "value": 3000 + } + }, + "97372097bc6442cd8e2cd518edffc2a1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_4b347b13474545c4be6eb5992112ca5d", + "placeholder": "​", + "style": "IPY_MODEL_39574ff310c54895b267b9bf1dcddb2a", + "value": " 3000/3000 [00:51<00:00, 70.97 examples/s]" + } + }, + "c36c6d5aab3b49ebb00885dc6ef6efba": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "cf45f9592f0140449cec334a3607c4dc": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "96118ea74ed049b1877344e3fa1e1bb5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "09ab14ae65364f67b19aa6e0f0421e5e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "fc6de80b1d174253bf1c6419cd32ddae": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "4b347b13474545c4be6eb5992112ca5d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "39574ff310c54895b267b9bf1dcddb2a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "31723fdf26d244e89793dd7b1a3bff02": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_85e6c38a1df843688653d8367d3de389", + "IPY_MODEL_f9c4032c491d4d5b89638ca095568e56", + "IPY_MODEL_52fe26d316a442298bca3c28f2946dca" + ], + "layout": "IPY_MODEL_625a05721ba7449bae90f00fb9559fe2" + } + }, + "85e6c38a1df843688653d8367d3de389": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7970266ec39d421fa9b1d7827b815c8c", + "placeholder": "​", + "style": "IPY_MODEL_72ba9f0b55784a40b9af2b0d9a29485b", + "value": "Map (num_proc=6): 100%" + } + }, + "f9c4032c491d4d5b89638ca095568e56": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_4b67bd9f67794c9c8ac5aa72b0e6c7be", + "max": 3000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_d8fc545583854c73a37467ab6c35a05e", + "value": 3000 + } + }, + "52fe26d316a442298bca3c28f2946dca": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ba41ad69efdb47b193a4870c35bfab00", + "placeholder": "​", + "style": "IPY_MODEL_4982f785bf94489d86c4e80a6ee1946d", + "value": " 3000/3000 [00:02<00:00, 2492.96 examples/s]" + } + }, + "625a05721ba7449bae90f00fb9559fe2": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7970266ec39d421fa9b1d7827b815c8c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "72ba9f0b55784a40b9af2b0d9a29485b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "4b67bd9f67794c9c8ac5aa72b0e6c7be": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d8fc545583854c73a37467ab6c35a05e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "ba41ad69efdb47b193a4870c35bfab00": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4982f785bf94489d86c4e80a6ee1946d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "16043e0b5c6c4a3f8a2c6bf80a836f7e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_115e710c34b4476a9e60c59c880357f4", + "IPY_MODEL_9fd27298f3384d1aaea0e9ee9cccbd25", + "IPY_MODEL_310d507dbe244b498ba5a1ab0b79546d" + ], + "layout": "IPY_MODEL_ee02c1c324db4838aee30cc2c2e5e9f0" + } + }, + "115e710c34b4476a9e60c59c880357f4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_20817f41ac254ed994a943c5a397d506", + "placeholder": "​", + "style": "IPY_MODEL_67b9724f2c8e48c5ae00eb545a1217dd", + "value": "Filter (num_proc=6): 100%" + } + }, + "9fd27298f3384d1aaea0e9ee9cccbd25": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_8a28a80231fb45d6a0e04dab8ab07df9", + "max": 3000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_997f89d4cc0946279726b4c00f4b6b2f", + "value": 3000 + } + }, + "310d507dbe244b498ba5a1ab0b79546d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_9b274c43ab604e338fe2e00449fc945b", + "placeholder": "​", + "style": "IPY_MODEL_2d4265df9bad4cd3bc136b7a1eaf3f34", + "value": " 3000/3000 [00:04<00:00, 451.55 examples/s]" + } + }, + "ee02c1c324db4838aee30cc2c2e5e9f0": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "20817f41ac254ed994a943c5a397d506": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "67b9724f2c8e48c5ae00eb545a1217dd": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "8a28a80231fb45d6a0e04dab8ab07df9": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "997f89d4cc0946279726b4c00f4b6b2f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "9b274c43ab604e338fe2e00449fc945b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2d4265df9bad4cd3bc136b7a1eaf3f34": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "state": {} + } + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E2B)-Vision.ipynb b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E2B)-Vision.ipynb new file mode 100644 index 0000000..0e0aa9d --- /dev/null +++ b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E2B)-Vision.ipynb @@ -0,0 +1,5673 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "2vQXvnUUsTzI" + }, + "source": [ + "To run this, press \"*Runtime*\" and press \"*Run all*\" on a **free** Tesla T4 Google Colab instance!\n", + "
\n", + "\n", + "\n", + " Join Discord if you need help + ⭐ Star us on Github ⭐\n", + "
\n", + "\n", + "To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme).\n", + "\n", + "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7j01DfVgsTzJ" + }, + "source": [ + "### News" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6dT42nHksTzJ" + }, + "source": [ + "Introducing **Unsloth Studio** - a new open source, no-code web UI to train and run LLMs. [Blog](https://unsloth.ai/docs/new/studio) • [Notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb)\n", + "\n", + "\n", + "\n", + "\n", + "
\"Unsloth
Train models — no code needed
\"Unsloth
Run GGUF models on Mac, Windows & Linux
\n", + "\n", + "Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe)\n", + "\n", + "Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context)\n", + "\n", + "New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) • [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) • [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning)\n", + "\n", + "Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "K7fgQkATsTzK" + }, + "source": [ + "### Installation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vA7IKFdUsTzK" + }, + "outputs": [], + "source": "%%capture\nimport os, re\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r'[\\d]{1,}\\.[\\d]{1,}', str(torch.__version__)).group(0)\n xformers = 'xformers==' + {'2.10':'0.0.34','2.9':'0.0.33.post1','2.8':'0.0.32.post2'}.get(v, \"0.0.34\")\n !pip install sentencepiece protobuf \"datasets==4.3.0\" \"huggingface_hub>=0.34.0\" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Mp4i13PHsTzK" + }, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GFOEZbP7ONMs" + }, + "source": [ + "### Unsloth" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "QmUBVEnvCDJv", + "outputId": "9920a66c-4176-44e9-d8a4-f6bbf2e37f48", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 397, + "referenced_widgets": [ + "9dbba5cc07984692a5cb9310c37294d1", + "3ac2b9eb631b49bab460d80e68135ab2", + "9a1b953f27b2419eb3302fee6ecbcc17", + "2a4ecf86f482417081d8645efdbe864d", + "b0f65d2867744352a1ff5fb2551cab67", + "a5b07bff219c4a33b0060c3ff3a12320", + "9d2f16071d4a429c8c9dbc588ec71b56", + "ec43f7e43c54441a8e13f42b8d21c306", + "7947ae62e3814e139e33e889e4753d24", + "aa18b4a0d9964a15a8699f461491a50f", + "4eca3234dd45475d8daa4d38ad818413", + "c45ec80780064eb698252b9f8268d18a", + "3008a3d0e4dc44a79304f6a398d914a7", + "c1e1699a15934316981527944b1acc1c", + "49a5c555942842ecb8c52d085c97a72c", + "a19a4d1df8db4cdc963d0374b0a3ead0", + "29be362f9fec42398fa8c358b0c9671c", + "008f1fd0286e4192adb7f8ac89f3191a", + "3af0e66e54f54a75906e08c9a6f2442e", + "c5bbad21d0eb4fd288dfc16c468be6da", + "07bec6bd94cd4c43b5c4f691969f9e4b", + "86f337af3a05478593337b916a108e5f", + "7edbbaac43b349be94f3a35f0c6d77d1", + "2a207602d4e04defa6294fd60cdcde4d", + "f3ef6cdedf1d49919d7fff94eb744b12", + "d6dd94e831ae4bb68380e6b1ec616c6a", + "ec11bc623d2d4f69bb514ffdac4bea7e", + "fcf6969b6ca04e74b57a1f0c5b857a3c", + "a63ad4bb24d34dfa9d7758bb17c76656", + "aa962a031a974a5c9c0995f01ee44d3d", + "86b32de59f744db8957f6830aa78882c", + "b6c332d8dbf149b38350f20728eb2d1b", + "ef8855bc019c406a8139203673af4a65", + "d1f0e4cdafd14357a9cb8dcac07f9da2", + "671fd02e35244c7ba8253a9d8a214bfa", + "b5c12d22f2c8436984be989f04dc1b82", + "329aef97734640fe858eb10cd898c1f0", + "96406b4153124e8ba124145e30bd74dd", + "8d320330f70248688d5acb3ec09a382c", + "6226c59b27fc4cf38d62ce5c7c69e375", + "344ea1b499e94bce91ae471cb39c6267", + "686d888d3aa24aa1acc943a78e7ab394", + "cb9060f87c9a44b18b1370dd8ccfec6a", + "5ee05bbd6830432a9f1a92c71e8f870f", + "acff987614e2454182fdd285f84cbfe2", + "b2bdbe39bdde4891958840f88e74ed7b", + "cd8bca8f57c84bfdaa8f76002d01ca50", + "e9c068d524664334b01fb6f1db19295e", + "8dac8e16ac9145ca9aae00456f0b9c74", + "09d2ce198cbd4148ad721aa825d62d6d", + "4789199cf5964285b8e550ad4fbed48e", + "3b8b062634424ea69075e48a06c6dc8f", + "a8637a53374d4a37b3b410e4408bf4ee", + "0ab3c43906724b08bcf869a5655bff30", + "aa515530e9494a28a24f90fdd9204a96", + "a60cd51dbd1d4404989c69a5b7e95f96", + "4f47d85c7a2c414e9cfb8f9e6e87df4c", + "e237c0a6a6814576a3100f68b329d4cb", + "a5705ad4707d4a3faacdd2d37eb63fbc", + "42bd62697c1649ec98f504930e56c396", + "0661e6df71fd4df88a185d9cadab56f0", + "e615ee8b03ce4bd494315a4ad2df8b9a", + "766ffc619cf04a378d7c1186c74f87ba", + "efadf09207c0427cb0d29f48bebe2ac7", + "d2a87fd0f3d04145a5bd9dd1ed75395b", + "d38821cad5ac4f62bdbbfed1bbd3ffb6", + "418feb293e1143c7bd3a057697e02b88", + "7ca38ff83b274ddf908d1250c613b590", + "526d7159b777407ea1c3fd34d44024df", + "28b52765c091458d8c411a5aa3d3c4ed", + "0501889c836b49bf90e4a9f3cca16b44", + "df2390fa32f0482a9b7d3fd10f4d19b0", + "67cfb5f857da465cae4575ce0e12449d", + "1dcda7554eb342cbb660bf8a78272ecc", + "1d23ab2873814345a58a8c12438196df", + "b2a50fc0c2a44c82a7015f11713c6181", + "8e47b9fa36ee4c128414a714d3ae9547" + ] + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.\n", + "🦥 Unsloth Zoo will now patch everything to make training faster!\n", + "==((====))== Unsloth 2026.4.4: Fast Gemma4 patching. Transformers: 5.5.0.\n", + " \\\\ /| Tesla T4. Num GPUs = 1. Max memory: 14.563 GB. Platform: Linux.\n", + "O^O/ \\_/ \\ Torch: 2.10.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.6.0\n", + "\\ / Bfloat16 = FALSE. FA [Xformers = 0.0.34. FA2 = False]\n", + " \"-____-\" Free license: http://github.com/unslothai/unsloth\n", + "Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!\n", + "Unsloth: QLoRA and full finetuning all not selected. Switching to 16bit LoRA.\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "model.safetensors: 0%| | 0.00/10.2G [00:00\n", + "### Data Prep\n", + "We'll use a sampled dataset of handwritten math formulas. The objective is to convert these images into a computer-readable format—specifically LaTeX—so they can be rendered. This is particularly useful for complex expressions.\n", + "\n", + "You can access the dataset [here](https://huggingface.co/datasets/unsloth/LaTeX_OCR). The full dataset is [here](https://huggingface.co/datasets/linxy/LaTeX_OCR)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "LjY75GoYUCB8", + "outputId": "e5477a1e-acac-46ce-8c00-bb48ace0df50", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 177, + "referenced_widgets": [ + "87fb532c204143a6957b498382749619", + "76e88b170834473aa4e41f8ecede6545", + "594a65dcffb442cb9a7d264e17d26bea", + "5eca7ef2432f43fc9ff2bb88a017c0e7", + "4a703d9591cf467ba73f98a0bad9b169", + "690042bf19924111809748d2f3fdf5e2", + "19dbff1fba52476aa320072b0c5ca0ad", + "446de647345c4d96a4f0bc20527a9239", + "24c3d3519e5e4203801f6db21b174d75", + "b630883c3e984eea892ecf0c23942c74", + "5f93397ffbde4cbf8c494f0e6495deab", + "f04af098beb745dd83eeebb4a129df16", + "16642b8f9da245e58595e8b59a898803", + "1b39506c9fa943f18c99e9a9ab29f2e1", + "5c14213926ab404a815a6a6fb0b10e5d", + "abcb4e51c128476b97a5cabcb51925ae", + "99c7aa8c8a96453dbcc5ed8ae8d55e39", + "94b4e20a08f741d89475873612d9f014", + "eb8b4370227e4969a1c8a2649f219b6a", + "0d7ed4045cb74d75a90e6ab2304fc9f5", + "575c74bd5eed4c0bba0a5741a32e8595", + "3f07f2d9645949f688ab4d577ae4d0e7", + "749cf884ca664718be7e2279d3609193", + "d820ee3337494b92869ff7c58133f785", + "a71b8168cc54466bb4980cfee5c36b43", + "0aa9fbdf33a043bdadb42e7c9ac60fe1", + "ac278d22afd8443ba318472ee187b9ce", + "7da52b5a12eb4e22939cab415d177fdf", + "f33bb1bb79af4929beee2fd8ac8e1762", + "4b17c47b96a54844bf5e197d0cf497a2", + "1f52b1d75d2b429b957f14dc74919e60", + "235f72da403d40138566ee21c219d3a3", + "b25d1876d9504dc2b0ab0cdce4dd1545", + "b57b52f00850496e8def6847da587a40", + "7dff2ef63de54684b7b61ae037d051b1", + "b15f4b8f28094b0cab0da27ed817e261", + "49baa628d8b4481ba6dac63b13e80cd6", + "7619a6b05a5643158620b12646757974", + "b3283fd17387482f9f127a8ff8089f23", + "08eece15f66b4367825ec48c7e3961cb", + "c49366b385fa41b083bfb208d54add54", + "d4e5d57774cf4c9caa352a1ada066fab", + "dfa1df1b47d647a3a4385b901ed58f57", + "6e3ec422ca96449b81f7358fe050b380", + "abdbeadfb7f24651967c24fedd7b0f2f", + "250587f17d6845ebbee21a51597382b2", + "2197baef18b84247a61bd69563cbb695", + "843fe7d70a2c4085beb423d355aeb472", + "a1fa450ee45f44dcbe67b73cfa56df3b", + "05811e6740f44784a58660daa042a371", + "fdfc87dd989c4f39948610991bd7b5d3", + "ec238c8a4a624904a8629adc272bdf15", + "e97dcacab83d4b968bae440040e1513b", + "46e1156b616243cabf185557950e5f07", + "896090ab3ee442f3b04339560948b222" + ] + } + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "README.md: 0%| | 0.00/519 [00:00" + ], + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAUAAAAAyCAIAAACib5WDAAAYrUlEQVR4Ae3cebSuUx0HcCSFEBHKlNZCQi2zureMyTwrY2UeQlSaXNdQxjSSMltFSJF5bjCzDMuilUUUIjSKotLtc+83+z73eYd7znve877n3vXuP56zn/389m/av2FP75l10qRJswzKQAMDDcyYGphtxmS7P1wPgl1/9N4nqoZ77I/4TOvA3dU+bP/5z39mnXXWPtnSgGwfNGC4lf/+9799oD1kkjOnA1N6tD9kPUwHELbZZ5/9hRde+Oc//zkd0Ol9FgteffXVsR/apyfHTP6dCT3zzDP/+Mc/ZpttTPvImGauYxuh9BdffPHvf//7v/71r46RpCNPM5Zc7uijj37729/+ne98J9m4Y7Riwete9zpPKb1jJIOOo6qBjPinP/3pt7zlLddcc82///1vBjCqFDtHzhxnpsIraP/nP//5IossMvfcc3/5y18mHTfuWMZky0022YSKf/rTnwrJSHSMTd8//vGP55133qabbvr888/Dg9uOsQ06jpIG4sCvvPLKxhtvbNzFboS48SiRGwnamTADS25i5+abbz7PPPMYg85j2yyz8F7J/IYbbrjyyiu/8pWvwPmGN7xB/uwAp0HSy7xggw02uO2222688cbXv/71HeAZdOmBBpiQYoAuv/zy973vfccee+xf//pXa6gMYg8YGAaJkXj/WOub3HjVVVfxsb/85S+m0JasI2EyqVs4gNACmD8rI0Go+9/+9jcYllpqKalYZZCBR6LPUe2blHv77bdzp2OOOQatkUzlRonV2Yfh62MelDNImBxYsJxzzjlly66wbCrO8cRjyA3DSHCK6/POO6/JM+NQHwmqbvUl0RjhpFsSTRdPBnG6UovaIFdffXWQpk7TRdsXgJlnCk3XfEyMPPPMMw8++OA55pgj8XLkahUXIMmojxwbPCxj5Hi6hWG6dtwtQmMHD5GHKDWwTOLG1JBVNdmJAzNoCaSKZezUX375ZRqXMKeM0VDHqZf8D9F0esOSaCK3ZBrfG4pjgQojUYYYkU278DxE4N5L14kDE6mLGzBU0xXtWABT33e/+13PDTfc0DOq771O21MkbFJ6F91migqHp8Z0sTez8sorW94LK1qqnAeg2tL3urgc1TXlxCcATT+Vxixrv/SlL33xi18kcmymfJ0RK00cmFTkVFQoRUmLp0kpIR3STJw4EYB6bdSHpQJ94ZycJae971IolvEoLWGpFRUILS/NdtZbbz0w7R0YqqqNtsfciuJw2yPs/PPPb33umdfhIqnBFx3W1FgDa3wFjw1jqlLdrqdtwIW3kQxxI9HGFvxPMbfJ9la+ptEzzGhnDEa2OqbVrwB8AlBspqAqFYKAIReYPAvyAjPDVZo4sB0gCVZRIbCSFk8LSzJffPHFRx11VFXdnYlNiXBKAgoqBUmhWBYepSUsFchSMTYYlknOPvtswwNh+dSqAlWxUTCtMLfq3lk7S3388cfPPffcxx57zFM9cbAzbHoRHOe23J9++umXXnqpqsamOA2f2SN31fGCCy5Ya621jOl73vOeww47jDYwEyvXF8I//elPKto1NsWWRthEgTb2oHsbDPifYm6T7a1QSaNnJIr3/vjHP77++uvBBFv5Gj/0CUBTHw6HCRNO4J3nX3311Ysttli2poItmgHTJgQU9sZOZarKwpOROO200/785z+TiiRbb7219lxGIZvbEXvsscc555zzqU99yjavFqrvQBjIFR0///nP/+hHP1I/5ZRTHJozAght3N9///2sxwbghz70IWBarrvuOpaEpQ9/+MNrrrmmLo3G6uDXIY30izfjAb4Vb4z41FNP/dnPfrbffvtB+PGPf3y++ebbZpttxo8fr2MJHK26d9wOuSNl04TDDz/8d7/7nfqee+7ZMTYdyXjkkUf+5Cc/eec73/mLX/zC0Gy22Wa01CjCZI1PmvTkk0/+8pe/3GWXXfR985vfTN7vf//7H/vYx3JjIZzcddddAjTH/tWvfrXoooti8o1vfGNoBaA8jRcvEol23nnnueaaq7TXKm0Ggr1973vf+/3vf6+7Kclee+1lUsDfTj75ZPEI/o022shwE8ftl69//euXXnopKfQycG9605vKkGl817veteWWW7LJj370o0UDNQ7dw1lppZWeeuopA7HPPvuwE6xO1su0mqnxP6Zfw32epCI55wnHZ511ltDOJahVi+QmYwhd6txJF9qpdh96HRXAa6+9ttj/3HPPPfLIIwxF1MCAAeC6biyiwm4SSjzTgiVf0QVZJRdOEp4du/tkjKsApa6jXS5fP/GJTyDhvqtPv/nNb9T333//Wscg4WyChbH3tUa3oO19JSJLJjjHP1ZlVG7geBmTjXymhaplXZJedtll4XnxxRdPxaAo9L/ccsvtuOOOvOjZZ59997vf7RgcQKM+IQTv0yGHHLLDDjt89rOfbaUiCldCpfZE7uGHH+ZUpHjwwQchRBfMgQceqEVoMNxePR2/uf2irgswU5jqkFUNoGigkUNXA2DA6qGHHhpUnjXNuLyhsdgYoQQga2aNjUrQ2N8ydeJKHQpL5VTS4Gc+8xkmLgyLcwsvvLBP0pTrB4L9Cius4DoRJVbDvFeabVpIOAX3/x8gUfnhD38oOggWCy20EO8VbumRpnxdYIEFjAE7A3bSSSdpEZv5ueSPJXV0a0EdDOx33HGHp15VctV6SIgLBGSmKi67alx66aWdPIlQ5t4+1RhmUhnjKqqO6yxDoahUOsaDJX3vvPNOHoh/bH/kIx/BqrmMdvirmOmHxi655JI11ljD1SIC0hIM4pcn3QabRnMTWdr4Gvq3vvWtPPMb3/gG95Npq2pRhxM5OfzXv/61QSRObVAKG+bnSnmtMgbPMsssI4HDb0QwYHDNbO+77z5Ts1133dXMCLzuku26667LhUAi/Y53vKM6ZDpqXH/99QlIAzihgUYOkdPOjRWVSNGomSqH6sB4QVRU+9T/V8yVYgzUr732WiKZjJV2NxmoJucNJld8mGoCXGCGXkkYk9ZM3qTWP/zhD8zFPFY7nDBTrtF69NFHGZBZMeuha3nAJMrX6L1Kjma1szzzbXacpK2xClPqIoU1gtD+gQ98wDQMGPyeN910k8FgrCDDYWiZgIj9jJ6xYq+KVr1fBZNIe2KSoR900EHGxcycHbP18lVFAaPQKiu86KKL0pinzOYXGslykUW7GabwLUdZYpxwwgnU0qjzqAiMcN/UEoItsc9kVdGS1yoD8Gi3LYxKYUCG5IfA8pXhielm8qQIrQxEbcgCTA+rrrqqugJDjcMI8slPfjKzLTbTSjP64seTwZhpM9Hw4zmmyjRrYMKIf7feeisWb7nllrvvvptapSkyyIfcmPzmEtXEG717mr+54suAiO21FIGcEi2e4RQXPBNiWQls1sDg5QHZmO9hAAZgeLCoM49ijg888MAqq6xy/vnnW+zla0FeKtphM4VOdi3tpRLRzBsF+wMOOAB+S0H7GWhFHPN5TFpiGVrYwq1gbwpgb+zEE08kCCTgC85qvTT2poI9xexup512MuWjRiZuZWsBabBkVMo0VSnMkMhFbqJtt912SadayjhGENKR1xIJTsq3bhJhH3roIWAGSBQzP4cWTpAq3Nuc5YknngDAWyDUvVAMTkrTkqeWVApMKto5pDqHNAomQcKrqT4qcPpK/1xI+vVKapAoqtSGDKRimiYKQ2JSAGGNw3RncpEdQqWmGZ80ogIbo0L6c5/73BFHHCGC+E0LrtI3zPf9OVXjZDMAYpKJqFBN11rIwCtwPG7cOLwSTJ38kTDcgzF+VjKkVY+O8gkYczEYxYGDRCSWJ0V3eVVfc/JvfvObjqYMJPwyfHZZ9t133+OOO+4LX/jChRdeKCTrizQSQV57winEEKHKQGC04MSg2uSQMfwswTrZls/uu+8OwCcABuZtb3ubEzJgCy64oBZ2Y+FtN4Wfy8Bl1AtOkMBqIte46u5raHkaIC701a9+1RqYYWULyg6F7UZGRmNOCriW+SeVxi35m0ZSiHHBE+YJbuDwSfMkIqyT4cRuK2GE7Ir99re/pTrxNMeHNAatOAgYNiNY80yYo5xQF2jgN+J55RJKNAOVinP7448/3tTPBuS2227LA8ULjEXnrtbZEtc3XTy1e9aGLI34YQZkwUMjh5GUQoQ5GOhQwKppplBRiTmZj5jS2xtjhzbV0IW/CtbPOjlTyK8SLeD4teZJzB1/5Wtp76xiGHQ0JJnpBcl73/te29rqJkue9G5OmE/qlMWksovAS9NefTIgr1ydXYqUWE1Lgcmr++gE4beClAl8xgYMlnQxR0qMkGyNkPavfe1r4KMKXaIB7amYkVqzmX86jfAcYhEjUoYIX8CqvZZddlkugSVukKmdVzzzWxaJQ5+owrxUI0vVYrFgL5ekitcUX8kiZsW1NEZF9vzV2ToZaSD3+IVazgYy4ltZyPb2Mr1WceZrTTk8SiHLkksu6WlSE/yegVdBSLt77CuuuKLX4MzAkctXMgYe2+G8NmSxDbsndCX4gm/kMORMKEwwQ6KpZhAqpXCYfbUsuEpjwPCDVc/Sq2eVqRkYT9R08803cxgHDMYvhQOvs846+E7kBlMtGGUrLMYsDkz1kzpUpBLChWoVr6h4mj8vv/zyZGZwBttWlrgL3iskNj8TNbTo++1vf/vee+9Ni741EuU1DokKfkpjtWJEfeKlxlWwNx7yjB0UacoK3EQDP5b6ZtGcVsfddtuNXOjapDFNxXkw58kiLRmIDzJKqNJKXTt+1PGWFqR1Zzp0GzwFUqXWUn2tdTQW7BV7VvLYphbSiTgmeygCFu+8WhCChJlicxgD2GvUmOz3wQ9+UAt+ggRR4RKYpCqW6bj33nvDKeB+61vfOuOMM6IcIth6hBM8zUAVbvMsygFGdlwhIcfmNem36MQngRt152omYqeffnpMUXuKqTu00SSVouUpE9aGLEJ5hge0zPtqHOaTdiUiN9XMa5Qnj6xCt5YV9sBNdug2NlxgVLCkVFt6Vw+LnuTxFJDQLo3xyYTMptmPMICBidCWWLXCSbQQGwyFAjYMjMPqhQuFSkbXjCuoRFAbV/kUljJ711FjYPI1z+DEm2mYwTCo2tGqwqDoFYcRLSdhVtRSqB3XnHVfccUVYEgKxpxKPRRzTiNva6mh1TLcIvbbtAs/w+rb2DGTQPOO4IkaZZ6wLSzyIoJEY056yJXNqhrdjEtNRWC06CKNq8PmmWSlAieX81WYYNxa2hfTKKUpTNgz+riFsPYPGPLV/FlSZWMw4FZFPgDcOGQAHClzTnoI/005hDZ2CL6NZnyFBLBpXSFXs4G8msUIcJ7h0LNnZaqv8gEx2wzHdohAaMxsx2+//fb2QqRB2YbMykg4y3gYe0tKOPm2nQmBPCdv3BUV7oQBnOAntsXfDJhX1IOhykNa7IFRsdAen6+BFTzCpIMo2xumGPZpDJ4NNh1zrA1/8Ni/QUJ+CzbrMXt4hZ9CPV+9CjppDIeoR1EcxlpA0VcjkWF2NGJhKdBoiX0Ac6qZRWMwaLGCZYWezHerrbZq7Ii6+YJzWr7hq6M+sSZswJzb4KZOaREuiUxSLYTldeEwXz2LimQw+0BwCrLmqMJEbNT+XzKnWAY/DMSR0GT4iRMnYlhjwabiVYngk933gAPKaw0Mfqq2N2HyzOqik8DoosJgnEeYXefVXK/VkAGQ5+V27CHdnsPI1UYzIUcDZamCtyrz6mmRDLDkWVpqYKP3OtWBjahdEBZseaPCPjzFM2Yk/Fv8YCIiNeWGJK1KsZWozGUMokLoaMGpBvwFoUYUw0DpVb42pZ5GmHkFG9W3KZ/BZrQcIaDC5YRVkKw5dk/ewDgstRXkE5wk0p7kZiCrmANsciHiWNqZiEKlUYmYtuJMzgUpIcMJLcaMsW0kSCwfZDMVyQT/whaFOJFmcCGhxam7mYgAauamY45/SkcWDxIhgR8JkcgZkhbc8gHLBI4qLOpos127wsd4mumoRstadEmXT3niXIUUOWUJq0jAKdfpZXExfvx4yIFFRrGVk/skfCRL18YoJNz0UvSqUdSiBBUxlfI65cvkR7rYTlPyFTO5hVIbspAWak211Dkw+OlyCKaVZoJQLIjG6Kcp/zDgBJineo/LVAfuAeHIb0/CJlaVXEyn2lKt02NUWW0s9TL8soHYDLKVlttQafUpqOx4Sd1GCNFwgqgKV3//+99vxc558rNv4U+7osWc0MF1+JTzJVIhI6sJXpTwYchDwjqflSiZvevldBfyfJ0wYUKtY1MZccWLINExdB2/ueSESqxZI2O1xKUo9UYkNT0kf5qFwpn56k1TTsvjabFXocQNZLEmgZjsIZ1nRgfz4T+vVYBavdbd13Thh5JwkUJ7jdXgMa/BqlHwOnkYpjDTnsOCp1Ez6W7cmRYlwNmosdDt43MaB54i8v8fRQV57wqLlCXnuP0bQ6cOwxM1FfyFh9LSvhKdZlrlWAVwGy0j52tIxP60KFUS1ZagkuWYRdWBY0kylXZTO93t96iXFakkoFfQOiHnzElQWri616RQnGBDwT8f1m7TBSHtkMfNQrexoxZgOEyJIVKvE7JI5GlV5sYbsLiiPG9qPW7cONEk3T1rRa8gLHqQyeVeYFqgsrrRXUUxkREsTByEGAwQpIZt6K+FXGMX/Gi02LH9Llx6rRHySkDbGQIWeQGHVZXpcqgvQZpqRjsM9G9kM5rhRGNjCXBj+2i3TOPAo00MfsNsDGJtXSEXnfIcWi6htyuYIQnyRgdmH77KKg5gpSavll7FgUnHyhVrBFu49i3j5NpNOE2n9XXY7hk8Zte2HtAy740U/NC61w1nZqGX4pJZtaMW3duUpvYkavA9qaZNx1afwmojWh5i1dPGsiHUN91bIW/fHiVwJDe0nJIAromfV5P/Mu2vIhwKh001E2GH6MBVir2s99qBuysbFbMM2rcZZg2cZNVoZB0TbeXAjQjtJ/G97BSwmNVWW+0HP/iB1antJROwmJ0rfmBYoQWhe7yQJJPbwdpiiy28OpwEkHNXW6lZz3Nmy85axzYOk5QSDqc4zjTzi7QP61lDUsU/LDy9AR5JpKhxOEM48Owso8eFmth0t4jaNWVS1mYcw46ousOkbiFvj4cgyDkvtafiX89Ks7m96CajzOzkUIHB7w1sQY0bN841I7f2uLcVI4/1CQZPe/5LLLGEug0Y57pSNzxuiQlJcgsAedv9qmrHNqeO1fPVHI3CUEooDkv/NSRV/NB2gLAwM6wKQpwTM61kj7/VuB06hz0TZFhSTxe4Dw48LOuZrgAAZGBWZcu365jbU2dPvNcUmjfa3zbLlYGdgbl/ixO+J8GKJtZmiSmZ51dxsjZIfHWCoovixqJ9Y3dI/TNxlipAiErq1V7qjWZaA2j1ikSrT521dx1hKzYQah+aWzn2EDkcIlgr9vrV3qfrI90Ql3HzEJj8NJ8b5ApKLT90g05zHPyTPdn2lFT90IL3urWS3+5Y9DoE4mN2m++55x5enVNZgQbPKUkXuAVmi9VPLJCRBFwCF4l05MNaIs5rnSb/1bE5Q4PW0dFAFD523ZvRzOjFhq01Jx9WuitLlpqOkdiGtSjkcTyEVJz0xmbKVX6nRz65+MH9+K09FT9b5b06ZqOlsKc7SDD+HwUk7soDCLnsqPN2MErpMqj0WAPRv3HhvWP2GKkPU+gRBkqjSKFykc0eZxiuFvABN5alMo0dzy3bcOUKR5DHUROMPWVaVyMkSb5t/mau6zKGnzG7deTQ0qzYRWsZdcKECUwhvQoVUujilrVr4QKEnze6pBH8bmXI5+qBKV0GlR5rgOsaXFsYBsJ+RI+pD5Uc5maskrjo2gAJXZDI/5SSytLeXVmS0nM9QIyAvJZIG8nJn25uaXda1vh10DKjaCDmZLhd+7Wi6foJZbf0MOOtgaUyyhUR7RtJuTJYrrPSSC3LDTWGtYaTeNFyRGQT2LV+u81W3QY1PXzCQCkmvT4BkHV5viQMzNfW6Cf/SBMkGM8CRpBCojQOKr3UQLzLuLij4lqru4Nu6RqjrhtYF4TqViToC56clCLNE0aJAZi5k8Hz0zxRw2Wv+G1TcubS/gFAdbXcFGzQOMY1YMTtR/h/YJZFthjVGcDY5HlWbHUhDPQDBS1nxVsqo82Febud5/woZ7RpDfD3VwPCtEVQTgT6y0l76jOwAxNM9OnNrAYhYSKHOu0VKlQPBaw9ksHXsaMBntzqhHksMDljO3CPNdizeNFjuQbkmmogk9PeZIimDAylceDAQ9HSAGaggTGqgRlvF3qMKnLA1kAD/dDAwIH7ofUBzYEGuqSBgQN3SZEDNAMN9EMDAwfuh9YHNAca6JIGBg7cJUUO0Aw00A8NDBy4H1of0BxooEsaGDhwlxQ5QDPQQD80MHDgfmh9QHOggS5pYODAXVLkAM1AA/3QwP8AGMg7qICuIqsAAAAASUVORK5CYII=\n", + "image/jpeg": "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAAyAUADASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD3+iq1/f22mWE17eSiK2gUvJIQSFUdScdhWYPF+hGGxmW/Vkv8/ZCI3PnY5O35eeOfpzQBuUVk6DrsevJfywQukNrey2iyMQRN5ZAZl9t24f8AATWtQAUUVkxeJdJm1dtLiu99yHaI7Y2KeYo3GPzMbd4GSVzkAHjigDWorOTXdMk1afSlvIzqEEXnPb87wmcbgMcjPcU7S9ZsNagefT7gTxI5jZgrABgcEcgcg8H0oAm1HULbStNudQvJBHbW0TSyueyqMmquhawdb0/7WdOv7A+YyGG+iEcnHfAJ4PY5rnPibIZtF0zRQpb+2NVtbNwoBPl797nBPI2oc9euO9dqOlAC0Vn3Gt6fbal/Z0s5+2fZ2uhCsbMxiU4LcA98DHXJqbT9RtNW06G/sJ0uLWdd0cqdGHrQBl2HiJ9U8SX+nWVkXs9Obybq9eTaPOKhvLRcHdgEbiSMZHWt6uO8Cf8AIQ8Y/wDYfl/9Ew12NABWZea5bWWsWOmSRXJmvHKRyLCfKB2O+C/TOI24GT045rTrmfEssq634eaOyvJ0tr1p5nggLqiGCVMkj/adeBz3oA6aijtRQAUVU1LU7PSLI3d7N5cQZUGFLFmYgKqqASxJIAAGTTNL1ey1iGWSzkZvJlMMqSRtG8bjBKsrAEHBB5HIIPQ0AXqKr3t9a6db/aLydIIdyoXkOFBY4GT25IrCsvFE9341vNEGm3YtYbaKVLrygEJYyZbdu5VtoC4HUNntQB0tFYq+K9GbUxp4uyZmna2DeS/lGYDJj8zGzfgdM57da2qACiq1rqFneyTx21zFLJA5SVFYFkYEjBHUdDVnNABRVe+vbfTbKa8u5PLt4V3SPgnaO5OO1UJfE2jwaRbarJeotjdMqwTbWxIW+7jjJz29aANeiqP9rWraqmmozvctB9oZVU/u484Bb0ycgDqcH0NQw+INPutKudRs5HuYLZnWURRsXVk+8u0gHcPTrQBqUVVTUbOXS11KO4R7JofPWZTlTHjduHtjmud1Lxi1v4k0PT7GymvrTUN+65t0EiDHAw24D5Tkt146c0AdZWBa+JH/AOEsn8PahYm1nMTXFlMsm+O6iBAbHAKupPK88cgkVrTajZ297DZzXMcdxOCYo3YAvggHGevUVy2u/wDJUfCP/XpqP8oaAOyooooAKKKKACiiigDi/ilfrb+CrjTkuEiutWePT4dzAZ81wrnkjgKWJ7euKxNMtp3+Kmm6bc61Hfw6Jpsk0SpEkflyOREFwpOSEVuvIB/2q9NZFb7yg/UUBEByFUH1AoAr2Gn2ul2KWdjAsNvHnbGvQZJJ/Mkn8a8/fVvExkYi48QKCeAPD8fH/j9elUmB6CgDIj1KS08JtqV55xeC1aaXzofKc7VJOVGdp46Vw2gxx/8ACV+Hre11Y6nb2tjPeXUStGYLKVgoDgoB8zF5eHLHDMeOtejajp8Wp2TWk5YROylgpxkBg2PocYPsamit4IFdYYY4w7F2CKBuY9ScdSaAPJ9cldIJfiNpEaX11p2qzKY4Hz5toALdk4908wdcZJHBr0vw/pzaT4fsLF8GSGFRIR/E+MsfxYk/jWiqqowoA+gpelAHAa+x1P4xeFdOUhk060udRmTGR8w8pCR2wc4NdF4z1m48PeDdW1a0jWS4tbZ5I1YZG7HBI9BnJ+lYugw/2h8UvFGrNl1sobfTIXyCAdvmyAHHq6ZGeuc9q7ZlV1KsAykYIIyCKAPIdD8T6ZpnjDW7/VNfl1aWysobSGSNQ5lkJDSrGFGPmkaMKo7hh0XI1fBV3rGlweIdCOnww6mhOqafYTTEIsVxlhGWx/BJvUkcZ/OvQ0sbSMIEtoVCBQuIwNoXO0DjjGTj0zUhij8wy7F8zbt3Y5x1xn0oA5H4bLAfDdxP50kuoz3076n5qhXW63YdSoJACgKBg/dAPetS4u/FK3Mi2+kaTJCGPltJqcisy54JUQHB9sn61leBP+P/AMZf9h+X/wBEw1z3ivSlTxFGttqivq7alDqDXcoCHTLQYVlZ+6MRtVD94k8cE0AdbNqvii28vz9K0KLzHEab9YkXcx6AZg5PtVaLxJrdxftYw2vhuS8XdugXW3LjacN8vkZ4PB9K5jxvDc3Y1bVvKtZLXz4tJV5kZp4kZ0RzbgjaHLO3zc8ovPGBt6ZoWpJ47a+n0mK3062e5+yGK5UjMpBklZcbjI7AcZCqM8EnNAGwL7xac40bReOv/E2k4/8AIFU73xLremKrX9r4btVYEgz626AgdTzB0GR+dcnrtwNA8SS+PyJRp0d8+l38aJkSW21U345BKzhvc5x2q3eeEb1PCOlWml6LB9quYJRqNwkqQzRLMA0yJuXGWPy5I+ULwM4wAX/FWoXV3feGNK1Oay0yG8kmurm4imWQJ5O0xrHJIgAZiwO7AIwcetY+jardeHkuvFUt2kuk6vriW5e8IWV7UKIIplORk7lycglkG71r0mHTLabSbW0vLG3ZIo0HkOokVCFxgbuuOmaq6n4V0nWbqW41C3Nw0lo1oFkclI0bO4ovRWIOCw5wAKAMGa403VfiTdabrUtoxsbaI6fZXBH7xpA3mShW4cgKFGM7RnpurdmsTpepaprkYWRTp8USW6rg5hMrcH38wDpxiodX8HabrVjp1leNK8Ni8bIWCu7bMYy7KWB+XkqQTk810NAHi+mwS69b+E0t9X+0ahqN6mt6jBbqn2e2Vf3p+RR8jbzGvJyxLZPp0d14zurv4eardPeWtrqNlff2beXNsS0cOZljaZeSQBG+8Z6Hr0rtLnQ7GfT7qzji+yx3Q/etaHyXb1O5cHPv71Fp/hrStKup57C0S38+3jt3ij4jKR5C/L0yA2M+gAoAXQbLRbTTIm0GKzFo8ahJbYqwkUdCXGd31JJ5Ncnq914oPi/w8X0jSllH2ny1XUpCrfuxnJ8njj2P4V1Hhzw1ZeGLS4t7IsRcTm4lJREBcgKcKiqo4UdAO571oy2VtPd291LCjT2+7ynI5TcMNj6igCGSGe/0WSC+RLeaeFkkEEhkCEgj5WIGevoK4H4ZC58Q+G9AvL2J1s9Is1gtVYFRLOq7GkI7hFGxT6lz/dNemU2ONIkCRoqKOgUYAoA47RNQg06+8a6nq8kVstvqIEkrN92BbeIpnk/3icepNZXg7U7mLxzqIurGTT7PxHH/AGlYW8xIcNHiOTcP4XZfLk29geea7L+woRr0+ppJhbqFYrq3ZAyTFD+7fnowBI9wRnoKuX1o13bOkUvkTlGWO4VAzRZGCVz3oA8xtLzTVh8PaPqF7bQ6FLf6kwWZ8RzmG5KwQ7jwV+bdg9fLA56V302gQnVdHu7XyYILAzfuY4wAwkXHGOBzz+NJP4V02bwqnh1IxHYpEsSgxpKcD1DqwJPOSRnknrzWjpunwaVpdpp1tu8i1hSCPccnaoAGT34FAHI+BZdL1wXer3Elpc6691J9oU4aS0COyRxhT8yAKB6ZJLc5pPHbNBrvhq50wtJ4gWeWOztSP3c8TKPOEhyNqgBTuGSCBgHOK3j4XsG8WJ4jYE3qQtCmERQAcZywUM3T+IkDJxWRrv8AyVLwj/156h/KGgDr5ZkggeaVgkaKWZj0AAyTUOn6haarYQ31jOk9rOu+OVOjD1FWaOlABRRRQAUUUUAFFFFABRRRQAUUUUAFZ+uWF3qej3FpY6lLp104BiuolDGNgQeh4IOMEdwT0rQooAyfD2hroOnyQG5e6uJ55Lm5uHUKZZXbJOBwAOAB2AArWoooAKKKKAOb0nQr/RfE2qzwSwS6Tqk32t0clZYJ9qq2OCGVgoPJG3Henz+BvDNxrY1qXRrWTUhKs4uGBLb1xg9e2B+VdDRQBzWl+C9MtGt7u7hFzqEchuGkZ28vz2JLSCPO0Nkn5tueBXSModCpzgjBwcUtFAGXD4c0iHSJtJWxiawmLNJbyZdGLHLZDE9Tz9ea1KKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigArmYdB1C78bHXtUlgFvZwyW2nW0JLFVcqXldiB8x2gbRwAOpNdNRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFAH/2Q==\n" + }, + "metadata": {}, + "execution_count": 7 + } + ], + "source": [ + "dataset[2][\"image\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "lXjfJr4W6z8P", + "outputId": "3d2c1834-b446-436e-81d9-0375434fd9a6", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 52 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'H ^ { \\\\prime } = \\\\beta N \\\\int d \\\\lambda \\\\biggl \\\\{ \\\\frac { 1 } { 2 \\\\beta ^ { 2 } N ^ { 2 } } \\\\partial _ { \\\\lambda } \\\\zeta ^ { \\\\dagger } \\\\partial _ { \\\\lambda } \\\\zeta + V ( \\\\lambda ) \\\\zeta ^ { \\\\dagger } \\\\zeta \\\\biggr \\\\} \\\\ .'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 8 + } + ], + "source": [ + "dataset[2][\"text\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rKHxfZua1CrS" + }, + "source": [ + "We can also render LaTeX directly in the browser!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "nPopsxAC1CrS", + "outputId": "ec105cce-33b7-4bc1-f4db-45de7e8befb3", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 52 + } + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/latex": "$\\displaystyle \\sigma ^ { \\mu } \\frac { \\lambda ^ { a } } { 2 } A _ { \\mu } ^ { a } .$" + }, + "metadata": {} + } + ], + "source": [ + "from IPython.display import display, Math, Latex\n", + "\n", + "latex = dataset[3][\"text\"]\n", + "display(Math(latex))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "K9CBpiISFa6C" + }, + "source": [ + "To format the dataset, all vision fine-tuning tasks should follow this format:\n", + "\n", + "```python\n", + "[\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"text\", \"text\": instruction},\n", + " {\"type\": \"image\", \"image\": sample[\"image\"]},\n", + " ],\n", + " },\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"text\", \"text\": instruction},\n", + " {\"type\": \"image\", \"image\": sample[\"image\"]},\n", + " ],\n", + " },\n", + "]\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oPXzJZzHEgXe" + }, + "outputs": [], + "source": [ + "instruction = \"Write the LaTeX representation for this image.\"\n", + "\n", + "def convert_to_conversation(sample):\n", + " conversation = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"text\", \"text\": instruction},\n", + " {\"type\": \"image\", \"image\": sample[\"image\"]},\n", + " ],\n", + " },\n", + " {\"role\": \"assistant\", \"content\": [{\"type\": \"text\", \"text\": sample[\"text\"]}]},\n", + " ]\n", + " return {\"messages\": conversation}\n", + "pass" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FY-9u-OD6_gE" + }, + "source": [ + "Let's convert the dataset into the \"correct\" format for finetuning:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "gFW2qXIr7Ezy" + }, + "outputs": [], + "source": [ + "converted_dataset = [convert_to_conversation(sample) for sample in dataset]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ndDUB23CGAC5" + }, + "source": [ + "The first example is now structured like below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "gGFzmplrEy9I", + "outputId": "b38a50f0-7526-4be4-8a1c-97558df918b4", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'messages': [{'role': 'user',\n", + " 'content': [{'type': 'text',\n", + " 'text': 'Write the LaTeX representation for this image.'},\n", + " {'type': 'image',\n", + " 'image': }]},\n", + " {'role': 'assistant',\n", + " 'content': [{'type': 'text',\n", + " 'text': '{ \\\\frac { N } { M } } \\\\in { \\\\bf Z } , { \\\\frac { M } { P } } \\\\in { \\\\bf Z } , { \\\\frac { P } { Q } } \\\\in { \\\\bf Z }'}]}]}" + ] + }, + "metadata": {}, + "execution_count": 12 + } + ], + "source": [ + "converted_dataset[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MsRPBIb0JJ6c" + }, + "source": [ + "Lets take the Gemma 4 instruction chat template and use it in our base model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "exoDVEvmJN-6" + }, + "outputs": [], + "source": [ + "from unsloth import get_chat_template\n", + "\n", + "processor = get_chat_template(\n", + " processor,\n", + " \"gemma-4\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FecKS-dA82f5" + }, + "source": [ + "Before fine-tuning, let us evaluate the base model's performance. We do not expect strong results, as it has not encountered this chat template before." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vcat4UxA81vr", + "outputId": "d033736e-e80f-490f-f189-6e09ea98817f", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "```latex\n", + "H' = \\beta N \\int d\\lambda \\left\\{ \\frac{1}{2\\beta^2 N^2} \\partial_\\lambda \\xi^\\dagger \\partial_\\lambda \\xi + V(\\lambda) \\xi^\\dagger \\xi \\right\\}.\n", + "```\n" + ] + } + ], + "source": [ + "image = dataset[2][\"image\"]\n", + "instruction = \"Write the LaTeX representation for this image.\"\n", + "\n", + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\": \"image\"}, {\"type\": \"text\", \"text\": instruction}],\n", + " }\n", + "]\n", + "input_text = processor.apply_chat_template(messages, add_generation_prompt = True)\n", + "inputs = processor(\n", + " image,\n", + " input_text,\n", + " add_special_tokens = False,\n", + " return_tensors = \"pt\",\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "\n", + "text_streamer = TextStreamer(processor, skip_prompt = True)\n", + "result = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128,\n", + " use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FeAiMlQ71CrS" + }, + "source": [ + "You can see it's absolutely terrible! It doesn't follow instructions at all" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "idAEIeSQ3xdS" + }, + "source": [ + "\n", + "### Train the model\n", + "Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support `DPOTrainer` and `GRPOTrainer` for reinforcement learning!\n", + "\n", + "We use our new `UnslothVisionDataCollator` which will help in our vision finetuning setup." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "95_Nn-89DhsL", + "outputId": "13282f82-d3fe-438b-d0f8-e47f4582fb17", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Unsloth: Model does not have a default image size - using 512\n" + ] + } + ], + "source": [ + "from unsloth.trainer import UnslothVisionDataCollator\n", + "from trl import SFTTrainer, SFTConfig\n", + "\n", + "trainer = SFTTrainer(\n", + " model = model,\n", + " train_dataset = converted_dataset,\n", + " processing_class = processor.tokenizer,\n", + " data_collator = UnslothVisionDataCollator(model, processor),\n", + " args = SFTConfig(\n", + " per_device_train_batch_size = 1,\n", + " gradient_accumulation_steps = 4,\n", + " max_grad_norm = 0.3,\n", + " warmup_ratio = 0.03,\n", + " max_steps = 60,\n", + " # num_train_epochs = 2, # Set this instead of max_steps for full training runs\n", + " learning_rate = 2e-4,\n", + " logging_steps = 1,\n", + " save_strategy = \"steps\",\n", + " optim = \"adamw_8bit\",\n", + " weight_decay = 0.001,\n", + " lr_scheduler_type = \"cosine\",\n", + " seed = 3407,\n", + " output_dir = \"outputs\",\n", + " report_to = \"none\", # For Weights and Biases or others\n", + "\n", + " # You MUST put the below items for vision finetuning:\n", + " remove_unused_columns = False,\n", + " dataset_text_field = \"\",\n", + " dataset_kwargs = {\"skip_prepare_dataset\": True},\n", + " max_length = 2048,\n", + " )\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "2ejIt2xSNKKp", + "outputId": "54133bfb-63f2-4d2d-bbe1-cea95d077a83", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "GPU = Tesla T4. Max memory = 14.563 GB.\n", + "10.307 GB of memory reserved.\n" + ] + } + ], + "source": [ + "# @title Show current memory stats\n", + "gpu_stats = torch.cuda.get_device_properties(0)\n", + "start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)\n", + "print(f\"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.\")\n", + "print(f\"{start_gpu_memory} GB of memory reserved.\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "yqxqAZ7KJ4oL", + "outputId": "3d8a59ac-ec41-419e-f25f-ddee3f9dea89", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': 2}.\n", + "==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1\n", + " \\\\ /| Num examples = 68,686 | Num Epochs = 1 | Total steps = 60\n", + "O^O/ \\_/ \\ Batch size per device = 1 | Gradient accumulation steps = 4\n", + "\\ / Data Parallel GPUs = 1 | Total batch size (1 x 4 x 1) = 4\n", + " \"-____-\" Trainable parameters = 59,719,680 of 5,182,897,696 (1.15% trained)\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [60/60 03:09, Epoch 0/1]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StepTraining Loss
113.119604
214.200603
314.389761
414.693236
514.118143
610.960572
78.097441
86.393641
94.937364
104.203208
113.456124
123.272719
132.971413
142.495770
152.417672
162.748857
172.540635
182.330746
192.280551
202.012815
212.011085
222.002847
231.918156
241.896314
251.834707
261.854128
271.651750
281.860624
291.869892
301.532480
311.548356
322.402746
331.781458
341.850275
351.509472
361.659287
371.609388
381.703836
391.961015
402.099207
411.570554
421.179479
431.736481
441.550761
451.548448
461.569731
471.550730
481.422475
491.556145
501.559651
511.485411
521.749113
531.348081
541.384408
551.508071
562.067807
571.268674
581.504525
591.869724
601.731446

" + ] + }, + "metadata": {} + } + ], + "source": [ + "trainer_stats = trainer.train()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "pCqnaKmlO1U9", + "outputId": "b505ad75-49e6-416d-8f9b-6673d1d98d64", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "217.1113 seconds used for training.\n", + "3.62 minutes used for training.\n", + "Peak reserved memory = 10.764 GB.\n", + "Peak reserved memory for training = 0.457 GB.\n", + "Peak reserved memory % of max memory = 73.913 %.\n", + "Peak reserved memory for training % of max memory = 3.138 %.\n" + ] + } + ], + "source": [ + "# @title Show final memory and time stats\n", + "used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "used_memory_for_lora = round(used_memory - start_gpu_memory, 3)\n", + "used_percentage = round(used_memory / max_memory * 100, 3)\n", + "lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)\n", + "print(f\"{trainer_stats.metrics['train_runtime']} seconds used for training.\")\n", + "print(\n", + " f\"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.\"\n", + ")\n", + "print(f\"Peak reserved memory = {used_memory} GB.\")\n", + "print(f\"Peak reserved memory for training = {used_memory_for_lora} GB.\")\n", + "print(f\"Peak reserved memory % of max memory = {used_percentage} %.\")\n", + "print(f\"Peak reserved memory for training % of max memory = {lora_percentage} %.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ekOmTR1hSNcr" + }, + "source": [ + "\n", + "### Inference\n", + "Let's run the model! You can modify the instruction and input—just leave the output blank.\n", + "\n", + "We'll use the best hyperparameters for inference on Gemma: `top_p=0.95`, `top_k=64`, and `temperature=1.0`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "kR3gIAX-SM2q", + "outputId": "1da42901-c48a-4396-f463-386ff4bf2685", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\\[\\left[[B_{n}^{\\pm}, b_{2}^{\\pm}\\right], b_{2}^{\\mp}\\right] = nB_{n}^{+}, \\quad \\left[[B_{n}^{-}, b_{2}^{\\pm}\\right], b_{2}^{\\mp}\\right] = nB_{n}^{-}.\\]\n" + ] + } + ], + "source": [ + "image = dataset[10][\"image\"]\n", + "instruction = \"Write the LaTeX representation for this image.\"\n", + "\n", + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\": \"image\"}, {\"type\": \"text\", \"text\": instruction}],\n", + " }\n", + "]\n", + "\n", + "input_text = processor.apply_chat_template(messages, add_generation_prompt = True)\n", + "\n", + "inputs = processor(\n", + " image,\n", + " input_text,\n", + " add_special_tokens = False,\n", + " return_tensors = \"pt\",\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "\n", + "text_streamer = TextStreamer(processor, skip_prompt = True)\n", + "result = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128,\n", + " use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uMuVrWbjAzhc" + }, + "source": [ + "\n", + "### Saving, loading finetuned models\n", + "To save the final model as LoRA adapters, use Hugging Face’s `push_to_hub` for online saving, or `save_pretrained` for local storage.\n", + "\n", + "**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "upcOlWe7A1vc", + "outputId": "32663620-c6cf-4ce5-8e16-12ac5c196988", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "['gemma_4_lora/processor_config.json']" + ] + }, + "metadata": {}, + "execution_count": 20 + } + ], + "source": [ + "model.save_pretrained(\"gemma_4_lora\") # Local saving\n", + "processor.save_pretrained(\"gemma_4_lora\")\n", + "# model.push_to_hub(\"your_name/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving\n", + "# processor.push_to_hub(\"your_name/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AEEcJ4qfC7Lp" + }, + "source": [ + "Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MKX_XKs_BNZR", + "outputId": "ac64eeb1-fb27-44c8-8a6d-198ad3daac3d", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The image shows a mathematical equation:\n", + "\n", + "$$D_{\\mu}^{\\alpha \\beta} \\bar{A}_{\\mu}^{\\alpha \\beta} = 0,$$\n", + "\n", + "This equation is a condition or a constraint within a specific physical theory, likely related to quantum field theory, particle physics, or general relativity, given the use of indices ($\\mu, \\alpha, \\beta$) and the notation $\\bar{A}$.\n", + "\n", + "**Interpretation of the terms (based on common physics notation):**\n", + "\n", + "* **$D_{\\mu}^{\\alpha \\beta}$:** This likely represents a covariant derivative (or a related tensor/operator) acting on some field\n" + ] + } + ], + "source": [ + "if False:\n", + " from unsloth import FastVisionModel\n", + "\n", + " model, processor = FastVisionModel.from_pretrained(\n", + " model_name = \"gemma_4_lora\", # YOUR MODEL YOU USED FOR TRAINING\n", + " load_in_4bit = True, # Set to False for 16bit LoRA\n", + " )\n", + "\n", + "sample = dataset[1]\n", + "image = sample[\"image\"].convert(\"RGB\")\n", + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": sample[\"text\"],\n", + " },\n", + " {\n", + " \"type\": \"image\",\n", + " },\n", + " ],\n", + " },\n", + "]\n", + "input_text = processor.apply_chat_template(messages, add_generation_prompt = True)\n", + "inputs = processor(\n", + " image,\n", + " input_text,\n", + " add_special_tokens = False,\n", + " return_tensors = \"pt\",\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "\n", + "text_streamer = TextStreamer(processor.tokenizer, skip_prompt = True)\n", + "_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128,\n", + " use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f422JgM9sdVT" + }, + "source": [ + "### Saving to float16 for VLLM\n", + "\n", + "We also support saving to `float16` directly. Select `merged_16bit` for float16. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See [our docs](https://unsloth.ai/docs/basics/inference-and-deployment) for more deployment options." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "iHjt_SMYsd3P" + }, + "outputs": [], + "source": [ + "# Select ONLY 1 to save! (Both not needed!)\n", + "\n", + "# Save locally to 16bit\n", + "if False: model.save_pretrained_merged(\"unsloth_finetune\", processor,)\n", + "\n", + "# To export and save to your Hugging Face account\n", + "if False: model.push_to_hub_merged(\"YOUR_USERNAME/unsloth_finetune\", processor, token = \"YOUR_HF_TOKEN\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TSjNVDCYv-yr" + }, + "source": [ + "And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!\n", + "\n", + "Some other resources:\n", + "1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)\n", + "2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)\n", + "3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)\n", + "4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)!\n", + "\n", + "

\n", + " \n", + " \n", + " \n", + "\n", + " Join Discord if you need help + ⭐️ Star us on Github ⭐️\n", + "
\n", + "\n", + " This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme)." + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.3" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "9dbba5cc07984692a5cb9310c37294d1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_3ac2b9eb631b49bab460d80e68135ab2", + "IPY_MODEL_9a1b953f27b2419eb3302fee6ecbcc17", + "IPY_MODEL_2a4ecf86f482417081d8645efdbe864d" + ], + "layout": "IPY_MODEL_b0f65d2867744352a1ff5fb2551cab67" + } + }, + "3ac2b9eb631b49bab460d80e68135ab2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a5b07bff219c4a33b0060c3ff3a12320", + "placeholder": "​", + "style": "IPY_MODEL_9d2f16071d4a429c8c9dbc588ec71b56", + "value": "model.safetensors: 100%" + } + }, + "9a1b953f27b2419eb3302fee6ecbcc17": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ec43f7e43c54441a8e13f42b8d21c306", + "max": 10246621918, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_7947ae62e3814e139e33e889e4753d24", + "value": 10246621918 + } + }, + "2a4ecf86f482417081d8645efdbe864d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_aa18b4a0d9964a15a8699f461491a50f", + "placeholder": "​", + "style": "IPY_MODEL_4eca3234dd45475d8daa4d38ad818413", + "value": " 10.2G/10.2G [00:26<00:00, 526MB/s]" + } + }, + "b0f65d2867744352a1ff5fb2551cab67": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a5b07bff219c4a33b0060c3ff3a12320": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9d2f16071d4a429c8c9dbc588ec71b56": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "ec43f7e43c54441a8e13f42b8d21c306": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7947ae62e3814e139e33e889e4753d24": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "aa18b4a0d9964a15a8699f461491a50f": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4eca3234dd45475d8daa4d38ad818413": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "c45ec80780064eb698252b9f8268d18a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_3008a3d0e4dc44a79304f6a398d914a7", + "IPY_MODEL_c1e1699a15934316981527944b1acc1c", + "IPY_MODEL_49a5c555942842ecb8c52d085c97a72c" + ], + "layout": "IPY_MODEL_a19a4d1df8db4cdc963d0374b0a3ead0" + } + }, + "3008a3d0e4dc44a79304f6a398d914a7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_29be362f9fec42398fa8c358b0c9671c", + "placeholder": "​", + "style": "IPY_MODEL_008f1fd0286e4192adb7f8ac89f3191a", + "value": "Loading weights: 100%" + } + }, + "c1e1699a15934316981527944b1acc1c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3af0e66e54f54a75906e08c9a6f2442e", + "max": 2011, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_c5bbad21d0eb4fd288dfc16c468be6da", + "value": 2011 + } + }, + "49a5c555942842ecb8c52d085c97a72c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_07bec6bd94cd4c43b5c4f691969f9e4b", + "placeholder": "​", + "style": "IPY_MODEL_86f337af3a05478593337b916a108e5f", + "value": " 2011/2011 [00:04<00:00, 689.32it/s]" + } + }, + "a19a4d1df8db4cdc963d0374b0a3ead0": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "29be362f9fec42398fa8c358b0c9671c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "008f1fd0286e4192adb7f8ac89f3191a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3af0e66e54f54a75906e08c9a6f2442e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c5bbad21d0eb4fd288dfc16c468be6da": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "07bec6bd94cd4c43b5c4f691969f9e4b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "86f337af3a05478593337b916a108e5f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "7edbbaac43b349be94f3a35f0c6d77d1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_2a207602d4e04defa6294fd60cdcde4d", + "IPY_MODEL_f3ef6cdedf1d49919d7fff94eb744b12", + "IPY_MODEL_d6dd94e831ae4bb68380e6b1ec616c6a" + ], + "layout": "IPY_MODEL_ec11bc623d2d4f69bb514ffdac4bea7e" + } + }, + "2a207602d4e04defa6294fd60cdcde4d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_fcf6969b6ca04e74b57a1f0c5b857a3c", + "placeholder": "​", + "style": "IPY_MODEL_a63ad4bb24d34dfa9d7758bb17c76656", + "value": "generation_config.json: 100%" + } + }, + "f3ef6cdedf1d49919d7fff94eb744b12": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_aa962a031a974a5c9c0995f01ee44d3d", + "max": 208, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_86b32de59f744db8957f6830aa78882c", + "value": 208 + } + }, + "d6dd94e831ae4bb68380e6b1ec616c6a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b6c332d8dbf149b38350f20728eb2d1b", + "placeholder": "​", + "style": "IPY_MODEL_ef8855bc019c406a8139203673af4a65", + "value": " 208/208 [00:00<00:00, 26.5kB/s]" + } + }, + "ec11bc623d2d4f69bb514ffdac4bea7e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "fcf6969b6ca04e74b57a1f0c5b857a3c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a63ad4bb24d34dfa9d7758bb17c76656": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "aa962a031a974a5c9c0995f01ee44d3d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "86b32de59f744db8957f6830aa78882c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "b6c332d8dbf149b38350f20728eb2d1b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ef8855bc019c406a8139203673af4a65": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "d1f0e4cdafd14357a9cb8dcac07f9da2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_671fd02e35244c7ba8253a9d8a214bfa", + "IPY_MODEL_b5c12d22f2c8436984be989f04dc1b82", + "IPY_MODEL_329aef97734640fe858eb10cd898c1f0" + ], + "layout": "IPY_MODEL_96406b4153124e8ba124145e30bd74dd" + } + }, + "671fd02e35244c7ba8253a9d8a214bfa": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_8d320330f70248688d5acb3ec09a382c", + "placeholder": "​", + "style": "IPY_MODEL_6226c59b27fc4cf38d62ce5c7c69e375", + "value": "processor_config.json: " + } + }, + "b5c12d22f2c8436984be989f04dc1b82": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_344ea1b499e94bce91ae471cb39c6267", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_686d888d3aa24aa1acc943a78e7ab394", + "value": 1 + } + }, + "329aef97734640fe858eb10cd898c1f0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_cb9060f87c9a44b18b1370dd8ccfec6a", + "placeholder": "​", + "style": "IPY_MODEL_5ee05bbd6830432a9f1a92c71e8f870f", + "value": " 1.69k/? [00:00<00:00, 180kB/s]" + } + }, + "96406b4153124e8ba124145e30bd74dd": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8d320330f70248688d5acb3ec09a382c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6226c59b27fc4cf38d62ce5c7c69e375": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "344ea1b499e94bce91ae471cb39c6267": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "686d888d3aa24aa1acc943a78e7ab394": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "cb9060f87c9a44b18b1370dd8ccfec6a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5ee05bbd6830432a9f1a92c71e8f870f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "acff987614e2454182fdd285f84cbfe2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_b2bdbe39bdde4891958840f88e74ed7b", + "IPY_MODEL_cd8bca8f57c84bfdaa8f76002d01ca50", + "IPY_MODEL_e9c068d524664334b01fb6f1db19295e" + ], + "layout": "IPY_MODEL_8dac8e16ac9145ca9aae00456f0b9c74" + } + }, + "b2bdbe39bdde4891958840f88e74ed7b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_09d2ce198cbd4148ad721aa825d62d6d", + "placeholder": "​", + "style": "IPY_MODEL_4789199cf5964285b8e550ad4fbed48e", + "value": "chat_template.jinja: " + } + }, + "cd8bca8f57c84bfdaa8f76002d01ca50": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3b8b062634424ea69075e48a06c6dc8f", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_a8637a53374d4a37b3b410e4408bf4ee", + "value": 1 + } + }, + "e9c068d524664334b01fb6f1db19295e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0ab3c43906724b08bcf869a5655bff30", + "placeholder": "​", + "style": "IPY_MODEL_aa515530e9494a28a24f90fdd9204a96", + "value": " 11.9k/? [00:00<00:00, 1.24MB/s]" + } + }, + "8dac8e16ac9145ca9aae00456f0b9c74": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "09d2ce198cbd4148ad721aa825d62d6d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4789199cf5964285b8e550ad4fbed48e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3b8b062634424ea69075e48a06c6dc8f": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "a8637a53374d4a37b3b410e4408bf4ee": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "0ab3c43906724b08bcf869a5655bff30": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "aa515530e9494a28a24f90fdd9204a96": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "a60cd51dbd1d4404989c69a5b7e95f96": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_4f47d85c7a2c414e9cfb8f9e6e87df4c", + "IPY_MODEL_e237c0a6a6814576a3100f68b329d4cb", + "IPY_MODEL_a5705ad4707d4a3faacdd2d37eb63fbc" + ], + "layout": "IPY_MODEL_42bd62697c1649ec98f504930e56c396" + } + }, + "4f47d85c7a2c414e9cfb8f9e6e87df4c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0661e6df71fd4df88a185d9cadab56f0", + "placeholder": "​", + "style": "IPY_MODEL_e615ee8b03ce4bd494315a4ad2df8b9a", + "value": "tokenizer_config.json: " + } + }, + "e237c0a6a6814576a3100f68b329d4cb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_766ffc619cf04a378d7c1186c74f87ba", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_efadf09207c0427cb0d29f48bebe2ac7", + "value": 1 + } + }, + "a5705ad4707d4a3faacdd2d37eb63fbc": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d2a87fd0f3d04145a5bd9dd1ed75395b", + "placeholder": "​", + "style": "IPY_MODEL_d38821cad5ac4f62bdbbfed1bbd3ffb6", + "value": " 14.9k/? [00:00<00:00, 1.67MB/s]" + } + }, + "42bd62697c1649ec98f504930e56c396": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0661e6df71fd4df88a185d9cadab56f0": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e615ee8b03ce4bd494315a4ad2df8b9a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "766ffc619cf04a378d7c1186c74f87ba": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "efadf09207c0427cb0d29f48bebe2ac7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "d2a87fd0f3d04145a5bd9dd1ed75395b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d38821cad5ac4f62bdbbfed1bbd3ffb6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "418feb293e1143c7bd3a057697e02b88": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_7ca38ff83b274ddf908d1250c613b590", + "IPY_MODEL_526d7159b777407ea1c3fd34d44024df", + "IPY_MODEL_28b52765c091458d8c411a5aa3d3c4ed" + ], + "layout": "IPY_MODEL_0501889c836b49bf90e4a9f3cca16b44" + } + }, + "7ca38ff83b274ddf908d1250c613b590": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_df2390fa32f0482a9b7d3fd10f4d19b0", + "placeholder": "​", + "style": "IPY_MODEL_67cfb5f857da465cae4575ce0e12449d", + "value": "tokenizer.json: 100%" + } + }, + "526d7159b777407ea1c3fd34d44024df": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_1dcda7554eb342cbb660bf8a78272ecc", + "max": 32169626, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_1d23ab2873814345a58a8c12438196df", + "value": 32169626 + } + }, + "28b52765c091458d8c411a5aa3d3c4ed": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b2a50fc0c2a44c82a7015f11713c6181", + "placeholder": "​", + "style": "IPY_MODEL_8e47b9fa36ee4c128414a714d3ae9547", + "value": " 32.2M/32.2M [00:00<00:00, 147MB/s]" + } + }, + "0501889c836b49bf90e4a9f3cca16b44": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "df2390fa32f0482a9b7d3fd10f4d19b0": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "67cfb5f857da465cae4575ce0e12449d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "1dcda7554eb342cbb660bf8a78272ecc": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1d23ab2873814345a58a8c12438196df": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "b2a50fc0c2a44c82a7015f11713c6181": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8e47b9fa36ee4c128414a714d3ae9547": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "87fb532c204143a6957b498382749619": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_76e88b170834473aa4e41f8ecede6545", + "IPY_MODEL_594a65dcffb442cb9a7d264e17d26bea", + "IPY_MODEL_5eca7ef2432f43fc9ff2bb88a017c0e7" + ], + "layout": "IPY_MODEL_4a703d9591cf467ba73f98a0bad9b169" + } + }, + "76e88b170834473aa4e41f8ecede6545": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_690042bf19924111809748d2f3fdf5e2", + "placeholder": "​", + "style": "IPY_MODEL_19dbff1fba52476aa320072b0c5ca0ad", + "value": "README.md: 100%" + } + }, + "594a65dcffb442cb9a7d264e17d26bea": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_446de647345c4d96a4f0bc20527a9239", + "max": 519, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_24c3d3519e5e4203801f6db21b174d75", + "value": 519 + } + }, + "5eca7ef2432f43fc9ff2bb88a017c0e7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b630883c3e984eea892ecf0c23942c74", + "placeholder": "​", + "style": "IPY_MODEL_5f93397ffbde4cbf8c494f0e6495deab", + "value": " 519/519 [00:00<00:00, 62.8kB/s]" + } + }, + "4a703d9591cf467ba73f98a0bad9b169": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "690042bf19924111809748d2f3fdf5e2": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "19dbff1fba52476aa320072b0c5ca0ad": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "446de647345c4d96a4f0bc20527a9239": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "24c3d3519e5e4203801f6db21b174d75": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "b630883c3e984eea892ecf0c23942c74": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5f93397ffbde4cbf8c494f0e6495deab": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "f04af098beb745dd83eeebb4a129df16": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_16642b8f9da245e58595e8b59a898803", + "IPY_MODEL_1b39506c9fa943f18c99e9a9ab29f2e1", + "IPY_MODEL_5c14213926ab404a815a6a6fb0b10e5d" + ], + "layout": "IPY_MODEL_abcb4e51c128476b97a5cabcb51925ae" + } + }, + "16642b8f9da245e58595e8b59a898803": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_99c7aa8c8a96453dbcc5ed8ae8d55e39", + "placeholder": "​", + "style": "IPY_MODEL_94b4e20a08f741d89475873612d9f014", + "value": "data/train-00000-of-00001.parquet: 100%" + } + }, + "1b39506c9fa943f18c99e9a9ab29f2e1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_eb8b4370227e4969a1c8a2649f219b6a", + "max": 343805431, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_0d7ed4045cb74d75a90e6ab2304fc9f5", + "value": 343805431 + } + }, + "5c14213926ab404a815a6a6fb0b10e5d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_575c74bd5eed4c0bba0a5741a32e8595", + "placeholder": "​", + "style": "IPY_MODEL_3f07f2d9645949f688ab4d577ae4d0e7", + "value": " 344M/344M [00:01<00:00, 571MB/s]" + } + }, + "abcb4e51c128476b97a5cabcb51925ae": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "99c7aa8c8a96453dbcc5ed8ae8d55e39": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "94b4e20a08f741d89475873612d9f014": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "eb8b4370227e4969a1c8a2649f219b6a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0d7ed4045cb74d75a90e6ab2304fc9f5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "575c74bd5eed4c0bba0a5741a32e8595": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3f07f2d9645949f688ab4d577ae4d0e7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "749cf884ca664718be7e2279d3609193": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_d820ee3337494b92869ff7c58133f785", + "IPY_MODEL_a71b8168cc54466bb4980cfee5c36b43", + "IPY_MODEL_0aa9fbdf33a043bdadb42e7c9ac60fe1" + ], + "layout": "IPY_MODEL_ac278d22afd8443ba318472ee187b9ce" + } + }, + "d820ee3337494b92869ff7c58133f785": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7da52b5a12eb4e22939cab415d177fdf", + "placeholder": "​", + "style": "IPY_MODEL_f33bb1bb79af4929beee2fd8ac8e1762", + "value": "data/test-00000-of-00001.parquet: 100%" + } + }, + "a71b8168cc54466bb4980cfee5c36b43": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_4b17c47b96a54844bf5e197d0cf497a2", + "max": 38205016, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_1f52b1d75d2b429b957f14dc74919e60", + "value": 38205016 + } + }, + "0aa9fbdf33a043bdadb42e7c9ac60fe1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_235f72da403d40138566ee21c219d3a3", + "placeholder": "​", + "style": "IPY_MODEL_b25d1876d9504dc2b0ab0cdce4dd1545", + "value": " 38.2M/38.2M [00:00<00:00, 191MB/s]" + } + }, + "ac278d22afd8443ba318472ee187b9ce": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7da52b5a12eb4e22939cab415d177fdf": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f33bb1bb79af4929beee2fd8ac8e1762": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "4b17c47b96a54844bf5e197d0cf497a2": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1f52b1d75d2b429b957f14dc74919e60": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "235f72da403d40138566ee21c219d3a3": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b25d1876d9504dc2b0ab0cdce4dd1545": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "b57b52f00850496e8def6847da587a40": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_7dff2ef63de54684b7b61ae037d051b1", + "IPY_MODEL_b15f4b8f28094b0cab0da27ed817e261", + "IPY_MODEL_49baa628d8b4481ba6dac63b13e80cd6" + ], + "layout": "IPY_MODEL_7619a6b05a5643158620b12646757974" + } + }, + "7dff2ef63de54684b7b61ae037d051b1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b3283fd17387482f9f127a8ff8089f23", + "placeholder": "​", + "style": "IPY_MODEL_08eece15f66b4367825ec48c7e3961cb", + "value": "Generating train split: 100%" + } + }, + "b15f4b8f28094b0cab0da27ed817e261": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c49366b385fa41b083bfb208d54add54", + "max": 68686, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_d4e5d57774cf4c9caa352a1ada066fab", + "value": 68686 + } + }, + "49baa628d8b4481ba6dac63b13e80cd6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_dfa1df1b47d647a3a4385b901ed58f57", + "placeholder": "​", + "style": "IPY_MODEL_6e3ec422ca96449b81f7358fe050b380", + "value": " 68686/68686 [00:00<00:00, 118058.07 examples/s]" + } + }, + "7619a6b05a5643158620b12646757974": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b3283fd17387482f9f127a8ff8089f23": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "08eece15f66b4367825ec48c7e3961cb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "c49366b385fa41b083bfb208d54add54": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d4e5d57774cf4c9caa352a1ada066fab": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "dfa1df1b47d647a3a4385b901ed58f57": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6e3ec422ca96449b81f7358fe050b380": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "abdbeadfb7f24651967c24fedd7b0f2f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_250587f17d6845ebbee21a51597382b2", + "IPY_MODEL_2197baef18b84247a61bd69563cbb695", + "IPY_MODEL_843fe7d70a2c4085beb423d355aeb472" + ], + "layout": "IPY_MODEL_a1fa450ee45f44dcbe67b73cfa56df3b" + } + }, + "250587f17d6845ebbee21a51597382b2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_05811e6740f44784a58660daa042a371", + "placeholder": "​", + "style": "IPY_MODEL_fdfc87dd989c4f39948610991bd7b5d3", + "value": "Generating test split: 100%" + } + }, + "2197baef18b84247a61bd69563cbb695": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ec238c8a4a624904a8629adc272bdf15", + "max": 7632, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_e97dcacab83d4b968bae440040e1513b", + "value": 7632 + } + }, + "843fe7d70a2c4085beb423d355aeb472": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_46e1156b616243cabf185557950e5f07", + "placeholder": "​", + "style": "IPY_MODEL_896090ab3ee442f3b04339560948b222", + "value": " 7632/7632 [00:00<00:00, 103292.38 examples/s]" + } + }, + "a1fa450ee45f44dcbe67b73cfa56df3b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "05811e6740f44784a58660daa042a371": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "fdfc87dd989c4f39948610991bd7b5d3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "ec238c8a4a624904a8629adc272bdf15": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e97dcacab83d4b968bae440040e1513b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "46e1156b616243cabf185557950e5f07": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "896090ab3ee442f3b04339560948b222": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "state": {} + } + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E2B)_GRPO.ipynb b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E2B)_GRPO.ipynb new file mode 100644 index 0000000..a71ec61 --- /dev/null +++ b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E2B)_GRPO.ipynb @@ -0,0 +1,5531 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To run this, press \"*Runtime*\" and press \"*Run all*\" on a **free** Tesla T4 Google Colab instance!\n", + "
\n", + "\n", + "\n", + " Join Discord if you need help + ⭐ Star us on Github ⭐\n", + "
\n", + "\n", + "To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme).\n", + "\n", + "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Installation" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": "%%capture\nimport os, re\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r'[\\d]{1,}\\.[\\d]{1,}', str(torch.__version__)).group(0)\n xformers = 'xformers==' + {'2.10':'0.0.34','2.9':'0.0.33.post1','2.8':'0.0.32.post2'}.get(v, \"0.0.34\")\n !pip install sentencepiece protobuf \"datasets==4.3.0\" \"huggingface_hub>=0.34.0\" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;" + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": "#@title Colab Extra Install { display-mode: \"form\" }\n%%capture\nimport os\n!pip install --upgrade -qqq uv\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n # If you're not in Colab, just use pip install!\n !pip install unsloth vllm\nelse:\n try: import numpy, PIL; _numpy = f'numpy=={numpy.__version__}'; _pil = f'pillow=={PIL.__version__}'\n except: _numpy = \"numpy\"; _pil = \"pillow\"\n try: import subprocess; is_t4 = \"Tesla T4\" in str(subprocess.check_output([\"nvidia-smi\"]))\n except: is_t4 = False\n _vllm, _triton = ('vllm==0.9.2', 'triton==3.2.0') if is_t4 else ('vllm==0.15.1', 'triton')\n !uv pip install -qqq --upgrade {_vllm} {_numpy} {_pil} torchvision bitsandbytes xformers unsloth\n !uv pip install -qqq {_triton}\n!uv pip install transformers==4.56.2\n!uv pip install --no-deps trl==0.22.2" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Unsloth" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Goal: Make faster kernels with Reinforcement Learning\n", + "\n", + "Our goal is to make a faster matrix multiplication kernel by doing RL on Gemma 4 with Unsloth.\n", + "\n", + "\n", + "\n", + "You will learn how to:\n", + "1. Counteract **reward hacking** like cheating, caching, laziness.\n", + "2. Timing and correctness of kernels and time limits.\n", + "3. Making good **reward functions**\n", + "4. How to seriously do RL to make optimized kernels" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "from unsloth import FastVisionModel\n", + "import torch\n", + "max_seq_length = 4096 # Can increase for longer reasoning traces\n", + "lora_rank = 32 # Larger rank = smarter, but slower\n", + "\n", + "gemma4_models = [\n", + " # Gemma-4 instruct models:\n", + " \"unsloth/gemma-4-E2B-it\",\n", + " \"unsloth/gemma-4-E4B-it\",\n", + " \"unsloth/gemma-4-31B-it\",\n", + " \"unsloth/gemma-4-26B-A4B-it\",\n", + " # Gemma-4 base models:\n", + " \"unsloth/gemma-4-E2B\",\n", + " \"unsloth/gemma-4-E4B\",\n", + " \"unsloth/gemma-4-31B\",\n", + " \"unsloth/gemma-4-26B-A4B\",\n", + "] # More models at https://huggingface.co/unsloth\n", + "\n", + "model, tokenizer = FastVisionModel.from_pretrained(\n", + " model_name = \"unsloth/gemma-4-E2B-it\",\n", + " max_seq_length = max_seq_length,\n", + " load_in_4bit = False, # False for LoRA 16bit\n", + " fast_inference = False, # Enable vllm fast inference\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We now add some small amount of LoRA weights to Gemma 4 so we only need to train those, instead of training on the full model." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "model = FastVisionModel.get_peft_model(\n", + " model,\n", + " r = lora_rank, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128\n", + " target_modules = [\n", + " \"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\",\n", + " \"gate_proj\", \"up_proj\", \"down_proj\",\n", + " ],\n", + " lora_alpha = lora_rank*2, # *2 speeds up training\n", + " use_gradient_checkpointing = \"unsloth\", # Reduces memory usage\n", + " random_state = 3407,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Optimized matrix multiplication\n", + "\n", + "Numpy has optimized matrix multiplication kernels for CPUs via BLAS optimized operations. For GPUs, one can use CUDA accelerated cuBLAS kernels which PyTorch calls under the hood.\n", + "\n", + "To generate some random matrices to do matrix multiplication, we can do the below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "def generate_random_matrices(seed = 3407, n = 256):\n", + " random_state = np.random.RandomState(seed)\n", + " n, k, m = random_state.randint(1, n+1, size = 3)\n", + " A = np.random.uniform(-10, 10, size = (n, k))\n", + " B = np.random.uniform(-10, 10, size = (k, m))\n", + " return A, A.tolist(), B, B.tolist()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We shall generate a small matrix, and see the matrix multiplied output" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[-2.8313286 4.54613909 -7.95265309 6.53459836 2.87235103]\n", + " [ 7.0739631 3.76278879 9.31565599 -8.52884711 9.96832952]\n", + " [ 8.41214082 6.51136046 -3.79347975 -2.46773693 -2.32292989]\n", + " [ 3.91302932 4.98335304 -5.33855089 5.71057634 -2.79871647]]\n", + "[[ 0.39218774 -9.6181377 -3.49736707]\n", + " [-0.33354865 -1.05626139 3.87231208]\n", + " [ 0.49494174 5.91863954 -6.83183693]\n", + " [ 5.1465162 -7.51648113 1.00445384]\n", + " [ 9.63213377 -4.92327556 3.323014 ]]\n", + "[[ 54.73441488 -87.89725072 97.94605887]\n", + " [ 58.25238906 -1.8467447 -49.25453031]\n", + " [ -35.82528794 -80.25394462 11.51225408]\n", + " [ -0.33785799 -103.64132345 38.51974367]]\n" + ] + } + ], + "source": [ + "A, A_list, B, B_list = generate_random_matrices(seed = 42, n = 5)\n", + "print(A)\n", + "print(B)\n", + "print(np.matmul(A, B))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can call a LLM to generate a simple matrix multiply kernel in Python only, and we can calculate the differences between the actual result and the kernel's result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def calculate_difference(pred, real):\n", + " if pred is None: return 5, 5\n", + " assert real is not None\n", + " import numpy as np\n", + " try:\n", + " difference = pred - real\n", + " except:\n", + " return 5, 5\n", + " amax_error = float(np.amax(difference))\n", + " mse_error = float(np.mean(np.square(difference)))\n", + " return amax_error, mse_error" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Kernel generated by GPT-5\n", + "def matmul(A, B):\n", + " z, s = zip, sum\n", + " Bt = list(z(*B))\n", + " return [[s(a*b for a, b in z(row, col)) for col in Bt] for row in A]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We see the error below is very small, so that's good!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(7.105427357601002e-15, 4.6783406255758477e-29)" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "prediction = matmul(A_list, B_list)\n", + "calculate_difference(prediction, np.matmul(A, B))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Countering Reward Hacking\n", + "\n", + "The ultimate goal of RL is to maximize some reward (say speed, revenue, some metric).\n", + "\n", + "But RL can **cheat** When the RL algorithm learns a trick or exploits something to increase the reward, without actually doing the task at end, this is called \"Reward Hacking\".\n", + "\n", + "Some good examples are in https://en.wikipedia.org/wiki/Reward_hacking\n", + "\n", + "For matrix multiplication kernels, we might see the following issues:\n", + "\n", + "* Laziness: RL learns to use Numpy, Torch, other libraries, which calls optimized kernels.\n", + "* Caching: RL learns to cache the result of the output\n", + "* Cheating: RL learns to find the actual output by inspecting Python global variables\n", + "* RL learns to edit the timing function to make it output 0 time as passed.\n", + "\n", + "And possibly more. We shall try to address each!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Countering Reward Hacking 1: Stop laziness\n", + "We can stop the RL algorithm from calling optimized code by inspecting if the generated code imports other non standard Python libraries. We used GPT-5 to help generate this check `check_only_stdlib_imports`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#@title (Collapsible code)\n", + "import ast\n", + "import sys\n", + "import sysconfig\n", + "from pathlib import Path\n", + "\n", + "def _stdlib_names():\n", + " \"\"\"\n", + " Build a set of canonical stdlib top-level module/package names.\n", + " Uses sys.stdlib_module_names when available (3.10+), with a\n", + " filesystem fallback for older versions/edge cases.\n", + " \"\"\"\n", + " names = {m.lower() for m in getattr(sys, \"stdlib_module_names\", set())}\n", + " names |= {m.lower() for m in sys.builtin_module_names}\n", + " names.add(\"__future__\") # special-case\n", + "\n", + " # Fallback/augmentation: scan the stdlib directory\n", + " try:\n", + " stdlib_dir = Path(sysconfig.get_path(\"stdlib\"))\n", + " if stdlib_dir.exists():\n", + " for p in stdlib_dir.iterdir():\n", + " if p.name == \"site-packages\":\n", + " continue\n", + " if p.suffix == \".py\":\n", + " names.add(p.stem.lower())\n", + " elif p.is_dir() and (p / \"__init__.py\").exists():\n", + " names.add(p.name.lower())\n", + " except Exception:\n", + " # conservative fallback; the names set above will still work well\n", + " pass\n", + "\n", + " return names\n", + "\n", + "_STDLIB_SET = _stdlib_names()\n", + "\n", + "def check_only_stdlib_imports(code: str):\n", + " \"\"\"\n", + " Return (ok: bool, details: dict)\n", + "\n", + " ok == True -> all absolute imports are from the stdlib.\n", + " ok == False -> details['non_stdlib'] lists offending top-level modules.\n", + "\n", + " details includes:\n", + " - stdlib: sorted list of stdlib imports found\n", + " - non_stdlib: sorted list of non-stdlib imports found\n", + " - relative_imports: count of relative imports (always allowed here)\n", + " \"\"\"\n", + " try:\n", + " tree = ast.parse(code)\n", + " except SyntaxError as e:\n", + " return False, {\n", + " \"error\": f\"SyntaxError: {e}\",\n", + " \"stdlib\": [],\n", + " \"non_stdlib\": [],\n", + " \"relative_imports\": 0,\n", + " }\n", + "\n", + " abs_imports = set()\n", + " relative_count = 0\n", + "\n", + " class Visitor(ast.NodeVisitor):\n", + " def visit_Import(self, node: ast.Import):\n", + " for alias in node.names:\n", + " abs_imports.add(alias.name.split(\".\")[0])\n", + " def visit_ImportFrom(self, node: ast.ImportFrom):\n", + " nonlocal relative_count\n", + " if (node.level or 0) > 0:\n", + " # relative import\n", + " relative_count += 1\n", + " else:\n", + " if node.module:\n", + " abs_imports.add(node.module.split(\".\")[0])\n", + "\n", + " Visitor().visit(tree)\n", + "\n", + " stdlib_found = sorted(m for m in abs_imports if m.lower() in _STDLIB_SET)\n", + " non_stdlib = sorted(m for m in abs_imports if m.lower() not in _STDLIB_SET)\n", + "\n", + " return len(non_stdlib) == 0, {\n", + " \"stdlib\": stdlib_found,\n", + " \"non_stdlib\": non_stdlib,\n", + " \"relative_imports\": relative_count,\n", + " }" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For example, let's call `check_only_stdlib_imports` on a random piece of matrix multiplication code generated by GPT-5:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Only stdlib imports? False\n", + "{'stdlib': [], 'non_stdlib': ['numpy', 'torch'], 'relative_imports': 0}\n" + ] + } + ], + "source": [ + "sample = \"\"\"\n", + "def matmul(A, B):\n", + " import numpy as np\n", + " from torch import matmul\n", + " z, s = zip, sum\n", + " Bt = list(z(*B))\n", + " return [[s(a*b for a, b in z(row, col)) for col in Bt] for row in A]\n", + "\"\"\"\n", + "ok, info = check_only_stdlib_imports(sample)\n", + "print(\"Only stdlib imports?\", ok)\n", + "print(info)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Countering Reward Hacking 2: Stop cheating\n", + "We can stop the RL algorithm from using global or cached variables by restricting it's `locals` and `globals`.\n", + "\n", + "We are also going to use `exec` to create the function, so we have to save the output to an empty dict.\n", + "\n", + "We also disallow global variable access." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "output_function = {}\n", + "exec(sample, {}, output_function)\n", + "output_function[\"matmul\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We also disallow global variable access via `types.FunctionType(f.__code__, {})`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Success\n", + "name 'np' is not defined\n" + ] + } + ], + "source": [ + "import types\n", + "output_function[\"matmul\"] = types.FunctionType(output_function[\"matmul\"].__code__, {})\n", + "\n", + "def import_numpy():\n", + " np.matmul\n", + " print(\"Success\")\n", + "\n", + "import_numpy()\n", + "import_numpy = types.FunctionType(import_numpy.__code__, {})\n", + "try:\n", + " import_numpy()\n", + "except Exception as e:\n", + " print(str(e))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def create_locked_down_function(function):\n", + " output_function = {}\n", + " exec(function, {}, output_function)\n", + " new_matmul = output_function[\"matmul\"]\n", + " new_matmul = types.FunctionType(new_matmul.__code__, {})\n", + " return new_matmul" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Countering Reward Hacking 3: Stop caching\n", + "We can stop the RL algorithm from using cached data by wiping the cache with a large fake matrix. We also have to benchmark carefully with multiple loops and turns.\n", + "\n", + "We also add a **timer** to not make the algorithm go in an endless loop." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os, gc, time, statistics\n", + "import signal\n", + "from contextlib import contextmanager\n", + "class TimeoutError(Exception): pass\n", + "\n", + "@contextmanager\n", + "def time_limit(seconds):\n", + " def _handler(signum, frame):\n", + " raise TimeoutError(f\"Timed out after {seconds}s\")\n", + " old = signal.signal(signal.SIGALRM, _handler)\n", + " signal.setitimer(signal.ITIMER_REAL, seconds)\n", + " try:\n", + " yield\n", + " finally:\n", + " signal.setitimer(signal.ITIMER_REAL, 0.0)\n", + " signal.signal(signal.SIGALRM, old)\n", + "\n", + "class Benchmarker:\n", + " def __init__(self, trials = 3, loops = 1, timeout = 30):\n", + " self.buffer = np.zeros(2 * 1024 * 1024 * 1024, dtype = np.uint8)\n", + " self.trials = trials\n", + " self.loops = loops\n", + " assert timeout > 0 # Cannot be 0 since it won't work!\n", + " self.timeout = timeout\n", + " def thrash(self):\n", + " # Edit the buffer to wipe cache lines\n", + " self.buffer ^= 1\n", + " return int(self.buffer[::4096].sum())\n", + "\n", + " def benchmark(self, function, arguments):\n", + " assert len(arguments) == self.loops\n", + " samples = []\n", + " exceptions = []\n", + " timed_out = 0\n", + " for _ in range(self.trials):\n", + " gc.collect(); gc.disable(); self.thrash()\n", + " t_start = time.perf_counter_ns()\n", + " for i in range(self.loops):\n", + " try:\n", + " with time_limit(self.timeout):\n", + " function(*arguments[i])\n", + " except TimeoutError as e:\n", + " timed_out += 1\n", + " except Exception as e:\n", + " exceptions.append(str(e))\n", + " t_end = time.perf_counter_ns()\n", + " gc.enable()\n", + " samples.append((t_end - t_start) // max(1, self.loops))\n", + " return {\n", + " \"median_ns\": int(statistics.median(samples)),\n", + " \"mean_ns\": int(statistics.fmean(samples)),\n", + " \"stdev_ns\": int(statistics.pstdev(samples) if len(samples) > 1 else 0),\n", + " \"exceptions\" : exceptions,\n", + " \"timeouts\" : timed_out,\n", + " }" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For example we use our matmul kernel we had, and benchmark it with a 10 second delay:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'median_ns': 70895404,\n", + " 'mean_ns': 70895404,\n", + " 'stdev_ns': 0,\n", + " 'exceptions': [],\n", + " 'timeouts': 0}" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "A, A_list, B, B_list = generate_random_matrices(seed = 0, n = 256)\n", + "Benchmarker(trials = 1, timeout = 10).benchmark(output_function[\"matmul\"], [(A_list, B_list)])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Data & RL task setup\n", + "\n", + "We now have to create a prompt to the model for which it will do some task. For our matrix multiply example, we use the below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Create a new fast matrix multiplication function using only native Python code.\n", + "You are given a list of list of numbers.\n", + "Output your new function in backticks using the format below:\n", + "```python\n", + "def matmul(A, B):\n", + " return ...\n", + "```\n" + ] + } + ], + "source": [ + "prompt = \"\"\"\n", + "Create a new fast matrix multiplication function using only native Python code.\n", + "You are given a list of list of numbers.\n", + "Output your new function in backticks using the format below:\n", + "```python\n", + "def matmul(A, B):\n", + " return ...\n", + "```\n", + "\"\"\".strip()\n", + "print(prompt)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First, let's prompt Gemma 4 without RL and see how it goes:" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "text = tokenizer.apply_chat_template(\n", + " [{\"role\": \"user\", \"content\": prompt.strip()}],\n", + " tokenize = False,\n", + " add_generation_prompt = True,\n", + ")\n", + "\n", + "from transformers import TextStreamer\n", + "print(\"=\" * 50)\n", + "print(\"BASE MODEL OUTPUT (before RL training):\")\n", + "print(\"=\" * 50)\n", + "\n", + "inputs = tokenizer(\n", + " text = text,\n", + " add_special_tokens = False,\n", + " return_tensors = \"pt\",\n", + ").to(\"cuda\")\n", + "\n", + "text_streamer = TextStreamer(tokenizer, skip_prompt = True)\n", + "result = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 512,\n", + " use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Reward functions\n", + "\n", + "We now design the `extract_function` function which simply extracts the function wrapped in 3 backticks.\n", + "\n", + "And 4 reward functions:\n", + "\n", + "1. `function_works` which rewards the model if the strategy is a valid Python function.\n", + "2. `no_cheating` which checks if the function imported other modules, and if it did, we penalize it.\n", + "3. `correctness_check` which checks if the kernel was correct or wrong - it shouldn't generate gibberish!\n", + "4. `speed_check` checks the performance relative to Numpy matmul directly." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "def matmul(A, B):\n", + " return ...\n" + ] + } + ], + "source": [ + "def extract_function(text):\n", + " if text.count(\"```\") >= 2:\n", + " first = text.find(\"```\") + 3\n", + " second = text.find(\"```\", first)\n", + " fx = text[first : second].strip()\n", + " fx = fx.removeprefix(\"python\\n\")\n", + " fx = fx[fx.find(\"def\"):]\n", + " if fx.startswith(\"def matmul(A, B):\"): return fx\n", + " return None\n", + "print(extract_function(prompt))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Below is our `function_works` reward function which uses Python's `exec` but guarded by not allowing leakage of local and global variables. We can also use `check_only_stdlib_imports` first to check if there are errors before even executing the function:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(False,\n", + " {'error': \"SyntaxError: expected '(' (, line 1)\",\n", + " 'stdlib': [],\n", + " 'non_stdlib': [],\n", + " 'relative_imports': 0})" + ], + "text/html": [ + "
(False,\n",
+       " {'error': "SyntaxError: expected '(' (<unknown>, line 1)",\n",
+       "  'stdlib': [],\n",
+       "  'non_stdlib': [],\n",
+       "  'relative_imports': 0})
" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ok, info = check_only_stdlib_imports(\"def a\")\n", + "ok, info" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def function_works(completions, **kwargs):\n", + " scores = []\n", + " for completion in completions:\n", + " score = 0\n", + " response = completion[0][\"content\"]\n", + " function = extract_function(response)\n", + " print(function)\n", + " if function is not None:\n", + " ok, info = check_only_stdlib_imports(function)\n", + " if function is None or \"error\" in info:\n", + " score = -2.0\n", + " else:\n", + " try:\n", + " new_matmul = create_locked_down_function(function)\n", + " score = 1.0\n", + " except:\n", + " score = -0.5\n", + " scores.append(score)\n", + " return scores" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`no_cheating` checks if the function cheated since it might have imported Numpy or Torch optimized code." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def no_cheating(completions, **kwargs):\n", + " scores = []\n", + " for completion in completions:\n", + " score = 0\n", + " response = completion[0][\"content\"]\n", + " function = extract_function(response)\n", + " if function is not None:\n", + " ok, info = check_only_stdlib_imports(function)\n", + " else:\n", + " ok = False\n", + " scores.append(1.0 if ok else -20.0) # Penalize heavily!\n", + " return scores" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next `correctness_check` checks if the kernel was correct. We want to penalize if the absolute error is larger than 1, and if the mean squared error is somewhat bigger then machine epsilon.\n", + "\n", + "We have to execute the code now!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "np.float64(2.220446049250313e-16)" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.finfo(np.float64).eps" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def correctness_check(completions, **kwargs):\n", + " scores = []\n", + " # Generate some random matrices of size less than 128\n", + " A, A_list, B, B_list = generate_random_matrices(seed = np.random.randint(10000), n = 128)\n", + " for completion in completions:\n", + " score = 0\n", + " response = completion[0][\"content\"]\n", + " function = extract_function(response)\n", + " if function is not None:\n", + " ok, info = check_only_stdlib_imports(function)\n", + " if function is None or \"error\" in info:\n", + " scores.append(0)\n", + " continue\n", + " try:\n", + " new_matmul = create_locked_down_function(function)\n", + " except:\n", + " scores.append(0)\n", + " continue\n", + " try:\n", + " pred = new_matmul(A_list.copy(), B_list.copy())\n", + " except:\n", + " # Failed!\n", + " scores.append(-2.0)\n", + " continue\n", + " true = np.matmul(A, B)\n", + " amax_error, mse_error = calculate_difference(pred, true)\n", + "\n", + " # Check correctness and score!\n", + " machine_epsilon = 100*np.finfo(np.float64).eps\n", + " if amax_error >= 3: score = -3.0\n", + " elif amax_error >= 2: score = -2.5\n", + " elif amax_error >= 1: score = -2.0\n", + " elif amax_error >= 0.5: score = -1.0\n", + " elif amax_error >= 100*machine_epsilon: score = 0.0\n", + " elif amax_error >= machine_epsilon: score = 1.0\n", + " else: score = 3.0\n", + "\n", + " if mse_error >= 3: score += -3.0\n", + " elif mse_error >= 2: score += -2.5\n", + " elif mse_error >= 1: score += -2.0\n", + " elif mse_error >= 0.5: score += -1.0\n", + " elif mse_error >= 100*machine_epsilon: score += 0.0\n", + " elif mse_error >= machine_epsilon: score += 1.0\n", + " else: score += 3.0\n", + " scores.append(score)\n", + " return scores" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally our benchmarking function for `speed_check`! We shall limit the timer to 10 seconds and do 3 trials." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'median_ns': 205566,\n", + " 'mean_ns': 231173,\n", + " 'stdev_ns': 39247,\n", + " 'exceptions': [],\n", + " 'timeouts': 0}" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "A, A_list, B, B_list = generate_random_matrices(seed = 0, n = 256)\n", + "benchmarker = Benchmarker(trials = 3, timeout = 10)\n", + "numpy_results = benchmarker.benchmark(np.matmul, [(A, B)])\n", + "numpy_results" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'median_ns': 84237,\n", + " 'mean_ns': 87442,\n", + " 'stdev_ns': 4538,\n", + " 'exceptions': [],\n", + " 'timeouts': 0}" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "new_matmul = create_locked_down_function(extract_function(prompt))\n", + "new_results = benchmarker.benchmark(new_matmul, [(A_list, B_list)])\n", + "new_results" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can take the difference and do a negative sign for slower ones. If the ratio is less than 1 (ie faster, we shall invert it!)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.02440329071548132" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "negative = -(new_results[\"median_ns\"] / numpy_results[\"median_ns\"]) / 100\n", + "positive = +(numpy_results[\"median_ns\"] / new_results[\"median_ns\"]) / 100\n", + "reward = negative if new_results[\"median_ns\"] >= numpy_results[\"median_ns\"] else positive\n", + "reward" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3.333333333333333" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "new_results[\"median_ns\"] = 3\n", + "numpy_results[\"median_ns\"] = 1000\n", + "negative = -(new_results[\"median_ns\"] / numpy_results[\"median_ns\"]) / 100\n", + "positive = +(numpy_results[\"median_ns\"] / new_results[\"median_ns\"]) / 100\n", + "reward = negative if new_results[\"median_ns\"] >= numpy_results[\"median_ns\"] else positive\n", + "reward" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import gc\n", + "def speed_check(completions, **kwargs):\n", + " scores = []\n", + " # Generate some random matrices of size less than 256\n", + " A, A_list, B, B_list = generate_random_matrices(seed = np.random.randint(10000), n = 256)\n", + " numpy_results = benchmarker.benchmark(np.matmul, [(A, B)])\n", + " for completion in completions:\n", + " score = 0\n", + " response = completion[0][\"content\"]\n", + " function = extract_function(response)\n", + " if function is not None:\n", + " ok, info = check_only_stdlib_imports(function)\n", + " if function is None or \"error\" in info:\n", + " scores.append(0)\n", + " continue\n", + " try:\n", + " new_matmul = create_locked_down_function(function)\n", + " except:\n", + " scores.append(0)\n", + " continue\n", + " new_results = benchmarker.benchmark(new_matmul, [(A_list.copy(), B_list.copy())])\n", + "\n", + " # Get score and clip to -10, 10\n", + " negative = -(new_results[\"median_ns\"] / numpy_results[\"median_ns\"]) / 100\n", + " positive = +(numpy_results[\"median_ns\"] / new_results[\"median_ns\"]) / 100\n", + " score = negative if new_results[\"median_ns\"] >= numpy_results[\"median_ns\"] else positive\n", + " if score >= 10: score = 10\n", + " if score <= -10: score = -10\n", + " scores.append(score)\n", + " # Free memory to counteract OOMs\n", + " gc.collect()\n", + " torch.cuda.empty_cache()\n", + " return scores" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We create the dataset which includes a replica of our prompt." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "from datasets import Dataset\n", + "dataset = Dataset.from_list([{\"prompt\" : [{\"role\": \"user\", \"content\": prompt.strip()}], \"answer\" : 0}]*1000)\n", + "maximum_length = len(tokenizer.apply_chat_template([{\"role\":\"user\", \"content\":prompt.strip()}], add_generation_prompt = True, tokenize = True))\n", + "print(maximum_length)\n", + "dataset[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "### Train the model\n", + "\n", + "Now set up GRPO Trainer and all configurations! We also support GSDP, GAPO, Dr GRPO and more! Go to our docs https://unsloth.ai/docs/ for more info!" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "# Leave room for the prompt (plus 1 token safety margin)\n", + "max_completion_length = max_seq_length - (maximum_length + 1)\n", + "\n", + "from trl import GRPOConfig, GRPOTrainer\n", + "training_args = GRPOConfig(\n", + " temperature = 1.0,\n", + " top_p = 0.95,\n", + " top_k = 64,\n", + " learning_rate = 5e-5,\n", + " weight_decay = 0.001,\n", + " warmup_ratio = 0.1,\n", + " lr_scheduler_type = \"linear\",\n", + " optim = \"adamw_8bit\",\n", + " logging_steps = 1,\n", + " per_device_train_batch_size = 1,\n", + " gradient_accumulation_steps = 2, # Increase to 4 for smoother training\n", + " num_generations = 2, # Decrease if out of memory\n", + " max_completion_length = max_completion_length,\n", + " # num_train_epochs = 1, # Set to 1 for a full training run\n", + " max_steps = 100,\n", + " save_steps = 100,\n", + " report_to = \"none\", # Can use Weights & Biases, TrackIO\n", + " output_dir = \"outputs\",\n", + " epsilon = 0.2,\n", + " epsilon_high = 0.28, # one sided\n", + " delta = 1.5, # two sided\n", + " loss_type = 'bnpo',\n", + " mask_truncated_completions = True\n", + " # For optional training + evaluation\n", + " # fp16_full_eval = True,\n", + " # per_device_eval_batch_size = 4,\n", + " # eval_accumulation_steps = 1,\n", + " # eval_strategy = \"steps\",\n", + " # eval_steps = 1,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And let's run the trainer! If you scroll up, you'll see a table of rewards. The goal is to see the `reward` column increase!\n", + "\n", + "You might have to wait 150 to 200 steps for any action. You'll probably get 0 reward for the first 100 steps. Please be patient!\n", + "\n", + "| Step | Training Loss | reward | reward_std | completion_length | kl |\n", + "|------|---------------|-----------|------------|-------------------|----------|\n", + "| 1 | 0.000000 | 0.125000 | 0.000000 | 200.000000 | 0.000000 |\n", + "| 2 | 0.000000 | 0.072375 | 0.248112 | 200.000000 | 0.000000 |\n", + "| 3 | 0.000000 | -0.079000 | 0.163776 | 182.500000 | 0.000005 |" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# For optional training + evaluation\n", + "# new_dataset = dataset.train_test_split(test_size = 0.01)\n", + "\n", + "trainer = GRPOTrainer(\n", + " model = model,\n", + " processing_class = tokenizer,\n", + " reward_funcs = [\n", + " function_works,\n", + " no_cheating,\n", + " correctness_check,\n", + " speed_check,\n", + " ],\n", + " args = training_args,\n", + " train_dataset = dataset,\n", + "\n", + " # For optional training + evaluation\n", + " # train_dataset = new_dataset[\"train\"],\n", + " # eval_dataset = new_dataset[\"test\"],\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And let's train the model!\n", + "\n", + "**NOTE** A T4 free GPU might take 5 minutes for one generation sadly since it's an old GPU - A100 or H100 will be much faster!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': 199998}.\n", + "==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1\n", + " \\\\ /| Num examples = 1,000 | Num Epochs = 1 | Total steps = 100\n", + "O^O/ \\_/ \\ Batch size per device = 2 | Gradient accumulation steps = 1\n", + "\\ / Data Parallel GPUs = 1 | Total batch size (2 x 1 x 1) = 2\n", + " \"-____-\" Trainable parameters = 1,990,656 of 20,916,747,840 (0.01% trained)\n", + "`generation_config` default values have been modified to match model-specific defaults: {'max_length': 131072}. If this is not desired, please set these values explicitly.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "def matmul(A, B):\n", + " \"\"\"\n", + " Fast matrix multiplication using only native Python code.\n", + " \n", + " Parameters\n", + " ----------\n", + " A : list of list of numbers\n", + " Left matrix of dimensions (m x p).\n", + " B : list of list of numbers\n", + " Right matrix of dimensions (p x n).\n", + " \n", + " Returns\n", + " -------\n", + " C : list of list of numbers\n", + " Resulting matrix of dimensions (m x n) such that C = A × B.\n", + " \"\"\"\n", + " # Transpose B to allow column access as rows.\n", + " Bt = list(zip(*B))\n", + " # Compute the dot product of each row from A with each column from B\n", + " return [[sum(a * b for a, b in zip(row, col))\n", + " for col in Bt]\n", + " for row in A]\n", + "def matmul(A, B):\n", + " return ...\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [100/100 1:36:19, Epoch 0/1]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StepTraining Lossrewardreward_stdcompletions / mean_lengthcompletions / min_lengthcompletions / max_lengthcompletions / clipped_ratiocompletions / mean_terminated_lengthcompletions / min_terminated_lengthcompletions / max_terminated_lengthklrewards / function_works / meanrewards / function_works / stdrewards / no_cheating / meanrewards / no_cheating / stdrewards / correctness_check / meanrewards / correctness_check / stdrewards / speed_check / meanrewards / speed_check / std
10.000000-0.9605324.244743536.000000392.000000680.0000000.000000536.000000392.000000680.0000000.0027981.0000000.0000001.0000000.000000-1.0000007.071068-1.9605322.826324
20.000000-11.50460114.842735718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.000834-0.5000002.121320-9.50000014.8492422.0000002.828427-3.5046014.956255
30.000000-1.2327683.847022718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.0006911.0000000.0000001.0000000.000000-1.0000007.071068-2.2327683.224046
40.000000-9.11239118.225832541.000000364.000000718.0000000.500000364.000000364.000000364.0000000.004645-0.5000002.121320-9.50000014.8492422.0000002.828427-1.1123911.573158
50.0000001.9825230.584465503.000000352.000000654.0000000.000000503.000000352.000000654.0000000.0042411.0000000.0000001.0000000.0000004.0000000.000000-4.0174770.584465
60.000000-8.95949018.442066629.500000541.000000718.0000000.500000541.000000541.000000541.0000000.002716-0.5000002.121320-9.50000014.8492422.0000002.828427-0.9594901.356924
70.0000005.5170080.094176440.500000394.000000487.0000000.000000440.500000394.000000487.0000000.0017691.0000000.0000001.0000000.0000004.0000000.000000-0.4829920.094176
80.000000-9.26346518.012180718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.000987-0.5000002.121320-9.50000014.8492422.0000002.828427-1.2634651.786810
90.000000-13.00000012.727922586.000000454.000000718.0000000.500000454.000000454.000000454.0000000.002943-0.5000002.121320-9.50000014.8492422.0000002.828427-5.0000007.071068
100.000000-3.9856780.000226635.500000553.000000718.0000000.500000553.000000553.000000553.0000000.0018141.0000000.0000001.0000000.000000-6.0000000.0000000.0143220.000225
110.000000-8.36670019.280397718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.001235-0.5000002.121320-9.50000014.8492422.0000002.828427-0.3667000.518593
120.000000-9.32722217.922014718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.000735-0.5000002.121320-9.50000014.8492422.0000002.828427-1.3272221.876975
130.000000-12.98925012.743125718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.001106-0.5000002.121320-9.50000014.849242-3.0000004.2426400.0107500.015203
140.0000001.7855222.598972640.500000563.000000718.0000000.500000563.000000563.000000563.0000000.0032061.0000000.0000001.0000000.0000001.0000004.242640-1.2144781.643669
150.000000-9.01898118.357933603.000000488.000000718.0000000.500000488.000000488.000000488.0000000.006529-0.5000002.121320-9.50000014.8492422.0000002.828427-1.0189811.441056
160.000000-3.9852480.000232718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.0006251.0000000.0000001.0000000.000000-6.0000000.0000000.0147520.000231
170.000000-8.49668819.096567625.500000533.000000718.0000000.500000533.000000533.000000533.0000000.003519-0.5000002.121320-9.50000014.8492422.0000002.828427-0.4966880.702423
180.000000-12.98532912.748671718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.001027-0.5000002.121320-9.50000014.849242-3.0000004.2426400.0146710.020748
190.000000-1.1731723.936521718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.0010251.0000000.0000001.0000000.000000-1.0000007.071068-2.1731723.134547
200.000000-2.0000000.000000391.000000297.000000485.0000000.000000391.000000297.000000485.0000000.0050211.0000000.0000001.0000000.0000006.0000000.000000-10.0000000.000000
210.000000-0.3879585.063533593.500000469.000000718.0000000.500000469.000000469.000000469.0000000.0042641.0000000.0000001.0000000.000000-1.0000007.071068-1.3879582.007535
220.000000-12.99263412.738339524.500000331.000000718.0000000.500000331.000000331.000000331.0000000.005515-0.5000002.121320-9.50000014.849242-3.0000004.2426400.0073660.010417
230.000000-22.0000000.000000718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.000955-2.0000000.000000-20.0000000.0000000.0000000.0000000.0000000.000000
240.000000-12.98972912.742447635.000000552.000000718.0000000.500000552.000000552.000000552.0000000.002888-0.5000002.121320-9.50000014.849242-3.0000004.2426400.0102710.014526
250.000000-22.0000000.000000718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.001271-2.0000000.000000-20.0000000.0000000.0000000.0000000.0000000.000000
260.000000-22.0000000.000000718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.001055-2.0000000.000000-20.0000000.0000000.0000000.0000000.0000000.000000
270.000000-9.10578218.235178534.000000350.000000718.0000000.500000350.000000350.000000350.0000000.021608-0.5000002.121320-9.50000014.8492422.0000002.828427-1.1057821.563811
280.0000002.7928983.932606645.500000573.000000718.0000000.500000573.000000573.000000573.0000000.0068351.0000000.0000001.0000000.0000001.0000004.242640-0.2071020.310035
290.000000-3.9702440.000759616.500000515.000000718.0000000.500000515.000000515.000000515.0000000.0116461.0000000.0000001.0000000.000000-6.0000000.0000000.0297560.000759
300.000000-12.97788912.759192718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.001129-0.5000002.121320-9.50000014.849242-3.0000004.2426400.0221110.031270
310.000000-12.98309512.751829586.500000455.000000718.0000000.500000455.000000455.000000455.0000000.032435-0.5000002.121320-9.50000014.849242-3.0000004.2426400.0169050.023908
320.000000-8.08334719.681118718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.001218-0.5000002.121320-9.50000014.8492422.0000002.828427-0.0833470.117870
330.000000-22.0000000.000000718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.001185-2.0000000.000000-20.0000000.0000000.0000000.0000000.0000000.000000
340.000000-4.0000000.000000577.500000477.000000678.0000000.000000577.500000477.000000678.0000000.0215511.0000000.0000001.0000000.0000004.0000000.000000-10.0000000.000000
35-0.0000003.2141730.016615609.500000577.000000642.0000000.000000609.500000577.000000642.0000000.0049371.0000000.0000001.0000000.0000004.0000000.000000-2.7858270.016615
360.000000-22.0000000.000000718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.001002-2.0000000.000000-20.0000000.0000000.0000000.0000000.0000000.000000
370.000000-8.40564319.225323691.000000664.000000718.0000000.500000664.000000664.000000664.0000000.001766-0.5000002.121320-9.50000014.8492422.0000002.828427-0.4056430.573666
380.0000002.5101880.017700601.000000541.000000661.0000000.000000601.000000541.000000661.0000000.0068951.0000000.0000001.0000000.0000004.0000000.000000-3.4898120.017700
390.0000001.1439301.851457676.000000634.000000718.0000000.500000634.000000634.000000634.0000000.0033301.0000000.0000001.0000000.0000001.0000004.242640-1.8560702.391184
400.0000000.3059450.040185385.500000260.000000511.0000000.000000385.500000260.000000511.0000000.0219961.0000000.0000001.0000000.0000004.0000000.000000-5.6940550.040185
410.000000-2.3859270.019569435.000000378.000000492.0000000.000000435.000000378.000000492.0000000.0040621.0000000.0000001.0000000.0000004.0000000.000000-8.3859270.019569
420.000000-3.9649930.000042625.000000532.000000718.0000000.500000532.000000532.000000532.0000000.0075711.0000000.0000001.0000000.000000-6.0000000.0000000.0350070.000042
430.000000-22.0000000.000000718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.001561-2.0000000.000000-20.0000000.0000000.0000000.0000000.0000000.000000
440.000000-3.9565340.000491718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.0011981.0000000.0000001.0000000.000000-6.0000000.0000000.0434660.000490
450.000000-3.9730950.000793718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.0013381.0000000.0000001.0000000.000000-6.0000000.0000000.0269040.000793
460.000000-3.9761560.033721718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.0011701.0000000.0000001.0000000.000000-1.0000007.071068-4.9761567.104789
470.000000-0.7798470.030023598.000000478.000000718.0000000.500000478.000000478.000000478.0000000.0037631.0000000.0000001.0000000.0000004.0000000.000000-6.7798470.030023
480.000000-0.4001165.054048587.000000544.000000630.0000000.000000587.000000544.000000630.0000000.0024841.0000000.0000001.0000000.000000-1.0000007.071068-1.4001162.017020
490.000000-0.3437055.124783487.000000256.000000718.0000000.500000256.000000256.000000256.0000000.0205601.0000000.0000001.0000000.000000-1.0000007.071068-1.3437061.946285
500.0000000.3490976.115803524.000000330.000000718.0000000.500000330.000000330.000000330.0000000.0115211.0000000.0000001.0000000.000000-1.0000007.071068-0.6509030.955265
510.000000-3.9599160.001149718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.0013241.0000000.0000001.0000000.000000-6.0000000.0000000.0400840.001149
520.000000-0.3867215.073168651.000000584.000000718.0000000.500000584.000000584.000000584.0000000.0041811.0000000.0000001.0000000.000000-1.0000007.071068-1.3867211.997900
530.000000-12.98180712.753652718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.001321-0.5000002.121320-9.50000014.849242-3.0000004.2426400.0181940.025730
540.000000-3.9624460.002950718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.0012481.0000000.0000001.0000000.000000-6.0000000.0000000.0375540.002950
550.000000-8.97693218.417400718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.001200-0.5000002.121320-9.50000014.8492422.0000002.828427-0.9769321.381590
560.000000-8.29010819.388716718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.000950-0.5000002.121320-9.50000014.8492422.0000002.828427-0.2901080.410275
570.0000001.5581850.646650337.500000222.000000453.0000000.000000337.500000222.000000453.0000000.0082591.0000000.0000001.0000000.0000004.0000000.000000-4.4418150.646650
580.000000-1.9318022.792675718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.0010271.0000000.0000001.0000000.0000001.0000004.242640-4.9318027.035316
590.000000-4.0000000.000000674.500000631.000000718.0000000.500000631.000000631.000000631.0000000.0032881.0000000.0000001.0000000.0000004.0000000.000000-10.0000000.000000
600.000000-22.0000000.000000718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.001974-2.0000000.000000-20.0000000.0000000.0000000.0000000.0000000.000000
610.000000-8.84890318.598459673.500000629.000000718.0000000.500000629.000000629.000000629.0000000.001706-0.5000002.121320-9.50000014.8492422.0000002.828427-0.8489031.200530
620.0000004.0908080.014869707.500000697.000000718.0000000.500000697.000000697.000000697.0000000.0009901.0000000.0000001.0000000.0000004.0000000.000000-1.9091920.014869
630.000000-11.09183415.426476678.000000638.000000718.0000000.500000638.000000638.000000638.0000000.002370-0.5000002.121320-9.50000014.8492422.0000002.828427-3.0918344.372514
640.0000000.8162416.788723504.000000398.000000610.0000000.000000504.000000398.000000610.0000000.0090331.0000000.0000001.0000000.000000-1.0000007.071068-0.1837590.282345
650.000000-12.97128512.768532639.500000561.000000718.0000000.500000561.000000561.000000561.0000000.004788-0.5000002.121320-9.50000014.849242-3.0000004.2426400.0287150.040609
660.000000-3.9789780.000921718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.0011021.0000000.0000001.0000000.000000-6.0000000.0000000.0210220.000921
670.0000000.5484996.408890718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.0009451.0000000.0000001.0000000.000000-1.0000007.071068-0.4515010.662178
680.000000-9.64760417.468925570.500000423.000000718.0000000.500000423.000000423.000000423.0000000.025197-0.5000002.121320-9.50000014.8492422.0000002.828427-1.6476042.330064
690.000000-10.83281115.792789559.500000401.000000718.0000000.500000401.000000401.000000401.0000000.038960-0.5000002.121320-9.50000014.8492422.0000002.828427-2.8328114.006200
700.000000-22.0000000.000000690.500000663.000000718.0000000.500000663.000000663.000000663.0000000.004275-2.0000000.000000-20.0000000.0000000.0000000.0000000.0000000.000000
710.000000-12.98364112.751058465.500000213.000000718.0000000.500000213.000000213.000000213.0000000.048212-0.5000002.121320-9.50000014.849242-3.0000004.2426400.0163600.023136
720.000000-12.98583012.747961718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.001176-0.5000002.121320-9.50000014.849242-3.0000004.2426400.0141700.020039
730.0001001.2938830.501316506.500000295.000000718.0000000.500000295.000000295.000000295.0000000.0863801.0000000.0000001.0000000.0000004.0000000.000000-4.7061170.501316
740.000000-12.63752213.240543587.000000486.000000688.0000000.000000587.000000486.000000688.0000000.041948-0.5000002.121320-9.50000014.8492422.0000002.828427-4.6375216.558445
750.000000-8.19532119.522764644.000000570.000000718.0000000.500000570.000000570.000000570.0000000.018705-0.5000002.121320-9.50000014.8492422.0000002.828427-0.1953210.276226
760.000000-9.50619717.668905718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.001101-0.5000002.121320-9.50000014.8492422.0000002.828427-1.5061972.130084
770.000000-9.08523918.264231641.500000565.000000718.0000000.500000565.000000565.000000565.0000000.038641-0.5000002.121320-9.50000014.8492422.0000002.828427-1.0852401.534761
780.0000004.2897120.292143683.000000648.000000718.0000000.500000648.000000648.000000648.0000000.0088021.0000000.0000001.0000000.0000004.0000000.000000-1.7102880.292143
790.000000-12.98687512.746484718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.001148-0.5000002.121320-9.50000014.849242-3.0000004.2426400.0131250.018562
800.000000-22.0000000.000000718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.001387-2.0000000.000000-20.0000000.0000000.0000000.0000000.0000000.000000
810.000000-22.0000000.000000718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.000819-2.0000000.000000-20.0000000.0000000.0000000.0000000.0000000.000000
820.000000-22.0000000.000000718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.001463-2.0000000.000000-20.0000000.0000000.0000000.0000000.0000000.000000
830.000000-13.00000012.727922662.500000607.000000718.0000000.500000607.000000607.000000607.0000000.027296-0.5000002.121320-9.50000014.8492422.0000002.828427-5.0000007.071068
840.000100-8.06907719.701302584.000000450.000000718.0000000.500000450.000000450.000000450.0000000.104870-0.5000002.121320-9.50000014.8492422.0000002.828427-0.0690760.097689
850.000200-9.36398317.870026569.000000420.000000718.0000000.500000420.000000420.000000420.0000000.166438-0.5000002.121320-9.50000014.8492422.0000002.828427-1.3639831.928963
860.000300-13.00000012.727922527.500000337.000000718.0000000.500000337.000000337.000000337.0000000.278213-0.5000002.121320-9.50000014.8492422.0000002.828427-5.0000007.071068
870.000300-0.1127575.169596457.000000196.000000718.0000000.500000196.000000196.000000196.0000000.3259311.0000000.0000001.0000000.000000-1.0000007.071068-1.1127571.901471
880.000200-3.6348850.447414587.000000456.000000718.0000000.500000456.000000456.000000456.0000000.1997671.0000000.0000001.0000000.000000-1.0000007.071068-4.6348856.623653
890.0004000.8717886.865792508.000000298.000000718.0000000.500000298.000000298.000000298.0000000.3636101.0000000.0000001.0000000.000000-1.0000007.071068-0.1282120.205277
900.000000-4.0429860.094284718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.0012591.0000000.0000001.0000000.000000-6.0000000.000000-0.0429860.094284
910.000000-12.98354612.751191718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.001957-0.5000002.121320-9.50000014.849242-3.0000004.2426400.0164540.023269
920.000000-9.23971018.045776718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.001780-0.5000002.121320-9.50000014.8492422.0000002.828427-1.2397101.753215
930.000300-0.6286194.722605554.000000390.000000718.0000000.500000390.000000390.000000390.0000000.3127741.0000000.0000001.0000000.000000-1.0000007.071068-1.6286192.348463
940.000000-8.35652719.294785692.500000667.000000718.0000000.500000667.000000667.000000667.0000000.015856-0.5000002.121320-9.50000014.8492422.0000002.828427-0.3565270.504206
950.0000000.8195536.786077710.000000702.000000718.0000000.500000702.000000702.000000702.0000000.0022371.0000000.0000001.0000000.000000-1.0000007.071068-0.1804470.284991
960.0000005.8887160.034997718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.0010131.0000000.0000001.0000000.0000004.0000000.000000-0.1112840.034997
970.0004001.6104860.819715558.000000398.000000718.0000000.500000398.000000398.000000398.0000000.3916051.0000000.0000001.0000000.0000004.0000000.000000-4.3895140.819715
980.000300-8.59123618.962856579.000000440.000000718.0000000.500000440.000000440.000000440.0000000.310268-0.5000002.121320-9.50000014.8492422.0000002.828427-0.5912360.836134
990.000000-22.0000000.000000718.000000718.000000718.0000001.0000000.0000000.0000000.0000000.001404-2.0000000.000000-20.0000000.0000000.0000000.0000000.0000000.000000
1000.000100-11.68190614.591989655.000000592.000000718.0000000.500000592.000000592.000000592.0000000.089281-0.5000002.121320-9.50000014.8492422.0000002.828427-3.6819065.207002

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Unsloth: Will smartly offload gradients to save VRAM!\n", + "def matmul(A, B):\n", + " # ensure dimensions\n", + " m = len(A)\n", + " n = len(A[0])\n", + " p = len(B[0]) if B else 0\n", + " return [[sum(A[i][k] * B[k][j] for k in range(n)) for j in range(p)] for i in range(m)]\n", + "None\n", + "def matmul(A, B):\n", + " # A: m x k\n", + " # B: k x n\n", + " # returns m x n\n", + " ...\n", + "def matmul(A, B):\n", + " # A: r x p, B: p x c\n", + " r, p = len(A), len(A[0]) if A else 0\n", + " p2, c = len(B), len(B[0]) if B else 0\n", + " if r == 0 or c == 0 or p != p2:\n", + " raise ValueError(\"Incompatible dimensions for multiplication\")\n", + " # transpose B to improve locality\n", + " B_T = list(zip(*B)) # c x p\n", + " result = [[0] * c for _ in range(r)]\n", + " for i in range(r):\n", + " Ai = A[i]\n", + " Ri = result[i]\n", + " for j, Bj in enumerate(B_T):\n", + " s = 0\n", + " for k in range(p):\n", + " s += Ai[k] * Bj[k]\n", + " Ri[j] = s\n", + " return result\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiply two matrices A and B.\n", + "\n", + " Parameters\n", + " ----------\n", + " A : list[list[float]]\n", + " Left‑hand matrix with shape m x k.\n", + " B : list[list[float]]\n", + " Right‑hand matrix with shape k x n.\n", + "\n", + " Returns\n", + " -------\n", + " list[list[float]]\n", + " Resulting matrix with shape m x n.\n", + "\n", + " Raises\n", + " ------\n", + " ValueError\n", + " If dimensions are incompatible.\n", + " \"\"\"\n", + " if not A or not B:\n", + " return []\n", + "\n", + " # Dimensions\n", + " rows_a, cols_a = len(A), len(A[0])\n", + " rows_b, cols_b = len(B), len(B[0])\n", + "\n", + " if cols_a != rows_b:\n", + " raise ValueError(\"Incompatible dimensions for multiplication\")\n", + "\n", + " # Convert B into columns for faster row‑by‑row multiplication\n", + " B_t = [list(col) for col in zip(*B)]\n", + "\n", + " # Compute each row of the result\n", + " result = [\n", + " [sum(a * b for a, b in zip(row_a, col_b)) for col_b in B_t]\n", + " for row_a in A\n", + " ]\n", + "\n", + " return result\n", + "None\n", + "def matmul(A, B):\n", + " \"\"\"Compute the matrix product of A and B using pure Python.\"\"\"\n", + " # Quick sanity checks\n", + " m, n = len(A), len(A[0])\n", + " p, q = len(B), len(B[0])\n", + " assert n == p, \"Number of columns of A must equal number of rows of B\"\n", + " # Prepare the result matrix with zeros\n", + " result = [[0] * q for _ in range(m)]\n", + " # This variant loops over the outermost index that is likely to be cache-friendly\n", + " # and prefetches the row of B so that we access its elements in order.\n", + " for i in range(m):\n", + " row_a = A[i]\n", + " for k in range(n):\n", + " aik = row_a[k]\n", + " if aik: # skip the zero case to save work\n", + " row_b = B[k]\n", + " for j in range(q):\n", + " result[i][j] += aik * row_b[j]\n", + " return result\n", + "def matmul(A, B):\n", + " m = len(A)\n", + " if m==0: return []\n", + " n = len(A[0]) or 0\n", + " p = len(B[0]) if B else 0\n", + " # ensure B has n rows\n", + " # Use list comprehension summing product over k\n", + " # Compute B transposed for column-wise access: BT = list(zip(*B))\n", + " BT = list(zip(*B))\n", + " return [[sum(a*b for a,b in zip(row,col)) for col in BT] for row in A]\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiply two matrices where each matrix is represented as a list of lists\n", + " and the elements are integers or floats.\n", + "\n", + " Parameters\n", + " ----------\n", + " A : list[list[int|float]]\n", + " Left‑hand matrix of size (m × p).\n", + " B : list[list[int|float]]\n", + " Right‑hand matrix of size (p × n).\n", + "\n", + " Returns\n", + " -------\n", + " list[list[int|float]]\n", + " The product matrix C = A @ B of size (m × n).\n", + " \"\"\"\n", + "\n", + " # Dimensions: A: m×p, B: p×n\n", + " m, p = len(A), len(A[0])\n", + " p2, n = len(B), len(B[0])\n", + " if p != p2:\n", + " raise ValueError(\"Inner dimensions must agree for matrix multiplication\")\n", + "\n", + " # Pre‑transpose B so that column access is contiguous\n", + " # This reduces random memory access during the dot product\n", + " B_T = [list(col) for col in zip(*B)]\n", + "\n", + " result = []\n", + " for i, row in enumerate(A):\n", + " # Compute each entry of the i‑th row of the result\n", + " result_row = []\n", + " for j, col in enumerate(B_T):\n", + " # use a local variable for speed\n", + " dot = 0\n", + " for a, b in zip(row, col):\n", + " dot += a * b\n", + " result_row.append(dot)\n", + " result.append(result_row)\n", + "\n", + " return result\n", + "None\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiply two matrices A and B using plain Python.\n", + " A: list of lists (m × n), B: list of lists (n × p).\n", + " Returns the product (m × p) as a new list of lists.\n", + " \"\"\"\n", + " if not A or not B:\n", + " return []\n", + "\n", + " m = len(A)\n", + " n = len(A[0])\n", + " p = len(B[0])\n", + "\n", + " # Quick compatibility check\n", + " assert len(B) == n, \"Incompatible matrix dimensions\"\n", + "\n", + " # Allocate result matrix\n", + " result = [[0] * p for _ in range(m)]\n", + "\n", + " # Standard triple‑loop multiplication, with a small speed‑up:\n", + " # pull outer indices, cache row and column values locally,\n", + " # and skip inner loop when the coefficient is zero.\n", + " for i in range(m):\n", + " Ai = A[i]\n", + " Ri = result[i]\n", + " for k in range(n):\n", + " aik = Ai[k]\n", + " if aik: # skip zero entries\n", + " Bk = B[k]\n", + " for j in range(p):\n", + " Ri[j] += aik * Bk[j]\n", + " return result\n", + "def matmul(A, B):\n", + " # number of rows in A\n", + " m = len(A)\n", + " # number of columns in B\n", + " p = len(B[0]) if B else 0\n", + " # number of columns in A (used as number of rows in B)\n", + " k = len(A[0]) if A else 0\n", + "\n", + " # If shapes are incompatible, raise an error\n", + " if len(B) != k:\n", + " raise ValueError(\"Incompatible matrices: A.shape[1] != B.shape[0]\")\n", + "\n", + " # Result matrix initialised with zeros\n", + " result = [[0] * p for _ in range(m)]\n", + "\n", + " # Triple‑loop multiplication\n", + " for i in range(m):\n", + " for j in range(p):\n", + " # Compute the dot product of row i of A and column j of B\n", + " s = 0\n", + " for t in range(k):\n", + " s += A[i][t] * B[t][j]\n", + " result[i][j] = s\n", + " return result\n", + "None\n", + "def matmul(A, B):\n", + " # Determine dimensions\n", + " m, n = len(A), len(A[0])\n", + " nB, p = len(B), len(B[0])\n", + " assert n == nB, \"Incompatible matrices\"\n", + " result = [[0]*p for _ in range(m)]\n", + " # Multiply\n", + " for i in range(m):\n", + " Ai = A[i]\n", + " for k in range(n):\n", + " aik = Ai[k]\n", + " if aik:\n", + " Bk = B[k]\n", + " for j in range(p):\n", + " result[i][j] += aik * Bk[j]\n", + " return result\n", + "def matmul(A, B):\n", + " \"\"\"Fast matrix product using only native Python.\n", + "\n", + " Args:\n", + " A (List[List[Number]]): left matrix (n × m)\n", + " B (List[List[Number]]): right matrix (m × p)\n", + "\n", + " Returns:\n", + " List[List[Number]]: the product A * B (n × p)\n", + "\n", + " Raises:\n", + " ValueError: if the matrices cannot be multiplied\n", + " \"\"\"\n", + " # Basic shape checks\n", + " if not A or not A[0] or not B or not B[0]:\n", + " raise ValueError(\"Input matrices must be non‑empty\")\n", + " n, m = len(A), len(A[0])\n", + " if m != len(B):\n", + " raise ValueError(\"Number of columns of A must equal number of rows of B\")\n", + "\n", + " # Transpose B once for better locality\n", + " B_T = list(zip(*B)) # now each element of B_T is a tuple representing a column\n", + "\n", + " # Compute the product\n", + " result = []\n", + " for row in A:\n", + " # dot(row, col) for each column\n", + " result.append([sum(a * b for a, b in zip(row, col)) for col in B_T])\n", + "\n", + " return result\n", + "None\n", + "def matmul(A, B):\n", + " return ...\n", + "def matmul(A, B):\n", + " ...\n", + "def matmul(A, B):\n", + " n = len(A)\n", + " m = len(B[0])\n", + " k = len(B)\n", + " result = [[0]*m for _ in range(n)]\n", + " for i in range(n):\n", + " for j in range(m):\n", + " s = 0\n", + " for l in range(k):\n", + " s += A[i][l] * B[l][j]\n", + " result[i][j] = s\n", + " return result\n", + "None\n", + "None\n", + "def matmul(A, B):\n", + " m = len(A)\n", + " n = len(B[0])\n", + " p = len(B)\n", + " return [[sum(A[i][k] * B[k][j] for k in range(p)) for j in range(n)] for i in range(m)]\n", + "None\n", + "def matmul(A, B):\n", + " return ...\n", + "def matmul(A, B):\n", + " import math\n", + " # Assume square matrices of same size and power of 2\n", + " n = len(A)\n", + " if n == 0:\n", + " return []\n", + " def add(X, Y):\n", + " return [[X[i][j] + Y[i][j] for j in range(n)] for i in range(n)]\n", + " def sub(X, Y):\n", + " return [[X[i][j] - Y[i][j] for j in range(n)] for i in range(n)]\n", + " def split(M):\n", + " k = n // 2\n", + " return [ [row[:k] for row in M[:k]],\n", + " [row[:k] for row in M[k:]],\n", + " [row[k:] for row in M[:k]],\n", + " [row[k:] for row in M[k:]] ]\n", + " def combine(A11, A12, A21, A22):\n", + " k = len(A11)\n", + " result = [ [0]* (k*2) for _ in range(k*2) ]\n", + " for i in range(k):\n", + " result[i][:k] = A11[i]\n", + " result[i][k:] = A12[i]\n", + " result[i+k][:k] = A21[i]\n", + " result[i+k][k:] = A22[i]\n", + " return result\n", + " def strassen(X, Y):\n", + " if n == 1:\n", + " return [[X[0][0]*Y[0][0]]]\n", + " a, b, c, d = split(X)\n", + " e, f, g, h = split(Y)\n", + " p1 = strassen(a, sub(f, h))\n", + " p2 = strassen(add(a, b), h)\n", + " p3 = strassen(add(c, d), e)\n", + " p4 = strassen(d, sub(g, e))\n", + " p5 = strassen(add(a, d), add(e, h))\n", + " p6 = strassen(sub(b, d), add(g, h))\n", + " p7 = strassen(sub(a, c), add(e, f))\n", + " c11 = add(sub(add(p5, p4), p2), p6)\n", + " c12 = add(p1, p2)\n", + " c21 = add(p3, p4)\n", + " c22 = sub(sub(add(p1, p5), p3), p7)\n", + " return combine(c11, c12, c21, c22)\n", + " return strassen(A, B)\n", + "def matmul(A, B):\n", + " n = len(A)\n", + " m = len(A[0])\n", + " p = len(B[0])\n", + " assert len(B) == m\n", + " BT = list(zip(*B)) # transposed as tuples\n", + " C = [[sum(Ai[k] * BTj[k] for k in range(m)) for BTj in BT] for Ai in A]\n", + " return C\n", + "def matmul(A, B):\n", + " n = len(A)\n", + " m = len(B[0])\n", + " p = len(B)\n", + " # transpose B\n", + " B_T = list(map(list, zip(*B))) # list of columns\n", + " return [[sum(a*b for a,b in zip(row, col)) for col in B_T] for row in A]\n", + "None\n", + "def matmul(A, B):\n", + " return ...\n", + "def matmul(A, B):\n", + " ...\n", + "def matmul(A, B):\n", + " \"\"\"Matrix multiplication using only native Python (no external libraries).\n", + "\n", + " Works for arbitrary sized matrices with compatible dimensions.\n", + " The algorithm transposes matrix B to enhance cache locality,\n", + " then uses a list‑comprehension to calculate the dot–product of\n", + " corresponding rows and columns.\n", + "\n", + " Parameters\n", + " ----------\n", + " A : list[list[Number]]\n", + " Left matrix of shape (m, n).\n", + " B : list[list[Number]]\n", + " Right matrix of shape (n, p).\n", + "\n", + " Returns\n", + " -------\n", + " list[list[Number]]\n", + " Resulting product matrix of shape (m, p).\n", + " \"\"\"\n", + " # Transpose B for efficient column access\n", + " B_transposed = list(zip(*B)) # tuples, one per column of B\n", + " return [\n", + " [\n", + " # dot product of row from A and column from B\n", + " sum(a * b for a, b in zip(row_a, col_b))\n", + " for col_b in B_transposed\n", + " ]\n", + " for row_a in A\n", + " ]\n", + "None\n", + "None\n", + "def matmul(A, B):\n", + " return ...\n", + "def matmul(A, B):\n", + " return ...\n", + "def matmul(A, B):\n", + " # check dimension matches\n", + " nrows_a = len(A)\n", + " ncols_a = len(A[0]) if A else 0\n", + " nrows_b = len(B)\n", + " ncols_b = len(B[0]) if B else 0\n", + " if ncols_a != nrows_b:\n", + " raise ValueError(\"Incompatible dimensions\")\n", + " # transpose B for cache locality\n", + " BT = list(zip(*B)) # tuple of tuples used as rows\n", + " result = [[0]*ncols_b for _ in range(nrows_a)]\n", + " for i in range(nrows_a):\n", + " ai = A[i]\n", + " ri = result[i]\n", + " for j in range(ncols_b):\n", + " s = 0\n", + " bj = BT[j]\n", + " for k in range(ncols_a):\n", + " s += ai[k] * bj[k]\n", + " ri[j] = s\n", + " return result\n", + "def matmul(A, B):\n", + " B_T = list(zip(*B))\n", + " res = [[sum(a*b for a,b in zip(row, col)) for col in B_T] for row in A]\n", + " return res\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiply two square matrices A and B.\n", + "\n", + " Parameters\n", + " ----------\n", + " A : list[list[float]]\n", + " First n × n matrix.\n", + " B : list[list[float]]\n", + " Second n × n matrix.\n", + "\n", + " Returns\n", + " -------\n", + " list[list[float]]\n", + " The product matrix C = A @ B.\n", + " \"\"\"\n", + " # Transpose B once for O(1) column access\n", + " B_T = list(zip(*B))\n", + "\n", + " # Compute C[i][j] = dot(A[i], B_T[j])\n", + " return [[sum(a * b for a, b in zip(row, col))\n", + " for col in B_T]\n", + " for row in A]\n", + "def matmul(A, B): return ...\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiply two matrices A and B using native Python lists.\n", + " `A` and `B` must be rectangular (i.e. all rows the same length).\n", + "\n", + " Returns a new matrix containing the product.\n", + " Raises ValueError if the inner dimensions do not match.\n", + " \"\"\"\n", + " if not A or not B:\n", + " return []\n", + "\n", + " n_rows_A, n_cols_A = len(A), len(A[0])\n", + " n_rows_B, n_cols_B = len(B), len(B[0])\n", + "\n", + " if n_cols_A != n_rows_B:\n", + " raise ValueError(\"cannot multiply: inner dimensions do not match\")\n", + "\n", + " # preallocate result matrix\n", + " result = [[0] * n_cols_B for _ in range(n_rows_A)]\n", + "\n", + " for i in range(n_rows_A):\n", + " # Local references for speed\n", + " row_a = A[i]\n", + " for k in range(n_cols_A):\n", + " aik = row_a[k]\n", + " if aik == 0:\n", + " continue # skip zero multiplications\n", + " row_b = B[k]\n", + " for j in range(n_cols_B):\n", + " result[i][j] += aik * row_b[j]\n", + "\n", + " return result\n", + "None\n", + "def matmul(A, B):\n", + " return ...\n", + "None\n", + "None\n", + "None\n", + "def matmul(A, B):\n", + " return ...\n", + "None\n", + "None\n", + "None\n", + "None\n", + "def matmul(A, B):\n", + " \"\"\"Return the matrix product of A and B.\n", + "\n", + " A must be an `n×m` matrix, B an `m×p` matrix.\n", + " Matrices are represented as nested lists of numbers.\n", + " \"\"\"\n", + " if not A or not B:\n", + " return []\n", + "\n", + " # Transpose B once so we can iterate over rows efficiently\n", + " B_T = list(zip(*B))\n", + "\n", + " # Compute each entry using a dot‑product of corresponding rows\n", + " return [\n", + " [sum(a * b for a, b in zip(row_a, col_b)) for col_b in B_T]\n", + " for row_a in A\n", + " ]\n", + "None\n", + "def matmul(A, B):\n", + " # A is m x p, B is p x n\n", + " m = len(A)\n", + " p = len(A[0]) # find p\n", + " n = len(B[0])\n", + " # create result m x n\n", + " result = [... for i in ...]\n", + "def matmul(A, B):\n", + " m, p = len(A), len(A[0]) if A else 0\n", + " p2, n = len(B), len(B[0]) if B else 0\n", + " if p != p2: raise ValueError(\"Incompatible dimensions\")\n", + " # Precompute transpose of B for cache-friendly access\n", + " Bt = list(zip(*B))\n", + " return [[sum(a * b for a, b in zip(row, col)) for col in Bt] for row in A]\n", + "def matmul(A, B): return ...\n", + "def matmul(A, B):\n", + " return ...\n", + "None\n", + "def matmul(A, B):\n", + " return ...\n", + "None\n", + "def matmul(A, B): return ...\n", + "None\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiply two matrices A (m×n) and B (n×p) given as lists of lists.\n", + " Uses only native Python code and is tuned for speed by transposing B.\n", + " \"\"\"\n", + " if not A:\n", + " return []\n", + "\n", + " n_rows_A, n_cols_A = len(A), len(A[0])\n", + " # Basic consistency check – assume all rows have equal length\n", + " # and B has compatible dimensions.\n", + " n_rows_B, n_cols_B = len(B), len(B[0]) if B else 0\n", + "\n", + " # Transpose B to access its columns as tuples (faster indexing)\n", + " B_T = list(zip(*B)) # shape: (p × n)\n", + "\n", + " result = []\n", + " for row_A in A: # iterate over rows of A\n", + " # Compute the dot product of row_A with each column of B\n", + " res_row = [sum(a * b for a, b in zip(row_A, col_B)) for col_B in B_T]\n", + " result.append(res_row)\n", + "\n", + " return result\n", + "None\n", + "None\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiply two matrices A and B (lists of lists) using native Python.\n", + " The matrices are assumed to be square and of compatible dimensions.\n", + " \"\"\"\n", + " n = len(A)\n", + " # Transpose B to improve cache locality for the inner sum\n", + " B_T = list(zip(*B)) # each element is a tuple\n", + " return [[sum(a * b for a, b in zip(row, col))\n", + " for col in B_T]\n", + " for row in A]\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiply two matrices A and B using only native Python code.\n", + "\n", + " Parameters\n", + " ----------\n", + " A : list of lists (m x k)\n", + " B : list of lists (k x n)\n", + "\n", + " Returns\n", + " -------\n", + " C : list of lists (m x n)\n", + " Product matrix such that C[i][j] = sum(A[i][p] * B[p][j] for p in range(k))\n", + "\n", + " Raises\n", + " ------\n", + " ValueError\n", + " If the number of columns in A does not equal the number of rows in B.\n", + " \"\"\"\n", + " if not A or not B:\n", + " raise ValueError(\"Both matrices must be non‑empty.\")\n", + " m, k1 = len(A), len(A[0])\n", + " k2, n = len(B), len(B[0])\n", + " if k1 != k2:\n", + " raise ValueError(f\"Incompatible shapes: {m}x{k1} multiplied by {k2}x{n}\")\n", + " # Pre‑compute columns of B for faster access\n", + " B_cols = list(zip(*B)) # n tuples each of length k\n", + " # Compute the product\n", + " return [[sum(a * b for a, b in zip(row, col))\n", + " for col in B_cols] for row in A]\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Fast matrix multiplication for plain Python objects.\n", + " A : list of m rows, each a list of n numbers\n", + " B : list of n rows, each a list of p numbers\n", + " Returns a new matrix of shape m × p.\n", + " \"\"\"\n", + " m = len(A)\n", + " if m == 0:\n", + " return []\n", + " n = len(A[0])\n", + " # Verify dimension compatibility\n", + " if len(B) != n or any(len(row) != n for row in A):\n", + " raise ValueError(\"Inner dimensions must agree.\")\n", + " p = len(B[0])\n", + "\n", + " # Allocate result matrix\n", + " result = [[0] * p for _ in range(m)]\n", + "\n", + " # Standard triple-loop, optimized for speed in pure Python\n", + " for i in range(m):\n", + " row_a = A[i]\n", + " row_res = result[i]\n", + " for k in range(n):\n", + " a_val = row_a[k]\n", + " row_b = B[k]\n", + " # Use local variables for performance\n", + " for j in range(p):\n", + " row_res[j] += a_val * row_b[j]\n", + "\n", + " return result\n", + "def matmul(A, B):\n", + " \"\"\"Multiply two matrices A and B.\n", + "\n", + " A: list of m rows, each containing n elements.\n", + " B: list of n rows, each containing p elements.\n", + " Returns a new list of list representing the product matrix of shape (m, p).\n", + " \"\"\"\n", + " # Basic sanity checks\n", + " if not A or not B:\n", + " raise ValueError(\"Input matrices must be non‑empty.\")\n", + " m, n = len(A), len(A[0])\n", + " nB, p = len(B), len(B[0])\n", + " if n != nB:\n", + " raise ValueError(\"Number of columns of A must equal number of rows of B.\")\n", + " for row in A:\n", + " if len(row) != n:\n", + " raise ValueError(\"All rows of A must have the same length.\")\n", + " for row in B:\n", + " if len(row) != p:\n", + " raise ValueError(\"All rows of B must have the same length.\")\n", + "\n", + " # Allocate the result matrix (m x p) initialized to 0\n", + " result = [[0] * p for _ in range(m)]\n", + "\n", + " # Perform multiplication\n", + " for i in range(m):\n", + " rowA = A[i]\n", + " rowR = result[i]\n", + " for k in range(n):\n", + " aik = rowA[k]\n", + " if aik: # Skip work for zero multiplication\n", + " rowB = B[k]\n", + " for j in range(p):\n", + " rowR[j] += aik * rowB[j]\n", + " return result\n", + "None\n", + "None\n", + "def matmul(A, B):\n", + " # Validate dimensions\n", + " if not A or not B:\n", + " return []\n", + " if len(A[0]) != len(B):\n", + " raise ValueError(\"Matrix dimensions do not match for multiplication.\")\n", + " B_cols = list(zip(*B)) # transpose B\n", + " result = [[sum(a*b for a,b in zip(row, col)) for col in B_cols] for row in A]\n", + " return result\n", + "None\n", + "def matmul(A, B):\n", + " \"\"\"Multiply matrices A × B (list‑of‑list format) using an optimized pure‑Python routine.\"\"\"\n", + " n, p = len(A), len(A[0]) # rows of A, columns of B (must match)\n", + " m = len(B[0]) # columns of B\n", + " # Result matrix initialized with zeros\n", + " result = [[0] * m for _ in range(n)]\n", + " for i in range(n):\n", + " rowA = A[i]\n", + " rowR = result[i]\n", + " for k in range(p):\n", + " aik = rowA[k]\n", + " if aik: # skip zero entries for a small extra speedup\n", + " rowBk = B[k]\n", + " for j in range(m):\n", + " rowR[j] += aik * rowBk[j]\n", + " return result\n", + "def matmul(A, B):\n", + " \"\"\"Fast matrix multiplication using only native Python code.\n", + "\n", + " Parameters\n", + " ----------\n", + " A : list[list[float]]\n", + " Left hand matrix of shape (n, m).\n", + " B : list[list[float]]\n", + " Right hand matrix of shape (m, q).\n", + "\n", + " Returns\n", + " -------\n", + " list[list[float]]\n", + " The product matrix of shape (n, q).\n", + "\n", + " Notes\n", + " -----\n", + " The routine pre-allocates the result matrix and uses local variable\n", + " bindings to reduce attribute look‑ups inside the innermost loop,\n", + " which gives a noticeable speed boost for large matrices.\n", + " \"\"\"\n", + " # Basic dimensions checking\n", + " n, m = len(A), len(A[0])\n", + " m2, q = len(B), len(B[0])\n", + " if m != m2:\n", + " raise ValueError(\"Number of columns in A must equal number of rows in B.\")\n", + "\n", + " # Pre‑allocate result matrix with zeros\n", + " result = [[0.0] * q for _ in range(n)]\n", + "\n", + " # Perform multiplication\n", + " for i in range(n):\n", + " rowA = A[i]\n", + " rowR = result[i]\n", + " for k in range(m):\n", + " aik = rowA[k]\n", + " rowB = B[k]\n", + " for j in range(q):\n", + " rowR[j] += aik * rowB[j]\n", + " return result\n", + "def matmul(A, B):\n", + " B_T = list(zip(*B)) # transpose B for inner product\n", + " return [[sum(a*b for a,b in zip(row, col)) for col in B_T] for row in A]\n", + "def matmul(A, B):\n", + " # A: m x n, B: n x p\n", + " m, n = len(A), len(A[0])\n", + " n2, p = len(B), len(B[0])\n", + " assert n==n2\n", + " # initialize result matrix\n", + " C = [[0]*p for _ in range(m)]\n", + " # transpose B for better locality\n", + " B_T = list(zip(*B))\n", + " for i in range(m):\n", + " Ai = A[i]\n", + " Ci = C[i]\n", + " for k in range(n):\n", + " aik = Ai[k]\n", + " Bk = B_T[k]\n", + " for j in range(p):\n", + " Ci[j] += aik * Bk[j]\n", + " return C\n", + "def matmul(A, B):\n", + " \"\"\"Fast matrix multiplication using pure Python.\n", + "\n", + " Arguments:\n", + " A: List of lists, the left matrix of size m x n.\n", + " B: List of lists, the right matrix of size n x p.\n", + "\n", + " Returns a new matrix C of size m x p where C[i][j] = sum(A[i][k] * B[k][j] for k in range(n)).\n", + "\n", + " This implementation transposes B once to enable efficient column access\n", + " and uses nested list comprehensions together with the built‑in `sum`\n", + " function, which is implemented in C.\n", + " \"\"\"\n", + " # Verify dimensions\n", + " if not A or not B:\n", + " raise ValueError(\"Matrices cannot be empty\")\n", + " n = len(A[0])\n", + " if any(len(row) != n for row in A):\n", + " raise ValueError(\"All rows in A must have the same length\")\n", + " if len(B) != n:\n", + " raise ValueError(\"Number of columns in A must equal number of rows in B\")\n", + " p = len(B[0])\n", + " if any(len(row) != p for row in B):\n", + " raise ValueError(\"All rows in B must have the same length\")\n", + "\n", + " # Transpose B to get columns as rows\n", + " B_cols = list(zip(*B))\n", + "\n", + " # Compute product\n", + " return [[sum(a * b for a, b in zip(row, col)) for col in B_cols] for row in A]\n", + "def matmul(A, B):\n", + " BT = list(zip(*B))\n", + " return [[sum(a*b for a,b in zip(row, col)) for col in BT] for row in A]\n", + "def matmul(A, B):\n", + " \"\"\"Multiplies matrix A by matrix B using pure native Python.\n", + "\n", + " Parameters\n", + " ----------\n", + " A : list[list[float]]\n", + " The first matrix of shape (l, m).\n", + " B : list[list[float]]\n", + " The second matrix of shape (m, n).\n", + "\n", + " Returns\n", + " -------\n", + " list[list[float]]\n", + " The product matrix of shape (l, n).\n", + " \"\"\"\n", + " # Pre‑get dimensions for speed\n", + " l = len(A) # Number of rows in A\n", + " m = len(A[0]) # Number of columns in A / rows in B\n", + " n = len(B[0]) # Number of columns in B\n", + "\n", + " # The result will have shape (l, n)\n", + " result = [[0.0] * n for _ in range(l)]\n", + "\n", + " for i in range(l):\n", + " Ai = A[i]\n", + " for k in range(m):\n", + " aik = Ai[k]\n", + " Bk = B[k]\n", + " # Unroll column loop to reduce attribute lookups\n", + " for j in range(n):\n", + " result[i][j] += aik * Bk[j]\n", + "\n", + " return result\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiply two matrices A (n x m) and B (m x p) using only native Python.\n", + " Returns the resulting matrix as a list of lists.\n", + " \"\"\"\n", + " n = len(A)\n", + " m = len(A[0])\n", + " if len(B) != m:\n", + " raise ValueError(\"Number of columns in A must equal number of rows in B\")\n", + " p = len(B[0])\n", + "\n", + " # Pre‑allocate result matrix with zeros\n", + " result = [[0] * p for _ in range(n)]\n", + "\n", + " for i in range(n):\n", + " row_a = A[i]\n", + " for k in range(m):\n", + " aik = row_a[k] # A[i][k]\n", + " row_b = B[k] # The k-th row of B\n", + " for j in range(p):\n", + " result[i][j] += aik * row_b[j] # C[i][j] += A[i][k] * B[k][j]\n", + " return result\n", + "def matmul(A, B):\n", + " return ...\n", + "def matmul(A, B):\n", + " return ...\n", + "None\n", + "def matmul(A, B):\n", + " # assume A dims m x n, B n x p\n", + "def matmul(A, B):\n", + " return ...\n", + "def matmul(A, B):\n", + " # Should handle maybe rectangular matrices\n", + " ...\n", + "def matmul(A, B): return ...\n", + "def matmul(A, B): return ...\n", + "def matmul(A, B):\n", + " result = []\n", + " for i in range(len(A)):\n", + " res_row = []\n", + " for j in range(len(B[0])):\n", + " sum_val = 0\n", + " for k in range(len(B)):\n", + " sum_val += A[i][k] * B[k][j]\n", + " res_row.append(sum_val)\n", + " result.append(res_row)\n", + " return result\n", + "def matmul(A, B): return ...\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiply two matrices represented as lists of lists using only\n", + " standard Python code. Raises a ValueError if the matrices cannot\n", + " be multiplied.\n", + " \n", + " Parameters\n", + " ----------\n", + " A : List[List[Number]]\n", + " The left-hand-side matrix of shape (m, n).\n", + " B : List[List[Number]]\n", + " The right-hand-side matrix of shape (n, p).\n", + "\n", + " Returns\n", + " -------\n", + " C : List[List[Number]]\n", + " The product matrix of shape (m, p).\n", + " \"\"\"\n", + " if not A or not B:\n", + " raise ValueError(\"Empty matrices cannot be multiplied\")\n", + "\n", + " # Verify inner dimensions match\n", + " n = len(A[0])\n", + " for row in A:\n", + " if len(row) != n:\n", + " raise ValueError(\"All rows of A must have the same length\")\n", + " if len(B) != n:\n", + " raise ValueError(\"Number of columns in A must equal number of rows in B\")\n", + "\n", + " p = len(B[0])\n", + " for row in B:\n", + " if len(row) != p:\n", + " raise ValueError(\"All rows of B must have the same length\")\n", + "\n", + " # Transpose B so that columns can be accessed as tuples\n", + " B_T = list(zip(*B)) # Each element is a tuple of length n\n", + "\n", + " # Compute the product\n", + " C = []\n", + " for a_row in A:\n", + " c_row = []\n", + " for b_col in B_T:\n", + " dot_product = sum(x * y for x, y in zip(a_row, b_col))\n", + " c_row.append(dot_product)\n", + " C.append(c_row)\n", + "\n", + " return C\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiply two matrices A and B represented as nested lists.\n", + " \n", + " Parameters:\n", + " A (list[list[float]]): Matrix of size (m x n).\n", + " B (list[list[float]]): Matrix of size (n x p).\n", + " \n", + " Returns:\n", + " list[list[float]]: Resultant matrix of size (m x p).\n", + " \"\"\"\n", + " # Pre‑compute the columns of B\n", + " B_cols = list(zip(*B))\n", + " \n", + " # Compute the product row by row\n", + " return [\n", + " [\n", + " sum(a * b for a, b in zip(row, col))\n", + " for col in B_cols\n", + " ]\n", + " for row in A\n", + " ]\n", + "def matmul(A, B):\n", + " # Basic dimension check\n", + " if not A or not B:\n", + " return []\n", + " n_rows_a = len(A)\n", + " n_cols_a = len(A[0])\n", + " n_rows_b = len(B)\n", + " n_cols_b = len(B[0])\n", + " if n_cols_a != n_rows_b:\n", + " raise ValueError(\"Incompatible dimensions for matrix multiplication\")\n", + " result = [[0] * n_cols_b for _ in range(n_rows_a)]\n", + " for i in range(n_rows_a):\n", + " for k in range(n_cols_a):\n", + " aik = A[i][k]\n", + " if aik == 0:\n", + " continue\n", + " for j in range(n_cols_b):\n", + " result[i][j] += aik * B[k][j]\n", + " return result\n", + "def matmul(A, B): return ...\n", + "def matmul(A, B):\n", + " return ...\n", + "def matmul(A, B):\n", + " # A: m x n, B: n x p\n", + " # returns m x p\n", + " # check sizes\n", + " m, n = len(A), len(A[0])\n", + " assert len(B) == n\n", + " p = len(B[0])\n", + " # Precompute transpose of B\n", + " Bt = list(zip(*B))\n", + " res = [[sum(a*b for a,b in zip(row, col)) for col in Bt] for row in A]\n", + " return res\n", + "def matmul(A, B):\n", + " return ...\n", + "def matmul(A, B):\n", + " \"\"\"Multiply two matrices A and B (list of lists) purely in plain Python.\"\"\"\n", + " m, n = len(A), len(A[0])\n", + " nB, p = len(B), len(B[0])\n", + " if n != nB:\n", + " raise ValueError(\"Inner matrix dimensions must agree.\")\n", + " # Pre‑allocate result matrix\n", + " C = [[0] * p for _ in range(m)]\n", + " # Perform multiplication in the ctr order: i, k, j\n", + " for i in range(m):\n", + " rowA = A[i]\n", + " rowC = C[i]\n", + " for k in range(n):\n", + " aik = rowA[k]\n", + " if aik: # skip zero multiplications\n", + " rowB = B[k]\n", + " for j in range(p):\n", + " rowC[j] += aik * rowB[j]\n", + " return C\n", + "def matmul(A, B): return ...\n", + "def matmul(A, B):\n", + " return ...\n", + "def matmul(A, B):\n", + " B_T = list(zip(*B))\n", + " return [[sum(a*b for a,b in zip(row, col)) for col in B_T] for row in A]\n", + "def matmul(A, B):\n", + " return ...\n", + "None\n", + "def matmul(A, B):\n", + " ...\n", + "def matmul(A, B):\n", + " ...\n", + "def matmul(A, B):\n", + " ...\n", + "None\n", + "def matmul(A, B):\n", + " # assume A rows, B columns\n", + " m, k1 = len(A), len(A[0]) if A else 0\n", + " k2, n = len(B), len(B[0]) if B else 0\n", + " assert k1 == k2, \"Inner dimensions must match\"\n", + " B_T = list(zip(*B)) # transposed B\n", + " result = [[sum(a*b for a,b in zip(row, col)) for col in B_T] for row in A]\n", + " return result\n", + "None\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiply two matrices A and B (both given as lists of lists) using\n", + " only native Python constructs.\n", + "\n", + " The function first validates that the inner dimensions match (number of\n", + " columns in A must equal the number of rows in B). Then it computes the\n", + " product using a straightforward triple‑loop but expressed in a compact\n", + " list‑comprehension. The columns of B are accessed by transposing B\n", + " once via ``zip(*B)``, which avoids explicit indexing and gives good\n", + " cache locality in Python.\n", + "\n", + " Parameters\n", + " ----------\n", + " A : List[List[Number]]\n", + " Left‑hand matrix of size m × n.\n", + "\n", + " B : List[List[Number]]\n", + " Right‑hand matrix of size n × p.\n", + "\n", + " Returns\n", + " -------\n", + " List[List[Number]]\n", + " The matrix product of A and B, which has shape m × p.\n", + "\n", + " Raises\n", + " ------\n", + " ValueError\n", + " If the matrices cannot be multiplied due to incompatible dimensions.\n", + " \"\"\"\n", + " # Validate dimensions\n", + " if not A or not B:\n", + " raise ValueError(\"Matrices cannot be empty\")\n", + " n_cols_A = len(A[0])\n", + " n_rows_B = len(B)\n", + " if n_cols_A != n_rows_B:\n", + " raise ValueError(\"Inner dimensions must match: \"\n", + " f\"{n_cols_A} != {n_rows_B}\")\n", + "\n", + " # Transpose B once to make column access efficient\n", + " B_cols = list(zip(*B))\n", + "\n", + " # Compute product using list comprehensions\n", + " return [\n", + " [sum(a * b for a, b in zip(row, col)) for col in B_cols]\n", + " for row in A\n", + " ]\n", + "def matmul(A, B):\n", + " if not A or not B:\n", + " return []\n", + " m, p = len(A), len(A[0])\n", + " n = len(B[0])\n", + " # ensure inner dimension matches\n", + " result = [[0]*n for _ in range(m)]\n", + " for i in range(m):\n", + " for k in range(p):\n", + " aik = A[i][k]\n", + " for j in range(n):\n", + " result[i][j] += aik * B[k][j]\n", + " return result\n", + "def matmul(A, B):\n", + " n = len(A)\n", + " m = len(B[0])\n", + " p = len(B)\n", + " return [[sum(A[i][k] * B[k][j] for k in range(p)) for j in range(m)] for i in range(n)]\n", + "def matmul(A, B):\n", + " # Basic check dims\n", + " n = len(A)\n", + " assert n > 0\n", + " m = len(A[0])\n", + " # B has size m x p\n", + " assert len(B) == m\n", + " p = len(B[0])\n", + " # Pre-allocate result\n", + " C = [[0]*p for _ in range(n)]\n", + " # Compute transpose of B for better locality\n", + " B_T = list(map(list, zip(*B)))\n", + " for i in range(n):\n", + " Ai = A[i]\n", + " Ci = C[i]\n", + " for j in range(p):\n", + " Bj = B_T[j]\n", + " s = 0\n", + " for k in range(m):\n", + " s += Ai[k] * Bj[k]\n", + " Ci[j] = s\n", + " return C\n", + "def matmul(A, B):\n", + " \"\"\"Return the product of two matrices in Theta(m^3/mn)*n^2 time,\n", + " optimizing cache usage if --precache-multiple was selected.\n", + " \"\"\"\n", + " _check_input(A, B)\n", + " m = len(A); n = len(A[0]); k = len(B[0]); # (m x n) * (n x k) => (m x k)\n", + " _use_a = (m / n <= 100.0 * n / k)\n", + " M_local_cache_avail_prefs = sympy.cache.get('M_local_cache_avail_prefs', 0)\n", + " M_local_cache_avail_nonp = sympy.cache.get('M_local_cache_avail_nonp', 0)\n", + "\n", + " if sympy.cache and M_local_cache_avail_nonp and _use_a:\n", + " local_cache_key_A = (m,n,1)\n", + " local_cache_key_B = (n,k,1)\n", + " if local_cache_key_A not in sympy.cache:\n", + " sympy.cache[local_cache_key_A] = [(r,c) for r in range(m) for c in range(n)]\n", + " if local_cache_key_B not in sympy.cache:\n", + " sympy.cache[local_cache_key_B] = [(r,c) for r in range(n) for c in range(k)]\n", + " A_local, B_local = sympy.cache[local_cache_key_A], sympy.cache[local_cache_key_B]\n", + " else:\n", + " A_local, B_local = None, None\n", + "\n", + " def fast_matmul(A_local, B_local, _):\n", + " result = [[sum(A_local[i][j] * B_local[j][l] for j in range(n))\n", + " for l in range(k)]\n", + " for i in range(m)]\n", + " return result\n", + "\n", + " if A_local and B_local:\n", + " return fast_matmul(A_local, B_local, k)\n", + " else:\n", + " return fast_matmul(A, B, k)\n", + "def matmul(A, B):\n", + " n, m = len(A), len(B)\n", + " assert all(len(row)==m for row in A) and all(len(row)==len(B[0]) for row in B)\n", + " # maybe compute B transposed\n", + " BT = list(zip(*B))\n", + " result = [[sum(a*b for a,b in zip(row, col)) for col in BT] for row in A]\n", + " return result\n", + "def matmul(A, B):\n", + " m = len(A)\n", + " n = len(A[0])\n", + " p = len(B[0])\n", + " # compute result matrix C(m x p)\n", + " result = [[sum(A[i][k]*B[k][j] for k in range(n)) for j in range(p)] for i in range(m)]\n", + " return result\n", + "None\n", + "None\n", + "None\n", + "def matmul(A, B):\n", + " n = len(A)\n", + " m = len(B[0])\n", + " p = len(B)\n", + " result = [[0]*m for _ in range(n)]\n", + " for i in range(n):\n", + " Ai=A[i]\n", + " Ri=result[i]\n", + " for k in range(p):\n", + " aik=Ai[k]\n", + " Bk=B[k]\n", + " for j in range(m):\n", + " Ri[j]+=aik*Bk[j]\n", + " return result\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiply two matrices A (n x m) and B (m x p) without using external libraries.\n", + " \"\"\"\n", + " n=len(A)\n", + " m=len(A[0])\n", + " p=len(B[0])\n", + " # Check compatibility\n", + " if len(B)!=m:\n", + " raise ValueError(\"Incompatible dimensions\")\n", + " # Transpose B for cache-friendly access\n", + " B_T=[list(col) for col in zip(*B)]\n", + " # Prepare result matrix\n", + " result=[[0]*p for _ in range(n)]\n", + " for i in range(n):\n", + " row=A[i]\n", + " # local assignments\n", + " res_row=result[i]\n", + " for j in range(p):\n", + " col=B_T[j]\n", + " s=0\n", + " for a, b in zip(row, col):\n", + " s += a*b\n", + " res_row[j]=s\n", + " return result\n", + "def matmul(A, B):\n", + " \"\"\"Multiply two matrices A (m x n) and B (n x p) using pure Python.\n", + "\n", + " Args:\n", + " A: list of m lists, each of length n.\n", + " B: list of n lists, each of length p.\n", + "\n", + " Returns:\n", + " C: list of m lists, each of length p, the product.\n", + " \"\"\"\n", + " m = len(A); n = len(A[0]); p = len(B[0])\n", + " # Preallocate result\n", + " C = [[0]*p for _ in range(m)]\n", + " # transpose B for better cache (although Python's memory model)\n", + " B_T = [list(col) for col in zip(*B)]\n", + " # iterate over rows of A and rows of B_T\n", + " for i in range(m):\n", + " Ai = A[i]\n", + " Ci = C[i]\n", + " for k in range(p):\n", + " s = 0\n", + " Bk = B_T[k]\n", + " for j in range(n):\n", + " s += Ai[j] * Bk[j]\n", + " Ci[k] = s\n", + " return C\n", + "def matmul(A, B):\n", + " # A: m x n, B: n x p\n", + " m, n = len(A), len(A[0])\n", + " n2, p = len(B), len(B[0])\n", + " assert n == n2\n", + " # Precompute columns of B\n", + " Bcols = list(zip(*B))\n", + " return [[sum(a*b for a,b in zip(row,c)) for c in Bcols] for row in A]\n", + "None\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiply two matrices using pure Python.\n", + "\n", + " Parameters\n", + " ----------\n", + " A : List[List[Number]]\n", + " The first matrix (m × n).\n", + " B : List[List[Number]]\n", + " The second matrix (n × p).\n", + "\n", + " Returns\n", + " -------\n", + " List[List[Number]]\n", + " The product matrix (m × p).\n", + "\n", + " Raises\n", + " ------\n", + " ValueError\n", + " If the inner dimensions of A and B do not match.\n", + " \"\"\"\n", + " if not A or not B or not B[0]:\n", + " return []\n", + "\n", + " m = len(A)\n", + " n = len(A[0])\n", + " if len(B) != n:\n", + " raise ValueError(\"Inner dimensions must agree: A: {}, B: {}.\".format(n, len(B)))\n", + "\n", + " p = len(B[0])\n", + " # column count of B must be p\n", + " # Use list comprehensions for a compact, Python‑native implementation\n", + " return [[sum(A[i][k] * B[k][j] for k in range(n)) for j in range(p)] for i in range(m)]\n", + "def matmul(A, B):\n", + " ...\n", + "None\n", + "def matmul(A, B):\n", + " return ...\n", + "def matmul(A, B):\n", + " return ...\n", + "def matmul(A, B): return ...\n", + "def matmul(A, B):\n", + " # matrix multiplication using only native Python\n", + " ...\n", + " return ...\n", + "def matmul(A, B):\n", + " n_rows = len(A)\n", + " n_cols = len(B[0])\n", + " k = len(B) # check match\n", + " if any(len(row)!=k for row in A):\n", + " raise ValueError(\"A dimensions don't match B\")\n", + " # transpose B for better cache\n", + " B_T = list(zip(*B)) # returns tuples but we can keep\n", + " result = [[0]*n_cols for _ in range(n_rows)]\n", + " for i in range(n_rows):\n", + " Ai = A[i]\n", + " Ri = result[i]\n", + " for j, Bj in enumerate(B_T):\n", + " s = 0\n", + " for a,b in zip(Ai, Bj):\n", + " s += a*b\n", + " Ri[j] = s\n", + " return result\n", + "def matmul(A, B):\n", + " n = len(A)\n", + " m = len(B[0])\n", + " p = len(B)\n", + " # assert p == len(A[0])? we can compute\n", + " result = [[0]*m for _ in range(n)]\n", + " for i in range(n):\n", + " ai = A[i]\n", + " for k in range(p):\n", + " aik = ai[k]\n", + " if aik:\n", + " bk = B[k]\n", + " for j in range(m):\n", + " result[i][j] += aik * bk[j]\n", + " return result\n", + "None\n", + "None\n", + "def matmul(A, B):\n", + " # Ensure both are lists of lists, etc. \n", + " return [[sum(a*b for a,b in zip(row,col)) for col in zip(*B)] for row in A]\n", + "None\n", + "None\n", + "def matmul(A, B): return ...\n", + "None\n", + "def matmul(A, B): return ...\n", + "None\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiplies two matrices A and B using pure Python lists.\n", + " Expects A to be m x n and B to be n x p.\n", + " Returns the resulting m x p matrix.\n", + " \"\"\"\n", + " return [[sum(a * b for a, b in zip(rowA, colB)) for colB in zip(*B)] for rowA in A]\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiply two matrices A and B using only native Python constructs.\n", + " \n", + " Parameters\n", + " ----------\n", + " A : list[list[Number]]\n", + " Left‑hand matrix.\n", + " B : list[list[Number]]\n", + " Right‑hand matrix.\n", + " \n", + " Returns\n", + " -------\n", + " list[list[Number]]\n", + " The product matrix A * B.\n", + " \"\"\"\n", + " # Transpose B to ease column access\n", + " Bt = list(zip(*B))\n", + " # Compute each entry as the dot product of a row of A and a column of B\n", + " return [[sum(a * b for a, b in zip(row, col))\n", + " for col in Bt] for row in A]\n", + "None\n", + "def matmul(A, B):\n", + " # assume A: m x n, B: n x p\n", + " n = len(A)\n", + " m = len(A[0])\n", + " p = len(B[0])\n", + " # precompute columns of B via zip\n", + " columns_B = list(zip(*B))\n", + " result = [[sum(a*b for a,b in zip(row, col)) for col in columns_B] for row in A]\n", + " return result\n", + "None\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiply two matrices A and B using plain Python.\n", + " A : list of m rows, each a list of p numbers\n", + " B : list of p rows, each a list of n numbers\n", + " Returns a list of m rows, each a list of n numbers\n", + " \"\"\"\n", + " m, p = len(A), len(A[0]) # dimensions of A\n", + " assert p == len(B), \"Incompatible matrix dimensions\"\n", + " n = len(B[0]) # number of columns of B\n", + "\n", + " # initialise result matrix with zeros\n", + " C = [[0] * n for _ in range(m)]\n", + "\n", + " # iterate over the shared dimension first\n", + " for k in range(p):\n", + " B_row = B[k]\n", + " for i in range(m):\n", + " aik = A[i][k]\n", + " if aik: # skip multiplies by zero\n", + " C_row = C[i]\n", + " # the inner loop that does the real work\n", + " for j in range(n):\n", + " C_row[j] += aik * B_row[j]\n", + " return C\n", + "def matmul(A, B):\n", + " m = len(A); n = len(A[0]); p = len(B[0])\n", + " # validate B has proper shape\n", + " result = [[0]*p for _ in range(m)]\n", + " for i in range(m):\n", + " rowA = A[i]\n", + " row_res = result[i]\n", + " for k in range(n):\n", + " aik = rowA[k]\n", + " if aik:\n", + " rowBk = B[k]\n", + " for j in range(p):\n", + " row_res[j] += aik * rowBk[j]\n", + " return result\n", + "None\n", + "None\n", + "def matmul(A, B):\n", + " if not A or not B:\n", + " return []\n", + " n, m = len(A), len(A[0])\n", + " p, q = len(B), len(B[0])\n", + " if m != p:\n", + " raise ValueError(\"Inner matrix dimensions must agree.\")\n", + " # compute C with zeros\n", + " C = [[0]*q for _ in range(n)]\n", + " for i in range(n):\n", + " Ai = A[i]\n", + " Ci = C[i]\n", + " for k in range(m):\n", + " aik = Ai[k]\n", + " Bk = B[k]\n", + " for j in range(q):\n", + " Ci[j] += aik * Bk[j]\n", + " return C\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiply two matrices A (n × p) and B (p × m) using a cache‑friendly\n", + " block‑style algorithm that reduces Python overhead compared to a\n", + " straightforward triple loop.\n", + " \"\"\"\n", + " n = len(A) # rows of A\n", + " p = len(B) # shared dimension\n", + " m = len(B[0]) # columns of B\n", + "\n", + " # Result matrix initialised with zeros\n", + " C = [[0] * m for _ in range(n)]\n", + "\n", + " # Optimised loop ordering: i → k → j\n", + " for i in range(n):\n", + " Ai = A[i]\n", + " Ci = C[i]\n", + " for k in range(p):\n", + " aik = Ai[k]\n", + " if aik: # skip zero multiplications\n", + " Bk = B[k]\n", + " for j in range(m):\n", + " Ci[j] += aik * Bk[j]\n", + " return C\n", + "def matmul(A, B):\n", + " # ensure convertible shapes\n", + " m, n = len(A), len(A[0])\n", + " p = len(B[0])\n", + " # Precompute transpose of B to improve cache locality\n", + " BT = list(zip(*B))\n", + " result = [[0]*p for _ in range(m)]\n", + " for i in range(m):\n", + " Ai = A[i]\n", + " Ri = result[i]\n", + " for j, Bj in enumerate(BT):\n", + " s = 0\n", + " for k in range(n):\n", + " s += Ai[k] * Bj[k]\n", + " Ri[j] = s\n", + " return result\n", + "None\n", + "def matmul(A, B): return ...\n", + "None\n", + "None\n", + "None\n", + "None\n", + "None\n", + "None\n", + "def matmul(A, B):\n", + " # ensure shape\n", + " if not A: return []\n", + " m, k = len(A), len(A[0])\n", + " k2, n = len(B), len(B[0])\n", + " assert k == k2\n", + " # transpose B\n", + " B_T = list(zip(*B))\n", + " # compute\n", + " return [[sum(a*b for a,b in zip(row, col)) for col in B_T] for row in A]\n", + "None\n", + "def matmul(A, B):\n", + " # A is an m x n matrix, B is an n x p matrix\n", + " m = len(A)\n", + " n = len(A[0]) if A else 0\n", + " p = len(B[0]) if B else 0\n", + "\n", + " # In case the dimensions are incompatible, raise an error\n", + " if n != len(B):\n", + " raise ValueError(\"Incompatible matrix dimensions: A has %d cols but B has %d rows.\" % (n, len(B)))\n", + "\n", + " # Prepare the result matrix filled with zeros\n", + " C = [[0] * p for _ in range(m)]\n", + "\n", + " # Perform the multiplication using a cache-friendly triple loop\n", + " for i in range(m):\n", + " Ai = A[i]\n", + " Ci = C[i]\n", + " for k in range(n):\n", + " aik = Ai[k]\n", + " Bk = B[k]\n", + " for j in range(p):\n", + " Ci[j] += aik * Bk[j]\n", + " return C\n", + "None\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiply two matrices A and B using plain Python lists.\n", + "\n", + " Parameters\n", + " ----------\n", + " A : list[list[float]]\n", + " Left matrix of shape (m, n).\n", + " B : list[list[float]]\n", + " Right matrix of shape (n, p).\n", + "\n", + " Returns\n", + " -------\n", + " list[list[float]]\n", + " The product matrix of shape (m, p).\n", + "\n", + " Raises\n", + " ------\n", + " ValueError\n", + " If the inner dimensions do not agree.\n", + " \"\"\"\n", + " # Validate input\n", + " if not A or not B:\n", + " return []\n", + "\n", + " m, n_A = len(A), len(A[0])\n", + " n_B, p = len(B), len(B[0])\n", + "\n", + " if n_A != n_B:\n", + " raise ValueError(\"Inner matrix dimensions must agree: \"\n", + " f\"{n_A} != {n_B}\")\n", + "\n", + " # Pre‑allocate result matrix with zeros\n", + " C = [[0.0 for _ in range(p)] for _ in range(m)]\n", + "\n", + " # Classic triple‑loop multiplication\n", + " for i in range(m):\n", + " Ai = A[i]\n", + " Ci = C[i]\n", + " for k in range(n_A):\n", + " aik = Ai[k]\n", + " Bk = B[k]\n", + " # Unroll the inner loop over p\n", + " for j in range(p):\n", + " Ci[j] += aik * Bk[j]\n", + "\n", + " return C\n", + "None\n", + "None\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiplies two matrices A and B represented as lists of lists using\n", + " plain Python code. Handles generic rectangular matrices.\n", + " \"\"\"\n", + " # Basic size validation\n", + " if not A or not B or not A[0] or not B[0]:\n", + " raise ValueError(\"Matrices cannot be empty\")\n", + " rows_a = len(A)\n", + " cols_a = len(A[0])\n", + " rows_b = len(B)\n", + " cols_b = len(B[0])\n", + "\n", + " if cols_a != rows_b:\n", + " raise ValueError(\"Inner dimensions must match for multiplication\")\n", + "\n", + " # Compute the product using a straightforward triple loop\n", + " result = [\n", + " [sum(A[i][k] * B[k][j] for k in range(cols_a)) for j in range(cols_b)]\n", + " for i in range(rows_a)\n", + " ]\n", + "\n", + " return result\n", + "def matmul(A, B):\n", + " return ...\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiply two matrices using a naïve algorithm implemented in pure Python.\n", + " \"\"\"\n", + " # Get dimensions\n", + " na = len(A)\n", + " ma = len(A[0]) if A else 0\n", + " nb = len(B)\n", + " mb = len(B[0]) if B else 0\n", + " \n", + " if ma != nb:\n", + " raise ValueError(\"Inner dimensions must match for multiplication.\")\n", + " \n", + " # Create result matrix\n", + " result = [[0 for _ in range(mb)] for _ in range(na)]\n", + " \n", + " # Standard triple‑loop multiplication\n", + " for i in range(na):\n", + " Ai = A[i]\n", + " for j in range(mb):\n", + " sum = 0\n", + " for k in range(ma):\n", + " sum += Ai[k] * B[k][j]\n", + " result[i][j] = sum\n", + " return result\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiply two matrices A and B and return the product.\n", + "\n", + " Parameters\n", + " ----------\n", + " A : list[list[Number]]\n", + " Left matrix of shape (m, n).\n", + " B : list[list[Number]]\n", + " Right matrix of shape (n, p).\n", + "\n", + " Returns\n", + " -------\n", + " C : list[list[Number]]\n", + " Matrix product of shape (m, p).\n", + "\n", + " Raises\n", + " ------\n", + " ValueError\n", + " If the input matrices cannot be multiplied due to incompatible shapes.\n", + " \"\"\"\n", + " # Validate inputs\n", + " if not A or not B:\n", + " return []\n", + "\n", + " # Dimensions\n", + " m, n = len(A), len(A[0]) if A else 0\n", + " n_b, p = len(B), len(B[0]) if B else 0\n", + "\n", + " if n != n_b:\n", + " raise ValueError(f\"Incompatible dimensions: A is {m}x{n}, B is {n_b}x{p}\")\n", + "\n", + " # Pre-allocate result matrix with zeros\n", + " C = [[0] * p for _ in range(m)]\n", + "\n", + " # Main multiplication loop\n", + " for i in range(m):\n", + " a_row = A[i]\n", + " c_row = C[i]\n", + " for k in range(n):\n", + " aik = a_row[k]\n", + " if aik == 0:\n", + " continue # Skip multiplication by zero\n", + " b_row_k = B[k]\n", + " # Unroll the inner loop for potential speed\n", + " for j in range(p):\n", + " c_row[j] += aik * b_row_k[j]\n", + " return C\n", + "def matmul(A, B):\n", + " return ...\n", + "def matmul(A, B):\n", + " return ...\n", + "def matmul(A, B):\n", + " \"\"\"Multiply two matrices using only native Python lists.\"\"\"\n", + " # Basic dimension checks\n", + " if not A or not B or not A[0] or not B[0]:\n", + " raise ValueError(\"Input matrices must be non-empty\")\n", + " if len(A[0]) != len(B):\n", + " raise ValueError(\"Number of columns in A must equal number of rows in B\")\n", + "\n", + " # Transpose B to get its columns efficiently\n", + " BT = list(zip(*B)) # each is a tuple: column of B\n", + "\n", + " # Compute the product\n", + " result = [[sum(a * b for a, b in zip(row, col)) for col in BT] for row in A]\n", + " return result\n", + "def matmul(A, B):\n", + " m = len(A)\n", + " p = len(A[0])\n", + " if p==0:\n", + " return [[] for _ in range(m)]\n", + " n = len(B[0])\n", + " # transpose B for cache\n", + " Bt = [[B[i][j] for i in range(p)] for j in range(n)]\n", + " res = [[0]*n for _ in range(m)]\n", + " for i in range(m):\n", + " Ai = A[i]\n", + " Ri = res[i]\n", + " for k in range(p):\n", + " aik = Ai[k]\n", + " Bk = Bt # not right; we need to use Bt[j][k].\n", + "def matmul(A, B): return ...\n", + "def matmul(A, B):\n", + " ...\n", + "None\n", + "def matmul(A, B):\n", + " # A: m x n, B: n x p\n", + " m, n = len(A), len(A[0])\n", + " assert n == len(B), \"Incompatible matrices\"\n", + " p = len(B[0])\n", + " # pre-transpose B to speed up\n", + " B_T = list(map(list, zip(*B))) # p x n\n", + " return [[sum(a*b for a, b in zip(row, col)) for col in B_T] for row in A]\n", + "None\n", + "def matmul(A, B): return ...\n", + "def matmul(A, B):\n", + " \"\"\"Return `A * B` for two 2‑D lists `A` and `B` using pure Python.\"\"\"\n", + " # Ensure dimensions are compatible\n", + " if not A or not B or not B[0]:\n", + " return []\n", + "\n", + " # Transpose B once to obtain column access in O(1)\n", + " B_t = list(zip(*B))\n", + "\n", + " # Compute the matrix product using list comprehensions\n", + " return [\n", + " [sum(a * b for a, b in zip(row, col))\n", + " for col in B_t] # compute dot product of row with each column of B\n", + " for row in A\n", + " ]\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Compute the matrix product C = A @ B with pure Python.\n", + "\n", + " Parameters\n", + " ----------\n", + " A : List[List[float]]\n", + " Left matrix of size m × p.\n", + " B : List[List[float]]\n", + " Right matrix of size p × n.\n", + "\n", + " Returns\n", + " -------\n", + " List[List[float]]\n", + " Resulting matrix of size m × n.\n", + " \"\"\"\n", + " m, pA = len(A), len(A[0])\n", + " pB, n = len(B), len(B[0])\n", + "\n", + " if pA != pB:\n", + " raise ValueError(\"Inner matrix dimensions must agree (got %d×%d and %d×%d) \"\n", + " % (m, pA, pB, n))\n", + "\n", + " # Allocate result matrix\n", + " C = [[0.0] * n for _ in range(m)]\n", + "\n", + " # Blocked matrix multiplication for better cache locality.\n", + " # Choose a block size that works well on most systems\n", + " block = 64\n", + "\n", + " for i0 in range(0, m, block):\n", + " i_max = min(i0 + block, m)\n", + " for k0 in range(0, pA, block):\n", + " k_max = min(k0 + block, pA)\n", + " for j0 in range(0, n, block):\n", + " j_max = min(j0 + block, n)\n", + "\n", + " for i in range(i0, i_max):\n", + " ai = A[i]\n", + " ci = C[i]\n", + " for k in range(k0, k_max):\n", + " aik = ai[k]\n", + " bk = B[k]\n", + " for j in range(j0, j_max):\n", + " ci[j] += aik * bk[j]\n", + " return C\n", + "None\n", + "def matmul(A, B):\n", + " if not A or not B:\n", + " return []\n", + " m = len(A)\n", + " n = len(A[0])\n", + " p = len(B[0])\n", + " # check compatibility: len(B)==n\n", + " # compute result matrix with nested loops\n", + " res = [[0]*p for _ in range(m)]\n", + " for i in range(m):\n", + " Ai = A[i]\n", + " for j in range(p):\n", + " s = 0\n", + " for k in range(n):\n", + " s += Ai[k]*B[k][j]\n", + " res[i][j] = s\n", + " return res\n", + "def matmul(A, B):\n", + " # check dimensions\n", + " ...\n", + " # compute product\n", + "def matmul(A, B):\n", + " # Extract dimensions\n", + " if not A or not B or not A[0] or not B[0]:\n", + " return [] # or raise ValueError...\n", + " # Determine dimensions\n", + " p = len(A[0]) # columns of A, rows of B\n", + " if p != len(B):\n", + " raise ValueError(\"Incompatible matrix sizes.\")\n", + " # Transpose B for efficient column access\n", + " BT = list(zip(*B))\n", + " # Perform multiplication\n", + " result = []\n", + " for row in A:\n", + " new_row = []\n", + " for col in BT:\n", + " # Dot product\n", + " sum_val = 0\n", + " for a, b in zip(row, col):\n", + " sum_val += a * b\n", + " new_row.append(sum_val)\n", + " result.append(new_row)\n", + " return result\n", + "def matmul(A, B):\n", + " # optimize: transpose B\n", + " BT = list(zip(*B)) # tuple of tuples (row of B^T)\n", + " return [[sum(a*b for a, b in zip(rowA, colB)) for colB in BT] for rowA in A]\n", + "def matmul(A, B):\n", + " n = len(A)\n", + " m = len(B[0])\n", + " p = len(B)\n", + " return [[sum(A[i][k]*B[k][j] for k in range(p)) for j in range(m)] for i in range(n)]\n", + "def matmul(A, B):\n", + " \"\"\"\n", + " Multiply two matrices A and B using pure Python.\n", + "\n", + " Parameters\n", + " ----------\n", + " A : list[list[float | int | complex]]\n", + " The first matrix of dimensions (m, p).\n", + " B : list[list[float | int | complex]]\n", + " The second matrix of dimensions (p, n).\n", + "\n", + " Returns\n", + " -------\n", + " list[list[float | int | complex]]\n", + " The product matrix of dimensions (m, n).\n", + "\n", + " Raises\n", + " ------\n", + " ValueError\n", + " If the inner dimensions do not match.\n", + " \"\"\"\n", + " # Basic sanity checks for empty/corrupt inputs\n", + " if not A or not B:\n", + " raise ValueError(\"Input matrices cannot be empty\")\n", + "\n", + " p = len(A[0])\n", + " for row in A:\n", + " if len(row) != p:\n", + " raise ValueError(\"All rows in A must have the same length\")\n", + " q = len(B)\n", + " for row in B:\n", + " if len(row) != len(B[0]):\n", + " raise ValueError(\"All rows in B must have the same length\")\n", + "\n", + " # Ensure the inner dimensions match\n", + " if p != q:\n", + " raise ValueError(\"Inner matrix dimensions must agree: len(A[0]) != len(B)\")\n", + "\n", + " # Transpose B to avoid repeated look‑ups in the inner loop\n", + " B_T = list(zip(*B)) # Each element is a tuple representing a column\n", + "\n", + " # Perform matrix multiplication\n", + " return [[sum(a * b for a, b in zip(row, col)) for col in B_T] for row in A]\n", + "def matmul(A, B):\n", + " \"\"\"Return the matrix product of A and B using native Python.\"\"\"\n", + " # Transpose B once to avoid repeated column lookups.\n", + " B_T = list(zip(*B))\n", + " # Compute each element of the product using a generator expression.\n", + " return [[sum(a * b for a, b in zip(row, col)) for col in B_T] for row in A]\n", + "None\n", + "None\n", + "None\n", + "None\n", + "def matmul(A, B):\n", + " if not A or not B: \n", + " return []\n", + " nrowA = len(A)\n", + " ncolA = len(A[0])\n", + " nrowB = len(B)\n", + " ncolB = len(B[0])\n", + " if ncolA != nrowB:\n", + " raise ValueError(\"Incompatible dimensions for matrix multiplication\")\n", + " # Precompute columns of B\n", + " colsB = list(zip(*B))\n", + " return [\n", + " [sum(a*b for a, b in zip(row, col)) for col in colsB]\n", + " for row in A\n", + " ]\n" + ] + }, + { + "data": { + "text/plain": [ + "TrainOutput(global_step=100, training_loss=3.209321222243488e-05, metrics={'train_runtime': 5865.3354, 'train_samples_per_second': 0.034, 'train_steps_per_second': 0.017, 'total_flos': 0.0, 'train_loss': 3.209321222243488e-05})" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trainer.train()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And now with the LoRA we just trained with GRPO - we first save the LoRA first!" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "model.save_pretrained(\"gemma_4_lora\") # Local saving\n", + "tokenizer.save_pretrained(\"gemma_4_lora\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Verify LoRA is actually trained!" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "from safetensors import safe_open\n", + "\n", + "tensors = {}\n", + "with safe_open(\"grpo_saved_lora/adapter_model.safetensors\", framework = \"pt\") as f:\n", + " # Verify both A and B are non zero\n", + " for key in f.keys():\n", + " tensor = f.get_tensor(key)\n", + " n_zeros = (tensor == 0).sum() / tensor.numel()\n", + " assert(n_zeros.item() != tensor.numel())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "# Inference\n", + "Now let's try the model we just trained!" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "text = tokenizer.apply_chat_template(\n", + " [{\"role\": \"user\", \"content\": prompt.strip()}],\n", + " tokenize = False,\n", + " add_generation_prompt = True,\n", + ")\n", + "\n", + "from transformers import TextStreamer\n", + "\n", + "_ = model.generate(\n", + " **tokenizer(images = None, text = text, return_tensors = \"pt\").to(\"cuda\"),\n", + " temperature = 1.0, top_p = 0.95, top_k = 64,\n", + " max_new_tokens = 1024,\n", + " streamer = TextStreamer(tokenizer, skip_prompt = False),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "### Saving to float16 for VLLM\n", + "\n", + "We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See [our docs](https://unsloth.ai/docs/basics/inference-and-deployment) for more deployment options." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "# Merge to 16bit\n", + "if False: model.save_pretrained_merged(\"gemma_4_finetune_16bit\", tokenizer, save_method = \"merged_16bit\",)\n", + "if False: model.push_to_hub_merged(\"HF_USERNAME/gemma_4_finetune_16bit\", tokenizer, save_method = \"merged_16bit\", token = \"YOUR_HF_TOKEN\")\n", + "\n", + "# Merge to 4bit\n", + "if False: model.save_pretrained_merged(\"gemma_4_finetune_4bit\", tokenizer, save_method = \"merged_4bit\",)\n", + "if False: model.push_to_hub_merged(\"HF_USERNAME/gemma_4_finetune_4bit\", tokenizer, save_method = \"merged_4bit\", token = \"YOUR_HF_TOKEN\")\n", + "\n", + "# Just LoRA adapters\n", + "if False:\n", + " model.save_pretrained(\"gemma_4_lora\")\n", + " tokenizer.save_pretrained(\"gemma_4_lora\")\n", + "if False:\n", + " model.push_to_hub(\"HF_USERNAME/gemma_4_lora\", token = \"YOUR_HF_TOKEN\")\n", + " tokenizer.push_to_hub(\"HF_USERNAME/gemma_4_lora\", token = \"YOUR_HF_TOKEN\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### GGUF / llama.cpp Conversion\n", + "To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.\n", + "\n", + "Some supported quant methods (full list on our [docs page](https://unsloth.ai/docs/basics/inference-and-deployment/saving-to-gguf)):\n", + "* `q8_0` - Fast conversion. High resource use, but generally acceptable.\n", + "* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.\n", + "* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.\n", + "\n", + "[**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "# Save to 8bit Q8_0\n", + "if False: model.save_pretrained_gguf(\"gemma_4_finetune\", tokenizer,)\n", + "# Remember to go to https://huggingface.co/settings/tokens for a token!\n", + "# And change hf to your username!\n", + "if False: model.push_to_hub_gguf(\"HF_USERNAME/gemma_4_finetune\", tokenizer, token = \"YOUR_HF_TOKEN\")\n", + "\n", + "# Save to 16bit GGUF\n", + "if False: model.save_pretrained_gguf(\"gemma_4_finetune\", tokenizer, quantization_method = \"f16\")\n", + "if False: model.push_to_hub_gguf(\"HF_USERNAME/gemma_4_finetune\", tokenizer, quantization_method = \"f16\", token = \"YOUR_HF_TOKEN\")\n", + "\n", + "# Save to q4_k_m GGUF\n", + "if False: model.save_pretrained_gguf(\"gemma_4_finetune\", tokenizer, quantization_method = \"q4_k_m\")\n", + "if False: model.push_to_hub_gguf(\"HF_USERNAME/gemma_4_finetune\", tokenizer, quantization_method = \"q4_k_m\", token = \"YOUR_HF_TOKEN\")\n", + "\n", + "# Save to multiple GGUF options - much faster if you want multiple!\n", + "if False:\n", + " model.push_to_hub_gguf(\n", + " \"HF_USERNAME/gemma_4_finetune\", # Change hf to your username!\n", + " tokenizer,\n", + " quantization_method = [\"q4_k_m\", \"q8_0\", \"q5_k_m\",],\n", + " token = \"YOUR_HF_TOKEN\",\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, use the `gemma_4_finetune.Q8_0.gguf` file or `gemma_4_finetune.Q4_K_M.gguf` file in llama.cpp.\n", + "\n", + "And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!\n", + "\n", + "Some other resources:\n", + "1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)\n", + "2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)\n", + "3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)\n", + "4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)!\n", + "\n", + "

\n", + " \n", + " \n", + " \n", + "\n", + " Join Discord if you need help + ⭐️ Star us on Github ⭐️\n", + "
\n", + "\n", + " This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme)." + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": {} + } + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E2B)_Reinforcement_Learning_2048_Game.ipynb b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E2B)_Reinforcement_Learning_2048_Game.ipynb new file mode 100644 index 0000000..c0e3acf --- /dev/null +++ b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E2B)_Reinforcement_Learning_2048_Game.ipynb @@ -0,0 +1,11898 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To run this, press \"*Runtime*\" and press \"*Run all*\" on a **free** Tesla T4 Google Colab instance!\n", + "
\n", + "\n", + "\n", + " Join Discord if you need help + ⭐ Star us on Github ⭐\n", + "
\n", + "\n", + "To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme).\n", + "\n", + "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Goal: Make Gemma 4 play games with Reinforcement Learning\n", + "\n", + "Our goal is to make Gemma 4 play the 2048 game with reinforcement learning, or a variant of it called [GRPO](https://arxiv.org/abs/2501.12948).\n", + "\n", + "We want the model to devise a strategy to play 2048, and we will run this strategy until we win or lose. We then reward the model if it created a good strategy (winning the game), and we'll penalize it (negative reward) if the strategy was a bad one.\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Installation\n", + "We'll be using [Unsloth](https://github.com/unslothai/unsloth) to do RL on Gemma 4. Unsloth saves 70% VRAM usage and makes reinforcement learning 2 to 6x faster!" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "%%capture\n", + "import os, importlib.util\n", + "!pip install --upgrade -qqq uv\n", + "if importlib.util.find_spec(\"torch\") is None or \"COLAB_\" in \"\".join(os.environ.keys()):\n", + " try: import numpy, PIL; _numpy = f\"numpy=={numpy.__version__}\"; _pil = f\"pillow=={PIL.__version__}\"\n", + " except: _numpy = \"numpy\"; _pil = \"pillow\"\n", + " # Gemma 4 requires transformers >= 5.5.0 — do NOT pin to 4.x here\n", + " !uv pip install -qqq \\\n", + " \"torch>=2.8.0\" \"triton>=3.4.0\" {_numpy} {_pil} torchvision bitsandbytes \\\n", + " \"unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo\" \\\n", + " \"unsloth[base] @ git+https://github.com/unslothai/unsloth\" \\\n", + " git+https://github.com/triton-lang/triton.git@0add68262ab0a2e33b84524346cb27cbb2787356#subdirectory=python/triton_kernels\n", + "elif importlib.util.find_spec(\"unsloth\") is None:\n", + " !uv pip install -qqq unsloth\n", + "# Gemma 4 requires transformers >= 5.5.0\n", + "!uv pip install --upgrade --no-deps \"transformers>=5.5.0\" tokenizers \"trl>=0.28.0\" unsloth unsloth_zoo" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Unsloth" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "from unsloth import FastVisionModel\n", + "import torch\n", + "max_seq_length = 4096 # Can increase for longer reasoning traces\n", + "lora_rank = 32 # Larger rank = smarter, but slower\n", + "\n", + "gemma4_models = [\n", + " # Gemma-4 instruct models:\n", + " \"unsloth/gemma-4-E2B-it\",\n", + " \"unsloth/gemma-4-E4B-it\",\n", + " \"unsloth/gemma-4-31B-it\",\n", + " \"unsloth/gemma-4-26B-A4B-it\",\n", + " # Gemma-4 base models:\n", + " \"unsloth/gemma-4-E2B\",\n", + " \"unsloth/gemma-4-E4B\",\n", + " \"unsloth/gemma-4-31B\",\n", + " \"unsloth/gemma-4-26B-A4B\",\n", + "] # More models at https://huggingface.co/unsloth\n", + "\n", + "model, tokenizer = FastVisionModel.from_pretrained(\n", + " model_name = \"unsloth/gemma-4-E2B-it\",\n", + " max_seq_length = max_seq_length,\n", + " load_in_4bit = False, # False for LoRA 16bit\n", + " fast_inference = False, # Enable vllm fast inference\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To do efficient RL, we will use [LoRA](https://arxiv.org/abs/2106.09685), which allows us to only add 1 to 5% of extra weights to the model for finetuning purposes. This allows us to save memory usage by over 60%, and yet it retains good accuracy." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "model = FastVisionModel.get_peft_model(\n", + " model,\n", + " r = lora_rank, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128\n", + " target_modules = [\n", + " \"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\",\n", + " \"gate_proj\", \"up_proj\", \"down_proj\",\n", + " ],\n", + " lora_alpha = lora_rank*2, # *2 speeds up training\n", + " use_gradient_checkpointing = \"unsloth\", # Reduces memory usage\n", + " random_state = 3407,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 2048 game\n", + "\n", + "We used GPT-5 to create a variant of the 2048 game. It should output the current game board state, and allow us to advance the game board state with 1 action (up, down, left, right)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#@title (Collapsible) 2048 Game Implementation\n", + "from dataclasses import dataclass, field\n", + "from typing import List, Tuple, Optional\n", + "import random\n", + "import copy\n", + "\n", + "def _compress_and_merge_row_left(row: List[int]) -> Tuple[List[int], int, bool]:\n", + " n = len(row)\n", + " tiles = [x for x in row if x != 0]\n", + " gained = 0\n", + " i = 0\n", + " merged = []\n", + " while i < len(tiles):\n", + " if i + 1 < len(tiles) and tiles[i] == tiles[i + 1]:\n", + " v = tiles[i] * 2\n", + " gained += v\n", + " merged.append(v)\n", + " i += 2\n", + " else:\n", + " merged.append(tiles[i])\n", + " i += 1\n", + " merged += [0] * (n - len(merged))\n", + " changed = merged != row\n", + " return merged, gained, changed\n", + "\n", + "def _move_left(board: List[List[int]]) -> Tuple[List[List[int]], int, bool]:\n", + " changed_any = False\n", + " total_gain = 0\n", + " new_board = []\n", + " for row in board:\n", + " new_row, gained, changed = _compress_and_merge_row_left(row)\n", + " new_board.append(new_row)\n", + " total_gain += gained\n", + " changed_any = changed_any or changed\n", + " return new_board, total_gain, changed_any\n", + "\n", + "def _move_right(board: List[List[int]]) -> Tuple[List[List[int]], int, bool]:\n", + " changed_any = False\n", + " total_gain = 0\n", + " new_board = []\n", + " for row in board:\n", + " rev = list(reversed(row))\n", + " new_rev, gained, changed = _compress_and_merge_row_left(rev)\n", + " new_row = list(reversed(new_rev))\n", + " new_board.append(new_row)\n", + " total_gain += gained\n", + " changed_any = changed_any or changed\n", + " return new_board, total_gain, changed_any\n", + "\n", + "def _transpose(board: List[List[int]]) -> List[List[int]]:\n", + " return [list(row) for row in zip(*board)]\n", + "\n", + "def _move_up(board: List[List[int]]) -> Tuple[List[List[int]], int, bool]:\n", + " t = _transpose(board)\n", + " moved, gain, changed = _move_left(t)\n", + " return _transpose(moved), gain, changed\n", + "\n", + "def _move_down(board: List[List[int]]) -> Tuple[List[List[int]], int, bool]:\n", + " t = _transpose(board)\n", + " moved, gain, changed = _move_right(t)\n", + " return _transpose(moved), gain, changed\n", + "\n", + "def _empty_cells(board: List[List[int]]) -> List[Tuple[int, int]]:\n", + " size = len(board)\n", + " return [(r, c) for r in range(size) for c in range(size) if board[r][c] == 0]\n", + "\n", + "def _can_move(board: List[List[int]]) -> bool:\n", + " if _empty_cells(board):\n", + " return True\n", + " size = len(board)\n", + " for r in range(size):\n", + " for c in range(size - 1):\n", + " if board[r][c] == board[r][c + 1]:\n", + " return True\n", + " for r in range(size - 1):\n", + " for c in range(size):\n", + " if board[r][c] == board[r + 1][c]:\n", + " return True\n", + " return False\n", + "\n", + "@dataclass\n", + "class GameBoard:\n", + " size: int\n", + " seed: Optional[int] = None\n", + " target: int = 2048\n", + " probability_fours: float = 0.10 # originally spawns (4) 10% of the time!\n", + " _rng: random.Random = field(init = False, repr = False)\n", + " _board: List[List[int]] = field(init = False, repr = False)\n", + " _score: int = field(default = 0, init = False, repr = False)\n", + " _state: str = field(default = \"ongoing\", init = False, repr = False)\n", + "\n", + " def __post_init__(self):\n", + " if self.size < 2:\n", + " raise ValueError(\"Board size must be at least 2.\")\n", + " self._rng = random.Random(self.seed)\n", + " self._board = [[0 for _ in range(self.size)] for _ in range(self.size)]\n", + " self._add_random_tile()\n", + " self._add_random_tile()\n", + " self._update_state_after_change()\n", + "\n", + " class _BoardView:\n", + " def __init__(self, game: \"GameBoard\"):\n", + " self._game = game\n", + " def __iter__(self):\n", + " return iter(self._game._board)\n", + " def __len__(self):\n", + " return len(self._game._board)\n", + " def __getitem__(self, idx):\n", + " return self._game._board[idx]\n", + " def __repr__(self) -> str:\n", + " return repr(self._game._board)\n", + " __str__ = __repr__\n", + " def do_action(self, key: str) -> None:\n", + " self._game.do_action(key)\n", + " def state(self) -> str:\n", + " return self._game.state()\n", + " def pretty(self, colors: bool = True, border: bool = True, dot_for_zero: bool = True) -> str:\n", + " return self._game._render_pretty(colors = colors, border = border, dot_for_zero = dot_for_zero)\n", + "\n", + " def board(self) -> \"_BoardView\":\n", + " return GameBoard._BoardView(self)\n", + " def state(self) -> str:\n", + " return self._state\n", + " def score(self) -> int:\n", + " return self._score\n", + " def do_action(self, key: str) -> None:\n", + " if self._state != \"ongoing\":\n", + " return\n", + " if not isinstance(key, str) or len(key) == 0:\n", + " self._state = \"failed\"\n", + " return\n", + " k = key.strip().lower()\n", + " if k == \"q\":\n", + " self._state = \"failed\"\n", + " return\n", + " move_map = {\"a\": _move_left, \"d\": _move_right, \"w\": _move_up, \"s\": _move_down}\n", + " if k not in move_map:\n", + " self._state = \"failed\"\n", + " return\n", + " mover = move_map[k]\n", + " new_board, gain, changed = mover(self._board)\n", + " if changed:\n", + " self._board = new_board\n", + " self._score += gain\n", + " self._add_random_tile()\n", + " self._update_state_after_change()\n", + " def _add_random_tile(self) -> bool:\n", + " empties = _empty_cells(self._board)\n", + " if not empties:\n", + " return False\n", + " r, c = self._rng.choice(empties)\n", + " self._board[r][c] = 4 if self._rng.random() < self.probability_fours else 2\n", + " return True\n", + " def _update_state_after_change(self) -> None:\n", + " if any(self.target in row for row in self._board):\n", + " self._state = \"success\"\n", + " return\n", + " if not _can_move(self._board):\n", + " self._state = \"failed\"\n", + " return\n", + " self._state = \"ongoing\"\n", + " def _render_pretty(self, colors: bool = True, border: bool = True, dot_for_zero: bool = True) -> str:\n", + " \"\"\"\n", + " Pretty-print the board with colors that scale from 0 up to self.target.\n", + " Uses ANSI 256-color codes (works in most terminals). Set colors = False to disable.\n", + " \"\"\"\n", + " import math\n", + "\n", + " b = self._board\n", + " mx = max((max(row) for row in b), default = 0)\n", + " cell_w = max(3, len(str(mx)))\n", + "\n", + " RESET = \"\\x1b[0m\"\n", + "\n", + " # A smooth-ish gradient from cool → warm\n", + " # (blue/cyan/green → yellow/orange/red). Tweak or expand as you like.\n", + " GRAD = [33, 39, 45, 51, 50, 49, 48, 47, 46, 82, 118, 154, 190, 226, 220, 214, 208, 202, 196]\n", + " ZERO_FG = 239 # dim gray\n", + "\n", + " def color_code(v: int) -> str:\n", + " if not colors:\n", + " return \"\"\n", + " if v == 0:\n", + " return f\"\\x1b[38;5;{ZERO_FG}m\"\n", + " # Normalize by exponent relative to target: r in [0,1]\n", + " t = max(2, self.target) # safety; avoid log2(1)\n", + " # Guard: if v is not a power of two or is <1, handle gracefully\n", + " try:\n", + " r = max(0.0, min(1.0, math.log2(v) / math.log2(t)))\n", + " except ValueError:\n", + " r = 0.0\n", + " idx = int(round(r * (len(GRAD) - 1)))\n", + " return f\"\\x1b[38;5;{GRAD[idx]}m\"\n", + "\n", + " def fmt(v: int) -> str:\n", + " s = \".\" if (v == 0 and dot_for_zero) else str(v)\n", + " s = s.rjust(cell_w)\n", + " return color_code(v) + s + (RESET if colors else \"\")\n", + "\n", + " def hline(left: str, mid: str, right: str) -> str:\n", + " return left + mid.join(\"─\" * cell_w for _ in range(self.size)) + right\n", + "\n", + " rows = []\n", + " if border:\n", + " rows.append(hline(\"┌\", \"┬\", \"┐\"))\n", + " for r in range(self.size):\n", + " content = \"│\".join(fmt(v) for v in b[r])\n", + " rows.append((\"│\" + content + \"│\") if border else content)\n", + " if border:\n", + " rows.append(hline(\"└\" if r == self.size - 1 else \"├\",\n", + " \"┴\" if r == self.size - 1 else \"┼\",\n", + " \"┘\" if r == self.size - 1 else \"┤\"))\n", + " return \"\\n\".join(rows)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For example let's create a board of size 5 X 5 and set the target to 8 instead of 2048.\n", + "\n", + "**[NOTE]** 2048 originally spawns a (4) 10% of the time! We can disable this for harder games. See [Wikipedia page](https://en.wikipedia.org/wiki/2048_(video_game)) for more details." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "┌───┬───┬───┬───┬───┐\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;48m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;48m 2\u001b[0m│\n", + "└───┴───┴───┴───┴───┘ ongoing\n" + ] + } + ], + "source": [ + "game = GameBoard(size = 5, seed = 42, target = 8, probability_fours = 0.10)\n", + "print(game.board().pretty(), game.state())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "GameBoard(size=5, seed=42, target=8, probability_fours=0.1)" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "game" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We'll use WASD for the action space:\n", + "\n", + "```\n", + " W\n", + "A S D\n", + "```\n", + "Also `game.state()` will say `success` if we succeeded in getting the target!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "┌───┬───┬───┬───┬───┐\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;48m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;190m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "└───┴───┴───┴───┴───┘ ongoing\n" + ] + } + ], + "source": [ + "game.do_action(\"A\")\n", + "print(game.board().pretty(), game.state())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "┌───┬───┬───┬───┬───┐\n", + "│\u001b[38;5;190m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;48m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;48m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "└───┴───┴───┴───┴───┘ ongoing\n" + ] + } + ], + "source": [ + "game.do_action(\"W\")\n", + "print(game.board().pretty(), game.state())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "┌───┬───┬───┬───┬───┐\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;190m 4\u001b[0m│\u001b[38;5;48m 2\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;48m 2\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;190m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "└───┴───┴───┴───┴───┘ ongoing\n" + ] + } + ], + "source": [ + "game.do_action(\"D\")\n", + "print(game.board().pretty(), game.state())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "┌───┬───┬───┬───┬───┐\n", + "│\u001b[38;5;190m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;190m 4\u001b[0m│\u001b[38;5;190m 4\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;190m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "└───┴───┴───┴───┴───┘ ongoing\n" + ] + } + ], + "source": [ + "game.do_action(\"W\")\n", + "print(game.board().pretty(), game.state())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "┌───┬───┬───┬───┬───┐\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;48m 2\u001b[0m│\u001b[38;5;190m 4\u001b[0m│\u001b[38;5;196m 8\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;190m 4\u001b[0m│\n", + "├───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "└───┴───┴───┴───┴───┘ success\n" + ] + } + ], + "source": [ + "game.do_action(\"D\")\n", + "print(game.board().pretty(), game.state())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we do some other action that's not part of the action space, we will get an error, and the game will not accept anymore actions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "┌───┬───┬───┐\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;190m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├───┼───┼───┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;48m 2\u001b[0m│\n", + "├───┼───┼───┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "└───┴───┴───┘ failed\n" + ] + } + ], + "source": [ + "game = GameBoard(size = 3, seed = 42, target = 8, probability_fours = 0.10)\n", + "game.do_action(\"AA\") # Not in WASD\n", + "game.do_action(\"W\") # Doesn't do anything\n", + "game.do_action(\"A\") # Doesn't do anything\n", + "print(game.board().pretty(), game.state())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# RL Environment Setup\n", + "\n", + "We'll set up a function to accept some strategy that'll emit an action within `WASD` and check the game state.\n", + "\n", + "We'll also add a timer to only execute the strategy for 2 seconds maximum, otherwise it might never terminate!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from typing import Callable\n", + "from unsloth import execute_with_time_limit\n", + "\n", + "def _execute_strategy(strategy : Callable, game : GameBoard):\n", + " assert callable(strategy)\n", + "\n", + " steps = 0\n", + " while game.state() == \"ongoing\":\n", + " action = strategy(list(game.board()))\n", + " steps += 1\n", + " if type(action) is not str:\n", + " return steps, \"failed\"\n", + " game.do_action(action)\n", + " return steps, game.state()\n", + "\n", + "@execute_with_time_limit(2)\n", + "def execute_strategy(strategy : Callable, game : GameBoard):\n", + " return _execute_strategy(strategy, game)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's make a generic strategy to just hit `W`. We should expect this generic strategy to fail:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Timed out with error = Timed out after 2s\n" + ] + } + ], + "source": [ + "def always_move_left(board):\n", + " return \"W\"\n", + "\n", + "game = GameBoard(size = 8, seed = 42, target = 2048, probability_fours = 0.10)\n", + "try:\n", + " execute_strategy(always_move_left, game)\n", + "except TimeoutError as e:\n", + " print(f\"Timed out with error = {str(e)}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To allow longer strategies for Gemma 4 Reinforcement Learning, we shall allow a 5 second timer." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "@execute_with_time_limit(5)\n", + "def execute_strategy(strategy : Callable, game : GameBoard):\n", + " return _execute_strategy(strategy, game)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Code Execution\n", + "\n", + "To execute and create a new Python function, we first have to check if the function does not call other global variables or cheat. This is called `countering reward hacking` since we don't want the function to cheat.\n", + "\n", + "For example the below piece of code is fine, since it only imports Python level functions. We use `check_python_modules`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Only Python imports? True\n", + "{'stdlib': ['math', 'typing'], 'non_stdlib': [], 'relative_imports': 0}\n" + ] + } + ], + "source": [ + "from unsloth import check_python_modules\n", + "\n", + "sample = \"\"\"\n", + "def strategy(board):\n", + " import math\n", + " from typing import Callable\n", + " return \"W\"\n", + "\"\"\"\n", + "ok, info = check_python_modules(sample)\n", + "print(\"Only Python imports?\", ok)\n", + "print(info)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For the below piece of code, since we import `numpy`, we should not allow the execution:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Only Python imports? False\n", + "{'stdlib': [], 'non_stdlib': ['numpy'], 'relative_imports': 0}\n" + ] + } + ], + "source": [ + "sample = \"\"\"\n", + "def strategy(board):\n", + " from numpy import matmul\n", + " return \"W\"\n", + "\"\"\"\n", + "ok, info = check_python_modules(sample)\n", + "print(\"Only Python imports?\", ok)\n", + "print(info)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We also disallow global variable access. We'll use Unsloth's `create_locked_down_function` function" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "name 'np' is not defined\n" + ] + } + ], + "source": [ + "from unsloth import create_locked_down_function\n", + "function = \"\"\"\n", + "def import_numpy():\n", + " np.matmul\n", + " print(\"Success\")\n", + "\"\"\"\n", + "f = create_locked_down_function(function)\n", + "try:\n", + " f()\n", + "except Exception as e:\n", + " print(str(e))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "60\n" + ] + } + ], + "source": [ + "from unsloth import create_locked_down_function\n", + "function = \"\"\"\n", + "def add(a, b):\n", + " def adder(a):\n", + " return a + b\n", + " return adder(b) + b\n", + "\"\"\"\n", + "f = create_locked_down_function(function)\n", + "try:\n", + " print(f(10, 20))\n", + "except Exception as e:\n", + " print(str(e))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Data & RL task setup\n", + "\n", + "We now have to create a prompt to tell the model to create a strategy for the 2048 game. You can customize this to some other task for another RL task." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Create a new short 2048 strategy using only native Python code.\n", + "You are given a list of list of numbers for the current board state.\n", + "Output one action for \"W\", \"A\", \"S\", \"D\" on what is the optimal next step.\n", + "Output your new short function in backticks using the format below:\n", + "```python\n", + "def strategy(board):\n", + " return \"W\" # Example\n", + "```\n", + "All helper functions should be inside def strategy. Only output the short function `strategy`.\n" + ] + } + ], + "source": [ + "prompt = \"\"\"\n", + "Create a new short 2048 strategy using only native Python code.\n", + "You are given a list of list of numbers for the current board state.\n", + "Output one action for \"W\", \"A\", \"S\", \"D\" on what is the optimal next step.\n", + "Output your new short function in backticks using the format below:\n", + "```python\n", + "def strategy(board):\n", + " return \"W\" # Example\n", + "```\n", + "All helper functions should be inside def strategy. Only output the short function `strategy`.\n", + "\"\"\".strip()\n", + "print(prompt)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First, let's prompt Gemma 4 without RL and see how it goes:" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "text = tokenizer.apply_chat_template(\n", + " [{\"role\": \"user\", \"content\": prompt.strip()}],\n", + " tokenize = False,\n", + " add_generation_prompt = True,\n", + ")\n", + "\n", + "from transformers import TextStreamer\n", + "print(\"=\" * 50)\n", + "print(\"BASE MODEL OUTPUT (before RL training):\")\n", + "print(\"=\" * 50)\n", + "\n", + "inputs = tokenizer(\n", + " text = text,\n", + " add_special_tokens = False,\n", + " return_tensors = \"pt\",\n", + ").to(\"cuda\")\n", + "\n", + "text_streamer = TextStreamer(tokenizer, skip_prompt = True)\n", + "result = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 512,\n", + " use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Reward functions\n", + "\n", + "We now design a `extract_function` function which simply extracts the function wrapped in 3 back ticks.\n", + "\n", + "And 3 reward functions:\n", + "\n", + "1. `function_works` which rewards the model if the strategy is a valid Python function.\n", + "2. `no_cheating` which checks if the function imported other modules, and if it did, we penalize it.\n", + "3. `strategy_succeeds` which checks if the game strategy actually succeeds in attaining 2048 after running the auto-generated strategy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "def strategy(board):\n", + " return \"W\" # Example\n" + ] + } + ], + "source": [ + "def extract_function(text):\n", + " if text.count(\"```\") >= 2:\n", + " first = text.find(\"```\") + 3\n", + " second = text.find(\"```\", first)\n", + " fx = text[first : second].strip()\n", + " fx = fx.removeprefix(\"python\\n\")\n", + " fx = fx[fx.find(\"def\"):]\n", + " if fx.startswith(\"def strategy(board):\"): return fx\n", + " return None\n", + "print(extract_function(prompt))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Below is our `function_works` reward function which uses Python's `exec` but guarded by not allowing leakage of local and global variables. We can also use `check_python_modules` first to check if there are errors before even executing the function:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(False,\n", + " {'error': \"SyntaxError: expected '(' (, line 1)\",\n", + " 'stdlib': [],\n", + " 'non_stdlib': [],\n", + " 'relative_imports': 0})" + ], + "text/html": [ + "
(False,\n",
+       " {'error': "SyntaxError: expected '(' (<unknown>, line 1)",\n",
+       "  'stdlib': [],\n",
+       "  'non_stdlib': [],\n",
+       "  'relative_imports': 0})
" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ok, info = check_python_modules(\"def a\")\n", + "ok, info" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def function_works(completions, **kwargs):\n", + " scores = []\n", + " for completion in completions:\n", + " score = 0\n", + " response = completion[0][\"content\"]\n", + " function = extract_function(response)\n", + " if function is not None:\n", + " ok, info = check_python_modules(function)\n", + " if function is None or \"error\" in info:\n", + " score = -2.0\n", + " else:\n", + " try:\n", + " new_strategy = create_locked_down_function(function)\n", + " score = 1.0\n", + " except:\n", + " score = -0.5\n", + " scores.append(score)\n", + " return scores" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`no_cheating` checks if the function cheated since it might have imported Numpy or other functions:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def no_cheating(completions, **kwargs):\n", + " scores = []\n", + " for completion in completions:\n", + " score = 0\n", + " response = completion[0][\"content\"]\n", + " function = extract_function(response)\n", + " if function is not None:\n", + " ok, info = check_python_modules(function)\n", + " scores.append(1.0 if ok else -20.0) # Penalize heavily!\n", + " else:\n", + " scores.append(-1.0) # Failed creating function\n", + " return scores" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next `strategy_succeeds` checks if the strategy actually allows the game to terminate. Imagine if the strategy simply returned \"W\" which would fail after a time limit of 10 seconds.\n", + "\n", + "We also add a global `PRINTER` to print out the strategy and board state." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "global PRINTER\n", + "PRINTER = 0\n", + "def strategy_succeeds(completions, **kwargs):\n", + " global PRINTER\n", + " scores = []\n", + " # Generate a random game board with seed\n", + " seed = np.random.randint(10000)\n", + " for completion in completions:\n", + " printed = False\n", + " score = 0\n", + " response = completion[0][\"content\"]\n", + " function = extract_function(response)\n", + " if PRINTER % 5 == 0:\n", + " printed = True\n", + " print(function)\n", + " PRINTER += 1\n", + " if function is not None:\n", + " ok, info = check_python_modules(function)\n", + " if function is None or \"error\" in info:\n", + " scores.append(0)\n", + " continue\n", + " try:\n", + " new_strategy = create_locked_down_function(function)\n", + " except:\n", + " scores.append(0)\n", + " continue\n", + " try:\n", + " game = GameBoard(size = 6, seed = seed, target = 2048, probability_fours = 0.10)\n", + " steps, game_state = execute_strategy(new_strategy, game)\n", + " print(f\"Steps = {steps} State = {game_state}\")\n", + " if printed is False:\n", + " print(function)\n", + " print(game.board().pretty())\n", + " if game_state == \"success\":\n", + " scores.append(20.0) # Success - massively reward!\n", + " else:\n", + " scores.append(2.0) # Failed but function works!\n", + " except TimeoutError as e:\n", + " print(\"Timeout\")\n", + " scores.append(-1.0) # Failed with timeout\n", + " except Exception as e:\n", + " print(f\"Exception = {str(e)}\")\n", + " scores.append(-3.0) # Failed\n", + " return scores" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We'll now create the dataset which includes a replica of our prompt." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "from datasets import Dataset\n", + "dataset = Dataset.from_list([{\"prompt\" : [{\"role\": \"user\", \"content\": prompt.strip()}], \"answer\" : 0}]*1000)\n", + "maximum_length = len(tokenizer.apply_chat_template([{\"role\":\"user\", \"content\":prompt.strip()}], add_generation_prompt = True, tokenize = True))\n", + "print(maximum_length)\n", + "dataset[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "### Train the model\n", + "\n", + "Now set up GRPO Trainer and all configurations! We also support GSPO, GAPO, Dr GRPO and more! Go the Unsloth [Reinforcement Learning Docs](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide) for more options." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "# Leave room for the prompt (plus 1 token safety margin)\n", + "max_completion_length = max_seq_length - (maximum_length + 1)\n", + "\n", + "from trl import GRPOConfig, GRPOTrainer\n", + "training_args = GRPOConfig(\n", + " temperature = 1.0,\n", + " top_p = 0.95,\n", + " top_k = 64,\n", + " learning_rate = 5e-5,\n", + " weight_decay = 0.001,\n", + " warmup_ratio = 0.1,\n", + " lr_scheduler_type = \"linear\",\n", + " optim = \"adamw_8bit\",\n", + " logging_steps = 1,\n", + " per_device_train_batch_size = 1,\n", + " gradient_accumulation_steps = 2, # Increase to 4 for smoother training\n", + " num_generations = 2, # Decrease if out of memory\n", + " max_completion_length = max_completion_length,\n", + " # num_train_epochs = 1, # Set to 1 for a full training run\n", + " max_steps = 60,\n", + " save_steps = 100,\n", + " report_to = \"none\", # Can use Weights & Biases, TrackIO\n", + " output_dir = \"outputs\",\n", + " epsilon = 0.2,\n", + " epsilon_high = 0.28, # one sided\n", + " delta = 1.5, # two sided\n", + " loss_type = 'bnpo',\n", + " mask_truncated_completions = True\n", + " # For optional training + evaluation\n", + " # fp16_full_eval = True,\n", + " # per_device_eval_batch_size = 4,\n", + " # eval_accumulation_steps = 1,\n", + " # eval_strategy = \"steps\",\n", + " # eval_steps = 1,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And let's run the trainer! If you scroll up, you'll see a table of rewards. The goal is to see the `reward` column increase!\n", + "\n", + "You might have to wait 150 to 200 steps for any action. You'll probably get 0 reward for the first 100 steps. Please be patient!\n", + "\n", + "| Step | Training Loss | reward | reward_std | completion_length | kl |\n", + "|------|---------------|-----------|------------|-------------------|----------|\n", + "| 1 | 0.000000 | 0.125000 | 0.000000 | 200.000000 | 0.000000 |\n", + "| 2 | 0.000000 | 0.072375 | 0.248112 | 200.000000 | 0.000000 |\n", + "| 3 | 0.000000 | -0.079000 | 0.163776 | 182.500000 | 0.000005 |" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# For optional training + evaluation\n", + "# new_dataset = dataset.train_test_split(test_size = 0.01)\n", + "\n", + "trainer = GRPOTrainer(\n", + " model = model,\n", + " processing_class = tokenizer,\n", + " reward_funcs = [\n", + " function_works,\n", + " no_cheating,\n", + " strategy_succeeds,\n", + " ],\n", + " args = training_args,\n", + " train_dataset = dataset,\n", + "\n", + " # For optional training + evaluation\n", + " # train_dataset = new_dataset[\"train\"],\n", + " # eval_dataset = new_dataset[\"test\"],\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And let's train the model!\n", + "\n", + "**NOTE** A T4 free GPU might take 5 minutes for one generation sadly since it's an old GPU - A100 or H100 will be much faster!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': 199998}.\n", + "==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1\n", + " \\\\ /| Num examples = 1,000 | Num Epochs = 1 | Total steps = 600\n", + "O^O/ \\_/ \\ Batch size per device = 2 | Gradient accumulation steps = 1\n", + "\\ / Data Parallel GPUs = 1 | Total batch size (2 x 1 x 1) = 2\n", + " \"-____-\" Trainable parameters = 1,990,656 of 20,916,747,840 (0.01% trained)\n", + "`generation_config` default values have been modified to match model-specific defaults: {'max_length': 131072}. If this is not desired, please set these values explicitly.\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [434/600 4:48:18 < 1:50:46, 0.02 it/s, Epoch 0.43/1]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StepTraining Lossrewardreward_stdcompletions / mean_lengthcompletions / min_lengthcompletions / max_lengthcompletions / clipped_ratiocompletions / mean_terminated_lengthcompletions / min_terminated_lengthcompletions / max_terminated_lengthklrewards / function_works / meanrewards / function_works / stdrewards / no_cheating / meanrewards / no_cheating / stdrewards / strategy_succeeds / meanrewards / strategy_succeeds / std
10.0000002.5000002.121320429.500000427.000000432.0000000.000000429.500000427.000000432.0000000.0023321.0000000.0000001.0000000.0000000.5000002.121320
20.000000-1.2500002.474874519.000000452.000000586.0000000.500000452.000000452.000000452.0000000.002923-1.2500001.0606600.0000001.4142140.0000000.000000
30.0000000.0000001.414214391.000000346.000000436.0000000.000000391.000000346.000000436.0000000.0030811.0000000.0000001.0000000.000000-2.0000001.414214
40.0000001.0000000.000000317.000000274.000000360.0000000.000000317.000000274.000000360.0000000.0035051.0000000.0000001.0000000.000000-1.0000000.000000
50.0000001.0000000.000000327.500000230.000000425.0000000.000000327.500000230.000000425.0000000.0098571.0000000.0000001.0000000.000000-1.0000000.000000
60.0000001.0000000.000000413.500000392.000000435.0000000.000000413.500000392.000000435.0000000.0010931.0000000.0000001.0000000.000000-1.0000000.000000
70.000000-1.0000002.828427433.000000280.000000586.0000000.500000280.000000280.000000280.0000000.005341-0.5000002.1213200.0000001.414214-0.5000000.707107
80.000000-1.0000002.828427505.500000425.000000586.0000000.500000425.000000425.000000425.0000000.002836-0.5000002.1213200.0000001.414214-0.5000000.707107
90.0000001.5000003.535534356.000000250.000000462.0000000.000000356.000000250.000000462.0000000.0053701.0000000.0000001.0000000.000000-0.5000003.535534
100.0000001.0000000.000000345.500000223.000000468.0000000.000000345.500000223.000000468.0000000.0061661.0000000.0000001.0000000.000000-1.0000000.000000
110.0000001.0000000.000000414.000000392.000000436.0000000.000000414.000000392.000000436.0000000.0023521.0000000.0000001.0000000.000000-1.0000000.000000
120.0000000.0000001.414214387.000000278.000000496.0000000.000000387.000000278.000000496.0000000.0055581.0000000.0000001.0000000.000000-2.0000001.414214
130.0000001.0000000.000000339.500000314.000000365.0000000.000000339.500000314.000000365.0000000.0016001.0000000.0000001.0000000.000000-1.0000000.000000
140.0000001.0000000.000000412.500000406.000000419.0000000.000000412.500000406.000000419.0000000.0017241.0000000.0000001.0000000.000000-1.0000000.000000
150.000000-2.0000001.414214556.500000527.000000586.0000000.500000527.000000527.000000527.0000000.002466-0.5000002.1213200.0000001.414214-1.5000002.121320
160.000000-1.0000002.828427520.500000455.000000586.0000000.500000455.000000455.000000455.0000000.002554-0.5000002.1213200.0000001.414214-0.5000000.707107
170.000000-1.0000002.828427439.500000293.000000586.0000000.500000293.000000293.000000293.0000000.007403-0.5000002.1213200.0000001.414214-0.5000000.707107
180.000000-10.50000016.263456472.000000449.000000495.0000000.000000472.000000449.000000495.0000000.002674-0.5000002.121320-9.50000014.849242-0.5000000.707107
190.0000000.0000001.414214432.000000348.000000516.0000000.000000432.000000348.000000516.0000000.0107181.0000000.0000001.0000000.000000-2.0000001.414214
200.000000-2.0000001.414214458.000000330.000000586.0000000.500000330.000000330.000000330.0000000.017924-0.5000002.1213200.0000001.414214-1.5000002.121320
210.000000-2.0000001.414214487.000000388.000000586.0000000.500000388.000000388.000000388.0000000.022975-0.5000002.1213200.0000001.414214-1.5000002.121320
220.000000-1.0000002.828427506.000000426.000000586.0000000.500000426.000000426.000000426.0000000.015738-0.5000002.1213200.0000001.414214-0.5000000.707107
230.0000001.0000000.000000436.500000382.000000491.0000000.000000436.500000382.000000491.0000000.0357681.0000000.0000001.0000000.000000-1.0000000.000000
240.0000000.0000001.414214365.500000359.000000372.0000000.000000365.500000359.000000372.0000000.0020271.0000000.0000001.0000000.000000-2.0000001.414214
250.000100-1.0000002.828427481.000000376.000000586.0000000.500000376.000000376.000000376.0000000.092577-0.5000002.1213200.0000001.414214-0.5000000.707107
260.0001001.0000000.000000490.500000427.000000554.0000000.000000490.500000427.000000554.0000000.0682581.0000000.0000001.0000000.000000-1.0000000.000000
270.0000001.0000000.000000329.000000296.000000362.0000000.000000329.000000296.000000362.0000000.0434971.0000000.0000001.0000000.000000-1.0000000.000000
280.000300-10.50000016.263456350.500000191.000000510.0000000.000000350.500000191.000000510.0000000.306294-0.5000002.121320-9.50000014.849242-0.5000000.707107
290.000100-1.0000002.828427517.500000449.000000586.0000000.500000449.000000449.000000449.0000000.139758-0.5000002.1213200.0000001.414214-0.5000000.707107
300.0001000.0000001.414214426.500000374.000000479.0000000.000000426.500000374.000000479.0000000.1216261.0000000.0000001.0000000.000000-2.0000001.414214
310.0001000.0000001.414214365.500000343.000000388.0000000.000000365.500000343.000000388.0000000.0610041.0000000.0000001.0000000.000000-2.0000001.414214
320.000200-11.50000014.849242404.000000356.000000452.0000000.000000404.000000356.000000452.0000000.162440-0.5000002.121320-9.50000014.849242-1.5000002.121320
330.00020010.50000016.263456366.500000317.000000416.0000000.000000366.500000317.000000416.0000000.2124881.0000000.0000001.0000000.0000008.50000016.263456
340.000400-2.0000001.414214493.500000401.000000586.0000000.500000401.000000401.000000401.0000000.391830-0.5000002.1213200.0000001.414214-1.5000002.121320
350.00010011.50000014.849242472.500000439.000000506.0000000.000000472.500000439.000000506.0000000.1377451.0000000.0000001.0000000.0000009.50000014.849242
360.0004001.0000000.000000427.500000353.000000502.0000000.000000427.500000353.000000502.0000000.3793351.0000000.0000001.0000000.000000-1.0000000.000000
370.0001001.0000000.000000352.500000327.000000378.0000000.000000352.500000327.000000378.0000000.1343681.0000000.0000001.0000000.000000-1.0000000.000000
380.0000000.0000001.414214364.000000356.000000372.0000000.000000364.000000356.000000372.0000000.0141641.0000000.0000001.0000000.000000-2.0000001.414214
390.0006001.0000000.000000389.500000303.000000476.0000000.000000389.500000303.000000476.0000000.6047661.0000000.0000001.0000000.000000-1.0000000.000000
400.0006000.0000001.414214352.000000262.000000442.0000000.000000352.000000262.000000442.0000000.5887451.0000000.0000001.0000000.000000-2.0000001.414214
410.0000000.0000001.414214419.500000405.000000434.0000000.000000419.500000405.000000434.0000000.0483871.0000000.0000001.0000000.000000-2.0000001.414214
420.0010000.5000004.949748396.500000207.000000586.0000000.500000207.000000207.000000207.0000000.962396-0.5000002.1213200.0000001.4142141.0000001.414214
430.000900-1.0000002.828427443.500000301.000000586.0000000.500000301.000000301.000000301.0000000.862072-0.5000002.1213200.0000001.414214-0.5000000.707107
440.0007001.0000000.000000453.500000325.000000582.0000000.000000453.500000325.000000582.0000000.7362911.0000000.0000001.0000000.000000-1.0000000.000000
450.0001001.0000000.000000502.000000483.000000521.0000000.000000502.000000483.000000521.0000000.0810931.0000000.0000001.0000000.000000-1.0000000.000000
460.0005001.0000000.000000404.000000327.000000481.0000000.000000404.000000327.000000481.0000000.4832801.0000000.0000001.0000000.000000-1.0000000.000000
470.0002000.5000004.949748548.000000510.000000586.0000000.500000510.000000510.000000510.0000000.195080-0.5000002.1213200.0000001.4142141.0000001.414214
480.000900-1.0000000.000000383.000000262.000000504.0000000.000000383.000000262.000000504.0000000.8648011.0000000.0000001.0000000.000000-3.0000000.000000
490.0000000.0000001.414214407.000000395.000000419.0000000.000000407.000000395.000000419.0000000.0395481.0000000.0000001.0000000.000000-2.0000001.414214
500.0004000.0000001.414214393.500000336.000000451.0000000.000000393.500000336.000000451.0000000.4431141.0000000.0000001.0000000.000000-2.0000001.414214
510.001000-1.0000002.828427440.000000294.000000586.0000000.500000294.000000294.000000294.0000001.009106-0.5000002.1213200.0000001.414214-0.5000000.707107
520.0001000.0000001.414214411.500000398.000000425.0000000.000000411.500000398.000000425.0000000.0659661.0000000.0000001.0000000.000000-2.0000001.414214
530.0001000.0000001.414214408.000000392.000000424.0000000.000000408.000000392.000000424.0000000.0854531.0000000.0000001.0000000.000000-2.0000001.414214
540.0006000.0000001.414214393.000000316.000000470.0000000.000000393.000000316.000000470.0000000.6053831.0000000.0000001.0000000.000000-2.0000001.414214
550.0003002.5000002.121320462.500000411.000000514.0000000.000000462.500000411.000000514.0000000.3361331.0000000.0000001.0000000.0000000.5000002.121320
560.0023001.0000000.000000236.50000035.000000438.0000000.000000236.50000035.000000438.0000002.3390791.0000000.0000001.0000000.000000-1.0000000.000000
570.0002001.0000000.000000378.500000350.000000407.0000000.000000378.500000350.000000407.0000000.2306871.0000000.0000001.0000000.000000-1.0000000.000000
580.0003001.0000000.000000362.000000324.000000400.0000000.000000362.000000324.000000400.0000000.2884071.0000000.0000001.0000000.000000-1.0000000.000000
590.0008000.0000001.414214362.000000273.000000451.0000000.000000362.000000273.000000451.0000000.8424411.0000000.0000001.0000000.000000-2.0000001.414214
600.0003002.5000002.121320377.500000342.000000413.0000000.000000377.500000342.000000413.0000000.3040211.0000000.0000001.0000000.0000000.5000002.121320
610.0023001.0000000.000000416.000000378.000000454.0000000.000000416.000000378.000000454.0000002.2973661.0000000.0000001.0000000.000000-1.0000000.000000
620.0036000.0000001.414214388.500000280.000000497.0000000.000000388.500000280.000000497.0000003.6217141.0000000.0000001.0000000.000000-2.0000001.414214
630.0012009.50000017.677670540.000000494.000000586.0000000.500000494.000000494.000000494.0000001.174912-0.5000002.1213200.0000001.41421410.00000014.142136
640.0030000.0000001.414214356.500000321.000000392.0000000.000000356.500000321.000000392.0000003.0252011.0000000.0000001.0000000.000000-2.0000001.414214
650.001600-1.0000002.828427525.000000464.000000586.0000000.500000464.000000464.000000464.0000001.621681-0.5000002.1213200.0000001.414214-0.5000000.707107
660.001400-2.0000001.414214479.500000373.000000586.0000000.500000373.000000373.000000373.0000001.405418-0.5000002.1213200.0000001.414214-1.5000002.121320
670.0002000.0000001.414214501.500000469.000000534.0000000.000000501.500000469.000000534.0000000.2016201.0000000.0000001.0000000.000000-2.0000001.414214
680.000600-2.0000001.414214504.500000423.000000586.0000000.500000423.000000423.000000423.0000000.587288-0.5000002.1213200.0000001.414214-1.5000002.121320
690.0000001.0000000.000000378.000000370.000000386.0000000.000000378.000000370.000000386.0000000.0167461.0000000.0000001.0000000.000000-1.0000000.000000
700.0005001.0000000.000000453.500000372.000000535.0000000.000000453.500000372.000000535.0000000.5171991.0000000.0000001.0000000.000000-1.0000000.000000
710.000200-1.0000002.828427556.000000526.000000586.0000000.500000526.000000526.000000526.0000000.162072-0.5000002.1213200.0000001.414214-0.5000000.707107
720.0003000.0000001.414214359.500000322.000000397.0000000.000000359.500000322.000000397.0000000.3144931.0000000.0000001.0000000.000000-2.0000001.414214
730.0010001.0000000.000000378.000000252.000000504.0000000.000000378.000000252.000000504.0000000.9837421.0000000.0000001.0000000.000000-1.0000000.000000
740.0010001.0000000.000000324.500000211.000000438.0000000.000000324.500000211.000000438.0000000.9509151.0000000.0000001.0000000.000000-1.0000000.000000
750.000700-1.0000002.828427488.000000390.000000586.0000000.500000390.000000390.000000390.0000000.661162-0.5000002.1213200.0000001.414214-0.5000000.707107
760.000100-1.0000000.000000327.500000314.000000341.0000000.000000327.500000314.000000341.0000000.0687571.0000000.0000001.0000000.000000-3.0000000.000000
770.0014001.0000000.000000309.000000176.000000442.0000000.000000309.000000176.000000442.0000001.4291871.0000000.0000001.0000000.000000-1.0000000.000000
780.000000-1.0000000.000000414.500000406.000000423.0000000.000000414.500000406.000000423.0000000.0215601.0000000.0000001.0000000.000000-3.0000000.000000
790.0002002.5000002.121320444.500000406.000000483.0000000.000000444.500000406.000000483.0000000.2296701.0000000.0000001.0000000.0000000.5000002.121320
800.0001000.0000001.414214468.500000441.000000496.0000000.000000468.500000441.000000496.0000000.1382391.0000000.0000001.0000000.000000-2.0000001.414214
810.0002002.5000002.121320422.000000391.000000453.0000000.000000422.000000391.000000453.0000000.2130071.0000000.0000001.0000000.0000000.5000002.121320
820.0002001.0000000.000000411.000000376.000000446.0000000.000000411.000000376.000000446.0000000.1957461.0000000.0000001.0000000.000000-1.0000000.000000
830.000200-10.50000016.263456352.500000323.000000382.0000000.000000352.500000323.000000382.0000000.177328-0.5000002.121320-9.50000014.849242-0.5000000.707107
840.0005001.0000000.000000421.000000345.000000497.0000000.000000421.000000345.000000497.0000000.5336361.0000000.0000001.0000000.000000-1.0000000.000000
850.0000000.0000001.414214378.000000370.000000386.0000000.000000378.000000370.000000386.0000000.0306391.0000000.0000001.0000000.000000-2.0000001.414214
860.00010011.50000014.849242369.000000349.000000389.0000000.000000369.000000349.000000389.0000000.1138981.0000000.0000001.0000000.0000009.50000014.849242
870.0005001.0000000.000000399.000000327.000000471.0000000.000000399.000000327.000000471.0000000.5282621.0000000.0000001.0000000.000000-1.0000000.000000
880.0003000.0000001.414214315.000000281.000000349.0000000.000000315.000000281.000000349.0000000.2855101.0000000.0000001.0000000.000000-2.0000001.414214
890.0009000.0000001.414214394.500000270.000000519.0000000.000000394.500000270.000000519.0000000.9458251.0000000.0000001.0000000.000000-2.0000001.414214
900.0004001.0000000.000000385.500000339.000000432.0000000.000000385.500000339.000000432.0000000.3665691.0000000.0000001.0000000.000000-1.0000000.000000
910.0004001.0000000.000000319.500000272.000000367.0000000.000000319.500000272.000000367.0000000.4082001.0000000.0000001.0000000.000000-1.0000000.000000
920.0002002.5000002.121320399.500000375.000000424.0000000.000000399.500000375.000000424.0000000.1802631.0000000.0000001.0000000.0000000.5000002.121320
930.000900-1.0000000.000000319.000000228.000000410.0000000.000000319.000000228.000000410.0000000.8569831.0000000.0000001.0000000.000000-3.0000000.000000
940.0000000.0000001.414214364.500000362.000000367.0000000.000000364.500000362.000000367.0000000.0094201.0000000.0000001.0000000.000000-2.0000001.414214
950.0002002.5000002.121320431.000000402.000000460.0000000.000000431.000000402.000000460.0000000.2009821.0000000.0000001.0000000.0000000.5000002.121320
960.0002001.0000000.000000531.000000493.000000569.0000000.000000531.000000493.000000569.0000000.2307571.0000000.0000001.0000000.000000-1.0000000.000000
970.000800-1.0000002.828427488.000000390.000000586.0000000.500000390.000000390.000000390.0000000.789454-0.5000002.1213200.0000001.414214-0.5000000.707107
980.0007001.0000000.000000331.000000234.000000428.0000000.000000331.000000234.000000428.0000000.7361851.0000000.0000001.0000000.000000-1.0000000.000000
990.0003002.5000002.121320462.000000422.000000502.0000000.000000462.000000422.000000502.0000000.2865631.0000000.0000001.0000000.0000000.5000002.121320
1000.0000001.0000000.000000443.500000436.000000451.0000000.000000443.500000436.000000451.0000000.0223891.0000000.0000001.0000000.000000-1.0000000.000000
1010.0000001.0000000.000000417.500000413.000000422.0000000.000000417.500000413.000000422.0000000.0092631.0000000.0000001.0000000.000000-1.0000000.000000
1020.0005001.0000000.000000349.000000296.000000402.0000000.000000349.000000296.000000402.0000000.4790791.0000000.0000001.0000000.000000-1.0000000.000000
1030.0001000.0000001.414214349.000000336.000000362.0000000.000000349.000000336.000000362.0000000.0816611.0000000.0000001.0000000.000000-2.0000001.414214
1040.0000000.0000001.414214487.000000485.000000489.0000000.000000487.000000485.000000489.0000000.0055311.0000000.0000001.0000000.000000-2.0000001.414214
1050.0001000.0000001.414214386.000000371.000000401.0000000.000000386.000000371.000000401.0000000.0884441.0000000.0000001.0000000.000000-2.0000001.414214
1060.0000002.5000002.121320486.000000481.000000491.0000000.000000486.000000481.000000491.0000000.0094481.0000000.0000001.0000000.0000000.5000002.121320
1070.0008001.0000000.000000317.500000225.000000410.0000000.000000317.500000225.000000410.0000000.8098161.0000000.0000001.0000000.000000-1.0000000.000000
1080.0006001.0000000.000000385.000000304.000000466.0000000.000000385.000000304.000000466.0000000.6036381.0000000.0000001.0000000.000000-1.0000000.000000
1090.00040011.50000014.849242431.000000368.000000494.0000000.000000431.000000368.000000494.0000000.4433671.0000000.0000001.0000000.0000009.50000014.849242
1100.0003002.5000002.121320391.000000350.000000432.0000000.000000391.000000350.000000432.0000000.2686841.0000000.0000001.0000000.0000000.5000002.121320
1110.0002001.0000000.000000463.500000427.000000500.0000000.000000463.500000427.000000500.0000000.2269921.0000000.0000001.0000000.000000-1.0000000.000000
1120.00010011.50000014.849242441.500000425.000000458.0000000.000000441.500000425.000000458.0000000.0883751.0000000.0000001.0000000.0000009.50000014.849242
1130.000500-1.0000002.828427513.000000440.000000586.0000000.500000440.000000440.000000440.0000000.507206-0.5000002.1213200.0000001.414214-0.5000000.707107
1140.0005001.0000000.000000475.000000403.000000547.0000000.000000475.000000403.000000547.0000000.4767341.0000000.0000001.0000000.000000-1.0000000.000000
1150.0003001.0000000.000000472.000000427.000000517.0000000.000000472.000000427.000000517.0000000.3252481.0000000.0000001.0000000.000000-1.0000000.000000
1160.0001000.0000001.414214387.000000375.000000399.0000000.000000387.000000375.000000399.0000000.0529311.0000000.0000001.0000000.000000-2.0000001.414214
1170.000300-1.0000002.828427547.000000508.000000586.0000000.500000508.000000508.000000508.0000000.258358-0.5000002.1213200.0000001.414214-0.5000000.707107
1180.0001001.0000000.000000457.500000434.000000481.0000000.000000457.500000434.000000481.0000000.1492021.0000000.0000001.0000000.000000-1.0000000.000000
1190.0000001.0000000.000000411.000000407.000000415.0000000.000000411.000000407.000000415.0000000.0091521.0000000.0000001.0000000.000000-1.0000000.000000
1200.0001001.0000000.000000456.000000441.000000471.0000000.000000456.000000441.000000471.0000000.0834701.0000000.0000001.0000000.000000-1.0000000.000000
1210.0002001.0000000.000000448.500000414.000000483.0000000.000000448.500000414.000000483.0000000.2414331.0000000.0000001.0000000.000000-1.0000000.000000
1220.0005001.0000000.000000347.500000291.000000404.0000000.000000347.500000291.000000404.0000000.4966671.0000000.0000001.0000000.000000-1.0000000.000000
1230.000700-1.0000002.828427485.000000384.000000586.0000000.500000384.000000384.000000384.0000000.706807-0.5000002.1213200.0000001.414214-0.5000000.707107
1240.00000011.50000014.849242275.500000266.000000285.0000000.000000275.500000266.000000285.0000000.0477161.0000000.0000001.0000000.0000009.50000014.849242
1250.0001001.0000000.000000439.500000417.000000462.0000000.000000439.500000417.000000462.0000000.1391991.0000000.0000001.0000000.000000-1.0000000.000000
1260.000300-10.50000016.263456402.500000349.000000456.0000000.000000402.500000349.000000456.0000000.339013-0.5000002.121320-9.50000014.849242-0.5000000.707107
1270.000800-1.0000002.828427468.500000351.000000586.0000000.500000351.000000351.000000351.0000000.831395-0.5000002.1213200.0000001.414214-0.5000000.707107
1280.0000001.0000000.000000452.000000445.000000459.0000000.000000452.000000445.000000459.0000000.0236881.0000000.0000001.0000000.000000-1.0000000.000000
1290.0000001.0000000.000000481.000000469.000000493.0000000.000000481.000000469.000000493.0000000.0424061.0000000.0000001.0000000.000000-1.0000000.000000
1300.000700-1.0000002.828427480.000000374.000000586.0000000.500000374.000000374.000000374.0000000.701966-0.5000002.1213200.0000001.414214-0.5000000.707107
1310.0003001.0000000.000000365.500000325.000000406.0000000.000000365.500000325.000000406.0000000.3421121.0000000.0000001.0000000.000000-1.0000000.000000
1320.0002001.0000000.000000393.000000358.000000428.0000000.000000393.000000358.000000428.0000000.2373561.0000000.0000001.0000000.000000-1.0000000.000000
1330.0000001.0000000.000000454.500000443.000000466.0000000.000000454.500000443.000000466.0000000.0489991.0000000.0000001.0000000.000000-1.0000000.000000
1340.0002000.0000001.414214371.000000347.000000395.0000000.000000371.000000347.000000395.0000000.1542941.0000000.0000001.0000000.000000-2.0000001.414214
1350.0000000.0000001.414214538.500000529.000000548.0000000.000000538.500000529.000000548.0000000.0377931.0000000.0000001.0000000.000000-2.0000001.414214
1360.0002000.0000001.414214367.500000339.000000396.0000000.000000367.500000339.000000396.0000000.1558491.0000000.0000001.0000000.000000-2.0000001.414214
1370.000000-10.50000016.263456525.500000523.000000528.0000000.000000525.500000523.000000528.0000000.009064-0.5000002.121320-9.50000014.849242-0.5000000.707107
1380.0002000.0000001.414214456.000000419.000000493.0000000.000000456.000000419.000000493.0000000.2101611.0000000.0000001.0000000.000000-2.0000001.414214
1390.0000001.0000000.000000377.000000367.000000387.0000000.000000377.000000367.000000387.0000000.0374561.0000000.0000001.0000000.000000-1.0000000.000000
1400.0003001.0000000.000000399.500000351.000000448.0000000.000000399.500000351.000000448.0000000.3024181.0000000.0000001.0000000.000000-1.0000000.000000
1410.0008001.0000000.000000311.000000234.000000388.0000000.000000311.000000234.000000388.0000000.7590171.0000000.0000001.0000000.000000-1.0000000.000000
1420.0005000.7500000.353553453.000000382.000000524.0000000.000000453.000000382.000000524.0000000.4695940.2500001.0606601.0000000.000000-0.5000000.707107
1430.0007001.0000000.000000453.500000344.000000563.0000000.000000453.500000344.000000563.0000000.7319241.0000000.0000001.0000000.000000-1.0000000.000000
1440.00040010.50000016.263456402.000000344.000000460.0000000.000000402.000000344.000000460.0000000.4003661.0000000.0000001.0000000.0000008.50000016.263456
1450.0000001.0000000.000000457.500000446.000000469.0000000.000000457.500000446.000000469.0000000.0439441.0000000.0000001.0000000.000000-1.0000000.000000
1460.0008000.0000001.414214431.000000333.000000529.0000000.000000431.000000333.000000529.0000000.7867331.0000000.0000001.0000000.000000-2.0000001.414214
1470.000400-1.0000002.828427523.500000461.000000586.0000000.500000461.000000461.000000461.0000000.439689-0.5000002.1213200.0000001.414214-0.5000000.707107
1480.0001001.0000000.000000399.500000377.000000422.0000000.000000399.500000377.000000422.0000000.1437651.0000000.0000001.0000000.000000-1.0000000.000000
1490.0009001.0000000.000000321.500000237.000000406.0000000.000000321.500000237.000000406.0000000.8788361.0000000.0000001.0000000.000000-1.0000000.000000
1500.0001001.0000000.000000400.500000374.000000427.0000000.000000400.500000374.000000427.0000000.1433601.0000000.0000001.0000000.000000-1.0000000.000000
1510.0000001.0000000.000000332.500000328.000000337.0000000.000000332.500000328.000000337.0000000.0173931.0000000.0000001.0000000.000000-1.0000000.000000
1520.000100-1.0000002.828427557.000000528.000000586.0000000.500000528.000000528.000000528.0000000.117714-0.5000002.1213200.0000001.414214-0.5000000.707107
1530.0001001.0000000.000000390.000000376.000000404.0000000.000000390.000000376.000000404.0000000.0627301.0000000.0000001.0000000.000000-1.0000000.000000
1540.000000-3.0000000.000000586.000000586.000000586.0000001.0000000.0000000.0000000.0000000.006829-2.0000000.000000-1.0000000.0000000.0000000.000000
1550.0003001.0000000.000000431.500000397.000000466.0000000.000000431.500000397.000000466.0000000.2810811.0000000.0000001.0000000.000000-1.0000000.000000
1560.000500-2.0000001.414214513.500000441.000000586.0000000.500000441.000000441.000000441.0000000.498690-0.5000002.1213200.0000001.414214-1.5000002.121320
1570.0003001.0000000.000000530.000000485.000000575.0000000.000000530.000000485.000000575.0000000.3169011.0000000.0000001.0000000.000000-1.0000000.000000
1580.000800-1.0000002.828427466.500000347.000000586.0000000.500000347.000000347.000000347.0000000.753139-0.5000002.1213200.0000001.414214-0.5000000.707107
1590.000900-1.0000002.828427462.500000339.000000586.0000000.500000339.000000339.000000339.0000000.947311-0.5000002.1213200.0000001.414214-0.5000000.707107
1600.0003000.0000001.414214489.000000439.000000539.0000000.000000489.000000439.000000539.0000000.3256181.0000000.0000001.0000000.000000-2.0000001.414214
1610.00010011.50000014.849242485.500000470.000000501.0000000.000000485.500000470.000000501.0000000.0795201.0000000.0000001.0000000.0000009.50000014.849242
1620.000700-1.0000002.828427497.000000408.000000586.0000000.500000408.000000408.000000408.0000000.683351-0.5000002.1213200.0000001.414214-0.5000000.707107
1630.000800-1.0000002.828427461.500000337.000000586.0000000.500000337.000000337.000000337.0000000.841592-0.5000002.1213200.0000001.414214-0.5000000.707107
1640.0003002.2500002.474874420.000000388.000000452.0000000.000000420.000000388.000000452.0000000.2511960.2500001.0606601.0000000.0000001.0000001.414214
1650.0000002.5000002.121320412.500000408.000000417.0000000.000000412.500000408.000000417.0000000.0178201.0000000.0000001.0000000.0000000.5000002.121320
1660.000500-1.0000002.828427523.000000460.000000586.0000000.500000460.000000460.000000460.0000000.484418-0.5000002.1213200.0000001.414214-0.5000000.707107
1670.0000002.5000002.121320388.500000383.000000394.0000000.000000388.500000383.000000394.0000000.0141411.0000000.0000001.0000000.0000000.5000002.121320
1680.0001001.0000000.000000407.000000382.000000432.0000000.000000407.000000382.000000432.0000000.1199751.0000000.0000001.0000000.000000-1.0000000.000000
1690.0002000.7500000.353553445.500000418.000000473.0000000.000000445.500000418.000000473.0000000.1548820.2500001.0606601.0000000.000000-0.5000000.707107
1700.0005000.0000001.414214422.500000355.000000490.0000000.000000422.500000355.000000490.0000000.5359371.0000000.0000001.0000000.000000-2.0000001.414214
1710.000300-1.0000000.000000399.000000356.000000442.0000000.000000399.000000356.000000442.0000000.3383581.0000000.0000001.0000000.000000-3.0000000.000000
1720.0002002.5000002.121320479.000000444.000000514.0000000.000000479.000000444.000000514.0000000.2043281.0000000.0000001.0000000.0000000.5000002.121320
1730.0004001.0000000.000000316.500000270.000000363.0000000.000000316.500000270.000000363.0000000.3888641.0000000.0000001.0000000.000000-1.0000000.000000
1740.0001001.0000000.000000391.000000368.000000414.0000000.000000391.000000368.000000414.0000000.1405391.0000000.0000001.0000000.000000-1.0000000.000000
1750.0006001.0000000.000000236.000000179.000000293.0000000.000000236.000000179.000000293.0000000.5715251.0000000.0000001.0000000.000000-1.0000000.000000
1760.0000001.0000000.000000412.000000401.000000423.0000000.000000412.000000401.000000423.0000000.0498521.0000000.0000001.0000000.000000-1.0000000.000000
1770.000500-1.0000002.828427517.500000449.000000586.0000000.500000449.000000449.000000449.0000000.469758-0.5000002.1213200.0000001.414214-0.5000000.707107
1780.0004000.0000001.414214434.000000386.000000482.0000000.000000434.000000386.000000482.0000000.3696521.0000000.0000001.0000000.000000-2.0000001.414214
1790.0024001.0000000.000000231.00000043.000000419.0000000.000000231.00000043.000000419.0000002.3681101.0000000.0000001.0000000.000000-1.0000000.000000
1800.0006001.0000000.000000487.500000405.000000570.0000000.000000487.500000405.000000570.0000000.5708141.0000000.0000001.0000000.000000-1.0000000.000000
1810.0003001.0000000.000000428.000000373.000000483.0000000.000000428.000000373.000000483.0000000.3425841.0000000.0000001.0000000.000000-1.0000000.000000
1820.0000001.0000000.000000444.500000442.000000447.0000000.000000444.500000442.000000447.0000000.0110681.0000000.0000001.0000000.000000-1.0000000.000000
1830.0002001.0000000.000000414.500000384.000000445.0000000.000000414.500000384.000000445.0000000.2020641.0000000.0000001.0000000.000000-1.0000000.000000
1840.0005001.0000000.000000428.000000359.000000497.0000000.000000428.000000359.000000497.0000000.5235471.0000000.0000001.0000000.000000-1.0000000.000000
1850.001400-1.0000002.828427396.000000206.000000586.0000000.500000206.000000206.000000206.0000001.401757-0.5000002.1213200.0000001.414214-0.5000000.707107
1860.0002000.0000001.414214442.000000403.000000481.0000000.000000442.000000403.000000481.0000000.2322791.0000000.0000001.0000000.000000-2.0000001.414214
1870.0004001.0000000.000000462.500000407.000000518.0000000.000000462.500000407.000000518.0000000.3835841.0000000.0000001.0000000.000000-1.0000000.000000
1880.0000002.5000002.121320378.000000370.000000386.0000000.000000378.000000370.000000386.0000000.0344791.0000000.0000001.0000000.0000000.5000002.121320
1890.000700-1.0000002.828427484.500000383.000000586.0000000.500000383.000000383.000000383.0000000.655472-0.5000002.1213200.0000001.414214-0.5000000.707107
1900.000600-1.0000002.828427493.000000400.000000586.0000000.500000400.000000400.000000400.0000000.598795-0.5000002.1213200.0000001.414214-0.5000000.707107
1910.000700-1.0000002.828427488.500000391.000000586.0000000.500000391.000000391.000000391.0000000.695953-0.5000002.1213200.0000001.414214-0.5000000.707107
1920.000900-2.0000001.414214453.000000320.000000586.0000000.500000320.000000320.000000320.0000000.852964-0.5000002.1213200.0000001.414214-1.5000002.121320
1930.0008001.0000000.000000328.500000243.000000414.0000000.000000328.500000243.000000414.0000000.7908621.0000000.0000001.0000000.000000-1.0000000.000000
1940.000600-1.0000002.828427499.500000413.000000586.0000000.500000413.000000413.000000413.0000000.605018-0.5000002.1213200.0000001.414214-0.5000000.707107
1950.0001001.0000000.000000431.500000401.000000462.0000000.000000431.500000401.000000462.0000000.1463441.0000000.0000001.0000000.000000-1.0000000.000000
1960.0003001.0000000.000000438.000000387.000000489.0000000.000000438.000000387.000000489.0000000.3203501.0000000.0000001.0000000.000000-1.0000000.000000
1970.0004000.0000001.414214531.000000479.000000583.0000000.000000531.000000479.000000583.0000000.4007661.0000000.0000001.0000000.000000-2.0000001.414214
1980.0007001.0000000.000000483.000000391.000000575.0000000.000000483.000000391.000000575.0000000.6594271.0000000.0000001.0000000.000000-1.0000000.000000
1990.0000001.0000000.000000409.000000403.000000415.0000000.000000409.000000403.000000415.0000000.0180091.0000000.0000001.0000000.000000-1.0000000.000000
2000.0002000.0000001.414214420.000000387.000000453.0000000.000000420.000000387.000000453.0000000.2395961.0000000.0000001.0000000.000000-2.0000001.414214
2010.00050011.50000014.849242456.500000394.000000519.0000000.000000456.500000394.000000519.0000000.5262301.0000000.0000001.0000000.0000009.50000014.849242
2020.0001001.0000000.000000444.500000432.000000457.0000000.000000444.500000432.000000457.0000000.0587891.0000000.0000001.0000000.000000-1.0000000.000000
2030.000700-1.0000002.828427484.000000382.000000586.0000000.500000382.000000382.000000382.0000000.743711-0.5000002.1213200.0000001.414214-0.5000000.707107
2040.0000000.0000001.414214384.500000379.000000390.0000000.000000384.500000379.000000390.0000000.0226951.0000000.0000001.0000000.000000-2.0000001.414214
2050.000800-1.0000002.828427476.000000366.000000586.0000000.500000366.000000366.000000366.0000000.771927-0.5000002.1213200.0000001.414214-0.5000000.707107
2060.0003001.0000000.000000341.000000301.000000381.0000000.000000341.000000301.000000381.0000000.3050821.0000000.0000001.0000000.000000-1.0000000.000000
2070.0002001.0000000.000000402.500000372.000000433.0000000.000000402.500000372.000000433.0000000.2007211.0000000.0000001.0000000.000000-1.0000000.000000
2080.0001001.0000000.000000446.000000424.000000468.0000000.000000446.000000424.000000468.0000000.1469361.0000000.0000001.0000000.000000-1.0000000.000000
2090.00000010.50000016.263456408.000000404.000000412.0000000.000000408.000000404.000000412.0000000.0248301.0000000.0000001.0000000.0000008.50000016.263456
2100.0005002.5000002.121320438.500000376.000000501.0000000.000000438.500000376.000000501.0000000.4552751.0000000.0000001.0000000.0000000.5000002.121320
2110.0003000.7500000.353553403.000000358.000000448.0000000.000000403.000000358.000000448.0000000.3153900.2500001.0606601.0000000.000000-0.5000000.707107
2120.000600-1.0000002.828427491.000000396.000000586.0000000.500000396.000000396.000000396.0000000.645317-0.5000002.1213200.0000001.414214-0.5000000.707107
2130.0005001.0000000.000000379.500000321.000000438.0000000.000000379.500000321.000000438.0000000.5024831.0000000.0000001.0000000.000000-1.0000000.000000
2140.0001001.0000000.000000430.500000418.000000443.0000000.000000430.500000418.000000443.0000000.0567371.0000000.0000001.0000000.000000-1.0000000.000000
2150.0000001.0000000.000000483.000000483.000000483.0000000.000000483.000000483.000000483.0000000.0190191.0000000.0000001.0000000.000000-1.0000000.000000
2160.0003001.0000000.000000400.500000364.000000437.0000000.000000400.500000364.000000437.0000000.2614711.0000000.0000001.0000000.000000-1.0000000.000000
2170.000500-1.0000002.828427507.000000428.000000586.0000000.500000428.000000428.000000428.0000000.505626-0.5000002.1213200.0000001.414214-0.5000000.707107
2180.0000001.0000000.000000402.000000401.000000403.0000000.000000402.000000401.000000403.0000000.0145241.0000000.0000001.0000000.000000-1.0000000.000000
2190.000500-10.50000016.263456388.000000317.000000459.0000000.000000388.000000317.000000459.0000000.490439-0.5000002.121320-9.50000014.849242-0.5000000.707107
2200.0016002.5000002.121320246.50000088.000000405.0000000.000000246.50000088.000000405.0000001.6251331.0000000.0000001.0000000.0000000.5000002.121320
2210.000300-10.50000016.263456412.000000370.000000454.0000000.000000412.000000370.000000454.0000000.281538-0.5000002.121320-9.50000014.849242-0.5000000.707107
2220.0003001.0000000.000000494.500000447.000000542.0000000.000000494.500000447.000000542.0000000.3387691.0000000.0000001.0000000.000000-1.0000000.000000
2230.0011001.0000000.000000274.500000160.000000389.0000000.000000274.500000160.000000389.0000001.1390901.0000000.0000001.0000000.000000-1.0000000.000000
2240.0008000.0000001.414214404.500000315.000000494.0000000.000000404.500000315.000000494.0000000.7842281.0000000.0000001.0000000.000000-2.0000001.414214
2250.0003001.0000000.000000357.500000313.000000402.0000000.000000357.500000313.000000402.0000000.3250361.0000000.0000001.0000000.000000-1.0000000.000000
2260.001000-2.0000001.414214462.000000338.000000586.0000000.500000338.000000338.000000338.0000001.006946-0.5000002.1213200.0000001.414214-1.5000002.121320
2270.00050011.50000014.849242450.500000384.000000517.0000000.000000450.500000384.000000517.0000000.5084361.0000000.0000001.0000000.0000009.50000014.849242
2280.0000001.0000000.000000474.000000470.000000478.0000000.000000474.000000470.000000478.0000000.0137771.0000000.0000001.0000000.000000-1.0000000.000000
2290.0003001.0000000.000000476.000000436.000000516.0000000.000000476.000000436.000000516.0000000.2968171.0000000.0000001.0000000.000000-1.0000000.000000
2300.0003000.0000001.414214437.000000398.000000476.0000000.000000437.000000398.000000476.0000000.2650591.0000000.0000001.0000000.000000-2.0000001.414214
2310.000500-1.0000002.828427515.500000445.000000586.0000000.500000445.000000445.000000445.0000000.529493-0.5000002.1213200.0000001.414214-0.5000000.707107
2320.0000000.0000001.414214421.500000417.000000426.0000000.000000421.500000417.000000426.0000000.0184081.0000000.0000001.0000000.000000-2.0000001.414214
2330.0004000.0000001.414214502.000000441.000000563.0000000.000000502.000000441.000000563.0000000.4398871.0000000.0000001.0000000.000000-2.0000001.414214
2340.000000-1.0000000.000000479.000000476.000000482.0000000.000000479.000000476.000000482.0000000.0199291.0000000.0000001.0000000.000000-3.0000000.000000
2350.0001000.0000001.414214413.000000398.000000428.0000000.000000413.000000398.000000428.0000000.0765991.0000000.0000001.0000000.000000-2.0000001.414214
2360.0007001.0000000.000000255.500000195.000000316.0000000.000000255.500000195.000000316.0000000.6993681.0000000.0000001.0000000.000000-1.0000000.000000
2370.0006001.0000000.000000478.000000390.000000566.0000000.000000478.000000390.000000566.0000000.6039651.0000000.0000001.0000000.000000-1.0000000.000000
2380.0000002.5000002.121320417.000000408.000000426.0000000.000000417.000000408.000000426.0000000.0415141.0000000.0000001.0000000.0000000.5000002.121320
2390.000800-12.50000013.435029466.500000347.000000586.0000000.500000347.000000347.000000347.0000000.839215-2.0000000.000000-10.50000013.4350290.0000000.000000
2400.0003001.0000000.000000321.000000278.000000364.0000000.000000321.000000278.000000364.0000000.3220701.0000000.0000001.0000000.000000-1.0000000.000000
2410.0002002.5000002.121320431.500000396.000000467.0000000.000000431.500000396.000000467.0000000.2164661.0000000.0000001.0000000.0000000.5000002.121320
2420.00010011.50000014.849242465.500000447.000000484.0000000.000000465.500000447.000000484.0000000.1021691.0000000.0000001.0000000.0000009.50000014.849242
2430.0004002.5000002.121320396.500000344.000000449.0000000.000000396.500000344.000000449.0000000.3635391.0000000.0000001.0000000.0000000.5000002.121320
2440.0000001.0000000.000000452.500000447.000000458.0000000.000000452.500000447.000000458.0000000.0234961.0000000.0000001.0000000.000000-1.0000000.000000
2450.0005002.5000002.121320417.000000350.000000484.0000000.000000417.000000350.000000484.0000000.4809631.0000000.0000001.0000000.0000000.5000002.121320
2460.0005001.0000000.000000340.000000284.000000396.0000000.000000340.000000284.000000396.0000000.5467401.0000000.0000001.0000000.000000-1.0000000.000000
2470.002300-1.0000002.828427325.00000064.000000586.0000000.50000064.00000064.00000064.0000002.256215-0.5000002.1213200.0000001.414214-0.5000000.707107
2480.0001000.0000001.414214441.500000423.000000460.0000000.000000441.500000423.000000460.0000000.0850671.0000000.0000001.0000000.000000-2.0000001.414214
2490.0005001.0000000.000000235.000000197.000000273.0000000.000000235.000000197.000000273.0000000.5012711.0000000.0000001.0000000.000000-1.0000000.000000
2500.00030011.50000014.849242432.000000388.000000476.0000000.000000432.000000388.000000476.0000000.3288491.0000000.0000001.0000000.0000009.50000014.849242
2510.00010011.50000014.849242453.000000432.000000474.0000000.000000453.000000432.000000474.0000000.1212341.0000000.0000001.0000000.0000009.50000014.849242
2520.0001001.0000000.000000395.500000380.000000411.0000000.000000395.500000380.000000411.0000000.1132951.0000000.0000001.0000000.000000-1.0000000.000000
2530.000800-1.0000002.828427482.000000378.000000586.0000000.500000378.000000378.000000378.0000000.784589-0.5000002.1213200.0000001.414214-0.5000000.707107
2540.0012002.5000002.121320232.000000148.000000316.0000000.000000232.000000148.000000316.0000001.1580221.0000000.0000001.0000000.0000000.5000002.121320
2550.0003000.0000001.414214412.000000370.000000454.0000000.000000412.000000370.000000454.0000000.2716801.0000000.0000001.0000000.000000-2.0000001.414214
2560.0002001.0000000.000000397.000000376.000000418.0000000.000000397.000000376.000000418.0000000.1567231.0000000.0000001.0000000.000000-1.0000000.000000
2570.0001001.0000000.000000427.500000411.000000444.0000000.000000427.500000411.000000444.0000000.0833841.0000000.0000001.0000000.000000-1.0000000.000000
2580.0004001.0000000.000000510.000000456.000000564.0000000.000000510.000000456.000000564.0000000.3984731.0000000.0000001.0000000.000000-1.0000000.000000
2590.0001000.0000001.414214452.500000440.000000465.0000000.000000452.500000440.000000465.0000000.0801181.0000000.0000001.0000000.000000-2.0000001.414214
2600.0002002.5000002.121320419.500000393.000000446.0000000.000000419.500000393.000000446.0000000.1717591.0000000.0000001.0000000.0000000.5000002.121320
2610.0001000.0000001.414214440.000000419.000000461.0000000.000000440.000000419.000000461.0000000.1338601.0000000.0000001.0000000.000000-2.0000001.414214
2620.0007001.0000000.000000456.000000369.000000543.0000000.000000456.000000369.000000543.0000000.7153801.0000000.0000001.0000000.000000-1.0000000.000000
2630.0002002.5000002.121320382.000000361.000000403.0000000.000000382.000000361.000000403.0000000.1531491.0000000.0000001.0000000.0000000.5000002.121320
2640.000500-1.0000002.828427516.500000447.000000586.0000000.500000447.000000447.000000447.0000000.521685-0.5000002.1213200.0000001.414214-0.5000000.707107
2650.0003001.0000000.000000369.500000336.000000403.0000000.000000369.500000336.000000403.0000000.2609261.0000000.0000001.0000000.000000-1.0000000.000000
2660.0022001.0000000.000000202.50000044.000000361.0000000.000000202.50000044.000000361.0000002.2008791.0000000.0000001.0000000.000000-1.0000000.000000
2670.0001002.5000002.121320393.500000383.000000404.0000000.000000393.500000383.000000404.0000000.0573581.0000000.0000001.0000000.0000000.5000002.121320
2680.0002002.5000002.121320411.000000380.000000442.0000000.000000411.000000380.000000442.0000000.2331031.0000000.0000001.0000000.0000000.5000002.121320
2690.0001004.0000000.000000407.000000386.000000428.0000000.000000407.000000386.000000428.0000000.1093961.0000000.0000001.0000000.0000002.0000000.000000
2700.0008001.0000000.000000144.000000106.000000182.0000000.000000144.000000106.000000182.0000000.8061451.0000000.0000001.0000000.000000-1.0000000.000000
2710.0011009.50000017.677670445.500000305.000000586.0000000.500000305.000000305.000000305.0000001.084759-0.5000002.1213200.0000001.41421410.00000014.142136
2720.001200-1.0000002.828427414.500000243.000000586.0000000.500000243.000000243.000000243.0000001.162079-0.5000002.1213200.0000001.414214-0.5000000.707107
2730.000600-1.0000002.828427501.500000417.000000586.0000000.500000417.000000417.000000417.0000000.640193-0.5000002.1213200.0000001.414214-0.5000000.707107
2740.0011001.0000000.000000341.000000236.000000446.0000000.000000341.000000236.000000446.0000001.0962991.0000000.0000001.0000000.000000-1.0000000.000000
2750.00230011.50000014.849242221.50000047.000000396.0000000.000000221.50000047.000000396.0000002.3121151.0000000.0000001.0000000.0000009.50000014.849242
2760.0020000.0000001.414214266.50000067.000000466.0000000.000000266.50000067.000000466.0000002.0034601.0000000.0000001.0000000.000000-2.0000001.414214
2770.000700-1.0000002.828427490.500000395.000000586.0000000.500000395.000000395.000000395.0000000.722101-0.5000002.1213200.0000001.414214-0.5000000.707107
2780.0004002.5000002.121320427.500000375.000000480.0000000.000000427.500000375.000000480.0000000.4430311.0000000.0000001.0000000.0000000.5000002.121320
2790.0006002.5000002.121320490.000000412.000000568.0000000.000000490.000000412.000000568.0000000.6120221.0000000.0000001.0000000.0000000.5000002.121320
2800.0020001.0000000.000000278.00000080.000000476.0000000.000000278.00000080.000000476.0000002.0130611.0000000.0000001.0000000.000000-1.0000000.000000
2810.0009001.0000000.000000130.50000091.000000170.0000000.000000130.50000091.000000170.0000000.9092331.0000000.0000001.0000000.000000-1.0000000.000000
2820.0021001.0000000.000000230.50000063.000000398.0000000.000000230.50000063.000000398.0000002.0701751.0000000.0000001.0000000.000000-1.0000000.000000
2830.0001001.0000000.000000471.500000451.000000492.0000000.000000471.500000451.000000492.0000000.1454571.0000000.0000001.0000000.000000-1.0000000.000000
2840.00200011.50000014.849242284.00000087.000000481.0000000.000000284.00000087.000000481.0000002.0153241.0000000.0000001.0000000.0000009.50000014.849242
2850.0001001.0000000.000000491.000000484.000000498.0000000.000000491.000000484.000000498.0000000.0622711.0000000.0000001.0000000.000000-1.0000000.000000
2860.0005001.0000000.00000080.50000065.00000096.0000000.00000080.50000065.00000096.0000000.5038061.0000000.0000001.0000000.000000-1.0000000.000000
2870.00060011.50000014.849242128.50000098.000000159.0000000.000000128.50000098.000000159.0000000.5733841.0000000.0000001.0000000.0000009.50000014.849242
2880.0004000.0000001.414214386.500000331.000000442.0000000.000000386.500000331.000000442.0000000.4133881.0000000.0000001.0000000.000000-2.0000001.414214
2890.0008001.0000000.000000199.500000132.000000267.0000000.000000199.500000132.000000267.0000000.8263521.0000000.0000001.0000000.000000-1.0000000.000000
2900.00010011.50000014.849242419.000000405.000000433.0000000.000000419.000000405.000000433.0000000.1010331.0000000.0000001.0000000.0000009.50000014.849242
2910.0021001.0000000.000000170.50000043.000000298.0000000.000000170.50000043.000000298.0000002.1041511.0000000.0000001.0000000.000000-1.0000000.000000
2920.0004001.0000000.000000466.500000418.000000515.0000000.000000466.500000418.000000515.0000000.3694201.0000000.0000001.0000000.000000-1.0000000.000000
2930.0021000.0000001.414214237.50000057.000000418.0000000.000000237.50000057.000000418.0000002.0746701.0000000.0000001.0000000.000000-2.0000001.414214
2940.0018001.0000000.000000209.50000066.000000353.0000000.000000209.50000066.000000353.0000001.8071481.0000000.0000001.0000000.000000-1.0000000.000000
2950.000600-1.0000002.828427500.000000414.000000586.0000000.500000414.000000414.000000414.0000000.593227-0.5000002.1213200.0000001.414214-0.5000000.707107
2960.0023002.5000002.121320234.50000042.000000427.0000000.000000234.50000042.000000427.0000002.2587811.0000000.0000001.0000000.0000000.5000002.121320
2970.00020011.50000014.849242435.000000408.000000462.0000000.000000435.000000408.000000462.0000000.2006521.0000000.0000001.0000000.0000009.50000014.849242
2980.0022001.0000000.000000234.00000049.000000419.0000000.000000234.00000049.000000419.0000002.1877111.0000000.0000001.0000000.000000-1.0000000.000000
2990.0002001.0000000.000000140.000000129.000000151.0000000.000000140.000000129.000000151.0000000.1679491.0000000.0000001.0000000.000000-1.0000000.000000
3000.0004001.0000000.000000217.500000194.000000241.0000000.000000217.500000194.000000241.0000000.3639471.0000000.0000001.0000000.000000-1.0000000.000000
3010.0014001.0000000.000000299.500000172.000000427.0000000.000000299.500000172.000000427.0000001.4318521.0000000.0000001.0000000.000000-1.0000000.000000
3020.00010011.50000014.849242427.000000415.000000439.0000000.000000427.000000415.000000439.0000000.0773461.0000000.0000001.0000000.0000009.50000014.849242
3030.0017001.0000000.000000231.50000094.000000369.0000000.000000231.50000094.000000369.0000001.6812311.0000000.0000001.0000000.000000-1.0000000.000000
3040.00080011.50000014.84924299.00000069.000000129.0000000.00000099.00000069.000000129.0000000.8313111.0000000.0000001.0000000.0000009.50000014.849242
3050.001900-1.0000002.828427346.000000106.000000586.0000000.500000106.000000106.000000106.0000001.936602-0.5000002.1213200.0000001.414214-0.5000000.707107
3060.0000002.5000002.121320414.000000410.000000418.0000000.000000414.000000410.000000418.0000000.0471051.0000000.0000001.0000000.0000000.5000002.121320
3070.002100-1.0000000.000000267.00000067.000000467.0000000.000000267.00000067.000000467.0000002.1467351.0000000.0000001.0000000.000000-3.0000000.000000
3080.0016002.5000002.121320292.500000136.000000449.0000000.000000292.500000136.000000449.0000001.5786771.0000000.0000001.0000000.0000000.5000002.121320
3090.0004000.0000001.414214459.000000407.000000511.0000000.000000459.000000407.000000511.0000000.3642201.0000000.0000001.0000000.000000-2.0000001.414214
3100.0023000.0000001.414214208.50000038.000000379.0000000.000000208.50000038.000000379.0000002.2876501.0000000.0000001.0000000.000000-2.0000001.414214
3110.0001002.5000002.121320412.000000399.000000425.0000000.000000412.000000399.000000425.0000000.0784181.0000000.0000001.0000000.0000000.5000002.121320
3120.00020011.50000014.849242498.500000473.000000524.0000000.000000498.500000473.000000524.0000000.1571591.0000000.0000001.0000000.0000009.50000014.849242
3130.0010000.0000001.414214282.500000196.000000369.0000000.000000282.500000196.000000369.0000001.0217771.0000000.0000001.0000000.000000-2.0000001.414214
3140.0020000.0000001.414214220.50000062.000000379.0000000.000000220.50000062.000000379.0000002.0180431.0000000.0000001.0000000.000000-2.0000001.414214
3150.00220011.25000015.202796293.50000075.000000512.0000000.000000293.50000075.000000512.0000002.2221400.2500001.0606601.0000000.00000010.00000014.142136
3160.0008001.0000000.000000458.500000363.000000554.0000000.000000458.500000363.000000554.0000000.7871281.0000000.0000001.0000000.000000-1.0000000.000000
3170.0003001.0000000.00000047.50000041.00000054.0000000.00000047.50000041.00000054.0000000.2821221.0000000.0000001.0000000.000000-1.0000000.000000
3180.0001002.5000002.121320360.000000353.000000367.0000000.000000360.000000353.000000367.0000000.0633401.0000000.0000001.0000000.0000000.5000002.121320
3190.0011001.0000000.000000383.000000247.000000519.0000000.000000383.000000247.000000519.0000001.1436691.0000000.0000001.0000000.000000-1.0000000.000000
3200.0011001.0000000.000000357.500000245.000000470.0000000.000000357.500000245.000000470.0000001.1461391.0000000.0000001.0000000.000000-1.0000000.000000
3210.0016001.0000000.000000318.500000145.000000492.0000000.000000318.500000145.000000492.0000001.6104791.0000000.0000001.0000000.000000-1.0000000.000000
3220.0015001.0000000.000000346.000000202.000000490.0000000.000000346.000000202.000000490.0000001.4587631.0000000.0000001.0000000.000000-1.0000000.000000
3230.0010002.5000002.121320359.000000243.000000475.0000000.000000359.000000243.000000475.0000000.9869621.0000000.0000001.0000000.0000000.5000002.121320
3240.002200-1.0000002.828427330.50000075.000000586.0000000.50000075.00000075.00000075.0000002.177847-0.5000002.1213200.0000001.414214-0.5000000.707107
3250.000800-1.0000002.828427469.000000352.000000586.0000000.500000352.000000352.000000352.0000000.815542-0.5000002.1213200.0000001.414214-0.5000000.707107
3260.0018001.0000000.000000305.000000141.000000469.0000000.000000305.000000141.000000469.0000001.7707991.0000000.0000001.0000000.000000-1.0000000.000000
3270.0017001.0000000.000000223.00000074.000000372.0000000.000000223.00000074.000000372.0000001.7395021.0000000.0000001.0000000.000000-1.0000000.000000
3280.00020011.50000014.849242474.000000444.000000504.0000000.000000474.000000444.000000504.0000000.2267221.0000000.0000001.0000000.0000009.50000014.849242
3290.0002001.0000000.000000484.000000458.000000510.0000000.000000484.000000458.000000510.0000000.1508671.0000000.0000001.0000000.000000-1.0000000.000000
3300.0004001.0000000.00000048.00000040.00000056.0000000.00000048.00000040.00000056.0000000.4008811.0000000.0000001.0000000.000000-1.0000000.000000
3310.0017001.0000000.000000211.50000082.000000341.0000000.000000211.50000082.000000341.0000001.6752321.0000000.0000001.0000000.000000-1.0000000.000000
3320.00090011.50000014.84924255.50000036.00000075.0000000.00000055.50000036.00000075.0000000.8554851.0000000.0000001.0000000.0000009.50000014.849242
3330.0004001.0000000.000000465.500000416.000000515.0000000.000000465.500000416.000000515.0000000.4188441.0000000.0000001.0000000.000000-1.0000000.000000
3340.00190011.50000014.849242257.50000094.000000421.0000000.000000257.50000094.000000421.0000001.8534501.0000000.0000001.0000000.0000009.50000014.849242
3350.0020001.0000000.000000248.50000075.000000422.0000000.000000248.50000075.000000422.0000001.9679511.0000000.0000001.0000000.000000-1.0000000.000000
3360.0012009.50000017.677670423.000000260.000000586.0000000.500000260.000000260.000000260.0000001.236872-0.5000002.1213200.0000001.41421410.00000014.142136
3370.0002001.0000000.000000212.000000201.000000223.0000000.000000212.000000201.000000223.0000000.1716211.0000000.0000001.0000000.000000-1.0000000.000000
3380.0015001.0000000.000000264.500000112.000000417.0000000.000000264.500000112.000000417.0000001.5315341.0000000.0000001.0000000.000000-1.0000000.000000
3390.00040011.50000014.84924267.00000057.00000077.0000000.00000067.00000057.00000077.0000000.3848071.0000000.0000001.0000000.0000009.50000014.849242
3400.0011001.0000000.000000260.500000174.000000347.0000000.000000260.500000174.000000347.0000001.0653511.0000000.0000001.0000000.000000-1.0000000.000000
3410.0002002.5000002.121320466.500000439.000000494.0000000.000000466.500000439.000000494.0000000.1986401.0000000.0000001.0000000.0000000.5000002.121320
3420.00170011.50000014.849242327.500000164.000000491.0000000.000000327.500000164.000000491.0000001.6793431.0000000.0000001.0000000.0000009.50000014.849242
3430.0018001.0000000.000000205.50000068.000000343.0000000.000000205.50000068.000000343.0000001.8111921.0000000.0000001.0000000.000000-1.0000000.000000
3440.00150011.50000014.849242368.500000212.000000525.0000000.000000368.500000212.000000525.0000001.4959911.0000000.0000001.0000000.0000009.50000014.849242
3450.0020001.0000000.000000246.00000064.000000428.0000000.000000246.00000064.000000428.0000002.0245411.0000000.0000001.0000000.000000-1.0000000.000000
3460.0002000.0000001.414214446.000000409.000000483.0000000.000000446.000000409.000000483.0000000.2279901.0000000.0000001.0000000.000000-2.0000001.414214
3470.001700-1.0000002.828427356.000000126.000000586.0000000.500000126.000000126.000000126.0000001.666988-0.5000002.1213200.0000001.414214-0.5000000.707107
3480.00040011.50000014.84924267.50000056.00000079.0000000.00000067.50000056.00000079.0000000.3501841.0000000.0000001.0000000.0000009.50000014.849242
3490.00080011.50000014.849242350.500000264.000000437.0000000.000000350.500000264.000000437.0000000.7733721.0000000.0000001.0000000.0000009.50000014.849242
3500.0022001.0000000.000000253.00000049.000000457.0000000.000000253.00000049.000000457.0000002.2302221.0000000.0000001.0000000.000000-1.0000000.000000
3510.001600-10.50000016.263456264.500000120.000000409.0000000.000000264.500000120.000000409.0000001.577635-0.5000002.121320-9.50000014.849242-0.5000000.707107
3520.0000001.0000000.000000490.500000487.000000494.0000000.000000490.500000487.000000494.0000000.0405261.0000000.0000001.0000000.000000-1.0000000.000000
3530.00120011.50000014.849242161.00000088.000000234.0000000.000000161.00000088.000000234.0000001.2194821.0000000.0000001.0000000.0000009.50000014.849242
3540.000700-2.0000001.414214486.000000386.000000586.0000000.500000386.000000386.000000386.0000000.681214-0.5000002.1213200.0000001.414214-1.5000002.121320
3550.0021001.0000000.000000237.00000056.000000418.0000000.000000237.00000056.000000418.0000002.1200751.0000000.0000001.0000000.000000-1.0000000.000000
3560.00040011.50000014.849242455.000000406.000000504.0000000.000000455.000000406.000000504.0000000.3820281.0000000.0000001.0000000.0000009.50000014.849242
3570.00230011.50000014.849242205.50000051.000000360.0000000.000000205.50000051.000000360.0000002.2643661.0000000.0000001.0000000.0000009.50000014.849242
3580.000000-1.0000002.828427581.000000576.000000586.0000000.500000576.000000576.000000576.0000000.037026-0.5000002.1213200.0000001.414214-0.5000000.707107
3590.0004001.0000000.000000157.500000134.000000181.0000000.000000157.500000134.000000181.0000000.4091691.0000000.0000001.0000000.000000-1.0000000.000000
3600.00190022.0000000.000000126.00000047.000000205.0000000.000000126.00000047.000000205.0000001.9423581.0000000.0000001.0000000.00000020.0000000.000000
3610.000700-1.0000002.828427494.500000403.000000586.0000000.500000403.000000403.000000403.0000000.698322-0.5000002.1213200.0000001.414214-0.5000000.707107
3620.0005001.0000000.000000413.500000355.000000472.0000000.000000413.500000355.000000472.0000000.4792361.0000000.0000001.0000000.000000-1.0000000.000000
3630.00170013.00000012.727922197.50000082.000000313.0000000.000000197.50000082.000000313.0000001.7375231.0000000.0000001.0000000.00000011.00000012.727922
3640.0001000.0000001.41421463.00000061.00000065.0000000.00000063.00000061.00000065.0000000.1303981.0000000.0000001.0000000.000000-2.0000001.414214
3650.0017001.0000000.000000137.50000048.000000227.0000000.000000137.50000048.000000227.0000001.7028121.0000000.0000001.0000000.000000-1.0000000.000000
3660.0004001.0000000.000000428.000000377.000000479.0000000.000000428.000000377.000000479.0000000.3892841.0000000.0000001.0000000.000000-1.0000000.000000
3670.0019001.0000000.000000230.00000067.000000393.0000000.000000230.00000067.000000393.0000001.9086301.0000000.0000001.0000000.000000-1.0000000.000000
3680.00140011.50000014.849242325.500000183.000000468.0000000.000000325.500000183.000000468.0000001.4461191.0000000.0000001.0000000.0000009.50000014.849242
3690.0020001.0000000.000000230.00000062.000000398.0000000.000000230.00000062.000000398.0000001.9904881.0000000.0000001.0000000.000000-1.0000000.000000
3700.0001001.0000000.000000500.500000491.000000510.0000000.000000500.500000491.000000510.0000000.0701481.0000000.0000001.0000000.000000-1.0000000.000000
3710.0020000.7500000.353553272.00000085.000000459.0000000.000000272.00000085.000000459.0000002.0396910.2500001.0606601.0000000.000000-0.5000000.707107
3720.00010011.50000014.84924256.00000052.00000060.0000000.00000056.00000052.00000060.0000000.1347651.0000000.0000001.0000000.0000009.50000014.849242
3730.0001001.0000000.00000081.50000079.00000084.0000000.00000081.50000079.00000084.0000000.1402181.0000000.0000001.0000000.000000-1.0000000.000000
3740.0023001.0000000.000000184.50000032.000000337.0000000.000000184.50000032.000000337.0000002.2534891.0000000.0000001.0000000.000000-1.0000000.000000
3750.0002001.0000000.000000370.500000343.000000398.0000000.000000370.500000343.000000398.0000000.1939321.0000000.0000001.0000000.000000-1.0000000.000000
3760.0012001.0000000.000000121.00000071.000000171.0000000.000000121.00000071.000000171.0000001.1605591.0000000.0000001.0000000.000000-1.0000000.000000
3770.0023001.0000000.000000219.00000042.000000396.0000000.000000219.00000042.000000396.0000002.2545081.0000000.0000001.0000000.000000-1.0000000.000000
3780.0021001.0000000.000000263.00000067.000000459.0000000.000000263.00000067.000000459.0000002.1183151.0000000.0000001.0000000.000000-1.0000000.000000
3790.00180011.50000014.849242206.50000073.000000340.0000000.000000206.50000073.000000340.0000001.8058671.0000000.0000001.0000000.0000009.50000014.849242
3800.000200-1.0000002.828427555.000000524.000000586.0000000.500000524.000000524.000000524.0000000.201501-0.5000002.1213200.0000001.414214-0.5000000.707107
3810.0006002.5000002.121320489.500000406.000000573.0000000.000000489.500000406.000000573.0000000.5772391.0000000.0000001.0000000.0000000.5000002.121320
3820.0006002.5000002.121320260.000000210.000000310.0000000.000000260.000000210.000000310.0000000.5922101.0000000.0000001.0000000.0000000.5000002.121320
3830.0021001.0000000.000000187.00000049.000000325.0000000.000000187.00000049.000000325.0000002.0598041.0000000.0000001.0000000.000000-1.0000000.000000
3840.0012001.0000000.000000134.00000070.000000198.0000000.000000134.00000070.000000198.0000001.2425091.0000000.0000001.0000000.000000-1.0000000.000000
3850.0002000.0000001.414214441.000000413.000000469.0000000.000000441.000000413.000000469.0000000.2489321.0000000.0000001.0000000.000000-2.0000001.414214
3860.002300-1.0000002.828427327.00000068.000000586.0000000.50000068.00000068.00000068.0000002.298598-0.5000002.1213200.0000001.414214-0.5000000.707107
3870.00020011.50000014.84924283.50000077.00000090.0000000.00000083.50000077.00000090.0000000.1955931.0000000.0000001.0000000.0000009.50000014.849242
3880.00020022.0000000.00000072.50000069.00000076.0000000.00000072.50000069.00000076.0000000.2049351.0000000.0000001.0000000.00000020.0000000.000000
3890.0009001.0000000.000000101.00000071.000000131.0000000.000000101.00000071.000000131.0000000.8584501.0000000.0000001.0000000.000000-1.0000000.000000
3900.0002001.0000000.000000470.500000450.000000491.0000000.000000470.500000450.000000491.0000000.1678321.0000000.0000001.0000000.000000-1.0000000.000000
3910.0008001.0000000.000000358.500000282.000000435.0000000.000000358.500000282.000000435.0000000.7656921.0000000.0000001.0000000.000000-1.0000000.000000
3920.00120022.0000000.00000073.00000047.00000099.0000000.00000073.00000047.00000099.0000001.2394061.0000000.0000001.0000000.00000020.0000000.000000
3930.00220011.50000014.849242247.00000067.000000427.0000000.000000247.00000067.000000427.0000002.1860261.0000000.0000001.0000000.0000009.50000014.849242
3940.0008000.5000004.949748484.500000383.000000586.0000000.500000383.000000383.000000383.0000000.807172-0.5000002.1213200.0000001.4142141.0000001.414214
3950.0012001.0000000.000000175.500000109.000000242.0000000.000000175.500000109.000000242.0000001.1680041.0000000.0000001.0000000.000000-1.0000000.000000
3960.0014001.0000000.000000298.000000180.000000416.0000000.000000298.000000180.000000416.0000001.3577371.0000000.0000001.0000000.000000-1.0000000.000000
3970.00100022.0000000.000000121.50000075.000000168.0000000.000000121.50000075.000000168.0000001.0440501.0000000.0000001.0000000.00000020.0000000.000000
3980.00180011.50000014.849242238.50000085.000000392.0000000.000000238.50000085.000000392.0000001.8499671.0000000.0000001.0000000.0000009.50000014.849242
3990.00010011.50000014.849242199.000000195.000000203.0000000.000000199.000000195.000000203.0000000.0917871.0000000.0000001.0000000.0000009.50000014.849242
4000.0013001.0000000.000000291.500000180.000000403.0000000.000000291.500000180.000000403.0000001.2746301.0000000.0000001.0000000.000000-1.0000000.000000
4010.0011000.0000001.414214257.000000152.000000362.0000000.000000257.000000152.000000362.0000001.0970281.0000000.0000001.0000000.000000-2.0000001.414214
4020.0017001.0000000.000000105.00000043.000000167.0000000.000000105.00000043.000000167.0000001.6904361.0000000.0000001.0000000.000000-1.0000000.000000
4030.00060011.50000014.849242169.000000136.000000202.0000000.000000169.000000136.000000202.0000000.5606831.0000000.0000001.0000000.0000009.50000014.849242
4040.0012001.0000000.000000343.000000208.000000478.0000000.000000343.000000208.000000478.0000001.2395441.0000000.0000001.0000000.000000-1.0000000.000000
4050.0016002.5000002.121320284.000000153.000000415.0000000.000000284.000000153.000000415.0000001.5621731.0000000.0000001.0000000.0000000.5000002.121320
4060.0015001.0000000.000000273.500000159.000000388.0000000.000000273.500000159.000000388.0000001.4512821.0000000.0000001.0000000.000000-1.0000000.000000
4070.0002001.0000000.000000118.000000115.000000121.0000000.000000118.000000115.000000121.0000000.1834801.0000000.0000001.0000000.000000-1.0000000.000000
4080.0018001.0000000.000000239.50000097.000000382.0000000.000000239.50000097.000000382.0000001.7699101.0000000.0000001.0000000.000000-1.0000000.000000
4090.00020011.50000014.84924270.50000069.00000072.0000000.00000070.50000069.00000072.0000000.1693711.0000000.0000001.0000000.0000009.50000014.849242
4100.00230011.50000014.849242219.00000059.000000379.0000000.000000219.00000059.000000379.0000002.2714831.0000000.0000001.0000000.0000009.50000014.849242
4110.0017001.0000000.000000268.50000095.000000442.0000000.000000268.50000095.000000442.0000001.6511641.0000000.0000001.0000000.000000-1.0000000.000000
4120.0020001.0000000.000000254.00000070.000000438.0000000.000000254.00000070.000000438.0000001.9570521.0000000.0000001.0000000.000000-1.0000000.000000
4130.00210011.50000014.849242241.50000068.000000415.0000000.000000241.50000068.000000415.0000002.0538681.0000000.0000001.0000000.0000009.50000014.849242
4140.00050011.50000014.84924269.50000054.00000085.0000000.00000069.50000054.00000085.0000000.5315251.0000000.0000001.0000000.0000009.50000014.849242
4150.0022001.0000000.000000288.00000077.000000499.0000000.000000288.00000077.000000499.0000002.1558221.0000000.0000001.0000000.000000-1.0000000.000000
4160.0014001.00000029.698484351.000000196.000000506.0000000.000000351.000000196.000000506.0000001.3680131.0000000.000000-9.50000014.8492429.50000014.849242
4170.0010001.0000000.000000306.500000210.000000403.0000000.000000306.500000210.000000403.0000001.0371901.0000000.0000001.0000000.000000-1.0000000.000000
4180.0007001.0000000.000000390.500000308.000000473.0000000.000000390.500000308.000000473.0000000.6802981.0000000.0000001.0000000.000000-1.0000000.000000
4190.0018001.0000000.000000363.000000174.000000552.0000000.000000363.000000174.000000552.0000001.8175931.0000000.0000001.0000000.000000-1.0000000.000000
4200.00160013.00000012.727922274.500000133.000000416.0000000.000000274.500000133.000000416.0000001.6003291.0000000.0000001.0000000.00000011.00000012.727922
4210.0015000.0000001.414214294.000000157.000000431.0000000.000000294.000000157.000000431.0000001.5279921.0000000.0000001.0000000.000000-2.0000001.414214
4220.0021002.5000002.121320303.00000094.000000512.0000000.000000303.00000094.000000512.0000002.0791811.0000000.0000001.0000000.0000000.5000002.121320
4230.0007001.0000000.000000314.000000250.000000378.0000000.000000314.000000250.000000378.0000000.7133971.0000000.0000001.0000000.000000-1.0000000.000000
4240.0026001.0000000.000000248.50000033.000000464.0000000.000000248.50000033.000000464.0000002.6238601.0000000.0000001.0000000.000000-1.0000000.000000
4250.000300-10.50000016.263456296.500000265.000000328.0000000.000000296.500000265.000000328.0000000.251168-0.5000002.121320-9.50000014.849242-0.5000000.707107
4260.00090010.50000016.263456110.50000072.000000149.0000000.000000110.50000072.000000149.0000000.8762591.0000000.0000001.0000000.0000008.50000016.263456
4270.00010011.50000014.849242160.000000154.000000166.0000000.000000160.000000154.000000166.0000000.1399851.0000000.0000001.0000000.0000009.50000014.849242
4280.0006001.0000000.000000460.000000378.000000542.0000000.000000460.000000378.000000542.0000000.6492061.0000000.0000001.0000000.000000-1.0000000.000000
4290.00190011.50000014.849242288.000000111.000000465.0000000.000000288.000000111.000000465.0000001.9342831.0000000.0000001.0000000.0000009.50000014.849242
4300.00030011.50000014.849242401.500000367.000000436.0000000.000000401.500000367.000000436.0000000.2674791.0000000.0000001.0000000.0000009.50000014.849242
4310.00190022.0000000.000000206.00000066.000000346.0000000.000000206.00000066.000000346.0000001.8977471.0000000.0000001.0000000.00000020.0000000.000000
4320.000500-1.0000002.828427515.500000445.000000586.0000000.500000445.000000445.000000445.0000000.539420-0.5000002.1213200.0000001.414214-0.5000000.707107

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[1;30;43mStreaming output truncated to the last 5000 lines.\u001b[0m\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;208m1024\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "def strategy(board):\n", + " import random\n", + " return random.choice([\"W\", \"A\", \"S\", \"D\"])\n", + "Steps = 1582 State = success\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;214m 512\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Timeout\n", + "Exception = name 'random' is not defined\n", + "Timeout\n", + "Timeout\n", + "def strategy(board):\n", + " import copy\n", + " dirs = {'W':(-1,0),'S':(1,0),'A':(0,-1),'D':(0,1)}\n", + " best_move = None\n", + " best_score = -1\n", + " def move(b, d):\n", + " rows, cols = 4,4\n", + " def slide(row):\n", + " new = [x for x in row if x]\n", + " res = []\n", + " skip=False\n", + " i=0\n", + " while i < len(new):\n", + " if i+1best_score:\n", + " best_score=score\n", + " best_move=m\n", + " return best_move\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "def strategy(board):\n", + " import random\n", + " moves = [\"W\", \"A\", \"S\", \"D\"]\n", + " # simple heuristic: prefer moves that increase score locally\n", + " best = None\n", + " best_score = -1\n", + " def simulate(board, move):\n", + " # copy board\n", + " import copy, math\n", + " b = copy.deepcopy(board)\n", + " # apply move logic simplified: just return new board\n", + " # without full game logic, just random for illustration\n", + " return b, random.randint(0, 4)\n", + " for m in moves:\n", + " _, s = simulate(board, m)\n", + " if s > best_score:\n", + " best_score = s\n", + " best = m\n", + " return best or random.choice(moves)\n", + "Steps = 1310 State = success\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "def strategy(board):\n", + " import copy, itertools, random\n", + "\n", + " def merge(row):\n", + " nonlocal score\n", + " new = [x for x in row if x]\n", + " merged = []\n", + " i = 0\n", + " while i < len(new):\n", + " if i+1 < len(new) and new[i]==new[i+1]:\n", + " merged.append(new[i]*2)\n", + " score += new[i]*2\n", + " i += 2\n", + " else:\n", + " merged.append(new[i])\n", + " i += 1\n", + " merged += [0]*(len(row)-len(merged))\n", + " return merged\n", + "\n", + " def transform(b, dir):\n", + " new = copy.deepcopy(b)\n", + " if dir=='W':\n", + " for c in range(4):\n", + " col=[new[r][c] for r in range(4)]\n", + " col=merge(col)\n", + " for r in range(4): new[r][c]=col[r]\n", + " if dir=='S':\n", + " for c in range(4):\n", + " col=[new[r][c] for r in range(4)][::-1]\n", + " col=merge(col)\n", + " col=col[::-1]\n", + " for r in range(4): new[r][c]=col[r]\n", + " if dir=='A':\n", + " for r in range(4):\n", + " new[r]=merge(new[r])\n", + " if dir=='D':\n", + " for r in range(4):\n", + " new[r]=merge(new[r][::-1])[::-1]\n", + " return new\n", + "\n", + " def board_score(b):\n", + " return max(max(row) for row in b)\n", + "\n", + " best=None; best_score=-1\n", + " dirs='WASD'\n", + " for d in dirs:\n", + " new=transform(board,d)\n", + " if new!=board:\n", + " s=board_score(new)\n", + " if s>best_score:\n", + " best_score=s; best=d\n", + " return best if best else random.choice('WASD')\n", + "Timeout\n", + "Steps = 1129 State = success\n", + "def strategy(board):\n", + " import random\n", + " return random.choice([\"W\", \"A\", \"S\", \"D\"])\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Timeout\n", + "Timeout\n", + "def strategy(board):\n", + " # Simple heuristic: always push left\n", + " return \"A\"\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "def strategy(board):\n", + " return \"W\" # simple fixed strategy\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "def strategy(board):\n", + " return \"W\" # Basic placeholder strategy: always move up\n", + "Timeout\n", + "Timeout\n", + "Steps = 1223 State = success\n", + "def strategy(board):\n", + " # Simple heuristic: always try to move up if possible, otherwise random legal move\n", + " from random import choice\n", + " moves = ['W', 'A', 'S', 'D']\n", + " return choice(moves)\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Timeout\n", + "def strategy(board):\n", + " \"\"\"Very simple 2048 strategy: always try to move up if tiles can combine, else down, left, right.\"\"\"\n", + " # helper to check if any move is possible in a given direction\n", + " def can_move(b, dir):\n", + " n = len(b)\n", + " # simulate slide in given direction, return True if any change\n", + " temp = [row[:] for row in b]\n", + " moved = False\n", + " if dir == 'W':\n", + " for c in range(n):\n", + " col = [temp[r][c] for r in range(n)]\n", + " merged, _ = slide(col)\n", + " if merged != col:\n", + " moved = True\n", + " for r in range(n):\n", + " temp[r][c] = merged[r]\n", + " elif dir == 'S':\n", + " for c in range(n):\n", + " col = [temp[r][c] for r in range(n)][::-1]\n", + " merged, _ = slide(col)\n", + " merged = merged[::-1]\n", + " if merged != [temp[r][c] for r in range(n)]:\n", + " moved = True\n", + " for r in range(n):\n", + " temp[n-1-r][c] = merged[r]\n", + " elif dir == 'A':\n", + " for r in range(n):\n", + " row = temp[r][:]\n", + " merged, _ = slide(row)\n", + " if merged != row:\n", + " moved = True\n", + " temp[r] = merged\n", + " elif dir == 'D':\n", + " for r in range(n):\n", + " row = temp[r][::-1]\n", + " merged, _ = slide(row)\n", + " merged = merged[::-1]\n", + " if merged != temp[r]:\n", + " moved = True\n", + " temp[r] = merged\n", + " return moved\n", + "\n", + " def slide(line):\n", + " \"\"\"Slide non-zeros left and merge.\"\"\"\n", + " new = [x for x in line if x != 0]\n", + " merged = []\n", + " skip = False\n", + " i = 0\n", + " while i < len(new):\n", + " if i+1 < len(new) and new[i] == new[i+1]:\n", + " merged.append(new[i]*2)\n", + " i += 2\n", + " else:\n", + " merged.append(new[i])\n", + " i += 1\n", + " merged += [0]*(len(line)-len(merged))\n", + " return merged, merged != line\n", + "\n", + " for d in ['W','S','A','D']:\n", + " if can_move(board, d):\n", + " return d\n", + " return 'W' # fallback\n", + "Steps = 205 State = failed\n", + "┌───┬───┬───┬───┬───┬───┐\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├───┼───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\n", + "├───┼───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\n", + "├───┼───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "├───┼───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├───┼───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "└───┴───┴───┴───┴───┴───┘\n", + "Timeout\n", + "Timeout\n", + "Steps = 988 State = failed\n", + "def strategy(board):\n", + " # Helper to simulate a move\n", + " def move(board, key):\n", + " def compress(row):\n", + " new = [x for x in row if x]\n", + " for i in range(len(new)-1):\n", + " if new[i]==new[i+1]:\n", + " new[i]*=2\n", + " new[i+1]=0\n", + " return [x for x in new if x]\n", + " def transpose(b):\n", + " return [list(i) for i in zip(*b)]\n", + " def reverse(b):\n", + " return [row[::-1] for row in b]\n", + " def slide_left(b):\n", + " return [compress(r) + [0]*(len(b[0])-len(compress(r))) for r in b]\n", + " def apply(b, k):\n", + " n=len(b)\n", + " if k=='W':return transpose(slide_left(transpose(b)))\n", + " if k=='S':return transpose(reverse(slide_left(reverse(transpose(b)))))\n", + " if k=='A':return slide_left(b)\n", + " if k=='D':return reverse(slide_left(reverse(b)))\n", + " return apply(board, key)\n", + " moves = \"WASD\"\n", + " for m in moves:\n", + " if move(board, m)!=board:\n", + " return m\n", + " return \"W\"\n", + "┌───┬───┬───┬───┬───┬───┐\n", + "│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;214m512\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├───┼───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;214m512\u001b[0m│\u001b[38;5;154m128\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\n", + "├───┼───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;154m128\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;226m256\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\n", + "├───┼───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "├───┼───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├───┼───┼───┼───┼───┼───┤\n", + "│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "└───┴───┴───┴───┴───┴───┘\n", + "Timeout\n", + "def strategy(board):\n", + " return \"W\"\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "Exception = list index out of range\n", + "Timeout\n", + "def strategy(board):\n", + " # Simple strategy: always move up unless the board is empty.\n", + " return \"W\"\n", + "Timeout\n", + "Timeout\n", + "Steps = 1095 State = success\n", + "def strategy(board):\n", + " import random\n", + " return random.choice([\"W\",\"A\",\"S\",\"D\"])\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Steps = 1285 State = success\n", + "def strategy(board):\n", + " # Simple heuristic: try to move left if possible, otherwise random\n", + " from random import choice\n", + " moves = [\"W\", \"A\", \"S\", \"D\"]\n", + " return choice(moves)\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "def strategy(board):\n", + " import random\n", + " moves = \"WASD\"\n", + " return random.choice(moves)\n", + "Steps = 1181 State = success\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "def strategy(board):\n", + " # Define move application\n", + " def move(mat, dir):\n", + " def compress(row):\n", + " new = [x for x in row if x != 0]\n", + " res = []\n", + " skip = False\n", + " for i in range(len(new)):\n", + " if skip:\n", + " skip = False\n", + " continue\n", + " if i+1 < len(new) and new[i] == new[i+1]:\n", + " res.append(new[i]*2)\n", + " skip = True\n", + " else:\n", + " res.append(new[i])\n", + " res += [0]*(len(row)-len(res))\n", + " return res\n", + " def transpose(m): return [list(row) for row in zip(*m)]\n", + " m = [r[:] for r in mat]\n", + " if dir == 'W': # up\n", + " m = transpose(m)\n", + " m = [compress(row) for row in m]\n", + " m = transpose(m)\n", + " elif dir == 'S': # down\n", + " m = transpose(m)\n", + " m = [compress(row[::-1])[::-1] for row in m]\n", + " m = transpose(m)\n", + " elif dir == 'A': # left\n", + " m = [compress(row) for row in m]\n", + " elif dir == 'D': # right\n", + " m = [compress(row[::-1])[::-1] for row in m]\n", + " return m\n", + "\n", + " # Evaluate board by total merge value\n", + " def score(mat):\n", + " return sum(sum(row) for row in mat)\n", + "\n", + " best_move = None\n", + " best_score = -1\n", + " for move_char in 'WASD':\n", + " new_board = move(board, move_char)\n", + " sc = score(new_board)\n", + " if sc > best_score:\n", + " best_score = sc\n", + " best_move = move_char\n", + " return best_move\n", + "Timeout\n", + "Timeout\n", + "Steps = 1264 State = success\n", + "def strategy(board):\n", + " import random\n", + " return random.choice([\"W\", \"A\", \"S\", \"D\"])\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Steps = 1157 State = success\n", + "def strategy(board):\n", + " import random\n", + " moves = ['W', 'A', 'S', 'D']\n", + " # Simple heuristic: choose a random valid move\n", + " return random.choice(moves)\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Timeout\n", + "def strategy(board):\n", + " import random\n", + " moves = [\"W\", \"A\", \"S\", \"D\"]\n", + " return random.choice(moves)\n", + "Steps = 1490 State = success\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;214m 512\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Steps = 1072 State = failed\n", + "def strategy(board):\n", + " import copy, random\n", + " def move(b, d):\n", + " # simple 2048 move implementation\n", + " def slide(arr):\n", + " arr = [x for x in arr if x!=0]\n", + " res=[]\n", + " skip=False\n", + " for i in range(len(arr)):\n", + " if skip: skip=False; continue\n", + " if i+1 best_empty:\n", + " best_empty = e\n", + " best = m\n", + " return best if best else random.choice(moves)\n", + "Timeout\n", + "Timeout\n", + "Steps = 1372 State = success\n", + "def strategy(board):\n", + " import random\n", + " # choose a random valid move from 'W', 'A', 'S', 'D'\n", + " return random.choice([\"W\", \"A\", \"S\", \"D\"])\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Steps = 1228 State = success\n", + "def strategy(board):\n", + " from random import choice\n", + " \n", + " # Possible moves\n", + " moves = ['W', 'A', 'S', 'D']\n", + " \n", + " # Simple heuristic: prefer moves that generate a merge or create a larger tile\n", + " def can_merge(b, move):\n", + " # Simulate a move and check if any tiles merge\n", + " # For brevity, just return False for this example\n", + " return False\n", + " \n", + " # Filter moves that could potentially merge\n", + " valid_moves = [m for m in moves if can_merge(board, m)]\n", + " if valid_moves:\n", + " return choice(valid_moves)\n", + " \n", + " # Fallback: random move\n", + " return choice(moves)\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Timeout\n", + "def strategy(board):\n", + " import random\n", + " # Simple heuristic: try moves in random order until one changes the board\n", + " moves = ['W','A','S','D']\n", + " random.shuffle(moves)\n", + " def can_move(b, dir):\n", + " # simulate a move and check if board changes\n", + " def rotate(b, times):\n", + " for _ in range(times):\n", + " b = [list(row) for row in zip(*b[::-1])]\n", + " return b\n", + " n = len(b)\n", + " def merge_line(line):\n", + " new = [x for x in line if x]\n", + " merged = []\n", + " skip = False\n", + " for i in range(len(new)):\n", + " if skip: skip = False; continue\n", + " if i+1 < len(new) and new[i] == new[i+1]:\n", + " merged.append(new[i]*2)\n", + " skip = True\n", + " else:\n", + " merged.append(new[i])\n", + " merged += [0]*(n-len(merged))\n", + " return merged\n", + " def move(b, dir):\n", + " if dir=='W':\n", + " b = rotate(b,3)\n", + " elif dir=='S':\n", + " b = rotate(b,1)\n", + " elif dir=='D':\n", + " b = rotate(b,2)\n", + " new_b = []\n", + " for row in b:\n", + " new_b.append(merge_line(row))\n", + " # rotate back\n", + " for _ in range((4-({'W':3,'S':1,'D':2,'A':0}[dir]))%4):\n", + " new_b = [list(row) for row in zip(*new_b[::-1])]\n", + " return new_b\n", + " return move(b, dir) != b\n", + " for m in moves:\n", + " if can_move(board, m):\n", + " return m\n", + " return random.choice(moves)\n", + "Steps = 1173 State = success\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Steps = 1221 State = success\n", + "def strategy(board):\n", + " import random\n", + " # Convert board to 4x4 grid\n", + " size = len(board)\n", + " directions = [\"W\", \"A\", \"S\", \"D\"]\n", + " \n", + " def move_possible(d):\n", + " # Basic check: return True if at least one move in that direction changes board\n", + " rowc, colc = 0, 0\n", + " for r in range(size):\n", + " for c in range(size):\n", + " if board[r][c] == 0:\n", + " return True\n", + " return False\n", + " \n", + " # Randomly pick a direction that is valid (here we just return a random choice)\n", + " # In a real strategy you'd evaluate each move, but for brevity we pick random\n", + " return random.choice(directions)\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "def strategy(board):\n", + " # A very simple strategy: always move up ('W') unless it would lose immediately.\n", + " # This is a placeholder; a more sophisticated strategy would evaluate moves.\n", + " return \"W\"\n", + "Timeout\n", + "Exception = name 'copy' is not defined\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "def strategy(board):\n", + " # Simple heuristic: choose the move that yields the most empty tiles after simulation\n", + " import random, copy\n", + "\n", + " moves = ['W', 'A', 'S', 'D']\n", + " def simulate(b, m):\n", + " # naive clone and apply a single shift (no merging logic)\n", + " return b # placeholder, as full 2048 logic is complex\n", + "\n", + " best = moves[0]\n", + " best_empty = -1\n", + " for m in moves:\n", + " b_copy = copy.deepcopy(board)\n", + " # placeholder: pretend each move yields random empty count\n", + " empty = random.randint(0, 15) # dummy\n", + " if empty > best_empty:\n", + " best_empty = empty\n", + " best = m\n", + " return best\n", + "Steps = 1260 State = success\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "Steps = 1500 State = failed\n", + "def strategy(board):\n", + " # Simple heuristic: try moves in order, pick first that changes board\n", + " def tilt(board, dir):\n", + " n = len(board)\n", + " def compress(row):\n", + " new = [x for x in row if x]\n", + " for i in range(len(new)-1):\n", + " if new[i] == new[i+1]:\n", + " new[i] *= 2\n", + " new[i+1] = 0\n", + " new = [x for x in new if x]\n", + " return new + [0]*(n-len(new))\n", + " result = [[0]*n for _ in range(n)]\n", + " for i in range(n):\n", + " for j in range(n):\n", + " result[i][j] = board[i][j]\n", + " if dir == \"W\":\n", + " for j in range(n):\n", + " col = [result[i][j] for i in range(n)]\n", + " newcol = compress(col)\n", + " for i in range(n): result[i][j] = newcol[i]\n", + " elif dir == \"S\":\n", + " for j in range(n):\n", + " col = [result[i][j] for i in range(n)][::-1]\n", + " newcol = compress(col)[::-1]\n", + " for i in range(n): result[i][j] = newcol[i]\n", + " elif dir == \"A\":\n", + " for i in range(n):\n", + " newrow = compress(result[i])\n", + " result[i] = newrow\n", + " elif dir == \"D\":\n", + " for i in range(n):\n", + " newrow = compress(result[i][::-1])[::-1]\n", + " result[i] = newrow\n", + " return result\n", + " moves = [\"W\",\"A\",\"S\",\"D\"]\n", + " for m in moves:\n", + " if tilt(board, m) != board:\n", + " return m\n", + " return moves[0]\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;208m1024\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;214m 512\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "def strategy(board):\n", + " # Simple Monte‑Carlo style: try all moves, pick the one that gives the highest\n", + " # immediate score (sum of merged tiles) in one step. \n", + " def simulate_move(b, dir):\n", + " import copy\n", + " b2 = copy.deepcopy(b)\n", + " size = len(b2)\n", + " score = 0\n", + "\n", + " def compress_line(line):\n", + " nonlocal score\n", + " new = [x for x in line if x != 0]\n", + " merged = []\n", + " i = 0\n", + " while i < len(new):\n", + " if i+1 < len(new) and new[i] == new[i+1]:\n", + " merged.append(new[i]*2)\n", + " score += new[i]*2\n", + " i += 2\n", + " else:\n", + " merged.append(new[i])\n", + " i += 1\n", + " merged += [0]*(size-len(merged))\n", + " return merged\n", + "\n", + " def move_left(mat):\n", + " for i in range(size):\n", + " mat[i] = compress_line(mat[i])\n", + "\n", + " def transpose(mat):\n", + " return [list(row) for row in zip(*mat)]\n", + "\n", + " if dir == \"W\":\n", + " move_left(transpose(b2))\n", + " transpose(b2)\n", + " elif dir == \"S\":\n", + " move_left(b2)\n", + " elif dir == \"A\":\n", + " move_left(b2)\n", + " elif dir == \"D\":\n", + " move_left(transpose(b2))\n", + " transpose(b2)\n", + "\n", + " return score\n", + "\n", + " best_dir = None\n", + " best_score = -1\n", + " for d in \"WASD\":\n", + " sc = simulate_move(board, d)\n", + " if sc > best_score:\n", + " best_score, best_dir = sc, d\n", + " return best_dir if best_dir else \"W\"\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "def strategy(board):\n", + " import copy, random\n", + " def move(b, dir):\n", + " size=len(b)\n", + " def rotate(b):\n", + " return [list(row) for row in zip(*b[::-1])]\n", + " if dir=='W':\n", + " g=rotate(rotate(rotate(rotate(b))))\n", + " elif dir=='A':\n", + " g=rotate(rotate(b))\n", + " elif dir=='D':\n", + " g=rotate(b)\n", + " else:\n", + " g=b\n", + " # slide rows left\n", + " def slide(row):\n", + " new=[i for i in row if i!=0]\n", + " res=[]\n", + " i=0\n", + " while ibestscore:\n", + " bestscore=score;bestdir=d\n", + " return bestdir or random.choice(\"WASD\")\n", + "Timeout\n", + "Timeout\n", + "Steps = 1427 State = success\n", + "def strategy(board):\n", + " import random\n", + " return random.choice([\"W\",\"A\",\"S\",\"D\"])\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Timeout\n", + "Steps = 1150 State = success\n", + "def strategy(board):\n", + " import random\n", + " return random.choice([\"W\", \"A\", \"S\", \"D\"])\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "def strategy(board):\n", + " # Simple strategy: always try to move up if possible, otherwise random\n", + " # Since we don't have a full engine, just return \"W\".\n", + " return \"W\"\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "Steps = 1347 State = success\n", + "def strategy(board):\n", + " # Simple random move strategy\n", + " import random\n", + " return random.choice([\"W\", \"A\", \"S\", \"D\"])\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "def strategy(board):\n", + " # Simple heuristic: try all moves and pick the one that results in the most merges\n", + " def rotate_cw(b): # rotate the board 90° clockwise\n", + " return [ [b[3-j][i] for j in range(4)] for i in range(4) ]\n", + "\n", + " def compress(b):\n", + " new_b = [[0]*4 for _ in range(4)]\n", + " score = 0\n", + " for i in range(4):\n", + " pos = 0\n", + " last = 0\n", + " for j in range(4):\n", + " val = b[i][j]\n", + " if val:\n", + " if val == last:\n", + " new_b[i][pos-1] *= 2\n", + " score += new_b[i][pos-1]\n", + " last = 0\n", + " else:\n", + " last = val\n", + " new_b[i][pos] = val\n", + " pos += 1\n", + " return new_b, score\n", + "\n", + " def move_left(b):\n", + " compressed, s = compress(b)\n", + " return compressed, s\n", + "\n", + " def move(board, dir):\n", + " b = [row[:] for row in board]\n", + " for _ in range(dir): # 0:W,1:D,2:S,3:A\n", + " b = rotate_cw(b)\n", + " new_b, score = move_left(b)\n", + " for _ in range(4-dir):\n", + " new_b = rotate_cw(new_b)\n", + " return new_b, score\n", + "\n", + " moves = ['W','A','S','D']\n", + " best_move = None\n", + " best_score = -1\n", + " for i, m in enumerate(moves):\n", + " _, s = move(board, i)\n", + " if s > best_score:\n", + " best_score, best_move = s, m\n", + " return best_move\n", + "Timeout\n", + "Steps = 1363 State = success\n", + "def strategy(board):\n", + " import random\n", + " return random.choice([\"W\",\"A\",\"S\",\"D\"])\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "def strategy(board):\n", + " \"\"\"\n", + " A very simple strategy for 2048: try moves in the order W, A, S, D\n", + " and return the first one that changes the board state.\n", + " \"\"\"\n", + " def can_move(b, dir):\n", + " # simulate a move and check if board changes\n", + " from copy import deepcopy\n", + " import numpy as np\n", + "\n", + " def move(b):\n", + " # inner helper to shift and merge once\n", + " def merge(arr):\n", + " result = []\n", + " skip = False\n", + " for i in range(len(arr)):\n", + " if skip:\n", + " skip = False\n", + " continue\n", + " if i+1 < len(arr) and arr[i] and arr[i] == arr[i+1]:\n", + " result.append(arr[i]*2)\n", + " skip = True\n", + " else:\n", + " result.append(arr[i])\n", + " return result + [0]*(len(arr)-len(result))\n", + "\n", + " n = len(b)\n", + " new_board = [[0]*n for _ in range(n)]\n", + " if dir == 'W':\n", + " for j in range(n):\n", + " col = [b[i][j] for i in range(n)]\n", + " moved = merge(col)\n", + " for i in range(n):\n", + " new_board[i][j] = moved[i]\n", + " elif dir == 'S':\n", + " for j in range(n):\n", + " col = [b[i][j] for i in range(n)][::-1]\n", + " moved = merge(col)\n", + " moved = moved[::-1]\n", + " for i in range(n):\n", + " new_board[i][j] = moved[i]\n", + " elif dir == 'A':\n", + " for i in range(n):\n", + " row = b[i]\n", + " moved = merge(row)\n", + " new_board[i] = moved\n", + " elif dir == 'D':\n", + " for i in range(n):\n", + " row = b[i][::-1]\n", + " moved = merge(row)\n", + " new_board[i] = moved[::-1]\n", + " return new_board\n", + "\n", + " new_board = move(b)\n", + " return new_board != b\n", + "\n", + " for d in ['W','A','S','D']:\n", + " if can_move(board, d):\n", + " return d\n", + " # if no move changes board, pick any (fallback)\n", + " return 'W'\n", + "Timeout\n", + "Steps = 1136 State = success\n", + "def strategy(board):\n", + " import random\n", + " # Simple heuristic: Random move that changes the board\n", + " moves = [\"W\", \"A\", \"S\", \"D\"]\n", + " \n", + " def board_after_move(b, m):\n", + " # simulate movement by creating a copy of board and applying move\n", + " # This is a very simplified implementation: in actual 2048 logic\n", + " # you'd need to merge tiles. For demo purposes, just return original.\n", + " return b # placeholder\n", + " \n", + " random.shuffle(moves)\n", + " for m in moves:\n", + " if board_after_move(board, m) != board:\n", + " return m\n", + " return moves[0]\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "def strategy(board):\n", + " import random\n", + " # simple heuristic: pick a random valid move\n", + " # Define helper to check if move is valid (changes board)\n", + " def move_possible(b, move):\n", + " # create a copy and apply move, compare\n", + " import copy\n", + " tmp = copy.deepcopy(b)\n", + " # apply move on tmp\n", + " def move_board(bd, dir):\n", + " size = len(bd)\n", + " # helper to compress row/col\n", + " def compress(line):\n", + " # shift and merge\n", + " new = [v for v in line if v!=0]\n", + " res = []\n", + " skip = False\n", + " i = 0\n", + " while i < len(new):\n", + " if i+1 < len(new) and new[i]==new[i+1]:\n", + " res.append(new[i]*2)\n", + " i+=2\n", + " else:\n", + " res.append(new[i])\n", + " i+=1\n", + " res += [0]*(size-len(res))\n", + " return res\n", + " if dir==\"L\":\n", + " for i in range(size):\n", + " bd[i]=compress(bd[i])\n", + " elif dir==\"R\":\n", + " for i in range(size):\n", + " bd[i]=list(reversed(compress(list(reversed(bd[i])))))\n", + " elif dir==\"U\":\n", + " for j in range(size):\n", + " col=[bd[i][j] for i in range(size)]\n", + " col=compress(col)\n", + " for i in range(size):\n", + " bd[i][j]=col[i]\n", + " elif dir==\"D\":\n", + " for j in range(size):\n", + " col=[bd[i][j] for i in range(size)]\n", + " col=list(reversed(compress(list(reversed(col)))))\n", + " for i in range(size):\n", + " bd[i][j]=col[i]\n", + " move_board(tmp, move)\n", + " return tmp != b\n", + "\n", + " directions = [\"W\",\"A\",\"S\",\"D\"]\n", + " valid_moves = [m for m in directions if move_possible(board, m)]\n", + " return random.choice(valid_moves) if valid_moves else \"W\"\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "Steps = 2185 State = failed\n", + "def strategy(board):\n", + " def can_move(board, dir):\n", + " def move(board, dir):\n", + " def compress(line):\n", + " new = [x for x in line if x]\n", + " merged = []\n", + " skip = False\n", + " i = 0\n", + " while i < len(new):\n", + " if i+1 < len(new) and new[i] == new[i+1]:\n", + " merged.append(new[i]*2)\n", + " i += 2\n", + " else:\n", + " merged.append(new[i])\n", + " i += 1\n", + " merged += [0]*(len(line)-len(merged))\n", + " return merged\n", + " n = len(board)\n", + " b = [row[:] for row in board]\n", + " if dir=='W':\n", + " for j in range(n):\n", + " col = [b[i][j] for i in range(n)]\n", + " merged = compress(col)\n", + " for i in range(n): b[i][j] = merged[i]\n", + " elif dir=='S':\n", + " for j in range(n):\n", + " col = [b[i][j] for i in range(n)][::-1]\n", + " merged = compress(col)\n", + " merged = merged[::-1]\n", + " for i in range(n): b[i][j] = merged[i]\n", + " elif dir=='A':\n", + " for i in range(n):\n", + " row = b[i]\n", + " merged = compress(row)\n", + " b[i] = merged\n", + " elif dir=='D':\n", + " for i in range(n):\n", + " row = b[i][::-1]\n", + " merged = compress(row)\n", + " b[i] = merged[::-1]\n", + " return b\n", + " moved = move(board, dir)\n", + " return moved != board\n", + " for d in \"WASD\":\n", + " if can_move(board, d):\n", + " return d\n", + " return \"W\"\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;208m1024\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;208m1024\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;214m 512\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;214m 512\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Steps = 1127 State = success\n", + "def strategy(board):\n", + " import random\n", + " # Simple heuristic: if any move possible, pick a random legal one.\n", + " moves = ['W', 'A', 'S', 'D']\n", + " # Randomly shuffle and pick first that changes board (or return any if all invalid)\n", + " def can_move(board, move):\n", + " # Very simple check: ensure move is within bounds and not all cells zero\n", + " return any(any(row) for row in board)\n", + " random.shuffle(moves)\n", + " return moves[0]\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "def strategy(board):\n", + " # Simple heuristic: always try to move up (\"W\") if possible, otherwise down (\"S\")\n", + " # We'll simulate a single move to see if any tile merges or moves.\n", + " def can_move_up(b):\n", + " for col in range(4):\n", + " for row in range(1, 4):\n", + " if b[row][col] != 0 and (b[row-1][col] == 0 or b[row-1][col] == b[row][col]):\n", + " return True\n", + " return False\n", + " return \"W\" if can_move_up(board) else \"S\"\n", + "Timeout\n", + "Exception = list index out of range\n", + "Steps = 1218 State = failed\n", + "def strategy(board):\n", + " import copy, random\n", + "\n", + " dirs = {\"W\": (-1, 0), \"A\": (0, -1), \"S\": (1, 0), \"D\": (0, 1)}\n", + "\n", + " def slide_line(line):\n", + " new = [x for x in line if x != 0]\n", + " res = []\n", + " i = 0\n", + " while i < len(new):\n", + " if i + 1 < len(new) and new[i] == new[i + 1]:\n", + " res.append(new[i] * 2)\n", + " i += 2\n", + " else:\n", + " res.append(new[i])\n", + " i += 1\n", + " res += [0] * (len(line) - len(res))\n", + " return res\n", + "\n", + " def move(b, d):\n", + " n = len(b)\n", + " bx = [row[:] for row in b]\n", + " moved = False\n", + " for i in range(n):\n", + " if d[0] == -1 or d[0] == 1:\n", + " line = [bx[j][i] for j in range(n)]\n", + " if d[0] == 1: line.reverse()\n", + " new = slide_line(line)\n", + " if d[0] == 1: new.reverse()\n", + " for j in range(n):\n", + " if bx[j][i] != new[j]:\n", + " moved = True\n", + " bx[j][i] = new[j]\n", + " else:\n", + " line = bx[i][:]\n", + " if d[1] == 1: line.reverse()\n", + " new = slide_line(line)\n", + " if d[1] == 1: new.reverse()\n", + " if new != bx[i]:\n", + " moved = True\n", + " bx[i] = new\n", + " return bx, moved\n", + "\n", + " def score(b):\n", + " return sum(sum(row) for row in b)\n", + "\n", + " best = None\n", + " best_score = -1\n", + " for key, d in dirs.items():\n", + " new_board, moved = move(board, d)\n", + " if not moved: continue\n", + " s = score(new_board)\n", + " if s > best_score:\n", + " best_score = s\n", + " best = key\n", + " return best if best else random.choice(list(dirs.keys()))\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;214m 512\u001b[0m│\u001b[38;5;208m1024\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Timeout\n", + "Timeout\n", + "def strategy(board):\n", + " # Simple deterministic strategy: try to push upwards if possible otherwise right\n", + " # Check if any merge or move possible upward, else right\n", + " def can_move(board, drow, dcol):\n", + " for r in range(4):\n", + " for c in range(4):\n", + " val = board[r][c]\n", + " if val == 0: continue\n", + " nr, nc = r + drow, c + dcol\n", + " if 0 <= nr < 4 and 0 <= nc < 4:\n", + " nb = board[nr][nc]\n", + " if nb == 0 or nb == val: return True\n", + " return False\n", + "\n", + " if can_move(board, -1, 0):\n", + " return \"W\"\n", + " if can_move(board, 0, 1):\n", + " return \"D\"\n", + " if can_move(board, 1, 0):\n", + " return \"S\"\n", + " if can_move(board, 0, -1):\n", + " return \"A\"\n", + " # if no moves possible, just return \"W\"\n", + " return \"W\"\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "Timeout\n", + "def strategy(board):\n", + " # Simple heuristic: always try to move 'W' (up) if possible,\n", + " # otherwise pick the first available move among A, S, D.\n", + " moves = ['W', 'A', 'S', 'D']\n", + " for m in moves:\n", + " if can_move(board, m):\n", + " return m\n", + " return 'W'\n", + "\n", + " def can_move(b, m):\n", + " # Check if a move in direction m is possible (board not empty)\n", + " # This is a minimal placeholder that always returns True\n", + " # for demonstration purposes.\n", + " return True\n", + "Exception = cannot access local variable 'can_move' where it is not associated with a value\n", + "Steps = 1376 State = success\n", + "def strategy(board):\n", + " import random\n", + " return random.choice([\"W\", \"A\", \"S\", \"D\"])\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;214m 512\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Timeout\n", + "Steps = 1393 State = success\n", + "def strategy(board):\n", + " import random\n", + " moves = [\"W\",\"A\",\"S\",\"D\"]\n", + " # Simple heuristic: prefer moves that don't change board, otherwise random\n", + " def can_move(b, dir_):\n", + " # simulate a move and check if board changes\n", + " bcopy = [row[:] for row in b]\n", + " # placeholder for actual move logic\n", + " return True # assume any direction always possible for simplicity\n", + " \n", + " valid_moves = [m for m in moves if can_move(board, m)]\n", + " return random.choice(valid_moves)\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;214m 512\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Timeout\n", + "def strategy(board):\n", + " # simulate a move in 2048\n", + " def move(b, dir):\n", + " def transpose(m): return [list(x) for x in zip(*m)]\n", + " def reverse(m): return [row[::-1] for row in m]\n", + " def compress(row):\n", + " new = [x for x in row if x]\n", + " merged = []\n", + " skip = False\n", + " for i in range(len(new)):\n", + " if skip: skip = False; continue\n", + " if i+1 < len(new) and new[i]==new[i+1]:\n", + " merged.append(new[i]*2)\n", + " skip = True\n", + " else:\n", + " merged.append(new[i])\n", + " return merged + [0]*(len(row)-len(merged))\n", + " def slide(m):\n", + " return [compress(row) for row in m]\n", + " m = [row[:] for row in b]\n", + " if dir==\"W\":\n", + " m = transpose(m)\n", + " m = slide(m)\n", + " m = transpose(m)\n", + " elif dir==\"S\":\n", + " m = transpose(m)\n", + " m = reverse(m)\n", + " m = slide(m)\n", + " m = reverse(m)\n", + " m = transpose(m)\n", + " elif dir==\"A\":\n", + " m = reverse(m)\n", + " m = slide(m)\n", + " m = reverse(m)\n", + " elif dir==\"D\":\n", + " m = slide(m)\n", + " return m\n", + " dirs = \"WASD\"\n", + " for d in dirs:\n", + " if move(board, d) != board:\n", + " return d\n", + " return dirs[0]\n", + "Timeout\n", + "Steps = 1343 State = success\n", + "def strategy(board):\n", + " import random\n", + " moves = [\"W\", \"A\", \"S\", \"D\"]\n", + " # Randomly pick a move that actually changes the board state\n", + " board_str = str(board)\n", + " for _ in range(10):\n", + " mv = random.choice(moves)\n", + " # Very simple: pretend every move is valid\n", + " return mv\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Timeout\n", + "Timeout\n", + "Steps = 1013 State = success\n", + "def strategy(board):\n", + " import random\n", + " # simple heuristic: try moves that combine tiles\n", + " moves = [\"W\", \"A\", \"S\", \"D\"]\n", + " best_move = random.choice(moves)\n", + " # Simple check: if any move leads to a merge, prioritize that\n", + " def can_merge(b, move):\n", + " def copy_board(b):\n", + " return [row[:] for row in b]\n", + " def slide(row):\n", + " tiles = [t for t in row if t]\n", + " new = []\n", + " skip = False\n", + " for i in range(len(tiles)):\n", + " if skip:\n", + " skip = False\n", + " continue\n", + " if i+1 < len(tiles) and tiles[i] == tiles[i+1]:\n", + " new.append(tiles[i]*2)\n", + " skip = True\n", + " else:\n", + " new.append(tiles[i])\n", + " return new + [0]*(len(row)-len(new))\n", + " def transpose(b):\n", + " return [list(col) for col in zip(*b)]\n", + " b2 = copy_board(b)\n", + " if move == \"W\":\n", + " b2 = transpose(b2)\n", + " for i in range(4):\n", + " b2[i] = slide(b2[i])\n", + " b2 = transpose(b2)\n", + " elif move == \"S\":\n", + " b2 = transpose(b2)\n", + " for i in range(4):\n", + " b2[i] = list(reversed(slide(list(reversed(b2[i])))))\n", + " b2 = transpose(b2)\n", + " elif move == \"A\":\n", + " for i in range(4):\n", + " b2[i] = slide(b2[i])\n", + " elif move == \"D\":\n", + " for i in range(4):\n", + " b2[i] = list(reversed(slide(list(reversed(b2[i])))))\n", + " return b2 != b\n", + " # evaluate moves\n", + " for m in moves:\n", + " if can_merge(board, m):\n", + " best_move = m\n", + " break\n", + " return best_move\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "def strategy(board):\n", + " import random\n", + " moves = [\"W\", \"A\", \"S\", \"D\"]\n", + " return random.choice(moves)\n", + "Steps = 1620 State = success\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;214m 512\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Steps = 1342 State = success\n", + "def strategy(board):\n", + " # Simple strategy: choose a random legal move from those that change the board\n", + " import random, copy\n", + " moves = ['W', 'A', 'S', 'D']\n", + " def apply_move(b, m):\n", + " def rotate(b, times):\n", + " for _ in range(times):\n", + " b = [list(row) for row in zip(*b[::-1])]\n", + " return b\n", + " def slide_and_merge(b):\n", + " new_b = []\n", + " for row in b:\n", + " tiles = [t for t in row if t != 0]\n", + " merged = []\n", + " skip = False\n", + " for i in range(len(tiles)):\n", + " if skip:\n", + " skip = False\n", + " continue\n", + " if i+1 < len(tiles) and tiles[i] == tiles[i+1]:\n", + " merged.append(tiles[i]*2)\n", + " skip = True\n", + " else:\n", + " merged.append(tiles[i])\n", + " merged += [0]*(4-len(merged))\n", + " new_b.append(merged)\n", + " return new_b\n", + " rot = {'W':0,'A':1,'S':2,'D':3}[m]\n", + " b_rot = rotate(b, rot)\n", + " b_new = slide_and_merge(b_rot)\n", + " b_final = rotate(b_new, (4-rot)%4)\n", + " return b_final\n", + " legal_moves = []\n", + " for m in moves:\n", + " if apply_move(copy.deepcopy(board), m) != board:\n", + " legal_moves.append(m)\n", + " return random.choice(legal_moves) if legal_moves else 'W'\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;118m 64\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;154m 128\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;49m 8\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Timeout\n", + "Timeout\n", + "def strategy(board):\n", + " import random\n", + " moves = [\"W\",\"A\",\"S\",\"D\"]\n", + " # Prefer moves that combine tiles or create empty spaces\n", + " def can_move(m):\n", + " n = len(board)\n", + " tmp = [row[:] for row in board]\n", + " def move_left(b):\n", + " changed = False\n", + " for i in range(n):\n", + " merged = [False]*n\n", + " for j in range(1,n):\n", + " if b[i][j]==0: continue\n", + " k=j\n", + " while k>0 and b[i][k-1]==0:\n", + " b[i][k-1]=b[i][k]; b[i][k]=0; k-=1; changed=True\n", + " if k>0 and b[i][k-1]==b[i][k]:\n", + " b[i][k-1]*=2; b[i][k]=0; changed=True; merged[k-1]=True\n", + " return changed\n", + " def rotate(b, times):\n", + " for _ in range(times):\n", + " b = [list(row) for row in zip(*b[::-1])]\n", + " return b\n", + " def revert(b, times):\n", + " for _ in range(times):\n", + " b = [list(row) for row in zip(*b)][::-1]\n", + " return b\n", + " for t in range(4):\n", + " tmp = rotate(tmp,1)\n", + " if move_left(tmp):\n", + " return True\n", + " return False\n", + " viable = [m for m in moves if can_move(m)]\n", + " return random.choice(viable) if viable else random.choice(moves)\n", + "Steps = 1201 State = success\n", + "┌────┬────┬────┬────┬────┬────┐\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;47m 16\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;49m 8\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;196m2048\u001b[0m│\u001b[38;5;226m 256\u001b[0m│\u001b[38;5;239m .\u001b[0m│\u001b[38;5;239m .\u001b[0m│\n", + "├────┼────┼────┼────┼────┼────┤\n", + "│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;46m 32\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\u001b[38;5;51m 4\u001b[0m│\u001b[38;5;45m 2\u001b[0m│\n", + "└────┴────┴────┴────┴────┴────┘\n", + "Exception = 'str' object does not support item assignment\n" + ] + } + ], + "source": [ + "trainer.train()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And now with the LoRA we just trained with GRPO - we first save the LoRA first!" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "model.save_pretrained(\"gemma_4_lora\") # Local saving\n", + "tokenizer.save_pretrained(\"gemma_4_lora\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Verify LoRA is actually trained!" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "from safetensors import safe_open\n", + "\n", + "tensors = {}\n", + "with safe_open(\"grpo_saved_lora/adapter_model.safetensors\", framework = \"pt\") as f:\n", + " # Verify both A and B are non zero\n", + " for key in f.keys():\n", + " tensor = f.get_tensor(key)\n", + " n_zeros = (tensor == 0).sum() / tensor.numel()\n", + " assert(n_zeros.item() != tensor.numel())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "# Inference\n", + "Now let's try the model we just trained!" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "text = tokenizer.apply_chat_template(\n", + " [{\"role\": \"user\", \"content\": prompt.strip()}],\n", + " tokenize = False,\n", + " add_generation_prompt = True,\n", + ")\n", + "\n", + "from transformers import TextStreamer\n", + "\n", + "_ = model.generate(\n", + " **tokenizer(images = None, text = text, return_tensors = \"pt\").to(\"cuda\"),\n", + " temperature = 1.0, top_p = 0.95, top_k = 64,\n", + " max_new_tokens = 1024,\n", + " streamer = TextStreamer(tokenizer, skip_prompt = False),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "### Saving to float16 for VLLM\n", + "\n", + "We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See [our docs](https://unsloth.ai/docs/basics/inference-and-deployment) for more deployment options." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "# Merge to 16bit\n", + "if False: model.save_pretrained_merged(\"gemma_4_finetune_16bit\", tokenizer, save_method = \"merged_16bit\",)\n", + "if False: model.push_to_hub_merged(\"HF_USERNAME/gemma_4_finetune_16bit\", tokenizer, save_method = \"merged_16bit\", token = \"YOUR_HF_TOKEN\")\n", + "\n", + "# Merge to 4bit\n", + "if False: model.save_pretrained_merged(\"gemma_4_finetune_4bit\", tokenizer, save_method = \"merged_4bit\",)\n", + "if False: model.push_to_hub_merged(\"HF_USERNAME/gemma_4_finetune_4bit\", tokenizer, save_method = \"merged_4bit\", token = \"YOUR_HF_TOKEN\")\n", + "\n", + "# Just LoRA adapters\n", + "if False:\n", + " model.save_pretrained(\"gemma_4_lora\")\n", + " tokenizer.save_pretrained(\"gemma_4_lora\")\n", + "if False:\n", + " model.push_to_hub(\"HF_USERNAME/gemma_4_lora\", token = \"YOUR_HF_TOKEN\")\n", + " tokenizer.push_to_hub(\"HF_USERNAME/gemma_4_lora\", token = \"YOUR_HF_TOKEN\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### GGUF / llama.cpp Conversion\n", + "To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.\n", + "\n", + "Some supported quant methods (full list on our [docs page](https://unsloth.ai/docs/basics/inference-and-deployment/saving-to-gguf)):\n", + "* `q8_0` - Fast conversion. High resource use, but generally acceptable.\n", + "* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.\n", + "* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.\n", + "\n", + "[**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "# Save to 8bit Q8_0\n", + "if False: model.save_pretrained_gguf(\"gemma_4_finetune\", tokenizer,)\n", + "# Remember to go to https://huggingface.co/settings/tokens for a token!\n", + "# And change hf to your username!\n", + "if False: model.push_to_hub_gguf(\"HF_USERNAME/gemma_4_finetune\", tokenizer, token = \"YOUR_HF_TOKEN\")\n", + "\n", + "# Save to 16bit GGUF\n", + "if False: model.save_pretrained_gguf(\"gemma_4_finetune\", tokenizer, quantization_method = \"f16\")\n", + "if False: model.push_to_hub_gguf(\"HF_USERNAME/gemma_4_finetune\", tokenizer, quantization_method = \"f16\", token = \"YOUR_HF_TOKEN\")\n", + "\n", + "# Save to q4_k_m GGUF\n", + "if False: model.save_pretrained_gguf(\"gemma_4_finetune\", tokenizer, quantization_method = \"q4_k_m\")\n", + "if False: model.push_to_hub_gguf(\"HF_USERNAME/gemma_4_finetune\", tokenizer, quantization_method = \"q4_k_m\", token = \"YOUR_HF_TOKEN\")\n", + "\n", + "# Save to multiple GGUF options - much faster if you want multiple!\n", + "if False:\n", + " model.push_to_hub_gguf(\n", + " \"HF_USERNAME/gemma_4_finetune\", # Change hf to your username!\n", + " tokenizer,\n", + " quantization_method = [\"q4_k_m\", \"q8_0\", \"q5_k_m\",],\n", + " token = \"YOUR_HF_TOKEN\",\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, use the `gemma_4_finetune.Q8_0.gguf` file or `gemma_4_finetune.Q4_K_M.gguf` file in llama.cpp.\n", + "\n", + "And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!\n", + "\n", + "Some other resources:\n", + "1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)\n", + "2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)\n", + "3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)\n", + "4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)!\n", + "\n", + "

\n", + " \n", + " \n", + " \n", + "\n", + " Join Discord if you need help + ⭐️ Star us on Github ⭐️\n", + "
\n", + "\n", + " This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme)." + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": {} + } + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E2B)_Reinforcement_Learning_Sudoku_Game.ipynb b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E2B)_Reinforcement_Learning_Sudoku_Game.ipynb new file mode 100644 index 0000000..333786e --- /dev/null +++ b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E2B)_Reinforcement_Learning_Sudoku_Game.ipynb @@ -0,0 +1,10738 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "cP7xBi17919u" + }, + "source": [ + "To run this, press \"*Runtime*\" and press \"*Run all*\" on a **free** Tesla T4 Google Colab instance!\n", + "
\n", + "\n", + "\n", + " Join Discord if you need help + ⭐ Star us on Github ⭐\n", + "
\n", + "\n", + "To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme).\n", + "\n", + "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hzPgFeIkZn9q" + }, + "source": [ + "# Goal: Make Gemma 4 solve Sudoku puzzles with Reinforcement Learning\n", + "\n", + "Our goal is to make Gemma 4 learn to solve Sudoku puzzles using reinforcement learning (GRPO).\n", + "The model will devise a strategy to fill in empty cells, and we'll reward it for correct placements\n", + "and completing valid puzzles.\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "31KIMLJLnHET" + }, + "source": [ + "# Installation\n", + "We'll be using [Unsloth](https://github.com/unslothai/unsloth) to do RL on Gemma 4. Unsloth saves 70% VRAM usage and makes reinforcement learning 2 to 6x faster." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "CGoDZwcunHEU" + }, + "outputs": [], + "source": [ + "%%capture\n", + "import os, importlib.util\n", + "!pip install --upgrade -qqq uv\n", + "if importlib.util.find_spec(\"torch\") is None or \"COLAB_\" in \"\".join(os.environ.keys()):\n", + " try: import numpy, PIL; _numpy = f\"numpy=={numpy.__version__}\"; _pil = f\"pillow=={PIL.__version__}\"\n", + " except: _numpy = \"numpy\"; _pil = \"pillow\"\n", + " # Gemma 4 requires transformers >= 5.5.0 — do NOT pin to 4.x here\n", + " !uv pip install -qqq \\\n", + " \"torch>=2.8.0\" \"triton>=3.4.0\" {_numpy} {_pil} torchvision bitsandbytes \\\n", + " \"unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo\" \\\n", + " \"unsloth[base] @ git+https://github.com/unslothai/unsloth\" \\\n", + " git+https://github.com/triton-lang/triton.git@0add68262ab0a2e33b84524346cb27cbb2787356#subdirectory=python/triton_kernels\n", + "elif importlib.util.find_spec(\"unsloth\") is None:\n", + " !uv pip install -qqq unsloth\n", + "# Gemma 4 requires transformers >= 5.5.0\n", + "!uv pip install --upgrade --no-deps \"transformers>=5.5.0\" tokenizers \"trl>=0.28.0\" unsloth unsloth_zoo" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "-9Hbdy3yxkDe" + }, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "liF15-XtmF26" + }, + "source": [ + "### Unsloth" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 415, + "referenced_widgets": [ + "7f2f1d7c247f45bb93ff417cb186c2eb", + "535adc052cf742b08633c3e05a9eeb52", + "5c19e35c5f934891b5ce135843529a5b", + "cc934485e87748579a7b2176f8470dd0", + "dfdfcf6e8b564262bd28ff6eb2558190", + "101a1cf35bb24d79a6e8b4ce77eb5bf7", + "0b4ac68cb5ea43f8a2c196c7f24bb349", + "c5fa4165c9f44af8acb5a35d53269863", + "ce301e618b3e4a7f9c081174974f7548", + "2f67d427db03445ca24bc0f7a7a17cbd", + "c9dfc15b3f89476189e4d2347f504e2e", + "6c41947826904b1db57cc32055f0eccb", + "c688bd93590c4270acb23f423ca85946", + "fdf731aed6144145a41c873115a3af61", + "a98e3900409d4bcebf9ddc25441bf56a", + "a6dc10a4fb0d474e819b3ecfa656b66a", + "0c11138badda42f1b3681626bcbe6ed0", + "cb84d4ce92124f94b4d7897cd3ee7fd4", + "743aa5ba0a8c41019cd0cbdb97b779df", + "d627022e756048e497691196783e374b", + "f0820395ae644b50988896600368588b", + "8a12ba8d6b9744468b652d460ce10295", + "0d90074b1b56432a8a8af05cadd57eea", + "b44c516743f944559b21e963b6e559b9", + "e5e07f71eee54e828dcb7b9dc5305b56", + "ceca217e98384421934db2fd81c78b08", + "74783be6311840b58560b8ae249292c8", + "3d96500eae194cd3a6a9f92c39f8760e", + "6761c9faa81040e1866b73ad40ca5e02", + "022dbbd06beb434b8cde47f13d1d9f3b", + "157505960e3f4d04be6bdeae7119b4a8", + "ea70dad5f2aa48a2adab37c178f34baf", + "a0fab082e7364d3391d765cb785ed2db", + "9273d234d79247fabcb5a2ace2ec118b", + "75f48d69379947af82ddb10ce4fca453", + "e2c30cdce58440b8afb1665297661883", + "6ea2009e3bd442fba3e001bed472b00e", + "4e026a1dbae64137a45f7c2b114b3178", + "e29039073cc7474cb0fe704cc7a668b7", + "5f3cc4d26ee742bcb19dbfc9598080ee", + "996e80a09e254708923e6e76e92c2554", + "5fec6f60f4274b9e964c916197dd155e", + "98536a63b3c94c018a16a9997bda38cf", + "ae3e1b9c33964a94a405952e9120fd0a", + "1e6181166ca0420eb9dce94d9ca0eceb", + "2161c02b9b984a47aa4539b5e3625ed5", + "55e46a3096a4438d9663c3b5a0b75f34", + "9825b277c4ca49a891c01739f2feaae0", + "284e45208cdd471ea6c361e5c9599481", + "2ca2adf3b2924431aba041483ad76c0e", + "3e6969868d724975bb067c1f962cbec6", + "915b8f0bc15a416c8e31f163838db6b4", + "6421eac0ecd14d4e9a7debd117178381", + "fb0e65319f7a400da1457acf2a7f4b38", + "ab9ee6b2f00a476a84b634404c573cba", + "913104b57a984d08af47073b60b2dc09", + "e5c3875f5d7f43e1acb57da68a5810d6", + "8bba15c72cdf4b9e995af58424d2021e", + "a6b55e2f0fca456c9c9a2c8546e56595", + "a82f008a06a6408c8520831378f433ef", + "152d919539764660b1223a911e580fa3", + "29394f0ef562422993d43a78c93eb1f1", + "f5f6ac53523743bbad0d09a8476c917c", + "9805bc6b1149455ca9d6a09cf05992fe", + "72a14d17820e446eb22e6261b97ce3e4", + "31d9ffc210964cd5ab82dcf38cddbc5b", + "6dd46cd00a7c45c484187857a89c844f", + "6230a33eb2ad43cdbd8999a1dea7bd4c", + "9d12ca85c4cb4c6d83cc70f0f4f7c527", + "2eb4215e9ef84c9b8ad121f0f6f531d0", + "5693cc3536314347856a708cceeda383", + "9539188d02584eb985835b4986935ced", + "ef49681d55954ce9876d0d1ce5d10fc9", + "e28e70aa5f834335ad7ea21450a25d32", + "fdd2d273a3bb495e99ed707bc6783b5d", + "0dd65ad33c184ebaa5ccaef977f34e83", + "b34efc6222d3420e90c13f02b726e6b5" + ] + }, + "id": "DkIvEkIIkEyB", + "outputId": "63346ae4-bef3-44ae-b99c-51b48ae829c4" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.\n", + "🦥 Unsloth Zoo will now patch everything to make training faster!\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:unsloth_zoo.log:Unsloth: Patched trl.models.utils.disable_gradient_checkpointing with a no-op to preserve Unsloth gradient checkpointing across TRL generation passes.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "==((====))== Unsloth 2026.4.4: Fast Gemma4 patching. Transformers: 5.5.4.\n", + " \\\\ /| Tesla T4. Num GPUs = 1. Max memory: 14.563 GB. Platform: Linux.\n", + "O^O/ \\_/ \\ Torch: 2.10.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.6.0\n", + "\\ / Bfloat16 = FALSE. FA [Xformers = None. FA2 = False]\n", + " \"-____-\" Free license: http://github.com/unslothai/unsloth\n", + "Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!\n", + "Unsloth: QLoRA and full finetuning all not selected. Switching to 16bit LoRA.\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7f2f1d7c247f45bb93ff417cb186c2eb", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "model.safetensors: 0%| | 0.00/10.2G [00:00 0 ! Suggested 8, 16, 32, 64, 128\n", + " target_modules = [\n", + " \"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\",\n", + " \"gate_proj\", \"up_proj\", \"down_proj\",\n", + " ],\n", + " lora_alpha = lora_rank*2, # *2 speeds up training\n", + " use_gradient_checkpointing = \"unsloth\", # Reduces memory usage\n", + " random_state = 3407,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "N0QnO9_YJBOI" + }, + "source": [ + "# Sudoku Game Implementation\n", + "\n", + "We use GPT-5 to create a clean Sudoku solver environment. The strategy outputs \"row,col,value\" to fill cells." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "D9CI4jtgL5mw" + }, + "outputs": [], + "source": [ + "#@title Sudoku Game Implementation\n", + "from dataclasses import dataclass, field\n", + "from typing import List, Tuple, Optional\n", + "import random\n", + "import copy\n", + "\n", + "def _is_valid_placement(board: List[List[int]], row: int, col: int, num: int) -> bool:\n", + " \"\"\"Check if placing num at (row, col) is valid.\"\"\"\n", + " # Check row\n", + " if num in board[row]:\n", + " return False\n", + "\n", + " # Check column\n", + " if num in [board[r][col] for r in range(9)]:\n", + " return False\n", + "\n", + " # Check 3x3 box\n", + " box_row, box_col = 3 * (row // 3), 3 * (col // 3)\n", + " for r in range(box_row, box_row + 3):\n", + " for c in range(box_col, box_col + 3):\n", + " if board[r][c] == num:\n", + " return False\n", + "\n", + " return True\n", + "\n", + "def _solve_sudoku(board: List[List[int]]) -> bool:\n", + " \"\"\"Solve sudoku using backtracking (for puzzle generation).\"\"\"\n", + " for row in range(9):\n", + " for col in range(9):\n", + " if board[row][col] == 0:\n", + " for num in range(1, 10):\n", + " if _is_valid_placement(board, row, col, num):\n", + " board[row][col] = num\n", + " if _solve_sudoku(board):\n", + " return True\n", + " board[row][col] = 0\n", + " return False\n", + " return True\n", + "\n", + "def _generate_complete_board(rng: random.Random) -> List[List[int]]:\n", + " \"\"\"Generate a complete valid Sudoku board.\"\"\"\n", + " board = [[0 for _ in range(9)] for _ in range(9)]\n", + "\n", + " # Fill diagonal 3x3 boxes first (they don't affect each other)\n", + " for box in range(3):\n", + " nums = list(range(1, 10))\n", + " rng.shuffle(nums)\n", + " for i in range(3):\n", + " for j in range(3):\n", + " board[box * 3 + i][box * 3 + j] = nums[i * 3 + j]\n", + "\n", + " # Solve the rest\n", + " _solve_sudoku(board)\n", + " return board\n", + "\n", + "@dataclass\n", + "class SudokuGame:\n", + " difficulty: int = 40 # Number of cells to remove (20 = easy, 40 = medium, 50 = hard)\n", + " seed: Optional[int] = None\n", + " _rng: random.Random = field(init = False, repr = False)\n", + " _board: List[List[int]] = field(init = False, repr = False)\n", + " _solution: List[List[int]] = field(init = False, repr = False)\n", + " _initial_board: List[List[int]] = field(init = False, repr = False)\n", + " _moves: int = field(default = 0, init = False, repr = False)\n", + " _state: str = field(default = \"ongoing\", init = False, repr = False)\n", + "\n", + " def __post_init__(self):\n", + " self._rng = random.Random(self.seed)\n", + "\n", + " # Generate complete board\n", + " complete_board = _generate_complete_board(self._rng)\n", + " self._solution = copy.deepcopy(complete_board)\n", + "\n", + " # Remove cells to create puzzle\n", + " self._board = copy.deepcopy(complete_board)\n", + " cells = [(r, c) for r in range(9) for c in range(9)]\n", + " self._rng.shuffle(cells)\n", + "\n", + " for r, c in cells[:self.difficulty]:\n", + " self._board[r][c] = 0\n", + "\n", + " self._initial_board = copy.deepcopy(self._board)\n", + " self._update_state()\n", + "\n", + " def board(self) -> List[List[int]]:\n", + " \"\"\"Return current board state.\"\"\"\n", + " return [row[:] for row in self._board]\n", + "\n", + " def initial_board(self) -> List[List[int]]:\n", + " \"\"\"Return initial puzzle state.\"\"\"\n", + " return [row[:] for row in self._initial_board]\n", + "\n", + " def state(self) -> str:\n", + " \"\"\"Return game state: 'ongoing', 'success', or 'failed'.\"\"\"\n", + " return self._state\n", + "\n", + " def moves(self) -> int:\n", + " \"\"\"Return number of moves made.\"\"\"\n", + " return self._moves\n", + "\n", + " def place_number(self, row: int, col: int, num: int) -> bool:\n", + " \"\"\"Place a number on the board. Returns True if valid move.\"\"\"\n", + " # Validate input\n", + " if not (0 <= row < 9 and 0 <= col < 9):\n", + " self._state = \"failed\"\n", + " return False\n", + "\n", + " if not (1 <= num <= 9):\n", + " self._state = \"failed\"\n", + " return False\n", + "\n", + " # Can't modify initial cells\n", + " if self._initial_board[row][col] != 0:\n", + " self._state = \"failed\"\n", + " return False\n", + " if self._board[row][col] != 0:\n", + " self._state = \"failed\"\n", + " return False\n", + " # Check if placement is valid\n", + " if not _is_valid_placement(self._board, row, col, num):\n", + " self._state = \"failed\"\n", + " return False\n", + "\n", + " # Place number\n", + " self._board[row][col] = num\n", + " self._moves += 1\n", + " self._update_state()\n", + " return True\n", + "\n", + " def _update_state(self) -> None:\n", + " \"\"\"Update game state based on current board.\"\"\"\n", + " # Check if puzzle is complete\n", + " if all(self._board[r][c] != 0 for r in range(9) for c in range(9)):\n", + " # Verify solution is correct\n", + " if self._board == self._solution:\n", + " self._state = \"success\"\n", + " else:\n", + " self._state = \"failed\"\n", + " else:\n", + " self._state = \"ongoing\"\n", + "\n", + " def pretty(self, colors: bool = True) -> str:\n", + " \"\"\"Pretty print the Sudoku board.\"\"\"\n", + " RESET = \"\\x1b[0m\"\n", + " INITIAL = \"\\x1b[38;5;45m\" # Cyan for initial numbers\n", + " PLACED = \"\\x1b[38;5;226m\" # Yellow for placed numbers\n", + " EMPTY = \"\\x1b[38;5;239m\" # Gray for empty cells\n", + "\n", + " lines = []\n", + " lines.append(\"┌───────┬───────┬───────┐\")\n", + "\n", + " for row in range(9):\n", + " row_str = \"│ \"\n", + " for col in range(9):\n", + " num = self._board[row][col]\n", + "\n", + " if colors:\n", + " if num == 0:\n", + " row_str += f\"{EMPTY}.{RESET}\"\n", + " elif self._initial_board[row][col] != 0:\n", + " row_str += f\"{INITIAL}{num}{RESET}\"\n", + " else:\n", + " row_str += f\"{PLACED}{num}{RESET}\"\n", + " else:\n", + " row_str += str(num) if num != 0 else \".\"\n", + "\n", + " if col % 3 == 2:\n", + " row_str += \" │ \"\n", + " else:\n", + " row_str += \" \"\n", + "\n", + " lines.append(row_str.rstrip())\n", + "\n", + " if row == 8:\n", + " lines.append(\"└───────┴───────┴───────┘\")\n", + " elif row % 3 == 2:\n", + " lines.append(\"├───────┼───────┼───────┤\")\n", + "\n", + " return \"\\n\".join(lines)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BBPHTCLYFivc" + }, + "source": [ + "Test the Sudoku environment:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "-M8kGaFRJ2ic", + "outputId": "5da676f8-7495-47ea-a570-74457cccd13c" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Initial puzzle:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "State: ongoing, Moves: 0\n" + ] + } + ], + "source": [ + "# Create an easy puzzle\n", + "game = SudokuGame(difficulty = 30, seed = 42)\n", + "print(\"Initial puzzle:\")\n", + "print(game.pretty())\n", + "print(f\"\\nState: {game.state()}, Moves: {game.moves()}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "zclUeNxosv4k", + "outputId": "113ab4e3-6c94-4f2d-e5bd-0ceb7736d71d" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "SudokuGame(difficulty=30, seed=42)" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "game" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "envzrXmjKRff" + }, + "source": [ + "Try making some moves:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "b-gSgthFI_wq", + "outputId": "e6a98078-9a31-4cad-ab1c-3b067c538930" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "After placing 7 at (1,0):\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "State: ongoing, Moves: 1\n" + ] + } + ], + "source": [ + "# Make a valid move\n", + "game.place_number(0, 1, 7)\n", + "print(\"\\nAfter placing 7 at (1,0):\")\n", + "print(game.pretty())\n", + "print(f\"State: {game.state()}, Moves: {game.moves()}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gGL1X29Fy4n5" + }, + "source": [ + "If we do some other action that's not part of the action space, we will get an error, and the game will not accept anymore actions." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VR6czU96cpxf" + }, + "source": [ + "# RL Environment Setup\n", + "\n", + "Execute strategies with time limits to prevent infinite loops." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "tdgjnf-8z_kr" + }, + "outputs": [], + "source": [ + "from typing import Callable\n", + "from unsloth import execute_with_time_limit\n", + "\n", + "def _execute_strategy(strategy: Callable, game: SudokuGame):\n", + " \"\"\"Execute a strategy function on a Sudoku game.\"\"\"\n", + " assert callable(strategy)\n", + "\n", + " max_moves = 100\n", + " valid_moves = 0 # Track successful moves\n", + "\n", + " while game.state() == \"ongoing\" and valid_moves < max_moves:\n", + " try:\n", + " board = game.board()\n", + " initial = game.initial_board()\n", + " result = strategy(board, initial)\n", + "\n", + " # Validate result format\n", + " if not isinstance(result, (tuple, list)) or len(result) != 3:\n", + " # Invalid format = immediate fail, but return valid moves made\n", + " return valid_moves, \"failed\"\n", + "\n", + " row, col, num = result\n", + "\n", + " # Validate types\n", + " if not all(isinstance(x, int) for x in [row, col, num]):\n", + " return valid_moves, \"failed\"\n", + "\n", + " # Try to place number\n", + " success = game.place_number(row, col, num)\n", + "\n", + " if success:\n", + " valid_moves += 1 # Count this valid move\n", + " else:\n", + " # Invalid move = game fails, but return valid_moves made so far\n", + " return valid_moves, \"failed\"\n", + "\n", + " except Exception:\n", + " return valid_moves, \"failed\"\n", + "\n", + " if valid_moves >= max_moves and game.state() == \"ongoing\":\n", + " return valid_moves, \"failed\"\n", + "\n", + " return valid_moves, game.state()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dkuHVdB09sgf" + }, + "source": [ + "To allow longer strategies for Reinforcement Learning, we shall allow a 10 second timer." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "SK-LfzsA9wbW" + }, + "outputs": [], + "source": [ + "@execute_with_time_limit(10)\n", + "def execute_strategy(strategy: Callable, game: SudokuGame):\n", + " \"\"\"Execute strategy with 10 second time limit.\"\"\"\n", + " return _execute_strategy(strategy, game)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "p7t7XMulLkpy" + }, + "source": [ + "Test with a simple strategy:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "VvwUzwSlLif2", + "outputId": "a6cdd742-2c78-4e68-c698-ee2810588f96" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Moves: 1, State: failed\n" + ] + } + ], + "source": [ + "def simple_strategy(board, initial):\n", + " \"\"\"Simple strategy: fill first empty cell with 1.\"\"\"\n", + " for r in range(9):\n", + " for c in range(9):\n", + " if board[r][c] == 0 and initial[r][c] == 0:\n", + " return (r, c, 7)\n", + " return (0, 0, 7)\n", + "\n", + "game = SudokuGame(difficulty = 30, seed = 42)\n", + "try:\n", + " moves, state = execute_strategy(simple_strategy, game)\n", + " print(f\"Moves: {moves}, State: {state}\")\n", + "except TimeoutError as e:\n", + " print(f\"Timed out: {e}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "FWN9b9TAk89-", + "outputId": "760ff7f0-db9b-4d5f-b1a2-5542c7bd6843" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n" + ] + } + ], + "source": [ + "print(game.pretty())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tRhLV_bZMYxy" + }, + "source": [ + "# Code Execution\n", + "\n", + "To execute and create a new Python function, we first have to check if the function does not call other global variables or cheat. This is called `countering reward hacking` since we don't want the function to cheat.\n", + "\n", + "For example the below piece of code is fine, since it only imports Python level functions. We use `check_python_modules`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "zz80kvg6M4BG", + "outputId": "60daace0-c1be-4038-ecd3-2620214e9193" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Safe Python code? True\n", + "{'stdlib': [], 'non_stdlib': [], 'relative_imports': 0}\n" + ] + } + ], + "source": [ + "from unsloth import check_python_modules, create_locked_down_function\n", + "\n", + "# Test safe code\n", + "sample = \"\"\"\n", + "def strategy(board, initial):\n", + " for r in range(9):\n", + " for c in range(9):\n", + " if board[r][c] == 0:\n", + " return (r, c, 1)\n", + " return (0, 0, 1)\n", + "\"\"\"\n", + "\n", + "ok, info = check_python_modules(sample)\n", + "print(\"Safe Python code?\", ok)\n", + "print(info)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bZzVWgKQ-VIg" + }, + "source": [ + "For the below piece of code, since we import `numpy`, we should not allow the execution:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Z89Jw1KB-Ux7", + "outputId": "cde92f58-02b0-4622-9a6d-9306900c3d06" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Safe Python code? False\n", + "{'stdlib': [], 'non_stdlib': ['numpy'], 'relative_imports': 0}\n" + ] + } + ], + "source": [ + "sample = \"\"\"\n", + "def strategy(board, initial):\n", + " import numpy as np\n", + " return (0, 0, 1)\n", + "\"\"\"\n", + "\n", + "ok, info = check_python_modules(sample)\n", + "print(\"Safe Python code?\", ok)\n", + "print(info)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8CzwCyXIPK04" + }, + "source": [ + "# Data & RL task setup\n", + "\n", + "Create the prompt that instructs the model to generate a Sudoku solving strategy. You can customize this to some other task for another RL task." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "B-2RRE4HMrQO", + "outputId": "cb3a9a79-0bf3-47d7-e3a8-e44592ed1f1b" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Create a Sudoku solving strategy using only native Python built-in functions without any import statements.\n", + "You are given two lists of lists (9x9 grids):\n", + "- board: current state (0 means empty)\n", + "- initial: starting puzzle (0 means was empty, numbers are fixed)\n", + "\n", + "Return a tuple (row, col, number) for the next move.\n", + "- row: 0-8 (row index)\n", + "- col: 0-8 (column index)\n", + "- number: 1-9 (digit to place)\n", + "\n", + "Only place numbers in cells that are BOTH empty in initial AND empty in board (initial[row][col] == 0 AND board[row][col] == 0)\n", + "Use Sudoku rules: no duplicates in rows, columns, or 3x3 boxes.\n", + "Output your function in backticks:\n", + "```python\n", + "def strategy(board, initial):\n", + " # Your logic here\n", + " return (row, col, number)\n", + "```\n", + "All helper functions must be inside def strategy. Output only the function.\n" + ] + } + ], + "source": [ + "prompt = \"\"\"\n", + "Create a Sudoku solving strategy using only native Python built-in functions without any import statements.\n", + "You are given two lists of lists (9x9 grids):\n", + "- board: current state (0 means empty)\n", + "- initial: starting puzzle (0 means was empty, numbers are fixed)\n", + "\n", + "Return a tuple (row, col, number) for the next move.\n", + "- row: 0-8 (row index)\n", + "- col: 0-8 (column index)\n", + "- number: 1-9 (digit to place)\n", + "\n", + "Only place numbers in cells that are BOTH empty in initial AND empty in board (initial[row][col] == 0 AND board[row][col] == 0)\n", + "Use Sudoku rules: no duplicates in rows, columns, or 3x3 boxes.\n", + "Output your function in backticks:\n", + "```python\n", + "def strategy(board, initial):\n", + " # Your logic here\n", + " return (row, col, number)\n", + "```\n", + "All helper functions must be inside def strategy. Output only the function.\n", + "\"\"\".strip()\n", + "\n", + "print(prompt)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MIdudFUodN4i" + }, + "source": [ + "First, let's prompt the model without RL and see how it goes:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "9HJxrS76h3Ds", + "outputId": "056b4001-c877-4913-bd48-d8618146e7e0" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "==================================================\n", + "BASE MODEL OUTPUT (before RL training):\n", + "==================================================\n", + "This is a complex request. Implementing a full, robust Sudoku solver strategy using *only* native Python built-in functions (no imports like `collections.Counter` or complex data structures beyond standard lists/dicts) requires implementing the core logic of constraint checking and candidate generation.\n", + "\n", + "Since the goal is to find *a* valid next move, we will use a simple backtracking/constraint propagation approach:\n", + "1. Identify all empty cells.\n", + "2. For each empty cell, determine the set of valid numbers (1-9) that can be placed there without violating Sudoku rules based on the current `board`.\n", + "3.\n" + ] + } + ], + "source": [ + "text = tokenizer.apply_chat_template(\n", + " [{\"role\": \"user\", \"content\": prompt.strip()}],\n", + " tokenize = False,\n", + " add_generation_prompt = True,\n", + ")\n", + "\n", + "from transformers import TextStreamer\n", + "print(\"=\" * 50)\n", + "print(\"BASE MODEL OUTPUT (before RL training):\")\n", + "print(\"=\" * 50)\n", + "\n", + "inputs = tokenizer(\n", + " text = text,\n", + " add_special_tokens = False,\n", + " return_tensors = \"pt\",\n", + ").to(\"cuda\")\n", + "\n", + "text_streamer = TextStreamer(tokenizer, skip_prompt = True)\n", + "result = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128,\n", + " use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iknaWZNudTNq" + }, + "source": [ + "# Reward functions\n", + "\n", + "We now design a `extract_function` function which simply extracts the function wrapped in 3 back ticks.\n", + "\n", + "And 3 reward functions:\n", + "\n", + "1. `function_works` which rewards the model if the strategy is a valid Python function.\n", + "2. `no_cheating` which checks if the function imported other modules, and if it did, we penalize it.\n", + "3. `strategy_succeeds` which checks if the game strategy actually succeeds in attaining Sudoku after running the auto-generated strategy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "8JJGXKdJ-Zl_" + }, + "outputs": [], + "source": [ + "def extract_function(text):\n", + " \"\"\"Extract Python function from markdown code blocks.\"\"\"\n", + " if text.count(\"```\") >= 2:\n", + " first = text.find(\"```\") + 3\n", + " second = text.find(\"```\", first)\n", + " fx = text[first:second].strip()\n", + " fx = fx.removeprefix(\"python\\n\")\n", + " fx = fx[fx.find(\"def\"):]\n", + " if fx.startswith(\"def strategy(board, initial):\"):\n", + " return fx\n", + " return None" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KLXEcf_HSJlI" + }, + "source": [ + "**Reward 1: Function Works**\n", + "\n", + "Checks if the generated code is valid Python and can be executed." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "h3-B0IIsS56S" + }, + "outputs": [], + "source": [ + "def function_works(completions, **kwargs):\n", + " \"\"\"Reward for generating valid executable Python code.\"\"\"\n", + " scores = []\n", + " for completion in completions:\n", + " score = 0\n", + " response = completion[0][\"content\"]\n", + " function = extract_function(response)\n", + "\n", + " if function is not None:\n", + " ok, info = check_python_modules(function)\n", + "\n", + " if function is None or \"error\" in info:\n", + " score = -2.0 # Invalid function\n", + " else:\n", + " try:\n", + " new_strategy = create_locked_down_function(function)\n", + " score = 1.0 # Valid function\n", + " except:\n", + " score = -1.0 # Function has errors\n", + "\n", + " scores.append(score)\n", + " return scores" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SDyEfDGjM6_2" + }, + "source": [ + "**Reward 2: No Cheating**\n", + "\n", + "Penalizes functions that import external libraries." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "uGPkLhqjM-jK" + }, + "outputs": [], + "source": [ + "def no_cheating(completions, **kwargs):\n", + " \"\"\"Penalize use of external imports.\"\"\"\n", + " scores = []\n", + " for completion in completions:\n", + " response = completion[0][\"content\"]\n", + " function = extract_function(response)\n", + "\n", + " if function is not None:\n", + " ok, info = check_python_modules(function)\n", + " scores.append(1.0 if ok else -20.0) # Heavy penalty for cheating\n", + " else:\n", + " scores.append(-1.0) # Failed to create function\n", + "\n", + " return scores" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FRgHkfQ4M_d9" + }, + "source": [ + "**Reward 3: Strategy Succeeds**\n", + "\n", + "Rewards strategies that successfully solve Sudoku puzzles." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sSM7ya5aNFGh" + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "global PRINTER\n", + "PRINTER = 0\n", + "\n", + "def strategy_succeeds(completions, **kwargs):\n", + " \"\"\"Reward valid moves even if strategy eventually fails.\"\"\"\n", + " global PRINTER\n", + " scores = []\n", + "\n", + " seed = np.random.randint(10000)\n", + " difficulty = 40\n", + " for completion in completions:\n", + " printed = False\n", + " response = completion[0][\"content\"]\n", + " function = extract_function(response)\n", + "\n", + " if PRINTER % 5 == 0:\n", + " printed = True\n", + " print(\"\\n\" + \"=\" * 60)\n", + " print(function)\n", + " print(\"=\" * 60)\n", + " PRINTER += 1\n", + "\n", + " if function is not None:\n", + " ok, info = check_python_modules(function)\n", + "\n", + " if function is None or \"error\" in info:\n", + " scores.append(0)\n", + " continue\n", + "\n", + " try:\n", + " new_strategy = create_locked_down_function(function)\n", + " except:\n", + " scores.append(0)\n", + " continue\n", + "\n", + " try:\n", + " game = SudokuGame(difficulty = difficulty, seed = seed)\n", + " valid_moves, game_state = execute_strategy(new_strategy, game)\n", + " if valid_moves == difficulty:\n", + " game_state = \"success\"\n", + "\n", + " print(f\"\\n Valid moves: {valid_moves}, Final state: {game_state}\")\n", + "\n", + " if not printed:\n", + " print(\"Strategy:\")\n", + " print(function[:200] + \"...\" if len(function) > 200 else function)\n", + "\n", + " print(\"\\nFinal board:\")\n", + " print(game.pretty())\n", + "\n", + " if game_state == \"success\":\n", + " scores.append(30.0) # Solved the puzzle!\n", + " elif valid_moves > 0:\n", + " # Reward based on valid moves made before failure\n", + " # Each valid move is worth 0.2 points\n", + " reward = valid_moves * 0.2\n", + " scores.append(reward)\n", + " else:\n", + " scores.append(-2.0) # Failed immediately with no valid moves\n", + "\n", + " except TimeoutError:\n", + " print(\"Timeout\")\n", + " scores.append(-1.0)\n", + " except Exception as e:\n", + " print(f\"Exception: {str(e)[:100]}\")\n", + " scores.append(-3.0)\n", + "\n", + " return scores" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TCpSxtvSeAG_" + }, + "source": [ + "# Dataset Preparation\n", + "\n", + "Create the training dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Ldf6SjLHVPRv", + "outputId": "e6c0afa5-e90c-43e6-e1ff-2ca353360854" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Maximum prompt length: 830\n", + "\n", + "Dataset sample:\n", + "{'prompt': [{'content': 'Create a Sudoku solving strategy using only native Python built-in functions without any import statements.\\nYou are given two lists of lists (9x9 grids):\\n- board: current state (0 means empty)\\n- initial: starting puzzle (0 means was empty, numbers are fixed)\\n\\nReturn a tuple (row, col, number) for the next move.\\n- row: 0-8 (row index)\\n- col: 0-8 (column index)\\n- number: 1-9 (digit to place)\\n\\nOnly place numbers in cells that are BOTH empty in initial AND empty in board (initial[row][col] == 0 AND board[row][col] == 0)\\nUse Sudoku rules: no duplicates in rows, columns, or 3x3 boxes.\\nOutput your function in backticks:\\n```python\\ndef strategy(board, initial):\\n # Your logic here\\n return (row, col, number)\\n```\\nAll helper functions must be inside def strategy. Output only the function.', 'role': 'user'}], 'answer': 0}\n" + ] + } + ], + "source": [ + "from datasets import Dataset\n", + "\n", + "dataset = Dataset.from_list([\n", + " {\n", + " \"prompt\": [{\"role\": \"user\", \"content\": prompt.strip()}],\n", + " \"answer\": 0,\n", + " }\n", + "] * 1000)\n", + "\n", + "maximum_length = len(tokenizer.apply_chat_template(\n", + " [{\"role\": \"user\", \"content\": prompt.strip()}],\n", + " add_generation_prompt = True\n", + "))\n", + "\n", + "print(f\"Maximum prompt length: {maximum_length}\")\n", + "print(\"\\nDataset sample:\")\n", + "print(dataset[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9-IOMhVg-2AM" + }, + "source": [ + "\n", + "### Train the model\n", + "\n", + "Now set up GRPO Trainer and all configurations! We also support GSPO, GAPO, Dr GRPO and more! Go the Unsloth [Reinforcement Learning Docs](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide) for more options." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ptqkXK2D4d6p", + "outputId": "7ffb2e77-3bb3-4af1-db5c-446e63fb0351" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.\n" + ] + } + ], + "source": [ + "# Leave room for the prompt (plus 1 token safety margin)\n", + "max_completion_length = max_seq_length - (maximum_length + 1)\n", + "\n", + "from trl import GRPOConfig, GRPOTrainer\n", + "training_args = GRPOConfig(\n", + " temperature = 1.0,\n", + " learning_rate = 5e-5,\n", + " weight_decay = 0.001,\n", + " warmup_ratio = 0.1,\n", + " lr_scheduler_type = \"linear\",\n", + " optim = \"adamw_8bit\",\n", + " logging_steps = 1,\n", + " per_device_train_batch_size = 1,\n", + " gradient_accumulation_steps = 2, # Increase to 4 for smoother training\n", + " num_generations = 2, # Decrease if out of memory\n", + " max_completion_length = max_completion_length,\n", + " # num_train_epochs = 1, # Set to 1 for a full training run\n", + " max_steps = 60,\n", + " save_steps = 100,\n", + " report_to = \"none\", # Can use Weights & Biases, TrackIO\n", + " output_dir = \"outputs\",\n", + " epsilon = 0.2,\n", + " epsilon_high = 0.28, # one sided\n", + " delta = 1.5, # two sided\n", + " loss_type = 'bnpo',\n", + " mask_truncated_completions = True\n", + " # For optional training + evaluation\n", + " # fp16_full_eval = True,\n", + " # per_device_eval_batch_size = 4,\n", + " # eval_accumulation_steps = 1,\n", + " # eval_strategy = \"steps\",\n", + " # eval_steps = 1,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "r9Mv8UZO5hz-" + }, + "source": [ + "And let's run the trainer! If you scroll up, you'll see a table of rewards. The goal is to see the `reward` column increase!\n", + "\n", + "You might have to wait 150 to 200 steps for any action. You'll probably get low reward for the first 100 steps. Please be patient!\n", + "\n", + "| Step | Training Loss | reward | reward_std | completion_length | kl |\n", + "|------|---------------|-----------|------------|-------------------|----------|\n", + "| 1 | 0.000000 | 0.125000 | 0.000000 | 200.000000 | 0.000000 |\n", + "| 2 | 0.000000 | 0.072375 | 0.248112 | 200.000000 | 0.000000 |\n", + "| 3 | 0.000000 | -0.079000 | 0.163776 | 182.500000 | 0.000005 |" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vzOuSVCL_GA9" + }, + "outputs": [], + "source": [ + "# For optional training + evaluation\n", + "# new_dataset = dataset.train_test_split(test_size = 0.01)\n", + "\n", + "trainer = GRPOTrainer(\n", + " model = model,\n", + " processing_class = tokenizer,\n", + " reward_funcs = [\n", + " function_works,\n", + " no_cheating,\n", + " strategy_succeeds,\n", + " ],\n", + " args = training_args,\n", + " train_dataset = dataset,\n", + "\n", + " # For optional training + evaluation\n", + " # train_dataset = new_dataset[\"train\"],\n", + " # eval_dataset = new_dataset[\"test\"],\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fQhtuwP4cf34" + }, + "source": [ + "And let's train the model!\n", + "\n", + "**NOTE** A T4 free GPU might take 5 minutes for one generation sadly since it's an old GPU - A100 or H100 will be much faster!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "VGRxPdSCcfC3", + "outputId": "c90a4a3d-faa2-4785-d922-69e1d2fc9072" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': 2}.\n", + "==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1\n", + " \\\\ /| Num examples = 1,000 | Num Epochs = 1 | Total steps = 100\n", + "O^O/ \\_/ \\ Batch size per device = 1 | Gradient accumulation steps = 2\n", + "\\ / Data Parallel GPUs = 1 | Total batch size (1 x 2 x 1) = 2\n", + " \"-____-\" Trainable parameters = 59,719,680 of 5,164,017,184 (1.16% trained)\n", + "Passing `generation_config` together with generation-related arguments=({'pad_token_id'}) is deprecated and will be removed in future versions. Please pass either a `generation_config` object OR all generation parameters explicitly, but not both.\n", + "Caching is incompatible with gradient checkpointing in Gemma4TextDecoderLayer. Setting `past_key_values=None`.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Unsloth: Will smartly offload gradients to save VRAM!\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " def get_candidates(r, c):\n", + " if board[r][c] != 0:\n", + " return set()\n", + "\n", + " used = set()\n", + "\n", + " # Check row\n", + " for col in range(9):\n", + " if board[r][col] != 0:\n", + " used.add(board[r][col])\n", + "\n", + " # Check column\n", + " for row in range(9):\n", + " if board[row][c] != 0:\n", + " used.add(board[row][c])\n", + "\n", + " # Check 3x3 box\n", + " start_row = (r // 3) * 3\n", + " start_col = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " current_r = start_row + i\n", + " current_c = start_col + j\n", + " if board[current_r][current_c] != 0:\n", + " used.add(board[current_r][current_c])\n", + "\n", + " all_digits = set([1, 2, 3, 4, 5, 6, 7, 8, 9])\n", + " return all_digits - used\n", + "\n", + " # 1. Find all empty cells where a move can potentially be made\n", + " empty_cells = []\n", + " for r in range(9):\n", + " for c in range(9):\n", + " if board[r][c] == 0 and initial[r][c] == 0:\n", + " empty_cells.append((r, c))\n", + "\n", + " # 2. Iterate through empty cells and check for 'Naked Singles'\n", + " for r, c in empty_cells:\n", + " candidates = get_candidates(r, c)\n", + " \n", + " # If only one candidate exists, this is our move\n", + " if len(candidates) == 1:\n", + " number = list(candidates)[0]\n", + " return (r, c, number)\n", + "\n", + " # If no immediate naked single is found, return None (or an arbitrary valid empty cell if required, \n", + " # but the prompt implies returning a move if possible based on the strategy).\n", + " # For this constrained problem, if no single is found, we return None, though the prompt structure \n", + " # implies a move will be returned if the input is solvable by this simple strategy.\n", + " return None\n", + "============================================================\n", + "\n", + " Valid moves: 40, Final state: success\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 36, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " def is_valid(b, r, c, num):\n", + " # Check row\n", + " for col in range(9):\n", + " if col != c and b[r][col] == num:\n", + " return False\n", + "\n", + " # Che...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [ 60/60 ]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StepTraining Lossrewardreward_stdcompletions / mean_lengthcompletions / min_lengthcompletions / max_lengthcompletions / clipped_ratiocompletions / mean_terminated_lengthcompletions / min_terminated_lengthcompletions / max_terminated_lengthklrewards / function_works / meanrewards / function_works / stdrewards / no_cheating / meanrewards / no_cheating / stdrewards / strategy_succeeds / meanrewards / strategy_succeeds / std
10.00000020.60000016.122034501.000000368.000000634.0000000.000000501.000000368.000000634.0000000.0000061.0000000.0000001.0000000.00000018.60000016.122034
20.0000008.8000000.000000392.000000380.000000404.0000000.000000392.000000380.000000404.0000000.0000041.0000000.0000001.0000000.0000006.8000000.000000
30.0000008.4000000.000000338.500000329.000000348.0000000.000000338.500000329.000000348.0000000.0000421.0000000.0000001.0000000.0000006.4000000.000000
40.0000008.0000000.000000369.500000360.000000379.0000000.000000369.500000360.000000379.0000000.0003711.0000000.0000001.0000000.0000006.0000000.000000
50.0000015.8000003.676955470.000000429.000000511.0000000.000000470.000000429.000000511.0000000.0014491.0000000.0000001.0000000.0000003.8000003.676955
60.00000220.40000016.404879576.500000415.000000738.0000000.000000576.500000415.000000738.0000000.0022731.0000000.0000001.0000000.00000018.40000016.404877
70.0000317.5000001.555635404.500000349.000000460.0000000.000000404.500000349.000000460.0000000.0308221.0000000.0000001.0000000.0000005.5000001.555635
80.00000720.60000016.122034906.000000811.0000001001.0000000.000000906.000000811.0000001001.0000000.0070961.0000000.0000001.0000000.00000018.60000016.122034
90.0000058.9000000.424264737.500000634.000000841.0000000.000000737.500000634.000000841.0000000.0063611.0000000.0000001.0000000.0000006.9000000.424264
100.0000158.7000010.141421649.500000631.000000668.0000000.000000649.500000631.000000668.0000000.0120281.0000000.0000001.0000000.0000006.7000000.141422
110.00000932.0000000.000000692.000000650.000000734.0000000.000000692.000000650.000000734.0000000.0088931.0000000.0000001.0000000.00000030.0000000.000000
120.0000156.5000003.818376543.000000481.000000605.0000000.000000543.000000481.000000605.0000000.0148791.0000000.0000001.0000000.0000004.5000003.818377
130.00002232.0000000.000000514.000000494.000000534.0000000.000000514.000000494.000000534.0000000.0215011.0000000.0000001.0000000.00000030.0000000.000000
140.00002232.0000000.000000679.500000615.000000744.0000000.000000679.500000615.000000744.0000000.0215871.0000000.0000001.0000000.00000030.0000000.000000
150.0000146.0000002.828427723.500000612.000000835.0000000.000000723.500000612.000000835.0000000.0144671.0000000.0000001.0000000.0000004.0000002.828427
160.00004320.00000016.970562575.500000500.000000651.0000000.000000575.500000500.000000651.0000000.0426461.0000000.0000001.0000000.00000018.00000016.970562
170.00002320.60000016.122034815.000000779.000000851.0000000.000000815.000000779.000000851.0000000.0226291.0000000.0000001.0000000.00000018.60000016.122034
180.00001420.60000016.122034702.000000528.000000876.0000000.000000702.000000528.000000876.0000000.0136691.0000000.0000001.0000000.00000018.60000016.122034
190.0000318.6000000.000000534.000000530.000000538.0000000.000000534.000000530.000000538.0000000.0308641.0000000.0000001.0000000.0000006.6000000.000000
200.0000209.4000000.282843600.500000481.000000720.0000000.000000600.500000481.000000720.0000000.0217241.0000000.0000001.0000000.0000007.4000000.282843
210.00002520.10000016.829140571.500000564.000000579.0000000.000000571.500000564.000000579.0000000.0245441.0000000.0000001.0000000.00000018.10000016.829142
220.00002232.0000000.000000772.000000674.000000870.0000000.000000772.000000674.000000870.0000000.0224141.0000000.0000001.0000000.00000030.0000000.000000
230.0000378.0000000.000000534.000000512.000000556.0000000.000000534.000000512.000000556.0000000.0369321.0000000.0000001.0000000.0000006.0000000.000000
240.0000208.2000000.000000539.000000492.000000586.0000000.000000539.000000492.000000586.0000000.0201431.0000000.0000001.0000000.0000006.2000000.000000
250.0000438.0000000.565685703.000000512.000000894.0000000.000000703.000000512.000000894.0000000.0429911.0000000.0000001.0000000.0000006.0000000.565686
260.0000398.4000000.000000509.000000493.000000525.0000000.000000509.000000493.000000525.0000000.0394601.0000000.0000001.0000000.0000006.4000000.000000
270.00004720.40000016.404879735.000000490.000000980.0000000.000000735.000000490.000000980.0000000.0471551.0000000.0000001.0000000.00000018.40000016.404877
280.0000398.0000000.000000504.000000489.000000519.0000000.000000504.000000489.000000519.0000000.0390001.0000000.0000001.0000000.0000006.0000000.000000
290.0000568.4000000.000000540.000000519.000000561.0000000.000000540.000000519.000000561.0000000.0561591.0000000.0000001.0000000.0000006.4000000.000000
300.0000468.4000000.000000534.000000524.000000544.0000000.000000534.000000524.000000544.0000000.0460861.0000000.0000001.0000000.0000006.4000000.000000
310.0000478.8000000.000000489.500000480.000000499.0000000.000000489.500000480.000000499.0000000.0465801.0000000.0000001.0000000.0000006.8000000.000000
320.0000408.2000000.000000525.500000476.000000575.0000000.000000525.500000476.000000575.0000000.0404871.0000000.0000001.0000000.0000006.2000000.000000
330.0000378.4000000.000000482.500000450.000000515.0000000.000000482.500000450.000000515.0000000.0368421.0000000.0000001.0000000.0000006.4000000.000000
340.0000588.8000000.000000516.500000512.000000521.0000000.000000516.500000512.000000521.0000000.0577501.0000000.0000001.0000000.0000006.8000000.000000
350.0000378.8000000.000000525.000000514.000000536.0000000.000000525.000000514.000000536.0000000.0374511.0000000.0000001.0000000.0000006.8000000.000000
360.0000654.1000005.798275874.000000681.0000001067.0000000.000000874.000000681.0000001067.0000000.0649731.0000000.0000001.0000000.0000002.1000005.798275
370.0000388.2000000.000000498.500000498.000000499.0000000.000000498.500000498.000000499.0000000.0382141.0000000.0000001.0000000.0000006.2000000.000000
380.0000598.4000000.000000480.000000480.000000480.0000000.000000480.000000480.000000480.0000000.0586451.0000000.0000001.0000000.0000006.4000000.000000
390.0000378.4000000.000000465.500000454.000000477.0000000.000000465.500000454.000000477.0000000.0368111.0000000.0000001.0000000.0000006.4000000.000000
400.0000768.4000000.000000523.500000501.000000546.0000000.000000523.500000501.000000546.0000000.0761751.0000000.0000001.0000000.0000006.4000000.000000
410.0000558.4000000.000000483.500000466.000000501.0000000.000000483.500000466.000000501.0000000.0553551.0000000.0000001.0000000.0000006.4000000.000000
420.00008920.60000016.122034717.000000534.000000900.0000000.000000717.000000534.000000900.0000000.0887321.0000000.0000001.0000000.00000018.60000016.122034
430.0000698.4000000.000000493.000000475.000000511.0000000.000000493.000000475.000000511.0000000.0693491.0000000.0000001.0000000.0000006.4000000.000000
440.0000788.0000000.000000530.500000480.000000581.0000000.000000530.500000480.000000581.0000000.0780021.0000000.0000001.0000000.0000006.0000000.000000
450.0000847.8000000.000000538.000000508.000000568.0000000.000000538.000000508.000000568.0000000.0837141.0000000.0000001.0000000.0000005.8000000.000000
460.0000758.6000000.000000519.500000505.000000534.0000000.000000519.500000505.000000534.0000000.0749361.0000000.0000001.0000000.0000006.6000000.000000
470.0000839.2000000.000000517.000000502.000000532.0000000.000000517.000000502.000000532.0000000.0827791.0000000.0000001.0000000.0000007.2000000.000000
480.0000788.4000000.000000458.000000453.000000463.0000000.000000458.000000453.000000463.0000000.0775231.0000000.0000001.0000000.0000006.4000000.000000
490.0000738.6000000.000000518.500000513.000000524.0000000.000000518.500000513.000000524.0000000.0725321.0000000.0000001.0000000.0000006.6000000.000000
500.0000749.6000000.000000525.500000509.000000542.0000000.000000525.500000509.000000542.0000000.0743911.0000000.0000001.0000000.0000007.6000000.000000
510.0000958.2000000.000000492.000000479.000000505.0000000.000000492.000000479.000000505.0000000.0951471.0000000.0000001.0000000.0000006.2000000.000000
520.0000788.2000000.000000618.000000477.000000759.0000000.000000618.000000477.000000759.0000000.0784011.0000000.0000001.0000000.0000006.2000000.000000
530.0000798.8000000.000000480.500000469.000000492.0000000.000000480.500000469.000000492.0000000.0788171.0000000.0000001.0000000.0000006.8000000.000000
540.0000829.2000000.000000721.500000680.000000763.0000000.000000721.500000680.000000763.0000000.0820421.0000000.0000001.0000000.0000007.2000000.000000
550.0000838.4000000.000000459.500000425.000000494.0000000.000000459.500000425.000000494.0000000.0830101.0000000.0000001.0000000.0000006.4000000.000000
560.0000638.2000000.000000572.000000474.000000670.0000000.000000572.000000474.000000670.0000000.0627901.0000000.0000001.0000000.0000006.2000000.000000
570.0000739.6000000.000000502.500000469.000000536.0000000.000000502.500000469.000000536.0000000.0725581.0000000.0000001.0000000.0000007.6000000.000000
580.0000728.2000000.000000655.000000538.000000772.0000000.000000655.000000538.000000772.0000000.0724101.0000000.0000001.0000000.0000006.2000000.000000
590.0000898.8000000.000000454.000000435.000000473.0000000.000000454.000000435.000000473.0000000.0889811.0000000.0000001.0000000.0000006.8000000.000000
600.0000954.3000006.081119641.000000542.000000740.0000000.000000641.000000542.000000740.0000000.0949311.0000000.0000001.0000000.0000002.3000006.081119

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " Valid moves: 34, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " def is_valid(b, r, c, num):\n", + " # Check row\n", + " for col in range(9):\n", + " if col != c and b[r][col] == num:\n", + " return False\n", + "\n", + " # Che...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 34, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " rows = [0, 1, 2, 3, 4, 5, 6, 7, 8]\n", + " cols = [0, 1, 2, 3, 4, 5, 6, 7, 8]\n", + "\n", + " def is_safe(b, r, c, num):\n", + " # Check row\n", + " for j in range(9):\n", + " i...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " def is_valid(b, r, c, num):\n", + " # Check row\n", + " for j in range(9):\n", + " if b[r][j] == num:\n", + " return False\n", + " \n", + " # Check column...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " def is_valid(b, r, c, num):\n", + " # Check row\n", + " for j in range(9):\n", + " if b[r][j] == num:\n", + " return False\n", + " \n", + " # Check column\n", + " for i in range(9):\n", + " if b[i][c] == num:\n", + " return False\n", + " \n", + " # Check 3x3 box\n", + " start_row = (r // 3) * 3\n", + " start_col = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " if b[start_row + i][start_col + j] == num:\n", + " return False\n", + " \n", + " return True\n", + "\n", + " def find_next_move():\n", + " for r in range(9):\n", + " for c in range(9):\n", + " # Only consider placing numbers in cells that are empty in both initial and current board\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " for num in range(1, 10):\n", + " if is_valid(board, r, c, num):\n", + " return (r, c, num)\n", + " return None\n", + "\n", + " result = find_next_move()\n", + " if result:\n", + " return result\n", + " else:\n", + " # No immediate deterministic move found based on this simple strategy\n", + " return None\n", + "============================================================\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 30, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " def is_valid(b, r, c, num):\n", + " # Check row\n", + " for col in range(9):\n", + " if col != c and b[r][col] == num:\n", + " return False\n", + " \n", + " ...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 30, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " def is_valid(b, r, c, num):\n", + " # Check row\n", + " for col in range(9):\n", + " if col != c and b[r][col] == num:\n", + " return False\n", + "\n", + " # Che...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 6, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " def is_valid(b, r, c, num):\n", + " # Check row\n", + " for col in range(9):\n", + " if b[r][col] == num:\n", + " return False\n", + " \n", + " # Check co...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " def get_possible_values(r, c):\n", + " if board[r][c] != 0:\n", + " return []\n", + "\n", + " used_values = set()\n", + "\n", + " # Check row\n", + " for col in range(9):\n", + " ...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " def get_possible_values(r, c):\n", + " if board[r][c] != 0:\n", + " return []\n", + "\n", + " used = set()\n", + "\n", + " # Check row\n", + " for col in range(9):\n", + " if board[r][col] != 0:\n", + " used.add(board[r][col])\n", + "\n", + " # Check column\n", + " for row in range(9):\n", + " if board[row][c] != 0:\n", + " used.add(board[row][c])\n", + "\n", + " # Check 3x3 box\n", + " start_row = (r // 3) * 3\n", + " start_col = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " current_row = start_row + i\n", + " current_col = start_col + j\n", + " if board[current_row][current_col] != 0:\n", + " used.add(board[current_row][current_col])\n", + "\n", + " possible = []\n", + " for num in range(1, 10):\n", + " if num not in used:\n", + " possible.append(num)\n", + " return possible\n", + "\n", + " # Find the first empty cell that is also empty in the initial board\n", + " for r in range(9):\n", + " for c in range(9):\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " possible_values = get_possible_values(r, c)\n", + " if possible_values:\n", + " # Return the first possible move found\n", + " return (r, c, possible_values[0])\n", + " \n", + " # If no move is found (puzzle might be full or unsolvable with this simple strategy)\n", + " return (-1, -1, -1)\n", + "============================================================\n", + "\n", + " Valid moves: 34, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 40, Final state: success\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_box_coords(r, c):\n", + " \"\"\"Returns the top-left coordinates (start_row, start_col) of the 3x3 box containing (r, c).\"\"\"\n", + " start_row = (r //...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 22, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " def is_safe(b, r, c, num):\n", + " # Check row\n", + " for col in range(9):\n", + " if col != c and b[r][col] == num:\n", + " return False\n", + "\n", + " # Chec...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 33, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " def is_safe(b, r, c, num):\n", + " # Check row\n", + " for col in range(9):\n", + " if col != c and b[r][col] == num:\n", + " return False\n", + "\n", + " # Chec...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 40, Final state: success\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " rows = 9\n", + " cols = 9\n", + "\n", + " def get_box_indices(r, c):\n", + " \"\"\"Returns the starting row and column indices of the 3x3 box containing (r, c).\"\"\"\n", + " start_row = ...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " R = 9\n", + " C = 9\n", + "\n", + " def get_box_indices(r, c):\n", + " \"\"\"Returns the top-left coordinates of the 3x3 box containing (r, c).\"\"\"\n", + " start_row = (r // 3) * 3\n", + " start_col = (c // 3) * 3\n", + " return start_row, start_col\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " \"\"\"Checks if placing 'num' at (r, c) is valid based on current board state.\"\"\"\n", + " \n", + " # Check row\n", + " for col in range(C):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + " \n", + " # Check column\n", + " for row in range(R):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + " \n", + " # Check 3x3 box\n", + " start_row, start_col = get_box_indices(r, c)\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_row + i\n", + " curr_c = start_col + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + " \n", + " return True\n", + "\n", + " # 1. Identify candidate cells (where both board and initial are 0)\n", + " empty_cells = []\n", + " for r in range(R):\n", + " for c in range(C):\n", + " if board[r][c] == 0 and initial[r][c] == 0:\n", + " empty_cells.append((r, c))\n", + "\n", + " # 2. Iterate through empty cells and test numbers 1-9\n", + " for r, c in empty_cells:\n", + " possible_numbers = []\n", + " \n", + " # Determine which numbers (1-9) are possible for this cell (r, c)\n", + " for num in range(1, 10):\n", + " if is_valid_placement(r, c, num):\n", + " possible_numbers.append(num)\n", + "\n", + " # 3. Check for Naked Single (only one possibility)\n", + " if len(possible_numbers) == 1:\n", + " number = possible_numbers[0]\n", + " return (r, c, number)\n", + "\n", + " # If no immediate single solution is found, return None or an arbitrary empty move\n", + " # Based on the prompt, we must return a tuple (row, col, number). \n", + " # If no move is found, we return an arbitrary valid empty cell and 0, \n", + " # though ideally, a real solver would backtrack or use more complex heuristics.\n", + " # For this constrained requirement, we return the first empty cell found with 0.\n", + " for r, c in empty_cells:\n", + " return (r, c, 0) \n", + " \n", + " # Should not be reached if empty_cells is not empty, but defensive return\n", + " return (-1, -1, 0)\n", + "============================================================\n", + "\n", + " Valid moves: 36, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 33, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " # Helper function to check if placing 'num' at (r, c) is valid\n", + " def is_safe(b, r, c, num):\n", + " # 1. Check row\n", + " for j in range(9):\n", + " if j != c ...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 36, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_candidates(r, c):\n", + " \"\"\"Returns a set of possible valid numbers for cell (r, c).\"\"\"\n", + " if board[r][c] != 0:\n", + " return set()\n", + "\n", + " ...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 33, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " # Helper function to check if placing 'num' at (r, c) is valid based on the current board state\n", + " def is_valid(r, c, num, current_board):\n", + " # Check row\n", + " ...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 34, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " R = 9\n", + " C = 9\n", + "\n", + " def is_safe(r, c, num, current_board):\n", + " # Check row\n", + " for col in range(C):\n", + " if col != c and current_board[r][col] == num:...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_possible_numbers(r, c):\n", + " \"\"\"Determines which numbers (1-9) can legally be placed at board[r][c].\"\"\"\n", + " if board[r][c] != 0:\n", + " return []\n", + "\n", + " used_numbers = []\n", + "\n", + " # Check Row\n", + " for col in range(N):\n", + " if col != c and board[r][col] != 0:\n", + " used_numbers.append(board[r][col])\n", + "\n", + " # Check Column\n", + " for row in range(N):\n", + " if row != r and board[row][c] != 0:\n", + " used_numbers.append(board[row][c])\n", + "\n", + " # Check 3x3 Box\n", + " start_row = (r // 3) * 3\n", + " start_col = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_row + i\n", + " curr_c = start_col + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] != 0:\n", + " used_numbers.append(board[curr_r][curr_c])\n", + "\n", + " # Find numbers from 1 to 9 that are not in used_numbers\n", + " possible = []\n", + " for num in range(1, 10):\n", + " is_used = False\n", + " for used in used_numbers:\n", + " if num == used:\n", + " is_used = True\n", + " break\n", + " if not is_used:\n", + " possible.append(num)\n", + " \n", + " return possible\n", + "\n", + " # Iterate through all cells to find a Naked Single candidate\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Only consider cells that are empty in the current board AND were empty initially\n", + " if board[r][c] == 0 and initial[r][c] == 0:\n", + " \n", + " possible = get_possible_numbers(r, c)\n", + " \n", + " # If exactly one number can be placed, we found our move\n", + " if len(possible) == 1:\n", + " number = possible[0]\n", + " return (r, c, number)\n", + "\n", + " # If no Naked Single move is found, return None (or an arbitrary value, \n", + " # but None is clearer for \"no immediate move found\")\n", + " return None\n", + "============================================================\n", + "\n", + " Valid moves: 40, Final state: success\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 40, Final state: success\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " def is_valid_placement(r, c, num):\n", + " # Check row\n", + " for col in range(9):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + " ...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 9, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_possible_numbers(r, c):\n", + " if board[r][c] != 0:\n", + " return []\n", + "\n", + " used_in_row = set()\n", + " for val in board[r]:\n", + " if...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 36, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 40, Final state: success\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_safe(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + "\n", + " # 3. Check 3x3 box\n", + " start_row = (r // 3) * 3\n", + " start_col = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_row + i\n", + " curr_c = start_col + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + " \n", + " return True\n", + "\n", + " # Iterate through every cell to find the first valid move\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Only consider cells that are empty in both the initial puzzle and the current board\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " for num in range(1, 10):\n", + " if is_safe(r, c, num):\n", + " # Found the first valid move based on simple checking\n", + " return (r, c, num)\n", + "\n", + " # If no move is found (puzzle is full or unsolvable by this simple check)\n", + " return (-1, -1, -1)\n", + "============================================================\n", + "\n", + " Valid moves: 40, Final state: success\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 40, Final state: success\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_possible_values(r, c):\n", + " \"\"\"Returns a set of numbers (1-9) that can legally be placed at (r, c).\"\"\"\n", + " if board[r][c] != 0:\n", + " ...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 40, Final state: success\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_safe(r, c, num):\n", + " # Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " ...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 30, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_safe(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 10, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_box_coords(r, c):\n", + " \"\"\"Calculates the top-left coordinates of the 3x3 box containing (r, c).\"\"\"\n", + " start_row = (r // 3) * 3\n", + " star...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_safe(r, c, num):\n", + " # Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + " \n", + " # Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + " \n", + " # Check 3x3 box\n", + " start_row = (r // 3) * 3\n", + " start_col = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_row + i\n", + " curr_c = start_col + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + " \n", + " return True\n", + "\n", + " # 1. Find all empty cells that need to be filled (board[r][c] == 0)\n", + " empty_cells = []\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Only consider cells that are empty in both the initial puzzle and the current board\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " empty_cells.append((r, c))\n", + "\n", + " # 2. Iterate through empty cells and check for Naked Singles\n", + " for r, c in empty_cells:\n", + " possible_numbers = []\n", + " \n", + " # Determine which numbers (1-9) are possible for this cell (r, c)\n", + " for num in range(1, 10):\n", + " if is_safe(r, c, num):\n", + " possible_numbers.append(num)\n", + " \n", + " # If exactly one number is possible, we found a Naked Single\n", + " if len(possible_numbers) == 1:\n", + " return (r, c, possible_numbers[0])\n", + "\n", + " # If no immediate Naked Single is found, return None (or a default failure state)\n", + " # Since the prompt implies a move should be returned, if no single move is found, \n", + " # we return None, though in a real solver, backtracking or deeper heuristics would follow.\n", + " return None\n", + "============================================================\n", + "\n", + " Valid moves: 40, Final state: success\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 30, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_safe(r, c, num):\n", + " # Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " ...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 40, Final state: success\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_candidates(r, c):\n", + " \"\"\"Calculates the set of valid candidates (1-9) for cell (r, c) based on current board state.\"\"\"\n", + " candidates = []\n", + "...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 36, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_candidates(r, c):\n", + " \"\"\"Determines possible candidates for cell (r, c) based on current board state.\"\"\"\n", + " if board[r][c] != 0:\n", + " ...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 36, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_safe(r, c, num):\n", + " # 1. Check row\n", + " for col_idx in range(N):\n", + " if col_idx != c and board[r][col_idx] == num:\n", + " r...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_possible_values(r, c):\n", + " \"\"\"Determines the set of valid numbers (1-9) that can be placed at (r, c).\"\"\"\n", + " if board[r][c] != 0:\n", + " return set() # Already filled\n", + "\n", + " possible = set(range(1, 10))\n", + "\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] != 0 and board[r][col] in possible:\n", + " possible.discard(board[r][col])\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] != 0 and board[row][c] in possible:\n", + " possible.discard(board[row][c])\n", + "\n", + " # 3. Check 3x3 box\n", + " start_row = (r // 3) * 3\n", + " start_col = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r, curr_c = start_row + i, start_col + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] != 0 and board[curr_r][curr_c] in possible:\n", + " possible.discard(board[curr_r][curr_c])\n", + "\n", + " return possible\n", + "\n", + " # 1. Find all candidate empty cells (where initial and board are 0)\n", + " candidate_cells = []\n", + " for r in range(N):\n", + " for c in range(N):\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " candidate_cells.append((r, c))\n", + "\n", + " if not candidate_cells:\n", + " return None # No moves possible\n", + "\n", + " best_score = 10 # Max possible options is 9\n", + " best_move = None\n", + "\n", + " # 2. Evaluate each candidate cell\n", + " for r, c in candidate_cells:\n", + " possible_values = get_possible_values(r, c)\n", + " num_options = len(possible_values)\n", + "\n", + " # We prioritize cells with fewer options (MRV heuristic)\n", + " if num_options < best_score:\n", + " best_score = num_options\n", + " best_move = (r, c, list(possible_values)[0]) # Pick the first available option\n", + "\n", + " # If we find a cell with 0 options, it means the current board state is invalid\n", + " # based on the initial constraints, but we still return the best found so far.\n", + " if best_score == 0:\n", + " break\n", + "\n", + " if best_move:\n", + " return best_move\n", + " else:\n", + " # This case should ideally not be reached if candidate_cells is not empty,\n", + " # but as a fallback, return None.\n", + " return None\n", + "============================================================\n", + "\n", + " Valid moves: 40, Final state: success\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 33, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 33, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_safe(r, c, num):\n", + " # Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " ...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 36, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_safe(r, c, num):\n", + " # Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + " ...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 38, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_safe(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + "\n", + " # 3. Check 3x3 box\n", + " start_row = (r // 3) * 3\n", + " start_col = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_row + i\n", + " curr_c = start_col + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + "\n", + " return True\n", + "\n", + " # Iterate through every cell to find the first possible move\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Only consider placing a number if the cell is currently empty (0)\n", + " if board[r][c] == 0:\n", + " # Determine what numbers are allowed in this cell based on the initial puzzle constraints\n", + " # (Although the prompt implies we only place numbers in cells that are empty in BOTH,\n", + " # we must ensure we only test numbers 1-9)\n", + " \n", + " for num in range(1, 10):\n", + " # Check if placing 'num' at (r, c) is valid according to Sudoku rules\n", + " if is_safe(r, c, num):\n", + " # Found the first valid move\n", + " return (r, c, num)\n", + "\n", + " # If no move is found, return None (or handle as per specific requirements, \n", + " # but for this prompt, we assume a move is usually found or we return a default/failure state)\n", + " return (None, None, None)\n", + "============================================================\n", + "\n", + " Valid moves: 31, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 40, Final state: success\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 40, Final state: success\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 40, Final state: success\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_valid_candidates(r, c):\n", + " \"\"\"Returns a set of numbers (1-9) that can legally be placed at (r, c).\"\"\"\n", + " if board[r][c] != 0:\n", + " ...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 30, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + "\n", + " # 3. Check 3x3 box\n", + " start_r = (r // 3) * 3\n", + " start_c = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_r + i\n", + " curr_c = start_c + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + "\n", + " return True\n", + "\n", + " # Iterate through the board to find the first valid move\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Only consider cells that are empty in both the initial puzzle and the current board\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " # Try placing numbers 1 through 9\n", + " for num in range(1, 10):\n", + " if is_valid_placement(r, c, num):\n", + " return (r, c, num)\n", + "\n", + " # If no move is found (puzzle is full or unsolvable by this simple heuristic)\n", + " return None\n", + "============================================================\n", + "\n", + " Valid moves: 30, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 31, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 31, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_safe(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 28, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_candidates(r, c):\n", + " \"\"\"Returns a set of valid numbers (1-9) that can be placed at (r, c).\"\"\"\n", + " if board[r][c] != 0:\n", + " return ...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_safe(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + "\n", + " # 3. Check 3x3 box\n", + " start_row = (r // 3) * 3\n", + " start_col = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_row + i\n", + " curr_c = start_col + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + "\n", + " return True\n", + "\n", + " # 1. Find the first empty cell (r, c) where both board and initial are 0\n", + " for r in range(N):\n", + " for c in range(N):\n", + " if board[r][c] == 0 and initial[r][c] == 0:\n", + " # Found an empty cell. Now find the smallest valid number.\n", + " for num in range(1, 10):\n", + " if is_safe(r, c, num):\n", + " return (r, c, num)\n", + "\n", + " # If no move is found (puzzle is full or no empty cells matching criteria)\n", + " return None\n", + "============================================================\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " # Helper to check if a number 'num' is valid at (r, c) based on current 'board' state\n", + " def is_valid(r, c, num):\n", + " # 1. Check Row\n", + " for col i...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 34, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 40, Final state: success\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_candidates(r, c, num):\n", + " # Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return Fa...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 30, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check Row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " # 2. Check Column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + "\n", + " # 3. Check 3x3 Box\n", + " start_row = (r // 3) * 3\n", + " start_col = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_row + i\n", + " curr_c = start_col + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + "\n", + " return True\n", + "\n", + " # Iterate through all cells to find the first valid move\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Only consider cells that are empty in both the initial puzzle and the current board\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " for num in range(1, 10):\n", + " if is_valid_placement(r, c, num):\n", + " return (r, c, num)\n", + "\n", + " # If no valid move is found (the board is full or unsolvable from this state)\n", + " # We return None or an arbitrary marker if no move exists.\n", + " return None\n", + "============================================================\n", + "\n", + " Valid moves: 30, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row constraint\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " ...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + "\n", + " # 3. Check 3x3 box\n", + " start_row = (r // 3) * 3\n", + " start_col = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_row + i\n", + " curr_c = start_col + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + "\n", + " return True\n", + "\n", + " # Iterate through all cells to find the first valid move\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Only consider cells that are empty in both the initial puzzle and the current board\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " for num in range(1, 10):\n", + " if is_valid_placement(r, c, num):\n", + " return (r, c, num)\n", + "\n", + " # If no move is found (puzzle is full or unsolvable based on this simple search), return None or similar indication.\n", + " # Since the prompt implies a move *will* be found if the board isn't complete, we return None if stuck.\n", + " return None\n", + "============================================================\n", + "\n", + " Valid moves: 34, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 34, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 31, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 31, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check Row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check Row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " # 2. Check Column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + "\n", + " # 3. Check 3x3 Box\n", + " start_row = (r // 3) * 3\n", + " start_col = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_row + i\n", + " curr_c = start_col + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + "\n", + " return True\n", + "\n", + " # Iterate through all cells to find the first valid move\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Constraint: Only place numbers in cells that are BOTH empty in initial AND empty in board\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " for num in range(1, 10):\n", + " if is_valid_placement(r, c, num):\n", + " return (r, c, num)\n", + "\n", + " # If no valid move is found (puzzle is full or unsolvable from this state)\n", + " # In a real solver, this would indicate failure or completion.\n", + " # For this specific function signature, we might return None or a default if no move exists.\n", + " return None\n", + "============================================================\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 34, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check Row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 34, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 34, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 34, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_possible_values(r, c):\n", + " \"\"\"Returns a set of valid numbers (1-9) that can be placed at (r, c).\"\"\"\n", + " if board[r][c] != 0:\n", + " return set()\n", + "\n", + " possible = set(range(1, 10))\n", + "\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] != 0 and board[r][col] in possible:\n", + " possible.discard(board[r][col])\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] != 0 and board[row][c] in possible:\n", + " possible.discard(board[row][c])\n", + "\n", + " # 3. Check 3x3 box\n", + " start_r = (r // 3) * 3\n", + " start_c = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_r + i\n", + " curr_c = start_c + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] != 0 and board[curr_r][curr_c] in possible:\n", + " possible.discard(board[curr_r][curr_c])\n", + "\n", + " return possible\n", + "\n", + " # Iterate through all cells to find the first valid move\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Check placement condition: Must be empty in both board and initial\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " \n", + " possible_values = get_possible_values(r, c)\n", + " \n", + " if possible_values:\n", + " # Found a valid move. Return the first one found (smallest number first for determinism)\n", + " # Since we only need *a* next move, we can sort and take the first.\n", + " \n", + " # To ensure a deterministic output, we sort the possibilities.\n", + " sorted_values = sorted(list(possible_values))\n", + " \n", + " return (r, c, sorted_values[0])\n", + "\n", + " # If no moves are found (puzzle is full or unsolvable from this state)\n", + " return (-1, -1, -1) # Sentinel for no move found\n", + "============================================================\n", + "\n", + " Valid moves: 31, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 0, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 31, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 31, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check Row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + "\n", + " # 3. Check 3x3 box\n", + " start_r = (r // 3) * 3\n", + " start_c = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_r + i\n", + " curr_c = start_c + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + "\n", + " return True\n", + "\n", + " # Iterate through all cells to find the first valid move\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Only consider cells that are empty in both the initial puzzle and the current board\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " for num in range(1, 10):\n", + " if is_valid_placement(r, c, num):\n", + " return (r, c, num)\n", + "\n", + " # If no move is found (puzzle is full or unsolvable by this strategy)\n", + " return (-1, -1, -1)\n", + "============================================================\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check Row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + "\n", + " # 3. Check 3x3 box\n", + " start_row = (r // 3) * 3\n", + " start_col = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_row + i\n", + " curr_c = start_col + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + "\n", + " return True\n", + "\n", + " # Iterate through all cells to find the first valid move\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Condition: Must be empty in both the initial puzzle and the current board\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " for num in range(1, 10):\n", + " if is_valid_placement(r, c, num):\n", + " return (r, c, num)\n", + "\n", + " # If no valid move is found (puzzle is full or unsolvable from this state)\n", + " return (-1, -1, -1)\n", + "============================================================\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 40, Final state: success\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_box_coords(r, c):\n", + " \"\"\"Returns the starting (top-left) coordinates (box_r, box_c) for the 3x3 box containing (r, c).\"\"\"\n", + " box_r = (r //...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 36, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + "\n", + " # 3. Check 3x3 box\n", + " start_r = (r // 3) * 3\n", + " start_c = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_r + i\n", + " curr_c = start_c + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + "\n", + " return True\n", + "\n", + " # Iterate through all cells to find the first valid move\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Check placement constraints: must be empty in both current board and initial\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " # Try placing numbers 1 through 9\n", + " for num in range(1, 10):\n", + " if is_valid_placement(r, c, num):\n", + " return (r, c, num)\n", + "\n", + " # If no move is found (puzzle is full or unsolvable by this simple heuristic)\n", + " return (-1, -1, -1)\n", + "============================================================\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 30, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check Row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 30, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 29, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 29, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + "\n", + " # 3. Check 3x3 box\n", + " start_r = (r // 3) * 3\n", + " start_c = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_r + i\n", + " curr_c = start_c + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + "\n", + " return True\n", + "\n", + " # Find all candidate empty cells (where initial and board are both 0)\n", + " candidates = []\n", + " for r in range(N):\n", + " for c in range(N):\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " candidates.append((r, c))\n", + "\n", + " # Iterate through candidates and find the first valid move\n", + " for r, c in candidates:\n", + " for num in range(1, 10):\n", + " if is_valid_placement(r, c, num):\n", + " return (r, c, num)\n", + "\n", + " # If no valid move is found (the board is full or stuck), return None or handle as necessary.\n", + " # Based on the prompt, we assume a move should be returned if possible.\n", + " return None\n", + "============================================================\n", + "\n", + " Valid moves: 33, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 33, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 36, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 36, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + "\n", + " # 3. Check 3x3 box\n", + " start_r = (r // 3) * 3\n", + " start_c = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_r + i\n", + " curr_c = start_c + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + "\n", + " return True\n", + "\n", + " # Iterate through all cells to find the first valid move\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Constraint: Only place numbers in cells that are BOTH empty in initial AND empty in board\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " for num in range(1, 10):\n", + " if is_valid_placement(r, c, num):\n", + " return (r, c, num)\n", + "\n", + " # If no move is found (puzzle is full or unsolvable from this state)\n", + " return (-1, -1, -1)\n", + "============================================================\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 33, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 33, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 38, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 38, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + "\n", + " # 3. Check 3x3 box\n", + " start_r = (r // 3) * 3\n", + " start_c = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_r + i\n", + " curr_c = start_c + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + "\n", + " return True\n", + "\n", + " # Iterate through all cells to find the first valid move\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Constraint: Only place numbers in cells that are BOTH empty in initial AND empty in board\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " for num in range(1, 10):\n", + " if is_valid_placement(r, c, num):\n", + " return (r, c, num)\n", + "\n", + " # If no valid move is found (puzzle is full or unsolvable from this state)\n", + " return (-1, -1, -1)\n", + "============================================================\n", + "\n", + " Valid moves: 31, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 31, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 31, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check Row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 31, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_candidates(r, c):\n", + " \"\"\"Returns a set of valid numbers (1-9) that can be placed at (r, c).\"\"\"\n", + " if board[r][c] != 0:\n", + " return ...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 34, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + "\n", + " # 3. Check 3x3 box\n", + " start_r = (r // 3) * 3\n", + " start_c = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_r + i\n", + " curr_c = start_c + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + "\n", + " return True\n", + "\n", + " # Iterate through all cells to find the first valid move\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Only consider cells that are empty in both the initial puzzle and the current board\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " for num in range(1, 10):\n", + " if is_valid_placement(r, c, num):\n", + " return (r, c, num)\n", + "\n", + " # If no move is found (puzzle is full or stuck), return None or handle as appropriate.\n", + " # Since the prompt implies a move can be found if the state is solvable, \n", + " # we return None if nothing is found.\n", + " return None\n", + "============================================================\n", + "\n", + " Valid moves: 34, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 36, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_possible_values(r, c):\n", + " \"\"\"Returns a set of valid numbers (1-9) that can be placed at (r, c).\"\"\"\n", + " if board[r][c] != 0:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 36, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_box_coords(r, c):\n", + " \"\"\"Returns the starting (top-left) coordinates (r_start, c_start) of the 3x3 box containing (r, c).\"\"\"\n", + " r_start = ...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_possible_values(r, c):\n", + " \"\"\"Returns a set of valid numbers (1-9) that can be placed at (r, c).\"\"\"\n", + " if board[r][c] != 0:\n", + " return set()\n", + "\n", + " possible = set(range(1, 10))\n", + "\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] != 0 and board[r][col] in possible:\n", + " possible.discard(board[r][col])\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] != 0 and board[row][c] in possible:\n", + " possible.discard(board[row][c])\n", + "\n", + " # 3. Check 3x3 box\n", + " start_r = (r // 3) * 3\n", + " start_c = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_r + i\n", + " curr_c = start_c + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] != 0 and board[curr_r][curr_c] in possible:\n", + " possible.discard(board[curr_r][curr_c])\n", + "\n", + " return possible\n", + "\n", + " # Iterate through all cells to find the first valid move\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Constraint: Only place numbers if the cell is empty in both board and initial\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " \n", + " possible_moves = get_possible_values(r, c)\n", + " \n", + " if possible_moves:\n", + " # Found a valid move. Return the first one found (smallest number first for determinism)\n", + " \n", + " # Since we cannot use sorted() easily without import, we iterate 1 to 9\n", + " for num in range(1, 10):\n", + " if num in possible_moves:\n", + " return (r, c, num)\n", + " \n", + " # If no move is found (puzzle is full or unsolvable from this state)\n", + " return (-1, -1, -1)\n", + "============================================================\n", + "\n", + " Valid moves: 31, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 31, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "└───────┴───────┴───────┘\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " Valid moves: 38, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check Row\n", + " for j in range(N):\n", + " if j != c and board[r][j] == num:\n", + " return F...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 38, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 31, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_possible_values(r, c):\n", + " \"\"\"Returns a set of valid numbers (1-9) that can be placed at (r, c).\"\"\"\n", + " if board[r][c] != 0:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + "\n", + " # 3. Check 3x3 box\n", + " start_r = (r // 3) * 3\n", + " start_c = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_r + i\n", + " curr_c = start_c + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + "\n", + " return True\n", + "\n", + " # Iterate through all cells to find the first valid move\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Only consider cells that are empty in both the initial puzzle and the current board\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " for num in range(1, 10):\n", + " if is_valid_placement(r, c, num):\n", + " return (r, c, num)\n", + "\n", + " # If no move is found (puzzle is full or stuck), return None or handle as required.\n", + " # Based on the prompt, we assume a move exists if the puzzle isn't solved.\n", + " return None\n", + "============================================================\n", + "\n", + " Valid moves: 31, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 34, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 34, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 0, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_candidates(r, c):\n", + " \"\"\"Returns a set of valid candidates for cell (r, c).\"\"\"\n", + " if board[r][c] != 0:\n", + " return set()\n", + "\n", + " c...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 33, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + "\n", + " # 3. Check 3x3 box\n", + " start_r = (r // 3) * 3\n", + " start_c = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_r + i\n", + " curr_c = start_c + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + "\n", + " return True\n", + "\n", + " # Iterate through all cells\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Only consider cells that are empty in both the initial puzzle and the current board\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " # Try placing numbers 1 through 9\n", + " for num in range(1, 10):\n", + " if is_valid_placement(r, c, num):\n", + " return (r, c, num)\n", + "\n", + " # If no move is found (puzzle is full or unsolvable by this heuristic)\n", + " return (-1, -1, -1) # Indicate no immediate move found\n", + "============================================================\n", + "\n", + " Valid moves: 31, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 31, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check Row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 35, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_box_coords(r, c):\n", + " \"\"\"Returns the (start_r, start_c) of the 3x3 box containing (r, c).\"\"\"\n", + " start_r = (r // 3) * 3\n", + " start_c = (...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 35, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 28, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check Row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + "\n", + " # 3. Check 3x3 box\n", + " start_r = (r // 3) * 3\n", + " start_c = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_r + i\n", + " curr_c = start_c + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + "\n", + " return True\n", + "\n", + " # Iterate through all cells to find the first valid move\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Check placement constraints:\n", + " # 1. Must be empty in the current board\n", + " # 2. Must be empty in the initial puzzle (fixed numbers cannot be overwritten)\n", + " if board[r][c] == 0 and initial[r][c] == 0:\n", + " # Try placing numbers 1 through 9\n", + " for num in range(1, 10):\n", + " if is_valid_placement(r, c, num):\n", + " return (r, c, num)\n", + "\n", + " # If no move is found (puzzle is full or blocked), return None or handle as appropriate.\n", + " # Based on the prompt, we assume a move should be found if the puzzle isn't solved.\n", + " return None\n", + "============================================================\n", + "\n", + " Valid moves: 28, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 31, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 31, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check Row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 33, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 40, Final state: success\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_candidates(r, c):\n", + " \"\"\"Returns a list of valid numbers (1-9) that can be placed at (r, c).\"\"\"\n", + " if board[r][c] != 0:\n", + " return...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + "\n", + " # 3. Check 3x3 box\n", + " start_r = (r // 3) * 3\n", + " start_c = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_r + i\n", + " curr_c = start_c + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + "\n", + " return True\n", + "\n", + " # Iterate through all cells to find the first valid move\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Constraint: Only place numbers in cells that are BOTH empty in initial AND empty in board\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " for num in range(1, 10):\n", + " if is_valid_placement(r, c, num):\n", + " return (r, c, num)\n", + "\n", + " # If no valid move is found (puzzle is solved or stuck), return None or handle as appropriate.\n", + " # Based on the prompt, we assume a move *can* be found if the puzzle isn't solved.\n", + " return None\n", + "============================================================\n", + "\n", + " Valid moves: 38, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 38, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 31, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_possible_values(r, c):\n", + " \"\"\"Returns a set of valid numbers (1-9) that can be placed at (r, c).\"\"\"\n", + " if board[r][c] != 0:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 31, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 35, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_box_coords(r, c):\n", + " \"\"\"Returns the starting (top-left) coordinates (start_r, start_c) of the 3x3 box containing (r, c).\"\"\"\n", + " start_r = (r // 3) * 3\n", + " start_c = (c // 3) * 3\n", + " return start_r, start_c\n", + "\n", + " # 1. Identify all potential empty cells\n", + " potential_moves = []\n", + " for r in range(N):\n", + " for c in range(N):\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " potential_moves.append((r, c))\n", + "\n", + " # 2. Iterate through potential moves and check validity for numbers 1 through 9\n", + " for r, c in potential_moves:\n", + " for num in range(1, 10):\n", + " is_valid = True\n", + "\n", + " # Check Row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " is_valid = False\n", + " break\n", + " if not is_valid:\n", + " continue\n", + "\n", + " # Check Column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " is_valid = False\n", + " break\n", + " if not is_valid:\n", + " continue\n", + "\n", + " # Check 3x3 Box\n", + " start_r, start_c = get_box_coords(r, c)\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_r + i\n", + " curr_c = start_c + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " is_valid = False\n", + " break\n", + " if not is_valid:\n", + " break\n", + " \n", + " if is_valid:\n", + " # Found the first valid move\n", + " return (r, c, num)\n", + "\n", + " # If no move is found (puzzle is solved or blocked), return None or handle as per requirement.\n", + " # Since the prompt implies a move should be returned if possible, we return None if stuck.\n", + " return None\n", + "============================================================\n", + "\n", + " Valid moves: 35, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 33, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 33, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 37, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 37, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_box_coords(r, c):\n", + " \"\"\"Returns the starting (r_start, c_start) coordinates of the 3x3 box containing (r, c).\"\"\"\n", + " box_r_start = (r // 3) * 3\n", + " box_c_start = (c // 3) * 3\n", + " return box_r_start, box_c_start\n", + "\n", + " def is_safe(r, c, num):\n", + " \"\"\"Checks if placing 'num' at (r, c) is valid according to Sudoku rules.\"\"\"\n", + "\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + "\n", + " # 3. Check 3x3 box\n", + " r_start, c_start = get_box_coords(r, c)\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = r_start + i\n", + " curr_c = c_start + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + "\n", + " return True\n", + "\n", + " # Iterate through all cells to find the first valid move\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Constraint: Only place numbers in cells that are BOTH empty in initial AND empty in board\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " for num in range(1, 10):\n", + " if is_safe(r, c, num):\n", + " return (r, c, num)\n", + "\n", + " # If no move is found (puzzle is full or unsolvable from this state)\n", + " return (-1, -1, -1)\n", + "============================================================\n", + "\n", + " Valid moves: 30, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 30, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " retur...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 33, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 33, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 40, Final state: success\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_candidates(r, c):\n", + " \"\"\"Returns a set of valid numbers (1-9) that can be placed at (r, c).\"\"\"\n", + " if board[r][c] != 0:\n", + " return ...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_box_coords(r, c):\n", + " \"\"\"Returns the starting (top-left) coordinates (box_r, box_c) for the 3x3 box containing (r, c).\"\"\"\n", + " box_r = (r // 3) * 3\n", + " box_c = (c // 3) * 3\n", + " return box_r, box_c\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " \"\"\"Checks if placing 'num' at (r, c) is valid according to Sudoku rules.\"\"\"\n", + "\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + "\n", + " # 3. Check 3x3 box\n", + " box_r, box_c = get_box_coords(r, c)\n", + " for i in range(box_r, box_r + 3):\n", + " for j in range(box_c, box_c + 3):\n", + " if (i != r or j != c) and board[i][j] == num:\n", + " return False\n", + "\n", + " return True\n", + "\n", + " # Iterate through all cells to find the first valid move\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Rule: Only place numbers in cells that are BOTH empty in initial AND empty in board\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " for num in range(1, 10):\n", + " if is_valid_placement(r, c, num):\n", + " return (r, c, num)\n", + "\n", + " # If no move is found (puzzle is full or unsolvable from this state)\n", + " return (-1, -1, -1)\n", + "============================================================\n", + "\n", + " Valid moves: 34, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 33, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 33, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 34, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 34, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m6\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + "\n", + " # 3. Check 3x3 box\n", + " start_r = (r // 3) * 3\n", + " start_c = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_r + i\n", + " curr_c = start_c + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + "\n", + " return True\n", + "\n", + " # Iterate through all cells to find the first valid move\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Only consider cells that are empty in both the initial puzzle and the current board\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " for num in range(1, 10):\n", + " if is_valid_placement(r, c, num):\n", + " return (r, c, num)\n", + "\n", + " # If no move is found (puzzle is solved or stuck for this strategy iteration)\n", + " return (-1, -1, -1)\n", + "============================================================\n", + "\n", + " Valid moves: 30, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 30, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m7\u001b[0m │\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " retur...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 32, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_safe(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 34, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " retur...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + "============================================================\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " return False\n", + "\n", + " # 2. Check column\n", + " for row in range(N):\n", + " if row != r and board[row][c] == num:\n", + " return False\n", + "\n", + " # 3. Check 3x3 box\n", + " start_r = (r // 3) * 3\n", + " start_c = (c // 3) * 3\n", + " for i in range(3):\n", + " for j in range(3):\n", + " curr_r = start_r + i\n", + " curr_c = start_c + j\n", + " if (curr_r != r or curr_c != c) and board[curr_r][curr_c] == num:\n", + " return False\n", + "\n", + " return True\n", + "\n", + " # Find all valid cells to try\n", + " possible_moves = []\n", + " for r in range(N):\n", + " for c in range(N):\n", + " # Only consider cells that are empty in both the initial puzzle and the current board\n", + " if initial[r][c] == 0 and board[r][c] == 0:\n", + " for num in range(1, 10):\n", + " if is_valid_placement(r, c, num):\n", + " possible_moves.append((r, c, num))\n", + " # Optimization: Since we only need *one* next move, we can break early\n", + " # after finding the first valid move for this specific (r, c)\n", + " break\n", + "\n", + " if possible_moves:\n", + " # Return the first valid move found (based on iteration order: row-major)\n", + " return possible_moves[0]\n", + " else:\n", + " # If no immediate move is found, return None or handle as appropriate.\n", + " # Based on the prompt, we must return (row, col, number). If no move exists,\n", + " # we return a sentinel value, though typically in a solver, this means the puzzle is stuck\n", + " # or needs backtracking. We will return (-1, -1, -1) to signify no move found.\n", + " return (-1, -1, -1)\n", + "============================================================\n", + "\n", + " Valid moves: 34, Final state: failed\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m9\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 33, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " retur...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 40, Final state: success\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " retur...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m6\u001b[0m │ \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;226m7\u001b[0m │\n", + "│ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m │ \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m9\u001b[0m │\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m1\u001b[0m │\n", + "│ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;226m8\u001b[0m │\n", + "│ \u001b[38;5;45m5\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m │\n", + "│ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m4\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 29, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def is_valid_placement(r, c, num):\n", + " # 1. Check row\n", + " for col in range(N):\n", + " if col != c and board[r][col] == num:\n", + " re...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "└───────┴───────┴───────┘\n", + "\n", + " Valid moves: 29, Final state: failed\n", + "Strategy:\n", + "def strategy(board, initial):\n", + " N = 9\n", + "\n", + " def get_candidates(r, c):\n", + " \"\"\"Returns a set of valid numbers (1-9) that can be placed at (r, c).\"\"\"\n", + " if board[r][c] != 0:\n", + " return ...\n", + "\n", + "Final board:\n", + "┌───────┬───────┬───────┐\n", + "│ \u001b[38;5;226m3\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m7\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m5\u001b[0m │ \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m9\u001b[0m │\n", + "│ \u001b[38;5;226m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;45m3\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m7\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;226m5\u001b[0m │\n", + "│ \u001b[38;5;226m5\u001b[0m \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m7\u001b[0m │ \u001b[38;5;226m1\u001b[0m \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m2\u001b[0m \u001b[38;5;226m3\u001b[0m \u001b[38;5;226m5\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m4\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m1\u001b[0m │\n", + "│ \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;45m4\u001b[0m │ \u001b[38;5;45m1\u001b[0m \u001b[38;5;226m2\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;45m9\u001b[0m \u001b[38;5;45m8\u001b[0m \u001b[38;5;45m6\u001b[0m │\n", + "│ \u001b[38;5;45m8\u001b[0m \u001b[38;5;226m9\u001b[0m \u001b[38;5;45m1\u001b[0m │ \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m3\u001b[0m \u001b[38;5;45m6\u001b[0m │ \u001b[38;5;226m4\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;45m2\u001b[0m │\n", + "├───────┼───────┼───────┤\n", + "│ \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;239m.\u001b[0m │ \u001b[38;5;226m6\u001b[0m \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m3\u001b[0m │ \u001b[38;5;45m2\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m8\u001b[0m │\n", + "│ \u001b[38;5;226m6\u001b[0m \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m2\u001b[0m │ \u001b[38;5;226m8\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;226m1\u001b[0m │ \u001b[38;5;45m5\u001b[0m \u001b[38;5;45m9\u001b[0m \u001b[38;5;226m4\u001b[0m │\n", + "│ \u001b[38;5;45m4\u001b[0m \u001b[38;5;45m1\u001b[0m \u001b[38;5;45m8\u001b[0m │ \u001b[38;5;239m.\u001b[0m \u001b[38;5;226m5\u001b[0m \u001b[38;5;45m9\u001b[0m │ \u001b[38;5;45m6\u001b[0m \u001b[38;5;45m7\u001b[0m \u001b[38;5;239m.\u001b[0m │\n", + "└───────┴───────┴───────┘\n" + ] + } + ], + "source": [ + "trainer.train()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mPCjD6Tdnqi8" + }, + "source": [ + "And now with the LoRA we just trained with GRPO - we first save the LoRA first!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MPjc98y_nrTF" + }, + "outputs": [], + "source": [ + "model.save_pretrained(\"gemma_4_lora\") # Local saving\n", + "tokenizer.save_pretrained(\"gemma_4_lora\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eC0MBN8AnvJb" + }, + "source": [ + "Verify LoRA is actually trained!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "fFBpVx6SnytW" + }, + "outputs": [], + "source": [ + "from safetensors import safe_open\n", + "\n", + "tensors = {}\n", + "with safe_open(\"grpo_saved_lora/adapter_model.safetensors\", framework = \"pt\") as f:\n", + " # Verify both A and B are non zero\n", + " for key in f.keys():\n", + " tensor = f.get_tensor(key)\n", + " n_zeros = (tensor == 0).sum() / tensor.numel()\n", + " assert(n_zeros.item() != tensor.numel())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tlaUdxC_VHpz" + }, + "source": [ + "\n", + "# Inference\n", + "Now let's try the model we just trained!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "8BZZHOKiF9Ct" + }, + "outputs": [], + "source": [ + "text = tokenizer.apply_chat_template(\n", + " [{\"role\": \"user\", \"content\": prompt.strip()}],\n", + " tokenize = False,\n", + " add_generation_prompt = True,\n", + ")\n", + "\n", + "from transformers import TextStreamer\n", + "\n", + "_ = model.generate(\n", + " **tokenizer(images = None,text = text, return_tensors = \"pt\").to(\"cuda\"),\n", + " temperature = 1.0,\n", + " max_new_tokens = 512,\n", + " streamer = TextStreamer(tokenizer, skip_prompt = False),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-NUEmHFSYNTp" + }, + "source": [ + "\n", + "### Saving to float16 for VLLM\n", + "\n", + "We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See [our docs](https://unsloth.ai/docs/basics/inference-and-deployment) for more deployment options." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NjXGTkp7YNtB" + }, + "outputs": [], + "source": [ + "# Merge to 16bit\n", + "if False: model.save_pretrained_merged(\"gemma_4_finetune_16bit\", tokenizer, save_method = \"merged_16bit\",)\n", + "if False: model.push_to_hub_merged(\"HF_USERNAME/gemma_4_finetune_16bit\", tokenizer, save_method = \"merged_16bit\", token = \"YOUR_HF_TOKEN\")\n", + "\n", + "# Merge to 4bit\n", + "if False: model.save_pretrained_merged(\"gemma_4_finetune_4bit\", tokenizer, save_method = \"merged_4bit\",)\n", + "if False: model.push_to_hub_merged(\"HF_USERNAME/gemma_4_finetune_4bit\", tokenizer, save_method = \"merged_4bit\", token = \"YOUR_HF_TOKEN\")\n", + "\n", + "# Just LoRA adapters\n", + "if False:\n", + " model.save_pretrained(\"gemma_4_lora\")\n", + " tokenizer.save_pretrained(\"gemma_4_lora\")\n", + "if False:\n", + " model.push_to_hub(\"HF_USERNAME/gemma_4_lora\", token = \"YOUR_HF_TOKEN\")\n", + " tokenizer.push_to_hub(\"HF_USERNAME/gemma_4_lora\", token = \"YOUR_HF_TOKEN\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_bBqsonjoD7Z" + }, + "source": [ + "### GGUF / llama.cpp Conversion\n", + "To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.\n", + "\n", + "Some supported quant methods (full list on our [docs page](https://unsloth.ai/docs/basics/inference-and-deployment/saving-to-gguf)):\n", + "* `q8_0` - Fast conversion. High resource use, but generally acceptable.\n", + "* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.\n", + "* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.\n", + "\n", + "[**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "TV0C6l7SoGDr" + }, + "outputs": [], + "source": [ + "# Save to 8bit Q8_0\n", + "if False: model.save_pretrained_gguf(\"gemma_4_finetune\", tokenizer,)\n", + "# Remember to go to https://huggingface.co/settings/tokens for a token!\n", + "# And change hf to your username!\n", + "if False: model.push_to_hub_gguf(\"HF_USERNAME/gemma_4_finetune\", tokenizer, token = \"YOUR_HF_TOKEN\")\n", + "\n", + "# Save to 16bit GGUF\n", + "if False: model.save_pretrained_gguf(\"gemma_4_finetune\", tokenizer, quantization_method = \"f16\")\n", + "if False: model.push_to_hub_gguf(\"HF_USERNAME/gemma_4_finetune\", tokenizer, quantization_method = \"f16\", token = \"YOUR_HF_TOKEN\")\n", + "\n", + "# Save to q4_k_m GGUF\n", + "if False: model.save_pretrained_gguf(\"gemma_4_finetune\", tokenizer, quantization_method = \"q4_k_m\")\n", + "if False: model.push_to_hub_gguf(\"HF_USERNAME/gemma_4_finetune\", tokenizer, quantization_method = \"q4_k_m\", token = \"YOUR_HF_TOKEN\")\n", + "\n", + "# Save to multiple GGUF options - much faster if you want multiple!\n", + "if False:\n", + " model.push_to_hub_gguf(\n", + " \"HF_USERNAME/gemma_4_finetune\", # Change hf to your username!\n", + " tokenizer,\n", + " quantization_method = [\"q4_k_m\", \"q8_0\", \"q5_k_m\",],\n", + " token = \"YOUR_HF_TOKEN\",\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tLtYuKEioJj9" + }, + "source": [ + "Now, use the `gemma_4_finetune.Q8_0.gguf` file or `gemma_4_finetune.Q4_K_M.gguf` file in llama.cpp.\n", + "\n", + "And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!\n", + "\n", + "Some other resources:\n", + "1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)\n", + "2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)\n", + "3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)\n", + "4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)!\n", + "\n", + "

\n", + " \n", + " \n", + " \n", + "\n", + " Join Discord if you need help + ⭐️ Star us on Github ⭐️\n", + "
\n", + "\n", + " This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme)." + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "022dbbd06beb434b8cde47f13d1d9f3b": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0b4ac68cb5ea43f8a2c196c7f24bb349": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "0c11138badda42f1b3681626bcbe6ed0": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0d90074b1b56432a8a8af05cadd57eea": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_b44c516743f944559b21e963b6e559b9", + "IPY_MODEL_e5e07f71eee54e828dcb7b9dc5305b56", + "IPY_MODEL_ceca217e98384421934db2fd81c78b08" + ], + "layout": "IPY_MODEL_74783be6311840b58560b8ae249292c8" + } + }, + "0dd65ad33c184ebaa5ccaef977f34e83": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "101a1cf35bb24d79a6e8b4ce77eb5bf7": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "152d919539764660b1223a911e580fa3": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "157505960e3f4d04be6bdeae7119b4a8": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "1e6181166ca0420eb9dce94d9ca0eceb": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_2161c02b9b984a47aa4539b5e3625ed5", + "IPY_MODEL_55e46a3096a4438d9663c3b5a0b75f34", + "IPY_MODEL_9825b277c4ca49a891c01739f2feaae0" + ], + "layout": "IPY_MODEL_284e45208cdd471ea6c361e5c9599481" + } + }, + "2161c02b9b984a47aa4539b5e3625ed5": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_2ca2adf3b2924431aba041483ad76c0e", + "placeholder": "​", + "style": "IPY_MODEL_3e6969868d724975bb067c1f962cbec6", + "value": "chat_template.jinja: " + } + }, + "284e45208cdd471ea6c361e5c9599481": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "29394f0ef562422993d43a78c93eb1f1": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "2ca2adf3b2924431aba041483ad76c0e": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2eb4215e9ef84c9b8ad121f0f6f531d0": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0dd65ad33c184ebaa5ccaef977f34e83", + "placeholder": "​", + "style": "IPY_MODEL_b34efc6222d3420e90c13f02b726e6b5", + "value": " 32.2M/32.2M [00:01<00:00, 160MB/s]" + } + }, + "2f67d427db03445ca24bc0f7a7a17cbd": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "31d9ffc210964cd5ab82dcf38cddbc5b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3d96500eae194cd3a6a9f92c39f8760e": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3e6969868d724975bb067c1f962cbec6": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "4e026a1dbae64137a45f7c2b114b3178": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "535adc052cf742b08633c3e05a9eeb52": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_101a1cf35bb24d79a6e8b4ce77eb5bf7", + "placeholder": "​", + "style": "IPY_MODEL_0b4ac68cb5ea43f8a2c196c7f24bb349", + "value": "model.safetensors: 100%" + } + }, + "55e46a3096a4438d9663c3b5a0b75f34": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_915b8f0bc15a416c8e31f163838db6b4", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_6421eac0ecd14d4e9a7debd117178381", + "value": 1 + } + }, + "5693cc3536314347856a708cceeda383": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5c19e35c5f934891b5ce135843529a5b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c5fa4165c9f44af8acb5a35d53269863", + "max": 10246621918, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_ce301e618b3e4a7f9c081174974f7548", + "value": 10246621918 + } + }, + "5f3cc4d26ee742bcb19dbfc9598080ee": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "5fec6f60f4274b9e964c916197dd155e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "6230a33eb2ad43cdbd8999a1dea7bd4c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_9539188d02584eb985835b4986935ced", + "placeholder": "​", + "style": "IPY_MODEL_ef49681d55954ce9876d0d1ce5d10fc9", + "value": "tokenizer.json: 100%" + } + }, + "6421eac0ecd14d4e9a7debd117178381": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "6761c9faa81040e1866b73ad40ca5e02": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "6c41947826904b1db57cc32055f0eccb": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_c688bd93590c4270acb23f423ca85946", + "IPY_MODEL_fdf731aed6144145a41c873115a3af61", + "IPY_MODEL_a98e3900409d4bcebf9ddc25441bf56a" + ], + "layout": "IPY_MODEL_a6dc10a4fb0d474e819b3ecfa656b66a" + } + }, + "6dd46cd00a7c45c484187857a89c844f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_6230a33eb2ad43cdbd8999a1dea7bd4c", + "IPY_MODEL_9d12ca85c4cb4c6d83cc70f0f4f7c527", + "IPY_MODEL_2eb4215e9ef84c9b8ad121f0f6f531d0" + ], + "layout": "IPY_MODEL_5693cc3536314347856a708cceeda383" + } + }, + "6ea2009e3bd442fba3e001bed472b00e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_98536a63b3c94c018a16a9997bda38cf", + "placeholder": "​", + "style": "IPY_MODEL_ae3e1b9c33964a94a405952e9120fd0a", + "value": " 1.69k/? [00:00<00:00, 107kB/s]" + } + }, + "72a14d17820e446eb22e6261b97ce3e4": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "743aa5ba0a8c41019cd0cbdb97b779df": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "74783be6311840b58560b8ae249292c8": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "75f48d69379947af82ddb10ce4fca453": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_e29039073cc7474cb0fe704cc7a668b7", + "placeholder": "​", + "style": "IPY_MODEL_5f3cc4d26ee742bcb19dbfc9598080ee", + "value": "processor_config.json: " + } + }, + "7f2f1d7c247f45bb93ff417cb186c2eb": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_535adc052cf742b08633c3e05a9eeb52", + "IPY_MODEL_5c19e35c5f934891b5ce135843529a5b", + "IPY_MODEL_cc934485e87748579a7b2176f8470dd0" + ], + "layout": "IPY_MODEL_dfdfcf6e8b564262bd28ff6eb2558190" + } + }, + "8a12ba8d6b9744468b652d460ce10295": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "8bba15c72cdf4b9e995af58424d2021e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_f5f6ac53523743bbad0d09a8476c917c", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_9805bc6b1149455ca9d6a09cf05992fe", + "value": 1 + } + }, + "913104b57a984d08af47073b60b2dc09": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_e5c3875f5d7f43e1acb57da68a5810d6", + "IPY_MODEL_8bba15c72cdf4b9e995af58424d2021e", + "IPY_MODEL_a6b55e2f0fca456c9c9a2c8546e56595" + ], + "layout": "IPY_MODEL_a82f008a06a6408c8520831378f433ef" + } + }, + "915b8f0bc15a416c8e31f163838db6b4": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "9273d234d79247fabcb5a2ace2ec118b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_75f48d69379947af82ddb10ce4fca453", + "IPY_MODEL_e2c30cdce58440b8afb1665297661883", + "IPY_MODEL_6ea2009e3bd442fba3e001bed472b00e" + ], + "layout": "IPY_MODEL_4e026a1dbae64137a45f7c2b114b3178" + } + }, + "9539188d02584eb985835b4986935ced": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9805bc6b1149455ca9d6a09cf05992fe": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "9825b277c4ca49a891c01739f2feaae0": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_fb0e65319f7a400da1457acf2a7f4b38", + "placeholder": "​", + "style": "IPY_MODEL_ab9ee6b2f00a476a84b634404c573cba", + "value": " 16.3k/? [00:00<00:00, 922kB/s]" + } + }, + "98536a63b3c94c018a16a9997bda38cf": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "996e80a09e254708923e6e76e92c2554": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "9d12ca85c4cb4c6d83cc70f0f4f7c527": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_e28e70aa5f834335ad7ea21450a25d32", + "max": 32169626, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_fdd2d273a3bb495e99ed707bc6783b5d", + "value": 32169626 + } + }, + "a0fab082e7364d3391d765cb785ed2db": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "a6b55e2f0fca456c9c9a2c8546e56595": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_72a14d17820e446eb22e6261b97ce3e4", + "placeholder": "​", + "style": "IPY_MODEL_31d9ffc210964cd5ab82dcf38cddbc5b", + "value": " 19.4k/? [00:00<00:00, 1.76MB/s]" + } + }, + "a6dc10a4fb0d474e819b3ecfa656b66a": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a82f008a06a6408c8520831378f433ef": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a98e3900409d4bcebf9ddc25441bf56a": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_f0820395ae644b50988896600368588b", + "placeholder": "​", + "style": "IPY_MODEL_8a12ba8d6b9744468b652d460ce10295", + "value": " 1951/1951 [00:44<00:00, 214.19it/s]" + } + }, + "ab9ee6b2f00a476a84b634404c573cba": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "ae3e1b9c33964a94a405952e9120fd0a": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "b34efc6222d3420e90c13f02b726e6b5": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "b44c516743f944559b21e963b6e559b9": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3d96500eae194cd3a6a9f92c39f8760e", + "placeholder": "​", + "style": "IPY_MODEL_6761c9faa81040e1866b73ad40ca5e02", + "value": "generation_config.json: 100%" + } + }, + "c5fa4165c9f44af8acb5a35d53269863": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c688bd93590c4270acb23f423ca85946": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0c11138badda42f1b3681626bcbe6ed0", + "placeholder": "​", + "style": "IPY_MODEL_cb84d4ce92124f94b4d7897cd3ee7fd4", + "value": "Loading weights: 100%" + } + }, + "c9dfc15b3f89476189e4d2347f504e2e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "cb84d4ce92124f94b4d7897cd3ee7fd4": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "cc934485e87748579a7b2176f8470dd0": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_2f67d427db03445ca24bc0f7a7a17cbd", + "placeholder": "​", + "style": "IPY_MODEL_c9dfc15b3f89476189e4d2347f504e2e", + "value": " 10.2G/10.2G [03:17<00:00, 176MB/s]" + } + }, + "ce301e618b3e4a7f9c081174974f7548": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "ceca217e98384421934db2fd81c78b08": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ea70dad5f2aa48a2adab37c178f34baf", + "placeholder": "​", + "style": "IPY_MODEL_a0fab082e7364d3391d765cb785ed2db", + "value": " 208/208 [00:00<00:00, 22.2kB/s]" + } + }, + "d627022e756048e497691196783e374b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "dfdfcf6e8b564262bd28ff6eb2558190": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e28e70aa5f834335ad7ea21450a25d32": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e29039073cc7474cb0fe704cc7a668b7": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e2c30cdce58440b8afb1665297661883": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_996e80a09e254708923e6e76e92c2554", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_5fec6f60f4274b9e964c916197dd155e", + "value": 1 + } + }, + "e5c3875f5d7f43e1acb57da68a5810d6": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_152d919539764660b1223a911e580fa3", + "placeholder": "​", + "style": "IPY_MODEL_29394f0ef562422993d43a78c93eb1f1", + "value": "tokenizer_config.json: " + } + }, + "e5e07f71eee54e828dcb7b9dc5305b56": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_022dbbd06beb434b8cde47f13d1d9f3b", + "max": 208, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_157505960e3f4d04be6bdeae7119b4a8", + "value": 208 + } + }, + "ea70dad5f2aa48a2adab37c178f34baf": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ef49681d55954ce9876d0d1ce5d10fc9": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "f0820395ae644b50988896600368588b": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f5f6ac53523743bbad0d09a8476c917c": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "fb0e65319f7a400da1457acf2a7f4b38": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "fdd2d273a3bb495e99ed707bc6783b5d": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "fdf731aed6144145a41c873115a3af61": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_743aa5ba0a8c41019cd0cbdb97b779df", + "max": 1951, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_d627022e756048e497691196783e374b", + "value": 1951 + } + }, + "state": {} + } + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E4B)-Audio.ipynb b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E4B)-Audio.ipynb new file mode 100644 index 0000000..6cf604b --- /dev/null +++ b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E4B)-Audio.ipynb @@ -0,0 +1,5742 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "4RQuEItXqjUN" + }, + "source": [ + "To run this, press \"*Runtime*\" and press \"*Run all*\" on a **free** Tesla T4 Google Colab instance!\n", + "
\n", + "\n", + "\n", + " Join Discord if you need help + ⭐ Star us on Github ⭐\n", + "
\n", + "\n", + "To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme).\n", + "\n", + "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "STd3-4v9qjUO" + }, + "source": [ + "### News" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5r82a21yqjUP" + }, + "source": [ + "Introducing **Unsloth Studio** - a new open source, no-code web UI to train and run LLMs. [Blog](https://unsloth.ai/docs/new/studio) • [Notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb)\n", + "\n", + "\n", + "\n", + "\n", + "
\"Unsloth
Train models — no code needed
\"Unsloth
Run GGUF models on Mac, Windows & Linux
\n", + "\n", + "Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe)\n", + "\n", + "Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context)\n", + "\n", + "New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) • [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) • [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning)\n", + "\n", + "Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2pK4zPKsqjUP" + }, + "source": [ + "### Installation" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "csoGx5FzqjUQ" + }, + "outputs": [], + "source": "%%capture\nimport os, re\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r'[\\d]{1,}\\.[\\d]{1,}', str(torch.__version__)).group(0)\n xformers = 'xformers==' + {'2.10':'0.0.34','2.9':'0.0.33.post1','2.8':'0.0.32.post2'}.get(v, \"0.0.34\")\n !pip install sentencepiece protobuf \"datasets==4.3.0\" \"huggingface_hub>=0.34.0\" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;" + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "hhrWHOOFqjUQ" + }, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TGMWlrRdzwgf" + }, + "source": [ + "### Unsloth\n", + "\n", + "`FastModel` supports loading nearly any model now! This includes Vision and Text models!" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 380, + "referenced_widgets": [ + "f410af63026a45c99add96178f79d89f", + "53016ab2f4044590a70a539539943f0e", + "52e12698e0484947a5e653dd75eafc0d", + "5b837f4e901a4c65a785ea1aeda64d88", + "335e402420454301b91472c6d57e84b5", + "7c79674d29da4d109d08144b1b041aaf", + "c0a71411c1b5450b9251a77537c4b5f1", + "a3c2dacd2d304e9ca47ac75b78e16a2e", + "30f9fd3ac0e34bbe911c8fac4b1e088e", + "e9c8a5ce0e494c37ab92a83b7ff572ec", + "f4d821418b9346618b71b71ce3685691", + "9673ddb6186a481bad6e2f7dd5be0e38", + "3f4b5a7e87f74988b98e29b93245152b", + "d16a20ae3c7e4f64958e435a989010b7", + "ec254568f26e44e1b9b3e9f55591cb03", + "81aa18df41064d1191bd8b70168fc4c4", + "1959561121044bab931b0c508598d21b", + "a05deffd7ac045aa86f299d96937ff7e", + "20fb640986e24578b5e88203620fb234", + "1284d32779d044e78e12c204299a23d9", + "ad55ab8e488a4fae956acbacde8b949d", + "0796662db3e64a03b3bb63404d112058", + "73bcace4f1064ce79d527f08ce32e91d", + "7ca7f08acb594e3b9ec3b86d9ef64f04", + "12dc4d8f0bb248f4aa8d5b83a588a498", + "9af6ba7716214c70b314f4f20f992c2f", + "ac9af7fdb9504ea5a5cf61d6e7856880", + "8c0efbf9883d4c1c83f23b53c876d463", + "5082837d05184d30be04e233aacf10f0", + "52b5a37078e149fa86170c83a33f3cde", + "29255e9634df4d1ab06af95bdecee46e", + "4e562a778ee84f8e8e0bb2341510aafa", + "2acf420f9f95457f8b1c2fd99ca91399", + "c66064d73fcb4297bd184c0aad8c9992", + "070a79ac591a46e19d269baeea4fd6c5", + "1174f271c48649d288b8cd3f6ebe88c9", + "1a149c6b325c4e438b7eff7dfb4dd7b7", + "03680f63749843a99713a4d0d5a3841e", + "c011b819ad7043faac43f774e01e39d5", + "531b3795e2eb4690b4cc6a7c57254dbf", + "6dde1dd97f6a4c8eb74b4e6a541cf6e3", + "2100d97d1b024f0d94fbf17a8f7f8bd2", + "e5d0c6b85acd47a2b97d174f2cb05ac0", + "64c6e8b82a994d8c8d7bee2db8e11095", + "79e4f519ce354778993398adfcaf8c0c", + "23d003381f774921b0bbba989628a953", + "ee768e79196645789673399538027f84", + "ebcc3520e6b448669e86c216cf447e85", + "be3c7b112e6c4c709534d37d905f8ca3", + "8fd629aab65847f8bfc8784ba4967053", + "71f692d49a834179977f7e46312a2382", + "f655e41b3b76407f803a3a7cac886331", + "a7f74450f28c4d89a7fa48a3bdcad0b2", + "ab73bb5ad7534739992e830528452a1b", + "2e1fb2e8a2d34d249daee3f4be63397a", + "2678b1294fc44508bbc031e1f8ebaa84", + "13a8e8e55fb1420f98dcf2a391fe31e2", + "a9ab6a8075554c88a353fb779bf3f11b", + "206bda5a7946468099699003ce99f361", + "fb3a74b6f88f4b09862baee03cee8302", + "29d110ef78ef46609421da0495e6467d", + "04a6f124897543549ad5aed50e480ed5", + "ba292ae9569840d396d87aefee0d3d26", + "45a413a7b6a3442a86e7def1d6697ac1", + "698fba63eb244ed2bb2ee63f24ec6f65", + "93d823e74553487ab278af9eee38e9c4", + "d500f1a39c7f40f4b2c9d963f504d213", + "9b955d354b75489cac2226fd4be5064c", + "8a24aba3bcde472298685e72cfb3427a", + "50a608a731a44709ba28d32b7f159b7d", + "6dedecd63e4f47218bc5802f7cb07dc4", + "6ff2024770cf401ebc44106f343b8f30", + "7db57d9dc2df4744a28fc079f48b2d0e", + "8b97315532964a4c84fb78073d738e3c", + "ddb90a7a730b4463bdaf6c8da724f8b0", + "45a1e46160aa4b9895d8583a8fae9890", + "a10943fa524b420ba8ff854665119e43" + ] + }, + "execution": { + "iopub.execute_input": "2025-07-20T12:16:21.155888Z", + "iopub.status.busy": "2025-07-20T12:16:21.155077Z", + "iopub.status.idle": "2025-07-20T12:17:36.514669Z", + "shell.execute_reply": "2025-07-20T12:17:36.513831Z", + "shell.execute_reply.started": "2025-07-20T12:16:21.155861Z" + }, + "id": "-Xbb0cuLzwgf", + "outputId": "73a29692-4e1d-4de7-e822-f49a5e5fa7f8", + "trusted": true + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.\n", + "🦥 Unsloth Zoo will now patch everything to make training faster!\n", + "==((====))== Unsloth 2026.4.4: Fast Gemma4 patching. Transformers: 5.5.0.\n", + " \\\\ /| Tesla T4. Num GPUs = 1. Max memory: 14.563 GB. Platform: Linux.\n", + "O^O/ \\_/ \\ Torch: 2.10.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.6.0\n", + "\\ / Bfloat16 = FALSE. FA [Xformers = 0.0.34. FA2 = False]\n", + " \"-____-\" Free license: http://github.com/unslothai/unsloth\n", + "Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "model.safetensors: 0%| | 0.00/16.0G [00:00Let's Evaluate Gemma 4 Baseline Performance on German Transcription" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-20T12:46:26.982493Z", + "iopub.status.busy": "2025-07-20T12:46:26.982188Z", + "iopub.status.idle": "2025-07-20T12:46:28.166576Z", + "shell.execute_reply": "2025-07-20T12:46:28.165889Z", + "shell.execute_reply.started": "2025-07-20T12:46:26.982472Z" + }, + "id": "GHBGeJhYcorh", + "trusted": true, + "outputId": "11d0f8cc-50b4-40ae-b2b0-c2c0481d0e2b", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 145, + "referenced_widgets": [ + "7ead793edcdc4542868bc35988087a41", + "9d9bc2085aa64e17a7cb03743fc13ce0", + "4dc9a224e9ed4fffb797c145520c2153", + "73018ba3aebd4a0a9c16150ecde428e7", + "0ab572241a14413381c618316769bea5", + "341d2ebe48d54ae58ef822a61a4ea55d", + "1c9ebc55fb584b5b89516e376ec00cfc", + "92db0e2905db4aa39552b7818b2282d3", + "6f730b26ae1447a9af2c7e3f8509f048", + "d26ca88568a94ef1beb9dd3102757a42", + "465de8b97846426cb7455bfbaaa66895", + "59102102d47c4d5b941afe38b1e327c5", + "ba1b1d94170b41ad82a5431e64854fd0", + "85c06e2df7cf426d888b2e885c9ff373", + "756f82f0ed2b4537ae922ac65de71e2c", + "6b92ebfc7af2452fb88700cd29549845", + "78b1fd5436694c7a883829a3373ba164", + "49285608dabb458bae5125ae9313a0b6", + "7d1762feec4a461284a762018372eef6", + "0504d2ea9d6b4c769b50be9e859c6b48", + "cc4fe7b5633d4932bb70b9ce0f46c169", + "e4717c8e5d234eba94fbc5e7c7369f65", + "7f2edc965f6d4418bfeaf01911425606", + "33f366b3b13d4a63b56ec57dd5f28836", + "2ee730d4eda94bbea4e0a4df58b7f84d", + "67f489c853be494b83456cca77bf2a0d", + "f8bf5e48834d418eb423737cf0a0c66d", + "7b63ed98dea64347bd8ffeb4e94c1e0c", + "c7b48c5df1e74383955e010b4f411854", + "aa7b6c960df84140b59c40a166279b23", + "215bf5b5eef844f9ad392e2b93db8da2", + "0cdf4bf1b09c481a95ab4c9351bf605c", + "d310fdb8b35d4173a1c2213118197d89", + "84d7352d2921443fa2b76d60bd2a51d1", + "5506b7e456b84c6890b12579d815e6ea", + "a68bd900917c4ec3bc7fced86a909643", + "4e4a231bbf20450fac8d6775cc6d1a1a", + "ca763ea2a041412280338cdf806dafc3", + "a8be70e3ed014df9bddd8f72dec5702e", + "97c34bf202644e82bed87aa2defa5ed7", + "92b1ae8f54b246b790ddd66493ddf7cc", + "5a90282515bf44aa88374f3a7f05df1e", + "904619a3cc624a25925cb66f6ab86564", + "8870940de0534e98b9189e7c168b80e1" + ] + } + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "README.md: 0%| | 0.00/540 [00:00" + ], + "text/html": [ + "\n", + " \n", + " " + ] + }, + "metadata": {}, + "execution_count": 6 + } + ], + "source": [ + "from IPython.display import Audio, display\n", + "print(test_audio['text'])\n", + "Audio(test_audio['audio']['array'],rate = test_audio['audio']['sampling_rate'])" + ] + }, + { + "cell_type": "markdown", + "source": [ + "And the translation of the audio from German to English is:\n", + "\n", + "> I—I hold myself directly accountable. That much is, of course, clear: namely, that there are political interests involved in trade—in the exchange of goods—and that political influences are at play. The question is: that should not be the alternative." + ], + "metadata": { + "id": "3XGomsRxl5d_" + } + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "execution": { + "iopub.execute_input": "2025-07-20T12:18:05.249062Z", + "iopub.status.busy": "2025-07-20T12:18:05.248355Z", + "iopub.status.idle": "2025-07-20T12:18:37.319606Z", + "shell.execute_reply": "2025-07-20T12:18:37.318802Z", + "shell.execute_reply.started": "2025-07-20T12:18:05.249040Z" + }, + "id": "BJr_D4O9Z2Zh", + "outputId": "ab2cb578-03a5-4322-ea56-c9caa785c5a8", + "trusted": true + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Sie reden direkt mich an, und ist mir völlig klar, dass es politische Interessen gibt im Handel, im Austausch mit Waren, dass es politische Einflüsse gibt. Wie qual ist die Alternative soll es nicht sein?\n" + ] + } + ], + "source": [ + "messages = [\n", + " {\n", + " \"role\": \"system\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": \"You are an assistant that transcribes speech accurately.\",\n", + " }\n", + " ],\n", + " },\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"audio\", \"audio\": test_audio['audio']['array']},\n", + " {\"type\": \"text\", \"text\": \"Please transcribe this audio.\"}\n", + " ]\n", + " }\n", + "]\n", + "\n", + "do_gemma_4_inference(messages, max_new_tokens = 256)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Yc0nI6Gzcori" + }, + "source": [ + "

Baseline Model Performance: 32.43% Word Error Rate (WER) for this sample !

" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Bw5XPyYFajyM" + }, + "source": [ + "# Let's finetune Gemma 4!\n", + "\n", + "You can finetune the vision and text and audio parts" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SXd9bTZd1aaL" + }, + "source": [ + "We now add LoRA adapters so we only need to update a small amount of parameters!" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-20T12:46:48.481871Z", + "iopub.status.busy": "2025-07-20T12:46:48.481594Z", + "iopub.status.idle": "2025-07-20T12:46:55.013627Z", + "shell.execute_reply": "2025-07-20T12:46:55.012955Z", + "shell.execute_reply.started": "2025-07-20T12:46:48.481854Z" + }, + "id": "6bZsfBuZDeCL", + "trusted": true + }, + "outputs": [], + "source": [ + "model = FastModel.get_peft_model(\n", + " model,\n", + " finetune_vision_layers = False, # False if not finetuning vision layers\n", + " finetune_language_layers = True, # False if not finetuning language layers\n", + " finetune_attention_modules = True, # False if not finetuning attention layers\n", + " finetune_mlp_modules = True, # False if not finetuning MLP layers\n", + "\n", + " r = 8, # The larger, the higher the accuracy, but might overfit\n", + " lora_alpha = 16, # Recommended alpha == r at least\n", + " lora_dropout = 0,\n", + " bias = \"none\",\n", + " random_state = 3407,\n", + " use_rslora = False, # We support rank stabilized LoRA\n", + " loftq_config = None, # And LoftQ\n", + " target_modules = [\n", + " \"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\",\n", + " \"gate_proj\", \"up_proj\", \"down_proj\",\n", + "\n", + " # Audio layers\n", + " \"post\", \"linear_start\", \"linear_end\",\n", + " \"embedding_projection\",\n", + " \"ffw_layer_1\", \"ffw_layer_2\",\n", + " \"output_proj\",\n", + " ]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vITh0KVJ10qX" + }, + "source": [ + "\n", + "### Data Prep\n", + "We adapt the `kadirnar/Emilia-DE-B000000` dataset for our German ASR task using Gemma 4 multi-modal chat format. Each audio-text pair is structured into a conversation with `system`, `user`, and `assistant` roles. The processor then converts this into the final training format:\n", + "\n", + "```\n", + "<|turn>system\n", + "You are an assistant that transcribes speech accurately.\n", + "<|turn>user\n", + "<|audio|>Please transcribe this audio.\n", + "<|turn>model\n", + "Ich, ich rechne direkt mich an." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-20T12:47:03.723745Z", + "iopub.status.busy": "2025-07-20T12:47:03.723405Z", + "iopub.status.idle": "2025-07-20T12:47:03.729197Z", + "shell.execute_reply": "2025-07-20T12:47:03.728434Z", + "shell.execute_reply.started": "2025-07-20T12:47:03.723714Z" + }, + "id": "o8caH7vlcorj", + "trusted": true + }, + "outputs": [], + "source": [ + "def format_intersection_data(samples: dict) -> dict[str, list]:\n", + " \"\"\"Format intersection dataset to match expected message format\"\"\"\n", + " formatted_samples = {\"messages\": []}\n", + " for idx in range(len(samples[\"audio\"])):\n", + " audio = samples[\"audio\"][idx][\"array\"]\n", + " label = str(samples[\"text\"][idx])\n", + "\n", + " message = [\n", + " {\n", + " \"role\": \"system\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": \"You are an assistant that transcribes speech accurately.\",\n", + " }\n", + " ],\n", + " },\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"audio\", \"audio\": audio},\n", + " {\"type\": \"text\", \"text\": \"Please transcribe this audio.\"}\n", + " ]\n", + " },\n", + " {\n", + " \"role\": \"assistant\",\n", + " \"content\":[{\"type\": \"text\", \"text\": label}]\n", + " }\n", + " ]\n", + " formatted_samples[\"messages\"].append(message)\n", + " return formatted_samples" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-20T12:47:08.489955Z", + "iopub.status.busy": "2025-07-20T12:47:08.489357Z", + "iopub.status.idle": "2025-07-20T12:47:09.221727Z", + "shell.execute_reply": "2025-07-20T12:47:09.221018Z", + "shell.execute_reply.started": "2025-07-20T12:47:08.489932Z" + }, + "id": "k7CQ3jvDcorj", + "trusted": true, + "outputId": "8aa74b98-32ea-4f20-a084-c8bb5b2d63e4", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 49, + "referenced_widgets": [ + "75e57ae8d4f94b3d9702a24a85b75f6e", + "a452f87440d34e2285dee4941cf558e2", + "d6d64f507469463ba1d678e966b78c28", + "9269fd899b71476faeffe5ca24ff3c23", + "d574cfbfa9f94ef7b1c037bc7884b8b5", + "cc44984cc78442ba9b73df356ad6e6b3", + "45cfbdff1db94eed80cbd6cf4ae19b3d", + "68f32e20479e4d61bda9adff2532ba70", + "c264561cd45b41109bcf3cacab8aa387", + "63351416559f4bfdb6c21adceaee17e8", + "5efb687bf68b45ca91797d17fbf88be4" + ] + } + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Map (num_proc=4): 0%| | 0/3000 [00:00\n", + "### Train the model\n", + "Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-20T12:48:17.004874Z", + "iopub.status.busy": "2025-07-20T12:48:17.004079Z", + "iopub.status.idle": "2025-07-20T12:48:17.279559Z", + "shell.execute_reply": "2025-07-20T12:48:17.278695Z", + "shell.execute_reply.started": "2025-07-20T12:48:17.004848Z" + }, + "id": "95_Nn-89DhsL", + "trusted": true, + "outputId": "4ab7d37a-efa4-43ab-b22d-e05c5ae7447e", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Unsloth: Model does not have a default image size - using 512\n" + ] + } + ], + "source": [ + "# Use UnslothVisionDataCollator which handles audio token alignment correctly\n", + "from unsloth.trainer import UnslothVisionDataCollator\n", + "from trl import SFTTrainer, SFTConfig\n", + "\n", + "trainer = SFTTrainer(\n", + " model = model,\n", + " train_dataset = dataset,\n", + " processing_class = processor.tokenizer,\n", + " data_collator = UnslothVisionDataCollator(model, processor),\n", + " args = SFTConfig(\n", + " per_device_train_batch_size = 8,\n", + " gradient_accumulation_steps = 1,\n", + " warmup_ratio = 0.03,\n", + " # num_train_epochs = 1, # Use for full training runs\n", + " max_steps = 60,\n", + " learning_rate = 5e-5,\n", + " logging_steps = 1,\n", + " save_strategy = \"steps\",\n", + " optim = \"adamw_8bit\",\n", + " weight_decay = 0.001,\n", + " lr_scheduler_type = \"cosine\",\n", + " seed = 3407,\n", + " output_dir = \"outputs\",\n", + " report_to = \"none\",\n", + " remove_unused_columns = False,\n", + "\n", + " # The below are a must for audio finetuning:\n", + " dataset_text_field = \"\",\n", + " dataset_kwargs = {\"skip_prepare_dataset\": True},\n", + " max_length = 8192,\n", + " )\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "cellView": "form", + "id": "2ejIt2xSNKKp", + "trusted": true, + "outputId": "1b81b661-b347-4ef8-bfa2-0e9211d96124", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "GPU = Tesla T4. Max memory = 14.563 GB.\n", + "9.518 GB of memory reserved.\n" + ] + } + ], + "source": [ + "# @title Show current memory stats\n", + "gpu_stats = torch.cuda.get_device_properties(0)\n", + "start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)\n", + "print(f\"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.\")\n", + "print(f\"{start_gpu_memory} GB of memory reserved.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CNP1Uidk9mrz" + }, + "source": [ + "# Let's train the model!\n", + "\n", + "To resume a training run, set `trainer.train(resume_from_checkpoint = True)`" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "execution": { + "iopub.execute_input": "2025-07-20T12:48:20.209164Z", + "iopub.status.busy": "2025-07-20T12:48:20.208832Z", + "iopub.status.idle": "2025-07-20T13:42:42.607026Z", + "shell.execute_reply": "2025-07-20T13:42:42.606099Z", + "shell.execute_reply.started": "2025-07-20T12:48:20.209142Z" + }, + "id": "yqxqAZ7KJ4oL", + "outputId": "9e4aa7ed-a720-48e9-e35c-42f458023208", + "trusted": true + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': 2}.\n", + "==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1\n", + " \\\\ /| Num examples = 3,000 | Num Epochs = 1 | Total steps = 60\n", + "O^O/ \\_/ \\ Batch size per device = 8 | Gradient accumulation steps = 1\n", + "\\ / Data Parallel GPUs = 1 | Total batch size (8 x 1 x 1) = 8\n", + " \"-____-\" Trainable parameters = 23,934,976 of 8,020,091,424 (0.30% trained)\n", + "Caching is incompatible with gradient checkpointing in Gemma4TextDecoderLayer. Setting `past_key_values=None`.\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Unsloth: Will smartly offload gradients to save VRAM!\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [60/60 02:44, Epoch 0/1]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StepTraining Loss
117.025774
216.762154
316.299833
416.981674
516.810152
616.151384
714.198338
813.161330
911.982812
1010.303127
119.856644
129.201151
138.912704
147.209361
156.705412
166.741477
176.189445
185.548326
194.765179
204.548351
214.192586
224.059164
234.241319
243.192038
253.979971
263.580244
273.562336
283.121466
293.284021
302.820877
312.781770
323.181130
333.296061
342.908674
353.456495
363.174899
372.976336
383.142226
392.514387
402.838711
413.000938
422.302069
433.237894
442.881329
453.214046
463.095760
472.509252
482.897425
492.668692
502.657610
512.777081
522.495918
532.955339
542.717725
553.013832
562.617149
572.862764
582.879027
593.030071
602.632949

" + ] + }, + "metadata": {} + } + ], + "source": [ + "trainer_stats = trainer.train()" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "cellView": "form", + "id": "pCqnaKmlO1U9", + "trusted": true, + "outputId": "20983c94-9f93-4053-e1f2-271c51939b65", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "193.5248 seconds used for training.\n", + "3.23 minutes used for training.\n", + "Peak reserved memory = 11.139 GB.\n", + "Peak reserved memory for training = 1.621 GB.\n", + "Peak reserved memory % of max memory = 76.488 %.\n", + "Peak reserved memory for training % of max memory = 11.131 %.\n" + ] + } + ], + "source": [ + "# @title Show final memory and time stats\n", + "used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "used_memory_for_lora = round(used_memory - start_gpu_memory, 3)\n", + "used_percentage = round(used_memory / max_memory * 100, 3)\n", + "lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)\n", + "print(f\"{trainer_stats.metrics['train_runtime']} seconds used for training.\")\n", + "print(\n", + " f\"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.\"\n", + ")\n", + "print(f\"Peak reserved memory = {used_memory} GB.\")\n", + "print(f\"Peak reserved memory for training = {used_memory_for_lora} GB.\")\n", + "print(f\"Peak reserved memory % of max memory = {used_percentage} %.\")\n", + "print(f\"Peak reserved memory for training % of max memory = {lora_percentage} %.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ekOmTR1hSNcr" + }, + "source": [ + "\n", + "### Inference\n", + "Let's run the model via Unsloth native inference! According to the `Gemma-4` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64` but for this example we use `do_sample=False` for ASR." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "execution": { + "iopub.execute_input": "2025-07-20T13:57:34.004664Z", + "iopub.status.busy": "2025-07-20T13:57:34.004306Z", + "iopub.status.idle": "2025-07-20T13:57:59.332316Z", + "shell.execute_reply": "2025-07-20T13:57:59.331671Z", + "shell.execute_reply.started": "2025-07-20T13:57:34.004639Z" + }, + "id": "kR3gIAX-SM2q", + "outputId": "ab8ef181-270a-42c5-ea08-25b9d092ca51", + "trusted": true + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Sie reden direkt mich an, und ist mir völlig klar, dass äh es politische Interessen gibt im Handel, im Austausch mit Waren, dass es politische Einflüsse gibt. Wie qual ist die Alternative soll es nicht sein?\n" + ] + } + ], + "source": [ + "messages = [\n", + " {\n", + " \"role\": \"system\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": \"You are an assistant that transcribes speech accurately.\",\n", + " }\n", + " ],\n", + " },\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"audio\", \"audio\": test_audio['audio']['array']},\n", + " {\"type\": \"text\", \"text\": \"Please transcribe this audio.\"}\n", + " ]\n", + " }\n", + "]\n", + "\n", + "do_gemma_4_inference(messages, max_new_tokens = 256)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uMuVrWbjAzhc" + }, + "source": [ + "\n", + "### Saving, loading finetuned models\n", + "To save the final model as LoRA adapters, either use Hugging Face's `push_to_hub` for an online save or `save_pretrained` for a local save.\n", + "\n", + "**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "id": "upcOlWe7A1vc", + "trusted": true, + "outputId": "049b82a5-2741-4f7b-824b-cd36a7b445e0", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "['gemma_4_lora/processor_config.json']" + ] + }, + "metadata": {}, + "execution_count": 16 + } + ], + "source": [ + "model.save_pretrained(\"gemma_4_lora\") # Local saving\n", + "processor.save_pretrained(\"gemma_4_lora\")\n", + "# model.push_to_hub(\"HF_ACCOUNT/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving\n", + "# processor.push_to_hub(\"HF_ACCOUNT/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AEEcJ4qfC7Lp" + }, + "source": [ + "Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "id": "MKX_XKs_BNZR", + "trusted": true, + "outputId": "0df68115-9684-45ca-ef64-fb04f3db1ec7", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "I am Gemma 4, a Large Language Model developed by Google DeepMind. I am an open weights model.\n" + ] + } + ], + "source": [ + "if False:\n", + " from unsloth import FastModel\n", + " model, processor = FastModel.from_pretrained(\n", + " model_name = \"gemma_4_lora\", # YOUR MODEL YOU USED FOR TRAINING\n", + " max_seq_length = 2048,\n", + " load_in_4bit = True,\n", + " )\n", + "\n", + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\" : \"text\", \"text\" : \"What is Gemma-4?\",}]\n", + "}]\n", + "inputs = processor.apply_chat_template(\n", + " messages,\n", + " add_generation_prompt = True, # Must add for generation\n", + " return_tensors = \"pt\",\n", + " tokenize = True,\n", + " return_dict = True,\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "_ = model.generate(\n", + " **inputs,\n", + " max_new_tokens = 128, # Increase for longer outputs!\n", + " # Recommended Gemma-4 settings!\n", + " temperature = 1.0, top_p = 0.95, top_k = 64,\n", + " streamer = TextStreamer(processor, skip_prompt = True),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f422JgM9sdVT" + }, + "source": [ + "### Saving to float16 for VLLM\n", + "\n", + "We also support saving to `float16` directly for deployment! We save it in the folder `gemma-4-finetune`. Set `if False` to `if True` to let it run!" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "id": "iHjt_SMYsd3P", + "trusted": true + }, + "outputs": [], + "source": [ + "if False: # Change to True to save finetune!\n", + " model.save_pretrained_merged(\"gemma-4\", processor)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "z6O48DbNIAr0" + }, + "source": [ + "If you want to upload / push to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location!" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "id": "ZV-CiKPrIFG0", + "trusted": true + }, + "outputs": [], + "source": [ + "if False: # Change to True to upload finetune\n", + " model.push_to_hub_merged(\n", + " \"HF_ACCOUNT/gemma-4-finetune\", processor,\n", + " token = \"YOUR_HF_TOKEN\"\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TCv4vXHd61i7" + }, + "source": [ + "### GGUF / llama.cpp Conversion\n", + "To save to `GGUF` / `llama.cpp`, we support it natively now for all models! For now, you can convert easily to `Q8_0, F16 or BF16` precision. `Q4_K_M` for 4bit will come later!" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "id": "FqfebeAdT073", + "trusted": true + }, + "outputs": [], + "source": [ + "if False: # Change to True to save to GGUF\n", + " model.save_pretrained_gguf(\n", + " \"gemma_4_finetune\",\n", + " processor,\n", + " quantization_method = \"Q8_0\", # For now only Q8_0, BF16, F16 supported\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Q974YEVPI7JS" + }, + "source": [ + "Likewise, if you want to instead push to GGUF to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location!" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "id": "ZgcJIhJ0I_es", + "trusted": true + }, + "outputs": [], + "source": [ + "if False: # Change to True to upload GGUF\n", + " model.push_to_hub_gguf(\n", + " \"HF_ACCOUNT/gemma_4_finetune\",\n", + " processor,\n", + " quantization_method = \"Q8_0\", # Only Q8_0, BF16, F16 supported\n", + " token = \"YOUR_HF_TOKEN\",\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pnz9QOYTMvbH" + }, + "source": [ + "Now, use the `gemma-4-finetune.gguf` file or `gemma-4-finetune-Q4_K_M.gguf` file in llama.cpp.\n", + "\n", + "And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!\n", + "\n", + "Some other resources:\n", + "1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)\n", + "2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)\n", + "3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)\n", + "4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)!\n", + "\n", + "

\n", + " \n", + " \n", + " \n", + "\n", + " Join Discord if you need help + ⭐️ Star us on Github ⭐️\n", + "
\n", + "\n", + " This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme)." + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [], + "machine_shape": "hm" + }, + "kaggle": { + "accelerator": "nvidiaTeslaT4", + "dataSources": [], + "dockerImageVersionId": 31040, + "isGpuEnabled": true, + "isInternetEnabled": true, + "language": "python", + "sourceType": "notebook" + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "f410af63026a45c99add96178f79d89f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_53016ab2f4044590a70a539539943f0e", + "IPY_MODEL_52e12698e0484947a5e653dd75eafc0d", + "IPY_MODEL_5b837f4e901a4c65a785ea1aeda64d88" + ], + "layout": "IPY_MODEL_335e402420454301b91472c6d57e84b5" + } + }, + "53016ab2f4044590a70a539539943f0e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7c79674d29da4d109d08144b1b041aaf", + "placeholder": "​", + "style": "IPY_MODEL_c0a71411c1b5450b9251a77537c4b5f1", + "value": "model.safetensors: 100%" + } + }, + "52e12698e0484947a5e653dd75eafc0d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a3c2dacd2d304e9ca47ac75b78e16a2e", + "max": 15992595884, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_30f9fd3ac0e34bbe911c8fac4b1e088e", + "value": 15992595884 + } + }, + "5b837f4e901a4c65a785ea1aeda64d88": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_e9c8a5ce0e494c37ab92a83b7ff572ec", + "placeholder": "​", + "style": "IPY_MODEL_f4d821418b9346618b71b71ce3685691", + "value": " 16.0G/16.0G [08:10<00:00, 82.8MB/s]" + } + }, + "335e402420454301b91472c6d57e84b5": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7c79674d29da4d109d08144b1b041aaf": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c0a71411c1b5450b9251a77537c4b5f1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "a3c2dacd2d304e9ca47ac75b78e16a2e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "30f9fd3ac0e34bbe911c8fac4b1e088e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "e9c8a5ce0e494c37ab92a83b7ff572ec": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f4d821418b9346618b71b71ce3685691": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "9673ddb6186a481bad6e2f7dd5be0e38": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_3f4b5a7e87f74988b98e29b93245152b", + "IPY_MODEL_d16a20ae3c7e4f64958e435a989010b7", + "IPY_MODEL_ec254568f26e44e1b9b3e9f55591cb03" + ], + "layout": "IPY_MODEL_81aa18df41064d1191bd8b70168fc4c4" + } + }, + "3f4b5a7e87f74988b98e29b93245152b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_1959561121044bab931b0c508598d21b", + "placeholder": "​", + "style": "IPY_MODEL_a05deffd7ac045aa86f299d96937ff7e", + "value": "Loading weights: 100%" + } + }, + "d16a20ae3c7e4f64958e435a989010b7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_20fb640986e24578b5e88203620fb234", + "max": 2130, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_1284d32779d044e78e12c204299a23d9", + "value": 2130 + } + }, + "ec254568f26e44e1b9b3e9f55591cb03": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ad55ab8e488a4fae956acbacde8b949d", + "placeholder": "​", + "style": "IPY_MODEL_0796662db3e64a03b3bb63404d112058", + "value": " 2130/2130 [01:08<00:00, 407.41it/s]" + } + }, + "81aa18df41064d1191bd8b70168fc4c4": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1959561121044bab931b0c508598d21b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a05deffd7ac045aa86f299d96937ff7e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "20fb640986e24578b5e88203620fb234": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1284d32779d044e78e12c204299a23d9": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "ad55ab8e488a4fae956acbacde8b949d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0796662db3e64a03b3bb63404d112058": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "73bcace4f1064ce79d527f08ce32e91d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_7ca7f08acb594e3b9ec3b86d9ef64f04", + "IPY_MODEL_12dc4d8f0bb248f4aa8d5b83a588a498", + "IPY_MODEL_9af6ba7716214c70b314f4f20f992c2f" + ], + "layout": "IPY_MODEL_ac9af7fdb9504ea5a5cf61d6e7856880" + } + }, + "7ca7f08acb594e3b9ec3b86d9ef64f04": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_8c0efbf9883d4c1c83f23b53c876d463", + "placeholder": "​", + "style": "IPY_MODEL_5082837d05184d30be04e233aacf10f0", + "value": "generation_config.json: 100%" + } + }, + "12dc4d8f0bb248f4aa8d5b83a588a498": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_52b5a37078e149fa86170c83a33f3cde", + "max": 208, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_29255e9634df4d1ab06af95bdecee46e", + "value": 208 + } + }, + "9af6ba7716214c70b314f4f20f992c2f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_4e562a778ee84f8e8e0bb2341510aafa", + "placeholder": "​", + "style": "IPY_MODEL_2acf420f9f95457f8b1c2fd99ca91399", + "value": " 208/208 [00:00<00:00, 21.3kB/s]" + } + }, + "ac9af7fdb9504ea5a5cf61d6e7856880": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8c0efbf9883d4c1c83f23b53c876d463": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5082837d05184d30be04e233aacf10f0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "52b5a37078e149fa86170c83a33f3cde": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "29255e9634df4d1ab06af95bdecee46e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "4e562a778ee84f8e8e0bb2341510aafa": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2acf420f9f95457f8b1c2fd99ca91399": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "c66064d73fcb4297bd184c0aad8c9992": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_070a79ac591a46e19d269baeea4fd6c5", + "IPY_MODEL_1174f271c48649d288b8cd3f6ebe88c9", + "IPY_MODEL_1a149c6b325c4e438b7eff7dfb4dd7b7" + ], + "layout": "IPY_MODEL_03680f63749843a99713a4d0d5a3841e" + } + }, + "070a79ac591a46e19d269baeea4fd6c5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c011b819ad7043faac43f774e01e39d5", + "placeholder": "​", + "style": "IPY_MODEL_531b3795e2eb4690b4cc6a7c57254dbf", + "value": "processor_config.json: " + } + }, + "1174f271c48649d288b8cd3f6ebe88c9": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_6dde1dd97f6a4c8eb74b4e6a541cf6e3", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_2100d97d1b024f0d94fbf17a8f7f8bd2", + "value": 1 + } + }, + "1a149c6b325c4e438b7eff7dfb4dd7b7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_e5d0c6b85acd47a2b97d174f2cb05ac0", + "placeholder": "​", + "style": "IPY_MODEL_64c6e8b82a994d8c8d7bee2db8e11095", + "value": " 1.69k/? [00:00<00:00, 114kB/s]" + } + }, + "03680f63749843a99713a4d0d5a3841e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c011b819ad7043faac43f774e01e39d5": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "531b3795e2eb4690b4cc6a7c57254dbf": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "6dde1dd97f6a4c8eb74b4e6a541cf6e3": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "2100d97d1b024f0d94fbf17a8f7f8bd2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "e5d0c6b85acd47a2b97d174f2cb05ac0": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "64c6e8b82a994d8c8d7bee2db8e11095": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "79e4f519ce354778993398adfcaf8c0c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_23d003381f774921b0bbba989628a953", + "IPY_MODEL_ee768e79196645789673399538027f84", + "IPY_MODEL_ebcc3520e6b448669e86c216cf447e85" + ], + "layout": "IPY_MODEL_be3c7b112e6c4c709534d37d905f8ca3" + } + }, + "23d003381f774921b0bbba989628a953": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_8fd629aab65847f8bfc8784ba4967053", + "placeholder": "​", + "style": "IPY_MODEL_71f692d49a834179977f7e46312a2382", + "value": "chat_template.jinja: " + } + }, + "ee768e79196645789673399538027f84": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_f655e41b3b76407f803a3a7cac886331", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_a7f74450f28c4d89a7fa48a3bdcad0b2", + "value": 1 + } + }, + "ebcc3520e6b448669e86c216cf447e85": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ab73bb5ad7534739992e830528452a1b", + "placeholder": "​", + "style": "IPY_MODEL_2e1fb2e8a2d34d249daee3f4be63397a", + "value": " 11.9k/? [00:00<00:00, 1.08MB/s]" + } + }, + "be3c7b112e6c4c709534d37d905f8ca3": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8fd629aab65847f8bfc8784ba4967053": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "71f692d49a834179977f7e46312a2382": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "f655e41b3b76407f803a3a7cac886331": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "a7f74450f28c4d89a7fa48a3bdcad0b2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "ab73bb5ad7534739992e830528452a1b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2e1fb2e8a2d34d249daee3f4be63397a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "2678b1294fc44508bbc031e1f8ebaa84": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_13a8e8e55fb1420f98dcf2a391fe31e2", + "IPY_MODEL_a9ab6a8075554c88a353fb779bf3f11b", + "IPY_MODEL_206bda5a7946468099699003ce99f361" + ], + "layout": "IPY_MODEL_fb3a74b6f88f4b09862baee03cee8302" + } + }, + "13a8e8e55fb1420f98dcf2a391fe31e2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_29d110ef78ef46609421da0495e6467d", + "placeholder": "​", + "style": "IPY_MODEL_04a6f124897543549ad5aed50e480ed5", + "value": "tokenizer_config.json: " + } + }, + "a9ab6a8075554c88a353fb779bf3f11b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ba292ae9569840d396d87aefee0d3d26", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_45a413a7b6a3442a86e7def1d6697ac1", + "value": 1 + } + }, + "206bda5a7946468099699003ce99f361": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_698fba63eb244ed2bb2ee63f24ec6f65", + "placeholder": "​", + "style": "IPY_MODEL_93d823e74553487ab278af9eee38e9c4", + "value": " 14.9k/? [00:00<00:00, 1.05MB/s]" + } + }, + "fb3a74b6f88f4b09862baee03cee8302": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "29d110ef78ef46609421da0495e6467d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "04a6f124897543549ad5aed50e480ed5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "ba292ae9569840d396d87aefee0d3d26": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "45a413a7b6a3442a86e7def1d6697ac1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "698fba63eb244ed2bb2ee63f24ec6f65": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "93d823e74553487ab278af9eee38e9c4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "d500f1a39c7f40f4b2c9d963f504d213": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_9b955d354b75489cac2226fd4be5064c", + "IPY_MODEL_8a24aba3bcde472298685e72cfb3427a", + "IPY_MODEL_50a608a731a44709ba28d32b7f159b7d" + ], + "layout": "IPY_MODEL_6dedecd63e4f47218bc5802f7cb07dc4" + } + }, + "9b955d354b75489cac2226fd4be5064c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_6ff2024770cf401ebc44106f343b8f30", + "placeholder": "​", + "style": "IPY_MODEL_7db57d9dc2df4744a28fc079f48b2d0e", + "value": "tokenizer.json: 100%" + } + }, + "8a24aba3bcde472298685e72cfb3427a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_8b97315532964a4c84fb78073d738e3c", + "max": 32169626, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_ddb90a7a730b4463bdaf6c8da724f8b0", + "value": 32169626 + } + }, + "50a608a731a44709ba28d32b7f159b7d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_45a1e46160aa4b9895d8583a8fae9890", + "placeholder": "​", + "style": "IPY_MODEL_a10943fa524b420ba8ff854665119e43", + "value": " 32.2M/32.2M [00:00<00:00, 161MB/s]" + } + }, + "6dedecd63e4f47218bc5802f7cb07dc4": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6ff2024770cf401ebc44106f343b8f30": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7db57d9dc2df4744a28fc079f48b2d0e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "8b97315532964a4c84fb78073d738e3c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ddb90a7a730b4463bdaf6c8da724f8b0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "45a1e46160aa4b9895d8583a8fae9890": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a10943fa524b420ba8ff854665119e43": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "7ead793edcdc4542868bc35988087a41": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_9d9bc2085aa64e17a7cb03743fc13ce0", + "IPY_MODEL_4dc9a224e9ed4fffb797c145520c2153", + "IPY_MODEL_73018ba3aebd4a0a9c16150ecde428e7" + ], + "layout": "IPY_MODEL_0ab572241a14413381c618316769bea5" + } + }, + "9d9bc2085aa64e17a7cb03743fc13ce0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_341d2ebe48d54ae58ef822a61a4ea55d", + "placeholder": "​", + "style": "IPY_MODEL_1c9ebc55fb584b5b89516e376ec00cfc", + "value": "README.md: 100%" + } + }, + "4dc9a224e9ed4fffb797c145520c2153": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_92db0e2905db4aa39552b7818b2282d3", + "max": 540, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_6f730b26ae1447a9af2c7e3f8509f048", + "value": 540 + } + }, + "73018ba3aebd4a0a9c16150ecde428e7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d26ca88568a94ef1beb9dd3102757a42", + "placeholder": "​", + "style": "IPY_MODEL_465de8b97846426cb7455bfbaaa66895", + "value": " 540/540 [00:00<00:00, 52.6kB/s]" + } + }, + "0ab572241a14413381c618316769bea5": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "341d2ebe48d54ae58ef822a61a4ea55d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1c9ebc55fb584b5b89516e376ec00cfc": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "92db0e2905db4aa39552b7818b2282d3": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6f730b26ae1447a9af2c7e3f8509f048": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "d26ca88568a94ef1beb9dd3102757a42": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "465de8b97846426cb7455bfbaaa66895": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "59102102d47c4d5b941afe38b1e327c5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_ba1b1d94170b41ad82a5431e64854fd0", + "IPY_MODEL_85c06e2df7cf426d888b2e885c9ff373", + "IPY_MODEL_756f82f0ed2b4537ae922ac65de71e2c" + ], + "layout": "IPY_MODEL_6b92ebfc7af2452fb88700cd29549845" + } + }, + "ba1b1d94170b41ad82a5431e64854fd0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_78b1fd5436694c7a883829a3373ba164", + "placeholder": "​", + "style": "IPY_MODEL_49285608dabb458bae5125ae9313a0b6", + "value": "data/train-00000-of-00002.parquet: 100%" + } + }, + "85c06e2df7cf426d888b2e885c9ff373": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7d1762feec4a461284a762018372eef6", + "max": 494804366, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_0504d2ea9d6b4c769b50be9e859c6b48", + "value": 494804366 + } + }, + "756f82f0ed2b4537ae922ac65de71e2c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_cc4fe7b5633d4932bb70b9ce0f46c169", + "placeholder": "​", + "style": "IPY_MODEL_e4717c8e5d234eba94fbc5e7c7369f65", + "value": " 495M/495M [00:03<00:00, 354MB/s]" + } + }, + "6b92ebfc7af2452fb88700cd29549845": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "78b1fd5436694c7a883829a3373ba164": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "49285608dabb458bae5125ae9313a0b6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "7d1762feec4a461284a762018372eef6": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0504d2ea9d6b4c769b50be9e859c6b48": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "cc4fe7b5633d4932bb70b9ce0f46c169": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e4717c8e5d234eba94fbc5e7c7369f65": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "7f2edc965f6d4418bfeaf01911425606": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_33f366b3b13d4a63b56ec57dd5f28836", + "IPY_MODEL_2ee730d4eda94bbea4e0a4df58b7f84d", + "IPY_MODEL_67f489c853be494b83456cca77bf2a0d" + ], + "layout": "IPY_MODEL_f8bf5e48834d418eb423737cf0a0c66d" + } + }, + "33f366b3b13d4a63b56ec57dd5f28836": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7b63ed98dea64347bd8ffeb4e94c1e0c", + "placeholder": "​", + "style": "IPY_MODEL_c7b48c5df1e74383955e010b4f411854", + "value": "data/train-00001-of-00002.parquet: 100%" + } + }, + "2ee730d4eda94bbea4e0a4df58b7f84d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_aa7b6c960df84140b59c40a166279b23", + "max": 502613920, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_215bf5b5eef844f9ad392e2b93db8da2", + "value": 502613920 + } + }, + "67f489c853be494b83456cca77bf2a0d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0cdf4bf1b09c481a95ab4c9351bf605c", + "placeholder": "​", + "style": "IPY_MODEL_d310fdb8b35d4173a1c2213118197d89", + "value": " 503M/503M [00:11<00:00, 40.7MB/s]" + } + }, + "f8bf5e48834d418eb423737cf0a0c66d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7b63ed98dea64347bd8ffeb4e94c1e0c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c7b48c5df1e74383955e010b4f411854": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "aa7b6c960df84140b59c40a166279b23": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "215bf5b5eef844f9ad392e2b93db8da2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "0cdf4bf1b09c481a95ab4c9351bf605c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d310fdb8b35d4173a1c2213118197d89": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "84d7352d2921443fa2b76d60bd2a51d1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_5506b7e456b84c6890b12579d815e6ea", + "IPY_MODEL_a68bd900917c4ec3bc7fced86a909643", + "IPY_MODEL_4e4a231bbf20450fac8d6775cc6d1a1a" + ], + "layout": "IPY_MODEL_ca763ea2a041412280338cdf806dafc3" + } + }, + "5506b7e456b84c6890b12579d815e6ea": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a8be70e3ed014df9bddd8f72dec5702e", + "placeholder": "​", + "style": "IPY_MODEL_97c34bf202644e82bed87aa2defa5ed7", + "value": "Generating train split: 100%" + } + }, + "a68bd900917c4ec3bc7fced86a909643": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_92b1ae8f54b246b790ddd66493ddf7cc", + "max": 12038, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_5a90282515bf44aa88374f3a7f05df1e", + "value": 12038 + } + }, + "4e4a231bbf20450fac8d6775cc6d1a1a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_904619a3cc624a25925cb66f6ab86564", + "placeholder": "​", + "style": "IPY_MODEL_8870940de0534e98b9189e7c168b80e1", + "value": " 12038/12038 [00:04<00:00, 2288.38 examples/s]" + } + }, + "ca763ea2a041412280338cdf806dafc3": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a8be70e3ed014df9bddd8f72dec5702e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "97c34bf202644e82bed87aa2defa5ed7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "92b1ae8f54b246b790ddd66493ddf7cc": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5a90282515bf44aa88374f3a7f05df1e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "904619a3cc624a25925cb66f6ab86564": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8870940de0534e98b9189e7c168b80e1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "75e57ae8d4f94b3d9702a24a85b75f6e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_a452f87440d34e2285dee4941cf558e2", + "IPY_MODEL_d6d64f507469463ba1d678e966b78c28", + "IPY_MODEL_9269fd899b71476faeffe5ca24ff3c23" + ], + "layout": "IPY_MODEL_d574cfbfa9f94ef7b1c037bc7884b8b5" + } + }, + "a452f87440d34e2285dee4941cf558e2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_cc44984cc78442ba9b73df356ad6e6b3", + "placeholder": "​", + "style": "IPY_MODEL_45cfbdff1db94eed80cbd6cf4ae19b3d", + "value": "Map (num_proc=4): 100%" + } + }, + "d6d64f507469463ba1d678e966b78c28": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_68f32e20479e4d61bda9adff2532ba70", + "max": 3000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_c264561cd45b41109bcf3cacab8aa387", + "value": 3000 + } + }, + "9269fd899b71476faeffe5ca24ff3c23": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_63351416559f4bfdb6c21adceaee17e8", + "placeholder": "​", + "style": "IPY_MODEL_5efb687bf68b45ca91797d17fbf88be4", + "value": " 3000/3000 [00:35<00:00, 45.77 examples/s]" + } + }, + "d574cfbfa9f94ef7b1c037bc7884b8b5": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "cc44984cc78442ba9b73df356ad6e6b3": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "45cfbdff1db94eed80cbd6cf4ae19b3d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "68f32e20479e4d61bda9adff2532ba70": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c264561cd45b41109bcf3cacab8aa387": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "63351416559f4bfdb6c21adceaee17e8": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5efb687bf68b45ca91797d17fbf88be4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "state": {} + } + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E4B)-Text.ipynb b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E4B)-Text.ipynb new file mode 100644 index 0000000..fb7c293 --- /dev/null +++ b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E4B)-Text.ipynb @@ -0,0 +1,7100 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "3FgedoPTwNIy" + }, + "source": [ + "To run this, press \"*Runtime*\" and press \"*Run all*\" on a Google Colab L4 instance!\n", + "
\n", + "\n", + "\n", + " Join Discord if you need help + ⭐ Star us on Github ⭐\n", + "
\n", + "\n", + "To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme).\n", + "\n", + "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kD8yWWE9wNIz" + }, + "source": [ + "### News" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dYFNULMnwNIz" + }, + "source": [ + "Introducing **Unsloth Studio** - a new open source, no-code web UI to train and run LLMs. [Blog](https://unsloth.ai/docs/new/studio) • [Notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb)\n", + "\n", + "\n", + "\n", + "\n", + "
\"Unsloth
Train models — no code needed
\"Unsloth
Run GGUF models on Mac, Windows & Linux
\n", + "\n", + "Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe)\n", + "\n", + "Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context)\n", + "\n", + "New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) • [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) • [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning)\n", + "\n", + "Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DQ6w6D0UwNIz" + }, + "source": [ + "### Installation" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "LSA1qFrKwNIz" + }, + "outputs": [], + "source": "%%capture\nimport os, re\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r'[\\d]{1,}\\.[\\d]{1,}', str(torch.__version__)).group(0)\n xformers = 'xformers==' + {'2.10':'0.0.34','2.9':'0.0.33.post1','2.8':'0.0.32.post2'}.get(v, \"0.0.34\")\n !pip install sentencepiece protobuf \"datasets==4.3.0\" \"huggingface_hub>=0.34.0\" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;" + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "lBN09c1tUlSV" + }, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TGMWlrRdzwgf" + }, + "source": [ + "### Unsloth\n", + "\n", + "`FastModel` supports loading nearly any model now! This includes Vision and Text models!" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "-Xbb0cuLzwgf", + "outputId": "b1936459-858a-460f-d39c-fe4d41114da8", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 380, + "referenced_widgets": [ + "1aed93f4ca3b45b5b194aaaba43768ef", + "884a0c3267f743198b1cc2e51eabbd87", + "22ecfe66b78f4470877558859de40608", + "43caa59b7fa94ee39f6993007be26a6e", + "1a677ab08bec40cbaea059912e3f148d", + "7cc26327d62b4cd1b18a301df184e544", + "6cccb64fbc6c46afb29801e6c5f2b8d1", + "671a5973e5504d81a09e0ee30db56a85", + "8a379ddf368842b39c01e94b394b7dd1", + "d5e2df92e9de4dc5a274a545d4b732f1", + "02d78160f48b4145830fc3c61b7cffd1", + "1a60091e9bb54eb3af50e055b0fafd1f", + "e050c9d3e05643ef9657216f814962b1", + "9b0fceb9931a4bc5978fcae33a6ff811", + "2947239108e34a149ef13a84e0b00e3d", + "99e7b18e3811427aab73f1afc8bfb954", + "62aab9454a884fbea96de59255620f87", + "c9975a7e25d54a39a0c5547245fa6178", + "5f089afd43884d14b78a8238cb548f37", + "968e200f35b44909ba35df31d35e4bf7", + "1fd5bf7fbf6744d39f1de3de1b2194f4", + "ddb0c72cb0af468dae7916bdf6a06f24", + "8aee416b9f5e421aab44ff0eb87f5323", + "bc8f33128c8845a9bbf7766a3cf85db6", + "99c78071240d4a448b02a0f5201cc00c", + "5d3f6e5f5b7f46d59cfc1b0d319b5aca", + "63319aaee4cb4fb1b97972b62442815a", + "7d3fa936cd1d4ef189321a9a4df6d9a2", + "acafd4bfe22b47f895909f1f5e30803a", + "724487122781495d82fa21f066b03557", + "3717648cbcba4a76abfe9114906be386", + "0f4af67f55234ad88612edbad914041f", + "eb8e4d5a374b44f0be18fb7baec7051e", + "ca880e2479834f3c98a65a2258e787bd", + "9609898077d646f7a44c4233a9ba9932", + "817f1bdb1dac432c93d868bea15f31fa", + "f4104b3429a74af889509ab26e494fac", + "4d1588e4d6d04ec4885c38e161071463", + "db4d07a9db714f6f93d922882a410e4d", + "fb04cb323b934cde808ba8fba87e66bc", + "0e5806ad42b64bbda0f9f8e426e382c8", + "c56375c628ab44fd95f80fae39ebbd55", + "42bf4041d37540089e80a31afee69b12", + "2e77264ff5764d658fd0d8f03329a3e2", + "694f196a866f47b48024d21e4cb36610", + "8bf349f444c648a3b193d89b5ca48cc5", + "7382b653f8db451d809d65159b10e69f", + "924b0f616af04e0c9f8bb43e2ad4af11", + "409155dc6ebf4736ae18034fe1b53b26", + "0a3c799cf0f843d59681f4a7106ab74b", + "d6c1b7fb52e2481fa12ecc6735225717", + "397cbf2ac3fb49e5ad99e063423c4ef0", + "a5be697d40af4a3b890f9eb1864ee815", + "4fa8b5cfa38247549a3a21d99e7425d9", + "5a0f4872c23e411fa90876d6e8fc6724", + "d466fd0b0e6641769ac60088978f31ca", + "56b3b3f6e9344dc5bcf6f3fea4cefe40", + "275082e9212941399f35de978e507c5b", + "1f84016528874323b222908c7ac9c484", + "766f1105af5d4e298189d47e3288593a", + "76ac77bfac124cd3a16bb28385973b71", + "251468bed4f44c6296194fb80eda1f55", + "c3636672661443abb33d2a338c6ff82c", + "ebffe47f7cc14b67b32f4bd869571260", + "8f6ef19f400e4adc832e018838e8bd43", + "751202361c2046a6993d88bb454a21c7", + "4c07a504c4b84880a3971ef5863dbfab", + "4579e185e6db4a4fbda6d357918a30cd", + "984f24f050514ff9808c0cc0fbf25b55", + "9510f7509289400e9b3b628bc8fa8962", + "742783818d3c41e396cf4fdffa041d7c", + "e855f68299b840888c94c565ce347b29", + "e39e03b3cf5e40d2807391f4959d4e42", + "1ab5e182e89e46dfa26cbabfbc5b522a", + "b0776300bf984578981a4a2c88525ada", + "8a532284dde448e78079dac3b8a29ad9", + "0429d1b4a634476695d10a0eb8b4a6c9" + ] + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.\n", + "🦥 Unsloth Zoo will now patch everything to make training faster!\n", + "==((====))== Unsloth 2026.4.4: Fast Gemma4 patching. Transformers: 5.5.0.\n", + " \\\\ /| Tesla T4. Num GPUs = 1. Max memory: 14.563 GB. Platform: Linux.\n", + "O^O/ \\_/ \\ Torch: 2.10.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.6.0\n", + "\\ / Bfloat16 = FALSE. FA [Xformers = 0.0.34. FA2 = False]\n", + " \"-____-\" Free license: http://github.com/unslothai/unsloth\n", + "Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "model.safetensors: 0%| | 0.00/16.0G [00:00" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "9jGeSb9bWe0k", + "outputId": "289205bd-b890-4659-8c82-8c7674befdfe", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "I am sorry, but I cannot answer that question because the image you provided is of an animal, and I have no information about any films it might feature in.\n" + ] + } + ], + "source": [ + "sloth_link = \"https://files.worldwildlife.org/wwfcmsprod/images/Sloth_Sitting_iStock_3_12_2014/story_full_width/8l7pbjmj29_iStock_000011145477Large_mini__1_.jpg\"\n", + "\n", + "messages = [{\n", + " \"role\" : \"user\",\n", + " \"content\": [\n", + " { \"type\": \"image\", \"image\" : sloth_link },\n", + " { \"type\": \"text\", \"text\" : \"Which films does this animal feature in?\" }\n", + " ]\n", + "}]\n", + "# You might have to wait 1 minute for Unsloth's auto compiler\n", + "do_gemma_4_inference(messages, max_new_tokens = 256)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eh0BzbZPWtRD" + }, + "source": [ + "Let's make a poem about sloths!" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "id": "R3ExuK8cWuT3", + "outputId": "d63b198a-00c6-49a5-d0e5-78cbd18d217f", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "## The Gentle Pace\n", + "\n", + "In emerald woods, where moss hangs deep and low,\n", + "And dappled sunlight through the canopy does flow,\n", + "There moves a creature, draped in patient grace,\n", + "A living statue in this verdant space.\n", + "\n", + "The sloth, a marvel of the slow design,\n", + "A tapestry of stillness, truly divine.\n", + "His fur, a canvas where the lichen clings,\n", + "A quiet testament to what the stillness brings.\n", + "\n", + "He hangs suspended, a deliberate dream,\n", + "Within the slow, meandering forest stream\n", + "Of time itself, where urgency takes flight,\n", + "And moments linger bathed in amber light.\n", + "\n", + "His movements\n" + ] + } + ], + "source": [ + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{ \"type\" : \"text\",\n", + " \"text\" : \"Write a poem about sloths.\" }]\n", + "}]\n", + "do_gemma_4_inference(messages)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wZrmFRZpZtGf" + }, + "source": [ + "# Gemma 4 can also hear!" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "id": "68crYajNZtw1", + "outputId": "e8ffe42f-79d8-473c-8b2b-8d4d05acc549", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 61 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + " \n", + " " + ] + }, + "metadata": {}, + "execution_count": 7 + } + ], + "source": [ + "from IPython.display import Audio, display\n", + "Audio(\"https://www.nasa.gov/wp-content/uploads/2015/01/591240main_JFKmoonspeech.mp3\")" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "id": "k3vrdoa0Z01X" + }, + "outputs": [], + "source": [ + "!wget -qqq https://www.nasa.gov/wp-content/uploads/2015/01/591240main_JFKmoonspeech.mp3 -O audio.mp3" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "id": "BJr_D4O9Z2Zh", + "outputId": "d6517a81-1d77-41dc-b2dd-2e6ab1aa3a29", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "This audio is about the goal of landing a man on the moon and returning him safely to Earth before the decade is out.\n" + ] + } + ], + "source": [ + "audio_file = \"audio.mp3\"\n", + "\n", + "messages = [{\n", + " \"role\" : \"user\",\n", + " \"content\": [\n", + " { \"type\": \"audio\", \"audio\" : audio_file },\n", + " { \"type\": \"text\", \"text\" : \"What is this audio about?\" }\n", + " ]\n", + "}]\n", + "do_gemma_4_inference(messages, max_new_tokens = 256)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "L15JuAmmaOkB" + }, + "source": [ + "# Let's combine all 3 modalities together!" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "id": "is37bsDZaRwV", + "outputId": "37235929-0289-4ba6-c352-da6d389cb5a0", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "This audio clip is a famous excerpt from John F. Kennedy's 1962 speech to Congress. He was advocating for the United States to commit to landing a man on the Moon and bringing him back safely to Earth before the decade was out.\n", + "\n", + "The image is a picture of a sloth.\n", + "\n", + "There is no relationship between the audio and the image. The audio is a political speech, while the image is a photograph of an animal.\n" + ] + } + ], + "source": [ + "messages = [{\n", + " \"role\" : \"user\",\n", + " \"content\": [\n", + " { \"type\": \"audio\", \"audio\" : audio_file },\n", + " { \"type\": \"image\", \"image\" : sloth_link },\n", + " { \"type\": \"text\", \"text\" : \"What is this audio and image about? \"\\\n", + " \"How are they related?\" }\n", + " ]\n", + "}]\n", + "do_gemma_4_inference(messages, max_new_tokens = 256)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Bw5XPyYFajyM" + }, + "source": [ + "# Let's finetune Gemma 4!\n", + "\n", + "You can finetune the vision and text parts for now through selection - the audio part can also be finetuned - we're working to make it selectable as well!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SXd9bTZd1aaL" + }, + "source": [ + "We now add LoRA adapters so we only need to update a small amount of parameters!" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "id": "6bZsfBuZDeCL" + }, + "outputs": [], + "source": [ + "model = FastModel.get_peft_model(\n", + " model,\n", + " finetune_vision_layers = False, # Turn off for just text!\n", + " finetune_language_layers = True, # Should leave on!\n", + " finetune_attention_modules = True, # Attention good for GRPO\n", + " finetune_mlp_modules = True, # Should leave on always!\n", + "\n", + " r = 8, # Larger = higher accuracy, but might overfit\n", + " lora_alpha = 8, # Recommended alpha == r at least\n", + " lora_dropout = 0,\n", + " bias = \"none\",\n", + " random_state = 3407,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vITh0KVJ10qX" + }, + "source": [ + "\n", + "### Data Prep\n", + "We now use the `Gemma-4` format for conversation style finetunes. We use [Maxime Labonne's FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset in ShareGPT style. Gemma-4 renders multi turn conversations like below:\n", + "\n", + "```\n", + "<|turn>user\n", + "Hello\n", + "<|turn>model\n", + "Hey there!\n", + "```\n", + "We use our `get_chat_template` function to get the correct chat template. We support `zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, phi3, llama3, phi4, qwen2.5, gemma3, gemma-4` and more." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "id": "LjY75GoYUCB8" + }, + "outputs": [], + "source": [ + "from unsloth.chat_templates import get_chat_template\n", + "tokenizer = get_chat_template(\n", + " tokenizer,\n", + " chat_template = \"gemma-4\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZQkXuGYxbJ-e" + }, + "source": [ + "We get the first 3000 rows of the dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "id": "Mkq4RvEq7FQr", + "outputId": "e78c54dd-f931-4c00-a73d-e37f3eff19e8", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 113, + "referenced_widgets": [ + "fabccf89d3904b678111fb12a8c7919f", + "496290c588ad4736906e89aa3f80b713", + "3cc36b868a834c19935e1613b114388a", + "6d5b8376f62f4b76b907312714bab547", + "13b8da7ffbd04a48a47e72d631825226", + "73d6158b09a9446d8ce764d75bcaa653", + "eec6c1a9c6d64f8ea02c0a3b35272cbe", + "0a8ea69380804347b38b95054f9a1687", + "dca6ee8c28a0441aba2fb7bdcb806246", + "4d8f3cd8f0ef4c86b987532df2c14a9b", + "1d4052c66e414ee89cb7454e3faac18c", + "e092b15159864c2eb45b15ff8dbcf659", + "d55e6ebfa3fb44d1b2e79c4848e83227", + "78baf58aa7c9410e9ebc74c04948f99c", + "cadca9302ab3450e92a1971c8a9528e7", + "ae7ddb3de6734906ac85729bf01709e7", + "d9b7ac1ff506461eaa9da424a8430c63", + "d0d9352121134f518ca4d9f1ae1d08a8", + "a0a01b6854194da0af7d2ddc96856945", + "a732bbffcaf34403869a6d86c1dd4871", + "10abfe1311914dbaa8f697a82fc567a3", + "7fbaf07e1b0e4fa0acea9a5d4b92559f", + "1503befd068143e992db7bcdca36557f", + "2fb84661a76e49f0912e36917adea52e", + "13d21d9087824744a6056f867dc4ff11", + "418e3b94a5a34882a6794e7092a89067", + "1d8d427198a946e8a2f364662da03b9e", + "76ee2c7cf99e44d7b403a328c5c85357", + "92ffae8e6c1447f8aea99ffae371f053", + "496535fa240f4cdebac514965d928b23", + "631c80e16bd24dd4861b5446e4fa3741", + "4856020cb14e471bab5619270d0d2dac", + "d2a6f7596d4343878a22f791613a3a34" + ] + } + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "README.md: 0%| | 0.00/982 [00:00` token using removeprefix(`''`) since we're finetuning. The Processor will add this token before training and the model expects only one." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "id": "1ahE8Ys37JDJ", + "outputId": "13973ed3-779d-485e-92aa-33b6925f1737", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 49, + "referenced_widgets": [ + "fa62d625aa06430caf57fbe8ff872b43", + "34cb4d22bf9d41329551a1135ebfca63", + "9258980fb6954200823889434a3e0a9a", + "8f8f53d0aff74f288b505717d7e2d625", + "922fe3cc3d88403dbec0c94fae364f93", + "de17d7225f1741aaa9a87e5689c2ada8", + "485e804c2a2b45b3a442219a9e73626c", + "dd0dc8a4da3f4c5897f18bde720a91bc", + "49151c4db95340608bf237e85563fa2f", + "b5c0dae215dc437fba607f57f6720632", + "0ca6a8f22d404f5b808769600655e54d" + ] + } + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Map: 0%| | 0/3000 [00:00') for convo in convos]\n", + " return { \"text\" : texts, }\n", + "\n", + "dataset = dataset.map(formatting_prompts_func, batched = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ndDUB23CGAC5" + }, + "source": [ + "Let's see how the chat template did! Notice there is no `` token as the processor tokenizer will be adding one." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "id": "gGFzmplrEy9I", + "outputId": "36882276-635b-45f5-f9d8-3764373dc35e", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 140 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'<|turn>user\\nWhat is the modulus operator in programming and how can I use it to calculate the modulus of two given numbers?\\n<|turn>model\\nIn programming, the modulus operator is represented by the \\'%\\' symbol. It calculates the remainder when one number is divided by another. To calculate the modulus of two given numbers, you can use the modulus operator in the following way:\\n\\n```python\\n# Calculate the modulus\\nModulus = a % b\\n\\nprint(\"Modulus of the given numbers is: \", Modulus)\\n```\\n\\nIn this code snippet, the variables \\'a\\' and \\'b\\' represent the two given numbers for which you want to calculate the modulus. By using the modulus operator \\'%\\', we calculate the remainder when \\'a\\' is divided by \\'b\\'. The result is then stored in the variable \\'Modulus\\'. Finally, the modulus value is printed using the \\'print\\' statement.\\n\\nFor example, if \\'a\\' is 10 and \\'b\\' is 4, the modulus calculation would be 10 % 4, which equals 2. Therefore, the output of the above code would be:\\n\\n```\\nModulus of the given numbers is: 2\\n```\\n\\nThis means that the modulus of 10 and 4 is 2.\\n'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 17 + } + ], + "source": [ + "dataset[100][\"text\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "idAEIeSQ3xdS" + }, + "source": [ + "\n", + "### Train the model\n", + "Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "id": "95_Nn-89DhsL", + "outputId": "87503724-3283-49e4-fec6-4b2c4e084fca", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 49, + "referenced_widgets": [ + "4f5e0be98f7b48b68f9e4d0b9bce31ee", + "89193bbcf50742e3a40c98f6332a51ce", + "6b1b3000713746a8a1e1309f3b319f3a", + "3f108d33c1d14c1ea82c1cf4c5aba5a3", + "e89d6c56927640ddbbf956a3b6f7c951", + "77c380ae4aa24032b80ecefced482997", + "3b4ebd09fa584e7a8a7f64841e4a3a22", + "bc6abe2b4d0b435c9ffa066c41641971", + "cd6e34ffab074155b6f6c84fad95b017", + "9372cb8b55424f679ed0a06820788cb5", + "573ce421fde44bb6b93677da04991e26" + ] + } + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Unsloth: Tokenizing [\"text\"] (num_proc=6): 0%| | 0/3000 [00:00user\\n\",\n", + " response_part = \"<|turn>model\\n\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Dv1NBUozV78l" + }, + "source": [ + "Let's verify masking the instruction part is done! Let's print the 100th row again. Notice how the sample only has a single `` as expected!" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "id": "LtsMVtlkUhja", + "outputId": "d3126757-2a14-4c6d-99f8-98b3bf91be55", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 140 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'<|turn>user\\nWhat is the modulus operator in programming and how can I use it to calculate the modulus of two given numbers?\\n<|turn>model\\nIn programming, the modulus operator is represented by the \\'%\\' symbol. It calculates the remainder when one number is divided by another. To calculate the modulus of two given numbers, you can use the modulus operator in the following way:\\n\\n```python\\n# Calculate the modulus\\nModulus = a % b\\n\\nprint(\"Modulus of the given numbers is: \", Modulus)\\n```\\n\\nIn this code snippet, the variables \\'a\\' and \\'b\\' represent the two given numbers for which you want to calculate the modulus. By using the modulus operator \\'%\\', we calculate the remainder when \\'a\\' is divided by \\'b\\'. The result is then stored in the variable \\'Modulus\\'. Finally, the modulus value is printed using the \\'print\\' statement.\\n\\nFor example, if \\'a\\' is 10 and \\'b\\' is 4, the modulus calculation would be 10 % 4, which equals 2. Therefore, the output of the above code would be:\\n\\n```\\nModulus of the given numbers is: 2\\n```\\n\\nThis means that the modulus of 10 and 4 is 2.\\n'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 20 + } + ], + "source": [ + "tokenizer.decode(trainer.train_dataset[100][\"input_ids\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4Kyjy__m9KY3" + }, + "source": [ + "Now let's print the masked out example - you should see only the answer is present:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "id": "_rD6fl8EUxnG", + "outputId": "e579a2b3-9df6-4c40-ef7c-98d7c9cd9d82", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 139 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "' In programming, the modulus operator is represented by the \\'%\\' symbol. It calculates the remainder when one number is divided by another. To calculate the modulus of two given numbers, you can use the modulus operator in the following way:\\n\\n```python\\n# Calculate the modulus\\nModulus = a % b\\n\\nprint(\"Modulus of the given numbers is: \", Modulus)\\n```\\n\\nIn this code snippet, the variables \\'a\\' and \\'b\\' represent the two given numbers for which you want to calculate the modulus. By using the modulus operator \\'%\\', we calculate the remainder when \\'a\\' is divided by \\'b\\'. The result is then stored in the variable \\'Modulus\\'. Finally, the modulus value is printed using the \\'print\\' statement.\\n\\nFor example, if \\'a\\' is 10 and \\'b\\' is 4, the modulus calculation would be 10 % 4, which equals 2. Therefore, the output of the above code would be:\\n\\n```\\nModulus of the given numbers is: 2\\n```\\n\\nThis means that the modulus of 10 and 4 is 2.\\n'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 21 + } + ], + "source": [ + "tokenizer.decode([tokenizer.pad_token_id if x == -100 else x for x in trainer.train_dataset[100][\"labels\"]]).replace(tokenizer.pad_token, \" \")" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "cellView": "form", + "id": "2ejIt2xSNKKp", + "outputId": "2ecea13f-2faa-4c7d-e006-a0a1cb90612a", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "GPU = Tesla T4. Max memory = 14.563 GB.\n", + "9.891 GB of memory reserved.\n" + ] + } + ], + "source": [ + "# @title Show current memory stats\n", + "gpu_stats = torch.cuda.get_device_properties(0)\n", + "start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)\n", + "print(f\"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.\")\n", + "print(f\"{start_gpu_memory} GB of memory reserved.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CNP1Uidk9mrz" + }, + "source": [ + "# Let's train the model!\n", + "\n", + "To resume a training run, set `trainer.train(resume_from_checkpoint = True)`" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "id": "yqxqAZ7KJ4oL", + "outputId": "180092e1-ae76-449a-9f2d-0dbc3894b8f8", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': 2}.\n", + "==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1\n", + " \\\\ /| Num examples = 2,991 | Num Epochs = 1 | Total steps = 60\n", + "O^O/ \\_/ \\ Batch size per device = 1 | Gradient accumulation steps = 4\n", + "\\ / Data Parallel GPUs = 1 | Total batch size (1 x 4 x 1) = 4\n", + " \"-____-\" Trainable parameters = 18,350,080 of 8,014,506,528 (0.23% trained)\n", + "Caching is incompatible with gradient checkpointing in Gemma4TextDecoderLayer. Setting `past_key_values=None`.\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Unsloth: Will smartly offload gradients to save VRAM!\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [60/60 05:40, Epoch 0/1]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StepTraining Loss
110.523080
210.053988
312.042601
49.989527
510.095121
610.694251
710.364755
89.754461
98.304264
107.318188
117.289935
126.898206
136.666481
145.392871
156.022815
164.724240
174.793351
185.101266
195.203894
205.107642
214.166076
224.096564
234.555800
244.716465
254.108744
263.947126
274.030560
283.109566
293.588513
303.787085
313.353788
323.781386
333.406949
343.755966
353.335477
363.656681
373.317236
383.059520
393.279643
402.585943
413.501766
423.417128
432.781741
442.492920
452.859325
463.464297
472.931827
483.242573
492.751030
502.390760
512.890718
522.199112
533.136880
542.664481
552.288600
562.728267
572.804231
582.806247
592.632231
602.837519

" + ] + }, + "metadata": {} + } + ], + "source": [ + "trainer_stats = trainer.train()" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "cellView": "form", + "id": "pCqnaKmlO1U9", + "outputId": "585888de-dbd8-4562-c6e5-a5f0dbf26dc7", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "368.1867 seconds used for training.\n", + "6.14 minutes used for training.\n", + "Peak reserved memory = 10.715 GB.\n", + "Peak reserved memory for training = 0.824 GB.\n", + "Peak reserved memory % of max memory = 73.577 %.\n", + "Peak reserved memory for training % of max memory = 5.658 %.\n" + ] + } + ], + "source": [ + "# @title Show final memory and time stats\n", + "used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "used_memory_for_lora = round(used_memory - start_gpu_memory, 3)\n", + "used_percentage = round(used_memory / max_memory * 100, 3)\n", + "lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)\n", + "print(f\"{trainer_stats.metrics['train_runtime']} seconds used for training.\")\n", + "print(\n", + " f\"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.\"\n", + ")\n", + "print(f\"Peak reserved memory = {used_memory} GB.\")\n", + "print(f\"Peak reserved memory for training = {used_memory_for_lora} GB.\")\n", + "print(f\"Peak reserved memory % of max memory = {used_percentage} %.\")\n", + "print(f\"Peak reserved memory for training % of max memory = {lora_percentage} %.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ekOmTR1hSNcr" + }, + "source": [ + "\n", + "### Inference\n", + "Let's run the model via Unsloth native inference! According to the `Gemma-4` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64`" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "id": "kR3gIAX-SM2q", + "outputId": "42097560-b2b6-4d32-a9f8-160db24434cc", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "['<|turn>user\\nContinue the sequence: 1, 1, 2, 3, 5, 8,\\n<|turn>model\\n13, 21, 34, 55, 89, ...\\n\\nThis is the **Fibonacci sequence**, where each number is the sum of the two preceding ones.']" + ], + "text/html": [ + "

['<bos><|turn>user\\nContinue the sequence: 1, 1, 2, 3, 5, 8,<turn|>\\n<|turn>model\\n13, 21, 34, 55, 89, ...\\n\\nThis is the **Fibonacci sequence**, where each number is the sum of the two preceding ones.<turn|>']
" + ] + }, + "metadata": {}, + "execution_count": 25 + } + ], + "source": [ + "from unsloth.chat_templates import get_chat_template\n", + "tokenizer = get_chat_template(\n", + " tokenizer,\n", + " chat_template = \"gemma-4\",\n", + ")\n", + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{\n", + " \"type\" : \"text\",\n", + " \"text\" : \"Continue the sequence: 1, 1, 2, 3, 5, 8,\",\n", + " }]\n", + "}]\n", + "inputs = tokenizer.apply_chat_template(\n", + " messages,\n", + " add_generation_prompt = True, # Must add for generation\n", + " return_tensors = \"pt\",\n", + " tokenize = True,\n", + " return_dict = True,\n", + ").to(\"cuda\")\n", + "outputs = model.generate(\n", + " **inputs,\n", + " max_new_tokens = 64, # Increase for longer outputs!\n", + " # Recommended Gemma-4 settings!\n", + " temperature = 1.0, top_p = 0.95, top_k = 64,\n", + ")\n", + "tokenizer.batch_decode(outputs)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CrSvZObor0lY" + }, + "source": [ + " You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "id": "e2pEuRb1r2Vg", + "outputId": "a648ff4f-3762-44d5-eb8b-5045a5c01a50", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The sky appears blue primarily due to a phenomenon called **Rayleigh scattering**. Here's a breakdown of how this works:\n", + "\n", + "1. **Sunlight is composed of different wavelengths:** Sunlight, which comes from the sun, is made up of various colors, each with a different wavelength. Blue light has a shorter wavelength\n" + ] + } + ], + "source": [ + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\" : \"text\", \"text\" : \"Why is the sky blue?\",}]\n", + "}]\n", + "inputs = tokenizer.apply_chat_template(\n", + " messages,\n", + " add_generation_prompt = True, # Must add for generation\n", + " return_tensors = \"pt\",\n", + " tokenize = True,\n", + " return_dict = True,\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "_ = model.generate(\n", + " **inputs,\n", + " max_new_tokens = 64, # Increase for longer outputs!\n", + " # Recommended Gemma-4 settings!\n", + " temperature = 1.0, top_p = 0.95, top_k = 64,\n", + " streamer = TextStreamer(tokenizer, skip_prompt = True),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uMuVrWbjAzhc" + }, + "source": [ + "\n", + "### Saving, loading finetuned models\n", + "To save the final model as LoRA adapters, either use Hugging Face's `push_to_hub` for an online save or `save_pretrained` for a local save.\n", + "\n", + "**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "id": "upcOlWe7A1vc", + "outputId": "b38635d3-0eae-40ed-93f1-771f04c205c0", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "['gemma_4_lora/processor_config.json']" + ] + }, + "metadata": {}, + "execution_count": 27 + } + ], + "source": [ + "model.save_pretrained(\"gemma_4_lora\") # Local saving\n", + "tokenizer.save_pretrained(\"gemma_4_lora\")\n", + "# model.push_to_hub(\"HF_ACCOUNT/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving\n", + "# tokenizer.push_to_hub(\"HF_ACCOUNT/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AEEcJ4qfC7Lp" + }, + "source": [ + "Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": { + "id": "MKX_XKs_BNZR", + "outputId": "c6380f12-92c3-44da-ee71-c6895cb19a95", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "I am Gemma 4, a Large Language Model developed by Google DeepMind. I am an open weights model.\n" + ] + } + ], + "source": [ + "if False:\n", + " from unsloth import FastModel\n", + " model, tokenizer = FastModel.from_pretrained(\n", + " model_name = \"gemma_4_lora\", # YOUR MODEL YOU USED FOR TRAINING\n", + " max_seq_length = 2048,\n", + " load_in_4bit = True,\n", + " )\n", + "\n", + "messages = [{\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\" : \"text\", \"text\" : \"What is Gemma-4?\",}]\n", + "}]\n", + "inputs = tokenizer.apply_chat_template(\n", + " messages,\n", + " add_generation_prompt = True, # Must add for generation\n", + " return_tensors = \"pt\",\n", + " tokenize = True,\n", + " return_dict = True,\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "_ = model.generate(\n", + " **inputs,\n", + " max_new_tokens = 128, # Increase for longer outputs!\n", + " # Recommended Gemma-4 settings!\n", + " temperature = 1.0, top_p = 0.95, top_k = 64,\n", + " streamer = TextStreamer(tokenizer, skip_prompt = True),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f422JgM9sdVT" + }, + "source": [ + "### Saving to float16 for VLLM\n", + "\n", + "We also support saving to `float16` directly for deployment! We save it in the folder `gemma-4-finetune`. Set `if False` to `if True` to let it run!" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": { + "id": "iHjt_SMYsd3P" + }, + "outputs": [], + "source": [ + "if False: # Change to True to save finetune!\n", + " model.save_pretrained_merged(\"gemma-4-finetune\", tokenizer)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "z6O48DbNIAr0" + }, + "source": [ + "If you want to upload / push to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location!" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": { + "id": "ZV-CiKPrIFG0" + }, + "outputs": [], + "source": [ + "if False: # Change to True to upload finetune\n", + " model.push_to_hub_merged(\n", + " \"HF_ACCOUNT/gemma-4-finetune\", tokenizer,\n", + " token = \"YOUR_HF_TOKEN\"\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TCv4vXHd61i7" + }, + "source": [ + "### GGUF / llama.cpp Conversion\n", + "To save to `GGUF` / `llama.cpp`, we support it natively now for all models! For now, you can convert easily to `Q8_0, F16 or BF16` precision. `Q4_K_M` for 4bit will come later!" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": { + "id": "FqfebeAdT073" + }, + "outputs": [], + "source": [ + "if False: # Change to True to save to GGUF\n", + " model.save_pretrained_gguf(\n", + " \"gemma_4_finetune\",\n", + " tokenizer,\n", + " quantization_method = \"Q8_0\", # For now only Q8_0, BF16, F16 supported\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Q974YEVPI7JS" + }, + "source": [ + "Likewise, if you want to instead push to GGUF to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location!" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "id": "ZgcJIhJ0I_es" + }, + "outputs": [], + "source": [ + "if False: # Change to True to upload GGUF\n", + " model.push_to_hub_gguf(\n", + " \"HF_ACCOUNT/gemma_4_finetune\",\n", + " tokenizer,\n", + " quantization_method = \"Q8_0\", # Only Q8_0, BF16, F16 supported\n", + " token = \"YOUR_HF_TOKEN\",\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pnz9QOYTMvbH" + }, + "source": [ + "Now, use the `gemma-4-finetune.gguf` file or `gemma-4-finetune-Q4_K_M.gguf` file in llama.cpp.\n", + "\n", + "And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!\n", + "\n", + "Some other resources:\n", + "1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)\n", + "2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)\n", + "3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)\n", + "4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)!\n", + "\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " Join Discord if you need help + ⭐️ Star us on Github ⭐️\n", + "
\n", + "\n", + " This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme)." + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "1aed93f4ca3b45b5b194aaaba43768ef": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_884a0c3267f743198b1cc2e51eabbd87", + "IPY_MODEL_22ecfe66b78f4470877558859de40608", + "IPY_MODEL_43caa59b7fa94ee39f6993007be26a6e" + ], + "layout": "IPY_MODEL_1a677ab08bec40cbaea059912e3f148d" + } + }, + "884a0c3267f743198b1cc2e51eabbd87": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7cc26327d62b4cd1b18a301df184e544", + "placeholder": "​", + "style": "IPY_MODEL_6cccb64fbc6c46afb29801e6c5f2b8d1", + "value": "model.safetensors: 100%" + } + }, + "22ecfe66b78f4470877558859de40608": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_671a5973e5504d81a09e0ee30db56a85", + "max": 15992595884, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_8a379ddf368842b39c01e94b394b7dd1", + "value": 15992595884 + } + }, + "43caa59b7fa94ee39f6993007be26a6e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d5e2df92e9de4dc5a274a545d4b732f1", + "placeholder": "​", + "style": "IPY_MODEL_02d78160f48b4145830fc3c61b7cffd1", + "value": " 16.0G/16.0G [05:22<00:00, 116MB/s]" + } + }, + "1a677ab08bec40cbaea059912e3f148d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7cc26327d62b4cd1b18a301df184e544": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6cccb64fbc6c46afb29801e6c5f2b8d1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "671a5973e5504d81a09e0ee30db56a85": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8a379ddf368842b39c01e94b394b7dd1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "d5e2df92e9de4dc5a274a545d4b732f1": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "02d78160f48b4145830fc3c61b7cffd1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "1a60091e9bb54eb3af50e055b0fafd1f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_e050c9d3e05643ef9657216f814962b1", + "IPY_MODEL_9b0fceb9931a4bc5978fcae33a6ff811", + "IPY_MODEL_2947239108e34a149ef13a84e0b00e3d" + ], + "layout": "IPY_MODEL_99e7b18e3811427aab73f1afc8bfb954" + } + }, + "e050c9d3e05643ef9657216f814962b1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_62aab9454a884fbea96de59255620f87", + "placeholder": "​", + "style": "IPY_MODEL_c9975a7e25d54a39a0c5547245fa6178", + "value": "Loading weights: 100%" + } + }, + "9b0fceb9931a4bc5978fcae33a6ff811": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_5f089afd43884d14b78a8238cb548f37", + "max": 2130, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_968e200f35b44909ba35df31d35e4bf7", + "value": 2130 + } + }, + "2947239108e34a149ef13a84e0b00e3d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_1fd5bf7fbf6744d39f1de3de1b2194f4", + "placeholder": "​", + "style": "IPY_MODEL_ddb0c72cb0af468dae7916bdf6a06f24", + "value": " 2130/2130 [01:07<00:00, 407.11it/s]" + } + }, + "99e7b18e3811427aab73f1afc8bfb954": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "62aab9454a884fbea96de59255620f87": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c9975a7e25d54a39a0c5547245fa6178": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "5f089afd43884d14b78a8238cb548f37": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "968e200f35b44909ba35df31d35e4bf7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "1fd5bf7fbf6744d39f1de3de1b2194f4": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ddb0c72cb0af468dae7916bdf6a06f24": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "8aee416b9f5e421aab44ff0eb87f5323": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_bc8f33128c8845a9bbf7766a3cf85db6", + "IPY_MODEL_99c78071240d4a448b02a0f5201cc00c", + "IPY_MODEL_5d3f6e5f5b7f46d59cfc1b0d319b5aca" + ], + "layout": "IPY_MODEL_63319aaee4cb4fb1b97972b62442815a" + } + }, + "bc8f33128c8845a9bbf7766a3cf85db6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7d3fa936cd1d4ef189321a9a4df6d9a2", + "placeholder": "​", + "style": "IPY_MODEL_acafd4bfe22b47f895909f1f5e30803a", + "value": "generation_config.json: 100%" + } + }, + "99c78071240d4a448b02a0f5201cc00c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_724487122781495d82fa21f066b03557", + "max": 208, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_3717648cbcba4a76abfe9114906be386", + "value": 208 + } + }, + "5d3f6e5f5b7f46d59cfc1b0d319b5aca": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0f4af67f55234ad88612edbad914041f", + "placeholder": "​", + "style": "IPY_MODEL_eb8e4d5a374b44f0be18fb7baec7051e", + "value": " 208/208 [00:00<00:00, 22.2kB/s]" + } + }, + "63319aaee4cb4fb1b97972b62442815a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7d3fa936cd1d4ef189321a9a4df6d9a2": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "acafd4bfe22b47f895909f1f5e30803a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "724487122781495d82fa21f066b03557": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3717648cbcba4a76abfe9114906be386": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "0f4af67f55234ad88612edbad914041f": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "eb8e4d5a374b44f0be18fb7baec7051e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "ca880e2479834f3c98a65a2258e787bd": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_9609898077d646f7a44c4233a9ba9932", + "IPY_MODEL_817f1bdb1dac432c93d868bea15f31fa", + "IPY_MODEL_f4104b3429a74af889509ab26e494fac" + ], + "layout": "IPY_MODEL_4d1588e4d6d04ec4885c38e161071463" + } + }, + "9609898077d646f7a44c4233a9ba9932": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_db4d07a9db714f6f93d922882a410e4d", + "placeholder": "​", + "style": "IPY_MODEL_fb04cb323b934cde808ba8fba87e66bc", + "value": "processor_config.json: " + } + }, + "817f1bdb1dac432c93d868bea15f31fa": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0e5806ad42b64bbda0f9f8e426e382c8", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_c56375c628ab44fd95f80fae39ebbd55", + "value": 1 + } + }, + "f4104b3429a74af889509ab26e494fac": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_42bf4041d37540089e80a31afee69b12", + "placeholder": "​", + "style": "IPY_MODEL_2e77264ff5764d658fd0d8f03329a3e2", + "value": " 1.69k/? [00:00<00:00, 117kB/s]" + } + }, + "4d1588e4d6d04ec4885c38e161071463": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "db4d07a9db714f6f93d922882a410e4d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "fb04cb323b934cde808ba8fba87e66bc": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "0e5806ad42b64bbda0f9f8e426e382c8": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "c56375c628ab44fd95f80fae39ebbd55": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "42bf4041d37540089e80a31afee69b12": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2e77264ff5764d658fd0d8f03329a3e2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "694f196a866f47b48024d21e4cb36610": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_8bf349f444c648a3b193d89b5ca48cc5", + "IPY_MODEL_7382b653f8db451d809d65159b10e69f", + "IPY_MODEL_924b0f616af04e0c9f8bb43e2ad4af11" + ], + "layout": "IPY_MODEL_409155dc6ebf4736ae18034fe1b53b26" + } + }, + "8bf349f444c648a3b193d89b5ca48cc5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0a3c799cf0f843d59681f4a7106ab74b", + "placeholder": "​", + "style": "IPY_MODEL_d6c1b7fb52e2481fa12ecc6735225717", + "value": "chat_template.jinja: " + } + }, + "7382b653f8db451d809d65159b10e69f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_397cbf2ac3fb49e5ad99e063423c4ef0", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_a5be697d40af4a3b890f9eb1864ee815", + "value": 1 + } + }, + "924b0f616af04e0c9f8bb43e2ad4af11": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_4fa8b5cfa38247549a3a21d99e7425d9", + "placeholder": "​", + "style": "IPY_MODEL_5a0f4872c23e411fa90876d6e8fc6724", + "value": " 11.9k/? [00:00<00:00, 1.03MB/s]" + } + }, + "409155dc6ebf4736ae18034fe1b53b26": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0a3c799cf0f843d59681f4a7106ab74b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d6c1b7fb52e2481fa12ecc6735225717": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "397cbf2ac3fb49e5ad99e063423c4ef0": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "a5be697d40af4a3b890f9eb1864ee815": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "4fa8b5cfa38247549a3a21d99e7425d9": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5a0f4872c23e411fa90876d6e8fc6724": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "d466fd0b0e6641769ac60088978f31ca": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_56b3b3f6e9344dc5bcf6f3fea4cefe40", + "IPY_MODEL_275082e9212941399f35de978e507c5b", + "IPY_MODEL_1f84016528874323b222908c7ac9c484" + ], + "layout": "IPY_MODEL_766f1105af5d4e298189d47e3288593a" + } + }, + "56b3b3f6e9344dc5bcf6f3fea4cefe40": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_76ac77bfac124cd3a16bb28385973b71", + "placeholder": "​", + "style": "IPY_MODEL_251468bed4f44c6296194fb80eda1f55", + "value": "tokenizer_config.json: " + } + }, + "275082e9212941399f35de978e507c5b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c3636672661443abb33d2a338c6ff82c", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_ebffe47f7cc14b67b32f4bd869571260", + "value": 1 + } + }, + "1f84016528874323b222908c7ac9c484": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_8f6ef19f400e4adc832e018838e8bd43", + "placeholder": "​", + "style": "IPY_MODEL_751202361c2046a6993d88bb454a21c7", + "value": " 14.9k/? [00:00<00:00, 619kB/s]" + } + }, + "766f1105af5d4e298189d47e3288593a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "76ac77bfac124cd3a16bb28385973b71": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "251468bed4f44c6296194fb80eda1f55": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "c3636672661443abb33d2a338c6ff82c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "ebffe47f7cc14b67b32f4bd869571260": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "8f6ef19f400e4adc832e018838e8bd43": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "751202361c2046a6993d88bb454a21c7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "4c07a504c4b84880a3971ef5863dbfab": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_4579e185e6db4a4fbda6d357918a30cd", + "IPY_MODEL_984f24f050514ff9808c0cc0fbf25b55", + "IPY_MODEL_9510f7509289400e9b3b628bc8fa8962" + ], + "layout": "IPY_MODEL_742783818d3c41e396cf4fdffa041d7c" + } + }, + "4579e185e6db4a4fbda6d357918a30cd": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_e855f68299b840888c94c565ce347b29", + "placeholder": "​", + "style": "IPY_MODEL_e39e03b3cf5e40d2807391f4959d4e42", + "value": "tokenizer.json: 100%" + } + }, + "984f24f050514ff9808c0cc0fbf25b55": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_1ab5e182e89e46dfa26cbabfbc5b522a", + "max": 32169626, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_b0776300bf984578981a4a2c88525ada", + "value": 32169626 + } + }, + "9510f7509289400e9b3b628bc8fa8962": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_8a532284dde448e78079dac3b8a29ad9", + "placeholder": "​", + "style": "IPY_MODEL_0429d1b4a634476695d10a0eb8b4a6c9", + "value": " 32.2M/32.2M [00:00<00:00, 161MB/s]" + } + }, + "742783818d3c41e396cf4fdffa041d7c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e855f68299b840888c94c565ce347b29": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e39e03b3cf5e40d2807391f4959d4e42": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "1ab5e182e89e46dfa26cbabfbc5b522a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b0776300bf984578981a4a2c88525ada": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "8a532284dde448e78079dac3b8a29ad9": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0429d1b4a634476695d10a0eb8b4a6c9": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "fabccf89d3904b678111fb12a8c7919f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_496290c588ad4736906e89aa3f80b713", + "IPY_MODEL_3cc36b868a834c19935e1613b114388a", + "IPY_MODEL_6d5b8376f62f4b76b907312714bab547" + ], + "layout": "IPY_MODEL_13b8da7ffbd04a48a47e72d631825226" + } + }, + "496290c588ad4736906e89aa3f80b713": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_73d6158b09a9446d8ce764d75bcaa653", + "placeholder": "​", + "style": "IPY_MODEL_eec6c1a9c6d64f8ea02c0a3b35272cbe", + "value": "README.md: 100%" + } + }, + "3cc36b868a834c19935e1613b114388a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0a8ea69380804347b38b95054f9a1687", + "max": 982, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_dca6ee8c28a0441aba2fb7bdcb806246", + "value": 982 + } + }, + "6d5b8376f62f4b76b907312714bab547": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_4d8f3cd8f0ef4c86b987532df2c14a9b", + "placeholder": "​", + "style": "IPY_MODEL_1d4052c66e414ee89cb7454e3faac18c", + "value": " 982/982 [00:00<00:00, 102kB/s]" + } + }, + "13b8da7ffbd04a48a47e72d631825226": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "73d6158b09a9446d8ce764d75bcaa653": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "eec6c1a9c6d64f8ea02c0a3b35272cbe": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "0a8ea69380804347b38b95054f9a1687": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "dca6ee8c28a0441aba2fb7bdcb806246": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "4d8f3cd8f0ef4c86b987532df2c14a9b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1d4052c66e414ee89cb7454e3faac18c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "e092b15159864c2eb45b15ff8dbcf659": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_d55e6ebfa3fb44d1b2e79c4848e83227", + "IPY_MODEL_78baf58aa7c9410e9ebc74c04948f99c", + "IPY_MODEL_cadca9302ab3450e92a1971c8a9528e7" + ], + "layout": "IPY_MODEL_ae7ddb3de6734906ac85729bf01709e7" + } + }, + "d55e6ebfa3fb44d1b2e79c4848e83227": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d9b7ac1ff506461eaa9da424a8430c63", + "placeholder": "​", + "style": "IPY_MODEL_d0d9352121134f518ca4d9f1ae1d08a8", + "value": "data/train-00000-of-00001.parquet: 100%" + } + }, + "78baf58aa7c9410e9ebc74c04948f99c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a0a01b6854194da0af7d2ddc96856945", + "max": 116531415, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_a732bbffcaf34403869a6d86c1dd4871", + "value": 116531415 + } + }, + "cadca9302ab3450e92a1971c8a9528e7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_10abfe1311914dbaa8f697a82fc567a3", + "placeholder": "​", + "style": "IPY_MODEL_7fbaf07e1b0e4fa0acea9a5d4b92559f", + "value": " 117M/117M [00:01<00:00, 582MB/s]" + } + }, + "ae7ddb3de6734906ac85729bf01709e7": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d9b7ac1ff506461eaa9da424a8430c63": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d0d9352121134f518ca4d9f1ae1d08a8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "a0a01b6854194da0af7d2ddc96856945": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a732bbffcaf34403869a6d86c1dd4871": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "10abfe1311914dbaa8f697a82fc567a3": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7fbaf07e1b0e4fa0acea9a5d4b92559f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "1503befd068143e992db7bcdca36557f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_2fb84661a76e49f0912e36917adea52e", + "IPY_MODEL_13d21d9087824744a6056f867dc4ff11", + "IPY_MODEL_418e3b94a5a34882a6794e7092a89067" + ], + "layout": "IPY_MODEL_1d8d427198a946e8a2f364662da03b9e" + } + }, + "2fb84661a76e49f0912e36917adea52e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_76ee2c7cf99e44d7b403a328c5c85357", + "placeholder": "​", + "style": "IPY_MODEL_92ffae8e6c1447f8aea99ffae371f053", + "value": "Generating train split: 100%" + } + }, + "13d21d9087824744a6056f867dc4ff11": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_496535fa240f4cdebac514965d928b23", + "max": 100000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_631c80e16bd24dd4861b5446e4fa3741", + "value": 100000 + } + }, + "418e3b94a5a34882a6794e7092a89067": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_4856020cb14e471bab5619270d0d2dac", + "placeholder": "​", + "style": "IPY_MODEL_d2a6f7596d4343878a22f791613a3a34", + "value": " 100000/100000 [00:02<00:00, 45853.04 examples/s]" + } + }, + "1d8d427198a946e8a2f364662da03b9e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "76ee2c7cf99e44d7b403a328c5c85357": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "92ffae8e6c1447f8aea99ffae371f053": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "496535fa240f4cdebac514965d928b23": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "631c80e16bd24dd4861b5446e4fa3741": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "4856020cb14e471bab5619270d0d2dac": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d2a6f7596d4343878a22f791613a3a34": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "f36a1f82456f457dbad01cb6b9d017fd": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_e77faea50b004da7bdc6023d39d651e7", + "IPY_MODEL_5f79747fb52b4fbabe01594958a0b2dd", + "IPY_MODEL_85a88eee278242438c435778fdec141b" + ], + "layout": "IPY_MODEL_1a5b89f465d547a8a7d2549fbdbdeb1b" + } + }, + "e77faea50b004da7bdc6023d39d651e7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_af99ca2c778b41259998a8ac3f6026bb", + "placeholder": "​", + "style": "IPY_MODEL_79ab23a7e9dd4735aa91f862895191bd", + "value": "Unsloth: Standardizing formats (num_proc=6): 100%" + } + }, + "5f79747fb52b4fbabe01594958a0b2dd": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ef5d97eb07744384a11a9b4b1ad85cca", + "max": 3000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_524e543a3a79455885fd299bfaf789ac", + "value": 3000 + } + }, + "85a88eee278242438c435778fdec141b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b6180f25519d4334b257ea2801e8e071", + "placeholder": "​", + "style": "IPY_MODEL_9054b9d557404a5893f0cecd2d3c0f44", + "value": " 3000/3000 [00:01<00:00, 403.17 examples/s]" + } + }, + "1a5b89f465d547a8a7d2549fbdbdeb1b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "af99ca2c778b41259998a8ac3f6026bb": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "79ab23a7e9dd4735aa91f862895191bd": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "ef5d97eb07744384a11a9b4b1ad85cca": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "524e543a3a79455885fd299bfaf789ac": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "b6180f25519d4334b257ea2801e8e071": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9054b9d557404a5893f0cecd2d3c0f44": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "fa62d625aa06430caf57fbe8ff872b43": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_34cb4d22bf9d41329551a1135ebfca63", + "IPY_MODEL_9258980fb6954200823889434a3e0a9a", + "IPY_MODEL_8f8f53d0aff74f288b505717d7e2d625" + ], + "layout": "IPY_MODEL_922fe3cc3d88403dbec0c94fae364f93" + } + }, + "34cb4d22bf9d41329551a1135ebfca63": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_de17d7225f1741aaa9a87e5689c2ada8", + "placeholder": "​", + "style": "IPY_MODEL_485e804c2a2b45b3a442219a9e73626c", + "value": "Map: 100%" + } + }, + "9258980fb6954200823889434a3e0a9a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_dd0dc8a4da3f4c5897f18bde720a91bc", + "max": 3000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_49151c4db95340608bf237e85563fa2f", + "value": 3000 + } + }, + "8f8f53d0aff74f288b505717d7e2d625": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b5c0dae215dc437fba607f57f6720632", + "placeholder": "​", + "style": "IPY_MODEL_0ca6a8f22d404f5b808769600655e54d", + "value": " 3000/3000 [00:00<00:00, 10098.84 examples/s]" + } + }, + "922fe3cc3d88403dbec0c94fae364f93": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "de17d7225f1741aaa9a87e5689c2ada8": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "485e804c2a2b45b3a442219a9e73626c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "dd0dc8a4da3f4c5897f18bde720a91bc": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "49151c4db95340608bf237e85563fa2f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "b5c0dae215dc437fba607f57f6720632": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0ca6a8f22d404f5b808769600655e54d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "4f5e0be98f7b48b68f9e4d0b9bce31ee": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_89193bbcf50742e3a40c98f6332a51ce", + "IPY_MODEL_6b1b3000713746a8a1e1309f3b319f3a", + "IPY_MODEL_3f108d33c1d14c1ea82c1cf4c5aba5a3" + ], + "layout": "IPY_MODEL_e89d6c56927640ddbbf956a3b6f7c951" + } + }, + "89193bbcf50742e3a40c98f6332a51ce": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_77c380ae4aa24032b80ecefced482997", + "placeholder": "​", + "style": "IPY_MODEL_3b4ebd09fa584e7a8a7f64841e4a3a22", + "value": "Unsloth: Tokenizing ["text"] (num_proc=6): 100%" + } + }, + "6b1b3000713746a8a1e1309f3b319f3a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_bc6abe2b4d0b435c9ffa066c41641971", + "max": 3000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_cd6e34ffab074155b6f6c84fad95b017", + "value": 3000 + } + }, + "3f108d33c1d14c1ea82c1cf4c5aba5a3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_9372cb8b55424f679ed0a06820788cb5", + "placeholder": "​", + "style": "IPY_MODEL_573ce421fde44bb6b93677da04991e26", + "value": " 3000/3000 [00:44<00:00, 74.50 examples/s]" + } + }, + "e89d6c56927640ddbbf956a3b6f7c951": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "77c380ae4aa24032b80ecefced482997": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3b4ebd09fa584e7a8a7f64841e4a3a22": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "bc6abe2b4d0b435c9ffa066c41641971": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "cd6e34ffab074155b6f6c84fad95b017": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "9372cb8b55424f679ed0a06820788cb5": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "573ce421fde44bb6b93677da04991e26": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "89eb46be58fb46d4aca8d4677fea2153": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_5a1d006bc4d2400ca95af83ebb1b91a4", + "IPY_MODEL_bc044e548e744a5eb8f91a0dda213319", + "IPY_MODEL_ebcff7a00369403c82946a1f202d7309" + ], + "layout": "IPY_MODEL_9197f25e359946ec865c2caac52f44d2" + } + }, + "5a1d006bc4d2400ca95af83ebb1b91a4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_4cf29dfc85e84f8688c3f7b4b79c4a93", + "placeholder": "​", + "style": "IPY_MODEL_1654f87a636749118813f1bb4bdb9f66", + "value": "Map (num_proc=6): 100%" + } + }, + "bc044e548e744a5eb8f91a0dda213319": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_e3e2640addfd43718e3726f1aff5e44d", + "max": 3000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_631772c3210f494a8eb39af885b7f9d6", + "value": 3000 + } + }, + "ebcff7a00369403c82946a1f202d7309": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_813aa6ed1133462ba9f69532341767db", + "placeholder": "​", + "style": "IPY_MODEL_6cd4a68cddfc43f9a2d7462198647468", + "value": " 3000/3000 [00:02<00:00, 2030.22 examples/s]" + } + }, + "9197f25e359946ec865c2caac52f44d2": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4cf29dfc85e84f8688c3f7b4b79c4a93": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1654f87a636749118813f1bb4bdb9f66": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "e3e2640addfd43718e3726f1aff5e44d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "631772c3210f494a8eb39af885b7f9d6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "813aa6ed1133462ba9f69532341767db": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6cd4a68cddfc43f9a2d7462198647468": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3e5451c464e24d4d807ad3beefe977e8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_a44c8d07c6ab424e9a6fc02389f79d2c", + "IPY_MODEL_b3f1dfcd5af54cea900d4e441770dc1d", + "IPY_MODEL_328d934117dd45d99ba137ec7a85b320" + ], + "layout": "IPY_MODEL_28971c9285db4b86a478d96f98df6391" + } + }, + "a44c8d07c6ab424e9a6fc02389f79d2c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_338bccf244944b548ad33c657982d07f", + "placeholder": "​", + "style": "IPY_MODEL_d1b226a034d745b1a6e8bd5478aa558d", + "value": "Filter (num_proc=6): 100%" + } + }, + "b3f1dfcd5af54cea900d4e441770dc1d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_64e3e0aa4153493fb99329f5533bbe56", + "max": 3000, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_badc246c049d40f79271d3a45a98e2e1", + "value": 3000 + } + }, + "328d934117dd45d99ba137ec7a85b320": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_4630936a67fe483eb6f2d5cadac8f7d8", + "placeholder": "​", + "style": "IPY_MODEL_b4a031e444ad48ddae94b76986595995", + "value": " 3000/3000 [00:04<00:00, 602.46 examples/s]" + } + }, + "28971c9285db4b86a478d96f98df6391": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "338bccf244944b548ad33c657982d07f": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d1b226a034d745b1a6e8bd5478aa558d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "64e3e0aa4153493fb99329f5533bbe56": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "badc246c049d40f79271d3a45a98e2e1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "4630936a67fe483eb6f2d5cadac8f7d8": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b4a031e444ad48ddae94b76986595995": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "state": {} + } + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E4B)-Vision.ipynb b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E4B)-Vision.ipynb new file mode 100644 index 0000000..b4ff3b6 --- /dev/null +++ b/tooling/fine-tuning/unsloth/notebooks/Gemma4_(E4B)-Vision.ipynb @@ -0,0 +1,5679 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "2vQXvnUUsTzI" + }, + "source": [ + "To run this, press \"*Runtime*\" and press \"*Run all*\" on a Google Colab L4 instance!\n", + "
\n", + "\n", + "\n", + " Join Discord if you need help + ⭐ Star us on Github ⭐\n", + "
\n", + "\n", + "To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme).\n", + "\n", + "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7j01DfVgsTzJ" + }, + "source": [ + "### News" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6dT42nHksTzJ" + }, + "source": [ + "Introducing **Unsloth Studio** - a new open source, no-code web UI to train and run LLMs. [Blog](https://unsloth.ai/docs/new/studio) • [Notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb)\n", + "\n", + "\n", + "\n", + "\n", + "
\"Unsloth
Train models — no code needed
\"Unsloth
Run GGUF models on Mac, Windows & Linux
\n", + "\n", + "Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe)\n", + "\n", + "Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context)\n", + "\n", + "New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) • [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) • [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning)\n", + "\n", + "Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "K7fgQkATsTzK" + }, + "source": [ + "### Installation" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "vA7IKFdUsTzK" + }, + "outputs": [], + "source": "%%capture\nimport os, re\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r'[\\d]{1,}\\.[\\d]{1,}', str(torch.__version__)).group(0)\n xformers = 'xformers==' + {'2.10':'0.0.34','2.9':'0.0.33.post1','2.8':'0.0.32.post2'}.get(v, \"0.0.34\")\n !pip install sentencepiece protobuf \"datasets==4.3.0\" \"huggingface_hub>=0.34.0\" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;" + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "Mp4i13PHsTzK" + }, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GFOEZbP7ONMs" + }, + "source": [ + "### Unsloth" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "QmUBVEnvCDJv", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 380, + "referenced_widgets": [ + "99bff77887c742b188878c8375844c00", + "69e590ac68fc4967b08d7b03f8d1029c", + "4a9ba035d1384ebe9c80163ac262e4e8", + "b24ef9781b824950bec1ec219867eee7", + "3a6e1f1e42cc4ef199c7d2f727f1077f", + "56bb57f481f3478cb724c8ba1f2b6cbe", + "cc4c86caf7c0411887432e63c5f7390e", + "123af53e4ce64b43ac0039ca8fd8c093", + "cc4c9812b7f74cde97abf1e2296913ad", + "b1a460f7070744e38f66bef5c8113442", + "e827a7825283426f80d9a1e1f86019a9", + "e9caa0bb6c3f4e69b32d74144283097d", + "b2dc300bcb304343a6e719211bfa16b5", + "bbc1086d0d8d44a68e8896bba354f67a", + "a8259ae181f94a7dbe728e4fbea13b50", + "56e6d46ada2748f79cddbb862e547475", + "247d66307c12403aa152e6afaa3fb772", + "2e205311dd6f46b78d6c7f0fac475d27", + "260ff38c4cd848b2948ecaea9abb2de3", + "d22d7e4650e443fe9ce1b84520c3894e", + "16373d8ce24c4be095e520fc2f063a81", + "51ec6a2c40dd46d385befe3e4cb32dce", + "7f0f2a4a7eda4176b544a876ca46aa0c", + "ca6772730e004e2ea6941ddd5e8965be", + "2798c261bb4a449da37c73d428aa750d", + "45927d38b34747a8969ff764a170a379", + "cb22f737eb1b4d429c9c9518f570fc93", + "738c3ace1bf64104ae3befe93d700da7", + "b75576ddcc674375b650e80d32f9fa41", + "4c714057645546e5820d31b2526952ea", + "6b474dc0a25b4b13ab021bf2410444aa", + "1f6feb3ff4bf41bfa77798588dd1a047", + "27dbf6293941491f883256f5b4459d8f", + "4a974128152b4403b5b7e56dcef91cf1", + "d77aa9e72d2d4597982e3ce7c8a12747", + "6f591057c35645288199e5be61eef098", + "d177d754f6184f5181725a174bfa6f59", + "bfaaf16915b24273b45e96e6ca41d3ed", + "0c3660a4769a4962b6f73a8d7012678b", + "812dec41c5bc40f2998564edefc32f48", + "9ab59d54123049eaac41b2ca3f69e8dd", + "3a1d8ef8f52647c1a050fd1e0d67e274", + "4c79283275d142c5af38f80004cd92cf", + "69138bb51f074fac972a2a9a1b13492f", + "13540698510845829fd4792cce58ae6e", + "a4d69faae483435e8221c908875ecb24", + "7b3f7e7018794b49859332e2ddb7b2d9", + "80b79b9be9e6449c90e0128e87a08c7f", + "6e2a733507bf40c78f2e1af4793bc7ae", + "aa16b740ad28497f8096e7e12237674e", + "0cb1d77cf7fe4f14ae87abf2fbd5e60f", + "f047301402d945fe81700ec820533cac", + "d889d44b9db54983b4b3c401a44a71f0", + "7645f6c36b524741a6f6e0958bb0ef00", + "b48f0117b8574fdeb20b215a949eb975", + "ee5cfb4e1026483e8af39bce3177d2bb", + "20d8eb28f98a4c819a18f4babe0ecd62", + "1b1406ea56b34090b49af6d0704553c8", + "0dc0d388e53e4b59ae99b5bf4b9ace5b", + "c817149cda9d4f089e073bcbe5eca9fb", + "ec0cd0c6725d4ceeaeb9c20be5e3e7cd", + "db5f4fdbf18345ebb272a6dfadcf7d61", + "2f8c6139246e44c5a2699ef58403d450", + "0a943edc0c0d4910b9c222ec26b4040f", + "7ae7c6205f4c4717bd2678ce309d4406", + "1ca73957b7ab4f08a4fa7a7144b0adc2", + "c8e3c78a0a5a4ecdb68edfd34bf5ba57", + "6be09e28388f41c684674d4e7ade2177", + "853884f9eb58491db1f6605a940a12b8", + "1801a1c03c3448da87284353ac0e3cc2", + "89fdbc7a3ff54f618cd73e3e687050f4", + "7521c7d02e4c4dc1b21fe062055f6e81", + "cc7710b22b1a4e16a4cfb1e588bc7f58", + "58c01f28e9c94c59aa9636f33de6ecac", + "4eb1db5ea16d4c7bbd9feadad505c5f4", + "57dd8104c01c43259d3b7b7998ef3393", + "022cc22ac0d6416e87d32a400656882f" + ] + }, + "outputId": "dbb6527a-1f29-4b7d-b338-7568d7a7ae00" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.\n", + "🦥 Unsloth Zoo will now patch everything to make training faster!\n", + "==((====))== Unsloth 2026.4.4: Fast Gemma4 patching. Transformers: 5.5.0.\n", + " \\\\ /| Tesla T4. Num GPUs = 1. Max memory: 14.563 GB. Platform: Linux.\n", + "O^O/ \\_/ \\ Torch: 2.10.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.6.0\n", + "\\ / Bfloat16 = FALSE. FA [Xformers = 0.0.34. FA2 = False]\n", + " \"-____-\" Free license: http://github.com/unslothai/unsloth\n", + "Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "model.safetensors: 0%| | 0.00/16.0G [00:00\n", + "### Data Prep\n", + "We'll use a sampled dataset of handwritten math formulas. The objective is to convert these images into a computer-readable format—specifically LaTeX—so they can be rendered. This is particularly useful for complex expressions.\n", + "\n", + "You can access the dataset [here](https://huggingface.co/datasets/unsloth/LaTeX_OCR). The full dataset is [here](https://huggingface.co/datasets/linxy/LaTeX_OCR)." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "LjY75GoYUCB8", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 177, + "referenced_widgets": [ + "5951a9e9c2e947c68add360a7dc5ce92", + "50b86bc5bc844b8eaad80cedd9fde48b", + "75f39de184674a5c9a1e5e91230d7c1b", + "473ad0b6598f4b7bb9a482fb43e5df9d", + "089308801ffd44d4a4910133252ac4ac", + "a699c8aa0ba1493f8acda292a7ed5baf", + "d97dc62af95d4724927c811b5d7e29ec", + "9fec8f47396045dd8db0098630bca7f9", + "decf15485d6b43b9bd9a8e2c0b716e05", + "ee418cb7012d410e9ac926454c0c07be", + "abd9f5c7c77646248fc205f392432130", + "4be36a52100f4428bdde5c677f97da30", + "9e7726767acd484ebae1afdd654a05a4", + "a31415c76913402189bc790b97a26c96", + "91d72c119af74d01ba75649e6a5373c0", + "ccae1e6a6d734ba8a10877f0e6f7ab62", + "45fba48641cf445fadc58d19b3f6c23e", + "50aef54f7eb74226a77cb441dfbe325c", + "c55cdf1c16094a1daf8d88c607fadde4", + "19a2453531bc4f6aaa93924719db78e6", + "caf7196689c34109b3533a1bb1848c8f", + "7ddcfb8760d54e9384cea9b194b54ff9", + "1a5cfa89e65b4d5eb4bad5b8384941d9", + "922e910c39484edab8e136d8cba096d6", + "a73435e48a8a455db79e8b0af1da8e30", + "d377ba70a4a34971b38dc3da00e14d8c", + "993adffff4014e6092d1ab58b7eb3785", + "76197bb01a7345398b587cac018af3c0", + "95014c93ccf84efb9920b90fab3280a4", + "6fd42b94fd344004874b5b30af23f735", + "0fedfdce9a5548e08523cd804bc52a89", + "791da49c566843a0931ff1fbf527ea15", + "ee6d087135cd45aeaecce5ead580e125", + "3507926c364e4d2492f44547e8be6026", + "5b63be53fc584f12bcf3f059c259fdda", + "9b3cc60d100c47a5bb6a5329ff154bdd", + "876479a04ab345f099c2302802fed738", + "93409d01a3de414baa8dfbacfd6fdb55", + "0fc1b9b7ebea4f2caf949042b361295d", + "5fb3a854d76f463bb11b56217da39fa5", + "3ca40ab02d8943bdb688392f971d2368", + "d83a9351bb084b6aa9f029163ef1347d", + "664e56863796493b8fb8978ac323ed58", + "74f0a74dc1474cd0b0ea13705665f903", + "49bae6aaa3fd4c2f9ae513c34a6224c4", + "1e420f9238664bfa8671c532c2a3cbc7", + "246458d1864d4bcebd0fd9ee8dffc9e9", + "3d93daa0c6e446e59b6fcd78a03c057e", + "72e9d3ea09714d1b868d70a85cc1732d", + "1ece85a4c59d4f52b737399c1f2751c7", + "a2e6128a027049a88094639640518b73", + "c9eab4588c46468f8a476936c7c69fa3", + "dedda5c9231042ba929049b41fbf1810", + "736b7e9afc8841ef81341d03934cb646", + "eefb235872ed4a858432d2abbc78e228" + ] + }, + "outputId": "ef8a166a-89f8-4d96-c179-f11766183203" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "README.md: 0%| | 0.00/519 [00:00" + ], + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAUAAAAAyCAIAAACib5WDAAAYrUlEQVR4Ae3cebSuUx0HcCSFEBHKlNZCQi2zureMyTwrY2UeQlSaXNdQxjSSMltFSJF5bjCzDMuilUUUIjSKotLtc+83+z73eYd7znve877n3vXuP56zn/389m/av2FP75l10qRJswzKQAMDDcyYGphtxmS7P1wPgl1/9N4nqoZ77I/4TOvA3dU+bP/5z39mnXXWPtnSgGwfNGC4lf/+9799oD1kkjOnA1N6tD9kPUwHELbZZ5/9hRde+Oc//zkd0Ol9FgteffXVsR/apyfHTP6dCT3zzDP/+Mc/ZpttTPvImGauYxuh9BdffPHvf//7v/71r46RpCNPM5Zc7uijj37729/+ne98J9m4Y7Riwete9zpPKb1jJIOOo6qBjPinP/3pt7zlLddcc82///1vBjCqFDtHzhxnpsIraP/nP//5IossMvfcc3/5y18mHTfuWMZky0022YSKf/rTnwrJSHSMTd8//vGP55133qabbvr888/Dg9uOsQ06jpIG4sCvvPLKxhtvbNzFboS48SiRGwnamTADS25i5+abbz7PPPMYg85j2yyz8F7J/IYbbrjyyiu/8pWvwPmGN7xB/uwAp0HSy7xggw02uO2222688cbXv/71HeAZdOmBBpiQYoAuv/zy973vfccee+xf//pXa6gMYg8YGAaJkXj/WOub3HjVVVfxsb/85S+m0JasI2EyqVs4gNACmD8rI0Go+9/+9jcYllpqKalYZZCBR6LPUe2blHv77bdzp2OOOQatkUzlRonV2Yfh62MelDNImBxYsJxzzjlly66wbCrO8cRjyA3DSHCK6/POO6/JM+NQHwmqbvUl0RjhpFsSTRdPBnG6UovaIFdffXWQpk7TRdsXgJlnCk3XfEyMPPPMMw8++OA55pgj8XLkahUXIMmojxwbPCxj5Hi6hWG6dtwtQmMHD5GHKDWwTOLG1JBVNdmJAzNoCaSKZezUX375ZRqXMKeM0VDHqZf8D9F0esOSaCK3ZBrfG4pjgQojUYYYkU278DxE4N5L14kDE6mLGzBU0xXtWABT33e/+13PDTfc0DOq771O21MkbFJ6F91migqHp8Z0sTez8sorW94LK1qqnAeg2tL3urgc1TXlxCcATT+Vxixrv/SlL33xi18kcmymfJ0RK00cmFTkVFQoRUmLp0kpIR3STJw4EYB6bdSHpQJ94ZycJae971IolvEoLWGpFRUILS/NdtZbbz0w7R0YqqqNtsfciuJw2yPs/PPPb33umdfhIqnBFx3W1FgDa3wFjw1jqlLdrqdtwIW3kQxxI9HGFvxPMbfJ9la+ptEzzGhnDEa2OqbVrwB8AlBspqAqFYKAIReYPAvyAjPDVZo4sB0gCVZRIbCSFk8LSzJffPHFRx11VFXdnYlNiXBKAgoqBUmhWBYepSUsFchSMTYYlknOPvtswwNh+dSqAlWxUTCtMLfq3lk7S3388cfPPffcxx57zFM9cbAzbHoRHOe23J9++umXXnqpqsamOA2f2SN31fGCCy5Ya621jOl73vOeww47jDYwEyvXF8I//elPKto1NsWWRthEgTb2oHsbDPifYm6T7a1QSaNnJIr3/vjHP77++uvBBFv5Gj/0CUBTHw6HCRNO4J3nX3311Ysttli2poItmgHTJgQU9sZOZarKwpOROO200/785z+TiiRbb7219lxGIZvbEXvsscc555zzqU99yjavFqrvQBjIFR0///nP/+hHP1I/5ZRTHJozAght3N9///2sxwbghz70IWBarrvuOpaEpQ9/+MNrrrmmLo3G6uDXIY30izfjAb4Vb4z41FNP/dnPfrbffvtB+PGPf3y++ebbZpttxo8fr2MJHK26d9wOuSNl04TDDz/8d7/7nfqee+7ZMTYdyXjkkUf+5Cc/eec73/mLX/zC0Gy22Wa01CjCZI1PmvTkk0/+8pe/3GWXXfR985vfTN7vf//7H/vYx3JjIZzcddddAjTH/tWvfrXoooti8o1vfGNoBaA8jRcvEol23nnnueaaq7TXKm0Ggr1973vf+/3vf6+7Kclee+1lUsDfTj75ZPEI/o022shwE8ftl69//euXXnopKfQycG9605vKkGl817veteWWW7LJj370o0UDNQ7dw1lppZWeeuopA7HPPvuwE6xO1su0mqnxP6Zfw32epCI55wnHZ511ltDOJahVi+QmYwhd6txJF9qpdh96HRXAa6+9ttj/3HPPPfLIIwxF1MCAAeC6biyiwm4SSjzTgiVf0QVZJRdOEp4du/tkjKsApa6jXS5fP/GJTyDhvqtPv/nNb9T333//Wscg4WyChbH3tUa3oO19JSJLJjjHP1ZlVG7geBmTjXymhaplXZJedtll4XnxxRdPxaAo9L/ccsvtuOOOvOjZZ59997vf7RgcQKM+IQTv0yGHHLLDDjt89rOfbaUiCldCpfZE7uGHH+ZUpHjwwQchRBfMgQceqEVoMNxePR2/uf2irgswU5jqkFUNoGigkUNXA2DA6qGHHhpUnjXNuLyhsdgYoQQga2aNjUrQ2N8ydeJKHQpL5VTS4Gc+8xkmLgyLcwsvvLBP0pTrB4L9Cius4DoRJVbDvFeabVpIOAX3/x8gUfnhD38oOggWCy20EO8VbumRpnxdYIEFjAE7A3bSSSdpEZv5ueSPJXV0a0EdDOx33HGHp15VctV6SIgLBGSmKi67alx66aWdPIlQ5t4+1RhmUhnjKqqO6yxDoahUOsaDJX3vvPNOHoh/bH/kIx/BqrmMdvirmOmHxi655JI11ljD1SIC0hIM4pcn3QabRnMTWdr4Gvq3vvWtPPMb3/gG95Npq2pRhxM5OfzXv/61QSRObVAKG+bnSnmtMgbPMsssI4HDb0QwYHDNbO+77z5Ts1133dXMCLzuku26667LhUAi/Y53vKM6ZDpqXH/99QlIAzihgUYOkdPOjRWVSNGomSqH6sB4QVRU+9T/V8yVYgzUr732WiKZjJV2NxmoJucNJld8mGoCXGCGXkkYk9ZM3qTWP/zhD8zFPFY7nDBTrtF69NFHGZBZMeuha3nAJMrX6L1Kjma1szzzbXacpK2xClPqIoU1gtD+gQ98wDQMGPyeN910k8FgrCDDYWiZgIj9jJ6xYq+KVr1fBZNIe2KSoR900EHGxcycHbP18lVFAaPQKiu86KKL0pinzOYXGslykUW7GabwLUdZYpxwwgnU0qjzqAiMcN/UEoItsc9kVdGS1yoD8Gi3LYxKYUCG5IfA8pXhielm8qQIrQxEbcgCTA+rrrqqugJDjcMI8slPfjKzLTbTSjP64seTwZhpM9Hw4zmmyjRrYMKIf7feeisWb7nllrvvvptapSkyyIfcmPzmEtXEG717mr+54suAiO21FIGcEi2e4RQXPBNiWQls1sDg5QHZmO9hAAZgeLCoM49ijg888MAqq6xy/vnnW+zla0FeKtphM4VOdi3tpRLRzBsF+wMOOAB+S0H7GWhFHPN5TFpiGVrYwq1gbwpgb+zEE08kCCTgC85qvTT2poI9xexup512MuWjRiZuZWsBabBkVMo0VSnMkMhFbqJtt912SadayjhGENKR1xIJTsq3bhJhH3roIWAGSBQzP4cWTpAq3Nuc5YknngDAWyDUvVAMTkrTkqeWVApMKto5pDqHNAomQcKrqT4qcPpK/1xI+vVKapAoqtSGDKRimiYKQ2JSAGGNw3RncpEdQqWmGZ80ogIbo0L6c5/73BFHHCGC+E0LrtI3zPf9OVXjZDMAYpKJqFBN11rIwCtwPG7cOLwSTJ38kTDcgzF+VjKkVY+O8gkYczEYxYGDRCSWJ0V3eVVfc/JvfvObjqYMJPwyfHZZ9t133+OOO+4LX/jChRdeKCTrizQSQV57winEEKHKQGC04MSg2uSQMfwswTrZls/uu+8OwCcABuZtb3ubEzJgCy64oBZ2Y+FtN4Wfy8Bl1AtOkMBqIte46u5raHkaIC701a9+1RqYYWULyg6F7UZGRmNOCriW+SeVxi35m0ZSiHHBE+YJbuDwSfMkIqyT4cRuK2GE7Ir99re/pTrxNMeHNAatOAgYNiNY80yYo5xQF2jgN+J55RJKNAOVinP7448/3tTPBuS2227LA8ULjEXnrtbZEtc3XTy1e9aGLI34YQZkwUMjh5GUQoQ5GOhQwKppplBRiTmZj5jS2xtjhzbV0IW/CtbPOjlTyK8SLeD4teZJzB1/5Wtp76xiGHQ0JJnpBcl73/te29rqJkue9G5OmE/qlMWksovAS9NefTIgr1ydXYqUWE1Lgcmr++gE4beClAl8xgYMlnQxR0qMkGyNkPavfe1r4KMKXaIB7amYkVqzmX86jfAcYhEjUoYIX8CqvZZddlkugSVukKmdVzzzWxaJQ5+owrxUI0vVYrFgL5ekitcUX8kiZsW1NEZF9vzV2ToZaSD3+IVazgYy4ltZyPb2Mr1WceZrTTk8SiHLkksu6WlSE/yegVdBSLt77CuuuKLX4MzAkctXMgYe2+G8NmSxDbsndCX4gm/kMORMKEwwQ6KpZhAqpXCYfbUsuEpjwPCDVc/Sq2eVqRkYT9R08803cxgHDMYvhQOvs846+E7kBlMtGGUrLMYsDkz1kzpUpBLChWoVr6h4mj8vv/zyZGZwBttWlrgL3iskNj8TNbTo++1vf/vee+9Ni741EuU1DokKfkpjtWJEfeKlxlWwNx7yjB0UacoK3EQDP5b6ZtGcVsfddtuNXOjapDFNxXkw58kiLRmIDzJKqNJKXTt+1PGWFqR1Zzp0GzwFUqXWUn2tdTQW7BV7VvLYphbSiTgmeygCFu+8WhCChJlicxgD2GvUmOz3wQ9+UAt+ggRR4RKYpCqW6bj33nvDKeB+61vfOuOMM6IcIth6hBM8zUAVbvMsygFGdlwhIcfmNem36MQngRt152omYqeffnpMUXuKqTu00SSVouUpE9aGLEJ5hge0zPtqHOaTdiUiN9XMa5Qnj6xCt5YV9sBNdug2NlxgVLCkVFt6Vw+LnuTxFJDQLo3xyYTMptmPMICBidCWWLXCSbQQGwyFAjYMjMPqhQuFSkbXjCuoRFAbV/kUljJ711FjYPI1z+DEm2mYwTCo2tGqwqDoFYcRLSdhVtRSqB3XnHVfccUVYEgKxpxKPRRzTiNva6mh1TLcIvbbtAs/w+rb2DGTQPOO4IkaZZ6wLSzyIoJEY056yJXNqhrdjEtNRWC06CKNq8PmmWSlAieX81WYYNxa2hfTKKUpTNgz+riFsPYPGPLV/FlSZWMw4FZFPgDcOGQAHClzTnoI/005hDZ2CL6NZnyFBLBpXSFXs4G8msUIcJ7h0LNnZaqv8gEx2wzHdohAaMxsx2+//fb2QqRB2YbMykg4y3gYe0tKOPm2nQmBPCdv3BUV7oQBnOAntsXfDJhX1IOhykNa7IFRsdAen6+BFTzCpIMo2xumGPZpDJ4NNh1zrA1/8Ni/QUJ+CzbrMXt4hZ9CPV+9CjppDIeoR1EcxlpA0VcjkWF2NGJhKdBoiX0Ac6qZRWMwaLGCZYWezHerrbZq7Ii6+YJzWr7hq6M+sSZswJzb4KZOaREuiUxSLYTldeEwXz2LimQw+0BwCrLmqMJEbNT+XzKnWAY/DMSR0GT4iRMnYlhjwabiVYngk933gAPKaw0Mfqq2N2HyzOqik8DoosJgnEeYXefVXK/VkAGQ5+V27CHdnsPI1UYzIUcDZamCtyrz6mmRDLDkWVpqYKP3OtWBjahdEBZseaPCPjzFM2Yk/Fv8YCIiNeWGJK1KsZWozGUMokLoaMGpBvwFoUYUw0DpVb42pZ5GmHkFG9W3KZ/BZrQcIaDC5YRVkKw5dk/ewDgstRXkE5wk0p7kZiCrmANsciHiWNqZiEKlUYmYtuJMzgUpIcMJLcaMsW0kSCwfZDMVyQT/whaFOJFmcCGhxam7mYgAauamY45/SkcWDxIhgR8JkcgZkhbc8gHLBI4qLOpos127wsd4mumoRstadEmXT3niXIUUOWUJq0jAKdfpZXExfvx4yIFFRrGVk/skfCRL18YoJNz0UvSqUdSiBBUxlfI65cvkR7rYTlPyFTO5hVIbspAWak211Dkw+OlyCKaVZoJQLIjG6Kcp/zDgBJineo/LVAfuAeHIb0/CJlaVXEyn2lKt02NUWW0s9TL8soHYDLKVlttQafUpqOx4Sd1GCNFwgqgKV3//+99vxc558rNv4U+7osWc0MF1+JTzJVIhI6sJXpTwYchDwjqflSiZvevldBfyfJ0wYUKtY1MZccWLINExdB2/ueSESqxZI2O1xKUo9UYkNT0kf5qFwpn56k1TTsvjabFXocQNZLEmgZjsIZ1nRgfz4T+vVYBavdbd13Thh5JwkUJ7jdXgMa/BqlHwOnkYpjDTnsOCp1Ez6W7cmRYlwNmosdDt43MaB54i8v8fRQV57wqLlCXnuP0bQ6cOwxM1FfyFh9LSvhKdZlrlWAVwGy0j52tIxP60KFUS1ZagkuWYRdWBY0kylXZTO93t96iXFakkoFfQOiHnzElQWri616RQnGBDwT8f1m7TBSHtkMfNQrexoxZgOEyJIVKvE7JI5GlV5sYbsLiiPG9qPW7cONEk3T1rRa8gLHqQyeVeYFqgsrrRXUUxkREsTByEGAwQpIZt6K+FXGMX/Gi02LH9Llx6rRHySkDbGQIWeQGHVZXpcqgvQZpqRjsM9G9kM5rhRGNjCXBj+2i3TOPAo00MfsNsDGJtXSEXnfIcWi6htyuYIQnyRgdmH77KKg5gpSavll7FgUnHyhVrBFu49i3j5NpNOE2n9XXY7hk8Zte2HtAy740U/NC61w1nZqGX4pJZtaMW3duUpvYkavA9qaZNx1afwmojWh5i1dPGsiHUN91bIW/fHiVwJDe0nJIAromfV5P/Mu2vIhwKh001E2GH6MBVir2s99qBuysbFbMM2rcZZg2cZNVoZB0TbeXAjQjtJ/G97BSwmNVWW+0HP/iB1antJROwmJ0rfmBYoQWhe7yQJJPbwdpiiy28OpwEkHNXW6lZz3Nmy85axzYOk5QSDqc4zjTzi7QP61lDUsU/LDy9AR5JpKhxOEM48Owso8eFmth0t4jaNWVS1mYcw46ousOkbiFvj4cgyDkvtafiX89Ks7m96CajzOzkUIHB7w1sQY0bN841I7f2uLcVI4/1CQZPe/5LLLGEug0Y57pSNzxuiQlJcgsAedv9qmrHNqeO1fPVHI3CUEooDkv/NSRV/NB2gLAwM6wKQpwTM61kj7/VuB06hz0TZFhSTxe4Dw48LOuZrgAAZGBWZcu365jbU2dPvNcUmjfa3zbLlYGdgbl/ixO+J8GKJtZmiSmZ51dxsjZIfHWCoovixqJ9Y3dI/TNxlipAiErq1V7qjWZaA2j1ikSrT521dx1hKzYQah+aWzn2EDkcIlgr9vrV3qfrI90Ql3HzEJj8NJ8b5ApKLT90g05zHPyTPdn2lFT90IL3urWS3+5Y9DoE4mN2m++55x5enVNZgQbPKUkXuAVmi9VPLJCRBFwCF4l05MNaIs5rnSb/1bE5Q4PW0dFAFD523ZvRzOjFhq01Jx9WuitLlpqOkdiGtSjkcTyEVJz0xmbKVX6nRz65+MH9+K09FT9b5b06ZqOlsKc7SDD+HwUk7soDCLnsqPN2MErpMqj0WAPRv3HhvWP2GKkPU+gRBkqjSKFykc0eZxiuFvABN5alMo0dzy3bcOUKR5DHUROMPWVaVyMkSb5t/mau6zKGnzG7deTQ0qzYRWsZdcKECUwhvQoVUujilrVr4QKEnze6pBH8bmXI5+qBKV0GlR5rgOsaXFsYBsJ+RI+pD5Uc5maskrjo2gAJXZDI/5SSytLeXVmS0nM9QIyAvJZIG8nJn25uaXda1vh10DKjaCDmZLhd+7Wi6foJZbf0MOOtgaUyyhUR7RtJuTJYrrPSSC3LDTWGtYaTeNFyRGQT2LV+u81W3QY1PXzCQCkmvT4BkHV5viQMzNfW6Cf/SBMkGM8CRpBCojQOKr3UQLzLuLij4lqru4Nu6RqjrhtYF4TqViToC56clCLNE0aJAZi5k8Hz0zxRw2Wv+G1TcubS/gFAdbXcFGzQOMY1YMTtR/h/YJZFthjVGcDY5HlWbHUhDPQDBS1nxVsqo82Febud5/woZ7RpDfD3VwPCtEVQTgT6y0l76jOwAxNM9OnNrAYhYSKHOu0VKlQPBaw9ksHXsaMBntzqhHksMDljO3CPNdizeNFjuQbkmmogk9PeZIimDAylceDAQ9HSAGaggTGqgRlvF3qMKnLA1kAD/dDAwIH7ofUBzYEGuqSBgQN3SZEDNAMN9EMDAwfuh9YHNAca6JIGBg7cJUUO0Aw00A8NDBy4H1of0BxooEsaGDhwlxQ5QDPQQD80MHDgfmh9QHOggS5pYODAXVLkAM1AA/3QwP8AGMg7qICuIqsAAAAASUVORK5CYII=\n", + "image/jpeg": "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAAyAUADASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD3+iq1/f22mWE17eSiK2gUvJIQSFUdScdhWYPF+hGGxmW/Vkv8/ZCI3PnY5O35eeOfpzQBuUVk6DrsevJfywQukNrey2iyMQRN5ZAZl9t24f8AATWtQAUUVkxeJdJm1dtLiu99yHaI7Y2KeYo3GPzMbd4GSVzkAHjigDWorOTXdMk1afSlvIzqEEXnPb87wmcbgMcjPcU7S9ZsNagefT7gTxI5jZgrABgcEcgcg8H0oAm1HULbStNudQvJBHbW0TSyueyqMmquhawdb0/7WdOv7A+YyGG+iEcnHfAJ4PY5rnPibIZtF0zRQpb+2NVtbNwoBPl797nBPI2oc9euO9dqOlAC0Vn3Gt6fbal/Z0s5+2fZ2uhCsbMxiU4LcA98DHXJqbT9RtNW06G/sJ0uLWdd0cqdGHrQBl2HiJ9U8SX+nWVkXs9Obybq9eTaPOKhvLRcHdgEbiSMZHWt6uO8Cf8AIQ8Y/wDYfl/9Ew12NABWZea5bWWsWOmSRXJmvHKRyLCfKB2O+C/TOI24GT045rTrmfEssq634eaOyvJ0tr1p5nggLqiGCVMkj/adeBz3oA6aijtRQAUVU1LU7PSLI3d7N5cQZUGFLFmYgKqqASxJIAAGTTNL1ey1iGWSzkZvJlMMqSRtG8bjBKsrAEHBB5HIIPQ0AXqKr3t9a6db/aLydIIdyoXkOFBY4GT25IrCsvFE9341vNEGm3YtYbaKVLrygEJYyZbdu5VtoC4HUNntQB0tFYq+K9GbUxp4uyZmna2DeS/lGYDJj8zGzfgdM57da2qACiq1rqFneyTx21zFLJA5SVFYFkYEjBHUdDVnNABRVe+vbfTbKa8u5PLt4V3SPgnaO5OO1UJfE2jwaRbarJeotjdMqwTbWxIW+7jjJz29aANeiqP9rWraqmmozvctB9oZVU/u484Bb0ycgDqcH0NQw+INPutKudRs5HuYLZnWURRsXVk+8u0gHcPTrQBqUVVTUbOXS11KO4R7JofPWZTlTHjduHtjmud1Lxi1v4k0PT7GymvrTUN+65t0EiDHAw24D5Tkt146c0AdZWBa+JH/AOEsn8PahYm1nMTXFlMsm+O6iBAbHAKupPK88cgkVrTajZ297DZzXMcdxOCYo3YAvggHGevUVy2u/wDJUfCP/XpqP8oaAOyooooAKKKKACiiigDi/ilfrb+CrjTkuEiutWePT4dzAZ81wrnkjgKWJ7euKxNMtp3+Kmm6bc61Hfw6Jpsk0SpEkflyOREFwpOSEVuvIB/2q9NZFb7yg/UUBEByFUH1AoAr2Gn2ul2KWdjAsNvHnbGvQZJJ/Mkn8a8/fVvExkYi48QKCeAPD8fH/j9elUmB6CgDIj1KS08JtqV55xeC1aaXzofKc7VJOVGdp46Vw2gxx/8ACV+Hre11Y6nb2tjPeXUStGYLKVgoDgoB8zF5eHLHDMeOtejajp8Wp2TWk5YROylgpxkBg2PocYPsamit4IFdYYY4w7F2CKBuY9ScdSaAPJ9cldIJfiNpEaX11p2qzKY4Hz5toALdk4908wdcZJHBr0vw/pzaT4fsLF8GSGFRIR/E+MsfxYk/jWiqqowoA+gpelAHAa+x1P4xeFdOUhk060udRmTGR8w8pCR2wc4NdF4z1m48PeDdW1a0jWS4tbZ5I1YZG7HBI9BnJ+lYugw/2h8UvFGrNl1sobfTIXyCAdvmyAHHq6ZGeuc9q7ZlV1KsAykYIIyCKAPIdD8T6ZpnjDW7/VNfl1aWysobSGSNQ5lkJDSrGFGPmkaMKo7hh0XI1fBV3rGlweIdCOnww6mhOqafYTTEIsVxlhGWx/BJvUkcZ/OvQ0sbSMIEtoVCBQuIwNoXO0DjjGTj0zUhij8wy7F8zbt3Y5x1xn0oA5H4bLAfDdxP50kuoz3076n5qhXW63YdSoJACgKBg/dAPetS4u/FK3Mi2+kaTJCGPltJqcisy54JUQHB9sn61leBP+P/AMZf9h+X/wBEw1z3ivSlTxFGttqivq7alDqDXcoCHTLQYVlZ+6MRtVD94k8cE0AdbNqvii28vz9K0KLzHEab9YkXcx6AZg5PtVaLxJrdxftYw2vhuS8XdugXW3LjacN8vkZ4PB9K5jxvDc3Y1bVvKtZLXz4tJV5kZp4kZ0RzbgjaHLO3zc8ovPGBt6ZoWpJ47a+n0mK3062e5+yGK5UjMpBklZcbjI7AcZCqM8EnNAGwL7xac40bReOv/E2k4/8AIFU73xLremKrX9r4btVYEgz626AgdTzB0GR+dcnrtwNA8SS+PyJRp0d8+l38aJkSW21U345BKzhvc5x2q3eeEb1PCOlWml6LB9quYJRqNwkqQzRLMA0yJuXGWPy5I+ULwM4wAX/FWoXV3feGNK1Oay0yG8kmurm4imWQJ5O0xrHJIgAZiwO7AIwcetY+jardeHkuvFUt2kuk6vriW5e8IWV7UKIIplORk7lycglkG71r0mHTLabSbW0vLG3ZIo0HkOokVCFxgbuuOmaq6n4V0nWbqW41C3Nw0lo1oFkclI0bO4ovRWIOCw5wAKAMGa403VfiTdabrUtoxsbaI6fZXBH7xpA3mShW4cgKFGM7RnpurdmsTpepaprkYWRTp8USW6rg5hMrcH38wDpxiodX8HabrVjp1leNK8Ni8bIWCu7bMYy7KWB+XkqQTk810NAHi+mwS69b+E0t9X+0ahqN6mt6jBbqn2e2Vf3p+RR8jbzGvJyxLZPp0d14zurv4eardPeWtrqNlff2beXNsS0cOZljaZeSQBG+8Z6Hr0rtLnQ7GfT7qzji+yx3Q/etaHyXb1O5cHPv71Fp/hrStKup57C0S38+3jt3ij4jKR5C/L0yA2M+gAoAXQbLRbTTIm0GKzFo8ahJbYqwkUdCXGd31JJ5Ncnq914oPi/w8X0jSllH2ny1XUpCrfuxnJ8njj2P4V1Hhzw1ZeGLS4t7IsRcTm4lJREBcgKcKiqo4UdAO571oy2VtPd291LCjT2+7ynI5TcMNj6igCGSGe/0WSC+RLeaeFkkEEhkCEgj5WIGevoK4H4ZC58Q+G9AvL2J1s9Is1gtVYFRLOq7GkI7hFGxT6lz/dNemU2ONIkCRoqKOgUYAoA47RNQg06+8a6nq8kVstvqIEkrN92BbeIpnk/3icepNZXg7U7mLxzqIurGTT7PxHH/AGlYW8xIcNHiOTcP4XZfLk29geea7L+woRr0+ppJhbqFYrq3ZAyTFD+7fnowBI9wRnoKuX1o13bOkUvkTlGWO4VAzRZGCVz3oA8xtLzTVh8PaPqF7bQ6FLf6kwWZ8RzmG5KwQ7jwV+bdg9fLA56V302gQnVdHu7XyYILAzfuY4wAwkXHGOBzz+NJP4V02bwqnh1IxHYpEsSgxpKcD1DqwJPOSRnknrzWjpunwaVpdpp1tu8i1hSCPccnaoAGT34FAHI+BZdL1wXer3Elpc6691J9oU4aS0COyRxhT8yAKB6ZJLc5pPHbNBrvhq50wtJ4gWeWOztSP3c8TKPOEhyNqgBTuGSCBgHOK3j4XsG8WJ4jYE3qQtCmERQAcZywUM3T+IkDJxWRrv8AyVLwj/156h/KGgDr5ZkggeaVgkaKWZj0AAyTUOn6haarYQ31jOk9rOu+OVOjD1FWaOlABRRRQAUUUUAFFFFABRRRQAUUUUAFZ+uWF3qej3FpY6lLp104BiuolDGNgQeh4IOMEdwT0rQooAyfD2hroOnyQG5e6uJ55Lm5uHUKZZXbJOBwAOAB2AArWoooAKKKKAOb0nQr/RfE2qzwSwS6Tqk32t0clZYJ9qq2OCGVgoPJG3Henz+BvDNxrY1qXRrWTUhKs4uGBLb1xg9e2B+VdDRQBzWl+C9MtGt7u7hFzqEchuGkZ28vz2JLSCPO0Nkn5tueBXSModCpzgjBwcUtFAGXD4c0iHSJtJWxiawmLNJbyZdGLHLZDE9Tz9ea1KKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigArmYdB1C78bHXtUlgFvZwyW2nW0JLFVcqXldiB8x2gbRwAOpNdNRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFAH/2Q==\n" + }, + "metadata": {}, + "execution_count": 7 + } + ], + "source": [ + "dataset[2][\"image\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "id": "lXjfJr4W6z8P", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 52 + }, + "outputId": "8cf867a2-2e4b-4d4c-9d4d-a790b17603df" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'H ^ { \\\\prime } = \\\\beta N \\\\int d \\\\lambda \\\\biggl \\\\{ \\\\frac { 1 } { 2 \\\\beta ^ { 2 } N ^ { 2 } } \\\\partial _ { \\\\lambda } \\\\zeta ^ { \\\\dagger } \\\\partial _ { \\\\lambda } \\\\zeta + V ( \\\\lambda ) \\\\zeta ^ { \\\\dagger } \\\\zeta \\\\biggr \\\\} \\\\ .'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 8 + } + ], + "source": [ + "dataset[2][\"text\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rKHxfZua1CrS" + }, + "source": [ + "We can also render LaTeX directly in the browser!" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "id": "nPopsxAC1CrS", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 52 + }, + "outputId": "6a4ef07a-7a3f-4c40-8f94-f7c892a6c0fe" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/latex": "$\\displaystyle \\sigma ^ { \\mu } \\frac { \\lambda ^ { a } } { 2 } A _ { \\mu } ^ { a } .$" + }, + "metadata": {} + } + ], + "source": [ + "from IPython.display import display, Math, Latex\n", + "\n", + "latex = dataset[3][\"text\"]\n", + "display(Math(latex))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "K9CBpiISFa6C" + }, + "source": [ + "To format the dataset, all vision fine-tuning tasks should follow this format:\n", + "\n", + "```python\n", + "[\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"text\", \"text\": instruction},\n", + " {\"type\": \"image\", \"image\": sample[\"image\"]},\n", + " ],\n", + " },\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"text\", \"text\": instruction},\n", + " {\"type\": \"image\", \"image\": sample[\"image\"]},\n", + " ],\n", + " },\n", + "]\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "id": "oPXzJZzHEgXe" + }, + "outputs": [], + "source": [ + "instruction = \"Write the LaTeX representation for this image.\"\n", + "\n", + "def convert_to_conversation(sample):\n", + " conversation = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"text\", \"text\": instruction},\n", + " {\"type\": \"image\", \"image\": sample[\"image\"]},\n", + " ],\n", + " },\n", + " {\"role\": \"assistant\", \"content\": [{\"type\": \"text\", \"text\": sample[\"text\"]}]},\n", + " ]\n", + " return {\"messages\": conversation}\n", + "pass" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FY-9u-OD6_gE" + }, + "source": [ + "Let's convert the dataset into the \"correct\" format for finetuning:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "id": "gFW2qXIr7Ezy" + }, + "outputs": [], + "source": [ + "converted_dataset = [convert_to_conversation(sample) for sample in dataset]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ndDUB23CGAC5" + }, + "source": [ + "The first example is now structured like below:" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "id": "gGFzmplrEy9I", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "0033ce5a-b6a2-4031-e333-febdc8ee37ff" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'messages': [{'role': 'user',\n", + " 'content': [{'type': 'text',\n", + " 'text': 'Write the LaTeX representation for this image.'},\n", + " {'type': 'image',\n", + " 'image': }]},\n", + " {'role': 'assistant',\n", + " 'content': [{'type': 'text',\n", + " 'text': '{ \\\\frac { N } { M } } \\\\in { \\\\bf Z } , { \\\\frac { M } { P } } \\\\in { \\\\bf Z } , { \\\\frac { P } { Q } } \\\\in { \\\\bf Z }'}]}]}" + ] + }, + "metadata": {}, + "execution_count": 12 + } + ], + "source": [ + "converted_dataset[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MsRPBIb0JJ6c" + }, + "source": [ + "Lets take the Gemma 4 instruction chat template and use it in our base model" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "id": "exoDVEvmJN-6" + }, + "outputs": [], + "source": [ + "from unsloth import get_chat_template\n", + "\n", + "processor = get_chat_template(\n", + " processor,\n", + " \"gemma-4\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FecKS-dA82f5" + }, + "source": [ + "Before fine-tuning, let us evaluate the base model's performance. We do not expect strong results, as it has not encountered this chat template before." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "id": "vcat4UxA81vr", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "1184ad38-de99-49bd-d623-48b0892e8daf" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "```latex\n", + "H' = \\beta N \\int d\\lambda \\left\\{ \\frac{1}{2\\beta^2 N^2} \\partial_\\lambda \\zeta^\\dagger \\partial_\\lambda \\zeta + V(\\lambda) \\zeta^\\dagger \\zeta \\right\\}\n", + "```\n" + ] + } + ], + "source": [ + "image = dataset[2][\"image\"]\n", + "instruction = \"Write the LaTeX representation for this image.\"\n", + "\n", + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\": \"image\"}, {\"type\": \"text\", \"text\": instruction}],\n", + " }\n", + "]\n", + "input_text = processor.apply_chat_template(messages, add_generation_prompt = True)\n", + "inputs = processor(\n", + " image,\n", + " input_text,\n", + " add_special_tokens = False,\n", + " return_tensors = \"pt\",\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "\n", + "text_streamer = TextStreamer(processor, skip_prompt = True)\n", + "result = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128,\n", + " use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FeAiMlQ71CrS" + }, + "source": [ + "You can see it's absolutely terrible! It doesn't follow instructions at all" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "idAEIeSQ3xdS" + }, + "source": [ + "\n", + "### Train the model\n", + "Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support `DPOTrainer` and `GRPOTrainer` for reinforcement learning!\n", + "\n", + "We use our new `UnslothVisionDataCollator` which will help in our vision finetuning setup." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "id": "95_Nn-89DhsL", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "ecdc9c7a-7177-45a5-cdee-77f06ec9842f" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Unsloth: Model does not have a default image size - using 512\n" + ] + } + ], + "source": [ + "from unsloth.trainer import UnslothVisionDataCollator\n", + "from trl import SFTTrainer, SFTConfig\n", + "\n", + "trainer = SFTTrainer(\n", + " model = model,\n", + " train_dataset = converted_dataset,\n", + " processing_class = processor.tokenizer,\n", + " data_collator = UnslothVisionDataCollator(model, processor),\n", + " args = SFTConfig(\n", + " per_device_train_batch_size = 1,\n", + " gradient_accumulation_steps = 4,\n", + " max_grad_norm = 0.3,\n", + " warmup_ratio = 0.03,\n", + " max_steps = 60,\n", + " # num_train_epochs = 2, # Set this instead of max_steps for full training runs\n", + " learning_rate = 2e-4,\n", + " logging_steps = 1,\n", + " save_strategy = \"steps\",\n", + " optim = \"adamw_8bit\",\n", + " weight_decay = 0.001,\n", + " lr_scheduler_type = \"cosine\",\n", + " seed = 3407,\n", + " output_dir = \"outputs\",\n", + " report_to = \"none\", # For Weights and Biases or others\n", + "\n", + " # You MUST put the below items for vision finetuning:\n", + " remove_unused_columns = False,\n", + " dataset_text_field = \"\",\n", + " dataset_kwargs = {\"skip_prepare_dataset\": True},\n", + " max_length = 2048,\n", + " )\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "cellView": "form", + "id": "2ejIt2xSNKKp", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "1e46dd33-1258-4978-abce-4a0cd70c28ab" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "GPU = Tesla T4. Max memory = 14.563 GB.\n", + "10.182 GB of memory reserved.\n" + ] + } + ], + "source": [ + "# @title Show current memory stats\n", + "gpu_stats = torch.cuda.get_device_properties(0)\n", + "start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)\n", + "print(f\"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.\")\n", + "print(f\"{start_gpu_memory} GB of memory reserved.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "id": "yqxqAZ7KJ4oL", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "outputId": "1611d77d-130b-4509-f49d-da4422a7a0e9" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': 2}.\n", + "==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1\n", + " \\\\ /| Num examples = 68,686 | Num Epochs = 1 | Total steps = 60\n", + "O^O/ \\_/ \\ Batch size per device = 1 | Gradient accumulation steps = 4\n", + "\\ / Data Parallel GPUs = 1 | Total batch size (1 x 4 x 1) = 4\n", + " \"-____-\" Trainable parameters = 82,444,288 of 8,078,600,736 (1.02% trained)\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Unsloth: Will smartly offload gradients to save VRAM!\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [60/60 05:25, Epoch 0/1]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StepTraining Loss
114.862482
215.983265
315.897610
416.153982
516.364323
612.335323
76.774734
85.239524
94.966103
103.271874
113.070101
122.938359
132.469718
142.040107
151.912474
162.391622
172.170793
181.760992
191.899228
201.493368
211.680966
221.567590
231.590823
241.522763
251.543306
261.648688
271.278674
281.528238
291.577930
301.366876
311.197964
321.834775
331.473532
341.572434
351.176874
361.155894
371.070827
381.249489
391.477776
401.618961
411.207805
420.868922
431.415766
441.279130
451.078420
461.076866
471.207423
481.021715
491.184945
501.114685
511.116125
521.270778
530.977663
541.013628
551.137954
561.791641
570.879617
581.045508
591.497848
601.182628

" + ] + }, + "metadata": {} + } + ], + "source": [ + "trainer_stats = trainer.train()" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "cellView": "form", + "id": "pCqnaKmlO1U9", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "308b4484-426c-4d82-b8b2-4f40918425d9" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "353.8392 seconds used for training.\n", + "5.9 minutes used for training.\n", + "Peak reserved memory = 10.777 GB.\n", + "Peak reserved memory for training = 0.595 GB.\n", + "Peak reserved memory % of max memory = 74.003 %.\n", + "Peak reserved memory for training % of max memory = 4.086 %.\n" + ] + } + ], + "source": [ + "# @title Show final memory and time stats\n", + "used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", + "used_memory_for_lora = round(used_memory - start_gpu_memory, 3)\n", + "used_percentage = round(used_memory / max_memory * 100, 3)\n", + "lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)\n", + "print(f\"{trainer_stats.metrics['train_runtime']} seconds used for training.\")\n", + "print(\n", + " f\"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.\"\n", + ")\n", + "print(f\"Peak reserved memory = {used_memory} GB.\")\n", + "print(f\"Peak reserved memory for training = {used_memory_for_lora} GB.\")\n", + "print(f\"Peak reserved memory % of max memory = {used_percentage} %.\")\n", + "print(f\"Peak reserved memory for training % of max memory = {lora_percentage} %.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ekOmTR1hSNcr" + }, + "source": [ + "\n", + "### Inference\n", + "Let's run the model! You can modify the instruction and input—just leave the output blank.\n", + "\n", + "We'll use the best hyperparameters for inference on Gemma: `top_p=0.95`, `top_k=64`, and `temperature=1.0`." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "id": "kR3gIAX-SM2q", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "79616e26-3a56-41d0-8306-06889f63b9fc" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\\[\\left[[B_n^{\\pm}, b_2^{\\mp}], b_2^{\\pm}\\right] = nB_n^{\\pm}, \\quad \\left[[B_n^{\\mp}, b_2^{\\pm}], b_2^{\\mp}\\right] = nB_n^{\\mp}.\\]\n" + ] + } + ], + "source": [ + "image = dataset[10][\"image\"]\n", + "instruction = \"Write the LaTeX representation for this image.\"\n", + "\n", + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\": \"image\"}, {\"type\": \"text\", \"text\": instruction}],\n", + " }\n", + "]\n", + "\n", + "input_text = processor.apply_chat_template(messages, add_generation_prompt = True)\n", + "\n", + "inputs = processor(\n", + " image,\n", + " input_text,\n", + " add_special_tokens = False,\n", + " return_tensors = \"pt\",\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "\n", + "text_streamer = TextStreamer(processor, skip_prompt = True)\n", + "result = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128,\n", + " use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uMuVrWbjAzhc" + }, + "source": [ + "\n", + "### Saving, loading finetuned models\n", + "To save the final model as LoRA adapters, use Hugging Face’s `push_to_hub` for online saving, or `save_pretrained` for local storage.\n", + "\n", + "**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "id": "upcOlWe7A1vc", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "808a9a7d-0e26-4635-c918-41f0525310eb" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "['gemma_4_lora/processor_config.json']" + ] + }, + "metadata": {}, + "execution_count": 20 + } + ], + "source": [ + "model.save_pretrained(\"gemma_4_lora\") # Local saving\n", + "processor.save_pretrained(\"gemma_4_lora\")\n", + "# model.push_to_hub(\"your_name/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving\n", + "# processor.push_to_hub(\"your_name/gemma_4_lora\", token = \"YOUR_HF_TOKEN\") # Online saving" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AEEcJ4qfC7Lp" + }, + "source": [ + "Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "id": "MKX_XKs_BNZR", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "5b7d5dfd-34b8-45e0-ef9e-1378d260fd7a" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "This equation appears to be from a field of physics, likely **theoretical physics**, specifically **electrodynamics**, **gauge theories**, or **relativistic field theory**, given the use of Greek letters ($\\mu, \\alpha, \\beta$), tensor notation ($D^\\alpha_\\mu \\tilde{A}^\\alpha_\\mu$), and the structure of the equation.\n", + "\n", + "Here is a breakdown of the components and the likely physical meaning:\n", + "\n", + "### Breakdown of the Equation\n", + "\n", + "$$D^\\alpha_\\mu \\tilde{A}^\\alpha_\\mu = 0$$\n", + "\n", + "1. **$D^\\alpha_\\mu$ (Covariant Derivative or Differential\n" + ] + } + ], + "source": [ + "if False:\n", + " from unsloth import FastVisionModel\n", + "\n", + " model, processor = FastVisionModel.from_pretrained(\n", + " model_name = \"gemma_4_lora\", # YOUR MODEL YOU USED FOR TRAINING\n", + " load_in_4bit = True, # Set to False for 16bit LoRA\n", + " )\n", + "\n", + "sample = dataset[1]\n", + "image = sample[\"image\"].convert(\"RGB\")\n", + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": sample[\"text\"],\n", + " },\n", + " {\n", + " \"type\": \"image\",\n", + " },\n", + " ],\n", + " },\n", + "]\n", + "input_text = processor.apply_chat_template(messages, add_generation_prompt = True)\n", + "inputs = processor(\n", + " image,\n", + " input_text,\n", + " add_special_tokens = False,\n", + " return_tensors = \"pt\",\n", + ").to(\"cuda\")\n", + "\n", + "from transformers import TextStreamer\n", + "\n", + "text_streamer = TextStreamer(processor.tokenizer, skip_prompt = True)\n", + "_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128,\n", + " use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f422JgM9sdVT" + }, + "source": [ + "### Saving to float16 for VLLM\n", + "\n", + "We also support saving to `float16` directly. Select `merged_16bit` for float16. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See [our docs](https://unsloth.ai/docs/basics/inference-and-deployment) for more deployment options." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "id": "iHjt_SMYsd3P" + }, + "outputs": [], + "source": [ + "# Select ONLY 1 to save! (Both not needed!)\n", + "\n", + "# Save locally to 16bit\n", + "if False: model.save_pretrained_merged(\"unsloth_finetune\", processor,)\n", + "\n", + "# To export and save to your Hugging Face account\n", + "if False: model.push_to_hub_merged(\"YOUR_USERNAME/unsloth_finetune\", processor, token = \"YOUR_HF_TOKEN\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TSjNVDCYv-yr" + }, + "source": [ + "And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!\n", + "\n", + "Some other resources:\n", + "1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)\n", + "2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)\n", + "3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)\n", + "4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)!\n", + "\n", + "

\n", + " \n", + " \n", + " \n", + "\n", + " Join Discord if you need help + ⭐️ Star us on Github ⭐️\n", + "
\n", + "\n", + " This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme)." + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.3" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "99bff77887c742b188878c8375844c00": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_69e590ac68fc4967b08d7b03f8d1029c", + "IPY_MODEL_4a9ba035d1384ebe9c80163ac262e4e8", + "IPY_MODEL_b24ef9781b824950bec1ec219867eee7" + ], + "layout": "IPY_MODEL_3a6e1f1e42cc4ef199c7d2f727f1077f" + } + }, + "69e590ac68fc4967b08d7b03f8d1029c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_56bb57f481f3478cb724c8ba1f2b6cbe", + "placeholder": "​", + "style": "IPY_MODEL_cc4c86caf7c0411887432e63c5f7390e", + "value": "model.safetensors: 100%" + } + }, + "4a9ba035d1384ebe9c80163ac262e4e8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_123af53e4ce64b43ac0039ca8fd8c093", + "max": 15992595884, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_cc4c9812b7f74cde97abf1e2296913ad", + "value": 15992595884 + } + }, + "b24ef9781b824950bec1ec219867eee7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b1a460f7070744e38f66bef5c8113442", + "placeholder": "​", + "style": "IPY_MODEL_e827a7825283426f80d9a1e1f86019a9", + "value": " 16.0G/16.0G [06:05<00:00, 102MB/s]" + } + }, + "3a6e1f1e42cc4ef199c7d2f727f1077f": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "56bb57f481f3478cb724c8ba1f2b6cbe": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "cc4c86caf7c0411887432e63c5f7390e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "123af53e4ce64b43ac0039ca8fd8c093": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "cc4c9812b7f74cde97abf1e2296913ad": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "b1a460f7070744e38f66bef5c8113442": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e827a7825283426f80d9a1e1f86019a9": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "e9caa0bb6c3f4e69b32d74144283097d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_b2dc300bcb304343a6e719211bfa16b5", + "IPY_MODEL_bbc1086d0d8d44a68e8896bba354f67a", + "IPY_MODEL_a8259ae181f94a7dbe728e4fbea13b50" + ], + "layout": "IPY_MODEL_56e6d46ada2748f79cddbb862e547475" + } + }, + "b2dc300bcb304343a6e719211bfa16b5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_247d66307c12403aa152e6afaa3fb772", + "placeholder": "​", + "style": "IPY_MODEL_2e205311dd6f46b78d6c7f0fac475d27", + "value": "Loading weights: 100%" + } + }, + "bbc1086d0d8d44a68e8896bba354f67a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_260ff38c4cd848b2948ecaea9abb2de3", + "max": 2130, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_d22d7e4650e443fe9ce1b84520c3894e", + "value": 2130 + } + }, + "a8259ae181f94a7dbe728e4fbea13b50": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_16373d8ce24c4be095e520fc2f063a81", + "placeholder": "​", + "style": "IPY_MODEL_51ec6a2c40dd46d385befe3e4cb32dce", + "value": " 2130/2130 [01:08<00:00, 488.54it/s]" + } + }, + "56e6d46ada2748f79cddbb862e547475": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "247d66307c12403aa152e6afaa3fb772": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2e205311dd6f46b78d6c7f0fac475d27": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "260ff38c4cd848b2948ecaea9abb2de3": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d22d7e4650e443fe9ce1b84520c3894e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "16373d8ce24c4be095e520fc2f063a81": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "51ec6a2c40dd46d385befe3e4cb32dce": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "7f0f2a4a7eda4176b544a876ca46aa0c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_ca6772730e004e2ea6941ddd5e8965be", + "IPY_MODEL_2798c261bb4a449da37c73d428aa750d", + "IPY_MODEL_45927d38b34747a8969ff764a170a379" + ], + "layout": "IPY_MODEL_cb22f737eb1b4d429c9c9518f570fc93" + } + }, + "ca6772730e004e2ea6941ddd5e8965be": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_738c3ace1bf64104ae3befe93d700da7", + "placeholder": "​", + "style": "IPY_MODEL_b75576ddcc674375b650e80d32f9fa41", + "value": "generation_config.json: 100%" + } + }, + "2798c261bb4a449da37c73d428aa750d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_4c714057645546e5820d31b2526952ea", + "max": 208, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_6b474dc0a25b4b13ab021bf2410444aa", + "value": 208 + } + }, + "45927d38b34747a8969ff764a170a379": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_1f6feb3ff4bf41bfa77798588dd1a047", + "placeholder": "​", + "style": "IPY_MODEL_27dbf6293941491f883256f5b4459d8f", + "value": " 208/208 [00:00<00:00, 21.3kB/s]" + } + }, + "cb22f737eb1b4d429c9c9518f570fc93": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "738c3ace1bf64104ae3befe93d700da7": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b75576ddcc674375b650e80d32f9fa41": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "4c714057645546e5820d31b2526952ea": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6b474dc0a25b4b13ab021bf2410444aa": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "1f6feb3ff4bf41bfa77798588dd1a047": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "27dbf6293941491f883256f5b4459d8f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "4a974128152b4403b5b7e56dcef91cf1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_d77aa9e72d2d4597982e3ce7c8a12747", + "IPY_MODEL_6f591057c35645288199e5be61eef098", + "IPY_MODEL_d177d754f6184f5181725a174bfa6f59" + ], + "layout": "IPY_MODEL_bfaaf16915b24273b45e96e6ca41d3ed" + } + }, + "d77aa9e72d2d4597982e3ce7c8a12747": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0c3660a4769a4962b6f73a8d7012678b", + "placeholder": "​", + "style": "IPY_MODEL_812dec41c5bc40f2998564edefc32f48", + "value": "processor_config.json: " + } + }, + "6f591057c35645288199e5be61eef098": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_9ab59d54123049eaac41b2ca3f69e8dd", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_3a1d8ef8f52647c1a050fd1e0d67e274", + "value": 1 + } + }, + "d177d754f6184f5181725a174bfa6f59": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_4c79283275d142c5af38f80004cd92cf", + "placeholder": "​", + "style": "IPY_MODEL_69138bb51f074fac972a2a9a1b13492f", + "value": " 1.69k/? [00:00<00:00, 79.5kB/s]" + } + }, + "bfaaf16915b24273b45e96e6ca41d3ed": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0c3660a4769a4962b6f73a8d7012678b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "812dec41c5bc40f2998564edefc32f48": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "9ab59d54123049eaac41b2ca3f69e8dd": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "3a1d8ef8f52647c1a050fd1e0d67e274": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "4c79283275d142c5af38f80004cd92cf": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "69138bb51f074fac972a2a9a1b13492f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "13540698510845829fd4792cce58ae6e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_a4d69faae483435e8221c908875ecb24", + "IPY_MODEL_7b3f7e7018794b49859332e2ddb7b2d9", + "IPY_MODEL_80b79b9be9e6449c90e0128e87a08c7f" + ], + "layout": "IPY_MODEL_6e2a733507bf40c78f2e1af4793bc7ae" + } + }, + "a4d69faae483435e8221c908875ecb24": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_aa16b740ad28497f8096e7e12237674e", + "placeholder": "​", + "style": "IPY_MODEL_0cb1d77cf7fe4f14ae87abf2fbd5e60f", + "value": "chat_template.jinja: " + } + }, + "7b3f7e7018794b49859332e2ddb7b2d9": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_f047301402d945fe81700ec820533cac", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_d889d44b9db54983b4b3c401a44a71f0", + "value": 1 + } + }, + "80b79b9be9e6449c90e0128e87a08c7f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7645f6c36b524741a6f6e0958bb0ef00", + "placeholder": "​", + "style": "IPY_MODEL_b48f0117b8574fdeb20b215a949eb975", + "value": " 11.9k/? [00:00<00:00, 705kB/s]" + } + }, + "6e2a733507bf40c78f2e1af4793bc7ae": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "aa16b740ad28497f8096e7e12237674e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0cb1d77cf7fe4f14ae87abf2fbd5e60f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "f047301402d945fe81700ec820533cac": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "d889d44b9db54983b4b3c401a44a71f0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "7645f6c36b524741a6f6e0958bb0ef00": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b48f0117b8574fdeb20b215a949eb975": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "ee5cfb4e1026483e8af39bce3177d2bb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_20d8eb28f98a4c819a18f4babe0ecd62", + "IPY_MODEL_1b1406ea56b34090b49af6d0704553c8", + "IPY_MODEL_0dc0d388e53e4b59ae99b5bf4b9ace5b" + ], + "layout": "IPY_MODEL_c817149cda9d4f089e073bcbe5eca9fb" + } + }, + "20d8eb28f98a4c819a18f4babe0ecd62": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ec0cd0c6725d4ceeaeb9c20be5e3e7cd", + "placeholder": "​", + "style": "IPY_MODEL_db5f4fdbf18345ebb272a6dfadcf7d61", + "value": "tokenizer_config.json: " + } + }, + "1b1406ea56b34090b49af6d0704553c8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_2f8c6139246e44c5a2699ef58403d450", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_0a943edc0c0d4910b9c222ec26b4040f", + "value": 1 + } + }, + "0dc0d388e53e4b59ae99b5bf4b9ace5b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7ae7c6205f4c4717bd2678ce309d4406", + "placeholder": "​", + "style": "IPY_MODEL_1ca73957b7ab4f08a4fa7a7144b0adc2", + "value": " 14.9k/? [00:00<00:00, 1.15MB/s]" + } + }, + "c817149cda9d4f089e073bcbe5eca9fb": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ec0cd0c6725d4ceeaeb9c20be5e3e7cd": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "db5f4fdbf18345ebb272a6dfadcf7d61": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "2f8c6139246e44c5a2699ef58403d450": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": "20px" + } + }, + "0a943edc0c0d4910b9c222ec26b4040f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "7ae7c6205f4c4717bd2678ce309d4406": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1ca73957b7ab4f08a4fa7a7144b0adc2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "c8e3c78a0a5a4ecdb68edfd34bf5ba57": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_6be09e28388f41c684674d4e7ade2177", + "IPY_MODEL_853884f9eb58491db1f6605a940a12b8", + "IPY_MODEL_1801a1c03c3448da87284353ac0e3cc2" + ], + "layout": "IPY_MODEL_89fdbc7a3ff54f618cd73e3e687050f4" + } + }, + "6be09e28388f41c684674d4e7ade2177": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7521c7d02e4c4dc1b21fe062055f6e81", + "placeholder": "​", + "style": "IPY_MODEL_cc7710b22b1a4e16a4cfb1e588bc7f58", + "value": "tokenizer.json: 100%" + } + }, + "853884f9eb58491db1f6605a940a12b8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_58c01f28e9c94c59aa9636f33de6ecac", + "max": 32169626, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_4eb1db5ea16d4c7bbd9feadad505c5f4", + "value": 32169626 + } + }, + "1801a1c03c3448da87284353ac0e3cc2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_57dd8104c01c43259d3b7b7998ef3393", + "placeholder": "​", + "style": "IPY_MODEL_022cc22ac0d6416e87d32a400656882f", + "value": " 32.2M/32.2M [00:00<00:00, 161MB/s]" + } + }, + "89fdbc7a3ff54f618cd73e3e687050f4": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7521c7d02e4c4dc1b21fe062055f6e81": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "cc7710b22b1a4e16a4cfb1e588bc7f58": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "58c01f28e9c94c59aa9636f33de6ecac": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4eb1db5ea16d4c7bbd9feadad505c5f4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "57dd8104c01c43259d3b7b7998ef3393": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "022cc22ac0d6416e87d32a400656882f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "5951a9e9c2e947c68add360a7dc5ce92": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_50b86bc5bc844b8eaad80cedd9fde48b", + "IPY_MODEL_75f39de184674a5c9a1e5e91230d7c1b", + "IPY_MODEL_473ad0b6598f4b7bb9a482fb43e5df9d" + ], + "layout": "IPY_MODEL_089308801ffd44d4a4910133252ac4ac" + } + }, + "50b86bc5bc844b8eaad80cedd9fde48b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a699c8aa0ba1493f8acda292a7ed5baf", + "placeholder": "​", + "style": "IPY_MODEL_d97dc62af95d4724927c811b5d7e29ec", + "value": "README.md: 100%" + } + }, + "75f39de184674a5c9a1e5e91230d7c1b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_9fec8f47396045dd8db0098630bca7f9", + "max": 519, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_decf15485d6b43b9bd9a8e2c0b716e05", + "value": 519 + } + }, + "473ad0b6598f4b7bb9a482fb43e5df9d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ee418cb7012d410e9ac926454c0c07be", + "placeholder": "​", + "style": "IPY_MODEL_abd9f5c7c77646248fc205f392432130", + "value": " 519/519 [00:00<00:00, 44.5kB/s]" + } + }, + "089308801ffd44d4a4910133252ac4ac": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a699c8aa0ba1493f8acda292a7ed5baf": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d97dc62af95d4724927c811b5d7e29ec": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "9fec8f47396045dd8db0098630bca7f9": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "decf15485d6b43b9bd9a8e2c0b716e05": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "ee418cb7012d410e9ac926454c0c07be": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "abd9f5c7c77646248fc205f392432130": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "4be36a52100f4428bdde5c677f97da30": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_9e7726767acd484ebae1afdd654a05a4", + "IPY_MODEL_a31415c76913402189bc790b97a26c96", + "IPY_MODEL_91d72c119af74d01ba75649e6a5373c0" + ], + "layout": "IPY_MODEL_ccae1e6a6d734ba8a10877f0e6f7ab62" + } + }, + "9e7726767acd484ebae1afdd654a05a4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_45fba48641cf445fadc58d19b3f6c23e", + "placeholder": "​", + "style": "IPY_MODEL_50aef54f7eb74226a77cb441dfbe325c", + "value": "data/train-00000-of-00001.parquet: 100%" + } + }, + "a31415c76913402189bc790b97a26c96": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c55cdf1c16094a1daf8d88c607fadde4", + "max": 343805431, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_19a2453531bc4f6aaa93924719db78e6", + "value": 343805431 + } + }, + "91d72c119af74d01ba75649e6a5373c0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_caf7196689c34109b3533a1bb1848c8f", + "placeholder": "​", + "style": "IPY_MODEL_7ddcfb8760d54e9384cea9b194b54ff9", + "value": " 344M/344M [00:04<00:00, 95.2MB/s]" + } + }, + "ccae1e6a6d734ba8a10877f0e6f7ab62": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "45fba48641cf445fadc58d19b3f6c23e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "50aef54f7eb74226a77cb441dfbe325c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "c55cdf1c16094a1daf8d88c607fadde4": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "19a2453531bc4f6aaa93924719db78e6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "caf7196689c34109b3533a1bb1848c8f": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7ddcfb8760d54e9384cea9b194b54ff9": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "1a5cfa89e65b4d5eb4bad5b8384941d9": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_922e910c39484edab8e136d8cba096d6", + "IPY_MODEL_a73435e48a8a455db79e8b0af1da8e30", + "IPY_MODEL_d377ba70a4a34971b38dc3da00e14d8c" + ], + "layout": "IPY_MODEL_993adffff4014e6092d1ab58b7eb3785" + } + }, + "922e910c39484edab8e136d8cba096d6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_76197bb01a7345398b587cac018af3c0", + "placeholder": "​", + "style": "IPY_MODEL_95014c93ccf84efb9920b90fab3280a4", + "value": "data/test-00000-of-00001.parquet: 100%" + } + }, + "a73435e48a8a455db79e8b0af1da8e30": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_6fd42b94fd344004874b5b30af23f735", + "max": 38205016, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_0fedfdce9a5548e08523cd804bc52a89", + "value": 38205016 + } + }, + "d377ba70a4a34971b38dc3da00e14d8c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_791da49c566843a0931ff1fbf527ea15", + "placeholder": "​", + "style": "IPY_MODEL_ee6d087135cd45aeaecce5ead580e125", + "value": " 38.2M/38.2M [00:00<00:00, 191MB/s]" + } + }, + "993adffff4014e6092d1ab58b7eb3785": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "76197bb01a7345398b587cac018af3c0": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "95014c93ccf84efb9920b90fab3280a4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "6fd42b94fd344004874b5b30af23f735": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0fedfdce9a5548e08523cd804bc52a89": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "791da49c566843a0931ff1fbf527ea15": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ee6d087135cd45aeaecce5ead580e125": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3507926c364e4d2492f44547e8be6026": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_5b63be53fc584f12bcf3f059c259fdda", + "IPY_MODEL_9b3cc60d100c47a5bb6a5329ff154bdd", + "IPY_MODEL_876479a04ab345f099c2302802fed738" + ], + "layout": "IPY_MODEL_93409d01a3de414baa8dfbacfd6fdb55" + } + }, + "5b63be53fc584f12bcf3f059c259fdda": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0fc1b9b7ebea4f2caf949042b361295d", + "placeholder": "​", + "style": "IPY_MODEL_5fb3a854d76f463bb11b56217da39fa5", + "value": "Generating train split: 100%" + } + }, + "9b3cc60d100c47a5bb6a5329ff154bdd": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3ca40ab02d8943bdb688392f971d2368", + "max": 68686, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_d83a9351bb084b6aa9f029163ef1347d", + "value": 68686 + } + }, + "876479a04ab345f099c2302802fed738": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_664e56863796493b8fb8978ac323ed58", + "placeholder": "​", + "style": "IPY_MODEL_74f0a74dc1474cd0b0ea13705665f903", + "value": " 68686/68686 [00:01<00:00, 41332.68 examples/s]" + } + }, + "93409d01a3de414baa8dfbacfd6fdb55": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0fc1b9b7ebea4f2caf949042b361295d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5fb3a854d76f463bb11b56217da39fa5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3ca40ab02d8943bdb688392f971d2368": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d83a9351bb084b6aa9f029163ef1347d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "664e56863796493b8fb8978ac323ed58": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "74f0a74dc1474cd0b0ea13705665f903": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "49bae6aaa3fd4c2f9ae513c34a6224c4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_1e420f9238664bfa8671c532c2a3cbc7", + "IPY_MODEL_246458d1864d4bcebd0fd9ee8dffc9e9", + "IPY_MODEL_3d93daa0c6e446e59b6fcd78a03c057e" + ], + "layout": "IPY_MODEL_72e9d3ea09714d1b868d70a85cc1732d" + } + }, + "1e420f9238664bfa8671c532c2a3cbc7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_1ece85a4c59d4f52b737399c1f2751c7", + "placeholder": "​", + "style": "IPY_MODEL_a2e6128a027049a88094639640518b73", + "value": "Generating test split: 100%" + } + }, + "246458d1864d4bcebd0fd9ee8dffc9e9": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c9eab4588c46468f8a476936c7c69fa3", + "max": 7632, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_dedda5c9231042ba929049b41fbf1810", + "value": 7632 + } + }, + "3d93daa0c6e446e59b6fcd78a03c057e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_736b7e9afc8841ef81341d03934cb646", + "placeholder": "​", + "style": "IPY_MODEL_eefb235872ed4a858432d2abbc78e228", + "value": " 7632/7632 [00:01<00:00, 5407.71 examples/s]" + } + }, + "72e9d3ea09714d1b868d70a85cc1732d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1ece85a4c59d4f52b737399c1f2751c7": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a2e6128a027049a88094639640518b73": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "c9eab4588c46468f8a476936c7c69fa3": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "dedda5c9231042ba929049b41fbf1810": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "736b7e9afc8841ef81341d03934cb646": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "eefb235872ed4a858432d2abbc78e228": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "state": {} + } + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(26B_A4B)-Text.py b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(26B_A4B)-Text.py new file mode 100644 index 0000000..9a25fe5 --- /dev/null +++ b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(26B_A4B)-Text.py @@ -0,0 +1,512 @@ +#!/usr/bin/env python +# coding: utf-8 + +# To run this, press "*Runtime*" and press "*Run all*" on a Google Colab A100 instance! +#
+# +# +# Join Discord if you need help + ⭐ Star us on Github ⭐ +#
+# +# To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). +# +# You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it + +# ### News + +# Introducing **Unsloth Studio** - a new open source, no-code web UI to train and run LLMs. [Blog](https://unsloth.ai/docs/new/studio) • [Notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb) +# +# +# +# +#
Unsloth Studio Training UI
Train models — no code needed
Unsloth Studio Chat UI
Run GGUF models on Mac, Windows & Linux
+# +# Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe) +# +# Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context) +# +# New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) • [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) • [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning) +# +# Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks). + +# # ### Installation +# +# # In[1]: +# +# +# get_ipython().run_cell_magic('capture', '', 'import os, re\nif "COLAB_" not in "".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r\'[\\d]{1,}\\.[\\d]{1,}\', str(torch.__version__)).group(0)\n xformers = \'xformers==\' + {\'2.10\':\'0.0.34\',\'2.9\':\'0.0.33.post1\',\'2.8\':\'0.0.32.post2\'}.get(v, "0.0.34")\n !pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;\n') +# +# +# # In[2]: +# +# +# get_ipython().run_cell_magic('capture', '', '!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio\n') +# +# +# # ### Unsloth +# +# `FastModel` supports loading nearly any model now! This includes Vision and Text models! + +# In[3]: + + +from unsloth import FastModel +import torch + +gemma4_models = [ + # Gemma-4 instruct models: + "unsloth/gemma-4-E2B-it", + "unsloth/gemma-4-E4B-it", + "unsloth/gemma-4-31B-it", + "unsloth/gemma-4-26B-A4B-it", + # Gemma-4 base models: + "unsloth/gemma-4-E2B", + "unsloth/gemma-4-E4B", + "unsloth/gemma-4-31B", + "unsloth/gemma-4-26B-A4B", +] # More models at https://huggingface.co/unsloth + +model, tokenizer = FastModel.from_pretrained( + model_name = "unsloth/gemma-4-26B-A4B-it", + dtype = None, # None for auto detection + max_seq_length = 8192, # Choose any for long context! + load_in_4bit = True, # 4 bit quantization to reduce memory + full_finetuning = False, # [NEW!] We have full finetuning now! + # token = "YOUR_HF_TOKEN", # HF Token for gated models +) + + +# # Gemma 4 can process Text, Vision and Audio! +# +# Let's first experience how Gemma 4 can handle multimodal inputs. We use Gemma 4's recommended settings of `temperature = 1.0, top_p = 0.95, top_k = 64` + +# In[4]: + + +from transformers import TextStreamer +# Helper function for inference +def do_gemma_4_inference(messages, max_new_tokens = 128): + _ = model.generate( + **tokenizer.apply_chat_template( + messages, + add_generation_prompt = True, # Must add for generation + tokenize = True, + return_dict = True, + return_tensors = "pt", + ).to("cuda"), + max_new_tokens = max_new_tokens, + use_cache = True, + temperature = 1.0, top_p = 0.95, top_k = 64, + streamer = TextStreamer(tokenizer, skip_prompt = True), + ) + + +# # Gemma 4 can see images! +# +# Alt text + +# In[5]: + + +sloth_link = "https://files.worldwildlife.org/wwfcmsprod/images/Sloth_Sitting_iStock_3_12_2014/story_full_width/8l7pbjmj29_iStock_000011145477Large_mini__1_.jpg" + +messages = [{ + "role" : "user", + "content": [ + { "type": "image", "image" : sloth_link }, + { "type": "text", "text" : "Which films does this animal feature in?" } + ] +}] +# You might have to wait 1 minute for Unsloth's auto compiler +do_gemma_4_inference(messages, max_new_tokens = 256) + + +# Let's make a poem about sloths! + +# In[6]: + + +messages = [{ + "role": "user", + "content": [{ "type" : "text", + "text" : "Write a poem about sloths." }] +}] +do_gemma_4_inference(messages) + + +# # Let's finetune Gemma 4! +# +# You can finetune the vision and text parts for now through selection - the audio part can also be finetuned - we're working to make it selectable as well! + +# We now add LoRA adapters so we only need to update a small amount of parameters! + +# In[7]: + + +model = FastModel.get_peft_model( + model, + finetune_vision_layers = False, # Turn off for just text! + finetune_language_layers = True, # Should leave on! + finetune_attention_modules = True, # Attention good for GRPO + finetune_mlp_modules = True, # Should leave on always! + + r = 8, # Larger = higher accuracy, but might overfit + lora_alpha = 8, # Recommended alpha == r at least + lora_dropout = 0, + bias = "none", + random_state = 3407, +) + + +# +# ### Data Prep +# We now use the `Gemma-4` format for conversation style finetunes. We use [Maxime Labonne's FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset in ShareGPT style. Gemma-4 renders multi turn conversations like below: +# +# ``` +# <|turn>user +# Hello +# <|turn>model +# Hey there! +# ``` +# We use our `get_chat_template` function to get the correct chat template. We support `zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, phi3, llama3, phi4, qwen2.5, gemma3, gemma-4` and more. + +# In[8]: + + +from unsloth.chat_templates import get_chat_template +tokenizer = get_chat_template( + tokenizer, + chat_template = "gemma-4-thinking", +) + + +# We get the first 3000 rows of the dataset + +# In[9]: + + +from datasets import load_dataset +dataset = load_dataset("mlabonne/FineTome-100k", split = "train[:3000]") + + +# We now use `standardize_data_formats` to try converting datasets to the correct format for finetuning purposes! + +# In[10]: + + +from unsloth.chat_templates import standardize_data_formats +dataset = standardize_data_formats(dataset) + + +# Let's see how row 100 looks like! + +# In[11]: + + +dataset[100] + + +# We now have to apply the chat template for `Gemma-3` onto the conversations, and save it to `text`. We remove the `` token using removeprefix(`''`) since we're finetuning. The Processor will add this token before training and the model expects only one. + +# In[12]: + + +def formatting_prompts_func(examples): + convos = examples["conversations"] + texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False).removeprefix('') for convo in convos] + return { "text" : texts, } + +dataset = dataset.map(formatting_prompts_func, batched = True) + + +# Let's see how the chat template did! Notice there is no `` token as the processor tokenizer will be adding one. + +# In[13]: + + +dataset[100]["text"] + + +# +# ### Train the model +# Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. + +# In[14]: + + +from trl import SFTTrainer, SFTConfig +trainer = SFTTrainer( + model = model, + tokenizer = tokenizer, + train_dataset = dataset, + eval_dataset = None, # Can set up evaluation! + args = SFTConfig( + dataset_text_field = "text", + per_device_train_batch_size = 1, + gradient_accumulation_steps = 4, # Use GA to mimic batch size! + warmup_steps = 5, + # num_train_epochs = 1, # Set this for 1 full training run. + max_steps = 60, + learning_rate = 2e-4, # Reduce to 2e-5 for long training runs + logging_steps = 1, + optim = "adamw_8bit", + weight_decay = 0.001, + lr_scheduler_type = "linear", + seed = 3407, + report_to = "none", # Use TrackIO/WandB etc + ), +) + + +# We also use Unsloth's `train_on_completions` method to only train on the assistant outputs and ignore the loss on the user's inputs. This helps increase accuracy of finetunes! + +# In[15]: + + +from unsloth.chat_templates import train_on_responses_only +trainer = train_on_responses_only( + trainer, + instruction_part = "<|turn>user\n", + response_part = "<|turn>model\n", +) + + +# Let's verify masking the instruction part is done! Let's print the 100th row again. Notice how the sample only has a single `` as expected! + +# In[16]: + + +tokenizer.decode(trainer.train_dataset[100]["input_ids"]) + + +# Now let's print the masked out example - you should see only the answer is present: + +# In[17]: + + +tokenizer.decode([tokenizer.pad_token_id if x == -100 else x for x in trainer.train_dataset[100]["labels"]]).replace(tokenizer.pad_token, " ") + + +# In[18]: + + +# @title Show current memory stats +gpu_stats = torch.cuda.get_device_properties(0) +start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) +max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3) +print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.") +print(f"{start_gpu_memory} GB of memory reserved.") + + +# # Let's train the model! +# +# To resume a training run, set `trainer.train(resume_from_checkpoint = True)` + +# In[19]: + + +trainer_stats = trainer.train() + + +# In[20]: + + +# @title Show final memory and time stats +used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) +used_memory_for_lora = round(used_memory - start_gpu_memory, 3) +used_percentage = round(used_memory / max_memory * 100, 3) +lora_percentage = round(used_memory_for_lora / max_memory * 100, 3) +print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.") +print( + f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training." +) +print(f"Peak reserved memory = {used_memory} GB.") +print(f"Peak reserved memory for training = {used_memory_for_lora} GB.") +print(f"Peak reserved memory % of max memory = {used_percentage} %.") +print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.") + + +# +# ### Inference +# Let's run the model via Unsloth native inference! According to the `Gemma-3` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64` + +# In[21]: + + +from unsloth.chat_templates import get_chat_template +tokenizer = get_chat_template( + tokenizer, + chat_template = "gemma-4-thinking", +) +messages = [{ + "role": "user", + "content": [{ + "type" : "text", + "text" : "Continue the sequence: 1, 1, 2, 3, 5, 8,", + }] +}] +inputs = tokenizer.apply_chat_template( + messages, + add_generation_prompt = True, # Must add for generation + return_tensors = "pt", + tokenize = True, + return_dict = True, +).to("cuda") +outputs = model.generate( + **inputs, + max_new_tokens = 64, # Increase for longer outputs! + use_cache = True, + # Recommended Gemma-3 settings! + temperature = 1.0, top_p = 0.95, top_k = 64, +) +tokenizer.batch_decode(outputs) + + +# You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time! + +# In[22]: + + +messages = [{ + "role": "user", + "content": [{"type" : "text", "text" : "Why is the sky blue?",}] +}] +inputs = tokenizer.apply_chat_template( + messages, + add_generation_prompt = True, # Must add for generation + return_tensors = "pt", + tokenize = True, + return_dict = True, +).to("cuda") + +from transformers import TextStreamer +_ = model.generate( + **inputs, + max_new_tokens = 64, # Increase for longer outputs! + use_cache = True, + # Recommended Gemma-3 settings! + temperature = 1.0, top_p = 0.95, top_k = 64, + streamer = TextStreamer(tokenizer, skip_prompt = True), +) + + +# +# ### Saving, loading finetuned models +# To save the final model as LoRA adapters, either use Hugging Face's `push_to_hub` for an online save or `save_pretrained` for a local save. +# +# **[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down! + +# In[23]: + + +model.save_pretrained("gemma_4_lora") # Local saving +tokenizer.save_pretrained("gemma_4_lora") +# model.push_to_hub("HF_ACCOUNT/gemma_4_lora", token = "YOUR_HF_TOKEN") # Online saving +# tokenizer.push_to_hub("HF_ACCOUNT/gemma_4_lora", token = "YOUR_HF_TOKEN") # Online saving + + +# Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`: + +# In[24]: + + +if False: + from unsloth import FastModel + model, tokenizer = FastModel.from_pretrained( + model_name = "gemma_4_lora", # YOUR MODEL YOU USED FOR TRAINING + max_seq_length = 2048, + load_in_4bit = True, + ) + +messages = [{ + "role": "user", + "content": [{"type" : "text", "text" : "What is Gemma-4?",}] +}] +inputs = tokenizer.apply_chat_template( + messages, + add_generation_prompt = True, # Must add for generation + return_tensors = "pt", + tokenize = True, + return_dict = True, +).to("cuda") + +from transformers import TextStreamer +_ = model.generate( + **inputs, + max_new_tokens = 128, # Increase for longer outputs! + # Recommended Gemma-3 settings! + temperature = 1.0, top_p = 0.95, top_k = 64, + streamer = TextStreamer(tokenizer, skip_prompt = True), +) + + +# ### Saving to float16 for VLLM +# +# We also support saving to `float16` directly for deployment! We save it in the folder `gemma-4-finetune`. Set `if False` to `if True` to let it run! + +# In[25]: + + +if False: # Change to True to save finetune! + model.save_pretrained_merged("gemma-4-finetune", tokenizer) + + +# If you want to upload / push to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location! + +# In[26]: + + +if False: # Change to True to upload finetune + model.push_to_hub_merged( + "HF_ACCOUNT/gemma-4-finetune", tokenizer, + token = "YOUR_HF_TOKEN" + ) + + +# ### GGUF / llama.cpp Conversion +# To save to `GGUF` / `llama.cpp`, we support it natively now for all models! For now, you can convert easily to `Q8_0, F16 or BF16` precision. `Q4_K_M` for 4bit will come later! + +# In[27]: + + +if False: # Change to True to save to GGUF + model.save_pretrained_gguf( + "gemma_4_finetune", + tokenizer, + quantization_method = "Q8_0", # For now only Q8_0, BF16, F16 supported + ) + + +# Likewise, if you want to instead push to GGUF to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location! + +# In[28]: + + +if False: # Change to True to upload GGUF + model.push_to_hub_gguf( + "HF_ACCOUNT/gemma_4_finetune", + tokenizer, + quantization_method = "Q8_0", # Only Q8_0, BF16, F16 supported + token = "YOUR_HF_TOKEN", + ) + + +# Now, use the `gemma-4-finetune.gguf` file or `gemma-4-finetune-Q4_K_M.gguf` file in llama.cpp. +# +# And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord! +# +# Some other resources: +# 1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb) +# 2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb) +# 3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) +# 4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)! +# +#
+# +# +# +# +# Join Discord if you need help + ⭐️ Star us on Github ⭐️ +#
+# +# This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). diff --git a/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(26B_A4B)-Vision.py b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(26B_A4B)-Vision.py new file mode 100644 index 0000000..f843358 --- /dev/null +++ b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(26B_A4B)-Vision.py @@ -0,0 +1,448 @@ +#!/usr/bin/env python +# coding: utf-8 + +# To run this, press "*Runtime*" and press "*Run all*" on a Google Colab A100 instance! +#
+# +# +# Join Discord if you need help + ⭐ Star us on Github ⭐ +#
+# +# To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). +# +# You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it + +# ### News + +# Introducing **Unsloth Studio** - a new open source, no-code web UI to train and run LLMs. [Blog](https://unsloth.ai/docs/new/studio) • [Notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb) +# +# +# +# +#
Unsloth Studio Training UI
Train models — no code needed
Unsloth Studio Chat UI
Run GGUF models on Mac, Windows & Linux
+# +# Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe) +# +# Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context) +# +# New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) • [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) • [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning) +# +# Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks). + +# # ### Installation +# +# # In[1]: +# +# +# get_ipython().run_cell_magic('capture', '', 'import os, re\nif "COLAB_" not in "".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r\'[\\d]{1,}\\.[\\d]{1,}\', str(torch.__version__)).group(0)\n xformers = \'xformers==\' + {\'2.10\':\'0.0.34\',\'2.9\':\'0.0.33.post1\',\'2.8\':\'0.0.32.post2\'}.get(v, "0.0.34")\n !pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;\n') +# +# +# # In[2]: +# +# +# get_ipython().run_cell_magic('capture', '', '!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio\n') +# +# +# # ### Unsloth + +# In[3]: + + +from unsloth import FastVisionModel # FastLanguageModel for LLMs +import torch + +gemma4_models = [ + # Gemma-4 instruct models: + "unsloth/gemma-4-E2B-it", + "unsloth/gemma-4-E4B-it", + "unsloth/gemma-4-31B-it", + "unsloth/gemma-4-26B-A4B-it", + # Gemma-4 base models: + "unsloth/gemma-4-E2B", + "unsloth/gemma-4-E4B", + "unsloth/gemma-4-31B", + "unsloth/gemma-4-26B-A4B", +] # More models at https://huggingface.co/unsloth + +model, processor = FastVisionModel.from_pretrained( + "unsloth/gemma-4-26B-A4B-it", + load_in_4bit = True, # Use 4bit to reduce memory use. False for 16bit LoRA. + use_gradient_checkpointing = "unsloth", # True or "unsloth" for long context +) + + +# We now add LoRA adapters for parameter efficient fine-tuning, allowing us to train only 1% of all model parameters efficiently. +# +# **[NEW]** We also support fine-tuning only the vision component, only the language component, or both. Additionally, you can choose to fine-tune the attention modules, the MLP layers, or both! + +# In[4]: + + +model = FastVisionModel.get_peft_model( + model, + finetune_vision_layers = True, # False if not finetuning vision layers + finetune_language_layers = True, # False if not finetuning language layers + finetune_attention_modules = True, # False if not finetuning attention layers + finetune_mlp_modules = True, # False if not finetuning MLP layers + + r = 32, # The larger, the higher the accuracy, but might overfit + lora_alpha = 32, # Recommended alpha == r at least + lora_dropout = 0, + bias = "none", + random_state = 3407, + use_rslora = False, # We support rank stabilized LoRA + loftq_config = None, # And LoftQ + target_modules = "all-linear", # Optional now! Can specify a list if needed +) + + +# +# ### Data Prep +# We'll use a sampled dataset of handwritten math formulas. The objective is to convert these images into a computer-readable format—specifically LaTeX—so they can be rendered. This is particularly useful for complex expressions. +# +# You can access the dataset [here](https://huggingface.co/datasets/unsloth/LaTeX_OCR). The full dataset is [here](https://huggingface.co/datasets/linxy/LaTeX_OCR). + +# In[5]: + + +from datasets import load_dataset +dataset = load_dataset("unsloth/LaTeX_OCR", split = "train") + + +# Let's take an overview of the dataset. We'll examine the second image and its corresponding caption. + +# In[6]: + + +dataset + + +# In[7]: + + +dataset[2]["image"] + + +# In[8]: + + +dataset[2]["text"] + + +# We can also render LaTeX directly in the browser! + +# In[9]: + + +from IPython.display import display, Math, Latex + +latex = dataset[3]["text"] +display(Math(latex)) + + +# To format the dataset, all vision fine-tuning tasks should follow this format: +# +# ```python +# [ +# { +# "role": "user", +# "content": [ +# {"type": "text", "text": instruction}, +# {"type": "image", "image": sample["image"]}, +# ], +# }, +# { +# "role": "user", +# "content": [ +# {"type": "text", "text": instruction}, +# {"type": "image", "image": sample["image"]}, +# ], +# }, +# ] +# ``` + +# In[10]: + + +instruction = "Write the LaTeX representation for this image." + +def convert_to_conversation(sample): + conversation = [ + { + "role": "user", + "content": [ + {"type": "text", "text": instruction}, + {"type": "image", "image": sample["image"]}, + ], + }, + {"role": "assistant", "content": [{"type": "text", "text": sample["text"]}]}, + ] + return {"messages": conversation} +pass + + +# Let's convert the dataset into the "correct" format for finetuning: + +# In[11]: + + +converted_dataset = [convert_to_conversation(sample) for sample in dataset] + + +# The first example is now structured like below: + +# In[12]: + + +converted_dataset[0] + + +# Lets take the Gemma 4 instruction chat template and use it in our base model + +# In[13]: + + +from unsloth import get_chat_template + +processor = get_chat_template( + processor, + "gemma-4-thinking" +) + + +# Before fine-tuning, let us evaluate the base model's performance. We do not expect strong results, as it has not encountered this chat template before. + +# In[14]: + + +image = dataset[2]["image"] +instruction = "Write the LaTeX representation for this image." + +messages = [ + { + "role": "user", + "content": [{"type": "image"}, {"type": "text", "text": instruction}], + } +] +input_text = processor.apply_chat_template(messages, add_generation_prompt = True) +inputs = processor( + image, + input_text, + add_special_tokens = False, + return_tensors = "pt", +).to("cuda") + +from transformers import TextStreamer + +text_streamer = TextStreamer(processor, skip_prompt = True) +result = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128, + use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64) + + +# You can see it's absolutely terrible! It doesn't follow instructions at all + +# +# ### Train the model +# Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support `DPOTrainer` and `GRPOTrainer` for reinforcement learning! +# +# We use our new `UnslothVisionDataCollator` which will help in our vision finetuning setup. + +# In[15]: + + +from unsloth.trainer import UnslothVisionDataCollator +from trl import SFTTrainer, SFTConfig + +trainer = SFTTrainer( + model = model, + train_dataset = converted_dataset, + processing_class = processor.tokenizer, + data_collator = UnslothVisionDataCollator(model, processor), + args = SFTConfig( + per_device_train_batch_size = 1, + gradient_accumulation_steps = 4, + max_grad_norm = 0.3, + warmup_ratio = 0.03, + max_steps = 60, + # num_train_epochs = 2, # Set this instead of max_steps for full training runs + learning_rate = 2e-4, + logging_steps = 1, + save_strategy = "steps", + optim = "adamw_8bit", + weight_decay = 0.001, + lr_scheduler_type = "cosine", + seed = 3407, + output_dir = "outputs", + report_to = "none", # For Weights and Biases or others + + # You MUST put the below items for vision finetuning: + remove_unused_columns = False, + dataset_text_field = "", + dataset_kwargs = {"skip_prepare_dataset": True}, + max_length = 2048, + ) +) + + +# In[16]: + + +# @title Show current memory stats +gpu_stats = torch.cuda.get_device_properties(0) +start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) +max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3) +print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.") +print(f"{start_gpu_memory} GB of memory reserved.") + + +# In[17]: + + +trainer_stats = trainer.train() + + +# In[18]: + + +# @title Show final memory and time stats +used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) +used_memory_for_lora = round(used_memory - start_gpu_memory, 3) +used_percentage = round(used_memory / max_memory * 100, 3) +lora_percentage = round(used_memory_for_lora / max_memory * 100, 3) +print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.") +print( + f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training." +) +print(f"Peak reserved memory = {used_memory} GB.") +print(f"Peak reserved memory for training = {used_memory_for_lora} GB.") +print(f"Peak reserved memory % of max memory = {used_percentage} %.") +print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.") + + +# +# ### Inference +# Let's run the model! You can modify the instruction and input—just leave the output blank. +# +# We'll use the best hyperparameters for inference on Gemma: `top_p=0.95`, `top_k=64`, and `temperature=1.0`. + +# In[19]: + + +image = dataset[10]["image"] +instruction = "Write the LaTeX representation for this image." + +messages = [ + { + "role": "user", + "content": [{"type": "image"}, {"type": "text", "text": instruction}], + } +] + +input_text = processor.apply_chat_template(messages, add_generation_prompt = True) + +inputs = processor( + image, + input_text, + add_special_tokens = False, + return_tensors = "pt", +).to("cuda") + +from transformers import TextStreamer + +text_streamer = TextStreamer(processor, skip_prompt = True) +result = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128, + use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64) + + +# +# ### Saving, loading finetuned models +# To save the final model as LoRA adapters, use Hugging Face’s `push_to_hub` for online saving, or `save_pretrained` for local storage. +# +# **[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down! + +# In[20]: + + +model.save_pretrained("gemma_4_lora") # Local saving +processor.save_pretrained("gemma_4_lora") +# model.push_to_hub("your_name/gemma_4_lora", token = "YOUR_HF_TOKEN") # Online saving +# processor.push_to_hub("your_name/gemma_4_lora", token = "YOUR_HF_TOKEN") # Online saving + + +# Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`: + +# In[21]: + + +if False: + from unsloth import FastVisionModel + + model, processor = FastVisionModel.from_pretrained( + model_name = "gemma_4_lora", # YOUR MODEL YOU USED FOR TRAINING + load_in_4bit = True, # Set to False for 16bit LoRA + ) + +sample = dataset[1] +image = sample["image"].convert("RGB") +messages = [ + { + "role": "user", + "content": [ + { + "type": "text", + "text": sample["text"], + }, + { + "type": "image", + }, + ], + }, +] +input_text = processor.apply_chat_template(messages, add_generation_prompt = True) +inputs = processor( + image, + input_text, + add_special_tokens = False, + return_tensors = "pt", +).to("cuda") + +from transformers import TextStreamer + +text_streamer = TextStreamer(processor.tokenizer, skip_prompt = True) +_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128, + use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64) + + +# ### Saving to float16 for VLLM +# +# We also support saving to `float16` directly. Select `merged_16bit` for float16. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See [our docs](https://unsloth.ai/docs/basics/inference-and-deployment) for more deployment options. + +# In[22]: + + +# Select ONLY 1 to save! (Both not needed!) + +# Save locally to 16bit +if False: model.save_pretrained_merged("unsloth_finetune", processor,) + +# To export and save to your Hugging Face account +if False: model.push_to_hub_merged("YOUR_USERNAME/unsloth_finetune", processor, token = "YOUR_HF_TOKEN") + + +# And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord! +# +# Some other resources: +# 1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb) +# 2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb) +# 3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) +# 4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)! +# +#
+# +# +# +# +# Join Discord if you need help + ⭐️ Star us on Github ⭐️ +#
+# +# This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). diff --git a/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(31B)-Text.py b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(31B)-Text.py new file mode 100644 index 0000000..def34c6 --- /dev/null +++ b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(31B)-Text.py @@ -0,0 +1,513 @@ +#!/usr/bin/env python +# coding: utf-8 + +# To run this, press "*Runtime*" and press "*Run all*" on a Google Colab A100 instance! +#
+# +# +# Join Discord if you need help + ⭐ Star us on Github ⭐ +#
+# +# To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). +# +# You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it + +# ### News + +# Introducing **Unsloth Studio** - a new open source, no-code web UI to train and run LLMs. [Blog](https://unsloth.ai/docs/new/studio) • [Notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb) +# +# +# +# +#
Unsloth Studio Training UI
Train models — no code needed
Unsloth Studio Chat UI
Run GGUF models on Mac, Windows & Linux
+# +# Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe) +# +# Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context) +# +# New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) • [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) • [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning) +# +# Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks). + +# # ### Installation +# +# # In[1]: +# +# +# get_ipython().run_cell_magic('capture', '', 'import os, re\nif "COLAB_" not in "".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r\'[\\d]{1,}\\.[\\d]{1,}\', str(torch.__version__)).group(0)\n xformers = \'xformers==\' + {\'2.10\':\'0.0.34\',\'2.9\':\'0.0.33.post1\',\'2.8\':\'0.0.32.post2\'}.get(v, "0.0.34")\n !pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;\n') +# +# +# # In[2]: +# +# +# get_ipython().run_cell_magic('capture', '', '!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio\n') +# +# +# # ### Unsloth +# +# `FastModel` supports loading nearly any model now! This includes Vision and Text models! + +# In[3]: + + +from unsloth import FastModel +import torch + +gemma4_models = [ + # Gemma-4 instruct models: + "unsloth/gemma-4-E2B-it", + "unsloth/gemma-4-E4B-it", + "unsloth/gemma-4-31B-it", + "unsloth/gemma-4-26B-A4B-it", + # Gemma-4 base models: + "unsloth/gemma-4-E2B", + "unsloth/gemma-4-E4B", + "unsloth/gemma-4-31B", + "unsloth/gemma-4-26B-A4B", +] # More models at https://huggingface.co/unsloth + +model, tokenizer = FastModel.from_pretrained( + model_name = "unsloth/gemma-4-31B-it", + dtype = None, # None for auto detection + max_seq_length = 8192, # Choose any for long context! + load_in_4bit = True, # 4 bit quantization to reduce memory + full_finetuning = False, # [NEW!] We have full finetuning now! + # token = "YOUR_HF_TOKEN", # HF Token for gated models +) + + +# # Gemma 4 can process Text, Vision and Audio! +# +# Let's first experience how Gemma 4 can handle multimodal inputs. We use Gemma 4's recommended settings of `temperature = 1.0, top_p = 0.95, top_k = 64` + +# In[4]: + + +from transformers import TextStreamer +# Helper function for inference +def do_gemma_4_inference(messages, max_new_tokens = 128): + _ = model.generate( + **tokenizer.apply_chat_template( + messages, + add_generation_prompt = True, # Must add for generation + tokenize = True, + return_dict = True, + return_tensors = "pt", + ).to("cuda"), + max_new_tokens = max_new_tokens, + use_cache = True, + temperature = 1.0, top_p = 0.95, top_k = 64, + streamer = TextStreamer(tokenizer, skip_prompt = True), + ) + + +# # Gemma 4 can see images! +# +# Alt text + +# In[5]: + + +sloth_link = "https://files.worldwildlife.org/wwfcmsprod/images/Sloth_Sitting_iStock_3_12_2014/story_full_width/8l7pbjmj29_iStock_000011145477Large_mini__1_.jpg" + +messages = [{ + "role" : "user", + "content": [ + { "type": "image", "image" : sloth_link }, + { "type": "text", "text" : "Which films does this animal feature in?" } + ] +}] +# You might have to wait 1 minute for Unsloth's auto compiler +do_gemma_4_inference(messages, max_new_tokens = 256) + + +# Let's make a poem about sloths! + +# In[6]: + + +messages = [{ + "role": "user", + "content": [{ "type" : "text", + "text" : "Write a poem about sloths." }] +}] +do_gemma_4_inference(messages) + + +# # Let's finetune Gemma 4! +# +# You can finetune the vision and text parts for now through selection - the audio part can also be finetuned - we're working to make it selectable as well! + +# We now add LoRA adapters so we only need to update a small amount of parameters! + +# In[7]: + + +model = FastModel.get_peft_model( + model, + finetune_vision_layers = False, # Turn off for just text! + finetune_language_layers = True, # Should leave on! + finetune_attention_modules = True, # Attention good for GRPO + finetune_mlp_modules = True, # Should leave on always! + + r = 8, # Larger = higher accuracy, but might overfit + lora_alpha = 8, # Recommended alpha == r at least + lora_dropout = 0, + bias = "none", + random_state = 3407, +) + + +# +# ### Data Prep +# We now use the `Gemma-4` format for conversation style finetunes. We use [Maxime Labonne's FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset in ShareGPT style. Gemma-4 renders multi turn conversations like below: +# +# ``` +# <|turn>user +# Hello +# <|turn>model +# Hey there! +# ``` +# We use our `get_chat_template` function to get the correct chat template. We support `zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, phi3, llama3, phi4, qwen2.5, gemma3, gemma-4` and more. + +# In[8]: + + +from unsloth.chat_templates import get_chat_template +tokenizer = get_chat_template( + tokenizer, + chat_template = "gemma-4-thinking", +) + + +# We get the first 3000 rows of the dataset + +# In[9]: + + +from datasets import load_dataset +dataset = load_dataset("mlabonne/FineTome-100k", split = "train[:3000]") + + +# We now use `standardize_data_formats` to try converting datasets to the correct format for finetuning purposes! + +# In[10]: + + +from unsloth.chat_templates import standardize_data_formats +dataset = standardize_data_formats(dataset) + + +# Let's see how row 100 looks like! + +# In[11]: + + +dataset[100] + + +# We now have to apply the chat template for `Gemma-4` onto the conversations, and save it to `text`. We remove the `` token using removeprefix(`''`) since we're finetuning. The Processor will add this token before training and the model expects only one. + +# In[12]: + + +def formatting_prompts_func(examples): + convos = examples["conversations"] + texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False).removeprefix('') for convo in convos] + return { "text" : texts, } + +dataset = dataset.map(formatting_prompts_func, batched = True) + + +# Let's see how the chat template did! Notice there is no `` token as the processor tokenizer will be adding one. + +# In[13]: + + +dataset[100]["text"] + + +# +# ### Train the model +# Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. + +# In[14]: + + +from trl import SFTTrainer, SFTConfig +trainer = SFTTrainer( + model = model, + tokenizer = tokenizer, + train_dataset = dataset, + eval_dataset = None, # Can set up evaluation! + args = SFTConfig( + dataset_text_field = "text", + per_device_train_batch_size = 1, + gradient_accumulation_steps = 4, # Use GA to mimic batch size! + warmup_steps = 5, + # num_train_epochs = 1, # Set this for 1 full training run. + max_steps = 60, + learning_rate = 2e-4, # Reduce to 2e-5 for long training runs + logging_steps = 1, + optim = "adamw_8bit", + weight_decay = 0.001, + lr_scheduler_type = "linear", + seed = 3407, + report_to = "none", # Use TrackIO/WandB etc + ), +) + + +# We also use Unsloth's `train_on_completions` method to only train on the assistant outputs and ignore the loss on the user's inputs. This helps increase accuracy of finetunes! + +# In[15]: + + +from unsloth.chat_templates import train_on_responses_only +trainer = train_on_responses_only( + trainer, + instruction_part = "<|turn>user\n", + response_part = "<|turn>model\n", +) + + +# Let's verify masking the instruction part is done! Let's print the 100th row again. Notice how the sample only has a single `` as expected! + +# In[16]: + + +tokenizer.decode(trainer.train_dataset[100]["input_ids"]) + + +# Now let's print the masked out example - you should see only the answer is present: + +# In[17]: + + +tokenizer.decode([tokenizer.pad_token_id if x == -100 else x for x in trainer.train_dataset[100]["labels"]]).replace(tokenizer.pad_token, " ") + + +# In[18]: + + +# @title Show current memory stats +gpu_stats = torch.cuda.get_device_properties(0) +start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) +max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3) +print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.") +print(f"{start_gpu_memory} GB of memory reserved.") + + +# # Let's train the model! +# +# To resume a training run, set `trainer.train(resume_from_checkpoint = True)` + +# In[19]: + + +trainer_stats = trainer.train() + + +# In[20]: + + +# @title Show final memory and time stats +used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) +used_memory_for_lora = round(used_memory - start_gpu_memory, 3) +used_percentage = round(used_memory / max_memory * 100, 3) +lora_percentage = round(used_memory_for_lora / max_memory * 100, 3) +print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.") +print( + f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training." +) +print(f"Peak reserved memory = {used_memory} GB.") +print(f"Peak reserved memory for training = {used_memory_for_lora} GB.") +print(f"Peak reserved memory % of max memory = {used_percentage} %.") +print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.") + + +# +# ### Inference +# Let's run the model via Unsloth native inference! According to the `Gemma-4` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64` + +# In[21]: + + +from unsloth.chat_templates import get_chat_template +tokenizer = get_chat_template( + tokenizer, + chat_template = "gemma-4-thinking", +) +messages = [{ + "role": "user", + "content": [{ + "type" : "text", + "text" : "Continue the sequence: 1, 1, 2, 3, 5, 8,", + }] +}] +inputs = tokenizer.apply_chat_template( + messages, + add_generation_prompt = True, # Must add for generation + return_tensors = "pt", + tokenize = True, + return_dict = True, +).to("cuda") +outputs = model.generate( + **inputs, + max_new_tokens = 64, # Increase for longer outputs! + use_cache = True, + # Recommended Gemma-4 settings! + temperature = 1.0, top_p = 0.95, top_k = 64, +) +tokenizer.batch_decode(outputs) + + +# You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time! + +# In[22]: + + +messages = [{ + "role": "user", + "content": [{"type" : "text", "text" : "Why is the sky blue?",}] +}] +inputs = tokenizer.apply_chat_template( + messages, + add_generation_prompt = True, # Must add for generation + return_tensors = "pt", + tokenize = True, + return_dict = True, +).to("cuda") + +from transformers import TextStreamer +_ = model.generate( + **inputs, + max_new_tokens = 64, # Increase for longer outputs! + use_cache = True, + # Recommended Gemma-4 settings! + temperature = 1.0, top_p = 0.95, top_k = 64, + streamer = TextStreamer(tokenizer, skip_prompt = True), +) + + +# +# ### Saving, loading finetuned models +# To save the final model as LoRA adapters, either use Hugging Face's `push_to_hub` for an online save or `save_pretrained` for a local save. +# +# **[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down! + +# In[23]: + + +model.save_pretrained("gemma_4_lora") # Local saving +tokenizer.save_pretrained("gemma_4_lora") +# model.push_to_hub("HF_ACCOUNT/gemma_4_lora", token = "YOUR_HF_TOKEN") # Online saving +# tokenizer.push_to_hub("HF_ACCOUNT/gemma_4_lora", token = "YOUR_HF_TOKEN") # Online saving + + +# Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`: + +# In[24]: + + +if False: + from unsloth import FastModel + model, tokenizer = FastModel.from_pretrained( + model_name = "gemma_4_lora", # YOUR MODEL YOU USED FOR TRAINING + max_seq_length = 2048, + load_in_4bit = True, + ) + +messages = [{ + "role": "user", + "content": [{"type" : "text", "text" : "What is Gemma-4?",}] +}] +inputs = tokenizer.apply_chat_template( + messages, + add_generation_prompt = True, # Must add for generation + return_tensors = "pt", + tokenize = True, + return_dict = True, +).to("cuda") + +from transformers import TextStreamer +_ = model.generate( + **inputs, + max_new_tokens = 128, # Increase for longer outputs! + use_cache = True, + # Recommended Gemma-4 settings! + temperature = 1.0, top_p = 0.95, top_k = 64, + streamer = TextStreamer(tokenizer, skip_prompt = True), +) + + +# ### Saving to float16 for VLLM +# +# We also support saving to `float16` directly for deployment! We save it in the folder `gemma-4-finetune`. Set `if False` to `if True` to let it run! + +# In[25]: + + +if False: # Change to True to save finetune! + model.save_pretrained_merged("gemma-4-finetune", tokenizer) + + +# If you want to upload / push to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location! + +# In[26]: + + +if False: # Change to True to upload finetune + model.push_to_hub_merged( + "HF_ACCOUNT/gemma-4-finetune", tokenizer, + token = "YOUR_HF_TOKEN" + ) + + +# ### GGUF / llama.cpp Conversion +# To save to `GGUF` / `llama.cpp`, we support it natively now for all models! For now, you can convert easily to `Q8_0, F16 or BF16` precision. `Q4_K_M` for 4bit will come later! + +# In[27]: + + +if False: # Change to True to save to GGUF + model.save_pretrained_gguf( + "gemma_4_finetune", + tokenizer, + quantization_method = "Q8_0", # For now only Q8_0, BF16, F16 supported + ) + + +# Likewise, if you want to instead push to GGUF to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location! + +# In[28]: + + +if False: # Change to True to upload GGUF + model.push_to_hub_gguf( + "HF_ACCOUNT/gemma_4_finetune", + tokenizer, + quantization_method = "Q8_0", # Only Q8_0, BF16, F16 supported + token = "YOUR_HF_TOKEN", + ) + + +# Now, use the `gemma-4-finetune.gguf` file or `gemma-4-finetune-Q4_K_M.gguf` file in llama.cpp. +# +# And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord! +# +# Some other resources: +# 1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb) +# 2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb) +# 3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) +# 4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)! +# +#
+# +# +# +# +# Join Discord if you need help + ⭐️ Star us on Github ⭐️ +#
+# +# This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). diff --git a/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(31B)-Vision.py b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(31B)-Vision.py new file mode 100644 index 0000000..6f46b05 --- /dev/null +++ b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(31B)-Vision.py @@ -0,0 +1,448 @@ +#!/usr/bin/env python +# coding: utf-8 + +# To run this, press "*Runtime*" and press "*Run all*" on a Google Colab A100 instance! +#
+# +# +# Join Discord if you need help + ⭐ Star us on Github ⭐ +#
+# +# To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). +# +# You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it + +# ### News + +# Introducing **Unsloth Studio** - a new open source, no-code web UI to train and run LLMs. [Blog](https://unsloth.ai/docs/new/studio) • [Notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb) +# +# +# +# +#
Unsloth Studio Training UI
Train models — no code needed
Unsloth Studio Chat UI
Run GGUF models on Mac, Windows & Linux
+# +# Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe) +# +# Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context) +# +# New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) • [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) • [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning) +# +# Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks). + +# # ### Installation +# +# # In[1]: +# +# +# get_ipython().run_cell_magic('capture', '', 'import os, re\nif "COLAB_" not in "".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r\'[\\d]{1,}\\.[\\d]{1,}\', str(torch.__version__)).group(0)\n xformers = \'xformers==\' + {\'2.10\':\'0.0.34\',\'2.9\':\'0.0.33.post1\',\'2.8\':\'0.0.32.post2\'}.get(v, "0.0.34")\n !pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;\n') +# +# +# # In[2]: +# +# +# get_ipython().run_cell_magic('capture', '', '!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio\n') +# +# +# # ### Unsloth + +# In[3]: + + +from unsloth import FastVisionModel # FastLanguageModel for LLMs +import torch + +gemma4_models = [ + # Gemma-4 instruct models: + "unsloth/gemma-4-E2B-it", + "unsloth/gemma-4-E4B-it", + "unsloth/gemma-4-31B-it", + "unsloth/gemma-4-26B-A4B-it", + # Gemma-4 base models: + "unsloth/gemma-4-E2B", + "unsloth/gemma-4-E4B", + "unsloth/gemma-4-31B", + "unsloth/gemma-4-26B-A4B", +] # More models at https://huggingface.co/unsloth + +model, processor = FastVisionModel.from_pretrained( + "unsloth/gemma-4-31B-it", + load_in_4bit = True, # Use 4bit to reduce memory use. False for 16bit LoRA. + use_gradient_checkpointing = "unsloth", # True or "unsloth" for long context +) + + +# We now add LoRA adapters for parameter efficient fine-tuning, allowing us to train only 1% of all model parameters efficiently. +# +# **[NEW]** We also support fine-tuning only the vision component, only the language component, or both. Additionally, you can choose to fine-tune the attention modules, the MLP layers, or both! + +# In[4]: + + +model = FastVisionModel.get_peft_model( + model, + finetune_vision_layers = True, # False if not finetuning vision layers + finetune_language_layers = True, # False if not finetuning language layers + finetune_attention_modules = True, # False if not finetuning attention layers + finetune_mlp_modules = True, # False if not finetuning MLP layers + + r = 32, # The larger, the higher the accuracy, but might overfit + lora_alpha = 32, # Recommended alpha == r at least + lora_dropout = 0, + bias = "none", + random_state = 3407, + use_rslora = False, # We support rank stabilized LoRA + loftq_config = None, # And LoftQ + target_modules = "all-linear", # Optional now! Can specify a list if needed +) + + +# +# ### Data Prep +# We'll use a sampled dataset of handwritten math formulas. The objective is to convert these images into a computer-readable format—specifically LaTeX—so they can be rendered. This is particularly useful for complex expressions. +# +# You can access the dataset [here](https://huggingface.co/datasets/unsloth/LaTeX_OCR). The full dataset is [here](https://huggingface.co/datasets/linxy/LaTeX_OCR). + +# In[5]: + + +from datasets import load_dataset +dataset = load_dataset("unsloth/LaTeX_OCR", split = "train") + + +# Let's take an overview of the dataset. We'll examine the second image and its corresponding caption. + +# In[6]: + + +dataset + + +# In[7]: + + +dataset[2]["image"] + + +# In[8]: + + +dataset[2]["text"] + + +# We can also render LaTeX directly in the browser! + +# In[9]: + + +from IPython.display import display, Math, Latex + +latex = dataset[3]["text"] +display(Math(latex)) + + +# To format the dataset, all vision fine-tuning tasks should follow this format: +# +# ```python +# [ +# { +# "role": "user", +# "content": [ +# {"type": "text", "text": instruction}, +# {"type": "image", "image": sample["image"]}, +# ], +# }, +# { +# "role": "user", +# "content": [ +# {"type": "text", "text": instruction}, +# {"type": "image", "image": sample["image"]}, +# ], +# }, +# ] +# ``` + +# In[10]: + + +instruction = "Write the LaTeX representation for this image." + +def convert_to_conversation(sample): + conversation = [ + { + "role": "user", + "content": [ + {"type": "text", "text": instruction}, + {"type": "image", "image": sample["image"]}, + ], + }, + {"role": "assistant", "content": [{"type": "text", "text": sample["text"]}]}, + ] + return {"messages": conversation} +pass + + +# Let's convert the dataset into the "correct" format for finetuning: + +# In[11]: + + +converted_dataset = [convert_to_conversation(sample) for sample in dataset] + + +# The first example is now structured like below: + +# In[12]: + + +converted_dataset[0] + + +# Lets take the Gemma 4 instruction chat template and use it in our base model + +# In[13]: + + +from unsloth import get_chat_template + +processor = get_chat_template( + processor, + "gemma-4-thinking" +) + + +# Before fine-tuning, let us evaluate the base model's performance. We do not expect strong results, as it has not encountered this chat template before. + +# In[14]: + + +image = dataset[2]["image"] +instruction = "Write the LaTeX representation for this image." + +messages = [ + { + "role": "user", + "content": [{"type": "image"}, {"type": "text", "text": instruction}], + } +] +input_text = processor.apply_chat_template(messages, add_generation_prompt = True) +inputs = processor( + image, + input_text, + add_special_tokens = False, + return_tensors = "pt", +).to("cuda") + +from transformers import TextStreamer + +text_streamer = TextStreamer(processor, skip_prompt = True) +result = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128, + use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64) + + +# You can see it's absolutely terrible! It doesn't follow instructions at all + +# +# ### Train the model +# Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support `DPOTrainer` and `GRPOTrainer` for reinforcement learning! +# +# We use our new `UnslothVisionDataCollator` which will help in our vision finetuning setup. + +# In[15]: + + +from unsloth.trainer import UnslothVisionDataCollator +from trl import SFTTrainer, SFTConfig + +trainer = SFTTrainer( + model = model, + train_dataset = converted_dataset, + processing_class = processor.tokenizer, + data_collator = UnslothVisionDataCollator(model, processor), + args = SFTConfig( + per_device_train_batch_size = 1, + gradient_accumulation_steps = 4, + max_grad_norm = 0.3, + warmup_ratio = 0.03, + max_steps = 60, + # num_train_epochs = 2, # Set this instead of max_steps for full training runs + learning_rate = 2e-4, + logging_steps = 1, + save_strategy = "steps", + optim = "adamw_8bit", + weight_decay = 0.001, + lr_scheduler_type = "cosine", + seed = 3407, + output_dir = "outputs", + report_to = "none", # For Weights and Biases or others + + # You MUST put the below items for vision finetuning: + remove_unused_columns = False, + dataset_text_field = "", + dataset_kwargs = {"skip_prepare_dataset": True}, + max_length = 2048, + ) +) + + +# In[16]: + + +# @title Show current memory stats +gpu_stats = torch.cuda.get_device_properties(0) +start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) +max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3) +print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.") +print(f"{start_gpu_memory} GB of memory reserved.") + + +# In[17]: + + +trainer_stats = trainer.train() + + +# In[18]: + + +# @title Show final memory and time stats +used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) +used_memory_for_lora = round(used_memory - start_gpu_memory, 3) +used_percentage = round(used_memory / max_memory * 100, 3) +lora_percentage = round(used_memory_for_lora / max_memory * 100, 3) +print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.") +print( + f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training." +) +print(f"Peak reserved memory = {used_memory} GB.") +print(f"Peak reserved memory for training = {used_memory_for_lora} GB.") +print(f"Peak reserved memory % of max memory = {used_percentage} %.") +print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.") + + +# +# ### Inference +# Let's run the model! You can modify the instruction and input—just leave the output blank. +# +# We'll use the best hyperparameters for inference on Gemma: `top_p=0.95`, `top_k=64`, and `temperature=1.0`. + +# In[19]: + + +image = dataset[10]["image"] +instruction = "Write the LaTeX representation for this image." + +messages = [ + { + "role": "user", + "content": [{"type": "image"}, {"type": "text", "text": instruction}], + } +] + +input_text = processor.apply_chat_template(messages, add_generation_prompt = True) + +inputs = processor( + image, + input_text, + add_special_tokens = False, + return_tensors = "pt", +).to("cuda") + +from transformers import TextStreamer + +text_streamer = TextStreamer(processor, skip_prompt = True) +result = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128, + use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64) + + +# +# ### Saving, loading finetuned models +# To save the final model as LoRA adapters, use Hugging Face’s `push_to_hub` for online saving, or `save_pretrained` for local storage. +# +# **[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down! + +# In[20]: + + +model.save_pretrained("gemma_4_lora") # Local saving +processor.save_pretrained("gemma_4_lora") +# model.push_to_hub("your_name/gemma_4_lora", token = "YOUR_HF_TOKEN") # Online saving +# processor.push_to_hub("your_name/gemma_4_lora", token = "YOUR_HF_TOKEN") # Online saving + + +# Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`: + +# In[21]: + + +if False: + from unsloth import FastVisionModel + + model, processor = FastVisionModel.from_pretrained( + model_name = "gemma_4_lora", # YOUR MODEL YOU USED FOR TRAINING + load_in_4bit = True, # Set to False for 16bit LoRA + ) + +sample = dataset[1] +image = sample["image"].convert("RGB") +messages = [ + { + "role": "user", + "content": [ + { + "type": "text", + "text": sample["text"], + }, + { + "type": "image", + }, + ], + }, +] +input_text = processor.apply_chat_template(messages, add_generation_prompt = True) +inputs = processor( + image, + input_text, + add_special_tokens = False, + return_tensors = "pt", +).to("cuda") + +from transformers import TextStreamer + +text_streamer = TextStreamer(processor.tokenizer, skip_prompt = True) +_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128, + use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64) + + +# ### Saving to float16 for VLLM +# +# We also support saving to `float16` directly. Select `merged_16bit` for float16. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See [our docs](https://unsloth.ai/docs/basics/inference-and-deployment) for more deployment options. + +# In[22]: + + +# Select ONLY 1 to save! (Both not needed!) + +# Save locally to 16bit +if False: model.save_pretrained_merged("unsloth_finetune", processor,) + +# To export and save to your Hugging Face account +if False: model.push_to_hub_merged("YOUR_USERNAME/unsloth_finetune", processor, token = "YOUR_HF_TOKEN") + + +# And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord! +# +# Some other resources: +# 1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb) +# 2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb) +# 3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) +# 4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)! +# +#
+# +# +# +# +# Join Discord if you need help + ⭐️ Star us on Github ⭐️ +#
+# +# This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). diff --git a/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E2B)-Audio.py b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E2B)-Audio.py new file mode 100644 index 0000000..a1bf768 --- /dev/null +++ b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E2B)-Audio.py @@ -0,0 +1,478 @@ +#!/usr/bin/env python +# coding: utf-8 + +# To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance! +#
+# +# +# Join Discord if you need help + ⭐ Star us on Github ⭐ +#
+# +# To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). +# +# You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it + +# ### News + +# Introducing **Unsloth Studio** - a new open source, no-code web UI to train and run LLMs. [Blog](https://unsloth.ai/docs/new/studio) • [Notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb) +# +# +# +# +#
Unsloth Studio Training UI
Train models — no code needed
Unsloth Studio Chat UI
Run GGUF models on Mac, Windows & Linux
+# +# Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe) +# +# Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context) +# +# New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) • [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) • [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning) +# +# Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks). + +# # ### Installation +# +# # In[1]: +# +# +# get_ipython().run_cell_magic('capture', '', 'import os, re\nif "COLAB_" not in "".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r\'[\\d]{1,}\\.[\\d]{1,}\', str(torch.__version__)).group(0)\n xformers = \'xformers==\' + {\'2.10\':\'0.0.34\',\'2.9\':\'0.0.33.post1\',\'2.8\':\'0.0.32.post2\'}.get(v, "0.0.34")\n !pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;\n') +# +# +# # In[2]: +# +# +# get_ipython().run_cell_magic('capture', '', '!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio\n') +# +# +# # ### Unsloth +# +# `FastModel` supports loading nearly any model now! This includes Vision and Text models! + +# In[3]: + + +from unsloth import FastModel +import torch +from huggingface_hub import snapshot_download + +fourbit_models = [ + # Gemma 4 models + "unsloth/gemma-4-E2B-it", + "unsloth/gemma-4-E2B", + "unsloth/gemma-4-E2B-it", + "unsloth/gemma-4-E4B", + "unsloth/gemma-4-31B-it", + "unsloth/gemma-4-31B", + "unsloth/gemma-4-26B-A4B-it", + "unsloth/gemma-4-26B-A4B", +] # More models at https://huggingface.co/unsloth + +model, processor = FastModel.from_pretrained( + model_name = "unsloth/gemma-4-E2B-it", + dtype = None, # None for auto detection + max_seq_length = 8192, # Choose any for long context! + load_in_4bit = False, # 4 bit quantization to reduce memory + full_finetuning = False, # [NEW!] We have full finetuning now! + # token = "YOUR_HF_TOKEN", # HF Token for gated models +) + + +# # Gemma 4 can process Text, Vision and Audio! +# +# Let's first experience how Gemma 4 can handle multimodal inputs. We use Gemma 4's recommended settings of `temperature = 1.0, top_p = 0.95, top_k = 64` but for this example we use `do_sample=False` for ASR. + +# In[4]: + + +from transformers import TextStreamer +# Helper function for inference +def do_gemma_4_inference(messages, max_new_tokens = 128): + _ = model.generate( + **processor.apply_chat_template( + messages, + add_generation_prompt = True, # Must add for generation + tokenize = True, + return_dict = True, + return_tensors = "pt", + ).to("cuda"), + max_new_tokens = max_new_tokens, + do_sample = False, + streamer = TextStreamer(processor, skip_prompt = True), + ) + + +#

Let's Evaluate Gemma 4 Baseline Performance on German Transcription

+ +# In[5]: + + +from datasets import load_dataset,Audio,concatenate_datasets + +dataset = load_dataset("kadirnar/Emilia-DE-B000000", split = "train") + +# Select a single audio sample to reserve for testing. +# This index is chosen from the full dataset before we create the smaller training split. +test_audio = dataset[7546] + +dataset = dataset.select(range(3000)) + +dataset = dataset.cast_column("audio", Audio(sampling_rate = 16000)) + + +# In[6]: + + +from IPython.display import Audio, display +print(test_audio['text']) +Audio(test_audio['audio']['array'],rate = test_audio['audio']['sampling_rate']) + + +# And the translation of the audio from German to English is: +# +# > I—I hold myself directly accountable. That much is, of course, clear: namely, that there are political interests involved in trade—in the exchange of goods—and that political influences are at play. The question is: that should not be the alternative. + +# In[7]: + + +messages = [ + { + "role": "system", + "content": [ + { + "type": "text", + "text": "You are an assistant that transcribes speech accurately.", + } + ], + }, + { + "role": "user", + "content": [ + {"type": "audio", "audio": test_audio['audio']['array']}, + {"type": "text", "text": "Please transcribe this audio."} + ] + } +] + +do_gemma_4_inference(messages, max_new_tokens = 256) + + +#

Baseline Model Performance: 32.43% Word Error Rate (WER) for this sample !

+ +# # Let's finetune Gemma 4! +# +# You can finetune the vision and text and audio parts + +# We now add LoRA adapters so we only need to update a small amount of parameters! + +# In[8]: + + +model = FastModel.get_peft_model( + model, + finetune_vision_layers = False, # False if not finetuning vision layers + finetune_language_layers = True, # False if not finetuning language layers + finetune_attention_modules = True, # False if not finetuning attention layers + finetune_mlp_modules = True, # False if not finetuning MLP layers + + r = 8, # The larger, the higher the accuracy, but might overfit + lora_alpha = 16, # Recommended alpha == r at least + lora_dropout = 0, + bias = "none", + random_state = 3407, + use_rslora = False, # We support rank stabilized LoRA + loftq_config = None, # And LoftQ + target_modules = [ + "q_proj", "k_proj", "v_proj", "o_proj", + "gate_proj", "up_proj", "down_proj", + + # Audio layers + "post", "linear_start", "linear_end", + "embedding_projection", + "ffw_layer_1", "ffw_layer_2", + "output_proj", + ] +) + + +# +# ### Data Prep +# We adapt the `kadirnar/Emilia-DE-B000000` dataset for our German ASR task using Gemma 4 multi-modal chat format. Each audio-text pair is structured into a conversation with `system`, `user`, and `assistant` roles. The processor then converts this into the final training format: +# +# ``` +# <|turn>system +# You are an assistant that transcribes speech accurately. +# <|turn>user +# <|audio|>Please transcribe this audio. +# <|turn>model +# Ich, ich rechne direkt mich an. + +# In[9]: + + +def format_intersection_data(samples: dict) -> dict[str, list]: + """Format intersection dataset to match expected message format""" + formatted_samples = {"messages": []} + for idx in range(len(samples["audio"])): + audio = samples["audio"][idx]["array"] + label = str(samples["text"][idx]) + + message = [ + { + "role": "system", + "content": [ + { + "type": "text", + "text": "You are an assistant that transcribes speech accurately.", + } + ], + }, + { + "role": "user", + "content": [ + {"type": "audio", "audio": audio}, + {"type": "text", "text": "Please transcribe this audio."} + ] + }, + { + "role": "assistant", + "content":[{"type": "text", "text": label}] + } + ] + formatted_samples["messages"].append(message) + return formatted_samples + + +# In[10]: + + +dataset = dataset.map(format_intersection_data, batched = True, batch_size = 4, num_proc = 4) + + +# +# ### Train the model +# Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. + +# In[11]: + + +# Use UnslothVisionDataCollator which handles audio token alignment correctly +from unsloth.trainer import UnslothVisionDataCollator +from trl import SFTTrainer, SFTConfig + +trainer = SFTTrainer( + model = model, + train_dataset = dataset, + processing_class = processor.tokenizer, + data_collator = UnslothVisionDataCollator(model, processor), + args = SFTConfig( + per_device_train_batch_size = 8, + gradient_accumulation_steps = 1, + warmup_ratio = 0.03, + # num_train_epochs = 1, # Use for full training runs + max_steps = 60, + learning_rate = 5e-5, + logging_steps = 1, + save_strategy = "steps", + optim = "adamw_8bit", + weight_decay = 0.001, + lr_scheduler_type = "cosine", + seed = 3407, + output_dir = "outputs", + report_to = "none", + remove_unused_columns = False, + + # The below are a must for audio finetuning: + dataset_text_field = "", + dataset_kwargs = {"skip_prepare_dataset": True}, + max_length = 8192, + ) +) + + +# In[12]: + + +# @title Show current memory stats +gpu_stats = torch.cuda.get_device_properties(0) +start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) +max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3) +print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.") +print(f"{start_gpu_memory} GB of memory reserved.") + + +# # Let's train the model! +# +# To resume a training run, set `trainer.train(resume_from_checkpoint = True)` + +# In[13]: + + +trainer_stats = trainer.train() + + +# In[14]: + + +# @title Show final memory and time stats +used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) +used_memory_for_lora = round(used_memory - start_gpu_memory, 3) +used_percentage = round(used_memory / max_memory * 100, 3) +lora_percentage = round(used_memory_for_lora / max_memory * 100, 3) +print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.") +print( + f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training." +) +print(f"Peak reserved memory = {used_memory} GB.") +print(f"Peak reserved memory for training = {used_memory_for_lora} GB.") +print(f"Peak reserved memory % of max memory = {used_percentage} %.") +print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.") + + +# +# ### Inference +# Let's run the model via Unsloth native inference! According to the `Gemma-4` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64` but for this example we use `do_sample=False` for ASR. + +# In[15]: + + +messages = [ + { + "role": "system", + "content": [ + { + "type": "text", + "text": "You are an assistant that transcribes speech accurately.", + } + ], + }, + { + "role": "user", + "content": [ + {"type": "audio", "audio": test_audio['audio']['array']}, + {"type": "text", "text": "Please transcribe this audio."} + ] + } +] + +do_gemma_4_inference(messages, max_new_tokens = 256) + + +# +# ### Saving, loading finetuned models +# To save the final model as LoRA adapters, either use Hugging Face's `push_to_hub` for an online save or `save_pretrained` for a local save. +# +# **[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down! + +# In[16]: + + +model.save_pretrained("gemma_4_lora") # Local saving +processor.save_pretrained("gemma_4_lora") +# model.push_to_hub("HF_ACCOUNT/gemma_4_lora", token = "YOUR_HF_TOKEN") # Online saving +# processor.push_to_hub("HF_ACCOUNT/gemma_4_lora", token = "YOUR_HF_TOKEN") # Online saving + + +# Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`: + +# In[17]: + + +if False: + from unsloth import FastModel + model, processor = FastModel.from_pretrained( + model_name = "gemma_4_lora", # YOUR MODEL YOU USED FOR TRAINING + max_seq_length = 2048, + load_in_4bit = True, + ) + +messages = [{ + "role": "user", + "content": [{"type" : "text", "text" : "What is Gemma-4?",}] +}] +inputs = processor.apply_chat_template( + messages, + add_generation_prompt = True, # Must add for generation + return_tensors = "pt", + tokenize = True, + return_dict = True, +).to("cuda") + +from transformers import TextStreamer +_ = model.generate( + **inputs, + max_new_tokens = 128, # Increase for longer outputs! + # Recommended Gemma-4 settings! + temperature = 1.0, top_p = 0.95, top_k = 64, + streamer = TextStreamer(processor, skip_prompt = True), +) + + +# ### Saving to float16 for VLLM +# +# We also support saving to `float16` directly for deployment! We save it in the folder `gemma-4-finetune`. Set `if False` to `if True` to let it run! + +# In[18]: + + +if False: # Change to True to save finetune! + model.save_pretrained_merged("gemma-4", processor) + + +# If you want to upload / push to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location! + +# In[19]: + + +if False: # Change to True to upload finetune + model.push_to_hub_merged( + "HF_ACCOUNT/gemma-4-finetune", processor, + token = "YOUR_HF_TOKEN" + ) + + +# ### GGUF / llama.cpp Conversion +# To save to `GGUF` / `llama.cpp`, we support it natively now for all models! For now, you can convert easily to `Q8_0, F16 or BF16` precision. `Q4_K_M` for 4bit will come later! + +# In[20]: + + +if False: # Change to True to save to GGUF + model.save_pretrained_gguf( + "gemma_4_finetune", + processor, + quantization_method = "Q8_0", # For now only Q8_0, BF16, F16 supported + ) + + +# Likewise, if you want to instead push to GGUF to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location! + +# In[21]: + + +if False: # Change to True to upload GGUF + model.push_to_hub_gguf( + "HF_ACCOUNT/gemma_4_finetune", + processor, + quantization_method = "Q8_0", # Only Q8_0, BF16, F16 supported + token = "YOUR_HF_TOKEN", + ) + + +# Now, use the `gemma-4-finetune.gguf` file or `gemma-4-finetune-Q4_K_M.gguf` file in llama.cpp. +# +# And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord! +# +# Some other resources: +# 1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb) +# 2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb) +# 3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) +# 4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)! +# +#
+# +# +# +# +# Join Discord if you need help + ⭐️ Star us on Github ⭐️ +#
+# +# This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). diff --git a/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E2B)-Text.py b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E2B)-Text.py new file mode 100644 index 0000000..8042ab5 --- /dev/null +++ b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E2B)-Text.py @@ -0,0 +1,556 @@ +#!/usr/bin/env python +# coding: utf-8 + +# To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance! +#
+# +# +# Join Discord if you need help + ⭐ Star us on Github ⭐ +#
+# +# To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). +# +# You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it + +# ### News + +# Introducing **Unsloth Studio** - a new open source, no-code web UI to train and run LLMs. [Blog](https://unsloth.ai/docs/new/studio) • [Notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb) +# +# +# +# +#
Unsloth Studio Training UI
Train models — no code needed
Unsloth Studio Chat UI
Run GGUF models on Mac, Windows & Linux
+# +# Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe) +# +# Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context) +# +# New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) • [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) • [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning) +# +# Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks). + +# # ### Installation +# +# # In[1]: +# +# +# get_ipython().run_cell_magic('capture', '', 'import os, re\nif "COLAB_" not in "".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r\'[\\d]{1,}\\.[\\d]{1,}\', str(torch.__version__)).group(0)\n xformers = \'xformers==\' + {\'2.10\':\'0.0.34\',\'2.9\':\'0.0.33.post1\',\'2.8\':\'0.0.32.post2\'}.get(v, "0.0.34")\n !pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;\n') +# +# +# # In[2]: +# +# +# get_ipython().run_cell_magic('capture', '', '!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio\n') +# +# +# # ### Unsloth +# +# `FastModel` supports loading nearly any model now! This includes Vision and Text models! + +# In[3]: + + +from unsloth import FastModel +import torch + +gemma4_models = [ + # Gemma-4 instruct models: + "unsloth/gemma-4-E2B-it", + "unsloth/gemma-4-E4B-it", + "unsloth/gemma-4-31B-it", + "unsloth/gemma-4-26B-A4B-it", + # Gemma-4 base models: + "unsloth/gemma-4-E2B", + "unsloth/gemma-4-E4B", + "unsloth/gemma-4-31B", + "unsloth/gemma-4-26B-A4B", +] # More models at https://huggingface.co/unsloth + +model, tokenizer = FastModel.from_pretrained( + model_name = "unsloth/gemma-4-E2B-it", + dtype = None, # None for auto detection + max_seq_length = 1024, # Choose any for long context! + load_in_4bit = False, # 4 bit quantization to reduce memory + full_finetuning = False, # [NEW!] We have full finetuning now! + # token = "YOUR_HF_TOKEN", # HF Token for gated models +) + + +# # Gemma 4 can process Text, Vision and Audio! +# +# Let's first experience how Gemma 4 can handle multimodal inputs. We use Gemma 4's recommended settings of `temperature = 1.0, top_p = 0.95, top_k = 64` + +# In[4]: + + +from transformers import TextStreamer +# Helper function for inference +def do_gemma_4_inference(messages, max_new_tokens = 128): + _ = model.generate( + **tokenizer.apply_chat_template( + messages, + add_generation_prompt = True, # Must add for generation + tokenize = True, + return_dict = True, + return_tensors = "pt", + ).to("cuda"), + max_new_tokens = max_new_tokens, + temperature = 1.0, top_p = 0.95, top_k = 64, + streamer = TextStreamer(tokenizer, skip_prompt = True) + ) + + +# # Gemma 4 can see images! +# +# Alt text + +# In[5]: + + +sloth_link = "https://files.worldwildlife.org/wwfcmsprod/images/Sloth_Sitting_iStock_3_12_2014/story_full_width/8l7pbjmj29_iStock_000011145477Large_mini__1_.jpg" + +messages = [{ + "role" : "user", + "content": [ + { "type": "image", "image" : sloth_link }, + { "type": "text", "text" : "Which films does this animal feature in?" } + ] +}] +# You might have to wait 1 minute for Unsloth's auto compiler +do_gemma_4_inference(messages, max_new_tokens = 256) + + +# Let's make a poem about sloths! + +# In[6]: + + +messages = [{ + "role": "user", + "content": [{ "type" : "text", + "text" : "Write a poem about sloths." }] +}] +do_gemma_4_inference(messages) + + +# # Gemma 4 can also hear! + +# In[7]: + + +from IPython.display import Audio, display +Audio("https://www.nasa.gov/wp-content/uploads/2015/01/591240main_JFKmoonspeech.mp3") + + +# In[8]: + + +get_ipython().system('wget -qqq https://www.nasa.gov/wp-content/uploads/2015/01/591240main_JFKmoonspeech.mp3 -O audio.mp3') + + +# In[9]: + + +audio_file = "audio.mp3" + +messages = [{ + "role" : "user", + "content": [ + { "type": "audio", "audio" : audio_file }, + { "type": "text", "text" : "What is this audio about?" } + ] +}] +do_gemma_4_inference(messages, max_new_tokens = 256) + + +# # Let's combine all 3 modalities together! + +# In[10]: + + +messages = [{ + "role" : "user", + "content": [ + { "type": "audio", "audio" : audio_file }, + { "type": "image", "image" : sloth_link }, + { "type": "text", "text" : "What is this audio and image about? "\ + "How are they related?" } + ] +}] +do_gemma_4_inference(messages, max_new_tokens = 256) + + +# # Let's finetune Gemma 4! +# +# You can finetune the vision and text parts for now through selection - the audio part can also be finetuned - we're working to make it selectable as well! + +# We now add LoRA adapters so we only need to update a small amount of parameters! + +# In[11]: + + +model = FastModel.get_peft_model( + model, + finetune_vision_layers = False, # Turn off for just text! + finetune_language_layers = True, # Should leave on! + finetune_attention_modules = True, # Attention good for GRPO + finetune_mlp_modules = True, # Should leave on always! + + r = 8, # Larger = higher accuracy, but might overfit + lora_alpha = 8, # Recommended alpha == r at least + lora_dropout = 0, + bias = "none", + random_state = 3407, +) + + +# +# ### Data Prep +# We now use the `Gemma-4` format for conversation style finetunes. We use [Maxime Labonne's FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset in ShareGPT style. Gemma-4 renders multi turn conversations like below: +# +# ``` +# <|turn>user +# Hello +# <|turn>model +# Hey there! +# ``` +# We use our `get_chat_template` function to get the correct chat template. We support `zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, phi3, llama3, phi4, qwen2.5, gemma3, gemma-4` and more. + +# In[12]: + + +from unsloth.chat_templates import get_chat_template +tokenizer = get_chat_template( + tokenizer, + chat_template = "gemma-4", +) + + +# We get the first 3000 rows of the dataset + +# In[13]: + + +from datasets import load_dataset +dataset = load_dataset("mlabonne/FineTome-100k", split = "train[:3000]") + + +# We now use `standardize_data_formats` to try converting datasets to the correct format for finetuning purposes! + +# In[14]: + + +from unsloth.chat_templates import standardize_data_formats +dataset = standardize_data_formats(dataset) + + +# Let's see how row 100 looks like! + +# In[15]: + + +dataset[100] + + +# We now have to apply the chat template for `Gemma-4` onto the conversations, and save it to `text`. We remove the `` token using removeprefix(`''`) since we're finetuning. The Processor will add this token before training and the model expects only one. + +# In[16]: + + +def formatting_prompts_func(examples): + convos = examples["conversations"] + texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False).removeprefix('') for convo in convos] + return { "text" : texts, } + +dataset = dataset.map(formatting_prompts_func, batched = True) + + +# Let's see how the chat template did! Notice there is no `` token as the processor tokenizer will be adding one. + +# In[17]: + + +dataset[100]["text"] + + +# +# ### Train the model +# Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. + +# In[18]: + + +from trl import SFTTrainer, SFTConfig +trainer = SFTTrainer( + model = model, + tokenizer = tokenizer, + train_dataset = dataset, + eval_dataset = None, # Can set up evaluation! + args = SFTConfig( + dataset_text_field = "text", + per_device_train_batch_size = 1, + gradient_accumulation_steps = 4, # Use GA to mimic batch size! + warmup_steps = 5, + # num_train_epochs = 1, # Set this for 1 full training run. + max_steps = 60, + learning_rate = 2e-4, # Reduce to 2e-5 for long training runs + logging_steps = 1, + optim = "adamw_8bit", + weight_decay = 0.001, + lr_scheduler_type = "linear", + seed = 3407, + report_to = "none", # Use TrackIO/WandB etc + ), +) + + +# We also use Unsloth's `train_on_completions` method to only train on the assistant outputs and ignore the loss on the user's inputs. This helps increase accuracy of finetunes! + +# In[19]: + + +from unsloth.chat_templates import train_on_responses_only +trainer = train_on_responses_only( + trainer, + instruction_part = "<|turn>user\n", + response_part = "<|turn>model\n", +) + + +# Let's verify masking the instruction part is done! Let's print the 100th row again. Notice how the sample only has a single `` as expected! + +# In[20]: + + +tokenizer.decode(trainer.train_dataset[100]["input_ids"]) + + +# Now let's print the masked out example - you should see only the answer is present: + +# In[21]: + + +tokenizer.decode([tokenizer.pad_token_id if x == -100 else x for x in trainer.train_dataset[100]["labels"]]).replace(tokenizer.pad_token, " ") + + +# In[22]: + + +# @title Show current memory stats +gpu_stats = torch.cuda.get_device_properties(0) +start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) +max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3) +print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.") +print(f"{start_gpu_memory} GB of memory reserved.") + + +# # Let's train the model! +# +# To resume a training run, set `trainer.train(resume_from_checkpoint = True)` + +# In[23]: + + +trainer_stats = trainer.train() + + +# In[24]: + + +# @title Show final memory and time stats +used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) +used_memory_for_lora = round(used_memory - start_gpu_memory, 3) +used_percentage = round(used_memory / max_memory * 100, 3) +lora_percentage = round(used_memory_for_lora / max_memory * 100, 3) +print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.") +print( + f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training." +) +print(f"Peak reserved memory = {used_memory} GB.") +print(f"Peak reserved memory for training = {used_memory_for_lora} GB.") +print(f"Peak reserved memory % of max memory = {used_percentage} %.") +print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.") + + +# +# ### Inference +# Let's run the model via Unsloth native inference! According to the `Gemma-4` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64` + +# In[25]: + + +from unsloth.chat_templates import get_chat_template +tokenizer = get_chat_template( + tokenizer, + chat_template = "gemma-4", +) +messages = [{ + "role": "user", + "content": [{ + "type" : "text", + "text" : "Continue the sequence: 1, 1, 2, 3, 5, 8,", + }] +}] +inputs = tokenizer.apply_chat_template( + messages, + add_generation_prompt = True, # Must add for generation + return_tensors = "pt", + tokenize = True, + return_dict = True, +).to("cuda") +outputs = model.generate( + **inputs, + max_new_tokens = 64, # Increase for longer outputs! + # Recommended Gemma-4 settings! + temperature = 1.0, top_p = 0.95, top_k = 64, +) +tokenizer.batch_decode(outputs) + + +# You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time! + +# In[26]: + + +messages = [{ + "role": "user", + "content": [{"type" : "text", "text" : "Why is the sky blue?",}] +}] +inputs = tokenizer.apply_chat_template( + messages, + add_generation_prompt = True, # Must add for generation + return_tensors = "pt", + tokenize = True, + return_dict = True, +).to("cuda") + +from transformers import TextStreamer +_ = model.generate( + **inputs, + max_new_tokens = 64, # Increase for longer outputs! + # Recommended Gemma-4 settings! + temperature = 1.0, top_p = 0.95, top_k = 64, + streamer = TextStreamer(tokenizer, skip_prompt = True), +) + + +# +# ### Saving, loading finetuned models +# To save the final model as LoRA adapters, either use Hugging Face's `push_to_hub` for an online save or `save_pretrained` for a local save. +# +# **[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down! + +# In[27]: + + +model.save_pretrained("gemma_4_lora") # Local saving +tokenizer.save_pretrained("gemma_4_lora") +# model.push_to_hub("HF_ACCOUNT/gemma_4_lora", token = "YOUR_HF_TOKEN") # Online saving +# tokenizer.push_to_hub("HF_ACCOUNT/gemma_4_lora", token = "YOUR_HF_TOKEN") # Online saving + + +# Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`: + +# In[28]: + + +if False: + from unsloth import FastModel + model, tokenizer = FastModel.from_pretrained( + model_name = "gemma_4_lora", # YOUR MODEL YOU USED FOR TRAINING + max_seq_length = 2048, + load_in_4bit = True, + ) + +messages = [{ + "role": "user", + "content": [{"type" : "text", "text" : "What is Gemma-4?",}] +}] +inputs = tokenizer.apply_chat_template( + messages, + add_generation_prompt = True, # Must add for generation + return_tensors = "pt", + tokenize = True, + return_dict = True, +).to("cuda") + +from transformers import TextStreamer +_ = model.generate( + **inputs, + max_new_tokens = 128, # Increase for longer outputs! + # Recommended Gemma-4 settings! + temperature = 1.0, top_p = 0.95, top_k = 64, + streamer = TextStreamer(tokenizer, skip_prompt = True), +) + + +# ### Saving to float16 for VLLM +# +# We also support saving to `float16` directly for deployment! We save it in the folder `gemma-4-finetune`. Set `if False` to `if True` to let it run! + +# In[29]: + + +if False: # Change to True to save finetune! + model.save_pretrained_merged("gemma-4-finetune", tokenizer) + + +# If you want to upload / push to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location! + +# In[30]: + + +if False: # Change to True to upload finetune + model.push_to_hub_merged( + "HF_ACCOUNT/gemma-4-finetune", tokenizer, + token = "YOUR_HF_TOKEN" + ) + + +# ### GGUF / llama.cpp Conversion +# To save to `GGUF` / `llama.cpp`, we support it natively now for all models! For now, you can convert easily to `Q8_0, F16 or BF16` precision. `Q4_K_M` for 4bit will come later! + +# In[31]: + + +if False: # Change to True to save to GGUF + model.save_pretrained_gguf( + "gemma_4_finetune", + tokenizer, + quantization_method = "Q8_0", # For now only Q8_0, BF16, F16 supported + ) + + +# Likewise, if you want to instead push to GGUF to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location! + +# In[32]: + + +if False: # Change to True to upload GGUF + model.push_to_hub_gguf( + "HF_ACCOUNT/gemma_4_finetune", + tokenizer, + quantization_method = "Q8_0", # Only Q8_0, BF16, F16 supported + token = "YOUR_HF_TOKEN", + ) + + +# Now, use the `gemma-4-finetune.gguf` file or `gemma-4-finetune-Q4_K_M.gguf` file in llama.cpp. +# +# And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord! +# +# Some other resources: +# 1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb) +# 2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb) +# 3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) +# 4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)! +# +#
+# +# +# +# +# Join Discord if you need help + ⭐️ Star us on Github ⭐️ +#
+# +# This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). diff --git a/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E2B)-Vision.py b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E2B)-Vision.py new file mode 100644 index 0000000..0139528 --- /dev/null +++ b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E2B)-Vision.py @@ -0,0 +1,448 @@ +#!/usr/bin/env python +# coding: utf-8 + +# To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance! +#
+# +# +# Join Discord if you need help + ⭐ Star us on Github ⭐ +#
+# +# To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). +# +# You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it + +# ### News + +# Introducing **Unsloth Studio** - a new open source, no-code web UI to train and run LLMs. [Blog](https://unsloth.ai/docs/new/studio) • [Notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb) +# +# +# +# +#
Unsloth Studio Training UI
Train models — no code needed
Unsloth Studio Chat UI
Run GGUF models on Mac, Windows & Linux
+# +# Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe) +# +# Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context) +# +# New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) • [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) • [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning) +# +# Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks). + +# # ### Installation +# +# # In[ ]: +# +# +# get_ipython().run_cell_magic('capture', '', 'import os, re\nif "COLAB_" not in "".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r\'[\\d]{1,}\\.[\\d]{1,}\', str(torch.__version__)).group(0)\n xformers = \'xformers==\' + {\'2.10\':\'0.0.34\',\'2.9\':\'0.0.33.post1\',\'2.8\':\'0.0.32.post2\'}.get(v, "0.0.34")\n !pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;\n') +# +# +# # In[ ]: +# +# +# get_ipython().run_cell_magic('capture', '', '!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio\n') +# +# +# # ### Unsloth + +# In[ ]: + + +from unsloth import FastVisionModel # FastLanguageModel for LLMs +import torch + +gemma4_models = [ + # Gemma-4 instruct models: + "unsloth/gemma-4-E2B-it", + "unsloth/gemma-4-E4B-it", + "unsloth/gemma-4-31B-it", + "unsloth/gemma-4-26B-A4B-it", + # Gemma-4 base models: + "unsloth/gemma-4-E2B", + "unsloth/gemma-4-E4B", + "unsloth/gemma-4-31B", + "unsloth/gemma-4-26B-A4B", +] # More models at https://huggingface.co/unsloth + +model, processor = FastVisionModel.from_pretrained( + "unsloth/gemma-4-E2B-it", + load_in_4bit = False, # Use 4bit to reduce memory use. False for 16bit LoRA. + use_gradient_checkpointing = "unsloth", # True or "unsloth" for long context +) + + +# We now add LoRA adapters for parameter efficient fine-tuning, allowing us to train only 1% of all model parameters efficiently. +# +# **[NEW]** We also support fine-tuning only the vision component, only the language component, or both. Additionally, you can choose to fine-tune the attention modules, the MLP layers, or both! + +# In[ ]: + + +model = FastVisionModel.get_peft_model( + model, + finetune_vision_layers = True, # False if not finetuning vision layers + finetune_language_layers = True, # False if not finetuning language layers + finetune_attention_modules = True, # False if not finetuning attention layers + finetune_mlp_modules = True, # False if not finetuning MLP layers + + r = 32, # The larger, the higher the accuracy, but might overfit + lora_alpha = 32, # Recommended alpha == r at least + lora_dropout = 0, + bias = "none", + random_state = 3407, + use_rslora = False, # We support rank stabilized LoRA + loftq_config = None, # And LoftQ + target_modules = "all-linear", # Optional now! Can specify a list if needed +) + + +# +# ### Data Prep +# We'll use a sampled dataset of handwritten math formulas. The objective is to convert these images into a computer-readable format—specifically LaTeX—so they can be rendered. This is particularly useful for complex expressions. +# +# You can access the dataset [here](https://huggingface.co/datasets/unsloth/LaTeX_OCR). The full dataset is [here](https://huggingface.co/datasets/linxy/LaTeX_OCR). + +# In[ ]: + + +from datasets import load_dataset +dataset = load_dataset("unsloth/LaTeX_OCR", split = "train") + + +# Let's take an overview of the dataset. We'll examine the second image and its corresponding caption. + +# In[ ]: + + +dataset + + +# In[ ]: + + +dataset[2]["image"] + + +# In[ ]: + + +dataset[2]["text"] + + +# We can also render LaTeX directly in the browser! + +# In[ ]: + + +from IPython.display import display, Math, Latex + +latex = dataset[3]["text"] +display(Math(latex)) + + +# To format the dataset, all vision fine-tuning tasks should follow this format: +# +# ```python +# [ +# { +# "role": "user", +# "content": [ +# {"type": "text", "text": instruction}, +# {"type": "image", "image": sample["image"]}, +# ], +# }, +# { +# "role": "user", +# "content": [ +# {"type": "text", "text": instruction}, +# {"type": "image", "image": sample["image"]}, +# ], +# }, +# ] +# ``` + +# In[ ]: + + +instruction = "Write the LaTeX representation for this image." + +def convert_to_conversation(sample): + conversation = [ + { + "role": "user", + "content": [ + {"type": "text", "text": instruction}, + {"type": "image", "image": sample["image"]}, + ], + }, + {"role": "assistant", "content": [{"type": "text", "text": sample["text"]}]}, + ] + return {"messages": conversation} +pass + + +# Let's convert the dataset into the "correct" format for finetuning: + +# In[ ]: + + +converted_dataset = [convert_to_conversation(sample) for sample in dataset] + + +# The first example is now structured like below: + +# In[ ]: + + +converted_dataset[0] + + +# Lets take the Gemma 4 instruction chat template and use it in our base model + +# In[ ]: + + +from unsloth import get_chat_template + +processor = get_chat_template( + processor, + "gemma-4" +) + + +# Before fine-tuning, let us evaluate the base model's performance. We do not expect strong results, as it has not encountered this chat template before. + +# In[ ]: + + +image = dataset[2]["image"] +instruction = "Write the LaTeX representation for this image." + +messages = [ + { + "role": "user", + "content": [{"type": "image"}, {"type": "text", "text": instruction}], + } +] +input_text = processor.apply_chat_template(messages, add_generation_prompt = True) +inputs = processor( + image, + input_text, + add_special_tokens = False, + return_tensors = "pt", +).to("cuda") + +from transformers import TextStreamer + +text_streamer = TextStreamer(processor, skip_prompt = True) +result = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128, + use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64) + + +# You can see it's absolutely terrible! It doesn't follow instructions at all + +# +# ### Train the model +# Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support `DPOTrainer` and `GRPOTrainer` for reinforcement learning! +# +# We use our new `UnslothVisionDataCollator` which will help in our vision finetuning setup. + +# In[ ]: + + +from unsloth.trainer import UnslothVisionDataCollator +from trl import SFTTrainer, SFTConfig + +trainer = SFTTrainer( + model = model, + train_dataset = converted_dataset, + processing_class = processor.tokenizer, + data_collator = UnslothVisionDataCollator(model, processor), + args = SFTConfig( + per_device_train_batch_size = 1, + gradient_accumulation_steps = 4, + max_grad_norm = 0.3, + warmup_ratio = 0.03, + max_steps = 60, + # num_train_epochs = 2, # Set this instead of max_steps for full training runs + learning_rate = 2e-4, + logging_steps = 1, + save_strategy = "steps", + optim = "adamw_8bit", + weight_decay = 0.001, + lr_scheduler_type = "cosine", + seed = 3407, + output_dir = "outputs", + report_to = "none", # For Weights and Biases or others + + # You MUST put the below items for vision finetuning: + remove_unused_columns = False, + dataset_text_field = "", + dataset_kwargs = {"skip_prepare_dataset": True}, + max_length = 2048, + ) +) + + +# In[ ]: + + +# @title Show current memory stats +gpu_stats = torch.cuda.get_device_properties(0) +start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) +max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3) +print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.") +print(f"{start_gpu_memory} GB of memory reserved.") + + +# In[ ]: + + +trainer_stats = trainer.train() + + +# In[ ]: + + +# @title Show final memory and time stats +used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) +used_memory_for_lora = round(used_memory - start_gpu_memory, 3) +used_percentage = round(used_memory / max_memory * 100, 3) +lora_percentage = round(used_memory_for_lora / max_memory * 100, 3) +print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.") +print( + f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training." +) +print(f"Peak reserved memory = {used_memory} GB.") +print(f"Peak reserved memory for training = {used_memory_for_lora} GB.") +print(f"Peak reserved memory % of max memory = {used_percentage} %.") +print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.") + + +# +# ### Inference +# Let's run the model! You can modify the instruction and input—just leave the output blank. +# +# We'll use the best hyperparameters for inference on Gemma: `top_p=0.95`, `top_k=64`, and `temperature=1.0`. + +# In[ ]: + + +image = dataset[10]["image"] +instruction = "Write the LaTeX representation for this image." + +messages = [ + { + "role": "user", + "content": [{"type": "image"}, {"type": "text", "text": instruction}], + } +] + +input_text = processor.apply_chat_template(messages, add_generation_prompt = True) + +inputs = processor( + image, + input_text, + add_special_tokens = False, + return_tensors = "pt", +).to("cuda") + +from transformers import TextStreamer + +text_streamer = TextStreamer(processor, skip_prompt = True) +result = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128, + use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64) + + +# +# ### Saving, loading finetuned models +# To save the final model as LoRA adapters, use Hugging Face’s `push_to_hub` for online saving, or `save_pretrained` for local storage. +# +# **[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down! + +# In[ ]: + + +model.save_pretrained("gemma_4_lora") # Local saving +processor.save_pretrained("gemma_4_lora") +# model.push_to_hub("your_name/gemma_4_lora", token = "YOUR_HF_TOKEN") # Online saving +# processor.push_to_hub("your_name/gemma_4_lora", token = "YOUR_HF_TOKEN") # Online saving + + +# Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`: + +# In[ ]: + + +if False: + from unsloth import FastVisionModel + + model, processor = FastVisionModel.from_pretrained( + model_name = "gemma_4_lora", # YOUR MODEL YOU USED FOR TRAINING + load_in_4bit = True, # Set to False for 16bit LoRA + ) + +sample = dataset[1] +image = sample["image"].convert("RGB") +messages = [ + { + "role": "user", + "content": [ + { + "type": "text", + "text": sample["text"], + }, + { + "type": "image", + }, + ], + }, +] +input_text = processor.apply_chat_template(messages, add_generation_prompt = True) +inputs = processor( + image, + input_text, + add_special_tokens = False, + return_tensors = "pt", +).to("cuda") + +from transformers import TextStreamer + +text_streamer = TextStreamer(processor.tokenizer, skip_prompt = True) +_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128, + use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64) + + +# ### Saving to float16 for VLLM +# +# We also support saving to `float16` directly. Select `merged_16bit` for float16. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See [our docs](https://unsloth.ai/docs/basics/inference-and-deployment) for more deployment options. + +# In[ ]: + + +# Select ONLY 1 to save! (Both not needed!) + +# Save locally to 16bit +if False: model.save_pretrained_merged("unsloth_finetune", processor,) + +# To export and save to your Hugging Face account +if False: model.push_to_hub_merged("YOUR_USERNAME/unsloth_finetune", processor, token = "YOUR_HF_TOKEN") + + +# And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord! +# +# Some other resources: +# 1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb) +# 2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb) +# 3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) +# 4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)! +# +#
+# +# +# +# +# Join Discord if you need help + ⭐️ Star us on Github ⭐️ +#
+# +# This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). diff --git a/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E2B)_GRPO.py b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E2B)_GRPO.py new file mode 100644 index 0000000..d6f7171 --- /dev/null +++ b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E2B)_GRPO.py @@ -0,0 +1,911 @@ +#!/usr/bin/env python +# coding: utf-8 + +# To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance! +#
+# +# +# Join Discord if you need help + ⭐ Star us on Github ⭐ +#
+# +# To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). +# +# You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it + +# # ### Installation +# +# # In[ ]: +# +# +# get_ipython().run_cell_magic('capture', '', 'import os, re\nif "COLAB_" not in "".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r\'[\\d]{1,}\\.[\\d]{1,}\', str(torch.__version__)).group(0)\n xformers = \'xformers==\' + {\'2.10\':\'0.0.34\',\'2.9\':\'0.0.33.post1\',\'2.8\':\'0.0.32.post2\'}.get(v, "0.0.34")\n !pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;\n') +# +# +# # In[ ]: +# +# +# #@title Colab Extra Install { display-mode: "form" } +# get_ipython().run_line_magic('%capture', '') +# import os +# get_ipython().system('pip install --upgrade -qqq uv') +# if "COLAB_" not in "".join(os.environ.keys()): +# # If you're not in Colab, just use pip install! +# get_ipython().system('pip install unsloth vllm') +# else: +# try: import numpy, PIL; _numpy = f'numpy=={numpy.__version__}'; _pil = f'pillow=={PIL.__version__}' +# except: _numpy = "numpy"; _pil = "pillow" +# try: import subprocess; is_t4 = "Tesla T4" in str(subprocess.check_output(["nvidia-smi"])) +# except: is_t4 = False +# _vllm, _triton = ('vllm==0.9.2', 'triton==3.2.0') if is_t4 else ('vllm==0.15.1', 'triton') +# get_ipython().system('uv pip install -qqq --upgrade {_vllm} {_numpy} {_pil} torchvision bitsandbytes xformers unsloth') +# get_ipython().system('uv pip install -qqq {_triton}') +# get_ipython().system('uv pip install transformers==4.56.2') +# get_ipython().system('uv pip install --no-deps trl==0.22.2') +# +# +# # ### Unsloth + +# # Goal: Make faster kernels with Reinforcement Learning +# +# Our goal is to make a faster matrix multiplication kernel by doing RL on Gemma 4 with Unsloth. +# +# +# +# You will learn how to: +# 1. Counteract **reward hacking** like cheating, caching, laziness. +# 2. Timing and correctness of kernels and time limits. +# 3. Making good **reward functions** +# 4. How to seriously do RL to make optimized kernels + +# In[ ]: + + +from unsloth import FastVisionModel +import torch +max_seq_length = 4096 # Can increase for longer reasoning traces +lora_rank = 32 # Larger rank = smarter, but slower + +gemma4_models = [ + # Gemma-4 instruct models: + "unsloth/gemma-4-E2B-it", + "unsloth/gemma-4-E4B-it", + "unsloth/gemma-4-31B-it", + "unsloth/gemma-4-26B-A4B-it", + # Gemma-4 base models: + "unsloth/gemma-4-E2B", + "unsloth/gemma-4-E4B", + "unsloth/gemma-4-31B", + "unsloth/gemma-4-26B-A4B", +] # More models at https://huggingface.co/unsloth + +model, tokenizer = FastVisionModel.from_pretrained( + model_name = "unsloth/gemma-4-E2B-it", + max_seq_length = max_seq_length, + load_in_4bit = False, # False for LoRA 16bit + fast_inference = False, # Enable vllm fast inference +) + + +# We now add some small amount of LoRA weights to Gemma 4 so we only need to train those, instead of training on the full model. + +# In[ ]: + + +model = FastVisionModel.get_peft_model( + model, + r = lora_rank, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128 + target_modules = [ + "q_proj", "k_proj", "v_proj", "o_proj", + "gate_proj", "up_proj", "down_proj", + ], + lora_alpha = lora_rank*2, # *2 speeds up training + use_gradient_checkpointing = "unsloth", # Reduces memory usage + random_state = 3407, +) + + +# # Optimized matrix multiplication +# +# Numpy has optimized matrix multiplication kernels for CPUs via BLAS optimized operations. For GPUs, one can use CUDA accelerated cuBLAS kernels which PyTorch calls under the hood. +# +# To generate some random matrices to do matrix multiplication, we can do the below: + +# In[ ]: + + +import numpy as np +def generate_random_matrices(seed = 3407, n = 256): + random_state = np.random.RandomState(seed) + n, k, m = random_state.randint(1, n+1, size = 3) + A = np.random.uniform(-10, 10, size = (n, k)) + B = np.random.uniform(-10, 10, size = (k, m)) + return A, A.tolist(), B, B.tolist() + + +# We shall generate a small matrix, and see the matrix multiplied output + +# In[ ]: + + +A, A_list, B, B_list = generate_random_matrices(seed = 42, n = 5) +print(A) +print(B) +print(np.matmul(A, B)) + + +# We can call a LLM to generate a simple matrix multiply kernel in Python only, and we can calculate the differences between the actual result and the kernel's result + +# In[ ]: + + +def calculate_difference(pred, real): + if pred is None: return 5, 5 + assert real is not None + import numpy as np + try: + difference = pred - real + except: + return 5, 5 + amax_error = float(np.amax(difference)) + mse_error = float(np.mean(np.square(difference))) + return amax_error, mse_error + + +# In[ ]: + + +# Kernel generated by GPT-5 +def matmul(A, B): + z, s = zip, sum + Bt = list(z(*B)) + return [[s(a*b for a, b in z(row, col)) for col in Bt] for row in A] + + +# We see the error below is very small, so that's good! + +# In[ ]: + + +prediction = matmul(A_list, B_list) +calculate_difference(prediction, np.matmul(A, B)) + + +# # Countering Reward Hacking +# +# The ultimate goal of RL is to maximize some reward (say speed, revenue, some metric). +# +# But RL can **cheat** When the RL algorithm learns a trick or exploits something to increase the reward, without actually doing the task at end, this is called "Reward Hacking". +# +# Some good examples are in https://en.wikipedia.org/wiki/Reward_hacking +# +# For matrix multiplication kernels, we might see the following issues: +# +# * Laziness: RL learns to use Numpy, Torch, other libraries, which calls optimized kernels. +# * Caching: RL learns to cache the result of the output +# * Cheating: RL learns to find the actual output by inspecting Python global variables +# * RL learns to edit the timing function to make it output 0 time as passed. +# +# And possibly more. We shall try to address each! + +# # Countering Reward Hacking 1: Stop laziness +# We can stop the RL algorithm from calling optimized code by inspecting if the generated code imports other non standard Python libraries. We used GPT-5 to help generate this check `check_only_stdlib_imports`: + +# In[ ]: + + +#@title (Collapsible code) +import ast +import sys +import sysconfig +from pathlib import Path + +def _stdlib_names(): + """ + Build a set of canonical stdlib top-level module/package names. + Uses sys.stdlib_module_names when available (3.10+), with a + filesystem fallback for older versions/edge cases. + """ + names = {m.lower() for m in getattr(sys, "stdlib_module_names", set())} + names |= {m.lower() for m in sys.builtin_module_names} + names.add("__future__") # special-case + + # Fallback/augmentation: scan the stdlib directory + try: + stdlib_dir = Path(sysconfig.get_path("stdlib")) + if stdlib_dir.exists(): + for p in stdlib_dir.iterdir(): + if p.name == "site-packages": + continue + if p.suffix == ".py": + names.add(p.stem.lower()) + elif p.is_dir() and (p / "__init__.py").exists(): + names.add(p.name.lower()) + except Exception: + # conservative fallback; the names set above will still work well + pass + + return names + +_STDLIB_SET = _stdlib_names() + +def check_only_stdlib_imports(code: str): + """ + Return (ok: bool, details: dict) + + ok == True -> all absolute imports are from the stdlib. + ok == False -> details['non_stdlib'] lists offending top-level modules. + + details includes: + - stdlib: sorted list of stdlib imports found + - non_stdlib: sorted list of non-stdlib imports found + - relative_imports: count of relative imports (always allowed here) + """ + try: + tree = ast.parse(code) + except SyntaxError as e: + return False, { + "error": f"SyntaxError: {e}", + "stdlib": [], + "non_stdlib": [], + "relative_imports": 0, + } + + abs_imports = set() + relative_count = 0 + + class Visitor(ast.NodeVisitor): + def visit_Import(self, node: ast.Import): + for alias in node.names: + abs_imports.add(alias.name.split(".")[0]) + def visit_ImportFrom(self, node: ast.ImportFrom): + nonlocal relative_count + if (node.level or 0) > 0: + # relative import + relative_count += 1 + else: + if node.module: + abs_imports.add(node.module.split(".")[0]) + + Visitor().visit(tree) + + stdlib_found = sorted(m for m in abs_imports if m.lower() in _STDLIB_SET) + non_stdlib = sorted(m for m in abs_imports if m.lower() not in _STDLIB_SET) + + return len(non_stdlib) == 0, { + "stdlib": stdlib_found, + "non_stdlib": non_stdlib, + "relative_imports": relative_count, + } + + +# For example, let's call `check_only_stdlib_imports` on a random piece of matrix multiplication code generated by GPT-5: + +# In[ ]: + + +sample = """ +def matmul(A, B): + import numpy as np + from torch import matmul + z, s = zip, sum + Bt = list(z(*B)) + return [[s(a*b for a, b in z(row, col)) for col in Bt] for row in A] +""" +ok, info = check_only_stdlib_imports(sample) +print("Only stdlib imports?", ok) +print(info) + + +# # Countering Reward Hacking 2: Stop cheating +# We can stop the RL algorithm from using global or cached variables by restricting it's `locals` and `globals`. +# +# We are also going to use `exec` to create the function, so we have to save the output to an empty dict. +# +# We also disallow global variable access. + +# In[ ]: + + +output_function = {} +exec(sample, {}, output_function) +output_function["matmul"] + + +# We also disallow global variable access via `types.FunctionType(f.__code__, {})` + +# In[ ]: + + +import types +output_function["matmul"] = types.FunctionType(output_function["matmul"].__code__, {}) + +def import_numpy(): + np.matmul + print("Success") + +import_numpy() +import_numpy = types.FunctionType(import_numpy.__code__, {}) +try: + import_numpy() +except Exception as e: + print(str(e)) + + +# In[ ]: + + +def create_locked_down_function(function): + output_function = {} + exec(function, {}, output_function) + new_matmul = output_function["matmul"] + new_matmul = types.FunctionType(new_matmul.__code__, {}) + return new_matmul + + +# # Countering Reward Hacking 3: Stop caching +# We can stop the RL algorithm from using cached data by wiping the cache with a large fake matrix. We also have to benchmark carefully with multiple loops and turns. +# +# We also add a **timer** to not make the algorithm go in an endless loop. + +# In[ ]: + + +import os, gc, time, statistics +import signal +from contextlib import contextmanager +class TimeoutError(Exception): pass + +@contextmanager +def time_limit(seconds): + def _handler(signum, frame): + raise TimeoutError(f"Timed out after {seconds}s") + old = signal.signal(signal.SIGALRM, _handler) + signal.setitimer(signal.ITIMER_REAL, seconds) + try: + yield + finally: + signal.setitimer(signal.ITIMER_REAL, 0.0) + signal.signal(signal.SIGALRM, old) + +class Benchmarker: + def __init__(self, trials = 3, loops = 1, timeout = 30): + self.buffer = np.zeros(2 * 1024 * 1024 * 1024, dtype = np.uint8) + self.trials = trials + self.loops = loops + assert timeout > 0 # Cannot be 0 since it won't work! + self.timeout = timeout + def thrash(self): + # Edit the buffer to wipe cache lines + self.buffer ^= 1 + return int(self.buffer[::4096].sum()) + + def benchmark(self, function, arguments): + assert len(arguments) == self.loops + samples = [] + exceptions = [] + timed_out = 0 + for _ in range(self.trials): + gc.collect(); gc.disable(); self.thrash() + t_start = time.perf_counter_ns() + for i in range(self.loops): + try: + with time_limit(self.timeout): + function(*arguments[i]) + except TimeoutError as e: + timed_out += 1 + except Exception as e: + exceptions.append(str(e)) + t_end = time.perf_counter_ns() + gc.enable() + samples.append((t_end - t_start) // max(1, self.loops)) + return { + "median_ns": int(statistics.median(samples)), + "mean_ns": int(statistics.fmean(samples)), + "stdev_ns": int(statistics.pstdev(samples) if len(samples) > 1 else 0), + "exceptions" : exceptions, + "timeouts" : timed_out, + } + + +# For example we use our matmul kernel we had, and benchmark it with a 10 second delay: + +# In[ ]: + + +A, A_list, B, B_list = generate_random_matrices(seed = 0, n = 256) +Benchmarker(trials = 1, timeout = 10).benchmark(output_function["matmul"], [(A_list, B_list)]) + + +# # Data & RL task setup +# +# We now have to create a prompt to the model for which it will do some task. For our matrix multiply example, we use the below: + +# In[ ]: + + +prompt = """ +Create a new fast matrix multiplication function using only native Python code. +You are given a list of list of numbers. +Output your new function in backticks using the format below: +```python +def matmul(A, B): + return ... +``` +""".strip() +print(prompt) + + +# First, let's prompt Gemma 4 without RL and see how it goes: + +# In[ ]: + + +text = tokenizer.apply_chat_template( + [{"role": "user", "content": prompt.strip()}], + tokenize = False, + add_generation_prompt = True, +) + +from transformers import TextStreamer +print("=" * 50) +print("BASE MODEL OUTPUT (before RL training):") +print("=" * 50) + +inputs = tokenizer( + text = text, + add_special_tokens = False, + return_tensors = "pt", +).to("cuda") + +text_streamer = TextStreamer(tokenizer, skip_prompt = True) +result = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 512, + use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64) + + +# # Reward functions +# +# We now design the `extract_function` function which simply extracts the function wrapped in 3 backticks. +# +# And 4 reward functions: +# +# 1. `function_works` which rewards the model if the strategy is a valid Python function. +# 2. `no_cheating` which checks if the function imported other modules, and if it did, we penalize it. +# 3. `correctness_check` which checks if the kernel was correct or wrong - it shouldn't generate gibberish! +# 4. `speed_check` checks the performance relative to Numpy matmul directly. + +# In[ ]: + + +def extract_function(text): + if text.count("```") >= 2: + first = text.find("```") + 3 + second = text.find("```", first) + fx = text[first : second].strip() + fx = fx.removeprefix("python\n") + fx = fx[fx.find("def"):] + if fx.startswith("def matmul(A, B):"): return fx + return None +print(extract_function(prompt)) + + +# Below is our `function_works` reward function which uses Python's `exec` but guarded by not allowing leakage of local and global variables. We can also use `check_only_stdlib_imports` first to check if there are errors before even executing the function: + +# In[ ]: + + +ok, info = check_only_stdlib_imports("def a") +ok, info + + +# In[ ]: + + +def function_works(completions, **kwargs): + scores = [] + for completion in completions: + score = 0 + response = completion[0]["content"] + function = extract_function(response) + print(function) + if function is not None: + ok, info = check_only_stdlib_imports(function) + if function is None or "error" in info: + score = -2.0 + else: + try: + new_matmul = create_locked_down_function(function) + score = 1.0 + except: + score = -0.5 + scores.append(score) + return scores + + +# `no_cheating` checks if the function cheated since it might have imported Numpy or Torch optimized code. + +# In[ ]: + + +def no_cheating(completions, **kwargs): + scores = [] + for completion in completions: + score = 0 + response = completion[0]["content"] + function = extract_function(response) + if function is not None: + ok, info = check_only_stdlib_imports(function) + else: + ok = False + scores.append(1.0 if ok else -20.0) # Penalize heavily! + return scores + + +# Next `correctness_check` checks if the kernel was correct. We want to penalize if the absolute error is larger than 1, and if the mean squared error is somewhat bigger then machine epsilon. +# +# We have to execute the code now! + +# In[ ]: + + +np.finfo(np.float64).eps + + +# In[ ]: + + +def correctness_check(completions, **kwargs): + scores = [] + # Generate some random matrices of size less than 128 + A, A_list, B, B_list = generate_random_matrices(seed = np.random.randint(10000), n = 128) + for completion in completions: + score = 0 + response = completion[0]["content"] + function = extract_function(response) + if function is not None: + ok, info = check_only_stdlib_imports(function) + if function is None or "error" in info: + scores.append(0) + continue + try: + new_matmul = create_locked_down_function(function) + except: + scores.append(0) + continue + try: + pred = new_matmul(A_list.copy(), B_list.copy()) + except: + # Failed! + scores.append(-2.0) + continue + true = np.matmul(A, B) + amax_error, mse_error = calculate_difference(pred, true) + + # Check correctness and score! + machine_epsilon = 100*np.finfo(np.float64).eps + if amax_error >= 3: score = -3.0 + elif amax_error >= 2: score = -2.5 + elif amax_error >= 1: score = -2.0 + elif amax_error >= 0.5: score = -1.0 + elif amax_error >= 100*machine_epsilon: score = 0.0 + elif amax_error >= machine_epsilon: score = 1.0 + else: score = 3.0 + + if mse_error >= 3: score += -3.0 + elif mse_error >= 2: score += -2.5 + elif mse_error >= 1: score += -2.0 + elif mse_error >= 0.5: score += -1.0 + elif mse_error >= 100*machine_epsilon: score += 0.0 + elif mse_error >= machine_epsilon: score += 1.0 + else: score += 3.0 + scores.append(score) + return scores + + +# Finally our benchmarking function for `speed_check`! We shall limit the timer to 10 seconds and do 3 trials. + +# In[ ]: + + +A, A_list, B, B_list = generate_random_matrices(seed = 0, n = 256) +benchmarker = Benchmarker(trials = 3, timeout = 10) +numpy_results = benchmarker.benchmark(np.matmul, [(A, B)]) +numpy_results + + +# In[ ]: + + +new_matmul = create_locked_down_function(extract_function(prompt)) +new_results = benchmarker.benchmark(new_matmul, [(A_list, B_list)]) +new_results + + +# We can take the difference and do a negative sign for slower ones. If the ratio is less than 1 (ie faster, we shall invert it!) + +# In[ ]: + + +negative = -(new_results["median_ns"] / numpy_results["median_ns"]) / 100 +positive = +(numpy_results["median_ns"] / new_results["median_ns"]) / 100 +reward = negative if new_results["median_ns"] >= numpy_results["median_ns"] else positive +reward + + +# In[ ]: + + +new_results["median_ns"] = 3 +numpy_results["median_ns"] = 1000 +negative = -(new_results["median_ns"] / numpy_results["median_ns"]) / 100 +positive = +(numpy_results["median_ns"] / new_results["median_ns"]) / 100 +reward = negative if new_results["median_ns"] >= numpy_results["median_ns"] else positive +reward + + +# In[ ]: + + +import gc +def speed_check(completions, **kwargs): + scores = [] + # Generate some random matrices of size less than 256 + A, A_list, B, B_list = generate_random_matrices(seed = np.random.randint(10000), n = 256) + numpy_results = benchmarker.benchmark(np.matmul, [(A, B)]) + for completion in completions: + score = 0 + response = completion[0]["content"] + function = extract_function(response) + if function is not None: + ok, info = check_only_stdlib_imports(function) + if function is None or "error" in info: + scores.append(0) + continue + try: + new_matmul = create_locked_down_function(function) + except: + scores.append(0) + continue + new_results = benchmarker.benchmark(new_matmul, [(A_list.copy(), B_list.copy())]) + + # Get score and clip to -10, 10 + negative = -(new_results["median_ns"] / numpy_results["median_ns"]) / 100 + positive = +(numpy_results["median_ns"] / new_results["median_ns"]) / 100 + score = negative if new_results["median_ns"] >= numpy_results["median_ns"] else positive + if score >= 10: score = 10 + if score <= -10: score = -10 + scores.append(score) + # Free memory to counteract OOMs + gc.collect() + torch.cuda.empty_cache() + return scores + + +# We create the dataset which includes a replica of our prompt. + +# In[ ]: + + +from datasets import Dataset +dataset = Dataset.from_list([{"prompt" : [{"role": "user", "content": prompt.strip()}], "answer" : 0}]*1000) +maximum_length = len(tokenizer.apply_chat_template([{"role":"user", "content":prompt.strip()}], add_generation_prompt = True, tokenize = True)) +print(maximum_length) +dataset[0] + + +# +# ### Train the model +# +# Now set up GRPO Trainer and all configurations! We also support GSDP, GAPO, Dr GRPO and more! Go to our docs https://unsloth.ai/docs/ for more info! + +# In[ ]: + + +# Leave room for the prompt (plus 1 token safety margin) +max_completion_length = max_seq_length - (maximum_length + 1) + +from trl import GRPOConfig, GRPOTrainer +training_args = GRPOConfig( + temperature = 1.0, + top_p = 0.95, + top_k = 64, + learning_rate = 5e-5, + weight_decay = 0.001, + warmup_ratio = 0.1, + lr_scheduler_type = "linear", + optim = "adamw_8bit", + logging_steps = 1, + per_device_train_batch_size = 1, + gradient_accumulation_steps = 2, # Increase to 4 for smoother training + num_generations = 2, # Decrease if out of memory + max_completion_length = max_completion_length, + # num_train_epochs = 1, # Set to 1 for a full training run + max_steps = 100, + save_steps = 100, + report_to = "none", # Can use Weights & Biases, TrackIO + output_dir = "outputs", + epsilon = 0.2, + epsilon_high = 0.28, # one sided + delta = 1.5, # two sided + loss_type = 'bnpo', + mask_truncated_completions = True + # For optional training + evaluation + # fp16_full_eval = True, + # per_device_eval_batch_size = 4, + # eval_accumulation_steps = 1, + # eval_strategy = "steps", + # eval_steps = 1, +) + + +# And let's run the trainer! If you scroll up, you'll see a table of rewards. The goal is to see the `reward` column increase! +# +# You might have to wait 150 to 200 steps for any action. You'll probably get 0 reward for the first 100 steps. Please be patient! +# +# | Step | Training Loss | reward | reward_std | completion_length | kl | +# |------|---------------|-----------|------------|-------------------|----------| +# | 1 | 0.000000 | 0.125000 | 0.000000 | 200.000000 | 0.000000 | +# | 2 | 0.000000 | 0.072375 | 0.248112 | 200.000000 | 0.000000 | +# | 3 | 0.000000 | -0.079000 | 0.163776 | 182.500000 | 0.000005 | + +# In[ ]: + + +# For optional training + evaluation +# new_dataset = dataset.train_test_split(test_size = 0.01) + +trainer = GRPOTrainer( + model = model, + processing_class = tokenizer, + reward_funcs = [ + function_works, + no_cheating, + correctness_check, + speed_check, + ], + args = training_args, + train_dataset = dataset, + + # For optional training + evaluation + # train_dataset = new_dataset["train"], + # eval_dataset = new_dataset["test"], +) + + +# And let's train the model! +# +# **NOTE** A T4 free GPU might take 5 minutes for one generation sadly since it's an old GPU - A100 or H100 will be much faster! + +# In[ ]: + + +trainer.train() + + +# And now with the LoRA we just trained with GRPO - we first save the LoRA first! + +# In[ ]: + + +model.save_pretrained("gemma_4_lora") # Local saving +tokenizer.save_pretrained("gemma_4_lora") + + +# Verify LoRA is actually trained! + +# In[ ]: + + +from safetensors import safe_open + +tensors = {} +with safe_open("grpo_saved_lora/adapter_model.safetensors", framework = "pt") as f: + # Verify both A and B are non zero + for key in f.keys(): + tensor = f.get_tensor(key) + n_zeros = (tensor == 0).sum() / tensor.numel() + assert(n_zeros.item() != tensor.numel()) + + +# +# # Inference +# Now let's try the model we just trained! + +# In[ ]: + + +text = tokenizer.apply_chat_template( + [{"role": "user", "content": prompt.strip()}], + tokenize = False, + add_generation_prompt = True, +) + +from transformers import TextStreamer + +_ = model.generate( + **tokenizer(images = None, text = text, return_tensors = "pt").to("cuda"), + temperature = 1.0, top_p = 0.95, top_k = 64, + max_new_tokens = 1024, + streamer = TextStreamer(tokenizer, skip_prompt = False), +) + + +# +# ### Saving to float16 for VLLM +# +# We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See [our docs](https://unsloth.ai/docs/basics/inference-and-deployment) for more deployment options. + +# In[ ]: + + +# Merge to 16bit +if False: model.save_pretrained_merged("gemma_4_finetune_16bit", tokenizer, save_method = "merged_16bit",) +if False: model.push_to_hub_merged("HF_USERNAME/gemma_4_finetune_16bit", tokenizer, save_method = "merged_16bit", token = "YOUR_HF_TOKEN") + +# Merge to 4bit +if False: model.save_pretrained_merged("gemma_4_finetune_4bit", tokenizer, save_method = "merged_4bit",) +if False: model.push_to_hub_merged("HF_USERNAME/gemma_4_finetune_4bit", tokenizer, save_method = "merged_4bit", token = "YOUR_HF_TOKEN") + +# Just LoRA adapters +if False: + model.save_pretrained("gemma_4_lora") + tokenizer.save_pretrained("gemma_4_lora") +if False: + model.push_to_hub("HF_USERNAME/gemma_4_lora", token = "YOUR_HF_TOKEN") + tokenizer.push_to_hub("HF_USERNAME/gemma_4_lora", token = "YOUR_HF_TOKEN") + + +# ### GGUF / llama.cpp Conversion +# To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF. +# +# Some supported quant methods (full list on our [docs page](https://unsloth.ai/docs/basics/inference-and-deployment/saving-to-gguf)): +# * `q8_0` - Fast conversion. High resource use, but generally acceptable. +# * `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K. +# * `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K. +# +# [**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb) + +# In[ ]: + + +# Save to 8bit Q8_0 +if False: model.save_pretrained_gguf("gemma_4_finetune", tokenizer,) +# Remember to go to https://huggingface.co/settings/tokens for a token! +# And change hf to your username! +if False: model.push_to_hub_gguf("HF_USERNAME/gemma_4_finetune", tokenizer, token = "YOUR_HF_TOKEN") + +# Save to 16bit GGUF +if False: model.save_pretrained_gguf("gemma_4_finetune", tokenizer, quantization_method = "f16") +if False: model.push_to_hub_gguf("HF_USERNAME/gemma_4_finetune", tokenizer, quantization_method = "f16", token = "YOUR_HF_TOKEN") + +# Save to q4_k_m GGUF +if False: model.save_pretrained_gguf("gemma_4_finetune", tokenizer, quantization_method = "q4_k_m") +if False: model.push_to_hub_gguf("HF_USERNAME/gemma_4_finetune", tokenizer, quantization_method = "q4_k_m", token = "YOUR_HF_TOKEN") + +# Save to multiple GGUF options - much faster if you want multiple! +if False: + model.push_to_hub_gguf( + "HF_USERNAME/gemma_4_finetune", # Change hf to your username! + tokenizer, + quantization_method = ["q4_k_m", "q8_0", "q5_k_m",], + token = "YOUR_HF_TOKEN", + ) + + +# Now, use the `gemma_4_finetune.Q8_0.gguf` file or `gemma_4_finetune.Q4_K_M.gguf` file in llama.cpp. +# +# And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord! +# +# Some other resources: +# 1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb) +# 2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb) +# 3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) +# 4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)! +# +#
+# +# +# +# +# Join Discord if you need help + ⭐️ Star us on Github ⭐️ +#
+# +# This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). diff --git a/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E2B)_Reinforcement_Learning_2048_Game.py b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E2B)_Reinforcement_Learning_2048_Game.py new file mode 100644 index 0000000..da4b133 --- /dev/null +++ b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E2B)_Reinforcement_Learning_2048_Game.py @@ -0,0 +1,913 @@ +#!/usr/bin/env python +# coding: utf-8 + +# To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance! +#
+# +# +# Join Discord if you need help + ⭐ Star us on Github ⭐ +#
+# +# To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). +# +# You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it + +# # Goal: Make Gemma 4 play games with Reinforcement Learning +# +# Our goal is to make Gemma 4 play the 2048 game with reinforcement learning, or a variant of it called [GRPO](https://arxiv.org/abs/2501.12948). +# +# We want the model to devise a strategy to play 2048, and we will run this strategy until we win or lose. We then reward the model if it created a good strategy (winning the game), and we'll penalize it (negative reward) if the strategy was a bad one. +# +# + +# # Installation +# We'll be using [Unsloth](https://github.com/unslothai/unsloth) to do RL on Gemma 4. Unsloth saves 70% VRAM usage and makes reinforcement learning 2 to 6x faster! + +# In[ ]: + + +get_ipython().run_cell_magic('capture', '', 'import os, importlib.util\n!pip install --upgrade -qqq uv\nif importlib.util.find_spec("torch") is None or "COLAB_" in "".join(os.environ.keys()):\n try: import numpy, PIL; _numpy = f"numpy=={numpy.__version__}"; _pil = f"pillow=={PIL.__version__}"\n except: _numpy = "numpy"; _pil = "pillow"\n # Gemma 4 requires transformers >= 5.5.0 — do NOT pin to 4.x here\n !uv pip install -qqq \\\n "torch>=2.8.0" "triton>=3.4.0" {_numpy} {_pil} torchvision bitsandbytes \\\n "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo" \\\n "unsloth[base] @ git+https://github.com/unslothai/unsloth" \\\n git+https://github.com/triton-lang/triton.git@0add68262ab0a2e33b84524346cb27cbb2787356#subdirectory=python/triton_kernels\nelif importlib.util.find_spec("unsloth") is None:\n !uv pip install -qqq unsloth\n# Gemma 4 requires transformers >= 5.5.0\n!uv pip install --upgrade --no-deps "transformers>=5.5.0" tokenizers "trl>=0.28.0" unsloth unsloth_zoo\n') + + +# In[ ]: + + +get_ipython().run_cell_magic('capture', '', '!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio\n') + + +# ### Unsloth + +# In[ ]: + + +from unsloth import FastVisionModel +import torch +max_seq_length = 4096 # Can increase for longer reasoning traces +lora_rank = 32 # Larger rank = smarter, but slower + +gemma4_models = [ + # Gemma-4 instruct models: + "unsloth/gemma-4-E2B-it", + "unsloth/gemma-4-E4B-it", + "unsloth/gemma-4-31B-it", + "unsloth/gemma-4-26B-A4B-it", + # Gemma-4 base models: + "unsloth/gemma-4-E2B", + "unsloth/gemma-4-E4B", + "unsloth/gemma-4-31B", + "unsloth/gemma-4-26B-A4B", +] # More models at https://huggingface.co/unsloth + +model, tokenizer = FastVisionModel.from_pretrained( + model_name = "unsloth/gemma-4-E2B-it", + max_seq_length = max_seq_length, + load_in_4bit = False, # False for LoRA 16bit + fast_inference = False, # Enable vllm fast inference +) + + +# To do efficient RL, we will use [LoRA](https://arxiv.org/abs/2106.09685), which allows us to only add 1 to 5% of extra weights to the model for finetuning purposes. This allows us to save memory usage by over 60%, and yet it retains good accuracy. + +# In[ ]: + + +model = FastVisionModel.get_peft_model( + model, + r = lora_rank, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128 + target_modules = [ + "q_proj", "k_proj", "v_proj", "o_proj", + "gate_proj", "up_proj", "down_proj", + ], + lora_alpha = lora_rank*2, # *2 speeds up training + use_gradient_checkpointing = "unsloth", # Reduces memory usage + random_state = 3407, +) + + +# # 2048 game +# +# We used GPT-5 to create a variant of the 2048 game. It should output the current game board state, and allow us to advance the game board state with 1 action (up, down, left, right). + +# In[ ]: + + +#@title (Collapsible) 2048 Game Implementation +from dataclasses import dataclass, field +from typing import List, Tuple, Optional +import random +import copy + +def _compress_and_merge_row_left(row: List[int]) -> Tuple[List[int], int, bool]: + n = len(row) + tiles = [x for x in row if x != 0] + gained = 0 + i = 0 + merged = [] + while i < len(tiles): + if i + 1 < len(tiles) and tiles[i] == tiles[i + 1]: + v = tiles[i] * 2 + gained += v + merged.append(v) + i += 2 + else: + merged.append(tiles[i]) + i += 1 + merged += [0] * (n - len(merged)) + changed = merged != row + return merged, gained, changed + +def _move_left(board: List[List[int]]) -> Tuple[List[List[int]], int, bool]: + changed_any = False + total_gain = 0 + new_board = [] + for row in board: + new_row, gained, changed = _compress_and_merge_row_left(row) + new_board.append(new_row) + total_gain += gained + changed_any = changed_any or changed + return new_board, total_gain, changed_any + +def _move_right(board: List[List[int]]) -> Tuple[List[List[int]], int, bool]: + changed_any = False + total_gain = 0 + new_board = [] + for row in board: + rev = list(reversed(row)) + new_rev, gained, changed = _compress_and_merge_row_left(rev) + new_row = list(reversed(new_rev)) + new_board.append(new_row) + total_gain += gained + changed_any = changed_any or changed + return new_board, total_gain, changed_any + +def _transpose(board: List[List[int]]) -> List[List[int]]: + return [list(row) for row in zip(*board)] + +def _move_up(board: List[List[int]]) -> Tuple[List[List[int]], int, bool]: + t = _transpose(board) + moved, gain, changed = _move_left(t) + return _transpose(moved), gain, changed + +def _move_down(board: List[List[int]]) -> Tuple[List[List[int]], int, bool]: + t = _transpose(board) + moved, gain, changed = _move_right(t) + return _transpose(moved), gain, changed + +def _empty_cells(board: List[List[int]]) -> List[Tuple[int, int]]: + size = len(board) + return [(r, c) for r in range(size) for c in range(size) if board[r][c] == 0] + +def _can_move(board: List[List[int]]) -> bool: + if _empty_cells(board): + return True + size = len(board) + for r in range(size): + for c in range(size - 1): + if board[r][c] == board[r][c + 1]: + return True + for r in range(size - 1): + for c in range(size): + if board[r][c] == board[r + 1][c]: + return True + return False + +@dataclass +class GameBoard: + size: int + seed: Optional[int] = None + target: int = 2048 + probability_fours: float = 0.10 # originally spawns (4) 10% of the time! + _rng: random.Random = field(init = False, repr = False) + _board: List[List[int]] = field(init = False, repr = False) + _score: int = field(default = 0, init = False, repr = False) + _state: str = field(default = "ongoing", init = False, repr = False) + + def __post_init__(self): + if self.size < 2: + raise ValueError("Board size must be at least 2.") + self._rng = random.Random(self.seed) + self._board = [[0 for _ in range(self.size)] for _ in range(self.size)] + self._add_random_tile() + self._add_random_tile() + self._update_state_after_change() + + class _BoardView: + def __init__(self, game: "GameBoard"): + self._game = game + def __iter__(self): + return iter(self._game._board) + def __len__(self): + return len(self._game._board) + def __getitem__(self, idx): + return self._game._board[idx] + def __repr__(self) -> str: + return repr(self._game._board) + __str__ = __repr__ + def do_action(self, key: str) -> None: + self._game.do_action(key) + def state(self) -> str: + return self._game.state() + def pretty(self, colors: bool = True, border: bool = True, dot_for_zero: bool = True) -> str: + return self._game._render_pretty(colors = colors, border = border, dot_for_zero = dot_for_zero) + + def board(self) -> "_BoardView": + return GameBoard._BoardView(self) + def state(self) -> str: + return self._state + def score(self) -> int: + return self._score + def do_action(self, key: str) -> None: + if self._state != "ongoing": + return + if not isinstance(key, str) or len(key) == 0: + self._state = "failed" + return + k = key.strip().lower() + if k == "q": + self._state = "failed" + return + move_map = {"a": _move_left, "d": _move_right, "w": _move_up, "s": _move_down} + if k not in move_map: + self._state = "failed" + return + mover = move_map[k] + new_board, gain, changed = mover(self._board) + if changed: + self._board = new_board + self._score += gain + self._add_random_tile() + self._update_state_after_change() + def _add_random_tile(self) -> bool: + empties = _empty_cells(self._board) + if not empties: + return False + r, c = self._rng.choice(empties) + self._board[r][c] = 4 if self._rng.random() < self.probability_fours else 2 + return True + def _update_state_after_change(self) -> None: + if any(self.target in row for row in self._board): + self._state = "success" + return + if not _can_move(self._board): + self._state = "failed" + return + self._state = "ongoing" + def _render_pretty(self, colors: bool = True, border: bool = True, dot_for_zero: bool = True) -> str: + """ + Pretty-print the board with colors that scale from 0 up to self.target. + Uses ANSI 256-color codes (works in most terminals). Set colors = False to disable. + """ + import math + + b = self._board + mx = max((max(row) for row in b), default = 0) + cell_w = max(3, len(str(mx))) + + RESET = "\x1b[0m" + + # A smooth-ish gradient from cool → warm + # (blue/cyan/green → yellow/orange/red). Tweak or expand as you like. + GRAD = [33, 39, 45, 51, 50, 49, 48, 47, 46, 82, 118, 154, 190, 226, 220, 214, 208, 202, 196] + ZERO_FG = 239 # dim gray + + def color_code(v: int) -> str: + if not colors: + return "" + if v == 0: + return f"\x1b[38;5;{ZERO_FG}m" + # Normalize by exponent relative to target: r in [0,1] + t = max(2, self.target) # safety; avoid log2(1) + # Guard: if v is not a power of two or is <1, handle gracefully + try: + r = max(0.0, min(1.0, math.log2(v) / math.log2(t))) + except ValueError: + r = 0.0 + idx = int(round(r * (len(GRAD) - 1))) + return f"\x1b[38;5;{GRAD[idx]}m" + + def fmt(v: int) -> str: + s = "." if (v == 0 and dot_for_zero) else str(v) + s = s.rjust(cell_w) + return color_code(v) + s + (RESET if colors else "") + + def hline(left: str, mid: str, right: str) -> str: + return left + mid.join("─" * cell_w for _ in range(self.size)) + right + + rows = [] + if border: + rows.append(hline("┌", "┬", "┐")) + for r in range(self.size): + content = "│".join(fmt(v) for v in b[r]) + rows.append(("│" + content + "│") if border else content) + if border: + rows.append(hline("└" if r == self.size - 1 else "├", + "┴" if r == self.size - 1 else "┼", + "┘" if r == self.size - 1 else "┤")) + return "\n".join(rows) + + +# For example let's create a board of size 5 X 5 and set the target to 8 instead of 2048. +# +# **[NOTE]** 2048 originally spawns a (4) 10% of the time! We can disable this for harder games. See [Wikipedia page](https://en.wikipedia.org/wiki/2048_(video_game)) for more details. + +# In[ ]: + + +game = GameBoard(size = 5, seed = 42, target = 8, probability_fours = 0.10) +print(game.board().pretty(), game.state()) + + +# In[ ]: + + +game + + +# We'll use WASD for the action space: +# +# ``` +# W +# A S D +# ``` +# Also `game.state()` will say `success` if we succeeded in getting the target! + +# In[ ]: + + +game.do_action("A") +print(game.board().pretty(), game.state()) + + +# In[ ]: + + +game.do_action("W") +print(game.board().pretty(), game.state()) + + +# In[ ]: + + +game.do_action("D") +print(game.board().pretty(), game.state()) + + +# In[ ]: + + +game.do_action("W") +print(game.board().pretty(), game.state()) + + +# In[ ]: + + +game.do_action("D") +print(game.board().pretty(), game.state()) + + +# If we do some other action that's not part of the action space, we will get an error, and the game will not accept anymore actions. + +# In[ ]: + + +game = GameBoard(size = 3, seed = 42, target = 8, probability_fours = 0.10) +game.do_action("AA") # Not in WASD +game.do_action("W") # Doesn't do anything +game.do_action("A") # Doesn't do anything +print(game.board().pretty(), game.state()) + + +# # RL Environment Setup +# +# We'll set up a function to accept some strategy that'll emit an action within `WASD` and check the game state. +# +# We'll also add a timer to only execute the strategy for 2 seconds maximum, otherwise it might never terminate! + +# In[ ]: + + +from typing import Callable +from unsloth import execute_with_time_limit + +def _execute_strategy(strategy : Callable, game : GameBoard): + assert callable(strategy) + + steps = 0 + while game.state() == "ongoing": + action = strategy(list(game.board())) + steps += 1 + if type(action) is not str: + return steps, "failed" + game.do_action(action) + return steps, game.state() + +@execute_with_time_limit(2) +def execute_strategy(strategy : Callable, game : GameBoard): + return _execute_strategy(strategy, game) + + +# Let's make a generic strategy to just hit `W`. We should expect this generic strategy to fail: + +# In[ ]: + + +def always_move_left(board): + return "W" + +game = GameBoard(size = 8, seed = 42, target = 2048, probability_fours = 0.10) +try: + execute_strategy(always_move_left, game) +except TimeoutError as e: + print(f"Timed out with error = {str(e)}") + + +# To allow longer strategies for Gemma 4 Reinforcement Learning, we shall allow a 5 second timer. + +# In[ ]: + + +@execute_with_time_limit(5) +def execute_strategy(strategy : Callable, game : GameBoard): + return _execute_strategy(strategy, game) + + +# # Code Execution +# +# To execute and create a new Python function, we first have to check if the function does not call other global variables or cheat. This is called `countering reward hacking` since we don't want the function to cheat. +# +# For example the below piece of code is fine, since it only imports Python level functions. We use `check_python_modules`: + +# In[ ]: + + +from unsloth import check_python_modules + +sample = """ +def strategy(board): + import math + from typing import Callable + return "W" +""" +ok, info = check_python_modules(sample) +print("Only Python imports?", ok) +print(info) + + +# For the below piece of code, since we import `numpy`, we should not allow the execution: + +# In[ ]: + + +sample = """ +def strategy(board): + from numpy import matmul + return "W" +""" +ok, info = check_python_modules(sample) +print("Only Python imports?", ok) +print(info) + + +# We also disallow global variable access. We'll use Unsloth's `create_locked_down_function` function + +# In[ ]: + + +from unsloth import create_locked_down_function +function = """ +def import_numpy(): + np.matmul + print("Success") +""" +f = create_locked_down_function(function) +try: + f() +except Exception as e: + print(str(e)) + + +# In[ ]: + + +from unsloth import create_locked_down_function +function = """ +def add(a, b): + def adder(a): + return a + b + return adder(b) + b +""" +f = create_locked_down_function(function) +try: + print(f(10, 20)) +except Exception as e: + print(str(e)) + + +# # Data & RL task setup +# +# We now have to create a prompt to tell the model to create a strategy for the 2048 game. You can customize this to some other task for another RL task. + +# In[ ]: + + +prompt = """ +Create a new short 2048 strategy using only native Python code. +You are given a list of list of numbers for the current board state. +Output one action for "W", "A", "S", "D" on what is the optimal next step. +Output your new short function in backticks using the format below: +```python +def strategy(board): + return "W" # Example +``` +All helper functions should be inside def strategy. Only output the short function `strategy`. +""".strip() +print(prompt) + + +# First, let's prompt Gemma 4 without RL and see how it goes: + +# In[ ]: + + +text = tokenizer.apply_chat_template( + [{"role": "user", "content": prompt.strip()}], + tokenize = False, + add_generation_prompt = True, +) + +from transformers import TextStreamer +print("=" * 50) +print("BASE MODEL OUTPUT (before RL training):") +print("=" * 50) + +inputs = tokenizer( + text = text, + add_special_tokens = False, + return_tensors = "pt", +).to("cuda") + +text_streamer = TextStreamer(tokenizer, skip_prompt = True) +result = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 512, + use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64) + + +# # Reward functions +# +# We now design a `extract_function` function which simply extracts the function wrapped in 3 back ticks. +# +# And 3 reward functions: +# +# 1. `function_works` which rewards the model if the strategy is a valid Python function. +# 2. `no_cheating` which checks if the function imported other modules, and if it did, we penalize it. +# 3. `strategy_succeeds` which checks if the game strategy actually succeeds in attaining 2048 after running the auto-generated strategy. + +# In[ ]: + + +def extract_function(text): + if text.count("```") >= 2: + first = text.find("```") + 3 + second = text.find("```", first) + fx = text[first : second].strip() + fx = fx.removeprefix("python\n") + fx = fx[fx.find("def"):] + if fx.startswith("def strategy(board):"): return fx + return None +print(extract_function(prompt)) + + +# Below is our `function_works` reward function which uses Python's `exec` but guarded by not allowing leakage of local and global variables. We can also use `check_python_modules` first to check if there are errors before even executing the function: + +# In[ ]: + + +ok, info = check_python_modules("def a") +ok, info + + +# In[ ]: + + +def function_works(completions, **kwargs): + scores = [] + for completion in completions: + score = 0 + response = completion[0]["content"] + function = extract_function(response) + if function is not None: + ok, info = check_python_modules(function) + if function is None or "error" in info: + score = -2.0 + else: + try: + new_strategy = create_locked_down_function(function) + score = 1.0 + except: + score = -0.5 + scores.append(score) + return scores + + +# `no_cheating` checks if the function cheated since it might have imported Numpy or other functions: + +# In[ ]: + + +def no_cheating(completions, **kwargs): + scores = [] + for completion in completions: + score = 0 + response = completion[0]["content"] + function = extract_function(response) + if function is not None: + ok, info = check_python_modules(function) + scores.append(1.0 if ok else -20.0) # Penalize heavily! + else: + scores.append(-1.0) # Failed creating function + return scores + + +# Next `strategy_succeeds` checks if the strategy actually allows the game to terminate. Imagine if the strategy simply returned "W" which would fail after a time limit of 10 seconds. +# +# We also add a global `PRINTER` to print out the strategy and board state. + +# In[ ]: + + +import numpy as np +global PRINTER +PRINTER = 0 +def strategy_succeeds(completions, **kwargs): + global PRINTER + scores = [] + # Generate a random game board with seed + seed = np.random.randint(10000) + for completion in completions: + printed = False + score = 0 + response = completion[0]["content"] + function = extract_function(response) + if PRINTER % 5 == 0: + printed = True + print(function) + PRINTER += 1 + if function is not None: + ok, info = check_python_modules(function) + if function is None or "error" in info: + scores.append(0) + continue + try: + new_strategy = create_locked_down_function(function) + except: + scores.append(0) + continue + try: + game = GameBoard(size = 6, seed = seed, target = 2048, probability_fours = 0.10) + steps, game_state = execute_strategy(new_strategy, game) + print(f"Steps = {steps} State = {game_state}") + if printed is False: + print(function) + print(game.board().pretty()) + if game_state == "success": + scores.append(20.0) # Success - massively reward! + else: + scores.append(2.0) # Failed but function works! + except TimeoutError as e: + print("Timeout") + scores.append(-1.0) # Failed with timeout + except Exception as e: + print(f"Exception = {str(e)}") + scores.append(-3.0) # Failed + return scores + + +# We'll now create the dataset which includes a replica of our prompt. + +# In[ ]: + + +from datasets import Dataset +dataset = Dataset.from_list([{"prompt" : [{"role": "user", "content": prompt.strip()}], "answer" : 0}]*1000) +maximum_length = len(tokenizer.apply_chat_template([{"role":"user", "content":prompt.strip()}], add_generation_prompt = True, tokenize = True)) +print(maximum_length) +dataset[0] + + +# +# ### Train the model +# +# Now set up GRPO Trainer and all configurations! We also support GSPO, GAPO, Dr GRPO and more! Go the Unsloth [Reinforcement Learning Docs](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide) for more options. + +# In[ ]: + + +# Leave room for the prompt (plus 1 token safety margin) +max_completion_length = max_seq_length - (maximum_length + 1) + +from trl import GRPOConfig, GRPOTrainer +training_args = GRPOConfig( + temperature = 1.0, + top_p = 0.95, + top_k = 64, + learning_rate = 5e-5, + weight_decay = 0.001, + warmup_ratio = 0.1, + lr_scheduler_type = "linear", + optim = "adamw_8bit", + logging_steps = 1, + per_device_train_batch_size = 1, + gradient_accumulation_steps = 2, # Increase to 4 for smoother training + num_generations = 2, # Decrease if out of memory + max_completion_length = max_completion_length, + # num_train_epochs = 1, # Set to 1 for a full training run + max_steps = 60, + save_steps = 100, + report_to = "none", # Can use Weights & Biases, TrackIO + output_dir = "outputs", + epsilon = 0.2, + epsilon_high = 0.28, # one sided + delta = 1.5, # two sided + loss_type = 'bnpo', + mask_truncated_completions = True + # For optional training + evaluation + # fp16_full_eval = True, + # per_device_eval_batch_size = 4, + # eval_accumulation_steps = 1, + # eval_strategy = "steps", + # eval_steps = 1, +) + + +# And let's run the trainer! If you scroll up, you'll see a table of rewards. The goal is to see the `reward` column increase! +# +# You might have to wait 150 to 200 steps for any action. You'll probably get 0 reward for the first 100 steps. Please be patient! +# +# | Step | Training Loss | reward | reward_std | completion_length | kl | +# |------|---------------|-----------|------------|-------------------|----------| +# | 1 | 0.000000 | 0.125000 | 0.000000 | 200.000000 | 0.000000 | +# | 2 | 0.000000 | 0.072375 | 0.248112 | 200.000000 | 0.000000 | +# | 3 | 0.000000 | -0.079000 | 0.163776 | 182.500000 | 0.000005 | + +# In[ ]: + + +# For optional training + evaluation +# new_dataset = dataset.train_test_split(test_size = 0.01) + +trainer = GRPOTrainer( + model = model, + processing_class = tokenizer, + reward_funcs = [ + function_works, + no_cheating, + strategy_succeeds, + ], + args = training_args, + train_dataset = dataset, + + # For optional training + evaluation + # train_dataset = new_dataset["train"], + # eval_dataset = new_dataset["test"], +) + + +# And let's train the model! +# +# **NOTE** A T4 free GPU might take 5 minutes for one generation sadly since it's an old GPU - A100 or H100 will be much faster! + +# In[ ]: + + +trainer.train() + + +# And now with the LoRA we just trained with GRPO - we first save the LoRA first! + +# In[ ]: + + +model.save_pretrained("gemma_4_lora") # Local saving +tokenizer.save_pretrained("gemma_4_lora") + + +# Verify LoRA is actually trained! + +# In[ ]: + + +from safetensors import safe_open + +tensors = {} +with safe_open("grpo_saved_lora/adapter_model.safetensors", framework = "pt") as f: + # Verify both A and B are non zero + for key in f.keys(): + tensor = f.get_tensor(key) + n_zeros = (tensor == 0).sum() / tensor.numel() + assert(n_zeros.item() != tensor.numel()) + + +# +# # Inference +# Now let's try the model we just trained! + +# In[ ]: + + +text = tokenizer.apply_chat_template( + [{"role": "user", "content": prompt.strip()}], + tokenize = False, + add_generation_prompt = True, +) + +from transformers import TextStreamer + +_ = model.generate( + **tokenizer(images = None, text = text, return_tensors = "pt").to("cuda"), + temperature = 1.0, top_p = 0.95, top_k = 64, + max_new_tokens = 1024, + streamer = TextStreamer(tokenizer, skip_prompt = False), +) + + +# +# ### Saving to float16 for VLLM +# +# We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See [our docs](https://unsloth.ai/docs/basics/inference-and-deployment) for more deployment options. + +# In[ ]: + + +# Merge to 16bit +if False: model.save_pretrained_merged("gemma_4_finetune_16bit", tokenizer, save_method = "merged_16bit",) +if False: model.push_to_hub_merged("HF_USERNAME/gemma_4_finetune_16bit", tokenizer, save_method = "merged_16bit", token = "YOUR_HF_TOKEN") + +# Merge to 4bit +if False: model.save_pretrained_merged("gemma_4_finetune_4bit", tokenizer, save_method = "merged_4bit",) +if False: model.push_to_hub_merged("HF_USERNAME/gemma_4_finetune_4bit", tokenizer, save_method = "merged_4bit", token = "YOUR_HF_TOKEN") + +# Just LoRA adapters +if False: + model.save_pretrained("gemma_4_lora") + tokenizer.save_pretrained("gemma_4_lora") +if False: + model.push_to_hub("HF_USERNAME/gemma_4_lora", token = "YOUR_HF_TOKEN") + tokenizer.push_to_hub("HF_USERNAME/gemma_4_lora", token = "YOUR_HF_TOKEN") + + +# ### GGUF / llama.cpp Conversion +# To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF. +# +# Some supported quant methods (full list on our [docs page](https://unsloth.ai/docs/basics/inference-and-deployment/saving-to-gguf)): +# * `q8_0` - Fast conversion. High resource use, but generally acceptable. +# * `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K. +# * `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K. +# +# [**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb) + +# In[ ]: + + +# Save to 8bit Q8_0 +if False: model.save_pretrained_gguf("gemma_4_finetune", tokenizer,) +# Remember to go to https://huggingface.co/settings/tokens for a token! +# And change hf to your username! +if False: model.push_to_hub_gguf("HF_USERNAME/gemma_4_finetune", tokenizer, token = "YOUR_HF_TOKEN") + +# Save to 16bit GGUF +if False: model.save_pretrained_gguf("gemma_4_finetune", tokenizer, quantization_method = "f16") +if False: model.push_to_hub_gguf("HF_USERNAME/gemma_4_finetune", tokenizer, quantization_method = "f16", token = "YOUR_HF_TOKEN") + +# Save to q4_k_m GGUF +if False: model.save_pretrained_gguf("gemma_4_finetune", tokenizer, quantization_method = "q4_k_m") +if False: model.push_to_hub_gguf("HF_USERNAME/gemma_4_finetune", tokenizer, quantization_method = "q4_k_m", token = "YOUR_HF_TOKEN") + +# Save to multiple GGUF options - much faster if you want multiple! +if False: + model.push_to_hub_gguf( + "HF_USERNAME/gemma_4_finetune", # Change hf to your username! + tokenizer, + quantization_method = ["q4_k_m", "q8_0", "q5_k_m",], + token = "YOUR_HF_TOKEN", + ) + + +# Now, use the `gemma_4_finetune.Q8_0.gguf` file or `gemma_4_finetune.Q4_K_M.gguf` file in llama.cpp. +# +# And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord! +# +# Some other resources: +# 1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb) +# 2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb) +# 3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) +# 4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)! +# +#
+# +# +# +# +# Join Discord if you need help + ⭐️ Star us on Github ⭐️ +#
+# +# This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). diff --git a/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E2B)_Reinforcement_Learning_Sudoku_Game.py b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E2B)_Reinforcement_Learning_Sudoku_Game.py new file mode 100644 index 0000000..2acce9f --- /dev/null +++ b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E2B)_Reinforcement_Learning_Sudoku_Game.py @@ -0,0 +1,897 @@ +#!/usr/bin/env python +# coding: utf-8 + +# To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance! +#
+# +# +# Join Discord if you need help + ⭐ Star us on Github ⭐ +#
+# +# To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). +# +# You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it + +# # Goal: Make Gemma 4 solve Sudoku puzzles with Reinforcement Learning +# +# Our goal is to make Gemma 4 learn to solve Sudoku puzzles using reinforcement learning (GRPO). +# The model will devise a strategy to fill in empty cells, and we'll reward it for correct placements +# and completing valid puzzles. +# +# + +# # Installation +# We'll be using [Unsloth](https://github.com/unslothai/unsloth) to do RL on Gemma 4. Unsloth saves 70% VRAM usage and makes reinforcement learning 2 to 6x faster. + +# In[ ]: + + +get_ipython().run_cell_magic('capture', '', 'import os, importlib.util\n!pip install --upgrade -qqq uv\nif importlib.util.find_spec("torch") is None or "COLAB_" in "".join(os.environ.keys()):\n try: import numpy, PIL; _numpy = f"numpy=={numpy.__version__}"; _pil = f"pillow=={PIL.__version__}"\n except: _numpy = "numpy"; _pil = "pillow"\n # Gemma 4 requires transformers >= 5.5.0 — do NOT pin to 4.x here\n !uv pip install -qqq \\\n "torch>=2.8.0" "triton>=3.4.0" {_numpy} {_pil} torchvision bitsandbytes \\\n "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo" \\\n "unsloth[base] @ git+https://github.com/unslothai/unsloth" \\\n git+https://github.com/triton-lang/triton.git@0add68262ab0a2e33b84524346cb27cbb2787356#subdirectory=python/triton_kernels\nelif importlib.util.find_spec("unsloth") is None:\n !uv pip install -qqq unsloth\n# Gemma 4 requires transformers >= 5.5.0\n!uv pip install --upgrade --no-deps "transformers>=5.5.0" tokenizers "trl>=0.28.0" unsloth unsloth_zoo\n') + + +# In[ ]: + + +get_ipython().run_cell_magic('capture', '', '!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio\n') + + +# ### Unsloth + +# In[ ]: + + +from unsloth import FastVisionModel +import torch +max_seq_length = 4096 # Can increase for longer reasoning traces +lora_rank = 32 # Larger rank = smarter, but slower + +gemma4_models = [ + # Gemma-4 instruct models: + "unsloth/gemma-4-E2B-it", + "unsloth/gemma-4-E4B-it", + "unsloth/gemma-4-31B-it", + "unsloth/gemma-4-26B-A4B-it", + # Gemma-4 base models: + "unsloth/gemma-4-E2B", + "unsloth/gemma-4-E4B", + "unsloth/gemma-4-31B", + "unsloth/gemma-4-26B-A4B", +] # More models at https://huggingface.co/unsloth + +model, tokenizer = FastVisionModel.from_pretrained( + model_name = "unsloth/gemma-4-E2B-it", + max_seq_length = max_seq_length, + load_in_4bit = False, # False for LoRA 16bit + fast_inference = False, # Enable vllm fast inference +) + + +# To do efficient RL, we will use [LoRA](https://arxiv.org/abs/2106.09685), which allows us to only add 1 to 5% of extra weights to the model for finetuning purposes. This allows us to save memory usage by over 60%, and yet it retains good accuracy. + +# In[ ]: + + +model = FastVisionModel.get_peft_model( + model, + r = lora_rank, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128 + target_modules = [ + "q_proj", "k_proj", "v_proj", "o_proj", + "gate_proj", "up_proj", "down_proj", + ], + lora_alpha = lora_rank*2, # *2 speeds up training + use_gradient_checkpointing = "unsloth", # Reduces memory usage + random_state = 3407, +) + + +# # Sudoku Game Implementation +# +# We use GPT-5 to create a clean Sudoku solver environment. The strategy outputs "row,col,value" to fill cells. + +# In[ ]: + + +#@title Sudoku Game Implementation +from dataclasses import dataclass, field +from typing import List, Tuple, Optional +import random +import copy + +def _is_valid_placement(board: List[List[int]], row: int, col: int, num: int) -> bool: + """Check if placing num at (row, col) is valid.""" + # Check row + if num in board[row]: + return False + + # Check column + if num in [board[r][col] for r in range(9)]: + return False + + # Check 3x3 box + box_row, box_col = 3 * (row // 3), 3 * (col // 3) + for r in range(box_row, box_row + 3): + for c in range(box_col, box_col + 3): + if board[r][c] == num: + return False + + return True + +def _solve_sudoku(board: List[List[int]]) -> bool: + """Solve sudoku using backtracking (for puzzle generation).""" + for row in range(9): + for col in range(9): + if board[row][col] == 0: + for num in range(1, 10): + if _is_valid_placement(board, row, col, num): + board[row][col] = num + if _solve_sudoku(board): + return True + board[row][col] = 0 + return False + return True + +def _generate_complete_board(rng: random.Random) -> List[List[int]]: + """Generate a complete valid Sudoku board.""" + board = [[0 for _ in range(9)] for _ in range(9)] + + # Fill diagonal 3x3 boxes first (they don't affect each other) + for box in range(3): + nums = list(range(1, 10)) + rng.shuffle(nums) + for i in range(3): + for j in range(3): + board[box * 3 + i][box * 3 + j] = nums[i * 3 + j] + + # Solve the rest + _solve_sudoku(board) + return board + +@dataclass +class SudokuGame: + difficulty: int = 40 # Number of cells to remove (20 = easy, 40 = medium, 50 = hard) + seed: Optional[int] = None + _rng: random.Random = field(init = False, repr = False) + _board: List[List[int]] = field(init = False, repr = False) + _solution: List[List[int]] = field(init = False, repr = False) + _initial_board: List[List[int]] = field(init = False, repr = False) + _moves: int = field(default = 0, init = False, repr = False) + _state: str = field(default = "ongoing", init = False, repr = False) + + def __post_init__(self): + self._rng = random.Random(self.seed) + + # Generate complete board + complete_board = _generate_complete_board(self._rng) + self._solution = copy.deepcopy(complete_board) + + # Remove cells to create puzzle + self._board = copy.deepcopy(complete_board) + cells = [(r, c) for r in range(9) for c in range(9)] + self._rng.shuffle(cells) + + for r, c in cells[:self.difficulty]: + self._board[r][c] = 0 + + self._initial_board = copy.deepcopy(self._board) + self._update_state() + + def board(self) -> List[List[int]]: + """Return current board state.""" + return [row[:] for row in self._board] + + def initial_board(self) -> List[List[int]]: + """Return initial puzzle state.""" + return [row[:] for row in self._initial_board] + + def state(self) -> str: + """Return game state: 'ongoing', 'success', or 'failed'.""" + return self._state + + def moves(self) -> int: + """Return number of moves made.""" + return self._moves + + def place_number(self, row: int, col: int, num: int) -> bool: + """Place a number on the board. Returns True if valid move.""" + # Validate input + if not (0 <= row < 9 and 0 <= col < 9): + self._state = "failed" + return False + + if not (1 <= num <= 9): + self._state = "failed" + return False + + # Can't modify initial cells + if self._initial_board[row][col] != 0: + self._state = "failed" + return False + if self._board[row][col] != 0: + self._state = "failed" + return False + # Check if placement is valid + if not _is_valid_placement(self._board, row, col, num): + self._state = "failed" + return False + + # Place number + self._board[row][col] = num + self._moves += 1 + self._update_state() + return True + + def _update_state(self) -> None: + """Update game state based on current board.""" + # Check if puzzle is complete + if all(self._board[r][c] != 0 for r in range(9) for c in range(9)): + # Verify solution is correct + if self._board == self._solution: + self._state = "success" + else: + self._state = "failed" + else: + self._state = "ongoing" + + def pretty(self, colors: bool = True) -> str: + """Pretty print the Sudoku board.""" + RESET = "\x1b[0m" + INITIAL = "\x1b[38;5;45m" # Cyan for initial numbers + PLACED = "\x1b[38;5;226m" # Yellow for placed numbers + EMPTY = "\x1b[38;5;239m" # Gray for empty cells + + lines = [] + lines.append("┌───────┬───────┬───────┐") + + for row in range(9): + row_str = "│ " + for col in range(9): + num = self._board[row][col] + + if colors: + if num == 0: + row_str += f"{EMPTY}.{RESET}" + elif self._initial_board[row][col] != 0: + row_str += f"{INITIAL}{num}{RESET}" + else: + row_str += f"{PLACED}{num}{RESET}" + else: + row_str += str(num) if num != 0 else "." + + if col % 3 == 2: + row_str += " │ " + else: + row_str += " " + + lines.append(row_str.rstrip()) + + if row == 8: + lines.append("└───────┴───────┴───────┘") + elif row % 3 == 2: + lines.append("├───────┼───────┼───────┤") + + return "\n".join(lines) + + +# Test the Sudoku environment: + +# In[ ]: + + +# Create an easy puzzle +game = SudokuGame(difficulty = 30, seed = 42) +print("Initial puzzle:") +print(game.pretty()) +print(f"\nState: {game.state()}, Moves: {game.moves()}") + + +# In[ ]: + + +game + + +# Try making some moves: + +# In[ ]: + + +# Make a valid move +game.place_number(0, 1, 7) +print("\nAfter placing 7 at (1,0):") +print(game.pretty()) +print(f"State: {game.state()}, Moves: {game.moves()}") + + +# If we do some other action that's not part of the action space, we will get an error, and the game will not accept anymore actions. + +# # RL Environment Setup +# +# Execute strategies with time limits to prevent infinite loops. + +# In[ ]: + + +from typing import Callable +from unsloth import execute_with_time_limit + +def _execute_strategy(strategy: Callable, game: SudokuGame): + """Execute a strategy function on a Sudoku game.""" + assert callable(strategy) + + max_moves = 100 + valid_moves = 0 # Track successful moves + + while game.state() == "ongoing" and valid_moves < max_moves: + try: + board = game.board() + initial = game.initial_board() + result = strategy(board, initial) + + # Validate result format + if not isinstance(result, (tuple, list)) or len(result) != 3: + # Invalid format = immediate fail, but return valid moves made + return valid_moves, "failed" + + row, col, num = result + + # Validate types + if not all(isinstance(x, int) for x in [row, col, num]): + return valid_moves, "failed" + + # Try to place number + success = game.place_number(row, col, num) + + if success: + valid_moves += 1 # Count this valid move + else: + # Invalid move = game fails, but return valid_moves made so far + return valid_moves, "failed" + + except Exception: + return valid_moves, "failed" + + if valid_moves >= max_moves and game.state() == "ongoing": + return valid_moves, "failed" + + return valid_moves, game.state() + + +# To allow longer strategies for Reinforcement Learning, we shall allow a 10 second timer. + +# In[ ]: + + +@execute_with_time_limit(10) +def execute_strategy(strategy: Callable, game: SudokuGame): + """Execute strategy with 10 second time limit.""" + return _execute_strategy(strategy, game) + + +# Test with a simple strategy: + +# In[ ]: + + +def simple_strategy(board, initial): + """Simple strategy: fill first empty cell with 1.""" + for r in range(9): + for c in range(9): + if board[r][c] == 0 and initial[r][c] == 0: + return (r, c, 7) + return (0, 0, 7) + +game = SudokuGame(difficulty = 30, seed = 42) +try: + moves, state = execute_strategy(simple_strategy, game) + print(f"Moves: {moves}, State: {state}") +except TimeoutError as e: + print(f"Timed out: {e}") + + +# In[ ]: + + +print(game.pretty()) + + +# # Code Execution +# +# To execute and create a new Python function, we first have to check if the function does not call other global variables or cheat. This is called `countering reward hacking` since we don't want the function to cheat. +# +# For example the below piece of code is fine, since it only imports Python level functions. We use `check_python_modules`: + +# In[ ]: + + +from unsloth import check_python_modules, create_locked_down_function + +# Test safe code +sample = """ +def strategy(board, initial): + for r in range(9): + for c in range(9): + if board[r][c] == 0: + return (r, c, 1) + return (0, 0, 1) +""" + +ok, info = check_python_modules(sample) +print("Safe Python code?", ok) +print(info) + + +# For the below piece of code, since we import `numpy`, we should not allow the execution: + +# In[ ]: + + +sample = """ +def strategy(board, initial): + import numpy as np + return (0, 0, 1) +""" + +ok, info = check_python_modules(sample) +print("Safe Python code?", ok) +print(info) + + +# # Data & RL task setup +# +# Create the prompt that instructs the model to generate a Sudoku solving strategy. You can customize this to some other task for another RL task. + +# In[ ]: + + +prompt = """ +Create a Sudoku solving strategy using only native Python built-in functions without any import statements. +You are given two lists of lists (9x9 grids): +- board: current state (0 means empty) +- initial: starting puzzle (0 means was empty, numbers are fixed) + +Return a tuple (row, col, number) for the next move. +- row: 0-8 (row index) +- col: 0-8 (column index) +- number: 1-9 (digit to place) + +Only place numbers in cells that are BOTH empty in initial AND empty in board (initial[row][col] == 0 AND board[row][col] == 0) +Use Sudoku rules: no duplicates in rows, columns, or 3x3 boxes. +Output your function in backticks: +```python +def strategy(board, initial): + # Your logic here + return (row, col, number) +``` +All helper functions must be inside def strategy. Output only the function. +""".strip() + +print(prompt) + + +# First, let's prompt the model without RL and see how it goes: + +# In[ ]: + + +text = tokenizer.apply_chat_template( + [{"role": "user", "content": prompt.strip()}], + tokenize = False, + add_generation_prompt = True, +) + +from transformers import TextStreamer +print("=" * 50) +print("BASE MODEL OUTPUT (before RL training):") +print("=" * 50) + +inputs = tokenizer( + text = text, + add_special_tokens = False, + return_tensors = "pt", +).to("cuda") + +text_streamer = TextStreamer(tokenizer, skip_prompt = True) +result = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128, + use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64) + + +# # Reward functions +# +# We now design a `extract_function` function which simply extracts the function wrapped in 3 back ticks. +# +# And 3 reward functions: +# +# 1. `function_works` which rewards the model if the strategy is a valid Python function. +# 2. `no_cheating` which checks if the function imported other modules, and if it did, we penalize it. +# 3. `strategy_succeeds` which checks if the game strategy actually succeeds in attaining Sudoku after running the auto-generated strategy. + +# In[ ]: + + +def extract_function(text): + """Extract Python function from markdown code blocks.""" + if text.count("```") >= 2: + first = text.find("```") + 3 + second = text.find("```", first) + fx = text[first:second].strip() + fx = fx.removeprefix("python\n") + fx = fx[fx.find("def"):] + if fx.startswith("def strategy(board, initial):"): + return fx + return None + + +# **Reward 1: Function Works** +# +# Checks if the generated code is valid Python and can be executed. + +# In[ ]: + + +def function_works(completions, **kwargs): + """Reward for generating valid executable Python code.""" + scores = [] + for completion in completions: + score = 0 + response = completion[0]["content"] + function = extract_function(response) + + if function is not None: + ok, info = check_python_modules(function) + + if function is None or "error" in info: + score = -2.0 # Invalid function + else: + try: + new_strategy = create_locked_down_function(function) + score = 1.0 # Valid function + except: + score = -1.0 # Function has errors + + scores.append(score) + return scores + + +# **Reward 2: No Cheating** +# +# Penalizes functions that import external libraries. + +# In[ ]: + + +def no_cheating(completions, **kwargs): + """Penalize use of external imports.""" + scores = [] + for completion in completions: + response = completion[0]["content"] + function = extract_function(response) + + if function is not None: + ok, info = check_python_modules(function) + scores.append(1.0 if ok else -20.0) # Heavy penalty for cheating + else: + scores.append(-1.0) # Failed to create function + + return scores + + +# **Reward 3: Strategy Succeeds** +# +# Rewards strategies that successfully solve Sudoku puzzles. + +# In[ ]: + + +import numpy as np + +global PRINTER +PRINTER = 0 + +def strategy_succeeds(completions, **kwargs): + """Reward valid moves even if strategy eventually fails.""" + global PRINTER + scores = [] + + seed = np.random.randint(10000) + difficulty = 40 + for completion in completions: + printed = False + response = completion[0]["content"] + function = extract_function(response) + + if PRINTER % 5 == 0: + printed = True + print("\n" + "=" * 60) + print(function) + print("=" * 60) + PRINTER += 1 + + if function is not None: + ok, info = check_python_modules(function) + + if function is None or "error" in info: + scores.append(0) + continue + + try: + new_strategy = create_locked_down_function(function) + except: + scores.append(0) + continue + + try: + game = SudokuGame(difficulty = difficulty, seed = seed) + valid_moves, game_state = execute_strategy(new_strategy, game) + if valid_moves == difficulty: + game_state = "success" + + print(f"\n Valid moves: {valid_moves}, Final state: {game_state}") + + if not printed: + print("Strategy:") + print(function[:200] + "..." if len(function) > 200 else function) + + print("\nFinal board:") + print(game.pretty()) + + if game_state == "success": + scores.append(30.0) # Solved the puzzle! + elif valid_moves > 0: + # Reward based on valid moves made before failure + # Each valid move is worth 0.2 points + reward = valid_moves * 0.2 + scores.append(reward) + else: + scores.append(-2.0) # Failed immediately with no valid moves + + except TimeoutError: + print("Timeout") + scores.append(-1.0) + except Exception as e: + print(f"Exception: {str(e)[:100]}") + scores.append(-3.0) + + return scores + + +# # Dataset Preparation +# +# Create the training dataset. + +# In[ ]: + + +from datasets import Dataset + +dataset = Dataset.from_list([ + { + "prompt": [{"role": "user", "content": prompt.strip()}], + "answer": 0, + } +] * 1000) + +maximum_length = len(tokenizer.apply_chat_template( + [{"role": "user", "content": prompt.strip()}], + add_generation_prompt = True +)) + +print(f"Maximum prompt length: {maximum_length}") +print("\nDataset sample:") +print(dataset[0]) + + +# +# ### Train the model +# +# Now set up GRPO Trainer and all configurations! We also support GSPO, GAPO, Dr GRPO and more! Go the Unsloth [Reinforcement Learning Docs](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide) for more options. + +# In[ ]: + + +# Leave room for the prompt (plus 1 token safety margin) +max_completion_length = max_seq_length - (maximum_length + 1) + +from trl import GRPOConfig, GRPOTrainer +training_args = GRPOConfig( + temperature = 1.0, + learning_rate = 5e-5, + weight_decay = 0.001, + warmup_ratio = 0.1, + lr_scheduler_type = "linear", + optim = "adamw_8bit", + logging_steps = 1, + per_device_train_batch_size = 1, + gradient_accumulation_steps = 2, # Increase to 4 for smoother training + num_generations = 2, # Decrease if out of memory + max_completion_length = max_completion_length, + # num_train_epochs = 1, # Set to 1 for a full training run + max_steps = 60, + save_steps = 100, + report_to = "none", # Can use Weights & Biases, TrackIO + output_dir = "outputs", + epsilon = 0.2, + epsilon_high = 0.28, # one sided + delta = 1.5, # two sided + loss_type = 'bnpo', + mask_truncated_completions = True + # For optional training + evaluation + # fp16_full_eval = True, + # per_device_eval_batch_size = 4, + # eval_accumulation_steps = 1, + # eval_strategy = "steps", + # eval_steps = 1, +) + + +# And let's run the trainer! If you scroll up, you'll see a table of rewards. The goal is to see the `reward` column increase! +# +# You might have to wait 150 to 200 steps for any action. You'll probably get low reward for the first 100 steps. Please be patient! +# +# | Step | Training Loss | reward | reward_std | completion_length | kl | +# |------|---------------|-----------|------------|-------------------|----------| +# | 1 | 0.000000 | 0.125000 | 0.000000 | 200.000000 | 0.000000 | +# | 2 | 0.000000 | 0.072375 | 0.248112 | 200.000000 | 0.000000 | +# | 3 | 0.000000 | -0.079000 | 0.163776 | 182.500000 | 0.000005 | + +# In[ ]: + + +# For optional training + evaluation +# new_dataset = dataset.train_test_split(test_size = 0.01) + +trainer = GRPOTrainer( + model = model, + processing_class = tokenizer, + reward_funcs = [ + function_works, + no_cheating, + strategy_succeeds, + ], + args = training_args, + train_dataset = dataset, + + # For optional training + evaluation + # train_dataset = new_dataset["train"], + # eval_dataset = new_dataset["test"], +) + + +# And let's train the model! +# +# **NOTE** A T4 free GPU might take 5 minutes for one generation sadly since it's an old GPU - A100 or H100 will be much faster! + +# In[ ]: + + +trainer.train() + + +# And now with the LoRA we just trained with GRPO - we first save the LoRA first! + +# In[ ]: + + +model.save_pretrained("gemma_4_lora") # Local saving +tokenizer.save_pretrained("gemma_4_lora") + + +# Verify LoRA is actually trained! + +# In[ ]: + + +from safetensors import safe_open + +tensors = {} +with safe_open("grpo_saved_lora/adapter_model.safetensors", framework = "pt") as f: + # Verify both A and B are non zero + for key in f.keys(): + tensor = f.get_tensor(key) + n_zeros = (tensor == 0).sum() / tensor.numel() + assert(n_zeros.item() != tensor.numel()) + + +# +# # Inference +# Now let's try the model we just trained! + +# In[ ]: + + +text = tokenizer.apply_chat_template( + [{"role": "user", "content": prompt.strip()}], + tokenize = False, + add_generation_prompt = True, +) + +from transformers import TextStreamer + +_ = model.generate( + **tokenizer(images = None,text = text, return_tensors = "pt").to("cuda"), + temperature = 1.0, + max_new_tokens = 512, + streamer = TextStreamer(tokenizer, skip_prompt = False), +) + + +# +# ### Saving to float16 for VLLM +# +# We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See [our docs](https://unsloth.ai/docs/basics/inference-and-deployment) for more deployment options. + +# In[ ]: + + +# Merge to 16bit +if False: model.save_pretrained_merged("gemma_4_finetune_16bit", tokenizer, save_method = "merged_16bit",) +if False: model.push_to_hub_merged("HF_USERNAME/gemma_4_finetune_16bit", tokenizer, save_method = "merged_16bit", token = "YOUR_HF_TOKEN") + +# Merge to 4bit +if False: model.save_pretrained_merged("gemma_4_finetune_4bit", tokenizer, save_method = "merged_4bit",) +if False: model.push_to_hub_merged("HF_USERNAME/gemma_4_finetune_4bit", tokenizer, save_method = "merged_4bit", token = "YOUR_HF_TOKEN") + +# Just LoRA adapters +if False: + model.save_pretrained("gemma_4_lora") + tokenizer.save_pretrained("gemma_4_lora") +if False: + model.push_to_hub("HF_USERNAME/gemma_4_lora", token = "YOUR_HF_TOKEN") + tokenizer.push_to_hub("HF_USERNAME/gemma_4_lora", token = "YOUR_HF_TOKEN") + + +# ### GGUF / llama.cpp Conversion +# To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF. +# +# Some supported quant methods (full list on our [docs page](https://unsloth.ai/docs/basics/inference-and-deployment/saving-to-gguf)): +# * `q8_0` - Fast conversion. High resource use, but generally acceptable. +# * `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K. +# * `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K. +# +# [**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb) + +# In[ ]: + + +# Save to 8bit Q8_0 +if False: model.save_pretrained_gguf("gemma_4_finetune", tokenizer,) +# Remember to go to https://huggingface.co/settings/tokens for a token! +# And change hf to your username! +if False: model.push_to_hub_gguf("HF_USERNAME/gemma_4_finetune", tokenizer, token = "YOUR_HF_TOKEN") + +# Save to 16bit GGUF +if False: model.save_pretrained_gguf("gemma_4_finetune", tokenizer, quantization_method = "f16") +if False: model.push_to_hub_gguf("HF_USERNAME/gemma_4_finetune", tokenizer, quantization_method = "f16", token = "YOUR_HF_TOKEN") + +# Save to q4_k_m GGUF +if False: model.save_pretrained_gguf("gemma_4_finetune", tokenizer, quantization_method = "q4_k_m") +if False: model.push_to_hub_gguf("HF_USERNAME/gemma_4_finetune", tokenizer, quantization_method = "q4_k_m", token = "YOUR_HF_TOKEN") + +# Save to multiple GGUF options - much faster if you want multiple! +if False: + model.push_to_hub_gguf( + "HF_USERNAME/gemma_4_finetune", # Change hf to your username! + tokenizer, + quantization_method = ["q4_k_m", "q8_0", "q5_k_m",], + token = "YOUR_HF_TOKEN", + ) + + +# Now, use the `gemma_4_finetune.Q8_0.gguf` file or `gemma_4_finetune.Q4_K_M.gguf` file in llama.cpp. +# +# And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord! +# +# Some other resources: +# 1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb) +# 2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb) +# 3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) +# 4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)! +# +#
+# +# +# +# +# Join Discord if you need help + ⭐️ Star us on Github ⭐️ +#
+# +# This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). diff --git a/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E4B)-Audio.py b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E4B)-Audio.py new file mode 100644 index 0000000..8fe447a --- /dev/null +++ b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E4B)-Audio.py @@ -0,0 +1,478 @@ +#!/usr/bin/env python +# coding: utf-8 + +# To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance! +#
+# +# +# Join Discord if you need help + ⭐ Star us on Github ⭐ +#
+# +# To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). +# +# You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it + +# ### News + +# Introducing **Unsloth Studio** - a new open source, no-code web UI to train and run LLMs. [Blog](https://unsloth.ai/docs/new/studio) • [Notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb) +# +# +# +# +#
Unsloth Studio Training UI
Train models — no code needed
Unsloth Studio Chat UI
Run GGUF models on Mac, Windows & Linux
+# +# Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe) +# +# Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context) +# +# New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) • [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) • [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning) +# +# Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks). + +# # ### Installation +# +# # In[1]: +# +# +# get_ipython().run_cell_magic('capture', '', 'import os, re\nif "COLAB_" not in "".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r\'[\\d]{1,}\\.[\\d]{1,}\', str(torch.__version__)).group(0)\n xformers = \'xformers==\' + {\'2.10\':\'0.0.34\',\'2.9\':\'0.0.33.post1\',\'2.8\':\'0.0.32.post2\'}.get(v, "0.0.34")\n !pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;\n') +# +# +# # In[2]: +# +# +# get_ipython().run_cell_magic('capture', '', '!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio\n') +# +# +# # ### Unsloth +# +# `FastModel` supports loading nearly any model now! This includes Vision and Text models! + +# In[3]: + + +from unsloth import FastModel +import torch +from huggingface_hub import snapshot_download + +fourbit_models = [ + # Gemma 4 models + "unsloth/gemma-4-E2B-it", + "unsloth/gemma-4-E2B", + "unsloth/gemma-4-E2B-it", + "unsloth/gemma-4-E4B", + "unsloth/gemma-4-31B-it", + "unsloth/gemma-4-31B", + "unsloth/gemma-4-26B-A4B-it", + "unsloth/gemma-4-26B-A4B", +] # More models at https://huggingface.co/unsloth + +model, processor = FastModel.from_pretrained( + model_name = "unsloth/gemma-4-E4B-it", + dtype = None, # None for auto detection + max_seq_length = 8192, # Choose any for long context! + load_in_4bit = True, # 4 bit quantization to reduce memory + full_finetuning = False, # [NEW!] We have full finetuning now! + # token = "YOUR_HF_TOKEN", # HF Token for gated models +) + + +# # Gemma 4 can process Text, Vision and Audio! +# +# Let's first experience how Gemma 4 can handle multimodal inputs. We use Gemma 4's recommended settings of `temperature = 1.0, top_p = 0.95, top_k = 64` but for this example we use `do_sample=False` for ASR. + +# In[4]: + + +from transformers import TextStreamer +# Helper function for inference +def do_gemma_4_inference(messages, max_new_tokens = 128): + _ = model.generate( + **processor.apply_chat_template( + messages, + add_generation_prompt = True, # Must add for generation + tokenize = True, + return_dict = True, + return_tensors = "pt", + ).to("cuda"), + max_new_tokens = max_new_tokens, + do_sample = False, + streamer = TextStreamer(processor, skip_prompt = True), + ) + + +#

Let's Evaluate Gemma 4 Baseline Performance on German Transcription

+ +# In[5]: + + +from datasets import load_dataset,Audio,concatenate_datasets + +dataset = load_dataset("kadirnar/Emilia-DE-B000000", split = "train") + +# Select a single audio sample to reserve for testing. +# This index is chosen from the full dataset before we create the smaller training split. +test_audio = dataset[7546] + +dataset = dataset.select(range(3000)) + +dataset = dataset.cast_column("audio", Audio(sampling_rate = 16000)) + + +# In[6]: + + +from IPython.display import Audio, display +print(test_audio['text']) +Audio(test_audio['audio']['array'],rate = test_audio['audio']['sampling_rate']) + + +# And the translation of the audio from German to English is: +# +# > I—I hold myself directly accountable. That much is, of course, clear: namely, that there are political interests involved in trade—in the exchange of goods—and that political influences are at play. The question is: that should not be the alternative. + +# In[7]: + + +messages = [ + { + "role": "system", + "content": [ + { + "type": "text", + "text": "You are an assistant that transcribes speech accurately.", + } + ], + }, + { + "role": "user", + "content": [ + {"type": "audio", "audio": test_audio['audio']['array']}, + {"type": "text", "text": "Please transcribe this audio."} + ] + } +] + +do_gemma_4_inference(messages, max_new_tokens = 256) + + +#

Baseline Model Performance: 32.43% Word Error Rate (WER) for this sample !

+ +# # Let's finetune Gemma 4! +# +# You can finetune the vision and text and audio parts + +# We now add LoRA adapters so we only need to update a small amount of parameters! + +# In[8]: + + +model = FastModel.get_peft_model( + model, + finetune_vision_layers = False, # False if not finetuning vision layers + finetune_language_layers = True, # False if not finetuning language layers + finetune_attention_modules = True, # False if not finetuning attention layers + finetune_mlp_modules = True, # False if not finetuning MLP layers + + r = 8, # The larger, the higher the accuracy, but might overfit + lora_alpha = 16, # Recommended alpha == r at least + lora_dropout = 0, + bias = "none", + random_state = 3407, + use_rslora = False, # We support rank stabilized LoRA + loftq_config = None, # And LoftQ + target_modules = [ + "q_proj", "k_proj", "v_proj", "o_proj", + "gate_proj", "up_proj", "down_proj", + + # Audio layers + "post", "linear_start", "linear_end", + "embedding_projection", + "ffw_layer_1", "ffw_layer_2", + "output_proj", + ] +) + + +# +# ### Data Prep +# We adapt the `kadirnar/Emilia-DE-B000000` dataset for our German ASR task using Gemma 4 multi-modal chat format. Each audio-text pair is structured into a conversation with `system`, `user`, and `assistant` roles. The processor then converts this into the final training format: +# +# ``` +# <|turn>system +# You are an assistant that transcribes speech accurately. +# <|turn>user +# <|audio|>Please transcribe this audio. +# <|turn>model +# Ich, ich rechne direkt mich an. + +# In[9]: + + +def format_intersection_data(samples: dict) -> dict[str, list]: + """Format intersection dataset to match expected message format""" + formatted_samples = {"messages": []} + for idx in range(len(samples["audio"])): + audio = samples["audio"][idx]["array"] + label = str(samples["text"][idx]) + + message = [ + { + "role": "system", + "content": [ + { + "type": "text", + "text": "You are an assistant that transcribes speech accurately.", + } + ], + }, + { + "role": "user", + "content": [ + {"type": "audio", "audio": audio}, + {"type": "text", "text": "Please transcribe this audio."} + ] + }, + { + "role": "assistant", + "content":[{"type": "text", "text": label}] + } + ] + formatted_samples["messages"].append(message) + return formatted_samples + + +# In[10]: + + +dataset = dataset.map(format_intersection_data, batched = True, batch_size = 4, num_proc = 4) + + +# +# ### Train the model +# Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. + +# In[11]: + + +# Use UnslothVisionDataCollator which handles audio token alignment correctly +from unsloth.trainer import UnslothVisionDataCollator +from trl import SFTTrainer, SFTConfig + +trainer = SFTTrainer( + model = model, + train_dataset = dataset, + processing_class = processor.tokenizer, + data_collator = UnslothVisionDataCollator(model, processor), + args = SFTConfig( + per_device_train_batch_size = 8, + gradient_accumulation_steps = 1, + warmup_ratio = 0.03, + # num_train_epochs = 1, # Use for full training runs + max_steps = 60, + learning_rate = 5e-5, + logging_steps = 1, + save_strategy = "steps", + optim = "adamw_8bit", + weight_decay = 0.001, + lr_scheduler_type = "cosine", + seed = 3407, + output_dir = "outputs", + report_to = "none", + remove_unused_columns = False, + + # The below are a must for audio finetuning: + dataset_text_field = "", + dataset_kwargs = {"skip_prepare_dataset": True}, + max_length = 8192, + ) +) + + +# In[12]: + + +# @title Show current memory stats +gpu_stats = torch.cuda.get_device_properties(0) +start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) +max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3) +print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.") +print(f"{start_gpu_memory} GB of memory reserved.") + + +# # Let's train the model! +# +# To resume a training run, set `trainer.train(resume_from_checkpoint = True)` + +# In[13]: + + +trainer_stats = trainer.train() + + +# In[14]: + + +# @title Show final memory and time stats +used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) +used_memory_for_lora = round(used_memory - start_gpu_memory, 3) +used_percentage = round(used_memory / max_memory * 100, 3) +lora_percentage = round(used_memory_for_lora / max_memory * 100, 3) +print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.") +print( + f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training." +) +print(f"Peak reserved memory = {used_memory} GB.") +print(f"Peak reserved memory for training = {used_memory_for_lora} GB.") +print(f"Peak reserved memory % of max memory = {used_percentage} %.") +print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.") + + +# +# ### Inference +# Let's run the model via Unsloth native inference! According to the `Gemma-4` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64` but for this example we use `do_sample=False` for ASR. + +# In[15]: + + +messages = [ + { + "role": "system", + "content": [ + { + "type": "text", + "text": "You are an assistant that transcribes speech accurately.", + } + ], + }, + { + "role": "user", + "content": [ + {"type": "audio", "audio": test_audio['audio']['array']}, + {"type": "text", "text": "Please transcribe this audio."} + ] + } +] + +do_gemma_4_inference(messages, max_new_tokens = 256) + + +# +# ### Saving, loading finetuned models +# To save the final model as LoRA adapters, either use Hugging Face's `push_to_hub` for an online save or `save_pretrained` for a local save. +# +# **[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down! + +# In[16]: + + +model.save_pretrained("gemma_4_lora") # Local saving +processor.save_pretrained("gemma_4_lora") +# model.push_to_hub("HF_ACCOUNT/gemma_4_lora", token = "YOUR_HF_TOKEN") # Online saving +# processor.push_to_hub("HF_ACCOUNT/gemma_4_lora", token = "YOUR_HF_TOKEN") # Online saving + + +# Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`: + +# In[17]: + + +if False: + from unsloth import FastModel + model, processor = FastModel.from_pretrained( + model_name = "gemma_4_lora", # YOUR MODEL YOU USED FOR TRAINING + max_seq_length = 2048, + load_in_4bit = True, + ) + +messages = [{ + "role": "user", + "content": [{"type" : "text", "text" : "What is Gemma-4?",}] +}] +inputs = processor.apply_chat_template( + messages, + add_generation_prompt = True, # Must add for generation + return_tensors = "pt", + tokenize = True, + return_dict = True, +).to("cuda") + +from transformers import TextStreamer +_ = model.generate( + **inputs, + max_new_tokens = 128, # Increase for longer outputs! + # Recommended Gemma-4 settings! + temperature = 1.0, top_p = 0.95, top_k = 64, + streamer = TextStreamer(processor, skip_prompt = True), +) + + +# ### Saving to float16 for VLLM +# +# We also support saving to `float16` directly for deployment! We save it in the folder `gemma-4-finetune`. Set `if False` to `if True` to let it run! + +# In[18]: + + +if False: # Change to True to save finetune! + model.save_pretrained_merged("gemma-4", processor) + + +# If you want to upload / push to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location! + +# In[19]: + + +if False: # Change to True to upload finetune + model.push_to_hub_merged( + "HF_ACCOUNT/gemma-4-finetune", processor, + token = "YOUR_HF_TOKEN" + ) + + +# ### GGUF / llama.cpp Conversion +# To save to `GGUF` / `llama.cpp`, we support it natively now for all models! For now, you can convert easily to `Q8_0, F16 or BF16` precision. `Q4_K_M` for 4bit will come later! + +# In[20]: + + +if False: # Change to True to save to GGUF + model.save_pretrained_gguf( + "gemma_4_finetune", + processor, + quantization_method = "Q8_0", # For now only Q8_0, BF16, F16 supported + ) + + +# Likewise, if you want to instead push to GGUF to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location! + +# In[21]: + + +if False: # Change to True to upload GGUF + model.push_to_hub_gguf( + "HF_ACCOUNT/gemma_4_finetune", + processor, + quantization_method = "Q8_0", # Only Q8_0, BF16, F16 supported + token = "YOUR_HF_TOKEN", + ) + + +# Now, use the `gemma-4-finetune.gguf` file or `gemma-4-finetune-Q4_K_M.gguf` file in llama.cpp. +# +# And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord! +# +# Some other resources: +# 1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb) +# 2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb) +# 3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) +# 4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)! +# +#
+# +# +# +# +# Join Discord if you need help + ⭐️ Star us on Github ⭐️ +#
+# +# This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). diff --git a/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E4B)-Text.py b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E4B)-Text.py new file mode 100644 index 0000000..b68f835 --- /dev/null +++ b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E4B)-Text.py @@ -0,0 +1,557 @@ +#!/usr/bin/env python +# coding: utf-8 + +# To run this, press "*Runtime*" and press "*Run all*" on a Google Colab L4 instance! +#
+# +# +# Join Discord if you need help + ⭐ Star us on Github ⭐ +#
+# +# To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). +# +# You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it + +# ### News + +# Introducing **Unsloth Studio** - a new open source, no-code web UI to train and run LLMs. [Blog](https://unsloth.ai/docs/new/studio) • [Notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb) +# +# +# +# +#
Unsloth Studio Training UI
Train models — no code needed
Unsloth Studio Chat UI
Run GGUF models on Mac, Windows & Linux
+# +# Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe) +# +# Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context) +# +# New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) • [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) • [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning) +# +# Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks). + +# # ### Installation +# +# # In[1]: +# +# +# get_ipython().run_cell_magic('capture', '', 'import os, re\nif "COLAB_" not in "".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r\'[\\d]{1,}\\.[\\d]{1,}\', str(torch.__version__)).group(0)\n xformers = \'xformers==\' + {\'2.10\':\'0.0.34\',\'2.9\':\'0.0.33.post1\',\'2.8\':\'0.0.32.post2\'}.get(v, "0.0.34")\n !pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;\n') +# +# +# # In[2]: +# +# +# get_ipython().run_cell_magic('capture', '', '!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio\n') +# +# +# # ### Unsloth +# +# `FastModel` supports loading nearly any model now! This includes Vision and Text models! + +# In[3]: + + +from unsloth import FastModel +import torch + +gemma4_models = [ + # Gemma-4 instruct models: + "unsloth/gemma-4-E2B-it", + "unsloth/gemma-4-E4B-it", + "unsloth/gemma-4-31B-it", + "unsloth/gemma-4-26B-A4B-it", + # Gemma-4 base models: + "unsloth/gemma-4-E2B", + "unsloth/gemma-4-E4B", + "unsloth/gemma-4-31B", + "unsloth/gemma-4-26B-A4B", +] # More models at https://huggingface.co/unsloth + +model, tokenizer = FastModel.from_pretrained( + model_name = "unsloth/gemma-4-E4B-it", + dtype = None, # None for auto detection + max_seq_length = 1024, # Choose any for long context! + load_in_4bit = True, # 4 bit quantization to reduce memory + full_finetuning = False, # [NEW!] We have full finetuning now! + # token = "YOUR_HF_TOKEN", # HF Token for gated models +) + + +# # Gemma 4 can process Text, Vision and Audio! +# +# Let's first experience how Gemma 4 can handle multimodal inputs. We use Gemma 4's recommended settings of `temperature = 1.0, top_p = 0.95, top_k = 64` + +# In[4]: + + +from transformers import TextStreamer +# Helper function for inference +def do_gemma_4_inference(messages, max_new_tokens = 128): + _ = model.generate( + **tokenizer.apply_chat_template( + messages, + add_generation_prompt = True, # Must add for generation + tokenize = True, + return_dict = True, + return_tensors = "pt", + ).to("cuda"), + max_new_tokens = max_new_tokens, + temperature = 1.0, top_p = 0.95, top_k = 64, + streamer = TextStreamer(tokenizer, skip_prompt = True), + use_cache = True + ) + + +# # Gemma 4 can see images! +# +# Alt text + +# In[5]: + + +sloth_link = "https://files.worldwildlife.org/wwfcmsprod/images/Sloth_Sitting_iStock_3_12_2014/story_full_width/8l7pbjmj29_iStock_000011145477Large_mini__1_.jpg" + +messages = [{ + "role" : "user", + "content": [ + { "type": "image", "image" : sloth_link }, + { "type": "text", "text" : "Which films does this animal feature in?" } + ] +}] +# You might have to wait 1 minute for Unsloth's auto compiler +do_gemma_4_inference(messages, max_new_tokens = 256) + + +# Let's make a poem about sloths! + +# In[6]: + + +messages = [{ + "role": "user", + "content": [{ "type" : "text", + "text" : "Write a poem about sloths." }] +}] +do_gemma_4_inference(messages) + + +# # Gemma 4 can also hear! + +# In[7]: + + +from IPython.display import Audio, display +Audio("https://www.nasa.gov/wp-content/uploads/2015/01/591240main_JFKmoonspeech.mp3") + + +# In[8]: + + +get_ipython().system('wget -qqq https://www.nasa.gov/wp-content/uploads/2015/01/591240main_JFKmoonspeech.mp3 -O audio.mp3') + + +# In[9]: + + +audio_file = "audio.mp3" + +messages = [{ + "role" : "user", + "content": [ + { "type": "audio", "audio" : audio_file }, + { "type": "text", "text" : "What is this audio about?" } + ] +}] +do_gemma_4_inference(messages, max_new_tokens = 256) + + +# # Let's combine all 3 modalities together! + +# In[10]: + + +messages = [{ + "role" : "user", + "content": [ + { "type": "audio", "audio" : audio_file }, + { "type": "image", "image" : sloth_link }, + { "type": "text", "text" : "What is this audio and image about? "\ + "How are they related?" } + ] +}] +do_gemma_4_inference(messages, max_new_tokens = 256) + + +# # Let's finetune Gemma 4! +# +# You can finetune the vision and text parts for now through selection - the audio part can also be finetuned - we're working to make it selectable as well! + +# We now add LoRA adapters so we only need to update a small amount of parameters! + +# In[11]: + + +model = FastModel.get_peft_model( + model, + finetune_vision_layers = False, # Turn off for just text! + finetune_language_layers = True, # Should leave on! + finetune_attention_modules = True, # Attention good for GRPO + finetune_mlp_modules = True, # Should leave on always! + + r = 8, # Larger = higher accuracy, but might overfit + lora_alpha = 8, # Recommended alpha == r at least + lora_dropout = 0, + bias = "none", + random_state = 3407, +) + + +# +# ### Data Prep +# We now use the `Gemma-4` format for conversation style finetunes. We use [Maxime Labonne's FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset in ShareGPT style. Gemma-4 renders multi turn conversations like below: +# +# ``` +# <|turn>user +# Hello +# <|turn>model +# Hey there! +# ``` +# We use our `get_chat_template` function to get the correct chat template. We support `zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, phi3, llama3, phi4, qwen2.5, gemma3, gemma-4` and more. + +# In[12]: + + +from unsloth.chat_templates import get_chat_template +tokenizer = get_chat_template( + tokenizer, + chat_template = "gemma-4", +) + + +# We get the first 3000 rows of the dataset + +# In[13]: + + +from datasets import load_dataset +dataset = load_dataset("mlabonne/FineTome-100k", split = "train[:3000]") + + +# We now use `standardize_data_formats` to try converting datasets to the correct format for finetuning purposes! + +# In[14]: + + +from unsloth.chat_templates import standardize_data_formats +dataset = standardize_data_formats(dataset) + + +# Let's see how row 100 looks like! + +# In[15]: + + +dataset[100] + + +# We now have to apply the chat template for `Gemma-4` onto the conversations, and save it to `text`. We remove the `` token using removeprefix(`''`) since we're finetuning. The Processor will add this token before training and the model expects only one. + +# In[16]: + + +def formatting_prompts_func(examples): + convos = examples["conversations"] + texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False).removeprefix('') for convo in convos] + return { "text" : texts, } + +dataset = dataset.map(formatting_prompts_func, batched = True) + + +# Let's see how the chat template did! Notice there is no `` token as the processor tokenizer will be adding one. + +# In[17]: + + +dataset[100]["text"] + + +# +# ### Train the model +# Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. + +# In[18]: + + +from trl import SFTTrainer, SFTConfig +trainer = SFTTrainer( + model = model, + tokenizer = tokenizer, + train_dataset = dataset, + eval_dataset = None, # Can set up evaluation! + args = SFTConfig( + dataset_text_field = "text", + per_device_train_batch_size = 1, + gradient_accumulation_steps = 4, # Use GA to mimic batch size! + warmup_steps = 5, + # num_train_epochs = 1, # Set this for 1 full training run. + max_steps = 60, + learning_rate = 2e-4, # Reduce to 2e-5 for long training runs + logging_steps = 1, + optim = "adamw_8bit", + weight_decay = 0.001, + lr_scheduler_type = "linear", + seed = 3407, + report_to = "none", # Use TrackIO/WandB etc + ), +) + + +# We also use Unsloth's `train_on_completions` method to only train on the assistant outputs and ignore the loss on the user's inputs. This helps increase accuracy of finetunes! + +# In[19]: + + +from unsloth.chat_templates import train_on_responses_only +trainer = train_on_responses_only( + trainer, + instruction_part = "<|turn>user\n", + response_part = "<|turn>model\n", +) + + +# Let's verify masking the instruction part is done! Let's print the 100th row again. Notice how the sample only has a single `` as expected! + +# In[20]: + + +tokenizer.decode(trainer.train_dataset[100]["input_ids"]) + + +# Now let's print the masked out example - you should see only the answer is present: + +# In[21]: + + +tokenizer.decode([tokenizer.pad_token_id if x == -100 else x for x in trainer.train_dataset[100]["labels"]]).replace(tokenizer.pad_token, " ") + + +# In[22]: + + +# @title Show current memory stats +gpu_stats = torch.cuda.get_device_properties(0) +start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) +max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3) +print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.") +print(f"{start_gpu_memory} GB of memory reserved.") + + +# # Let's train the model! +# +# To resume a training run, set `trainer.train(resume_from_checkpoint = True)` + +# In[23]: + + +trainer_stats = trainer.train() + + +# In[24]: + + +# @title Show final memory and time stats +used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) +used_memory_for_lora = round(used_memory - start_gpu_memory, 3) +used_percentage = round(used_memory / max_memory * 100, 3) +lora_percentage = round(used_memory_for_lora / max_memory * 100, 3) +print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.") +print( + f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training." +) +print(f"Peak reserved memory = {used_memory} GB.") +print(f"Peak reserved memory for training = {used_memory_for_lora} GB.") +print(f"Peak reserved memory % of max memory = {used_percentage} %.") +print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.") + + +# +# ### Inference +# Let's run the model via Unsloth native inference! According to the `Gemma-4` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64` + +# In[25]: + + +from unsloth.chat_templates import get_chat_template +tokenizer = get_chat_template( + tokenizer, + chat_template = "gemma-4", +) +messages = [{ + "role": "user", + "content": [{ + "type" : "text", + "text" : "Continue the sequence: 1, 1, 2, 3, 5, 8,", + }] +}] +inputs = tokenizer.apply_chat_template( + messages, + add_generation_prompt = True, # Must add for generation + return_tensors = "pt", + tokenize = True, + return_dict = True, +).to("cuda") +outputs = model.generate( + **inputs, + max_new_tokens = 64, # Increase for longer outputs! + # Recommended Gemma-4 settings! + temperature = 1.0, top_p = 0.95, top_k = 64, +) +tokenizer.batch_decode(outputs) + + +# You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time! + +# In[26]: + + +messages = [{ + "role": "user", + "content": [{"type" : "text", "text" : "Why is the sky blue?",}] +}] +inputs = tokenizer.apply_chat_template( + messages, + add_generation_prompt = True, # Must add for generation + return_tensors = "pt", + tokenize = True, + return_dict = True, +).to("cuda") + +from transformers import TextStreamer +_ = model.generate( + **inputs, + max_new_tokens = 64, # Increase for longer outputs! + # Recommended Gemma-4 settings! + temperature = 1.0, top_p = 0.95, top_k = 64, + streamer = TextStreamer(tokenizer, skip_prompt = True), +) + + +# +# ### Saving, loading finetuned models +# To save the final model as LoRA adapters, either use Hugging Face's `push_to_hub` for an online save or `save_pretrained` for a local save. +# +# **[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down! + +# In[27]: + + +model.save_pretrained("gemma_4_lora") # Local saving +tokenizer.save_pretrained("gemma_4_lora") +# model.push_to_hub("HF_ACCOUNT/gemma_4_lora", token = "YOUR_HF_TOKEN") # Online saving +# tokenizer.push_to_hub("HF_ACCOUNT/gemma_4_lora", token = "YOUR_HF_TOKEN") # Online saving + + +# Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`: + +# In[28]: + + +if False: + from unsloth import FastModel + model, tokenizer = FastModel.from_pretrained( + model_name = "gemma_4_lora", # YOUR MODEL YOU USED FOR TRAINING + max_seq_length = 2048, + load_in_4bit = True, + ) + +messages = [{ + "role": "user", + "content": [{"type" : "text", "text" : "What is Gemma-4?",}] +}] +inputs = tokenizer.apply_chat_template( + messages, + add_generation_prompt = True, # Must add for generation + return_tensors = "pt", + tokenize = True, + return_dict = True, +).to("cuda") + +from transformers import TextStreamer +_ = model.generate( + **inputs, + max_new_tokens = 128, # Increase for longer outputs! + # Recommended Gemma-4 settings! + temperature = 1.0, top_p = 0.95, top_k = 64, + streamer = TextStreamer(tokenizer, skip_prompt = True), +) + + +# ### Saving to float16 for VLLM +# +# We also support saving to `float16` directly for deployment! We save it in the folder `gemma-4-finetune`. Set `if False` to `if True` to let it run! + +# In[29]: + + +if False: # Change to True to save finetune! + model.save_pretrained_merged("gemma-4-finetune", tokenizer) + + +# If you want to upload / push to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location! + +# In[30]: + + +if False: # Change to True to upload finetune + model.push_to_hub_merged( + "HF_ACCOUNT/gemma-4-finetune", tokenizer, + token = "YOUR_HF_TOKEN" + ) + + +# ### GGUF / llama.cpp Conversion +# To save to `GGUF` / `llama.cpp`, we support it natively now for all models! For now, you can convert easily to `Q8_0, F16 or BF16` precision. `Q4_K_M` for 4bit will come later! + +# In[31]: + + +if False: # Change to True to save to GGUF + model.save_pretrained_gguf( + "gemma_4_finetune", + tokenizer, + quantization_method = "Q8_0", # For now only Q8_0, BF16, F16 supported + ) + + +# Likewise, if you want to instead push to GGUF to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location! + +# In[32]: + + +if False: # Change to True to upload GGUF + model.push_to_hub_gguf( + "HF_ACCOUNT/gemma_4_finetune", + tokenizer, + quantization_method = "Q8_0", # Only Q8_0, BF16, F16 supported + token = "YOUR_HF_TOKEN", + ) + + +# Now, use the `gemma-4-finetune.gguf` file or `gemma-4-finetune-Q4_K_M.gguf` file in llama.cpp. +# +# And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord! +# +# Some other resources: +# 1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb) +# 2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb) +# 3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) +# 4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)! +# +#
+# +# +# +# +# Join Discord if you need help + ⭐️ Star us on Github ⭐️ +#
+# +# This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). diff --git a/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E4B)-Vision.py b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E4B)-Vision.py new file mode 100644 index 0000000..007a399 --- /dev/null +++ b/tooling/fine-tuning/unsloth/python_scripts/Gemma4_(E4B)-Vision.py @@ -0,0 +1,448 @@ +#!/usr/bin/env python +# coding: utf-8 + +# To run this, press "*Runtime*" and press "*Run all*" on a Google Colab L4 instance! +#
+# +# +# Join Discord if you need help + ⭐ Star us on Github ⭐ +#
+# +# To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). +# +# You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it + +# ### News + +# Introducing **Unsloth Studio** - a new open source, no-code web UI to train and run LLMs. [Blog](https://unsloth.ai/docs/new/studio) • [Notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb) +# +# +# +# +#
Unsloth Studio Training UI
Train models — no code needed
Unsloth Studio Chat UI
Run GGUF models on Mac, Windows & Linux
+# +# Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe) +# +# Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context) +# +# New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) • [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) • [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning) +# +# Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks). + +# # ### Installation +# +# # In[1]: +# +# +# get_ipython().run_cell_magic('capture', '', 'import os, re\nif "COLAB_" not in "".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r\'[\\d]{1,}\\.[\\d]{1,}\', str(torch.__version__)).group(0)\n xformers = \'xformers==\' + {\'2.10\':\'0.0.34\',\'2.9\':\'0.0.33.post1\',\'2.8\':\'0.0.32.post2\'}.get(v, "0.0.34")\n !pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;\n') +# +# +# # In[2]: +# +# +# get_ipython().run_cell_magic('capture', '', '!pip install --no-deps --upgrade timm # For Gemma 4 vision/audio\n') +# +# +# # ### Unsloth + +# In[3]: + + +from unsloth import FastVisionModel # FastLanguageModel for LLMs +import torch + +gemma4_models = [ + # Gemma-4 instruct models: + "unsloth/gemma-4-E2B-it", + "unsloth/gemma-4-E4B-it", + "unsloth/gemma-4-31B-it", + "unsloth/gemma-4-26B-A4B-it", + # Gemma-4 base models: + "unsloth/gemma-4-E2B", + "unsloth/gemma-4-E4B", + "unsloth/gemma-4-31B", + "unsloth/gemma-4-26B-A4B", +] # More models at https://huggingface.co/unsloth + +model, processor = FastVisionModel.from_pretrained( + "unsloth/gemma-4-E4B-it", + load_in_4bit = True, # Use 4bit to reduce memory use. False for 16bit LoRA. + use_gradient_checkpointing = "unsloth", # True or "unsloth" for long context +) + + +# We now add LoRA adapters for parameter efficient fine-tuning, allowing us to train only 1% of all model parameters efficiently. +# +# **[NEW]** We also support fine-tuning only the vision component, only the language component, or both. Additionally, you can choose to fine-tune the attention modules, the MLP layers, or both! + +# In[4]: + + +model = FastVisionModel.get_peft_model( + model, + finetune_vision_layers = True, # False if not finetuning vision layers + finetune_language_layers = True, # False if not finetuning language layers + finetune_attention_modules = True, # False if not finetuning attention layers + finetune_mlp_modules = True, # False if not finetuning MLP layers + + r = 32, # The larger, the higher the accuracy, but might overfit + lora_alpha = 32, # Recommended alpha == r at least + lora_dropout = 0, + bias = "none", + random_state = 3407, + use_rslora = False, # We support rank stabilized LoRA + loftq_config = None, # And LoftQ + target_modules = "all-linear", # Optional now! Can specify a list if needed +) + + +# +# ### Data Prep +# We'll use a sampled dataset of handwritten math formulas. The objective is to convert these images into a computer-readable format—specifically LaTeX—so they can be rendered. This is particularly useful for complex expressions. +# +# You can access the dataset [here](https://huggingface.co/datasets/unsloth/LaTeX_OCR). The full dataset is [here](https://huggingface.co/datasets/linxy/LaTeX_OCR). + +# In[5]: + + +from datasets import load_dataset +dataset = load_dataset("unsloth/LaTeX_OCR", split = "train") + + +# Let's take an overview of the dataset. We'll examine the second image and its corresponding caption. + +# In[6]: + + +dataset + + +# In[7]: + + +dataset[2]["image"] + + +# In[8]: + + +dataset[2]["text"] + + +# We can also render LaTeX directly in the browser! + +# In[9]: + + +from IPython.display import display, Math, Latex + +latex = dataset[3]["text"] +display(Math(latex)) + + +# To format the dataset, all vision fine-tuning tasks should follow this format: +# +# ```python +# [ +# { +# "role": "user", +# "content": [ +# {"type": "text", "text": instruction}, +# {"type": "image", "image": sample["image"]}, +# ], +# }, +# { +# "role": "user", +# "content": [ +# {"type": "text", "text": instruction}, +# {"type": "image", "image": sample["image"]}, +# ], +# }, +# ] +# ``` + +# In[10]: + + +instruction = "Write the LaTeX representation for this image." + +def convert_to_conversation(sample): + conversation = [ + { + "role": "user", + "content": [ + {"type": "text", "text": instruction}, + {"type": "image", "image": sample["image"]}, + ], + }, + {"role": "assistant", "content": [{"type": "text", "text": sample["text"]}]}, + ] + return {"messages": conversation} +pass + + +# Let's convert the dataset into the "correct" format for finetuning: + +# In[11]: + + +converted_dataset = [convert_to_conversation(sample) for sample in dataset] + + +# The first example is now structured like below: + +# In[12]: + + +converted_dataset[0] + + +# Lets take the Gemma 4 instruction chat template and use it in our base model + +# In[13]: + + +from unsloth import get_chat_template + +processor = get_chat_template( + processor, + "gemma-4" +) + + +# Before fine-tuning, let us evaluate the base model's performance. We do not expect strong results, as it has not encountered this chat template before. + +# In[14]: + + +image = dataset[2]["image"] +instruction = "Write the LaTeX representation for this image." + +messages = [ + { + "role": "user", + "content": [{"type": "image"}, {"type": "text", "text": instruction}], + } +] +input_text = processor.apply_chat_template(messages, add_generation_prompt = True) +inputs = processor( + image, + input_text, + add_special_tokens = False, + return_tensors = "pt", +).to("cuda") + +from transformers import TextStreamer + +text_streamer = TextStreamer(processor, skip_prompt = True) +result = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128, + use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64) + + +# You can see it's absolutely terrible! It doesn't follow instructions at all + +# +# ### Train the model +# Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support `DPOTrainer` and `GRPOTrainer` for reinforcement learning! +# +# We use our new `UnslothVisionDataCollator` which will help in our vision finetuning setup. + +# In[15]: + + +from unsloth.trainer import UnslothVisionDataCollator +from trl import SFTTrainer, SFTConfig + +trainer = SFTTrainer( + model = model, + train_dataset = converted_dataset, + processing_class = processor.tokenizer, + data_collator = UnslothVisionDataCollator(model, processor), + args = SFTConfig( + per_device_train_batch_size = 1, + gradient_accumulation_steps = 4, + max_grad_norm = 0.3, + warmup_ratio = 0.03, + max_steps = 60, + # num_train_epochs = 2, # Set this instead of max_steps for full training runs + learning_rate = 2e-4, + logging_steps = 1, + save_strategy = "steps", + optim = "adamw_8bit", + weight_decay = 0.001, + lr_scheduler_type = "cosine", + seed = 3407, + output_dir = "outputs", + report_to = "none", # For Weights and Biases or others + + # You MUST put the below items for vision finetuning: + remove_unused_columns = False, + dataset_text_field = "", + dataset_kwargs = {"skip_prepare_dataset": True}, + max_length = 2048, + ) +) + + +# In[16]: + + +# @title Show current memory stats +gpu_stats = torch.cuda.get_device_properties(0) +start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) +max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3) +print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.") +print(f"{start_gpu_memory} GB of memory reserved.") + + +# In[17]: + + +trainer_stats = trainer.train() + + +# In[18]: + + +# @title Show final memory and time stats +used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) +used_memory_for_lora = round(used_memory - start_gpu_memory, 3) +used_percentage = round(used_memory / max_memory * 100, 3) +lora_percentage = round(used_memory_for_lora / max_memory * 100, 3) +print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.") +print( + f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training." +) +print(f"Peak reserved memory = {used_memory} GB.") +print(f"Peak reserved memory for training = {used_memory_for_lora} GB.") +print(f"Peak reserved memory % of max memory = {used_percentage} %.") +print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.") + + +# +# ### Inference +# Let's run the model! You can modify the instruction and input—just leave the output blank. +# +# We'll use the best hyperparameters for inference on Gemma: `top_p=0.95`, `top_k=64`, and `temperature=1.0`. + +# In[19]: + + +image = dataset[10]["image"] +instruction = "Write the LaTeX representation for this image." + +messages = [ + { + "role": "user", + "content": [{"type": "image"}, {"type": "text", "text": instruction}], + } +] + +input_text = processor.apply_chat_template(messages, add_generation_prompt = True) + +inputs = processor( + image, + input_text, + add_special_tokens = False, + return_tensors = "pt", +).to("cuda") + +from transformers import TextStreamer + +text_streamer = TextStreamer(processor, skip_prompt = True) +result = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128, + use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64) + + +# +# ### Saving, loading finetuned models +# To save the final model as LoRA adapters, use Hugging Face’s `push_to_hub` for online saving, or `save_pretrained` for local storage. +# +# **[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down! + +# In[20]: + + +model.save_pretrained("gemma_4_lora") # Local saving +processor.save_pretrained("gemma_4_lora") +# model.push_to_hub("your_name/gemma_4_lora", token = "YOUR_HF_TOKEN") # Online saving +# processor.push_to_hub("your_name/gemma_4_lora", token = "YOUR_HF_TOKEN") # Online saving + + +# Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`: + +# In[21]: + + +if False: + from unsloth import FastVisionModel + + model, processor = FastVisionModel.from_pretrained( + model_name = "gemma_4_lora", # YOUR MODEL YOU USED FOR TRAINING + load_in_4bit = True, # Set to False for 16bit LoRA + ) + +sample = dataset[1] +image = sample["image"].convert("RGB") +messages = [ + { + "role": "user", + "content": [ + { + "type": "text", + "text": sample["text"], + }, + { + "type": "image", + }, + ], + }, +] +input_text = processor.apply_chat_template(messages, add_generation_prompt = True) +inputs = processor( + image, + input_text, + add_special_tokens = False, + return_tensors = "pt", +).to("cuda") + +from transformers import TextStreamer + +text_streamer = TextStreamer(processor.tokenizer, skip_prompt = True) +_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128, + use_cache = True, temperature = 1.0, top_p = 0.95, top_k = 64) + + +# ### Saving to float16 for VLLM +# +# We also support saving to `float16` directly. Select `merged_16bit` for float16. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See [our docs](https://unsloth.ai/docs/basics/inference-and-deployment) for more deployment options. + +# In[22]: + + +# Select ONLY 1 to save! (Both not needed!) + +# Save locally to 16bit +if False: model.save_pretrained_merged("unsloth_finetune", processor,) + +# To export and save to your Hugging Face account +if False: model.push_to_hub_merged("YOUR_USERNAME/unsloth_finetune", processor, token = "YOUR_HF_TOKEN") + + +# And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord! +# +# Some other resources: +# 1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb) +# 2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb) +# 3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) +# 4. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://unsloth.ai/docs/get-started/unsloth-notebooks)! +# +#
+# +# +# +# +# Join Discord if you need help + ⭐️ Star us on Github ⭐️ +#
+# +# This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme). diff --git a/tooling/gemma-family/codegemma.md b/tooling/gemma-family/codegemma.md new file mode 100644 index 0000000..cbf1933 --- /dev/null +++ b/tooling/gemma-family/codegemma.md @@ -0,0 +1,88 @@ +# CodeGemma + +Code completion / generation with native **fill-in-the-middle (FIM)** support. Built on **Gemma 1** — still the most recent generation as of April 2026. No CodeGemma 2/3/4 release. + +## What it is + +Gemma 1 fine-tuned on code. Trained with 80–90% FIM rate, 50/50 split between PSM (Prefix-Suffix-Middle) and SPM (Suffix-Prefix-Middle) formats. Designed for IDE autocomplete more than chat. + +## Sizes + +- **2B pretrained** — fast completion +- **7B pretrained** — higher quality completion + FIM +- **7B instruction-tuned** — code chat + +Versioned point releases exist (2B 1.1, 7B-IT 1.1). + +## Model card + +- https://ai.google.dev/gemma/docs/codegemma/model_card +- HF: https://huggingface.co/google/codegemma-7b +- Tech report: https://arxiv.org/abs/2406.11409 + +## FIM tokens + +``` +<|fim_prefix|> prefix-of-completion marker +<|fim_suffix|> cursor/insertion-point marker +<|fim_middle|> generation trigger +<|file_separator|> multi-file boundary +``` + +### PSM (Prefix-Suffix-Middle) template + +``` +<|fim_prefix|>[code before cursor]<|fim_suffix|>[code after cursor]<|fim_middle|> +``` + +Example: + +```python +prompt = ( + "<|fim_prefix|>import datetime\n" + "def calculate_age(birth_year):\n" + " current_year = datetime.date.today().year\n" + " <|fim_suffix|>\n" + " return age<|fim_middle|>" +) +``` + +The model generates the middle chunk and halts. + +### Multi-file context + +Prepend referenced files separated by `<|file_separator|>`, then the target file in FIM format. + +## Minimum invocation + +```python +from transformers import AutoTokenizer, AutoModelForCausalLM +import torch + +model_id = "google/codegemma-7b" +tokenizer = AutoTokenizer.from_pretrained(model_id) +model = AutoModelForCausalLM.from_pretrained( + model_id, torch_dtype=torch.bfloat16, device_map="auto" +) + +prompt = "<|fim_prefix|>def fib(n):\n if n <= 1:\n return n\n <|fim_suffix|>\n return a<|fim_middle|>" +inputs = tokenizer(prompt, return_tensors="pt").to("cuda") +out = model.generate(**inputs, max_new_tokens=128) +print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False)) +``` + +## Ollama + +`ollama pull codegemma:7b` or `codegemma:2b`. Ollama wraps the FIM tokens for you when you use its completion API with prefix/suffix. + +## When to choose it over base Gemma 4 + +- You need **IDE-grade FIM autocomplete** — CodeGemma was trained for it, base Gemma 4 was not. +- You want a **2B code model** — base Gemma 4 skips this size (E2B is multimodal, not code-specialized). +- You want **Ollama-native FIM** that tools like `continue.dev` can talk to. + +Base Gemma 4 31B still beats CodeGemma 7B on LiveCodeBench, so for **agentic coding** (plan, write, execute) Gemma 4 or `qwen3-coder:30b` wins. CodeGemma is the inline-cursor-assistant niche. + +## Homelab fit + +Steel141 already has qwen3-coder:30b and qwen3-coder-next:79.7B — those are stronger than CodeGemma 7B. Only reason to pull CodeGemma is if you want a tiny 2B FIM model for a latency-sensitive editor integration on a Pi or on pve197 alongside the vision stack. diff --git a/tooling/gemma-family/datagemma.md b/tooling/gemma-family/datagemma.md new file mode 100644 index 0000000..1f2f7a4 --- /dev/null +++ b/tooling/gemma-family/datagemma.md @@ -0,0 +1,76 @@ +# DataGemma + +LLM grounding with Google **Data Commons** — a public knowledge graph of 240B+ statistical data points (economics, health, demographics, science). Built on **Gemma 2 27B**. No Gemma 3 or 4 generation yet. + +## What it is + +Two flavors: + +- **DataGemma RIG** (Retrieval-Interleaved Generation): Model is fine-tuned to emit inline Data Commons queries wrapped around its own claims. Outputs look like `The population of Sunnyvale is [__DC__("population of Sunnyvale") --> "152,200"]`. An external resolver substitutes the real stat. +- **DataGemma RAG** (Retrieval-Augmented Generation): Standard RAG pipeline — query Data Commons, inject results into context, generate. + +## Sizes + +- **27B instruct** only (`datagemma-rig-27b-it`, `datagemma-rag-27b-it`). + +## Model cards + +- https://ai.google.dev/gemma/docs/datagemma +- DeepMind: https://deepmind.google/models/gemma/datagemma/ +- HF RIG: https://huggingface.co/google/datagemma-rig-27b-it +- HF RAG: https://huggingface.co/google/datagemma-rag-27b-it +- Paper: https://docs.datacommons.org/papers/DataGemma-FullPaper.pdf + +## Performance claim + +Baseline Gemma 2 factuality on the 101-query statistical eval: **5–17%**. DataGemma RIG: **~58%**. The improvement is narrow (statistical claims only) but real. + +## Prompt format + +No special template. Plain natural-language input. The difference is in the **training** and the **output format**. + +**RIG output example:** +``` +Sunnyvale has [__DC__("total population of Sunnyvale CA") --> "152,200"] +residents as of 2020, with a median age of [__DC__("median age of +Sunnyvale CA") --> "34.8"]. +``` + +Post-processing: regex out the `[__DC__("...") --> "..."]` blocks and either (a) replace with resolved Data Commons values, or (b) render as inline citations. + +**RAG flow:** query Data Commons first, inject tabular context, then prompt normally. + +## Minimum invocation — RIG + +```python +from transformers import AutoTokenizer, AutoModelForCausalLM +import torch + +model_id = "google/datagemma-rig-27b-it" +tokenizer = AutoTokenizer.from_pretrained(model_id) +model = AutoModelForCausalLM.from_pretrained( + model_id, device_map="auto", torch_dtype=torch.bfloat16 +) + +prompt = "What are the demographic trends in Sunnyvale, California?" +inputs = tokenizer(prompt, return_tensors="pt").to("cuda") +out = model.generate(**inputs, max_new_tokens=1024) +print(tokenizer.batch_decode( + out[:, inputs["input_ids"].shape[1]:], + skip_special_tokens=True +)[0]) +``` + +Then run a resolver that extracts each `[__DC__(q) --> ""]` and hits the Data Commons API. + +## When to choose it over base Gemma 4 + +- You're building a **statistics-grounded assistant** (government data, public health, economic indicators) and need low hallucination on numbers. +- You're okay with a **27B model** — DataGemma only ships at this size. +- Your domain overlaps Data Commons coverage (US-heavy, but growing internationally). + +Base Gemma 4 + a conventional RAG pipeline can do the same thing if you bring your own retriever. DataGemma's value is the **trained inline-citation behavior** (RIG) — Gemma 4 won't emit that format without prompting gymnastics. + +## Homelab fit + +Low. No current Seth project leans on statistical grounding. Niche for a news-summary use case (POS-Automation daily print) if Seth ever wants "US inflation was X% as of Y" kind of interjections — but then a simple Data Commons API call from the script is cheaper than running a 27B model. diff --git a/tooling/gemma-family/dolphingemma.md b/tooling/gemma-family/dolphingemma.md new file mode 100644 index 0000000..419252d --- /dev/null +++ b/tooling/gemma-family/dolphingemma.md @@ -0,0 +1,44 @@ +# DolphinGemma + +Marine biology / dolphin vocalization model. Developed with the Wild Dolphin Project (WDP) and Georgia Tech. Announced April 2025. + +## Status + +**Not publicly released as of April 2026.** DeepMind's page states "DolphinGemma is currently in development. On release, it will be openly available." No weights on Hugging Face, Kaggle, or Google AI for Developers. Google's 2025 post anticipated a summer 2025 open-source release; that slipped. + +If you see a `dolphingemma-*` tag somewhere, it is either community-named (not Google) or a leaked checkpoint. Verify the uploader is `google/` on HF. + +## What it is (from announcement material) + +- **Audio-in, audio-out** model. +- Trained on tens of thousands of hours of Atlantic spotted dolphin vocalizations. +- Predicts the next sound in a sequence (same training objective as an LLM, just in the audio token domain). +- **~400M parameters** — small enough to run on a Pixel phone in the field. +- Intended to plug into the CHAT (Cetacean Hearing Augmentation Telemetry) system to accelerate real-time pattern recognition during dolphin interactions. + +## Base generation + +Announced as "built on Google's open Gemma series." Google has not disclosed which generation. Given the mid-2025 timing and 400M size, most likely Gemma 3-era tech, but **this is an educated guess**, not confirmed. + +## Model card + +- DeepMind: https://deepmind.google/models/gemma/dolphingemma/ +- Blog: https://blog.google/innovation-and-ai/products/dolphingemma/ + +No model card on ai.google.dev yet (expected once released). + +## Prompt format + +Not published. The audio-token I/O format will depend on the tokenizer Google picked (e.g., SoundStream, Whisper-style, or a custom cetacean-phoneme tokenizer). Wait for release. + +## Minimum invocation + +Not possible. No weights available. + +## When to choose it + +If and when it ships: marine biology research, specifically Atlantic spotted dolphins. Fine-tunable for other cetacean species per Google. + +## Homelab fit + +Zero for normal use. If it ships and Seth wants a novelty "run the model on a cheap Pi and watch it hallucinate dolphin whistles" project, it's a candidate for the 400M-parameter slot on seth-pi. Until then, nothing to deploy. diff --git a/tooling/gemma-family/embeddinggemma.md b/tooling/gemma-family/embeddinggemma.md new file mode 100644 index 0000000..7cb6475 --- /dev/null +++ b/tooling/gemma-family/embeddinggemma.md @@ -0,0 +1,93 @@ +# EmbeddingGemma + +On-device text embedding model. Released **September 2025**. Built on **Gemma 3 with T5Gemma initialization**. No Gemma 4 generation yet. + +## What it is + +A **308M-parameter** open embedding model. Trained on 100+ languages. State-of-the-art on MTEB for its size class. Uses **Matryoshka Representation Learning (MRL)** — one model produces embeddings at 768, 512, 256, or 128 dimensions by truncation + renormalization, with graceful quality degradation. + +## Sizes + +- **308M** — only size. + +## Model card + +- https://ai.google.dev/gemma/docs/embeddinggemma/model_card +- HF: https://huggingface.co/google/embeddinggemma-300m +- HF blog: https://huggingface.co/blog/embeddinggemma +- DeepMind: https://deepmind.google/models/gemma/embeddinggemma/ +- Paper: https://arxiv.org/html/2509.20354v2 + +## Prompt format + +EmbeddingGemma uses **task-prefixed inputs** — you prepend a task descriptor to each string before embedding. + +### Query prompts + +``` +task: {task description} | query: {your query} +``` + +Default task description: `search result`. + +Example: `task: search result | query: what is the capital of France?` + +### Document prompts + +``` +title: {title or "none"} | text: {document text} +``` + +Providing a real title improves retrieval; use `none` if unavailable. + +Example: `title: Eiffel Tower | text: The Eiffel Tower is a wrought-iron lattice tower...` + +## Minimum invocation + +### Sentence-Transformers (easy path) + +```python +from sentence_transformers import SentenceTransformer + +model = SentenceTransformer("google/embeddinggemma-300m") + +query = "Which planet is known as the Red Planet?" +documents = [ + "Mars, known for its reddish appearance, is often referred to as the Red Planet.", + "Venus is often called Earth's twin due to its similar size.", +] + +q_emb = model.encode_query(query) +d_emb = model.encode_document(documents) + +print(model.similarity(q_emb, d_emb)) +``` + +The `encode_query` / `encode_document` methods apply the task prefixes automatically. + +### Shorter embeddings (MRL) + +```python +emb_768 = model.encode(text) # full +emb_256 = emb_768[:, :256] # truncate +emb_256 = emb_256 / emb_256.norm(dim=-1, keepdim=True) # renormalize +``` + +## Gotcha + +**Activations do not support `float16`.** Use `bfloat16` or `float32`. This is explicit in the model card. + +## When to choose it over base Gemma 4 + +Always, when you want embeddings. Base Gemma 4 is a generative decoder — not trained as an embedding model. EmbeddingGemma is the correct tool for retrieval, clustering, semantic search, RAG. + +Its main competitor is `nomic-embed-text` (already in Seth's pantry). EmbeddingGemma's MRL and multilingual coverage (100+ vs. nomic's ~English-focused) are the differentiators. + +## Homelab fit + +**Highest-impact variant for Seth right now, along with TranslateGemma.** + +- **Family history agent:** 100+ language support + 128d embeddings = tight, multilingual indices over scanned documents, letters, census records. MRL lets you serve fast 128d approximate search and fall back to 768d for reranking. +- **SearXNG / SethSearch:** drop-in upgrade from nomic-embed-text for the semantic-search layer. Bigger model but better quality. +- **Mortdecai memory:** use 308M EmbeddingGemma for long-term memory over chat logs. Small enough to run alongside the big mortdecai qwen35 models on pve197 or steel141 without resource contention. +- **Gemma-cookbook already has a tutorial** (`tutorials_RAG_EmbeddingGemma.ipynb` in the corpus) — skip straight to working code. diff --git a/tooling/gemma-family/index.md b/tooling/gemma-family/index.md new file mode 100644 index 0000000..753af47 --- /dev/null +++ b/tooling/gemma-family/index.md @@ -0,0 +1,55 @@ +# Gemma family index (as of April 2026) + +Specialized sister models Google has released alongside base Gemma. Base Gemma 4 instruct/base variants are **not** listed here — they live in the main corpus at `/home/claude/bin/gemma4-research/`. + +## Summary table + +| Variant | Base gen | Sizes | Canonical use case | HF URL | +|---|---|---|---|---| +| **ShieldGemma** | Gemma 2 | 2B, 9B, 27B | Text safety classification (4 harm types) | [google/shieldgemma-2b](https://huggingface.co/google/shieldgemma-2b) | +| **ShieldGemma 2** | Gemma 3 | 4B | Image safety classification (3 categories) | [google/shieldgemma-2-4b-it](https://huggingface.co/google/shieldgemma-2-4b-it) | +| **CodeGemma** | Gemma 1 | 2B, 7B, 7B-IT | Code completion with FIM tokens | [google/codegemma-7b](https://huggingface.co/google/codegemma-7b) | +| **PaliGemma** | Gemma 1 | 3B | Vision-language (task-prefix prompting) | [google/paligemma-3b-mix-448](https://huggingface.co/google/paligemma-3b-mix-448) | +| **PaliGemma 2** | Gemma 2 | 3B, 10B, 28B | Vision-language, multi-resolution | [google/paligemma2-3b-pt-448](https://huggingface.co/google/paligemma2-3b-pt-448) | +| **RecurrentGemma** | Gemma 1 | 2B, 9B | Griffin architecture, long-context throughput | [google/recurrentgemma-9b](https://huggingface.co/google/recurrentgemma-9b) | +| **DataGemma (RIG/RAG)** | Gemma 2 | 27B | Statistical grounding via Google Data Commons | [google/datagemma-rig-27b-it](https://huggingface.co/google/datagemma-rig-27b-it) | +| **MedGemma 1.5** | Gemma 3 | 4B multimodal | Medical text + image comprehension (non-clinical) | [google/medgemma-1.5-4b-it](https://huggingface.co/google/medgemma-1.5-4b-it) | +| **TxGemma** | Gemma 2 | 2B, 9B, 27B | Therapeutics/drug-discovery prediction | [google/txgemma-27b-predict](https://huggingface.co/google/txgemma-27b-predict) | +| **DolphinGemma** | Gemma (unstated) | ~400M | Marine biology / dolphin vocalization | *Not released as of April 2026* | +| **SignGemma** | Gemma 3-era | small on-device | ASL → English translation | *Limited preview only; no public weights as of April 2026* | +| **TranslateGemma** | Gemma 3 | 4B, 12B, 27B | 55-language text + image translation | [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) | +| **EmbeddingGemma** | Gemma 3 (T5Gemma init) | 308M | On-device text embeddings, MRL (768/512/256/128) | [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m) | +| **T5Gemma / T5Gemma 2** | Gemma 2 / Gemma 3 | small → 4B-4B | Encoder-decoder for summarization, translation | [google/t5gemma-2-4b-4b](https://huggingface.co/google/t5gemma-2-4b-4b) | +| **FunctionGemma** | Gemma 3 | 270M | Function-calling specialist | [google/functiongemma-270m](https://huggingface.co/google/functiongemma-270m) | +| **VaultGemma** | Gemma 3 | 1B | Differential-privacy-trained LLM | [google/vaultgemma-1b](https://huggingface.co/google/vaultgemma-1b) | +| **Gemma-APS** | Gemma 2 | 2B, 7B | Abstractive proposition segmentation | — | +| **Gemma Scope / Scope 2** | Gemma 2/3 | SAE suite | Mechanistic interpretability | [google/gemma-scope](https://huggingface.co/google/gemma-scope) | + +## Gemma 4 generation status + +**As of 2026-04-18, no specialized sister model has been re-based to Gemma 4.** Every variant in the table above is built on Gemma 1, 2, or 3. The newest specialized releases (TranslateGemma, Jan 2026; T5Gemma 2, Dec 2025) still sit on Gemma 3. This is normal for Google's cadence — sisters lag the base release by 3–6 months. Expect a MedGemma-on-Gemma-4, ShieldGemma-3-on-Gemma-4, and PaliGemma 3 over summer/fall 2026. + +## Per-variant files + +- `shieldgemma.md` — covers both ShieldGemma (text) and ShieldGemma 2 (image) +- `codegemma.md` +- `paligemma.md` — covers both PaliGemma and PaliGemma 2 +- `recurrentgemma.md` +- `datagemma.md` +- `medgemma.md` +- `txgemma.md` +- `dolphingemma.md` +- `signgemma.md` +- `translategemma.md` +- `embeddinggemma.md` +- `other-variants.md` — T5Gemma, FunctionGemma, VaultGemma, Gemma-APS, Gemma Scope + +## Picking a variant for homelab use + +Short read — see individual files for depth. + +- **Minecraft agent (Mortdecai):** consider `FunctionGemma` (270M) as a fast-path tool-router in front of the big `mortdecai:*` models. Today's setup uses the base `qwen35`/`mortdecai` tool calling, but FunctionGemma's 270M size makes it cheap enough to run as a gateway classifier. +- **AI music video gen / visualizer:** `PaliGemma 2` for detailed captioning of reference frames; `ShieldGemma 2` to pre-filter generated output before publishing. Base Gemma 4 vision (tested in existing corpus) handles the "describe this image" job fine — reach for PaliGemma 2 when you need spatial grounding (detect/segment task prefixes). +- **Family history agent:** `EmbeddingGemma` (308M) is the immediate win — small, multilingual, 100+ languages, MRL to 128d for tight indices. Pair with `TranslateGemma` if sources are in German/Polish/etc. For ingest of old scanned documents, `PaliGemma 2` + `TranslateGemma` handles image-embedded text translation. +- **General safety pass for anything going public:** `ShieldGemma 2` for images, `ShieldGemma` (Gemma 2-based) for text. Both run comfortably on pve197's CT 105. +- **Skip for homelab:** MedGemma (disclaimer-laden, not clinical-grade, niche), TxGemma (drug discovery, highly specialist), DolphinGemma (not released), SignGemma (limited preview, no weights). diff --git a/tooling/gemma-family/medgemma.md b/tooling/gemma-family/medgemma.md new file mode 100644 index 0000000..81515eb --- /dev/null +++ b/tooling/gemma-family/medgemma.md @@ -0,0 +1,72 @@ +# MedGemma + +Medical-domain variant for text + image comprehension. Current release is **MedGemma 1.5** (Jan 13, 2026), built on **Gemma 3**. **No Gemma 4 generation.** + +## What it is + +Gemma 3 fine-tuned on de-identified medical corpora — clinical notes, radiology images, dermatology images, histopathology, etc. The multimodal variants use a SigLIP image encoder trained specifically on medical imagery (not the base SigLIP). + +## Sizes + +**MedGemma 1.5** (current): **4B multimodal IT only**. Previous 27B variants were in MedGemma 1; 1.5 currently ships 4B only with improvements in medical reasoning, records interpretation, and image interpretation. + +**MedGemma 1** (prior): 4B multimodal, 27B text-only, 27B multimodal. + +## Model card + +- https://developers.google.com/health-ai-developer-foundations/medgemma/model-card +- DeepMind: https://deepmind.google/models/gemma/medgemma/ +- Repo: https://github.com/google-health/medgemma +- Tech report: https://arxiv.org/abs/2507.05201 + +## Intended use + +"A starting point that enables more efficient development of downstream healthcare applications involving medical text and images." **Developer tool, not a clinical product.** + +### Disclaimer (near-verbatim from model card) + +> The outputs generated by MedGemma are not intended to directly inform clinical diagnosis, patient management decisions, treatment recommendations, or any other direct clinical practice applications. All outputs require independent verification and clinical correlation. + +Terms of use are governed by **Health AI Developer Foundations** — a separate license from base Gemma's. Read it before shipping anything. + +## Prompt format + +Standard Gemma 3 chat template. Content messages accept `{"type": "image"}` and `{"type": "text"}`. + +## Minimum invocation + +```python +from transformers import pipeline +from PIL import Image +import requests, torch + +pipe = pipeline( + "image-text-to-text", + model="google/medgemma-1.5-4b-it", + torch_dtype=torch.bfloat16, + device="cuda", +) + +img_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png" +image = Image.open(requests.get(img_url, stream=True).raw) + +messages = [{"role": "user", "content": [ + {"type": "image", "image": image}, + {"type": "text", "text": "Describe this chest X-ray. What anatomical structures are visible?"}, +]}] + +out = pipe(text=messages, max_new_tokens=512) +print(out[0]["generated_text"][-1]["content"]) +``` + +## When to choose it over base Gemma 4 + +- You're building **healthcare dev tools** (medical image triage assistant, doctor-facing records summarizer, clinician education) and want the SigLIP-medical image encoder. +- You can accept the Health AI Developer Foundations license and embed the disclaimers. +- You need **medical-vocabulary fluency** (SNOMED, ICD, RxNorm) that base Gemma 4 doesn't have at the 4B size. + +Use base Gemma 4 otherwise — including for health-adjacent content that isn't clinical (fitness logs, nutrition, sleep data). + +## Homelab fit + +Zero. Seth is not running medical apps. Noted for completeness only. diff --git a/tooling/gemma-family/other-variants.md b/tooling/gemma-family/other-variants.md new file mode 100644 index 0000000..be7ce83 --- /dev/null +++ b/tooling/gemma-family/other-variants.md @@ -0,0 +1,74 @@ +# Other Gemma variants + +Smaller / more specialized sisters that don't warrant a full file each. All on Gemma 2 or Gemma 3. **None on Gemma 4 as of April 2026.** + +## T5Gemma / T5Gemma 2 + +**Encoder-decoder** Gemma, built by adapting decoder-only Gemma weights into a T5-style encoder-decoder via UL2 or PrefixLM pretraining. + +- **T5Gemma** (Jul 2025): Gemma 2-based. Sizes include 2B-2B, 9B-2B, 9B-9B plus new T5-sized small/base/large/XL models. +- **T5Gemma 2** (Dec 2025): Gemma 3-based. Sizes: 270M-270M, 1B-1B, 4B-4B. Multimodal (128K context). + +### When to pick it + +- **Summarization, translation, QA** where the encoder's separate bidirectional attention buys quality. +- Anywhere a decoder-only Gemma feels wasteful for "read input, compress into short output" tasks. + +HF: https://huggingface.co/google/t5gemma-2-4b-4b +Blog: https://developers.googleblog.com/en/t5gemma/ + +## FunctionGemma + +**270M tool/function-calling specialist.** Gemma 3-based. Released Dec 2025. + +Trained to emit structured function calls given a tool catalog. Not a generalist chat model — feed it a user message + tool schemas and it picks the right tool. Tiny enough to run as a pre-router in front of a larger model. + +### When to pick it + +- **Minecraft agent (Mortdecai):** plausibly interesting — use it as a 270M gateway that classifies intent and picks one of the Mortdecai tools, then hands off to the bigger `mortdecai:*` model for reasoning. Latency/cost savings if the tool decision is hot-path. +- Any agent where tool-selection volume is high and model call cost matters. + +HF: search `google/functiongemma-270m`. + +## VaultGemma + +**1B Gemma 3 trained with differential privacy.** Released Sep 2025. + +The point is the training process (DP-SGD with rigorous privacy budget) more than the weights per se. Useful as a reference checkpoint or for deployments where "model cannot have memorized training data" is a hard requirement. + +### When to pick it + +- Niche. You almost never need DP-trained weights unless you're in regulated space. + +## Gemma-APS + +**Abstractive Proposition Segmentation.** 2B and 7B on Gemma 2. Oct 2024. + +Takes a passage, splits it into atomic propositions (self-contained factual statements). Useful for fact-checking, citation mapping, and as a preprocessing step for RAG indexing. + +### When to pick it + +- Building a **fact-verification pipeline** where you need to decompose generated text into checkable claims. +- **Family history** — could decompose narrative biographical text into timestamped facts for structured storage. + +## Gemma Scope / Gemma Scope 2 + +Sparse autoencoder (SAE) suites for **mechanistic interpretability** research. Gemma Scope on Gemma 2, Gemma Scope 2 on Gemma 3 (Dec 2025). + +Not models you deploy for product work. Tools for "which neurons activate on what" research. + +HF: https://huggingface.co/google/gemma-scope + +### When to pick it + +- Interpretability research only. Not a homelab deployment candidate. + +## Summary of homelab relevance + +| Variant | Homelab fit | +|---|---| +| T5Gemma 2 4B-4B | Moderate — summarization for the news-briefing printer | +| FunctionGemma 270M | **High — tool-router for Mortdecai** | +| VaultGemma | None | +| Gemma-APS | Low-moderate — niche preprocessing step | +| Gemma Scope | None (research tool) | diff --git a/tooling/gemma-family/paligemma.md b/tooling/gemma-family/paligemma.md new file mode 100644 index 0000000..6de23c2 --- /dev/null +++ b/tooling/gemma-family/paligemma.md @@ -0,0 +1,80 @@ +# PaliGemma / PaliGemma 2 + +Vision-language model combining a **SigLIP** image encoder with a Gemma text decoder. Separate product line from base Gemma 4's built-in vision. Still on Gemma 2 as of April 2026 — **no PaliGemma 3 or PaliGemma-on-Gemma-4 yet.** + +## What it is + +- **PaliGemma** (May 2024): Gemma 1 + SigLIP-So400m/14. Sizes: 3B only. Built for task-prefix prompting (`caption`, `detect`, `segment`, `ocr`). +- **PaliGemma 2** (Dec 2024): Gemma 2 + SigLIP-So400m/14. Sizes: 3B, 10B, 28B. Each available at three resolutions: 224x224, 448x448, 896x896. +- **PaliGemma 2 mix** (Feb 2025): task-mixed instruction-tuned variant — works better out-of-the-box on ad-hoc VQA without per-task fine-tuning. + +## Sizes (PaliGemma 2) + +| Text decoder | Image encoder | Total | Resolutions | +|---|---|---|---| +| Gemma 2 2B | SigLIP-So400m | ~3B | 224 / 448 / 896 | +| Gemma 2 9B | SigLIP-So400m | ~10B | 224 / 448 / 896 | +| Gemma 2 27B | SigLIP-So400m | ~28B | 224 / 448 / 896 | + +## Model cards + +- PaliGemma 2: https://ai.google.dev/gemma/docs/paligemma/model-card-2 +- DeepMind: https://deepmind.google/models/gemma/paligemma-2/ +- HF blog: https://huggingface.co/blog/paligemma2 + +## Prompt format + +PaliGemma uses **task-prefix** prompting, not chat turns. Format: + +``` +{task} {args} +``` + +Known task prefixes (not exhaustive; Google under-documents the full list): + +| Prefix | Purpose | Example | +|---|---|---| +| `caption {lang}` | Image captioning | `caption en` | +| `ocr` | Read all text in image | `ocr` | +| `answer en {q}` | VQA | `answer en what color is the car?` | +| `detect {obj}` | Object detection (bounding boxes) | `detect cat ; dog` | +| `segment {obj}` | Segmentation masks | `segment person` | + +For `detect` and `segment`, output uses custom location (``) and segmentation (``) tokens. You need the PaliGemma postprocessing routines to convert them to pixel coords. + +## Minimum invocation — PaliGemma 2 + +```python +from transformers import AutoProcessor, PaliGemmaForConditionalGeneration +from PIL import Image +import requests, torch + +model_id = "google/paligemma2-3b-mix-448" +model = PaliGemmaForConditionalGeneration.from_pretrained( + model_id, torch_dtype=torch.bfloat16 +).to("cuda") +processor = AutoProcessor.from_pretrained(model_id) + +image = Image.open(requests.get( + "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cats.png", + stream=True +).raw).convert("RGB") + +prompt = "caption en" +inputs = processor(prompt, image, return_tensors="pt").to("cuda") +out = model.generate(**inputs, max_new_tokens=200) +gen = processor.decode(out[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True) +print(gen) +``` + +## When to choose it over base Gemma 4 vision + +- You need **structured spatial output** — bounding boxes, segmentation masks. Base Gemma 4 vision returns freeform text; PaliGemma 2 returns grid-aligned location tokens. +- You're doing **pure VQA or captioning at scale** and want a smaller, faster, task-specialized 3B model (vs. Gemma 4 E4B at 4B-effective). +- You're **fine-tuning** for a narrow vision task — PaliGemma 2 is explicitly designed to be easy to fine-tune; Google ships LoRA recipes. + +Use base Gemma 4 for **conversational multimodal** (back-and-forth with images + text reasoning). PaliGemma is the "turn image into structured text" workhorse. + +## Homelab fit + +For `ai-visualizer` (CT 167, pve197 with V100): PaliGemma 2 3B-448 is a great caption-and-ground step when producing SDXL prompts from reference images. Already tested: base Gemma 4 E4B handles "describe this image" at ~25 tok/s on pve197. PaliGemma 2 would add `detect`/`segment` for spatial control (e.g., "put the character in the upper-left quadrant of the generated scene"). diff --git a/tooling/gemma-family/recurrentgemma.md b/tooling/gemma-family/recurrentgemma.md new file mode 100644 index 0000000..fa6c020 --- /dev/null +++ b/tooling/gemma-family/recurrentgemma.md @@ -0,0 +1,67 @@ +# RecurrentGemma + +Griffin-architecture sibling. Built on **Gemma 1**. No Gemma 2/3/4 generation — the line has effectively stalled, with long-context Transformer variants (Gemma 4 with 256K context) overtaking the memory-efficiency argument. + +## What it is + +Gated linear recurrences + local sliding-window attention, replacing full self-attention. Fixed-size hidden state → **O(1) memory per token generated**, no KV cache growth. Inference stays fast and cheap as context lengthens. + +## Sizes + +- **2B** pretrained + instruct +- **9B** pretrained + instruct + +Only two sizes. No 27B. Griffin scaling beyond 9B is an open research question and Google didn't ship it. + +## Model card + +- https://ai.google.dev/gemma/docs/recurrentgemma/model_card +- DeepMind: https://deepmind.google/models/gemma/recurrentgemma/ +- Paper: https://arxiv.org/abs/2404.07839 +- Repo: https://github.com/google-deepmind/recurrentgemma + +## Architecture highlights + +- **Griffin block:** alternates two residual recurrent blocks with a local MQA attention block. +- **State size:** fixed — independent of sequence length. +- **Sliding window:** local attention only, not global. +- **Trade-off:** loses some needle-in-haystack precision vs. a full-attention Transformer, gains memory flatness. + +## Prompt format + +Standard Gemma turn format — same `user … ` as Gemma 1 IT. No RecurrentGemma-specific tokens. + +## Minimum invocation + +```python +from transformers import AutoTokenizer, AutoModelForCausalLM +import torch + +model_id = "google/recurrentgemma-9b-it" +tokenizer = AutoTokenizer.from_pretrained(model_id) +model = AutoModelForCausalLM.from_pretrained( + model_id, torch_dtype=torch.bfloat16, device_map="auto" +) + +prompt = "user\nWrite a haiku about memory.\nmodel\n" +inputs = tokenizer(prompt, return_tensors="pt").to("cuda") +out = model.generate(**inputs, max_new_tokens=100) +print(tokenizer.decode(out[0], skip_special_tokens=True)) +``` + +## When to choose it over base Gemma 4 + +Honestly: **rarely, in April 2026.** + +The original pitch was "long-context generation without KV blowup." Gemma 4 now ships with 256K context on the 26B/31B and 128K on the edge models, with efficient attention implementations. The gap RecurrentGemma was filling has narrowed. + +Reasonable residual cases: +- **Extremely memory-constrained hardware** (Jetson Nano tier) where even quantized Gemma 4 E2B KV cache is the limiting factor on sequence length. +- **Streaming-generation workloads** where latency-per-token must stay constant as output length grows into the tens of thousands of tokens. +- **Research interest** in recurrent LLMs. + +For typical homelab use, skip. The V100 on pve197 has 32GB VRAM; Gemma 4 31B at Q4 fits with room for generous context. + +## Homelab fit + +Not a strong candidate for any current Seth project. Note for file: if a CPU-only streaming-transcript use case ever comes up (e.g., running on seth-pi for always-on audio processing), RecurrentGemma 2B could reappear in scope. diff --git a/tooling/gemma-family/shieldgemma.md b/tooling/gemma-family/shieldgemma.md new file mode 100644 index 0000000..83dd4f9 --- /dev/null +++ b/tooling/gemma-family/shieldgemma.md @@ -0,0 +1,89 @@ +# ShieldGemma / ShieldGemma 2 + +Safety classifiers. Two separate product lines now: **ShieldGemma** (text, built on Gemma 2) and **ShieldGemma 2** (images, built on Gemma 3). There is no Gemma 4 generation yet. + +## What it is + +- **ShieldGemma (text):** LLM-as-a-judge safety classifier. Takes a prompt + optional model response + a policy, emits `Yes`/`No` (yes = violates policy). Four harm types. +- **ShieldGemma 2 (image):** Image classifier. Takes a PIL image, emits probabilities across three image-safety categories. Image-only — does not accept text. + +## Sizes + +- ShieldGemma: **2B, 9B, 27B** — all instruction-tuned. +- ShieldGemma 2: **4B** — only size available. + +## Model cards + +- Text: https://ai.google.dev/gemma/docs/shieldgemma/model_card +- Image: https://huggingface.co/google/shieldgemma-2-4b-it +- DeepMind: https://deepmind.google/models/gemma/shieldgemma-2/ + +## Safety categories + +**ShieldGemma (text):** +1. Sexually explicit content +2. Dangerous content +3. Hate speech +4. Harassment + +**ShieldGemma 2 (image):** +1. Sexually explicit content +2. Dangerous content +3. Violence / gore + +Note the image model dropped "hate" and "harassment" (hard to define visually) and added "violence/gore" (a visual primitive). + +## Text prompt format (ShieldGemma) + +Five-component structure: + +``` + + +user + + + +model + + + +Our safety principle is defined in the below: +* + +Does the human prompt/response violate the above principle? Your answer must start with 'Yes' or 'No'. +``` + +The model outputs one token: `Yes` (violates) or `No` (safe). Softmax the logits on those two tokens for a calibrated score. + +## Minimum invocation — ShieldGemma 2 (image) + +```python +from transformers import AutoProcessor, ShieldGemma2ForImageClassification +from PIL import Image +import torch + +model_id = "google/shieldgemma-2-4b-it" +model = ShieldGemma2ForImageClassification.from_pretrained(model_id).eval() +processor = AutoProcessor.from_pretrained(model_id) + +image = Image.open("input.jpg") +inputs = processor(images=[image], return_tensors="pt") + +with torch.inference_mode(): + out = model(**inputs) + +print(out.probabilities) # tensor of per-category "Yes" probabilities +``` + +## When to choose it over base Gemma 4 + +- You need a **calibrated safety score**, not a free-form "is this safe?" answer from the chat model. ShieldGemma emits Yes/No token logits — easy to threshold. +- You want **policy-by-policy classification** (e.g., run each category separately with different thresholds). +- You're running a moderation pipeline and need **a small, fast, purpose-trained classifier** rather than a general chat model reasoning about safety. + +Use base Gemma 4 for "explain *why* this is unsafe" narrative output. ShieldGemma is the yes/no stamp. + +## Homelab fit + +Pre-filter for `ai-visualizer` (CT 167, pve197) before publishing generated images. ShieldGemma 2 4B at Q4 fits comfortably on the Tesla V100-PCIE-32GB alongside SDXL. diff --git a/tooling/gemma-family/signgemma.md b/tooling/gemma-family/signgemma.md new file mode 100644 index 0000000..e64d4fa --- /dev/null +++ b/tooling/gemma-family/signgemma.md @@ -0,0 +1,43 @@ +# SignGemma + +ASL (American Sign Language) → English translation model. Announced at Google I/O 2025. + +## Status + +**Limited preview only. No open weights as of April 2026.** Google published an interest form at I/O 2025; access has been gated to language-service providers, accessibility researchers, and members of the Deaf community. Participants receive a TensorFlow Lite package and sample integration code. + +There is no public Hugging Face entry under `google/signgemma*`. The original plan was general availability by end-of-2025, which slipped. No updated timeline announced as of April 2026. + +## What it is (from announcement material) + +- **Video-in, text-out** on-device model. +- Best performance on **ASL → English**; training includes other sign languages for future expansion. +- Uses a **vision transformer** to analyze hand shapes, facial expressions, and motion, followed by a compact language model that produces English output. +- Sized for **smartphones and laptops** — on-device real-time translation is the design goal. + +## Base generation + +Google states it is "part of the Gemma family" and "built on the Gemini Nano framework." Likely Gemma 3-era image/video encoder on a small Gemma 3 text decoder — **not confirmed**, and the "Gemini Nano framework" language suggests it may use Gemini-not-Gemma internals despite the name. Verify at release. + +## Model card + +- LinkedIn announcement: https://www.linkedin.com/posts/googledeepmind_signgemma-is-our-most-advanced-model-for-activity-7342957078249955329-JwJJ +- Slator coverage: https://slator.com/google-invites-feedback-for-signgemma-a-new-ai-sign-language-translation-model/ + +No public model card yet. + +## Prompt format + +Not published. + +## Minimum invocation + +Not possible. No weights available. + +## When to choose it + +On release: accessibility apps, live captioning for Deaf users, sign-language learning tools. + +## Homelab fit + +Zero for typical homelab use. If Seth ever wants to pilot a real-time captioning overlay for video streams this could matter — but not buildable until Google ships weights. diff --git a/tooling/gemma-family/translategemma.md b/tooling/gemma-family/translategemma.md new file mode 100644 index 0000000..a9b0d29 --- /dev/null +++ b/tooling/gemma-family/translategemma.md @@ -0,0 +1,105 @@ +# TranslateGemma + +Multilingual text + image translation. Released **January 15, 2026**. Built on **Gemma 3** (not Gemma 4, despite being the newest variant at time of writing). + +## What it is + +Gemma 3 fine-tuned for translation across **55 languages**, using a two-stage distillation from Gemini. Retains Gemma 3's multimodal capability — can translate text embedded in images. + +## Sizes + +- **4B IT** +- **12B IT** +- **27B IT** + +Google's headline claim: the 12B beats Gemma 3 27B baseline translation quality with less than half the parameters. + +## Model card + +- HF: https://huggingface.co/google/translategemma-4b-it +- Blog: https://blog.google/innovation-and-ai/technology/developers-tools/translategemma/ +- InfoQ: https://www.infoq.com/news/2026/01/google-translategemma-models/ + +## Supported languages + +55 languages via ISO 639-1 codes (`en`, `de`, `es`, `fr`, `pl`, `ja`, `zh`, `ar`, `hi`, etc.) plus regional variants (`en-US`, `en-GB`, `pt-BR`, `pt-PT`, `de-DE`, `de-AT`, `de-CH`, `zh-CN`, `zh-TW`, etc.). + +## Prompt format + +**Strict chat-template format.** Content list must contain exactly **one entry**, with mandatory `source_lang_code` and `target_lang_code`. + +### Text translation + +```python +messages = [{ + "role": "user", + "content": [{ + "type": "text", + "source_lang_code": "cs", + "target_lang_code": "de-DE", + "text": "V nejhorším případě i k prasknutí čočky.", + }], +}] +``` + +### Image translation (translates text inside the image) + +```python +messages = [{ + "role": "user", + "content": [{ + "type": "image", + "source_lang_code": "ja", + "target_lang_code": "en", + "url": "https://example.com/japanese-sign.jpg", + }], +}] +``` + +Only `"text"` and `"image"` types are supported. Only `user` and `assistant` roles. Image input is normalized to 896×896 (256 vision tokens). + +## Minimum invocation + +```python +from transformers import pipeline +import torch + +pipe = pipeline( + "image-text-to-text", + model="google/translategemma-4b-it", + device="cuda", + dtype=torch.bfloat16, +) + +messages = [{ + "role": "user", + "content": [{ + "type": "text", + "source_lang_code": "pl", + "target_lang_code": "en", + "text": "Dziadek mieszkał w Warszawie przed wojną.", + }], +}] + +out = pipe(text=messages, max_new_tokens=200) +print(out[0]["generated_text"][-1]["content"]) +``` + +## Performance + +- **WMT24++ across 55 languages:** MetricX 5.32, COMET 81.6. +- Context window: 2K tokens (short — this is a translation model, not a long-doc summarizer). + +## When to choose it over base Gemma 4 + +- You want **translation quality > general Gemma 4** at equivalent size, with the strict prompt contract making it easy to drop into a pipeline. +- You need **image-text translation** (street signs, menus, old documents) as a first-class task. +- You care about the 55-language coverage and regionalized variants. + +Base Gemma 4 31B *can* translate — fine for casual use. TranslateGemma wins for production pipelines and when you care about metric-validated quality. + +## Homelab fit + +**Strong fit for family history agent.** If source documents are in German, Polish, Hungarian, Yiddish, or any of the 55 supported languages, TranslateGemma 4B on pve197 (GPU-backed) becomes the translation leg of an ingest pipeline: OCR → TranslateGemma → Gemma 4 for reasoning. The 4B size fits alongside the other models on the V100. + +Also useful for SearchXNG (if Seth ever wants to auto-translate non-English search results) and the news-summary print system (translate foreign-language feeds before summarization). diff --git a/tooling/gemma-family/txgemma.md b/tooling/gemma-family/txgemma.md new file mode 100644 index 0000000..8bdc54b --- /dev/null +++ b/tooling/gemma-family/txgemma.md @@ -0,0 +1,63 @@ +# TxGemma + +Therapeutic-development / drug-discovery variant. Built on **Gemma 2**. No Gemma 3 or 4 generation yet. + +## What it is + +Gemma 2 fine-tuned on 7M examples curated from the **Therapeutics Data Commons (TDC)** — predictive tasks across small molecules, proteins, nucleic acids, diseases, and cell lines. Beats or matches state-of-the-art on 50 of 66 TDC tasks; beats specialist models on 26 of them. + +## Sizes + +- **2B predict** — prediction-only, narrow prompt format. +- **9B predict** + **9B chat** — prediction plus conversational reasoning. +- **27B predict** + **27B chat** — same, larger. + +## Model card + +- https://developers.google.com/health-ai-developer-foundations/txgemma/model-card +- DeepMind: https://deepmind.google/models/gemma/txgemma/ +- Paper: https://deepmind.google/research/publications/153799/ + +## Prompting modes + +**Prediction mode** (all sizes): structured TDC-format prompt with instruction + context + question + optional few-shot. Output is a short prediction (sometimes a single token or a float). + +**Conversational mode** (9B, 27B): chat-template interactions, can explain reasoning behind predictions. + +## Minimum invocation — prediction + +```python +from transformers import pipeline + +pipe = pipeline( + "text-generation", + model="google/txgemma-27b-predict", + device="cuda", +) + +prompt = ( + "Instructions: Predict whether the molecule can penetrate the blood-brain barrier.\n" + "Context: Blood-brain barrier penetration is an important property for CNS drugs.\n" + "Question: Given the SMILES string CN1C=NC2=C1C(=O)N(C(=O)N2C)C, " + "predict BBB penetration. Answer with 'Yes' or 'No'.\n" + "Answer:" +) + +out = pipe(prompt, max_new_tokens=8) +print(out[0]["generated_text"]) +``` + +## License + +Health AI Developer Foundations — same terms as MedGemma. Non-clinical, research-use. + +## When to choose it over base Gemma 4 + +- You're doing **drug-discovery research** and need TDC-format predictions out of the box. +- You want **SMILES-aware reasoning** without a custom cheminformatics stack. + +Almost never chosen for general-purpose work. TxGemma's value is the training data, not the base model. + +## Homelab fit + +Zero. Noted for completeness. diff --git a/tooling/google-official/README.md b/tooling/google-official/README.md new file mode 100644 index 0000000..64020ec --- /dev/null +++ b/tooling/google-official/README.md @@ -0,0 +1,226 @@ +# Google-official Gemma tooling (as of 2026-04-18) + +Downloaded corpus of canonical Google / Google-DeepMind Gemma tooling. This +directory mirrors only **upstream-authored** material — no third-party forks, +no community ports, no Ollama-specific content (that lives in +`../../CORPUS_ollama_variants.md`). + +Reach for this directory when you need to verify what the canonical code/docs +actually say (prompt tokens, API shapes, supported variants) versus what a +third-party wrapper claims they say. + +## Top-line findings (flag for cross-check with rest of corpus) + +1. **Canonical JAX/Flax library (`google-deepmind/gemma`) has first-class + Gemma 4 support today** — `gm.nn.Gemma4_E4B()`, + `gm.ckpts.CheckpointPath.GEMMA4_E4B_IT`, and the unified `ChatSampler` / + `ToolSampler` API explicitly lists "2, 3, 3n, 4" as supported. This is the + least-friction Python path if you want the actual reference behavior. +2. **`google/gemma_pytorch` has NO Gemma 4 support** as of last push + (2025-05-30). `scripts/run.py` validates variant in + `['2b', '2b-v2', '7b', '9b', '27b', '1b']`; `scripts/run_multimodal.py` in + `['4b', '12b', '27b_v3']` (all Gemma 3). If someone tells you to "use + the official PyTorch repo" for Gemma 4, they're wrong — it's stale. +3. **`google/gemma.cpp` README says Gemma 2-3 + PaliGemma 2 only** (no Gemma 4 + yet), but the repo is actively pushed and explicitly notes active work + happens on the `dev` branch. Worth rechecking `dev` for Gemma 4 support. +4. **Gemma 4 uses a NEW prompt-token syntax** distinct from Gemma 1/2/3: + - Gemma 1/2/3: `` / `` (symmetric angle brackets) + - Gemma 4: `<|turn>` / `` (asymmetric pipe-brackets) + - Plus Gemma-4-new: `<|tool>`/``, `<|tool_call>`/``, + `<|tool_response>`/``, `<|think|>`, + `<|channel>`/``, `<|image>`/``, `<|audio>`/``, + string delimiter `<|"|>`. + - Roles are named directly: `system`, `user`, `model` (no role brackets). + This directly contradicts any chat template built against Gemma 3 tokens. + `CORPUS_tool_calling_format.md` already captures the tool tokens correctly + but does NOT yet document the turn-token change or the thinking tokens. +5. **`gemma.cpp` ships an HTTP API server (`gemma_api_server`) that speaks + the Google Gemini API protocol** (`POST /v1beta/models/:generateContent`, + SSE streaming, session management). This is a canonical Google-built + alternative to Ollama that implements the *real* Gemini REST API locally. + See `gemma-cpp/API_SERVER_README.md`. +6. **Tool use was NOT a trained capability in Gemma 1/2/3** — the DeepMind + `colabs/tool_use.ipynb` explicitly disclaims: *"The Gemma 1, 2 and 3 models + were not specifically trained for tool use. This is more a proof-of-concept + than an officially supported feature."* Gemma 4 is notably absent from that + caveat; the cookbook and blog confirm Gemma 4 has **native function + calling** as a first-class trained capability. +7. **No Gemma 4 technical-report PDF exists yet.** All conventional URLs + (`storage.googleapis.com/deepmind-media/gemma/Gemma4Report.pdf`, + `goo.gle/gemma4report`) return 404/redirect-to-google.com, and the + DeepMind repo README explicitly says "Gemma 4 (Coming soon)". Current + most-authoritative scientific document for the family is the Gemma 3 + technical report (arXiv:2503.19786), downloaded here. +8. **Cookbook ships a Gemma-4-specific agentic reference app** + (`apps/Gemma_4_HDP_Agentic_Security/`) demonstrating how to cryptographically + gate Gemma 4's native function calls with Ed25519-signed delegation tokens + (IETF draft `draft-helixar-hdp-agentic-delegation-00`). A more + production-shaped pattern than the toy `tool_use.ipynb`. + +## File index + +### `deepmind-gemma/` — JAX/Flax reference (the primary Python library) +Upstream: https://github.com/google-deepmind/gemma (`main`, pushed 2026-04-17). + +| File | What | Why keep | +|------|------|----------| +| `README.md` | PyPI `gemma` package entry point | Shows canonical `gm.nn.Gemma4_E4B()` API, `ChatSampler` multi-turn/multi-modal example | +| `example_multimodal.py` | Image-captioning fine-tune (Kauldron config) | Canonical end-to-end SFT example; docstring shows exact `user / / ` interleave for Gemma 3 | +| `example_lora.py` | LoRA fine-tuning recipe | Reach for this if doing PEFT against a Gemma 4 checkpoint | +| `example_dpo.py` | Direct Preference Optimization recipe | Reference for preference-alignment post-training | +| `example_classification.py` | Classification fine-tune | Shows Gemma as a feature extractor | +| `example_sharding.py` | Multi-device sharding | Reference for running >E4B on multi-GPU/TPU | +| `colab_tool_use.ipynb` | Tool-use demo (`ToolSampler`) | Important caveat inside: "not specifically trained for tool use" for Gemma 1/2/3; shows the `gm.tools.Tool` base class API | +| `colab_sampling.ipynb` | Basic inference / chat notebook | Starter-grade canonical sampling example | + +Other scripts in the repo (not downloaded, cherry-picked above): `seq2seq.py`, `npo.py`, colabs for `quantization_aware_training`, `sharding`, `tokenizer`, `multimodal`, `finetuning`, `lora_finetuning`, `lora_sampling`. Fetch directly from https://github.com/google-deepmind/gemma/tree/main when needed. + +### `gemma-pytorch/` — PyTorch reference (STALE for Gemma 4) +Upstream: https://github.com/google/gemma_pytorch (`main`, pushed 2025-05-30). + +| File | What | Why keep | +|------|------|----------| +| `README.md` | Entry-point docs | Only documents up through Gemma 3; no Gemma 4 | +| `run.py` | Text-only inference entry point | Variant whitelist `['2b','2b-v2','7b','9b','27b','1b']` — Gemma 1/2 only | +| `run_multimodal.py` | Multimodal inference entry point | Variant whitelist `['4b','12b','27b_v3']` — Gemma 3 only. Shows exact interleaved `user\n`, image, `text, \nmodel` pattern | +| `run_xla.py` | TPU/XLA inference | Reference for running Gemma 3 on TPU | + +**Do not reach for this repo for Gemma 4 work** until it's updated. Use the +DeepMind JAX lib, Hugging Face `transformers`, or gemma.cpp instead. + +### `gemma-cpp/` — C++ reference inference +Upstream: https://github.com/google/gemma.cpp (`main`, pushed 2026-04-17; active dev on `dev` branch). + +| File | What | Why keep | +|------|------|----------| +| `README.md` | Project overview, build instructions | States "Gemma 2-3 + PaliGemma 2" in features; Gemma 4 status unclear from `main` — check `dev` branch | +| `API_SERVER_README.md` | HTTP API server that speaks Gemini API protocol | **Most interesting find** — canonical drop-in for apps written against the Gemini API, runs locally. `POST /v1beta/models/:generateContent`, SSE streaming, session KV-cache | +| `examples_README.md` | Pointer to `hello_world` / `simplified_gemma` minimal embedding examples | Starting point for embedding gemma.cpp into your own C++ binary | + +### `cookbook/` — Official recipes and end-to-end apps +Upstream: https://github.com/google-gemma/cookbook (`main`, pushed 2026-04-17). +**Note:** `google-gemini/gemma-cookbook` now 301-redirects here; use the +`google-gemma/cookbook` URL going forward. + +| File | What | Why keep | +|------|------|----------| +| `README.md` | Cookbook index | Authoritative list of Gemma variants incl. Gemma 4 (E2B / E4B / 26B A4B / 31B), the ecosystem (FunctionGemma, MedGemma, PaliGemma 2, RecurrentGemma, ShieldGemma 2, T5Gemma, TranslateGemma, TxGemma, VaultGemma, EmbeddingGemma) | +| `tutorials_RAG_EmbeddingGemma.ipynb` | RAG with EmbeddingGemma | Currently the only notebook in `tutorials/` — reflects the "latest tested" tier | +| `docs_gemma_chat.ipynb` | Chatbot with Gemma on Keras | Documents the `__START_TURN_USER__ = "user\n"` / `__END_TURN__ = "\n"` format explicitly; Gemma 2 example, but the class is the canonical illustration of the Gemma 1/2/3 chat template | +| `apps_Gemma4_HDP_AgenticSecurity_README.md` | README for the HDP agentic-security reference app | Gemma-4-specific demo; real production pattern for gating native function calls | +| `apps_Gemma4_HDP_hdp_middleware.py` | Drop-in middleware (`HDPMiddleware.gate()`) | Wraps any Gemma 4 tool executor with Ed25519-signed HDT verification | +| `apps_Gemma4_HDP_AgenticSecurity.ipynb` | Walkthrough notebook | End-to-end: load Gemma 4, issue tokens, gate function calls | + +Other cookbook content worth noting (not downloaded — fetch on demand): +- `docs/capabilities/thinking.ipynb` (438 KB) — Gemma 4 thinking-mode notebook +- `docs/capabilities/audio.ipynb` — audio-input capability +- `docs/functiongemma/{finetuning-with-functiongemma,full-function-calling-sequence-with-functiongemma,function-calling-with-hf}.ipynb` — **FunctionGemma** is a separate fine-tune on the Gemma 3 270M IT checkpoint specifically for function calling; distinct from Gemma 4's native function calling +- `docs/core/pytorch_gemma.ipynb`, `keras_inference.ipynb`, `huggingface_*.ipynb` — framework-specific recipes +- `docs/integrations/langchain.ipynb` — LangChain integration +- `experiments/{MedGemma,TxGemma}/` and `experiments/[T5Gemma]Example.ipynb`, `[VaultGemma]FineTuning_Inference_Huggingface.ipynb`, etc. — domain-specific Gemma variants + +### `docs/` — Canonical ai.google.dev pages (HTML cached) +Verified URLs below; HTML snapshots saved for verbatim preservation. + +| File | Source URL | +|------|-----------| +| `ai-google-dev_core.html` | https://ai.google.dev/gemma/docs/core — Gemma 4 overview | +| `ai-google-dev_model_card_4.html` | https://ai.google.dev/gemma/docs/core/model_card_4 — Gemma 4 model card | +| `ai-google-dev_prompt_formatting_gemma4.html` | https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4 — **Gemma 4 prompt tokens (new `<\|turn>`/`` syntax)** | +| `ai-google-dev_function_calling_gemma4.html` | https://ai.google.dev/gemma/docs/capabilities/text/function-calling-gemma4 — **Gemma 4 native function calling spec** | +| `ai-google-dev_formatting.html` | https://ai.google.dev/gemma/docs/formatting — Gemma 1/2/3 prompt format (``/``) | +| `blog_announcement.html` | https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/ — Gemma 4 launch blog, 2026-04-02 | + +Other canonical doc URLs (verified to exist, not snapshotted here — visit +directly): +- https://ai.google.dev/gemma/docs — top-level Gemma hub +- https://ai.google.dev/gemma/docs/releases — release history +- https://ai.google.dev/gemma/docs/functiongemma — FunctionGemma variant +- https://ai.google.dev/gemma/docs/core/deploy_to_cloud_run_from_ai_studio — AI Studio → Cloud Run +- https://docs.cloud.google.com/vertex-ai/generative-ai/docs/open-models/use-gemma — Vertex AI +- https://aistudio.google.com — AI Studio +- https://gemma-llm.readthedocs.io — DeepMind JAX lib docs +- https://www.kaggle.com/models/google/gemma-4 — Gemma 4 on Kaggle +- https://huggingface.co/collections/google/gemma-4 — Gemma 4 on HF + +### `tech-report/` +| File | What | Source | +|------|------|--------| +| `Gemma3Report.pdf` | **Gemma 3 Technical Report** (arXiv:2503.19786, 2025-03-12) | https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf | + +No Gemma 4 technical report exists yet. Probed paths that return 404: +- `Gemma4Report.pdf`, `gemma4-report.pdf`, `Gemma4Report_v1.pdf` under + `storage.googleapis.com/deepmind-media/gemma/` +- `goo.gle/gemma4report` (not configured — redirects to google.com) + +DeepMind repo README line: **"Gemma 4 (Coming soon)"**. The Gemma 3 report +remains the most-authoritative Google-DeepMind scientific document for the +family and is the correct citation for architecture fundamentals (Grouped-Query +Attention with post-norm/pre-norm RMSNorm, 5:1 local/global attention layer +interleave, 1024-token local sliding window, RoPE base 1M on global / 10k on +local, SigLIP 400M vision encoder at 896×896 shared across 4B/12B/27B and +frozen during training, SentencePiece tokenizer with 262k vocab shared with +Gemini 2.0, knowledge distillation during pre-training, QAT checkpoints via +5k-step fine-tune for int4/SFP8). Per-variant parameter counts for Gemma 3: +1B = 698M non-embedding + 302M embedding, 4B = 3209M + 675M, 12B = 10759M + +1012M, 27B = 25600M + 1416M. + +## Canonical Gemma 4 prompt format (verified 2026-04-18) + +**Source:** https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4 and +https://ai.google.dev/gemma/docs/capabilities/text/function-calling-gemma4 + +Note the `<|turn>` / `` are asymmetric — opening has the pipe on the +left, closing has the pipe on the right. Same for all paired delimiters. + +``` +<|turn>system +<|think|> (optional — activates thinking mode) +<|tool>declaration:FUNCTION_NAME{description:<|"|>...<|"|>,parameters:{properties:{...},required:[...]}} +You are a helpful assistant. +<|turn>user +What's the weather in Tokyo? +<|turn>model +<|channel>thought +...internal reasoning... +<|tool_call>call:get_current_weather{location:<|"|>Tokyo, JP<|"|>} +<|tool_response>response:get_current_weather{temperature:15,weather:<|"|>sunny<|"|>} +The current weather in Tokyo is 15 degrees and sunny. +``` + +Recommended sampling (per model card, verified): +`temperature=1.0, top_p=0.95, top_k=64`. Tokenizer vocab = **262k** (same as +Gemini 2.0). **BOS token required** — prepend `[BOS]` / set `add_bos=True`. + +**Gemma 1/2/3 prompt format (different — for reference):** +``` +user +[message] +model +[response] +``` +Gemma 1/2/3 have no trained tool-use or thinking tokens. PT models end with +``; IT models end with ``. + +## Gemma 4 variants (canonical spec from model card) + +| Variant | Params | Active | Context | Multimodal | +|---------|--------|--------|---------|------------| +| Gemma 4 E2B | 2.3B effective (5.1B w/ embeddings), 35 layers | — | 128K | text+image+audio (30s max) | +| Gemma 4 E4B | 4.5B effective (8B w/ embeddings), 42 layers | — | 128K | text+image+audio (30s max) | +| Gemma 4 26B A4B | 25.2B total (MoE), 30 layers | 3.8B | 256K | text+image | +| Gemma 4 31B | 30.7B dense, 60 layers | — | 256K | text+image | + +All variants: Apache 2.0, base + instruction-tuned (`-it`), 140+ languages, +native function calling, native structured JSON output. Vision encoder = 150M +(E2B/E4B) or 550M (26B/31B). Image resolution token budgets: 70, 140, 280, +560, 1120. Released 2026-04-02. + +## Fetched using + +All files fetched via `curl -sL` from `raw.githubusercontent.com` on +2026-04-18. Repos enumerated via the GitHub API +(`https://api.github.com/repos///contents/`). Google docs +pages fetched via WebFetch tool. No GitHub auth needed for public raw files +(unauthenticated rate limit = 60 req/hr, sufficient for this task). diff --git a/tooling/google-official/cookbook/README.md b/tooling/google-official/cookbook/README.md new file mode 100644 index 0000000..e8bd787 --- /dev/null +++ b/tooling/google-official/cookbook/README.md @@ -0,0 +1,80 @@ + +# Welcome to the Gemma Cookbook +This is a collection of guides and examples for [Google Gemma](https://ai.google.dev/gemma/). + +> **Disclaimer:** Gemma is a family of developer-focused models built by Google DeepMind. This cookbook is a collection of guides and examples for Google Gemma. Please keep in mind that Gemma is an open model and can hallucinate as you build on examples in this cookbook. + +## Repository Structure +* [**Tutorials**](tutorials/): The latest tested notebooks for Gemma models and variants. +* [**Apps**](apps/): Full-stack demos and complex end-to-end use cases. +* [**Experiments**](experiments/): Research-focused model notebooks, including [TxGemma](experiments/TxGemma) and [MedGemma](experiments/MedGemma). +* [**Responsible**](responsible/): Notebooks for responsible AI development. +* [**Docs**](docs/): Core documentation, capabilities, and technical guides. +* [**Archive**](.archive/): All older notebooks and historical examples. + +## Get started with the Gemma models +Gemma is a family of lightweight, generative artificial intelligence (AI) open models, built from the same research and technology used to create the Gemini models. The Gemma model family includes: +* Gemma\ + The core models of the Gemma family. + * [Gemma](https://ai.google.dev/gemma/docs/core/model_card)\ + For a variety of text generation tasks and can be further tuned for specific use cases + * [Gemma 2](https://ai.google.dev/gemma/docs/core/model_card_2)\ + Higher-performing and more efficient, available in 2B, 9B, 27B parameter sizes + * [Gemma 3](https://ai.google.dev/gemma/docs/core/model_card_3)\ + Longer context window and handling text and image input, available in 1B, 4B, 12B, and 27B parameter sizes + * [Gemma 3n](https://ai.google.dev/gemma/docs/gemma-3n/model_card) \ + Designed for efficient execution on low-resource devices. Handling text, image, video, and audio input, available in E2B and E4B parameter sizes + * [Gemma 4](https://ai.google.dev/gemma/docs/core/model_card_4)\ + Well-suited for reasoning, agentic workflows, coding, and multimodal understanding, available in E2B, E4B, 26B A4B, and 31B parameter sizes. +* Gemma variants + * [CodeGemma](https://ai.google.dev/gemma/docs/codegemma)\ + Fine-tuned for a variety of coding tasks + * [DataGemma](https://ai.google.dev/gemma/docs/datagemma)\ + Fine-tuned for using Data Commons to address AI hallucinations + * [FunctionGemma](https://ai.google.dev/gemma/docs/functiongemma)\ + Fine-tuned on Gemma 3 270M IT checkpoint for function calling + * [MedGemma](https://developers.google.com/health-ai-developer-foundations/medgemma) + The MedGemma collection contains Google's most capable open models for medical text and image comprehension, built on Gemma 3. Developers can use MedGemma to accelerate building healthcare-based AI applications. MedGemma comes in two variants: a 4B multimodal version and a 27B text-only version. + * [PaliGemma](https://ai.google.dev/gemma/docs/paligemma/model-card)\ + Vision Language Model\ + For a deeper analysis of images and provide useful insights + * [PaliGemma 2](https://ai.google.dev/gemma/docs/paligemma/model-card-2)\ + VLM which incorporates the capabilities of the Gemma 2 models + * [RecurrentGemma](https://ai.google.dev/gemma/docs/recurrentgemma)\ + Based on [Griffin](https://arxiv.org/abs/2402.19427) architecture\ + For a variety of text generation tasks + * [ShieldGemma](https://ai.google.dev/gemma/docs/shieldgemma/model_card)\ + Fine-tuned for evaluating the safety of text prompt input and text output responses against a set of defined safety policies + * [ShieldGemma 2](https://ai.google.dev/gemma/docs/shieldgemma/model_card_2)\ + Fine-tuned on Gemma 3 4B IT checkpoint for image safety classification + * [T5Gemma](https://deepmind.google/models/gemma/t5gemma)\ + A collection of encoder-decoder models that provide a strong quality-inference efficiency tradeoff + * [TranslateGemma](https://huggingface.co/collections/google/translategemma)\ + A collection of open model designed to handle translation tasks across 55 languages + * [TxGemma](https://deepmind.google/models/gemma/txgemma)\ + A collection of open models designed to improve the efficiency of therapeutic development + * [VaultGemma](https://deepmind.google/models/gemma/vaultgemma)\ + An open model trained from the ground up using differential privacy to prevent memorization and leaking of training data examples + +You can find the Gemma models on the Hugging Face Hub, Kaggle, Google Cloud Vertex AI Model Garden, and [ai.nvidia.com](https://ai.nvidia.com). + +## Additional Resources +* [MedGemma on Google-Health](https://github.com/Google-Health/medgemma/tree/main/notebooks) : Google-Health has additional notebooks for using MedGemma +* [Gemma on Google Cloud](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/open-models) : GCP open models has additional notebooks for using Gemma + +## Get help +Ask a Gemma cookbook-related question on the [developer forum](https://discuss.ai.google.dev/c/gemma/10), or open an [issue](https://github.com/google-gemini/gemma-cookbook/issues) on GitHub. + +## Wish list +If you want to see additional cookbooks implemented for specific features/integrations, please open a new issue with [“Feature Request” template](https://github.com/google-gemini/gemma-cookbook/issues/new?template=feature_request.yml). + +If you want to make contributions to the Gemma Cookbook project, you are welcome to pick any idea in the [“Wish List”](https://github.com/google-gemini/gemma-cookbook/labels/wishlist) and implement it. + +## Contributing +Contributions are always welcome. Please read [contributing](https://github.com/google-gemini/gemma-cookbook/blob/main/CONTRIBUTING.md) before implementation. + +Thank you for developing with Gemma! We’re excited to see what you create. + +## Translation of this repository +* [Traditional Chinese](https://github.com/doggy8088/gemma-cookbook) +* [Simplified Chinese](https://github.com/xiaoxiong1006/gemma-cookbook) diff --git a/tooling/google-official/cookbook/apps_Gemma4_HDP_AgenticSecurity.ipynb b/tooling/google-official/cookbook/apps_Gemma4_HDP_AgenticSecurity.ipynb new file mode 100644 index 0000000..6b3d473 --- /dev/null +++ b/tooling/google-official/cookbook/apps_Gemma4_HDP_AgenticSecurity.ipynb @@ -0,0 +1,526 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "colab-badge" + }, + "source": [ + "\n", + " \n", + "
\n", + " Run in Google Colab\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "byline" + }, + "source": [ + "# Securing Gemma 4 Agentic Workflows with HDP\n", + "\n", + "**Author:** Asiri Dalugoda, Helixar Limited ([@asiridalugoda](https://github.com/asiridalugoda)) | [helixar.ai](https://helixar.ai)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gpu-instructions" + }, + "source": [ + "## Before you begin\n", + "\n", + "This notebook requires a GPU runtime. To enable GPU in Colab:\n", + "1. Go to **Runtime → Change runtime type**\n", + "2. Set **Hardware accelerator** to **GPU** (T4 is sufficient for E4B)\n", + "3. Click **Save**\n", + "\n", + "You will also need a **Hugging Face token** to download Gemma 4 (gated model):\n", + "1. Go to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)\n", + "2. Create a token with **Read** access\n", + "3. Accept the Gemma 4 model license at [huggingface.co/google/gemma-4-E4B-it](https://huggingface.co/google/gemma-4-E4B-it)\n", + "4. Run the cell below to authenticate" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "hf-login" + }, + "outputs": [], + "source": [ + "from huggingface_hub import notebook_login\n", + "notebook_login()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "overview" + }, + "source": [ + "# Securing Gemma 4 Agentic Workflows with HDP\n", + "\n", + "**Human Delegation Provenance (HDP)** is an open protocol that adds a cryptographic chain-of-custody to AI agent function calls — ensuring every tool invocation can be traced back to an authorized human principal.\n", + "\n", + "This notebook demonstrates how to integrate HDP with Gemma 4's native function-calling capability to:\n", + "\n", + "- **Verify** that Gemma 4's function calls were authorized by a human principal before execution\n", + "- **Classify** actions by irreversibility (read-only → irreversible → physical actuation)\n", + "- **Block** unauthorized or out-of-scope tool calls at the middleware layer\n", + "- **Audit** every decision with a pre-execution log\n", + "\n", + "This is particularly relevant for Gemma 4 deployments on edge devices (Jetson Nano, Raspberry Pi) where the model may be directing physical actuators offline with no out-of-band authorization check.\n", + "\n", + "**References:**\n", + "- HDP IETF draft: [draft-helixar-hdp-agentic-delegation-00](https://datatracker.ietf.org/doc/draft-helixar-hdp-agentic-delegation/)\n", + "- HDP-P (physical AI agents): [DOI 10.5281/ZENODO.19332440](https://doi.org/10.5281/ZENODO.19332440)\n", + "- Helixar: [helixar.ai](https://helixar.ai)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "b3600ee25c8e" + }, + "source": [ + "## Setup" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7a80251f52b3" + }, + "outputs": [], + "source": [ + "!pip install -q transformers torch cryptography" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ed80fe18f255" + }, + "outputs": [], + "source": [ + "# Download the middleware\n", + "!wget -q https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/apps/Gemma_4_HDP_Agentic_Security/hdp_middleware.py\n", + "\n", + "from hdp_middleware import (\n", + " HDPDelegationToken,\n", + " HDPMiddleware,\n", + " IrreversibilityClass,\n", + " DEFAULT_TOOL_CLASS_MAP,\n", + ")\n", + "from cryptography.hazmat.primitives.asymmetric.ed25519 import Ed25519PrivateKey\n", + "import json" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e88bdc7b7265" + }, + "source": [ + "## 1. Load Gemma 4\n", + "\n", + "We use the 4B Effective model for this demo. For production agentic deployments, the 26B MoE or 31B Dense models are recommended." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1e4e7779806d" + }, + "outputs": [], + "source": [ + "from transformers import pipeline\n", + "\n", + "# For edge/robotics use cases: swap to google/gemma-4-E2B-it\n", + "MODEL_ID = \"google/gemma-4-E4B-it\"\n", + "\n", + "pipe = pipeline(\n", + " \"text-generation\",\n", + " model=MODEL_ID,\n", + " device_map=\"auto\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d91e36cfb0b2" + }, + "source": [ + "## 2. Define Tools\n", + "\n", + "Gemma 4 uses structured JSON function-calling. We define a tool set spanning different IrreversibilityClasses to demonstrate the middleware's classification behaviour." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1becdb52e7f8" + }, + "outputs": [], + "source": [ + "TOOLS = [\n", + " {\n", + " \"name\": \"get_weather\",\n", + " \"description\": \"Get the current weather for a location.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"location\": {\"type\": \"string\", \"description\": \"City name\"}\n", + " },\n", + " \"required\": [\"location\"]\n", + " }\n", + " },\n", + " {\n", + " \"name\": \"send_email\",\n", + " \"description\": \"Send an email to a recipient.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"to\": {\"type\": \"string\"},\n", + " \"subject\": {\"type\": \"string\"},\n", + " \"body\": {\"type\": \"string\"}\n", + " },\n", + " \"required\": [\"to\", \"subject\", \"body\"]\n", + " }\n", + " },\n", + " {\n", + " \"name\": \"delete_file\",\n", + " \"description\": \"Permanently delete a file by path.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"path\": {\"type\": \"string\"}\n", + " },\n", + " \"required\": [\"path\"]\n", + " }\n", + " },\n", + " {\n", + " \"name\": \"actuate_robot_arm\",\n", + " \"description\": \"Command a robot arm to move to a target position.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"joint_angles\": {\"type\": \"array\", \"items\": {\"type\": \"number\"}},\n", + " \"force_limit_n\": {\"type\": \"number\"}\n", + " },\n", + " \"required\": [\"joint_angles\"]\n", + " }\n", + " }\n", + "]\n", + "\n", + "# Tools indexed by name for lookup\n", + "TOOL_REGISTRY = {t[\"name\"]: t for t in TOOLS}\n", + "print(f\"Registered {len(TOOLS)} tools\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "722948b00a92" + }, + "source": [ + "## 3. Issue an HDP Delegation Token\n", + "\n", + "The human principal generates an Ed25519 keypair and issues an HDT that specifies:\n", + "- Which tools the agent is permitted to call\n", + "- The maximum IrreversibilityClass the agent can act on\n", + "- The token's lifetime" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b0622c68dfa5" + }, + "outputs": [], + "source": [ + "# Human principal generates their signing keypair\n", + "# In production: loaded from secure key storage (HSM, OS keychain, etc.)\n", + "principal_private_key = Ed25519PrivateKey.generate()\n", + "principal_public_key = principal_private_key.public_key()\n", + "\n", + "# Issue an HDT authorizing the Gemma 4 agent to call weather queries\n", + "# and send emails (Class 0 and Class 2), but NOT delete files or actuate hardware\n", + "token = HDPDelegationToken.issue(\n", + " principal_id=\"alice@example.com\",\n", + " agent_id=\"gemma4-agent-01\",\n", + " scope=[\"get_weather\", \"send_email\"],\n", + " max_class=IrreversibilityClass.CLASS_2,\n", + " ttl_seconds=3600,\n", + " private_key=principal_private_key,\n", + ")\n", + "\n", + "print(json.dumps(token.to_dict(), indent=2))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e206f950f4bc" + }, + "source": [ + "## 4. Initialise the HDP Middleware\n", + "\n", + "The middleware takes the principal's **public key** only — it verifies but cannot issue tokens." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e24676f528bf" + }, + "outputs": [], + "source": [ + "audit_log = []\n", + "\n", + "# Confirmation callback for Class 2 (irreversible) actions.\n", + "# In production: this would invoke a push notification, SMS OTP,\n", + "# or hardware confirmation device to the human principal.\n", + "def require_human_confirmation(tool_name: str, parameters: dict) -> bool:\n", + " print(f\"\\n⚠️ Class 2 action requested: {tool_name}\")\n", + " print(f\" Parameters: {json.dumps(parameters, indent=4)}\")\n", + " response = input(\" Confirm? [y/N]: \").strip().lower()\n", + " return response == \"y\"\n", + "\n", + "middleware = HDPMiddleware(\n", + " public_key=principal_public_key,\n", + " tool_class_map=DEFAULT_TOOL_CLASS_MAP,\n", + " confirmation_callback=require_human_confirmation,\n", + " audit_log=audit_log,\n", + ")\n", + "\n", + "print(\"HDP middleware initialised.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "72d56542eba0" + }, + "source": [ + "## 5. Gemma 4 Function Call → HDP Gate → Tool Execution\n", + "\n", + "This is the core integration pattern. Every function call Gemma 4 generates is passed through `middleware.gate()` before being forwarded to tool execution." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "da20bc191e71" + }, + "outputs": [], + "source": [ + "# Simulated Gemma 4 function call outputs\n", + "# In production these come from parsing Gemma 4's structured JSON output\n", + "gemma_function_calls = [\n", + " # ✅ Should ALLOW — Class 0, in scope\n", + " {\"name\": \"get_weather\", \"parameters\": {\"location\": \"Auckland\"}},\n", + "\n", + " # ⚠️ Should CONFIRM then ALLOW — Class 2, in scope\n", + " {\"name\": \"send_email\", \"parameters\": {\n", + " \"to\": \"bob@example.com\",\n", + " \"subject\": \"Weekly report\",\n", + " \"body\": \"Please find attached.\"\n", + " }},\n", + "\n", + " # ❌ Should BLOCK — Class 2, NOT in HDT scope\n", + " {\"name\": \"delete_file\", \"parameters\": {\"path\": \"/data/important.csv\"}},\n", + "\n", + " # ❌ Should BLOCK — Class 3, physical actuation\n", + " {\"name\": \"actuate_robot_arm\", \"parameters\": {\n", + " \"joint_angles\": [0.0, -1.57, 0.0, -1.57, 0.0, 0.0],\n", + " \"force_limit_n\": 50.0\n", + " }},\n", + "]\n", + "\n", + "print(\"=\" * 60)\n", + "print(\"HDP VERIFICATION RESULTS\")\n", + "print(\"=\" * 60)\n", + "\n", + "for call in gemma_function_calls:\n", + " result = middleware.gate(call, token)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "be0d0dd05bce" + }, + "source": [ + "## 6. Audit Log\n", + "\n", + "Every decision is logged pre-execution. This is the HDP audit trail — a cryptographically linked record of what was authorized, by whom, and when." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e6dbab6d88d1" + }, + "outputs": [], + "source": [ + "print(\"\\nAUDIT LOG\")\n", + "print(\"-\" * 60)\n", + "for i, entry in enumerate(audit_log):\n", + " status = \"✅ ALLOWED\" if entry.allowed else \"❌ BLOCKED\"\n", + " print(f\"{i+1}. {status} | {entry.tool_name} | {entry.action_class.name} | {entry.reason}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bcadcb7040db" + }, + "source": [ + "## 7. Token Expiry and Scope Violation Demo\n", + "\n", + "Demonstrate that expired tokens and out-of-scope calls are blocked regardless of the action class." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "deb2e3b6b20e" + }, + "outputs": [], + "source": [ + "import time\n", + "\n", + "# Issue a token that's already expired\n", + "expired_token = HDPDelegationToken.issue(\n", + " principal_id=\"alice@example.com\",\n", + " agent_id=\"gemma4-agent-01\",\n", + " scope=[\"get_weather\"],\n", + " max_class=IrreversibilityClass.CLASS_0,\n", + " ttl_seconds=-1, # expired immediately\n", + " private_key=principal_private_key,\n", + ")\n", + "\n", + "print(\"Testing expired token:\")\n", + "middleware.gate({\"name\": \"get_weather\", \"parameters\": {\"location\": \"Auckland\"}}, expired_token)\n", + "\n", + "print(\"\\nTesting call outside HDT scope:\")\n", + "middleware.gate({\"name\": \"delete_file\", \"parameters\": {\"path\": \"/etc/passwd\"}}, token)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "b8f4acddb6fa" + }, + "source": [ + "## 8. Edge / Robotics Deployment (HDP-P)\n", + "\n", + "For Gemma 4 E2B/E4B running on Jetson Nano or Raspberry Pi and directing physical actuators, use the HDP-P extension. The key additions are:\n", + "\n", + "- **Embodiment context** — bind the token to a specific hardware ID\n", + "- **Policy attestation** — hash the deployed model weights into the token\n", + "- **Fleet delegation constraints** — prevent lateral movement across robot fleet\n", + "- **Pre-execution logging** — write audit records *before* actuator commands are issued\n", + "\n", + "See the [HDP-P specification](https://doi.org/10.5281/ZENODO.19332440) for the full EDT extension structure." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "fcf7b451d175" + }, + "outputs": [], + "source": [ + "# Minimal HDP-P Embodied Delegation Token (EDT) extension example\n", + "# This shows how to attach physical constraints to an HDT\n", + "\n", + "hdp_p_extension = {\n", + " \"hdp-p\": {\n", + " \"version\": \"0.1\",\n", + " \"embodiment\": {\n", + " \"type\": \"mobile\",\n", + " \"platform\": \"raspberry-pi-5\",\n", + " \"hardware_id\": \"rpi-serial-XXXX\", # TPM-attested in production\n", + " \"workspace\": \"lab-zone-a\"\n", + " },\n", + " \"action_scope\": {\n", + " \"permitted_actions\": [\"move_base\", \"read_sensor\"],\n", + " \"excluded_zones\": [\"human-workspace\"],\n", + " \"force_limit_n\": 10.0,\n", + " \"max_velocity_ms\": 0.5\n", + " },\n", + " \"irreversibility\": {\n", + " \"max_class\": 1, # Class 1 max for this token\n", + " \"class2_requires_confirmation\": True,\n", + " \"class3_prohibited\": True\n", + " },\n", + " \"policy_attestation\": {\n", + " \"policy_hash\": \"sha256:abc123...\", # SHA-256 of deployed model weights\n", + " \"training_run_id\": \"gemma4-e2b-it\",\n", + " \"sim_validated\": True\n", + " },\n", + " \"delegation_scope\": {\n", + " \"fleet_delegation_permitted\": False, # No lateral movement\n", + " \"max_delegation_depth\": 0\n", + " }\n", + " }\n", + "}\n", + "\n", + "print(\"HDP-P EDT extension structure:\")\n", + "print(json.dumps(hdp_p_extension, indent=2))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "b0af7c701dfc" + }, + "source": [ + "## Summary\n", + "\n", + "| Layer | What it solves | Tool |\n", + "|---|---|---|\n", + "| Gemma 4 function calling | Model generates structured tool calls | `pipeline(\"text-generation\")` |\n", + "| HDP middleware | Was this call authorized by a human? | `HDPMiddleware.gate()` |\n", + "| HDP-P EDT extension | Is this physical action within delegated bounds? | `hdp_p_extension` |\n", + "| Audit log | Pre-execution record of every decision | `audit_log` |\n", + "\n", + "The full HDP specification (IETF draft), HDP-P companion paper, TypeScript SDK, and Python bindings are available at:\n", + "\n", + "- **IETF draft:** https://datatracker.ietf.org/doc/draft-helixar-hdp-agentic-delegation/\n", + "- **HDP-P paper:** https://doi.org/10.5281/ZENODO.19332440\n", + "- **GitHub:** https://github.com/Helixar-AI\n", + "- **Site:** https://helixar.ai" + ] + } + ], + "metadata": { + "colab": { + "name": "Gemma_4_HDP_Agentic_Security.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/tooling/google-official/cookbook/apps_Gemma4_HDP_AgenticSecurity_README.md b/tooling/google-official/cookbook/apps_Gemma4_HDP_AgenticSecurity_README.md new file mode 100644 index 0000000..8811f85 --- /dev/null +++ b/tooling/google-official/cookbook/apps_Gemma4_HDP_AgenticSecurity_README.md @@ -0,0 +1,75 @@ +# Gemma 4 + HDP: Securing Agentic Function Calls + +This example demonstrates how to integrate the **Human Delegation Provenance (HDP)** protocol with **Gemma 4's native function-calling** to cryptographically verify that every tool invocation was authorized by a human principal before execution. + +## The problem + +Gemma 4 is purpose-built for agentic workflows. Its native function-calling lets it autonomously call tools and APIs across multi-step plans — on anything from a cloud workstation to a Raspberry Pi running a robot offline. + +This creates a gap: when Gemma 4 generates a function call, there is no verifiable record that a human principal authorized that specific action. An injected prompt, a compromised system prompt, or a lateral pivot from another agent can trigger function calls that are indistinguishable from legitimate requests at the tool interface. + +HDP closes this gap. + +## What HDP does + +HDP (IETF draft: `draft-helixar-hdp-agentic-delegation-00`) provides: + +- **Ed25519-signed Delegation Tokens (HDTs)** issued by a human principal +- **Scope constraints** — which tools the agent is permitted to call +- **Irreversibility classification** (Class 0–3) — from read-only to physical actuation +- **Pre-execution verification** — the middleware gate runs *before* any tool executes +- **Audit log** — a tamper-evident record of every authorization decision + +For Gemma 4 on **edge devices directing physical actuators** (Jetson Nano, Raspberry Pi + robot arm), the HDP-P companion specification adds embodiment constraints, policy attestation, and fleet delegation controls. + +## Files + +| File | Description | +|---|---| +| `Gemma_4_HDP_Agentic_Security.ipynb` | Full walkthrough notebook — load Gemma 4, issue tokens, gate function calls | +| `hdp_middleware.py` | Drop-in middleware — `HDPMiddleware.gate()` wraps any Gemma 4 tool executor | + +## Quick start + +```python +from hdp_middleware import HDPDelegationToken, HDPMiddleware, IrreversibilityClass +from cryptography.hazmat.primitives.asymmetric.ed25519 import Ed25519PrivateKey + +# Human principal issues a delegation token +private_key = Ed25519PrivateKey.generate() +token = HDPDelegationToken.issue( + principal_id="alice@example.com", + agent_id="gemma4-agent-01", + scope=["get_weather", "send_email"], + max_class=IrreversibilityClass.CLASS_2, + ttl_seconds=3600, + private_key=private_key, +) + +# Middleware verifies every Gemma 4 function call before execution +middleware = HDPMiddleware(public_key=private_key.public_key()) + +result = middleware.gate( + function_call={"name": "send_email", "parameters": {"to": "bob@example.com", ...}}, + token=token, +) + +if result.allowed: + execute_tool(function_call) +``` + +## Irreversibility classes + +| Class | Definition | Authorization | +|---|---|---| +| 0 | Fully reversible — reads, queries | HDT sufficient | +| 1 | Reversible with effort — writes, moves | HDT sufficient | +| 2 | Irreversible — send, delete, publish | HDT + principal confirmation | +| 3 | Irreversible + potentially harmful — physical actuation | Dual-principal required (HDP-P) | + +## References + +- **IETF draft:** https://datatracker.ietf.org/doc/draft-helixar-hdp-agentic-delegation/ +- **Zenodo DOI:** https://doi.org/10.5281/zenodo.19332023 +- **HDP-P (physical AI):** https://doi.org/10.5281/ZENODO.19332440 +- **Helixar:** https://helixar.ai diff --git a/tooling/google-official/cookbook/apps_Gemma4_HDP_hdp_middleware.py b/tooling/google-official/cookbook/apps_Gemma4_HDP_hdp_middleware.py new file mode 100644 index 0000000..33b4442 --- /dev/null +++ b/tooling/google-official/cookbook/apps_Gemma4_HDP_hdp_middleware.py @@ -0,0 +1,390 @@ +""" +HDP (Human Delegation Provenance) middleware for Gemma 4 function calling. + +Intercepts Gemma 4 function call outputs and verifies that a valid HDP +Delegation Token (HDT) authorizes the requested action before forwarding +to the tool execution layer. + +Reference: draft-helixar-hdp-agentic-delegation-00 + https://datatracker.ietf.org/doc/draft-helixar-hdp-agentic-delegation/ + DOI: 10.5281/zenodo.19332023 + +For physical AI agents (robots, edge devices), see HDP-P: + DOI: 10.5281/ZENODO.19332440 +""" + +import json +import time +import base64 +import hashlib +import hmac +from dataclasses import dataclass, field +from enum import IntEnum +from typing import Optional, Callable, Any +from cryptography.hazmat.primitives.asymmetric.ed25519 import ( + Ed25519PrivateKey, + Ed25519PublicKey, +) +from cryptography.hazmat.primitives.serialization import ( + Encoding, + PublicFormat, + PrivateFormat, + NoEncryption, +) +from cryptography.exceptions import InvalidSignature + + +# --------------------------------------------------------------------------- +# Irreversibility Classes (HDP-P §4.2) +# --------------------------------------------------------------------------- + +class IrreversibilityClass(IntEnum): + """ + Classification of physical action reversibility (HDP-P §4.2). + + For digital-only Gemma 4 deployments, all tool calls are Class 0 or 1. + For edge/robotics deployments (Jetson Nano, Raspberry Pi + actuators), + Class 2 and 3 require explicit pre-execution confirmation. + """ + CLASS_0 = 0 # Fully reversible — read-only, query, observe + CLASS_1 = 1 # Reversible with effort — write, create, move + CLASS_2 = 2 # Irreversible under normal conditions — delete, send, publish + CLASS_3 = 3 # Irreversible and potentially harmful — physical actuation + + +# Default tool → irreversibility class mapping. +# Deployments should override this for their specific tool set. +DEFAULT_TOOL_CLASS_MAP: dict[str, IrreversibilityClass] = { + # Class 0 — safe reads + "get_weather": IrreversibilityClass.CLASS_0, + "search_web": IrreversibilityClass.CLASS_0, + "read_file": IrreversibilityClass.CLASS_0, + "query_database": IrreversibilityClass.CLASS_0, + # Class 1 — reversible writes + "write_file": IrreversibilityClass.CLASS_1, + "create_record": IrreversibilityClass.CLASS_1, + "move_object": IrreversibilityClass.CLASS_1, + # Class 2 — irreversible digital actions + "send_email": IrreversibilityClass.CLASS_2, + "delete_file": IrreversibilityClass.CLASS_2, + "publish_post": IrreversibilityClass.CLASS_2, + "execute_transaction": IrreversibilityClass.CLASS_2, + # Class 3 — physical actuation (HDP-P scope) + "actuate_robot_arm": IrreversibilityClass.CLASS_3, + "command_vehicle": IrreversibilityClass.CLASS_3, + "dispense_fluid": IrreversibilityClass.CLASS_3, + "apply_force": IrreversibilityClass.CLASS_3, +} + + +# --------------------------------------------------------------------------- +# HDP Delegation Token (HDT) +# --------------------------------------------------------------------------- + +@dataclass +class HDPDelegationToken: + """ + Simplified HDT structure derived from draft-helixar-hdp-agentic-delegation-00. + + In production, HDTs are JOSE/JWT tokens signed with Ed25519. + This implementation provides the core claims structure and verification logic. + + Claims: + iss — issuer (human principal identifier) + sub — subject (agent being delegated to) + iat — issued at (unix timestamp) + exp — expiry (unix timestamp) + scope — list of permitted tool names or wildcard patterns + max_irreversibility_class — ceiling on action class (0–3) + delegation_depth — remaining delegation hops permitted + nonce — replay-attack prevention + """ + iss: str + sub: str + iat: int + exp: int + scope: list[str] + max_irreversibility_class: IrreversibilityClass + delegation_depth: int = 1 + nonce: str = "" + _signature: bytes = field(default=b"", repr=False) + _public_key: Optional[Ed25519PublicKey] = field(default=None, repr=False) + + @classmethod + def issue( + cls, + principal_id: str, + agent_id: str, + scope: list[str], + max_class: IrreversibilityClass, + ttl_seconds: int = 3600, + delegation_depth: int = 1, + private_key: Optional[Ed25519PrivateKey] = None, + ) -> "HDPDelegationToken": + """ + Issue a new HDT signed by the human principal's Ed25519 private key. + + Args: + principal_id: Human principal identifier (e.g. "alice@example.com") + agent_id: Agent being delegated to (e.g. "gemma4-agent-01") + scope: List of permitted tool names. Use ["*"] for unrestricted. + max_class: Maximum IrreversibilityClass this token permits. + ttl_seconds: Token lifetime in seconds. + delegation_depth: How many times this token can be re-delegated. + private_key: Ed25519 private key for signing. Generated if None. + """ + now = int(time.time()) + nonce = base64.urlsafe_b64encode( + hashlib.sha256(f"{principal_id}{now}".encode()).digest()[:16] + ).decode() + + token = cls( + iss=principal_id, + sub=agent_id, + iat=now, + exp=now + ttl_seconds, + scope=scope, + max_irreversibility_class=max_class, + delegation_depth=delegation_depth, + nonce=nonce, + ) + + if private_key is None: + private_key = Ed25519PrivateKey.generate() + + token._public_key = private_key.public_key() + token._signature = private_key.sign(token._canonical_bytes()) + return token + + def _canonical_bytes(self) -> bytes: + """Deterministic serialisation for signing/verification.""" + payload = { + "iss": self.iss, + "sub": self.sub, + "iat": self.iat, + "exp": self.exp, + "scope": sorted(self.scope), + "max_irreversibility_class": int(self.max_irreversibility_class), + "delegation_depth": self.delegation_depth, + "nonce": self.nonce, + } + return json.dumps(payload, sort_keys=True, separators=(",", ":")).encode() + + def verify(self, public_key: Ed25519PublicKey) -> bool: + """Verify the token's Ed25519 signature.""" + try: + public_key.verify(self._signature, self._canonical_bytes()) + return True + except InvalidSignature: + return False + + def is_expired(self) -> bool: + return int(time.time()) > self.exp + + def permits_tool(self, tool_name: str) -> bool: + """Check whether this token's scope covers the requested tool.""" + if "*" in self.scope: + return True + return tool_name in self.scope + + def permits_class(self, action_class: IrreversibilityClass) -> bool: + return action_class <= self.max_irreversibility_class + + def to_dict(self) -> dict: + return { + "iss": self.iss, + "sub": self.sub, + "iat": self.iat, + "exp": self.exp, + "scope": self.scope, + "max_irreversibility_class": int(self.max_irreversibility_class), + "delegation_depth": self.delegation_depth, + "nonce": self.nonce, + } + + +# --------------------------------------------------------------------------- +# Verification result +# --------------------------------------------------------------------------- + +@dataclass +class VerificationResult: + allowed: bool + reason: str + tool_name: str + action_class: IrreversibilityClass + token_iss: Optional[str] = None + requires_confirmation: bool = False + + def __str__(self) -> str: + status = "ALLOWED" if self.allowed else "BLOCKED" + conf = " [CONFIRMATION REQUIRED]" if self.requires_confirmation else "" + return ( + f"[HDP] {status}{conf} — tool={self.tool_name} " + f"class={self.action_class.name} reason={self.reason}" + ) + + +# --------------------------------------------------------------------------- +# HDP Middleware +# --------------------------------------------------------------------------- + +class HDPMiddleware: + """ + HDP verification gate for Gemma 4 function calls. + + Sits between Gemma 4's function-call output and the tool execution layer. + For each function call Gemma 4 generates, this middleware: + + 1. Parses the tool name from the function call. + 2. Looks up its IrreversibilityClass. + 3. Verifies the attached HDT (signature, expiry, scope, class ceiling). + 4. For Class 2 actions, invokes the confirmation callback. + 5. Blocks Class 3 actions unless explicitly pre-authorized with + dual verification (HDP-P §5.4). + 6. Logs all decisions before forwarding or blocking. + + Usage: + middleware = HDPMiddleware( + public_key=principal_public_key, + tool_class_map=DEFAULT_TOOL_CLASS_MAP, + confirmation_callback=my_confirmation_fn, + ) + + # Wrap your tool executor: + result = middleware.gate( + function_call=gemma_output, # {"name": "...", "parameters": {...}} + token=hdp_token, + ) + + if result.allowed: + output = execute_tool(function_call) + """ + + def __init__( + self, + public_key: Ed25519PublicKey, + tool_class_map: dict[str, IrreversibilityClass] = None, + confirmation_callback: Optional[Callable[[str, dict], bool]] = None, + default_class: IrreversibilityClass = IrreversibilityClass.CLASS_1, + audit_log: Optional[list] = None, + ): + """ + Args: + public_key: Principal's Ed25519 public key for HDT verification. + tool_class_map: Mapping of tool names to IrreversibilityClass. + Defaults to DEFAULT_TOOL_CLASS_MAP. + confirmation_callback: Called for Class 2 actions. Receives + (tool_name, parameters) and returns bool. + If None, Class 2 actions are blocked. + default_class: Class assigned to unknown tools. Defaults to CLASS_1. + audit_log: Optional list to append VerificationResult records to. + """ + self.public_key = public_key + self.tool_class_map = tool_class_map or DEFAULT_TOOL_CLASS_MAP + self.confirmation_callback = confirmation_callback + self.default_class = default_class + self.audit_log = audit_log if audit_log is not None else [] + + def classify(self, tool_name: str) -> IrreversibilityClass: + """Return the IrreversibilityClass for a tool name.""" + return self.tool_class_map.get(tool_name, self.default_class) + + def gate( + self, + function_call: dict, + token: HDPDelegationToken, + ) -> VerificationResult: + """ + Main verification gate. Call this for every Gemma 4 function call. + + Args: + function_call: Gemma 4 function call dict: + {"name": "tool_name", "parameters": {...}} + token: HDPDelegationToken issued by the human principal. + + Returns: + VerificationResult — check .allowed before executing the tool. + """ + tool_name = function_call.get("name", "") + parameters = function_call.get("parameters", {}) + action_class = self.classify(tool_name) + + def _block(reason: str) -> VerificationResult: + result = VerificationResult( + allowed=False, + reason=reason, + tool_name=tool_name, + action_class=action_class, + token_iss=token.iss if token else None, + ) + self.audit_log.append(result) + print(result) + return result + + def _allow(reason: str, requires_confirmation: bool = False) -> VerificationResult: + result = VerificationResult( + allowed=True, + reason=reason, + tool_name=tool_name, + action_class=action_class, + token_iss=token.iss, + requires_confirmation=requires_confirmation, + ) + self.audit_log.append(result) + print(result) + return result + + # ── 1. Token presence ─────────────────────────────────────────────── + if token is None: + return _block("no HDT present") + + # ── 2. Expiry ─────────────────────────────────────────────────────── + if token.is_expired(): + return _block("HDT expired") + + # ── 3. Signature ──────────────────────────────────────────────────── + if not token.verify(self.public_key): + return _block("HDT signature invalid") + + # ── 4. Scope ──────────────────────────────────────────────────────── + if not token.permits_tool(tool_name): + return _block(f"tool '{tool_name}' not in HDT scope") + + # ── 5. Irreversibility class ceiling ──────────────────────────────── + if not token.permits_class(action_class): + return _block( + f"action class {action_class.name} exceeds HDT ceiling " + f"{token.max_irreversibility_class.name}" + ) + + # ── 6. Class 3 — always blocked without explicit dual verification ── + if action_class == IrreversibilityClass.CLASS_3: + # In production: implement dual-principal confirmation (HDP-P §5.4) + return _block( + "Class 3 physical action requires dual-principal confirmation " + "(HDP-P §5.4) — not implemented in this middleware instance" + ) + + # ── 7. Class 2 — confirmation callback required ───────────────────── + if action_class == IrreversibilityClass.CLASS_2: + if self.confirmation_callback is None: + return _block( + "Class 2 action requires confirmation callback — " + "none configured" + ) + confirmed = self.confirmation_callback(tool_name, parameters) + if not confirmed: + return _block("Class 2 action — confirmation denied by principal") + return _allow("Class 2 confirmed by principal", requires_confirmation=True) + + # ── 8. Class 0 / 1 — allow ───────────────────────────────────────── + return _allow(f"HDT valid, scope and class verified") + + def gate_batch( + self, + function_calls: list[dict], + token: HDPDelegationToken, + ) -> list[VerificationResult]: + """Verify a list of function calls. Returns one result per call.""" + return [self.gate(fc, token) for fc in function_calls] diff --git a/tooling/google-official/cookbook/docs_gemma_chat.ipynb b/tooling/google-official/cookbook/docs_gemma_chat.ipynb new file mode 100644 index 0000000..1eb5aa4 --- /dev/null +++ b/tooling/google-official/cookbook/docs_gemma_chat.ipynb @@ -0,0 +1,1008 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "G3MMAcssHTML" + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Tce3stUlHN0L" + }, + "source": [ + "##### Copyright 2024 Google LLC." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "tuOe1ymfHZPu" + }, + "outputs": [], + "source": [ + "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4qxv4Sn9b8CE" + }, + "source": [ + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
\n", + " View on ai.google.dev\n", + " \n", + " Run in Google Colab\n", + " \n", + " Run in Kaggle\n", + " \n", + " Open in Vertex AI\n", + " \n", + " View source on GitHub\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "402c3d8a" + }, + "source": [ + "# Building a chatbot with Gemma" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "b686fd95" + }, + "source": [ + "Large Language Models (LLMs) such as Gemma excel at generating informative responses, making them ideal for building virtual assistants and chatbots.\n", + "\n", + "Conventionally, LLMs operate in a stateless manner, meaning they lack an inherent memory to store past conversations. Each prompt or question is processed independently, disregarding prior interactions. However, a crucial aspect of natural conversation is the ability to retain context from prior interactions. To overcome this limitation and enable LLMs to maintain conversation context, they must be explicitly provided with relevant information such as the conversation history (or pertinent parts) into each new prompt presented to the LLM.\n", + "\n", + "This tutorial shows you how to develop a chatbot using the instruction-tuned model variant of Gemma." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "29732090" + }, + "source": [ + "## Setup" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QQ6W7NzRe1VM" + }, + "source": [ + "### Gemma setup\n", + "\n", + "To complete this tutorial, you'll first need to complete the setup instructions at [Gemma setup](https://ai.google.dev/gemma/docs/setup). The Gemma setup instructions show you how to do the following:\n", + "\n", + "* Get access to Gemma on kaggle.com.\n", + "* Select a Colab runtime with sufficient resources to run\n", + " the Gemma 2B model.\n", + "* Generate and configure a Kaggle username and API key.\n", + "\n", + "After you've completed the Gemma setup, move on to the next section, where you'll set environment variables for your Colab environment." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_gN-IVRC3dQe" + }, + "source": [ + "### Set environment variables\n", + "\n", + "Set environment variables for `KAGGLE_USERNAME` and `KAGGLE_KEY`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "DrBoa_Urw9Vx" + }, + "outputs": [], + "source": [ + "import os\n", + "from google.colab import userdata\n", + "\n", + "# Note: `userdata.get` is a Colab API. If you're not using Colab, set the env\n", + "# vars as appropriate for your system.\n", + "os.environ[\"KAGGLE_USERNAME\"] = userdata.get('KAGGLE_USERNAME')\n", + "os.environ[\"KAGGLE_KEY\"] = userdata.get('KAGGLE_KEY')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "z9oy3QUmXtSd" + }, + "source": [ + "### Install dependencies\n", + "\n", + "Install Keras and KerasNLP." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "a973dd7a" + }, + "outputs": [], + "source": [ + "# Install Keras 3 last. See https://keras.io/getting_started/ for more details.\n", + "!pip install -q tensorflow-cpu\n", + "!pip install -q -U keras-nlp tensorflow-hub\n", + "!pip install -q -U \"keras>=3\"\n", + "!pip install -q -U tensorflow-text" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Wme8666dUPVR" + }, + "source": [ + "### Select a backend\n", + "\n", + "Keras is a high-level, multi-framework deep learning API designed for simplicity and ease of use. [Keras 3](https://keras.io/keras_3){:.external} lets you choose the backend: TensorFlow, JAX, or PyTorch. All three will work for this tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "67d12d2d" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# Select JAX as the backend\n", + "os.environ[\"KERAS_BACKEND\"] = \"jax\"\n", + "\n", + "# Pre-allocate 100% of TPU memory to minimize memory fragmentation\n", + "os.environ[\"XLA_PYTHON_CLIENT_MEM_FRACTION\"] = \"1.0\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ajm_SGWTUjVd" + }, + "source": [ + "### Import packages\n", + "\n", + "Import Keras and KerasNLP." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3lyn9FxPUok8" + }, + "outputs": [], + "source": [ + "import keras\n", + "import keras_nlp\n", + "\n", + "# for reproducibility\n", + "keras.utils.set_random_seed(42)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "39dc9d5b" + }, + "source": [ + "### Instantiate the model\n", + "\n", + "KerasNLP provides implementations of many popular [model architectures](https://keras.io/api/keras_nlp/models/){:.external}. In this tutorial, you'll instantiate the model using `GemmaCausalLM`, an end-to-end Gemma model for causal language modeling. A causal language model predicts the next token based on previous tokens.\n", + "\n", + "Instantiate the model using the `from_preset` method:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c86dc8fe" + }, + "outputs": [], + "source": [ + "gemma_lm = keras_nlp.models.GemmaCausalLM.from_preset(\"gemma2_instruct_2b_en\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tcCv0BSdVFv9" + }, + "source": [ + "The `GemmaCausalLM.from_preset()` function instantiates the model from a preset architecture and weights. In the code above, the string `\"gemma2_instruct_2b_en\"` specifies the preset the Gemma 2 2B model with 2 billion parameters. Gemma models with [7B, 9B, and 27B parameters](https://ai.google.com/gemma/docs/get_started#models-list) are also available. You can find the code strings for Gemma models in their **Model Variation** listings on [Kaggle](https://www.kaggle.com/models/google/gemma).\n", + "\n", + "Note: To run the larger models in Colab, you need access to the premium GPUs available in paid plans. Alternatively, you can perform inferences using Kaggle notebooks or Google Cloud projects." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bLNx8AoeVe-a" + }, + "source": [ + "Use the `summary` method to get more info about the model:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3MorieIpVksu" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Preprocessor: \"gemma_causal_lm_preprocessor\"\n",
+              "
\n" + ], + "text/plain": [ + "\u001b[1mPreprocessor: \"gemma_causal_lm_preprocessor\"\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n",
+              "┃ Tokenizer (type)                                                                                Vocab # ┃\n",
+              "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n",
+              "│ gemma_tokenizer (GemmaTokenizer)                   │                                             256,000 │\n",
+              "└────────────────────────────────────────────────────┴─────────────────────────────────────────────────────┘\n",
+              "
\n" + ], + "text/plain": [ + "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n", + "┃\u001b[1m \u001b[0m\u001b[1mTokenizer (type) \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1m Vocab #\u001b[0m\u001b[1m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n", + "│ gemma_tokenizer (\u001b[38;5;33mGemmaTokenizer\u001b[0m) │ \u001b[38;5;34m256,000\u001b[0m │\n", + "└────────────────────────────────────────────────────┴─────────────────────────────────────────────────────┘\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
Model: \"gemma_causal_lm\"\n",
+              "
\n" + ], + "text/plain": [ + "\u001b[1mModel: \"gemma_causal_lm\"\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n",
+              "┃ Layer (type)                   Output Shape                       Param #  Connected to               ┃\n",
+              "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n",
+              "│ padding_mask (InputLayer)     │ (None, None)              │               0 │ -                          │\n",
+              "├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤\n",
+              "│ token_ids (InputLayer)        │ (None, None)              │               0 │ -                          │\n",
+              "├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤\n",
+              "│ gemma_backbone                │ (None, None, 2304)        │   2,614,341,888 │ padding_mask[0][0],        │\n",
+              "│ (GemmaBackbone)               │                           │                 │ token_ids[0][0]            │\n",
+              "├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤\n",
+              "│ token_embedding               │ (None, None, 256000)      │     589,824,000 │ gemma_backbone[0][0]       │\n",
+              "│ (ReversibleEmbedding)         │                           │                 │                            │\n",
+              "└───────────────────────────────┴───────────────────────────┴─────────────────┴────────────────────────────┘\n",
+              "
\n" + ], + "text/plain": [ + "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n", + "┃\u001b[1m \u001b[0m\u001b[1mLayer (type) \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mOutput Shape \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1m Param #\u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mConnected to \u001b[0m\u001b[1m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n", + "│ padding_mask (\u001b[38;5;33mInputLayer\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;45mNone\u001b[0m) │ \u001b[38;5;34m0\u001b[0m │ - │\n", + "├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤\n", + "│ token_ids (\u001b[38;5;33mInputLayer\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;45mNone\u001b[0m) │ \u001b[38;5;34m0\u001b[0m │ - │\n", + "├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤\n", + "│ gemma_backbone │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m2304\u001b[0m) │ \u001b[38;5;34m2,614,341,888\u001b[0m │ padding_mask[\u001b[38;5;34m0\u001b[0m][\u001b[38;5;34m0\u001b[0m], │\n", + "│ (\u001b[38;5;33mGemmaBackbone\u001b[0m) │ │ │ token_ids[\u001b[38;5;34m0\u001b[0m][\u001b[38;5;34m0\u001b[0m] │\n", + "├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤\n", + "│ token_embedding │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m256000\u001b[0m) │ \u001b[38;5;34m589,824,000\u001b[0m │ gemma_backbone[\u001b[38;5;34m0\u001b[0m][\u001b[38;5;34m0\u001b[0m] │\n", + "│ (\u001b[38;5;33mReversibleEmbedding\u001b[0m) │ │ │ │\n", + "└───────────────────────────────┴───────────────────────────┴─────────────────┴────────────────────────────┘\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
 Total params: 2,614,341,888 (9.74 GB)\n",
+              "
\n" + ], + "text/plain": [ + "\u001b[1m Total params: \u001b[0m\u001b[38;5;34m2,614,341,888\u001b[0m (9.74 GB)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
 Trainable params: 2,614,341,888 (9.74 GB)\n",
+              "
\n" + ], + "text/plain": [ + "\u001b[1m Trainable params: \u001b[0m\u001b[38;5;34m2,614,341,888\u001b[0m (9.74 GB)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
 Non-trainable params: 0 (0.00 B)\n",
+              "
\n" + ], + "text/plain": [ + "\u001b[1m Non-trainable params: \u001b[0m\u001b[38;5;34m0\u001b[0m (0.00 B)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "gemma_lm.summary()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ArZPOzFpVp6S" + }, + "source": [ + "As you can see from the summary, the model has 2.6 billion trainable parameters.\n", + "\n", + "Note: For purposes of naming the model (\"2B\"), the embedding layer is not counted against the number of parameters." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1WpS39TBYql9" + }, + "source": [ + "### Define formatting helper functions" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3-obTC1jZGpZ" + }, + "outputs": [], + "source": [ + "from IPython.display import Markdown\n", + "import textwrap\n", + "\n", + "def display_chat(prompt, text):\n", + " formatted_prompt = \"🙋‍♂️
\" + prompt + \"
\"\n", + " text = text.replace('•', ' *')\n", + " text = textwrap.indent(text, '> ', predicate=lambda _: True)\n", + " formatted_text = \"🤖\\n\\n\" + text + \"\\n\"\n", + " return Markdown(formatted_prompt+formatted_text)\n", + "\n", + "def to_markdown(text):\n", + " text = text.replace('•', ' *')\n", + " return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5ca54e8c" + }, + "source": [ + "## Building the chatbot\n", + "\n", + "The Gemma instruction-tuned model `gemma2_instruct_2b_en` is fine-tuned to understand the following turn tokens:\n", + "\n", + "```\n", + "user\\n ... \\n\n", + "model\\n ... \\n\n", + "```\n", + "\n", + "This tutorial uses these tokens to build the chatbot. Refer to [Formatting and system instructions](https://ai.google.dev/gemma/docs/formatting) for more information on Gemma control tokens.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9583dfd1" + }, + "source": [ + "### Create a chat helper to manage the conversation state" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e4e9a187" + }, + "outputs": [], + "source": [ + "class ChatState():\n", + " \"\"\"\n", + " Manages the conversation history for a turn-based chatbot\n", + " Follows the turn-based conversation guidelines for the Gemma family of models\n", + " documented at https://ai.google.dev/gemma/docs/formatting\n", + " \"\"\"\n", + "\n", + " __START_TURN_USER__ = \"user\\n\"\n", + " __START_TURN_MODEL__ = \"model\\n\"\n", + " __END_TURN__ = \"\\n\"\n", + "\n", + " def __init__(self, model, system=\"\"):\n", + " \"\"\"\n", + " Initializes the chat state.\n", + "\n", + " Args:\n", + " model: The language model to use for generating responses.\n", + " system: (Optional) System instructions or bot description.\n", + " \"\"\"\n", + " self.model = model\n", + " self.system = system\n", + " self.history = []\n", + "\n", + " def add_to_history_as_user(self, message):\n", + " \"\"\"\n", + " Adds a user message to the history with start/end turn markers.\n", + " \"\"\"\n", + " self.history.append(self.__START_TURN_USER__ + message + self.__END_TURN__)\n", + "\n", + " def add_to_history_as_model(self, message):\n", + " \"\"\"\n", + " Adds a model response to the history with start/end turn markers.\n", + " \"\"\"\n", + " self.history.append(self.__START_TURN_MODEL__ + message)\n", + "\n", + " def get_history(self):\n", + " \"\"\"\n", + " Returns the entire chat history as a single string.\n", + " \"\"\"\n", + " return \"\".join([*self.history])\n", + "\n", + " def get_full_prompt(self):\n", + " \"\"\"\n", + " Builds the prompt for the language model, including history and system description.\n", + " \"\"\"\n", + " prompt = self.get_history() + self.__START_TURN_MODEL__\n", + " if len(self.system)>0:\n", + " prompt = self.system + \"\\n\" + prompt\n", + " return prompt\n", + "\n", + " def send_message(self, message):\n", + " \"\"\"\n", + " Handles sending a user message and getting a model response.\n", + "\n", + " Args:\n", + " message: The user's message.\n", + "\n", + " Returns:\n", + " The model's response.\n", + " \"\"\"\n", + " self.add_to_history_as_user(message)\n", + " prompt = self.get_full_prompt()\n", + " response = self.model.generate(prompt, max_length=2048)\n", + " result = response.replace(prompt, \"\") # Extract only the new response\n", + " self.add_to_history_as_model(result)\n", + " return result\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9hmJS4h4ZmiP" + }, + "source": [ + "### Chat with the model\n", + "\n", + "Start chatting with the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b1913181" + }, + "outputs": [ + { + "data": { + "text/markdown": [ + "🙋‍♂️
Tell me, in a few words, how to compute all prime numbers up to 1000?
🤖\n", + "\n", + "> **Sieve of Eratosthenes.** \n", + "> \n", + "" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "chat = ChatState(gemma_lm)\n", + "message = \"Tell me, in a few words, how to compute all prime numbers up to 1000?\"\n", + "display_chat(message, chat.send_message(message))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ODKxUPP2Zuqy" + }, + "source": [ + "Continue the conversation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7448005b" + }, + "outputs": [ + { + "data": { + "text/markdown": [ + "🙋‍♂️
Now in Python! No numpy, please!
🤖\n", + "\n", + "> ```python\n", + "> def sieve_of_eratosthenes(n):\n", + "> \"\"\"Returns a list of prime numbers up to n.\"\"\"\n", + "> primes = [True] * (n + 1)\n", + "> primes[0] = primes[1] = False\n", + "> for i in range(2, int(n**0.5) + 1):\n", + "> if primes[i]:\n", + "> for j in range(i * i, n + 1, i):\n", + "> primes[j] = False\n", + "> return [i for i, is_prime in enumerate(primes) if is_prime]\n", + "> \n", + "> primes = sieve_of_eratosthenes(1000)\n", + "> print(primes)\n", + "> ```\n", + "> \n", + "> **Explanation:**\n", + "> \n", + "> 1. **Initialization:**\n", + "> - `primes = [True] * (n + 1)`: Creates a list `primes` of boolean values, initially assuming all numbers are prime.\n", + "> - `primes[0] = primes[1] = False`: Sets 0 and 1 as non-prime.\n", + "> \n", + "> 2. **Iteration:**\n", + "> - `for i in range(2, int(n**0.5) + 1):`: Iterates from 2 to the square root of `n`. We only need to check up to the square root because any composite number must have a prime factor less than or equal to its square root.\n", + "> - `if primes[i]:`: If `i` is marked as prime:\n", + "> - `for j in range(i * i, n + 1, i):`: Marks all multiples of `i` as non-prime.\n", + "> \n", + "> 3. **Result:**\n", + "> - `return [i for i, is_prime in enumerate(primes) if is_prime]`: Creates a list of indices where `primes[i]` is True, representing the prime numbers.\n", + "> \n", + "> \n", + "> Let me know if you'd like a more detailed explanation of any part! \n", + "> \n", + "" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "message = \"Now in Python! No numpy, please!\"\n", + "display_chat(message, chat.send_message(message))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0973ff54" + }, + "outputs": [ + { + "data": { + "text/markdown": [ + "🙋‍♂️
Thank you, it works! Can you explain the code in French?
🤖\n", + "\n", + "> Bien sûr ! Voici une explication du code en français :\n", + "> \n", + "> ```python\n", + "> def sieve_of_eratosthenes(n):\n", + "> \"\"\"Retourne une liste de nombres premiers jusqu'à n.\"\"\"\n", + "> primes = [True] * (n + 1)\n", + "> primes[0] = primes[1] = False\n", + "> for i in range(2, int(n**0.5) + 1):\n", + "> if primes[i]:\n", + "> for j in range(i * i, n + 1, i):\n", + "> primes[j] = False\n", + "> return [i for i, is_prime in enumerate(primes) if is_prime]\n", + "> \n", + "> primes = sieve_of_eratosthenes(1000)\n", + "> print(primes)\n", + "> ```\n", + "> \n", + "> **Explication:**\n", + "> \n", + "> 1. **Initialisation:**\n", + "> - `primes = [True] * (n + 1)`: Crée une liste `primes` de valeurs booléennes, initialement supposant que tous les nombres sont premiers.\n", + "> - `primes[0] = primes[1] = False`: Définit 0 et 1 comme non-premiers.\n", + "> \n", + "> 2. **Itération:**\n", + "> - `for i in range(2, int(n**0.5) + 1):`: Itère de 2 jusqu'à la racine carrée de `n`. Nous ne devons vérifier que jusqu'à la racine carrée car tout nombre composite doit avoir un facteur premier inférieur ou égal à sa racine carrée.\n", + "> - `if primes[i]:`: Si `i` est considéré comme premier:\n", + "> - `for j in range(i * i, n + 1, i):`: Marquer tous les multiples de `i` comme non-premiers.\n", + "> \n", + "> 3. **Resultat:**\n", + "> - `return [i for i, is_prime in enumerate(primes) if is_prime]`: Crée une liste des indices où `primes[i]` est vrai, représentant les nombres premiers.\n", + "> \n", + "> \n", + "> N'hésitez pas à me demander si vous avez besoin d'une explication plus détaillée de quelque chose! \n", + "> \n", + "" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "message = \"Thank you, it works! Can you explain the code in French?\"\n", + "display_chat(message, chat.send_message(message))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "a0c51f42" + }, + "outputs": [ + { + "data": { + "text/markdown": [ + "🙋‍♂️
Great! Now add those explanations as comments in the code.
🤖\n", + "\n", + "> ```python\n", + "> def sieve_of_eratosthenes(n):\n", + "> \"\"\"Retourne une liste de nombres premiers jusqu'à n.\"\"\"\n", + "> # Initialise une liste de boolean avec True pour tous les nombres de 0 à n\n", + "> primes = [True] * (n + 1)\n", + "> # Définit 0 et 1 comme non-premiers\n", + "> primes[0] = primes[1] = False\n", + "> # Itère de 2 à la racine carrée de n\n", + "> for i in range(2, int(n**0.5) + 1):\n", + "> # Si i est considéré comme premier\n", + "> if primes[i]:\n", + "> # Itère sur tous les multiples de i\n", + "> for j in range(i * i, n + 1, i):\n", + "> # Définit les multiples de i comme non-premiers\n", + "> primes[j] = False\n", + "> # Retourne la liste des indices des nombres premiers\n", + "> return [i for i, is_prime in enumerate(primes) if is_prime]\n", + "> \n", + "> primes = sieve_of_eratosthenes(1000)\n", + "> print(primes)\n", + "> ```\n", + "> \n", + "> **Explication:**\n", + "> \n", + "> * **Initialisation:**\n", + "> * `primes = [True] * (n + 1)`: Crée une liste `primes` de valeurs booléennes, initialement supposant que tous les nombres sont premiers.\n", + "> * `primes[0] = primes[1] = False`: Définit 0 et 1 comme non-premiers.\n", + "> * **Itération:**\n", + "> * `for i in range(2, int(n**0.5) + 1):`: Itère de 2 jusqu'à la racine carrée de `n`. Nous ne devons vérifier que jusqu'à la racine carrée car tout nombre composite doit avoir un facteur premier inférieur ou égal à sa racine carrée.\n", + "> * `if primes[i]:`: Si `i` est considéré comme premier:\n", + "> * `for j in range(i * i, n + 1, i):`: Marquer tous les multiples de `i` comme non-premiers.\n", + "> * **Resultat:**\n", + "> * `return [i for i, is_prime in enumerate(primes) if is_prime]`: Crée une liste des indices où `primes[i]` est vrai, représentant les nombres premiers. \n", + "> \n", + "> \n", + "> \n", + "> \n", + "" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "message = \"Great! Now add those explanations as comments in the code.\"\n", + "display_chat(message, chat.send_message(message))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "51a33627" + }, + "source": [ + "Test the generated response by running the generated code:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "221c0817" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997]\n" + ] + } + ], + "source": [ + "def sieve_of_eratosthenes(n):\n", + " \"\"\"Retourne une liste de nombres premiers jusqu'à n.\"\"\"\n", + " # Initialise une liste de boolean avec True pour tous les nombres de 0 à n\n", + " primes = [True] * (n + 1)\n", + " # Définit 0 et 1 comme non-premiers\n", + " primes[0] = primes[1] = False\n", + " # Itère de 2 à la racine carrée de n\n", + " for i in range(2, int(n**0.5) + 1):\n", + " # Si i est considéré comme premier\n", + " if primes[i]:\n", + " # Itère sur tous les multiples de i\n", + " for j in range(i * i, n + 1, i):\n", + " # Définit les multiples de i comme non-premiers\n", + " primes[j] = False\n", + " # Retourne la liste des indices des nombres premiers\n", + " return [i for i, is_prime in enumerate(primes) if is_prime]\n", + "\n", + "primes = sieve_of_eratosthenes(1000)\n", + "print(primes)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1c8ece6c" + }, + "source": [ + "Use the `get_history` method to see how all the context was retained by the `Chat` class." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e48f4ca1" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "user\n", + "Tell me, in a few words, how to compute all prime numbers up to 1000?\n", + "model\n", + "**Sieve of Eratosthenes.** \n", + "user\n", + "Now in Python! No numpy, please!\n", + "model\n", + "```python\n", + "def sieve_of_eratosthenes(n):\n", + " \"\"\"Returns a list of prime numbers up to n.\"\"\"\n", + " primes = [True] * (n + 1)\n", + " primes[0] = primes[1] = False\n", + " for i in range(2, int(n**0.5) + 1):\n", + " if primes[i]:\n", + " for j in range(i * i, n + 1, i):\n", + " primes[j] = False\n", + " return [i for i, is_prime in enumerate(primes) if is_prime]\n", + "\n", + "primes = sieve_of_eratosthenes(1000)\n", + "print(primes)\n", + "```\n", + "\n", + "**Explanation:**\n", + "\n", + "1. **Initialization:**\n", + " - `primes = [True] * (n + 1)`: Creates a list `primes` of boolean values, initially assuming all numbers are prime.\n", + " - `primes[0] = primes[1] = False`: Sets 0 and 1 as non-prime.\n", + "\n", + "2. **Iteration:**\n", + " - `for i in range(2, int(n**0.5) + 1):`: Iterates from 2 to the square root of `n`. We only need to check up to the square root because any composite number must have a prime factor less than or equal to its square root.\n", + " - `if primes[i]:`: If `i` is marked as prime:\n", + " - `for j in range(i * i, n + 1, i):`: Marks all multiples of `i` as non-prime.\n", + "\n", + "3. **Result:**\n", + " - `return [i for i, is_prime in enumerate(primes) if is_prime]`: Creates a list of indices where `primes[i]` is True, representing the prime numbers.\n", + "\n", + "\n", + "Let me know if you'd like a more detailed explanation of any part! \n", + "user\n", + "Thank you, it works! Can you explain the code in French?\n", + "model\n", + "Bien sûr ! Voici une explication du code en français :\n", + "\n", + "```python\n", + "def sieve_of_eratosthenes(n):\n", + " \"\"\"Retourne une liste de nombres premiers jusqu'à n.\"\"\"\n", + " primes = [True] * (n + 1)\n", + " primes[0] = primes[1] = False\n", + " for i in range(2, int(n**0.5) + 1):\n", + " if primes[i]:\n", + " for j in range(i * i, n + 1, i):\n", + " primes[j] = False\n", + " return [i for i, is_prime in enumerate(primes) if is_prime]\n", + "\n", + "primes = sieve_of_eratosthenes(1000)\n", + "print(primes)\n", + "```\n", + "\n", + "**Explication:**\n", + "\n", + "1. **Initialisation:**\n", + " - `primes = [True] * (n + 1)`: Crée une liste `primes` de valeurs booléennes, initialement supposant que tous les nombres sont premiers.\n", + " - `primes[0] = primes[1] = False`: Définit 0 et 1 comme non-premiers.\n", + "\n", + "2. **Itération:**\n", + " - `for i in range(2, int(n**0.5) + 1):`: Itère de 2 jusqu'à la racine carrée de `n`. Nous ne devons vérifier que jusqu'à la racine carrée car tout nombre composite doit avoir un facteur premier inférieur ou égal à sa racine carrée.\n", + " - `if primes[i]:`: Si `i` est considéré comme premier:\n", + " - `for j in range(i * i, n + 1, i):`: Marquer tous les multiples de `i` comme non-premiers.\n", + "\n", + "3. **Resultat:**\n", + " - `return [i for i, is_prime in enumerate(primes) if is_prime]`: Crée une liste des indices où `primes[i]` est vrai, représentant les nombres premiers.\n", + "\n", + "\n", + "N'hésitez pas à me demander si vous avez besoin d'une explication plus détaillée de quelque chose! \n", + "user\n", + "Great! Now add those explanations as comments in the code.\n", + "model\n", + "```python\n", + "def sieve_of_eratosthenes(n):\n", + " \"\"\"Retourne une liste de nombres premiers jusqu'à n.\"\"\"\n", + " # Initialise une liste de boolean avec True pour tous les nombres de 0 à n\n", + " primes = [True] * (n + 1)\n", + " # Définit 0 et 1 comme non-premiers\n", + " primes[0] = primes[1] = False\n", + " # Itère de 2 à la racine carrée de n\n", + " for i in range(2, int(n**0.5) + 1):\n", + " # Si i est considéré comme premier\n", + " if primes[i]:\n", + " # Itère sur tous les multiples de i\n", + " for j in range(i * i, n + 1, i):\n", + " # Définit les multiples de i comme non-premiers\n", + " primes[j] = False\n", + " # Retourne la liste des indices des nombres premiers\n", + " return [i for i, is_prime in enumerate(primes) if is_prime]\n", + "\n", + "primes = sieve_of_eratosthenes(1000)\n", + "print(primes)\n", + "```\n", + "\n", + "**Explication:**\n", + "\n", + "* **Initialisation:**\n", + " * `primes = [True] * (n + 1)`: Crée une liste `primes` de valeurs booléennes, initialement supposant que tous les nombres sont premiers.\n", + " * `primes[0] = primes[1] = False`: Définit 0 et 1 comme non-premiers.\n", + "* **Itération:**\n", + " * `for i in range(2, int(n**0.5) + 1):`: Itère de 2 jusqu'à la racine carrée de `n`. Nous ne devons vérifier que jusqu'à la racine carrée car tout nombre composite doit avoir un facteur premier inférieur ou égal à sa racine carrée.\n", + " * `if primes[i]:`: Si `i` est considéré comme premier:\n", + " * `for j in range(i * i, n + 1, i):`: Marquer tous les multiples de `i` comme non-premiers.\n", + "* **Resultat:**\n", + " * `return [i for i, is_prime in enumerate(primes) if is_prime]`: Crée une liste des indices où `primes[i]` est vrai, représentant les nombres premiers. \n", + "\n", + "\n", + "\n", + "\n" + ] + } + ], + "source": [ + "print(chat.get_history())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9693c66f" + }, + "source": [ + "## Summary and further reading\n", + "\n", + "In this tutorial, you learned how to chat with the Gemma 2B Instruction tuned model using Keras on JAX.\n", + "\n", + "Check out these guides and tutorials to learn more about Gemma:\n", + "\n", + "* [Get started with Keras Gemma](https://ai.google.dev/gemma/docs/get_started).\n", + "* [Finetune the Gemma model on GPU](https://ai.google.dev/gemma/docs/lora_tuning).\n", + "* Learn about [Gemma integration with Vertex AI](https://ai.google.dev/gemma/docs/integrations/vertex)\n", + "* Learn how to [use Gemma models with Vertex AI](https://cloud.google.com/vertex-ai/docs/generative-ai/open-models/use-gemma){:.external}.\n" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "name": "gemma_chat.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/tooling/google-official/cookbook/tutorials_RAG_EmbeddingGemma.ipynb b/tooling/google-official/cookbook/tutorials_RAG_EmbeddingGemma.ipynb new file mode 100644 index 0000000..70a8e27 --- /dev/null +++ b/tooling/google-official/cookbook/tutorials_RAG_EmbeddingGemma.ipynb @@ -0,0 +1,925 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "-u7xRR3DeFXz" + }, + "source": [ + "##### Copyright 2026 Google LLC." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "oed1Dh9SeIlD" + }, + "outputs": [], + "source": [ + "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A0UbyyBOeKmV" + }, + "source": [ + "# RAG with EmbeddingGemma\n", + "\n", + "\n", + " \n", + "
\n", + " Run in Google Colab\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ND35JUp9ecq2" + }, + "source": [ + "EmbeddingGemma is a lightweight, open embedding model designed for fast, high-quality retrieval on everyday devices like mobile phones. At only 308 million parameters, it's efficient enough to run advanced AI techniques, such as Retrieval Augmented Generation (RAG), directly on your local machine with no internet connection required.\n", + "\n", + "## Setup\n", + "\n", + "Before starting this tutorial, complete the following steps:\n", + "\n", + "* Get access to EmbeddingGemma by logging into [Hugging Face](https://huggingface.co/google/embeddinggemma-300M) and selecting **Acknowledge license** for a Gemma model.\n", + "* Select a Colab runtime with sufficient resources to run\n", + " the Gemma model size you want to run. [Learn more](https://ai.google.dev/gemma/docs/core#sizes).\n", + "* Generate a Hugging Face [Access Token](https://huggingface.co/docs/hub/en/security-tokens#how-to-manage-user-access-token) and use it to login from Colab.\n", + "\n", + "This notebook will run on an NVIDIA T4 GPU." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SZ8cw1nPf-NV" + }, + "source": [ + "### Install Python packages\n", + "\n", + "Install the libraries required for running the EmbeddingGemma model and generating embeddings. Sentence Transformers is a Python framework for text and image embeddings. For more information, see the [Sentence Transformers](https://www.sbert.net/) documentation." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "daXx6O20Q7M0" + }, + "outputs": [], + "source": [ + "!pip install -q -U sentence-transformers transformers" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kYiTsNFSjGJH" + }, + "source": [ + "After you have accepted the license, you need a valid Hugging Face Token to access the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "eLagJ9aff9Ks" + }, + "outputs": [], + "source": [ + "# Login into Hugging Face Hub\n", + "from huggingface_hub import login\n", + "login()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IiDcW_rmHBfx" + }, + "source": [ + "### Load language model\n", + "\n", + "You will use Gemma 4 E2B to generate responses." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "HX2JFDQI-vg8" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c0b54b8b91da46fdb7ba8fd3aecb5002", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "config.json: 0.00B [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4291694230e74608a2808adde451bd0f", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "model.safetensors: 0%| | 0.00/10.2G [00:00\n", + " ```python\n", + " query_embedding = model.encode(\n", + " \"How do I use prompts with this model?\",\n", + " prompt_name=\"Retrieval-query\"\n", + " )\n", + " ```\n", + "\n", + "* **For Documents:** Use `prompt_name=\"Retrieval-document\"`. To further improve document embeddings, you can also include a title by using the `prompt` argument directly:
\n", + " * **With a title:**
\n", + " ```python\n", + " doc_embedding = model.encode(\n", + " \"The document text...\",\n", + " prompt=\"title: Using Prompts in RAG | text: \"\n", + " )\n", + " ```\n", + " * **Without a title:**
\n", + " ```python\n", + " doc_embedding = model.encode(\n", + " \"The document text...\",\n", + " prompt=\"title: none | text: \"\n", + " )\n", + " ```\n", + "\n", + "### Further Reading\n", + "\n", + "* For details on all available EmbeddingGemma prompts, see the [model card](http://ai.google.dev/gemma/docs/embeddinggemma/model_card#prompt_instructions).\n", + "* For general information on prompt templates, see the [Sentence Transformer documentation](https://sbert.net/examples/sentence_transformer/applications/computing-embeddings/README.html#prompt-templates).\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "Y5hVNF3F-qZ7" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Available tasks:\n", + " query: \"task: search result | query: \"\n", + " document: \"title: none | text: \"\n", + " BitextMining: \"task: search result | query: \"\n", + " Clustering: \"task: clustering | query: \"\n", + " Classification: \"task: classification | query: \"\n", + " InstructionRetrieval: \"task: code retrieval | query: \"\n", + " MultilabelClassification: \"task: classification | query: \"\n", + " PairClassification: \"task: sentence similarity | query: \"\n", + " Reranking: \"task: search result | query: \"\n", + " Retrieval: \"task: search result | query: \"\n", + " Retrieval-query: \"task: search result | query: \"\n", + " Retrieval-document: \"title: none | text: \"\n", + " STS: \"task: sentence similarity | query: \"\n", + " Summarization: \"task: summarization | query: \"\n" + ] + } + ], + "source": [ + "print(\"Available tasks:\")\n", + "for name, prefix in model.prompts.items():\n", + " print(f\" {name}: \\\"{prefix}\\\"\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eIfWZ_z3xDZq" + }, + "source": [ + "## Simple RAG example\n", + "\n", + "Retrieval is the task of finding the most relevant pieces of information from a large collection (a database, a set of documents, a website) based on the meaning of a query, not just keywords.\n", + "\n", + "Imagine you work for a company, and you need to find information from the internal employee handbook, which is stored as a collection of hundreds of documents." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "cellView": "form", + "id": "fbaiy-CXRAs7" + }, + "outputs": [], + "source": [ + "#@title Corp knowledge base\n", + "corp_knowledge_base = [\n", + " {\n", + " \"category\": \"HR & Leave Policies\",\n", + " \"documents\": [\n", + " {\n", + " \"title\": \"Procedure for Unscheduled Absence\",\n", + " \"content\": \"In the event of an illness or emergency preventing you from working, please notify both your direct manager and the HR department via email by 9:30 AM JST. The subject line should be 'Sick Leave - [Your Name]'. If the absence extends beyond two consecutive days, a doctor's certificate (診断書) will be required upon your return.\"\n", + " },\n", + " {\n", + " \"title\": \"Annual Leave Policy\",\n", + " \"content\": \"Full-time employees are granted 10 days of annual paid leave in their first year. This leave is granted six months after the date of joining and increases each year based on length of service. For example, an employee in their third year of service is entitled to 14 days per year. For a detailed breakdown, please refer to the attached 'Annual Leave Accrual Table'.\"\n", + " },\n", + " ]\n", + " },\n", + " {\n", + " \"category\": \"IT & Security\",\n", + " \"documents\": [\n", + " {\n", + " \"title\": \"Account Password Management\",\n", + " \"content\": \"If you have forgotten your password or your account is locked, please use the self-service reset portal at https://reset.ourcompany. You will be prompted to answer your pre-configured security questions. For security reasons, the IT Help Desk cannot reset passwords over the phone or email. If you have not set up your security questions, please visit the IT support desk on the 12th floor of the Shibuya office with your employee ID card.\"\n", + " },\n", + " {\n", + " \"title\": \"Software Procurement Process\",\n", + " \"content\": \"All requests for new software must be submitted through the 'IT Service Desk' portal under the 'Software Request' category. Please include a business justification for the request. All software licenses require approval from your department head before procurement can begin. Please note that standard productivity software is pre-approved and does not require this process.\"\n", + " },\n", + " ]\n", + " },\n", + " {\n", + " \"category\": \"Finance & Expenses\",\n", + " \"documents\": [\n", + " {\n", + " \"title\": \"Expense Reimbursement Policy\",\n", + " \"content\": \"To ensure timely processing, all expense claims for a given month must be submitted for approval no later than the 5th business day of the following month. For example, all expenses incurred in July must be submitted by the 5th business day of August. Submissions after this deadline may be processed in the next payment cycle.\"\n", + " },\n", + " {\n", + " \"title\": \"Business Trip Expense Guidelines\",\n", + " \"content\": \"Travel expenses for business trips will, as a rule, be reimbursed based on the actual cost of the most logical and economical route. Please submit a travel expense application in advance when using the Shinkansen or airplanes. Taxis are permitted only when public transportation is unavailable or when transporting heavy equipment. Receipts are mandatory.\"\n", + " },\n", + " ]\n", + " },\n", + " {\n", + " \"category\": \"Office & Facilities\",\n", + " \"documents\": [\n", + " {\n", + " \"title\": \"Conference Room Booking Instructions\",\n", + " \"content\": \"All conference rooms in the Shibuya office can be reserved through your Calendar App. Create a new meeting invitation, add the attendees, and then use the 'Room Finder' feature to select an available room. Please be sure to select the correct floor. For meetings with more than 10 people, please book the 'Sakura' or 'Fuji' rooms on the 14th floor.\"\n", + " },\n", + " {\n", + " \"title\": \"Mail and Delivery Policy\",\n", + " \"content\": \"The company's mail services are intended for business-related correspondence only. For security and liability reasons, employees are kindly requested to refrain from having personal parcels or mail delivered to the Shibuya office address. The front desk will not be able to accept or hold personal deliveries.\"\n", + " },\n", + " ]\n", + " },\n", + "]\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Fvecfoko--hL" + }, + "source": [ + "And imagine you have a question like below." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "id": "wN-WHf26J89m" + }, + "outputs": [], + "source": [ + "question = \"How do I reset my password?\" # @param [\"How many days of annual paid leave do I get?\", \"How do I reset my password?\", \"What travel expenses can be reimbursed for a business trip?\", \"Can I receive personal packages at the office?\"] {type:\"string\", allow-input: true}\n", + "\n", + "# Define a minimum confidence threshold for a match to be considered valid\n", + "similarity_threshold = 0.4 # @param {\"type\":\"slider\",\"min\":0,\"max\":1,\"step\":0.1}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2CSeSmF7OuMB" + }, + "source": [ + "Search relevant document from the corporate knowledge base." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "id": "NngqWUxOyrLS" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Step 1: Finding the best category...\n", + "['HR & Leave Policies', 'IT & Security', 'Finance & Expenses', 'Office & Facilities']\n", + "tensor([[0.5063, 0.5937, 0.5076, 0.4221]])\n", + " `-> ✅ Category Found: 'IT & Security' (Score: 0.59)\n", + "\n", + "Step 2: Finding the best document in that category...\n", + "['Account Password Management', 'Software Procurement Process']\n", + "tensor([[0.5829, 0.1531]])\n", + " `-> ✅ Document Found: 'Account Password Management' (Score: 0.58)\n" + ] + } + ], + "source": [ + "# --- Helper Functions for Semantic Search ---\n", + "\n", + "def _calculate_best_match(similarities):\n", + " print(similarities)\n", + " if similarities is None or similarities.nelement() == 0:\n", + " return None, 0.0\n", + "\n", + " # Find the index and value of the highest score\n", + " best_index = similarities.argmax().item()\n", + " best_score = similarities[0, best_index].item()\n", + "\n", + " return best_index, best_score\n", + "\n", + "def find_best_category(model, query, candidates):\n", + " \"\"\"\n", + " Finds the most relevant category from a list of candidates.\n", + "\n", + " Args:\n", + " model: The SentenceTransformer model.\n", + " query: The user's query string.\n", + " candidates: A list of category name strings.\n", + "\n", + " Returns:\n", + " A tuple containing the index of the best category and its similarity score.\n", + " \"\"\"\n", + " if not candidates:\n", + " return None, 0.0\n", + "\n", + " # Encode the query and candidate categories for classification\n", + " query_embedding = model.encode(query, prompt_name=\"Classification\")\n", + " candidate_embeddings = model.encode(candidates, prompt_name=\"Classification\")\n", + "\n", + " print(candidates)\n", + " return _calculate_best_match(model.similarity(query_embedding, candidate_embeddings))\n", + "\n", + "def find_best_doc(model, query, candidates):\n", + " \"\"\"\n", + " Finds the most relevant document from a list of candidates.\n", + "\n", + " Args:\n", + " model: The SentenceTransformer model.\n", + " query: The user's query string.\n", + " candidates: A list of document dictionaries, each with 'title' and 'content'.\n", + "\n", + " Returns:\n", + " A tuple containing the index of the best document and its similarity score.\n", + " \"\"\"\n", + " if not candidates:\n", + " return None, 0.0\n", + "\n", + " # Encode the query for retrieval\n", + " query_embedding = model.encode(query, prompt_name=\"Retrieval-query\")\n", + "\n", + " # Encode the document for similarity check\n", + " doc_texts = [\n", + " f\"title: {doc.get('title', 'none')} | text: {doc.get('content', '')}\"\n", + " for doc in candidates\n", + " ]\n", + " candidate_embeddings = model.encode(doc_texts)\n", + "\n", + " print([doc['title'] for doc in candidates])\n", + "\n", + " # Calculate cosine similarity\n", + " return _calculate_best_match(model.similarity(query_embedding, candidate_embeddings))\n", + "\n", + "# --- Main Search Logic ---\n", + "\n", + "# In your application, `best_document` would result from a search.\n", + "# We initialize it to None to ensure it always exists.\n", + "best_document = None\n", + "\n", + "# 1. Find the most relevant category\n", + "print(\"Step 1: Finding the best category...\")\n", + "categories = [item[\"category\"] for item in corp_knowledge_base]\n", + "best_category_index, category_score = find_best_category(\n", + " model, question, categories\n", + ")\n", + "\n", + "# Check if the category score meets the threshold\n", + "if category_score < similarity_threshold:\n", + " print(f\" `-> 🤷 No relevant category found. The highest score was only {category_score:.2f}.\")\n", + "else:\n", + " best_category = corp_knowledge_base[best_category_index]\n", + " print(f\" `-> ✅ Category Found: '{best_category['category']}' (Score: {category_score:.2f})\")\n", + "\n", + " # 2. Find the most relevant document ONLY if a good category was found\n", + " print(\"\\nStep 2: Finding the best document in that category...\")\n", + " best_document_index, document_score = find_best_doc(\n", + " model, question, best_category[\"documents\"]\n", + " )\n", + "\n", + " # Check if the document score meets the threshold\n", + " if document_score < similarity_threshold:\n", + " print(f\" `-> 🤷 No relevant document found. The highest score was only {document_score:.2f}.\")\n", + " else:\n", + " best_document = best_category[\"documents\"][best_document_index]\n", + " # 3. Display the final successful result\n", + " print(f\" `-> ✅ Document Found: '{best_document['title']}' (Score: {document_score:.2f})\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zK9T5rRGAMDw" + }, + "source": [ + "Next, generate the answer with the retrieved context" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "id": "FrwKySpMASpt" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Question🙋‍♂️: How do I reset my password?\n", + "Using document: Account Password Management\n", + "Answer🤖: Please use the self-service reset portal at https://reset.ourcompany. You will be prompted to answer your pre-configured security questions.\n" + ] + } + ], + "source": [ + "from transformers import GenerationConfig\n", + "MODEL_ID = \"google/gemma-4-E2B-it\"\n", + "config = GenerationConfig.from_pretrained(MODEL_ID)\n", + "config.max_new_tokens = 512\n", + "\n", + "qa_prompt_template = \"\"\"Answer the following QUESTION based only on the CONTEXT provided. If the answer cannot be found in the CONTEXT, write \"I don't know.\"\n", + "\n", + "---\n", + "CONTEXT:\n", + "{context}\n", + "---\n", + "QUESTION:\n", + "{question}\n", + "\"\"\"\n", + "\n", + "# First, check if a valid document was found before proceeding.\n", + "if best_document and \"content\" in best_document:\n", + " # If the document exists and has a \"content\" key, generate the answer.\n", + " context = best_document[\"content\"]\n", + "\n", + " prompt = qa_prompt_template.format(context=context, question=question)\n", + "\n", + " messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [{\"type\": \"text\", \"text\": prompt}],\n", + " },\n", + " ]\n", + "\n", + " print(\"Question🙋‍♂️: \" + question)\n", + " # This part assumes your pipeline and response parsing logic are correct\n", + " answer = pipeline(messages, generation_config=config)[0][\"generated_text\"][1][\"content\"]\n", + " print(\"Using document: \" + best_document[\"title\"])\n", + " print(\"Answer🤖: \" + answer)\n", + "\n", + "else:\n", + " # If best_document is None or doesn't have content, give a direct response.\n", + " print(\"Question🙋‍♂️: \" + question)\n", + " print(\"Answer🤖: I'm sorry, I could not find a relevant document to answer that question.\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "h4J4pFA3IK1d" + }, + "source": [ + "## Summary and next steps\n", + "\n", + "You have now learned how to build a practical RAG system with EmbeddingGemma.\n", + "\n", + "Explore what more you can do with EmbeddingGemma:\n", + "\n", + "* [Generate embeddings with Sentence Transformers](https://ai.google.dev/gemma/docs/embeddinggemma/inference-embeddinggemma-with-sentence-transformers)\n", + "* [Fine-tune EmbeddingGemma](https://ai.google.dev/gemma/docs/embeddinggemma/fine-tuning-embeddinggemma-with-sentence-transformers)\n", + "* [Mood Palette Generator](https://huggingface.co/spaces/google/mood-palette), an interactive application using EmbeddingGemma" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "name": "RAG_with_EmbeddingGemma.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/tooling/google-official/deepmind-gemma/README.md b/tooling/google-official/deepmind-gemma/README.md new file mode 100644 index 0000000..6d50c44 --- /dev/null +++ b/tooling/google-official/deepmind-gemma/README.md @@ -0,0 +1,99 @@ +# Gemma + +[![Unittests](https://github.com/google-deepmind/gemma/actions/workflows/pytest_and_autopublish.yml/badge.svg)](https://github.com/google-deepmind/gemma/actions/workflows/pytest_and_autopublish.yml) +[![PyPI version](https://badge.fury.io/py/gemma.svg)](https://badge.fury.io/py/gemma) +[![Documentation Status](https://readthedocs.org/projects/gemma-llm/badge/?version=latest)](https://gemma-llm.readthedocs.io/en/latest/?badge=latest) + +[Gemma](https://ai.google.dev/gemma) is a family of open-weights Large Language +Model (LLM) by [Google DeepMind](https://deepmind.google/), based on Gemini +research and technology. + +This repository contains the implementation of the +[`gemma`](https://pypi.org/project/gemma/) PyPI package. A +[JAX](https://github.com/jax-ml/jax) library to use and fine-tune Gemma. + +For examples and use cases, see our +[documentation](https://gemma-llm.readthedocs.io/). Please +report issues and feedback in +[our GitHub](https://github.com/google-deepmind/gemma/issues). + +### Installation + +1. Install JAX for CPU, GPU or TPU. Follow the instructions on + [the JAX website](https://jax.readthedocs.io/en/latest/installation.html). +1. Run + + ```sh + pip install gemma + ``` + +### Examples + +Here is a minimal example to have a multi-turn, multi-modal conversation with +Gemma: + +```python +from gemma import gm + +# Model and parameters (Gemma 4) +model = gm.nn.Gemma4_E4B() +params = gm.ckpts.load_params(gm.ckpts.CheckpointPath.GEMMA4_E4B_IT) + +# Example of multi-turn conversation +sampler = gm.text.ChatSampler( + model=model, + params=params, + multi_turn=True, +) + +prompt = """Which of the 2 images do you prefer ? + +Image 1: <|image|> +Image 2: <|image|> + +Write your answer as a poem.""" +out0 = sampler.chat(prompt, images=[image1, image2]) + +out1 = sampler.chat('What about the other image ?') +``` + +The same `ChatSampler` API works with all Gemma versions (2, 3, 3n, 4). + +Our documentation contains various Colabs and tutorials, including: + +* [Sampling](https://gemma-llm.readthedocs.io/en/latest/colab_sampling.html) +* [Multi-modal](https://gemma-llm.readthedocs.io/en/latest/colab_multimodal.html) +* [Fine-tuning](https://gemma-llm.readthedocs.io/en/latest/colab_finetuning.html) +* [LoRA](https://gemma-llm.readthedocs.io/en/latest/colab_lora_sampling.html) +* ... + +Additionally, our +[examples/](https://github.com/google-deepmind/gemma/tree/main/examples) folder +contain additional scripts to fine-tune and sample with Gemma. + +### Learn more about Gemma + +* To use this library: [Gemma documentation](https://gemma-llm.readthedocs.io/) +* Technical reports for metrics and model capabilities: + * [Gemma 1](https://goo.gle/GemmaReport) + * [Gemma 2](https://goo.gle/gemma2report) + * [Gemma 3](https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf) + * Gemma 4 (Coming soon) +* Other Gemma implementations and doc on the + [Gemma ecosystem](https://ai.google.dev/gemma/docs) + +### Downloading the models + +To download the model weights. See +[our documentation](https://gemma-llm.readthedocs.io/en/latest/checkpoints.html). + +### System Requirements + +Gemma can run on a CPU, GPU and TPU. For GPU, we recommend 8GB+ RAM on GPU for +The 2B checkpoint and 24GB+ RAM on GPU are used for the 7B checkpoint. + +### Contributing + +We welcome contributions! Please read our [Contributing Guidelines](./CONTRIBUTING.md) before submitting a pull request. + +*This is not an official Google product.* diff --git a/tooling/google-official/deepmind-gemma/colab_sampling.ipynb b/tooling/google-official/deepmind-gemma/colab_sampling.ipynb new file mode 100644 index 0000000..94bf66e --- /dev/null +++ b/tooling/google-official/deepmind-gemma/colab_sampling.ipynb @@ -0,0 +1,547 @@ +{ + "cells": [ + { + "metadata": { + "id": "My2AZpV_RuTs" + }, + "cell_type": "markdown", + "source": [ + "# Sampling\n", + "\n", + "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google-deepmind/gemma/blob/main/colabs/sampling.ipynb)\n", + "\n", + "Example on how to load a Gemma model and run inference on it.\n", + "\n", + "The Gemma library has 3 ways to prompt a model:\n", + "\n", + "* `gm.text.ChatSampler`: Easiest to use, simply talk to the model and get answer. Support multi-turns conversations out-of-the-box.\n", + "* `gm.text.Sampler`: Lower level, but give more control. The chat state has to be manually handeled for multi-turn.\n", + "* `model.apply`: Directly call the model, only predict a single token." + ] + }, + { + "metadata": { + "id": "94CVV9ZxKVDO" + }, + "cell_type": "code", + "source": [ + "!pip install -q gemma" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": { + "executionInfo": { + "elapsed": 2610, + "status": "ok", + "timestamp": 1741517119876, + "user": { + "displayName": "", + "userId": "" + }, + "user_tz": -60 + }, + "id": "PXBc1hRKRuTt" + }, + "cell_type": "code", + "source": [ + "# Common imports\n", + "import os\n", + "import jax\n", + "import jax.numpy as jnp\n", + "\n", + "# Gemma imports\n", + "from gemma import gm" + ], + "outputs": [], + "execution_count": 1 + }, + { + "metadata": { + "id": "i_DEVehe3v5Y" + }, + "cell_type": "markdown", + "source": [ + "By default, Jax do not utilize the full GPU memory, but this can be overwritten. See [GPU memory allocation](https://docs.jax.dev/en/latest/gpu_memory_allocation.html):" + ] + }, + { + "metadata": { + "id": "AaK17GWo3v5Y" + }, + "cell_type": "code", + "source": [ + "os.environ[\"XLA_PYTHON_CLIENT_MEM_FRACTION\"]=\"1.00\"" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": { + "id": "XzEj3PYvX1Sm" + }, + "cell_type": "markdown", + "source": [ + "Load the model and the params. Here we load the instruction-tuned version of the model." + ] + }, + { + "metadata": { + "id": "ox1CAuffKJtj" + }, + "cell_type": "code", + "source": [ + "model = gm.nn.Gemma3_4B()\n", + "\n", + "params = gm.ckpts.load_params(gm.ckpts.CheckpointPath.GEMMA3_4B_IT)" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": { + "id": "iRjnbNREdugN" + }, + "cell_type": "markdown", + "source": [ + "## Multi-turns conversations\n", + "\n", + "The easiest way to chat with Gemma is to use the `gm.text.ChatSampler`. It hides the boilerplate of the conversation cache, as well as the `` / `` tokens used to format the conversation.\n", + "\n", + "Here, we set `multi_turn=True` when creating `gm.text.ChatSampler` (by default, the `ChatSampler` start a new conversation every time).\n", + "\n", + "In multi-turn mode, you can erase the previous conversation state, by passing `chatbot.chat(..., multi_turn=False)`." + ] + }, + { + "metadata": { + "executionInfo": { + "elapsed": 18237, + "status": "ok", + "timestamp": 1741517159587, + "user": { + "displayName": "", + "userId": "" + }, + "user_tz": -60 + }, + "id": "bGSE6aTYdxqt", + "outputId": "5d428ab7-f9bf-4898-a921-cd57353ff364" + }, + "cell_type": "code", + "source": [ + "sampler = gm.text.ChatSampler(\n", + " model=model,\n", + " params=params,\n", + " multi_turn=True,\n", + " print_stream=True, # Print output as it is generated.\n", + ")\n", + "\n", + "turn0 = sampler.chat('Share one methapore linking \"shadow\" and \"laughter\".')" + ], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Okay, here's a metaphor linking \"shadow\" and \"laughter,\" aiming for a slightly evocative and layered feel:\n", + "\n", + "**\"Laughter is the fleeting shadow of joy, dancing across a face that’s often hidden in the long shadow of sorrow.\"**\n", + "\n", + "---\n", + "\n", + "**Here's a breakdown of why this works:**\n", + "\n", + "* **\"Shadow\"** represents sadness, pain, or a past experience that lingers. It’s not necessarily a dark shadow, but a persistent presence.\n", + "* **\"Laughter\"** is presented as a brief, bright appearance – a momentary flash of happiness.\n", + "* **\"Dancing across a face that’s often hidden\"** emphasizes that the joy isn't constant, and the underlying sadness is still there, obscuring it.\n", + "\n", + "---\n", + "\n", + "Would you like me to:\n", + "\n", + "* Try a different type of metaphor?\n", + "* Expand on this one with a short story snippet?\n" + ] + } + ], + "execution_count": 3 + }, + { + "metadata": { + "executionInfo": { + "elapsed": 7822, + "status": "ok", + "timestamp": 1741517170597, + "user": { + "displayName": "", + "userId": "" + }, + "user_tz": -60 + }, + "id": "MIZqgSSRfmRS", + "outputId": "575490ff-3d07-48f6-cb39-5fac3553e24a" + }, + "cell_type": "code", + "source": [ + "turn1 = sampler.chat('Expand it in a haiku.')" + ], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Okay, here’s a haiku based on the metaphor:\n", + "\n", + "Shadow stretches long,\n", + "Laughter’s brief, bright, dancing grace,\n", + "Joy hides in the dark. \n", + "\n", + "---\n", + "\n", + "Would you like me to try another haiku, or perhaps a different poetic form?\n" + ] + } + ], + "execution_count": 4 + }, + { + "metadata": { + "id": "uOQHd6eFd67c" + }, + "cell_type": "markdown", + "source": [ + "Note: By default (`multi_turn=False`), the conversation state is reset everytime, but you can still continue the previous conversation by passing `sampler.chat(..., multi_turn=True)`\n", + "\n", + "By default, greedy decoding is used. You can pass a custom `sampling=` method as kwargs:\n", + "\n", + "* `gm.text.Greedy()`: (default) Greedy decoding\n", + "* `gm.text.RandomSampling()`: Simple random sampling with temperature, for more variety" + ] + }, + { + "metadata": { + "id": "AH0eWFWJaiNk" + }, + "cell_type": "markdown", + "source": [ + "## Sample a prompt\n", + "\n", + "For more control, we also provide a `gm.text.Sampler` which still perform efficient sampling (with kv-caching, early stopping,...).\n", + "\n", + "Prompting the sampler require to correctly add format the prompt with the `` / `` tokens (see the custom token section doc on [tokenizer](https://gemma-llm.readthedocs.io/en/latest/tokenizer.html))." + ] + }, + { + "metadata": { + "executionInfo": { + "elapsed": 14339, + "status": "ok", + "timestamp": 1741277003042, + "user": { + "displayName": "", + "userId": "" + }, + "user_tz": -60 + }, + "id": "6R5J42EiZtkC", + "outputId": "e309909f-b627-4057-94c0-524e66dd3ea2" + }, + "cell_type": "code", + "source": [ + "sampler = gm.text.Sampler(\n", + " model=model,\n", + " params=params,\n", + ")\n", + "\n", + "prompt = \"\"\"user\n", + "Give me a list of inspirational quotes.\n", + "model\n", + "\"\"\"\n", + "\n", + "out = sampler.sample(prompt, max_new_tokens=1000)\n", + "print(out)" + ], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Okay, here's a list of inspirational quotes, categorized a little to give you a variety:\n", + "\n", + "**On Perseverance & Resilience:**\n", + "\n", + "* “The only way to do great work is to love what you do.” – Steve Jobs\n", + "* “Fall seven times, stand up eight.” – Japanese Proverb\n", + "* “The difference between ordinary and extraordinary is that little extra.” – Jimmy Johnson\n", + "* “Success is not final, failure is not fatal: It is the courage to continue that counts.” – Winston Churchill\n", + "* “Don’t watch the clock; do what it does. Keep going.” – Sam Levenson\n", + "* “When the going gets tough, the tough get going.” – Theodore Roosevelt\n", + "\n", + "\n", + "**On Self-Love & Confidence:**\n", + "\n", + "* “You are enough.” – Brené Brown\n", + "* “Believe you can and you’re halfway there.” – Theodore Roosevelt\n", + "* “You must be the change you wish to see in the world.” – Mahatma Gandhi\n", + "* “The best is yet to come.” – Frank Sinatra\n", + "* “Be the energy you want to attract.” – Tony Gaskins\n", + "* “Don’t be defined by your past. Define your future.” – Unknown\n", + "\n", + "\n", + "**On Dreams & Goals:**\n", + "\n", + "* “If you can dream it, you can do it.” – Walt Disney\n", + "* “The future belongs to those who believe in the beauty of their dreams.” – Eleanor Roosevelt\n", + "* “Shoot for the moon. Even if you miss, you’ll land among the stars.” – Les Brown\n", + "* “Start where you are. Use what you have. Do what you can.” – Arthur Ashe\n", + "* “Life begins at the end of your comfort zone.” – Unknown\n", + "\n", + "\n", + "**On Happiness & Perspective:**\n", + "\n", + "* “Happiness is not something readymade. It comes from your own actions.” – Dalai Lama\n", + "* “It’s not the triumph that matters, it’s the effort.” – Winston Churchill\n", + "* “Don’t wait for the perfect moment, take the moment and make it perfect.” – Oscar Wilde\n", + "* “Be present. Be grateful. Be you.” – Unknown\n", + "* “The only way out is through.” – Robert Frost\n", + "\n", + "\n", + "\n", + "**Short & Powerful:**\n", + "\n", + "* “Be the change.” – Mahatma Gandhi\n", + "* “Just breathe.”\n", + "* “Keep going.”\n", + "* “You got this.”\n", + "* “Dream big.”\n", + "\n", + "---\n", + "\n", + "**Resources for More Quotes:**\n", + "\n", + "* **BrainyQuote:** [https://www.brainyquote.com/](https://www.brainyquote.com/)\n", + "* **Goodreads:** [https://www.goodreads.com/quotes](https://www.goodreads.com/quotes)\n", + "* **Quote Garden:** [https://quotegarden.com/](https://quotegarden.com/)\n", + "\n", + "To help me give you even more relevant quotes, could you tell me:\n", + "\n", + "* **What kind of inspiration are you looking for?** (e.g., motivation for work, overcoming challenges, self-love, etc.)\n", + "* **Is there a particular theme or topic you'd like quotes about?**\n" + ] + } + ], + "execution_count": null + }, + { + "metadata": { + "id": "TPf01271RuTs" + }, + "cell_type": "markdown", + "source": [ + "## Use the model directly\n", + "\n", + "Here's an example of predicting a single token, directly calling the model.\n", + "\n", + "The model input expectes encoded tokens. For this, we first need to encode the prompt with our tokenizer. See our [tokenizer](https://gemma-llm.readthedocs.io/en/latest/tokenizer.html) documentation for more information on using the tokenizer." + ] + }, + { + "metadata": { + "id": "mvCAQCDXZ0D3" + }, + "cell_type": "code", + "source": [ + "tokenizer = gm.text.Gemma3Tokenizer()" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": { + "id": "kH534DHohG67" + }, + "cell_type": "markdown", + "source": [ + "Note: When encoding the prompt, don't forget to add the beginning-of-string token with `add_bos=True`. All prompts feed to the model should start by this token." + ] + }, + { + "metadata": { + "id": "T1GC0OPRhHGc" + }, + "cell_type": "code", + "source": [ + "prompt = tokenizer.encode('One word to describe Paris: \\n\\n', add_bos=True)\n", + "prompt = jnp.asarray(prompt)" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": { + "id": "hHqyFnskZ5SC" + }, + "cell_type": "markdown", + "source": [ + "We then can call the model, and get the predicted logits." + ] + }, + { + "metadata": { + "colab": { + "height": 35 + }, + "executionInfo": { + "elapsed": 7194, + "status": "ok", + "timestamp": 1741277771722, + "user": { + "displayName": "", + "userId": "" + }, + "user_tz": -60 + }, + "id": "G3Kbo9hgRuTt", + "outputId": "3b0e468e-c55c-466a-ece9-ade04926612d" + }, + "cell_type": "code", + "source": [ + "# Run the model\n", + "out = model.apply(\n", + " {'params': params},\n", + " tokens=prompt,\n", + " return_last_only=True, # Only predict the last token\n", + ")\n", + "\n", + "\n", + "# Sample a token from the predicted logits\n", + "next_token = jax.random.categorical(\n", + " jax.random.key(1),\n", + " out.logits\n", + ")\n", + "tokenizer.decode(next_token)" + ], + "outputs": [ + { + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, + "text/plain": [ + "'Romantic'" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "execution_count": null + }, + { + "metadata": { + "id": "TNb88k3EF5gu" + }, + "cell_type": "markdown", + "source": [ + "You can also display the next token probability." + ] + }, + { + "metadata": { + "colab": { + "height": 542 + }, + "executionInfo": { + "elapsed": 104, + "status": "ok", + "timestamp": 1741277773577, + "user": { + "displayName": "", + "userId": "" + }, + "user_tz": -60 + }, + "id": "mIkOdE9dF45s", + "outputId": "ab610392-44f5-44cc-9370-bd881bc9136b" + }, + "cell_type": "code", + "source": [ + "tokenizer.plot_logits(out.logits)" + ], + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + "\n", + "
\n", + "
\n", + "\n", + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "execution_count": null + }, + { + "metadata": { + "id": "3yeUUP8Rh5mU" + }, + "cell_type": "markdown", + "source": [ + "## Next steps\n", + "\n", + "* See our [multimodal](https://gemma-llm.readthedocs.io/en/latest/multimodal.html) example to query the model with images.\n", + "* See our [finetuning](https://gemma-llm.readthedocs.io/en/latest/finetuning.html) example to train Gemma on your custom task.\n", + "* See our [tool use](https://gemma-llm.readthedocs.io/en/latest/tool_use.html) tutorial to extend Gemma with external tools.\n" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "last_runtime": {}, + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/tooling/google-official/deepmind-gemma/colab_tool_use.ipynb b/tooling/google-official/deepmind-gemma/colab_tool_use.ipynb new file mode 100644 index 0000000..933dfe0 --- /dev/null +++ b/tooling/google-official/deepmind-gemma/colab_tool_use.ipynb @@ -0,0 +1,568 @@ +{ + "cells": [ + { + "metadata": { + "id": "-KkvqLgjiIdD" + }, + "cell_type": "markdown", + "source": [ + "# Tool Use\n", + "\n", + "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google-deepmind/gemma/blob/main/colabs/tool_use.ipynb)\n", + "\n", + "Demo to show how to use tool-use with Gemma library.\n", + "\n", + "Note: The Gemma 1, 2 and 3 models were not specifically trained for tool use. This is more a proof-of-concept than an officially supported feature." + ] + }, + { + "metadata": { + "id": "gcNRfVEnj4aq" + }, + "cell_type": "code", + "source": [ + "!pip install -q gemma" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": { + "executionInfo": { + "elapsed": 2221, + "status": "ok", + "timestamp": 1749202985345, + "user": { + "displayName": "", + "userId": "" + }, + "user_tz": -120 + }, + "id": "k1ZAgLg1j9NT" + }, + "cell_type": "code", + "source": [ + "# Common imports\n", + "import os\n", + "import datetime\n", + "\n", + "# Gemma imports\n", + "from gemma import gm" + ], + "outputs": [], + "execution_count": 3 + }, + { + "metadata": { + "id": "139lZszJj_CC" + }, + "cell_type": "markdown", + "source": [ + "By default, Jax does not utilize the full GPU memory, but this can be overwritten. See [GPU memory allocation](https://docs.jax.dev/en/latest/gpu_memory_allocation.html):" + ] + }, + { + "metadata": { + "executionInfo": { + "elapsed": 2, + "status": "ok", + "timestamp": 1749138071985, + "user": { + "displayName": "", + "userId": "" + }, + "user_tz": -120 + }, + "id": "VtlWWLIYj_LJ" + }, + "cell_type": "code", + "source": [ + "os.environ[\"XLA_PYTHON_CLIENT_MEM_FRACTION\"]=\"1.00\"" + ], + "outputs": [], + "execution_count": 2 + }, + { + "metadata": { + "id": "31JPZb5RkD_p" + }, + "cell_type": "markdown", + "source": [ + "Load the model and the params." + ] + }, + { + "metadata": { + "executionInfo": { + "elapsed": 39057, + "status": "ok", + "timestamp": 1749203024713, + "user": { + "displayName": "", + "userId": "" + }, + "user_tz": -120 + }, + "id": "RsAo6k4_kEJS", + "outputId": "e10afb5c-6c81-42e8-e590-a39ea4ef3bf7" + }, + "cell_type": "code", + "source": [ + "model = gm.nn.Gemma3_4B()\n", + "\n", + "params = gm.ckpts.load_params(gm.ckpts.CheckpointPath.GEMMA3_4B_IT)" + ], + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:2025-06-06 02:43:16,896:jax._src.xla_bridge:749: Unable to initialize backend 'pathways': Could not initialize backend 'pathways'\n", + "INFO:2025-06-06 02:43:16,897:jax._src.xla_bridge:749: Unable to initialize backend 'proxy': INVALID_ARGUMENT: IFRT proxy server address must be '://' (e.g., 'grpc://localhost'), but got \n", + "INFO:2025-06-06 02:43:16,900:jax._src.xla_bridge:749: Unable to initialize backend 'mlcr': Could not initialize backend 'mlcr'\n", + "INFO:2025-06-06 02:43:16,901:jax._src.xla_bridge:749: Unable to initialize backend 'sliceme': Could not initialize backend 'sliceme'\n" + ] + } + ], + "execution_count": 4 + }, + { + "metadata": { + "id": "p108c5yIlYH7" + }, + "cell_type": "markdown", + "source": [ + "## Using existing tools\n", + "\n", + "If you're familiar with the [sampling](https://gemma-llm.readthedocs.io/en/latest/sampling.html) tutorial, using tool-use differ in two ways:\n", + "\n", + "1. Using the `gm.text.ToolSampler` rather than the `gm.text.ChatSampler`.\n", + "2. Passing the `tools=` you want to use to the sampler.\n", + "\n", + "For example:" + ] + }, + { + "metadata": { + "colab": { + "height": 594 + }, + "executionInfo": { + "elapsed": 50615, + "status": "ok", + "timestamp": 1749138791069, + "user": { + "displayName": "", + "userId": "" + }, + "user_tz": -120 + }, + "id": "iRCV5h8BlVX6", + "outputId": "b3b5d83d-8a8b-4982-fc8f-d409fb8b38a9" + }, + "cell_type": "code", + "source": [ + "sampler = gm.text.ToolSampler(\n", + " model=model,\n", + " params=params,\n", + " tools=[\n", + " gm.tools.Calculator(),\n", + " gm.tools.FileExplorer(),\n", + " ],\n", + " print_stream=True,\n", + ")\n", + "\n", + "output = sampler.chat('I have a serie `Sn+1 = cos(Sn) * 2`. Using the calculator, compute the steps 0-4 for S0 = 3')" + ], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Thought: I need to compute S0, S1, S2, S3, and S4 using the given recursive formula Sn+1 = cos(Sn) * 2 and S0 = 3.\n", + "Let's start with S0 = 3.\n", + "S1 = cos(S0) * 2 = cos(3) * 2\n", + "S2 = cos(S1) * 2 = cos(cos(3) * 2) * 2\n", + "S3 = cos(S2) * 2 = cos(cos(cos(3) * 2) * 2) * 2\n", + "S4 = cos(S3) * 2 = cos(cos(cos(cos(3) * 2) * 2)) * 2\n", + "\n", + "I will use the calculator to compute these values.\n", + "{\"tool_name\": \"calculator\", \"expression\": \"cos(3) * 2\"}\n", + "\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[Tool result: -1.9799849932008908]\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Thought: Now I need to compute S1 = cos(S0) * 2 = cos(3) * 2 = -1.9799849932008908 * 2\n", + "{\"tool_name\": \"calculator\", \"expression\": \"-1.9799849932008908 * 2\"}\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[Tool result: -3.9599699864017817]\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Thought: Now I need to compute S2 = cos(S1) * 2 = cos(-3.9599699864017817) * 2\n", + "{\"tool_name\": \"calculator\", \"expression\": \"cos(-3.9599699864017817) * 2\"}\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[Tool result: -1.3668134299076982]\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Thought: Now I need to compute S3 = cos(S2) * 2 = cos(-1.3668134299076982) * 2\n", + "{\"tool_name\": \"calculator\", \"expression\": \"cos(-1.3668134299076982) * 2\"}\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[Tool result: 0.4051424976130353]\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Thought: Now I need to compute S4 = cos(S3) * 2 = cos(0.4051424976130353) * 2\n", + "{\"tool_name\": \"calculator\", \"expression\": \"cos(0.4051424976130353) * 2\"}\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[Tool result: 1.8380924822033438]\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The values are: S0 = 3, S1 = -3.9599699864017817, S2 = -1.3668134299076982, S3 = 0.4051424976130353, S4 = 1.8380924822033438" + ] + } + ], + "execution_count": 10 + }, + { + "metadata": { + "id": "FAI54F-Blkan" + }, + "cell_type": "markdown", + "source": [ + "Note: Only the final model answer is returned. You can access the conversation history, including all intermediates tool calls and output through `sampler.turns` property." + ] + }, + { + "metadata": { + "id": "D0_IIS1Nlfuw" + }, + "cell_type": "markdown", + "source": [ + "## Creating your own tool\n", + "\n", + "To create your own tool, you can inherit from the `gm.tools.Tool` class. You should provide:\n", + "\n", + "* A description & example, so the model knows how to use your tool\n", + "* Implement the `call` method. The `call` function can take arbitrary `**kwargs`, but the name of the args should match the ones defined in `tool_kwargs` and `tool_kwargs_doc`" + ] + }, + { + "metadata": { + "executionInfo": { + "elapsed": 55, + "status": "ok", + "timestamp": 1749203934196, + "user": { + "displayName": "", + "userId": "" + }, + "user_tz": -120 + }, + "id": "XqmQcfdI0oEl" + }, + "cell_type": "code", + "source": [ + "class DateTime(gm.tools.Tool):\n", + " \"\"\"Tool to access the current date.\"\"\"\n", + "\n", + " DESCRIPTION = 'Access the current date, time,...'\n", + " EXAMPLE = gm.tools.Example(\n", + " query='Which day of the week are we today ?',\n", + " thought='The `datetime.strptime` uses %a for day of the week',\n", + " tool_kwargs={'format': '%a'},\n", + " tool_kwargs_doc={'format': ''},\n", + " result='Sat',\n", + " answer='Today is Saturday.',\n", + " )\n", + "\n", + " def call(self, format: str) -> str:\n", + " dt = datetime.datetime.now()\n", + " return dt.strftime(format)\n" + ], + "outputs": [], + "execution_count": 7 + }, + { + "metadata": { + "id": "sSxYhXPuuXYp" + }, + "cell_type": "markdown", + "source": [ + "The tool can then be used in the sampler:" + ] + }, + { + "metadata": { + "colab": { + "height": 118 + }, + "executionInfo": { + "elapsed": 2156, + "status": "ok", + "timestamp": 1749204833094, + "user": { + "displayName": "", + "userId": "" + }, + "user_tz": -120 + }, + "id": "9S8xB2B-0cbW", + "outputId": "fccc0e89-e922-4184-8b77-800041cdd77e" + }, + "cell_type": "code", + "source": [ + "sampler = gm.text.ToolSampler(\n", + " model=model,\n", + " params=params,\n", + " tools=[\n", + " DateTime(),\n", + " ],\n", + " print_stream=True,\n", + ")\n", + "\n", + "output = sampler.chat('Which date are we today ?')" + ], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Thought: I need to get the current date.\n", + "{\"tool_name\": \"datetime\", \"format\": \"%Y-%m-%d\"}\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[Tool result: 2025-06-06]\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Today is June 6th, 2025." + ] + } + ], + "execution_count": 9 + }, + { + "metadata": { + "id": "esIpCjhxzHmf" + }, + "cell_type": "markdown", + "source": [ + "## Next steps\n", + "\n", + "* See our [multimodal](https://gemma-llm.readthedocs.io/en/latest/multimodal.html) example to query the model with images.\n", + "* See our [finetuning](https://gemma-llm.readthedocs.io/en/latest/finetuning.html) example to train Gemma on your custom task.\n" + ] + } + ], + "metadata": { + "colab": { + "last_runtime": {}, + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/tooling/google-official/deepmind-gemma/example_classification.py b/tooling/google-official/deepmind-gemma/example_classification.py new file mode 100644 index 0000000..4ffdbc7 --- /dev/null +++ b/tooling/google-official/deepmind-gemma/example_classification.py @@ -0,0 +1,130 @@ +# Copyright 2026 DeepMind Technologies Limited. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +r"""Example config for finetuning Gemma for a classification task. + +* Input: A text to classify. +* Output: A classification label. The pre-trained Gemma model is trained to + predict one world among 256.000. Here, we're finetuning to predict only 2 + tokens among the 256.000 available. + +Train locally with: + +```sh +python -m kauldron.main \ + --cfg=examples/classification.py \ + --cfg.workdir=/tmp/kauldron_oss/workdir +``` + +""" + +from kauldron import konfig + +# pylint: disable=g-import-not-at-top +with konfig.imports(): + from gemma import gm + from kauldron import kd + import optax +# pylint: enable=g-import-not-at-top + + +def get_config(): + """Get the default hyperparameter configuration.""" + return kd.train.Trainer( + seed=42, + # Dataset + train_ds=_make_dataset(training=True), + # Model definition + model=gm.nn.Gemma3_4B( + tokens="batch.sentence", + return_last_only=True, + ), + # Load the weights from the pretrained checkpoint + init_transform=gm.ckpts.LoadCheckpoint( + path=gm.ckpts.CheckpointPath.GEMMA3_4B_IT, + ), + # Training + num_train_steps=10_000, + train_losses={ + "xentropy": kd.losses.SoftmaxCrossEntropyWithIntLabels( + logits="preds.logits", + labels="batch.label", + ), + }, + optimizer=optax.adafactor(learning_rate=1e-4), + checkpointer=kd.ckpts.Checkpointer( + save_interval_steps=500, + ), + # Evaluation + evals={ + "test": kd.evals.Evaluator( + run=kd.evals.EveryNSteps(1000), + ds=_make_dataset(training=False), + ), + }, + ) + + +def _make_dataset(training: bool) -> kd.data.Pipeline: + # Dict key names from the dataset + _INPUT_FIELD = "sentence" # pylint: disable=invalid-name + _LABEL_FIELD = "label" # pylint: disable=invalid-name + + tokenizer = gm.text.Gemma3Tokenizer() + + return kd.data.py.Tfds( + name="glue/cola", + split="train" if training else "validation", + shuffle=True if training else False, + num_epochs=None if training else 1, + batch_size=8, + transforms=[ + # Process the input text + # TFDS datasets returns `bytes`, so convert them to `str` + gm.data.DecodeBytes(key=_INPUT_FIELD), + gm.data.FormatText( + key=_INPUT_FIELD, + template="""user + Please classify whether the following sentence is grammaticaly correct, please answer only with Yes or No. + Sentence: {text} + model""", + ), + gm.data.Tokenize( + key=_INPUT_FIELD, + tokenizer=tokenizer, + add_bos=True, + ), + gm.data.Pad( + key=_INPUT_FIELD, + max_length=128, + ), + # Process the label + gm.data.MapInts( + key=_LABEL_FIELD, + # Rather than predicting the token 0 and 1, we are using the + # token 1294 and 3553 which respectivelly correspond to "No" and + # "Yes". We do this because those token already contain semantic + # information, so even zero-shot prediction without any + # finetuning has better than random performances. + old_to_new={ + 0: 1294, # Token -> "No" + 1: 3553, # Token -> "Yes" + }, + ), + kd.data.Rearrange( + key=_LABEL_FIELD, + pattern="... -> ... 1", # For shape compatibility with the loss. + ), + ], + ) diff --git a/tooling/google-official/deepmind-gemma/example_dpo.py b/tooling/google-official/deepmind-gemma/example_dpo.py new file mode 100644 index 0000000..2d2565a --- /dev/null +++ b/tooling/google-official/deepmind-gemma/example_dpo.py @@ -0,0 +1,122 @@ +# Copyright 2026 DeepMind Technologies Limited. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +r"""DPO Example. + +DPO works by running two answers (one prefered and one rejected) into both +the reference model and the model to finetune. Then the DPO loss is used to +increase the likelihood of generating the preferred answer. + +Implementation wise, this is done by: + +* Wrapping the model inside a `gm.nn.AnchoredPolicy` (which runs both the + model and the reference frozen model) +* Using the `gm.ckpts.AnchoredPolicyLoader` to restore the weights, so the + weights are correctly mapped to inside `gm.nn.AnchoredPolicy`. + + +Train locally with: + +```sh +python -m kauldron.main \ + --cfg=examples/dpo.py \ + --cfg.workdir=/tmp/kauldron_oss/workdir +``` + +""" + +from kauldron import konfig + +# pylint: disable=g-import-not-at-top +with konfig.imports(): + from gemma import gm + from kauldron import kd + import optax +# pylint: enable=g-import-not-at-top + + +def get_config(): + """Get the default hyperparameter configuration.""" + return kd.train.Trainer( + seed=42, + # Dataset + train_ds=_make_dataset(training=True), + # Model definition + model=gm.nn.AnchoredPolicy( + policy=gm.nn.Gemma3_4B(tokens="batch.tokens", text_only=True), + ), + # Load the weights from the pretrained checkpoint + init_transform=gm.ckpts.AnchoredPolicyLoader( + policy=gm.ckpts.LoadCheckpoint( + path=gm.ckpts.CheckpointPath.GEMMA3_4B_IT, + ), + ), + # Training + num_train_steps=10_000, + train_losses={ + "dpo": gm.losses.DpoLoss( + tokens="batch.targets", + sequence_mask="batch.mask", + policy_logits="preds.policy.logits", + anchor_logits="preds.anchor.logits", + ), + }, + optimizer=optax.adafactor(learning_rate=1e-4), + checkpointer=kd.ckpts.Checkpointer( + save_interval_steps=500, + ), + # Evaluation + evals={ + # "test": kd.evals.Evaluator( + # run=kd.evals.EveryNSteps(1000), + # ds=_make_dataset(training=False), + # ), + }, + ) + + +def _make_dataset(training: bool) -> kd.data.Pipeline: + # TODO(epot): !!!! + max_length = 512 + batch_size = 16 + + tokenizer = gm.text.Gemma3Tokenizer() + + return kd.data.py.HuggingFace( + path="argilla/distilabel-math-preference-dpo", + split="train", + shuffle=True if training else False, + num_epochs=None if training else 1, + batch_size=batch_size, + transforms=[ + # Only keep the fields we need. + kd.data.Elements( + keep=["instruction", "chosen_response", "rejected_response"] + ), + # Create the model inputs and loss mask. + gm.data.ContrastiveTask( + in_prompt="instruction", + in_chosen="chosen_response", + in_rejected="rejected_response", + out_tokens="tokens", + out_targets="targets", + out_mask="mask", + tokenizer=tokenizer, + # Padding parameters + max_length=max_length, + # TODO(epot): Run stats (how many examples are we dropping?) + truncate=True, + ), + ], + ) diff --git a/tooling/google-official/deepmind-gemma/example_lora.py b/tooling/google-official/deepmind-gemma/example_lora.py new file mode 100644 index 0000000..736b0e1 --- /dev/null +++ b/tooling/google-official/deepmind-gemma/example_lora.py @@ -0,0 +1,154 @@ +# Copyright 2026 DeepMind Technologies Limited. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +r"""Example of Gemma finetuning using LoRA. + +This example is based on the `seq2seq.py` example. See the +docstring of that file for more details. + +The changes to use LoRA are: + +* `model`: Use `gm.nn.LoRA()` wrapper to add `LoRA` adapters to the + model. +* `init_transform`: Use `gm.ckpts.SkipLoRA()` wrapper to only restore the + non-LoRA weights. +* `optimizer`: Use `kd.optim.partial_updates` wrapper to only train the LoRA + weights. + +Train locally with: + +```sh +python -m kauldron.main \ + --cfg=examples/lora.py \ + --cfg.workdir=/tmp/kauldron_oss/workdir +``` + +""" + +from kauldron import konfig + +# pylint: disable=g-import-not-at-top +with konfig.imports(): + from gemma import gm + from kauldron import kd + import optax +# pylint: enable=g-import-not-at-top + + +def get_config(): + batch_size = 16 + max_length = 512 + + return kd.train.Trainer( + seed=42, + # Dataset + train_ds=_make_dataset( + training=True, + batch_size=batch_size, + max_length=max_length, + ), + # Model definition + model=gm.nn.LoRA( + rank=4, + model=gm.nn.Gemma3_4B( + tokens="batch.input", + # TODO(epot): At the moment, LoRA fine-tuning with multimodal + # is not supported. Willbe fixed soon. + text_only=True, + ), + ), + # Load the weights from the pretrained checkpoint + # Use `SkipLoRA` as the original checkpoint does not contain the LoRA + # weights. + init_transform=gm.ckpts.SkipLoRA( + wrapped=gm.ckpts.LoadCheckpoint( + path=gm.ckpts.CheckpointPath.GEMMA3_4B_IT, + ) + ), + # Training + num_train_steps=10_000, + train_losses={ + "xentropy": kd.losses.SoftmaxCrossEntropyWithIntLabels( + logits="preds.logits", + labels="batch.target", + mask="batch.loss_mask", + ), + }, + # TODO(epot): Add Gradient accumenlation. + optimizer=kd.optim.partial_updates( + optax.adafactor(learning_rate=0.005), + # We only optimize the LoRA weights. The rest of the model is frozen. + mask=kd.optim.select("lora"), + ), + checkpointer=kd.ckpts.Checkpointer( + save_interval_steps=500, + ), + # Evaluation + evals={ + "test": kd.evals.Evaluator( + run=kd.evals.EveryNSteps(1000), + ds=_make_dataset( + training=False, + batch_size=batch_size, + max_length=max_length, + ), + ), + # The sampler evaluator run inference on a few prompts from the + # test set. + "sampling": gm.evals.SamplerEvaluator( + run=kd.evals.EveryNSteps(1000), + max_new_tokens=150, # Sampling parameters + num_batches=1, # Only predict a single example (batch_size=None) + ds=_make_dataset(training=False, sampling=True), + ), + }, + ) + + +def _make_dataset( + *, + training: bool, + sampling: bool = False, + batch_size: int | None = None, + max_length: int | None = None, +): + tokenizer = gm.text.Gemma3Tokenizer() + + return kd.data.py.Tfds( + name="mtnt/en-fr", + split="train" if training else "test", + shuffle=True if training else False, + num_epochs=None if training else 1, + batch_size=None if sampling else batch_size, + num_workers=4, + transforms=[ + # Create the model inputs/targets/loss_mask. + gm.data.Seq2SeqTask( + # Select which field from the dataset to use. + # https://www.tensorflow.org/datasets/catalog/mtnt + in_prompt="src", + in_response="dst", + # Output batch is {"input": ..., "target": ..., "loss_mask": ...} + out_input="input", + out_target="target", + out_target_mask="loss_mask", + tokenizer=tokenizer, + # Padding parameters + max_length=None if sampling else max_length, + # In this dataset, ~1% of examples are longer than 512 tokens. + truncate=True, + sampling=sampling, + ), + ], + ) diff --git a/tooling/google-official/deepmind-gemma/example_multimodal.py b/tooling/google-official/deepmind-gemma/example_multimodal.py new file mode 100644 index 0000000..d4341f2 --- /dev/null +++ b/tooling/google-official/deepmind-gemma/example_multimodal.py @@ -0,0 +1,164 @@ +# Copyright 2026 DeepMind Technologies Limited. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +r"""Example of Gemma finetuning for an image captioning task. + +Example: + +Prompt: + +``` +user + +model +``` + +Target: + +``` +A diagram showing a circuit with a battery, lamp, and switch. +``` + +Here, the prompt only contains the `` to indicate an image +is inserted. + +Train locally with: + +```sh +python -m kauldron.main \ + --cfg=examples/multimodal.py \ + --cfg.workdir=/tmp/kauldron_oss/workdir +``` + +""" + +from kauldron import konfig + +# pylint: disable=g-import-not-at-top +with konfig.imports(): + import jax.numpy as jnp + from gemma import gm + from kauldron import kd + import optax +# pylint: enable=g-import-not-at-top + + +def get_config(): + batch_size = 32 + max_length = 200 + + return kd.train.Trainer( + seed=42, + # Dataset + train_ds=_make_dataset( + training=True, + batch_size=batch_size, + max_length=max_length, + ), + # Model definition + model=gm.nn.Gemma3_4B( + tokens="batch.input", + images="batch.image", + ), + # Load the weights from the pretrained checkpoint + init_transform=gm.ckpts.LoadCheckpoint( + path=gm.ckpts.CheckpointPath.GEMMA3_4B_IT, + ), + # Training + num_train_steps=10_000, + train_losses={ + "xentropy": kd.losses.SoftmaxCrossEntropyWithIntLabels( + logits="preds.logits", + labels="batch.target", + mask="batch.loss_mask", + ), + }, + train_summaries={ + "image": kd.summaries.ShowImages(images="batch.image", num_images=5), + }, + optimizer=optax.adafactor(learning_rate=1e-3), + checkpointer=kd.ckpts.Checkpointer( + save_interval_steps=500, + ), + # Evaluation + evals={ + "test": kd.evals.Evaluator( + run=kd.evals.EveryNSteps(1000), + ds=_make_dataset( + training=False, + batch_size=4, + max_length=max_length, + ), + ), + # The sampler evaluator run inference on a few prompts from the + # test set. + "sampling": gm.evals.SamplerEvaluator( + run=kd.evals.EveryNSteps(1000), + max_new_tokens=50, # Sampling parameters + num_batches=3, + ds=_make_dataset(training=False, sampling=True), + summaries={ + "image": kd.summaries.ShowImages( + images="batch.image", num_images=5 + ), + }, + ), + }, + ) + + +def _make_dataset( + *, + training: bool, + sampling: bool = False, + batch_size: int | None = None, + max_length: int | None = None, +): + tokenizer = gm.text.Gemma3Tokenizer() + + return kd.data.py.Tfds( + name="ai2dcaption", + split="llava_15" if training else "test", + shuffle=True if training else False, + num_epochs=None if training else 1, + batch_size=None if sampling else batch_size, + num_workers=4, + transforms=[ + # Only keep the fields we need.See fields at: + # https://www.tensorflow.org/datasets/catalog/ai2dcaption + kd.data.Elements(keep=["image", "caption"]), + # Create a new constant field + kd.data.AddConstants({"prompt": ""}), + # Create the model inputs/targets/loss_mask. + gm.data.Seq2SeqTask( + # Select which field from the dataset to use. + in_prompt="prompt", + in_response="caption", + # Output batch is {"input": ..., "target": ..., "loss_mask": ...} + out_input="input", + out_target="target", + out_target_mask="loss_mask", + tokenizer=tokenizer, + # Padding parameters + max_length=None if sampling else max_length, + # In this dataset, ~1% of examples are longer than 512 tokens. + truncate=True, + sampling=sampling, + ), + kd.data.py.Resize(key="image", size=(800, 800)), + # TODO(epot): Make the `num_images` dimension optional + kd.data.Rearrange(key="image", pattern="... h w c -> ... 1 h w c"), + kd.data.Cast(key="image", dtype=jnp.uint8), + ], + ) diff --git a/tooling/google-official/deepmind-gemma/example_sharding.py b/tooling/google-official/deepmind-gemma/example_sharding.py new file mode 100644 index 0000000..33f3d77 --- /dev/null +++ b/tooling/google-official/deepmind-gemma/example_sharding.py @@ -0,0 +1,133 @@ +# Copyright 2026 DeepMind Technologies Limited. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +r"""Example of Gemma finetuning for a prompt -> response task. + +This is a fork of the seq2seq example, but with sharding. +The only difference is the `sharding=kd.sharding.ShardingStrategy()` + +Train locally with: + +```sh +python -m kauldron.main \ + --cfg=examples/sharding.py \ + --cfg.workdir=/tmp/kauldron_oss/workdir +``` + +""" + +from kauldron import konfig + +# pylint: disable=g-import-not-at-top +with konfig.imports(): + from gemma import gm + from kauldron import kd + import optax +# pylint: enable=g-import-not-at-top + + +def get_config(): + batch_size = 16 + max_length = 512 + + return kd.train.Trainer( + seed=42, + # Dataset + train_ds=_make_dataset( + training=True, + batch_size=batch_size, + max_length=max_length, + ), + # Model definition + model=gm.nn.Gemma3_4B( + tokens="batch.input", + ), + sharding=kd.sharding.ShardingStrategy( + params=kd.sharding.FSDPSharding(), + ), + # Load the weights from the pretrained checkpoint + init_transform=gm.ckpts.LoadCheckpoint( + path=gm.ckpts.CheckpointPath.GEMMA3_4B_IT, + ), + # Training + num_train_steps=10_000, + train_losses={ + "xentropy": kd.losses.SoftmaxCrossEntropyWithIntLabels( + logits="preds.logits", + labels="batch.target", + mask="batch.loss_mask", + ), + }, + optimizer=optax.adafactor(learning_rate=1e-3), + checkpointer=kd.ckpts.Checkpointer( + save_interval_steps=500, + ), + # Evaluation + evals={ + "test": kd.evals.Evaluator( + run=kd.evals.EveryNSteps(1000), + ds=_make_dataset( + training=False, + batch_size=batch_size, + max_length=max_length, + ), + ), + # The sampler evaluator run inference on a few prompts from the + # test set. + "sampling": gm.evals.SamplerEvaluator( + run=kd.evals.EveryNSteps(1000), + max_new_tokens=50, # Sampling parameters + num_batches=1, # Only predict a single example (batch_size=None) + ds=_make_dataset(training=False, sampling=True), + ), + }, + ) + + +def _make_dataset( + *, + training: bool, + sampling: bool = False, + batch_size: int | None = None, + max_length: int | None = None, +): + tokenizer = gm.text.Gemma3Tokenizer() + + return kd.data.py.Tfds( + name="mtnt/en-fr", + split="train" if training else "test", + shuffle=True if training else False, + num_epochs=None if training else 1, + batch_size=None if sampling else batch_size, + num_workers=4, + transforms=[ + # Create the model inputs/targets/loss_mask. + gm.data.Seq2SeqTask( + # Select which field from the dataset to use. + # https://www.tensorflow.org/datasets/catalog/mtnt + in_prompt="src", + in_response="dst", + # Output batch is {"input": ..., "target": ..., "loss_mask": ...} + out_input="input", + out_target="target", + out_target_mask="loss_mask", + tokenizer=tokenizer, + # Padding parameters + max_length=None if sampling else max_length, + # In this dataset, ~1% of examples are longer than 512 tokens. + truncate=True, + sampling=sampling, + ), + ], + ) diff --git a/tooling/google-official/docs/ai-google-dev_core.html b/tooling/google-official/docs/ai-google-dev_core.html new file mode 100644 index 0000000..55991c1 --- /dev/null +++ b/tooling/google-official/docs/ai-google-dev_core.html @@ -0,0 +1,4739 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Gemma 4 model overview  |  Google AI for Developers + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+ + + + + + + + + + + + + + +
+
+
+ + + + + + + + + + +
+ + +
+
+ + +
+
+ +
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + +
+
+ + +
+
+ + +
+ + + + +
+ + + +
+ +
+ + + + +
+ + + + + + + + + + + + +
+ +
+
+ +
+ + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + + +
+
+ + + + + + + + + + + + + +
+ + + + + + + + + + + + +

+ Gemma 4 model overview + + +

+
+ + + + + + +
+ + + + +

+ +

+ + + +
+ + +
+ +

Gemma is a family of generative artificial intelligence models and you can +use them in a wide variety of generation tasks, including question answering, +summarization, and reasoning. Gemma models are provided with open weights and +permit responsible +commercial use, +allowing you to tune and deploy them in your own projects and applications.

+ +

Gemma 4 model family spans three distinct architectures tailored for specific +hardware requirements:

+ +
    +
  • Small Sizes: 2B and 4B effective parameter models built for +ultra-mobile, edge, and browser deployment (e.g., Pixel, Chrome).
  • +
  • Dense: A powerful 31B parameter dense model that bridges the gap between +server-grade performance and local execution.
  • +
  • Mixture-of-Experts: A highly efficient 26B MoE model designed for +high-throughput, advanced reasoning.
  • +
+ +

You can download Gemma 4 models from +Kaggle and +Hugging Face. +For more technical details on Gemma 4, see the +Model Card. +Earlier versions of Gemma core models are also available for download. For more +information, see Previous Gemma models.

+ +

Get it on Kaggle +Get it on Hugging Face

+ +

Capabilities

+ +
    +
  • Reasoning: All models in the family are designed as highly capable +reasoners, with configurable thinking +modes.
  • +
  • Extended Multimodalities: Processes Text, +Image with variable aspect ratio +and resolution support (all models), +Video, and +Audio (featured natively on the E2B and +E4B models).
  • +
  • Increased Context Window: Small models feature a 128K context window, +while the medium models support 256K.
  • +
  • Enhanced Coding & Agentic Capabilities: Achieves notable improvements in +coding benchmarks alongside built-in function-calling +support, powering +highly capable autonomous agents.
  • +
  • Native System Prompt Support: Gemma 4 introduces built-in support for +the system role, enabling more structured and controllable conversations.
  • +
+ +

Parameter sizes and quantization

+ +

Gemma 4 models are available in 4 parameter sizes: E2B, E4B, 31B and 26B A4B. +The models can be used with their default precision (16-bit) or with a lower +precision using quantization. The different sizes and precisions represent a set +of trade-offs for your AI application. Models with higher parameters and bit +counts (higher precision) are generally more capable, but are more expensive to +run in terms of processing cycles, memory cost and power consumption. Models +with lower parameters and bit counts (lower precision) have less capabilities, +but may be sufficient for your AI task.

+ +

Gemma 4 Inference Memory Requirements

+ +

The following table details the approximate GPU or TPU memory requirements for +running inference with each size of the Gemma 4 model versions.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ParametersBF16 (16-bit)SFP8 (8-bit)Q4_0 (4-bit)
Gemma 4 E2B9.6 GB4.6 GB3.2 GB
Gemma 4 E4B15 GB7.5 GB5 GB
Gemma 4 31B58.3 GB30.4 GB17.4 GB
Gemma 4 26B A4B48 GB25 GB15.6 GB
+ +

Table 1. Approximate GPU or TPU memory required to load Gemma 4 models based +on parameter count and quantization level.

+ +

Key Considerations for Memory Planning

+ +
    +
  • Efficient Architecture (E2B and E4B): The "E" stands for "effective" +parameters. The smaller models incorporate Per-Layer Embeddings (PLE) to +maximize parameter efficiency in on-device deployments. Rather than adding +more layers to the model, PLE gives each decoder layer its own small +embedding for every token. These embedding tables are large but only used +for quick lookups, which is why the total memory required to load static +weights is higher than the effective parameter count suggests.
  • +
  • The MoE Architecture (26B A4B): The 26B is a Mixture of Experts +model. While it only activates 4 billion parameters per token during +generation, all 26 billion parameters must be loaded into memory to +maintain fast routing and inference speeds. This is why its baseline memory +requirement is much closer to a dense 26B model than a 4B model.
  • +
  • Base Weights Only: The estimates in the preceding table only account +for the memory required to load the static model weights. They don't include +the additional VRAM needed for supporting software or the context window.
  • +
  • Context Window (KV Cache): Memory consumption will increase dynamically +based on the total number of tokens in your prompt and the generated +response. Larger context windows require significantly more VRAM on top of +the base model weights.
  • +
  • Fine-Tuning Overhead: Memory requirements for fine-tuning Gemma models +are drastically higher than for standard inference. Your exact footprint +will depend heavily on the development framework, batch size, and whether +you are using full-precision tuning versus a Parameter-Efficient Fine-Tuning +(PEFT) method like Low-Rank Adaptation (LoRA).
  • +
+ +

Previous Gemma models

+ +

You can work with previous generations of Gemma models, which are also +available from Kaggle and +Hugging Face. +For more technical details about previous Gemma models, see the following +model card pages:

+ + + +

Ready to start building? +Get started +with Gemma models!

+ + + + +
+ + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+ + + + + + + + +
+ +
+
+ + + + + + + + + + + + + + + + + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/tooling/google-official/docs/ai-google-dev_formatting.html b/tooling/google-official/docs/ai-google-dev_formatting.html new file mode 100644 index 0000000..a131848 --- /dev/null +++ b/tooling/google-official/docs/ai-google-dev_formatting.html @@ -0,0 +1,4632 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Gemma formatting and system instructions  |  Google AI for Developers + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+ + + + + + + + + + + + + + +
+
+
+ + + + + + + + + + +
+ + +
+
+ + +
+
+ +
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + +
+
+ + +
+
+ + +
+ + + + +
+ + + +
+ +
+ + + + +
+ + + + + + + + + + + + +
+ +
+
+ +
+ + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + + +
+
+ + + + + + + + + + + + + +
+ + + + + + + + + + + + +

+ Gemma formatting and system instructions + + +

+
+ + + + + + +
+ + + + +

+ +

+ + + +

Gemma instruction-tuned (IT) models are trained with a specific formatter that +annotates all instruction tuning examples with extra information, both at +training and inference time. The formatter has two purposes:

+ +
    +
  1. Indicating roles in a conversation, such as the system, user, or +assistant roles.
  2. +
  3. Delineating turns in a conversation, especially in a multi-turn +conversation.
  4. +
+ +

Below, we specify the control tokens used by Gemma and their use cases. Note +that the control tokens are reserved in and specific to our tokenizer.

+ +
    +
  • Token to indicate a user turn: user
  • +
  • Token to indicate a model turn: model
  • +
  • Token to indicate the beginning of dialogue turn: <start_of_turn>
  • +
  • Token to indicate the end of dialogue turn: <end_of_turn>
  • +
+ +

Here's an example dialogue:

+
<start_of_turn>user
+knock knock<end_of_turn>
+<start_of_turn>model
+who is there<end_of_turn>
+<start_of_turn>user
+Gemma<end_of_turn>
+<start_of_turn>model
+Gemma who?<end_of_turn>
+
+

The token "<end_of_turn>\n" is the turn separator, and the prompt prefix is +"<start_of_turn>model\n". This means that if you'd like to prompt the model +with a question like, "What is Cramer's Rule?", you should instead feed the +model as follows:

+
"<start_of_turn>user
+What is Cramer's Rule?<end_of_turn>
+<start_of_turn>model"
+
+

Note that if you want to finetune the pretrained Gemma models with your own +data, you can use any such schema for control tokens, as long as it's consistent +between your training and inference use cases.

+ +

System instructions

+ +

Gemma's instruction-tuned models are designed to work with only two roles: +user and model. Therefore, the system role or a system turn is not +supported.

+ +

Instead of using a separate system role, provide system-level instructions +directly within the initial user prompt. The model instruction following +capabilities allow Gemma to interpret the instructions effectively. For example:

+
<start_of_turn>user
+Only reply like a pirate.
+
+What is the answer to life the universe and everything?<end_of_turn>
+<start_of_turn>model
+Arrr, 'tis 42,<end_of_turn>
+
+ + + + +
+ + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+ + + + + + + + +
+ +
+
+ + + + + + + + + + + + + + + + + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/tooling/google-official/docs/ai-google-dev_function_calling_gemma4.html b/tooling/google-official/docs/ai-google-dev_function_calling_gemma4.html new file mode 100644 index 0000000..dea6afa --- /dev/null +++ b/tooling/google-official/docs/ai-google-dev_function_calling_gemma4.html @@ -0,0 +1,5193 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Function calling with Gemma 4  |  Google AI for Developers + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+ + + + + + + + + + + + + + +
+
+
+ + + + + + + + + + +
+ + +
+
+ + +
+
+ +
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + +
+
+ + +
+
+ + +
+ + + + +
+ + + +
+ +
+ + + + +
+ + + + + + + + + + + + +
+ +
+
+ +
+ + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + + +
+
+ + + + + + + + + + + + + +
+ + + + + + + + + + + + +

+ Function calling with Gemma 4 + + +

+
+ + + + + + +
+ + + + +

+

+ + + + + + + + + + + +
+ View on ai.google.dev + + Run in Google Colab + + Run in Kaggle + + Open in Vertex AI + + View source on GitHub +
+ +

When using a generative artificial intelligence (AI) model such as Gemma, you +may want to use the model to operate programming interfaces in order to complete +tasks or answer questions. Instructing a model by defining a programming +interface and then making a request that uses that interface is called function +calling.

+ +
+
+ +

This guide shows the process of using Gemma 4 within the Hugging Face ecosystem.

+ +

This notebook will run on T4 GPU.

+ +

Install Python packages

+ +

Install the Hugging Face libraries required for running the Gemma model and making requests.

+ +
# Install PyTorch & other libraries
+pip install torch accelerate
+
+# Install the transformers library
+pip install transformers
+ +

Load Model

+ +

Use the transformers libraries to create an instance of a processor and model using the AutoProcessor and AutoModelForImageTextToText classes as shown in the following code example:

+
MODEL_ID = "google/gemma-4-E2B-it" # @param ["google/gemma-4-E2B-it","google/gemma-4-E4B-it", "google/gemma-4-31B-it", "google/gemma-4-26B-A4B-it"]
+
+from transformers import AutoProcessor, AutoModelForMultimodalLM
+
+model = AutoModelForMultimodalLM.from_pretrained(MODEL_ID, dtype="auto", device_map="auto")
+processor = AutoProcessor.from_pretrained(MODEL_ID)
+
+
+Loading weights:   0%|          | 0/2011 [00:00<?, ?it/s]
+
+ +

Passing Tools

+ +

You can pass tools to the model using the apply_chat_template() function via the tools argument. There are two methods for defining these tools:

+ +
    +
  • JSON schema: You can manually construct a JSON dictionary defining the function name, description, and parameters (including types and required fields).
  • +
  • Raw Python Functions: You can pass actual Python functions. The system automatically generates the required JSON schema by parsing the function's type hints, arguments, and docstrings. For best results, docstrings should adhere to the Google Python Style Guide.
  • +
+ +

Below is the example with the JSON schema.

+
from transformers import TextStreamer
+
+weather_function_schema = {
+    "type": "function",
+    "function": {
+        "name": "get_current_temperature",
+        "description": "Gets the current temperature for a given location.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "location": {
+                    "type": "string",
+                    "description": "The city name, e.g. San Francisco",
+                },
+            },
+            "required": ["location"],
+        },
+    }
+}
+
+message = [
+    {
+        "role": "system", "content": "You are a helpful assistant."
+    },
+    {
+        "role": "user", "content": "What's the temperature in London?"
+    }
+]
+
+text = processor.apply_chat_template(message, tools=[weather_function_schema], tokenize=False, add_generation_prompt=True)
+inputs = processor(text=text, return_tensors="pt").to(model.device)
+streamer = TextStreamer(processor)
+outputs = model.generate(**inputs, streamer=streamer, max_new_tokens=64)
+
+
+<bos><|turn>system
+You are a helpful assistant.<|tool>declaration:get_current_temperature{description:<|"|>Gets the current temperature for a given location.<|"|>,parameters:{properties:{location:{description:<|"|>The city name, e.g. San Francisco<|"|>,type:<|"|>STRING<|"|>} },required:[<|"|>location<|"|>],type:<|"|>OBJECT<|"|>} }<tool|><turn|>
+<|turn>user
+What's the temperature in London?<turn|>
+<|turn>model
+<|tool_call>call:get_current_temperature{location:<|"|>London<|"|>}<tool_call|><|tool_response>
+
+ +

And the same example with the raw Python function.

+
from transformers.utils import get_json_schema
+
+def get_current_temperature(location: str):
+    """
+    Gets the current temperature for a given location.
+
+    Args:
+        location: The city name, e.g. San Francisco
+    """
+    return "15°C"
+
+message = [
+    {
+        "role": "user", "content": "What's the temperature in London?"
+    }
+]
+
+text = processor.apply_chat_template(message, tools=[get_json_schema(get_current_temperature)], tokenize=False, add_generation_prompt=True)
+inputs = processor(text=text, return_tensors="pt").to(model.device)
+streamer = TextStreamer(processor)
+outputs = model.generate(**inputs, streamer=streamer, max_new_tokens=256)
+
+
+<bos><|turn>system
+<|tool>declaration:get_current_temperature{description:<|"|>Gets the current temperature for a given location.<|"|>,parameters:{properties:{location:{description:<|"|>The city name, e.g. San Francisco<|"|>,type:<|"|>STRING<|"|>} },required:[<|"|>location<|"|>],type:<|"|>OBJECT<|"|>} }<tool|><turn|>
+<|turn>user
+What's the temperature in London?<turn|>
+<|turn>model
+<|tool_call>call:get_current_temperature{location:<|"|>London<|"|>}<tool_call|><|tool_response>
+
+ +

Full function calling sequence

+ +

This section demonstrates a three-stage cycle for connecting the model to external tools: the Model's Turn to generate function call objects, the Developer's Turn to parse and execute code (such as a weather API), and the Final Response where the model uses the tool's output to answer the user.

+ +

Model's Turn

+ +

Here's the user prompt "Hey, what's the weather in Tokyo right now?", and the tool [get_current_weather]. Gemma generates a function call object as follows.

+
# Define a function that our model can use.
+def get_current_weather(location: str, unit: str = "celsius"):
+    """
+    Gets the current weather in a given location.
+
+    Args:
+        location: The city and state, e.g. "San Francisco, CA" or "Tokyo, JP"
+        unit: The unit to return the temperature in. (choices: ["celsius", "fahrenheit"])
+
+    Returns:
+        temperature: The current temperature in the given location
+        weather: The current weather in the given location
+    """
+    return {"temperature": 15, "weather": "sunny"}
+
+prompt = "Hey, what's the weather in Tokyo right now?"
+tools = [get_current_weather]
+
+message = [
+    {
+        "role": "system", "content": "You are a helpful assistant."
+    },
+    {
+        "role": "user", "content": prompt
+    },
+]
+
+text = processor.apply_chat_template(message, tools=tools, tokenize=False, add_generation_prompt=True)
+inputs = processor(text=text, return_tensors="pt").to(model.device)
+out = model.generate(**inputs, max_new_tokens=128)
+generated_tokens = out[0][len(inputs["input_ids"][0]):]
+output = processor.decode(generated_tokens, skip_special_tokens=False)
+
+print(f"Prompt: {prompt}")
+print(f"Tools: {tools}")
+print(f"Output: {output}")
+
+
+Prompt: Hey, what's the weather in Tokyo right now?
+Tools: [<function get_current_weather at 0x7cef824ece00>]
+Output: <|tool_call>call:get_current_weather{location:<|"|>Tokyo, JP<|"|>}<tool_call|><|tool_response>
+
+ +

Developer's Turn

+ +

Your application should parse the model's response to extract the function name and argments, and append tool_calls and tool_responses with the assistant role.

+ +
+
+
import re
+import json
+
+def extract_tool_calls(text):
+    def cast(v):
+        try: return int(v)
+        except:
+            try: return float(v)
+            except: return {'true': True, 'false': False}.get(v.lower(), v.strip("'\""))
+
+    return [{
+        "name": name,
+        "arguments": {
+            k: cast((v1 or v2).strip())
+            for k, v1, v2 in re.findall(r'(\w+):(?:<\|"\|>(.*?)<\|"\|>|([^,}]*))', args)
+        }
+    } for name, args in re.findall(r"<\|tool_call>call:(\w+)\{(.*?)\}<tool_call\|>", text, re.DOTALL)]
+
+calls = extract_tool_calls(output)
+if calls:
+    # Call the function and get the result
+    #####################################
+    # WARNING: This is a demonstration. #
+    #####################################
+    # Using globals() to call functions dynamically can be dangerous in
+    # production. In a real application, you should implement a secure way to
+    # map function names to actual function calls, such as a predefined
+    # dictionary of allowed tools and their implementations.
+    results = [
+        {"name": c['name'], "response": globals()[c['name']](**c['arguments'])}
+        for c in calls
+    ]
+
+    message.append({
+        "role": "assistant",
+        "tool_calls": [
+            {"function": call} for call in calls
+        ],
+        "tool_responses": results
+    })
+    print(json.dumps(message[-1], indent=2))
+
+
+{
+  "role": "assistant",
+  "tool_calls": [
+    {
+      "function": {
+        "name": "get_current_weather",
+        "arguments": {
+          "location": "Tokyo, JP"
+        }
+      }
+    }
+  ],
+  "tool_responses": [
+    {
+      "name": "get_current_weather",
+      "response": {
+        "temperature": 15,
+        "weather": "sunny"
+      }
+    }
+  ]
+}
+
+ +
+
+
"tool_responses": [
+  {
+    "name": function_name,
+    "response": function_response
+  }
+]
+
+

In case of multiple independent requests:

+
"tool_responses": [
+  {
+    "name": function_name_1,
+    "response": function_response_1
+  },
+  {
+    "name": function_name_2,
+    "response": function_response_2
+  }
+]
+
+

Final Response

+ +

Finally, Gemma reads the tool response and reply to the user.

+
text = processor.apply_chat_template(message, tools=tools, tokenize=False, add_generation_prompt=True)
+inputs = processor(text=text, return_tensors="pt").to(model.device)
+out = model.generate(**inputs, max_new_tokens=128)
+generated_tokens = out[0][len(inputs["input_ids"][0]):]
+output = processor.decode(generated_tokens, skip_special_tokens=True)
+print(f"Output: {output}")
+message[-1]["content"] = output
+
+
+Output: The current weather in Tokyo is 15 degrees and sunny.
+
+ +

You can see the full chat history below.

+
# full history
+print(json.dumps(message, indent=2))
+
+print("-"*80)
+output = processor.decode(out[0], skip_special_tokens=False)
+print(f"Output: {output}")
+
+
+[
+  {
+    "role": "system",
+    "content": "You are a helpful assistant."
+  },
+  {
+    "role": "user",
+    "content": "Hey, what's the weather in Tokyo right now?"
+  },
+  {
+    "role": "assistant",
+    "tool_calls": [
+      {
+        "function": {
+          "name": "get_current_weather",
+          "arguments": {
+            "location": "Tokyo, JP"
+          }
+        }
+      }
+    ],
+    "tool_responses": [
+      {
+        "name": "get_current_weather",
+        "response": {
+          "temperature": 15,
+          "weather": "sunny"
+        }
+      }
+    ],
+    "content": "The current weather in Tokyo is 15 degrees and sunny."
+  }
+]
+--------------------------------------------------------------------------------
+Output: <bos><|turn>system
+You are a helpful assistant.<|tool>declaration:get_current_weather{description:<|"|>Gets the current weather in a given location.<|"|>,parameters:{properties:{location:{description:<|"|>The city and state, e.g. "San Francisco, CA" or "Tokyo, JP"<|"|>,type:<|"|>STRING<|"|>},unit:{description:<|"|>The unit to return the temperature in.<|"|>,enum:[<|"|>celsius<|"|>,<|"|>fahrenheit<|"|>],type:<|"|>STRING<|"|>} },required:[<|"|>location<|"|>],type:<|"|>OBJECT<|"|>} }<tool|><turn|>
+<|turn>user
+Hey, what's the weather in Tokyo right now?<turn|>
+<|turn>model
+<|tool_call>call:get_current_weather{location:<|"|>Tokyo, JP<|"|>}<tool_call|><|tool_response>response:get_current_weather{temperature:15,weather:<|"|>sunny<|"|>}<tool_response|>The current weather in Tokyo is 15 degrees and sunny.<turn|>
+
+ +

Function calling with Thinking

+ +

By utilizing an internal reasoning process, the model significantly enhances its function-calling accuracy. This allows for more precise decision-making regarding when to trigger a tool and how to define its parameters.

+
prompt = "Hey, I'm in Seoul. Is it good for running now?"
+message = [
+    {
+        "role": "system", "content": "You are a helpful assistant."
+    },
+    {
+        "role": "user", "content": prompt
+    },
+]
+
+text = processor.apply_chat_template(message, tools=tools, tokenize=False, add_generation_prompt=True, enable_thinking=True)
+inputs = processor(text=text, return_tensors="pt").to(model.device)
+input_len = inputs["input_ids"].shape[-1]
+
+out = model.generate(**inputs, max_new_tokens=1024)
+output = processor.decode(out[0][input_len:], skip_special_tokens=False)
+result = processor.parse_response(output)
+
+for key, value in result.items():
+  if key == "role":
+    print(f"Role: {value}")
+  elif key == "thinking":
+    print(f"\n=== Thoughts ===\n{value}")
+  elif key == "content":
+    print(f"\n=== Answer ===\n{value}")
+  elif key == "tool_calls":
+    print(f"\n=== Tool Calls ===\n{value}")
+  else:
+    print(f"\n{key}: {value}...\n")
+
+
+Role: assistant
+
+=== Thoughts ===
+
+1.  **Analyze the Request:** The user is asking if it's "good for running now" in "Seoul".
+
+2.  **Identify Necessary Information:** To determine if it's good for running, I need current weather information (temperature, precipitation, etc.) for Seoul.
+
+3.  **Examine Available Tools:** The available tool is `get_current_weather(location, unit)`.
+
+4.  **Determine Tool Arguments:**
+    *   `location`: The user specified "Seoul".
+    *   `unit`: The user did not specify a unit (Celsius or Fahrenheit).
+
+5.  **Formulate the Tool Call:** I need to call `get_current_weather` with the location. Since the user didn't specify a unit, I can either omit it (if the tool defaults are acceptable) or choose a common one. However, the tool definition requires `location` but `unit` is optional.
+
+6.  **Construct the Response Strategy:**
+    *   Call the tool to get the weather data for Seoul.
+    *   Once the data is received, I can advise the user on whether it's suitable for running.
+
+7.  **Generate Tool Call:**
+
+    ```json
+    {
+      "toolSpec": {
+        "name": "get_current_weather",
+        "args": {
+          "location": "Seoul"
+        }
+      }
+    }
+    ```
+    (Self-correction: The `unit` parameter is optional in the definition, so just providing the location is sufficient to proceed.)
+
+8.  **Final Output Generation:** Present the tool call to the user/system.
+
+=== Tool Calls ===
+[{'type': 'function', 'function': {'name': 'get_current_weather', 'arguments': {'location': 'Seoul'} } }]
+
+ +

Process the tool call and get the final answer.

+
calls = extract_tool_calls(output)
+if calls:
+    # Call the function and get the result
+    #####################################
+    # WARNING: This is a demonstration. #
+    #####################################
+    # Using globals() to call functions dynamically can be dangerous in
+    # production. In a real application, you should implement a secure way to
+    # map function names to actual function calls, such as a predefined
+    # dictionary of allowed tools and their implementations.
+    results = [
+        {"name": c['name'], "response": globals()[c['name']](**c['arguments'])}
+        for c in calls
+    ]
+
+    message.append({
+        "role": "assistant",
+        "tool_calls": [
+            {"function": call} for call in calls
+        ],
+        "tool_responses": results
+    })
+
+text = processor.apply_chat_template(message, tools=tools, tokenize=False, add_generation_prompt=True)
+inputs = processor(text=text, return_tensors="pt").to(model.device)
+out = model.generate(**inputs, max_new_tokens=128)
+generated_tokens = out[0][len(inputs["input_ids"][0]):]
+output = processor.decode(generated_tokens, skip_special_tokens=True)
+print(f"Output: {output}")
+message[-1]["content"] = output
+
+print("-"*80)
+print("Full History")
+print("-"*80)
+print(json.dumps(message, indent=2))
+
+
+Output: The current weather in Seoul is 15 degrees Celsius and sunny. That sounds like great weather for a run!
+--------------------------------------------------------------------------------
+Full History
+--------------------------------------------------------------------------------
+[
+  {
+    "role": "system",
+    "content": "You are a helpful assistant."
+  },
+  {
+    "role": "user",
+    "content": "Hey, I'm in Seoul. Is it good for running now?"
+  },
+  {
+    "role": "assistant",
+    "tool_calls": [
+      {
+        "function": {
+          "name": "get_current_weather",
+          "arguments": {
+            "location": "Seoul"
+          }
+        }
+      }
+    ],
+    "tool_responses": [
+      {
+        "name": "get_current_weather",
+        "response": {
+          "temperature": 15,
+          "weather": "sunny"
+        }
+      }
+    ],
+    "content": "The current weather in Seoul is 15 degrees Celsius and sunny. That sounds like great weather for a run!"
+  }
+]
+
+ +

Important Caveat: Automatic vs. Manual Schemas

+ +

When relying on automatic conversion from Python functions to JSON schema, the generated output may not always meet specific expectations regarding complex parameters.

+ +

If a function uses a custom object (like a Config class) as an argument, the automatic converter may describe it simply as a generic "object" without detailing its internal properties.

+ +

In these cases, manually defining the JSON schema is preferred to ensure nested properties (such as theme or font_size within a config object) are explicitly defined for the model.

+
import json
+from transformers.utils import get_json_schema
+
+class Config:
+    def __init__(self):
+        self.theme = "light"
+        self.font_size = 14
+
+def update_config(config: Config):
+    """
+    Updates the configuration of the system.
+
+    Args:
+        config: A Config object
+
+    Returns:
+        True if the configuration was successfully updated, False otherwise.
+    """
+
+update_config_schema = {
+    "type": "function",
+    "function": {
+        "name": "update_config",
+        "description": "Updates the configuration of the system.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "config": {
+                    "type": "object",
+                    "description": "A Config object",
+                    "properties": {"theme": {"type": "string"}, "font_size": {"type": "number"} },
+                    },
+                },
+            "required": ["config"],
+            },
+        },
+    }
+
+print(f"--- [Automatic] ---")
+print(json.dumps(get_json_schema(update_config), indent=2))
+
+print(f"\n--- [Manual Schemas] ---")
+print(json.dumps(update_config_schema, indent=2))
+
+
+--- [Automatic] ---
+{
+  "type": "function",
+  "function": {
+    "name": "update_config",
+    "description": "Updates the configuration of the system.",
+    "parameters": {
+      "type": "object",
+      "properties": {
+        "config": {
+          "type": "object",
+          "description": "A Config object"
+        }
+      },
+      "required": [
+        "config"
+      ]
+    }
+  }
+}
+
+--- [Manual Schemas] ---
+{
+  "type": "function",
+  "function": {
+    "name": "update_config",
+    "description": "Updates the configuration of the system.",
+    "parameters": {
+      "type": "object",
+      "properties": {
+        "config": {
+          "type": "object",
+          "description": "A Config object",
+          "properties": {
+            "theme": {
+              "type": "string"
+            },
+            "font_size": {
+              "type": "number"
+            }
+          }
+        }
+      },
+      "required": [
+        "config"
+      ]
+    }
+  }
+}
+
+ +

Summary and next steps

+ +

You have established how to build an application that can call functions with Gemma 4. The workflow is established through a four-stage cycle:

+ +
    +
  1. Define Tools: Create the functions your model can use, specifying arguments and descriptions (e.g., a weather lookup function).
  2. +
  3. Model's Turn: The model receives the user's prompt and a list of available tools, returning a structured function call object instead of plain text.
  4. +
  5. Developer's Turn: The developer parses this output using regular expressions to extract function names and arguments, executes the actual Python code, and appends the results to the chat history using the specific tool role.
  6. +
  7. Final Response: The model processes the tool's execution result to generate a final, natural language answer for the user.
  8. +
+ +

Check out the following documentation for further reading.

+ + + + + +
+ + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+ + + + + + + + +
+ +
+
+ + + + + + + + + + + + + + + + + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/tooling/google-official/docs/ai-google-dev_model_card_4.html b/tooling/google-official/docs/ai-google-dev_model_card_4.html new file mode 100644 index 0000000..c459b9a --- /dev/null +++ b/tooling/google-official/docs/ai-google-dev_model_card_4.html @@ -0,0 +1,5283 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Gemma 4 model card  |  Google AI for Developers + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+ + + + + + + + + + + + + + +
+
+
+ + + + + + + + + + +
+ + +
+
+ + +
+
+ +
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + +
+
+ + +
+
+ + +
+ + + + +
+ + + +
+ +
+ + + + +
+ + + + + + + + + + + + +
+ +
+
+ +
+ + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + + +
+
+ + + + + + + + + + + + + +
+ + + + + + + + + + + + +

+ Gemma 4 model card + + +

+
+ + + + + + +
+ + + + +

+ +

+ + + +

Gemma 4 Banner

+ +

+ Hugging Face | + GitHub | + Launch Blog | + Documentation +
+ License: Apache 2.0 | Authors: Google DeepMind +

+ +

Gemma is a family of open models built by Google DeepMind. Gemma 4 models are +multimodal, handling text and image input (with audio supported on small models) +and generating text output. This release includes open-weights models in both +pre-trained and instruction-tuned variants. Gemma 4 features a context window of +up to 256K tokens and maintains multilingual support in over 140 languages.

+ +

Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is +well-suited for tasks like text generation, coding, and reasoning. The models +are available in four distinct sizes: E2B, E4B, 26B A4B, and +31B. Their diverse sizes make them deployable in environments ranging from +high-end phones to laptops and servers, democratizing access to state-of-the-art +AI.

+ +

Gemma 4 introduces key capability and architectural advancements:

+ +
    +
  • Reasoning – All models in the family are designed as highly capable +reasoners, with configurable thinking modes.

  • +
  • Extended Multimodalities – Processes Text, Image with variable aspect +ratio and resolution support (all models), Video, and Audio (featured +natively on the E2B and E4B models).

  • +
  • Diverse & Efficient Architectures – Offers Dense and Mixture-of-Experts +(MoE) variants of different sizes for scalable deployment.

  • +
  • Optimized for On-Device – Smaller models are specifically designed for +efficient local execution on laptops and mobile devices.

  • +
  • Increased Context Window – The small models feature a 128K context +window, while the medium models support 256K.

  • +
  • Enhanced Coding & Agentic Capabilities – Achieves notable improvements +in coding benchmarks alongside native function-calling support, powering +highly capable autonomous agents.

  • +
  • Native System Prompt Support – Gemma 4 introduces native support for the +system role, enabling more structured and controllable conversations.

  • +
+ +

Models Overview

+ +

Gemma 4 models are designed to deliver frontier-level performance at each size, +targeting deployment scenarios from mobile and edge devices (E2B, E4B) to +consumer GPUs and workstations (26B A4B, 31B). They are well-suited for +reasoning, agentic workflows, coding, and multimodal understanding.

+ +

The models employ a hybrid attention mechanism that interleaves local sliding +window attention with full global attention, ensuring the final layer is always +global. This hybrid design delivers the processing speed and low memory +footprint of a lightweight model without sacrificing the deep awareness required +for complex, long-context tasks. To optimize memory for long contexts, global +layers feature unified Keys and Values, and apply Proportional RoPE (p-RoPE).

+ +

Dense Models

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PropertyE2BE4B31B Dense
Total Parameters2.3B effective (5.1B with embeddings)4.5B effective (8B with embeddings)30.7B
Layers354260
Sliding Window512 tokens512 tokens1024 tokens
Context Length128K tokens128K tokens256K tokens
Vocabulary Size262K262K262K
Supported ModalitiesText, Image, AudioText, Image, AudioText, Image
Vision Encoder Parameters~150M~150M~550M
Audio Encoder Parameters~300M~300MNo Audio
+ +

The "E" in E2B and E4B stands for "effective" parameters. The smaller models +incorporate Per-Layer Embeddings (PLE) to maximize parameter efficiency in +on-device deployments. Rather than adding more layers or parameters to the +model, PLE gives each decoder layer its own small embedding for every token. +These embedding tables are large but are only used for quick lookups, which is +why the effective parameter count is much smaller than the total.

+ +

Mixture-of-Experts (MoE) Model

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Property26B A4B MoE
Total Parameters25.2B
Active Parameters3.8B
Layers30
Sliding Window1024 tokens
Context Length256K tokens
Vocabulary Size262K
Expert Count8 active / 128 total and 1 shared
Supported ModalitiesText, Image
Vision Encoder Parameters~550M
+ +

The "A" in 26B A4B stands for "active parameters" in contrast to the total +number of parameters the model contains. By only activating a 4B subset of +parameters during inference, the Mixture-of-Experts model runs much faster than +its 26B total might suggest. This makes it an excellent choice for fast +inference compared to the dense 31B model since it runs almost as fast as a +4B-parameter model.

+ +

Benchmark Results

+ +

These models were evaluated against a large collection of different datasets and +metrics to cover different aspects of text generation. Evaluation results marked +in the table are for instruction-tuned models.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Gemma 4 31BGemma 4 26B A4BGemma 4 E4BGemma 4 E2BGemma 3 27B (no think)
MMLU Pro85.2%82.6%69.4%60.0%67.6%
AIME 2026 no tools89.2%88.3%42.5%37.5%20.8%
LiveCodeBench v680.0%77.1%52.0%44.0%29.1%
Codeforces ELO21501718940633110
GPQA Diamond84.3%82.3%58.6%43.4%42.4%
Tau2 (average over 3)76.9%68.2%42.2%24.5%16.2%
HLE no tools19.5%8.7%---
HLE with search26.5%17.2%---
BigBench Extra Hard74.4%64.8%33.1%21.9%19.3%
MMMLU88.4%86.3%76.6%67.4%70.7%
Vision
MMMU Pro76.9%73.8%52.6%44.2%49.7%
OmniDocBench 1.5 (average edit distance, lower is better)0.1310.1490.1810.2900.365
MATH-Vision85.6%82.4%59.5%52.4%46.0%
MedXPertQA MM61.3%58.1%28.7%23.5%-
Audio
CoVoST--35.5433.47-
FLEURS (lower is better)--0.080.09-
Long Context
MRCR v2 8 needle 128k (average)66.4%44.1%25.4%19.1%13.5%
+ +

Core Capabilities

+ +

Gemma 4 models handle a broad range of tasks across text, vision, and audio. Key +capabilities include:

+ +
    +
  • Thinking – Built-in reasoning mode that lets the model think +step-by-step before answering.
  • +
  • Long Context – Context windows of up to 128K tokens (E2B/E4B) and 256K +tokens (26B A4B/31B).
  • +
  • Image Understanding – Object detection, Document/PDF parsing, screen and +UI understanding, chart comprehension, OCR (including multilingual), +handwriting recognition, and pointing. Images can be processed at variable +aspect ratios and resolutions.
  • +
  • Video Understanding – Analyze video by processing sequences of frames.
  • +
  • Interleaved Multimodal Input – Freely mix text and images in any order +within a single prompt.
  • +
  • Function Calling – Native support for structured tool use, enabling +agentic workflows.
  • +
  • Coding – Code generation, completion, and correction.
  • +
  • Multilingual – Out-of-the-box support for 35+ languages, pre-trained on +140+ languages.
  • +
  • Audio (E2B and E4B only) – Automatic speech recognition (ASR) and +speech-to-translated-text translation across multiple languages.
  • +
+ +

Best Practices

+ +

For the best performance, use these configurations and best practices:

+ +

1. Sampling Parameters

+ +

Use the following standardized sampling configuration across all use cases:

+ +
    +
  • temperature=1.0
  • +
  • top_p=0.95
  • +
  • top_k=64
  • +
+ +

2. Thinking Mode Configuration

+ +

Compared to Gemma 3, the models use standard system, assistant, and user +roles. To properly manage the thinking process, use the following control +tokens:

+ +
    +
  • Trigger Thinking: Thinking is enabled by including the <|think|> token +at the start of the system prompt. To disable thinking, remove the token.
  • +
  • Standard Generation: When thinking is enabled, the model will output its +internal reasoning followed by the final answer using this structure: +<|channel>thought\n[Internal reasoning]<channel|>
  • +
  • Disabled Thinking Behavior: For all models except for the E2B and E4B +variants, if thinking is disabled, the model will still generate the tags +but with an empty thought block: <|channel>thought\n<channel|>[Final +answer]
  • +
+ +
+

Note that many libraries like Transformers and llama.cpp handle the +complexities of the chat template for you.

+
+ +

3. Multi-Turn Conversations

+ +
    +
  • No Thinking Content in History: In multi-turn conversations, the +historical model output should only include the final response. Thoughts +from previous model turns must not be added before the next user turn +begins.
  • +
+ +

4. Modality order

+ +
    +
  • For optimal performance with multimodal inputs, place image and/or audio +content before the text in your prompt.
  • +
+ +

5. Variable Image Resolution

+ +

Aside from variable aspect ratios, Gemma 4 supports variable image resolution +through a configurable visual token budget, which controls how many tokens are +used to represent an image. A higher token budget preserves more visual detail +at the cost of additional compute, while a lower budget enables faster inference +for tasks that don't require fine-grained understanding.

+ +
    +
  • The supported token budgets are: 70, 140, 280, 560, and +1120. +
      +
    • Use lower budgets for classification, captioning, or video +understanding, where faster inference and processing many frames +outweigh fine-grained detail.
    • +
    • Use higher budgets for tasks like OCR, document parsing, or reading +small text.
    • +
  • +
+ +

6. Audio

+ +

Use the following prompt structures for audio processing:

+ +
    +
  • Audio Speech Recognition (ASR)
  • +
+
Transcribe the following speech segment in {LANGUAGE} into {LANGUAGE} text.
+
+Follow these specific instructions for formatting the answer:
+*   Only output the transcription, with no newlines.
+*   When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three.
+
+
    +
  • Automatic Speech Translation (AST)
  • +
+
Transcribe the following speech segment in {SOURCE_LANGUAGE}, then translate it into {TARGET_LANGUAGE}.
+When formatting the answer, first output the transcription in {SOURCE_LANGUAGE}, then one newline, then output the string '{TARGET_LANGUAGE}: ', then the translation in {TARGET_LANGUAGE}.
+
+

7. Audio and Video Length

+ +

All models support image inputs and can process videos as frames whereas the E2B +and E4B models also support audio inputs. Audio supports a maximum length of 30 +seconds. Video supports a maximum of 60 seconds assuming the images are +processed at one frame per second.

+ +

Model Data

+ +

Data used for model training and how the data was processed.

+ +

Training Dataset

+ +

Our pre-training dataset is a large-scale, diverse collection of data +encompassing a wide range of domains and modalities, which includes web +documents, code, images, audio, with a cutoff date of January 2025. Here are the +key components:

+ +
    +
  • Web Documents: A diverse collection of web text ensures the model is +exposed to a broad range of linguistic styles, topics, and vocabulary. The +training dataset includes content in over 140 languages.
  • +
  • Code: Exposing the model to code helps it to learn the syntax and +patterns of programming languages, which improves its ability to generate +code and understand code-related questions.
  • +
  • Mathematics: Training on mathematical text helps the model learn logical +reasoning, symbolic representation, and to address mathematical queries.
  • +
  • Images: A wide range of images enables the model to perform image +analysis and visual data extraction tasks.
  • +
+ +

The combination of these diverse data sources is crucial for training a powerful +multimodal model that can handle a wide variety of different tasks and data +formats.

+ +

Data Preprocessing

+ +

Here are the key data cleaning and filtering methods applied to the training +data:

+ +
    +
  • CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering +was applied at multiple stages in the data preparation process to ensure the +exclusion of harmful and illegal content.
  • +
  • Sensitive Data Filtering: As part of making Gemma pre-trained models +safe and reliable, automated techniques were used to filter out certain +personal information and other sensitive data from training sets.
  • +
  • Additional methods: Filtering based on content quality and safety in +line with our +policies.
  • +
+ +

Ethics and Safety

+ +

As open models become central to enterprise infrastructure, provenance and +security are paramount. Developed by Google DeepMind, Gemma 4 undergoes the same +rigorous safety evaluations as our proprietary Gemini models.

+ +

Evaluation Approach

+ +

Gemma 4 models were developed in partnership with internal safety and +responsible AI teams. A range of automated as well as human evaluations were +conducted to help improve model safety. These evaluations align with Google's +AI principles, as well as safety policies, which +aim to prevent our generative AI models from generating harmful content, +including:

+ +
    +
  • Content related to child sexual abuse material and exploitation
  • +
  • Dangerous content (e.g., promoting suicide, or instructing in activities +that could cause real-world harm)
  • +
  • Sexually explicit content
  • +
  • Hate speech (e.g., dehumanizing members of protected groups)
  • +
  • Harassment (e.g., encouraging violence against people)
  • +
+ +

Evaluation Results

+ +

For all areas of safety testing, we saw major improvements in all categories of +content safety relative to previous Gemma models. Overall, Gemma 4 models +significantly outperform Gemma 3 and 3n models in improving safety, while +keeping unjustified refusals low. All testing was conducted without safety +filters to evaluate the model capabilities and behaviors. For both text-to-text +and image-to-text, and across all model sizes, the model produced minimal policy +violations, and showed significant improvements over previous Gemma models' +performance.

+ +

Usage and Limitations

+ +

These models have certain limitations that users should be aware of.

+ +

Intended Usage

+ +

Multimodal models (capable of processing vision, language, and/or audio) have a +wide range of applications across various industries and domains. The following +list of potential uses is not comprehensive. The purpose of this list is to +provide contextual information about the possible use-cases that the model +creators considered as part of model training and development.

+ +
    +
  • Content Creation and Communication +
      +
    • Text Generation: These models can be used to generate creative text +formats such as poems, scripts, code, marketing copy, and email drafts.
    • +
    • Chatbots and Conversational AI: Power conversational interfaces for +customer service, virtual assistants, or interactive applications.
    • +
    • Text Summarization: Generate concise summaries of a text corpus, +research papers, or reports.
    • +
    • Image Data Extraction: These models can be used to extract, +interpret, and summarize visual data for text communications.
    • +
    • Audio Processing and Interaction: The smaller models (E2B and E4B) +can analyze and interpret audio inputs, enabling voice-driven +interactions and transcriptions.
    • +
  • +
  • Research and Education +
      +
    • Natural Language Processing (NLP) and VLM Research: These models can +serve as a foundation for researchers to experiment with VLM and NLP +techniques, develop algorithms, and contribute to the advancement of the +field.
    • +
    • Language Learning Tools: Support interactive language learning +experiences, aiding in grammar correction or providing writing practice. +
        +
      • Knowledge Exploration: Assist researchers in exploring large +bodies of text by generating summaries or answering questions about +specific topics.
      • +
    • +
  • +
+ +

Limitations

+ +
    +
  • Training Data +
      +
    • The quality and diversity of the training data significantly influence +the model's capabilities. Biases or gaps in the training data can lead +to limitations in the model's responses.
    • +
    • The scope of the training dataset determines the subject areas the model +can handle effectively.
    • +
  • +
  • Context and Task Complexity +
      +
    • Models perform well on tasks that can be framed with clear prompts and +instructions. Open-ended or highly complex tasks might be challenging.
    • +
    • A model's performance can be influenced by the amount of context +provided (longer context generally leads to better outputs, up to a +certain point).
    • +
  • +
  • Language Ambiguity and Nuance +
      +
    • Natural language is inherently complex. Models might struggle to grasp +subtle nuances, sarcasm, or figurative language.
    • +
  • +
  • Factual Accuracy +
      +
    • Models generate responses based on information they learned from their +training datasets, but they are not knowledge bases. They may generate +incorrect or outdated factual statements.
    • +
  • +
  • Common Sense +
      +
    • Models rely on statistical patterns in language. They might lack the +ability to apply common sense reasoning in certain situations.
    • +
  • +
+ +

Ethical Considerations and Risks

+ +

The development of vision-language models (VLMs) raises several ethical +concerns. In creating an open model, we have carefully considered the following:

+ +
    +
  • Bias and Fairness +
      +
    • VLMs trained on large-scale, real-world text and image data can reflect +socio-cultural biases embedded in the training material. Gemma 4 models +underwent careful scrutiny, input data pre-processing, and post-training +evaluations as reported in this card to help mitigate the risk of these +biases.
    • +
  • +
  • Misinformation and Misuse +
      +
    • VLMs can be misused to generate text that is false, misleading, or +harmful.
    • +
    • Guidelines are provided for responsible use with the model, see the +Responsible Generative AI Toolkit.
    • +
  • +
  • Transparency and Accountability +
      +
    • This model card summarizes details on the models' architecture, +capabilities, limitations, and evaluation processes.
    • +
    • A responsibly developed open model offers the opportunity to share +innovation by making VLM technology accessible to developers and +researchers across the AI ecosystem.
    • +
  • +
+ +

Risks identified and mitigations:

+ +
    +
  • Generation of harmful content: Mechanisms and guidelines for content +safety are essential. Developers are encouraged to exercise caution and +implement appropriate content safety safeguards based on their specific +product policies and application use cases.
  • +
  • Misuse for malicious purposes: Technical limitations and developer and +end-user education can help mitigate against malicious applications of VLMs. +Educational resources and reporting mechanisms for users to flag misuse are +provided.
  • +
  • Privacy violations: Models were trained on data filtered for removal of +certain personal information and other sensitive data. Developers are +encouraged to adhere to privacy regulations with privacy-preserving +techniques.
  • +
  • Perpetuation of biases: It's encouraged to perform continuous monitoring +(using evaluation metrics, human review) and the exploration of de-biasing +techniques during model training, fine-tuning, and other use cases.
  • +
+ +

Benefits

+ +

At the time of release, this family of models provides high-performance open +vision-language model implementations designed from the ground up for +responsible AI development compared to similarly sized models.

+ + + + +
+ + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+ + + + + + + + +
+ +
+
+ + + + + + + + + + + + + + + + + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/tooling/google-official/docs/ai-google-dev_prompt_formatting_gemma4.html b/tooling/google-official/docs/ai-google-dev_prompt_formatting_gemma4.html new file mode 100644 index 0000000..3cb3c53 --- /dev/null +++ b/tooling/google-official/docs/ai-google-dev_prompt_formatting_gemma4.html @@ -0,0 +1,4873 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Gemma 4 Prompt Formatting  |  Google AI for Developers + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+ + + + + + + + + + + + + + +
+
+
+ + + + + + + + + + +
+ + +
+
+ + +
+
+ +
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + +
+
+ + +
+
+ + +
+ + + + +
+ + + +
+ +
+ + + + +
+ + + + + + + + + + + + +
+ +
+
+ +
+ + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + + +
+
+ + + + + + + + + + + + + +
+ + + + + + + + + + + + +

+ Gemma 4 Prompt Formatting + + +

+
+ + + + + + +
+ + + + +

+ +

+ + + +

Starting with Gemma 4, we introduce new control tokens. For Gemma 3 and lower, +see the previous document.

+ +

The following sections specify the control tokens used by Gemma 4 and their use +cases. Note that the control tokens are reserved in and specific to our +tokenizer.

+ +
    +
  • Token to indicate a system instruction: system
  • +
  • Token to indicate a user turn: user
  • +
  • Token to indicate a model turn: model
  • +
  • Token to indicate the beginning of a dialogue turn: <|turn>
  • +
  • Token to indicate the end of a dialogue turn: <turn|>
  • +
+ +

Here's an example dialogue:

+
<|turn>system
+You are a helpful assistant.<turn|>
+<|turn>user
+Hello.<turn|>
+
+

Multi-modalities

+ + + + + + + + + + + + + + + + + + + + + + + +
Multimodal TokenPurpose
<|image>
<image|>
Indicate image embeddings
<|audio>
<audio|>
Indicate audio embeddings
<|image|>
<|audio|>
Special placeholder tokens
+ +

We use two special placeholder tokens (<|image|> and <|audio|>) to specify +where image and audio tokens should be inserted. After tokenization, these +tokens are replaced by the actual soft embeddings inside the model.

+ +

Here is an example dialogue:

+
prompt = """<|turn>user
+Describe this image: <|image|>
+
+And translate these audio:
+
+a. <|audio|>
+b. <|audio|><turn|>
+<|turn>model"""
+
+

Agentic and Reasoning Control Tokens

+ +

To support agentic workflows, Gemma uses specialized control tokens that +delineate internal reasoning (thinking) from external actions (function +calling). These tokens allow the model to process complex logic before providing +a final response or interacting with outside tools.

+ +

Function Calling

+ +

Gemma 4 is trained on six special tokens to manage the "tool use" lifecycle.

+ + + + + + + + + + + + + + + + + + + + + + + +
Token PairPurpose
<|tool>
<tool|>
Defines a tool
<|tool_call>
<tool_call|>
Indicates a model's request to use a tool.
<|tool_response>
<tool_response|>
Provides a tool's execution result back to the model.
+ +

Delimiter for String Values: <|"|>

+ +

A single token, <|"|>, is used as a delimiter for all string values +within the structured data blocks.

+ +
    +
  • Purpose: This token ensures that any special characters (such as {, +}, ,, or quotes) inside a string are treated as literal text and not as +part of the data structure's underlying syntax.
  • +
  • Usage: All string literals in your function declarations, calls, and +responses must be enclosed using this token (e.g., key:<|"|>string +value<|"|>).
  • +
+ +

Thinking Mode

+ +

To activate thinking mode, include the <|think|> control token within the +system instruction.

+ + + + + + + + + + + + + + + + + + + +
Control TokenPurpose
<|think|>Activates thinking mode
<|channel>
<channel|>
Indicates a model's internal process.
+ +

Here is an example dialogue:

+
<|turn>system
+<|think|><turn|>
+<|turn>user
+What is the water formula?<turn|>
+<|turn>model
+<|channel>thought
+...
+<channel|>The most common interpretation of "the water formula" refers...<turn|>
+
+

Thinking mode is designed to be enabled at the conversation level. This should +be consolidated into a single system turn alongside your other system +instructions, such as tool definitions.

+ +

Reasoning and Function Calling Example

+ +

In an agentic turn, the model may "think" privately before deciding to call a +function. The lifecycle follows this sequence:

+ +
    +
  1. User Inquiry: The user asks a question.
  2. +
  3. Internal Reasoning: The model thinks privately in the thought channel.
  4. +
  5. Tool Request: The model halts generation to request a tool call.
  6. +
  7. Execution & Injection: The application executes the tool and appends the response.
  8. +
  9. Final Response: The model reads the response and generates the final answer.
  10. +
+ +

The following example demonstrates a model using a weather tool:

+
<|turn>system
+<|think|>You are a helpful assistant.<|tool>declaration:get_current_temperature{...}<tool|><turn|>
+<|turn>user
+What's the temperature in London?<turn|>
+<|turn>model
+<|channel>thought
+...
+<channel|><|tool_call>call:get_current_temperature{location:<|"|>London<|"|>}<tool_call|><|tool_response>
+
+

Your application should parse the model's response to extract the function name +and arguments, execute the function, and then append the tool_calls and +tool_responses to the chat history under the assistant role.

+
<|turn>model
+<|tool_call>call:get_current_weather{location:<|"|>London<|"|>}<tool_call|><|tool_response>response:get_current_weather{temperature:15,weather:<|"|>sunny<|"|>}<tool_response|>
+
+

Finally, Gemma reads the tool response and replies to the user.

+
The temperature in London is 15 degrees and it is sunny.<turn|>
+
+

Here is the complete JSON chat history for this example:

+
[
+  {
+    "role": "system",
+    "content": "You are a helpful assistant."
+  },
+  {
+    "role": "user",
+    "content": "What's the temperature in London?"
+  },
+  {
+    "role": "assistant",
+    "tool_calls": [
+      {
+        "function": {
+          "name": "get_current_weather",
+          "arguments": {
+            "location": "London"
+          }
+        }
+      }
+    ],
+    "tool_responses": [
+      {
+        "name": "get_current_weather",
+        "response": {
+          "temperature": 15,
+          "weather": "sunny"
+        }
+      }
+    ],
+    "content": "The temperature in London is 15 degrees and it is sunny."
+  }
+]
+
+

Managing Thought Context Between Turns

+ +

Properly managing the model's generated thoughts is critical for maintaining +performance across multi-turn conversations.

+ +
    +
  • Standard Multi-Turn Conversations: You must remove (strip) the model's +generated thoughts from the previous turn before passing the conversation +history back to the model for the next turn. If you want to disable thinking +mode mid-conversation, you can remove the <|think|> token when you strip +the previous thoughts.
  • +
  • Function Calling (Exception): If a single model turn involves function +or tool calls, thoughts must NOT be removed between the function calls.
  • +
+ +

Agentic Workflows and Long-Running Tasks

+ +

Because raw thoughts are stripped between standard turns, developers building +long-running agents may want to retain reasoning context to prevent the model +from entering cyclical reasoning loops.

+ +
    +
  • Summarizing Thoughts: A highly recommended inference technique is to +extract, summarize, and feed the model's previous thoughts back into the +context window as standard text.
  • +
  • Formatting Constraints: Because Gemma 4 was not explicitly trained with +raw thoughts included in the prompt (outside of the specific tool-call +scenario mentioned above), there is no strict or specific format expected by +the model for these injected thoughts. You have the flexibility to format +summarized reasoning in whatever way best suits your specific agentic +architecture.
  • +
+ +

Integration Notes

+ +
    +
  • Internal State: The <|channel> and <channel|> tokens are typically +used for Chain-of-Thought (CoT) processing. In standard user-facing +applications, this content is usually hidden from the end-user.
  • +
  • Tool Loop: The tool_call and tool_response tokens facilitate a +"handshake" between the model your application environment. The application +intercepts the tool_call, executes the underlying code, and feeds the +result back to the model within the tool_response tokens.
  • +
  • Model Behavior: Larger models (e.g., gemma-4-26B-A4B-it, gemma-4-31B-it) +may occasionally generate a thought channel even when thinking mode is +explicitly turned off. To stabilize model behavior in these edge cases, +consider adding an empty thinking token to the prompt.
  • +
+ +

Tip: Fine-Tuning Big Models with No-Thinking Datasets

+ +

When fine-tuning larger Gemma models with a dataset that does not include +thinking, you can achieve better results by adding the empty channel to your +training prompts:

+
<|turn>model
+<|channel>thought
+<channel|>
+
+

Tip: Adaptive Thought Efficiency using System Instructions

+ +

While "thinking" in Gemma 4 is officially supported as an ON or OFF boolean +feature, the model has exceptionally strong instruction-following capabilities +that allow you to modulate its thinking behavior dynamically.

+ +

Rather than relying on a hardcoded framework parameter for "high" or "low" +thinking, you can use System Instructions (SI) to guide the model into a reduced +thinking mode. By explicitly instructing the model to think efficiently or at a +lower depth (a concept we refer to as a "LOW" thinking instruction), you can +achieve adaptive thought efficiency.

+ +
    +
  • Reduced Cost: Testing has shown that applying a "LOW" thinking System +Instruction can reduce the number of thinking tokens generated by +approximately 20%.
  • +
  • Proof of Concept: Because this behavior is a byproduct of the model's +instructability rather than a specifically trained, there is no single +"perfect" prompt. The "LOW" instruction is a proof of concept.
  • +
  • Customization: We highly encourage developers to play around with their +own custom System Instructions. You can fine-tune the depth, length, and +style of the model's thinking process to perfectly balance latency, cost, +and output quality for your specific use cases.
  • +
+ + + + +
+ + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+ + + + + + + + +
+ +
+
+ + + + + + + + + + + + + + + + + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/tooling/google-official/docs/blog_announcement.html b/tooling/google-official/docs/blog_announcement.html new file mode 100644 index 0000000..404e76b --- /dev/null +++ b/tooling/google-official/docs/blog_announcement.html @@ -0,0 +1,7466 @@ + + + + + + + + + + + + Gemma 4: Our most capable open models to date + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ + + + + + +
+ + + +
+ + + + + + + +
+
+
+ +
+ + +
+ + + + + +
+ + + + + + + + + + + + +
+
Gemma 4: Byte for byte, the most capable open models
+ +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + +
+ +
+ +
+ +
+ ["How can teachers and students use AI?", "What are the newest features in Chrome?", "How can I learn new AI skills?"] +
+
+ + + + + + + + + + + +
+ + + + + + +
+ + + +
+
+
+ + + +
+ + + + +
+ + + + + +
+
+ +
+ + + + + + + + + + + +
+ +

Gemma 4: Byte for byte, the most capable open models

+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+
+ Gemma 4 +
+
+ +
+
+
+ + + + + + + + + +
+
+ + + +
+ + + + + + + + + + + + + + + + + + + + + + + +
+ +
+
+ +
+ + Listen to article + +
This content is generated by Google AI. Generative AI is experimental
+ + + + +
+
+
[[duration]] minutes
+
+ +
+
+ +
+ +
+ +
+ +
+
+ + +
+
+ +
+
+ + + + + +
+
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+

Open model performance vs size on Arena.ai’s chat arena as of 4/1.

+
+ + +
+ Open model performance vs size on Arena.ai’s chat arena +
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+

These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation. See additional benchmarks in our model card.

+
+ + +
+ Gemma 4 Table +
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+
+
+ + + + + + + + + + + +
+ + + + + + + + +
+
+
+
+
+ +
+

Let’s stay in touch. Get the latest news from Google in your inbox.

+
+
+ Subscribe + +
+
+
+
+ + + + + + + + +
+ + + + +
+ + + + + +
+
+ + +
+ +
+ + + + + diff --git a/tooling/google-official/gemma-cpp/API_SERVER_README.md b/tooling/google-official/gemma-cpp/API_SERVER_README.md new file mode 100644 index 0000000..f7af504 --- /dev/null +++ b/tooling/google-official/gemma-cpp/API_SERVER_README.md @@ -0,0 +1,250 @@ +# Gemma.cpp API Server + +This is an HTTP API server for gemma.cpp that implements the Google API protocol, allowing you to interact with Gemma models through REST API endpoints compatible with the Google API format. + +## Features + +- **API-compatible**: Implements Google API endpoints +- **Unified client/server**: Single codebase supports both local and public API modes +- **Text generation**: Support for `generateContent` endpoint +- **Streaming support**: Server-Sent Events (SSE) for `streamGenerateContent` +- **Model management**: Support for `/v1beta/models` endpoint +- **Session management**: Maintains conversation context with KV cache +- **JSON responses**: All responses in Google API format +- **Error handling**: Proper HTTP status codes and error messages + +## Building + +The API server is built alongside the main gemma.cpp project: + +```bash +# Configure the build +cmake -B build -DCMAKE_BUILD_TYPE=Release + +# Build the API server and client +cmake --build build --target gemma_api_server gemma_api_client -j 8 +``` + +The binaries will be created at: +- `build/gemma_api_server` - Local API server +- `build/gemma_api_client` - Unified client for both local and public APIs + +## Usage + +### Starting the Local API Server + +```bash +./build/gemma_api_server \ + --tokenizer path/to/tokenizer.spm \ + --weights path/to/model.sbs \ + --port 8080 +``` + +**Required arguments:** +- `--tokenizer`: Path to the tokenizer file (`.spm`) +- `--weights`: Path to the model weights file (`.sbs`) + +**Optional arguments:** +- `--port`: Port to listen on (default: 8080) +- `--model`: Model name for API endpoints (default: gemma3-4b) + +### Using the Unified Client + +#### With Local Server +```bash +# Interactive chat with local server +./build/gemma_api_client --interactive 1 --host localhost --port 8080 + +# Single prompt with local server +./build/gemma_api_client --prompt "Hello, how are you?" +``` + +#### With Public Google API +```bash +# Set API key and use public API +export GOOGLE_API_KEY="your-api-key-here" +./build/gemma_api_client --interactive 1 + +# Or pass API key directly +./build/gemma_api_client --api_key "your-api-key" --interactive 1 +``` + +## API Endpoints + +The server implements Google API endpoints: + +### 1. Generate Content - `POST /v1beta/models/gemma3-4b:generateContent` + +Generate a response for given content (non-streaming). + +**Request:** +```json +{ + "contents": [ + { + "parts": [ + {"text": "Why is the sky blue?"} + ] + } + ], + "generationConfig": { + "temperature": 0.9, + "topK": 1, + "maxOutputTokens": 1024 + } +} +``` + +**Response:** +```json +{ + "candidates": [ + { + "content": { + "parts": [ + {"text": "The sky appears blue because..."} + ], + "role": "model" + }, + "finishReason": "STOP", + "index": 0 + } + ], + "promptFeedback": { + "safetyRatings": [] + }, + "usageMetadata": { + "promptTokenCount": 5, + "candidatesTokenCount": 25, + "totalTokenCount": 30 + } +} +``` + +### 2. Stream Generate Content - `POST /v1beta/models/gemma3-4b:streamGenerateContent` + +Generate a response with Server-Sent Events (SSE) streaming. + +**Request:** Same as above + +**Response:** Stream of SSE events: +``` +data: {"candidates":[{"content":{"parts":[{"text":"The"}],"role":"model"},"index":0}],"promptFeedback":{"safetyRatings":[]}} + +data: {"candidates":[{"content":{"parts":[{"text":" sky"}],"role":"model"},"index":0}],"promptFeedback":{"safetyRatings":[]}} + +data: [DONE] +``` + +### 3. List Models - `GET /v1beta/models` + +List available models. + +**Response:** +```json +{ + "models": [ + { + "name": "models/gemma3-4b", + "displayName": "Gemma3 4B", + "description": "Gemma3 4B model running locally" + } + ] +} +``` + +## Example Usage + +### Using curl with Local Server + +```bash +# Generate content (non-streaming) +curl -X POST http://localhost:8080/v1beta/models/gemma3-4b:generateContent \ + -H "Content-Type: application/json" \ + -d '{ + "contents": [{"parts": [{"text": "Hello, how are you?"}]}], + "generationConfig": {"temperature": 0.9, "topK": 1, "maxOutputTokens": 1024} + }' + +# Stream generate content (SSE) +curl -X POST http://localhost:8080/v1beta/models/gemma3-4b:streamGenerateContent \ + -H "Content-Type: application/json" \ + -d '{ + "contents": [{"parts": [{"text": "Tell me a story"}]}], + "generationConfig": {"temperature": 0.9, "topK": 1, "maxOutputTokens": 1024} + }' + +# List models +curl http://localhost:8080/v1beta/models +``` + +### Multi-turn Conversation with curl + +```bash +# First message +curl -X POST http://localhost:8080/v1beta/models/gemma3-4b:generateContent \ + -H "Content-Type: application/json" \ + -d '{ + "contents": [ + {"parts": [{"text": "Hi, my name is Alice"}]} + ] + }' + +# Follow-up message with conversation history +curl -X POST http://localhost:8080/v1beta/models/gemma3-4b:generateContent \ + -H "Content-Type: application/json" \ + -d '{ + "contents": [ + {"parts": [{"text": "Hi, my name is Alice"}]}, + {"parts": [{"text": "Hello Alice! Nice to meet you."}]}, + {"parts": [{"text": "What is my name?"}]} + ] + }' +``` + +### Using Python + +```python +import requests + +# Generate content +response = requests.post('http://localhost:8080/v1beta/models/gemma3-4b:generateContent', + json={ + 'contents': [{'parts': [{'text': 'Explain quantum computing in simple terms'}]}], + 'generationConfig': { + 'temperature': 0.9, + 'topK': 1, + 'maxOutputTokens': 1024 + } + } +) + +result = response.json() +if 'candidates' in result and result['candidates']: + text = result['candidates'][0]['content']['parts'][0]['text'] + print(text) +``` + +## Configuration Options + +The Google API supports various generation configuration options: + +- **temperature**: Controls randomness (0.0 to 2.0, default: 1.0) +- **topK**: Top-K sampling parameter (default: 1) +- **maxOutputTokens**: Maximum number of tokens to generate (default: 8192) + +## Key Features + +- **Unified Implementation**: Same codebase handles both local server and public API +- **Session Management**: Maintains conversation context using KV cache +- **Streaming Support**: Real-time token generation via Server-Sent Events +- **Error Handling**: Comprehensive error responses and HTTP status codes +- **Memory Efficient**: Optimized token processing and caching + +## Compatibility + +This implementation is compatible with: +- Google API format and endpoints +- Standard HTTP clients (curl, browsers, Python requests, etc.) +- Server-Sent Events (SSE) for streaming responses +- JSON request/response format diff --git a/tooling/google-official/gemma-cpp/README.md b/tooling/google-official/gemma-cpp/README.md new file mode 100644 index 0000000..6294920 --- /dev/null +++ b/tooling/google-official/gemma-cpp/README.md @@ -0,0 +1,532 @@ +# gemma.cpp + +gemma.cpp is a lightweight, standalone C++ inference engine for the Gemma +foundation models from Google. + +For additional information about Gemma, see +[ai.google.dev/gemma](https://ai.google.dev/gemma). Model weights, including +gemma.cpp specific artifacts, are +[available on kaggle](https://www.kaggle.com/models/google/gemma-2). + +## Who is this project for? + +Modern LLM inference engines are sophisticated systems, often with bespoke +capabilities extending beyond traditional neural network runtimes. With this +comes opportunities for research and innovation through co-design of high level +algorithms and low-level computation. However, there is a gap between +deployment-oriented C++ inference runtimes, which are not designed for +experimentation, and Python-centric ML research frameworks, which abstract away +low-level computation through compilation. + +gemma.cpp provides a minimalist implementation of Gemma-2, Gemma-3, and +PaliGemma-2 models, focusing on simplicity and directness rather than full +generality. This is inspired by vertically-integrated model implementations such +as [ggml](https://github.com/ggerganov/ggml), +[llama.c](https://github.com/karpathy/llama2.c), and +[llama.rs](https://github.com/srush/llama2.rs). + +gemma.cpp targets experimentation and research use cases. It is intended to be +straightforward to embed in other projects with minimal dependencies and also +easily modifiable with a small ~2K LoC core implementation (along with ~4K LoC +of supporting utilities). We use the [Google +Highway](https://github.com/google/highway) Library to take advantage of +portable SIMD for CPU inference. + +For production-oriented edge deployments we recommend standard deployment +pathways using Python frameworks like JAX, Keras, PyTorch, and Transformers +([all model variations here](https://www.kaggle.com/models/google/gemma)). + +## Contributing + +Community contributions large and small are welcome. See +[DEVELOPERS.md](https://github.com/google/gemma.cpp/blob/main/DEVELOPERS.md) +for additional notes contributing developers and [join the discord by following +this invite link](https://discord.gg/H5jCBAWxAe). This project follows +[Google's Open Source Community +Guidelines](https://opensource.google.com/conduct/). + +> [!NOTE] Active development is currently done on the `dev` branch. Please open +> pull requests targeting `dev` branch instead of `main`, which is intended to +> be more stable. + +## What's inside? + +- LLM + + - CPU-only inference for: Gemma 2-3, PaliGemma 2. + - Sampling with TopK and temperature. + - Backward pass (VJP) and Adam optimizer for Gemma research. + +- Optimizations + + - Mixed-precision (fp8, bf16, fp32, fp64 bit) GEMM: + - Designed for BF16 instructions, can efficiently emulate them. + - Automatic runtime autotuning 7 parameters per matrix shape. + - Weight compression integrated directly into GEMM: + - Custom fp8 format with 2..3 mantissa bits; tensor scaling. + - Also bf16, f32 and non-uniform 4-bit (NUQ); easy to add new formats. + +- Infrastructure + + - SIMD: single implementation via Highway. Chooses ISA at runtime. + - Tensor parallelism: CCX-aware, multi-socket thread pool. + - Disk I/O: memory map or parallel read (heuristic with user override). + - Custom format with forward/backward-compatible metadata serialization. + - Model conversion from Safetensors, not yet open sourced. + - Portability: Linux, Windows/OS X supported. CMake/Bazel. 'Any' CPU. + +- Frontends + + - C++ APIs with streaming for single query and batched inference. + - Basic interactive command-line app. + - Basic Python bindings (pybind11). + +## Quick Start + +### System requirements + +Before starting, you should have installed: + +- [CMake](https://cmake.org/) +- [Clang C++ compiler](https://clang.llvm.org/get_started.html), supporting at + least C++17. +- `tar` for extracting archives from Kaggle. + +Building natively on Windows requires the Visual Studio 2012 Build Tools with the +optional Clang/LLVM C++ frontend (`clang-cl`). This can be installed from the +command line with +[`winget`](https://learn.microsoft.com/en-us/windows/package-manager/winget/): + +```sh +winget install --id Kitware.CMake +winget install --id Microsoft.VisualStudio.2022.BuildTools --force --override "--passive --wait --add Microsoft.VisualStudio.Workload.VCTools;installRecommended --add Microsoft.VisualStudio.Component.VC.Llvm.Clang --add Microsoft.VisualStudio.Component.VC.Llvm.ClangToolset" +``` + +### Step 1: Obtain model weights and tokenizer from Kaggle or Hugging Face Hub + +Visit the +[Kaggle page for Gemma-2](https://www.kaggle.com/models/google/gemma-2/gemmaCpp) +and select `Model Variations |> Gemma C++`. + +On this tab, the `Variation` dropdown includes the options below. Note bfloat16 +weights are higher fidelity, while 8-bit switched floating point weights enable +faster inference. In general, we recommend starting with the `-sfp` checkpoints. + +> [!NOTE] **Important**: We strongly recommend starting off with the +> `gemma2-2b-it-sfp` model to get up and running. + +Gemma 2 models are named `gemma2-2b-it` for 2B and `9b-it` or `27b-it`. See the +`ModelPrefix` function in `configs.cc`. + +### Step 2: Extract Files + +After filling out the consent form, the download should proceed to retrieve a +tar archive file `archive.tar.gz`. Extract files from `archive.tar.gz` (this can +take a few minutes): + +``` +tar -xf archive.tar.gz +``` + +This should produce a file containing model weights such as `2b-it-sfp.sbs` and +a tokenizer file (`tokenizer.spm`). You may want to move these files to a +convenient directory location (e.g. the `build/` directory in this repo). + +### Step 3: Build + +The build system uses [CMake](https://cmake.org/). To build the gemma inference +runtime, create a build directory and generate the build files using `cmake` +from the top-level project directory. Note if you previous ran `cmake` and are +re-running with a different setting, be sure to delete all files in the `build/` +directory with `rm -rf build/*`. + +#### Unix-like Platforms +```sh +cmake -B build +``` + +After running `cmake`, you can enter the `build/` directory and run `make` to +build the `./gemma` executable: + +```sh +# Configure `build` directory +cmake --preset make + +# Build project using make +cmake --build --preset make -j [number of parallel threads to use] +``` + +Replace `[number of parallel threads to use]` with a number - the number of +cores available on your system is a reasonable heuristic. For example, `make -j4 +gemma` will build using 4 threads. If the `nproc` command is available, you can +use `make -j$(nproc) gemma` as a reasonable default for the number of threads. + +If you aren't sure of the right value for the `-j` flag, you can simply run +`make gemma` instead and it should still build the `./gemma` executable. + +> [!NOTE] +> On Windows Subsystem for Linux (WSL) users should set the number of +> parallel threads to 1. Using a larger number may result in errors. + +If the build is successful, you should now have a `gemma` executable in the +`build/` directory. + +#### Windows + +```sh +# Configure `build` directory +cmake --preset windows + +# Build project using Visual Studio Build Tools +cmake --build --preset windows -j [number of parallel threads to use] +``` + +If the build is successful, you should now have a `gemma.exe` executable in the +`build/` directory. + +#### Bazel + +```sh +bazel build -c opt --cxxopt=-std=c++20 :gemma +``` + +If the build is successful, you should now have a `gemma` executable in the +`bazel-bin/` directory. + +#### Make + +If you prefer Makefiles, @jart has made one available here: + +https://github.com/jart/gemma3/blob/main/Makefile + +### Step 4: Run + +You can now run `gemma` from inside the `build/` directory. + +`gemma` has the following required arguments: + +Argument | Description | Example value +------------- | ---------------------------- | --------------- +`--weights` | The compressed weights file. | `2b-it-sfp.sbs` +`--tokenizer` | The tokenizer file. | `tokenizer.spm` + +Example invocation for the following configuration: + +- weights file `gemma2-2b-it-sfp.sbs` (Gemma2 2B instruction-tuned model, + 8-bit switched floating point). +- Tokenizer file `tokenizer.spm` (can omit for single-format weights files + created after 2025-05-06, or output by migrate_weights.cc). + +```sh +./gemma \ +--tokenizer tokenizer.spm --weights gemma2-2b-it-sfp.sbs +``` + +### PaliGemma Vision-Language Model + +This repository includes a version of the PaliGemma 2 VLM +([paper](https://arxiv.org/abs/2412.03555)). We provide a C++ implementation of +the PaliGemma 2 model here. + +To use the version of PaliGemma included in this repository, build the gemma +binary as noted above in Step 3. Download the compressed weights and tokenizer +from +[Kaggle](https://www.kaggle.com/models/google/paligemma-2/gemmaCpp/paligemma2-3b-mix-224) +and run the binary as follows: + +```sh +./gemma \ +--tokenizer paligemma_tokenizer.model \ +--weights paligemma2-3b-mix-224-sfp.sbs \ +--image_file paligemma/testdata/image.ppm +``` + +Note that the image reading code is very basic to avoid depending on an image +processing library for now. We currently only support reading binary PPMs (P6). +So use a tool like `convert` to first convert your images into that format, e.g. + +`convert image.jpeg -resize 224x224^ image.ppm` + +(As the image will be resized for processing anyway, we can already resize at +this stage for slightly faster loading.) + +The interaction with the image (using the mix-224 checkpoint) may then look +something like this: + +``` +> Describe the image briefly +A large building with two towers in the middle of a city. +> What type of building is it? +church +> What color is the church? +gray +> caption image +A large building with two towers stands tall on the water's edge. The building +has a brown roof and a window on the side. A tree stands in front of the +building, and a flag waves proudly from its top. The water is calm and blue, +reflecting the sky above. A bridge crosses the water, and a red and white boat +rests on its surface. The building has a window on the side, and a flag on top. +A tall tree stands in front of the building, and a window on the building is +visible from the water. The water is green, and the sky is blue. +``` + +### Migrating to single-file format + +There is now a new format for the weights file, which is a single file that +allows to contain the tokenizer (and the model type) directly. A tool to migrate +from the multi-file format to the single-file format is available. + +```sh +io/migrate_weights \ + --tokenizer .../tokenizer.spm --weights .../gemma2-2b-it-sfp.sbs \ + --output_weights .../gemma2-2b-it-sfp-single.sbs +``` + +After migration, you can omit the tokenizer argument like this: + +```sh +./gemma --weights .../gemma2-2b-it-sfp-single.sbs +``` + +### Troubleshooting and FAQs + +**Problems building in Windows / Visual Studio** + +Currently if you're using Windows, we recommend building in WSL (Windows +Subsystem for Linux). We are exploring options to enable other build +configurations, see issues for active discussion. + +**Model does not respond to instructions and produces strange output** + +A common issue is that you are using a pre-trained model, which is not +instruction-tuned and thus does not respond to instructions. Make sure you are +using an instruction-tuned model (`gemma2-2b-it-sfp`) and not a pre-trained +model (any model with a `-pt` suffix). + +**What sequence lengths are supported?** + +See `max_seq_len` in `configs.cc` and `InferenceArgs.seq_len`. For the Gemma 3 +models larger than 1B, this is typically 32K but 128K would also work given +enough RAM. Note that long sequences will be slow due to the quadratic cost of +attention. + +**How do I convert my fine-tune to a `.sbs` compressed model file?** + +For PaliGemma 2 checkpoints, you can use python/convert_from_safetensors.py to +convert from safetensors format (tested with building via bazel). For an adapter +model, you will likely need to call merge_and_unload() to convert the adapter +model to a single-file format before converting it. + +Here is how to use it using a bazel build of the compression library assuming +locally installed (venv) torch, numpy, safetensors, absl-py, etc.: + +```sh +bazel build //compression/python:compression +BAZEL_OUTPUT_DIR="${PWD}/bazel-bin/compression" +python3 -c "import site; print(site.getsitepackages())" +# Use your sites-packages file here: +ln -s $BAZEL_OUTPUT_DIR [...]/site-packages/compression +python3 python/convert_from_safetensors.py --load_path [...].safetensors.index.json +``` + +**What are some easy ways to make the model run faster?** + +1. Make sure you are using the 8-bit switched floating point `-sfp` models. + These are half the size of bf16 and thus use less memory bandwidth and cache + space. +2. Due to auto-tuning, the second and especially third query will be faster. +3. If you're on a laptop, make sure power mode is set to maximize performance + and saving mode is **off**. For most laptops, the power saving modes get + activated automatically if the computer is not plugged in. +4. Close other unused cpu-intensive applications. +5. On macs, anecdotally we observe a "warm-up" ramp-up in speed as performance + cores get engaged. + +We're also working on algorithmic and optimization approaches for faster +inference, stay tuned. + +## Usage + +`gemma` has different usage modes, controlled by the verbosity flag. + +All usage modes are currently interactive, triggering text generation upon +newline input. + +| Verbosity | Usage mode | Details | +| --------------- | ---------- | --------------------------------------------- | +| `--verbosity 0` | Minimal | Only prints generation output. Suitable as a CLI tool. | +| `--verbosity 1` | Default | Standard user-facing terminal UI. | +| `--verbosity 2` | Detailed | Shows additional developer and debug info. | + +### Interactive Terminal App + +By default, verbosity is set to 1, bringing up a terminal-based interactive +interface when `gemma` is invoked: + +```sh +$ ./gemma [...] + __ _ ___ _ __ ___ _ __ ___ __ _ ___ _ __ _ __ + / _` |/ _ \ '_ ` _ \| '_ ` _ \ / _` | / __| '_ \| '_ \ +| (_| | __/ | | | | | | | | | | (_| || (__| |_) | |_) | + \__, |\___|_| |_| |_|_| |_| |_|\__,_(_)___| .__/| .__/ + __/ | | | | | + |___/ |_| |_| + +... + +*Usage* + Enter an instruction and press enter (%C reset conversation, %Q quits). + +*Examples* + - Write an email to grandma thanking her for the cookies. + - What are some historical attractions to visit around Massachusetts? + - Compute the nth fibonacci number in javascript. + - Write a standup comedy bit about WebGPU programming. + +> What are some outdoorsy places to visit around Boston? + +[ Reading prompt ] ..................... + + +**Boston Harbor and Islands:** + +* **Boston Harbor Islands National and State Park:** Explore pristine beaches, wildlife, and maritime history. +* **Charles River Esplanade:** Enjoy scenic views of the harbor and city skyline. +* **Boston Harbor Cruise Company:** Take a relaxing harbor cruise and admire the city from a different perspective. +* **Seaport Village:** Visit a charming waterfront area with shops, restaurants, and a seaport museum. + +**Forest and Nature:** + +* **Forest Park:** Hike through a scenic forest with diverse wildlife. +* **Quabbin Reservoir:** Enjoy boating, fishing, and hiking in a scenic setting. +* **Mount Forest:** Explore a mountain with breathtaking views of the city and surrounding landscape. + +... +``` + +### Usage as a Command Line Tool + +For using the `gemma` executable as a command line tool, it may be useful to +create an alias for gemma.cpp with arguments fully specified: + +```sh +alias gemma2b="~/gemma.cpp/build/gemma -- --tokenizer ~/gemma.cpp/build/tokenizer.spm --weights ~/gemma.cpp/build/gemma2-2b-it-sfp.sbs --verbosity 0" +``` + +Replace the above paths with your own paths to the model and tokenizer paths +from the download. + +Here is an example of prompting `gemma` with a truncated input +file (using a `gemma2b` alias like defined above): + +```sh +cat configs.h | tail -n 35 | tr '\n' ' ' | xargs -0 echo "What does this C++ code do: " | gemma2b +``` + +> [!NOTE] +> CLI usage of gemma.cpp is experimental and should take context length +> limitations into account. + +The output of the above command should look like: + +```sh +[ Reading prompt ] [...] +This C++ code snippet defines a set of **constants** used in a large language model (LLM) implementation, likely related to the **attention mechanism**. + +Let's break down the code: +[...] +``` + +### Incorporating gemma.cpp as a Library in your Project + +The easiest way to incorporate gemma.cpp in your own project is to pull in +gemma.cpp and dependencies using `FetchContent`. You can add the following to +your CMakeLists.txt: + +``` +include(FetchContent) + +FetchContent_Declare(sentencepiece GIT_REPOSITORY https://github.com/google/sentencepiece GIT_TAG 53de76561cfc149d3c01037f0595669ad32a5e7c) +FetchContent_MakeAvailable(sentencepiece) + +FetchContent_Declare(gemma GIT_REPOSITORY https://github.com/google/gemma.cpp GIT_TAG origin/main) +FetchContent_MakeAvailable(gemma) + +FetchContent_Declare(highway GIT_REPOSITORY https://github.com/google/highway.git GIT_TAG 2a16a50ff61071bb25ddef0ce35d92b0e2b9c579) +FetchContent_MakeAvailable(highway) +``` + +Note for the gemma.cpp `GIT_TAG`, you may replace `origin/main` for a specific +commit hash if you would like to pin the library version. + +After your executable is defined (substitute your executable name for +`[Executable Name]` below): + +``` +target_link_libraries([Executable Name] libgemma hwy hwy_contrib sentencepiece) +FetchContent_GetProperties(gemma) +FetchContent_GetProperties(sentencepiece) +target_include_directories([Executable Name] PRIVATE ${gemma_SOURCE_DIR}) +target_include_directories([Executable Name] PRIVATE ${sentencepiece_SOURCE_DIR}) +``` + +### Building gemma.cpp as a Library + +gemma.cpp can also be used as a library dependency in your own project. The +shared library artifact can be built by modifying the make invocation to build +the `libgemma` target instead of `gemma`. + +> [!NOTE] +> If you are using gemma.cpp in your own project with the `FetchContent` steps +> in the previous section, building the library is done automatically by `cmake` +> and this section can be skipped. + +First, run `cmake`: + +```sh +cmake -B build +``` + +Then, run `make` with the `libgemma` target: + +```sh +cd build +make -j [number of parallel threads to use] libgemma +``` + +If this is successful, you should now have a `libgemma` library file in the +`build/` directory. On Unix platforms, the filename is `libgemma.a`. + +## Independent Projects Using gemma.cpp + +Some independent projects using gemma.cpp: + +- [gemma-cpp-python - Python bindings](https://github.com/namtranase/gemma-cpp-python) +- [lua-cgemma - Lua bindings](https://github.com/ufownl/lua-cgemma) +- [Godot engine demo project](https://github.com/Rliop913/Gemma-godot-demo-project) + +If you would like to have your project included, feel free to get in touch or +submit a PR with a `README.md` edit. + +## Acknowledgements and Contacts + +gemma.cpp was started in fall 2023 by +[Austin Huang](mailto:austinvhuang@google.com) and +[Jan Wassenberg](mailto:janwas@google.com), and subsequently released February +2024 thanks to contributions from Phil Culliton, Paul Chang, and Dan Zheng. + +Griffin support was implemented in April 2024 thanks to contributions by Andrey +Mikhaylov, Eugene Kliuchnikov, Jan Wassenberg, Jyrki Alakuijala, Lode +Vandevenne, Luca Versari, Martin Bruse, Phil Culliton, Sami Boukortt, Thomas +Fischbacher and Zoltan Szabadka. It was removed in 2025-09. + +Gemma-2 support was implemented in June/July 2024 with the help of several +people. + +PaliGemma support was implemented in September 2024 with contributions from +Daniel Keysers. + +[Jan Wassenberg](mailto:janwas@google.com) has continued to contribute many +improvements, including major gains in efficiency, since the initial release. + +This is not an officially supported Google product. diff --git a/tooling/google-official/gemma-cpp/examples_README.md b/tooling/google-official/gemma-cpp/examples_README.md new file mode 100644 index 0000000..87eb54d --- /dev/null +++ b/tooling/google-official/gemma-cpp/examples_README.md @@ -0,0 +1,7 @@ +# Examples + +In this directory are some simple examples illustrating usage of `gemma.cpp` as +a library beyond the interactive `gemma` app implemented in `run.cc`. + +- `hello_world/` - minimal/template project for using `gemma.cpp` as a library. + It sets up the model state and generates text for a single hard coded prompt. diff --git a/tooling/google-official/gemma-pytorch/README.md b/tooling/google-official/gemma-pytorch/README.md new file mode 100644 index 0000000..20344c5 --- /dev/null +++ b/tooling/google-official/gemma-pytorch/README.md @@ -0,0 +1,186 @@ +# Gemma in PyTorch + +**Gemma** is a family of lightweight, state-of-the art open models built from research and technology used to create Google Gemini models. They include both text-only and multimodal decoder-only large language models, with open weights, pre-trained variants, and instruction-tuned variants. For more details, please check out the following links: + + * [Gemma on Google AI](https://ai.google.dev/gemma) + * [Gemma on Kaggle](https://www.kaggle.com/models/google/gemma-3) + * [Gemma on Vertex AI Model Garden](https://pantheon.corp.google.com/vertex-ai/publishers/google/model-garden/gemma3) + +This is the official PyTorch implementation of Gemma models. We provide model and inference implementations using both PyTorch and PyTorch/XLA, and support running inference on CPU, GPU and TPU. + +## Updates + + * [March 12th, 2025 🔥] Support Gemma v3. You can find the checkpoints [on Kaggle](https://www.kaggle.com/models/google/gemma-3/pytorch) and [Hugging Face](https://huggingface.co/models?other=gemma_torch) + + * [June 26th, 2024] Support Gemma v2. You can find the checkpoints [on Kaggle](https://www.kaggle.com/models/google/gemma-2/pytorch) and Hugging Face + + * [April 9th, 2024] Support CodeGemma. You can find the checkpoints [on Kaggle](https://www.kaggle.com/models/google/codegemma/pytorch) and [Hugging Face](https://huggingface.co/collections/google/codegemma-release-66152ac7b683e2667abdee11) + + * [April 5, 2024] Support Gemma v1.1. You can find the v1.1 checkpoints [on Kaggle](https://www.kaggle.com/models/google/gemma/frameworks/pyTorch) and [Hugging Face](https://huggingface.co/collections/google/gemma-release-65d5efbccdbb8c4202ec078b). + +## Download Gemma model checkpoint + +You can find the model checkpoints on Kaggle: + +- [Gemma 3](https://www.kaggle.com/models/google/gemma-3/pyTorch) +- [Gemma 2](https://www.kaggle.com/models/google/gemma-2/pyTorch) +- [Gemma](https://www.kaggle.com/models/google/gemma/pyTorch) + +Alternatively, you can find the model checkpoints on the Hugging Face Hub [here](https://huggingface.co/models?other=gemma_torch). To download the models, go the the model repository of the model of interest and click the `Files and versions` tab, and download the model and tokenizer files. For programmatic downloading, if you have `huggingface_hub` installed, you can also run: + +``` +huggingface-cli download google/gemma-3-4b-it-pytorch +``` + +The following model sizes are available: + +- **Gemma 3**: + - **Text only**: 1b + - **Multimodal**: 4b, 12b, 27b_v3 +- **Gemma 2**: + - **Text only**: 2b-v2, 9b, 27b +- **Gemma**: + - **Text only**: 2b, 7b + + +Note that you can choose between the 1B, 4B, 12B, and 27B variants. + +``` +VARIANT=<1b, 2b, 2b-v2, 4b, 7b, 9b, 12b, 27b, 27b_v3> +CKPT_PATH= +``` + +## Try it free on Colab + +Follow the steps at +[https://ai.google.dev/gemma/docs/pytorch_gemma](https://ai.google.dev/gemma/docs/pytorch_gemma). + +## Try it out with PyTorch + +Prerequisite: make sure you have setup docker permission properly as a non-root user. + +```bash +sudo usermod -aG docker $USER +newgrp docker +``` + +### Build the docker image. + +```bash +DOCKER_URI=gemma:${USER} + +docker build -f docker/Dockerfile ./ -t ${DOCKER_URI} +``` + +### Run Gemma inference on CPU. + +> NOTE: This is a multimodal example. Use a multimodal variant. + +```bash +docker run -t --rm \ + -v ${CKPT_PATH}:/tmp/ckpt \ + ${DOCKER_URI} \ + python scripts/run_multimodal.py \ + --ckpt=/tmp/ckpt \ + --variant="${VARIANT}" \ + # add `--quant` for the int8 quantized model. +``` + +### Run Gemma inference on GPU. + +> NOTE: This is a multimodal example. Use a multimodal variant. + +```bash +docker run -t --rm \ + --gpus all \ + -v ${CKPT_PATH}:/tmp/ckpt \ + ${DOCKER_URI} \ + python scripts/run_multimodal.py \ + --device=cuda \ + --ckpt=/tmp/ckpt \ + --variant="${VARIANT}" + # add `--quant` for the int8 quantized model. +``` + +## Try It out with PyTorch/XLA + +### Build the docker image (CPU, TPU). + +```bash +DOCKER_URI=gemma_xla:${USER} + +docker build -f docker/xla.Dockerfile ./ -t ${DOCKER_URI} +``` + +### Build the docker image (GPU). + +```bash +DOCKER_URI=gemma_xla_gpu:${USER} + +docker build -f docker/xla_gpu.Dockerfile ./ -t ${DOCKER_URI} +``` + +### Run Gemma inference on CPU. + +> NOTE: This is a multimodal example. Use a multimodal variant. + +```bash +docker run -t --rm \ + --shm-size 4gb \ + -e PJRT_DEVICE=CPU \ + -v ${CKPT_PATH}:/tmp/ckpt \ + ${DOCKER_URI} \ + python scripts/run_xla.py \ + --ckpt=/tmp/ckpt \ + --variant="${VARIANT}" \ + # add `--quant` for the int8 quantized model. +``` + +### Run Gemma inference on TPU. + +Note: be sure to use the docker container built from `xla.Dockerfile`. + +```bash +docker run -t --rm \ + --shm-size 4gb \ + -e PJRT_DEVICE=TPU \ + -v ${CKPT_PATH}:/tmp/ckpt \ + ${DOCKER_URI} \ + python scripts/run_xla.py \ + --ckpt=/tmp/ckpt \ + --variant="${VARIANT}" \ + # add `--quant` for the int8 quantized model. +``` + +### Run Gemma inference on GPU. + +Note: be sure to use the docker container built from `xla_gpu.Dockerfile`. + +```bash +docker run -t --rm --privileged \ + --shm-size=16g --net=host --gpus all \ + -e USE_CUDA=1 \ + -e PJRT_DEVICE=CUDA \ + -v ${CKPT_PATH}:/tmp/ckpt \ + ${DOCKER_URI} \ + python scripts/run_xla.py \ + --ckpt=/tmp/ckpt \ + --variant="${VARIANT}" \ + # add `--quant` for the int8 quantized model. +``` + +### Tokenizer Notes + +99 unused tokens are reserved in the pretrained tokenizer model to assist with more efficient training/fine-tuning. Unused tokens are in the string format of `` with token id range of `[7-104]`. + +``` +"": 7, +"": 8, +"": 9, +... +"": 104, +``` + +## Disclaimer + +This is not an officially supported Google product. diff --git a/tooling/google-official/gemma-pytorch/run.py b/tooling/google-official/gemma-pytorch/run.py new file mode 100644 index 0000000..e1f93e5 --- /dev/null +++ b/tooling/google-official/gemma-pytorch/run.py @@ -0,0 +1,107 @@ +# Copyright 2024 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import contextlib +import random + +import numpy as np +import torch +from absl import app, flags + +from gemma import config +from gemma import model as gemma_model + +# Define flags +FLAGS = flags.FLAGS + +flags.DEFINE_string('ckpt', None, 'Path to the checkpoint file.', required=True) +flags.DEFINE_string('variant', '4b', 'Model variant.') +flags.DEFINE_string('device', 'cpu', 'Device to run the model on.') +flags.DEFINE_integer('output_len', 10, 'Length of the output sequence.') +flags.DEFINE_integer('seed', 12345, 'Random seed.') +flags.DEFINE_boolean('quant', False, 'Whether to use quantization.') +flags.DEFINE_string('prompt', 'What are large language models?', 'Input prompt for the model.') + +# Define valid text only model variants +_VALID_MODEL_VARIANTS = ['2b', '2b-v2', '7b', '9b', '27b', '1b'] + +# Define valid devices +_VALID_DEVICES = ['cpu', 'cuda'] + +# Validator function for the 'variant' flag +def validate_variant(variant): + if variant not in _VALID_MODEL_VARIANTS: + raise ValueError(f'Invalid variant: {variant}. Valid variants are: {_VALID_MODEL_VARIANTS}') + return True + +# Validator function for the 'device' flag +def validate_device(device): + if device not in _VALID_DEVICES: + raise ValueError(f'Invalid device: {device}. Valid devices are: {_VALID_DEVICES}') + return True + +# Register the validator for the 'variant' flag +flags.register_validator('variant', validate_variant, message='Invalid model variant.') + +# Register the validator for the 'device' flag +flags.register_validator('device', validate_device, message='Invalid device.') + +@contextlib.contextmanager +def _set_default_tensor_type(dtype: torch.dtype): + """Sets the default torch dtype to the given dtype.""" + torch.set_default_dtype(dtype) + yield + torch.set_default_dtype(torch.float) + +def main(_): + # Construct the model config. + model_config = config.get_model_config(FLAGS.variant) + model_config.dtype = "float32" + model_config.quant = FLAGS.quant + + # Seed random. + random.seed(FLAGS.seed) + np.random.seed(FLAGS.seed) + torch.manual_seed(FLAGS.seed) + + # Create the model and load the weights. + device = torch.device(FLAGS.device) + with _set_default_tensor_type(model_config.get_dtype()): + model = gemma_model.GemmaForCausalLM(model_config) + model.load_weights(FLAGS.ckpt) + model = model.to(device).eval() + print("Model loading done") + + # Generate the response. + result = model.generate(FLAGS.prompt, device, output_len=FLAGS.output_len) + + # Print the prompts and results. + print('======================================') + print(f'PROMPT: {FLAGS.prompt}') + print(f'RESULT: {result}') + print('======================================') + +if __name__ == "__main__": + app.run(main) + + +# How to run this script: + +# Example command (replace with your actual paths and values): +# python scripts/run.py --device=cpu --ckpt=/path/to/your/pytorch_checkpoint/model.ckpt --output_len=2 --prompt="The name of the capital of Italy is" +# Important: +# - Replace '/path/to/your/pytorch_checkpoint/model.ckpt' with the actual path to your checkpoint file. +# - Choose the correct --variant (model size). +# - Use --device=cuda if you have a GPU; otherwise, use --device=cpu. \ No newline at end of file diff --git a/tooling/google-official/gemma-pytorch/run_multimodal.py b/tooling/google-official/gemma-pytorch/run_multimodal.py new file mode 100644 index 0000000..231e340 --- /dev/null +++ b/tooling/google-official/gemma-pytorch/run_multimodal.py @@ -0,0 +1,197 @@ +# Copyright 2024 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import contextlib +import random + +from absl import app +from absl import flags +import numpy as np +from PIL import Image +import torch + +from gemma import config +from gemma import gemma3_model + +# Define flags +FLAGS = flags.FLAGS + +_CKPT = flags.DEFINE_string( + 'ckpt', None, 'Path to the checkpoint file.', required=True +) +_VARIANT = flags.DEFINE_string('variant', '4b', 'Model variant.') +_DEVICE = flags.DEFINE_string('device', 'cpu', 'Device to run the model on.') +_OUTPUT_LEN = flags.DEFINE_integer( + 'output_len', 10, 'Length of the output sequence.' +) +_SEED = flags.DEFINE_integer('seed', 12345, 'Random seed.') +_QUANT = flags.DEFINE_boolean('quant', False, 'Whether to use quantization.') + +# Define valid multimodal model variants +_VALID_MODEL_VARIANTS = ['4b', '12b', '27b_v3'] + +# Define valid devices +_VALID_DEVICES = ['cpu', 'cuda'] + + +# Validator function for the 'variant' flag +def validate_variant(variant): + if variant not in _VALID_MODEL_VARIANTS: + raise ValueError( + f'Invalid variant: {variant}. Valid variants are:' + f' {_VALID_MODEL_VARIANTS}' + ) + return True + + +# Validator function for the 'device' flag +def validate_device(device): + if device not in _VALID_DEVICES: + raise ValueError( + f'Invalid device: {device}. Valid devices are: {_VALID_DEVICES}' + ) + return True + + +# Register the validator for the 'variant' flag +flags.register_validator( + 'variant', validate_variant, message='Invalid model variant.' +) + +# Register the validator for the 'device' flag +flags.register_validator('device', validate_device, message='Invalid device.') + + +@contextlib.contextmanager +def _set_default_tensor_type(dtype: torch.dtype): + """Sets the default torch dtype to the given dtype.""" + torch.set_default_dtype(dtype) + yield + torch.set_default_dtype(torch.float) + + +def main(_): + # Construct the model config. + model_config = config.get_model_config(_VARIANT.value) + model_config.dtype = 'float32' + model_config.quant = _QUANT.value + image_paths = {"cow_in_beach": "scripts/images/cow_in_beach.jpg", + "lilly": "scripts/images/lilly.jpg", + "sunflower": "scripts/images/sunflower.JPG", + 'golden_test_image': ( + 'scripts/images/test_image.jpg' + ), + } + + image = {} + for key in image_paths: + try: + image[key] = Image.open(image_paths[key]) # Open local file + image[key].show() + except IOError as e: + print(f"Error loading image: {e}") + exit() + + # Seed random. + random.seed(_SEED.value) + np.random.seed(_SEED.value) + torch.manual_seed(_SEED.value) + + # Create the model and load the weights. + device = torch.device(_DEVICE.value) + with _set_default_tensor_type(model_config.get_dtype()): + model = gemma3_model.Gemma3ForMultimodalLM(model_config) + model.load_state_dict(torch.load(_CKPT.value)['model_state_dict']) + # model.load_weights(_CKPT.value) + model = model.to(device).eval() + print('Model loading done') + + # Generate text only. + result = model.generate( + [ + [ + 'user The capital of Italy' + ' is?\nmodel' + ], + [ + 'user What is your' + ' purpose?\nmodel' + ], + ], + device, + output_len=_OUTPUT_LEN.value, + ) + + # Print the results. + print('======================================') + print(f'Text only RESULT: {result}') + print('======================================') + + # Generate golden Gemax test image. + result = model.generate( + [[ + 'user\n', + image['golden_test_image'], + 'Caption this image. \nmodel', + ]], + device, + output_len=_OUTPUT_LEN.value, + ) + + # Print the result. + print('======================================') + print(f'Golden test image RESULT: {result}') + print('======================================') + + # Generate text and image. + result = model.generate( + [[ + 'user\n', + image['cow_in_beach'], + ( + 'The name of the animal in the image is' + ' \nmodel' + ), + ]], + device, + output_len=_OUTPUT_LEN.value, + ) + + # Print the result. + print('======================================') + print(f'Single image RESULT: {result}') + print('======================================') + + # Generate interleave text and multiple images. + result = model.generate( + [[ + 'user\nThis image', + image['lilly'], + 'and this image', + image['sunflower'], + 'are similar because? \nmodel', + ]], + device, + output_len=_OUTPUT_LEN.value, + ) + + # Print the result. + print('======================================') + print(f'Interleave images RESULT: {result}') + print('======================================') + + +if __name__ == '__main__': + app.run(main) diff --git a/tooling/google-official/gemma-pytorch/run_xla.py b/tooling/google-official/gemma-pytorch/run_xla.py new file mode 100644 index 0000000..1881e63 --- /dev/null +++ b/tooling/google-official/gemma-pytorch/run_xla.py @@ -0,0 +1,267 @@ +# Copyright 2024 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import contextlib +import os +import random +import socket +import sys +from typing import List, Union + +import numpy as np +import torch +import torch.multiprocessing + +from gemma.config import GemmaConfig, get_model_config +from gemma.model_xla import GemmaForCausalLM +from gemma.tokenizer import Tokenizer +import gemma.xla_model_parallel as xla_model_parallel + +USE_CUDA = os.environ.get('USE_CUDA', False) +if not USE_CUDA: + import torch_xla.core.xla_model as xm + import torch_xla.distributed.xla_multiprocessing as xmp +else: + # Choose an available port. + with contextlib.closing(socket.socket(socket.AF_INET, + socket.SOCK_STREAM)) as s: + s.bind(('', 0)) + s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) + MASTER_PORT = str(s.getsockname()[1]) + + +@contextlib.contextmanager +def _set_default_tensor_type(dtype: torch.dtype): + """Sets the default torch dtype to the given dtype.""" + torch.set_default_dtype(dtype) + yield + torch.set_default_dtype(torch.float) + + +def generate( + i: int, + model_config: GemmaConfig, + ckpt_path: str, + prompts: List[str], + output_lens: List[int], + temperatures: Union[List[float], None], + top_ps: List[float], + top_ks: List[int], + seed: int +): + random.seed(seed) + np.random.seed(seed) + torch.manual_seed(seed) + if USE_CUDA: + os.environ['MASTER_ADDR'] = '127.0.0.1' + os.environ['MASTER_PORT'] = MASTER_PORT + if not torch.distributed.is_initialized(): + torch.distributed.init_process_group( + "nccl", + rank=int(os.environ.get("RANK", 0)), + world_size=int(os.environ.get("WORLD_SIZE", 1))) + xla_model_parallel.set_g_group() + local_rank = int(os.environ.get("LOCAL_RANK", 0)) + device = torch.device("cuda", local_rank) + torch.cuda.set_device(local_rank) + else: + device = xm.xla_device() + xm.set_rng_state(seed, device) + + rank = xla_model_parallel.get_model_parallel_rank() + world_size = xla_model_parallel.get_model_parallel_world_size() + if rank > 0: + sys.stdout = open(os.devnull, 'w') + + # build, load and compile model. + with _set_default_tensor_type(model_config.get_dtype()): + model = GemmaForCausalLM(model_config, world_size, rank, device) + model.load_weights(ckpt_path) + model = model.to(device).eval() + + # create tokenizer. + tokenizer = Tokenizer(model_config.tokenizer) + + prompt_tokens = [tokenizer.encode(prompt) for prompt in prompts] + min_prompt_len = min(len(p) for p in prompt_tokens) + + batch_size = len(prompts) + if temperatures is not None: + assert batch_size == len(temperatures) + assert batch_size == len(top_ps) + assert batch_size == len(top_ks) + max_seq_len = max([len(p) + o for p, o in zip(prompt_tokens, output_lens)]) + assert max_seq_len <= model_config.max_position_embeddings + if model_config.num_key_value_heads < world_size: + assert world_size % model_config.num_key_value_heads == 0 + n_local_heads = 1 + else: + assert model_config.num_key_value_heads % world_size == 0 + n_local_heads = model_config.num_key_value_heads // world_size + + # build KV caches + kv_caches = [] + for _ in range(model_config.num_hidden_layers): + k_cache = torch.zeros( + size=(batch_size, max_seq_len, n_local_heads, + model_config.head_dim), + dtype=model_config.get_dtype(), + device=device, + ) + v_cache = torch.zeros( + size=(batch_size, max_seq_len, n_local_heads, + model_config.head_dim), + dtype=model_config.get_dtype(), + device=device, + ) + kv_caches.append((k_cache, v_cache)) + + # prepare inputs + token_ids_tensor = torch.full((batch_size, max_seq_len), + tokenizer.pad_id, + dtype=torch.int64) + input_token_ids_tensor = torch.full((batch_size, min_prompt_len), + tokenizer.pad_id, + dtype=torch.int64) + for i, p in enumerate(prompt_tokens): + token_ids_tensor[i, :len(p)] = torch.tensor(p) + input_token_ids_tensor[i, :min_prompt_len] = torch.tensor( + p[:min_prompt_len]) + token_ids_tensor = token_ids_tensor.to(device) + prompt_mask_tensor = token_ids_tensor != tokenizer.pad_id + input_token_ids_tensor = input_token_ids_tensor.to(device) + input_positions_tensor = torch.arange(0, min_prompt_len, + dtype=torch.int64).to(device) + mask_tensor = torch.full((1, 1, max_seq_len, max_seq_len), + -2.3819763e38).to(torch.float) + mask_tensor = torch.triu(mask_tensor, diagonal=1).to(device) + curr_mask_tensor = mask_tensor.index_select(2, input_positions_tensor) + output_positions_tensor = torch.LongTensor([min_prompt_len - 1]).to(device) + temperatures_tensor = None if not temperatures else torch.FloatTensor(temperatures).to(device) + top_ps_tensor = torch.FloatTensor(top_ps).to(device) + top_ks_tensor = torch.LongTensor(top_ks).to(device) + output_index = torch.tensor(min_prompt_len, dtype=torch.int64).to(device) + if not USE_CUDA: + xm.mark_step() + + # Prefill up to min_prompt_len tokens, then treat other prefill as decode and ignore output. + for i in range(max_seq_len - min_prompt_len): + next_token_ids, _ = model( + input_token_ids=input_token_ids_tensor, + input_positions=input_positions_tensor, + kv_write_indices=None, + kv_caches=kv_caches, + mask=curr_mask_tensor, + output_positions=output_positions_tensor, + temperatures=temperatures_tensor, + top_ps=top_ps_tensor, + top_ks=top_ks_tensor, + ) + curr_prompt_mask = prompt_mask_tensor.index_select( + 1, output_index).squeeze(dim=1) + curr_token_ids = token_ids_tensor.index_select( + 1, output_index).squeeze(dim=1) + output_token_ids = torch.where(curr_prompt_mask, curr_token_ids, + next_token_ids).unsqueeze(dim=1) + token_ids_tensor.index_copy_(1, output_index, output_token_ids) + + input_token_ids_tensor = output_token_ids + input_positions_tensor = output_index.unsqueeze(dim=-1) + curr_mask_tensor = mask_tensor.index_select(2, input_positions_tensor) + output_positions_tensor = torch.tensor(0, dtype=torch.int64).to(device) + output_index = output_index + 1 + if not USE_CUDA: + xm.mark_step() + + # Detokenization. + token_ids = token_ids_tensor.tolist() + results = [] + for i, tokens in enumerate(token_ids): + trimmed_output = tokens[len(prompt_tokens[i]):len(prompt_tokens[i]) + + output_lens[i]] + if tokenizer.eos_id in trimmed_output: + eos_index = trimmed_output.index(tokenizer.eos_id) + trimmed_output = trimmed_output[:eos_index] + results.append(tokenizer.decode(trimmed_output)) + + for prompt, result in zip(prompts, results): + print('======================================') + print(f'PROMPT: {prompt}') + print(f'RESULT: {result}') + print('======================================') + + +def main(args): + model_config = get_model_config(args.variant) + model_config.quant = args.quant + + prompts = [args.prompt] + n = len(prompts) + output_lengths = [args.output_len] * n + temperatures = [0.95] * n + top_ps = [1.0] * n + top_ks = [100] * n + + if USE_CUDA: + os.environ['MASTER_ADDR'] = '127.0.0.1' + os.environ['MASTER_PORT'] = MASTER_PORT + if not torch.distributed.is_initialized(): + torch.distributed.init_process_group( + "nccl", + rank=int(os.environ.get("RANK", 0)), + world_size=int(os.environ.get("WORLD_SIZE", 1))) + xla_model_parallel.set_g_group() + torch.multiprocessing.spawn( + generate, + args=( + model_config, + args.ckpt, + prompts, + output_lengths, + temperatures, + top_ps, + top_ks, + args.seed, + ), + ) + else: + xmp.spawn( + generate, + args=( + model_config, + args.ckpt, + prompts, + output_lengths, + temperatures, + top_ps, + top_ks, + args.seed, + ), + ) + + +if __name__ == '__main__': + parser = argparse.ArgumentParser() + parser.add_argument("--ckpt", type=str, required=True) + parser.add_argument("--variant", + type=str, + default="2b", + choices=["2b", "2b-v2", "7b", "9b", "27b"]) + parser.add_argument("--output_len", type=int, default=4) + parser.add_argument("--seed", type=int, default=12345) + parser.add_argument("--quant", action='store_true') + parser.add_argument("--prompt", type=str, default="The meaning of life is") + args = parser.parse_args() + + main(args) diff --git a/tooling/google-official/tech-report/Gemma3Report.pdf b/tooling/google-official/tech-report/Gemma3Report.pdf new file mode 100644 index 0000000..2339e44 Binary files /dev/null and b/tooling/google-official/tech-report/Gemma3Report.pdf differ diff --git a/tooling/huggingface/README.md b/tooling/huggingface/README.md new file mode 100644 index 0000000..300e9f4 --- /dev/null +++ b/tooling/huggingface/README.md @@ -0,0 +1,161 @@ +# Gemma 4 — Hugging Face Canonical Tooling + +Downloaded April 2026. First-party Google/HF content only. No weights, no third-party fine-tunes. + +## What's here + +### `model-cards/` +Verbatim `README.md` from every `google/gemma-4-*` repo (raw endpoint, ungated). Plus the chat template and tokenizer config for two representative variants (31B-it and E4B-it). All eight model cards have identical body text; they differ only in the `pipeline_tag:` YAML frontmatter and size-specific tables. + +| File | What it demonstrates | +|------|----------------------| +| `gemma-4-31B-it-README.md` | Flagship dense (33B) instruction-tuned. Full "how to use" from Google+HF. | +| `gemma-4-31B-README.md` | Base (pretrained) variant of the above. | +| `gemma-4-26B-A4B-it-README.md` | MoE (26B params, 4B active) instruction-tuned. The "A4B" = 4B active. | +| `gemma-4-26B-A4B-README.md` | Base MoE. | +| `gemma-4-E4B-it-README.md` | Edge-sized 8B instruction-tuned. Multimodal including audio. | +| `gemma-4-E4B-README.md` | Base E4B. | +| `gemma-4-E2B-it-README.md` | Smallest (5B) instruction-tuned, mobile-targeted. | +| `gemma-4-E2B-README.md` | Base E2B. | +| `gemma-4-31B-it-chat_template.jinja` | **Canonical chat template.** 16KB Jinja — handles system/user/model/tool roles, thinking channel, tool calls, image/audio/video tokens. | +| `gemma-4-E4B-it-chat_template.jinja` | Near-identical to 31B's (131-byte difference — likely one whitespace-sensitive thing around audio handling). | +| `gemma-4-31B-it-tokenizer_config.json` | **Special-token inventory + `response_schema` regex machinery.** See "New capabilities" below. | +| `gemma-4-E4B-it-tokenizer_config.json` | Same shape. | + +### `transformers/` +Files under `src/transformers/models/gemma4/` on `huggingface/transformers@main`. Full files for small ones; outlines (signatures + first 12 lines per class/def) for the two large ones. + +| File | Lines | What | +|------|-------|------| +| `__init__.py` | 33 | Module exports | +| `configuration_gemma4.py` | 352 | `Gemma4Config`, `Gemma4TextConfig`, `Gemma4AudioConfig`, `Gemma4VisionConfig` — all hyperparams | +| `processing_gemma4.py` | 366 | `Gemma4Processor` — the thing `AutoProcessor.from_pretrained` returns. Includes `parse_response()` | +| `feature_extraction_gemma4.py` | 298 | Audio feature extraction (mel spec, padding) | +| `image_processing_gemma4.py` | 220 | Tensor-backed image preprocessing | +| `image_processing_pil_gemma4.py` | 278 | PIL-backed variant (slower fallback) | +| `video_processing_gemma4.py` | 237 | Frame sampling + stitching to image tokens | +| `modeling_gemma4-OUTLINE.py` | 723 | Outline of the 2657-line modeling file (43 classes: attention, MoE, audio encoder, vision tower, all LM heads) | +| `modular_gemma4-OUTLINE.py` | 563 | Outline of the modular source file — shows Gemma4 **inherits from Gemma3n classes** (RMSNorm, attention blocks etc.) confirming the 3n→4 lineage | + +Full files: https://github.com/huggingface/transformers/tree/main/src/transformers/models/gemma4 + +### `recipes/` +From `huggingface/huggingface-gemma-recipes` — the canonical HF recipe repo. The only Gemma 4-specific recipe as of April 2026 is one notebook; the rest is Gemma 3n which is architecturally the parent of Gemma 4. + +| File | What | +|------|------| +| `notebooks/Gemma4_E2B-Multimodal.ipynb` | **The one first-party Gemma-4 recipe.** Original ipynb. 36 cells: image, video, audio, function calling, object detection with `box_2d`, any-to-any pipeline, captioning. | +| `notebooks/Gemma4_E2B-Multimodal-extracted.py` | Same notebook flattened to readable .py for grep/diff. | +| `scripts/ft_gemma3n_image_trl.py` | TRL SFT fine-tune of Gemma 3n on images. Direct precursor to Gemma 4 SFT. | +| `scripts/ft_gemma3n_image_vt.py` | Vision+text fine-tune without TRL (pure Transformers Trainer). | +| `scripts/ft_gemma3n_audio_vt.py` | Audio+text fine-tune. | +| `scripts/gemma3n_fine_tuning_on_all_modalities.py` | All-modalities SFT script — template for full Gemma-4 all-modal SFT. | +| `scripts/carla_vlm_gemma.py` | CARLA driving sim VLM example using Gemma. | + +### `trl/` +**Empty as of April 2026.** Searched `huggingface/trl/examples/scripts` — only `sft_gemma3.py` and `sft_vlm_gemma3.py` exist, no gemma4 yet. The gemma-recipes repo's `ft_gemma3n_image_trl.py` is the closest first-party TRL pattern; it is saved under `recipes/scripts/` above. + +### `peft/` +**Empty as of April 2026.** `huggingface/peft/examples` has no gemma-specific directory. The canonical HF PEFT guide for Gemma is the blog post `gemma-peft.md`, saved under `blog/` below. It covers Gemma 1 but the LoRA target-module patterns apply unchanged to Gemma 4 (same `q_proj/k_proj/v_proj/o_proj` naming). + +### `blog/` +| File | What | +|------|------| +| `gemma4-blog.md` | **"Welcome Gemma 4: Frontier multimodal intelligence on device"** — the HF launch blog. 764 lines. Authored by merve. Covers architecture, capabilities, transformers usage, HF Inference API, llama.cpp/MLX quantization, thinking mode examples. | +| `gemma-peft-blog.md` | "Fine-Tuning Gemma Models in Hugging Face" — the PEFT/LoRA recipe blog (gemma-agnostic, target modules unchanged for Gemma 4). | + +### `spaces/` +The official HF-run interactive demo Spaces. + +| File | What | +|------|------| +| `huggingface-projects_gemma-4-31b-it-app.py` | Official 31B demo (Gradio 6 chat + multimodal). | +| `huggingface-projects_gemma-4-e4b-it-app.py` | Official E4B demo. **More illustrative** — shows the full multimodal+thinking pattern in ~320 lines. | +| `*-requirements.txt` | Pinned deps. **`transformers==5.5.4`** (as of 2026-04-18) — that's the minimum version for Gemma 4 in transformers main line. | + +--- + +## New capabilities the HF integration exposes that weren't in the existing corpus + +1. **`AutoModelForMultimodalLM`** — new transformers AutoClass, not `AutoModelForCausalLM`. Required to get any-to-any routing (text+image+audio+video in, text out). The corpus's `CORPUS_capabilities.md` should note this. + +2. **`processor.parse_response(text) -> dict`** — built into `Gemma4Processor`. Returns `{thinking, content, tool_calls}` parsed from raw decoded output. Driven by regexes declared in `tokenizer_config.json` under `response_schema` (new HF feature using `x-regex`, `x-regex-iterator`, and a custom `x-parser: gemma4-tool-call`). **You no longer need to hand-roll tool-call regex parsing** if you use the HF processor — this is the HF-canonical replacement for the manual parsing done in `CORPUS_tool_calling_format.md`. + +3. **`enable_thinking=True`** — a kwarg to `processor.apply_chat_template()`. When set, injects `<|think|>` at the top of the system turn. **This is how you turn reasoning mode on** through the HF API. Not documented in the existing corpus. + +4. **`load_audio_from_video=True`** — another `apply_chat_template` kwarg. Pulls the audio track out of a video URL and feeds it as audio tokens alongside sampled frames. Only relevant for E2B/E4B which have audio; the notebook comment explicitly calls this out. + +5. **`pipeline("any-to-any", model=...)`** — a new HF pipeline task registered for Gemma 4. Accepts the chat-style messages list directly. Easiest one-liner for multimodal inference. + +6. **Object detection via `box_2d` JSON** — prompting with "What's the bounding box for the X?" returns `[{"box_2d": [ymin, xmin, ymax, xmax], "label": "..."}]` in a 1000x1000 normalized coordinate frame, with images resized to multiples of 48 pixels. This is a Gemma-4-specific convention the notebook demonstrates. Corpus doesn't cover this. + +7. **Thinking delimiters are `<|channel>thought...`** — not `...` like some other open-weights models. The Space app explicitly strips these to pass to Gradio 6's `reasoning_tags` for collapsible thinking UI. + +8. **Breaking change in role/turn markers vs Gemma 3** — Gemma 3 used `user ... `. Gemma 4 uses `<|turn>user\n ... `. Tokenizer config: + - `sot_token`: `<|turn>` (start of turn) + - `eot_token`: `` (end of turn) + - Role after `<|turn>` can be `system`, `user`, `model`, or `tool`. + - `enable_thinking` injects a `<|think|>` marker into the first system turn. + Anything in the homelab that hard-codes `` for Gemma needs to branch on family version. Worth adding to `GOTCHAS.md`. + +--- + +## Canonical chat template format + +**Source of truth:** the two `.jinja` files in `model-cards/`. Use them directly — **do not reimplement.** The tokenizer loads them automatically: + +```python +from transformers import AutoProcessor +processor = AutoProcessor.from_pretrained("google/gemma-4-E4B-it") +inputs = processor.apply_chat_template( + messages, + tools=[WEATHER_TOOL], # optional; OpenAI-style tool schema + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, + enable_thinking=True, # turns on reasoning, injects <|think|> + load_audio_from_video=False, # only for video inputs +) +output = model.generate(**inputs, max_new_tokens=1000) +generated = processor.decode(output[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True) +result = processor.parse_response(generated) +# → {"thinking": "...", "content": "...", "tool_calls": [...]} +``` + +### Wire format that the template produces + +``` +<|turn>system +<|think|> +{system prompt here if any} +<|tool>declaration:get_weather{city:{type:<|"|>STRING<|"|>,description:<|"|>...<|"|>}} + +<|turn>user +{user text} +<|image|> ← placeholder for each image +<|audio|> ← placeholder for each audio +<|video|> ← placeholder for each video + +<|turn>model +<|channel>thought +{reasoning text} + +<|tool_call>call:get_weather{city:<|"|>London<|"|>} +<|tool_response>response:get_weather{temperature:15} +{final content} + +``` + +Every Gemma-4-specific token appears in `tokenizer_config.json`. The `apply_chat_template` call + the `response_schema` + `parse_response()` round-trip means **homelab code should never hand-emit these tokens** — always go through the processor. + +--- + +## Source URLs (first-party only) + +- Model collection: https://huggingface.co/collections/google/gemma-4 +- transformers gemma4 dir: https://github.com/huggingface/transformers/tree/main/src/transformers/models/gemma4 +- Recipes repo: https://github.com/huggingface/huggingface-gemma-recipes +- Launch blog: https://huggingface.co/blog/gemma4 +- Official 31B Space: https://huggingface.co/spaces/huggingface-projects/gemma-4-31b-it +- Official E4B Space: https://huggingface.co/spaces/huggingface-projects/gemma-4-e4b-it diff --git a/tooling/huggingface/blog/gemma-peft-blog.md b/tooling/huggingface/blog/gemma-peft-blog.md new file mode 100644 index 0000000..a7493fd --- /dev/null +++ b/tooling/huggingface/blog/gemma-peft-blog.md @@ -0,0 +1,207 @@ +--- +title: Fine-Tuning Gemma Models in Hugging Face +thumbnail: /blog/assets/gemma-peft/thumbnail.png +authors: +- user: svaibhav + guest: true +- user: alanwaketan + guest: true +- user: ybelkada +- user: ArthurZ +--- + +# Fine-Tuning Gemma Models in Hugging Face + +We recently announced that [Gemma](https://huggingface.co/blog/gemma), the open weights language model from Google Deepmind, is available for the broader open-source community via Hugging Face. It’s available in 2 billion and 7 billion parameter sizes with pretrained and instruction-tuned flavors. It’s available on Hugging Face, supported in TGI, and easily accessible for deployment and fine-tuning in the Vertex Model Garden and Google Kubernetes Engine. + +
+Gemma Deploy +
+ + + +The Gemma family of models also happens to be well suited for prototyping and experimentation using the free GPU resource available via Colab. In this post we will briefly review how you can do [Parameter Efficient FineTuning (PEFT)](https://huggingface.co/blog/peft) for Gemma models, using the Hugging Face Transformers and PEFT libraries on GPUs and Cloud TPUs for anyone who wants to fine-tune Gemma models on their own dataset. + + + +## Why PEFT? + +The default (full weight) training for language models, even for modest sizes, tends to be memory and compute-intensive. On one hand, it can be prohibitive for users relying on openly available compute platforms for learning and experimentation, such as Colab or Kaggle. On the other hand, and even for enterprise users, the cost of adapting these models for different domains is an important metric to optimize. PEFT, or parameter-efficient fine tuning, is a popular technique to accomplish this at low cost. + +## PyTorch on GPU and TPU + +Gemma models in Hugging Face `transformers` are optimized for both PyTorch and PyTorch/XLA. This enables both TPU and GPU users to access and experiment with Gemma models as needed. Together with the Gemma release, we have also improved the [FSDP](https://engineering.fb.com/2021/07/15/open-source/fsdp/) experience for PyTorch/XLA in Hugging Face. This [FSDP via SPMD](https://github.com/pytorch/xla/issues/6379) integration also allows other Hugging Face models to take advantage of TPU acceleration via PyTorch/XLA. In this post, we will focus on PEFT, and more specifically on Low-Rank Adaptation (LoRA), for Gemma models. For a more comprehensive set of LoRA techniques, we encourage readers to review the [Scaling Down to Scale Up, from Lialin et al.](https://arxiv.org/pdf/2303.15647.pdf) and [this excellent post](https://pytorch.org/blog/finetune-llms/) post by Belkada et al. + +## Low-Rank Adaptation for Large Language Models + +Low-Rank Adaptation (LoRA) is one of the parameter-efficient fine-tuning techniques for large language models (LLMs). It addresses just a fraction of the total number of model parameters to be fine-tuned, by freezing the original model and only training adapter layers that are decomposed into low-rank matrices. The [PEFT library](https://github.com/huggingface/peft) provides an easy abstraction that allows users to select the model layers where adapter weights should be applied. + +```python +from peft import LoraConfig + +lora_config = LoraConfig( + r=8, + target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"], + task_type="CAUSAL_LM", +) +``` + +In this snippet, we refer to all `nn.Linear` layers as the target layers to be adapted. + +In the following example, we will leverage [QLoRA](https://huggingface.co/blog/4bit-transformers-bitsandbytes), from [Dettmers et al.](https://arxiv.org/abs/2305.14314), in order to quantize the base model in 4-bit precision for a more memory efficient fine-tuning protocol. The model can be loaded with QLoRA by first installing the `bitsandbytes` library on your environment, and then passing a `BitsAndBytesConfig` object to `from_pretrained` when loading the model. + +## Before we begin + +In order to access Gemma model artifacts, users are required to accept [the consent form](https://huggingface.co/google/gemma-7b-it). +Now let’s get started with the implementation. + +## Learning to quote + +Assuming that you have submitted the consent form, you can access the model artifacts from the [Hugging Face Hub](https://huggingface.co/collections/google/gemma-release-65d5efbccdbb8c4202ec078b). + +We start by downloading the model and the tokenizer. We also include a `BitsAndBytesConfig` for weight only quantization. + +```python +import torch +import os +from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig + +model_id = "google/gemma-2b" +bnb_config = BitsAndBytesConfig( + load_in_4bit=True, + bnb_4bit_quant_type="nf4", + bnb_4bit_compute_dtype=torch.bfloat16 +) + +tokenizer = AutoTokenizer.from_pretrained(model_id, token=os.environ['HF_TOKEN']) +model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0}, token=os.environ['HF_TOKEN']) +``` + +Now we test the model before starting the finetuning, using a famous quote: + + +```python +text = "Quote: Imagination is more" +device = "cuda:0" +inputs = tokenizer(text, return_tensors="pt").to(device) + +outputs = model.generate(**inputs, max_new_tokens=20) +print(tokenizer.decode(outputs[0], skip_special_tokens=True)) +``` + +The model does a reasonable completion with some extra tokens: +``` +Quote: Imagination is more important than knowledge. Knowledge is limited. Imagination encircles the world. + +-Albert Einstein + +I +``` + +But this is not exactly the format we would love the answer to be. Let’s see if we can use fine-tuning to teach the model to generate the answer in the following format. + +``` +Quote: Imagination is more important than knowledge. Knowledge is limited. Imagination encircles the world. + +Author: Albert Einstein +``` + +To begin with, let's select an English quotes dataset [Abirate/english_quotes](https://huggingface.co/datasets/Abirate/english_quotes). + +```python +from datasets import load_dataset + +data = load_dataset("Abirate/english_quotes") +data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True) +``` + +Now let’s finetune this model using the LoRA config stated above: + +```python +import transformers +from trl import SFTTrainer + +def formatting_func(example): + text = f"Quote: {example['quote'][0]}\nAuthor: {example['author'][0]}" + return [text] + +trainer = SFTTrainer( + model=model, + train_dataset=data["train"], + args=transformers.TrainingArguments( + per_device_train_batch_size=1, + gradient_accumulation_steps=4, + warmup_steps=2, + max_steps=10, + learning_rate=2e-4, + fp16=True, + logging_steps=1, + output_dir="outputs", + optim="paged_adamw_8bit" + ), + peft_config=lora_config, + formatting_func=formatting_func, +) +trainer.train() +``` + +Finally, we are ready to test the model once more with the same prompt we have used earlier: + +```python +text = "Quote: Imagination is" +device = "cuda:0" +inputs = tokenizer(text, return_tensors="pt").to(device) + +outputs = model.generate(**inputs, max_new_tokens=20) +print(tokenizer.decode(outputs[0], skip_special_tokens=True)) +``` + +This time we get the response in the format we like: + + +``` +Quote: Imagination is more important than knowledge. Knowledge is limited. Imagination encircles the world. +Author: Albert Einstein +``` + + +## Accelerate with FSDP via SPMD on TPU + +As mentioned earlier, Hugging Face `transformers` now supports PyTorch/XLA’s latest FSDP implementation. This can greatly accelerate the fine-tuning speed. To enable that, one just needs to add a FSDP config to the `transformers.Trainer`: + +```python +from transformers import DataCollatorForLanguageModeling, Trainer, TrainingArguments + +# Set up the FSDP config. To enable FSDP via SPMD, set xla_fsdp_v2 to True. +fsdp_config = { + "fsdp_transformer_layer_cls_to_wrap": ["GemmaDecoderLayer"], + "xla": True, + "xla_fsdp_v2": True, + "xla_fsdp_grad_ckpt": True +} + +# Finally, set up the trainer and train the model. +trainer = Trainer( + model=model, + train_dataset=data, + args=TrainingArguments( + per_device_train_batch_size=64, # This is actually the global batch size for SPMD. + num_train_epochs=100, + max_steps=-1, + output_dir="./output", + optim="adafactor", + logging_steps=1, + dataloader_drop_last = True, # Required for SPMD. + fsdp="full_shard", + fsdp_config=fsdp_config, + ), + data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False), +) +trainer.train() +``` + +## Next Steps + +We walked through this simple example adapted from the source notebook to illustrate the LoRA finetuning method applied to Gemma models. The full colab for GPU can be found [here](https://huggingface.co/google/gemma-7b/blob/main/examples/notebook_sft_peft.ipynb), and the full script for TPU can be found [here](https://huggingface.co/google/gemma-7b/blob/main/examples/example_fsdp.py). We are excited about the endless possibilities for research and learning thanks to this recent addition to our open source ecosystem. We encourage users to also visit the [Gemma documentation](https://huggingface.co/docs/transformers/v4.38.0/en/model_doc/gemma), as well as our [launch blog](https://huggingface.co/blog/gemma) for more examples to train, finetune and deploy Gemma models. + + diff --git a/tooling/huggingface/blog/gemma4-blog.md b/tooling/huggingface/blog/gemma4-blog.md new file mode 100644 index 0000000..66ea09c --- /dev/null +++ b/tooling/huggingface/blog/gemma4-blog.md @@ -0,0 +1,764 @@ +--- +title: "Welcome Gemma 4: Frontier multimodal intelligence on device" +thumbnail: /blog/assets/gemma4/thumbnail.png +authors: +- user: merve +- user: pcuenq +- user: sergiopaniego +- user: burtenshaw +- user: Steveeeeeeen +- user: alvarobartt +- user: SaylorTwift +--- + +# Welcome Gemma 4: Frontier multimodal intelligence on device + +The Gemma 4 family of multimodal models by Google DeepMind is out on Hugging Face, with support for your favorite agents, inference engines, and fine-tuning libraries 🤗 + +These models are the real deal: truly open with Apache 2 licenses, high quality with pareto frontier arena scores, multimodal including audio, and sizes you can use _everywhere_ including on-device. Gemma 4 builds on advances from previous families and makes them click together. In our tests with pre-release checkpoints we have been impressed by their capabilities, to the extent that we struggled to find good fine-tuning examples because they are _so good_ out of the box. + +We collaborated with Google and the community to make them available everywhere: transformers, llama.cpp, MLX, WebGPU, Rust; you name it. This blog post will show you how to build with [your favorite tools](https://huggingface.co/collections/google/gemma-4) so let us know what you think! + +## Table of Contents + +- [What is New with Gemma 4?](#what-is-new-with-gemma-4) +- [Overview of Capabilities and Architecture](#overview-of-capabilities-and-architecture) + - [Architecture at a Glance](#architecture-at-a-glance) + - [Per-Layer Embeddings (PLE)](#per-layer-embeddings-ple) + - [Shared KV Cache](#shared-kv-cache) +- [Multimodal Capabilities](#multimodal-capabilities) +- [Deploy Anywhere](#deploy-anywhere) + - [transformers](#transformers) + - [Llama.cpp](#llamacpp) + - [Plug in to your local agent](#Plug-in-your-local-agent) + - [transformers.js](#transformersjs) + - [MLX](#mlx) + - [Mistral.rs](#mistralrs) +- [Fine-tuning & Demos](#fine-tuning--demos) + - [Fine-tuning with TRL](#fine-tuning-with-trl) + - [Fine-tuning with TRL on Vertex AI](#fine-tuning-with-trl-on-vertex-ai) + - [Fine-tuning with Unsloth Studio](#fine-tuning-with-unsloth-studio) +- [Try Gemma 4](#try-gemma-4) +- [Benchmark Results](#benchmark-results) +- [Acknowledgements](#acknowledgements) + +# What is new with Gemma 4? + +Similar to Gemma-3n, Gemma 4 supports image, text, and audio inputs, and generates text responses. The text decoder is based on the Gemma model with support for long context windows. The image encoder is similar to the one from Gemma 3 but with two crucial improvements: variable aspect ratios, and configurable number of image token inputs to find your sweet spot between speed, memory, and quality. All models support images (or video) and text inputs, while the small variants (E2B and E4B) support audio as well. + +Gemma 4 comes in four sizes, all base and instruction fine-tuned: + +| Model | Parameter Size | Context Window | Checkpoints | +| :---- | :---- | :---- | :---- | +| Gemma 4 E2B | 2.3B effective, 5.1B with embeddings | 128k | [base](https://huggingface.co/google/gemma-4-E2B), [IT](https://huggingface.co/google/gemma-4-E2B-it) | +| Gemma 4 E4B | 4.5B effective, 8B with embeddings | 128k | [base](https://huggingface.co/google/gemma-4-E4B), [IT](https://huggingface.co/google/gemma-4-E4B-it) | +| Gemma 4 31B | 31B dense model | 256K | [base](https://huggingface.co/google/gemma-4-31B), [IT](https://huggingface.co/google/gemma-4-31B-it) | +| Gemma 4 26B A4B | mixture-of-experts with 4B activated/26B total parameters | 256K | [base](https://huggingface.co/google/gemma-4-26B-A4B), [IT](https://huggingface.co/google/gemma-4-26B-A4B-it) | + +## Overview of Capabilities and Architecture + +Gemma 4 leverages several architecture components used in previous Gemma versions and other open models, and leaves out complex or inconclusive features such as Altup. The combination is a mix designed to be highly compatible across libraries and devices, that can efficiently support long context and agentic use cases, whilst being ideal for quantization. + +As shown in the benchmarks above, this feature mix (combined with the training data and recipe) enables the 31B dense model to achieve an estimated LMArena score (text only) of 1452, while the 26B MoE reaches 1441 with just 4B active parameters 🤯. As we'll see, multimodal operation is comparatively as good as text generation, at least in informal and subjective tests. + +These are the main architecture characteristics in Gemma 4: + +* Alternating **local sliding-window** and **global full-context** attention layers. Smaller dense models use sliding windows of 512 tokens while larger models use 1024 tokens. +* **Dual RoPE** configurations: standard RoPE for sliding layers, pruned RoPE for global layers, to enable longer context. +* **Per-Layer Embeddings (PLE)**: a second embedding table that feeds a small residual signal into every decoder layer. +* **Shared KV Cache**: the last N layers of the model reuse key-value states from earlier layers, eliminating redundant KV projections. +* **Vision encoder**: uses learned 2D positions and multidimensional RoPE. Preserves the original aspect ratios and can encode images to a few different token budgets (70, 140, 280, 560, 1120). +* **Audio encoder**: USM-style conformer with the same base architecture as the one in Gemma-3n. + +#### Per-Layer Embeddings (PLE) + +One of the most distinctive features in smaller Gemma 4 models is Per-Layer Embeddings (PLE), which was introduced previously in Gemma-3n. In a standard transformer, each token gets a single embedding vector at input, and the same initial representation is what the residual stream builds on across all layers, forcing the embedding to frontload everything the model might need. PLE adds a parallel, lower-dimensional conditioning pathway alongside the main residual stream. For each token, it produces a small dedicated vector for every layer by combining two signals: a token-identity component (from an embedding lookup) and a context-aware component (from a learned projection of the main embeddings). Each decoder layer then uses its corresponding vector to modulate the hidden states via a lightweight residual block after attention and feed-forward. This gives each layer its own channel to receive token-specific information only when it becomes relevant, rather than requiring everything to be packed into a single upfront embedding. Because the PLE dimension is much smaller than the main hidden size, this adds meaningful per-layer specialization at modest parameter cost. For multimodal inputs (images, audio, video), PLE is computed before soft tokens are merged into the embedding sequence — since PLE relies on token IDs that are lost once multimodal features replace the placeholders. Multimodal positions use the pad token ID, effectively receiving neutral per-layer signals. + +#### Shared KV Cache + +The **shared KV cache** is an efficiency optimization that reduces both compute and memory during inference. The last `num_kv_shared_layers` layers of the model don't compute their own key and value projections. Instead, they **reuse** the K and V tensors from the last non-shared layer of the same attention type (sliding or full). + +In practice, this has a minimal impact on quality while being much more efficient (in terms of both memory and compute) for long context generation and on-device use. + +## Multimodal Capabilities + +We saw in our tests that Gemma 4 supports comprehensive multimodal capabilities out of the box. We don't know what was the training mix, but we had success using it for tasks such as OCR, speech-to-text, object detection, or pointing. It also supports text-only and multimodal function calling, reasoning, code completion and correction. + +Here, we show a few inference examples across different model sizes. You can run them conveniently with [this notebook](https://github.com/huggingface/huggingface-gemma-recipes/blob/main/notebooks/Gemma4_(E2B)-Multimodal.ipynb). We encourage you to try the demos and share them below this blog! + +### Object Detection and Pointing + +### GUI detection + +We test Gemma 4 on GUI element detection and pointing across different sizes, with the following image and text prompt: "What's the bounding box for the "view recipe" element in the image?" + +![Image](https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/food_resized.png) + +With this prompt, the model natively responds in JSON format with the detected bounding boxes - no need for specific instructions or grammar-constrained generation. We found the coordinates refer to an image size of 1000x1000, relative to the input dimensions. + +We visualize the outputs below for your convenience. We parse the bounding boxes from the returned JSON: ```json\n[\n {"box_2d": [171, 75, 245, 308], "label": "view recipe element"}\n]\n``` + +| E2B | E4B | +| :---- | :---- | +| ![E2B](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/e2b.png) | ![E4B](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/e4b.png) | +| 26/A4B | 31B | +| ![31B](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/26b.png) | ![31B](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/31b.png) | + +### Object Detection + +We test models to detect everyday objects, here we ask them to detect the bike and compare different model outputs. As in the previous case, we parse the bounding box from the json and translate to image space coordinates. + +| E2B | E4B | 26B/A3B | 31B | +| :---- | :---- | :---- | :---- | +| ![E2B](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/e2b_bike.png) | ![E4B](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/bike_e4b.png) | ![26B](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/bike_26b.png) | ![31B](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/31b_bike.png) | + +### Multimodal Thinking and Function Calling + +We asked Gemma 4 to write HTML code to reconstruct a page we made with Gemini 3. Below you can find the code to do this, we enable thinking and ask each model to generate up to 4000 new tokens, to make it foolproof. + +| Gemini Generated Website (Reference) | Gemini Reproduced Image | +| :---- | :---- | +| ![Reference](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/landing_page.png) | ![Reference](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/page_repro_gemini.png) | + +
+Inference code + +```py +messages = [ + { + "role": "user", + "content": [ + { + "type": "image", + "image": "https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/landing_page.png", + }, + {"type": "text", "text": "Write HTML code for this page."}, + ], + } +] +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, + enable_thinking=True, +).to(model.device) +output = model.generate(**inputs, max_new_tokens=4000) +input_len = inputs.input_ids.shape[-1] +generated_text_ids = output[0][input_len:] +generated_text = processor.decode(generated_text_ids, skip_special_tokens=True) +result = processor.parse_response(generated_text) +print(result["content"]) +``` + +
+ +| E2B | E4B | +| :---- | :---- | +| ![Reference](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/page_e2b.png) | ![E4B](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/page_e4b.png) | + +| 31B | 26/A4B | +| :---- | :---- | +| ![31B](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/page_31b.png) | ![MoE](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/g4-blog/page_26b.png) | + +### Video Understanding + +Smaller Gemma 4 models can take in videos with audio while larger ones can take in videos without audio. While the models are not explicitly post-trained on videos, they can understand videos both with and without audio. The model is particularly strong in audios + +
+Inference code + +```py +messages = [ + { + "role": "user", + "content": [ + {"type": "video", "url": "https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/concert.mp4"}, + {"type": "text", "text": "What is happening in the video? What is the song about?"}, + ], + }, +] +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, + load_audio_from_video=True, # disable this for larger models +).to(model.device) +output = model.generate(**inputs, max_new_tokens=200) +input_len = inputs.input_ids.shape[-1] +generated_text_ids = output[0][input_len:] +generated_text = processor.decode(generated_text_ids, skip_special_tokens=True) +print(result["content"]) +``` + +
+ +| Model | Output | +| :---- | :---- | +| E2B (hallucinated for the audio) | Based on the video provided, here is an analysis: **What is happening in this video?** The video captures a live musical performance, likely at a large outdoor festival or concert. * **Setting:** The scene is a large stage set up outdoors, indicated by the extensive lighting rig, large speakers, and the audience visible in the foreground. * **Performers:** There are several musicians on stage. The focal point seems to be a lead singer/guitarist in a blue shirt and white pants who is actively performing, playing an acoustic or electric guitar. There are also other musicians, including drummers and bassists, visible in the background. * **Atmosphere:** The atmosphere is energetic and dramatic, characterized by bright stage lights, intense blue and white spotlights, and a significant amount of stage fog/smoke, which adds to the concert mood. * **Audience:** The foreground is filled with the backs of audience members, showing they are close to the stage and engaged in watching the performance. **What is the song about?** The lyrics provided below the video are: > "I don't care 'bout street > From what I cherish > Could it's time to protest | +| E4B ✅ | The video captures a live performance of a band on a stage with bright lights and smoke effects, surrounded by an audience. The band performs and sings to the crowd. The song appears to be about the struggles and disillusionment of modern life, specifically the feeling of being stuck and the difficulties in finding one's place in a chaotic and overwhelming world. Lyrics mention themes of the "concrete street," "broken promises," "storm of the day," and the difficulty of finding oneself amidst societal pressures. | +| 26/A4B (no audio) ✅ | This video shows a live music performance at an outdoor concert or festival at night. A solo male artist, dressed in a blue shirt and white pants, is the central focus. He is playing an acoustic guitar and singing into a microphone on a large, lit stage. The performance is accompanied by a band, including a drummer and other musicians visible in the background and on large video screens. The stage is filled with atmospheric blue lighting, smoke, and bright spotlights. The video captures different perspectives, including shots from the crowd looking up at the stage and close-up views of the performers projected onto large screens. The crowd is visible in the foreground, watching the show. One of the screens also displays the "Estrella Damm" brand name. | +| 31B (no audio) ✅ | A musician is performing on stage during a concert, playing an acoustic guitar and singing into a microphone. He is wearing a blue shirt and white pants. The stage is filled with atmospheric smoke and illuminated by bright blue and white lights. Other band members, including a drummer and a keyboardist, are also visible on stage. The video shows the musician from the perspective of the audience, with some shots focusing on him and others showing the large screen on the side of the stage, which displays close-ups of the performer and the crowd. | + +### Captioning + +We have tested all models on captioning. All checkpoints perform very well and accurately capture nuance in complex scenerios. +Here's the image we use, with the prompt "Write single detailed caption for this image.". + +![image](https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/bird.png) + +
+Inference code + +```py +messages = [ + { + "role": "user", + "content": [ + {"type": "image", "url": "https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/bird.png"}, + {"type": "text", "text": "Write single detailed caption for this image."}, + ], + }, +] +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +output = model.generate(**inputs, max_new_tokens=512) +input_len = inputs.input_ids.shape[-1] +generated_text_ids = output[0][input_len:] +generated_text = processor.decode(generated_text_ids, skip_special_tokens=True) +result = processor.parse_response(generated_text) +print(result["content"]) +``` + +
+ +| Model | Output | +| :---- | :---- | +| E2B ✅ | A medium shot captures a weathered seagull perched atop a stone pedestal in what appears to be a bustling European square, with a grand, classical-style building featuring ornate columns and architectural details dominating the right side of the frame. In the background, a low, light-colored building stretches across the middle ground, flanked by some green foliage, and hints of other structures, including a terracotta-roofed building, are visible. In the foreground, part of a modern bus stop or shelter is visible, with signage indicating routes, and the overall scene is bathed in bright daylight under a pale sky. | +| E4B ✅ | A solitary seagull perches atop a grey stone pedestal in a bustling European piazza, with the historical grandeur of a classical building dominating the right side of the frame, contrasting with the more modern, low-rise architecture in the background under a bright, hazy sky. | +| 26B/A4B ✅ | A full shot shows a white and brown seagull perched on a gray pillar in Venice, Italy. The seagull faces left, its beak pointed towards the edge of the frame. Behind it is a light gray, two-story building with several dark rectangular windows. To the right of the seagull is a large, white building with columns and intricate carvings. The sky is a clear, light blue. The lighting is bright and even, suggesting it is daytime. | +| 31B ✅ | A high-angle, medium shot shows a white and gray seagull perched on a gray concrete pylon in Venice, Italy. The seagull is facing left, its body angled slightly towards the camera. Its white head, neck, and chest contrast with its gray wings and back. The seagull's orange beak is slightly open. The pylon is a light gray concrete cylinder with a weathered texture. In the background, there are several buildings. To the left is a long, low, white building with a series of small, rectangular windows. In the center is a cluster of green trees. To the right is a red building with a tiled roof and several small, rectangular windows. Further right is a large, ornate, white stone building with several columns and sculptures. The sky is a pale blue, and the light is bright and clear. In the lower-left corner, there is a yellow bus with the word "Ferrovia" printed on it in black letters. There are several people walking in the background, but they are small and out of focus. | + +### Audio Question Answering + +These models are trained to answer questions about speech in audio. Music and non-speech sounds were not part of the training data. + +
+Inference code + +```py +messages = [ + { + "role": "user", + "content": [ + {"type": "audio", "url": "https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/obama_first_45_secs.mp3"}, + {"type": "text", "text": "Can you describe this audio in detail?"}, + ], + }, +] + +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) + +output = model.generate( + **inputs, + max_new_tokens=1000, + do_sample=False, +) + +print(processor.decode(output[0], skip_special_tokens=True)) +``` + +
+ +| Model | Output | +| :---- | :---- | +| E2B | This audio is a personal reflection. The speaker is talking about their final farewell address to the nation, which they delivered in Chicago. They express gratitude for the conversations they've had with the American people, noting that despite not having met them face-to-face or even greeted them, these interactions in various settings like living rooms, schools, farms, factory floors, diners, and military outposts have been what has kept them going. | +| E4B | The audio is a speech excerpt where a speaker is delivering a farewell address to the nation from Chicago. The speaker reflects on their time in office, expressing gratitude for the conversations they had with the American people across various settings like living rooms, schools, farms, factories, diners, and military outposts. The tone is reflective and appreciative, highlighting the importance of these interactions in their political journey. | + +Here is an example if you want to do transcription: + +
+Inference code + +```py +messages = [ + { + "role": "user", + "content": [ + {"type": "audio", "url": "https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/obama_first_45_secs.mp3"}, + {"type": "text", "text": "Transcribe the audio?"}, + ], + }, +] + +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) + +output = model.generate( + **inputs, + max_new_tokens=1000, + do_sample=False, +) + +print(processor.decode(output[0], skip_special_tokens=True)) +``` + +
+ +| Model | Output | +| :---- | :---- | +| E2B | This week I traveled to Chicago to deliver my final farewell address to the nation following in the tradition of presidents before me It was an opportunity to say thank you whether we've seen eye to eye or rarely agreed at all my conversations with you the American people in living rooms and schools at farms and on factory floors at diners and on distant military outposts all these conversations are what have kept me honest | +| E4B | This week I traveled to Chicago to deliver my final farewell address to the nation following in the tradition of presidents before me. It was an opportunity to say thank you. Whether we've seen eye to eye or rarely agreed at all, my conversations with you, the American people, in living rooms and schools, at farms and on factory floors, at diners and on distant military outposts, all these conversations are what have kept me honest. | + +### Multimodal Function Calling + +We test the model by asking to get the weather in the place shown in the image. + +
+Inference code + +```py +import re +WEATHER_TOOL = { + "type": "function", + "function": { + "name": "get_weather", + "description": "Gets the current weather for a specific location.", + "parameters": { + "type": "object", + "properties": { + "city": {"type": "string", "description": "The city name"}, + }, + "required": ["city"], + }, + }, +} +tools = [WEATHER_TOOL] +messages = [ + {"role": "user", "content": [ + {"type": "text", "text": "What is the city in this image? Check the weather there right now."}, + + {"type": "image", "image": "https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/thailand.jpg"}, + ]}, +] +inputs = processor.apply_chat_template( + messages, + tools=[WEATHER_TOOL], + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, + enable_thinking=True, +).to(model.device) +output = model.generate(**inputs, max_new_tokens=1000) +input_len = inputs.input_ids.shape[-1] +generated_text_ids = output[0][input_len:] +generated_text = processor.decode(generated_text_ids, skip_special_tokens=True) +result = processor.parse_response(generated_text) +print(result["content"]) +``` + +
+ +| Model | Output | +| :---- | :---- | +| E2B | **Analyze the Request:** The user has provided an image and asked two questions:\n * "What is the city in this image?" * "Check the weather there right now."\n\n2. **Analyze the Image:** The image displays a large, ornate, tall structure, characteristic of temple architecture, likely in Southeast Asia (given the style). It appears to be a famous landmark. 3. **Identify the Landmark/Location (Internal Knowledge/Visual Cues):** The structure strongly resembles the chedis/stupas found in places like Bangkok, Thailand (e.g., Wat Arun or similar structures).\n\n4. **Determine the City:** Based on the visual evidence, the location is highly likely to be Bangkok, Thailand.\n\n5. **Check Available Tools:** The available tool is `get_weather(city: STRING)`. 6. **Formulate the Plan:**\n * Step 1: Identify the city from the image (Bangkok).\n * Step 2: Use the `get_weather` tool with the identified city.\n\n7. **Construct the Tool Call:** `get_weather(city="Bangkok")`\n\n8. **Generate Response:** The model should first state the identified city and then call the tool to get the weather. (Self-correction: The model *must* call the tool in the correct format if it needs external information.)call:get_weather{city:Bangkok} | +| E4B | '1. **Analyze the Request:** The user has provided an image and asked two things:\n a. Identify the city in the image.\n b. Check the weather in that city right now.\n\n2. **Analyze the Image:** The image shows a very distinctive, ornate tower/spire structure, characteristic of Thai architecture (specifically, a chedi or prang).\n\n3. **Determine City Identification (Internal Knowledge/Visual Recognition):** Based on the architectural style, the structure strongly resembles prominent temples in Thailand, such as Wat Arun in Bangkok.\n\n4. **Evaluate Tool Availability:** I have one available tool: `get_weather(city: STRING)`. This tool requires a city name as input.\n\n5. **Formulate a Plan:**\n a. I need the city name to use the `get_weather` tool.\n b. I must first identify the city from the image.\n c. Since my internal knowledge suggests Bangkok, I will proceed with that assumption, but I must acknowledge that I cannot *definitively* prove the location from the image alone.\n\n6. **Execute Step 1 (City Identification):** Assume the city is Bangkok based on the visual evidence of the architecture.\n\n7. **Execute Step 2 (Tool Call Generation):** Use the `get_weather` tool with "Bangkok" as the `city` argument.\n\n8. **Construct the Response:** Generate the appropriate tool call in JSON format.', 'answer': '<\|tool_call>call:get_weather{city:<\|"\|>Bangkok<\|"\|>}' | +| 31B | The user wants to know the city in the image and the current weather there. 1. **Analyze the image:** The image shows ornate, spire-like structures (prangs) characteristic of Thai temple architecture. Specifically, these look like the prangs at Wat Phra Kaew (Temple of the Emerald Buddha) or Wat Pho in Bangkok, Thailand. The style is very distinctive of Bangkok's royal temple complexes. 2. **Identify the city:** Based on the architectural style, the city is Bangkok. 3. **Find the weather:** I need to call the `get_weather` tool with `city="Bangkok"`.call:get_weather{city:Bangkok} | +| 26/A4B | The image shows several tall, ornate towers (prangs) typical of Thai Buddhist temple architecture. Specifically, these look like the structures within the Grand Palace or Wat Phra Kaew in Bangkok, Thailand. The style is unmistakably Thai. I'll confirm this by looking at the specific towers. These are very iconic.\n2. **Confirm the location:** The image shows elements of the Grand Palace complex in Bangkok.\n3. **Get weather information:** Once the city is identified as Bangkok, I will use the `get_weather` tool with `city='Bangkok'`.call:get_weather{city:Bangkok} | + +# Deploy Anywhere + +Gemma 4 comes with day-0 support for many open-source inference engines, and is ideal for tool calling and agents! We also release ONNX checkpoints that can run on many hardware backends, allowing use cases on edge devices or in browser! + +## transformers + +Gemma 4 comes with first-class transformers support from the get-go 🤗. This integration allows using the model with other libraries like bitsandbytes, PEFT and TRL. Make sure to install the latest version of transformers. + +```bash +pip install -U transformers +``` + +The easiest way to infer with the small Gemma 4 models is through the `any-to-any` pipeline. You can initialize it as follows. + +```py +from transformers import pipeline +pipe = pipeline("any-to-any", model="google/gemma-4-e2b-it") +``` + +You can then pass in images and text as follows. + +```python +messages = [ + { + "role": "user", + "content": [ + { + "type": "image", + "image": "https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/thailand.jpg", + }, + {"type": "text", "text": "Do you have travel advice going to here?"}, + ], + } +] +output = pipe(messages, max_new_tokens=100, return_full_text=False) +output[0]["generated_text"] +# Based on the image, which appears to show a magnificent, ornate **Buddhist temple or pagoda**, likely in Southeast Asia (such as Thailand, Myanmar, or Cambodia), here is some general travel advice.. +``` + +When inferring with videos, you can include the audio track using the `load_audio_from_video` argument. + +```python +messages = [ + { + "role": "user", + "content": [ + { + "type": "video", + "image": "https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/rockets.mp4", + }, + {"type": "text", "text": "What is happening in this video?"}, + ], + } +] +pipe(messages, load_audio_from_video=True) +``` + +Going a level lower, you can load Gemma 4 using the `AutoModelForMultimodalLM` class, especially useful for fine-tuning. The built-in chat template takes care of formatting the inputs correctly, please make sure you use it to prevent subtle mistakes when building the prompt manually. + +
+Inference code + +```python +from transformers import AutoModelForMultimodalLM, AutoProcessor +model = AutoModelForMultimodalLM.from_pretrained("google/gemma-4-E2B-it", device_map="auto") +processor = AutoProcessor.from_pretrained("google/gemma-4-E2B-it") +messages = [ + { + "role": "user", + "content": [ + { + "type": "video", + "image": "https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/rockets.mp4", + }, + {"type": "text", "text": "What is happening in this video?"}, + ], + } +] +inputs = processor.apply_chat_template( + messages, + tokenize=True, + add_generation_prompt=True, + return_dict=True, + return_tensors="pt" +).to(model.device) + +generated_ids = model.generate(**inputs, max_new_tokens=128) +generated_ids_trimmed = [ + out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) +] +output_text = processor.batch_decode( + generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False +) +print(output_text) +``` + +
+ +## Llama.cpp + +Gemma 4 models come with image+text support in llama.cpp from the get-go! This unlocks using Gemma 4 with all of your favorite local apps: llama-cpp server, lmstudio, Jan as well as coding agents like Pi across many backends such as Metal and CUDA. + +You can install llama-cpp as follows. + +```bash +brew install llama.cpp # MacOS +winget install llama.cpp # Windows +``` + +You can then start a server compatible with the OpenAI API Replace the quantization scheme at the end of the command with the precision of your choice. + +```bash +llama-server -hf ggml-org/gemma-4-E2B-it-GGUF +``` + +Check out this link [for more](https://huggingface.co/ggml-org/gemma-4-E2B-it-GGUF?local-app=llama.cpp) options on combining llama.cpp with different coding agents and local apps. Find all the GGUF checkpoints [in this collection](https://huggingface.co/collections/ggml-org/gemma-4). + +## Plug in your local agent + +We worked on making sure the new models work locally with agents like **openclaw, hermes, pi, and open code**. All thanks to llama.cpp! Run the following to try Gemma 4 right away. + +First, start your local server: + +``` +llama-server -hf ggml-org/gemma-4-26b-a4b-it-GGUF:Q4_K_M +``` + +For **hermes:** + +```shell +hermes model +``` + +For **openclaw:** + +```shell +openclaw onboard +``` + +For **pi** define a `~/.pi/agent/models.json`: + +```json +{ + "providers": { + "llama-cpp": { + "baseUrl": "http://localhost:8080/v1", + "api": "openai-completions", + "apiKey": "none", + "models": [ + { + "id": "ggml-org-gemma-4-26b-4b-gguf" + } + ] + } + } +} +``` + +For **open code** define a `~/.config/opencode/opencode.json`: + +```json +{ + "$schema": "https://opencode.ai/config.json", + "provider": { + "llama.cpp": { + "npm": "@ai-sdk/openai-compatible", + "name": "llama-server (local)", + "options": { + "baseURL": "http://127.0.0.1:8080/v1" + }, + "models": { + "gemma-4-26b-4b-it": { + "name": "Gemma 4 (local)", + "limit": { + "context": 128000, + "output": 8192 + } + } + } + } + } +} +``` + +## transformers.js + +transformers.js enables running Gemma 4 right inside browser. You can check out the model card to see text-only, image & text, audio & text inference in detail [here](https://huggingface.co/onnx-community/gemma-4-E2B-it-ONNX#transformersjs-javascript). We also shipped a demo for you to test the model [here](https://huggingface.co/spaces/webml-community/Gemma-4-WebGPU). + +## MLX + +Full multimodal support of Gemma 4 is available using the open-source [`mlx-vlm` library](https://github.com/Blaizzy/mlx-vlm). Here's how to ask the model to describe an image: + +```shell +pip install -U mlx-vlm +``` + +```shell +mlx_vlm.generate \ +--model google/gemma-4-E4B-it \ +--image https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg \ +--prompt "Describe this image in detail" +``` + +mlx-vlm supports TurboQuant, which delivers the same accuracy as the uncompressed baseline while using ~4x less active memory and running a lot faster end-to-end. This makes long-context inference practical on Apple Silicon without sacrificing quality. Use it like this: + +```shell +mlx_vlm.generate \ +--model "mlx-community/gemma-4-26b-a4b-it-4bit" \ +--prompt "Your prompt here" \ +--kv-bits 3.5 \ +--kv-quant-scheme turboquant +``` + +For audio examples and more details, please check [the MLX collection](https://hf.co/mlx-community/gemma-4). + +### Mistral.rs + +[mistral.rs](https://github.com/EricLBuehler/mistral.rs) is a Rust-native inference engine with day-0 Gemma 4 support across all modalities (text, image, video, audio) and builtin tool-calling and agentic functionality. Install mistral.rs: + +```bash +curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/EricLBuehler/mistral.rs/master/install.sh | sh # Linux/macOS + +irm https://raw.githubusercontent.com/EricLBuehler/mistral.rs/master/install.ps1 | iex # Windows +``` + +You can then start an OpenAI-compatible HTTP server: + +```bash +mistralrs serve mistralrs-community/gemma-4-E4B-it-UQFF --from-uqff 8 +``` + +Or, use interactive mode: + +``` +mistralrs run -m google/gemma-4-E4B-it --isq 8 --image image.png -i "Describe this image in detail." + +mistralrs run -m google/gemma-4-E4B-it --isq 8 --audio audio.mp3 -i "Transcribe this fully." +``` + +Find all models [here](https://huggingface.co/mistralrs-community/models). Please, follow [the instructions](https://huggingface.co/mistralrs-community/gemma-4-E2B-it-UQFF#install) in the model cards for installation and inference guidelines. + +## Fine-tuning for all + +Gemma 4 models are ideal for fine-tuning in your favorite tools and platforms and at any budget. + +## Fine-tuning with TRL + +Gemma 4 is fully supported for fine-tuning with TRL. To celebrate, TRL has been upgraded with support for multimodal tool responses when interacting with environments, meaning models can now receive images back from tools during training, not just text. + +To showcase this, we've built an example training script where Gemma 4 learns to drive in the CARLA simulator. The model sees the road through a camera, decides what to do and learns from the outcome. After training, it consistently changes lanes to avoid pedestrians. The same approach works for any task where a model needs to see and act: robotics, web browsing, or other interactive environments. + +Get started: + +```shell +# pip install git+https://github.com/huggingface/trl.git + +python examples/scripts/openenv/carla_vlm_gemma.py \ + --env-urls https://sergiopaniego-carla-env.hf.space \ + https://sergiopaniego-carla-env-2.hf.space \ + --model google/gemma-4-E2B-it +``` + +Find the example [here](https://github.com/huggingface/huggingface-gemma-recipes/blob/main/scripts/carla_vlm_gemma.py). + +### Fine-tuning with TRL on Vertex AI + +Additionally, we have prepared an example on how to fine-tune Gemma 4 with TRL on Vertex AI using SFT, to showcase how to extend the function calling capabilities, whilst freezing both the vision and audio towers. The examples include how to build a custom Docker container with latest Transformers, TRL, etc. with CUDA support on Google Cloud, and how to run it via Vertex AI Serverless Training Jobs. + +```python +# pip install google-cloud-aiplatform --upgrade --quiet +from google.cloud import aiplatform + +aiplatform.init( + project="", + location="", + staging_bucket="", +) + +job = aiplatform.CustomContainerTrainingJob( + display_name="gemma-4-fine-tuning", + container_uri="", + command=["python", "/gcs/gemma-4-fine-tuning/train.py"], +) + +job = job.submit( + replica_count=1, + machine_type="a3-highgpu-1g", + accelerator_type="NVIDIA_H100_80GB", + accelerator_count=1, + base_output_dir="/output-dir", + environment_variables={ + "MODEL_ID": "google/gemma-4-E2B-it", + "HF_TOKEN": , + }, + boot_disk_size_gb=500, +) +``` + +You can find the complete example in the "Hugging Face on Google Cloud" docs at https://hf.co/docs/google-cloud/examples/vertex-ai-notebooks-fine-tune-gemma-4. + +## Fine-tuning with Unsloth Studio + +If you want to fine tune and run a Gemma 4 model in a UI, try out [Unsloth Studio](https://unsloth.ai/docs/new/studio). It runs locally or on Google Colab. First, install and start the app: + +```shell +# install unsloth studio on MacOS, Linux, WSL +curl -fsSL https://unsloth.ai/install.sh | sh + +# install unsloth studio on Windows +irm https://unsloth.ai/install.ps1 | iex + +# launch unsloth studio +unsloth studio -H 0.0.0.0 -p 8888 +# Search for for a Gemma 4 model like google/gemma-4-E2B-it +``` + +Then select any of the Gemma 4 models from the hub. + +![Unsloth Studio](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/gemma4/unsloth.png) + +## Try Gemma 4 + +We have shipped demos for you to try different Gemma 4 models. We include demos based on transformers implementation for [E4B](https://huggingface.co/spaces/huggingface-projects/gemma-4-e4b-it), [26B/A4B](https://huggingface.co/spaces/huggingface-projects/gemma-4-26b-a4b-it), and dense [31B](https://huggingface.co/spaces/huggingface-projects/gemma-4-31b-it) models, as well as a [WebGPU](https://huggingface.co/spaces/webml-community/Gemma-4-WebGPU) demo with transformers.js 🚀 + + + + +## Benchmark Results + +Gemma 4 models demonstrate exceptional performance across diverse benchmarks, from reasoning and coding to vision and long-context tasks. The graph below shows model performance vs size, with Gemma 4 models forming an impressive Pareto frontier: + +
+
+ Gemma 4 Performance vs Size +
+
+ Gemma 4 Arena Elo Score Comparison +
+
+ +

Source: Google (blog.google)

+ +Here are detailed benchmark results for the instruction-tuned models: + +| Benchmark | Gemma 4 31B | Gemma 4 26B A4B | Gemma 4 E4B | Gemma 4 E2B | Gemma 3 27B (no think) | +|-----------|-------------|-----------------|-------------|-------------|------------------------| +| **Reasoning & Knowledge** | +| MMLU Pro | [85.2%](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro?eval_result=google/gemma-4-31B-it) | [82.6%](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro?eval_result=google/gemma-4-26B-A4B-it) | [69.4%](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro?eval_result=google/gemma-4-E4B-it) | [60.0%](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro?eval_result=google/gemma-4-E2B-it) | 67.6% | +| AIME 2026 no tools | [89.2%](https://huggingface.co/datasets/MathArena/aime_2026?eval_result=google/gemma-4-31B-it) | [88.3%](https://huggingface.co/datasets/MathArena/aime_2026?eval_result=google/gemma-4-26B-A4B-it) | [42.5%](https://huggingface.co/datasets/MathArena/aime_2026?eval_result=google/gemma-4-E4B-it) | [37.5%](https://huggingface.co/datasets/MathArena/aime_2026?eval_result=google/gemma-4-E2B-it) | 20.8% | +| GPQA Diamond | [84.3%](https://huggingface.co/datasets/Idavidrein/gpqa?eval_result=google/gemma-4-31B-it) | [82.3%](https://huggingface.co/datasets/Idavidrein/gpqa?eval_result=google/gemma-4-26B-A4B-it) | [58.6%](https://huggingface.co/datasets/Idavidrein/gpqa?eval_result=google/gemma-4-E4B-it) | [43.4%](https://huggingface.co/datasets/Idavidrein/gpqa?eval_result=google/gemma-4-E2B-it) | 42.4% | +| Tau2 (average over 3) | 76.9% | 68.2% | 42.2% | 24.5% | 16.2% | +| BigBench Extra Hard | 74.4% | 64.8% | 33.1% | 21.9% | 19.3% | +| MMMLU | 88.4% | 86.3% | 76.6% | 67.4% | 70.7% | +| **Coding** | +| LiveCodeBench v6 | 80.0% | 77.1% | 52.0% | 44.0% | 29.1% | +| Codeforces ELO | 2150 | 1718 | 940 | 633 | 110 | +| HLE no tools | [19.5%](https://huggingface.co/datasets/cais/hle?eval_result=google/gemma-4-31B-it) | [8.7%](https://huggingface.co/datasets/cais/hle?eval_result=google/gemma-4-26B-A4B-it) | - | - | - | +| HLE with search | [26.5%](https://huggingface.co/datasets/cais/hle?eval_result=google/gemma-4-31B-it) | [17.2%](https://huggingface.co/datasets/cais/hle?eval_result=google/gemma-4-26B-A4B-it) | - | - | - | +| **Vision** | +| MMMU Pro | 76.9% | 73.8% | 52.6% | 44.2% | 49.7% | +| OmniDocBench 1.5 (edit distance) | 0.131 | 0.149 | 0.181 | 0.290 | 0.365 | +| MATH-Vision | 85.6% | 82.4% | 59.5% | 52.4% | 46.0% | +| MedXPertQA MM | 61.3% | 58.1% | 28.7% | 23.5% | - | +| **Audio** | +| CoVoST | - | - | 35.54 | 33.47 | - | +| FLEURS (lower is better) | - | - | 0.08 | 0.09 | - | +| **Long Context** | +| MRCR v2 8 needle 128k (average) | 66.4% | 44.1% | 25.4% | 19.1% | 13.5% | + +## Acknowledgements + +Landing Gemma-4 in the open-source ecosystem took a lot of effort from many people and not only the authors of this blog post. In no particular order, we thank many people from the open-source team: Gemma 4 transformers integration is owed to Cyril, Raushan, Eustache, Arthur, Lysandre. We thank Joshua for transformers.js integration and demo, Eric for mistral.rs integration, Son for Llama.cpp, Prince for MLX integration, Quentin, Albert and Kashif for TRL, Adarsh for SGLang transformers backend and Toshihiro for building the demos. +This work wouldn't have been possible without Google's extensive contribution with the model artefact, but also the significant effort contributing the model to transformers in an effort to standardize it. The open-source ecosystem is now more complete, with a very capable, freely-licensed, open-source model. +The Gemma 4 transformers integration was handled by Cyril, Raushan, Eustache, Arthur, Lysandre. We thank Joshua for the transformers.js integration and demo, Eric for mistral.rs integration, Son for Llama.cpp, Prince for MLX, Quentin for TRL, Adarsh for SGLang transformers backend, and Toshihiro for building several demos. + +This work wouldn't have been possible without Google's extensive contribution with the model artefact, but also their significant effort contributing the model to transformers in an effort to standardize it. The open-source ecosystem is now more complete, with a very capable, freely-licensed, open-source model. diff --git a/tooling/huggingface/model-cards/gemma-4-26B-A4B-README.md b/tooling/huggingface/model-cards/gemma-4-26B-A4B-README.md new file mode 100644 index 0000000..8466914 --- /dev/null +++ b/tooling/huggingface/model-cards/gemma-4-26B-A4B-README.md @@ -0,0 +1,514 @@ +--- +library_name: transformers +license: apache-2.0 +license_link: https://ai.google.dev/gemma/docs/gemma_4_license +pipeline_tag: image-text-to-text +--- + +
+ +
+ + +

+ Hugging Face | + GitHub | + Launch Blog | + Documentation +
+ License: Apache 2.0 | Authors: Google DeepMind +

+ +Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages. + +Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in four distinct sizes: **E2B**, **E4B**, **26B A4B**, and **31B**. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI. + +Gemma 4 introduces key **capability and architectural advancements**: + +* **Reasoning** – All models in the family are designed as highly capable reasoners, with configurable thinking modes. + +* **Extended Multimodalities** – Processes Text, Image with variable aspect ratio and resolution support (all models), Video, and Audio (featured natively on the E2B and E4B models). + +* **Diverse & Efficient Architectures** – Offers Dense and Mixture-of-Experts (MoE) variants of different sizes for scalable deployment. + +* **Optimized for On-Device** – Smaller models are specifically designed for efficient local execution on laptops and mobile devices. + +* **Increased Context Window** – The small models feature a 128K context window, while the medium models support 256K. + +* **Enhanced Coding & Agentic Capabilities** – Achieves notable improvements in coding benchmarks alongside native function-calling support, powering highly capable autonomous agents. + +* **Native System Prompt Support** – Gemma 4 introduces native support for the `system` role, enabling more structured and controllable conversations. + +## **Models Overview** + +Gemma 4 models are designed to deliver frontier-level performance at each size, targeting deployment scenarios from mobile and edge devices (E2B, E4B) to consumer GPUs and workstations (26B A4B, 31B). They are well-suited for reasoning, agentic workflows, coding, and multimodal understanding. + +The models employ a hybrid attention mechanism that interleaves local sliding window attention with full global attention, ensuring the final layer is always global. This hybrid design delivers the processing speed and low memory footprint of a lightweight model without sacrificing the deep awareness required for complex, long-context tasks. To optimize memory for long contexts, global layers feature unified Keys and Values, and apply Proportional RoPE (p-RoPE). + +### Dense Models + +| Property | E2B | E4B | 31B Dense | +| :---- | :---- | :---- | :---- | +| **Total Parameters** | 2.3B effective (5.1B with embeddings) | 4.5B effective (8B with embeddings) | 30.7B | +| **Layers** | 35 | 42 | 60 | +| **Sliding Window** | 512 tokens | 512 tokens | 1024 tokens | +| **Context Length** | 128K tokens | 128K tokens | 256K tokens | +| **Vocabulary Size** | 262K | 262K | 262K | +| **Supported Modalities** | Text, Image, Audio | Text, Image, Audio | Text, Image | +| **Vision Encoder Parameters** | *~150M* | *~150M* | *~550M* | +| **Audio Encoder Parameters** | *~300M* | *~300M* | No Audio | + +The "E" in E2B and E4B stands for "effective" parameters. The smaller models incorporate Per-Layer Embeddings (PLE) to maximize parameter efficiency in on-device deployments. Rather than adding more layers or parameters to the model, PLE gives each decoder layer its own small embedding for every token. These embedding tables are large but are only used for quick lookups, which is why the effective parameter count is much smaller than the total. + +### Mixture-of-Experts (MoE) Model + +| Property | 26B A4B MoE | +| :---- | :---- | +| **Total Parameters** | 25.2B | +| **Active Parameters** | 3.8B | +| **Layers** | 30 | +| **Sliding Window** | 1024 tokens | +| **Context Length** | 256K tokens | +| **Vocabulary Size** | 262K | +| **Expert Count** | 8 active / 128 total and 1 shared | +| **Supported Modalities** | Text, Image | +| **Vision Encoder Parameters** | *~550M* | + +The "A" in 26B A4B stands for "active parameters" in contrast to the total number of parameters the model contains. By only activating a 4B subset of parameters during inference, the Mixture-of-Experts model runs much faster than its 26B total might suggest. This makes it an excellent choice for fast inference compared to the dense 31B model since it runs almost as fast as a 4B-parameter model. + +## **Benchmark Results** + +These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation. Evaluation results marked in the table are for instruction-tuned models. + +| | Gemma 4 31B | Gemma 4 26B A4B | Gemma 4 E4B | Gemma 4 E2B | Gemma 3 27B (no think) | +| :---- | :---- | :---- | :---- | :---- | :---- | +| MMLU Pro | 85.2% | 82.6% | 69.4% | 60.0% | 67.6% | +| AIME 2026 no tools | 89.2% | 88.3% | 42.5% | 37.5% | 20.8% | +| LiveCodeBench v6 | 80.0% | 77.1% | 52.0% | 44.0% | 29.1% | +| Codeforces ELO | 2150 | 1718 | 940 | 633 | 110 | +| GPQA Diamond | 84.3% | 82.3% | 58.6% | 43.4% | 42.4% | +| Tau2 (average over 3) | 76.9% | 68.2% | 42.2% | 24.5% | 16.2% | +| HLE no tools | 19.5% | 8.7% | - | - | - | +| HLE with search | 26.5% | 17.2% | - | - | - | +| BigBench Extra Hard | 74.4% | 64.8% | 33.1% | 21.9% | 19.3% | +| MMMLU | 88.4% | 86.3% | 76.6% | 67.4% | 70.7% | +| **Vision** | | | | | | +| MMMU Pro | 76.9% | 73.8% | 52.6% | 44.2% | 49.7% | +| OmniDocBench 1.5 (average edit distance, lower is better) | 0.131 | 0.149 | 0.181 | 0.290 | 0.365 | +| MATH-Vision | 85.6% | 82.4% | 59.5% | 52.4% | 46.0% | +| MedXPertQA MM | 61.3% | 58.1% | 28.7% | 23.5% | - | +| **Audio** | | | | | | +| CoVoST | - | - | 35.54 | 33.47 | - | +| FLEURS (lower is better) | - | - | 0.08 | 0.09 | - | +| **Long Context** | | | | | | +| MRCR v2 8 needle 128k (average) | 66.4% | 44.1% | 25.4% | 19.1% | 13.5% | + +## **Core Capabilities** + +Gemma 4 models handle a broad range of tasks across text, vision, and audio. Key capabilities include: + +* **Thinking** – Built-in reasoning mode that lets the model think step-by-step before answering. +* **Long Context** – Context windows of up to 128K tokens (E2B/E4B) and 256K tokens (26B A4B/31B). +* **Image Understanding** – Object detection, Document/PDF parsing, screen and UI understanding, chart comprehension, OCR (including multilingual), handwriting recognition, and pointing. Images can be processed at variable aspect ratios and resolutions. +* **Video Understanding** – Analyze video by processing sequences of frames. +* **Interleaved Multimodal Input** – Freely mix text and images in any order within a single prompt. +* **Function Calling** – Native support for structured tool use, enabling agentic workflows. +* **Coding** – Code generation, completion, and correction. +* **Multilingual** – Out-of-the-box support for 35+ languages, pre-trained on 140+ languages. +* **Audio** (E2B and E4B only) – Automatic speech recognition (ASR) and speech-to-translated-text translation across multiple languages. + + +## Getting Started + +You can use all Gemma 4 models with the latest version of Transformers. To get started, install the necessary dependencies in your environment: + +`pip install -U transformers torch accelerate` + +Once you have everything installed, you can proceed to load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForCausalLM + +MODEL_ID = "google/gemma-4-26B-A4B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForCausalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output: + +```python +# Prompt +messages = [ + {"role": "system", "content": "You are a helpful assistant."}, + {"role": "user", "content": "Write a short joke about saving RAM."}, +] + +# Process input +text = processor.apply_chat_template( + messages, + tokenize=False, + add_generation_prompt=True, + enable_thinking=False +) +inputs = processor(text=text, return_tensors="pt").to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=1024) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +To enable reasoning, set `enable_thinking=True` and the `parse_response` function will take care of parsing the thinking output. + +Below, you will also find snippets for processing audio (E2B and E4B only), images, and video alongside text: + +
+Code for processing Audio + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process audio. To use it, make sure to install the following packages: + + +`pip install -U transformers torch librosa accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-E2B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the audio URL in the prompt: + + +```python +# Prompt - add audio before text +messages = [ + { + "role": "user", + "content": [ + {"type": "audio", "audio": "https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/Demos/sample-data/journal1.wav"}, + {"type": "text", "text": "Transcribe the following speech segment in its original language. Follow these specific instructions for formatting the answer:\n* Only output the transcription, with no newlines.\n* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three."}, + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ +
+Code for processing Images + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process images. To use it, make sure to install the following packages: + + +`pip install -U transformers torch torchvision accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-26B-A4B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the image URL in the prompt: + + +```python +# Prompt - add image before text +messages = [ + { + "role": "user", "content": [ + {"type": "image", "url": "https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/Demos/sample-data/GoldenGate.png"}, + {"type": "text", "text": "What is shown in this image?"} + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ + +
+Code for processing Videos + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process videos. To use it, make sure to install the following packages: + +`pip install -U transformers torch torchvision torchcodec librosa accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-26B-A4B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the video URL in the prompt: + + +```python +# Prompt - add video before text +messages = [ + { + 'role': 'user', + 'content': [ + {"type": "video", "video": "https://github.com/bebechien/gemma/raw/refs/heads/main/videos/ForBiggerBlazes.mp4"}, + {'type': 'text', 'text': 'Describe this video.'} + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ + +## **Best Practices** + +For the best performance, use these configurations and best practices: + +### 1. Sampling Parameters + +Use the following standardized sampling configuration across all use cases: + +* `temperature=1.0` +* `top_p=0.95` +* `top_k=64` + +### 2. Thinking Mode Configuration + +Compared to Gemma 3, the models use standard `system`, `assistant`, and `user` roles. To properly manage the thinking process, use the following control tokens: + +* **Trigger Thinking:** Thinking is enabled by including the `<|think|>` token at the start of the system prompt. To disable thinking, remove the token. +* **Standard Generation:** When thinking is enabled, the model will output its internal reasoning followed by the final answer using this structure: + `<|channel>thought\n`**[Internal reasoning]**`` +* **Disabled Thinking Behavior:** For all models except for the E2B and E4B variants, if thinking is disabled, the model will still generate the tags but with an empty thought block: + `<|channel>thought\n`**[Final answer]** + +> [!Note] +> Note that many libraries like Transformers and llama.cpp handle the complexities of the chat template for you. + +### 3. Multi-Turn Conversations + +* **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final response. Thoughts from previous model turns must *not be added* before the next user turn begins. + +### 4. Modality order + +* For optimal performance with multimodal inputs, place image and/or audio content **before** the text in your prompt. + +### 5. Variable Image Resolution + +Aside from variable aspect ratios, Gemma 4 supports variable image resolution through a configurable visual token budget, which controls how many tokens are used to represent an image. A higher token budget preserves more visual detail at the cost of additional compute, while a lower budget enables faster inference for tasks that don't require fine-grained understanding. + +* The supported token budgets are: **70**, **140**, **280**, **560**, and **1120**. + * Use *lower budgets* for classification, captioning, or video understanding, where faster inference and processing many frames outweigh fine-grained detail. + * Use *higher budgets* for tasks like OCR, document parsing, or reading small text. + +### 6. Audio + +Use the following prompt structures for audio processing: + +* **Audio Speech Recognition (ASR)** + +```text +Transcribe the following speech segment in {LANGUAGE} into {LANGUAGE} text. + +Follow these specific instructions for formatting the answer: +* Only output the transcription, with no newlines. +* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three. +``` + +* **Automatic Speech Translation (AST)** + +```text +Transcribe the following speech segment in {SOURCE_LANGUAGE}, then translate it into {TARGET_LANGUAGE}. +When formatting the answer, first output the transcription in {SOURCE_LANGUAGE}, then one newline, then output the string '{TARGET_LANGUAGE}: ', then the translation in {TARGET_LANGUAGE}. +``` + +### 7. Audio and Video Length + +All models support image inputs and can process videos as frames whereas the E2B and E4B models also support audio inputs. Audio supports a maximum length of 30 seconds. Video supports a maximum of 60 seconds assuming the images are processed at one frame per second. + +## **Model Data** + +Data used for model training and how the data was processed. + +### **Training Dataset** + +Our pre-training dataset is a large-scale, diverse collection of data encompassing a wide range of domains and modalities, which includes web documents, code, images, audio, with a cutoff date of January 2025. Here are the key components: + +* **Web Documents**: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages. +* **Code**: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions. +* **Mathematics**: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. +* **Images**: A wide range of images enables the model to perform image analysis and visual data extraction tasks. + +The combination of these diverse data sources is crucial for training a powerful multimodal model that can handle a wide variety of different tasks and data formats. + +### **Data Preprocessing** + +Here are the key data cleaning and filtering methods applied to the training data: + +* **CSAM Filtering**: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. +* **Sensitive Data Filtering**: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. +* **Additional methods**: Filtering based on content quality and safety in line with [our policies](https://ai.google/static/documents/ai-responsibility-update-published-february-2025.pdf). + +## **Ethics and Safety** + +As open models become central to enterprise infrastructure, provenance and security are paramount. Developed by Google DeepMind, Gemma 4 undergoes the same rigorous safety evaluations as our proprietary Gemini models. + +### **Evaluation Approach** + +Gemma 4 models were developed in partnership with internal safety and responsible AI teams. A range of automated as well as human evaluations were conducted to help improve model safety. These evaluations align with [Google’s AI principles](https://ai.google/principles/), as well as safety policies, which aim to prevent our generative AI models from generating harmful content, including: + +* Content related to child sexual abuse material and exploitation +* Dangerous content (e.g., promoting suicide, or instructing in activities that could cause real-world harm) +* Sexually explicit content +* Hate speech (e.g., dehumanizing members of protected groups) +* Harassment (e.g., encouraging violence against people) + +### **Evaluation Results** + +For all areas of safety testing, we saw major improvements in all categories of content safety relative to previous Gemma models. Overall, Gemma 4 models significantly outperform Gemma 3 and 3n models in improving safety, while keeping unjustified refusals low. All testing was conducted without safety filters to evaluate the model capabilities and behaviors. For both text-to-text and image-to-text, and across all model sizes, the model produced minimal policy violations, and showed significant improvements over previous Gemma models' performance. + +## **Usage and Limitations** + +These models have certain limitations that users should be aware of. + +### **Intended Usage** + +Multimodal models (capable of processing vision, language, and/or audio) have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. + +* **Content Creation and Communication** + * **Text Generation**: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. + * **Chatbots and Conversational AI**: Power conversational interfaces for customer service, virtual assistants, or interactive applications. + * **Text Summarization**: Generate concise summaries of a text corpus, research papers, or reports. + * **Image Data Extraction**: These models can be used to extract, interpret, and summarize visual data for text communications. + * **Audio Processing and Interaction**: The smaller models (E2B and E4B) can analyze and interpret audio inputs, enabling voice-driven interactions and transcriptions. +* **Research and Education** + * **Natural Language Processing (NLP) and VLM Research**: These models can serve as a foundation for researchers to experiment with VLM and NLP techniques, develop algorithms, and contribute to the advancement of the field. + * **Language Learning Tools**: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. + * **Knowledge Exploration**: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. + +### **Limitations** + +* **Training Data** + * The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. + * The scope of the training dataset determines the subject areas the model can handle effectively. +* **Context and Task Complexity** + * Models perform well on tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. + * A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). +* **Language Ambiguity and Nuance** + * Natural language is inherently complex. Models might struggle to grasp subtle nuances, sarcasm, or figurative language. +* **Factual Accuracy** + * Models generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. +* **Common Sense** + * Models rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. + +### **Ethical Considerations and Risks** + +The development of vision-language models (VLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: + +* **Bias and Fairness** + * VLMs trained on large-scale, real-world text and image data can reflect socio-cultural biases embedded in the training material. Gemma 4 models underwent careful scrutiny, input data pre-processing, and post-training evaluations as reported in this card to help mitigate the risk of these biases. +* **Misinformation and Misuse** + * VLMs can be misused to generate text that is false, misleading, or harmful. + * Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit](https://ai.google.dev/responsible). +* **Transparency and Accountability** + * This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. + * A responsibly developed open model offers the opportunity to share innovation by making VLM technology accessible to developers and researchers across the AI ecosystem. + +**Risks identified and mitigations**: + +* **Generation of harmful content**: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. +* **Misuse for malicious purposes**: Technical limitations and developer and end-user education can help mitigate against malicious applications of VLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. +* **Privacy violations**: Models were trained on data filtered for removal of certain personal information and other sensitive data. Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. +* **Perpetuation of biases**: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. + +### **Benefits** + +At the time of release, this family of models provides high-performance open vision-language model implementations designed from the ground up for responsible AI development compared to similarly sized models. diff --git a/tooling/huggingface/model-cards/gemma-4-26B-A4B-it-README.md b/tooling/huggingface/model-cards/gemma-4-26B-A4B-it-README.md new file mode 100644 index 0000000..3949a79 --- /dev/null +++ b/tooling/huggingface/model-cards/gemma-4-26B-A4B-it-README.md @@ -0,0 +1,513 @@ +--- +library_name: transformers +license: apache-2.0 +license_link: https://ai.google.dev/gemma/docs/gemma_4_license +pipeline_tag: image-text-to-text +--- + +
+ +
+ + +

+ Hugging Face | + GitHub | + Launch Blog | + Documentation +
+ License: Apache 2.0 | Authors: Google DeepMind +

+ +Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages. + +Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in four distinct sizes: **E2B**, **E4B**, **26B A4B**, and **31B**. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI. + +Gemma 4 introduces key **capability and architectural advancements**: + +* **Reasoning** – All models in the family are designed as highly capable reasoners, with configurable thinking modes. + +* **Extended Multimodalities** – Processes Text, Image with variable aspect ratio and resolution support (all models), Video, and Audio (featured natively on the E2B and E4B models). + +* **Diverse & Efficient Architectures** – Offers Dense and Mixture-of-Experts (MoE) variants of different sizes for scalable deployment. + +* **Optimized for On-Device** – Smaller models are specifically designed for efficient local execution on laptops and mobile devices. + +* **Increased Context Window** – The small models feature a 128K context window, while the medium models support 256K. + +* **Enhanced Coding & Agentic Capabilities** – Achieves notable improvements in coding benchmarks alongside native function-calling support, powering highly capable autonomous agents. + +* **Native System Prompt Support** – Gemma 4 introduces native support for the `system` role, enabling more structured and controllable conversations. + +## **Models Overview** + +Gemma 4 models are designed to deliver frontier-level performance at each size, targeting deployment scenarios from mobile and edge devices (E2B, E4B) to consumer GPUs and workstations (26B A4B, 31B). They are well-suited for reasoning, agentic workflows, coding, and multimodal understanding. + +The models employ a hybrid attention mechanism that interleaves local sliding window attention with full global attention, ensuring the final layer is always global. This hybrid design delivers the processing speed and low memory footprint of a lightweight model without sacrificing the deep awareness required for complex, long-context tasks. To optimize memory for long contexts, global layers feature unified Keys and Values, and apply Proportional RoPE (p-RoPE). + +### Dense Models + +| Property | E2B | E4B | 31B Dense | +| :---- | :---- | :---- | :---- | +| **Total Parameters** | 2.3B effective (5.1B with embeddings) | 4.5B effective (8B with embeddings) | 30.7B | +| **Layers** | 35 | 42 | 60 | +| **Sliding Window** | 512 tokens | 512 tokens | 1024 tokens | +| **Context Length** | 128K tokens | 128K tokens | 256K tokens | +| **Vocabulary Size** | 262K | 262K | 262K | +| **Supported Modalities** | Text, Image, Audio | Text, Image, Audio | Text, Image | +| **Vision Encoder Parameters** | *~150M* | *~150M* | *~550M* | +| **Audio Encoder Parameters** | *~300M* | *~300M* | No Audio | + +The "E" in E2B and E4B stands for "effective" parameters. The smaller models incorporate Per-Layer Embeddings (PLE) to maximize parameter efficiency in on-device deployments. Rather than adding more layers or parameters to the model, PLE gives each decoder layer its own small embedding for every token. These embedding tables are large but are only used for quick lookups, which is why the effective parameter count is much smaller than the total. + +### Mixture-of-Experts (MoE) Model + +| Property | 26B A4B MoE | +| :---- | :---- | +| **Total Parameters** | 25.2B | +| **Active Parameters** | 3.8B | +| **Layers** | 30 | +| **Sliding Window** | 1024 tokens | +| **Context Length** | 256K tokens | +| **Vocabulary Size** | 262K | +| **Expert Count** | 8 active / 128 total and 1 shared | +| **Supported Modalities** | Text, Image | +| **Vision Encoder Parameters** | *~550M* | + +The "A" in 26B A4B stands for "active parameters" in contrast to the total number of parameters the model contains. By only activating a 4B subset of parameters during inference, the Mixture-of-Experts model runs much faster than its 26B total might suggest. This makes it an excellent choice for fast inference compared to the dense 31B model since it runs almost as fast as a 4B-parameter model. + +## **Benchmark Results** + +These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation. Evaluation results marked in the table are for instruction-tuned models. + +| | Gemma 4 31B | Gemma 4 26B A4B | Gemma 4 E4B | Gemma 4 E2B | Gemma 3 27B (no think) | +| :---- | :---- | :---- | :---- | :---- | :---- | +| MMLU Pro | 85.2% | 82.6% | 69.4% | 60.0% | 67.6% | +| AIME 2026 no tools | 89.2% | 88.3% | 42.5% | 37.5% | 20.8% | +| LiveCodeBench v6 | 80.0% | 77.1% | 52.0% | 44.0% | 29.1% | +| Codeforces ELO | 2150 | 1718 | 940 | 633 | 110 | +| GPQA Diamond | 84.3% | 82.3% | 58.6% | 43.4% | 42.4% | +| Tau2 (average over 3) | 76.9% | 68.2% | 42.2% | 24.5% | 16.2% | +| HLE no tools | 19.5% | 8.7% | - | - | - | +| HLE with search | 26.5% | 17.2% | - | - | - | +| BigBench Extra Hard | 74.4% | 64.8% | 33.1% | 21.9% | 19.3% | +| MMMLU | 88.4% | 86.3% | 76.6% | 67.4% | 70.7% | +| **Vision** | | | | | | +| MMMU Pro | 76.9% | 73.8% | 52.6% | 44.2% | 49.7% | +| OmniDocBench 1.5 (average edit distance, lower is better) | 0.131 | 0.149 | 0.181 | 0.290 | 0.365 | +| MATH-Vision | 85.6% | 82.4% | 59.5% | 52.4% | 46.0% | +| MedXPertQA MM | 61.3% | 58.1% | 28.7% | 23.5% | - | +| **Audio** | | | | | | +| CoVoST | - | - | 35.54 | 33.47 | - | +| FLEURS (lower is better) | - | - | 0.08 | 0.09 | - | +| **Long Context** | | | | | | +| MRCR v2 8 needle 128k (average) | 66.4% | 44.1% | 25.4% | 19.1% | 13.5% | + +## **Core Capabilities** + +Gemma 4 models handle a broad range of tasks across text, vision, and audio. Key capabilities include: + +* **Thinking** – Built-in reasoning mode that lets the model think step-by-step before answering. +* **Long Context** – Context windows of up to 128K tokens (E2B/E4B) and 256K tokens (26B A4B/31B). +* **Image Understanding** – Object detection, Document/PDF parsing, screen and UI understanding, chart comprehension, OCR (including multilingual), handwriting recognition, and pointing. Images can be processed at variable aspect ratios and resolutions. +* **Video Understanding** – Analyze video by processing sequences of frames. +* **Interleaved Multimodal Input** – Freely mix text and images in any order within a single prompt. +* **Function Calling** – Native support for structured tool use, enabling agentic workflows. +* **Coding** – Code generation, completion, and correction. +* **Multilingual** – Out-of-the-box support for 35+ languages, pre-trained on 140+ languages. +* **Audio** (E2B and E4B only) – Automatic speech recognition (ASR) and speech-to-translated-text translation across multiple languages. + +## Getting Started + +You can use all Gemma 4 models with the latest version of Transformers. To get started, install the necessary dependencies in your environment: + +`pip install -U transformers torch accelerate` + +Once you have everything installed, you can proceed to load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForCausalLM + +MODEL_ID = "google/gemma-4-26B-A4B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForCausalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output: + +```python +# Prompt +messages = [ + {"role": "system", "content": "You are a helpful assistant."}, + {"role": "user", "content": "Write a short joke about saving RAM."}, +] + +# Process input +text = processor.apply_chat_template( + messages, + tokenize=False, + add_generation_prompt=True, + enable_thinking=False +) +inputs = processor(text=text, return_tensors="pt").to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=1024) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +To enable reasoning, set `enable_thinking=True` and the `parse_response` function will take care of parsing the thinking output. + +Below, you will also find snippets for processing audio (E2B and E4B only), images, and video alongside text: + +
+Code for processing Audio + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process audio. To use it, make sure to install the following packages: + + +`pip install -U transformers torch librosa accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-E2B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the audio URL in the prompt: + + +```python +# Prompt - add audio before text +messages = [ + { + "role": "user", + "content": [ + {"type": "audio", "audio": "https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/Demos/sample-data/journal1.wav"}, + {"type": "text", "text": "Transcribe the following speech segment in its original language. Follow these specific instructions for formatting the answer:\n* Only output the transcription, with no newlines.\n* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three."}, + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ +
+Code for processing Images + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process images. To use it, make sure to install the following packages: + + +`pip install -U transformers torch torchvision accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-26B-A4B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the image URL in the prompt: + + +```python +# Prompt - add image before text +messages = [ + { + "role": "user", "content": [ + {"type": "image", "url": "https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/Demos/sample-data/GoldenGate.png"}, + {"type": "text", "text": "What is shown in this image?"} + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ + +
+Code for processing Videos + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process videos. To use it, make sure to install the following packages: + +`pip install -U transformers torch torchvision torchcodec librosa accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-26B-A4B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the video URL in the prompt: + + +```python +# Prompt - add video before text +messages = [ + { + 'role': 'user', + 'content': [ + {"type": "video", "video": "https://github.com/bebechien/gemma/raw/refs/heads/main/videos/ForBiggerBlazes.mp4"}, + {'type': 'text', 'text': 'Describe this video.'} + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ + +## **Best Practices** + +For the best performance, use these configurations and best practices: + +### 1. Sampling Parameters + +Use the following standardized sampling configuration across all use cases: + +* `temperature=1.0` +* `top_p=0.95` +* `top_k=64` + +### 2. Thinking Mode Configuration + +Compared to Gemma 3, the models use standard `system`, `assistant`, and `user` roles. To properly manage the thinking process, use the following control tokens: + +* **Trigger Thinking:** Thinking is enabled by including the `<|think|>` token at the start of the system prompt. To disable thinking, remove the token. +* **Standard Generation:** When thinking is enabled, the model will output its internal reasoning followed by the final answer using this structure: + `<|channel>thought\n`**[Internal reasoning]**`` +* **Disabled Thinking Behavior:** For all models except for the E2B and E4B variants, if thinking is disabled, the model will still generate the tags but with an empty thought block: + `<|channel>thought\n`**[Final answer]** + +> [!Note] +> Note that many libraries like Transformers and llama.cpp handle the complexities of the chat template for you. + +### 3. Multi-Turn Conversations + +* **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final response. Thoughts from previous model turns must *not be added* before the next user turn begins. + +### 4. Modality order + +* For optimal performance with multimodal inputs, place image and/or audio content **before** the text in your prompt. + +### 5. Variable Image Resolution + +Aside from variable aspect ratios, Gemma 4 supports variable image resolution through a configurable visual token budget, which controls how many tokens are used to represent an image. A higher token budget preserves more visual detail at the cost of additional compute, while a lower budget enables faster inference for tasks that don't require fine-grained understanding. + +* The supported token budgets are: **70**, **140**, **280**, **560**, and **1120**. + * Use *lower budgets* for classification, captioning, or video understanding, where faster inference and processing many frames outweigh fine-grained detail. + * Use *higher budgets* for tasks like OCR, document parsing, or reading small text. + +### 6. Audio + +Use the following prompt structures for audio processing: + +* **Audio Speech Recognition (ASR)** + +```text +Transcribe the following speech segment in {LANGUAGE} into {LANGUAGE} text. + +Follow these specific instructions for formatting the answer: +* Only output the transcription, with no newlines. +* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three. +``` + +* **Automatic Speech Translation (AST)** + +```text +Transcribe the following speech segment in {SOURCE_LANGUAGE}, then translate it into {TARGET_LANGUAGE}. +When formatting the answer, first output the transcription in {SOURCE_LANGUAGE}, then one newline, then output the string '{TARGET_LANGUAGE}: ', then the translation in {TARGET_LANGUAGE}. +``` + +### 7. Audio and Video Length + +All models support image inputs and can process videos as frames whereas the E2B and E4B models also support audio inputs. Audio supports a maximum length of 30 seconds. Video supports a maximum of 60 seconds assuming the images are processed at one frame per second. + +## **Model Data** + +Data used for model training and how the data was processed. + +### **Training Dataset** + +Our pre-training dataset is a large-scale, diverse collection of data encompassing a wide range of domains and modalities, which includes web documents, code, images, audio, with a cutoff date of January 2025. Here are the key components: + +* **Web Documents**: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages. +* **Code**: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions. +* **Mathematics**: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. +* **Images**: A wide range of images enables the model to perform image analysis and visual data extraction tasks. + +The combination of these diverse data sources is crucial for training a powerful multimodal model that can handle a wide variety of different tasks and data formats. + +### **Data Preprocessing** + +Here are the key data cleaning and filtering methods applied to the training data: + +* **CSAM Filtering**: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. +* **Sensitive Data Filtering**: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. +* **Additional methods**: Filtering based on content quality and safety in line with [our policies](https://ai.google/static/documents/ai-responsibility-update-published-february-2025.pdf). + +## **Ethics and Safety** + +As open models become central to enterprise infrastructure, provenance and security are paramount. Developed by Google DeepMind, Gemma 4 undergoes the same rigorous safety evaluations as our proprietary Gemini models. + +### **Evaluation Approach** + +Gemma 4 models were developed in partnership with internal safety and responsible AI teams. A range of automated as well as human evaluations were conducted to help improve model safety. These evaluations align with [Google’s AI principles](https://ai.google/principles/), as well as safety policies, which aim to prevent our generative AI models from generating harmful content, including: + +* Content related to child sexual abuse material and exploitation +* Dangerous content (e.g., promoting suicide, or instructing in activities that could cause real-world harm) +* Sexually explicit content +* Hate speech (e.g., dehumanizing members of protected groups) +* Harassment (e.g., encouraging violence against people) + +### **Evaluation Results** + +For all areas of safety testing, we saw major improvements in all categories of content safety relative to previous Gemma models. Overall, Gemma 4 models significantly outperform Gemma 3 and 3n models in improving safety, while keeping unjustified refusals low. All testing was conducted without safety filters to evaluate the model capabilities and behaviors. For both text-to-text and image-to-text, and across all model sizes, the model produced minimal policy violations, and showed significant improvements over previous Gemma models' performance. + +## **Usage and Limitations** + +These models have certain limitations that users should be aware of. + +### **Intended Usage** + +Multimodal models (capable of processing vision, language, and/or audio) have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. + +* **Content Creation and Communication** + * **Text Generation**: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. + * **Chatbots and Conversational AI**: Power conversational interfaces for customer service, virtual assistants, or interactive applications. + * **Text Summarization**: Generate concise summaries of a text corpus, research papers, or reports. + * **Image Data Extraction**: These models can be used to extract, interpret, and summarize visual data for text communications. + * **Audio Processing and Interaction**: The smaller models (E2B and E4B) can analyze and interpret audio inputs, enabling voice-driven interactions and transcriptions. +* **Research and Education** + * **Natural Language Processing (NLP) and VLM Research**: These models can serve as a foundation for researchers to experiment with VLM and NLP techniques, develop algorithms, and contribute to the advancement of the field. + * **Language Learning Tools**: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. + * **Knowledge Exploration**: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. + +### **Limitations** + +* **Training Data** + * The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. + * The scope of the training dataset determines the subject areas the model can handle effectively. +* **Context and Task Complexity** + * Models perform well on tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. + * A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). +* **Language Ambiguity and Nuance** + * Natural language is inherently complex. Models might struggle to grasp subtle nuances, sarcasm, or figurative language. +* **Factual Accuracy** + * Models generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. +* **Common Sense** + * Models rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. + +### **Ethical Considerations and Risks** + +The development of vision-language models (VLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: + +* **Bias and Fairness** + * VLMs trained on large-scale, real-world text and image data can reflect socio-cultural biases embedded in the training material. Gemma 4 models underwent careful scrutiny, input data pre-processing, and post-training evaluations as reported in this card to help mitigate the risk of these biases. +* **Misinformation and Misuse** + * VLMs can be misused to generate text that is false, misleading, or harmful. + * Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit](https://ai.google.dev/responsible). +* **Transparency and Accountability** + * This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. + * A responsibly developed open model offers the opportunity to share innovation by making VLM technology accessible to developers and researchers across the AI ecosystem. + +**Risks identified and mitigations**: + +* **Generation of harmful content**: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. +* **Misuse for malicious purposes**: Technical limitations and developer and end-user education can help mitigate against malicious applications of VLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. +* **Privacy violations**: Models were trained on data filtered for removal of certain personal information and other sensitive data. Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. +* **Perpetuation of biases**: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. + +### **Benefits** + +At the time of release, this family of models provides high-performance open vision-language model implementations designed from the ground up for responsible AI development compared to similarly sized models. diff --git a/tooling/huggingface/model-cards/gemma-4-31B-README.md b/tooling/huggingface/model-cards/gemma-4-31B-README.md new file mode 100644 index 0000000..f7130fd --- /dev/null +++ b/tooling/huggingface/model-cards/gemma-4-31B-README.md @@ -0,0 +1,513 @@ +--- +library_name: transformers +license: apache-2.0 +license_link: https://ai.google.dev/gemma/docs/gemma_4_license +pipeline_tag: image-text-to-text +--- + +
+ +
+ + +

+ Hugging Face | + GitHub | + Launch Blog | + Documentation +
+ License: Apache 2.0 | Authors: Google DeepMind +

+ +Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages. + +Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in four distinct sizes: **E2B**, **E4B**, **26B A4B**, and **31B**. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI. + +Gemma 4 introduces key **capability and architectural advancements**: + +* **Reasoning** – All models in the family are designed as highly capable reasoners, with configurable thinking modes. + +* **Extended Multimodalities** – Processes Text, Image with variable aspect ratio and resolution support (all models), Video, and Audio (featured natively on the E2B and E4B models). + +* **Diverse & Efficient Architectures** – Offers Dense and Mixture-of-Experts (MoE) variants of different sizes for scalable deployment. + +* **Optimized for On-Device** – Smaller models are specifically designed for efficient local execution on laptops and mobile devices. + +* **Increased Context Window** – The small models feature a 128K context window, while the medium models support 256K. + +* **Enhanced Coding & Agentic Capabilities** – Achieves notable improvements in coding benchmarks alongside native function-calling support, powering highly capable autonomous agents. + +* **Native System Prompt Support** – Gemma 4 introduces native support for the `system` role, enabling more structured and controllable conversations. + +## **Models Overview** + +Gemma 4 models are designed to deliver frontier-level performance at each size, targeting deployment scenarios from mobile and edge devices (E2B, E4B) to consumer GPUs and workstations (26B A4B, 31B). They are well-suited for reasoning, agentic workflows, coding, and multimodal understanding. + +The models employ a hybrid attention mechanism that interleaves local sliding window attention with full global attention, ensuring the final layer is always global. This hybrid design delivers the processing speed and low memory footprint of a lightweight model without sacrificing the deep awareness required for complex, long-context tasks. To optimize memory for long contexts, global layers feature unified Keys and Values, and apply Proportional RoPE (p-RoPE). + +### Dense Models + +| Property | E2B | E4B | 31B Dense | +| :---- | :---- | :---- | :---- | +| **Total Parameters** | 2.3B effective (5.1B with embeddings) | 4.5B effective (8B with embeddings) | 30.7B | +| **Layers** | 35 | 42 | 60 | +| **Sliding Window** | 512 tokens | 512 tokens | 1024 tokens | +| **Context Length** | 128K tokens | 128K tokens | 256K tokens | +| **Vocabulary Size** | 262K | 262K | 262K | +| **Supported Modalities** | Text, Image, Audio | Text, Image, Audio | Text, Image | +| **Vision Encoder Parameters** | *~150M* | *~150M* | *~550M* | +| **Audio Encoder Parameters** | *~300M* | *~300M* | No Audio | + +The "E" in E2B and E4B stands for "effective" parameters. The smaller models incorporate Per-Layer Embeddings (PLE) to maximize parameter efficiency in on-device deployments. Rather than adding more layers or parameters to the model, PLE gives each decoder layer its own small embedding for every token. These embedding tables are large but are only used for quick lookups, which is why the effective parameter count is much smaller than the total. + +### Mixture-of-Experts (MoE) Model + +| Property | 26B A4B MoE | +| :---- | :---- | +| **Total Parameters** | 25.2B | +| **Active Parameters** | 3.8B | +| **Layers** | 30 | +| **Sliding Window** | 1024 tokens | +| **Context Length** | 256K tokens | +| **Vocabulary Size** | 262K | +| **Expert Count** | 8 active / 128 total and 1 shared | +| **Supported Modalities** | Text, Image | +| **Vision Encoder Parameters** | *~550M* | + +The "A" in 26B A4B stands for "active parameters" in contrast to the total number of parameters the model contains. By only activating a 4B subset of parameters during inference, the Mixture-of-Experts model runs much faster than its 26B total might suggest. This makes it an excellent choice for fast inference compared to the dense 31B model since it runs almost as fast as a 4B-parameter model. + +## **Benchmark Results** + +These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation. Evaluation results marked in the table are for instruction-tuned models. + +| | Gemma 4 31B | Gemma 4 26B A4B | Gemma 4 E4B | Gemma 4 E2B | Gemma 3 27B (no think) | +| :---- | :---- | :---- | :---- | :---- | :---- | +| MMLU Pro | 85.2% | 82.6% | 69.4% | 60.0% | 67.6% | +| AIME 2026 no tools | 89.2% | 88.3% | 42.5% | 37.5% | 20.8% | +| LiveCodeBench v6 | 80.0% | 77.1% | 52.0% | 44.0% | 29.1% | +| Codeforces ELO | 2150 | 1718 | 940 | 633 | 110 | +| GPQA Diamond | 84.3% | 82.3% | 58.6% | 43.4% | 42.4% | +| Tau2 (average over 3) | 76.9% | 68.2% | 42.2% | 24.5% | 16.2% | +| HLE no tools | 19.5% | 8.7% | - | - | - | +| HLE with search | 26.5% | 17.2% | - | - | - | +| BigBench Extra Hard | 74.4% | 64.8% | 33.1% | 21.9% | 19.3% | +| MMMLU | 88.4% | 86.3% | 76.6% | 67.4% | 70.7% | +| **Vision** | | | | | | +| MMMU Pro | 76.9% | 73.8% | 52.6% | 44.2% | 49.7% | +| OmniDocBench 1.5 (average edit distance, lower is better) | 0.131 | 0.149 | 0.181 | 0.290 | 0.365 | +| MATH-Vision | 85.6% | 82.4% | 59.5% | 52.4% | 46.0% | +| MedXPertQA MM | 61.3% | 58.1% | 28.7% | 23.5% | - | +| **Audio** | | | | | | +| CoVoST | - | - | 35.54 | 33.47 | - | +| FLEURS (lower is better) | - | - | 0.08 | 0.09 | - | +| **Long Context** | | | | | | +| MRCR v2 8 needle 128k (average) | 66.4% | 44.1% | 25.4% | 19.1% | 13.5% | + +## **Core Capabilities** + +Gemma 4 models handle a broad range of tasks across text, vision, and audio. Key capabilities include: + +* **Thinking** – Built-in reasoning mode that lets the model think step-by-step before answering. +* **Long Context** – Context windows of up to 128K tokens (E2B/E4B) and 256K tokens (26B A4B/31B). +* **Image Understanding** – Object detection, Document/PDF parsing, screen and UI understanding, chart comprehension, OCR (including multilingual), handwriting recognition, and pointing. Images can be processed at variable aspect ratios and resolutions. +* **Video Understanding** – Analyze video by processing sequences of frames. +* **Interleaved Multimodal Input** – Freely mix text and images in any order within a single prompt. +* **Function Calling** – Native support for structured tool use, enabling agentic workflows. +* **Coding** – Code generation, completion, and correction. +* **Multilingual** – Out-of-the-box support for 35+ languages, pre-trained on 140+ languages. +* **Audio** (E2B and E4B only) – Automatic speech recognition (ASR) and speech-to-translated-text translation across multiple languages. + +## Getting Started + +You can use all Gemma 4 models with the latest version of Transformers. To get started, install the necessary dependencies in your environment: + +`pip install -U transformers torch accelerate` + +Once you have everything installed, you can proceed to load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForCausalLM + +MODEL_ID = "google/gemma-4-31B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForCausalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output: + +```python +# Prompt +messages = [ + {"role": "system", "content": "You are a helpful assistant."}, + {"role": "user", "content": "Write a short joke about saving RAM."}, +] + +# Process input +text = processor.apply_chat_template( + messages, + tokenize=False, + add_generation_prompt=True, + enable_thinking=False +) +inputs = processor(text=text, return_tensors="pt").to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=1024) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +To enable reasoning, set `enable_thinking=True` and the `parse_response` function will take care of parsing the thinking output. + +Below, you will also find snippets for processing audio (E2B and E4B only), images, and video alongside text: + +
+Code for processing Audio + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process audio. To use it, make sure to install the following packages: + + +`pip install -U transformers torch librosa accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-E2B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the audio URL in the prompt: + + +```python +# Prompt - add audio before text +messages = [ + { + "role": "user", + "content": [ + {"type": "audio", "audio": "https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/Demos/sample-data/journal1.wav"}, + {"type": "text", "text": "Transcribe the following speech segment in its original language. Follow these specific instructions for formatting the answer:\n* Only output the transcription, with no newlines.\n* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three."}, + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ +
+Code for processing Images + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process images. To use it, make sure to install the following packages: + + +`pip install -U transformers torch torchvision accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-31B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the image URL in the prompt: + + +```python +# Prompt - add image before text +messages = [ + { + "role": "user", "content": [ + {"type": "image", "url": "https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/Demos/sample-data/GoldenGate.png"}, + {"type": "text", "text": "What is shown in this image?"} + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ + +
+Code for processing Videos + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process videos. To use it, make sure to install the following packages: + +`pip install -U transformers torch torchvision torchcodec librosa accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-31B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the video URL in the prompt: + + +```python +# Prompt - add video before text +messages = [ + { + 'role': 'user', + 'content': [ + {"type": "video", "video": "https://github.com/bebechien/gemma/raw/refs/heads/main/videos/ForBiggerBlazes.mp4"}, + {'type': 'text', 'text': 'Describe this video.'} + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ + +## **Best Practices** + +For the best performance, use these configurations and best practices: + +### 1. Sampling Parameters + +Use the following standardized sampling configuration across all use cases: + +* `temperature=1.0` +* `top_p=0.95` +* `top_k=64` + +### 2. Thinking Mode Configuration + +Compared to Gemma 3, the models use standard `system`, `assistant`, and `user` roles. To properly manage the thinking process, use the following control tokens: + +* **Trigger Thinking:** Thinking is enabled by including the `<|think|>` token at the start of the system prompt. To disable thinking, remove the token. +* **Standard Generation:** When thinking is enabled, the model will output its internal reasoning followed by the final answer using this structure: + `<|channel>thought\n`**[Internal reasoning]**`` +* **Disabled Thinking Behavior:** For all models except for the E2B and E4B variants, if thinking is disabled, the model will still generate the tags but with an empty thought block: + `<|channel>thought\n`**[Final answer]** + +> [!Note] +> Note that many libraries like Transformers and llama.cpp handle the complexities of the chat template for you. + +### 3. Multi-Turn Conversations + +* **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final response. Thoughts from previous model turns must *not be added* before the next user turn begins. + +### 4. Modality order + +* For optimal performance with multimodal inputs, place image and/or audio content **before** the text in your prompt. + +### 5. Variable Image Resolution + +Aside from variable aspect ratios, Gemma 4 supports variable image resolution through a configurable visual token budget, which controls how many tokens are used to represent an image. A higher token budget preserves more visual detail at the cost of additional compute, while a lower budget enables faster inference for tasks that don't require fine-grained understanding. + +* The supported token budgets are: **70**, **140**, **280**, **560**, and **1120**. + * Use *lower budgets* for classification, captioning, or video understanding, where faster inference and processing many frames outweigh fine-grained detail. + * Use *higher budgets* for tasks like OCR, document parsing, or reading small text. + +### 6. Audio + +Use the following prompt structures for audio processing: + +* **Audio Speech Recognition (ASR)** + +```text +Transcribe the following speech segment in {LANGUAGE} into {LANGUAGE} text. + +Follow these specific instructions for formatting the answer: +* Only output the transcription, with no newlines. +* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three. +``` + +* **Automatic Speech Translation (AST)** + +```text +Transcribe the following speech segment in {SOURCE_LANGUAGE}, then translate it into {TARGET_LANGUAGE}. +When formatting the answer, first output the transcription in {SOURCE_LANGUAGE}, then one newline, then output the string '{TARGET_LANGUAGE}: ', then the translation in {TARGET_LANGUAGE}. +``` + +### 7. Audio and Video Length + +All models support image inputs and can process videos as frames whereas the E2B and E4B models also support audio inputs. Audio supports a maximum length of 30 seconds. Video supports a maximum of 60 seconds assuming the images are processed at one frame per second. + +## **Model Data** + +Data used for model training and how the data was processed. + +### **Training Dataset** + +Our pre-training dataset is a large-scale, diverse collection of data encompassing a wide range of domains and modalities, which includes web documents, code, images, audio, with a cutoff date of January 2025. Here are the key components: + +* **Web Documents**: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages. +* **Code**: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions. +* **Mathematics**: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. +* **Images**: A wide range of images enables the model to perform image analysis and visual data extraction tasks. + +The combination of these diverse data sources is crucial for training a powerful multimodal model that can handle a wide variety of different tasks and data formats. + +### **Data Preprocessing** + +Here are the key data cleaning and filtering methods applied to the training data: + +* **CSAM Filtering**: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. +* **Sensitive Data Filtering**: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. +* **Additional methods**: Filtering based on content quality and safety in line with [our policies](https://ai.google/static/documents/ai-responsibility-update-published-february-2025.pdf). + +## **Ethics and Safety** + +As open models become central to enterprise infrastructure, provenance and security are paramount. Developed by Google DeepMind, Gemma 4 undergoes the same rigorous safety evaluations as our proprietary Gemini models. + +### **Evaluation Approach** + +Gemma 4 models were developed in partnership with internal safety and responsible AI teams. A range of automated as well as human evaluations were conducted to help improve model safety. These evaluations align with [Google’s AI principles](https://ai.google/principles/), as well as safety policies, which aim to prevent our generative AI models from generating harmful content, including: + +* Content related to child sexual abuse material and exploitation +* Dangerous content (e.g., promoting suicide, or instructing in activities that could cause real-world harm) +* Sexually explicit content +* Hate speech (e.g., dehumanizing members of protected groups) +* Harassment (e.g., encouraging violence against people) + +### **Evaluation Results** + +For all areas of safety testing, we saw major improvements in all categories of content safety relative to previous Gemma models. Overall, Gemma 4 models significantly outperform Gemma 3 and 3n models in improving safety, while keeping unjustified refusals low. All testing was conducted without safety filters to evaluate the model capabilities and behaviors. For both text-to-text and image-to-text, and across all model sizes, the model produced minimal policy violations, and showed significant improvements over previous Gemma models' performance. + +## **Usage and Limitations** + +These models have certain limitations that users should be aware of. + +### **Intended Usage** + +Multimodal models (capable of processing vision, language, and/or audio) have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. + +* **Content Creation and Communication** + * **Text Generation**: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. + * **Chatbots and Conversational AI**: Power conversational interfaces for customer service, virtual assistants, or interactive applications. + * **Text Summarization**: Generate concise summaries of a text corpus, research papers, or reports. + * **Image Data Extraction**: These models can be used to extract, interpret, and summarize visual data for text communications. + * **Audio Processing and Interaction**: The smaller models (E2B and E4B) can analyze and interpret audio inputs, enabling voice-driven interactions and transcriptions. +* **Research and Education** + * **Natural Language Processing (NLP) and VLM Research**: These models can serve as a foundation for researchers to experiment with VLM and NLP techniques, develop algorithms, and contribute to the advancement of the field. + * **Language Learning Tools**: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. + * **Knowledge Exploration**: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. + +### **Limitations** + +* **Training Data** + * The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. + * The scope of the training dataset determines the subject areas the model can handle effectively. +* **Context and Task Complexity** + * Models perform well on tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. + * A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). +* **Language Ambiguity and Nuance** + * Natural language is inherently complex. Models might struggle to grasp subtle nuances, sarcasm, or figurative language. +* **Factual Accuracy** + * Models generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. +* **Common Sense** + * Models rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. + +### **Ethical Considerations and Risks** + +The development of vision-language models (VLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: + +* **Bias and Fairness** + * VLMs trained on large-scale, real-world text and image data can reflect socio-cultural biases embedded in the training material. Gemma 4 models underwent careful scrutiny, input data pre-processing, and post-training evaluations as reported in this card to help mitigate the risk of these biases. +* **Misinformation and Misuse** + * VLMs can be misused to generate text that is false, misleading, or harmful. + * Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit](https://ai.google.dev/responsible). +* **Transparency and Accountability** + * This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. + * A responsibly developed open model offers the opportunity to share innovation by making VLM technology accessible to developers and researchers across the AI ecosystem. + +**Risks identified and mitigations**: + +* **Generation of harmful content**: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. +* **Misuse for malicious purposes**: Technical limitations and developer and end-user education can help mitigate against malicious applications of VLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. +* **Privacy violations**: Models were trained on data filtered for removal of certain personal information and other sensitive data. Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. +* **Perpetuation of biases**: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. + +### **Benefits** + +At the time of release, this family of models provides high-performance open vision-language model implementations designed from the ground up for responsible AI development compared to similarly sized models. diff --git a/tooling/huggingface/model-cards/gemma-4-31B-it-README.md b/tooling/huggingface/model-cards/gemma-4-31B-it-README.md new file mode 100644 index 0000000..f7130fd --- /dev/null +++ b/tooling/huggingface/model-cards/gemma-4-31B-it-README.md @@ -0,0 +1,513 @@ +--- +library_name: transformers +license: apache-2.0 +license_link: https://ai.google.dev/gemma/docs/gemma_4_license +pipeline_tag: image-text-to-text +--- + +
+ +
+ + +

+ Hugging Face | + GitHub | + Launch Blog | + Documentation +
+ License: Apache 2.0 | Authors: Google DeepMind +

+ +Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages. + +Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in four distinct sizes: **E2B**, **E4B**, **26B A4B**, and **31B**. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI. + +Gemma 4 introduces key **capability and architectural advancements**: + +* **Reasoning** – All models in the family are designed as highly capable reasoners, with configurable thinking modes. + +* **Extended Multimodalities** – Processes Text, Image with variable aspect ratio and resolution support (all models), Video, and Audio (featured natively on the E2B and E4B models). + +* **Diverse & Efficient Architectures** – Offers Dense and Mixture-of-Experts (MoE) variants of different sizes for scalable deployment. + +* **Optimized for On-Device** – Smaller models are specifically designed for efficient local execution on laptops and mobile devices. + +* **Increased Context Window** – The small models feature a 128K context window, while the medium models support 256K. + +* **Enhanced Coding & Agentic Capabilities** – Achieves notable improvements in coding benchmarks alongside native function-calling support, powering highly capable autonomous agents. + +* **Native System Prompt Support** – Gemma 4 introduces native support for the `system` role, enabling more structured and controllable conversations. + +## **Models Overview** + +Gemma 4 models are designed to deliver frontier-level performance at each size, targeting deployment scenarios from mobile and edge devices (E2B, E4B) to consumer GPUs and workstations (26B A4B, 31B). They are well-suited for reasoning, agentic workflows, coding, and multimodal understanding. + +The models employ a hybrid attention mechanism that interleaves local sliding window attention with full global attention, ensuring the final layer is always global. This hybrid design delivers the processing speed and low memory footprint of a lightweight model without sacrificing the deep awareness required for complex, long-context tasks. To optimize memory for long contexts, global layers feature unified Keys and Values, and apply Proportional RoPE (p-RoPE). + +### Dense Models + +| Property | E2B | E4B | 31B Dense | +| :---- | :---- | :---- | :---- | +| **Total Parameters** | 2.3B effective (5.1B with embeddings) | 4.5B effective (8B with embeddings) | 30.7B | +| **Layers** | 35 | 42 | 60 | +| **Sliding Window** | 512 tokens | 512 tokens | 1024 tokens | +| **Context Length** | 128K tokens | 128K tokens | 256K tokens | +| **Vocabulary Size** | 262K | 262K | 262K | +| **Supported Modalities** | Text, Image, Audio | Text, Image, Audio | Text, Image | +| **Vision Encoder Parameters** | *~150M* | *~150M* | *~550M* | +| **Audio Encoder Parameters** | *~300M* | *~300M* | No Audio | + +The "E" in E2B and E4B stands for "effective" parameters. The smaller models incorporate Per-Layer Embeddings (PLE) to maximize parameter efficiency in on-device deployments. Rather than adding more layers or parameters to the model, PLE gives each decoder layer its own small embedding for every token. These embedding tables are large but are only used for quick lookups, which is why the effective parameter count is much smaller than the total. + +### Mixture-of-Experts (MoE) Model + +| Property | 26B A4B MoE | +| :---- | :---- | +| **Total Parameters** | 25.2B | +| **Active Parameters** | 3.8B | +| **Layers** | 30 | +| **Sliding Window** | 1024 tokens | +| **Context Length** | 256K tokens | +| **Vocabulary Size** | 262K | +| **Expert Count** | 8 active / 128 total and 1 shared | +| **Supported Modalities** | Text, Image | +| **Vision Encoder Parameters** | *~550M* | + +The "A" in 26B A4B stands for "active parameters" in contrast to the total number of parameters the model contains. By only activating a 4B subset of parameters during inference, the Mixture-of-Experts model runs much faster than its 26B total might suggest. This makes it an excellent choice for fast inference compared to the dense 31B model since it runs almost as fast as a 4B-parameter model. + +## **Benchmark Results** + +These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation. Evaluation results marked in the table are for instruction-tuned models. + +| | Gemma 4 31B | Gemma 4 26B A4B | Gemma 4 E4B | Gemma 4 E2B | Gemma 3 27B (no think) | +| :---- | :---- | :---- | :---- | :---- | :---- | +| MMLU Pro | 85.2% | 82.6% | 69.4% | 60.0% | 67.6% | +| AIME 2026 no tools | 89.2% | 88.3% | 42.5% | 37.5% | 20.8% | +| LiveCodeBench v6 | 80.0% | 77.1% | 52.0% | 44.0% | 29.1% | +| Codeforces ELO | 2150 | 1718 | 940 | 633 | 110 | +| GPQA Diamond | 84.3% | 82.3% | 58.6% | 43.4% | 42.4% | +| Tau2 (average over 3) | 76.9% | 68.2% | 42.2% | 24.5% | 16.2% | +| HLE no tools | 19.5% | 8.7% | - | - | - | +| HLE with search | 26.5% | 17.2% | - | - | - | +| BigBench Extra Hard | 74.4% | 64.8% | 33.1% | 21.9% | 19.3% | +| MMMLU | 88.4% | 86.3% | 76.6% | 67.4% | 70.7% | +| **Vision** | | | | | | +| MMMU Pro | 76.9% | 73.8% | 52.6% | 44.2% | 49.7% | +| OmniDocBench 1.5 (average edit distance, lower is better) | 0.131 | 0.149 | 0.181 | 0.290 | 0.365 | +| MATH-Vision | 85.6% | 82.4% | 59.5% | 52.4% | 46.0% | +| MedXPertQA MM | 61.3% | 58.1% | 28.7% | 23.5% | - | +| **Audio** | | | | | | +| CoVoST | - | - | 35.54 | 33.47 | - | +| FLEURS (lower is better) | - | - | 0.08 | 0.09 | - | +| **Long Context** | | | | | | +| MRCR v2 8 needle 128k (average) | 66.4% | 44.1% | 25.4% | 19.1% | 13.5% | + +## **Core Capabilities** + +Gemma 4 models handle a broad range of tasks across text, vision, and audio. Key capabilities include: + +* **Thinking** – Built-in reasoning mode that lets the model think step-by-step before answering. +* **Long Context** – Context windows of up to 128K tokens (E2B/E4B) and 256K tokens (26B A4B/31B). +* **Image Understanding** – Object detection, Document/PDF parsing, screen and UI understanding, chart comprehension, OCR (including multilingual), handwriting recognition, and pointing. Images can be processed at variable aspect ratios and resolutions. +* **Video Understanding** – Analyze video by processing sequences of frames. +* **Interleaved Multimodal Input** – Freely mix text and images in any order within a single prompt. +* **Function Calling** – Native support for structured tool use, enabling agentic workflows. +* **Coding** – Code generation, completion, and correction. +* **Multilingual** – Out-of-the-box support for 35+ languages, pre-trained on 140+ languages. +* **Audio** (E2B and E4B only) – Automatic speech recognition (ASR) and speech-to-translated-text translation across multiple languages. + +## Getting Started + +You can use all Gemma 4 models with the latest version of Transformers. To get started, install the necessary dependencies in your environment: + +`pip install -U transformers torch accelerate` + +Once you have everything installed, you can proceed to load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForCausalLM + +MODEL_ID = "google/gemma-4-31B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForCausalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output: + +```python +# Prompt +messages = [ + {"role": "system", "content": "You are a helpful assistant."}, + {"role": "user", "content": "Write a short joke about saving RAM."}, +] + +# Process input +text = processor.apply_chat_template( + messages, + tokenize=False, + add_generation_prompt=True, + enable_thinking=False +) +inputs = processor(text=text, return_tensors="pt").to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=1024) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +To enable reasoning, set `enable_thinking=True` and the `parse_response` function will take care of parsing the thinking output. + +Below, you will also find snippets for processing audio (E2B and E4B only), images, and video alongside text: + +
+Code for processing Audio + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process audio. To use it, make sure to install the following packages: + + +`pip install -U transformers torch librosa accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-E2B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the audio URL in the prompt: + + +```python +# Prompt - add audio before text +messages = [ + { + "role": "user", + "content": [ + {"type": "audio", "audio": "https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/Demos/sample-data/journal1.wav"}, + {"type": "text", "text": "Transcribe the following speech segment in its original language. Follow these specific instructions for formatting the answer:\n* Only output the transcription, with no newlines.\n* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three."}, + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ +
+Code for processing Images + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process images. To use it, make sure to install the following packages: + + +`pip install -U transformers torch torchvision accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-31B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the image URL in the prompt: + + +```python +# Prompt - add image before text +messages = [ + { + "role": "user", "content": [ + {"type": "image", "url": "https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/Demos/sample-data/GoldenGate.png"}, + {"type": "text", "text": "What is shown in this image?"} + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ + +
+Code for processing Videos + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process videos. To use it, make sure to install the following packages: + +`pip install -U transformers torch torchvision torchcodec librosa accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-31B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the video URL in the prompt: + + +```python +# Prompt - add video before text +messages = [ + { + 'role': 'user', + 'content': [ + {"type": "video", "video": "https://github.com/bebechien/gemma/raw/refs/heads/main/videos/ForBiggerBlazes.mp4"}, + {'type': 'text', 'text': 'Describe this video.'} + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ + +## **Best Practices** + +For the best performance, use these configurations and best practices: + +### 1. Sampling Parameters + +Use the following standardized sampling configuration across all use cases: + +* `temperature=1.0` +* `top_p=0.95` +* `top_k=64` + +### 2. Thinking Mode Configuration + +Compared to Gemma 3, the models use standard `system`, `assistant`, and `user` roles. To properly manage the thinking process, use the following control tokens: + +* **Trigger Thinking:** Thinking is enabled by including the `<|think|>` token at the start of the system prompt. To disable thinking, remove the token. +* **Standard Generation:** When thinking is enabled, the model will output its internal reasoning followed by the final answer using this structure: + `<|channel>thought\n`**[Internal reasoning]**`` +* **Disabled Thinking Behavior:** For all models except for the E2B and E4B variants, if thinking is disabled, the model will still generate the tags but with an empty thought block: + `<|channel>thought\n`**[Final answer]** + +> [!Note] +> Note that many libraries like Transformers and llama.cpp handle the complexities of the chat template for you. + +### 3. Multi-Turn Conversations + +* **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final response. Thoughts from previous model turns must *not be added* before the next user turn begins. + +### 4. Modality order + +* For optimal performance with multimodal inputs, place image and/or audio content **before** the text in your prompt. + +### 5. Variable Image Resolution + +Aside from variable aspect ratios, Gemma 4 supports variable image resolution through a configurable visual token budget, which controls how many tokens are used to represent an image. A higher token budget preserves more visual detail at the cost of additional compute, while a lower budget enables faster inference for tasks that don't require fine-grained understanding. + +* The supported token budgets are: **70**, **140**, **280**, **560**, and **1120**. + * Use *lower budgets* for classification, captioning, or video understanding, where faster inference and processing many frames outweigh fine-grained detail. + * Use *higher budgets* for tasks like OCR, document parsing, or reading small text. + +### 6. Audio + +Use the following prompt structures for audio processing: + +* **Audio Speech Recognition (ASR)** + +```text +Transcribe the following speech segment in {LANGUAGE} into {LANGUAGE} text. + +Follow these specific instructions for formatting the answer: +* Only output the transcription, with no newlines. +* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three. +``` + +* **Automatic Speech Translation (AST)** + +```text +Transcribe the following speech segment in {SOURCE_LANGUAGE}, then translate it into {TARGET_LANGUAGE}. +When formatting the answer, first output the transcription in {SOURCE_LANGUAGE}, then one newline, then output the string '{TARGET_LANGUAGE}: ', then the translation in {TARGET_LANGUAGE}. +``` + +### 7. Audio and Video Length + +All models support image inputs and can process videos as frames whereas the E2B and E4B models also support audio inputs. Audio supports a maximum length of 30 seconds. Video supports a maximum of 60 seconds assuming the images are processed at one frame per second. + +## **Model Data** + +Data used for model training and how the data was processed. + +### **Training Dataset** + +Our pre-training dataset is a large-scale, diverse collection of data encompassing a wide range of domains and modalities, which includes web documents, code, images, audio, with a cutoff date of January 2025. Here are the key components: + +* **Web Documents**: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages. +* **Code**: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions. +* **Mathematics**: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. +* **Images**: A wide range of images enables the model to perform image analysis and visual data extraction tasks. + +The combination of these diverse data sources is crucial for training a powerful multimodal model that can handle a wide variety of different tasks and data formats. + +### **Data Preprocessing** + +Here are the key data cleaning and filtering methods applied to the training data: + +* **CSAM Filtering**: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. +* **Sensitive Data Filtering**: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. +* **Additional methods**: Filtering based on content quality and safety in line with [our policies](https://ai.google/static/documents/ai-responsibility-update-published-february-2025.pdf). + +## **Ethics and Safety** + +As open models become central to enterprise infrastructure, provenance and security are paramount. Developed by Google DeepMind, Gemma 4 undergoes the same rigorous safety evaluations as our proprietary Gemini models. + +### **Evaluation Approach** + +Gemma 4 models were developed in partnership with internal safety and responsible AI teams. A range of automated as well as human evaluations were conducted to help improve model safety. These evaluations align with [Google’s AI principles](https://ai.google/principles/), as well as safety policies, which aim to prevent our generative AI models from generating harmful content, including: + +* Content related to child sexual abuse material and exploitation +* Dangerous content (e.g., promoting suicide, or instructing in activities that could cause real-world harm) +* Sexually explicit content +* Hate speech (e.g., dehumanizing members of protected groups) +* Harassment (e.g., encouraging violence against people) + +### **Evaluation Results** + +For all areas of safety testing, we saw major improvements in all categories of content safety relative to previous Gemma models. Overall, Gemma 4 models significantly outperform Gemma 3 and 3n models in improving safety, while keeping unjustified refusals low. All testing was conducted without safety filters to evaluate the model capabilities and behaviors. For both text-to-text and image-to-text, and across all model sizes, the model produced minimal policy violations, and showed significant improvements over previous Gemma models' performance. + +## **Usage and Limitations** + +These models have certain limitations that users should be aware of. + +### **Intended Usage** + +Multimodal models (capable of processing vision, language, and/or audio) have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. + +* **Content Creation and Communication** + * **Text Generation**: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. + * **Chatbots and Conversational AI**: Power conversational interfaces for customer service, virtual assistants, or interactive applications. + * **Text Summarization**: Generate concise summaries of a text corpus, research papers, or reports. + * **Image Data Extraction**: These models can be used to extract, interpret, and summarize visual data for text communications. + * **Audio Processing and Interaction**: The smaller models (E2B and E4B) can analyze and interpret audio inputs, enabling voice-driven interactions and transcriptions. +* **Research and Education** + * **Natural Language Processing (NLP) and VLM Research**: These models can serve as a foundation for researchers to experiment with VLM and NLP techniques, develop algorithms, and contribute to the advancement of the field. + * **Language Learning Tools**: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. + * **Knowledge Exploration**: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. + +### **Limitations** + +* **Training Data** + * The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. + * The scope of the training dataset determines the subject areas the model can handle effectively. +* **Context and Task Complexity** + * Models perform well on tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. + * A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). +* **Language Ambiguity and Nuance** + * Natural language is inherently complex. Models might struggle to grasp subtle nuances, sarcasm, or figurative language. +* **Factual Accuracy** + * Models generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. +* **Common Sense** + * Models rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. + +### **Ethical Considerations and Risks** + +The development of vision-language models (VLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: + +* **Bias and Fairness** + * VLMs trained on large-scale, real-world text and image data can reflect socio-cultural biases embedded in the training material. Gemma 4 models underwent careful scrutiny, input data pre-processing, and post-training evaluations as reported in this card to help mitigate the risk of these biases. +* **Misinformation and Misuse** + * VLMs can be misused to generate text that is false, misleading, or harmful. + * Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit](https://ai.google.dev/responsible). +* **Transparency and Accountability** + * This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. + * A responsibly developed open model offers the opportunity to share innovation by making VLM technology accessible to developers and researchers across the AI ecosystem. + +**Risks identified and mitigations**: + +* **Generation of harmful content**: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. +* **Misuse for malicious purposes**: Technical limitations and developer and end-user education can help mitigate against malicious applications of VLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. +* **Privacy violations**: Models were trained on data filtered for removal of certain personal information and other sensitive data. Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. +* **Perpetuation of biases**: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. + +### **Benefits** + +At the time of release, this family of models provides high-performance open vision-language model implementations designed from the ground up for responsible AI development compared to similarly sized models. diff --git a/tooling/huggingface/model-cards/gemma-4-31B-it-chat_template.jinja b/tooling/huggingface/model-cards/gemma-4-31B-it-chat_template.jinja new file mode 100644 index 0000000..98da08e --- /dev/null +++ b/tooling/huggingface/model-cards/gemma-4-31B-it-chat_template.jinja @@ -0,0 +1,347 @@ +{%- macro format_parameters(properties, required) -%} + {%- set standard_keys = ['description', 'type', 'properties', 'required', 'nullable'] -%} + {%- set ns = namespace(found_first=false) -%} + {%- for key, value in properties | dictsort -%} + {%- set add_comma = false -%} + {%- if key not in standard_keys -%} + {%- if ns.found_first %},{% endif -%} + {%- set ns.found_first = true -%} + {{ key }}:{ + {%- if value['description'] -%} + description:<|"|>{{ value['description'] }}<|"|> + {%- set add_comma = true -%} + {%- endif -%} + {%- if value['type'] | upper == 'STRING' -%} + {%- if value['enum'] -%} + {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%} + enum:{{ format_argument(value['enum']) }} + {%- endif -%} + {%- elif value['type'] | upper == 'ARRAY' -%} + {%- if value['items'] is mapping and value['items'] -%} + {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%} + items:{ + {%- set ns_items = namespace(found_first=false) -%} + {%- for item_key, item_value in value['items'] | dictsort -%} + {%- if item_value is not none -%} + {%- if ns_items.found_first %},{% endif -%} + {%- set ns_items.found_first = true -%} + {%- if item_key == 'properties' -%} + properties:{ + {%- if item_value is mapping -%} + {{- format_parameters(item_value, value['items']['required'] | default([])) -}} + {%- endif -%} + } + {%- elif item_key == 'required' -%} + required:[ + {%- for req_item in item_value -%} + <|"|>{{- req_item -}}<|"|> + {%- if not loop.last %},{% endif -%} + {%- endfor -%} + ] + {%- elif item_key == 'type' -%} + {%- if item_value is string -%} + type:{{ format_argument(item_value | upper) }} + {%- else -%} + type:{{ format_argument(item_value | map('upper') | list) }} + {%- endif -%} + {%- else -%} + {{ item_key }}:{{ format_argument(item_value) }} + {%- endif -%} + {%- endif -%} + {%- endfor -%} + } + {%- endif -%} + {%- endif -%} + {%- if value['nullable'] %} + {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%} + nullable:true + {%- endif -%} + {%- if value['type'] | upper == 'OBJECT' -%} + {%- if value['properties'] is defined and value['properties'] is mapping -%} + {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%} + properties:{ + {{- format_parameters(value['properties'], value['required'] | default([])) -}} + } + {%- elif value is mapping -%} + {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%} + properties:{ + {{- format_parameters(value, value['required'] | default([])) -}} + } + {%- endif -%} + {%- if value['required'] -%} + {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%} + required:[ + {%- for item in value['required'] | default([]) -%} + <|"|>{{- item -}}<|"|> + {%- if not loop.last %},{% endif -%} + {%- endfor -%} + ] + {%- endif -%} + {%- endif -%} + {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%} + type:<|"|>{{ value['type'] | upper }}<|"|>} + {%- endif -%} + {%- endfor -%} +{%- endmacro -%} +{%- macro format_function_declaration(tool_data) -%} + declaration:{{- tool_data['function']['name'] -}}{description:<|"|>{{- tool_data['function']['description'] -}}<|"|> + {%- set params = tool_data['function']['parameters'] -%} + {%- if params -%} + ,parameters:{ + {%- if params['properties'] -%} + properties:{ {{- format_parameters(params['properties'], params['required']) -}} }, + {%- endif -%} + {%- if params['required'] -%} + required:[ + {%- for item in params['required'] -%} + <|"|>{{- item -}}<|"|> + {{- ',' if not loop.last -}} + {%- endfor -%} + ], + {%- endif -%} + {%- if params['type'] -%} + type:<|"|>{{- params['type'] | upper -}}<|"|>} + {%- endif -%} + {%- endif -%} + {%- if 'response' in tool_data['function'] -%} + {%- set response_declaration = tool_data['function']['response'] -%} + ,response:{ + {%- if response_declaration['description'] -%} + description:<|"|>{{- response_declaration['description'] -}}<|"|>, + {%- endif -%} + {%- if response_declaration['type'] | upper == 'OBJECT' -%} + type:<|"|>{{- response_declaration['type'] | upper -}}<|"|>} + {%- endif -%} + {%- endif -%} + } +{%- endmacro -%} +{%- macro format_argument(argument, escape_keys=True) -%} + {%- if argument is string -%} + {{- '<|"|>' + argument + '<|"|>' -}} + {%- elif argument is boolean -%} + {{- 'true' if argument else 'false' -}} + {%- elif argument is mapping -%} + {{- '{' -}} + {%- set ns = namespace(found_first=false) -%} + {%- for key, value in argument | dictsort -%} + {%- if ns.found_first %},{% endif -%} + {%- set ns.found_first = true -%} + {%- if escape_keys -%} + {{- '<|"|>' + key + '<|"|>' -}} + {%- else -%} + {{- key -}} + {%- endif -%} + :{{- format_argument(value, escape_keys=escape_keys) -}} + {%- endfor -%} + {{- '}' -}} + {%- elif argument is sequence -%} + {{- '[' -}} + {%- for item in argument -%} + {{- format_argument(item, escape_keys=escape_keys) -}} + {%- if not loop.last %},{% endif -%} + {%- endfor -%} + {{- ']' -}} + {%- else -%} + {{- argument -}} + {%- endif -%} +{%- endmacro -%} +{%- macro strip_thinking(text) -%} + {%- set ns = namespace(result='') -%} + {%- for part in text.split('') -%} + {%- if '<|channel>' in part -%} + {%- set ns.result = ns.result + part.split('<|channel>')[0] -%} + {%- else -%} + {%- set ns.result = ns.result + part -%} + {%- endif -%} + {%- endfor -%} + {{- ns.result | trim -}} +{%- endmacro -%} + +{%- macro format_tool_response_block(tool_name, response) -%} + {{- '<|tool_response>' -}} + {%- if response is mapping -%} + {{- 'response:' + tool_name + '{' -}} + {%- for key, value in response | dictsort -%} + {{- key -}}:{{- format_argument(value, escape_keys=False) -}} + {%- if not loop.last %},{% endif -%} + {%- endfor -%} + {{- '}' -}} + {%- else -%} + {{- 'response:' + tool_name + '{value:' + format_argument(response, escape_keys=False) + '}' -}} + {%- endif -%} + {{- '' -}} +{%- endmacro -%} + +{%- set ns = namespace(prev_message_type=None) -%} +{%- set loop_messages = messages -%} +{{- bos_token -}} +{#- Handle System/Tool Definitions Block -#} +{%- if (enable_thinking is defined and enable_thinking) or tools or messages[0]['role'] in ['system', 'developer'] -%} + {{- '<|turn>system\n' -}} + + {#- Inject Thinking token at the very top of the FIRST system turn -#} + {%- if enable_thinking is defined and enable_thinking -%} + {{- '<|think|>\n' -}} + {%- set ns.prev_message_type = 'think' -%} + {%- endif -%} + + {%- if messages[0]['role'] in ['system', 'developer'] -%} + {{- messages[0]['content'] | trim -}} + {%- set loop_messages = messages[1:] -%} + {%- endif -%} + + {%- if tools -%} + {%- for tool in tools %} + {{- '<|tool>' -}} + {{- format_function_declaration(tool) | trim -}} + {{- '' -}} + {%- endfor %} + {%- set ns.prev_message_type = 'tool' -%} + {%- endif -%} + + {{- '\n' -}} +{%- endif %} + +{#- Pre-scan: find last user message index for reasoning guard -#} +{%- set ns_turn = namespace(last_user_idx=-1) -%} +{%- for i in range(loop_messages | length) -%} + {%- if loop_messages[i]['role'] == 'user' -%} + {%- set ns_turn.last_user_idx = i -%} + {%- endif -%} +{%- endfor -%} + +{#- Loop through messages -#} +{%- for message in loop_messages -%} + {%- if message['role'] != 'tool' -%} + {%- set ns.prev_message_type = None -%} + {%- set role = 'model' if message['role'] == 'assistant' else message['role'] -%} + {#- Detect continuation: suppress duplicate <|turn>model when previous non-tool message was also assistant -#} + {%- set prev_nt = namespace(role=None, found=false) -%} + {%- if loop.index0 > 0 -%} + {%- for j in range(loop.index0 - 1, -1, -1) -%} + {%- if not prev_nt.found -%} + {%- if loop_messages[j]['role'] != 'tool' -%} + {%- set prev_nt.role = loop_messages[j]['role'] -%} + {%- set prev_nt.found = true -%} + {%- endif -%} + {%- endif -%} + {%- endfor -%} + {%- endif -%} + {%- set continue_same_model_turn = (role == 'model' and prev_nt.role == 'assistant') -%} + {%- if not continue_same_model_turn -%} + {{- '<|turn>' + role + '\n' }} + {%- endif -%} + + {#- Render reasoning/reasoning_content as thinking channel -#} + {%- set thinking_text = message.get('reasoning') or message.get('reasoning_content') -%} + {%- if thinking_text and loop.index0 > ns_turn.last_user_idx and message.get('tool_calls') -%} + {{- '<|channel>thought\n' + thinking_text + '\n' -}} + {%- endif -%} + + {%- if message['tool_calls'] -%} + {%- for tool_call in message['tool_calls'] -%} + {%- set function = tool_call['function'] -%} + {{- '<|tool_call>call:' + function['name'] + '{' -}} + {%- if function['arguments'] is mapping -%} + {%- set ns_args = namespace(found_first=false) -%} + {%- for key, value in function['arguments'] | dictsort -%} + {%- if ns_args.found_first %},{% endif -%} + {%- set ns_args.found_first = true -%} + {{- key -}}:{{- format_argument(value, escape_keys=False) -}} + {%- endfor -%} + {%- elif function['arguments'] is string -%} + {{- function['arguments'] -}} + {%- endif -%} + {{- '}' -}} + {%- endfor -%} + {%- set ns.prev_message_type = 'tool_call' -%} + {%- endif -%} + + {%- set ns_tr_out = namespace(flag=false) -%} + {%- if message.get('tool_responses') -%} + {#- Legacy: tool_responses embedded on the assistant message (Google/Gemma native) -#} + {%- for tool_response in message['tool_responses'] -%} + {{- format_tool_response_block(tool_response['name'] | default('unknown'), tool_response['response']) -}} + {%- set ns_tr_out.flag = true -%} + {%- set ns.prev_message_type = 'tool_response' -%} + {%- endfor -%} + {%- elif message.get('tool_calls') -%} + {#- OpenAI Chat Completions: forward-scan consecutive role:tool messages -#} + {%- set ns_tool_scan = namespace(stopped=false) -%} + {%- for k in range(loop.index0 + 1, loop_messages | length) -%} + {%- if ns_tool_scan.stopped -%} + {%- elif loop_messages[k]['role'] != 'tool' -%} + {%- set ns_tool_scan.stopped = true -%} + {%- else -%} + {%- set follow = loop_messages[k] -%} + {#- Resolve tool_call_id to function name -#} + {%- set ns_tname = namespace(name=follow.get('name') | default('unknown')) -%} + {%- for tc in message['tool_calls'] -%} + {%- if tc.get('id') == follow.get('tool_call_id') -%} + {%- set ns_tname.name = tc['function']['name'] -%} + {%- endif -%} + {%- endfor -%} + {#- Handle content as string or content-parts array -#} + {%- set tool_body = follow.get('content') -%} + {%- if tool_body is string -%} + {{- format_tool_response_block(ns_tname.name, tool_body) -}} + {%- elif tool_body is sequence and tool_body is not string -%} + {%- set ns_txt = namespace(s='') -%} + {%- for part in tool_body -%} + {%- if part.get('type') == 'text' -%} + {%- set ns_txt.s = ns_txt.s + (part.get('text') | default('')) -%} + {%- endif -%} + {%- endfor -%} + {{- format_tool_response_block(ns_tname.name, ns_txt.s) -}} + {%- else -%} + {{- format_tool_response_block(ns_tname.name, tool_body) -}} + {%- endif -%} + {%- set ns_tr_out.flag = true -%} + {%- set ns.prev_message_type = 'tool_response' -%} + {%- endif -%} + {%- endfor -%} + {%- endif -%} + + {%- if message['content'] is string -%} + {%- if role == 'model' -%} + {{- strip_thinking(message['content']) -}} + {%- else -%} + {{- message['content'] | trim -}} + {%- endif -%} + {%- elif message['content'] is sequence -%} + {%- for item in message['content'] -%} + {%- if item['type'] == 'text' -%} + {%- if role == 'model' -%} + {{- strip_thinking(item['text']) -}} + {%- else -%} + {{- item['text'] | trim -}} + {%- endif -%} + {%- elif item['type'] == 'image' -%} + {{- '<|image|>' -}} + {%- set ns.prev_message_type = 'image' -%} + {%- elif item['type'] == 'audio' -%} + {{- '<|audio|>' -}} + {%- set ns.prev_message_type = 'audio' -%} + {%- elif item['type'] == 'video' -%} + {{- '<|video|>' -}} + {%- set ns.prev_message_type = 'video' -%} + {%- endif -%} + {%- endfor -%} + {%- endif -%} + + {%- if ns.prev_message_type == 'tool_call' and not ns_tr_out.flag -%} + {{- '<|tool_response>' -}} + {%- elif not (ns_tr_out.flag and not message.get('content')) -%} + {{- '\n' -}} + {%- endif -%} + {%- endif -%} +{%- endfor -%} + +{%- if add_generation_prompt -%} + {%- if ns.prev_message_type != 'tool_response' and ns.prev_message_type != 'tool_call' -%} + {{- '<|turn>model\n' -}} + {%- if not enable_thinking | default(false) -%} + {{- '<|channel>thought\n' -}} + {%- endif -%} + {%- endif -%} +{%- endif -%} diff --git a/tooling/huggingface/model-cards/gemma-4-31B-it-tokenizer_config.json b/tooling/huggingface/model-cards/gemma-4-31B-it-tokenizer_config.json new file mode 100644 index 0000000..375b25d --- /dev/null +++ b/tooling/huggingface/model-cards/gemma-4-31B-it-tokenizer_config.json @@ -0,0 +1,74 @@ +{ + "audio_token": "<|audio|>", + "backend": "tokenizers", + "boa_token": "<|audio>", + "boi_token": "<|image>", + "bos_token": "", + "eoa_token": "", + "eoc_token": "", + "eoi_token": "", + "eos_token": "", + "eot_token": "", + "escape_token": "<|\"|>", + "etc_token": "", + "etd_token": "", + "etr_token": "", + "extra_special_tokens": [ + "<|video|>" + ], + "image_token": "<|image|>", + "mask_token": "", + "model_max_length": 1000000000000000019884624838656, + "pad_token": "", + "padding_side": "left", + "processor_class": "Gemma4Processor", + "response_schema": { + "type": "object", + "properties": { + "role": { + "const": "assistant" + }, + "thinking": { + "type": "string" + }, + "content": { + "type": "string" + }, + "tool_calls": { + "x-regex-iterator": "<\\|tool_call>(.*?)", + "type": "array", + "items": { + "type": "object", + "properties": { + "type": { + "const": "function" + }, + "function": { + "type": "object", + "x-regex": "call\\:(?P\\w+)(?P\\{.*\\})", + "properties": { + "name": { + "type": "string" + }, + "arguments": { + "type": "object", + "x-parser": "gemma4-tool-call", + "additionalProperties": {} + } + } + } + } + } + } + }, + "x-regex": "(\\<\\|channel\\>thought\\n(?P.*?)\\)?(?P\\<\\|tool_call\\>.*\\)?(?P(?:(?!\\)(?!\\<\\|tool_response\\>).)+)?(?:\\|\\<\\|tool_response\\>)?" + }, + "soc_token": "<|channel>", + "sot_token": "<|turn>", + "stc_token": "<|tool_call>", + "std_token": "<|tool>", + "str_token": "<|tool_response>", + "think_token": "<|think|>", + "tokenizer_class": "GemmaTokenizer", + "unk_token": "" +} diff --git a/tooling/huggingface/model-cards/gemma-4-E2B-README.md b/tooling/huggingface/model-cards/gemma-4-E2B-README.md new file mode 100644 index 0000000..67061ce --- /dev/null +++ b/tooling/huggingface/model-cards/gemma-4-E2B-README.md @@ -0,0 +1,516 @@ +--- +library_name: transformers +license: apache-2.0 +license_link: https://ai.google.dev/gemma/docs/gemma_4_license +pipeline_tag: any-to-any +--- + +
+ +
+ + +

+ Hugging Face | + GitHub | + Launch Blog | + Documentation +
+ License: Apache 2.0 | Authors: Google DeepMind +

+ +Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages. + +Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in four distinct sizes: **E2B**, **E4B**, **26B A4B**, and **31B**. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI. + +Gemma 4 introduces key **capability and architectural advancements**: + +* **Reasoning** – All models in the family are designed as highly capable reasoners, with configurable thinking modes. + +* **Extended Multimodalities** – Processes Text, Image with variable aspect ratio and resolution support (all models), Video, and Audio (featured natively on the E2B and E4B models). + +* **Diverse & Efficient Architectures** – Offers Dense and Mixture-of-Experts (MoE) variants of different sizes for scalable deployment. + +* **Optimized for On-Device** – Smaller models are specifically designed for efficient local execution on laptops and mobile devices. + +* **Increased Context Window** – The small models feature a 128K context window, while the medium models support 256K. + +* **Enhanced Coding & Agentic Capabilities** – Achieves notable improvements in coding benchmarks alongside native function-calling support, powering highly capable autonomous agents. + +* **Native System Prompt Support** – Gemma 4 introduces native support for the `system` role, enabling more structured and controllable conversations. + +## **Models Overview** + +Gemma 4 models are designed to deliver frontier-level performance at each size, targeting deployment scenarios from mobile and edge devices (E2B, E4B) to consumer GPUs and workstations (26B A4B, 31B). They are well-suited for reasoning, agentic workflows, coding, and multimodal understanding. + +The models employ a hybrid attention mechanism that interleaves local sliding window attention with full global attention, ensuring the final layer is always global. This hybrid design delivers the processing speed and low memory footprint of a lightweight model without sacrificing the deep awareness required for complex, long-context tasks. To optimize memory for long contexts, global layers feature unified Keys and Values, and apply Proportional RoPE (p-RoPE). + +### Dense Models + +| Property | E2B | E4B | 31B Dense | +| :---- | :---- | :---- | :---- | +| **Total Parameters** | 2.3B effective (5.1B with embeddings) | 4.5B effective (8B with embeddings) | 30.7B | +| **Layers** | 35 | 42 | 60 | +| **Sliding Window** | 512 tokens | 512 tokens | 1024 tokens | +| **Context Length** | 128K tokens | 128K tokens | 256K tokens | +| **Vocabulary Size** | 262K | 262K | 262K | +| **Supported Modalities** | Text, Image, Audio | Text, Image, Audio | Text, Image | +| **Vision Encoder Parameters** | *~150M* | *~150M* | *~550M* | +| **Audio Encoder Parameters** | *~300M* | *~300M* | No Audio | + +The "E" in E2B and E4B stands for "effective" parameters. The smaller models incorporate Per-Layer Embeddings (PLE) to maximize parameter efficiency in on-device deployments. Rather than adding more layers or parameters to the model, PLE gives each decoder layer its own small embedding for every token. These embedding tables are large but are only used for quick lookups, which is why the effective parameter count is much smaller than the total. + +### Mixture-of-Experts (MoE) Model + +| Property | 26B A4B MoE | +| :---- | :---- | +| **Total Parameters** | 25.2B | +| **Active Parameters** | 3.8B | +| **Layers** | 30 | +| **Sliding Window** | 1024 tokens | +| **Context Length** | 256K tokens | +| **Vocabulary Size** | 262K | +| **Expert Count** | 8 active / 128 total and 1 shared | +| **Supported Modalities** | Text, Image | +| **Vision Encoder Parameters** | *~550M* | + +The "A" in 26B A4B stands for "active parameters" in contrast to the total number of parameters the model contains. By only activating a 4B subset of parameters during inference, the Mixture-of-Experts model runs much faster than its 26B total might suggest. This makes it an excellent choice for fast inference compared to the dense 31B model since it runs almost as fast as a 4B-parameter model. + +## **Benchmark Results** + +These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation. Evaluation results marked in the table are for instruction-tuned models. + +| | Gemma 4 31B | Gemma 4 26B A4B | Gemma 4 E4B | Gemma 4 E2B | Gemma 3 27B (no think) | +| :---- | :---- | :---- | :---- | :---- | :---- | +| MMLU Pro | 85.2% | 82.6% | 69.4% | 60.0% | 67.6% | +| AIME 2026 no tools | 89.2% | 88.3% | 42.5% | 37.5% | 20.8% | +| LiveCodeBench v6 | 80.0% | 77.1% | 52.0% | 44.0% | 29.1% | +| Codeforces ELO | 2150 | 1718 | 940 | 633 | 110 | +| GPQA Diamond | 84.3% | 82.3% | 58.6% | 43.4% | 42.4% | +| Tau2 (average over 3) | 76.9% | 68.2% | 42.2% | 24.5% | 16.2% | +| HLE no tools | 19.5% | 8.7% | - | - | - | +| HLE with search | 26.5% | 17.2% | - | - | - | +| BigBench Extra Hard | 74.4% | 64.8% | 33.1% | 21.9% | 19.3% | +| MMMLU | 88.4% | 86.3% | 76.6% | 67.4% | 70.7% | +| **Vision** | | | | | | +| MMMU Pro | 76.9% | 73.8% | 52.6% | 44.2% | 49.7% | +| OmniDocBench 1.5 (average edit distance, lower is better) | 0.131 | 0.149 | 0.181 | 0.290 | 0.365 | +| MATH-Vision | 85.6% | 82.4% | 59.5% | 52.4% | 46.0% | +| MedXPertQA MM | 61.3% | 58.1% | 28.7% | 23.5% | - | +| **Audio** | | | | | | +| CoVoST | - | - | 35.54 | 33.47 | - | +| FLEURS (lower is better) | - | - | 0.08 | 0.09 | - | +| **Long Context** | | | | | | +| MRCR v2 8 needle 128k (average) | 66.4% | 44.1% | 25.4% | 19.1% | 13.5% | + +## **Core Capabilities** + +Gemma 4 models handle a broad range of tasks across text, vision, and audio. Key capabilities include: + +* **Thinking** – Built-in reasoning mode that lets the model think step-by-step before answering. +* **Long Context** – Context windows of up to 128K tokens (E2B/E4B) and 256K tokens (26B A4B/31B). +* **Image Understanding** – Object detection, Document/PDF parsing, screen and UI understanding, chart comprehension, OCR (including multilingual), handwriting recognition, and pointing. Images can be processed at variable aspect ratios and resolutions. +* **Video Understanding** – Analyze video by processing sequences of frames. +* **Interleaved Multimodal Input** – Freely mix text and images in any order within a single prompt. +* **Function Calling** – Native support for structured tool use, enabling agentic workflows. +* **Coding** – Code generation, completion, and correction. +* **Multilingual** – Out-of-the-box support for 35+ languages, pre-trained on 140+ languages. +* **Audio** (E2B and E4B only) – Automatic speech recognition (ASR) and speech-to-translated-text translation across multiple languages. + + +## Getting Started + +You can use all Gemma 4 models with the latest version of Transformers. To get started, install the necessary dependencies in your environment: + +`pip install -U transformers torch accelerate` + +Once you have everything installed, you can proceed to load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForCausalLM + +MODEL_ID = "google/gemma-4-E2B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForCausalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output: + +```python +# Prompt +messages = [ + {"role": "system", "content": "You are a helpful assistant."}, + {"role": "user", "content": "Write a short joke about saving RAM."}, +] + +# Process input +text = processor.apply_chat_template( + messages, + tokenize=False, + add_generation_prompt=True, + enable_thinking=False +) +inputs = processor(text=text, return_tensors="pt").to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=1024) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +To enable reasoning, set `enable_thinking=True` and the `parse_response` function will take care of parsing the thinking output. + +Below, you will also find snippets for processing audio (E2B and E4B only), images, and video alongside text: + +
+Code for processing Audio + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process audio. To use it, make sure to install the following packages: + + +`pip install -U transformers torch torchvision librosa accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-E2B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the audio URL in the prompt: + + +```python +# Prompt - add audio before text +messages = [ + { + "role": "user", + "content": [ + {"type": "audio", "audio": "https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/Demos/sample-data/journal1.wav"}, + {"type": "text", "text": "Transcribe the following speech segment in its original language. Follow these specific instructions for formatting the answer:\n* Only output the transcription, with no newlines.\n* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three."}, + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ +
+Code for processing Images + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process images. To use it, make sure to install the following packages: + + +`pip install -U transformers torch torchvision accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-E2B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the image URL in the prompt: + + +```python +# Prompt - add image before text +messages = [ + { + "role": "user", "content": [ + {"type": "image", "url": "https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/Demos/sample-data/GoldenGate.png"}, + {"type": "text", "text": "What is shown in this image?"} + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ + +
+Code for processing Videos + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process videos. To use it, make sure to install the following packages: + +`pip install -U transformers torch torchvision librosa accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-E2B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the video URL in the prompt: + + +```python +# Prompt - add video before text +messages = [ + { + 'role': 'user', + 'content': [ + {"type": "video", "video": "https://github.com/bebechien/gemma/raw/refs/heads/main/videos/ForBiggerBlazes.mp4"}, + {'type': 'text', 'text': 'Describe this video.'} + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ + + + +## **Best Practices** + +For the best performance, use these configurations and best practices: + +### 1. Sampling Parameters + +Use the following standardized sampling configuration across all use cases: + +* `temperature=1.0` +* `top_p=0.95` +* `top_k=64` + +### 2. Thinking Mode Configuration + +Compared to Gemma 3, the models use standard `system`, `assistant`, and `user` roles. To properly manage the thinking process, use the following control tokens: + +* **Trigger Thinking:** Thinking is enabled by including the `<|think|>` token at the start of the system prompt. To disable thinking, remove the token. +* **Standard Generation:** When thinking is enabled, the model will output its internal reasoning followed by the final answer using this structure: + `<|channel>thought\n`**[Internal reasoning]**`` +* **Disabled Thinking Behavior:** For all models except for the E2B and E4B variants, if thinking is disabled, the model will still generate the tags but with an empty thought block: + `<|channel>thought\n`**[Final answer]** + +> [!Note] +> Note that many libraries like Transformers and llama.cpp handle the complexities of the chat template for you. + +### 3. Multi-Turn Conversations + +* **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final response. Thoughts from previous model turns must *not be added* before the next user turn begins. + +### 4. Modality order + +* For optimal performance with multimodal inputs, place image and/or audio content **before** the text in your prompt. + +### 5. Variable Image Resolution + +Aside from variable aspect ratios, Gemma 4 supports variable image resolution through a configurable visual token budget, which controls how many tokens are used to represent an image. A higher token budget preserves more visual detail at the cost of additional compute, while a lower budget enables faster inference for tasks that don't require fine-grained understanding. + +* The supported token budgets are: **70**, **140**, **280**, **560**, and **1120**. + * Use *lower budgets* for classification, captioning, or video understanding, where faster inference and processing many frames outweigh fine-grained detail. + * Use *higher budgets* for tasks like OCR, document parsing, or reading small text. + +### 6. Audio + +Use the following prompt structures for audio processing: + +* **Audio Speech Recognition (ASR)** + +```text +Transcribe the following speech segment in {LANGUAGE} into {LANGUAGE} text. + +Follow these specific instructions for formatting the answer: +* Only output the transcription, with no newlines. +* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three. +``` + +* **Automatic Speech Translation (AST)** + +```text +Transcribe the following speech segment in {SOURCE_LANGUAGE}, then translate it into {TARGET_LANGUAGE}. +When formatting the answer, first output the transcription in {SOURCE_LANGUAGE}, then one newline, then output the string '{TARGET_LANGUAGE}: ', then the translation in {TARGET_LANGUAGE}. +``` + +### 7. Audio and Video Length + +All models support image inputs and can process videos as frames whereas the E2B and E4B models also support audio inputs. Audio supports a maximum length of 30 seconds. Video supports a maximum of 60 seconds assuming the images are processed at one frame per second. + +## **Model Data** + +Data used for model training and how the data was processed. + +### **Training Dataset** + +Our pre-training dataset is a large-scale, diverse collection of data encompassing a wide range of domains and modalities, which includes web documents, code, images, audio, with a cutoff date of January 2025. Here are the key components: + +* **Web Documents**: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages. +* **Code**: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions. +* **Mathematics**: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. +* **Images**: A wide range of images enables the model to perform image analysis and visual data extraction tasks. + +The combination of these diverse data sources is crucial for training a powerful multimodal model that can handle a wide variety of different tasks and data formats. + +### **Data Preprocessing** + +Here are the key data cleaning and filtering methods applied to the training data: + +* **CSAM Filtering**: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. +* **Sensitive Data Filtering**: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. +* **Additional methods**: Filtering based on content quality and safety in line with [our policies](https://ai.google/static/documents/ai-responsibility-update-published-february-2025.pdf). + +## **Ethics and Safety** + +As open models become central to enterprise infrastructure, provenance and security are paramount. Developed by Google DeepMind, Gemma 4 undergoes the same rigorous safety evaluations as our proprietary Gemini models. + +### **Evaluation Approach** + +Gemma 4 models were developed in partnership with internal safety and responsible AI teams. A range of automated as well as human evaluations were conducted to help improve model safety. These evaluations align with [Google’s AI principles](https://ai.google/principles/), as well as safety policies, which aim to prevent our generative AI models from generating harmful content, including: + +* Content related to child sexual abuse material and exploitation +* Dangerous content (e.g., promoting suicide, or instructing in activities that could cause real-world harm) +* Sexually explicit content +* Hate speech (e.g., dehumanizing members of protected groups) +* Harassment (e.g., encouraging violence against people) + +### **Evaluation Results** + +For all areas of safety testing, we saw major improvements in all categories of content safety relative to previous Gemma models. Overall, Gemma 4 models significantly outperform Gemma 3 and 3n models in improving safety, while keeping unjustified refusals low. All testing was conducted without safety filters to evaluate the model capabilities and behaviors. For both text-to-text and image-to-text, and across all model sizes, the model produced minimal policy violations, and showed significant improvements over previous Gemma models' performance. + +## **Usage and Limitations** + +These models have certain limitations that users should be aware of. + +### **Intended Usage** + +Multimodal models (capable of processing vision, language, and/or audio) have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. + +* **Content Creation and Communication** + * **Text Generation**: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. + * **Chatbots and Conversational AI**: Power conversational interfaces for customer service, virtual assistants, or interactive applications. + * **Text Summarization**: Generate concise summaries of a text corpus, research papers, or reports. + * **Image Data Extraction**: These models can be used to extract, interpret, and summarize visual data for text communications. + * **Audio Processing and Interaction**: The smaller models (E2B and E4B) can analyze and interpret audio inputs, enabling voice-driven interactions and transcriptions. +* **Research and Education** + * **Natural Language Processing (NLP) and VLM Research**: These models can serve as a foundation for researchers to experiment with VLM and NLP techniques, develop algorithms, and contribute to the advancement of the field. + * **Language Learning Tools**: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. + * **Knowledge Exploration**: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. + +### **Limitations** + +* **Training Data** + * The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. + * The scope of the training dataset determines the subject areas the model can handle effectively. +* **Context and Task Complexity** + * Models perform well on tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. + * A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). +* **Language Ambiguity and Nuance** + * Natural language is inherently complex. Models might struggle to grasp subtle nuances, sarcasm, or figurative language. +* **Factual Accuracy** + * Models generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. +* **Common Sense** + * Models rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. + +### **Ethical Considerations and Risks** + +The development of vision-language models (VLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: + +* **Bias and Fairness** + * VLMs trained on large-scale, real-world text and image data can reflect socio-cultural biases embedded in the training material. Gemma 4 models underwent careful scrutiny, input data pre-processing, and post-training evaluations as reported in this card to help mitigate the risk of these biases. +* **Misinformation and Misuse** + * VLMs can be misused to generate text that is false, misleading, or harmful. + * Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit](https://ai.google.dev/responsible). +* **Transparency and Accountability** + * This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. + * A responsibly developed open model offers the opportunity to share innovation by making VLM technology accessible to developers and researchers across the AI ecosystem. + +**Risks identified and mitigations**: + +* **Generation of harmful content**: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. +* **Misuse for malicious purposes**: Technical limitations and developer and end-user education can help mitigate against malicious applications of VLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. +* **Privacy violations**: Models were trained on data filtered for removal of certain personal information and other sensitive data. Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. +* **Perpetuation of biases**: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. + +### **Benefits** + +At the time of release, this family of models provides high-performance open vision-language model implementations designed from the ground up for responsible AI development compared to similarly sized models. diff --git a/tooling/huggingface/model-cards/gemma-4-E2B-it-README.md b/tooling/huggingface/model-cards/gemma-4-E2B-it-README.md new file mode 100644 index 0000000..45209d4 --- /dev/null +++ b/tooling/huggingface/model-cards/gemma-4-E2B-it-README.md @@ -0,0 +1,513 @@ +--- +library_name: transformers +license: apache-2.0 +license_link: https://ai.google.dev/gemma/docs/gemma_4_license +pipeline_tag: any-to-any +--- + +
+ +
+ + +

+ Hugging Face | + GitHub | + Launch Blog | + Documentation +
+ License: Apache 2.0 | Authors: Google DeepMind +

+ +Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages. + +Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in four distinct sizes: **E2B**, **E4B**, **26B A4B**, and **31B**. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI. + +Gemma 4 introduces key **capability and architectural advancements**: + +* **Reasoning** – All models in the family are designed as highly capable reasoners, with configurable thinking modes. + +* **Extended Multimodalities** – Processes Text, Image with variable aspect ratio and resolution support (all models), Video, and Audio (featured natively on the E2B and E4B models). + +* **Diverse & Efficient Architectures** – Offers Dense and Mixture-of-Experts (MoE) variants of different sizes for scalable deployment. + +* **Optimized for On-Device** – Smaller models are specifically designed for efficient local execution on laptops and mobile devices. + +* **Increased Context Window** – The small models feature a 128K context window, while the medium models support 256K. + +* **Enhanced Coding & Agentic Capabilities** – Achieves notable improvements in coding benchmarks alongside native function-calling support, powering highly capable autonomous agents. + +* **Native System Prompt Support** – Gemma 4 introduces native support for the `system` role, enabling more structured and controllable conversations. + +## **Models Overview** + +Gemma 4 models are designed to deliver frontier-level performance at each size, targeting deployment scenarios from mobile and edge devices (E2B, E4B) to consumer GPUs and workstations (26B A4B, 31B). They are well-suited for reasoning, agentic workflows, coding, and multimodal understanding. + +The models employ a hybrid attention mechanism that interleaves local sliding window attention with full global attention, ensuring the final layer is always global. This hybrid design delivers the processing speed and low memory footprint of a lightweight model without sacrificing the deep awareness required for complex, long-context tasks. To optimize memory for long contexts, global layers feature unified Keys and Values, and apply Proportional RoPE (p-RoPE). + +### Dense Models + +| Property | E2B | E4B | 31B Dense | +| :---- | :---- | :---- | :---- | +| **Total Parameters** | 2.3B effective (5.1B with embeddings) | 4.5B effective (8B with embeddings) | 30.7B | +| **Layers** | 35 | 42 | 60 | +| **Sliding Window** | 512 tokens | 512 tokens | 1024 tokens | +| **Context Length** | 128K tokens | 128K tokens | 256K tokens | +| **Vocabulary Size** | 262K | 262K | 262K | +| **Supported Modalities** | Text, Image, Audio | Text, Image, Audio | Text, Image | +| **Vision Encoder Parameters** | *~150M* | *~150M* | *~550M* | +| **Audio Encoder Parameters** | *~300M* | *~300M* | No Audio | + +The "E" in E2B and E4B stands for "effective" parameters. The smaller models incorporate Per-Layer Embeddings (PLE) to maximize parameter efficiency in on-device deployments. Rather than adding more layers or parameters to the model, PLE gives each decoder layer its own small embedding for every token. These embedding tables are large but are only used for quick lookups, which is why the effective parameter count is much smaller than the total. + +### Mixture-of-Experts (MoE) Model + +| Property | 26B A4B MoE | +| :---- | :---- | +| **Total Parameters** | 25.2B | +| **Active Parameters** | 3.8B | +| **Layers** | 30 | +| **Sliding Window** | 1024 tokens | +| **Context Length** | 256K tokens | +| **Vocabulary Size** | 262K | +| **Expert Count** | 8 active / 128 total and 1 shared | +| **Supported Modalities** | Text, Image | +| **Vision Encoder Parameters** | *~550M* | + +The "A" in 26B A4B stands for "active parameters" in contrast to the total number of parameters the model contains. By only activating a 4B subset of parameters during inference, the Mixture-of-Experts model runs much faster than its 26B total might suggest. This makes it an excellent choice for fast inference compared to the dense 31B model since it runs almost as fast as a 4B-parameter model. + +## **Benchmark Results** + +These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation. Evaluation results marked in the table are for instruction-tuned models. + +| | Gemma 4 31B | Gemma 4 26B A4B | Gemma 4 E4B | Gemma 4 E2B | Gemma 3 27B (no think) | +| :---- | :---- | :---- | :---- | :---- | :---- | +| MMLU Pro | 85.2% | 82.6% | 69.4% | 60.0% | 67.6% | +| AIME 2026 no tools | 89.2% | 88.3% | 42.5% | 37.5% | 20.8% | +| LiveCodeBench v6 | 80.0% | 77.1% | 52.0% | 44.0% | 29.1% | +| Codeforces ELO | 2150 | 1718 | 940 | 633 | 110 | +| GPQA Diamond | 84.3% | 82.3% | 58.6% | 43.4% | 42.4% | +| Tau2 (average over 3) | 76.9% | 68.2% | 42.2% | 24.5% | 16.2% | +| HLE no tools | 19.5% | 8.7% | - | - | - | +| HLE with search | 26.5% | 17.2% | - | - | - | +| BigBench Extra Hard | 74.4% | 64.8% | 33.1% | 21.9% | 19.3% | +| MMMLU | 88.4% | 86.3% | 76.6% | 67.4% | 70.7% | +| **Vision** | | | | | | +| MMMU Pro | 76.9% | 73.8% | 52.6% | 44.2% | 49.7% | +| OmniDocBench 1.5 (average edit distance, lower is better) | 0.131 | 0.149 | 0.181 | 0.290 | 0.365 | +| MATH-Vision | 85.6% | 82.4% | 59.5% | 52.4% | 46.0% | +| MedXPertQA MM | 61.3% | 58.1% | 28.7% | 23.5% | - | +| **Audio** | | | | | | +| CoVoST | - | - | 35.54 | 33.47 | - | +| FLEURS (lower is better) | - | - | 0.08 | 0.09 | - | +| **Long Context** | | | | | | +| MRCR v2 8 needle 128k (average) | 66.4% | 44.1% | 25.4% | 19.1% | 13.5% | + +## **Core Capabilities** + +Gemma 4 models handle a broad range of tasks across text, vision, and audio. Key capabilities include: + +* **Thinking** – Built-in reasoning mode that lets the model think step-by-step before answering. +* **Long Context** – Context windows of up to 128K tokens (E2B/E4B) and 256K tokens (26B A4B/31B). +* **Image Understanding** – Object detection, Document/PDF parsing, screen and UI understanding, chart comprehension, OCR (including multilingual), handwriting recognition, and pointing. Images can be processed at variable aspect ratios and resolutions. +* **Video Understanding** – Analyze video by processing sequences of frames. +* **Interleaved Multimodal Input** – Freely mix text and images in any order within a single prompt. +* **Function Calling** – Native support for structured tool use, enabling agentic workflows. +* **Coding** – Code generation, completion, and correction. +* **Multilingual** – Out-of-the-box support for 35+ languages, pre-trained on 140+ languages. +* **Audio** (E2B and E4B only) – Automatic speech recognition (ASR) and speech-to-translated-text translation across multiple languages. + +## Getting Started + +You can use all Gemma 4 models with the latest version of Transformers. To get started, install the necessary dependencies in your environment: + +`pip install -U transformers torch accelerate` + +Once you have everything installed, you can proceed to load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForCausalLM + +MODEL_ID = "google/gemma-4-E2B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForCausalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output: + +```python +# Prompt +messages = [ + {"role": "system", "content": "You are a helpful assistant."}, + {"role": "user", "content": "Write a short joke about saving RAM."}, +] + +# Process input +text = processor.apply_chat_template( + messages, + tokenize=False, + add_generation_prompt=True, + enable_thinking=False +) +inputs = processor(text=text, return_tensors="pt").to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=1024) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +To enable reasoning, set `enable_thinking=True` and the `parse_response` function will take care of parsing the thinking output. + +Below, you will also find snippets for processing audio (E2B and E4B only), images, and video alongside text: + +
+Code for processing Audio + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process audio. To use it, make sure to install the following packages: + + +`pip install -U transformers torch torchvision librosa accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-E2B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the audio URL in the prompt: + + +```python +# Prompt - add audio before text +messages = [ + { + "role": "user", + "content": [ + {"type": "audio", "audio": "https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/Demos/sample-data/journal1.wav"}, + {"type": "text", "text": "Transcribe the following speech segment in its original language. Follow these specific instructions for formatting the answer:\n* Only output the transcription, with no newlines.\n* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three."}, + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ +
+Code for processing Images + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process images. To use it, make sure to install the following packages: + + +`pip install -U transformers torch torchvision accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-E2B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the image URL in the prompt: + + +```python +# Prompt - add image before text +messages = [ + { + "role": "user", "content": [ + {"type": "image", "url": "https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/Demos/sample-data/GoldenGate.png"}, + {"type": "text", "text": "What is shown in this image?"} + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ + +
+Code for processing Videos + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process videos. To use it, make sure to install the following packages: + +`pip install -U transformers torch torchvision librosa accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-E2B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the video URL in the prompt: + + +```python +# Prompt - add video before text +messages = [ + { + 'role': 'user', + 'content': [ + {"type": "video", "video": "https://github.com/bebechien/gemma/raw/refs/heads/main/videos/ForBiggerBlazes.mp4"}, + {'type': 'text', 'text': 'Describe this video.'} + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ + +## **Best Practices** + +For the best performance, use these configurations and best practices: + +### 1. Sampling Parameters + +Use the following standardized sampling configuration across all use cases: + +* `temperature=1.0` +* `top_p=0.95` +* `top_k=64` + +### 2. Thinking Mode Configuration + +Compared to Gemma 3, the models use standard `system`, `assistant`, and `user` roles. To properly manage the thinking process, use the following control tokens: + +* **Trigger Thinking:** Thinking is enabled by including the `<|think|>` token at the start of the system prompt. To disable thinking, remove the token. +* **Standard Generation:** When thinking is enabled, the model will output its internal reasoning followed by the final answer using this structure: + `<|channel>thought\n`**[Internal reasoning]**`` +* **Disabled Thinking Behavior:** For all models except for the E2B and E4B variants, if thinking is disabled, the model will still generate the tags but with an empty thought block: + `<|channel>thought\n`**[Final answer]** + +> [!Note] +> Note that many libraries like Transformers and llama.cpp handle the complexities of the chat template for you. + +### 3. Multi-Turn Conversations + +* **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final response. Thoughts from previous model turns must *not be added* before the next user turn begins. + +### 4. Modality order + +* For optimal performance with multimodal inputs, place image and/or audio content **before** the text in your prompt. + +### 5. Variable Image Resolution + +Aside from variable aspect ratios, Gemma 4 supports variable image resolution through a configurable visual token budget, which controls how many tokens are used to represent an image. A higher token budget preserves more visual detail at the cost of additional compute, while a lower budget enables faster inference for tasks that don't require fine-grained understanding. + +* The supported token budgets are: **70**, **140**, **280**, **560**, and **1120**. + * Use *lower budgets* for classification, captioning, or video understanding, where faster inference and processing many frames outweigh fine-grained detail. + * Use *higher budgets* for tasks like OCR, document parsing, or reading small text. + +### 6. Audio + +Use the following prompt structures for audio processing: + +* **Audio Speech Recognition (ASR)** + +```text +Transcribe the following speech segment in {LANGUAGE} into {LANGUAGE} text. + +Follow these specific instructions for formatting the answer: +* Only output the transcription, with no newlines. +* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three. +``` + +* **Automatic Speech Translation (AST)** + +```text +Transcribe the following speech segment in {SOURCE_LANGUAGE}, then translate it into {TARGET_LANGUAGE}. +When formatting the answer, first output the transcription in {SOURCE_LANGUAGE}, then one newline, then output the string '{TARGET_LANGUAGE}: ', then the translation in {TARGET_LANGUAGE}. +``` + +### 7. Audio and Video Length + +All models support image inputs and can process videos as frames whereas the E2B and E4B models also support audio inputs. Audio supports a maximum length of 30 seconds. Video supports a maximum of 60 seconds assuming the images are processed at one frame per second. + +## **Model Data** + +Data used for model training and how the data was processed. + +### **Training Dataset** + +Our pre-training dataset is a large-scale, diverse collection of data encompassing a wide range of domains and modalities, which includes web documents, code, images, audio, with a cutoff date of January 2025. Here are the key components: + +* **Web Documents**: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages. +* **Code**: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions. +* **Mathematics**: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. +* **Images**: A wide range of images enables the model to perform image analysis and visual data extraction tasks. + +The combination of these diverse data sources is crucial for training a powerful multimodal model that can handle a wide variety of different tasks and data formats. + +### **Data Preprocessing** + +Here are the key data cleaning and filtering methods applied to the training data: + +* **CSAM Filtering**: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. +* **Sensitive Data Filtering**: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. +* **Additional methods**: Filtering based on content quality and safety in line with [our policies](https://ai.google/static/documents/ai-responsibility-update-published-february-2025.pdf). + +## **Ethics and Safety** + +As open models become central to enterprise infrastructure, provenance and security are paramount. Developed by Google DeepMind, Gemma 4 undergoes the same rigorous safety evaluations as our proprietary Gemini models. + +### **Evaluation Approach** + +Gemma 4 models were developed in partnership with internal safety and responsible AI teams. A range of automated as well as human evaluations were conducted to help improve model safety. These evaluations align with [Google’s AI principles](https://ai.google/principles/), as well as safety policies, which aim to prevent our generative AI models from generating harmful content, including: + +* Content related to child sexual abuse material and exploitation +* Dangerous content (e.g., promoting suicide, or instructing in activities that could cause real-world harm) +* Sexually explicit content +* Hate speech (e.g., dehumanizing members of protected groups) +* Harassment (e.g., encouraging violence against people) + +### **Evaluation Results** + +For all areas of safety testing, we saw major improvements in all categories of content safety relative to previous Gemma models. Overall, Gemma 4 models significantly outperform Gemma 3 and 3n models in improving safety, while keeping unjustified refusals low. All testing was conducted without safety filters to evaluate the model capabilities and behaviors. For both text-to-text and image-to-text, and across all model sizes, the model produced minimal policy violations, and showed significant improvements over previous Gemma models' performance. + +## **Usage and Limitations** + +These models have certain limitations that users should be aware of. + +### **Intended Usage** + +Multimodal models (capable of processing vision, language, and/or audio) have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. + +* **Content Creation and Communication** + * **Text Generation**: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. + * **Chatbots and Conversational AI**: Power conversational interfaces for customer service, virtual assistants, or interactive applications. + * **Text Summarization**: Generate concise summaries of a text corpus, research papers, or reports. + * **Image Data Extraction**: These models can be used to extract, interpret, and summarize visual data for text communications. + * **Audio Processing and Interaction**: The smaller models (E2B and E4B) can analyze and interpret audio inputs, enabling voice-driven interactions and transcriptions. +* **Research and Education** + * **Natural Language Processing (NLP) and VLM Research**: These models can serve as a foundation for researchers to experiment with VLM and NLP techniques, develop algorithms, and contribute to the advancement of the field. + * **Language Learning Tools**: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. + * **Knowledge Exploration**: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. + +### **Limitations** + +* **Training Data** + * The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. + * The scope of the training dataset determines the subject areas the model can handle effectively. +* **Context and Task Complexity** + * Models perform well on tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. + * A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). +* **Language Ambiguity and Nuance** + * Natural language is inherently complex. Models might struggle to grasp subtle nuances, sarcasm, or figurative language. +* **Factual Accuracy** + * Models generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. +* **Common Sense** + * Models rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. + +### **Ethical Considerations and Risks** + +The development of vision-language models (VLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: + +* **Bias and Fairness** + * VLMs trained on large-scale, real-world text and image data can reflect socio-cultural biases embedded in the training material. Gemma 4 models underwent careful scrutiny, input data pre-processing, and post-training evaluations as reported in this card to help mitigate the risk of these biases. +* **Misinformation and Misuse** + * VLMs can be misused to generate text that is false, misleading, or harmful. + * Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit](https://ai.google.dev/responsible). +* **Transparency and Accountability** + * This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. + * A responsibly developed open model offers the opportunity to share innovation by making VLM technology accessible to developers and researchers across the AI ecosystem. + +**Risks identified and mitigations**: + +* **Generation of harmful content**: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. +* **Misuse for malicious purposes**: Technical limitations and developer and end-user education can help mitigate against malicious applications of VLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. +* **Privacy violations**: Models were trained on data filtered for removal of certain personal information and other sensitive data. Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. +* **Perpetuation of biases**: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. + +### **Benefits** + +At the time of release, this family of models provides high-performance open vision-language model implementations designed from the ground up for responsible AI development compared to similarly sized models. diff --git a/tooling/huggingface/model-cards/gemma-4-E4B-README.md b/tooling/huggingface/model-cards/gemma-4-E4B-README.md new file mode 100644 index 0000000..fb28cf6 --- /dev/null +++ b/tooling/huggingface/model-cards/gemma-4-E4B-README.md @@ -0,0 +1,513 @@ +--- +library_name: transformers +license: apache-2.0 +license_link: https://ai.google.dev/gemma/docs/gemma_4_license +pipeline_tag: any-to-any +--- + +
+ +
+ + +

+ Hugging Face | + GitHub | + Launch Blog | + Documentation +
+ License: Apache 2.0 | Authors: Google DeepMind +

+ +Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages. + +Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in four distinct sizes: **E2B**, **E4B**, **26B A4B**, and **31B**. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI. + +Gemma 4 introduces key **capability and architectural advancements**: + +* **Reasoning** – All models in the family are designed as highly capable reasoners, with configurable thinking modes. + +* **Extended Multimodalities** – Processes Text, Image with variable aspect ratio and resolution support (all models), Video, and Audio (featured natively on the E2B and E4B models). + +* **Diverse & Efficient Architectures** – Offers Dense and Mixture-of-Experts (MoE) variants of different sizes for scalable deployment. + +* **Optimized for On-Device** – Smaller models are specifically designed for efficient local execution on laptops and mobile devices. + +* **Increased Context Window** – The small models feature a 128K context window, while the medium models support 256K. + +* **Enhanced Coding & Agentic Capabilities** – Achieves notable improvements in coding benchmarks alongside native function-calling support, powering highly capable autonomous agents. + +* **Native System Prompt Support** – Gemma 4 introduces native support for the `system` role, enabling more structured and controllable conversations. + +## **Models Overview** + +Gemma 4 models are designed to deliver frontier-level performance at each size, targeting deployment scenarios from mobile and edge devices (E2B, E4B) to consumer GPUs and workstations (26B A4B, 31B). They are well-suited for reasoning, agentic workflows, coding, and multimodal understanding. + +The models employ a hybrid attention mechanism that interleaves local sliding window attention with full global attention, ensuring the final layer is always global. This hybrid design delivers the processing speed and low memory footprint of a lightweight model without sacrificing the deep awareness required for complex, long-context tasks. To optimize memory for long contexts, global layers feature unified Keys and Values, and apply Proportional RoPE (p-RoPE). + +### Dense Models + +| Property | E2B | E4B | 31B Dense | +| :---- | :---- | :---- | :---- | +| **Total Parameters** | 2.3B effective (5.1B with embeddings) | 4.5B effective (8B with embeddings) | 30.7B | +| **Layers** | 35 | 42 | 60 | +| **Sliding Window** | 512 tokens | 512 tokens | 1024 tokens | +| **Context Length** | 128K tokens | 128K tokens | 256K tokens | +| **Vocabulary Size** | 262K | 262K | 262K | +| **Supported Modalities** | Text, Image, Audio | Text, Image, Audio | Text, Image | +| **Vision Encoder Parameters** | *~150M* | *~150M* | *~550M* | +| **Audio Encoder Parameters** | *~300M* | *~300M* | No Audio | + +The "E" in E2B and E4B stands for "effective" parameters. The smaller models incorporate Per-Layer Embeddings (PLE) to maximize parameter efficiency in on-device deployments. Rather than adding more layers or parameters to the model, PLE gives each decoder layer its own small embedding for every token. These embedding tables are large but are only used for quick lookups, which is why the effective parameter count is much smaller than the total. + +### Mixture-of-Experts (MoE) Model + +| Property | 26B A4B MoE | +| :---- | :---- | +| **Total Parameters** | 25.2B | +| **Active Parameters** | 3.8B | +| **Layers** | 30 | +| **Sliding Window** | 1024 tokens | +| **Context Length** | 256K tokens | +| **Vocabulary Size** | 262K | +| **Expert Count** | 8 active / 128 total and 1 shared | +| **Supported Modalities** | Text, Image | +| **Vision Encoder Parameters** | *~550M* | + +The "A" in 26B A4B stands for "active parameters" in contrast to the total number of parameters the model contains. By only activating a 4B subset of parameters during inference, the Mixture-of-Experts model runs much faster than its 26B total might suggest. This makes it an excellent choice for fast inference compared to the dense 31B model since it runs almost as fast as a 4B-parameter model. + +## **Benchmark Results** + +These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation. Evaluation results marked in the table are for instruction-tuned models. + +| | Gemma 4 31B | Gemma 4 26B A4B | Gemma 4 E4B | Gemma 4 E2B | Gemma 3 27B (no think) | +| :---- | :---- | :---- | :---- | :---- | :---- | +| MMLU Pro | 85.2% | 82.6% | 69.4% | 60.0% | 67.6% | +| AIME 2026 no tools | 89.2% | 88.3% | 42.5% | 37.5% | 20.8% | +| LiveCodeBench v6 | 80.0% | 77.1% | 52.0% | 44.0% | 29.1% | +| Codeforces ELO | 2150 | 1718 | 940 | 633 | 110 | +| GPQA Diamond | 84.3% | 82.3% | 58.6% | 43.4% | 42.4% | +| Tau2 (average over 3) | 76.9% | 68.2% | 42.2% | 24.5% | 16.2% | +| HLE no tools | 19.5% | 8.7% | - | - | - | +| HLE with search | 26.5% | 17.2% | - | - | - | +| BigBench Extra Hard | 74.4% | 64.8% | 33.1% | 21.9% | 19.3% | +| MMMLU | 88.4% | 86.3% | 76.6% | 67.4% | 70.7% | +| **Vision** | | | | | | +| MMMU Pro | 76.9% | 73.8% | 52.6% | 44.2% | 49.7% | +| OmniDocBench 1.5 (average edit distance, lower is better) | 0.131 | 0.149 | 0.181 | 0.290 | 0.365 | +| MATH-Vision | 85.6% | 82.4% | 59.5% | 52.4% | 46.0% | +| MedXPertQA MM | 61.3% | 58.1% | 28.7% | 23.5% | - | +| **Audio** | | | | | | +| CoVoST | - | - | 35.54 | 33.47 | - | +| FLEURS (lower is better) | - | - | 0.08 | 0.09 | - | +| **Long Context** | | | | | | +| MRCR v2 8 needle 128k (average) | 66.4% | 44.1% | 25.4% | 19.1% | 13.5% | + +## **Core Capabilities** + +Gemma 4 models handle a broad range of tasks across text, vision, and audio. Key capabilities include: + +* **Thinking** – Built-in reasoning mode that lets the model think step-by-step before answering. +* **Long Context** – Context windows of up to 128K tokens (E2B/E4B) and 256K tokens (26B A4B/31B). +* **Image Understanding** – Object detection, Document/PDF parsing, screen and UI understanding, chart comprehension, OCR (including multilingual), handwriting recognition, and pointing. Images can be processed at variable aspect ratios and resolutions. +* **Video Understanding** – Analyze video by processing sequences of frames. +* **Interleaved Multimodal Input** – Freely mix text and images in any order within a single prompt. +* **Function Calling** – Native support for structured tool use, enabling agentic workflows. +* **Coding** – Code generation, completion, and correction. +* **Multilingual** – Out-of-the-box support for 35+ languages, pre-trained on 140+ languages. +* **Audio** (E2B and E4B only) – Automatic speech recognition (ASR) and speech-to-translated-text translation across multiple languages. + +## Getting Started + +You can use all Gemma 4 models with the latest version of Transformers. To get started, install the necessary dependencies in your environment: + +`pip install -U transformers torch accelerate` + +Once you have everything installed, you can proceed to load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForCausalLM + +MODEL_ID = "google/gemma-4-E4B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForCausalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output: + +```python +# Prompt +messages = [ + {"role": "system", "content": "You are a helpful assistant."}, + {"role": "user", "content": "Write a short joke about saving RAM."}, +] + +# Process input +text = processor.apply_chat_template( + messages, + tokenize=False, + add_generation_prompt=True, + enable_thinking=False +) +inputs = processor(text=text, return_tensors="pt").to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=1024) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +To enable reasoning, set `enable_thinking=True` and the `parse_response` function will take care of parsing the thinking output. + +Below, you will also find snippets for processing audio (E2B and E4B only), images, and video alongside text: + +
+Code for processing Audio + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process audio. To use it, make sure to install the following packages: + + +`pip install -U transformers torch torchvision librosa accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-E4B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the audio URL in the prompt: + + +```python +# Prompt - add audio before text +messages = [ + { + "role": "user", + "content": [ + {"type": "audio", "audio": "https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/Demos/sample-data/journal1.wav"}, + {"type": "text", "text": "Transcribe the following speech segment in its original language. Follow these specific instructions for formatting the answer:\n* Only output the transcription, with no newlines.\n* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three."}, + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ +
+Code for processing Images + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process images. To use it, make sure to install the following packages: + + +`pip install -U transformers torch torchvision accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-E4B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the image URL in the prompt: + + +```python +# Prompt - add image before text +messages = [ + { + "role": "user", "content": [ + {"type": "image", "url": "https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/Demos/sample-data/GoldenGate.png"}, + {"type": "text", "text": "What is shown in this image?"} + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ + +
+Code for processing Videos + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process videos. To use it, make sure to install the following packages: + +`pip install -U transformers torch torchvision librosa accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-E4B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the video URL in the prompt: + + +```python +# Prompt - add video before text +messages = [ + { + 'role': 'user', + 'content': [ + {"type": "video", "video": "https://github.com/bebechien/gemma/raw/refs/heads/main/videos/ForBiggerBlazes.mp4"}, + {'type': 'text', 'text': 'Describe this video.'} + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ + +## **Best Practices** + +For the best performance, use these configurations and best practices: + +### 1. Sampling Parameters + +Use the following standardized sampling configuration across all use cases: + +* `temperature=1.0` +* `top_p=0.95` +* `top_k=64` + +### 2. Thinking Mode Configuration + +Compared to Gemma 3, the models use standard `system`, `assistant`, and `user` roles. To properly manage the thinking process, use the following control tokens: + +* **Trigger Thinking:** Thinking is enabled by including the `<|think|>` token at the start of the system prompt. To disable thinking, remove the token. +* **Standard Generation:** When thinking is enabled, the model will output its internal reasoning followed by the final answer using this structure: + `<|channel>thought\n`**[Internal reasoning]**`` +* **Disabled Thinking Behavior:** For all models except for the E2B and E4B variants, if thinking is disabled, the model will still generate the tags but with an empty thought block: + `<|channel>thought\n`**[Final answer]** + +> [!Note] +> Note that many libraries like Transformers and llama.cpp handle the complexities of the chat template for you. + +### 3. Multi-Turn Conversations + +* **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final response. Thoughts from previous model turns must *not be added* before the next user turn begins. + +### 4. Modality order + +* For optimal performance with multimodal inputs, place image and/or audio content **before** the text in your prompt. + +### 5. Variable Image Resolution + +Aside from variable aspect ratios, Gemma 4 supports variable image resolution through a configurable visual token budget, which controls how many tokens are used to represent an image. A higher token budget preserves more visual detail at the cost of additional compute, while a lower budget enables faster inference for tasks that don't require fine-grained understanding. + +* The supported token budgets are: **70**, **140**, **280**, **560**, and **1120**. + * Use *lower budgets* for classification, captioning, or video understanding, where faster inference and processing many frames outweigh fine-grained detail. + * Use *higher budgets* for tasks like OCR, document parsing, or reading small text. + +### 6. Audio + +Use the following prompt structures for audio processing: + +* **Audio Speech Recognition (ASR)** + +```text +Transcribe the following speech segment in {LANGUAGE} into {LANGUAGE} text. + +Follow these specific instructions for formatting the answer: +* Only output the transcription, with no newlines. +* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three. +``` + +* **Automatic Speech Translation (AST)** + +```text +Transcribe the following speech segment in {SOURCE_LANGUAGE}, then translate it into {TARGET_LANGUAGE}. +When formatting the answer, first output the transcription in {SOURCE_LANGUAGE}, then one newline, then output the string '{TARGET_LANGUAGE}: ', then the translation in {TARGET_LANGUAGE}. +``` + +### 7. Audio and Video Length + +All models support image inputs and can process videos as frames whereas the E2B and E4B models also support audio inputs. Audio supports a maximum length of 30 seconds. Video supports a maximum of 60 seconds assuming the images are processed at one frame per second. + +## **Model Data** + +Data used for model training and how the data was processed. + +### **Training Dataset** + +Our pre-training dataset is a large-scale, diverse collection of data encompassing a wide range of domains and modalities, which includes web documents, code, images, audio, with a cutoff date of January 2025. Here are the key components: + +* **Web Documents**: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages. +* **Code**: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions. +* **Mathematics**: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. +* **Images**: A wide range of images enables the model to perform image analysis and visual data extraction tasks. + +The combination of these diverse data sources is crucial for training a powerful multimodal model that can handle a wide variety of different tasks and data formats. + +### **Data Preprocessing** + +Here are the key data cleaning and filtering methods applied to the training data: + +* **CSAM Filtering**: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. +* **Sensitive Data Filtering**: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. +* **Additional methods**: Filtering based on content quality and safety in line with [our policies](https://ai.google/static/documents/ai-responsibility-update-published-february-2025.pdf). + +## **Ethics and Safety** + +As open models become central to enterprise infrastructure, provenance and security are paramount. Developed by Google DeepMind, Gemma 4 undergoes the same rigorous safety evaluations as our proprietary Gemini models. + +### **Evaluation Approach** + +Gemma 4 models were developed in partnership with internal safety and responsible AI teams. A range of automated as well as human evaluations were conducted to help improve model safety. These evaluations align with [Google’s AI principles](https://ai.google/principles/), as well as safety policies, which aim to prevent our generative AI models from generating harmful content, including: + +* Content related to child sexual abuse material and exploitation +* Dangerous content (e.g., promoting suicide, or instructing in activities that could cause real-world harm) +* Sexually explicit content +* Hate speech (e.g., dehumanizing members of protected groups) +* Harassment (e.g., encouraging violence against people) + +### **Evaluation Results** + +For all areas of safety testing, we saw major improvements in all categories of content safety relative to previous Gemma models. Overall, Gemma 4 models significantly outperform Gemma 3 and 3n models in improving safety, while keeping unjustified refusals low. All testing was conducted without safety filters to evaluate the model capabilities and behaviors. For both text-to-text and image-to-text, and across all model sizes, the model produced minimal policy violations, and showed significant improvements over previous Gemma models' performance. + +## **Usage and Limitations** + +These models have certain limitations that users should be aware of. + +### **Intended Usage** + +Multimodal models (capable of processing vision, language, and/or audio) have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. + +* **Content Creation and Communication** + * **Text Generation**: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. + * **Chatbots and Conversational AI**: Power conversational interfaces for customer service, virtual assistants, or interactive applications. + * **Text Summarization**: Generate concise summaries of a text corpus, research papers, or reports. + * **Image Data Extraction**: These models can be used to extract, interpret, and summarize visual data for text communications. + * **Audio Processing and Interaction**: The smaller models (E2B and E4B) can analyze and interpret audio inputs, enabling voice-driven interactions and transcriptions. +* **Research and Education** + * **Natural Language Processing (NLP) and VLM Research**: These models can serve as a foundation for researchers to experiment with VLM and NLP techniques, develop algorithms, and contribute to the advancement of the field. + * **Language Learning Tools**: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. + * **Knowledge Exploration**: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. + +### **Limitations** + +* **Training Data** + * The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. + * The scope of the training dataset determines the subject areas the model can handle effectively. +* **Context and Task Complexity** + * Models perform well on tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. + * A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). +* **Language Ambiguity and Nuance** + * Natural language is inherently complex. Models might struggle to grasp subtle nuances, sarcasm, or figurative language. +* **Factual Accuracy** + * Models generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. +* **Common Sense** + * Models rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. + +### **Ethical Considerations and Risks** + +The development of vision-language models (VLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: + +* **Bias and Fairness** + * VLMs trained on large-scale, real-world text and image data can reflect socio-cultural biases embedded in the training material. Gemma 4 models underwent careful scrutiny, input data pre-processing, and post-training evaluations as reported in this card to help mitigate the risk of these biases. +* **Misinformation and Misuse** + * VLMs can be misused to generate text that is false, misleading, or harmful. + * Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit](https://ai.google.dev/responsible). +* **Transparency and Accountability** + * This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. + * A responsibly developed open model offers the opportunity to share innovation by making VLM technology accessible to developers and researchers across the AI ecosystem. + +**Risks identified and mitigations**: + +* **Generation of harmful content**: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. +* **Misuse for malicious purposes**: Technical limitations and developer and end-user education can help mitigate against malicious applications of VLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. +* **Privacy violations**: Models were trained on data filtered for removal of certain personal information and other sensitive data. Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. +* **Perpetuation of biases**: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. + +### **Benefits** + +At the time of release, this family of models provides high-performance open vision-language model implementations designed from the ground up for responsible AI development compared to similarly sized models. diff --git a/tooling/huggingface/model-cards/gemma-4-E4B-it-README.md b/tooling/huggingface/model-cards/gemma-4-E4B-it-README.md new file mode 100644 index 0000000..e8abaaf --- /dev/null +++ b/tooling/huggingface/model-cards/gemma-4-E4B-it-README.md @@ -0,0 +1,515 @@ +--- +library_name: transformers +license: apache-2.0 +license_link: https://ai.google.dev/gemma/docs/gemma_4_license +pipeline_tag: any-to-any +--- + +
+ +
+ + +

+ Hugging Face | + GitHub | + Launch Blog | + Documentation +
+ License: Apache 2.0 | Authors: Google DeepMind +

+ +Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages. + +Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in four distinct sizes: **E2B**, **E4B**, **26B A4B**, and **31B**. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI. + +Gemma 4 introduces key **capability and architectural advancements**: + +* **Reasoning** – All models in the family are designed as highly capable reasoners, with configurable thinking modes. + +* **Extended Multimodalities** – Processes Text, Image with variable aspect ratio and resolution support (all models), Video, and Audio (featured natively on the E2B and E4B models). + +* **Diverse & Efficient Architectures** – Offers Dense and Mixture-of-Experts (MoE) variants of different sizes for scalable deployment. + +* **Optimized for On-Device** – Smaller models are specifically designed for efficient local execution on laptops and mobile devices. + +* **Increased Context Window** – The small models feature a 128K context window, while the medium models support 256K. + +* **Enhanced Coding & Agentic Capabilities** – Achieves notable improvements in coding benchmarks alongside native function-calling support, powering highly capable autonomous agents. + +* **Native System Prompt Support** – Gemma 4 introduces native support for the `system` role, enabling more structured and controllable conversations. + +## **Models Overview** + +Gemma 4 models are designed to deliver frontier-level performance at each size, targeting deployment scenarios from mobile and edge devices (E2B, E4B) to consumer GPUs and workstations (26B A4B, 31B). They are well-suited for reasoning, agentic workflows, coding, and multimodal understanding. + +The models employ a hybrid attention mechanism that interleaves local sliding window attention with full global attention, ensuring the final layer is always global. This hybrid design delivers the processing speed and low memory footprint of a lightweight model without sacrificing the deep awareness required for complex, long-context tasks. To optimize memory for long contexts, global layers feature unified Keys and Values, and apply Proportional RoPE (p-RoPE). + +### Dense Models + +| Property | E2B | E4B | 31B Dense | +| :---- | :---- | :---- | :---- | +| **Total Parameters** | 2.3B effective (5.1B with embeddings) | 4.5B effective (8B with embeddings) | 30.7B | +| **Layers** | 35 | 42 | 60 | +| **Sliding Window** | 512 tokens | 512 tokens | 1024 tokens | +| **Context Length** | 128K tokens | 128K tokens | 256K tokens | +| **Vocabulary Size** | 262K | 262K | 262K | +| **Supported Modalities** | Text, Image, Audio | Text, Image, Audio | Text, Image | +| **Vision Encoder Parameters** | *~150M* | *~150M* | *~550M* | +| **Audio Encoder Parameters** | *~300M* | *~300M* | No Audio | + +The "E" in E2B and E4B stands for "effective" parameters. The smaller models incorporate Per-Layer Embeddings (PLE) to maximize parameter efficiency in on-device deployments. Rather than adding more layers or parameters to the model, PLE gives each decoder layer its own small embedding for every token. These embedding tables are large but are only used for quick lookups, which is why the effective parameter count is much smaller than the total. + +### Mixture-of-Experts (MoE) Model + +| Property | 26B A4B MoE | +| :---- | :---- | +| **Total Parameters** | 25.2B | +| **Active Parameters** | 3.8B | +| **Layers** | 30 | +| **Sliding Window** | 1024 tokens | +| **Context Length** | 256K tokens | +| **Vocabulary Size** | 262K | +| **Expert Count** | 8 active / 128 total and 1 shared | +| **Supported Modalities** | Text, Image | +| **Vision Encoder Parameters** | *~550M* | + +The "A" in 26B A4B stands for "active parameters" in contrast to the total number of parameters the model contains. By only activating a 4B subset of parameters during inference, the Mixture-of-Experts model runs much faster than its 26B total might suggest. This makes it an excellent choice for fast inference compared to the dense 31B model since it runs almost as fast as a 4B-parameter model. + +## **Benchmark Results** + +These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation. Evaluation results marked in the table are for instruction-tuned models. + +| | Gemma 4 31B | Gemma 4 26B A4B | Gemma 4 E4B | Gemma 4 E2B | Gemma 3 27B (no think) | +| :---- | :---- | :---- | :---- | :---- | :---- | +| MMLU Pro | 85.2% | 82.6% | 69.4% | 60.0% | 67.6% | +| AIME 2026 no tools | 89.2% | 88.3% | 42.5% | 37.5% | 20.8% | +| LiveCodeBench v6 | 80.0% | 77.1% | 52.0% | 44.0% | 29.1% | +| Codeforces ELO | 2150 | 1718 | 940 | 633 | 110 | +| GPQA Diamond | 84.3% | 82.3% | 58.6% | 43.4% | 42.4% | +| Tau2 (average over 3) | 76.9% | 68.2% | 42.2% | 24.5% | 16.2% | +| HLE no tools | 19.5% | 8.7% | - | - | - | +| HLE with search | 26.5% | 17.2% | - | - | - | +| BigBench Extra Hard | 74.4% | 64.8% | 33.1% | 21.9% | 19.3% | +| MMMLU | 88.4% | 86.3% | 76.6% | 67.4% | 70.7% | +| **Vision** | | | | | | +| MMMU Pro | 76.9% | 73.8% | 52.6% | 44.2% | 49.7% | +| OmniDocBench 1.5 (average edit distance, lower is better) | 0.131 | 0.149 | 0.181 | 0.290 | 0.365 | +| MATH-Vision | 85.6% | 82.4% | 59.5% | 52.4% | 46.0% | +| MedXPertQA MM | 61.3% | 58.1% | 28.7% | 23.5% | - | +| **Audio** | | | | | | +| CoVoST | - | - | 35.54 | 33.47 | - | +| FLEURS (lower is better) | - | - | 0.08 | 0.09 | - | +| **Long Context** | | | | | | +| MRCR v2 8 needle 128k (average) | 66.4% | 44.1% | 25.4% | 19.1% | 13.5% | + +## **Core Capabilities** + +Gemma 4 models handle a broad range of tasks across text, vision, and audio. Key capabilities include: + +* **Thinking** – Built-in reasoning mode that lets the model think step-by-step before answering. +* **Long Context** – Context windows of up to 128K tokens (E2B/E4B) and 256K tokens (26B A4B/31B). +* **Image Understanding** – Object detection, Document/PDF parsing, screen and UI understanding, chart comprehension, OCR (including multilingual), handwriting recognition, and pointing. Images can be processed at variable aspect ratios and resolutions. +* **Video Understanding** – Analyze video by processing sequences of frames. +* **Interleaved Multimodal Input** – Freely mix text and images in any order within a single prompt. +* **Function Calling** – Native support for structured tool use, enabling agentic workflows. +* **Coding** – Code generation, completion, and correction. +* **Multilingual** – Out-of-the-box support for 35+ languages, pre-trained on 140+ languages. +* **Audio** (E2B and E4B only) – Automatic speech recognition (ASR) and speech-to-translated-text translation across multiple languages. + + +## Getting Started + +You can use all Gemma 4 models with the latest version of Transformers. To get started, install the necessary dependencies in your environment: + +`pip install -U transformers torch accelerate` + +Once you have everything installed, you can proceed to load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForCausalLM + +MODEL_ID = "google/gemma-4-E4B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForCausalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output: + +```python +# Prompt +messages = [ + {"role": "system", "content": "You are a helpful assistant."}, + {"role": "user", "content": "Write a short joke about saving RAM."}, +] + +# Process input +text = processor.apply_chat_template( + messages, + tokenize=False, + add_generation_prompt=True, + enable_thinking=False +) +inputs = processor(text=text, return_tensors="pt").to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=1024) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +To enable reasoning, set `enable_thinking=True` and the `parse_response` function will take care of parsing the thinking output. + +Below, you will also find snippets for processing audio (E2B and E4B only), images, and video alongside text: + +
+Code for processing Audio + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process audio. To use it, make sure to install the following packages: + + +`pip install -U transformers torch torchvision librosa accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-E4B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the audio URL in the prompt: + + +```python +# Prompt - add audio before text +messages = [ + { + "role": "user", + "content": [ + {"type": "audio", "audio": "https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/Demos/sample-data/journal1.wav"}, + {"type": "text", "text": "Transcribe the following speech segment in its original language. Follow these specific instructions for formatting the answer:\n* Only output the transcription, with no newlines.\n* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three."}, + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ +
+Code for processing Images + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process images. To use it, make sure to install the following packages: + + +`pip install -U transformers torch torchvision accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-E4B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the image URL in the prompt: + + +```python +# Prompt - add image before text +messages = [ + { + "role": "user", "content": [ + {"type": "image", "url": "https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/Demos/sample-data/GoldenGate.png"}, + {"type": "text", "text": "What is shown in this image?"} + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ + +
+Code for processing Videos + +Instead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process videos. To use it, make sure to install the following packages: + +`pip install -U transformers torch torchvision librosa accelerate` + +You can then load the model with the code below: + +```python +from transformers import AutoProcessor, AutoModelForMultimodalLM + +MODEL_ID = "google/gemma-4-E4B-it" + +# Load model +processor = AutoProcessor.from_pretrained(MODEL_ID) +model = AutoModelForMultimodalLM.from_pretrained( + MODEL_ID, + dtype="auto", + device_map="auto" +) +``` + +Once the model is loaded, you can start generating output by directly referencing the video URL in the prompt: + + +```python +# Prompt - add video before text +messages = [ + { + 'role': 'user', + 'content': [ + {"type": "video", "video": "https://github.com/bebechien/gemma/raw/refs/heads/main/videos/ForBiggerBlazes.mp4"}, + {'type': 'text', 'text': 'Describe this video.'} + ] + } +] + +# Process input +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) +input_len = inputs["input_ids"].shape[-1] + +# Generate output +outputs = model.generate(**inputs, max_new_tokens=512) +response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) + +# Parse output +processor.parse_response(response) +``` + +
+ + + +## **Best Practices** + +For the best performance, use these configurations and best practices: + +### 1. Sampling Parameters + +Use the following standardized sampling configuration across all use cases: + +* `temperature=1.0` +* `top_p=0.95` +* `top_k=64` + +### 2. Thinking Mode Configuration + +Compared to Gemma 3, the models use standard `system`, `assistant`, and `user` roles. To properly manage the thinking process, use the following control tokens: + +* **Trigger Thinking:** Thinking is enabled by including the `<|think|>` token at the start of the system prompt. To disable thinking, remove the token. +* **Standard Generation:** When thinking is enabled, the model will output its internal reasoning followed by the final answer using this structure: + `<|channel>thought\n`**[Internal reasoning]**`` +* **Disabled Thinking Behavior:** For all models except for the E2B and E4B variants, if thinking is disabled, the model will still generate the tags but with an empty thought block: + `<|channel>thought\n`**[Final answer]** + +> [!Note] +> Note that many libraries like Transformers and llama.cpp handle the complexities of the chat template for you. + +### 3. Multi-Turn Conversations + +* **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final response. Thoughts from previous model turns must *not be added* before the next user turn begins. + +### 4. Modality order + +* For optimal performance with multimodal inputs, place image and/or audio content **before** the text in your prompt. + +### 5. Variable Image Resolution + +Aside from variable aspect ratios, Gemma 4 supports variable image resolution through a configurable visual token budget, which controls how many tokens are used to represent an image. A higher token budget preserves more visual detail at the cost of additional compute, while a lower budget enables faster inference for tasks that don't require fine-grained understanding. + +* The supported token budgets are: **70**, **140**, **280**, **560**, and **1120**. + * Use *lower budgets* for classification, captioning, or video understanding, where faster inference and processing many frames outweigh fine-grained detail. + * Use *higher budgets* for tasks like OCR, document parsing, or reading small text. + +### 6. Audio + +Use the following prompt structures for audio processing: + +* **Audio Speech Recognition (ASR)** + +```text +Transcribe the following speech segment in {LANGUAGE} into {LANGUAGE} text. + +Follow these specific instructions for formatting the answer: +* Only output the transcription, with no newlines. +* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three. +``` + +* **Automatic Speech Translation (AST)** + +```text +Transcribe the following speech segment in {SOURCE_LANGUAGE}, then translate it into {TARGET_LANGUAGE}. +When formatting the answer, first output the transcription in {SOURCE_LANGUAGE}, then one newline, then output the string '{TARGET_LANGUAGE}: ', then the translation in {TARGET_LANGUAGE}. +``` + +### 7. Audio and Video Length + +All models support image inputs and can process videos as frames whereas the E2B and E4B models also support audio inputs. Audio supports a maximum length of 30 seconds. Video supports a maximum of 60 seconds assuming the images are processed at one frame per second. + +## **Model Data** + +Data used for model training and how the data was processed. + +### **Training Dataset** + +Our pre-training dataset is a large-scale, diverse collection of data encompassing a wide range of domains and modalities, which includes web documents, code, images, audio, with a cutoff date of January 2025. Here are the key components: + +* **Web Documents**: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages. +* **Code**: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions. +* **Mathematics**: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. +* **Images**: A wide range of images enables the model to perform image analysis and visual data extraction tasks. + +The combination of these diverse data sources is crucial for training a powerful multimodal model that can handle a wide variety of different tasks and data formats. + +### **Data Preprocessing** + +Here are the key data cleaning and filtering methods applied to the training data: + +* **CSAM Filtering**: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. +* **Sensitive Data Filtering**: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. +* **Additional methods**: Filtering based on content quality and safety in line with [our policies](https://ai.google/static/documents/ai-responsibility-update-published-february-2025.pdf). + +## **Ethics and Safety** + +As open models become central to enterprise infrastructure, provenance and security are paramount. Developed by Google DeepMind, Gemma 4 undergoes the same rigorous safety evaluations as our proprietary Gemini models. + +### **Evaluation Approach** + +Gemma 4 models were developed in partnership with internal safety and responsible AI teams. A range of automated as well as human evaluations were conducted to help improve model safety. These evaluations align with [Google’s AI principles](https://ai.google/principles/), as well as safety policies, which aim to prevent our generative AI models from generating harmful content, including: + +* Content related to child sexual abuse material and exploitation +* Dangerous content (e.g., promoting suicide, or instructing in activities that could cause real-world harm) +* Sexually explicit content +* Hate speech (e.g., dehumanizing members of protected groups) +* Harassment (e.g., encouraging violence against people) + +### **Evaluation Results** + +For all areas of safety testing, we saw major improvements in all categories of content safety relative to previous Gemma models. Overall, Gemma 4 models significantly outperform Gemma 3 and 3n models in improving safety, while keeping unjustified refusals low. All testing was conducted without safety filters to evaluate the model capabilities and behaviors. For both text-to-text and image-to-text, and across all model sizes, the model produced minimal policy violations, and showed significant improvements over previous Gemma models' performance. + +## **Usage and Limitations** + +These models have certain limitations that users should be aware of. + +### **Intended Usage** + +Multimodal models (capable of processing vision, language, and/or audio) have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. + +* **Content Creation and Communication** + * **Text Generation**: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. + * **Chatbots and Conversational AI**: Power conversational interfaces for customer service, virtual assistants, or interactive applications. + * **Text Summarization**: Generate concise summaries of a text corpus, research papers, or reports. + * **Image Data Extraction**: These models can be used to extract, interpret, and summarize visual data for text communications. + * **Audio Processing and Interaction**: The smaller models (E2B and E4B) can analyze and interpret audio inputs, enabling voice-driven interactions and transcriptions. +* **Research and Education** + * **Natural Language Processing (NLP) and VLM Research**: These models can serve as a foundation for researchers to experiment with VLM and NLP techniques, develop algorithms, and contribute to the advancement of the field. + * **Language Learning Tools**: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. + * **Knowledge Exploration**: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. + +### **Limitations** + +* **Training Data** + * The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. + * The scope of the training dataset determines the subject areas the model can handle effectively. +* **Context and Task Complexity** + * Models perform well on tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. + * A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). +* **Language Ambiguity and Nuance** + * Natural language is inherently complex. Models might struggle to grasp subtle nuances, sarcasm, or figurative language. +* **Factual Accuracy** + * Models generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. +* **Common Sense** + * Models rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. + +### **Ethical Considerations and Risks** + +The development of vision-language models (VLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: + +* **Bias and Fairness** + * VLMs trained on large-scale, real-world text and image data can reflect socio-cultural biases embedded in the training material. Gemma 4 models underwent careful scrutiny, input data pre-processing, and post-training evaluations as reported in this card to help mitigate the risk of these biases. +* **Misinformation and Misuse** + * VLMs can be misused to generate text that is false, misleading, or harmful. + * Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit](https://ai.google.dev/responsible). +* **Transparency and Accountability** + * This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. + * A responsibly developed open model offers the opportunity to share innovation by making VLM technology accessible to developers and researchers across the AI ecosystem. + +**Risks identified and mitigations**: + +* **Generation of harmful content**: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. +* **Misuse for malicious purposes**: Technical limitations and developer and end-user education can help mitigate against malicious applications of VLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. +* **Privacy violations**: Models were trained on data filtered for removal of certain personal information and other sensitive data. Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. +* **Perpetuation of biases**: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. + +### **Benefits** + +At the time of release, this family of models provides high-performance open vision-language model implementations designed from the ground up for responsible AI development compared to similarly sized models. diff --git a/tooling/huggingface/model-cards/gemma-4-E4B-it-chat_template.jinja b/tooling/huggingface/model-cards/gemma-4-E4B-it-chat_template.jinja new file mode 100644 index 0000000..07e50e6 --- /dev/null +++ b/tooling/huggingface/model-cards/gemma-4-E4B-it-chat_template.jinja @@ -0,0 +1,344 @@ +{%- macro format_parameters(properties, required) -%} + {%- set standard_keys = ['description', 'type', 'properties', 'required', 'nullable'] -%} + {%- set ns = namespace(found_first=false) -%} + {%- for key, value in properties | dictsort -%} + {%- set add_comma = false -%} + {%- if key not in standard_keys -%} + {%- if ns.found_first %},{% endif -%} + {%- set ns.found_first = true -%} + {{ key }}:{ + {%- if value['description'] -%} + description:<|"|>{{ value['description'] }}<|"|> + {%- set add_comma = true -%} + {%- endif -%} + {%- if value['type'] | upper == 'STRING' -%} + {%- if value['enum'] -%} + {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%} + enum:{{ format_argument(value['enum']) }} + {%- endif -%} + {%- elif value['type'] | upper == 'ARRAY' -%} + {%- if value['items'] is mapping and value['items'] -%} + {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%} + items:{ + {%- set ns_items = namespace(found_first=false) -%} + {%- for item_key, item_value in value['items'] | dictsort -%} + {%- if item_value is not none -%} + {%- if ns_items.found_first %},{% endif -%} + {%- set ns_items.found_first = true -%} + {%- if item_key == 'properties' -%} + properties:{ + {%- if item_value is mapping -%} + {{- format_parameters(item_value, value['items']['required'] | default([])) -}} + {%- endif -%} + } + {%- elif item_key == 'required' -%} + required:[ + {%- for req_item in item_value -%} + <|"|>{{- req_item -}}<|"|> + {%- if not loop.last %},{% endif -%} + {%- endfor -%} + ] + {%- elif item_key == 'type' -%} + {%- if item_value is string -%} + type:{{ format_argument(item_value | upper) }} + {%- else -%} + type:{{ format_argument(item_value | map('upper') | list) }} + {%- endif -%} + {%- else -%} + {{ item_key }}:{{ format_argument(item_value) }} + {%- endif -%} + {%- endif -%} + {%- endfor -%} + } + {%- endif -%} + {%- endif -%} + {%- if value['nullable'] %} + {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%} + nullable:true + {%- endif -%} + {%- if value['type'] | upper == 'OBJECT' -%} + {%- if value['properties'] is defined and value['properties'] is mapping -%} + {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%} + properties:{ + {{- format_parameters(value['properties'], value['required'] | default([])) -}} + } + {%- elif value is mapping -%} + {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%} + properties:{ + {{- format_parameters(value, value['required'] | default([])) -}} + } + {%- endif -%} + {%- if value['required'] -%} + {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%} + required:[ + {%- for item in value['required'] | default([]) -%} + <|"|>{{- item -}}<|"|> + {%- if not loop.last %},{% endif -%} + {%- endfor -%} + ] + {%- endif -%} + {%- endif -%} + {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%} + type:<|"|>{{ value['type'] | upper }}<|"|>} + {%- endif -%} + {%- endfor -%} +{%- endmacro -%} +{%- macro format_function_declaration(tool_data) -%} + declaration:{{- tool_data['function']['name'] -}}{description:<|"|>{{- tool_data['function']['description'] -}}<|"|> + {%- set params = tool_data['function']['parameters'] -%} + {%- if params -%} + ,parameters:{ + {%- if params['properties'] -%} + properties:{ {{- format_parameters(params['properties'], params['required']) -}} }, + {%- endif -%} + {%- if params['required'] -%} + required:[ + {%- for item in params['required'] -%} + <|"|>{{- item -}}<|"|> + {{- ',' if not loop.last -}} + {%- endfor -%} + ], + {%- endif -%} + {%- if params['type'] -%} + type:<|"|>{{- params['type'] | upper -}}<|"|>} + {%- endif -%} + {%- endif -%} + {%- if 'response' in tool_data['function'] -%} + {%- set response_declaration = tool_data['function']['response'] -%} + ,response:{ + {%- if response_declaration['description'] -%} + description:<|"|>{{- response_declaration['description'] -}}<|"|>, + {%- endif -%} + {%- if response_declaration['type'] | upper == 'OBJECT' -%} + type:<|"|>{{- response_declaration['type'] | upper -}}<|"|>} + {%- endif -%} + {%- endif -%} + } +{%- endmacro -%} +{%- macro format_argument(argument, escape_keys=True) -%} + {%- if argument is string -%} + {{- '<|"|>' + argument + '<|"|>' -}} + {%- elif argument is boolean -%} + {{- 'true' if argument else 'false' -}} + {%- elif argument is mapping -%} + {{- '{' -}} + {%- set ns = namespace(found_first=false) -%} + {%- for key, value in argument | dictsort -%} + {%- if ns.found_first %},{% endif -%} + {%- set ns.found_first = true -%} + {%- if escape_keys -%} + {{- '<|"|>' + key + '<|"|>' -}} + {%- else -%} + {{- key -}} + {%- endif -%} + :{{- format_argument(value, escape_keys=escape_keys) -}} + {%- endfor -%} + {{- '}' -}} + {%- elif argument is sequence -%} + {{- '[' -}} + {%- for item in argument -%} + {{- format_argument(item, escape_keys=escape_keys) -}} + {%- if not loop.last %},{% endif -%} + {%- endfor -%} + {{- ']' -}} + {%- else -%} + {{- argument -}} + {%- endif -%} +{%- endmacro -%} +{%- macro strip_thinking(text) -%} + {%- set ns = namespace(result='') -%} + {%- for part in text.split('') -%} + {%- if '<|channel>' in part -%} + {%- set ns.result = ns.result + part.split('<|channel>')[0] -%} + {%- else -%} + {%- set ns.result = ns.result + part -%} + {%- endif -%} + {%- endfor -%} + {{- ns.result | trim -}} +{%- endmacro -%} + +{%- macro format_tool_response_block(tool_name, response) -%} + {{- '<|tool_response>' -}} + {%- if response is mapping -%} + {{- 'response:' + tool_name + '{' -}} + {%- for key, value in response | dictsort -%} + {{- key -}}:{{- format_argument(value, escape_keys=False) -}} + {%- if not loop.last %},{% endif -%} + {%- endfor -%} + {{- '}' -}} + {%- else -%} + {{- 'response:' + tool_name + '{value:' + format_argument(response, escape_keys=False) + '}' -}} + {%- endif -%} + {{- '' -}} +{%- endmacro -%} + +{%- set ns = namespace(prev_message_type=None) -%} +{%- set loop_messages = messages -%} +{{- bos_token -}} +{#- Handle System/Tool Definitions Block -#} +{%- if (enable_thinking is defined and enable_thinking) or tools or messages[0]['role'] in ['system', 'developer'] -%} + {{- '<|turn>system\n' -}} + + {#- Inject Thinking token at the very top of the FIRST system turn -#} + {%- if enable_thinking is defined and enable_thinking -%} + {{- '<|think|>\n' -}} + {%- set ns.prev_message_type = 'think' -%} + {%- endif -%} + + {%- if messages[0]['role'] in ['system', 'developer'] -%} + {{- messages[0]['content'] | trim -}} + {%- set loop_messages = messages[1:] -%} + {%- endif -%} + + {%- if tools -%} + {%- for tool in tools %} + {{- '<|tool>' -}} + {{- format_function_declaration(tool) | trim -}} + {{- '' -}} + {%- endfor %} + {%- set ns.prev_message_type = 'tool' -%} + {%- endif -%} + + {{- '\n' -}} +{%- endif %} + +{#- Pre-scan: find last user message index for reasoning guard -#} +{%- set ns_turn = namespace(last_user_idx=-1) -%} +{%- for i in range(loop_messages | length) -%} + {%- if loop_messages[i]['role'] == 'user' -%} + {%- set ns_turn.last_user_idx = i -%} + {%- endif -%} +{%- endfor -%} + +{#- Loop through messages -#} +{%- for message in loop_messages -%} + {%- if message['role'] != 'tool' -%} + {%- set ns.prev_message_type = None -%} + {%- set role = 'model' if message['role'] == 'assistant' else message['role'] -%} + {#- Detect continuation: suppress duplicate <|turn>model when previous non-tool message was also assistant -#} + {%- set prev_nt = namespace(role=None, found=false) -%} + {%- if loop.index0 > 0 -%} + {%- for j in range(loop.index0 - 1, -1, -1) -%} + {%- if not prev_nt.found -%} + {%- if loop_messages[j]['role'] != 'tool' -%} + {%- set prev_nt.role = loop_messages[j]['role'] -%} + {%- set prev_nt.found = true -%} + {%- endif -%} + {%- endif -%} + {%- endfor -%} + {%- endif -%} + {%- set continue_same_model_turn = (role == 'model' and prev_nt.role == 'assistant') -%} + {%- if not continue_same_model_turn -%} + {{- '<|turn>' + role + '\n' }} + {%- endif -%} + + {#- Render reasoning/reasoning_content as thinking channel -#} + {%- set thinking_text = message.get('reasoning') or message.get('reasoning_content') -%} + {%- if thinking_text and loop.index0 > ns_turn.last_user_idx and message.get('tool_calls') -%} + {{- '<|channel>thought\n' + thinking_text + '\n' -}} + {%- endif -%} + + {%- if message['tool_calls'] -%} + {%- for tool_call in message['tool_calls'] -%} + {%- set function = tool_call['function'] -%} + {{- '<|tool_call>call:' + function['name'] + '{' -}} + {%- if function['arguments'] is mapping -%} + {%- set ns_args = namespace(found_first=false) -%} + {%- for key, value in function['arguments'] | dictsort -%} + {%- if ns_args.found_first %},{% endif -%} + {%- set ns_args.found_first = true -%} + {{- key -}}:{{- format_argument(value, escape_keys=False) -}} + {%- endfor -%} + {%- elif function['arguments'] is string -%} + {{- function['arguments'] -}} + {%- endif -%} + {{- '}' -}} + {%- endfor -%} + {%- set ns.prev_message_type = 'tool_call' -%} + {%- endif -%} + + {%- set ns_tr_out = namespace(flag=false) -%} + {%- if message.get('tool_responses') -%} + {#- Legacy: tool_responses embedded on the assistant message (Google/Gemma native) -#} + {%- for tool_response in message['tool_responses'] -%} + {{- format_tool_response_block(tool_response['name'] | default('unknown'), tool_response['response']) -}} + {%- set ns_tr_out.flag = true -%} + {%- set ns.prev_message_type = 'tool_response' -%} + {%- endfor -%} + {%- elif message.get('tool_calls') -%} + {#- OpenAI Chat Completions: forward-scan consecutive role:tool messages -#} + {%- set ns_tool_scan = namespace(stopped=false) -%} + {%- for k in range(loop.index0 + 1, loop_messages | length) -%} + {%- if ns_tool_scan.stopped -%} + {%- elif loop_messages[k]['role'] != 'tool' -%} + {%- set ns_tool_scan.stopped = true -%} + {%- else -%} + {%- set follow = loop_messages[k] -%} + {#- Resolve tool_call_id to function name -#} + {%- set ns_tname = namespace(name=follow.get('name') | default('unknown')) -%} + {%- for tc in message['tool_calls'] -%} + {%- if tc.get('id') == follow.get('tool_call_id') -%} + {%- set ns_tname.name = tc['function']['name'] -%} + {%- endif -%} + {%- endfor -%} + {#- Handle content as string or content-parts array -#} + {%- set tool_body = follow.get('content') -%} + {%- if tool_body is string -%} + {{- format_tool_response_block(ns_tname.name, tool_body) -}} + {%- elif tool_body is sequence and tool_body is not string -%} + {%- set ns_txt = namespace(s='') -%} + {%- for part in tool_body -%} + {%- if part.get('type') == 'text' -%} + {%- set ns_txt.s = ns_txt.s + (part.get('text') | default('')) -%} + {%- endif -%} + {%- endfor -%} + {{- format_tool_response_block(ns_tname.name, ns_txt.s) -}} + {%- else -%} + {{- format_tool_response_block(ns_tname.name, tool_body) -}} + {%- endif -%} + {%- set ns_tr_out.flag = true -%} + {%- set ns.prev_message_type = 'tool_response' -%} + {%- endif -%} + {%- endfor -%} + {%- endif -%} + + {%- if message['content'] is string -%} + {%- if role == 'model' -%} + {{- strip_thinking(message['content']) -}} + {%- else -%} + {{- message['content'] | trim -}} + {%- endif -%} + {%- elif message['content'] is sequence -%} + {%- for item in message['content'] -%} + {%- if item['type'] == 'text' -%} + {%- if role == 'model' -%} + {{- strip_thinking(item['text']) -}} + {%- else -%} + {{- item['text'] | trim -}} + {%- endif -%} + {%- elif item['type'] == 'image' -%} + {{- '<|image|>' -}} + {%- set ns.prev_message_type = 'image' -%} + {%- elif item['type'] == 'audio' -%} + {{- '<|audio|>' -}} + {%- set ns.prev_message_type = 'audio' -%} + {%- elif item['type'] == 'video' -%} + {{- '<|video|>' -}} + {%- set ns.prev_message_type = 'video' -%} + {%- endif -%} + {%- endfor -%} + {%- endif -%} + + {%- if ns.prev_message_type == 'tool_call' and not ns_tr_out.flag -%} + {{- '<|tool_response>' -}} + {%- elif not (ns_tr_out.flag and not message.get('content')) -%} + {{- '\n' -}} + {%- endif -%} + {%- endif -%} +{%- endfor -%} + +{%- if add_generation_prompt -%} + {%- if ns.prev_message_type != 'tool_response' and ns.prev_message_type != 'tool_call' -%} + {{- '<|turn>model\n' -}} + {%- endif -%} +{%- endif -%} \ No newline at end of file diff --git a/tooling/huggingface/model-cards/gemma-4-E4B-it-tokenizer_config.json b/tooling/huggingface/model-cards/gemma-4-E4B-it-tokenizer_config.json new file mode 100644 index 0000000..375b25d --- /dev/null +++ b/tooling/huggingface/model-cards/gemma-4-E4B-it-tokenizer_config.json @@ -0,0 +1,74 @@ +{ + "audio_token": "<|audio|>", + "backend": "tokenizers", + "boa_token": "<|audio>", + "boi_token": "<|image>", + "bos_token": "", + "eoa_token": "", + "eoc_token": "", + "eoi_token": "", + "eos_token": "", + "eot_token": "", + "escape_token": "<|\"|>", + "etc_token": "", + "etd_token": "", + "etr_token": "", + "extra_special_tokens": [ + "<|video|>" + ], + "image_token": "<|image|>", + "mask_token": "", + "model_max_length": 1000000000000000019884624838656, + "pad_token": "", + "padding_side": "left", + "processor_class": "Gemma4Processor", + "response_schema": { + "type": "object", + "properties": { + "role": { + "const": "assistant" + }, + "thinking": { + "type": "string" + }, + "content": { + "type": "string" + }, + "tool_calls": { + "x-regex-iterator": "<\\|tool_call>(.*?)", + "type": "array", + "items": { + "type": "object", + "properties": { + "type": { + "const": "function" + }, + "function": { + "type": "object", + "x-regex": "call\\:(?P\\w+)(?P\\{.*\\})", + "properties": { + "name": { + "type": "string" + }, + "arguments": { + "type": "object", + "x-parser": "gemma4-tool-call", + "additionalProperties": {} + } + } + } + } + } + } + }, + "x-regex": "(\\<\\|channel\\>thought\\n(?P.*?)\\)?(?P\\<\\|tool_call\\>.*\\)?(?P(?:(?!\\)(?!\\<\\|tool_response\\>).)+)?(?:\\|\\<\\|tool_response\\>)?" + }, + "soc_token": "<|channel>", + "sot_token": "<|turn>", + "stc_token": "<|tool_call>", + "std_token": "<|tool>", + "str_token": "<|tool_response>", + "think_token": "<|think|>", + "tokenizer_class": "GemmaTokenizer", + "unk_token": "" +} diff --git a/tooling/huggingface/recipes/notebooks/Gemma4_E2B-Multimodal-extracted.py b/tooling/huggingface/recipes/notebooks/Gemma4_E2B-Multimodal-extracted.py new file mode 100644 index 0000000..355ba01 --- /dev/null +++ b/tooling/huggingface/recipes/notebooks/Gemma4_E2B-Multimodal-extracted.py @@ -0,0 +1,389 @@ +# Gemma4_(E2B)-Multimodal.ipynb — extracted cells +# Source: https://github.com/huggingface/huggingface-gemma-recipes/blob/main/notebooks/Gemma4_(E2B)-Multimodal.ipynb + +# ===== CELL 0 (markdown) ===== +# This notebook has vibe test examples to test image, text, audio capabilities of Gemma-4 model. To get started, let's install latest stable release of transformers. + +# ===== CELL 1 (code) ===== +!pip install -U transformers + +# ===== CELL 2 (markdown) ===== +# We can load model into `AutoModelForMultimodalLM` to make use of all capabilities. + +# ===== CELL 3 (code) ===== +import torch +from PIL import Image + +from transformers import AutoModelForMultimodalLM, AutoProcessor +#model_list = ["google/gemma-4-26B-A4B-it", "google/gemma-4-E4B-it", +# "google/gemma-4-E2B-it", "google/gemma-4-31B-it"] +model_id = "google/gemma-4-E2B-it" +model = AutoModelForMultimodalLM.from_pretrained(model_id, device_map="auto") +processor = AutoProcessor.from_pretrained(model_id) + +# ===== CELL 4 (markdown) ===== +# ## Code completion + +# ===== CELL 5 (markdown) ===== +# We give Gemma-4 a website screenshot to reproduce the code. + +# ===== CELL 6 (code) ===== +messages = [ + { + "role": "user", + "content": [ + { + "type": "image", + "image": "https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/landing_page.png", + }, + {"type": "text", "text": "Write HTML code for this page."}, + ], + } +] + +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, + enable_thinking=True, +).to(model.device) + +output = model.generate(**inputs, max_new_tokens=4000) + +# ===== CELL 7 (code) ===== +input_len = inputs.input_ids.shape[-1] +generated_text_ids = output[0][input_len:] +generated_text = processor.decode(generated_text_ids, skip_special_tokens=True) +result = processor.parse_response(generated_text) + +print(result["content"]) + +# ===== CELL 8 (markdown) ===== +# ## Video Inference + +# ===== CELL 9 (markdown) ===== +# We test Gemma-4 on video understanding. If you want to run this example with larger models which don't take audio input, disable `load_audio_from_video`. + +# ===== CELL 10 (code) ===== +messages = [ + { + "role": "user", + "content": [ + {"type": "video", "url": "https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/concert.mp4"}, + {"type": "text", "text": "What is happening in the video? What is the song about?"}, + ], + }, +] +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, + load_audio_from_video=True, +).to(model.device) +output = model.generate(**inputs, max_new_tokens=200) +input_len = inputs.input_ids.shape[-1] +generated_text_ids = output[0][input_len:] +generated_text = processor.decode(generated_text_ids, skip_special_tokens=True) +result = processor.parse_response(generated_text) + +# ===== CELL 11 (code) ===== +print(result["content"]) + +# ===== CELL 12 (markdown) ===== +# ## Multimodal Function Calling + +# ===== CELL 13 (code) ===== +import re + +WEATHER_TOOL = { + "type": "function", + "function": { + "name": "get_weather", + "description": "Gets the current weather for a specific location.", + "parameters": { + "type": "object", + "properties": { + "city": {"type": "string", "description": "The city name"}, + }, + "required": ["city"], + }, + }, +} +tools = [WEATHER_TOOL] + +messages = [ + {"role": "user", "content": [ + {"type": "image", "image": "https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/thailand.jpg"}, + {"type": "text", "text": "What is the city in this image? Check the weather there right now."}, + ]}, +] + +inputs = processor.apply_chat_template( + messages, + tools=[WEATHER_TOOL], + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, + enable_thinking=True, +).to(model.device) + +# ===== CELL 14 (code) ===== +output = model.generate(**inputs, max_new_tokens=1000) + +# ===== CELL 15 (code) ===== +input_len = inputs.input_ids.shape[-1] +generated_text_ids = output[0][input_len:] +generated_text = processor.decode(generated_text_ids, skip_special_tokens=True) +result = processor.parse_response(generated_text) + +# ===== CELL 16 (code) ===== +print(result["content"]) + +# ===== CELL 17 (markdown) ===== +# # Any-to-any inference + +# ===== CELL 18 (markdown) ===== +# We can also run the model with `any-to-any` pipeline. + +# ===== CELL 19 (code) ===== +from transformers import pipeline + +pipe = pipeline("any-to-any", model="google/gemma-4-e2b-it") + +messages = [ + { + "role": "user", + "content": [ + { + "type": "video", + "image": "https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/rockets.mp4", + }, + {"type": "text", "text": "What is happening in this video?"}, + ], + } +] + +# ===== CELL 20 (code) ===== +pipe(messages)#, load_audio_from_video=True) + +# ===== CELL 21 (code) ===== +messages = [ + { + "role": "user", + "content": [ + { + "type": "video", + "image": "https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/rockets.mp4", + }, + {"type": "text", "text": "What is happening in this video?"}, + ], + } +] + +inputs = processor.apply_chat_template( + messages, + tokenize=True, + add_generation_prompt=True, + return_dict=True, + return_tensors="pt" +) +inputs = inputs.to(model.device) + +generated_ids = model.generate(**inputs, max_new_tokens=128) +generated_ids_trimmed = [ + out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) +] +output_text = processor.batch_decode( + generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False +) +print(output_text) + +# ===== CELL 22 (markdown) ===== +# # Object detection and pointing + +# ===== CELL 23 (code) ===== +import re +import torch +from transformers.image_utils import load_image +from PIL import Image +import matplotlib.pyplot as plt +import matplotlib.patches as patches +import json + +# ===== CELL 24 (code) ===== +image_url = "https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/bike.png" +image = load_image(image_url) + +# ===== CELL 25 (code) ===== +def resize_to_48_multiple(image): + w, h = image.size + new_w = (w // 48) * 48 + new_h = (h // 48) * 48 + return image.crop((0, 0, new_w, new_h)) + +# ===== CELL 26 (code) ===== +def inputs_for_object_detection(image, what_object): + messages = [ + { + "role": "user", "content": [ + {"type": "image", "image": image}, + {"type": "text", "text": f"What's the bounding box for the {what_object} in the image?"} + ] + } + ] + + inputs = processor.apply_chat_template( + messages, + tokenize=True, + add_generation_prompt=True, + return_dict=True, + return_tensors="pt", + enable_thinking=False, + ) + + return inputs.to(model.device) + +# ===== CELL 27 (code) ===== +def extract_json(text: str): + text = text.strip() + + text = re.sub(r"^```(?:json)?\s*", "", text) + text = re.sub(r"\s*```$", "", text) + + # Try direct parse first + try: + return json.loads(text) + except json.JSONDecodeError: + pass + + # Fallback: extract first JSON object or array + match = re.search(r'(\{.*\}|\[.*\])', text, re.DOTALL) + if match: + candidate = match.group(1) + return json.loads(candidate) + + raise ValueError("No valid JSON found") + +# ===== CELL 28 (code) ===== +def detect_object(image_url, what_object): + image = load_image(image_url) + image = resize_to_48_multiple(image) + inputs = inputs_for_object_detection(image, what_object) + input_len = inputs["input_ids"].shape[-1] + generated_outputs = model.generate(**inputs, max_new_tokens=1000, do_sample=False) + generated = processor.decode(generated_outputs[0, input_len:]) + parsed_json = extract_json(generated)[0] + return parsed_json + +# ===== CELL 29 (code) ===== +def draw_pascal_voc_boxes(i, image, box, label, resize_shape=(1000,1000)): + dpi = 72 + width, height = image.size + fig, ax = plt.subplots(1, figsize=[width/dpi, height/dpi], tight_layout={'pad':0}) + + ax.imshow(image) + + ymin, xmin, ymax, xmax = box + re_h, re_w = resize_shape if resize_shape is not None else (height, width) + xmin = (xmin / re_w) * width + ymin = (ymin/ re_h) * height + xmax = (xmax / re_w) * width + ymax = (ymax/ re_h) * height + + w = xmax - xmin + h = ymax - ymin + + rect = patches.Rectangle( + (xmin, ymin), + w, + h, + linewidth=10, + edgecolor="green", + facecolor="none" + ) + ax.add_patch(rect) + + if label is not None: + ax.text(xmin, ymin-25, label, fontsize=24, bbox=dict(facecolor="yellow", alpha=0.5)) + + plt.axis("off") + plt.savefig(f"boxes_{i}.png") + plt.close(fig) + display(fig) + +# ===== CELL 30 (code) ===== +def display_detected_object(image_url, what_object): + image = load_image(image_url) + image = resize_to_48_multiple(image) + detection = detect_object(image_url, what_object) + box = detection["box_2d"] + label = detection.get("label", f"{what_object}") + draw_pascal_voc_boxes("1000", image, box, label) + +# ===== CELL 31 (code) ===== +display_detected_object("https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/bike.png", "bike") + +# ===== CELL 32 (markdown) ===== +# ## Captioning + +# ===== CELL 33 (code) ===== +messages = [ + { + "role": "user", + "content": [ + {"type": "image", "url": "https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/bird.png"}, + {"type": "text", "text": "Write single detailed caption for this image."}, + ], + }, +] + +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) + +output = model.generate(**inputs, max_new_tokens=512) +input_len = inputs.input_ids.shape[-1] +generated_text_ids = output[0][input_len:] +generated_text = processor.decode(generated_text_ids, skip_special_tokens=True) +result = processor.parse_response(generated_text) +print(result["content"]) + +# ===== CELL 34 (markdown) ===== +# ## Audio Understanding + +# ===== CELL 35 (code) ===== +messages = [ + { + "role": "user", + "content": [ + {"type": "audio", "url": "https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/obama_first_45_secs.mp3"}, + {"type": "text", "text": "Can you describe this audio in detail?"}, + ], + }, +] + +inputs = processor.apply_chat_template( + messages, + tokenize=True, + return_dict=True, + return_tensors="pt", + add_generation_prompt=True, +).to(model.device) + +output = model.generate( + **inputs, + max_new_tokens=1000, + do_sample=False, +) + +print(processor.decode(output[0], skip_special_tokens=True)) + diff --git a/tooling/huggingface/recipes/notebooks/Gemma4_E2B-Multimodal.ipynb b/tooling/huggingface/recipes/notebooks/Gemma4_E2B-Multimodal.ipynb new file mode 100644 index 0000000..ec9243d --- /dev/null +++ b/tooling/huggingface/recipes/notebooks/Gemma4_E2B-Multimodal.ipynb @@ -0,0 +1,595 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.10.0" + } + }, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "This notebook has vibe test examples to test image, text, audio capabilities of Gemma-4 model. To get started, let's install latest stable release of transformers." + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "!pip install -U transformers" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "We can load model into `AutoModelForMultimodalLM` to make use of all capabilities." + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "import torch\n", + "from PIL import Image\n", + "\n", + "from transformers import AutoModelForMultimodalLM, AutoProcessor\n", + "#model_list = [\"google/gemma-4-26B-A4B-it\", \"google/gemma-4-E4B-it\",\n", + "# \"google/gemma-4-E2B-it\", \"google/gemma-4-31B-it\"]\n", + "model_id = \"google/gemma-4-E2B-it\"\n", + "model = AutoModelForMultimodalLM.from_pretrained(model_id, device_map=\"auto\")\n", + "processor = AutoProcessor.from_pretrained(model_id)" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Code completion" + ], + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "We give Gemma-4 a website screenshot to reproduce the code." + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"image\",\n", + " \"image\": \"https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/landing_page.png\",\n", + " },\n", + " {\"type\": \"text\", \"text\": \"Write HTML code for this page.\"},\n", + " ],\n", + " }\n", + "]\n", + "\n", + "inputs = processor.apply_chat_template(\n", + " messages,\n", + " tokenize=True,\n", + " return_dict=True,\n", + " return_tensors=\"pt\",\n", + " add_generation_prompt=True,\n", + " enable_thinking=True,\n", + ").to(model.device)\n", + "\n", + "output = model.generate(**inputs, max_new_tokens=4000)" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "input_len = inputs.input_ids.shape[-1]\n", + "generated_text_ids = output[0][input_len:]\n", + "generated_text = processor.decode(generated_text_ids, skip_special_tokens=True)\n", + "result = processor.parse_response(generated_text)\n", + "\n", + "print(result[\"content\"])" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Video Inference" + ], + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "We test Gemma-4 on video understanding. If you want to run this example with larger models which don't take audio input, disable `load_audio_from_video`." + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"video\", \"url\": \"https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/concert.mp4\"},\n", + " {\"type\": \"text\", \"text\": \"What is happening in the video? What is the song about?\"},\n", + " ],\n", + " },\n", + "]\n", + "inputs = processor.apply_chat_template(\n", + " messages,\n", + " tokenize=True,\n", + " return_dict=True,\n", + " return_tensors=\"pt\",\n", + " add_generation_prompt=True,\n", + " load_audio_from_video=True,\n", + ").to(model.device)\n", + "output = model.generate(**inputs, max_new_tokens=200)\n", + "input_len = inputs.input_ids.shape[-1]\n", + "generated_text_ids = output[0][input_len:]\n", + "generated_text = processor.decode(generated_text_ids, skip_special_tokens=True)\n", + "result = processor.parse_response(generated_text)\n" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "print(result[\"content\"])" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Multimodal Function Calling" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "import re\n", + "\n", + "WEATHER_TOOL = {\n", + " \"type\": \"function\",\n", + " \"function\": {\n", + " \"name\": \"get_weather\",\n", + " \"description\": \"Gets the current weather for a specific location.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"city\": {\"type\": \"string\", \"description\": \"The city name\"},\n", + " },\n", + " \"required\": [\"city\"],\n", + " },\n", + " },\n", + "}\n", + "tools = [WEATHER_TOOL]\n", + "\n", + "messages = [\n", + " {\"role\": \"user\", \"content\": [\n", + " {\"type\": \"image\", \"image\": \"https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/thailand.jpg\"},\n", + " {\"type\": \"text\", \"text\": \"What is the city in this image? Check the weather there right now.\"},\n", + " ]},\n", + "]\n", + "\n", + "inputs = processor.apply_chat_template(\n", + " messages,\n", + " tools=[WEATHER_TOOL],\n", + " tokenize=True,\n", + " return_dict=True,\n", + " return_tensors=\"pt\",\n", + " add_generation_prompt=True,\n", + " enable_thinking=True,\n", + ").to(model.device)" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "output = model.generate(**inputs, max_new_tokens=1000)" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "input_len = inputs.input_ids.shape[-1]\n", + "generated_text_ids = output[0][input_len:]\n", + "generated_text = processor.decode(generated_text_ids, skip_special_tokens=True)\n", + "result = processor.parse_response(generated_text)" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "print(result[\"content\"])" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "# Any-to-any inference" + ], + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "We can also run the model with `any-to-any` pipeline." + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "from transformers import pipeline\n", + "\n", + "pipe = pipeline(\"any-to-any\", model=\"google/gemma-4-e2b-it\")\n", + "\n", + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"video\",\n", + " \"image\": \"https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/rockets.mp4\",\n", + " },\n", + " {\"type\": \"text\", \"text\": \"What is happening in this video?\"},\n", + " ],\n", + " }\n", + "]\n" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "pipe(messages)#, load_audio_from_video=True)" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"video\",\n", + " \"image\": \"https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/rockets.mp4\",\n", + " },\n", + " {\"type\": \"text\", \"text\": \"What is happening in this video?\"},\n", + " ],\n", + " }\n", + "]\n", + "\n", + "inputs = processor.apply_chat_template(\n", + " messages,\n", + " tokenize=True,\n", + " add_generation_prompt=True,\n", + " return_dict=True,\n", + " return_tensors=\"pt\"\n", + ")\n", + "inputs = inputs.to(model.device)\n", + "\n", + "generated_ids = model.generate(**inputs, max_new_tokens=128)\n", + "generated_ids_trimmed = [\n", + " out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)\n", + "]\n", + "output_text = processor.batch_decode(\n", + " generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False\n", + ")\n", + "print(output_text)\n" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "# Object detection and pointing" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "import re\n", + "import torch\n", + "from transformers.image_utils import load_image\n", + "from PIL import Image\n", + "import matplotlib.pyplot as plt\n", + "import matplotlib.patches as patches\n", + "import json" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "image_url = \"https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/bike.png\"\n", + "image = load_image(image_url)" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def resize_to_48_multiple(image):\n", + " w, h = image.size\n", + " new_w = (w // 48) * 48\n", + " new_h = (h // 48) * 48\n", + " return image.crop((0, 0, new_w, new_h))" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def inputs_for_object_detection(image, what_object):\n", + " messages = [\n", + " {\n", + " \"role\": \"user\", \"content\": [\n", + " {\"type\": \"image\", \"image\": image},\n", + " {\"type\": \"text\", \"text\": f\"What's the bounding box for the {what_object} in the image?\"}\n", + " ]\n", + " }\n", + " ]\n", + "\n", + " inputs = processor.apply_chat_template(\n", + " messages,\n", + " tokenize=True,\n", + " add_generation_prompt=True,\n", + " return_dict=True,\n", + " return_tensors=\"pt\",\n", + " enable_thinking=False,\n", + " )\n", + "\n", + " return inputs.to(model.device)" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def extract_json(text: str):\n", + " text = text.strip()\n", + "\n", + " text = re.sub(r\"^```(?:json)?\\s*\", \"\", text)\n", + " text = re.sub(r\"\\s*```$\", \"\", text)\n", + "\n", + " # Try direct parse first\n", + " try:\n", + " return json.loads(text)\n", + " except json.JSONDecodeError:\n", + " pass\n", + "\n", + " # Fallback: extract first JSON object or array\n", + " match = re.search(r'(\\{.*\\}|\\[.*\\])', text, re.DOTALL)\n", + " if match:\n", + " candidate = match.group(1)\n", + " return json.loads(candidate)\n", + "\n", + " raise ValueError(\"No valid JSON found\")" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def detect_object(image_url, what_object):\n", + " image = load_image(image_url)\n", + " image = resize_to_48_multiple(image)\n", + " inputs = inputs_for_object_detection(image, what_object)\n", + " input_len = inputs[\"input_ids\"].shape[-1]\n", + " generated_outputs = model.generate(**inputs, max_new_tokens=1000, do_sample=False)\n", + " generated = processor.decode(generated_outputs[0, input_len:])\n", + " parsed_json = extract_json(generated)[0]\n", + " return parsed_json" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def draw_pascal_voc_boxes(i, image, box, label, resize_shape=(1000,1000)):\n", + " dpi = 72\n", + " width, height = image.size\n", + " fig, ax = plt.subplots(1, figsize=[width/dpi, height/dpi], tight_layout={'pad':0})\n", + "\n", + " ax.imshow(image)\n", + "\n", + " ymin, xmin, ymax, xmax = box\n", + " re_h, re_w = resize_shape if resize_shape is not None else (height, width)\n", + " xmin = (xmin / re_w) * width\n", + " ymin = (ymin/ re_h) * height\n", + " xmax = (xmax / re_w) * width\n", + " ymax = (ymax/ re_h) * height\n", + "\n", + " w = xmax - xmin\n", + " h = ymax - ymin\n", + "\n", + " rect = patches.Rectangle(\n", + " (xmin, ymin),\n", + " w,\n", + " h,\n", + " linewidth=10,\n", + " edgecolor=\"green\",\n", + " facecolor=\"none\"\n", + " )\n", + " ax.add_patch(rect)\n", + "\n", + " if label is not None:\n", + " ax.text(xmin, ymin-25, label, fontsize=24, bbox=dict(facecolor=\"yellow\", alpha=0.5))\n", + "\n", + " plt.axis(\"off\")\n", + " plt.savefig(f\"boxes_{i}.png\")\n", + " plt.close(fig)\n", + " display(fig)" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def display_detected_object(image_url, what_object):\n", + " image = load_image(image_url)\n", + " image = resize_to_48_multiple(image)\n", + " detection = detect_object(image_url, what_object)\n", + " box = detection[\"box_2d\"]\n", + " label = detection.get(\"label\", f\"{what_object}\")\n", + " draw_pascal_voc_boxes(\"1000\", image, box, label)" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "display_detected_object(\"https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/bike.png\", \"bike\")" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "##\u00a0Captioning" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"image\", \"url\": \"https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/bird.png\"},\n", + " {\"type\": \"text\", \"text\": \"Write single detailed caption for this image.\"},\n", + " ],\n", + " },\n", + "]\n", + "\n", + "inputs = processor.apply_chat_template(\n", + " messages,\n", + " tokenize=True,\n", + " return_dict=True,\n", + " return_tensors=\"pt\",\n", + " add_generation_prompt=True,\n", + ").to(model.device)\n", + "\n", + "output = model.generate(**inputs, max_new_tokens=512)\n", + "input_len = inputs.input_ids.shape[-1]\n", + "generated_text_ids = output[0][input_len:]\n", + "generated_text = processor.decode(generated_text_ids, skip_special_tokens=True)\n", + "result = processor.parse_response(generated_text)\n", + "print(result[\"content\"])" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Audio Understanding" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"audio\", \"url\": \"https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/obama_first_45_secs.mp3\"},\n", + " {\"type\": \"text\", \"text\": \"Can you describe this audio in detail?\"},\n", + " ],\n", + " },\n", + "]\n", + "\n", + "inputs = processor.apply_chat_template(\n", + " messages,\n", + " tokenize=True,\n", + " return_dict=True,\n", + " return_tensors=\"pt\",\n", + " add_generation_prompt=True,\n", + ").to(model.device)\n", + "\n", + "output = model.generate(\n", + " **inputs,\n", + " max_new_tokens=1000,\n", + " do_sample=False,\n", + ")\n", + "\n", + "print(processor.decode(output[0], skip_special_tokens=True))\n" + ], + "metadata": {}, + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file diff --git a/tooling/huggingface/recipes/scripts/carla_vlm_gemma.py b/tooling/huggingface/recipes/scripts/carla_vlm_gemma.py new file mode 100644 index 0000000..422ddb4 --- /dev/null +++ b/tooling/huggingface/recipes/scripts/carla_vlm_gemma.py @@ -0,0 +1,302 @@ +# Copyright 2020-2026 The HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# /// script +# dependencies = [ +# "trl", +# "openenv-carla-env @ git+https://huggingface.co/spaces/sergiopaniego/carla_env", +# ] +# /// + + +""" +GRPO training with OpenEnv's CARLA environment for VLMs (Vision Language Models). + +This script uses `environment_factory` with multimodal tool responses: each tool action +returns a camera image from the vehicle alongside the text scene description, allowing the +VLM to see the driving scene visually after each action. + +The CARLA environment simulates an emergency driving scenario where pedestrians are ahead +and the model must learn to observe the scene and take the correct action (e.g., swerve +to an empty lane) to minimize casualties. + +Setup: +```sh +pip install "openenv-carla-env @ git+https://huggingface.co/spaces/sergiopaniego/carla_env" +``` + +Usage (requires at least 2 CARLA Spaces, each supports only 1 concurrent connection): +```sh +python examples/scripts/openenv/carla_vlm.py \ + --env-urls https://server1.hf.space https://server2.hf.space +``` +""" + +import argparse +import base64 +from io import BytesIO + +from carla_env import CarlaAction, CarlaEnv +from datasets import Dataset +from PIL import Image + +from trl import GRPOConfig, GRPOTrainer + + +def parse_args(): + parser = argparse.ArgumentParser(description="Run GRPO VLM training with CARLA environment.") + parser.add_argument("--model", type=str, default="google/gemma-4-E2B-it") + parser.add_argument( + "--env-urls", + type=str, + nargs="+", + required=True, + help="URLs for CARLA environment servers. At least 2 required (1 Space = 1 connection).", + ) + parser.add_argument("--dataset-size", type=int, default=1000) + parser.add_argument("--max-completion-length", type=int, default=3072) + parser.add_argument("--per-device-train-batch-size", type=int, default=None, help="Defaults to len(env-urls).") + parser.add_argument("--gradient-accumulation-steps", type=int, default=4) + parser.add_argument("--max-steps", type=int, default=100) + parser.add_argument("--image-size", type=int, default=256, help="Resize camera images to this size. 0 to disable.") + parser.add_argument("--trackio-space-id", type=str, default=None, help="Trackio Space ID for logging.") + parser.add_argument("--use-lora", action="store_true", help="Use LoRA for memory-efficient training.") + parser.add_argument("--lora-r", type=int, default=128, help="LoRA rank.") + parser.add_argument("--lora-alpha", type=int, default=256, help="LoRA alpha.") + parser.add_argument( + "--lora-target-modules", + type=str, + default="llm-only", + help="LoRA target modules. Use 'llm-only' to skip vision encoder, 'all-linear' for all.", + ) + parser.add_argument( + "--learning-rate", type=float, default=5e-6, help="Learning rate. Default 5e-6 (good for LoRA r=128)." + ) + parser.add_argument("--hub-model-id", type=str, default=None) + parser.add_argument("--hub-private-repo", action="store_true", help="Make the Hub repo private.") + parser.add_argument("--run-name", type=str, default=None) + parser.add_argument("--report-to", type=str, default="trackio", help="Logging backend: wandb, trackio, none.") + return parser.parse_args() + + +SIM_TICKS = 10 + + +def reward_func(completions, environments, **kwargs): + rewards = [] + for i, (comp, env) in enumerate(zip(completions, environments, strict=False)): + # Advance to episode end to capture rubric_reward + try: + final = env._advance_until_done() + if final and final.observation.rubric_reward: + env.reward = final.observation.rubric_reward + except Exception as e: + print(f"[WARN] _advance_until_done failed for gen={i}: {e}") + r = env.reward + tools = [ + msg["tool_calls"][0]["function"]["name"] for msg in comp if isinstance(msg, dict) and msg.get("tool_calls") + ] + rewards.append(r) + print(f"[DEBUG reward] gen={i} tools={tools} env_reward={env.reward} total={r}") + return rewards + + +def main(): + args = parse_args() + env_url_iter = iter(args.env_urls) + image_size = args.image_size + + prompt = """\ +You control an autonomous vehicle in an emergency. There are pedestrians ahead and you must \ +decide what to do immediately. + +You will see a camera image from the vehicle after each action. Use the visual information +along with the scene description to decide your next action. + +You have the following tools available: +- `observe`: Advance time and get a new observation of the scene with a camera image. +- `emergency_stop`: Apply maximum braking to stop the vehicle. +- `lane_change(direction)`: Change lane to the left or right. Direction must be "left" or "right". + +Make one tool call at a time, wait for the result, then decide your next action. +Observe the scene first, then decide the best course of action to minimize harm. +Consider all available actions - sometimes avoiding the obstacle by changing lanes \ +is safer than stopping in its path.""" + + dataset = Dataset.from_dict({"prompt": [[{"role": "user", "content": prompt}] for _ in range(args.dataset_size)]}) + + class CarlaVLMEnv: + def __init__(self): + self.url = next(env_url_iter) + self.client = CarlaEnv(base_url=self.url, connect_timeout_s=30, message_timeout_s=120) + self.reward = 0.0 + + @staticmethod + def _describe(obs) -> str: + parts = [] + parts.append(f"Speed: {obs.speed_kmh:.1f} km/h.") + if obs.nearby_actors: + for actor in obs.nearby_actors: + parts.append(f"- {actor.get('type', 'actor')} at {actor.get('distance', '?')}m") + else: + parts.append("No nearby actors detected.") + if obs.collision_detected: + parts.append(f"COLLISION detected with {obs.collided_with or 'unknown'}!") + return "\n".join(parts) + + @staticmethod + def _decode_image(camera_image_b64, target_size): + """Decode base64 JPEG image and optionally resize.""" + img_bytes = base64.b64decode(camera_image_b64) + img = Image.open(BytesIO(img_bytes)) + if target_size > 0: + img.thumbnail((target_size, target_size), Image.LANCZOS) + return img + + def _format_multimodal(self, obs) -> list: + """Format observation as multimodal content blocks (camera image + text).""" + content = [] + if obs.camera_image is not None: + img = self._decode_image(obs.camera_image, image_size) + content.append({"type": "image", "image": img}) + content.append({"type": "text", "text": self._describe(obs)}) + return content + + def _advance(self, ticks: int = SIM_TICKS): + result = None + for _ in range(ticks): + result = self.client.step(CarlaAction(action_type="observe")) + if result.done: + break + return result + + def _advance_until_done(self, max_ticks: int = 50): + """Advance the simulation until the episode ends.""" + result = None + for _ in range(max_ticks): + result = self.client.step(CarlaAction(action_type="observe")) + if result.done: + break + return result + + def _advance_and_capture(self, ticks: int = SIM_TICKS): + """Advance the simulation, then capture an image of the current state.""" + result = self._advance(ticks) + capture_result = self.client.step(CarlaAction(action_type="capture_image")) + result.observation.camera_image = capture_result.observation.camera_image + return result + + def reset(self, **kwargs) -> str | None: + for attempt in range(3): + try: + result = self.client.reset(scenario_name="trolley_micro_escape_exists") + self.reward = 0.0 + return self._describe(result.observation) + except Exception as e: + if attempt == 2: + raise + print(f"[WARN] reset failed (attempt {attempt + 1}/3): {e}. Reconnecting...") + self.client = CarlaEnv(base_url=self.url, connect_timeout_s=30, message_timeout_s=120) + + def observe(self) -> list: + """ + Get the current scene with a camera image and description. + + Returns: + The camera image and scene description with vehicle state and nearby actors. + """ + result = self._advance_and_capture() + self.reward = result.observation.rubric_reward or 0.0 + return self._format_multimodal(result.observation) + + def emergency_stop(self) -> list: + """ + Apply maximum braking to stop the vehicle. + + Returns: + The camera image and scene description after braking. + """ + self.client.step(CarlaAction(action_type="emergency_stop")) + result = self._advance_and_capture() + self.reward = result.observation.rubric_reward or 0.0 + print(f"[DEBUG env] emergency_stop: done={result.done}, reward={self.reward}") + return self._format_multimodal(result.observation) + + def lane_change(self, direction: str) -> list: + """ + Change lane to avoid obstacles. + + Args: + direction: Direction to change lane, either "left" or "right". + + Returns: + The camera image and scene description after changing lane. + """ + self.client.step(CarlaAction(action_type="lane_change", lane_direction=direction)) + result = self._advance_and_capture() + self.reward = result.observation.rubric_reward or 0.0 + print(f"[DEBUG env] lane_change({direction}): done={result.done}, reward={self.reward}") + return self._format_multimodal(result.observation) + + peft_config = None + if args.use_lora: + from peft import LoraConfig + + if args.lora_target_modules == "llm-only": + target_modules = "all-linear" + exclude_modules = ["vision_tower", "multi_modal_projector"] + else: + target_modules = args.lora_target_modules + exclude_modules = None + + peft_config = LoraConfig( + r=args.lora_r, + lora_alpha=args.lora_alpha, + target_modules=target_modules, + exclude_modules=exclude_modules, + task_type="CAUSAL_LM", + ) + + trainer = GRPOTrainer( + model=args.model, + train_dataset=dataset, + reward_funcs=reward_func, + peft_config=peft_config, + args=GRPOConfig( + chat_template_kwargs={"enable_thinking": False}, + log_completions=True, + logging_steps=2, + num_completions_to_print=1, + max_completion_length=args.max_completion_length, + per_device_train_batch_size=args.per_device_train_batch_size or len(args.env_urls), + steps_per_generation=1, + num_generations=len(args.env_urls), + max_tool_calling_iterations=10, + learning_rate=args.learning_rate, + gradient_accumulation_steps=args.gradient_accumulation_steps, + max_steps=args.max_steps, + push_to_hub=args.hub_model_id is not None, + hub_model_id=args.hub_model_id, + hub_private_repo=args.hub_private_repo, + run_name=args.run_name, + report_to=args.report_to, + trackio_space_id=args.trackio_space_id, + ), + environment_factory=CarlaVLMEnv, + ) + trainer.train() + + +if __name__ == "__main__": + main() diff --git a/tooling/huggingface/recipes/scripts/ft_gemma3n_audio_vt.py b/tooling/huggingface/recipes/scripts/ft_gemma3n_audio_vt.py new file mode 100644 index 0000000..684ca0c --- /dev/null +++ b/tooling/huggingface/recipes/scripts/ft_gemma3n_audio_vt.py @@ -0,0 +1,184 @@ +import os + +os.environ["TRANSFORMERS_VERBOSITY"] = "error" +os.environ["TOKENIZERS_PARALLELISM"] = "false" + +import random +from functools import partial + +import torch +from datasets import load_dataset +from matplotlib import pyplot as plt +from torch.utils.data import DataLoader +from tqdm import tqdm +from transformers import Gemma3nForConditionalGeneration, Gemma3nProcessor + + +def collate_fn(examples, processor): + messages = list() + for sample in examples: + audio = sample["audio"]["array"] + label = str(sample["text"]) + message = [ + { + "role": "system", + "content": [ + { + "type": "text", + "text": "You are an assistant that transcribes speech accurately.", + } + ], + }, + { + "role": "user", + "content": [ + {"type": "audio", "audio": audio}, + {"type": "text", "text": "Please transcribe this audio."}, + ], + }, + {"role": "assistant", "content": [{"type": "text", "text": label}]}, + ] + messages.append(message) + + batch = processor.apply_chat_template( + messages, + add_generation_prompt=False, + tokenize=True, + return_dict=True, + return_tensors="pt", + ) + + labels = batch["input_ids"].clone() # Clone input IDs for labels + # Mask the tokens that we do not want to include in the loss computation + # -100 is ignored during categorical cross entropy loss computation + labels[labels == processor.tokenizer.pad_token_id] = -100 + labels[labels == processor.tokenizer.audio_token_id] = -100 + labels[labels == processor.tokenizer.image_token_id] = -100 + labels[labels == processor.tokenizer.boi_token_id] = -100 + labels[labels == processor.tokenizer.eoi_token_id] = -100 + + batch["labels"] = labels + + return batch + + +def freeze_layers(model): + for name, param in model.named_parameters(): + if "attn" in name: + param.requires_grad = True + else: + param.requires_grad = False + return model + + +def run_inference(val_dataset, processor, model, fname): + # infer before training + val_sample = random.choice(val_dataset) + audio = val_sample["audio"]["array"] + message = [ + { + "role": "system", + "content": [ + { + "type": "text", + "text": "You are an assistant that transcribes speech accurately.", + } + ], + }, + { + "role": "user", + "content": [ + {"type": "audio", "audio": audio}, + {"type": "text", "text": "Please transcribe this audio."}, + ], + }, + ] + inputs = processor.apply_chat_template( + message, + add_generation_prompt=True, + tokenize=True, + return_dict=True, + return_tensors="pt", + ).to(model.device, dtype=torch.bfloat16) + input_len = inputs["input_ids"].shape[-1] + with torch.no_grad(): + generation = model.generate(**inputs, max_new_tokens=100, disable_compile=True) + generation = generation[0][input_len:] + + decoded = processor.decode(generation, skip_special_tokens=True) + + print(f"Audio transcription: {decoded}") + print(f"Label: {val_sample['text']}") + + +def main(): + model_id = "google/gemma-3n-E2B-it" + processor = Gemma3nProcessor.from_pretrained(model_id) + + # Load and split the dataset. + ds_full = load_dataset("AdrienB134/Emilia-dataset-french-split", split="fr") + split_ds = ds_full.train_test_split(test_size=0.1, seed=42) + train_dataset = split_ds["train"].select(range(10000)) + val_dataset = split_ds["test"].select(range(100)) + + # create data loader + partial_collate_fn = partial(collate_fn, processor=processor) + train_dataloader = DataLoader( + train_dataset, + batch_size=1, + shuffle=True, + num_workers=8, + drop_last=True, + collate_fn=partial_collate_fn, + pin_memory=True, + ) + val_dataloader = DataLoader( + val_dataset, + batch_size=1, + shuffle=False, + num_workers=8, + drop_last=True, + collate_fn=partial_collate_fn, + ) + + # load the model and optimizer + model = Gemma3nForConditionalGeneration.from_pretrained(model_id).to( + "cuda", dtype=torch.bfloat16 + ) + + run_inference(val_dataset, processor, model, "pred_before.png") + + model = freeze_layers(model) + + params_to_train = filter(lambda p: p.requires_grad, model.parameters()) + optimizer = torch.optim.AdamW(params_to_train, lr=1e-5) + + # Start Training + accumulation_steps = 8 + for idx, batch in tqdm(enumerate(train_dataloader)): + outputs = model(**batch.to(model.device, dtype=torch.bfloat16)) + loss = outputs.loss / accumulation_steps + if idx % 100 == 0: + val_loss = 0.0 + with torch.no_grad(): + count = 0 + for val_batch in tqdm(val_dataloader, desc="Validation"): + val_loss = ( + val_loss + + model(**val_batch.to(model.device, dtype=torch.bfloat16)).loss + ) + count = count + 1 + val_loss = val_loss / count + print( + f"Iter: {idx} Loss: {loss.item():.4f} Val Loss: {val_loss.item():.4f}" + ) + run_inference(val_dataset, processor, model, f"infer_{idx}.png") + + loss.backward() + if idx % 8 == 0: + optimizer.step() + optimizer.zero_grad() + + +if __name__ == "__main__": + main() diff --git a/tooling/huggingface/recipes/scripts/ft_gemma3n_image_trl.py b/tooling/huggingface/recipes/scripts/ft_gemma3n_image_trl.py new file mode 100644 index 0000000..3df53ae --- /dev/null +++ b/tooling/huggingface/recipes/scripts/ft_gemma3n_image_trl.py @@ -0,0 +1,352 @@ +""" +Train Gemma-3n on various vision-language datasets including intersection-dataset. + +For Gemma-3n with intersection dataset: +accelerate launch \ + --config_file examples/accelerate_configs/deepspeed_zero3.yaml \ + sft_vlm_gemma3n.py \ + --dataset_name ariG23498/intersection-dataset \ + --model_name_or_path google/gemma-3n-E2B-it \ + --per_device_train_batch_size 1 \ + --gradient_accumulation_steps 1 \ + --output_dir gemma-3n-E2B-it-trl-sft-intersection \ + --bf16 \ + --torch_dtype bfloat16 \ + --use_peft \ + --lora_target_modules all-linear \ + --attn_implementation eager + +Train Gemma-3n on the HuggingFaceH4/llava-instruct-mix-vsft dataset (single-image). + +accelerate launch \ + --config_file examples/accelerate_configs/deepspeed_zero3.yaml \ + sft_vlm_gemma3n.py \ + --dataset_name HuggingFaceH4/llava-instruct-mix-vsft \ + --model_name_or_path google/gemma-3-4b-it \ + --per_device_train_batch_size 1 \ + --gradient_accumulation_steps 1 \ + --output_dir gemma-3-4b-it-trl-sft-llava-instruct-mix-vsft \ + --bf16 \ + --torch_dtype bfloat16 \ + --use_peft \ + --lora_target_modules all-linear \ + --attn_implementation eager + +Train Gemma-3n on the FanqingM/MMIU-Benchmark dataset (multi-image). + +accelerate launch \ + --config_file examples/accelerate_configs/deepspeed_zero3.yaml \ + sft_vlm_gemma3n.py \ + --dataset_name FanqingM/MMIU-Benchmark \ + --dataset_train_split test \ + --model_name_or_path google/gemma-3-4b-it \ + --per_device_train_batch_size 1 \ + --gradient_accumulation_steps 1 \ + --output_dir gemma-3-4b-it-trl-sft-MMIU-Benchmark \ + --bf16 \ + --torch_dtype bfloat16 \ + --use_peft \ + --lora_target_modules all-linear + --attn_implementation eager +""" + +import io +import os +import zipfile + +import torch +from datasets import DatasetDict, load_dataset +from huggingface_hub import hf_hub_download, list_repo_files +from PIL import Image +from transformers import (AutoModelForImageTextToText, AutoProcessor, + Gemma3nForConditionalGeneration) +from trl import (ModelConfig, ScriptArguments, SFTConfig, SFTTrainer, + TrlParser, get_kbit_device_map, get_quantization_config) + + +def my_get_peft_config(model_args: ModelConfig): + """A version of get_peft_config that handles comma-separated target modules""" + if model_args.use_peft is False: + return None + + # Import here to avoid issues if PEFT is not available + try: + from peft import LoraConfig + except ImportError: + raise ValueError( + "You need to have PEFT library installed in your environment, make sure to install `peft`. " + "Make sure to run `pip install -U peft`." + ) + + # Fix the target_modules to be a list if it's a comma-separated string + target_modules = model_args.lora_target_modules + if isinstance(target_modules, str) and target_modules != "all-linear": + # Convert comma-separated string to list + target_modules = [module.strip() for module in target_modules.split(",")] + + peft_config = LoraConfig( + task_type=model_args.lora_task_type, + r=model_args.lora_r, + target_modules=target_modules, + lora_alpha=model_args.lora_alpha, + lora_dropout=model_args.lora_dropout, + bias="none", + use_rslora=model_args.use_rslora, + use_dora=model_args.use_dora, + modules_to_save=model_args.lora_modules_to_save, + ) + + return peft_config + + +# For intersection dataset processing +def format_intersection_data(samples: dict) -> dict[str, list]: + """Format intersection dataset to match expected message format""" + formatted_samples = {"messages": []} + for idx in range(len(samples["image"])): + image = samples["image"][idx].convert("RGB") + label = str(samples["label"][idx]) + + message = [ + { + "role": "system", + "content": [ + { + "type": "text", + "text": "You are an assistant with great geometry skills.", + } + ], + }, + { + "role": "user", + "content": [ + {"type": "image", "image": image}, + { + "type": "text", + "text": "How many intersection points are there in the image?", + }, + ], + }, + {"role": "assistant", "content": [{"type": "text", "text": label}]}, + ] + formatted_samples["messages"].append(message) + return formatted_samples + + +# For multi-image example +def process_vision_info(messages: list[dict]) -> list[Image.Image]: + image_inputs = [] + for msg in messages: + content = msg.get("content", []) + if not isinstance(content, list): + content = [content] + + for element in content: + if isinstance(element, dict) and ( + "image" in element or element.get("type") == "image" + ): + if "image" in element: + image = element["image"] + else: + image = element + if image is not None: + # Handle dictionary with bytes + if isinstance(image, dict) and "bytes" in image: + pil_image = Image.open(io.BytesIO(image["bytes"])) + image_inputs.append(pil_image.convert("RGB")) + # Handle PIL Image objects + elif hasattr(image, "convert"): + image_inputs.append(image.convert("RGB")) + return image_inputs + + +def format_data(samples: dict) -> dict[str, list]: + formatted_samples = {"messages": []} + for cont in range(len(samples["question"])): + images = [] + for img_path in samples["input_image_path"][cont]: + try: + with open(img_path, "rb") as f: + img_bytes = f.read() + image = Image.open(io.BytesIO(img_bytes)).convert("RGB") + images.append({"type": "image", "image": image}) + except Exception as e: + print(f"Error processing image {img_path}: {e}") + continue + + formatted_samples["messages"].append( + [ + { + "role": "system", + "content": [{"type": "text", "text": samples["context"][cont]}], + }, + { + "role": "user", + "content": images + + [{"type": "text", "text": samples["question"][cont]}], + }, + { + "role": "assistant", + "content": [{"type": "text", "text": samples["output"][cont]}], + }, + ] + ) + return formatted_samples + + +# For multi-image example +def prepare_dataset( + dataset: DatasetDict, dataset_name: str, dataset_train_split: str +) -> DatasetDict: + all_files = list_repo_files(dataset_name, repo_type="dataset") + zip_files = [f for f in all_files if f.endswith(".zip")] + + for zip_filename in zip_files: + zip_path = hf_hub_download( + repo_id=dataset_name, filename=zip_filename, repo_type="dataset" + ) + extract_folder = zip_filename.replace(".zip", "") + os.makedirs(extract_folder, exist_ok=True) + + with zipfile.ZipFile(zip_path, "r") as zip_ref: + zip_ref.extractall(extract_folder) + + dataset = dataset.map(format_data, batched=True, batch_size=4, num_proc=16) + return dataset + + +def main(): + parser = TrlParser((ScriptArguments, SFTConfig, ModelConfig)) + script_args, training_args, model_args = parser.parse_args_and_config() + training_args.gradient_checkpointing_kwargs = dict(use_reentrant=False) + training_args.remove_unused_columns = False + training_args.dataset_kwargs = {"skip_prepare_dataset": True} + + ################ + # Model, Tokenizer & Processor + ################ + torch_dtype = ( + model_args.torch_dtype + if model_args.torch_dtype in ["auto", None] + else getattr(torch, model_args.torch_dtype) + ) + quantization_config = get_quantization_config(model_args) + model_kwargs = dict( + revision=model_args.model_revision, + attn_implementation=model_args.attn_implementation, + torch_dtype=torch_dtype, + device_map=get_kbit_device_map() if quantization_config is not None else None, + quantization_config=quantization_config, + ) + processor = AutoProcessor.from_pretrained( + model_args.model_name_or_path, trust_remote_code=model_args.trust_remote_code + ) + processor.tokenizer.padding_side = "right" + + # Use appropriate model class based on model name + if "gemma-3n" in model_args.model_name_or_path.lower(): + model = Gemma3nForConditionalGeneration.from_pretrained( + model_args.model_name_or_path, + trust_remote_code=model_args.trust_remote_code, + **model_kwargs, + ) + else: + model = AutoModelForImageTextToText.from_pretrained( + model_args.model_name_or_path, + trust_remote_code=model_args.trust_remote_code, + **model_kwargs, + ) + + def collate_fn(examples): + texts = [] + images_list = [] + + for example in examples: + # Apply chat template to get text + text = processor.apply_chat_template( + example["messages"], tokenize=False, add_generation_prompt=False + ).strip() + texts.append(text) + + # Extract images + if "images" in example: # single-image case + images = [img.convert("RGB") for img in example["images"]] + else: # multi-image case or intersection dataset + images = process_vision_info(example["messages"]) + images_list.append(images) + + # Tokenize the texts and process the images + batch = processor( + text=texts, images=images_list, return_tensors="pt", padding=True + ) + + # The labels are the input_ids, and we mask the padding tokens in the loss computation + labels = batch["input_ids"].clone() + + # Mask tokens for Gemma3n model + if "gemma-3n" in model_args.model_name_or_path.lower(): + # Use Gemma3n specific token masking + labels[labels == processor.tokenizer.pad_token_id] = -100 + if hasattr(processor.tokenizer, "image_token_id"): + labels[labels == processor.tokenizer.image_token_id] = -100 + if hasattr(processor.tokenizer, "boi_token_id"): + labels[labels == processor.tokenizer.boi_token_id] = -100 + if hasattr(processor.tokenizer, "eoi_token_id"): + labels[labels == processor.tokenizer.eoi_token_id] = -100 + else: + # Original masking for other models + image_token_id = [ + processor.tokenizer.convert_tokens_to_ids( + processor.tokenizer.special_tokens_map["boi_token"] + ) + ] + labels[labels == processor.tokenizer.pad_token_id] = -100 + labels[labels == image_token_id] = -100 + labels[labels == 262144] = -100 + + batch["labels"] = labels + return batch + + ################ + # Dataset + ################ + dataset = load_dataset(script_args.dataset_name, name=script_args.dataset_config) + + # Handle different dataset formats + if script_args.dataset_name == "FanqingM/MMIU-Benchmark": + dataset = prepare_dataset( + dataset, script_args.dataset_name, script_args.dataset_train_split + ) + elif script_args.dataset_name == "ariG23498/intersection-dataset": + # Format intersection dataset + dataset = dataset.map( + format_intersection_data, batched=True, batch_size=4, num_proc=4 + ) + + ################ + # Training + ################ + trainer = SFTTrainer( + model=model, + args=training_args, + data_collator=collate_fn, + train_dataset=dataset[script_args.dataset_train_split], + eval_dataset=dataset[script_args.dataset_test_split] + if training_args.eval_strategy != "no" + else None, + processing_class=processor.tokenizer, + peft_config=my_get_peft_config(model_args), + ) + + trainer.train() + + # Save and push to hub + trainer.save_model(training_args.output_dir) + if training_args.push_to_hub: + trainer.push_to_hub(dataset_name=script_args.dataset_name) + if trainer.accelerator.is_main_process: + processor.push_to_hub(training_args.hub_model_id) + + +if __name__ == "__main__": + main() diff --git a/tooling/huggingface/recipes/scripts/ft_gemma3n_image_vt.py b/tooling/huggingface/recipes/scripts/ft_gemma3n_image_vt.py new file mode 100644 index 0000000..fd20a2d --- /dev/null +++ b/tooling/huggingface/recipes/scripts/ft_gemma3n_image_vt.py @@ -0,0 +1,186 @@ +import os + +os.environ["TRANSFORMERS_VERBOSITY"] = "error" +os.environ["TOKENIZERS_PARALLELISM"] = "false" + +import random +from functools import partial + +import torch +from datasets import load_dataset +from matplotlib import pyplot as plt +from torch.utils.data import DataLoader +from tqdm import tqdm +from transformers import Gemma3nForConditionalGeneration, Gemma3nProcessor + + +def collate_fn(examples, processor): + messages = list() + for sample in examples: + image = sample["image"].convert("RGB") + label = str(sample["label"]) + message = [ + { + "role": "system", + "content": [ + { + "type": "text", + "text": "You are an assistant with great geometry skills.", + } + ], + }, + { + "role": "user", + "content": [ + {"type": "image", "image": image}, + { + "type": "text", + "text": "How many intersection points are there in the image?", + }, + ], + }, + {"role": "assistant", "content": [{"type": "text", "text": label}]}, + ] + messages.append(message) + + batch = processor.apply_chat_template( + messages, + add_generation_prompt=False, + tokenize=True, + return_dict=True, + return_tensors="pt", + ) + + labels = batch["input_ids"].clone() # Clone input IDs for labels + # Mask the tokens that we do not want to include in the loss computation + # -100 is ignored during categorical cross entropy loss computation + labels[labels == processor.tokenizer.pad_token_id] = -100 + labels[labels == processor.tokenizer.image_token_id] = -100 + labels[labels == processor.tokenizer.boi_token_id] = -100 + labels[labels == processor.tokenizer.eoi_token_id] = -100 + + batch["labels"] = labels + + return batch + + +def freeze_layers(model): + for name, param in model.named_parameters(): + if "attn" in name: + param.requires_grad = True + else: + param.requires_grad = False + return model + + +def run_inference(val_dataset, processor, model, fname): + # infer before training + val_sample = random.choice(val_dataset) + image = val_sample["image"].convert("RGB") + message = [ + { + "role": "system", + "content": [ + { + "type": "text", + "text": "You are an assistant with great geometry skills.", + } + ], + }, + { + "role": "user", + "content": [ + {"type": "image", "image": image}, + { + "type": "text", + "text": "How many intersection points are there in the image?", + }, + ], + }, + ] + inputs = processor.apply_chat_template( + message, + add_generation_prompt=True, + tokenize=True, + return_dict=True, + return_tensors="pt", + ).to(model.device, dtype=torch.bfloat16) + input_len = inputs["input_ids"].shape[-1] + with torch.no_grad(): + generation = model.generate(**inputs, max_new_tokens=10, disable_compile=True) + generation = generation[0][input_len:] + + decoded = processor.decode(generation, skip_special_tokens=True) + + plt.imshow(image) + plt.axis("off") + plt.title(f"Pred: {decoded}") + plt.show() + plt.savefig(f"outputs_fine_tune/{fname}") + + +def main(): + model_id = "google/gemma-3n-E2B-it" + processor = Gemma3nProcessor.from_pretrained(model_id) + + # load the dataset + dataset_id = "ariG23498/intersection-dataset" + train_dataset = load_dataset(dataset_id, split="train") + val_dataset = load_dataset(dataset_id, split="validation") + + # create data loader + partial_collate_fn = partial(collate_fn, processor=processor) + train_dataloader = DataLoader( + train_dataset, + batch_size=2, + shuffle=True, + num_workers=8, + drop_last=True, + collate_fn=partial_collate_fn, + pin_memory=True, + ) + val_dataloader = DataLoader( + val_dataset, + batch_size=2, + shuffle=False, + num_workers=8, + drop_last=True, + collate_fn=partial_collate_fn, + ) + + # load the model and optimizer + model = Gemma3nForConditionalGeneration.from_pretrained(model_id).to("cuda") + + run_inference(val_dataset, processor, model, "pred_before.png") + + model = freeze_layers(model) + + params_to_train = filter(lambda p: p.requires_grad, model.parameters()) + optimizer = torch.optim.AdamW(params_to_train, lr=1e-5) + + # Start Training + accumulation_steps = 8 + for idx, batch in tqdm(enumerate(train_dataloader)): + outputs = model(**batch.to(model.device)) + loss = outputs.loss / accumulation_steps + if idx % 50 == 0: + val_loss = 0.0 + with torch.no_grad(): + count = 0 + for val_batch in val_dataloader: + val_loss = val_loss + model(**val_batch.to(model.device)).loss + count = count + 1 + val_loss = val_loss / count + print( + f"Iter: {idx} Loss: {loss.item():.4f} Val Loss: {val_loss.item():.4f}" + ) + run_inference(val_dataset, processor, model, f"infer_{idx}.png") + + loss.backward() + if idx % 8 == 0: + optimizer.step() + optimizer.zero_grad() + + +if __name__ == "__main__": + main() diff --git a/tooling/huggingface/recipes/scripts/gemma3n_fine_tuning_on_all_modalities.py b/tooling/huggingface/recipes/scripts/gemma3n_fine_tuning_on_all_modalities.py new file mode 100644 index 0000000..54036ad --- /dev/null +++ b/tooling/huggingface/recipes/scripts/gemma3n_fine_tuning_on_all_modalities.py @@ -0,0 +1,425 @@ +# -*- coding: utf-8 -*- +"""Gemma3n Fine-tuning on All Modalities.ipynb + +Automatically generated by Colab. + +Original file is located at + https://colab.research.google.com/drive/1iEZUJuvKJpGU8t50BqfkiCQmGkaR6gd4 + +# Fine-tune Gemma3n on FineVideo + +In this notebook, we will see how to fine-tune Gemma3n an videos with audios inside. +Using all three modalities is very costly compute-wise, so keep in mind that this is an educational tutorial to fit the model in 40GB VRAM. +""" + +!pip install -U -q timm transformers trl peft datasets + +import io +import os +import zipfile + +import torch +from datasets import load_dataset +from PIL import Image +from transformers import AutoProcessor, Gemma3nForConditionalGeneration + +from trl import ( + SFTConfig, + SFTTrainer, +) + +"""## Download videos and preprocessing + +FineVideo is a quite large dataset, we don't need a ton of examples, so we stream the dataset, check the duration and download the videos shorter than 30 secs. +""" + +from datasets import load_dataset +import json +import os + +dataset = load_dataset("HuggingFaceFV/finevideo", split="train", streaming=True) + + +os.makedirs("videos", exist_ok=True) +os.makedirs("metadata", exist_ok=True) + +for idx, sample in enumerate(dataset): + data = sample["json"] + duration = data.get("duration_seconds", 0) + if duration < 30: + video_filename = f"videos/sample_{idx}.mp4" + with open(video_filename, 'wb') as video_file: + video_file.write(sample['mp4']) + + json_filename = f"metadata/sample_{idx}.json" + with open(json_filename, 'w') as json_file: + json.dump(sample['json'], json_file) + +print(f"Number of items in content/videos: {len(os.listdir('videos'))}") + +"""In FineVideo some frames are dark so we downsample 6 frames and if we can't get meaningful videos we remove them.""" + +import cv2 +from PIL import Image +import numpy as np + +def is_dark(frame, threshold=10): + return np.max(frame) < threshold # all pixels are very close to 0 + +def downsample_video(video_path): + vidcap = cv2.VideoCapture(video_path) + total_frames = int(vidcap.get(cv2.CAP_PROP_FRAME_COUNT)) + fps = vidcap.get(cv2.CAP_PROP_FPS) + + frames = [] + + # Generate 8 evenly spaced indices, skip first and last + full_indices = np.linspace(0, total_frames - 1, 8, dtype=int)[1:-1] + + for i in full_indices: + found_valid = False + for offset in [0, -1, 1, -2, 2]: # Try nearby frames if original is dark + candidate_idx = i + offset + if 0 <= candidate_idx < total_frames: + vidcap.set(cv2.CAP_PROP_POS_FRAMES, candidate_idx) + success, image = vidcap.read() + if success: + if not is_dark(image): + image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) + pil_image = Image.fromarray(image) + timestamp = round(candidate_idx / fps, 2) + frames.append((pil_image, timestamp)) + found_valid = True + break + if not found_valid: + print(f"Warning: Could not find non-dark frame near index {i}") + + vidcap.release() + + # If still fewer than 8, try to top off by scanning more frames + if len(frames) < 6: + print("Trying to top off with additional non-dark frames...") + idx = 0 + while len(frames) < 8 and idx < total_frames: + vidcap.set(cv2.CAP_PROP_POS_FRAMES, idx) + success, image = vidcap.read() + if success and not is_dark(image): + image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) + pil_image = Image.fromarray(image) + timestamp = round(idx / fps, 2) + # Avoid adding duplicate timestamps + if not any(ts == timestamp for _, ts in frames): + frames.append((pil_image, timestamp)) + idx += 1 + + return frames[:8] # Ensure exactly 8 frames + +import os +import glob + +def remove_dark_videos(video_dir, metadata_dir, audio_dir): + """ + Remove videos (and their metadata/audio files) if all frames are dark. + """ + video_paths = glob.glob(os.path.join(video_dir, "*.mp4")) + + for video_path in video_paths: + filename = os.path.basename(video_path) + base_name = os.path.splitext(filename)[0] + + frames = downsample_video(video_path) + if len(frames) < 6: + try: + os.remove(video_path) + print(f"Deleted: {video_path}") + except Exception as e: + print(f"Failed to delete {video_path}: {e}") + + metadata_path = os.path.join(metadata_dir, f"{base_name}.json") + if os.path.exists(metadata_path): + os.remove(metadata_path) + + # Remove audio + audio_path = os.path.join(audio_dir, f"{base_name}.wav") + if os.path.exists(audio_path): + os.remove(audio_path) + +remove_dark_videos( + video_dir="videos", + metadata_dir="metadata", + audio_dir="audios" + ) + +"""Gemma-3n accepts video (image frames) and audio separately, so we strip audio from video.""" + +import os +import subprocess + +video_dir = "videos" +audio_dir = "audios" +os.makedirs(audio_dir, exist_ok=True) + +for filename in os.listdir(video_dir): + if not filename.endswith(".mp4"): + continue + + idx = filename.split("_")[1].split(".")[0] + video_path = os.path.join(video_dir, filename) + audio_path = os.path.join(audio_dir, f"sample_{idx}.wav") + + subprocess.run([ + "ffmpeg", "-i", video_path, + "-q:a", "0", "-map", "a", + audio_path, + "-y" + ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL) + +"""Construct a new dataset with audio, video, metadata (video categories). This dataset is very cool, it has some questions and answers, captions and more so get creative if you have the GPU VRAM to do so. Here we solve an easier task for educational purposes.""" + +from datasets import Dataset +import json + +def gen(): + meta_dir = "metadata" + for filename in os.listdir(meta_dir): + if not filename.endswith(".json"): + continue + + idx = filename.split("_")[1].split(".")[0] + if os.path.exists(f"videos/sample_{idx}.mp4"): + video_filename = f"sample_{idx}.mp4" + audio_filename = f"sample_{idx}.wav" + json_path = os.path.join(meta_dir, filename) + + with open(json_path, "r") as f: + metadata = json.load(f) + + + yield { + "video": video_filename, + "audio": audio_filename, + "content_parent_category": metadata["content_parent_category"], + "sample_index": int(idx) + } + else: + pass + +dataset = Dataset.from_generator(gen) + +"""We will speed-up and downsample the audios to save space during training.""" + +import torchaudio +from torchaudio.transforms import Resample +import os +import torch + +def preprocess_audio(audio_path, target_sample_rate=16000, max_duration_sec=5, speedup_factor=1.25): + waveform, sample_rate = torchaudio.load(audio_path) + + if waveform.shape[0] > 1: + waveform = waveform.mean(dim=0, keepdim=True) + + if sample_rate != target_sample_rate: + resampler = Resample(orig_freq=sample_rate, new_freq=target_sample_rate) + waveform = resampler(waveform) + sample_rate = target_sample_rate + + if speedup_factor > 1.0: + indices = torch.arange(0, waveform.shape[1], step=speedup_factor).long() + if indices[-1] >= waveform.shape[1]: + indices = indices[:-1] + waveform = waveform[:, indices] + + max_length = int(target_sample_rate * max_duration_sec) + if waveform.shape[1] > max_length: + waveform = waveform[:, :max_length] + + torchaudio.save(audio_path, waveform, sample_rate) + +for file_name in os.listdir("audios"): + if file_name.lower().endswith(".wav"): + audio_path = os.path.join("audios", file_name) + preprocess_audio(audio_path) + +dataset = dataset.train_test_split(test_size=0.10, seed=42) + +"""### Load the model + +Make sure you have your Hugging Face token in your Colab secrets. +""" + +model = Gemma3nForConditionalGeneration.from_pretrained( + "google/gemma-3n-E2B-it", torch_dtype=torch.bfloat16, +) +processor = AutoProcessor.from_pretrained( + "google/gemma-3n-E2B-it", +) +processor.tokenizer.padding_side = "right" + +processor.tokenizer.all_special_ids + +"""Write our dataset collator. We will train model to predict category of a video (which can be done easily). You can do much better things, for instance FineVideo has QnA section, you can train this model to do open-ended QnA if you have a big VRAM and a lot of patience. Open-ended tasks are harder to work with, and this notebook carries educational purposes on feeding different modalities. + +In collator we also downsample videos to 6 frames, we have written the helper above. For better results you need more frames. +""" + +def collate_fn(examples): + video_path = examples[0]["video"] + audio_path = examples[0]["audio"] + sample_idx = filename.split("_")[1].split(".")[0] + frames = downsample_video(f"videos/{video_path}") + + text = "Based on the video, predict the category of it." + message = [ + { + "role": "user", + "content": [ + {"type": "text", "text": text} + ], + }, + ] + # this is how video inference should be formatted in Gemma3n + for frame in frames: + image, timestamp = frame + message[0]["content"].append({"type": "text", "text": f"Frame {timestamp}:"}) + timestamp = str(timestamp).replace(".", "_") + image.save(f"image_idx_{sample_idx}_{timestamp}.png") + message[0]["content"].append({"type": "image", "url": f"image_idx_{sample_idx}_{timestamp}.png"}) + + message[0]["content"].append({"type": "audio", "audio": f"audios/{audio_path}"}) + message.append({"role": "assistant", "content": [{"type": "text", "text": examples[0]["content_parent_category"]}]}) + inputs = processor.apply_chat_template( + message, + add_generation_prompt=False, + tokenize=True, + return_dict=True, + return_tensors="pt", + padding=True, + ).to(model.device) + + labels = inputs["input_ids"].clone() + special_token_ids = processor.tokenizer.all_special_ids + + special_token_ids_tensor = torch.tensor(special_token_ids, device=labels.device) + mask = torch.isin(labels, special_token_ids_tensor) + labels[mask] = -100 + + inputs["labels"] = labels + if torch.all(inputs["pixel_values"] == 0): + print("Frames are dark") + + return inputs + +"""## Training + +We do LoRA fine-tuning again to save up on space. +""" + +from peft import LoraConfig +peft_config = LoraConfig( + task_type="CAUSAL_LM", + r=16, + target_modules="all-linear", + lora_alpha=32, + lora_dropout=0.05, + bias="none", + use_rslora=False, + use_dora=False, + modules_to_save=None +) + +model.gradient_checkpointing_disable() + +model.config.use_cache = False + +training_args = SFTConfig( + output_dir="/content/gemma-3n-finevideo", + eval_strategy='epoch', + per_device_train_batch_size=1, + per_device_eval_batch_size=1, + gradient_accumulation_steps=4, + gradient_checkpointing=False, + learning_rate=1e-05, + num_train_epochs=3.0, + logging_steps=10, + save_steps=100, + bf16=True, + report_to=["tensorboard"], + dataset_kwargs={'skip_prepare_dataset': True}, + remove_unused_columns=False, + max_seq_length=None, + push_to_hub=True, + dataloader_pin_memory=False, +) + +trainer = SFTTrainer( + model=model, + args=training_args, + data_collator=collate_fn, + train_dataset=dataset["train"], + eval_dataset=dataset["test"] if training_args.eval_strategy != "no" else None, + processing_class=processor.tokenizer, + peft_config=peft_config, +) + +trainer.train() + +"""Test the model with a video of snowboarding.""" + +!wget https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/IMG_8137.mp4 + +model = trainer.model # trainer has the adapter + +"""Strip audio and downsample video.""" + +audio_path = "/content/test_audio.wav" +subprocess.run([ + "ffmpeg", "-i", "/content/IMG_8137.mp4", + "-q:a", "0", "-map", "a", + f"{audio_path}", + "-y" + ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL) + +frames = downsample_video("/content/IMG_8137.mp4") + +# repeat the chat template +text = "Based on the video, predict the category of it." +message = [ + { + "role": "user", + "content": [ + {"type": "text", "text": text} + ], + }, +] +for frame in frames: + image, timestamp = frame + message[0]["content"].append({"type": "text", "text": f"Frame {timestamp}:"}) + timestamp = str(timestamp).replace(".", "_") + image.save(f"test_frame_{timestamp}.png") + message[0]["content"].append({"type": "image", "url": f"test_frame_{timestamp}.png"}) + +message[0]["content"].append({"type": "audio", "audio": f"{audio_path}"}) + +message + +inputs = processor.apply_chat_template( + message, + add_generation_prompt=True, + tokenize=True, + return_dict=True, + return_tensors="pt", + padding=True, +).to(model.device).to(model.dtype) + +input_len = inputs["input_ids"].shape[-1] + +with torch.inference_mode(): + generation = model.generate(**inputs, max_new_tokens=100, do_sample=False) + generation = generation[0][input_len:] + +decoded = processor.decode(generation, skip_special_tokens=True) +print(decoded) + +"""Thanks a lot for reading! Keep training the model further with more data or unfreeze the layers for better performance 💗""" + diff --git a/tooling/huggingface/spaces/huggingface-projects_gemma-4-31b-it-README.md b/tooling/huggingface/spaces/huggingface-projects_gemma-4-31b-it-README.md new file mode 100644 index 0000000..177aac3 --- /dev/null +++ b/tooling/huggingface/spaces/huggingface-projects_gemma-4-31b-it-README.md @@ -0,0 +1,13 @@ +--- +title: Gemma 4 31B It +emoji: 🚀 +colorFrom: blue +colorTo: green +sdk: gradio +sdk_version: 6.12.0 +python_version: "3.12.12" +app_file: app.py +pinned: false +--- + +Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference diff --git a/tooling/huggingface/spaces/huggingface-projects_gemma-4-31b-it-app.py b/tooling/huggingface/spaces/huggingface-projects_gemma-4-31b-it-app.py new file mode 100644 index 0000000..f08f04e --- /dev/null +++ b/tooling/huggingface/spaces/huggingface-projects_gemma-4-31b-it-app.py @@ -0,0 +1,303 @@ +import os +from collections.abc import Iterator +from threading import Thread + +import gradio as gr +import spaces +import torch +from transformers import AutoModelForMultimodalLM, AutoProcessor, BatchFeature +from transformers.generation.streamers import TextIteratorStreamer + +MODEL_ID = "google/gemma-4-31b-it" + +processor = AutoProcessor.from_pretrained(MODEL_ID, use_fast=False) +model = AutoModelForMultimodalLM.from_pretrained(MODEL_ID, device_map="auto", dtype=torch.bfloat16) + +IMAGE_FILE_TYPES = (".jpg", ".jpeg", ".png", ".webp") +VIDEO_FILE_TYPES = (".mp4", ".mov", ".avi", ".webm") +MAX_INPUT_TOKENS = int(os.getenv("MAX_INPUT_TOKENS", "10_000")) + +THINKING_START = "<|channel>" +THINKING_END = "" + +# Special tokens to strip from decoded output (keeping thinking delimiters +# so that Gradio's reasoning_tags can find them on the frontend). +_KEEP_TOKENS = {THINKING_START, THINKING_END} +_STRIP_TOKENS = sorted( + (t for t in processor.tokenizer.all_special_tokens if t not in _KEEP_TOKENS), + key=len, + reverse=True, # longest first to avoid partial matches +) + + +def _strip_special_tokens(text: str) -> str: + for tok in _STRIP_TOKENS: + text = text.replace(tok, "") + return text + + +def _classify_file(path: str) -> str | None: + """Return media type string for a file path, or None if unsupported.""" + lower = path.lower() + if lower.endswith(IMAGE_FILE_TYPES): + return "image" + if lower.endswith(VIDEO_FILE_TYPES): + return "video" + return None + + +def process_new_user_message(message: dict) -> list[dict]: + """Build content list from the new user message with URL-based media references.""" + content: list[dict] = [] + for path in message.get("files", []): + kind = _classify_file(path) + if kind: + content.append({"type": kind, "url": path}) + content.append({"type": "text", "text": message.get("text", "")}) + return content + + +def process_history(history: list[dict]) -> list[dict]: + """Walk Gradio 6 history and build message list with URL-based media references.""" + messages: list[dict] = [] + + for item in history: + if item["role"] == "assistant": + text_parts = [p["text"] for p in item["content"] if p.get("type") == "text"] + messages.append( + { + "role": "assistant", + "content": [{"type": "text", "text": " ".join(text_parts)}], + } + ) + else: + user_content: list[dict] = [] + for part in item["content"]: + if part.get("type") == "text": + user_content.append({"type": "text", "text": part["text"]}) + elif part.get("type") == "file": + filepath = part["file"]["path"] + kind = _classify_file(filepath) + if kind: + user_content.append({"type": kind, "url": filepath}) + if user_content: + messages.append({"role": "user", "content": user_content}) + + return messages + + +@spaces.GPU(duration=180) +@torch.inference_mode() +def _generate_on_gpu(inputs: BatchFeature, max_new_tokens: int, thinking: bool) -> Iterator[str]: + inputs = inputs.to(device=model.device, dtype=torch.bfloat16) + + streamer = TextIteratorStreamer( + processor, + timeout=30.0, + skip_prompt=True, + skip_special_tokens=not thinking, + ) + generate_kwargs = { + **inputs, + "streamer": streamer, + "max_new_tokens": max_new_tokens, + "disable_compile": True, + } + + exception_holder: list[Exception] = [] + + def _generate() -> None: + try: + model.generate(**generate_kwargs) + except Exception as e: # noqa: BLE001 + exception_holder.append(e) + + thread = Thread(target=_generate) + thread.start() + + chunks: list[str] = [] + for text in streamer: + chunks.append(text) + accumulated = "".join(chunks) + if thinking: + yield _strip_special_tokens(accumulated) + else: + yield accumulated + + thread.join() + if exception_holder: + msg = f"Generation failed: {exception_holder[0]}" + raise gr.Error(msg) + + +def validate_input(message: dict) -> dict: + has_text = bool(message.get("text", "").strip()) + has_files = bool(message.get("files")) + if not (has_text or has_files): + return gr.validate(has_text, "Please enter a message or upload a file.") + + files = message.get("files", []) + kinds = [_classify_file(f) for f in files] + kinds = [k for k in kinds if k is not None] + unique_kinds = set(kinds) + + if len(unique_kinds) > 1: + return gr.validate(False, "Please upload only one type of media (images or video) at a time.") + if kinds.count("video") > 1: + return gr.validate(False, "Only one video file can be uploaded at a time.") + + return gr.validate(True, "") + + +def _has_media_type(messages: list[dict], media_type: str) -> bool: + """Check if any message contains a content entry of the given media type.""" + return any( + c.get("type") == media_type for m in messages for c in (m["content"] if isinstance(m["content"], list) else []) + ) + + +def generate( + message: dict, + history: list[dict], + thinking: bool = False, + max_new_tokens: int = 1024, + max_soft_tokens: int = 280, + system_prompt: str = "", +) -> Iterator[str]: + + messages: list[dict] = [] + if system_prompt: + messages.append({"role": "system", "content": [{"type": "text", "text": system_prompt}]}) + + messages.extend(process_history(history)) + messages.append({"role": "user", "content": process_new_user_message(message)}) + + template_kwargs: dict = { + "tokenize": True, + "return_dict": True, + "return_tensors": "pt", + "add_generation_prompt": True, + "processor_kwargs": {"images_kwargs": {"max_soft_tokens": max_soft_tokens}}, + } + if _has_media_type(messages, "video"): + template_kwargs["load_audio_from_video"] = False + if thinking: + template_kwargs["enable_thinking"] = True + + inputs = processor.apply_chat_template(messages, **template_kwargs) + + n_tokens = inputs["input_ids"].shape[1] + if n_tokens > MAX_INPUT_TOKENS: + msg = f"Input too long ({n_tokens} tokens). Maximum is {MAX_INPUT_TOKENS} tokens." + raise gr.Error(msg) + + yield from _generate_on_gpu(inputs=inputs, max_new_tokens=max_new_tokens, thinking=thinking) + + +examples = [ + # --- Text-only examples --- + [ + { + "text": "What is the capital of France?", + "files": [], + } + ], + [ + { + "text": "What is the water formula?", + "files": [], + } + ], + [ + { + "text": "Explain quantum entanglement in simple terms.", + "files": [], + } + ], + [ + { + "text": "I want to do a car wash that is 50 meters away, should I walk or drive?", + "files": [], + } + ], + [ + { + "text": "Write a poem about beer with 4 stanzas. Format the title as an H2 markdown heading and bold the first line of each stanza.", + "files": [], + } + ], + # --- Single-image examples --- + [ + { + "text": "Describe this image.", + "files": ["https://news.bbc.co.uk/media/images/38107000/jpg/_38107299_ronaldogoal_ap_300.jpg"], + } + ], + [ + { + "text": "What is the city in this image? Describe what you see.", + "files": ["https://imgmd.net/images/v1/guia/1698673/rio-de-janeiro-4-c.jpg"], + } + ], + # --- Multi-image examples --- + [ + { + "text": "What are the key similarities between these three images?", + "files": [ + "https://news.bbc.co.uk/media/images/38107000/jpg/_38107299_ronaldogoal_ap_300.jpg", + "https://ogimg.infoglobo.com.br/in/12547538-502-0e0/FT1086A/94-8705-14.jpg", + "https://amazonasatual.com.br/wp-content/uploads/2021/01/Pele.jpg", + ], + } + ], + # --- Video examples --- + [ + { + "text": "What is happening in this video?", + "files": ["https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/concert.mp4"], + } + ], +] + +demo = gr.ChatInterface( + fn=generate, + validator=validate_input, + chatbot=gr.Chatbot( + scale=1, + latex_delimiters=[ + {"left": "$$", "right": "$$", "display": True}, + {"left": "$", "right": "$", "display": False}, + {"left": "\\(", "right": "\\)", "display": False}, + {"left": "\\[", "right": "\\]", "display": True}, + ], + reasoning_tags=[(THINKING_START, THINKING_END)], + ), + textbox=gr.MultimodalTextbox( + sources=["upload"], + file_types=[*IMAGE_FILE_TYPES, *VIDEO_FILE_TYPES], + file_count="multiple", + autofocus=True, + ), + multimodal=True, + additional_inputs=[ + gr.Checkbox(label="Thinking", value=False), + gr.Slider(label="Max New Tokens", minimum=100, maximum=4000, step=10, value=2000), + gr.Dropdown( + label="Image Token Budget", + info="Higher values preserve more visual detail (useful for OCR/documents). Lower values are faster.", + choices=[70, 140, 280, 560, 1120], + value=280, + ), + gr.Textbox(label="System Prompt", value=""), + ], + additional_inputs_accordion=gr.Accordion("Settings", open=True), + stop_btn=False, + title="Gemma 4 31B It", + examples=examples, + run_examples_on_click=False, + cache_examples=False, + delete_cache=(1800, 1800), +) + +if __name__ == "__main__": + demo.launch(css_paths="style.css", max_file_size="20mb") diff --git a/tooling/huggingface/spaces/huggingface-projects_gemma-4-31b-it-requirements.txt b/tooling/huggingface/spaces/huggingface-projects_gemma-4-31b-it-requirements.txt new file mode 100644 index 0000000..fbbfd8c --- /dev/null +++ b/tooling/huggingface/spaces/huggingface-projects_gemma-4-31b-it-requirements.txt @@ -0,0 +1,362 @@ +# This file was autogenerated by uv via the following command: +# uv export --no-hashes --no-dev --group hf-spaces --no-emit-package typer-slim --no-emit-package spaces -o requirements.txt +accelerate==1.13.0 + # via gemma-4-31b-it +aiohappyeyeballs==2.6.1 + # via aiohttp +aiohttp==3.13.5 + # via fsspec +aiosignal==1.4.0 + # via aiohttp +annotated-doc==0.0.4 + # via + # fastapi + # typer +annotated-types==0.7.0 + # via pydantic +anyio==4.13.0 + # via + # gradio + # httpx + # mcp + # sse-starlette + # starlette +attrs==26.1.0 + # via + # aiohttp + # jsonschema + # referencing +audioop-lts==0.2.2 ; python_full_version >= '3.13' + # via gradio +brotli==1.2.0 + # via gradio +certifi==2026.2.25 + # via + # httpcore + # httpx + # requests +cffi==2.0.0 ; platform_python_implementation != 'PyPy' + # via cryptography +charset-normalizer==3.4.7 + # via requests +click==8.3.2 + # via + # typer + # uvicorn +colorama==0.4.6 ; sys_platform == 'win32' + # via + # click + # tqdm +cryptography==46.0.7 + # via pyjwt +datasets==4.8.4 +dill==0.4.1 + # via + # datasets + # multiprocess +fastapi==0.136.0 + # via gradio +filelock==3.28.0 + # via + # datasets + # huggingface-hub + # torch +frozenlist==1.8.0 + # via + # aiohttp + # aiosignal +fsspec==2026.2.0 + # via + # datasets + # gradio-client + # huggingface-hub + # torch +gradio==6.12.0 + # via + # gemma-4-31b-it + # spaces +gradio-client==2.4.1 + # via + # gradio + # hf-gradio +groovy==0.1.2 + # via gradio +h11==0.16.0 + # via + # httpcore + # uvicorn +hf-gradio==0.4.0 + # via gradio +hf-xet==1.4.3 ; platform_machine == 'AMD64' or platform_machine == 'aarch64' or platform_machine == 'amd64' or platform_machine == 'arm64' or platform_machine == 'x86_64' + # via huggingface-hub +httpcore==1.0.9 + # via httpx +httpx==0.28.1 + # via + # datasets + # gradio + # gradio-client + # huggingface-hub + # mcp + # safehttpx + # spaces +httpx-sse==0.4.3 + # via mcp +huggingface-hub==1.11.0 + # via + # accelerate + # datasets + # gradio + # gradio-client + # tokenizers + # transformers +idna==3.11 + # via + # anyio + # httpx + # requests + # yarl +jinja2==3.1.6 + # via + # gradio + # torch +jsonschema==4.26.0 + # via mcp +jsonschema-specifications==2025.9.1 + # via jsonschema +markdown-it-py==4.0.0 + # via rich +markupsafe==3.0.3 + # via + # gradio + # jinja2 +mcp==1.27.0 + # via gradio +mdurl==0.1.2 + # via markdown-it-py +mpmath==1.3.0 + # via sympy +multidict==6.7.1 + # via + # aiohttp + # yarl +multiprocess==0.70.19 + # via datasets +networkx==3.6.1 + # via torch +numpy==2.4.4 + # via + # accelerate + # datasets + # gradio + # pandas + # torchvision + # transformers +nvidia-cublas-cu12==12.8.4.1 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via + # nvidia-cudnn-cu12 + # nvidia-cusolver-cu12 + # torch +nvidia-cuda-cupti-cu12==12.8.90 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +nvidia-cuda-nvrtc-cu12==12.8.93 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +nvidia-cuda-runtime-cu12==12.8.90 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +nvidia-cudnn-cu12==9.10.2.21 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +nvidia-cufft-cu12==11.3.3.83 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +nvidia-cufile-cu12==1.13.1.3 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +nvidia-curand-cu12==10.3.9.90 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +nvidia-cusolver-cu12==11.7.3.90 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +nvidia-cusparse-cu12==12.5.8.93 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via + # nvidia-cusolver-cu12 + # torch +nvidia-cusparselt-cu12==0.7.1 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +nvidia-nccl-cu12==2.27.5 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +nvidia-nvjitlink-cu12==12.8.93 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via + # nvidia-cufft-cu12 + # nvidia-cusolver-cu12 + # nvidia-cusparse-cu12 + # torch +nvidia-nvshmem-cu12==3.3.20 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +nvidia-nvtx-cu12==12.8.90 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +orjson==3.11.8 + # via gradio +packaging==26.1 + # via + # accelerate + # datasets + # gradio + # gradio-client + # huggingface-hub + # spaces + # transformers +pandas==3.0.2 + # via + # datasets + # gradio +pillow==12.2.0 + # via + # gradio + # torchvision +propcache==0.4.1 + # via + # aiohttp + # yarl +psutil==5.9.8 + # via + # accelerate + # spaces +pyarrow==23.0.1 + # via datasets +pycparser==3.0 ; implementation_name != 'PyPy' and platform_python_implementation != 'PyPy' + # via cffi +pydantic==2.12.5 + # via + # fastapi + # gradio + # mcp + # pydantic-settings + # spaces +pydantic-core==2.41.5 + # via pydantic +pydantic-settings==2.13.1 + # via mcp +pydub==0.25.1 + # via gradio +pygments==2.20.0 + # via rich +pyjwt==2.12.1 + # via mcp +python-dateutil==2.9.0.post0 + # via pandas +python-dotenv==1.2.2 + # via pydantic-settings +python-multipart==0.0.26 + # via + # gradio + # mcp +pytz==2026.1.post1 + # via gradio +pywin32==311 ; sys_platform == 'win32' + # via mcp +pyyaml==6.0.3 + # via + # accelerate + # datasets + # gradio + # huggingface-hub + # transformers +referencing==0.37.0 + # via + # jsonschema + # jsonschema-specifications +regex==2026.4.4 + # via transformers +requests==2.33.1 + # via + # datasets + # spaces +rich==15.0.0 + # via typer +rpds-py==0.30.0 + # via + # jsonschema + # referencing +safehttpx==0.1.7 + # via gradio +safetensors==0.7.0 + # via + # accelerate + # transformers +semantic-version==2.10.0 + # via gradio +setuptools==82.0.1 + # via torch +shellingham==1.5.4 + # via typer +six==1.17.0 + # via python-dateutil +sse-starlette==3.3.4 + # via mcp +starlette==1.0.0 + # via + # fastapi + # gradio + # mcp + # sse-starlette +sympy==1.14.0 + # via torch +tokenizers==0.22.2 + # via transformers +tomlkit==0.14.0 + # via gradio +torch==2.9.1 + # via + # accelerate + # gemma-4-31b-it + # torchvision +torchcodec==0.9.1 + # via gemma-4-31b-it +torchvision==0.24.1 + # via gemma-4-31b-it +tqdm==4.67.3 + # via + # datasets + # huggingface-hub + # transformers +transformers==5.5.4 + # via gemma-4-31b-it +triton==3.5.1 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +typer==0.24.1 + # via + # gradio + # hf-gradio + # huggingface-hub + # transformers +typing-extensions==4.15.0 + # via + # aiosignal + # anyio + # fastapi + # gradio + # gradio-client + # huggingface-hub + # mcp + # pydantic + # pydantic-core + # referencing + # spaces + # starlette + # torch + # typing-inspection +typing-inspection==0.4.2 + # via + # fastapi + # mcp + # pydantic + # pydantic-settings +tzdata==2026.1 ; sys_platform == 'emscripten' or sys_platform == 'win32' + # via pandas +urllib3==2.6.3 + # via requests +uvicorn==0.44.0 + # via + # gradio + # mcp +xxhash==3.6.0 + # via datasets +yarl==1.23.0 + # via aiohttp diff --git a/tooling/huggingface/spaces/huggingface-projects_gemma-4-e4b-it-README.md b/tooling/huggingface/spaces/huggingface-projects_gemma-4-e4b-it-README.md new file mode 100644 index 0000000..603b039 --- /dev/null +++ b/tooling/huggingface/spaces/huggingface-projects_gemma-4-e4b-it-README.md @@ -0,0 +1,13 @@ +--- +title: Gemma 4 E4B It +emoji: 🚀 +colorFrom: blue +colorTo: green +sdk: gradio +sdk_version: 6.12.0 +python_version: "3.12.12" +app_file: app.py +pinned: false +--- + +Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference diff --git a/tooling/huggingface/spaces/huggingface-projects_gemma-4-e4b-it-app.py b/tooling/huggingface/spaces/huggingface-projects_gemma-4-e4b-it-app.py new file mode 100644 index 0000000..4faa110 --- /dev/null +++ b/tooling/huggingface/spaces/huggingface-projects_gemma-4-e4b-it-app.py @@ -0,0 +1,322 @@ +import os +from collections.abc import Iterator +from threading import Thread + +import gradio as gr +import spaces +import torch +from transformers import AutoModelForMultimodalLM, AutoProcessor, BatchFeature +from transformers.generation.streamers import TextIteratorStreamer + +MODEL_ID = "google/gemma-4-e4b-it" + +processor = AutoProcessor.from_pretrained(MODEL_ID, use_fast=False) +model = AutoModelForMultimodalLM.from_pretrained(MODEL_ID, device_map="auto", dtype=torch.bfloat16) + +IMAGE_FILE_TYPES = (".jpg", ".jpeg", ".png", ".webp") +AUDIO_FILE_TYPES = (".wav", ".mp3", ".flac", ".ogg") +VIDEO_FILE_TYPES = (".mp4", ".mov", ".avi", ".webm") +MAX_INPUT_TOKENS = int(os.getenv("MAX_INPUT_TOKENS", "10_000")) + +THINKING_START = "<|channel>" +THINKING_END = "" + +# Special tokens to strip from decoded output (keeping thinking delimiters +# so that Gradio's reasoning_tags can find them on the frontend). +_KEEP_TOKENS = {THINKING_START, THINKING_END} +_STRIP_TOKENS = sorted( + (t for t in processor.tokenizer.all_special_tokens if t not in _KEEP_TOKENS), + key=len, + reverse=True, # longest first to avoid partial matches +) + + +def _strip_special_tokens(text: str) -> str: + for tok in _STRIP_TOKENS: + text = text.replace(tok, "") + return text + + +def _classify_file(path: str) -> str | None: + """Return media type string for a file path, or None if unsupported.""" + lower = path.lower() + if lower.endswith(IMAGE_FILE_TYPES): + return "image" + if lower.endswith(AUDIO_FILE_TYPES): + return "audio" + if lower.endswith(VIDEO_FILE_TYPES): + return "video" + return None + + +def process_new_user_message(message: dict) -> list[dict]: + """Build content list from the new user message with URL-based media references.""" + content: list[dict] = [] + for path in message.get("files", []): + kind = _classify_file(path) + if kind: + content.append({"type": kind, "url": path}) + content.append({"type": "text", "text": message.get("text", "")}) + return content + + +def process_history(history: list[dict]) -> list[dict]: + """Walk Gradio 6 history and build message list with URL-based media references.""" + messages: list[dict] = [] + + for item in history: + if item["role"] == "assistant": + text_parts = [p["text"] for p in item["content"] if p.get("type") == "text"] + messages.append( + { + "role": "assistant", + "content": [{"type": "text", "text": " ".join(text_parts)}], + } + ) + else: + user_content: list[dict] = [] + for part in item["content"]: + if part.get("type") == "text": + user_content.append({"type": "text", "text": part["text"]}) + elif part.get("type") == "file": + filepath = part["file"]["path"] + kind = _classify_file(filepath) + if kind: + user_content.append({"type": kind, "url": filepath}) + if user_content: + messages.append({"role": "user", "content": user_content}) + + return messages + + +@spaces.GPU(duration=120) +@torch.inference_mode() +def _generate_on_gpu(inputs: BatchFeature, max_new_tokens: int, thinking: bool) -> Iterator[str]: + inputs = inputs.to(device=model.device, dtype=torch.bfloat16) + + streamer = TextIteratorStreamer( + processor, + timeout=30.0, + skip_prompt=True, + skip_special_tokens=not thinking, + ) + generate_kwargs = { + **inputs, + "streamer": streamer, + "max_new_tokens": max_new_tokens, + "disable_compile": True, + } + + exception_holder: list[Exception] = [] + + def _generate() -> None: + try: + model.generate(**generate_kwargs) + except Exception as e: # noqa: BLE001 + exception_holder.append(e) + + thread = Thread(target=_generate) + thread.start() + + chunks: list[str] = [] + for text in streamer: + chunks.append(text) + accumulated = "".join(chunks) + if thinking: + yield _strip_special_tokens(accumulated) + else: + yield accumulated + + thread.join() + if exception_holder: + msg = f"Generation failed: {exception_holder[0]}" + raise gr.Error(msg) + + +# FBT003 is suppressed below: gr.validate API takes bool as first positional arg. +def validate_input(message: dict) -> dict: + has_text = bool(message.get("text", "").strip()) + has_files = bool(message.get("files")) + if not (has_text or has_files): + return gr.validate(False, "Please enter a message or upload a file.") # noqa: FBT003 + + files = message.get("files", []) + kinds = [_classify_file(f) for f in files] + kinds = [k for k in kinds if k is not None] + unique_kinds = set(kinds) + + if len(unique_kinds) > 1: + return gr.validate(False, "Please upload only one type of media (images, audio, or video) at a time.") # noqa: FBT003 + if kinds.count("audio") > 1: + return gr.validate(False, "Only one audio file can be uploaded at a time.") # noqa: FBT003 + if kinds.count("video") > 1: + return gr.validate(False, "Only one video file can be uploaded at a time.") # noqa: FBT003 + + return gr.validate(True, "") # noqa: FBT003 + + +def _has_media_type(messages: list[dict], media_type: str) -> bool: + """Check if any message contains a content entry of the given media type.""" + return any(c.get("type") == media_type for m in messages for c in m["content"]) + + +def generate( + message: dict, + history: list[dict], + thinking: bool = False, + max_new_tokens: int = 1024, + max_soft_tokens: int = 280, + system_prompt: str = "", +) -> Iterator[str]: + messages: list[dict] = [] + if system_prompt: + messages.append({"role": "system", "content": [{"type": "text", "text": system_prompt}]}) + + messages.extend(process_history(history)) + messages.append({"role": "user", "content": process_new_user_message(message)}) + + template_kwargs: dict = { + "tokenize": True, + "return_dict": True, + "return_tensors": "pt", + "add_generation_prompt": True, + "load_audio_from_video": _has_media_type(messages, "video"), + "processor_kwargs": {"images_kwargs": {"max_soft_tokens": max_soft_tokens}}, + } + if thinking: + template_kwargs["enable_thinking"] = True + + inputs = processor.apply_chat_template(messages, **template_kwargs) + + n_tokens = inputs["input_ids"].shape[1] + if n_tokens > MAX_INPUT_TOKENS: + msg = f"Input too long ({n_tokens} tokens). Maximum is {MAX_INPUT_TOKENS} tokens." + raise gr.Error(msg) + + yield from _generate_on_gpu(inputs=inputs, max_new_tokens=max_new_tokens, thinking=thinking) + + +examples = [ + # --- Text-only examples --- + [ + { + "text": "What is the capital of France?", + "files": [], + } + ], + [ + { + "text": "What is the water formula?", + "files": [], + } + ], + [ + { + "text": "Explain quantum entanglement in simple terms.", + "files": [], + } + ], + [ + { + "text": "I want to do a car wash that is 50 meters away, should I walk or drive?", + "files": [], + } + ], + [ + { + "text": "Write a poem about beer with 4 stanzas. Format the title as an H2 markdown heading and bold the first line of each stanza.", + "files": [], + } + ], + # --- Single-image examples --- + [ + { + "text": "Describe this image.", + "files": ["https://news.bbc.co.uk/media/images/38107000/jpg/_38107299_ronaldogoal_ap_300.jpg"], + } + ], + [ + { + "text": "What is the city in this image? Describe what you see.", + "files": ["https://imgmd.net/images/v1/guia/1698673/rio-de-janeiro-4-c.jpg"], + } + ], + # --- Multi-image examples --- + [ + { + "text": "What are the key similarities between these three images?", + "files": [ + "https://news.bbc.co.uk/media/images/38107000/jpg/_38107299_ronaldogoal_ap_300.jpg", + "https://ogimg.infoglobo.com.br/in/12547538-502-0e0/FT1086A/94-8705-14.jpg", + "https://amazonasatual.com.br/wp-content/uploads/2021/01/Pele.jpg", + ], + } + ], + # --- Audio examples --- + [ + { + "text": "Transcribe the audio.", + "files": [ + "https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/bcn_weather.mp3" + ], + } + ], + [ + { + "text": "Translate to Dutch.", + "files": [ + "https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/bcn_weather.mp3" + ], + } + ], + # --- Video examples --- + [ + { + "text": "What is happening in this video?", + "files": ["https://huggingface.co/datasets/merve/vlm_test_images/resolve/main/concert.mp4"], + } + ], +] + +demo = gr.ChatInterface( + fn=generate, + validator=validate_input, + chatbot=gr.Chatbot( + scale=1, + latex_delimiters=[ + {"left": "$$", "right": "$$", "display": True}, + {"left": "$", "right": "$", "display": False}, + {"left": "\\(", "right": "\\)", "display": False}, + {"left": "\\[", "right": "\\]", "display": True}, + ], + reasoning_tags=[(THINKING_START, THINKING_END)], + ), + textbox=gr.MultimodalTextbox( + sources=["upload", "microphone"], + file_types=[*IMAGE_FILE_TYPES, *AUDIO_FILE_TYPES, *VIDEO_FILE_TYPES], + file_count="multiple", + autofocus=True, + ), + multimodal=True, + additional_inputs=[ + gr.Checkbox(label="Thinking", value=False), + gr.Slider(label="Max New Tokens", minimum=100, maximum=4000, step=10, value=2000), + gr.Dropdown( + label="Image Token Budget", + info="Higher values preserve more visual detail (useful for OCR/documents). Lower values are faster.", + choices=[70, 140, 280, 560, 1120], + value=280, + ), + gr.Textbox(label="System Prompt", value=""), + ], + additional_inputs_accordion=gr.Accordion("Settings", open=True), + stop_btn=False, + title="Gemma 4 E4B It", + examples=examples, + run_examples_on_click=False, + cache_examples=False, + delete_cache=(1800, 1800), +) + +if __name__ == "__main__": + demo.launch(css_paths="style.css", max_file_size="20MB") diff --git a/tooling/huggingface/spaces/huggingface-projects_gemma-4-e4b-it-requirements.txt b/tooling/huggingface/spaces/huggingface-projects_gemma-4-e4b-it-requirements.txt new file mode 100644 index 0000000..d734b3c --- /dev/null +++ b/tooling/huggingface/spaces/huggingface-projects_gemma-4-e4b-it-requirements.txt @@ -0,0 +1,362 @@ +# This file was autogenerated by uv via the following command: +# uv export --no-hashes --no-dev --group hf-spaces --no-emit-package typer-slim --no-emit-package spaces -o requirements.txt +accelerate==1.13.0 + # via gemma-4-e4b-it +aiohappyeyeballs==2.6.1 + # via aiohttp +aiohttp==3.13.5 + # via fsspec +aiosignal==1.4.0 + # via aiohttp +annotated-doc==0.0.4 + # via + # fastapi + # typer +annotated-types==0.7.0 + # via pydantic +anyio==4.13.0 + # via + # gradio + # httpx + # mcp + # sse-starlette + # starlette +attrs==26.1.0 + # via + # aiohttp + # jsonschema + # referencing +audioop-lts==0.2.2 ; python_full_version >= '3.13' + # via gradio +brotli==1.2.0 + # via gradio +certifi==2026.2.25 + # via + # httpcore + # httpx + # requests +cffi==2.0.0 ; platform_python_implementation != 'PyPy' + # via cryptography +charset-normalizer==3.4.7 + # via requests +click==8.3.2 + # via + # typer + # uvicorn +colorama==0.4.6 ; sys_platform == 'win32' + # via + # click + # tqdm +cryptography==46.0.7 + # via pyjwt +datasets==4.8.4 +dill==0.4.1 + # via + # datasets + # multiprocess +fastapi==0.136.0 + # via gradio +filelock==3.28.0 + # via + # datasets + # huggingface-hub + # torch +frozenlist==1.8.0 + # via + # aiohttp + # aiosignal +fsspec==2026.2.0 + # via + # datasets + # gradio-client + # huggingface-hub + # torch +gradio==6.12.0 + # via + # gemma-4-e4b-it + # spaces +gradio-client==2.4.1 + # via + # gradio + # hf-gradio +groovy==0.1.2 + # via gradio +h11==0.16.0 + # via + # httpcore + # uvicorn +hf-gradio==0.4.0 + # via gradio +hf-xet==1.4.3 ; platform_machine == 'AMD64' or platform_machine == 'aarch64' or platform_machine == 'amd64' or platform_machine == 'arm64' or platform_machine == 'x86_64' + # via huggingface-hub +httpcore==1.0.9 + # via httpx +httpx==0.28.1 + # via + # datasets + # gradio + # gradio-client + # huggingface-hub + # mcp + # safehttpx + # spaces +httpx-sse==0.4.3 + # via mcp +huggingface-hub==1.11.0 + # via + # accelerate + # datasets + # gradio + # gradio-client + # tokenizers + # transformers +idna==3.11 + # via + # anyio + # httpx + # requests + # yarl +jinja2==3.1.6 + # via + # gradio + # torch +jsonschema==4.26.0 + # via mcp +jsonschema-specifications==2025.9.1 + # via jsonschema +markdown-it-py==4.0.0 + # via rich +markupsafe==3.0.3 + # via + # gradio + # jinja2 +mcp==1.27.0 + # via gradio +mdurl==0.1.2 + # via markdown-it-py +mpmath==1.3.0 + # via sympy +multidict==6.7.1 + # via + # aiohttp + # yarl +multiprocess==0.70.19 + # via datasets +networkx==3.6.1 + # via torch +numpy==2.4.4 + # via + # accelerate + # datasets + # gradio + # pandas + # torchvision + # transformers +nvidia-cublas-cu12==12.8.4.1 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via + # nvidia-cudnn-cu12 + # nvidia-cusolver-cu12 + # torch +nvidia-cuda-cupti-cu12==12.8.90 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +nvidia-cuda-nvrtc-cu12==12.8.93 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +nvidia-cuda-runtime-cu12==12.8.90 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +nvidia-cudnn-cu12==9.10.2.21 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +nvidia-cufft-cu12==11.3.3.83 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +nvidia-cufile-cu12==1.13.1.3 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +nvidia-curand-cu12==10.3.9.90 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +nvidia-cusolver-cu12==11.7.3.90 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +nvidia-cusparse-cu12==12.5.8.93 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via + # nvidia-cusolver-cu12 + # torch +nvidia-cusparselt-cu12==0.7.1 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +nvidia-nccl-cu12==2.27.5 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +nvidia-nvjitlink-cu12==12.8.93 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via + # nvidia-cufft-cu12 + # nvidia-cusolver-cu12 + # nvidia-cusparse-cu12 + # torch +nvidia-nvshmem-cu12==3.3.20 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +nvidia-nvtx-cu12==12.8.90 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +orjson==3.11.8 + # via gradio +packaging==26.1 + # via + # accelerate + # datasets + # gradio + # gradio-client + # huggingface-hub + # spaces + # transformers +pandas==3.0.2 + # via + # datasets + # gradio +pillow==12.2.0 + # via + # gradio + # torchvision +propcache==0.4.1 + # via + # aiohttp + # yarl +psutil==5.9.8 + # via + # accelerate + # spaces +pyarrow==23.0.1 + # via datasets +pycparser==3.0 ; implementation_name != 'PyPy' and platform_python_implementation != 'PyPy' + # via cffi +pydantic==2.12.5 + # via + # fastapi + # gradio + # mcp + # pydantic-settings + # spaces +pydantic-core==2.41.5 + # via pydantic +pydantic-settings==2.13.1 + # via mcp +pydub==0.25.1 + # via gradio +pygments==2.20.0 + # via rich +pyjwt==2.12.1 + # via mcp +python-dateutil==2.9.0.post0 + # via pandas +python-dotenv==1.2.2 + # via pydantic-settings +python-multipart==0.0.26 + # via + # gradio + # mcp +pytz==2026.1.post1 + # via gradio +pywin32==311 ; sys_platform == 'win32' + # via mcp +pyyaml==6.0.3 + # via + # accelerate + # datasets + # gradio + # huggingface-hub + # transformers +referencing==0.37.0 + # via + # jsonschema + # jsonschema-specifications +regex==2026.4.4 + # via transformers +requests==2.33.1 + # via + # datasets + # spaces +rich==15.0.0 + # via typer +rpds-py==0.30.0 + # via + # jsonschema + # referencing +safehttpx==0.1.7 + # via gradio +safetensors==0.7.0 + # via + # accelerate + # transformers +semantic-version==2.10.0 + # via gradio +setuptools==82.0.1 + # via torch +shellingham==1.5.4 + # via typer +six==1.17.0 + # via python-dateutil +sse-starlette==3.3.4 + # via mcp +starlette==1.0.0 + # via + # fastapi + # gradio + # mcp + # sse-starlette +sympy==1.14.0 + # via torch +tokenizers==0.22.2 + # via transformers +tomlkit==0.14.0 + # via gradio +torch==2.9.1 + # via + # accelerate + # gemma-4-e4b-it + # torchvision +torchcodec==0.9.1 + # via gemma-4-e4b-it +torchvision==0.24.1 + # via gemma-4-e4b-it +tqdm==4.67.3 + # via + # datasets + # huggingface-hub + # transformers +transformers==5.5.4 + # via gemma-4-e4b-it +triton==3.5.1 ; platform_machine == 'x86_64' and sys_platform == 'linux' + # via torch +typer==0.24.1 + # via + # gradio + # hf-gradio + # huggingface-hub + # transformers +typing-extensions==4.15.0 + # via + # aiosignal + # anyio + # fastapi + # gradio + # gradio-client + # huggingface-hub + # mcp + # pydantic + # pydantic-core + # referencing + # spaces + # starlette + # torch + # typing-inspection +typing-inspection==0.4.2 + # via + # fastapi + # mcp + # pydantic + # pydantic-settings +tzdata==2026.1 ; sys_platform == 'emscripten' or sys_platform == 'win32' + # via pandas +urllib3==2.6.3 + # via requests +uvicorn==0.44.0 + # via + # gradio + # mcp +xxhash==3.6.0 + # via datasets +yarl==1.23.0 + # via aiohttp diff --git a/tooling/huggingface/transformers/__init__.py b/tooling/huggingface/transformers/__init__.py new file mode 100644 index 0000000..d108443 --- /dev/null +++ b/tooling/huggingface/transformers/__init__.py @@ -0,0 +1,33 @@ +# Copyright 2026 the HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from typing import TYPE_CHECKING + +from ...utils import _LazyModule +from ...utils.import_utils import define_import_structure + + +if TYPE_CHECKING: + from .configuration_gemma4 import * + from .feature_extraction_gemma4 import * + from .image_processing_gemma4 import * + from .image_processing_pil_gemma4 import * + from .modeling_gemma4 import * + from .processing_gemma4 import * + from .video_processing_gemma4 import * +else: + import sys + + _file = globals()["__file__"] + sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__) diff --git a/tooling/huggingface/transformers/configuration_gemma4.py b/tooling/huggingface/transformers/configuration_gemma4.py new file mode 100644 index 0000000..7ae940d --- /dev/null +++ b/tooling/huggingface/transformers/configuration_gemma4.py @@ -0,0 +1,352 @@ +# Copyright 2026 the HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from typing import Any, Literal + +from huggingface_hub.dataclasses import strict + +from ...configuration_utils import PreTrainedConfig +from ...utils import auto_docstring, logging +from ...utils.type_validators import interval + + +logger = logging.get_logger(__name__) + + +@auto_docstring(checkpoint="google/gemma-4-e2b-it") +@strict +class Gemma4AudioConfig(PreTrainedConfig): + r""" + subsampling_conv_channels (`list[int]`, defaults to `[128, 32]`): + Channel sizes for the convolutional layers in the Sub-sample Convolution Projection. + residual_weight (`float`, defaults to `0.5`): + Scaling applied to hidden_states prior to combining with the residual in the feedforward. + attention_chunk_size (`int`, defaults to `12`): + The sub-sequence size for attention processing. + attention_context_left (`int`, defaults to `13`): + The leftward context size for the attention chunk. + attention_context_right (`int`, defaults to `0`): + The rightward context size for the attention chunk. + attention_logit_cap (`float`, defaults to `50.0`): + Cap applied to attention weights. + attention_invalid_logits_value (`float`, defaults to `1e-9`): + Value to use for invalid logits in attention. + use_clipped_linears (`bool`, defaults to `True`): + If true, apply clipping to the Linear layers, drawing bounds from the model checkpoint. + gradient_clipping (`float`, defaults to `1e10`): + Clipping value used to stabilize extremely large gradient values. + output_proj_dims (`int`, defaults to `1536`): + Dimension of the final linear projection from `hidden_size` to the model's output. + """ + + model_type = "gemma4_audio" + + hidden_size: int = 1024 + num_hidden_layers: int = 12 + num_attention_heads: int = 8 + hidden_act: str = "silu" + + # subsampling parameters + subsampling_conv_channels: list[int] | tuple[int, int] = (128, 32) + + # conformer parameters + conv_kernel_size: int = 5 + residual_weight: float = 0.5 + attention_chunk_size: int = 12 + attention_context_left: int = 13 + attention_context_right: int = 0 + attention_logit_cap: float = 50.0 + attention_invalid_logits_value: float = -1.0e9 + + use_clipped_linears: bool = True + rms_norm_eps: float = 1e-6 + gradient_clipping: float = 1e10 + output_proj_dims: int = 1536 + initializer_range: float = interval(min=0.0, max=1.0)(default=0.02) + + def __post_init__(self, **kwargs): + # JSON serialization converts tuples to lists, convert back + if isinstance(self.subsampling_conv_channels, tuple): + self.subsampling_conv_channels = list(self.subsampling_conv_channels) + super().__post_init__(**kwargs) + + +@auto_docstring(checkpoint="google/gemma-4-e2b-it") +@strict +class Gemma4TextConfig(PreTrainedConfig): + r""" + use_bidirectional_attention (`str`, *optional*): + Controls bidirectional attention behavior. When set to `"vision"`, vision tokens + attend bidirectionally while text tokens use causal attention. When set to `"all"`, + all tokens use bidirectional attention. + vocab_size_per_layer_input (`int`, defaults to 262144): + Vocabulary size for the per-layer input embeddings (PLE). Used by models with + per-layer residual streams where a smaller embedding is added at each decoder layer. + hidden_size_per_layer_input (`int`, defaults to 256): + Per-layer hidden dimension for the PLE system. The actual embedding weight has shape + `[vocab_size_per_layer_input, num_hidden_layers * hidden_size_per_layer_input]` + because all layers are packed into a single table. See the [Gemma4](https://huggingface.co/docs/transformers/main/en/model_doc/gemma4#per-layer-embeddings-ple) docs + for a description of the full PLE pipeline. + num_global_key_value_heads (`int`, *optional*): + Number of key-value heads for global (full) attention layers. If `None`, defaults + to `num_key_value_heads`. + global_head_dim (`int`, defaults to 512): + Dimension of each attention head in global (full) attention layers. + attention_k_eq_v (`bool`, defaults to `False`): + Whether keys and values share the same projection weights. When `True`, the key + projection output is reused as the value projection. + num_kv_shared_layers (`int`, defaults to 0): + Number of consecutive decoder layers that share the same key-value projections. + A value of 0 means no sharing (each layer has independent KV projections). + enable_moe_block (`bool`, defaults to `False`): + Whether to enable Mixture-of-Experts (MoE) blocks in the decoder layers. When + `True`, eligible layers will use a sparse MoE feed-forward network. + use_double_wide_mlp (`bool`, defaults to `False`): + Whether to use a double-width MLP with fused gate and up projections. + top_k_experts (`int`, *optional*): + Number of experts activated per token in MoE layers. Only used when + `enable_moe_block=True`. + moe_intermediate_size (`int`, *optional*): + Intermediate (hidden) size of each expert's feed-forward network in MoE layers. + Only used when `enable_moe_block=True`. + """ + + model_type = "gemma4_text" + keys_to_ignore_at_inference = ["past_key_values"] + base_model_tp_plan = { + "layers.*.self_attn.q_proj": "colwise", + "layers.*.self_attn.k_proj": "colwise", + "layers.*.self_attn.v_proj": "colwise", + "layers.*.self_attn.q_norm": "replicated_with_grad_allreduce", + "layers.*.self_attn.k_norm": "replicated_with_grad_allreduce", + "layers.*.self_attn.o_proj": "rowwise", + "layers.*.mlp.gate_proj": "colwise", + "layers.*.mlp.up_proj": "colwise", + "layers.*.mlp.down_proj": "rowwise", + "layers.*.experts.gate_up_proj": "packed_colwise", + "layers.*.experts.down_proj": "rowwise", + "layers.*.experts": "moe_tp_experts", + } + base_model_pp_plan = { + "embed_tokens": (["input_ids"], ["inputs_embeds"]), + "layers": (["hidden_states", "attention_mask"], ["hidden_states"]), + "norm": (["hidden_states"], ["hidden_states"]), + } + + vocab_size: int = 262_144 + hidden_size: int = 2304 + intermediate_size: int = 9216 + num_hidden_layers: int = 30 + num_attention_heads: int = 8 + num_key_value_heads: int = 4 + head_dim: int = 256 + hidden_activation: str = "gelu_pytorch_tanh" + max_position_embeddings: int = 131_072 + initializer_range: float = 0.02 + rms_norm_eps: float = 1e-6 + use_cache: bool = True + pad_token_id: int | None = 0 + eos_token_id: int | list[int] | None = 1 + bos_token_id: int | None = 2 + tie_word_embeddings: bool = True + rope_parameters: dict | None = None + attention_bias: bool = False + attention_dropout: int | float | None = 0.0 + sliding_window: int = 512 + layer_types: list[str] | None = None + final_logit_softcapping: float | None = None + use_bidirectional_attention: Literal["all", "vision"] | None = None + vocab_size_per_layer_input: int = 262_144 + hidden_size_per_layer_input: int = 256 + num_global_key_value_heads: int | None = None + global_head_dim: int = 512 + attention_k_eq_v: bool = False + num_kv_shared_layers: int = 0 + enable_moe_block: bool = False + use_double_wide_mlp: bool = False + num_experts: int | None = None + top_k_experts: int | None = None + moe_intermediate_size: int | None = None + + def __post_init__(self, **kwargs): + if self.use_bidirectional_attention == "all": + self.sliding_window = (self.sliding_window // 2) + 1 # due to fa we set exclusive bounds + + if self.layer_types is None: + sliding_window_pattern = 6 # by default 5:1 + self.layer_types = [ + "sliding_attention" if bool((i + 1) % sliding_window_pattern) else "full_attention" + for i in range(self.num_hidden_layers) + ] + + if self.layer_types and (last_layer_type := self.layer_types[-1]) != "full_attention": + logger.warning( + f"Last layer must use `full_attention`, but got `{last_layer_type}`. Forcing last layer to `full_attention`." + ) + self.layer_types[-1] = "full_attention" + + default_rope_params: dict[Literal["full_attention", "sliding_attention"] : dict[str, Any]] = { + "sliding_attention": {"rope_type": "default", "rope_theta": 10_000.0}, + "full_attention": {"rope_type": "proportional", "partial_rotary_factor": 0.25, "rope_theta": 1_000_000.0}, + } + if self.rope_parameters is None: + self.rope_parameters = default_rope_params + + super().__post_init__(**kwargs) + + def convert_rope_params_to_dict(self, **kwargs): + # No need to handle BC for new models, because they have no old-format `rope_scaling` + return kwargs + + +@auto_docstring(checkpoint="google/gemma-4-e2b-it") +@strict +class Gemma4VisionConfig(PreTrainedConfig): + r""" + pooling_kernel_size (`int`, *optional*): + Spatial pooling kernel size applied after patchification. + position_embedding_size (`int`, defaults to 10240): + Maximum number of position embeddings for the vision encoder. Controls the size of + the learned 2D position embedding table used by the patch embedder. + use_clipped_linears (`bool`, defaults to `False`): + Whether to use weight-clipped linear layers. When enabled, linear layer weights are + clamped to a fixed range during the forward pass to improve numerical stability. + standardize (`bool`, defaults to `False`): + If true, applies a bias and scale to the soft tokens returned from the pooler. + """ + + model_type = "gemma4_vision" + base_model_tp_plan = { + "encoder.layers.*.self_attn.q_proj": "colwise", + "encoder.layers.*.self_attn.k_proj": "colwise", + "encoder.layers.*.self_attn.v_proj": "colwise", + "encoder.layers.*.self_attn.q_norm": "replicated_with_grad_allreduce", + "encoder.layers.*.self_attn.k_norm": "replicated_with_grad_allreduce", + "encoder.layers.*.self_attn.o_proj": "rowwise", + "encoder.layers.*.mlp.gate_proj": "colwise", + "encoder.layers.*.mlp.up_proj": "colwise", + "encoder.layers.*.mlp.down_proj": "rowwise", + } + default_theta = 100.0 + + hidden_size: int = 768 + intermediate_size: int = 3072 + num_hidden_layers: int = 16 + num_attention_heads: int = 12 + num_key_value_heads: int = 12 + head_dim: int = 64 + hidden_activation: str = "gelu_pytorch_tanh" + rms_norm_eps: float = 1e-6 + max_position_embeddings: int = 131_072 + attention_bias: bool | None = False + attention_dropout: float | None = 0.0 + rope_parameters: dict | None = None + pooling_kernel_size: int = 3 + patch_size: int = 16 + position_embedding_size: int = 10 * 1024 + use_clipped_linears: bool = False + standardize: bool = False + initializer_range: float = 0.02 + + def __post_init__(self, **kwargs): + if self.rope_parameters is None: + self.rope_parameters = {"rope_type": "default", "rope_theta": 100.0} + + super().__post_init__(**kwargs) + + +@auto_docstring(checkpoint="google/gemma-4-e2b-it") +@strict +class Gemma4Config(PreTrainedConfig): + r""" + boi_token_id (`int`, *optional*, defaults to 255999): + The begin-of-image token index to wrap the image prompt. + eoi_token_id (`int`, *optional*, defaults to 258882): + The end-of-image token index to wrap the image prompt. + boa_token_id (`int`, *optional*, defaults to 256000): + The begin-of-audio token index to wrap the audio prompt. + eoa_token_index (`int`, *optional*, defaults to 258883): + The end-of-audio token index to wrap the audio prompt. + + Example: + + ```python + >>> from transformers import ( + >>> Gemma4AudioConfig, + >>> Gemma4Config, + >>> Gemma4ForConditionalGeneration, + >>> Gemma4TextConfig, + >>> Gemma4VisionConfig, + >>> ) + + >>> # Initializing a Gemma 4 Audio config. + >>> audio_config = Gemma4AudioConfig() + + >>> # Initializing a Gemma 4 Text config. + >>> text_config = Gemma4TextConfig() + + >>> # Initializing a Gemma 4 vision config. + >>> vision_config = Gemma4VisionConfig() + + >>> # Initializing a Gemma 4 config similar to google/gemma-4-e2b-it + >>> configuration = Gemma4Config(text_config, vision_config, audio_config) + + >>> # Initializing a model from the google/gemma-4-e2b-it configuration + >>> model = Gemma4ForConditionalGeneration(configuration) + + >>> # Accessing the model configuration + >>> configuration = model.config + ```""" + + model_type = "gemma4" + sub_configs = { + "text_config": Gemma4TextConfig, + "vision_config": Gemma4VisionConfig, + "audio_config": Gemma4AudioConfig, + } + + text_config: Gemma4TextConfig | dict[str, Any] | None = None + vision_config: Gemma4VisionConfig | dict[str, Any] | None = None + audio_config: Gemma4AudioConfig | dict[str, Any] | None = None + boi_token_id: int | None = 255_999 + eoi_token_id: int | None = 258_882 + image_token_id: int | None = 258_880 + video_token_id: int | None = 258_884 + boa_token_id: int | None = 256_000 + eoa_token_index: int | None = 258_883 + audio_token_id: int | None = 258_881 + initializer_range: float | None = 0.02 + tie_word_embeddings: bool = True + + def __post_init__(self, **kwargs): + if self.text_config is None: + self.text_config = Gemma4TextConfig() + logger.info("text_config is None. Using default Gemma4TextConfig.") + elif isinstance(self.text_config, dict): + self.text_config = Gemma4TextConfig(**self.text_config) + + if self.vision_config is None: + logger.info("vision_config is None. Gemma4Model.vision_tower will not be initialized.") + if isinstance(self.vision_config, dict): + self.vision_config = Gemma4VisionConfig(**self.vision_config) + + if self.audio_config is None: + logger.info("audio_config is None. Gemma4Model.audio_tower will not be initialized.") + if isinstance(self.audio_config, dict): + self.audio_config = Gemma4AudioConfig(**self.audio_config) + + super().__post_init__(**kwargs) + + +__all__ = ["Gemma4AudioConfig", "Gemma4Config", "Gemma4TextConfig", "Gemma4VisionConfig"] diff --git a/tooling/huggingface/transformers/feature_extraction_gemma4.py b/tooling/huggingface/transformers/feature_extraction_gemma4.py new file mode 100644 index 0000000..38382e8 --- /dev/null +++ b/tooling/huggingface/transformers/feature_extraction_gemma4.py @@ -0,0 +1,298 @@ +# Copyright 2026 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +import warnings +from collections.abc import Sequence + +import numpy as np + +from ...audio_utils import mel_filter_bank, window_function +from ...feature_extraction_sequence_utils import SequenceFeatureExtractor +from ...feature_extraction_utils import BatchFeature +from ...utils import PaddingStrategy, TensorType, logging + + +logger = logging.get_logger(__name__) + + +def _unfold(array: np.ndarray, dimension: int, size: int, step: int) -> np.ndarray: + """A basic NumPy equivalent of PyTorch's unfold for 2D arrays along the last dim.""" + if array.ndim != 2: + raise ValueError("This unfold implementation currently supports 2D arrays (batch, time).") + if dimension != -1 and dimension != array.ndim - 1: + raise ValueError("This unfold implementation only supports unfolding the last dimension.") + + batch_size, original_length = array.shape + num_frames = (original_length - size) // step + 1 + + if num_frames <= 0: + return np.zeros((batch_size, 0, size), dtype=array.dtype) + + output_shape = (batch_size, num_frames, size) + output_strides = (array.strides[0], array.strides[1] * step, array.strides[1]) + + return np.lib.stride_tricks.as_strided(array, shape=output_shape, strides=output_strides) + + +class Gemma4AudioFeatureExtractor(SequenceFeatureExtractor): + """An audio feature extractor Universal Speech Models https://huggingface.co/papers/2303.01037. + + Args: + feature_size (`int`, *optional*, defaults to 128): + The feature dimension of the extracted features. + sampling_rate (`int`, *optional*, defaults to 16000): + The sampling rate at which the audio files should be digitalized expressed in hertz (Hz). + padding_value (`float`, *optional*, defaults to 0.0): + Padding value used to pad the audio. Should correspond to silences. + return_attention_mask (`bool`, *optional*, defaults to `True`): + Whether to return the attention mask for the generated MEL spectrograms. + frame_length_ms (`float`, *optional*, defaults to 20.0): + The length of a frame in milliseconds. + hop_length_ms (`float`, *optional*, defaults to 10.0): + Length of the overlapping windows for the STFT used to obtain the Mel Frequency coefficients. + min_frequency (`float`, *optional*, defaults to 0.0): + The minimum frequency (in Hz) for the Mel filterbank. + max_frequency (`float`, *optional*, defaults to 8000.0): + The maximum frequency (in Hz) for the Mel filterbank. + preemphasis (`float`, *optional*, defaults to 0.0): + The preemphasis coefficient. + preemphasis_htk_flavor (`bool`, *optional*, defaults to `True`): + Whether to use HTK-style preemphasis. + fft_overdrive (`bool`, *optional*, defaults to `False`): + Whether to use FFT overdrive. + dither (`float`, *optional*, defaults to 0.0): + Adds dithering. In other words, adds a small Gaussian noise to each frame. + E.g. use 0.0001 to add dithering with a normal distribution centered + around 0.0 with standard deviation 0.0001 (assuming [-1,+1] range of raw_speech). + The value 0.0 means no dithering. + Dithering has similar effect as `spectrogram(mel_floor=...)`. It reduces + the high log_mel_fbank values for signals with hard-zero sections, + when VAD cutoff is present in the signal. + input_scale_factor (`float`, *optional*, defaults to 1.0): + Scaling factor applied to the input waveform. + mel_floor (`float`, *optional*, defaults to 0.001): + Minimum value for Mel spectrograms to avoid log(0). + per_bin_mean (`Optional[Sequence[float]]`, *optional*): + Mean values for per-bin normalization. + per_bin_stddev (`Optional[Sequence[float]]`, *optional*): + Standard deviation values for per-bin normalization. + """ + + model_input_names = ["input_features", "input_features_mask"] + + def __init__( + self, + feature_size: int = 128, + sampling_rate: int = 16_000, + padding_value: float = 0.0, + return_attention_mask: bool = True, + frame_length_ms: float = 20.0, + hop_length_ms: float = 10.0, + min_frequency: float = 0.0, + max_frequency: float = 8000.0, + preemphasis: float = 0.0, + preemphasis_htk_flavor: bool = True, + fft_overdrive: bool = False, + dither: float = 0.0, + input_scale_factor: float = 1.0, + mel_floor: float = 1e-3, + per_bin_mean: Sequence[float] | None = None, + per_bin_stddev: Sequence[float] | None = None, + **kwargs, + ): + super().__init__( + feature_size=feature_size, + sampling_rate=sampling_rate, + padding_value=padding_value, + return_attention_mask=return_attention_mask, + **kwargs, + ) + + self.min_frequency = min_frequency + self.max_frequency = max_frequency + self.preemphasis = preemphasis + self.preemphasis_htk_flavor = preemphasis_htk_flavor + self.fft_overdrive = fft_overdrive + self.dither = dither + self.input_scale_factor = input_scale_factor + self.frame_length = int(round(sampling_rate * frame_length_ms / 1000.0)) + self.hop_length = int(round(sampling_rate * hop_length_ms / 1000.0)) + self.mel_floor = np.array(mel_floor, dtype=np.float64) + + fft_length = 2 ** math.ceil(math.log2(self.frame_length)) + if self.fft_overdrive: + fft_length *= 2 + self.fft_length = fft_length + + # Use periodic Hann window, matching sl.STFT default (signal.hann_window) + # For even frame_length: window[n] = 0.5 - 0.5 * cos(2*pi*n / frame_length) + self.window = window_function(self.frame_length).astype(np.float32) + + # Use HuggingFace's mel_filter_bank for compatibility. + # Suppress the expected warning about all-zero upper mel filters; + # with fft_length=512 (257 bins) and 128 mel filters the uppermost + # triangular filter falls between frequency bins, which is harmless. + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + self.mel_filters = mel_filter_bank( + num_frequency_bins=self.fft_length // 2 + 1, + num_mel_filters=feature_size, + min_frequency=min_frequency, + max_frequency=max_frequency, + sampling_rate=self.sampling_rate, + norm=None, + mel_scale="htk", + ) + + if per_bin_mean is not None: + self.per_bin_mean = np.array(per_bin_mean).reshape(1, 1, feature_size) + else: + self.per_bin_mean = None + + if per_bin_stddev is not None: + self.per_bin_stddev = np.array(per_bin_stddev).reshape(1, 1, feature_size) + else: + self.per_bin_stddev = None + + def _extract_spectrogram(self, waveform: np.ndarray, attention_mask: np.ndarray) -> tuple[np.ndarray, np.ndarray]: + """""" + if waveform.ndim == 1: # If single waveform, add batch dimension + waveform = np.expand_dims(waveform, axis=0) + + if self.dither > 0.0: + waveform = waveform + self.dither * np.random.randn(*waveform.shape).astype(waveform.dtype) + + if self.input_scale_factor != 1.0: + waveform = waveform * self.input_scale_factor + + # Semicausal time padding: prepend frame_length // 2 zeros so that the + # first STFT frame is centered at t=0, matching sl.STFT(time_padding='semicausal'). + pad_left = self.frame_length // 2 + waveform = np.pad(waveform, ((0, 0), (pad_left, 0)), mode="constant") + attention_mask = np.pad(attention_mask, (pad_left, 0), mode="constant", constant_values=0) + + frame_size_for_unfold = self.frame_length + 1 + + # NumPy equivalent of unfold for [B, NumFrames, frame_size_for_unfold] + frames_to_process = _unfold(waveform, dimension=-1, size=frame_size_for_unfold, step=self.hop_length) + + if self.preemphasis > 0.0: + if self.preemphasis_htk_flavor: + first_in_frame = frames_to_process[..., :1] * (1.0 - self.preemphasis) + rest_in_frame = frames_to_process[..., 1:-1] - self.preemphasis * frames_to_process[..., :-2] + frames = np.concatenate([first_in_frame, rest_in_frame], axis=-1) + else: + frames = frames_to_process[..., 1:] - self.preemphasis * frames_to_process[..., :-1] + else: + frames = frames_to_process[..., :-1] + + # Apply window, then RFFT. np.fft.rfft with n=fft_length implicitly + # right-pads frames to fft_length. + frames = frames * self.window # Broadcasting window + stft = np.fft.rfft(frames, n=self.fft_length, axis=-1) + + magnitude_spec = np.abs(stft) + + mel_spec = np.matmul(magnitude_spec, self.mel_filters) + log_mel_spec = np.log(mel_spec + self.mel_floor) + + if self.per_bin_mean is not None: + log_mel_spec = log_mel_spec - self.per_bin_mean # Broadcasting + + if self.per_bin_stddev is not None: + log_mel_spec = log_mel_spec / self.per_bin_stddev # Broadcasting + + mel_spectrogram = log_mel_spec.squeeze(0) + num_mel_frames = mel_spectrogram.shape[0] + + # Build a frame-aware mask: a mel frame is valid only when every sample + # in its analysis window [i*hop, i*hop + frame_size - 1] is real audio. + # We check this by looking at the last sample of each frame's window. + frame_end_indices = np.arange(num_mel_frames) * self.hop_length + frame_size_for_unfold - 1 + mask = attention_mask[frame_end_indices].astype(bool) + return mel_spectrogram, mask + + def __call__( + self, + raw_speech: np.ndarray | list[float] | list[np.ndarray] | list[list[float]], + padding: bool | str | PaddingStrategy = "longest", + max_length: int | None = 480_000, + truncation: bool = True, + pad_to_multiple_of: int | None = 128, + return_tensors: str | TensorType | None = None, + return_attention_mask: bool | None = True, + **kwargs, + ) -> BatchFeature: + """Creates a batch of MEL spectrograms from the provided raw speech. + + This implementation uses a different algorithm for windowing and preemphasis compared to the built-in + `transformers.audio_utils.spectrogram()` function that _will_ result in different outputs. Consider this + carefully when selecting an audio feature extractor, especially with pre-trained models. + + Args: + raw_speech: + The audio for which MEL spectrograms are created. + padding (`Union[bool, str, PaddingStrategy]`, *optional*, defaults to `"longest"`): + The padding strategy to use for batches of audio with different lengths. + max_length (`int`, *optional*, defaults to 480000): + If provided, defines the maximum length of the audio to allow. Audio longer than this will be + truncated if `truncation=True`. + truncation (`bool`, *optional*, defaults to `True`): + Whether or not to truncate audio above `max_length`. + pad_to_multiple_of (`int`, *optional*, defaults to 128): + When padding, pad to a multiple of this value. The default value is defined for optimal TPU support. + return_tensors (`Union[str, TensorType]`, *optional*, defaults to `None`): + The type of tensors to return (e.g., NumPy, or Torch). + return_attention_mask (`bool`, *optional*, defaults to `True`): + Whether to return the attention mask for the generated MEL spectrograms. + """ + + is_batched_numpy = isinstance(raw_speech, np.ndarray) and len(raw_speech.shape) > 1 + is_batched_sequence = isinstance(raw_speech, Sequence) and isinstance(raw_speech[0], (np.ndarray, Sequence)) + is_batched = is_batched_numpy or is_batched_sequence + + if is_batched: + raw_speech = [np.asarray([rs]).T for rs in raw_speech] + elif not is_batched and not isinstance(raw_speech, np.ndarray): + raw_speech = np.asarray(raw_speech) + + if not is_batched: # always return a batch + raw_speech = [np.asarray([raw_speech])] + + batched_speech = self.pad( + BatchFeature({"input_features": raw_speech}), + padding=padding, + max_length=max_length, + truncation=truncation, + pad_to_multiple_of=pad_to_multiple_of, + return_attention_mask=return_attention_mask, + ) + + prepared_speech = [] + prepared_speech_mask = [] + for speech, mask in zip(batched_speech.input_features, batched_speech.attention_mask): + speech, mask = self._extract_spectrogram(speech.T, mask) + prepared_speech.append(speech.astype(np.float32)) + prepared_speech_mask.append(mask) + + prepared_speech = [speech * mask[..., None] for speech, mask in zip(prepared_speech, prepared_speech_mask)] + + return BatchFeature( + {"input_features": prepared_speech, "input_features_mask": prepared_speech_mask}, + tensor_type=return_tensors, + ) + + +__all__ = ["Gemma4AudioFeatureExtractor"] diff --git a/tooling/huggingface/transformers/image_processing_gemma4.py b/tooling/huggingface/transformers/image_processing_gemma4.py new file mode 100644 index 0000000..88510d0 --- /dev/null +++ b/tooling/huggingface/transformers/image_processing_gemma4.py @@ -0,0 +1,220 @@ +# Copyright 2026 the HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import torch +from torchvision.transforms.v2 import functional as F + +from ...image_processing_backends import TorchvisionBackend +from ...image_processing_utils import BatchFeature +from ...image_utils import ImageInput, PILImageResampling +from ...processing_utils import ImagesKwargs, Unpack +from ...utils import TensorType, auto_docstring, logging +from .image_processing_pil_gemma4 import _SUPPORTED_SOFT_TOKENS, get_aspect_ratio_preserving_size + + +logger = logging.get_logger(__name__) + + +# Copied from transformers.models.siglip2.image_processing_siglip2.convert_image_to_patches +def convert_image_to_patches(image: "torch.Tensor", patch_size: int) -> "torch.Tensor": + """ + Convert 3D tensor image of shape (num_channels, image_height, image_width) into 2D tensor of patches of shape + (num_patches_height * num_patches_width, patch_size * patch_size * num_channels). + """ + num_channels, image_height, image_width = image.shape + num_patches_height = image_height // patch_size + num_patches_width = image_width // patch_size + patched_image = image.reshape(num_channels, num_patches_height, patch_size, num_patches_width, patch_size) + patched_image = patched_image.permute(1, 3, 2, 4, 0) + patched_image = patched_image.reshape(num_patches_height * num_patches_width, -1) + return patched_image + + +# Adopted from Siglip2 (mask -> position ids) +def pad_along_first_dim( + image: "torch.Tensor", positions: "torch.Tensor", target_length: int +) -> tuple["torch.Tensor", "torch.Tensor"]: + """ + Pad the tensor along the first dimension. + """ + current_length = image.shape[0] + padding_length = target_length - current_length + if padding_length > 0: + padding = [0, 0] * (image.ndim - 1) + [0, padding_length] + pos_padding = (0, 0, 0, padding_length) + image = torch.nn.functional.pad(image, padding, mode="constant", value=0) + positions = torch.nn.functional.pad(positions, pos_padding, mode="constant", value=-1) + return image, positions + + +class Gemma4ImageProcessorKwargs(ImagesKwargs, total=False): + """ + patch_size (`int`, *optional*): + Size of each image patch in pixels. + max_soft_tokens (`int`, *optional*): + Maximum number of soft (vision) tokens per image. + Must be one of {70, 140, 280, 560, 1120}. + pooling_kernel_size (`int`, *optional*): + Spatial pooling kernel size applied after patchification. + """ + + patch_size: int + max_soft_tokens: int + pooling_kernel_size: int + + +@auto_docstring(custom_intro="Constructs a Gemma4 image processor.") +class Gemma4ImageProcessor(TorchvisionBackend): + resample = PILImageResampling.BICUBIC + image_mean = [0.0, 0.0, 0.0] + image_std = [1.0, 1.0, 1.0] + size = None + default_to_square = True + do_convert_rgb = True + do_resize = True + do_rescale = True + do_normalize = False + patch_size = 16 + max_soft_tokens = 280 + pooling_kernel_size = 3 + valid_kwargs = Gemma4ImageProcessorKwargs + model_input_names = ["pixel_values", "image_position_ids", "num_soft_tokens_per_image"] + + def __init__(self, **kwargs: Unpack[Gemma4ImageProcessorKwargs]): + super().__init__(**kwargs) + + if self.max_soft_tokens not in _SUPPORTED_SOFT_TOKENS: + raise ValueError(f"`max_soft_tokens` must be one of {_SUPPORTED_SOFT_TOKENS}, got {self.max_soft_tokens}.") + + def _validate_preprocess_kwargs(self, **kwargs): + # Gemma4 uses aspect_ratio_preserving_resize driven by patch_size, + # max_soft_tokens, and pooling_kernel_size — not the standard `size` + # parameter. Temporarily disable do_resize so the base validation + # doesn't require `size` to be set. + kwargs["do_resize"] = False + super()._validate_preprocess_kwargs(**kwargs) + + def aspect_ratio_preserving_resize( + self, + image: torch.Tensor, + patch_size: int, + max_patches: int, + pooling_kernel_size: int, + resample: F.InterpolationMode, + ) -> torch.Tensor: + height, width = image.shape[-2], image.shape[-1] + target_height, target_width = get_aspect_ratio_preserving_size( + height=height, + width=width, + patch_size=patch_size, + max_patches=max_patches, + pooling_kernel_size=pooling_kernel_size, + ) + + if target_height == height and target_width == width: + return image + + return F.resize( + image, + size=[target_height, target_width], + interpolation=resample, + antialias=True, + ) + + def preprocess( + self, + images: ImageInput, + **kwargs: Unpack[Gemma4ImageProcessorKwargs], + ) -> BatchFeature: + return super().preprocess(images, **kwargs) + + def _preprocess( + self, + images: list["torch.Tensor"], + do_resize: bool, + resample: "PILImageResampling | F.InterpolationMode | int | None", + do_rescale: bool, + rescale_factor: float, + do_normalize: bool, + image_mean: float | list[float] | None, + image_std: float | list[float] | None, + return_tensors: str | TensorType | None, + patch_size: int | None = None, + max_soft_tokens: int | None = None, + pooling_kernel_size: int | None = None, + **kwargs, + ) -> BatchFeature: + if max_soft_tokens not in _SUPPORTED_SOFT_TOKENS: + raise ValueError(f"`max_soft_tokens` must be one of {_SUPPORTED_SOFT_TOKENS}, got {max_soft_tokens}.") + + # Compute max_patches from max_soft_tokens and pooling_kernel_size + max_patches = max_soft_tokens * pooling_kernel_size**2 + + # Process each image individually: resize, rescale/normalize, patchify, pad. + # Images have different aspect ratios and thus different resized dimensions, + # so patchification and padding must happen per-image before stacking. + pixel_values = [] + position_ids = [] + num_soft_tokens_per_image = [] + + for image in images: + # Step 1: Aspect-ratio-preserving resize + if do_resize: + image = self.aspect_ratio_preserving_resize( + image=image, + patch_size=patch_size, + max_patches=max_patches, + pooling_kernel_size=pooling_kernel_size, + resample=resample, + ) + + # Step 2: Rescale pixel values (typically to [0, 1]) and optionally identity normalize + image = self.rescale_and_normalize(image, do_rescale, rescale_factor, do_normalize, image_mean, image_std) + + # Step 3: Patchify the image + # (num_channels, height, width) -> (num_patches, patch_size * patch_size * num_channels) + patch_height = image.shape[-2] // patch_size + patch_width = image.shape[-1] // patch_size + patches = convert_image_to_patches(image, patch_size) + num_soft_tokens_per_image.append(patches.shape[0] // pooling_kernel_size**2) + + # Step 5: Compute position IDs + device = image.device + patch_grid = torch.meshgrid( + torch.arange(patch_width, device=device), + torch.arange(patch_height, device=device), + indexing="xy", + ) + stacked_grid = torch.stack(patch_grid, dim=-1) + real_positions = stacked_grid.reshape(patches.shape[0], 2) + + # Step 6. Pad pacthes and positions to `max_patches` + patches, positions = pad_along_first_dim(patches, real_positions, max_patches) + pixel_values.append(patches) + position_ids.append(positions) + + # Stack into batch tensors + pixel_values = torch.stack(pixel_values, dim=0) # (batch, max_patches, patch_pixels) + position_ids = torch.stack(position_ids, dim=0) # (batch, max_patches, 2) + + data = { + "pixel_values": pixel_values, + "image_position_ids": position_ids, + "num_soft_tokens_per_image": num_soft_tokens_per_image, + } + return BatchFeature(data=data, tensor_type=return_tensors) + + +__all__ = ["Gemma4ImageProcessor"] diff --git a/tooling/huggingface/transformers/image_processing_pil_gemma4.py b/tooling/huggingface/transformers/image_processing_pil_gemma4.py new file mode 100644 index 0000000..d58f6a4 --- /dev/null +++ b/tooling/huggingface/transformers/image_processing_pil_gemma4.py @@ -0,0 +1,278 @@ +# Copyright 2026 the HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math + +import numpy as np + +from ...image_processing_backends import PilBackend +from ...image_processing_utils import BatchFeature +from ...image_transforms import resize +from ...image_utils import ImageInput +from ...processing_utils import ImagesKwargs, Unpack +from ...utils import TensorType, auto_docstring, is_vision_available, logging + + +if is_vision_available(): + from ...image_utils import PILImageResampling + + +logger = logging.get_logger(__name__) + +_SUPPORTED_SOFT_TOKENS = (70, 140, 280, 560, 1120) + + +def get_aspect_ratio_preserving_size( + height: int, + width: int, + patch_size: int, + max_patches: int, + pooling_kernel_size: int, +) -> tuple[int, int]: + """ + Image is resized to preserve aspect ratio so it fits within the patch budget. + Target dimensions are the largest that: + 1) Produce at most `max_patches` patches when patchified with `patch_size` + 2) Have height and width divisible by `pooling_kernel_size * patch_size` + """ + total_px = height * width + target_px = max_patches * (patch_size**2) + factor = math.sqrt(target_px / total_px) + ideal_height = factor * height + ideal_width = factor * width + side_mult = pooling_kernel_size * patch_size + + # Round down to nearest multiple of side_mult + target_height = int(math.floor(ideal_height / side_mult)) * side_mult + target_width = int(math.floor(ideal_width / side_mult)) * side_mult + + # Handle edge cases where one or both dimensions round to 0 + if target_height == 0 and target_width == 0: + raise ValueError( + "Attempting to resize to a 0 x 0 image. Resized height should be divisble by " + f"`pooling_kernel_size * patch_size`={pooling_kernel_size * patch_size}." + ) + + max_side_length = (max_patches // pooling_kernel_size**2) * side_mult + if target_height == 0: + target_height = side_mult + target_width = min( + int(math.floor(width / height)) * side_mult, + max_side_length, + ) + elif target_width == 0: + target_width = side_mult + target_height = min( + int(math.floor(height / width)) * side_mult, + max_side_length, + ) + + if target_height * target_width > target_px: + raise ValueError( + f"Resizing [{height}x{width}] to [{target_height}x{target_width}] " + f"but this exceeds {max_patches} patches with patch_size {patch_size}" + ) + + return target_height, target_width + + +# Copied from transformers.models.siglip2.image_processing_pil_siglip2.convert_image_to_patches +def convert_image_to_patches(image: np.ndarray, patch_size: int) -> np.ndarray: + """ + Convert 3D array image of shape (num_channels, image_height, image_width) into 2D array of patches of shape + (num_patches_height * num_patches_width, patch_size * patch_size * num_channels). + """ + num_channels, image_height, image_width = image.shape + num_patches_height = image_height // patch_size + num_patches_width = image_width // patch_size + patched_image = image.reshape(num_channels, num_patches_height, patch_size, num_patches_width, patch_size) + patched_image = patched_image.transpose(1, 3, 2, 4, 0) + patched_image = patched_image.reshape(num_patches_height * num_patches_width, -1) + return patched_image + + +# Adopted from Siglip2 (mask -> position ids) +def pad_along_first_dim(image: np.ndarray, positions: np.ndarray, target_length: int) -> tuple[np.ndarray, np.ndarray]: + """ + Pad the image along the first dimension. + """ + current_length = image.shape[0] + padding_length = target_length - current_length + if padding_length > 0: + paddings = [(0, padding_length)] + [(0, 0)] * (image.ndim - 1) + pos_paddings = [(0, padding_length), (0, 0)] + image = np.pad(image, paddings, mode="constant", constant_values=0) + positions = np.pad(positions, pos_paddings, mode="constant", constant_values=-1) + return image, positions + + +class Gemma4ImageProcessorKwargs(ImagesKwargs, total=False): + """ + patch_size (`int`, *optional*): + Size of each image patch in pixels. + max_soft_tokens (`int`, *optional*): + Maximum number of soft (vision) tokens per image. + Must be one of {70, 140, 280, 560, 1120}. + pooling_kernel_size (`int`, *optional*): + Spatial pooling kernel size applied after patchification. + """ + + patch_size: int + max_soft_tokens: int + pooling_kernel_size: int + + +@auto_docstring(custom_intro="Constructs a Gemma4 image processor.") +class Gemma4ImageProcessorPil(PilBackend): + valid_kwargs = Gemma4ImageProcessorKwargs + model_input_names = ["pixel_values", "image_position_ids", "num_soft_tokens_per_image"] + + do_resize = True + resample = PILImageResampling.BICUBIC + do_rescale = True + rescale_factor = 1 / 255 + do_normalize = False + image_mean = [0.0, 0.0, 0.0] + image_std = [1.0, 1.0, 1.0] + do_convert_rgb = True + patch_size = 16 + max_soft_tokens = 280 + pooling_kernel_size = 3 + + def __init__(self, **kwargs: Unpack[Gemma4ImageProcessorKwargs]) -> None: + super().__init__(**kwargs) + + if self.max_soft_tokens not in _SUPPORTED_SOFT_TOKENS: + raise ValueError(f"`max_soft_tokens` must be one of {_SUPPORTED_SOFT_TOKENS}, got {self.max_soft_tokens}.") + + def _validate_preprocess_kwargs(self, **kwargs): + # Gemma4 uses aspect_ratio_preserving_resize driven by patch_size, + # max_soft_tokens, and pooling_kernel_size — not the standard `size` + # parameter. Temporarily disable do_resize so the base validation + # doesn't require `size` to be set. + kwargs["do_resize"] = False + super()._validate_preprocess_kwargs(**kwargs) + + @auto_docstring + def preprocess( + self, + images: ImageInput, + **kwargs: Unpack[Gemma4ImageProcessorKwargs], + ) -> BatchFeature: + return super().preprocess(images, **kwargs) + + def aspect_ratio_preserving_resize( + self, + image: np.ndarray, + patch_size: int, + max_patches: int, + pooling_kernel_size: int, + resample: PILImageResampling, + ) -> np.ndarray: + height, width = image.shape[-2], image.shape[-1] + target_height, target_width = get_aspect_ratio_preserving_size( + height=height, + width=width, + patch_size=patch_size, + max_patches=max_patches, + pooling_kernel_size=pooling_kernel_size, + ) + + if target_height == height and target_width == width: + return image + + return resize( + image, + size=(target_height, target_width), + resample=resample, + ) + + def _preprocess( + self, + images: list[np.ndarray], + do_resize: bool, + resample: "PILImageResampling | int | None", + do_rescale: bool, + rescale_factor: float, + do_normalize: bool, + image_mean: float | list[float] | None, + image_std: float | list[float] | None, + return_tensors: str | TensorType | None, + max_soft_tokens: int | None = None, + patch_size: int | None = None, + pooling_kernel_size: int | None = None, + **kwargs, + ) -> BatchFeature: + if max_soft_tokens not in _SUPPORTED_SOFT_TOKENS: + raise ValueError(f"`max_soft_tokens` must be one of {_SUPPORTED_SOFT_TOKENS}, got {max_soft_tokens}.") + + # Compute max_patches from max_soft_tokens and pooling_kernel_size + max_patches = max_soft_tokens * pooling_kernel_size**2 + + # Process each image individually: resize, rescale/normalize, patchify, pad. + # Images have different aspect ratios and thus different resized dimensions, + # so patchification and padding must happen per-image before stacking. + pixel_values = [] + position_ids = [] + num_soft_tokens_per_image = [] + + for image in images: + # Step 1: Aspect-ratio-preserving resize + if do_resize: + image = self.aspect_ratio_preserving_resize( + image=image, + patch_size=patch_size, + max_patches=max_patches, + pooling_kernel_size=pooling_kernel_size, + resample=resample, + ) + + # Step 2: Rescale pixel values from [0, 255] to [0, 1] + if do_rescale: + image = self.rescale(image=image, scale=rescale_factor) + + # Step 3: Identity normalization because Gemma4 was trained with pixels in [0, 1] + if do_normalize: + image = self.normalize(image=image, mean=image_mean, std=image_std) + + # Step 4: Patchify the image + # image is (C, H, W) numpy array; add batch dimension for reshape + # (num_channels, height, width) -> (num_patches, patch_size * patch_size * num_channels) + patches = convert_image_to_patches(image, patch_size) + num_soft_tokens_per_image.append(patches.shape[0] // pooling_kernel_size**2) + + # Step 5: Compute position IDs + patch_height = image.shape[-2] // patch_size + patch_width = image.shape[-1] // patch_size + grid_x, grid_y = np.meshgrid(np.arange(patch_width), np.arange(patch_height), indexing="xy") + real_positions = np.stack([grid_x, grid_y], axis=-1).reshape(patches.shape[0], 2) + + patches, positions = pad_along_first_dim(patches, real_positions, max_patches) + + pixel_values.append(patches) + position_ids.append(positions) + + # Stack into batch arrays and convert to tensors + pixel_values = np.stack(pixel_values, axis=0) # (batch, max_patches, patch_pixels) + position_ids = np.stack(position_ids, axis=0) # (batch, max_patches, 2) + + data = { + "pixel_values": pixel_values, + "image_position_ids": position_ids, + "num_soft_tokens_per_image": num_soft_tokens_per_image, + } + return BatchFeature(data=data, tensor_type=return_tensors) + + +__all__ = ["Gemma4ImageProcessorPil"] diff --git a/tooling/huggingface/transformers/modeling_gemma4-OUTLINE.py b/tooling/huggingface/transformers/modeling_gemma4-OUTLINE.py new file mode 100644 index 0000000..c1d7eb9 --- /dev/null +++ b/tooling/huggingface/transformers/modeling_gemma4-OUTLINE.py @@ -0,0 +1,723 @@ +# === HEADER (license + imports) === +# 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨 +# This file was automatically generated from src/transformers/models/gemma4/modular_gemma4.py. +# Do NOT edit this file manually as any edits will be overwritten by the generation of +# the file from the modular. If any change should be done, please apply the change to the +# modular_gemma4.py file directly. One of our CI enforces this. +# 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨 +# Copyright 2026 the HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +from collections.abc import Callable +from dataclasses import dataclass +from functools import cached_property +from typing import Optional + +import torch +from torch import nn +from torch.nn import functional as F + +from ... import initialization as init +from ...activations import ACT2FN +from ...cache_utils import Cache, DynamicCache +from ...configuration_utils import PreTrainedConfig +from ...generation import GenerationMixin +from ...integrations import use_experts_implementation, use_kernelized_func +from ...masking_utils import ( + create_bidirectional_mask, + create_causal_mask, + create_masks_for_generate, + create_sliding_window_causal_mask, +) +from ...modeling_flash_attention_utils import FlashAttentionKwargs +from ...modeling_layers import GradientCheckpointingLayer +from ...modeling_outputs import BaseModelOutputWithPast, BaseModelOutputWithPooling, CausalLMOutputWithPast +from ...modeling_rope_utils import ROPE_INIT_FUNCTIONS, dynamic_rope_update +from ...modeling_utils import ALL_ATTENTION_FUNCTIONS, PreTrainedModel +from ...processing_utils import Unpack +from ...utils import ( + ModelOutput, + TransformersKwargs, + auto_docstring, + can_return_tuple, + is_accelerate_available, + torch_compilable_check, +) +from ...utils.generic import maybe_autocast, merge_with_config_defaults +from ...utils.output_capturing import OutputRecorder, capture_outputs +from ..auto.modeling_auto import AutoModel +from .configuration_gemma4 import Gemma4AudioConfig, Gemma4Config, Gemma4TextConfig, Gemma4VisionConfig + + +if is_accelerate_available(): + from accelerate.hooks import add_hook_to_module + + +@dataclass +@auto_docstring( + custom_intro=""" + Base class for Gemma4 outputs, with hidden states and attentions. + """ +) +class Gemma4ModelOutputWithPast(BaseModelOutputWithPast): + r""" + past_key_values (`Cache`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`): + It is a [`~cache_utils.Cache`] instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). + + Contains pre-computed hidden-states (key and values in the self-attention blocks) that can be used (see + `past_key_values` input) to speed up sequential decoding. + image_hidden_states (`torch.FloatTensor`, *optional*): + +# === CLASS/FUNCTION OUTLINE (signatures + short body) === +@dataclass +@auto_docstring( + custom_intro=""" + Base class for Gemma4 outputs, with hidden states and attentions. + """ +) +class Gemma4ModelOutputWithPast(BaseModelOutputWithPast): + r""" + past_key_values (`Cache`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`): + It is a [`~cache_utils.Cache`] instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). + + Contains pre-computed hidden-states (key and values in the self-attention blocks) that can be used (see + `past_key_values` input) to speed up sequential decoding. + image_hidden_states (`torch.FloatTensor`, *optional*): + ... + +@dataclass +@auto_docstring( + custom_intro=""" + Base class for Gemma4 causal language model (or autoregressive) outputs. + """ +) +class Gemma4CausalLMOutputWithPast(ModelOutput): + r""" + loss (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided): + Language modeling loss (for next-token prediction). + logits (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.text_config.vocab_size)`): + Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). + past_key_values (`Cache`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`): + It is a [`~cache_utils.Cache`] instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). + ... + +@dataclass +@auto_docstring +class Gemma4AudioModelOutput(BaseModelOutputWithPooling): + r""" + attention_mask (`torch.BoolTensor`, *optional*): + A torch.BoolTensor of shape `(batch_size, num_frames)`. True for valid positions, False for padding. + """ + + attention_mask: torch.BoolTensor | None = None + + +class Gemma4ClippableLinear(nn.Module): + def __init__( + self, + ... + +class Gemma4RMSNorm(nn.Module): + def __init__(self, dim: int, eps: float = 1e-6, with_scale: bool = True): + super().__init__() + self.eps = eps + self.with_scale = with_scale + + if self.with_scale: + self.weight = nn.Parameter(torch.ones(dim), requires_grad=True) + + def _norm(self, hidden_states: torch.Tensor): + mean_squared = hidden_states.pow(2).mean(-1, keepdim=True) + self.eps + # Use torch.pow() (over torch.sqrt() or torch.rsqrt()) to addess compiler differences between Torch and JAX + return hidden_states * torch.pow(mean_squared, -0.5) + + ... + +class Gemma4AudioRelPositionalEncoding(nn.Module): + """Sinusoidal relative positional encoding for the audio encoder. + + Produces position embeddings of shape [1, 2*context_size - 1, hidden_size] with + concatenated [sin..., cos...] layout matching the original Gemma4 convention. + """ + + inv_timescales: torch.Tensor + + def __init__(self, config: Gemma4AudioConfig): + super().__init__() + self.hidden_size = config.hidden_size + self.context_size = ( + config.attention_chunk_size + config.attention_context_left - 1 + config.attention_context_right + ... + +class Gemma4AudioAttention(nn.Module): + """Chunked local attention with relative position bias""" + + def __init__(self, config: Gemma4AudioConfig, layer_idx: int): + super().__init__() + self.config = config + self.layer_idx = layer_idx + self.attention_logits_soft_cap = config.attention_logit_cap + self.head_dim = config.hidden_size // config.num_attention_heads + self.num_heads = config.num_attention_heads + + self.q_scale = (self.head_dim**-0.5) / math.log(2) + self.k_scale = math.log(1 + math.e) / math.log(2) + + ... + +class Gemma4AudioSubSampleConvProjectionLayer(nn.Module): + def __init__(self, in_channels, out_channels, norm_eps): + super().__init__() + self.conv = nn.Conv2d( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=(3, 3), + stride=(2, 2), + padding=1, + bias=False, + ) + self.norm = nn.LayerNorm(out_channels, eps=norm_eps, elementwise_affine=True, bias=False) + self.act = nn.ReLU() + + ... + +class Gemma4AudioSubSampleConvProjection(nn.Module): + def __init__(self, config: Gemma4AudioConfig): + super().__init__() + self.layer0 = Gemma4AudioSubSampleConvProjectionLayer( + in_channels=1, + out_channels=config.subsampling_conv_channels[0], + norm_eps=config.rms_norm_eps, + ) + self.layer1 = Gemma4AudioSubSampleConvProjectionLayer( + in_channels=config.subsampling_conv_channels[0], + out_channels=config.subsampling_conv_channels[1], + norm_eps=config.rms_norm_eps, + ) + proj_input_dim = (config.subsampling_conv_channels[0] // 4) * config.subsampling_conv_channels[1] + ... + +class Gemma4AudioFeedForward(nn.Module): + def __init__(self, config: Gemma4AudioConfig): + super().__init__() + self.config = config + + self.ffw_layer_1 = Gemma4ClippableLinear(config, config.hidden_size, config.hidden_size * 4) + self.ffw_layer_2 = Gemma4ClippableLinear(config, config.hidden_size * 4, config.hidden_size) + + self.pre_layer_norm = Gemma4RMSNorm(config.hidden_size) + self.post_layer_norm = Gemma4RMSNorm(config.hidden_size) + self.act_fn = ACT2FN[config.hidden_act] + + self.gradient_clipping = config.gradient_clipping + self.post_layer_scale = config.residual_weight + ... + +class Gemma4AudioCausalConv1d(nn.Conv1d): + # def __init__( + # self, + # in_channels: int, + # out_channels: int, + # kernel_size: int, + # # cache_key: str, + # stride: int = 1, + # dilation: int = 1, + # bias: bool = True, + # ): + # super().__init__(in_channels, out_channels, kernel_size, stride=stride, dilation=dilation, bias=bias) + # self.cache_key = cache_key + + ... + +class Gemma4AudioLightConv1d(nn.Module): + def __init__(self, config: Gemma4AudioConfig): + super().__init__() + self.config = config + + self.linear_start = Gemma4ClippableLinear(config, config.hidden_size, config.hidden_size * 2) + self.linear_end = Gemma4ClippableLinear(config, config.hidden_size, config.hidden_size) + self.depthwise_conv1d = Gemma4AudioCausalConv1d( + in_channels=config.hidden_size, + out_channels=config.hidden_size, + kernel_size=config.conv_kernel_size, + groups=config.hidden_size, + bias=False, + ) + ... + +class Gemma4AudioLayer(nn.Module): + def __init__(self, config: Gemma4AudioConfig, layer_idx: int): + super().__init__() + self.config = config + + self.feed_forward1 = Gemma4AudioFeedForward(config) + self.feed_forward2 = Gemma4AudioFeedForward(config) + self.self_attn = Gemma4AudioAttention(config, layer_idx) + self.lconv1d = Gemma4AudioLightConv1d(config) + + self.norm_pre_attn = Gemma4RMSNorm(config.hidden_size) + self.norm_post_attn = Gemma4RMSNorm(config.hidden_size) + self.norm_out = Gemma4RMSNorm(config.hidden_size) + + ... + +class Gemma4VisionPatchEmbedder(nn.Module): + def __init__(self, config: Gemma4VisionConfig): + super().__init__() + self.config = config + self.hidden_size = config.hidden_size + self.patch_size = config.patch_size + self.position_embedding_size = config.position_embedding_size + + self.input_proj = nn.Linear(3 * self.patch_size**2, self.hidden_size, bias=False) + self.position_embedding_table = nn.Parameter(torch.ones(2, self.position_embedding_size, self.hidden_size)) + + def _position_embeddings(self, pixel_position_ids: torch.Tensor, padding_positions: torch.Tensor) -> torch.Tensor: + """Prepare patch positions map for matmul with positon embedding table.""" + # Expanding and permute patch positions to (batch_size, num_patches, 2, position_embedding_size) for matmul. + ... + +class Gemma4VisionPooler(nn.Module): + """Scaling and optional spatial pooling for vision encodings""" + + def __init__(self, config: Gemma4VisionConfig): + super().__init__() + self.hidden_size = config.hidden_size + self.root_hidden_size = self.hidden_size**0.5 + + def _avg_pool_by_positions( + self, hidden_states: torch.Tensor, pixel_position_ids: torch.Tensor, length: int + ) -> tuple[torch.Tensor, torch.Tensor]: + """ + 2D spatial pooling according to patch positions. + Pools the input tokens by averaging patches within a `k^2` grid, where `k` is determined by the ratio between + ... + +class Gemma4VisionMLP(nn.Module): + def __init__(self, config: Gemma4VisionConfig): + super().__init__() + self.config = config + self.hidden_size = config.hidden_size + self.intermediate_size = config.intermediate_size + self.gate_proj = Gemma4ClippableLinear(config, self.hidden_size, self.intermediate_size) + self.up_proj = Gemma4ClippableLinear(config, self.hidden_size, self.intermediate_size) + self.down_proj = Gemma4ClippableLinear(config, self.intermediate_size, self.hidden_size) + self.act_fn = ACT2FN[config.hidden_activation] + + def forward(self, x): + down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x)) + return down_proj + ... + +class Gemma4VisionRotaryEmbedding(nn.Module): + inv_freq: torch.Tensor # fix linting for `register_buffer` + + def __init__(self, config: Gemma4VisionConfig, device=None): + super().__init__() + self.max_seq_len_cached = config.max_position_embeddings + self.original_max_seq_len = config.max_position_embeddings + + self.config = config + + self.rope_type = self.config.rope_parameters["rope_type"] + rope_init_fn: Callable = self.compute_default_rope_parameters + if self.rope_type != "default": + rope_init_fn = ROPE_INIT_FUNCTIONS[self.rope_type] + ... + +def rotate_half(x): + """Rotates half the hidden dims of the input.""" + x1 = x[..., : x.shape[-1] // 2] + x2 = x[..., x.shape[-1] // 2 :] + return torch.cat((-x2, x1), dim=-1) + + +def apply_rotary_pos_emb(x: torch.Tensor, cos: torch.Tensor, sin: torch.Tensor, unsqueeze_dim: int = 1): + """Applies Rotary Position Embedding to the query and key tensors. + + Args: + x (`torch.Tensor`): The tensor to embed. + cos (`torch.Tensor`): The cosine part of the rotary embedding. + sin (`torch.Tensor`): The sine part of the rotary embedding. + ... + +def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor: + """ + This is the equivalent of torch.repeat_interleave(x, dim=1, repeats=n_rep). The hidden states go from (batch, + num_key_value_heads, seqlen, head_dim) to (batch, num_attention_heads, seqlen, head_dim) + """ + batch, num_key_value_heads, slen, head_dim = hidden_states.shape + if n_rep == 1: + return hidden_states + hidden_states = hidden_states[:, :, None, :, :].expand(batch, num_key_value_heads, n_rep, slen, head_dim) + return hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim) + + +def eager_attention_forward( + module: nn.Module, + ... + +def apply_multidimensional_rope( + x: torch.Tensor, + cos: torch.Tensor, + sin: torch.Tensor, + position_ids: torch.Tensor, + unsqueeze_dim: int = 2, +) -> torch.Tensor: + """Applies multidimensional RoPE to inputs. + + Args: + x (`torch.Tensor`): The tensor to embed. + cos (`torch.Tensor`): The cosine part of the rotary embedding. + sin (`torch.Tensor`): The sine part of the rotary embedding. + position_ids (`torch.Tensor`, *optional*): + ... + +@use_kernelized_func(apply_rotary_pos_emb) +class Gemma4VisionAttention(nn.Module): + """Multi-headed attention from 'Attention Is All You Need' paper""" + + def __init__(self, config: Gemma4VisionConfig, layer_idx: int): + super().__init__() + self.layer_type = config.layer_types[layer_idx] if hasattr(config, "layer_types") else None + self.config = config + self.layer_idx = layer_idx + self.head_dim = getattr(config, "head_dim", config.hidden_size // config.num_attention_heads) + self.num_key_value_groups = config.num_attention_heads // config.num_key_value_heads + self.scaling = 1.0 + self.attention_dropout = self.config.attention_dropout + self.is_causal = False + ... + +class Gemma4VisionEncoderLayer(GradientCheckpointingLayer): + def __init__(self, config: Gemma4VisionConfig, layer_idx: int): + super().__init__() + self.config = config + self.hidden_size = config.hidden_size + self.layer_idx = layer_idx + self.self_attn = Gemma4VisionAttention(config=config, layer_idx=layer_idx) + self.mlp = Gemma4VisionMLP(config) + self.input_layernorm = Gemma4RMSNorm(self.hidden_size, eps=config.rms_norm_eps) + self.post_attention_layernorm = Gemma4RMSNorm(self.hidden_size, eps=config.rms_norm_eps) + self.pre_feedforward_layernorm = Gemma4RMSNorm(self.hidden_size, eps=config.rms_norm_eps) + self.post_feedforward_layernorm = Gemma4RMSNorm(self.hidden_size, eps=config.rms_norm_eps) + + def forward( + ... + +class Gemma4VisionEncoder(nn.Module): + def __init__(self, config: Gemma4VisionConfig): + super().__init__() + self.config = config + self.num_layers = config.num_hidden_layers + self.rotary_emb = Gemma4VisionRotaryEmbedding(config) + self.layers = nn.ModuleList( + [Gemma4VisionEncoderLayer(config=config, layer_idx=i) for i in range(self.num_layers)] + ) + + def forward( + self, + inputs_embeds: torch.Tensor, + attention_mask: torch.Tensor, + ... + +class Gemma4TextMLP(nn.Module): + def __init__(self, config: Gemma4TextConfig, layer_idx: int): + super().__init__() + first_kv_shared_layer_idx = config.num_hidden_layers - config.num_kv_shared_layers + is_kv_shared_layer = layer_idx >= first_kv_shared_layer_idx > 0 + use_double_wide_mlp = config.use_double_wide_mlp and is_kv_shared_layer + self.config = config + self.hidden_size = config.hidden_size + self.intermediate_size = config.intermediate_size * (2 if use_double_wide_mlp else 1) + self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False) + self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False) + self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False) + self.act_fn = ACT2FN[config.hidden_activation] + + ... + +class Gemma4TextRotaryEmbedding(nn.Module): + inv_freq: torch.Tensor # fix linting for `register_buffer` + + def __init__(self, config: Gemma4TextConfig, device=None, layer_type=None): + super().__init__() + self.max_seq_len_cached = config.max_position_embeddings + self.original_max_seq_len = config.max_position_embeddings + + self.config = config + self.layer_types = set(config.layer_types) + self.rope_init_fns: dict[str, Callable[..., tuple[torch.Tensor, float]]] = {} + self.rope_type: dict[str, str] = {} + + for layer_type in self.layer_types: + ... + +@use_kernelized_func(apply_rotary_pos_emb) +class Gemma4TextAttention(nn.Module): + """Multi-headed attention from 'Attention Is All You Need' paper""" + + def __init__(self, config: Gemma4TextConfig, layer_idx: int): + super().__init__() + self.layer_type = config.layer_types[layer_idx] if hasattr(config, "layer_types") else None + self.config = config + self.layer_idx = layer_idx + self.is_sliding = self.layer_type == "sliding_attention" + self.sliding_window = config.sliding_window if self.is_sliding else None + + self.head_dim = config.global_head_dim if not self.is_sliding and config.global_head_dim else config.head_dim + self.use_alternative_attention = config.attention_k_eq_v and not self.is_sliding + ... + +@use_experts_implementation +class Gemma4TextExperts(nn.Module): + """Collection of expert weights stored as 3D tensors.""" + + def __init__(self, config: Gemma4TextConfig): + super().__init__() + self.num_experts = config.num_experts + self.hidden_dim = config.hidden_size + self.intermediate_dim = config.moe_intermediate_size + self.gate_up_proj = nn.Parameter(torch.empty(self.num_experts, 2 * self.intermediate_dim, self.hidden_dim)) + self.down_proj = nn.Parameter(torch.empty(self.num_experts, self.hidden_dim, self.intermediate_dim)) + self.act_fn = ACT2FN[config.hidden_activation] + + def forward( + ... + +class Gemma4TextRouter(nn.Module): + def __init__(self, config: Gemma4TextConfig): + super().__init__() + self.config = config + self.hidden_size = config.hidden_size + self.scalar_root_size = self.hidden_size**-0.5 + self.eps = config.rms_norm_eps + + self.norm = Gemma4RMSNorm(self.hidden_size, eps=self.eps, with_scale=False) + self.proj = nn.Linear(config.hidden_size, config.num_experts, bias=False) + self.scale = nn.Parameter(torch.ones(self.hidden_size)) + self.per_expert_scale = nn.Parameter(torch.ones(config.num_experts)) + + def forward(self, hidden_states: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]: + ... + +class Gemma4TextDecoderLayer(GradientCheckpointingLayer): + def __init__(self, config: Gemma4TextConfig | Gemma4VisionConfig, layer_idx: int): + super().__init__() + self.config = config + self.hidden_size = config.hidden_size + self.layer_idx = layer_idx + self.self_attn = Gemma4TextAttention(config=config, layer_idx=layer_idx) + self.mlp = Gemma4TextMLP(config, layer_idx) + self.input_layernorm = Gemma4RMSNorm(self.hidden_size, eps=config.rms_norm_eps) + self.post_attention_layernorm = Gemma4RMSNorm(self.hidden_size, eps=config.rms_norm_eps) + self.pre_feedforward_layernorm = Gemma4RMSNorm(self.hidden_size, eps=config.rms_norm_eps) + self.post_feedforward_layernorm = Gemma4RMSNorm(self.hidden_size, eps=config.rms_norm_eps) + self.register_buffer("layer_scalar", torch.ones(1)) + + ... + +class Gemma4TextScaledWordEmbedding(nn.Embedding): + """ + This module overrides nn.Embeddings' forward by multiplying with embeddings scale. + """ + + def __init__(self, num_embeddings: int, embedding_dim: int, padding_idx: int, embed_scale: float = 1.0): + super().__init__(num_embeddings, embedding_dim, padding_idx) + self.scalar_embed_scale = embed_scale + self.register_buffer("embed_scale", torch.tensor(embed_scale), persistent=False) + + def forward(self, input_ids: torch.Tensor): + return super().forward(input_ids) * self.embed_scale.to(self.weight.dtype) + + + ... + +@auto_docstring +class Gemma4PreTrainedModel(PreTrainedModel): + config: Gemma4Config + base_model_prefix = "model" + supports_gradient_checkpointing = True + _no_split_modules = ["Gemma4TextDecoderLayer", "Gemma4VisionEncoderLayer", "Gemma4AudioLayer"] + _skip_keys_device_placement = ["past_key_values", "shared_kv_states"] + _supports_flash_attn = True + _supports_sdpa = True + _supports_flex_attn = True + + _can_compile_fullgraph = True + _supports_attention_backend = True + _can_record_outputs = None # override + ... + +@auto_docstring(custom_intro="The base Gemma 4 language model without a language modeling head.") +class Gemma4TextModel(Gemma4PreTrainedModel): + config: Gemma4TextConfig + input_modalities = ("text",) + _can_record_outputs = { + "router_logits": OutputRecorder(Gemma4TextRouter, index=0), + "hidden_states": Gemma4TextDecoderLayer, + "attentions": Gemma4TextAttention, + } + + def __init__(self, config: Gemma4TextConfig): + super().__init__(config) + self.padding_idx = config.pad_token_id + self.vocab_size = config.vocab_size + ... + +@auto_docstring(custom_intro="The base Gemma 4 language model with a language modeling head.") +class Gemma4ForCausalLM(Gemma4PreTrainedModel, GenerationMixin): + _tied_weights_keys = {"lm_head.weight": "model.embed_tokens.weight"} + _tp_plan = {"lm_head": "colwise_gather_output"} + _pp_plan = {"lm_head": (["hidden_states"], ["logits"])} + config: Gemma4TextConfig + base_model_prefix = "model" + + def __init__(self, config: Gemma4TextConfig): + super().__init__(config) + self.model = Gemma4TextModel(config) + self.vocab_size = config.vocab_size + self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False) + # Grab the ones from the child + ... + +def sliding_window_mask_function(sliding_window: tuple[int, int]) -> Callable: + """ + This creates uni/bidirectional attention mask with sliding window. + """ + + def inner_mask(batch_idx: int, head_idx: int, q_idx: int, kv_idx: int) -> bool: + left_window_size, right_window_size = sliding_window + + dist = q_idx - kv_idx + left_mask = (dist >= 0) & (dist < left_window_size) + right_mask = (dist < 0) & (-dist < right_window_size) + return left_mask | right_mask + + return inner_mask + ... + +class Gemma4AudioModel(Gemma4PreTrainedModel): + """An audio encoder based on the [Universal Speech Model](https://huggingface.co/papers/2303.01037) architecture.""" + + config: Gemma4AudioConfig + main_input_name = "input_features" + base_model_prefix = "model.audio_tower" # prefix for Gemma4ForConditionalGeneration saved checkpoints, required for Gemma4AudioModel.from_pretrained() + _can_record_outputs = { + "hidden_states": Gemma4AudioLayer, + "attentions": Gemma4AudioAttention, + } + + def __init__(self, config: Gemma4AudioConfig): + super().__init__(config) + self.config = config + ... + +class Gemma4VisionModel(Gemma4PreTrainedModel): + """The Gemma 4 Vision Encoder.""" + + config = Gemma4VisionConfig + _can_record_outputs = { + "hidden_states": Gemma4VisionEncoderLayer, + "attentions": Gemma4VisionAttention, + } + + def __init__(self, config: Gemma4VisionConfig): + super().__init__(config) + self.patch_embedder = Gemma4VisionPatchEmbedder(config) + self.encoder = Gemma4VisionEncoder(config) + self.pooler = Gemma4VisionPooler(config) + ... + +class Gemma4MultimodalEmbedder(nn.Module): + """Embeds token ids or soft tokens for multimodal content into language model space.""" + + def __init__( + self, + multimodal_config: Gemma4AudioConfig | Gemma4VisionConfig, + text_config: Gemma4TextConfig, + ): + super().__init__() + + self.multimodal_hidden_size = getattr(multimodal_config, "output_proj_dims", multimodal_config.hidden_size) + self.eps = multimodal_config.rms_norm_eps + self.text_hidden_size = text_config.hidden_size + self.embedding_projection = nn.Linear(self.multimodal_hidden_size, self.text_hidden_size, bias=False) + ... + +def token_type_ids_mask_function( + token_type_ids: torch.Tensor | None, + image_group_ids: torch.Tensor | None, +) -> Callable | None: + """ + This function adds the correct offsets to the `q_idx` and `kv_idx` as the torch API can only accept lengths, + not start and end indices. + """ + # Do not return an additional mask in this case + if token_type_ids is None: + return None + + def inner_mask(batch_idx: int, head_idx: int, q_idx: int, kv_idx: int) -> bool: + seq_length = image_group_ids.shape[-1] + ... + +def create_causal_mask_mapping( + config: PreTrainedConfig, + inputs_embeds: torch.Tensor, + attention_mask: torch.Tensor | None, + past_key_values: Cache | None, + position_ids: torch.Tensor | None, + mm_token_type_ids: torch.Tensor | None = None, + pixel_values: torch.FloatTensor | None = None, + is_training: bool = False, + is_first_iteration: bool | None = None, + **kwargs, +) -> dict: + """ + Overwrites the base `create_masks_for_generate` with `token_type_ids` masking to create the causal mask mapping + ... + +@auto_docstring( + custom_intro=""" + The base Gemma 4 model comprising a vision backbone, an audio backbone, and a language model without a + language modeling head. + """ +) +class Gemma4Model(Gemma4PreTrainedModel): + # we are filtering the logits/labels so we shouldn't divide the loss based on num_items_in_batch + accepts_loss_kwargs = False + + def __init__(self, config: Gemma4Config): + super().__init__(config) + self.vocab_size = config.text_config.vocab_size + + ... + +@auto_docstring( + custom_intro=""" + The base Gemma 4 model comprising a vision backbone, an audio backbone, a language model, and a language modeling + head. + """ +) +class Gemma4ForConditionalGeneration(Gemma4PreTrainedModel, GenerationMixin): + _tied_weights_keys = {"lm_head.weight": "model.language_model.embed_tokens.weight"} + accepts_loss_kwargs = False + base_model_prefix = "model" + + def __init__(self, config: Gemma4Config): + super().__init__(config) + self.model = Gemma4Model(config) + ... + diff --git a/tooling/huggingface/transformers/modular_gemma4-OUTLINE.py b/tooling/huggingface/transformers/modular_gemma4-OUTLINE.py new file mode 100644 index 0000000..4ea92d6 --- /dev/null +++ b/tooling/huggingface/transformers/modular_gemma4-OUTLINE.py @@ -0,0 +1,563 @@ +# === HEADER (license + imports) === +# Copyright 2026 the HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +from collections.abc import Callable +from dataclasses import dataclass +from functools import cached_property + +import torch +from torch import nn +from torch.nn import functional as F + +from ... import initialization as init +from ...activations import ACT2FN +from ...cache_utils import Cache, DynamicCache +from ...configuration_utils import PreTrainedConfig +from ...integrations import use_kernelized_func +from ...masking_utils import ( + create_bidirectional_mask, + create_causal_mask, + create_masks_for_generate, + create_sliding_window_causal_mask, +) +from ...modeling_flash_attention_utils import FlashAttentionKwargs +from ...modeling_outputs import BaseModelOutputWithPast, BaseModelOutputWithPooling +from ...modeling_rope_utils import ROPE_INIT_FUNCTIONS, dynamic_rope_update +from ...modeling_utils import ALL_ATTENTION_FUNCTIONS, PreTrainedModel +from ...processing_utils import Unpack +from ...utils import ( + TransformersKwargs, + auto_docstring, + can_return_tuple, + is_accelerate_available, + logging, + torch_compilable_check, +) +from ...utils.generic import maybe_autocast, merge_with_config_defaults +from ...utils.output_capturing import OutputRecorder, capture_outputs +from ..auto.modeling_auto import AutoModel +from ..gemma3.modeling_gemma3 import ( + Gemma3Attention, + Gemma3DecoderLayer, + Gemma3ForCausalLM, + Gemma3MLP, + Gemma3RotaryEmbedding, + Gemma3TextModel, + Gemma3TextScaledWordEmbedding, +) +from ..gemma3n.modeling_gemma3n import ( + Gemma3nCausalLMOutputWithPast, + Gemma3nForConditionalGeneration, + Gemma3nModel, + Gemma3nModelOutputWithPast, + Gemma3nMultimodalEmbedder, + Gemma3nPreTrainedModel, + Gemma3nRMSNorm, + apply_rotary_pos_emb, + eager_attention_forward, +) +from ..llama.modeling_llama import LlamaRotaryEmbedding +from ..mixtral.modeling_mixtral import MixtralExperts +from ..moonshine_streaming.modeling_moonshine_streaming import sliding_window_mask_function +from .configuration_gemma4 import Gemma4AudioConfig, Gemma4Config, Gemma4TextConfig, Gemma4VisionConfig + + +if is_accelerate_available(): + pass + + + +# === CLASS/FUNCTION OUTLINE (signatures + short body) === +class Gemma4ModelOutputWithPast(Gemma3nModelOutputWithPast): + pass + + +class Gemma4CausalLMOutputWithPast(Gemma3nCausalLMOutputWithPast): + pass + + +@dataclass +@auto_docstring +class Gemma4AudioModelOutput(BaseModelOutputWithPooling): + r""" + attention_mask (`torch.BoolTensor`, *optional*): + A torch.BoolTensor of shape `(batch_size, num_frames)`. True for valid positions, False for padding. + ... + +class Gemma4ClippableLinear(nn.Module): + def __init__( + self, + config: Gemma4VisionConfig | Gemma4AudioConfig, + in_features: int, + out_features: int, + ) -> None: + super().__init__() + self.use_clipped_linears = config.use_clipped_linears + self.linear = nn.Linear(in_features, out_features, bias=False) + + if self.use_clipped_linears: + self.register_buffer("input_min", torch.tensor(-float("inf"))) + self.register_buffer("input_max", torch.tensor(float("inf"))) + ... + +class Gemma4RMSNorm(Gemma3nRMSNorm): + pass + + +class Gemma4AudioRelPositionalEncoding(nn.Module): + """Sinusoidal relative positional encoding for the audio encoder. + + Produces position embeddings of shape [1, 2*context_size - 1, hidden_size] with + concatenated [sin..., cos...] layout matching the original Gemma4 convention. + """ + + inv_timescales: torch.Tensor + + def __init__(self, config: Gemma4AudioConfig): + ... + +class Gemma4AudioAttention(nn.Module): + """Chunked local attention with relative position bias""" + + def __init__(self, config: Gemma4AudioConfig, layer_idx: int): + super().__init__() + self.config = config + self.layer_idx = layer_idx + self.attention_logits_soft_cap = config.attention_logit_cap + self.head_dim = config.hidden_size // config.num_attention_heads + self.num_heads = config.num_attention_heads + + self.q_scale = (self.head_dim**-0.5) / math.log(2) + self.k_scale = math.log(1 + math.e) / math.log(2) + + ... + +class Gemma4AudioSubSampleConvProjectionLayer(nn.Module): + def __init__(self, in_channels, out_channels, norm_eps): + super().__init__() + self.conv = nn.Conv2d( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=(3, 3), + stride=(2, 2), + padding=1, + bias=False, + ) + self.norm = nn.LayerNorm(out_channels, eps=norm_eps, elementwise_affine=True, bias=False) + self.act = nn.ReLU() + + ... + +class Gemma4AudioSubSampleConvProjection(nn.Module): + def __init__(self, config: Gemma4AudioConfig): + super().__init__() + self.layer0 = Gemma4AudioSubSampleConvProjectionLayer( + in_channels=1, + out_channels=config.subsampling_conv_channels[0], + norm_eps=config.rms_norm_eps, + ) + self.layer1 = Gemma4AudioSubSampleConvProjectionLayer( + in_channels=config.subsampling_conv_channels[0], + out_channels=config.subsampling_conv_channels[1], + norm_eps=config.rms_norm_eps, + ) + proj_input_dim = (config.subsampling_conv_channels[0] // 4) * config.subsampling_conv_channels[1] + ... + +class Gemma4AudioFeedForward(nn.Module): + def __init__(self, config: Gemma4AudioConfig): + super().__init__() + self.config = config + + self.ffw_layer_1 = Gemma4ClippableLinear(config, config.hidden_size, config.hidden_size * 4) + self.ffw_layer_2 = Gemma4ClippableLinear(config, config.hidden_size * 4, config.hidden_size) + + self.pre_layer_norm = Gemma4RMSNorm(config.hidden_size) + self.post_layer_norm = Gemma4RMSNorm(config.hidden_size) + self.act_fn = ACT2FN[config.hidden_act] + + self.gradient_clipping = config.gradient_clipping + self.post_layer_scale = config.residual_weight + ... + +class Gemma4AudioCausalConv1d(nn.Conv1d): + # def __init__( + # self, + # in_channels: int, + # out_channels: int, + # kernel_size: int, + # # cache_key: str, + # stride: int = 1, + # dilation: int = 1, + # bias: bool = True, + # ): + # super().__init__(in_channels, out_channels, kernel_size, stride=stride, dilation=dilation, bias=bias) + # self.cache_key = cache_key + + ... + +class Gemma4AudioLightConv1d(nn.Module): + def __init__(self, config: Gemma4AudioConfig): + super().__init__() + self.config = config + + self.linear_start = Gemma4ClippableLinear(config, config.hidden_size, config.hidden_size * 2) + self.linear_end = Gemma4ClippableLinear(config, config.hidden_size, config.hidden_size) + self.depthwise_conv1d = Gemma4AudioCausalConv1d( + in_channels=config.hidden_size, + out_channels=config.hidden_size, + kernel_size=config.conv_kernel_size, + groups=config.hidden_size, + bias=False, + ) + ... + +class Gemma4AudioLayer(nn.Module): + def __init__(self, config: Gemma4AudioConfig, layer_idx: int): + super().__init__() + self.config = config + + self.feed_forward1 = Gemma4AudioFeedForward(config) + self.feed_forward2 = Gemma4AudioFeedForward(config) + self.self_attn = Gemma4AudioAttention(config, layer_idx) + self.lconv1d = Gemma4AudioLightConv1d(config) + + self.norm_pre_attn = Gemma4RMSNorm(config.hidden_size) + self.norm_post_attn = Gemma4RMSNorm(config.hidden_size) + self.norm_out = Gemma4RMSNorm(config.hidden_size) + + ... + +class Gemma4VisionPatchEmbedder(nn.Module): + def __init__(self, config: Gemma4VisionConfig): + super().__init__() + self.config = config + self.hidden_size = config.hidden_size + self.patch_size = config.patch_size + self.position_embedding_size = config.position_embedding_size + + self.input_proj = nn.Linear(3 * self.patch_size**2, self.hidden_size, bias=False) + self.position_embedding_table = nn.Parameter(torch.ones(2, self.position_embedding_size, self.hidden_size)) + + def _position_embeddings(self, pixel_position_ids: torch.Tensor, padding_positions: torch.Tensor) -> torch.Tensor: + """Prepare patch positions map for matmul with positon embedding table.""" + # Expanding and permute patch positions to (batch_size, num_patches, 2, position_embedding_size) for matmul. + ... + +class Gemma4VisionPooler(nn.Module): + """Scaling and optional spatial pooling for vision encodings""" + + def __init__(self, config: Gemma4VisionConfig): + super().__init__() + self.hidden_size = config.hidden_size + self.root_hidden_size = self.hidden_size**0.5 + + def _avg_pool_by_positions( + self, hidden_states: torch.Tensor, pixel_position_ids: torch.Tensor, length: int + ) -> tuple[torch.Tensor, torch.Tensor]: + """ + 2D spatial pooling according to patch positions. + Pools the input tokens by averaging patches within a `k^2` grid, where `k` is determined by the ratio between + ... + +class Gemma4VisionMLP(Gemma3MLP): + def __init__(self, config: Gemma4VisionConfig): + super().__init__(self, config) + self.gate_proj = Gemma4ClippableLinear(config, self.hidden_size, self.intermediate_size) + self.up_proj = Gemma4ClippableLinear(config, self.hidden_size, self.intermediate_size) + self.down_proj = Gemma4ClippableLinear(config, self.intermediate_size, self.hidden_size) + + +def apply_multidimensional_rope( + x: torch.Tensor, + cos: torch.Tensor, + sin: torch.Tensor, + position_ids: torch.Tensor, + unsqueeze_dim: int = 2, + ... + +class Gemma4VisionRotaryEmbedding(LlamaRotaryEmbedding): + @staticmethod + def compute_default_rope_parameters( + config: Gemma4VisionConfig | None = None, + device: torch.device | None = None, + seq_len: int | None = None, + ) -> tuple["torch.Tensor", float]: + """ + Computes the inverse frequencies according to the original RoPE implementation + Args: + config ([`~transformers.PreTrainedConfig`]): + The model configuration. + device (`torch.device`): + The device to use for initialization of the inverse frequencies. + ... + +class Gemma4VisionAttention(Gemma3Attention): + def __init__(self, config: Gemma4VisionConfig, layer_idx: int): + super().__init__(self, config, layer_idx) + del self.attn_logit_softcapping + del self.sliding_window + del self.is_sliding + self.scaling = 1.0 + self.is_causal = False + self.k_proj = Gemma4ClippableLinear(config, config.hidden_size, config.num_key_value_heads * self.head_dim) + self.q_proj = Gemma4ClippableLinear(config, config.hidden_size, config.num_attention_heads * self.head_dim) + self.v_proj = Gemma4ClippableLinear(config, config.hidden_size, config.num_key_value_heads * self.head_dim) + self.o_proj = Gemma4ClippableLinear(config, config.num_attention_heads * self.head_dim, config.hidden_size) + self.v_norm = Gemma4RMSNorm(self.head_dim, eps=config.rms_norm_eps, with_scale=False) + + ... + +class Gemma4VisionEncoderLayer(Gemma3DecoderLayer): + def __init__(self, config: Gemma4VisionConfig, layer_idx: int): + super().__init__(self, config, layer_idx) + self.self_attn = Gemma4VisionAttention(config=config, layer_idx=layer_idx) + self.mlp = Gemma4VisionMLP(config) + + def forward( + self, + hidden_states: torch.Tensor, + position_embeddings: torch.Tensor = None, + attention_mask: torch.Tensor | None = None, + position_ids: torch.LongTensor | None = None, + **kwargs: Unpack[TransformersKwargs], + ) -> tuple[torch.FloatTensor, tuple[torch.FloatTensor, torch.FloatTensor] | None]: + ... + +class Gemma4VisionEncoder(nn.Module): + def __init__(self, config: Gemma4VisionConfig): + super().__init__() + self.config = config + self.num_layers = config.num_hidden_layers + self.rotary_emb = Gemma4VisionRotaryEmbedding(config) + self.layers = nn.ModuleList( + [Gemma4VisionEncoderLayer(config=config, layer_idx=i) for i in range(self.num_layers)] + ) + + def forward( + self, + inputs_embeds: torch.Tensor, + attention_mask: torch.Tensor, + ... + +class Gemma4TextMLP(Gemma3MLP): + def __init__(self, config: Gemma4TextConfig, layer_idx: int): + first_kv_shared_layer_idx = config.num_hidden_layers - config.num_kv_shared_layers + is_kv_shared_layer = layer_idx >= first_kv_shared_layer_idx > 0 + use_double_wide_mlp = config.use_double_wide_mlp and is_kv_shared_layer + super().__init__() + self.intermediate_size = config.intermediate_size * (2 if use_double_wide_mlp else 1) + + +class Gemma4TextRotaryEmbedding(Gemma3RotaryEmbedding): + def __init__(self, config: Gemma4TextConfig, device=None, layer_type=None): + nn.Module.__init__(self) + self.max_seq_len_cached = config.max_position_embeddings + self.original_max_seq_len = config.max_position_embeddings + ... + +@use_kernelized_func(apply_rotary_pos_emb) +class Gemma4TextAttention(nn.Module): + """Multi-headed attention from 'Attention Is All You Need' paper""" + + def __init__(self, config: Gemma4TextConfig, layer_idx: int): + super().__init__() + self.layer_type = config.layer_types[layer_idx] if hasattr(config, "layer_types") else None + self.config = config + self.layer_idx = layer_idx + self.is_sliding = self.layer_type == "sliding_attention" + self.sliding_window = config.sliding_window if self.is_sliding else None + + self.head_dim = config.global_head_dim if not self.is_sliding and config.global_head_dim else config.head_dim + self.use_alternative_attention = config.attention_k_eq_v and not self.is_sliding + ... + +class Gemma4TextExperts(MixtralExperts): + def __init__(self, config: Gemma4TextConfig): + super().__init__() + self.num_experts = config.num_experts + self.intermediate_dim = config.moe_intermediate_size + self.act_fn = ACT2FN[config.hidden_activation] + + +class Gemma4TextRouter(nn.Module): + def __init__(self, config: Gemma4TextConfig): + super().__init__() + self.config = config + self.hidden_size = config.hidden_size + self.scalar_root_size = self.hidden_size**-0.5 + ... + +class Gemma4TextDecoderLayer(Gemma3DecoderLayer): + def __init__(self, config: Gemma4TextConfig | Gemma4VisionConfig, layer_idx: int): + super().__init__(config, layer_idx) + self.self_attn = Gemma4TextAttention(config=config, layer_idx=layer_idx) + self.mlp = Gemma4TextMLP(config, layer_idx) + self.register_buffer("layer_scalar", torch.ones(1)) + + self.hidden_size_per_layer_input = config.hidden_size_per_layer_input + if self.hidden_size_per_layer_input: + self.act_fn = ACT2FN[config.hidden_activation] + self.per_layer_input_gate = nn.Linear(self.hidden_size, self.hidden_size_per_layer_input, bias=False) + self.per_layer_projection = nn.Linear(self.hidden_size_per_layer_input, self.hidden_size, bias=False) + self.post_per_layer_input_norm = Gemma4RMSNorm(self.hidden_size, eps=config.rms_norm_eps) + + ... + +class Gemma4TextScaledWordEmbedding(Gemma3TextScaledWordEmbedding): + pass + + +# ---- Model Classes ---- + + +class Gemma4PreTrainedModel(Gemma3nPreTrainedModel): + _no_split_modules = ["Gemma4TextDecoderLayer", "Gemma4VisionEncoderLayer", "Gemma4AudioLayer"] + input_modalities = ("image", "text", "video", "audio") + _can_record_outputs = None # override + _skip_keys_device_placement = ["past_key_values", "shared_kv_states"] + + @torch.no_grad() + ... + +@auto_docstring(custom_intro="The base Gemma 4 language model without a language modeling head.") +class Gemma4TextModel(Gemma3TextModel): + config: Gemma4TextConfig + _can_record_outputs = { + "router_logits": OutputRecorder(Gemma4TextRouter, index=0), + "hidden_states": Gemma4TextDecoderLayer, + "attentions": Gemma4TextAttention, + } + + def __init__(self, config: Gemma4TextConfig): + super().__init__(config) + self.layers = nn.ModuleList( + [Gemma4TextDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)] + ) + ... + +@auto_docstring(custom_intro="The base Gemma 4 language model with a language modeling head.") +class Gemma4ForCausalLM(Gemma3ForCausalLM): + base_model_prefix = "model" + + def __init__(self, config: Gemma4TextConfig): + super().__init__(config) + # Grab the ones from the child + self._keys_to_ignore_on_load_unexpected = [ + f"model.{name}" for name in self.model._keys_to_ignore_on_load_unexpected + ] + + +class Gemma4AudioModel(Gemma4PreTrainedModel): + """An audio encoder based on the [Universal Speech Model](https://huggingface.co/papers/2303.01037) architecture.""" + ... + +class Gemma4VisionModel(Gemma4PreTrainedModel): + """The Gemma 4 Vision Encoder.""" + + config = Gemma4VisionConfig + _can_record_outputs = { + "hidden_states": Gemma4VisionEncoderLayer, + "attentions": Gemma4VisionAttention, + } + + def __init__(self, config: Gemma4VisionConfig): + super().__init__(config) + self.patch_embedder = Gemma4VisionPatchEmbedder(config) + self.encoder = Gemma4VisionEncoder(config) + self.pooler = Gemma4VisionPooler(config) + ... + +class Gemma4MultimodalEmbedder(Gemma3nMultimodalEmbedder): + def __init__( + self, + multimodal_config: Gemma4AudioConfig | Gemma4VisionConfig, + text_config: Gemma4TextConfig, + ): + # Audio tower may use a different output dimension (output_proj_dims) than the + # internal hidden_size. Use the tower-specific dimension if specified. + super().__init__(multimodal_config, text_config) + del self.embedding + del self.hard_embedding_norm + del self.soft_embedding_norm + del self.vocab_offset + del self.vocab_size + ... + +def token_type_ids_mask_function( + token_type_ids: torch.Tensor | None, + image_group_ids: torch.Tensor | None, +) -> Callable | None: + """ + This function adds the correct offsets to the `q_idx` and `kv_idx` as the torch API can only accept lengths, + not start and end indices. + """ + # Do not return an additional mask in this case + if token_type_ids is None: + return None + + def inner_mask(batch_idx: int, head_idx: int, q_idx: int, kv_idx: int) -> bool: + seq_length = image_group_ids.shape[-1] + ... + +def create_causal_mask_mapping( + config: PreTrainedConfig, + inputs_embeds: torch.Tensor, + attention_mask: torch.Tensor | None, + past_key_values: Cache | None, + position_ids: torch.Tensor | None, + mm_token_type_ids: torch.Tensor | None = None, + pixel_values: torch.FloatTensor | None = None, + is_training: bool = False, + is_first_iteration: bool | None = None, + **kwargs, +) -> dict: + """ + Overwrites the base `create_masks_for_generate` with `token_type_ids` masking to create the causal mask mapping + ... + +@auto_docstring( + custom_intro=""" + The base Gemma 4 model comprising a vision backbone, an audio backbone, and a language model without a + language modeling head. + """ +) +class Gemma4Model(Gemma3nModel): + def __init__(self, config: Gemma4Config): + super().__init__(config) + del self.vision_tower + del self.embed_vision + self.vision_tower = AutoModel.from_config(config.vision_config) if config.vision_config is not None else None + self.embed_vision = ( + Gemma4MultimodalEmbedder(config.vision_config, config.text_config) + ... + +@auto_docstring( + custom_intro=""" + The base Gemma 4 model comprising a vision backbone, an audio backbone, a language model, and a language modeling + head. + """ +) +class Gemma4ForConditionalGeneration(Gemma3nForConditionalGeneration): + base_model_prefix = "model" + + def __init__(self, config: Gemma4Config): + super().__init__(config) + # Grab the ones from the child + self._keys_to_ignore_on_load_unexpected = [ + f"model.{name}" for name in self.model._keys_to_ignore_on_load_unexpected + ... + diff --git a/tooling/huggingface/transformers/processing_gemma4.py b/tooling/huggingface/transformers/processing_gemma4.py new file mode 100644 index 0000000..d688250 --- /dev/null +++ b/tooling/huggingface/transformers/processing_gemma4.py @@ -0,0 +1,366 @@ +# Copyright 2026 the HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import re + +import numpy as np + +from ...audio_utils import AudioInput +from ...image_processing_utils import BatchFeature +from ...image_utils import ImageInput, make_nested_list_of_images +from ...processing_utils import MultiModalData, ProcessingKwargs, ProcessorMixin, Unpack +from ...tokenization_utils_base import PreTokenizedInput, TextInput +from ...utils import auto_docstring, is_vision_available, logging +from ...utils.import_utils import requires +from ...video_utils import VideoInput + + +if is_vision_available(): + from .image_processing_pil_gemma4 import Gemma4ImageProcessorKwargs, get_aspect_ratio_preserving_size + + +logger = logging.get_logger(__name__) + + +class Gemma4ProcessorKwargs(ProcessingKwargs, total=False): + images_kwargs: Gemma4ImageProcessorKwargs + _defaults = { + "text_kwargs": { + "padding": True, + "return_mm_token_type_ids": True, + }, + "images_kwargs": { + "do_convert_rgb": True, + }, + "audio_kwargs": {}, + "videos_kwargs": {"return_metadata": True}, + } + + +@auto_docstring +@requires(backends=("vision",)) +class Gemma4Processor(ProcessorMixin): + def __init__( + self, + feature_extractor, + image_processor, + tokenizer, + video_processor, + chat_template=None, + image_seq_length: int = 280, + audio_seq_length: int = 750, + audio_ms_per_token: int = 40, + **kwargs, + ): + r""" + image_seq_length (`int`, *optional*, defaults to 280): + The number of soft tokens per image used for placeholder expansion. + audio_seq_length (`int`, *optional*, defaults to 750): + The maximum number of audio soft tokens per audio segment. Serves as an + upper-bound cap when dynamic audio token counts are computed. + audio_ms_per_token (`int`, *optional*, defaults to 40): + Milliseconds of audio per output soft token. Used to dynamically compute + the number of audio placeholder tokens as ``ceil(duration_ms / audio_ms_per_token)``. + The default of 40 comes from the SSCP convolution's 4× time reduction on 10ms frames. + """ + self.image_seq_length = image_seq_length + self.image_token_id = tokenizer.image_token_id + self.boi_token = tokenizer.boi_token + self.eoi_token = tokenizer.eoi_token + self.image_token = tokenizer.image_token + + # FIXME: add the token to config and ask Ryan to re-upload + tokenizer.add_special_tokens({"additional_special_tokens": ["<|video|>"]}) + self.video_token = "<|video|>" + self.video_token_id = tokenizer.convert_tokens_to_ids(self.video_token) + + # Audio token handling, mirroring the vision pattern. + # audio_seq_length serves as the maximum cap on the number of audio soft tokens + # any single audio segment can produce. With dynamic audio tokens, the actual + # number of placeholders inserted per audio is computed from the audio duration. + self.audio_seq_length = audio_seq_length + # Milliseconds of audio per output soft token. The default of 40 comes from the + # SSCP convolution's 4× time reduction applied to 10ms mel spectrogram frames. + self.audio_ms_per_token = audio_ms_per_token + self.audio_token_id = getattr(tokenizer, "audio_token_id", None) + self.audio_token = getattr(tokenizer, "audio_token", None) + self.boa_token = getattr(tokenizer, "boa_token", None) + self.eoa_token = getattr(tokenizer, "eoa_token", None) + + super().__init__( + feature_extractor=feature_extractor, + image_processor=image_processor, + tokenizer=tokenizer, + video_processor=video_processor, + chat_template=chat_template, + **kwargs, + ) + + @auto_docstring + def __call__( + self, + images: ImageInput | None = None, + text: TextInput | PreTokenizedInput | list[TextInput] | list[PreTokenizedInput] = None, + audio: AudioInput | None = None, + videos: VideoInput | None = None, + **kwargs: Unpack[Gemma4ProcessorKwargs], + ) -> BatchFeature: + if text is None and images is None and audio is None and videos is None: + raise ValueError("Provide at least one of `text`, `images`, `audio`, or `videos`.") + + output_kwargs = self._merge_kwargs( + Gemma4ProcessorKwargs, + tokenizer_init_kwargs=self.tokenizer.init_kwargs, + **kwargs, + ) + + if isinstance(text, str): + text = [text] + elif not isinstance(text, list) and not isinstance(text[0], str): + raise TypeError("Invalid input text. Please provide a string, or a list of strings") + + image_inputs = {} + if images is not None: + images = self.image_processor.fetch_images(images) + batched_images = make_nested_list_of_images(images) + image_inputs = self.image_processor(images, **output_kwargs["images_kwargs"]) + + num_soft_tokens = image_inputs.pop("num_soft_tokens_per_image") + + # Create empty text to be replaced with placeholders + if not text: + text = [" ".join([self.image_token] * len(images)) for images in batched_images] + + if len(batched_images) != len(text): + raise ValueError( + f"Received inconsistently sized batches of images ({len(batched_images)}) and text ({len(text)})." + ) + + replacements = [f"{self.boi_token}{self.image_token * n}{self.eoi_token}" for n in num_soft_tokens] + replacements_iter = iter(replacements) + + # Expand image_token placeholders to per-image soft token sequences. + # re.sub never re-scans replaced text, so it is safe + pattern = re.escape(self.image_token) + text = [re.sub(pattern, lambda _: next(replacements_iter), prompt) for prompt in text] + + # Process video inputs in same way + video_inputs = {} + if videos is not None: + video_inputs = self.video_processor(videos=videos, **output_kwargs["videos_kwargs"]) + num_video_tokens = video_inputs.pop("num_soft_tokens_per_video") + + # If user has not requested video metadata, pop it so it isn't returned + if not kwargs.get("return_metadata"): + video_metadata = video_inputs.pop("video_metadata") + else: + video_metadata = video_inputs["video_metadata"] + + video_replacements = [] + for metadata, n_tokens in zip(video_metadata, num_video_tokens): + if metadata.fps is None: + logger.warning_once( + "Gemma 4 requires frame timestamps to construct prompts, but the `fps` of the input video " + "could not be inferred. Probably `video_metadata` was missing from inputs and you passed " + "pre-sampled frames. Defaulting to `fps=24`. Please provide `video_metadata` for more " + "accurate results." + ) + metadata.fps = 24 if metadata.fps is None else metadata.fps + # mm:ss format for timestamps + timestamp_str = [ + f"{int(seconds // 60):02d}:{int(seconds % 60):02d}" for seconds in metadata.timestamps + ] + video_replacements.append( + " ".join( + [f"{t} {self.boi_token}{self.video_token * n_tokens}{self.eoi_token}" for t in timestamp_str] + ) + ) + + video_replacements = iter(video_replacements) + pattern = re.escape(self.video_token) + text = [re.sub(pattern, lambda _: next(video_replacements), prompt) for prompt in text] + + # Process audio inputs + audio_inputs = {} + if audio is not None: + if self.audio_token is None or self.boa_token is None or self.eoa_token is None: + raise ValueError( + "Audio inputs were provided, but the tokenizer does not have an `audio_token` defined." + ) + + # Normalize audio input to list of waveforms + if isinstance(audio, np.ndarray) and audio.ndim == 1: + audio = [audio] + + # TODO: Add tests for audio-only processor inputs. + if not text: + text = [self.audio_token] * len(audio) + + # Dynamic audio token expansion wihtout padding: + # * Extract audio features with feature extractor; + # * Compute precise per-audio token counts from the waveform duration; + # * Generate full audio token sequence for each computed audio length; + # * Expand text prompts with full audio token sequences. + audio_kwargs = output_kwargs.get("audio_kwargs", {}) + audio_inputs = self.feature_extractor(audio, **audio_kwargs) + sampling_rate = self.feature_extractor.sampling_rate + num_audio_tokens = [self._compute_audio_num_tokens(a, sampling_rate) for a in audio] + replacements = [f"{self.boa_token}{self.audio_token * n}{self.eoa_token}" for n in num_audio_tokens] + replacements_iter = iter(replacements) + audio_pattern = re.escape(self.audio_token) + text = [re.sub(audio_pattern, lambda _: next(replacements_iter), prompt) for prompt in text] + + return_tensors = output_kwargs["text_kwargs"].pop("return_tensors", None) + return_mm_token_type_ids = output_kwargs["text_kwargs"].pop("return_mm_token_type_ids", False) + text_inputs = self.tokenizer(text=text, **output_kwargs["text_kwargs"]) + + # Check special tokens for all active modalities + active_modalities = [] + if images is not None: + active_modalities.append("image") + if videos is not None: + active_modalities.append("video") + if audio is not None: + active_modalities.append("audio") + if active_modalities: + self._check_special_mm_tokens(text, text_inputs, modalities=active_modalities) + + if return_mm_token_type_ids: + text_inputs["mm_token_type_ids"] = self.create_mm_token_type_ids(text_inputs["input_ids"]) + + return BatchFeature( + data={**text_inputs, **image_inputs, **audio_inputs, **video_inputs}, + tensor_type=return_tensors, + ) + + def _compute_audio_num_tokens(self, audio_waveform, sampling_rate: int) -> int: + """Compute the number of audio soft tokens for a single waveform. + + Replicates the exact sequence-length arithmetic of the audio encoder + so that the processor inserts the correct number of placeholder tokens. + The computation mirrors: + + 1. Mel framing via ``_unfold`` in ``Gemma4AudioFeatureExtractor`` + 2. Two ``Conv2d`` subsampling layers in ``Gemma4AudioSubSampleConvProjection`` + (each: kernel=3, stride=2, semicausal padding top=1, bottom=1) + + The result is capped at ``self.audio_seq_length`` (the configured maximum). + + Args: + audio_waveform: A 1-D numpy array or list containing the raw audio samples. + sampling_rate: The sampling rate of the audio waveform in Hz. + + Returns: + The number of audio soft tokens to insert as placeholders. + """ + num_samples = len(audio_waveform) + + # Step 1: Mel frames (matches feature_extraction_gemma4.py _unfold) + frame_length = int(round(sampling_rate * 20.0 / 1000.0)) # 320 @ 16kHz + hop_length = int(round(sampling_rate * 10.0 / 1000.0)) # 160 @ 16kHz + frame_size_for_unfold = frame_length + 1 # 321 + + # The feature extractor prepends (frame_length // 2) zero samples as + # semicausal time-padding before the unfold. We must include this to + # match the actual number of mel frames it produces. + pad_left = frame_length // 2 # 160 @ 16kHz + padded_samples = num_samples + pad_left + num_mel_frames = (padded_samples - frame_size_for_unfold) // hop_length + 1 + + if num_mel_frames <= 0: + return 0 + + # Step 2: Two SSCP conv layers (kernel=3, stride=2, semicausal pad top=1, bottom=1) + # Each layer: T_out = (T_in + pad_top + pad_bottom - kernel) // stride + 1 + t = num_mel_frames + for _ in range(2): + t_padded = t + 2 # pad_top=1, pad_bottom=1 + t = (t_padded - 3) // 2 + 1 + + # Cap at the configured maximum + return min(t, self.audio_seq_length) + + def _get_num_multimodal_tokens(self, image_sizes=None, audio_lengths=None, **kwargs): + """ + Computes the number of placeholder tokens needed for multimodal inputs with the given sizes. + + Args: + image_sizes (`list[list[int]]`, *optional*): + The input sizes formatted as (height, width) per each image. + audio_lengths (`list[int]`, *optional*): + The lengths of audio inputs in number of samples. Used to dynamically + compute per-audio token counts. + + Returns: + `MultiModalData`: A `MultiModalData` object holding number of tokens per each of the provided + input modalities, along with other useful data. + """ + + images_kwargs = Gemma4ProcessorKwargs._defaults.get("images_kwargs", {}) + images_kwargs.update(kwargs) + patch_size = images_kwargs.get("patch_size", None) or self.image_processor.patch_size + pooling_kernel_size = ( + images_kwargs.get("pooling_kernel_size", None) or self.image_processor.pooling_kernel_size + ) + max_soft_tokens = images_kwargs.get("max_soft_tokens", None) or self.image_processor.max_soft_tokens + + max_patches = max_soft_tokens * pooling_kernel_size**2 + + vision_data = {} + if image_sizes is not None: + num_image_tokens = [] + for image_size in image_sizes: + target_h, target_w = get_aspect_ratio_preserving_size( + height=image_size[0], + width=image_size[1], + patch_size=patch_size, + max_patches=max_patches, + pooling_kernel_size=pooling_kernel_size, + ) + patch_height = target_h // patch_size + patch_width = target_w // patch_size + num_image_tokens.append(patch_height * patch_width // pooling_kernel_size**2) + + num_image_patches = [1] * len(image_sizes) + vision_data.update({"num_image_tokens": num_image_tokens, "num_image_patches": num_image_patches}) + + if audio_lengths is not None: + # Dynamically compute per-audio token counts from sample lengths. + # audio_lengths are in number of samples; assume default sampling rate. + sampling_rate = getattr(self.feature_extractor, "sampling_rate", 16_000) + num_audio_tokens = [ + self._compute_audio_num_tokens(np.zeros(length), sampling_rate) for length in audio_lengths + ] + vision_data.update({"num_audio_tokens": num_audio_tokens}) + + return MultiModalData(**vision_data) + + @property + def model_input_names(self): + model_input_names = super().model_input_names + model_input_names = [ + name + for name in model_input_names + if name not in ["num_soft_tokens_per_image", "num_soft_tokens_per_video"] + ] + + # Include audio feature extractor input names if available + if self.feature_extractor is not None: + feature_extractor_input_names = self.feature_extractor.model_input_names + model_input_names.extend([name for name in feature_extractor_input_names if name not in model_input_names]) + + return model_input_names + ["mm_token_type_ids"] + + +__all__ = ["Gemma4Processor"] diff --git a/tooling/huggingface/transformers/video_processing_gemma4.py b/tooling/huggingface/transformers/video_processing_gemma4.py new file mode 100644 index 0000000..d867d31 --- /dev/null +++ b/tooling/huggingface/transformers/video_processing_gemma4.py @@ -0,0 +1,237 @@ +# Copyright 2026 the HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import torch + +from ...image_processing_utils import BatchFeature +from ...processing_utils import Unpack, VideosKwargs +from ...utils import ( + TensorType, + add_start_docstrings, + is_torch_available, + is_torchvision_available, + is_torchvision_v2_available, + is_vision_available, + logging, +) +from ...video_processing_utils import BASE_VIDEO_PROCESSOR_DOCSTRING, BaseVideoProcessor +from ...video_utils import VideoInput +from .image_processing_gemma4 import _SUPPORTED_SOFT_TOKENS, get_aspect_ratio_preserving_size + + +if is_vision_available(): + from ...image_utils import PILImageResampling + +if is_torch_available(): + import torch + +if is_torchvision_v2_available(): + from torchvision.transforms.v2 import functional as F +elif is_torchvision_available(): + from torchvision.transforms import functional as F + + +logger = logging.get_logger(__name__) + + +class Gemma4VideoProcessorKwargs(VideosKwargs, total=False): + """ + patch_size (`int`, *optional*): + Size of each image patch in pixels. + max_soft_tokens (`int`, *optional*): + Maximum number of soft (vision) tokens per video frame. + Must be one of {70, 140, 280, 560, 1120}. + pooling_kernel_size (`int`, *optional*): + Spatial pooling kernel size applied after patchification. + """ + + patch_size: int + max_soft_tokens: int + pooling_kernel_size: int + + +def convert_video_to_patches(video: "torch.Tensor", patch_size: int) -> "torch.Tensor": + """ + Convert 4D tensor video of shape (num_frames, num_channels, height, width) into 3D tensor of patches of shape + (num_frames, num_patches_height * num_patches_width, patch_size * patch_size * num_channels). + """ + num_frames, num_channels, height, width = video.shape + num_patches_height = height // patch_size + num_patches_width = width // patch_size + patched_video = video.reshape( + num_frames, num_channels, num_patches_height, patch_size, num_patches_width, patch_size + ) + patched_video = patched_video.permute(0, 2, 4, 3, 5, 1) + patched_video = patched_video.reshape(num_frames, num_patches_height * num_patches_width, -1) + return patched_video + + +def pad_to_max_patches( + video: "torch.Tensor", positions: "torch.Tensor", target_length: int +) -> tuple["torch.Tensor", "torch.Tensor"]: + """ + Pad the video along to max number of patches + """ + current_length = video.shape[1] + padding_length = target_length - current_length + if padding_length > 0: + padding = [0, 0, 0, padding_length, 0, 0] + pos_padding = (0, 0, 0, padding_length, 0, 0) + video = torch.nn.functional.pad(video, padding, mode="constant", value=0) + positions = torch.nn.functional.pad(positions, pos_padding, mode="constant", value=-1) + return video, positions + + +@add_start_docstrings( + "Constructs a Gemma4 video processor that samples frames from videos for use with the Gemma4 model.", + BASE_VIDEO_PROCESSOR_DOCSTRING, +) +class Gemma4VideoProcessor(BaseVideoProcessor): + resample = PILImageResampling.BICUBIC + image_mean = [0.0, 0.0, 0.0] + image_std = [1.0, 1.0, 1.0] + size = None + default_to_square = True + do_convert_rgb = True + do_resize = True + do_rescale = True + do_normalize = True + num_frames = 32 + do_sample_frames = True + patch_size = 16 + max_soft_tokens = 70 + pooling_kernel_size = 3 + valid_kwargs = Gemma4VideoProcessorKwargs + model_input_names = ["pixel_values_videos", "video_position_ids"] + + def __init__(self, **kwargs: Unpack[Gemma4VideoProcessorKwargs]): + super().__init__(**kwargs) + + if self.max_soft_tokens not in _SUPPORTED_SOFT_TOKENS: + raise ValueError(f"`max_soft_tokens` must be one of {_SUPPORTED_SOFT_TOKENS}, got {self.max_soft_tokens}.") + + def _validate_preprocess_kwargs(self, **kwargs): + # Gemma4 uses aspect_ratio_preserving_resize driven by patch_size, + # max_soft_tokens, and pooling_kernel_size — not the standard `size` + # parameter. Temporarily disable do_resize so the base validation + # doesn't require `size` to be set. + kwargs["do_resize"] = False + super()._validate_preprocess_kwargs(**kwargs) + + def aspect_ratio_preserving_resize( + self, + video: torch.Tensor, + patch_size: int, + max_patches: int, + pooling_kernel_size: int, + resample: F.InterpolationMode, + ) -> torch.Tensor: + height, width = video.shape[-2], video.shape[-1] + target_height, target_width = get_aspect_ratio_preserving_size( + height=height, + width=width, + patch_size=patch_size, + max_patches=max_patches, + pooling_kernel_size=pooling_kernel_size, + ) + + if target_height == height and target_width == width: + return video + + return F.resize( + video, + size=[target_height, target_width], + interpolation=resample, + antialias=True, + ) + + def preprocess( + self, + videos: VideoInput, + **kwargs: Unpack[Gemma4VideoProcessorKwargs], + ) -> BatchFeature: + return super().preprocess(videos, **kwargs) + + def _preprocess( + self, + videos: list["torch.Tensor"], + do_resize: bool, + resample: "F.InterpolationMode | int | None", + do_rescale: bool, + rescale_factor: float, + do_normalize: bool, + image_mean: float | list[float] | None, + image_std: float | list[float] | None, + return_tensors: str | TensorType | None, + patch_size: int | None = None, + max_soft_tokens: int | None = None, + pooling_kernel_size: int | None = None, + **kwargs, + ) -> BatchFeature: + if max_soft_tokens not in _SUPPORTED_SOFT_TOKENS: + raise ValueError(f"`max_soft_tokens` must be one of {_SUPPORTED_SOFT_TOKENS}, got {max_soft_tokens}.") + + max_patches = max_soft_tokens * pooling_kernel_size**2 + + pixel_values = [] + position_ids = [] + num_soft_tokens_per_video = [] + num_frames = 1 + + for video in videos: + if do_resize: + video = self.aspect_ratio_preserving_resize( + video=video, + patch_size=patch_size, + max_patches=max_patches, + pooling_kernel_size=pooling_kernel_size, + resample=resample, + ) + + video = self.rescale_and_normalize(video, do_rescale, rescale_factor, do_normalize, image_mean, image_std) + + num_frames = video.shape[0] + patch_height = video.shape[-2] // patch_size + patch_width = video.shape[-1] // patch_size + patches = convert_video_to_patches(video, patch_size) + num_soft_tokens_per_video.append(patches.shape[1] // pooling_kernel_size**2) + + device = video.device + patch_grid = torch.meshgrid( + torch.arange(patch_width, device=device), + torch.arange(patch_height, device=device), + indexing="xy", + ) + stacked_grid = torch.stack(patch_grid, dim=-1) + real_positions = stacked_grid.reshape(patches.shape[1], 2) + real_positions = real_positions[None, ...].repeat(num_frames, 1, 1) + + patches, positions = pad_to_max_patches(patches, real_positions, max_patches) + pixel_values.append(patches) + position_ids.append(positions) + + # Stack into batch tensors + pixel_values = torch.stack(pixel_values, dim=0) # (num_videos, num_frames, max_patches, patch_pixels) + position_ids = torch.stack(position_ids, dim=0) # (num_videos, num_frames, max_patches, 2) + + data = { + "pixel_values_videos": pixel_values, + "video_position_ids": position_ids, + "num_soft_tokens_per_video": num_soft_tokens_per_video, + } + return BatchFeature(data=data, tensor_type=return_tensors) + + +__all__ = ["Gemma4VideoProcessor"] diff --git a/tooling/inference-frameworks/README.md b/tooling/inference-frameworks/README.md new file mode 100644 index 0000000..c3f98f0 --- /dev/null +++ b/tooling/inference-frameworks/README.md @@ -0,0 +1,71 @@ +# Gemma 4 — Inference Framework Support Matrix + +> Non-Ollama frameworks. Ollama is covered separately in the parent research corpus. +> Verified against upstream repos, model cards, and docs on **2026-04-18**. + +## Summary table + +| # | Framework | Gemma 4 support | Vision | Audio | Tool calling | Quantization options | Canonical run command | +|---|---|---|---|---|---|---|---| +| 1 | **vLLM** | Native, upstream merged — `gemma4.py` (text) + `gemma4_mm.py` (multimodal). Registered in `registry.py` as `Gemma4ForCausalLM` and `Gemma4ForConditionalGeneration`. | Yes (all sizes) | Yes (E2B/E4B) | Yes — OpenAI-compatible `/v1/chat/completions` with `tools=[...]` | AWQ, GPTQ, FP8, NVFP4 (via `--quantization modelopt`), BF16 | `vllm serve google/gemma-4-31b-it --tensor-parallel-size 2` | +| 2 | **llama.cpp / GGUF** | Native — `Gemma4Model` + `Gemma4VisionAudioModel` registered in `convert_hf_to_gguf.py` (lines 7666 & 7791). Distinct `GEMMA4V` + `GEMMA4A` projector types. Official GGUFs published at `ggml-org/gemma-4-*-GGUF`. | Yes (all, via mmproj) | Yes (E-series, via mmproj) | Yes — `llama-server` exposes OpenAI-compatible tools API | Q4_K_M, Q8_0, BF16 published officially; full quant menu via self-convert | `llama-server -hf ggml-org/gemma-4-E4B-it-GGUF` | +| 3 | **Apple MLX** | Native in `mlx-lm` (text, `gemma4.py` + `gemma4_text.py`) and `mlx-vlm` (multimodal, `mlx_vlm/models/gemma4/` with `audio.py`, `vision.py`, `language.py`, `processing_gemma4.py`) | Yes (mlx-vlm) | Yes (mlx-vlm) | Community; no first-party tools wrapper | 4bit, 8bit, bf16 via MLX quantize | `mlx_vlm.generate --model mlx-community/gemma-4-E4B-it-8bit --image URL --prompt "..."` | +| 4 | **Keras / keras-hub** | Native, full modular impl: `keras_hub/src/models/gemma4/` with `attention`, `audio_encoder`, `vision_encoder`, `decoder_block`, `moe`, `causal_lm`, etc. 8 presets (base + instruct × 2B/4B/26B_a4b/31B). | Yes | Yes | No (it's a training library, not an inference server) | Via Keras mixed-precision; no canonical GGUF/AWQ path | `keras_hub.models.Gemma4CausalLM.from_preset("gemma4_instruct_4b")` | +| 5 | **HF Text Generation Inference (TGI)** | **No native support.** Supported-models page stops at Gemma 3 / Gemma 3 Text. No open or merged PRs for "gemma4" (verified). Will fall back to unoptimized `AutoModelForCausalLM` path. | Fallback only, no vision kernels | No | Fallback only | Whatever HF transformers exposes on the fallback path | `text-generation-launcher --model-id google/gemma-4-31b-it` (degraded) | +| 6 | **TensorRT-LLM / NVIDIA NIM** | **Not in the 2026-04 support matrix.** Matrix lists `Gemma3ForCausalLM`/`Gemma3ForConditionalGeneration` but no Gemma 4 entry. GitHub issue #12764 tracks broken runtime on DGX Spark/GB10. NVIDIA's own `nvidia/Gemma-4-31B-IT-NVFP4` card tells users to run it on **vLLM**, not TRT-LLM. | N/A | N/A | N/A | NVFP4 export exists but runtime is broken; use the NVFP4 weights in vLLM instead | Avoid — use `vllm serve nvidia/Gemma-4-31B-IT-NVFP4 --quantization modelopt` | +| 7 | **Gemini API (AI Studio)** | Hosted. Model IDs: `gemma-4-31b-it`, `gemma-4-26b-a4b-it`. E-series NOT exposed (on-device only). | Yes (via `inlineData` parts) | No (Gemini API strips the audio path) | Yes — same `tools=[...]` schema as Gemini models | N/A (Google-managed) | `curl .../v1beta/models/gemma-4-26b-a4b-it:generateContent -d @payload.json` | +| 8 | **Vertex AI Model Garden** | One-click deploy. Model card: `console.cloud.google.com/vertex-ai/publishers/google/model-garden/gemma4`. Publisher ID format `google/gemma4@gemma-4-31b-it`. 26B-A4B is offered fully managed & serverless; 31B requires self-provisioned GPU endpoint. | Yes (via endpoint backend — vLLM under the hood) | Yes for E-series variants deployed that way | Yes (endpoint inherits from backing runtime) | Depends on backing image (vLLM/SAX) — BF16, FP8, AWQ selectable at deploy time | `model_garden.OpenModel("google/gemma4@gemma-4-31b-it").deploy()` | + +## Production-readiness ranking + +1. **vLLM** — most complete, most optimized, only runtime with first-party NVFP4 support and tested multimodal (image+audio+video). +2. **llama.cpp / GGUF** — best for local CPU + small GPU, only framework with audio mmproj shipping as a downloadable file for E-series, official Google-published quants via `ggml-org/*`. +3. **Gemini API / Vertex AI** — if you don't want to self-host; Vertex gives you the managed-endpoint exit path with vLLM under the hood. +4. **Apple MLX** — production-ready on Apple Silicon only; `mlx-vlm` is community-maintained but actively updated. +5. **Keras-hub** — reference/training, not inference-server. +6. **TGI** — usable as a *fallback* only; no optimized path yet. +7. **TensorRT-LLM** — **avoid for Gemma 4.** NVIDIA themselves point at vLLM. + +## Capabilities beyond Ollama + +- **Native audio input** — Ollama does **not** currently expose the E2B/E4B audio tower. Three frameworks do: + - **llama.cpp** with the `mmproj-...-E4B-it-*.gguf` projector (`VisionProjectorType.GEMMA4A`), + - **vLLM** via `gemma4_mm.py` (`input_features_padded`, `input_features_mask`), + - **MLX** via `mlx-vlm/models/gemma4/audio.py`. + If Seth ever wants the speech-transcription path, llama.cpp with the E4B mmproj is the shortest route from where he already is. +- **Video with interleaved audio** — vLLM's `gemma4_mm.py` decomposes videos into up to 32 timestamped frames; with E-series models it also loads the audio track (`load_audio_from_video=True`). Ollama has no video path at all. +- **NVFP4 on Blackwell** — vLLM only. `nvidia/Gemma-4-31B-IT-NVFP4` reports ~0.3 pp accuracy loss vs BF16 on GPQA Diamond / MMLU Pro. + +## Framework to avoid + +**TensorRT-LLM.** Not in the upstream support matrix as of 2026-04, known runtime bug on DGX Spark/GB10 (issue #12764), and NVIDIA's own NVFP4 checkpoint directs users to vLLM. Revisit only after a future TRT-LLM release lists `Gemma4ForCausalLM` in the support matrix. + +## Files in this directory + +``` +inference-frameworks/ +├── README.md — this file +├── run_commands.sh — canonical one-liners per framework +└── snippets/ + ├── llamacpp_convert_gemma4_excerpt.py — Gemma4Model + Gemma4VisionAudioModel from convert_hf_to_gguf.py (lines 7666-7840) + ├── vllm_gemma4_head_80.py — gemma4.py header (imports, config deref) + ├── vllm_gemma4_mm_head_80.py — gemma4_mm.py header (multimodal docstring lists image/audio/video) + ├── vllm_registry_excerpt.txt — registry.py Gemma4 registrations + ├── mlx_gemma4_head_100.py — mlx-lm gemma4.py (text) first 100 lines + ├── mlx_vlm_gemma4_head_60.py — mlx-vlm gemma4/gemma4.py (multimodal) first 60 lines + ├── keras_hub_gemma4.py — canonical keras-hub example + preset list + ├── gemini_api_gemma4.sh — canonical curl example + └── gemini_api_gemma4.py — canonical google-genai Python SDK example +``` + +## Notable upstream references + +- vLLM Gemma 4 model class: `vllm-project/vllm:vllm/model_executor/models/gemma4.py` and `gemma4_mm.py` +- llama.cpp HF → GGUF converter: `ggml-org/llama.cpp:convert_hf_to_gguf.py` lines 7666-7840 +- Official Google GGUF repos (verified live): `ggml-org/gemma-4-{E2B,E4B,31B,26b-a4b}-it-GGUF` — all ship mmproj projector files +- HF blog: huggingface.co/blog/gemma4 — shows `AutoModelForMultimodalLM` is the canonical transformers entry point +- NVIDIA NVFP4 checkpoint: `nvidia/Gemma-4-31B-IT-NVFP4` — runtime=vLLM, not TRT-LLM +- Gemini API doc: ai.google.dev/gemma/docs/core/gemma_on_gemini_api +- Vertex AI Model Garden: console.cloud.google.com/vertex-ai/publishers/google/model-garden/gemma4 +- TGI supported-models list (confirming *absence* of Gemma 4): huggingface.co/docs/text-generation-inference/supported_models +- TRT-LLM support matrix (confirming *absence*): nvidia.github.io/TensorRT-LLM/reference/support-matrix.html diff --git a/tooling/inference-frameworks/run_commands.sh b/tooling/inference-frameworks/run_commands.sh new file mode 100644 index 0000000..f14bd21 --- /dev/null +++ b/tooling/inference-frameworks/run_commands.sh @@ -0,0 +1,70 @@ +#!/usr/bin/env bash +# Canonical one-liners to serve Gemma 4 across inference frameworks. +# Verified against upstream repos / model cards on 2026-04-18. +# Not meant to be executed as a script — each block is a standalone example. + +### 1. vLLM — full multimodal (text + vision + audio + video) ### +# Text-only 31B dense: +vllm serve google/gemma-4-31b-it --tensor-parallel-size 2 +# Multimodal E4B (vision + audio): +vllm serve google/gemma-4-E4B-it --limit-mm-per-prompt image=4,audio=1 +# NVFP4-quantized 31B on Blackwell/H100 (NVIDIA's official quant): +vllm serve nvidia/Gemma-4-31B-IT-NVFP4 --quantization modelopt --tensor-parallel-size 8 + +### 2. llama.cpp — official ggml-org GGUFs ### +# Text-only via -hf shortcut (auto-download, default = Q4_K_M if multiple present): +llama-server -hf ggml-org/gemma-4-E4B-it-GGUF +# Choose a specific quant: +llama-server -hf ggml-org/gemma-4-26b-a4b-it-GGUF:Q4_K_M +# Vision (+ audio for E-series) — add --mmproj pointing to the projector: +llama-server -hf ggml-org/gemma-4-E4B-it-GGUF \ + --mmproj ggml-org/gemma-4-E4B-it-GGUF/mmproj-gemma-4-E4B-it-Q8_0.gguf +# Convert a new HF checkpoint to GGUF yourself: +python convert_hf_to_gguf.py /path/to/google/gemma-4-31b-it --outfile gemma-4-31b.gguf + +### 3. Apple MLX — text via mlx-lm, multimodal via mlx-vlm (community) ### +# Text generation (mlx-lm, first-party Apple): +mlx_lm.generate --model mlx-community/gemma-4-E4B-it-4bit --prompt "Hello" +# Vision/audio (mlx-vlm, Prince Canuma / community): +mlx_vlm.generate --model mlx-community/gemma-4-E4B-it-8bit \ + --image https://example.com/cat.jpg --prompt "Describe this image." + +### 4. Keras / keras-hub — reference implementation, training-focused ### +# python: +# import keras_hub +# model = keras_hub.models.Gemma4CausalLM.from_preset("gemma4_instruct_4b") +# model.generate("Hello", max_length=128) +# Presets: gemma4_{2b,4b,26b_a4b,31b} and gemma4_instruct_{...} + +### 5. Text Generation Inference (TGI) — NO native Gemma 4 support as of 2026-04-18 ### +# Upstream supported_models list stops at Gemma 3 / Gemma 3 Text. +# Fallback: TGI will try AutoModelForCausalLM without optimized kernels — +# expect degraded throughput and no guarantee of vision/audio paths. +text-generation-launcher --model-id google/gemma-4-31b-it # unoptimized fallback + +### 6. TensorRT-LLM — NOT supported ### +# Support matrix (2026-04) lists Gemma2 and Gemma3{ForCausalLM,ForConditionalGeneration} +# but NOT Gemma4. NVIDIA's own nvidia/Gemma-4-31B-IT-NVFP4 card points users to vLLM. +# Issue #12764 tracks DGX Spark runtime skew. Avoid for production Gemma 4. + +### 7. Gemini API (Google AI Studio) — hosted Gemma 4 ### +curl "https://generativelanguage.googleapis.com/v1beta/models/gemma-4-26b-a4b-it:generateContent" \ + -H 'Content-Type: application/json' \ + -H "x-goog-api-key: $GEMINI_API_KEY" \ + -X POST \ + -d '{"contents":[{"parts":[{"text":"Your prompt here"}]}]}' +# Python SDK (google-genai): +# from google import genai +# client = genai.Client() +# resp = client.models.generate_content(model="gemma-4-26b-a4b-it", contents="Hi") +# print(resp.text) +# Hosted model IDs: gemma-4-31b-it, gemma-4-26b-a4b-it + +### 8. Vertex AI Model Garden — one-click deploy ### +# Console: https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/gemma4 +# CLI (new model-garden command): +gcloud ai model-garden models list | grep gemma-4 +# Python SDK (vertex-ai-model-garden): +# from google.cloud.aiplatform import model_garden +# model = model_garden.OpenModel("google/gemma4@gemma-4-31b-it") +# endpoint = model.deploy() # spins up Vertex endpoint with backing GPUs diff --git a/tooling/inference-frameworks/snippets/gemini_api_gemma4.py b/tooling/inference-frameworks/snippets/gemini_api_gemma4.py new file mode 100644 index 0000000..5c173bb --- /dev/null +++ b/tooling/inference-frameworks/snippets/gemini_api_gemma4.py @@ -0,0 +1,26 @@ +"""Canonical Gemma 4 call via the google-genai Python SDK (Gemini API). + +Source: https://ai.google.dev/gemma/docs/core/gemma_on_gemini_api + +Install: pip install google-genai +Env: GEMINI_API_KEY=... (from https://aistudio.google.com/apikey) + +Hosted model IDs (2026-04): + - gemma-4-31b-it + - gemma-4-26b-a4b-it + +The E-series (E2B, E4B) is NOT exposed via the Gemini API — those are +on-device-only checkpoints. For them you must self-host (Ollama, +llama.cpp, vLLM, MLX). +""" + +from google import genai + +client = genai.Client() # picks up GEMINI_API_KEY from env + +response = client.models.generate_content( + model="gemma-4-26b-a4b-it", + contents="Write a haiku about inference framework fragmentation.", +) + +print(response.text) diff --git a/tooling/inference-frameworks/snippets/gemini_api_gemma4.sh b/tooling/inference-frameworks/snippets/gemini_api_gemma4.sh new file mode 100644 index 0000000..a9253a8 --- /dev/null +++ b/tooling/inference-frameworks/snippets/gemini_api_gemma4.sh @@ -0,0 +1,17 @@ +#!/usr/bin/env bash +# Canonical Gemma 4 call via the Gemini API (Google AI Studio). +# Source: https://ai.google.dev/gemma/docs/core/gemma_on_gemini_api +# Hosted model IDs (2026-04): gemma-4-31b-it, gemma-4-26b-a4b-it +# Note: hosted variants are the big ones only; on-device E2B/E4B are NOT served on the Gemini API. + +export GEMINI_API_KEY="..." # from https://aistudio.google.com/apikey + +curl "https://generativelanguage.googleapis.com/v1beta/models/gemma-4-26b-a4b-it:generateContent" \ + -H 'Content-Type: application/json' \ + -H "x-goog-api-key: ${GEMINI_API_KEY}" \ + -X POST \ + -d '{ + "contents": [{ + "parts": [{"text": "Write a haiku about inference framework fragmentation."}] + }] + }' diff --git a/tooling/inference-frameworks/snippets/keras_hub_gemma4.py b/tooling/inference-frameworks/snippets/keras_hub_gemma4.py new file mode 100644 index 0000000..1f7e9e3 --- /dev/null +++ b/tooling/inference-frameworks/snippets/keras_hub_gemma4.py @@ -0,0 +1,30 @@ +"""Canonical Keras / keras-hub example for Gemma 4. + +Source: keras-team/keras-hub — keras_hub/src/models/gemma4/ +Requires: pip install keras-hub keras[jax] (or keras[torch] / keras[tensorflow]) + +Presets (verified 2026-04-18 from gemma4_presets.py): + gemma4_2b gemma4_instruct_2b + gemma4_4b gemma4_instruct_4b + gemma4_26b_a4b gemma4_instruct_26b_a4b + gemma4_31b gemma4_instruct_31b + +Keras-hub is the reference implementation maintained by the Keras team +(Google). It ships all components modularly — see the directory listing: +gemma4_attention, gemma4_audio_encoder, gemma4_vision_encoder, +gemma4_moe, gemma4_decoder_block, gemma4_causal_lm, etc. This makes it +the most legible path to *read* the architecture, but it is a +training/fine-tuning tool — not a production inference server. +""" + +import keras_hub + +# Text causal LM +model = keras_hub.models.Gemma4CausalLM.from_preset("gemma4_instruct_4b") +print(model.generate("Write a haiku about JAX.", max_length=128)) + +# For multimodal (vision/audio) use the backbone + preprocessors directly: +# backbone = keras_hub.models.Gemma4Backbone.from_preset("gemma4_instruct_4b") +# preproc = keras_hub.models.Gemma4CausalLMPreprocessor.from_preset("gemma4_instruct_4b") +# Vision and audio encoders are in separate modules (gemma4_vision_encoder, +# gemma4_audio_encoder) and are wired by the backbone when preset includes them. diff --git a/tooling/inference-frameworks/snippets/llamacpp_convert_gemma4_excerpt.py b/tooling/inference-frameworks/snippets/llamacpp_convert_gemma4_excerpt.py new file mode 100644 index 0000000..e34d0e6 --- /dev/null +++ b/tooling/inference-frameworks/snippets/llamacpp_convert_gemma4_excerpt.py @@ -0,0 +1,175 @@ +@ModelBase.register("Gemma4ForConditionalGeneration") +class Gemma4Model(Gemma3Model): + model_arch = gguf.MODEL_ARCH.GEMMA4 + + def norm_shift(self, name: str) -> float: + del name # unused + return 0.0 + + def set_vocab(self): + vocab = gguf.LlamaHfVocab(self.dir_model) + tokens = [] + scores = [] + toktypes = [] + visible_tokens = {"<|channel>", "", "<|tool_call>", "", "<|tool_response>", "", "<|\"|>"} + + for text, score, toktype in vocab.all_tokens(): + tokens.append(text) + scores.append(score) + text_str = text.decode() + if text_str in visible_tokens: + # always render these tokens, so that the chat parser can read them + toktypes.append(gguf.TokenType.USER_DEFINED) + logger.info(f"Token '{text_str}' is set to USER_DEFINED") + else: + toktypes.append(toktype) + + assert len(tokens) == vocab.vocab_size + + self.gguf_writer.add_tokenizer_model("gemma4") + self.gguf_writer.add_token_list(tokens) + self.gguf_writer.add_token_scores(scores) + self.gguf_writer.add_token_types(toktypes) + + special_vocab = gguf.SpecialVocab(self.dir_model, load_merges=True) + special_vocab.add_to_gguf(self.gguf_writer) + self.gguf_writer.add_add_space_prefix(False) + self.gguf_writer.add_add_bos_token(True) + + def set_gguf_parameters(self): + super().set_gguf_parameters() + + num_kv_shared_layers = self.hparams["num_kv_shared_layers"] + self.gguf_writer.add_shared_kv_layers(num_kv_shared_layers) + + # per-layer embedding is optional + n_pl_embd = self.hparams.get("hidden_size_per_layer_input") or 0 + self.gguf_writer.add_embedding_length_per_layer_input(n_pl_embd) + + swa_layers = [t == "sliding_attention" for t in self.hparams["layer_types"]] + self.gguf_writer.add_sliding_window_pattern(swa_layers) + + head_dim_full = self.hparams["global_head_dim"] + head_dim_swa = self.hparams["head_dim"] + # correct the head dim for global/swa layers + self.gguf_writer.add_key_length(head_dim_full) + self.gguf_writer.add_value_length(head_dim_full) + self.gguf_writer.add_key_length_swa(head_dim_swa) + self.gguf_writer.add_value_length_swa(head_dim_swa) + + expert_intermediate_size = self.find_hparam(["expert_intermediate_size", "moe_intermediate_size"]) + if expert_intermediate_size is not None: + self.gguf_writer.add_expert_feed_forward_length(expert_intermediate_size) + + # if use_double_wide_mlp is set, we need to adjust the value for kv shared layers + use_double_wide_mlp = self.hparams.get("use_double_wide_mlp", False) + first_kv_shared_layer_idx = self.block_count - num_kv_shared_layers + if use_double_wide_mlp: + n_ff = self.hparams["intermediate_size"] + n_ff_arr = [n_ff if il < first_kv_shared_layer_idx else n_ff * 2 for il in range(self.block_count)] + self.gguf_writer.add_feed_forward_length(n_ff_arr) + + # handle num_global_key_value_heads + num_key_value_heads_full = self.hparams.get("num_global_key_value_heads") + num_key_value_heads_swa = self.hparams.get("num_key_value_heads") + if num_key_value_heads_full is not None and num_key_value_heads_swa is not None: + value_arr = [num_key_value_heads_swa if is_swa else num_key_value_heads_full for is_swa in swa_layers] + self.gguf_writer.add_head_count_kv(value_arr) + + # handle n_rot differently for global vs swa layers + partial_rotary_factor_swa = self.hparams.get("partial_rotary_factor", 1.0) + n_rot_full = int(head_dim_full) # "proportional" is used, see generate_extra_tensors + n_rot_swa = int(head_dim_swa * partial_rotary_factor_swa) + self.gguf_writer.add_rope_dimension_count(n_rot_full) + self.gguf_writer.add_rope_dimension_count_swa(n_rot_swa) + + def generate_extra_tensors(self) -> Iterable[tuple[str, Tensor]]: + # full layer uses "proportional" rope with partial_rotary_factor=0.25 + # the expected ordering is cc000000ss000000 (c = cos, s = sin, 0 = unrotated), + # but ggml neox only supports ccss000000000000, and we cannot rearrange the head because that will break use_alternative_attention + # solution is to set specific freq_factors for the unrotated dims + + # IMPORTANT: this ROPE_FREQS tensor is ONLY used by the full_attention layers + rope_params_full = self.hparams["rope_parameters"]["full_attention"] + assert rope_params_full["rope_type"] == "proportional" + head_dim_full = (self.hparams["global_head_dim"]) + partial_rotary_factor_full = rope_params_full["partial_rotary_factor"] + n_rot_full = int(head_dim_full * partial_rotary_factor_full / 2) + n_unrot_full = int(head_dim_full / 2) - n_rot_full + values = [1.0] * n_rot_full + [1e30] * n_unrot_full + rope_freqs_full = torch.tensor(values, dtype=torch.float32) + yield (self.format_tensor_name(gguf.MODEL_TENSOR.ROPE_FREQS), rope_freqs_full) + + def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]: + if name.endswith("per_dim_scale") or name.endswith("layer_scalar"): + name = name + ".weight" + + if "language_model." not in name and "rope_freqs" not in name: + return # skip non-language model tensors + + name = name.replace("language_model.", "") + if name.endswith("router.scale"): + name = self.format_tensor_name(gguf.MODEL_TENSOR.FFN_GATE_INP, bid, ".scale") + yield (name, data_torch) + return + if ".per_expert_scale" in name: + # convert per-expert scale to FFN down scale + name = self.format_tensor_name(gguf.MODEL_TENSOR.FFN_DOWN_EXP, bid, ".scale") + yield (name, data_torch) + return + if ".experts." in name and not name.endswith(".weight"): + name += ".weight" + + yield from super().modify_tensors(data_torch, name, bid) + + +@ModelBase.register("Gemma4ForConditionalGeneration") +class Gemma4VisionAudioModel(MmprojModel): + has_audio_encoder = True + has_vision_encoder = True + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + assert self.hparams_vision is not None + self.hparams_vision["image_size"] = 224 # unused, but set to avoid error + + # remap audio hparams + if self.hparams_audio: + self.hparams_audio["feat_in"] = self.hparams_audio.get("input_feat_size", 128) + self.hparams_audio["intermediate_size"] = self.hparams_audio["hidden_size"] * 4 + else: + self.has_audio_encoder = False + + def set_gguf_parameters(self): + super().set_gguf_parameters() + + # vision params + self.gguf_writer.add_clip_vision_projector_type(gguf.VisionProjectorType.GEMMA4V) + self.gguf_writer.add_vision_attention_layernorm_eps(self.hparams.get("layer_norm_eps", 1e-6)) + + # audio params + if self.hparams_audio: + self.gguf_writer.add_clip_audio_projector_type(gguf.VisionProjectorType.GEMMA4A) + self.gguf_writer.add_audio_num_mel_bins(self.hparams_audio["feat_in"]) + self.gguf_writer.add_audio_attention_layernorm_eps(1e-5) + + def is_audio_tensor(self, name: str) -> bool: + return "audio_tower" in name or "embed_audio" in name + + def tensor_force_quant(self, name, new_name, bid, n_dims): + if self.is_audio_tensor(name): + if ".conv" in name or "_conv" in name and ".weight" in name: + return gguf.GGMLQuantizationType.F32 + if "position_embedding_table" in name: + return gguf.GGMLQuantizationType.F32 + return super().tensor_force_quant(name, new_name, bid, n_dims) + + def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]: + del bid # unused + + if name.startswith("model.language_model."): + return # skip + + if len(data_torch.shape) == 0: + # convert scalar tensors (input/output_mix/max) to 1D tensors + data_torch = data_torch.unsqueeze(0) diff --git a/tooling/inference-frameworks/snippets/mlx_gemma4_head_100.py b/tooling/inference-frameworks/snippets/mlx_gemma4_head_100.py new file mode 100644 index 0000000..2489537 --- /dev/null +++ b/tooling/inference-frameworks/snippets/mlx_gemma4_head_100.py @@ -0,0 +1,92 @@ +# Copyright © 2025 Apple Inc. + +from dataclasses import dataclass +from typing import Optional + +import mlx.core as mx +import mlx.nn as nn +from mlx.utils import tree_flatten, tree_unflatten + +from . import gemma4_text +from .base import BaseModelArgs + + +@dataclass +class ModelArgs(BaseModelArgs): + model_type: str = "gemma4" + text_config: dict = None + vocab_size: int = 262144 + + def __post_init__(self): + if self.text_config is None: + self.text_config = {} + self.text_config["vocab_size"] = self.vocab_size + self.text_config["num_attention_heads"] = self.text_config.get( + "num_attention_heads", 8 + ) + self.text_config["num_key_value_heads"] = self.text_config.get( + "num_key_value_heads", 1 + ) + + +class Model(nn.Module): + def __init__(self, args: ModelArgs): + super().__init__() + self.args = args + self.model_type = args.model_type + self.language_model = gemma4_text.Model( + gemma4_text.ModelArgs.from_dict(args.text_config) + ) + + def __call__( + self, + inputs: mx.array, + cache=None, + input_embeddings: Optional[mx.array] = None, + per_layer_inputs: Optional[mx.array] = None, + ): + return self.language_model( + inputs, + cache=cache, + input_embeddings=input_embeddings, + per_layer_inputs=per_layer_inputs, + ) + + def sanitize(self, weights): + new_weights = {} + for k, v in weights.items(): + starts_w_model = k.startswith("model.") + + k = k.removeprefix("model.") + if k.startswith( + ( + "vision_tower", + "multi_modal_projector", + "audio_tower", + "embed_audio", + "embed_vision", + ) + ): + continue + + if not starts_w_model: + new_weights[k] = v + continue + + if k.startswith("language_model"): + k = k.replace("language_model.", "language_model.model.") + + new_weights[k] = v + + return self.language_model.sanitize(new_weights) + + @property + def layers(self): + return self.language_model.layers + + @property + def quant_predicate(self): + return self.language_model.quant_predicate + + def make_cache(self): + return self.language_model.make_cache() diff --git a/tooling/inference-frameworks/snippets/mlx_vlm_gemma4_head_60.py b/tooling/inference-frameworks/snippets/mlx_vlm_gemma4_head_60.py new file mode 100644 index 0000000..0ef80a0 --- /dev/null +++ b/tooling/inference-frameworks/snippets/mlx_vlm_gemma4_head_60.py @@ -0,0 +1,60 @@ +from typing import Optional + +import mlx.core as mx +import mlx.nn as nn + +from ..base import InputEmbeddingsFeatures +from .audio import AudioEncoder +from .config import ModelConfig +from .language import LanguageModel, RMSNormNoScale +from .vision import VisionModel + + +def masked_scatter(input_tensor, mask, source): + mask_flat = mask.flatten().astype(mx.int32) + indices = mx.cumsum(mask_flat) - 1 + aligned = source.flatten()[indices % source.size] + return mx.where(mask_flat, aligned, input_tensor.flatten()).reshape( + input_tensor.shape + ) + + +class MultimodalEmbedder(nn.Module): + """Projects soft tokens from vision/audio into language model space.""" + + def __init__(self, embedding_dim: int, text_hidden_size: int, eps: float = 1e-6): + super().__init__() + self.embedding_projection = nn.Linear( + embedding_dim, text_hidden_size, bias=False + ) + self.embedding_pre_projection_norm = RMSNormNoScale(embedding_dim, eps=eps) + + def __call__(self, inputs_embeds: mx.array) -> mx.array: + normed = self.embedding_pre_projection_norm(inputs_embeds) + return self.embedding_projection(normed) + + +class Model(nn.Module): + def __init__(self, config: ModelConfig): + super().__init__() + self.model_type = config.model_type + self.config = config + + # Text + self.language_model = LanguageModel(config.text_config) + self.vocab_size = config.text_config.vocab_size + + # Vision + self.vision_tower = VisionModel(config.vision_config) + self.embed_vision = MultimodalEmbedder( + embedding_dim=config.vision_config.hidden_size, + text_hidden_size=config.text_config.hidden_size, + eps=config.vision_config.rms_norm_eps, + ) + + # Audio + if config.audio_config is not None: + self.audio_tower = AudioEncoder(config.audio_config) + audio_output_dim = ( + config.audio_config.output_proj_dims or config.audio_config.hidden_size + ) diff --git a/tooling/inference-frameworks/snippets/vllm_gemma4_head_80.py b/tooling/inference-frameworks/snippets/vllm_gemma4_head_80.py new file mode 100644 index 0000000..f8ffa64 --- /dev/null +++ b/tooling/inference-frameworks/snippets/vllm_gemma4_head_80.py @@ -0,0 +1,90 @@ +# SPDX-License-Identifier: Apache-2.0 +# SPDX-FileCopyrightText: Copyright contributors to the vLLM project +# Copyright 2025 The vLLM team. +# Copyright 2025 Google Inc. HuggingFace Inc. team. All rights reserved. +# +# + +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +"""Gemma 4 model implementation for vLLM.""" + +from collections.abc import Iterable +from dataclasses import replace +from itertools import islice + +import regex as re +import torch +from torch import nn + +from vllm.compilation.decorators import support_torch_compile +from vllm.config import CacheConfig, VllmConfig +from vllm.distributed import ( + get_pp_group, + get_tensor_model_parallel_rank, + get_tensor_model_parallel_world_size, +) +from vllm.forward_context import get_forward_context +from vllm.logger import init_logger +from vllm.model_executor.layers.activation import GeluAndMul +from vllm.model_executor.layers.attention import Attention +from vllm.model_executor.layers.fused_moe import FusedMoE, GateLinear +from vllm.model_executor.layers.layernorm import RMSNorm +from vllm.model_executor.layers.linear import ( + ColumnParallelLinear, + MergedColumnParallelLinear, + QKVParallelLinear, + ReplicatedLinear, + RowParallelLinear, +) +from vllm.model_executor.layers.logits_processor import LogitsProcessor +from vllm.model_executor.layers.quantization import QuantizationConfig +from vllm.model_executor.layers.rotary_embedding import get_rope +from vllm.model_executor.layers.vocab_parallel_embedding import ( + ParallelLMHead, + VocabParallelEmbedding, +) +from vllm.model_executor.model_loader.weight_utils import ( + default_weight_loader, + maybe_remap_kv_scale_name, +) +from vllm.sequence import IntermediateTensors +from vllm.v1.attention.backends.utils import KVSharingFastPrefillMetadata + +from .interfaces import ( + EagleModelMixin, + MixtureOfExperts, + SupportsEagle3, + SupportsLoRA, + SupportsPP, +) +from .utils import ( + AutoWeightsLoader, + WeightsMapper, + extract_layer_index, + is_pp_missing_parameter, + make_layers, + maybe_prefix, +) + +logger = init_logger(__name__) + + +def _get_text_config(config): + """Dereference text_config if config is a nested Gemma4Config. + + Gemma4 checkpoints use architectures=["Gemma4ForConditionalGeneration"] + which yields a Gemma4Config with nested text_config. This function + transparently returns the text config regardless of nesting. + """ + if hasattr(config, "text_config"): + return config.text_config diff --git a/tooling/inference-frameworks/snippets/vllm_gemma4_mm_head_80.py b/tooling/inference-frameworks/snippets/vllm_gemma4_mm_head_80.py new file mode 100644 index 0000000..6ee45dd --- /dev/null +++ b/tooling/inference-frameworks/snippets/vllm_gemma4_mm_head_80.py @@ -0,0 +1,80 @@ +# SPDX-License-Identifier: Apache-2.0 +# SPDX-FileCopyrightText: Copyright contributors to the vLLM project +"""Gemma 4 multimodal model (image + audio + video support). + +Adds vision tower, audio tower, and multimodal embedders on top of the +text-only Gemma4ForCausalLM. The vision/audio encoders are loaded via +AutoModel.from_config and run in eager mode while the language model uses +the vLLM-optimized path. + +Video support: Gemma4 does **not** have a native video tower. Videos are +decomposed into timestamped image frames (up to 32 frames at 70 soft tokens +each) and fed through the same vision tower as regular images. The +processor inserts ``mm:ss`` timestamps between frames so the model can +reason about temporal order. +""" + +import math +from collections.abc import Iterable, Mapping, Sequence +from typing import Annotated, Any, Literal + +import numpy as np +import torch +from PIL import Image as PILImage +from torch import nn +from transformers import AutoModel, BatchFeature +from transformers.models.gemma4 import ( + Gemma4Config, + Gemma4Processor, + Gemma4VisionConfig, +) +from transformers.models.gemma4.configuration_gemma4 import ( + Gemma4AudioConfig, + Gemma4TextConfig, +) + +from vllm.config import VllmConfig +from vllm.config.multimodal import BaseDummyOptions, VideoDummyOptions +from vllm.inputs import MultiModalDataDict +from vllm.logger import init_logger +from vllm.model_executor.layers.layernorm import RMSNorm +from vllm.model_executor.layers.linear import ReplicatedLinear +from vllm.model_executor.models.gemma4 import Gemma4ForCausalLM +from vllm.model_executor.models.module_mapping import MultiModelKeys +from vllm.multimodal import MULTIMODAL_REGISTRY +from vllm.multimodal.inputs import ( + MultiModalFieldConfig, + MultiModalKwargsItems, + VideoItem, +) +from vllm.multimodal.parse import ( + AudioProcessorItems, + ImageProcessorItems, + MultiModalDataItems, + MultiModalDataParser, +) +from vllm.multimodal.processing import BaseDummyInputsBuilder +from vllm.multimodal.processing.processor import ( + BaseMultiModalProcessor, + BaseProcessingInfo, + PromptReplacement, + PromptUpdate, + PromptUpdateDetails, +) +from vllm.sequence import IntermediateTensors +from vllm.utils.tensor_schema import TensorSchema, TensorShape + +from .interfaces import ( + MultiModalEmbeddings, + SupportsEagle3, + SupportsLoRA, + SupportsMultiModal, + SupportsPP, +) +from .utils import ( + AutoWeightsLoader, + WeightsMapper, + init_vllm_registered_model, + maybe_prefix, +) + diff --git a/tooling/inference-frameworks/snippets/vllm_registry_excerpt.txt b/tooling/inference-frameworks/snippets/vllm_registry_excerpt.txt new file mode 100644 index 0000000..54fb140 --- /dev/null +++ b/tooling/inference-frameworks/snippets/vllm_registry_excerpt.txt @@ -0,0 +1,16 @@ +# Source: vllm-project/vllm main branch — vllm/model_executor/models/registry.py +# Verified 2026-04-18 via GitHub API. + +# Line 99 (text-only Gemma 4 CausalLM): +"Gemma4ForCausalLM": ("gemma4", "Gemma4ForCausalLM"), + +# Line 230 (multimodal Gemma 4: vision + audio + video): +"Gemma4ForCausalLM": ("gemma4_mm", "Gemma4ForConditionalGeneration"), + +# The second (_mm) registration maps Gemma4ForCausalLM -> gemma4_mm.Gemma4ForConditionalGeneration, +# which wires in: +# - vision_tower (pixel_values, pixel_position_ids) +# - audio_tower (input_features_padded, input_features_mask) [E2B/E4B only] +# - video path (pixel_values_videos — decomposed to frames, up to 32 frames @ 70 soft tokens) +# +# vLLM dispatches based on whether the HF config has audio_config populated.