# Training Data Analysis: merged_training_v06.jsonl **Date:** 2026-03-26 **Source:** `/home/claude/bin/Mincecraft-AI-model/data/processed/merged_training_v06.jsonl` **Examples:** 7,256 total **Training script:** `training/scripts/train_lora.py` (Unsloth + TRL SFTTrainer) --- ## Dataset Structure | Metric | Value | |--------|-------| | Total examples | 7,256 | | Parse errors | 0 | | Message format | "conversations" (7,254) + "text" (2) | | Multi-turn (>3 msgs) | 4,006 (55.2%) | | Has system prompt | 100% | | Has /no_think prefix | 3,459 (47.7%) | | Avg system prompt length | 21,358 chars (~5,300 tokens) | | System prompts >20K chars | 6,526 (89.9%) | | JSON responses | 3,248 (44.8%) | | tool_call responses | 4,006 (55.2%) | | Contains `` tags | 52 | | Tool role messages | 12,155 | | Contaminated JSON (tool_call inside) | 84 | | Empty commands arrays | 387 | ## System Prompt Variants | Variant | Count | |---------|-------| | script_writer (no /no_think) | 3,559 | | /no_think + paper_server | 2,942 | | /no_think + other | 338 | | paper_server (no /no_think) | 236 | | /no_think + server_admin | 179 | | god_persona | 6 | ## JSON Response Schema Variants | Key Combination | Count | |-----------------|-------| | {commands, risk_level} | 1,112 | | {commands, reasoning, risk_level} | 911 | | {commands, message, risk_level} | 495 | | {commands, message, reasoning} | 338 | | {commands, message, reasoning, risk_level} | 243 | | {commands, reasoning} | 149 | ## Tool Usage in Training Data | Tool | Calls | |------|-------| | rcon.execute | 7,220 | | script.validate | 1,496 | | script.write | 1,493 | | script.execute | 1,485 | | journal.read | 110 | | world.player_info | 70 | | journal.write | 33 | | world.scan_area | 29 | | minecraft.wiki_lookup | 28 | | world.nearby_entities | 25 | | Others (14 tools) | <25 each | --- ## Six Compounding Failure Modes ### 1. System Prompt Truncation (The Killer) `max_seq_len = 2048 tokens`. Average system prompt = ~5,300 tokens. **90% of examples have system prompts that exceed the entire sequence length by 2.5x.** With packing enabled, the trainer stuffs multiple examples per 2048-token window. The system prompt alone doesn't fit — so the model trained on truncated system prompts with **no user input and no assistant response in most examples**. It learned system prompt fragments, not task behavior. ### 2. Mixed Response Paradigm 44.8% of examples teach: return clean JSON `{"commands": [...]}`. 55.2% of examples teach: emit `{"name": "rcon.execute",...}`. No clear signal distinguishes when to use each format. The system prompts differ but get truncated (Issue 1), so the model never sees the disambiguation. ### 3. Inconsistent JSON Schema 6 different key combinations across the JSON responses. No single schema dominates enough to become the learned default. ### 4. Contaminated Examples - 84 examples have `` strings inside JSON responses (pipeline leakage) - 387 examples have empty `commands: []` (teaches returning nothing is acceptable) - 2 raw `text` format entries with literal `<|im_start|>` tokens ### 5. Tool Role Incompatibility 4,006 examples use custom tool names (`rcon.execute`, `script.validate`) that aren't in Qwen's pretrained vocabulary. The model needs to learn these from scratch, but with truncated sequences it never sees enough context. ### 6. `/no_think` Misuse `/no_think` is a Qwen inference-time directive, not a trainable behavior. Including it in 48% of training data wastes tokens and doesn't transfer to the fine-tuned model's behavior (confirmed by probes). --- ## Training Pipeline Details ### Data Flow ``` Raw sources (seed, tool, audit, self_play, distilled, etc.) → merge_datasets.py (normalize, dedup, 95/5 split) → merged_training_v06.jsonl → train_lora.py ├─ load_seed_dataset() → conversations format ├─ load_tool_dataset() → messages/text format └─ formatting_func() → tokenizer.apply_chat_template() → SFTTrainer (Unsloth, QLoRA 4-bit) → LoRA merge → GGUF → quantize → Ollama ``` ### Training Config - **Base models:** Qwen3-8B, Qwen3.5-9B, Qwen3.5-14B - **Method:** QLoRA (4-bit base + FP16 LoRA) - **LoRA:** rank 16-64, alpha 32-128, targets q/k/v/o/gate/up/down_proj - **Batch:** 2 x 4 grad accum = effective 8 - **LR:** 2e-4, cosine schedule, 0.1 warmup - **Epochs:** 1 (default) - **Packing:** enabled - **max_seq_len:** 2048 ### What the Model Actually Trained On Due to truncation + packing, most training examples were reduced to: ``` [truncated system prompt fragment][truncated system prompt fragment][truncated...] ``` The model spent compute learning to predict system prompt text, not learning the user→assistant mapping.