Add knowledge corpus: 14 command references, server context, and TF-IDF search index (Phase 1.3)

- knowledge/mc-commands/commands.json: 14 MC commands with JE syntax, args, examples, common errors, 1.21 version notes
- knowledge/server-context/servers.json: all 4 servers (mc1, shrink, paper-ai, paper-dev) with full config
- knowledge/build_index.py: TF-IDF indexer + search function (19 docs, 725 terms)
- All command syntax validated live on dev server via RCON (12/13 passed)
- PLAN.md: mark Phase 1.3 complete
This commit is contained in:
2026-03-18 02:01:12 -04:00
parent 827850b8d7
commit 77efac0283
5 changed files with 2825 additions and 10 deletions
+9 -10
View File
@@ -119,16 +119,15 @@ These projects informed the plan but solve different problems:
- [x] Seed 31 examples from repair code, prayer logs, sudo logs, and session history (`data/processed/seed_dataset.jsonl`)
#### 1.3 Knowledge Corpus
- [ ] Scrape Minecraft Wiki command reference pages for 1.21.x syntax
- Target: `/give`, `/effect`, `/tp`, `/execute`, `/worldborder`, `/weather`, `/gamemode`, `/enchant`, `/fill`, `/setblock`, `/clone`, `/scoreboard`, `/data`, `/function`
- Store as structured JSON (command, syntax, parameters, examples, version notes)
- [ ] Extract and chunk local server context:
- `server.properties` from mc1 and shrink-world
- Datapack definitions (shrinkborder, morespawns)
- Player list and UUID mappings
- RCON connection parameters (sanitized)
- [ ] Index knowledge corpus for RAG retrieval (simple TF-IDF or embedding-based)
- [ ] Validate: query the index with sample questions, spot-check relevance
- [x] Scrape Minecraft Wiki command reference pages for 1.21.x syntax (14 commands in `knowledge/mc-commands/commands.json`)
- Includes JE syntax, arguments, examples, version notes, and common errors per command
- Commands validated live on dev server (Paper 1.21.11) -- 12/13 passed, 1 false negative (already in target state)
- [x] Extract and chunk local server context (`knowledge/server-context/servers.json`)
- All 4 servers (mc1, shrink-world, paper-ai, paper-dev) with ports, RCON, settings, plugins
- Player list with UUIDs, infrastructure details, version-specific notes
- [x] Index knowledge corpus for RAG retrieval (`knowledge/build_index.py` -- TF-IDF with title boosting)
- 19 documents indexed, 725 unique terms
- [x] Validated with 6 test queries -- all return relevant top results
#### 1.4 Baseline Assistant (No Fine-Tuning)
- [ ] Build prompt-only assistant using `qwen3-coder` (via Ollama at 192.168.0.179)