# Codex Tasks — Structural Analysis & Tooling

**Created:** 2026-04-03
**Purpose:** Three Codex agents build structural tools and analysis from the VIBECODE-THEORY paper series (papers 001-008 + allegorical directory). These complement the Gemini research swarm by providing machine-readable structure, cross-reference maps, and integration tooling.

**Protocol:** Each agent claims ONE task by writing their identifier into the `Claimed by` field, then works autonomously. When done, write output to the specified location and mark status as `DONE`.

---

## Task C1: Cross-Reference Graph
**Status:** DONE
**Claimed by:** Codex-GPT5
**Output:** `tools/cross-references/`

Parse all 8 papers and 8 allegory files. Extract every cross-reference between documents — explicit ("Paper 006's theological thread," "as established in Paper 007") and implicit (shared concepts, terms introduced in one paper and used in another).

**Deliverables:**

1. **`graph.json`** — Structured JSON graph:
```json
{
  "nodes": [
    {"id": "007", "title": "The Ratchet", "concepts_introduced": ["biological ratchet", "infrastructure threshold", ...]}
  ],
  "edges": [
    {"source": "008", "target": "007", "type": "extends", "context": "extends ratchet mechanism with a direction — toward unification"},
    {"source": "008", "target": "003", "type": "addresses", "context": "responds to falsifiability concern"}
  ]
}
```

2. **`graph.mermaid`** — Mermaid diagram showing paper relationships. Use directional edges labeled with relationship type (extends, refutes, addresses, introduces concept used by).

3. **`dangling_threads.md`** — List of concepts, questions, or claims that are raised in one paper but never resolved or revisited. These are candidates for Paper 009+. For each: which paper raised it, what the open question is, and which (if any) later papers partially address it.

4. **`concept_flow.md`** — For each major concept (dependency chain, ratchet, infrastructure threshold, cognitive preference shift, automation spiral, knowledge unification, etc.), trace its lifecycle: where introduced, where challenged, where revised, where it currently stands.

**How to extract:** Read each paper. Look for:
- Explicit references: "Paper N," "as established in," "the series has," "prior papers"
- Section headers like "Relationship to Prior Papers" (most papers have one)
- Shared terminology across papers
- Open questions sections (most papers end with these)
- The HANDOFF.md file has a summary of key ideas by session

---

## Task C2: Concept Index & Glossary Generator
**Status:** DONE
**Claimed by:** Codex-GPT5
**Output:** `tools/concept-index/`

Build an automated glossary of every named concept, framework, and thesis in the series.

**Deliverables:**

1. **`index.json`** — Structured concept index:
```json
{
  "concepts": [
    {
      "name": "The Biological Ratchet",
      "aliases": ["neural pruning argument", "dependency ratchet", "physiological argument"],
      "introduced_in": "007",
      "definition": "Dependencies don't reverse because the organism physically adapts...",
      "revised_in": [],
      "challenged_in": ["003"],
      "referenced_in": ["008"],
      "status": "active",
      "related_concepts": ["cognitive preference shift", "infrastructure threshold"]
    }
  ]
}
```

2. **`glossary.md`** — Human-readable glossary sorted alphabetically. For each concept: one-paragraph definition drawn from the papers, paper of origin, current status (active/superseded/open question).

3. **`concept_map.mermaid`** — Mermaid diagram showing concept relationships (which concepts depend on, extend, or contradict which other concepts). Separate from the paper-level graph in C1 — this is concept-to-concept, not paper-to-paper.

4. **`build_index.py`** — The Python script that generates all of the above from the paper files. Should be re-runnable as new papers are added. Read the markdown files, extract concepts by pattern matching (bold terms, section headers, named frameworks), cross-reference, and output structured data.

**Extraction heuristics:**
- Bold terms on first use often indicate named concepts
- Section headers are often concept names
- Table rows in papers 007 and 008 define mappings
- "Relationship to Prior Papers" sections link concepts across papers
- The HANDOFF.md "Key Ideas" sections are a good seed list

---

## Task C3: Research Integrator
**Status:** DONE
**Claimed by:** Codex-GPT5
**Output:** `tools/integrator/`

Build a tool that processes the Gemini research output files (from `research/`) and produces a unified research digest. **Note:** The research files may not exist yet (Gemini agents are still running). Build the tool so it works on whatever files exist at runtime, and can be re-run later when all 6 are complete.

**Deliverables:**

1. **`integrate.py`** — Python script that:
   - Reads all `research/*.md` files
   - Extracts all named scholars/authors mentioned across files
   - Deduplicates scholars appearing in multiple research files and consolidates what each research file says about them
   - Extracts all book/paper titles and builds a unified bibliography
   - Identifies contradictions (where one research file's evidence conflicts with another's)
   - Maps research findings to the open questions from Paper 008's "Open Questions for Paper 009" section
   - Outputs structured results

2. **`digest.md`** — Generated output (from running integrate.py on whatever research files exist):
   - **Scholars by frequency** — who appears most across the research, suggesting central importance
   - **Unified bibliography** — every source mentioned, deduplicated, sorted by relevance
   - **Contradiction report** — where research files disagree or present conflicting evidence
   - **Paper 009 coverage map** — which open questions from 008 got the most supporting material, which got the least (research gaps)
   - **Strongest challenges** — the most threatening counterarguments found across all research files

3. **`009_outline_suggestion.md`** — Auto-generated suggested outline for Paper 009 based on:
   - Which open questions have the most research material
   - Which new themes emerged from the research that weren't in the original open questions
   - Which counterarguments are strong enough to require direct engagement

**Design notes:**
- Parse markdown with regex or a lightweight parser — don't require a markdown AST library
- Be generous with extraction — false positives are better than missed findings
- The script should work with 1 research file or all 6
- Print progress to stdout so the user can see what it found