commit 81cedfc789a3d33f0ba4bc963884f3f89f207d13 Author: Mortdecai Date: Fri Apr 10 01:06:56 2026 -0400 docs: AI Hell design spec Passive horror webapp — AI-generated hellscape with infinite escalation. SDXL Turbo + XTTS v2 on V100, WebGL shader compositor frontend. Based on claude-avatar infrastructure. Co-Authored-By: Claude Opus 4.6 (1M context) diff --git a/docs/superpowers/specs/2026-04-10-ai-hell-design.md b/docs/superpowers/specs/2026-04-10-ai-hell-design.md new file mode 100644 index 0000000..f1594de --- /dev/null +++ b/docs/superpowers/specs/2026-04-10-ai-hell-design.md @@ -0,0 +1,265 @@ +# AI Hell — Design Spec + +## Overview + +Passive horror webapp. You open it, it slowly destroys your comfort. AI-generated imagery shifts between abstract hellscape and recognizable-but-wrong forms (faces, rooms, shapes). Audio: ambient dread soundscape with AI-voiced whispers cloned from random non-voice audio samples. Escalation is infinite — never peaks, always worse, with randomized timing to prevent habituation. No avatar, no interaction, no UI. You just watch. + +Based on the claude-avatar project infrastructure (FastAPI, WebSocket, SDXL Turbo, XTTS v2) but replaces the 3D avatar with a fullscreen 2D shader compositor. + +## Architecture + +``` +Viewer opens hell.sethpc.xyz + | + v ++-------------------+ +| FastAPI Server | +| (escalation brain)|----> WebSocket: phase updates, audio clips, asset URLs ++---------+---------+ + | + +-----+------+ + | | + v v ++--------+ +-----------+ +| SDXL | | XTTS v2 | +| Turbo | | (voices) | +| ~5GB | | ~1.5GB | ++--------+ +-----------+ + | | + v v +Asset Pool Audio Pool +(on disk) (on disk) + | | + v v ++-----------------------------------+ +| Browser (WebGL) | +| - Shader distortion layer | +| - Asset compositor (blend/morph) | +| - Web Audio (ambient + whispers) | +| - Escalation renderer | ++-----------------------------------+ +``` + +Server streams assets and commands, not frames. Frontend composites at 60fps. + +### Container + +Same pattern as claude-avatar: LXC on pve197, V100 32GB GPU passthrough, Caddy reverse proxy at `hell.sethpc.xyz`. Can reuse CT 167 or create a new CT for isolation. + +### VRAM Budget + +~6.5GB total (SDXL 5GB + XTTS 1.5GB). No MuseTalk/LivePortrait needed. + +## Escalation Engine + +Server-side brain. Continuous intensity value on a logarithmic curve: + +``` +intensity = log(1 + elapsed_seconds * rate) +``` + +Default `rate = 0.05`. At this rate: intensity 1.0 at ~40s, 2.0 at ~130s, 3.0 at ~7min, 4.0 at ~18min. Tunable via config. + +Fast early escalation (first minute is dramatic), then slower creep that never stops. 5 minutes and 30 minutes are very different experiences, but neither has peaked. + +### What Intensity Controls + +| Parameter | Low (0-1) | Medium (1-3) | High (3+) | +|-----------|-----------|--------------|-----------| +| Image content | Abstract textures, dark gradients | Faces emerge, distorted architecture | Body horror, impossible geometry, rapid cycling | +| Morph speed | Slow crossfades (5-10s) | Moderate blending (2-4s) | Fast cuts, stuttering, strobe | +| Shader severity | Subtle chromatic aberration, slight warping | Visible glitch, color bleeding, pulse | Screen tearing, melt, inversion, tremor | +| Audio | Low drone, silence gaps | Whispers fade in, dissonant tones | Layered voices, rising pitch, sudden stops | +| Voice frequency | Rare (every 60s+) | Occasional (every 20-30s) | Frequent, overlapping, direct address | +| Surprise events | None | Rare (face flash, audio spike) | Unpredictable timing, fake UI glitches | + +### Randomized Timing + +The escalation floor only goes up, but delivery is stochastic: + +- **Asset swaps:** random intervals within a range that shrinks with intensity (early: 5-15s gaps, later: 0.5-4s) +- **Silence gaps:** random-length pauses where nothing happens — then something does +- **Cluster bursts:** occasionally stack multiple events close together, then go quiet +- **Voice timing:** Poisson-distributed, mean interval decreases with intensity +- **Fake calm:** occasionally intensity *presentation* drops for 10-30s before spiking — the "it stopped... wait" effect + +Predictable timing kills horror. The randomness prevents habituation. + +### Phase Updates + +Server pushes over WebSocket: + +```json +{"type": "phase", "intensity": 2.4, "params": { + "morph_speed": 0.35, + "shader_severity": 0.6, + "palette": "crimson_void" +}} +``` + +Frontend interpolates between current and target params smoothly, unless the server sends a deliberate hard scare. + +## Asset Generation Pipeline + +### Startup Batch + +- Server generates 30-50 initial images across severity tiers (mild, medium, extreme) +- SDXL prompts curated per tier: abstract textures, distorted faces, body horror, impossible spaces +- 10-20 voice clips from random non-voice source samples via XTTS +- Assets tagged with `severity: float` for the escalation engine to select from + +### Background Generation (Continuous) + +- Worker thread generates new assets every 10-30s while the server runs +- Severity of new assets biased toward what connected viewers currently need +- Old assets rotated out to prevent disk bloat (cap: ~200 images, ~50 audio clips) +- Prompts randomized from a curated horror prompt pool + procedural combination + +### Asset Format + +- Images: 512x512 PNG (SDXL Turbo native res), served via static endpoint +- Audio: WAV clips, 2-10s each, served via static endpoint or pushed over WebSocket +- Ambient drones: pre-baked looping tracks bundled with frontend (not AI-generated) + +## Frontend Rendering + +Fullscreen WebGL shader canvas. No 3D scene — 2D compositor with shader effects. + +### Layer Stack (bottom to top) + +1. **Base layer** — current AI-generated image, fills viewport +2. **Blend layer** — next image, crossfading in (or hard-cutting at high intensity) +3. **Morphing layer** — WebGL shader that warps/distorts the composited image +4. **Overlay layer** — procedural effects (fog, particles, vignette, scanlines) +5. **Flash layer** — momentary full-screen events (face flash, white-out, inversion) + +### Shader Effects + +| Effect | Description | +|--------|-------------| +| Chromatic aberration | RGB channel split — subtle wrongness early, extreme separation later | +| Mesh warp | Sin-wave displacement — gentle breathing early, violent convulsion later | +| Melt | Downward pixel smear — faces drip | +| Glitch | Horizontal scanline offset blocks — digital corruption | +| Pulse | Brightness oscillation synced to ambient drone | +| Color shift | Palette rotation — desaturates then pushes into reds/greens | +| Noise | Film grain -> TV static progression | +| Inversion | Brief negative-image flashes | + +### Transition Modes + +- Slow crossfade (low intensity) +- Dissolve through black +- Glitch-cut (hard swap with artifacting) +- Melt-morph (one image melts into the next via displacement map) + +### UI + +None. Black background. Cursor hidden after 3s idle. No title bar hint. Fullscreen API requested on first click. + +## Audio System + +Three layers mixed in Web Audio API. + +### Layer 1 — Ambient Drone (client-side) + +- 2-3 pre-baked dark ambient loops bundled with the frontend +- Crossfade between them as intensity rises +- Web Audio gain/filter nodes shift tone: low-pass opens up, reverb increases, pitch drops +- Always playing — the foundation + +### Layer 2 — Whisper Pool (server-pushed) + +- XTTS clones random *non-voice* audio samples as voice sources: dog barks, machinery, wind, static, instruments, household objects, reversed audio +- Speaks horror phrases through these cloned "voices" — results range from "almost human but wrong" to "this should not be talking" +- Pre-generated at startup from a `samples/` directory (just drop WAVs in to customize) +- Server pushes clip references over WebSocket at random intervals +- Client plays with randomized: stereo pan, volume (some barely audible), reverb amount +- Content: fragments of sentences, numbers, names, nonsense syllables, distorted laughter +- At high intensity: clips overlap, stack, pan rapidly + +### Layer 3 — Direct Address (server-triggered, real-time) + +- Server generates XTTS audio on the fly for "it sees you" moments +- Phrases: "you're still here", "don't leave", "I can see you", "why" +- Sparse at low intensity, more frequent later +- Played dry (no reverb) — cuts through the ambient, feels close and intimate +- These are the moments that actually scare people + +### Audio Escalation + +- **Low:** quiet drone, occasional barely-audible whisper +- **Medium:** drone louder, whispers clearly audible, first direct address +- **High:** layered voices, drone distorted, direct address frequent, sudden silences followed by spikes + +## Project Structure + +``` +ai-hell/ + server/ + main.py # FastAPI app, WebSocket, static serving + config.py # Intensity params, timing ranges + escalation.py # Escalation engine, phase calculator, randomized timing + asset_generator.py # SDXL wrapper (repurposed from claude-avatar face_generator.py) + voice_generator.py # XTTS wrapper (repurposed from claude-avatar voice.py) + asset_pool.py # Pool management, severity tagging, rotation + prompts.py # Horror prompt library + procedural combiner + frontend/ + index.html # Fullscreen WebGL compositor + shaders/ # GLSL fragment shaders for distortion effects + ambient/ # Pre-baked drone loops (2-3 tracks) + samples/ # Random audio files for XTTS voice cloning source + requirements.txt + IDEA.md + SESSION.md + DECISIONS.md + CLAUDE.md +``` + +### Reused from claude-avatar + +FastAPI skeleton, WebSocket streaming, SDXL Turbo loading/inference, XTTS loading/inference, config pattern, systemd service pattern, Caddy reverse proxy pattern. + +### Dropped + +Three.js 3D scene, GLB model, morph targets, MuseTalk, LivePortrait, animation loop. + +### New + +Escalation engine, asset pool manager, shader compositor, horror prompt system, multi-layer Web Audio mixer. + +## API + +### WebSocket (`/stream`) + +Server -> Client: +```json +{"type": "phase", "intensity": 2.4, "params": {"morph_speed": 0.35, "shader_severity": 0.6, "palette": "crimson_void"}} +{"type": "asset", "url": "/assets/img_0042.png", "severity": 1.8, "transition": "melt"} +{"type": "whisper", "url": "/assets/audio/whisper_017.wav", "pan": -0.3, "volume": 0.4, "reverb": 0.7} +{"type": "address", "audio": "", "text": "you're still here"} +{"type": "scare", "effect": "face_flash", "duration_ms": 150} +``` + +Client -> Server: +```json +{"type": "ping"} +``` + +### REST + +``` +GET /status + -> {"intensity": 2.4, "connected_clients": 1, "asset_pool_size": 87, "audio_pool_size": 23} + +POST /reset + -> {"status": "ok"} # restart escalation for this session +``` + +## Future Path (Out of scope for v1) + +- Webcam integration — detect viewer's face, use it in the horror +- Multi-viewer awareness — "there are others here" +- Mobile haptics — vibration API timed to scares +- Viewer voice cloning — mic input -> XTTS clones them back at them +- Seasonal themes — different horror palettes/prompts