Files
ai-hell/docs/superpowers/specs/2026-04-10-ai-hell-design.md
Mortdecai 81cedfc789 docs: AI Hell design spec
Passive horror webapp — AI-generated hellscape with infinite escalation.
SDXL Turbo + XTTS v2 on V100, WebGL shader compositor frontend.
Based on claude-avatar infrastructure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 01:06:56 -04:00

10 KiB

AI Hell — Design Spec

Overview

Passive horror webapp. You open it, it slowly destroys your comfort. AI-generated imagery shifts between abstract hellscape and recognizable-but-wrong forms (faces, rooms, shapes). Audio: ambient dread soundscape with AI-voiced whispers cloned from random non-voice audio samples. Escalation is infinite — never peaks, always worse, with randomized timing to prevent habituation. No avatar, no interaction, no UI. You just watch.

Based on the claude-avatar project infrastructure (FastAPI, WebSocket, SDXL Turbo, XTTS v2) but replaces the 3D avatar with a fullscreen 2D shader compositor.

Architecture

Viewer opens hell.sethpc.xyz
        |
        v
+-------------------+
|  FastAPI Server    |
|  (escalation brain)|----> WebSocket: phase updates, audio clips, asset URLs
+---------+---------+
          |
    +-----+------+
    |             |
    v             v
+--------+  +-----------+
| SDXL   |  | XTTS v2   |
| Turbo  |  | (voices)  |
| ~5GB   |  | ~1.5GB    |
+--------+  +-----------+
    |             |
    v             v
Asset Pool      Audio Pool
(on disk)       (on disk)
    |             |
    v             v
+-----------------------------------+
|  Browser (WebGL)                   |
|  - Shader distortion layer         |
|  - Asset compositor (blend/morph)  |
|  - Web Audio (ambient + whispers)  |
|  - Escalation renderer             |
+-----------------------------------+

Server streams assets and commands, not frames. Frontend composites at 60fps.

Container

Same pattern as claude-avatar: LXC on pve197, V100 32GB GPU passthrough, Caddy reverse proxy at hell.sethpc.xyz. Can reuse CT 167 or create a new CT for isolation.

VRAM Budget

~6.5GB total (SDXL 5GB + XTTS 1.5GB). No MuseTalk/LivePortrait needed.

Escalation Engine

Server-side brain. Continuous intensity value on a logarithmic curve:

intensity = log(1 + elapsed_seconds * rate)

Default rate = 0.05. At this rate: intensity 1.0 at ~40s, 2.0 at ~130s, 3.0 at ~7min, 4.0 at ~18min. Tunable via config.

Fast early escalation (first minute is dramatic), then slower creep that never stops. 5 minutes and 30 minutes are very different experiences, but neither has peaked.

What Intensity Controls

Parameter Low (0-1) Medium (1-3) High (3+)
Image content Abstract textures, dark gradients Faces emerge, distorted architecture Body horror, impossible geometry, rapid cycling
Morph speed Slow crossfades (5-10s) Moderate blending (2-4s) Fast cuts, stuttering, strobe
Shader severity Subtle chromatic aberration, slight warping Visible glitch, color bleeding, pulse Screen tearing, melt, inversion, tremor
Audio Low drone, silence gaps Whispers fade in, dissonant tones Layered voices, rising pitch, sudden stops
Voice frequency Rare (every 60s+) Occasional (every 20-30s) Frequent, overlapping, direct address
Surprise events None Rare (face flash, audio spike) Unpredictable timing, fake UI glitches

Randomized Timing

The escalation floor only goes up, but delivery is stochastic:

  • Asset swaps: random intervals within a range that shrinks with intensity (early: 5-15s gaps, later: 0.5-4s)
  • Silence gaps: random-length pauses where nothing happens — then something does
  • Cluster bursts: occasionally stack multiple events close together, then go quiet
  • Voice timing: Poisson-distributed, mean interval decreases with intensity
  • Fake calm: occasionally intensity presentation drops for 10-30s before spiking — the "it stopped... wait" effect

Predictable timing kills horror. The randomness prevents habituation.

Phase Updates

Server pushes over WebSocket:

{"type": "phase", "intensity": 2.4, "params": {
  "morph_speed": 0.35,
  "shader_severity": 0.6,
  "palette": "crimson_void"
}}

Frontend interpolates between current and target params smoothly, unless the server sends a deliberate hard scare.

Asset Generation Pipeline

Startup Batch

  • Server generates 30-50 initial images across severity tiers (mild, medium, extreme)
  • SDXL prompts curated per tier: abstract textures, distorted faces, body horror, impossible spaces
  • 10-20 voice clips from random non-voice source samples via XTTS
  • Assets tagged with severity: float for the escalation engine to select from

Background Generation (Continuous)

  • Worker thread generates new assets every 10-30s while the server runs
  • Severity of new assets biased toward what connected viewers currently need
  • Old assets rotated out to prevent disk bloat (cap: ~200 images, ~50 audio clips)
  • Prompts randomized from a curated horror prompt pool + procedural combination

Asset Format

  • Images: 512x512 PNG (SDXL Turbo native res), served via static endpoint
  • Audio: WAV clips, 2-10s each, served via static endpoint or pushed over WebSocket
  • Ambient drones: pre-baked looping tracks bundled with frontend (not AI-generated)

Frontend Rendering

Fullscreen WebGL shader canvas. No 3D scene — 2D compositor with shader effects.

Layer Stack (bottom to top)

  1. Base layer — current AI-generated image, fills viewport
  2. Blend layer — next image, crossfading in (or hard-cutting at high intensity)
  3. Morphing layer — WebGL shader that warps/distorts the composited image
  4. Overlay layer — procedural effects (fog, particles, vignette, scanlines)
  5. Flash layer — momentary full-screen events (face flash, white-out, inversion)

Shader Effects

Effect Description
Chromatic aberration RGB channel split — subtle wrongness early, extreme separation later
Mesh warp Sin-wave displacement — gentle breathing early, violent convulsion later
Melt Downward pixel smear — faces drip
Glitch Horizontal scanline offset blocks — digital corruption
Pulse Brightness oscillation synced to ambient drone
Color shift Palette rotation — desaturates then pushes into reds/greens
Noise Film grain -> TV static progression
Inversion Brief negative-image flashes

Transition Modes

  • Slow crossfade (low intensity)
  • Dissolve through black
  • Glitch-cut (hard swap with artifacting)
  • Melt-morph (one image melts into the next via displacement map)

UI

None. Black background. Cursor hidden after 3s idle. No title bar hint. Fullscreen API requested on first click.

Audio System

Three layers mixed in Web Audio API.

Layer 1 — Ambient Drone (client-side)

  • 2-3 pre-baked dark ambient loops bundled with the frontend
  • Crossfade between them as intensity rises
  • Web Audio gain/filter nodes shift tone: low-pass opens up, reverb increases, pitch drops
  • Always playing — the foundation

Layer 2 — Whisper Pool (server-pushed)

  • XTTS clones random non-voice audio samples as voice sources: dog barks, machinery, wind, static, instruments, household objects, reversed audio
  • Speaks horror phrases through these cloned "voices" — results range from "almost human but wrong" to "this should not be talking"
  • Pre-generated at startup from a samples/ directory (just drop WAVs in to customize)
  • Server pushes clip references over WebSocket at random intervals
  • Client plays with randomized: stereo pan, volume (some barely audible), reverb amount
  • Content: fragments of sentences, numbers, names, nonsense syllables, distorted laughter
  • At high intensity: clips overlap, stack, pan rapidly

Layer 3 — Direct Address (server-triggered, real-time)

  • Server generates XTTS audio on the fly for "it sees you" moments
  • Phrases: "you're still here", "don't leave", "I can see you", "why"
  • Sparse at low intensity, more frequent later
  • Played dry (no reverb) — cuts through the ambient, feels close and intimate
  • These are the moments that actually scare people

Audio Escalation

  • Low: quiet drone, occasional barely-audible whisper
  • Medium: drone louder, whispers clearly audible, first direct address
  • High: layered voices, drone distorted, direct address frequent, sudden silences followed by spikes

Project Structure

ai-hell/
  server/
    main.py              # FastAPI app, WebSocket, static serving
    config.py            # Intensity params, timing ranges
    escalation.py        # Escalation engine, phase calculator, randomized timing
    asset_generator.py   # SDXL wrapper (repurposed from claude-avatar face_generator.py)
    voice_generator.py   # XTTS wrapper (repurposed from claude-avatar voice.py)
    asset_pool.py        # Pool management, severity tagging, rotation
    prompts.py           # Horror prompt library + procedural combiner
  frontend/
    index.html           # Fullscreen WebGL compositor
    shaders/             # GLSL fragment shaders for distortion effects
    ambient/             # Pre-baked drone loops (2-3 tracks)
  samples/               # Random audio files for XTTS voice cloning source
  requirements.txt
  IDEA.md
  SESSION.md
  DECISIONS.md
  CLAUDE.md

Reused from claude-avatar

FastAPI skeleton, WebSocket streaming, SDXL Turbo loading/inference, XTTS loading/inference, config pattern, systemd service pattern, Caddy reverse proxy pattern.

Dropped

Three.js 3D scene, GLB model, morph targets, MuseTalk, LivePortrait, animation loop.

New

Escalation engine, asset pool manager, shader compositor, horror prompt system, multi-layer Web Audio mixer.

API

WebSocket (/stream)

Server -> Client:

{"type": "phase", "intensity": 2.4, "params": {"morph_speed": 0.35, "shader_severity": 0.6, "palette": "crimson_void"}}
{"type": "asset", "url": "/assets/img_0042.png", "severity": 1.8, "transition": "melt"}
{"type": "whisper", "url": "/assets/audio/whisper_017.wav", "pan": -0.3, "volume": 0.4, "reverb": 0.7}
{"type": "address", "audio": "<base64 wav>", "text": "you're still here"}
{"type": "scare", "effect": "face_flash", "duration_ms": 150}

Client -> Server:

{"type": "ping"}

REST

GET /status
  -> {"intensity": 2.4, "connected_clients": 1, "asset_pool_size": 87, "audio_pool_size": 23}

POST /reset
  -> {"status": "ok"}  # restart escalation for this session

Future Path (Out of scope for v1)

  • Webcam integration — detect viewer's face, use it in the horror
  • Multi-viewer awareness — "there are others here"
  • Mobile haptics — vibration API timed to scares
  • Viewer voice cloning — mic input -> XTTS clones them back at them
  • Seasonal themes — different horror palettes/prompts