Files

T

Mortdecai 81cedfc789 docs: AI Hell design spec

Passive horror webapp — AI-generated hellscape with infinite escalation.
SDXL Turbo + XTTS v2 on V100, WebGL shader compositor frontend.
Based on claude-avatar infrastructure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-10 01:06:56 -04:00

10 KiB

Raw Permalink Blame History

AI Hell — Design Spec

Overview

Passive horror webapp. You open it, it slowly destroys your comfort. AI-generated imagery shifts between abstract hellscape and recognizable-but-wrong forms (faces, rooms, shapes). Audio: ambient dread soundscape with AI-voiced whispers cloned from random non-voice audio samples. Escalation is infinite — never peaks, always worse, with randomized timing to prevent habituation. No avatar, no interaction, no UI. You just watch.

Based on the claude-avatar project infrastructure (FastAPI, WebSocket, SDXL Turbo, XTTS v2) but replaces the 3D avatar with a fullscreen 2D shader compositor.

Architecture

Viewer opens hell.sethpc.xyz
        |
        v
+-------------------+
|  FastAPI Server    |
|  (escalation brain)|----> WebSocket: phase updates, audio clips, asset URLs
+---------+---------+
          |
    +-----+------+
    |             |
    v             v
+--------+  +-----------+
| SDXL   |  | XTTS v2   |
| Turbo  |  | (voices)  |
| ~5GB   |  | ~1.5GB    |
+--------+  +-----------+
    |             |
    v             v
Asset Pool      Audio Pool
(on disk)       (on disk)
    |             |
    v             v
+-----------------------------------+
|  Browser (WebGL)                   |
|  - Shader distortion layer         |
|  - Asset compositor (blend/morph)  |
|  - Web Audio (ambient + whispers)  |
|  - Escalation renderer             |
+-----------------------------------+

Server streams assets and commands, not frames. Frontend composites at 60fps.

Container

Same pattern as claude-avatar: LXC on pve197, V100 32GB GPU passthrough, Caddy reverse proxy at hell.sethpc.xyz. Can reuse CT 167 or create a new CT for isolation.

VRAM Budget

~6.5GB total (SDXL 5GB + XTTS 1.5GB). No MuseTalk/LivePortrait needed.

Escalation Engine

Server-side brain. Continuous intensity value on a logarithmic curve:

intensity = log(1 + elapsed_seconds * rate)

Default rate = 0.05. At this rate: intensity 1.0 at ~40s, 2.0 at ~130s, 3.0 at ~7min, 4.0 at ~18min. Tunable via config.

Fast early escalation (first minute is dramatic), then slower creep that never stops. 5 minutes and 30 minutes are very different experiences, but neither has peaked.

What Intensity Controls

Parameter	Low (0-1)	Medium (1-3)	High (3+)
Image content	Abstract textures, dark gradients	Faces emerge, distorted architecture	Body horror, impossible geometry, rapid cycling
Morph speed	Slow crossfades (5-10s)	Moderate blending (2-4s)	Fast cuts, stuttering, strobe
Shader severity	Subtle chromatic aberration, slight warping	Visible glitch, color bleeding, pulse	Screen tearing, melt, inversion, tremor
Audio	Low drone, silence gaps	Whispers fade in, dissonant tones	Layered voices, rising pitch, sudden stops
Voice frequency	Rare (every 60s+)	Occasional (every 20-30s)	Frequent, overlapping, direct address
Surprise events	None	Rare (face flash, audio spike)	Unpredictable timing, fake UI glitches

Randomized Timing

The escalation floor only goes up, but delivery is stochastic:

Asset swaps: random intervals within a range that shrinks with intensity (early: 5-15s gaps, later: 0.5-4s)
Silence gaps: random-length pauses where nothing happens — then something does
Cluster bursts: occasionally stack multiple events close together, then go quiet
Voice timing: Poisson-distributed, mean interval decreases with intensity
Fake calm: occasionally intensity presentation drops for 10-30s before spiking — the "it stopped... wait" effect

Predictable timing kills horror. The randomness prevents habituation.

Phase Updates

Server pushes over WebSocket:

{"type": "phase", "intensity": 2.4, "params": {
  "morph_speed": 0.35,
  "shader_severity": 0.6,
  "palette": "crimson_void"
}}

Frontend interpolates between current and target params smoothly, unless the server sends a deliberate hard scare.

Asset Generation Pipeline

Startup Batch

Server generates 30-50 initial images across severity tiers (mild, medium, extreme)
SDXL prompts curated per tier: abstract textures, distorted faces, body horror, impossible spaces
10-20 voice clips from random non-voice source samples via XTTS
Assets tagged with severity: float for the escalation engine to select from

Background Generation (Continuous)

Worker thread generates new assets every 10-30s while the server runs
Severity of new assets biased toward what connected viewers currently need
Old assets rotated out to prevent disk bloat (cap: ~200 images, ~50 audio clips)
Prompts randomized from a curated horror prompt pool + procedural combination

Asset Format

Images: 512x512 PNG (SDXL Turbo native res), served via static endpoint
Audio: WAV clips, 2-10s each, served via static endpoint or pushed over WebSocket
Ambient drones: pre-baked looping tracks bundled with frontend (not AI-generated)

Frontend Rendering

Fullscreen WebGL shader canvas. No 3D scene — 2D compositor with shader effects.

Layer Stack (bottom to top)

Base layer — current AI-generated image, fills viewport
Blend layer — next image, crossfading in (or hard-cutting at high intensity)
Morphing layer — WebGL shader that warps/distorts the composited image
Overlay layer — procedural effects (fog, particles, vignette, scanlines)
Flash layer — momentary full-screen events (face flash, white-out, inversion)

Shader Effects

Effect	Description
Chromatic aberration	RGB channel split — subtle wrongness early, extreme separation later
Mesh warp	Sin-wave displacement — gentle breathing early, violent convulsion later
Melt	Downward pixel smear — faces drip
Glitch	Horizontal scanline offset blocks — digital corruption
Pulse	Brightness oscillation synced to ambient drone
Color shift	Palette rotation — desaturates then pushes into reds/greens
Noise	Film grain -> TV static progression
Inversion	Brief negative-image flashes

Transition Modes

Slow crossfade (low intensity)
Dissolve through black
Glitch-cut (hard swap with artifacting)
Melt-morph (one image melts into the next via displacement map)

UI

None. Black background. Cursor hidden after 3s idle. No title bar hint. Fullscreen API requested on first click.

Audio System

Three layers mixed in Web Audio API.

Layer 1 — Ambient Drone (client-side)

2-3 pre-baked dark ambient loops bundled with the frontend
Crossfade between them as intensity rises
Web Audio gain/filter nodes shift tone: low-pass opens up, reverb increases, pitch drops
Always playing — the foundation

Layer 2 — Whisper Pool (server-pushed)

XTTS clones random non-voice audio samples as voice sources: dog barks, machinery, wind, static, instruments, household objects, reversed audio
Speaks horror phrases through these cloned "voices" — results range from "almost human but wrong" to "this should not be talking"
Pre-generated at startup from a samples/ directory (just drop WAVs in to customize)
Server pushes clip references over WebSocket at random intervals
Client plays with randomized: stereo pan, volume (some barely audible), reverb amount
Content: fragments of sentences, numbers, names, nonsense syllables, distorted laughter
At high intensity: clips overlap, stack, pan rapidly

Layer 3 — Direct Address (server-triggered, real-time)

Server generates XTTS audio on the fly for "it sees you" moments
Phrases: "you're still here", "don't leave", "I can see you", "why"
Sparse at low intensity, more frequent later
Played dry (no reverb) — cuts through the ambient, feels close and intimate
These are the moments that actually scare people

Audio Escalation

Low: quiet drone, occasional barely-audible whisper
Medium: drone louder, whispers clearly audible, first direct address
High: layered voices, drone distorted, direct address frequent, sudden silences followed by spikes

Project Structure

ai-hell/
  server/
    main.py              # FastAPI app, WebSocket, static serving
    config.py            # Intensity params, timing ranges
    escalation.py        # Escalation engine, phase calculator, randomized timing
    asset_generator.py   # SDXL wrapper (repurposed from claude-avatar face_generator.py)
    voice_generator.py   # XTTS wrapper (repurposed from claude-avatar voice.py)
    asset_pool.py        # Pool management, severity tagging, rotation
    prompts.py           # Horror prompt library + procedural combiner
  frontend/
    index.html           # Fullscreen WebGL compositor
    shaders/             # GLSL fragment shaders for distortion effects
    ambient/             # Pre-baked drone loops (2-3 tracks)
  samples/               # Random audio files for XTTS voice cloning source
  requirements.txt
  IDEA.md
  SESSION.md
  DECISIONS.md
  CLAUDE.md

Reused from claude-avatar

FastAPI skeleton, WebSocket streaming, SDXL Turbo loading/inference, XTTS loading/inference, config pattern, systemd service pattern, Caddy reverse proxy pattern.

Dropped

Three.js 3D scene, GLB model, morph targets, MuseTalk, LivePortrait, animation loop.

New

Escalation engine, asset pool manager, shader compositor, horror prompt system, multi-layer Web Audio mixer.

API

WebSocket (`/stream`)

Server -> Client:

{"type": "phase", "intensity": 2.4, "params": {"morph_speed": 0.35, "shader_severity": 0.6, "palette": "crimson_void"}}
{"type": "asset", "url": "/assets/img_0042.png", "severity": 1.8, "transition": "melt"}
{"type": "whisper", "url": "/assets/audio/whisper_017.wav", "pan": -0.3, "volume": 0.4, "reverb": 0.7}
{"type": "address", "audio": "<base64 wav>", "text": "you're still here"}
{"type": "scare", "effect": "face_flash", "duration_ms": 150}

Client -> Server:

{"type": "ping"}

REST

GET /status
  -> {"intensity": 2.4, "connected_clients": 1, "asset_pool_size": 87, "audio_pool_size": 23}

POST /reset
  -> {"status": "ok"}  # restart escalation for this session

Future Path (Out of scope for v1)

Webcam integration — detect viewer's face, use it in the horror
Multi-viewer awareness — "there are others here"
Mobile haptics — vibration API timed to scares
Viewer voice cloning — mic input -> XTTS clones them back at them
Seasonal themes — different horror palettes/prompts

10 KiB Raw Permalink Blame History