task-queue-proof/README.md

# Unweighted Average Completion Time Is Not a Fair Metric for Task Scheduling

A mathematical proof that unweighted average task completion time is a biased
statistic that incentivizes cherry-picking easy work, and that any scheduling
advantage it appears to reveal is an artifact of the metric — not a reflection
of genuine throughput or service quality.

---

## 1. Introduction

Many organizations measure task-execution performance by **unweighted mean
completion time**: the average number of hours (or days) between task
submission and task resolution, counting each task equally regardless of
size or priority.

This paper proves that this metric is not merely imprecise but structurally
biased. It can be improved by reordering work without doing any additional
work (Theorem 1), while a properly weighted alternative is completely
immune to scheduling manipulation (Theorem 2). When combined with a
priority system, the metric actively contradicts the organization's own
priority classifications (Theorem 9).

The argument proceeds in four parts:

- **Part I** (Sections 2–4) establishes the mathematical foundation:
  the unweighted mean is gameable by Shortest Processing Time (SPT)
  scheduling, the work-weighted mean is schedule-invariant, and the
  resulting service-quality consequences are provably negative.

- **Part II** (Sections 5–6) extends the model to priority-classified
  tasks, proves the metric becomes adversarial to the priority system,
  and proposes weighted alternatives with a worked IT service desk example.

- **Part III** (Sections 7–9) examines organizational dynamics: what
  happens when the metric is reported to clients (information asymmetry),
  what happens to team members who understand its flaws (psychological
  harm), and what a single informed manager can do about it (constrained
  optimization with game-theoretic stability analysis).

- **Part IV** (Sections 10–12) presents honest counterarguments, situates
  the work in existing literature, and concludes.

The core results build on Smith's (1956) foundational scheduling theory [1],
extended through game theory [9, 10], organizational measurement theory
[18, 19], and psychology [11–17] to trace a complete chain from a
mathematical proof about a specific metric to organizational outcomes.

---

# Part I: Mathematical Foundation

## 2. Definitions

Let there be **n** tasks with processing times $p_1, p_2, \ldots, p_n$.

A **schedule** $\sigma$ is a permutation of $\{1, 2, \ldots, n\}$ assigning
tasks to execution order on a single executor.

The **completion time** of task $\sigma(k)$ under schedule $\sigma$ is:

$$C_{\sigma(k)} = \sum_{j=1}^{k} p_{\sigma(j)}$$

The **unweighted mean completion time** is:

$$\bar{C}(\sigma) = \frac{1}{n} \sum_{k=1}^{n} C_{\sigma(k)}$$

The **work-weighted mean completion time** is:

$$\bar{C}_w(\sigma) = \frac{\sum_{k=1}^{n} p_{\sigma(k)} \cdot C_{\sigma(k)}}{\sum_{k=1}^{n} p_{\sigma(k)}}$$

---

## 3. Core Results

### 3.1 The Unweighted Mean Is Gameable

**Theorem 1** (Smith, 1956 [1])**.** The schedule that minimizes
$\bar{C}(\sigma)$ is Shortest Processing Time first (SPT): sort tasks so
that $p_{\sigma(1)} \le p_{\sigma(2)} \le \cdots \le p_{\sigma(n)}$.

**Proof (exchange argument [1, 2]).**

Consider any schedule $\sigma$ in which two adjacent tasks $i, j$ satisfy
$p_i > p_j$ with task $i$ scheduled immediately before task $j$. Let $t$
be the start time of task $i$.

| | Task $i$ finishes | Task $j$ finishes | Sum |
|---|---|---|---|
| **Before swap** ($i$ then $j$) | $t + p_i$ | $t + p_i + p_j$ | $2t + 2p_i + p_j$ |
| **After swap** ($j$ then $i$) | $t + p_j$ | $t + p_j + p_i$ | $2t + p_i + 2p_j$ |

The change in the sum of completion times is:

$$(2p_i + p_j) - (p_i + 2p_j) = p_i - p_j > 0$$

Every swap of a longer-before-shorter adjacent pair strictly reduces the
total. Any non-SPT schedule contains such a pair. Repeated swaps converge
to SPT. Therefore SPT uniquely minimizes $\bar{C}(\sigma)$. $\blacksquare$

### 3.2 The Work-Weighted Mean Is Schedule-Invariant

**Theorem 2.** The work-weighted mean completion time $\bar{C}_w(\sigma)$
is the same for every schedule $\sigma$.

**Proof.**

Expand the numerator:

$$\sum_{k=1}^{n} p_{\sigma(k)} \cdot C_{\sigma(k)} = \sum_{k=1}^{n} p_{\sigma(k)} \sum_{j=1}^{k} p_{\sigma(j)}$$

Reindex by letting $a = \sigma(k)$ and $b = \sigma(j)$. The double sum
counts every ordered pair $(a, b)$ where $b$ is scheduled no later than $a$:

$$= \sum_{\substack{a, b \\ b \preceq_\sigma a}} p_a \, p_b$$

For any pair $(a, b)$ with $a \ne b$, exactly one of
$\{b \preceq_\sigma a\}$ or $\{a \prec_\sigma b\}$ holds. The diagonal
terms ($a = b$) contribute $p_a^2$ regardless of order. Therefore:

$$\sum_{\substack{a, b \\ b \preceq_\sigma a}} p_a \, p_b = \sum_{a} p_a^2 + \sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b$$

Together with the complementary sum, the two off-diagonal sums cover all
unordered pairs:

$$\sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b + \sum_{\substack{a \ne b \\ a \prec_\sigma b}} p_a \, p_b = \sum_{a \ne b} p_a \, p_b$$

The right-hand side is schedule-independent. By symmetry of $p_a p_b$,
both off-diagonal sums are equal:

$$\sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b = \frac{1}{2} \sum_{a \ne b} p_a \, p_b$$

Therefore:

$$\sum_{k=1}^{n} p_{\sigma(k)} \cdot C_{\sigma(k)} = \sum_a p_a^2 + \frac{1}{2} \sum_{a \ne b} p_a \, p_b = \frac{1}{2}\left(\sum_a p_a\right)^2 + \frac{1}{2}\sum_a p_a^2$$

This expression contains no reference to $\sigma$. Since the denominator
$\sum p_a$ is also schedule-independent:

$$\bar{C}_w(\sigma) = \frac{\frac{1}{2}\left(\sum p_a\right)^2 + \frac{1}{2}\sum p_a^2}{\sum p_a}$$

is **constant across all schedules**. $\blacksquare$

This is an instance of the conservation laws in scheduling identified by
Coffman, Shanthikumar, and Yao [20]. The invariance corresponds to
measuring how long a unit of *work* waits rather than how long a *task*
waits — the unweighted statistic counts completions rather than work,
which is why it is gameable. (See also Little [3, 4] for the queueing-
theoretic context, with the caveat that Little's Law applies directly
only to steady-state systems, not to the batch case analyzed here.)

### 3.3 Illustrative Example

Two tasks: $A$ with $p_A = 1$ hour, $B$ with $p_B = 10$ hours.

| Schedule | $C_A$ | $C_B$ | Unweighted mean | Work-weighted mean |
|----------|-------|-------|-----------------|-------------------|
| SPT (A first) | 1 | 11 | 6.0 | 111/11 ≈ 10.09 |
| Reverse (B first) | 11 | 10 | 10.5 | 111/11 ≈ 10.09 |

SPT appears **4.5 hours better** on the unweighted metric but provides
**zero improvement** on the work-weighted metric. The apparent advantage
exists only because the unweighted statistic lets a 1-hour task "vote"
equally with a 10-hour task.

---

## 4. Consequences for Service Quality

### 4.1 Starvation of Large Tasks

**Theorem 3 (Metric Bias).** Any scheduling policy that minimizes
unweighted mean completion time necessarily maximizes the completion time
of the largest task.

**Proof.** SPT places the largest task last. Its completion time equals
the total processing time $\sum p_i$, which is the maximum possible
completion time for any individual task. Under any schedule that does not
place the largest task last, that task completes strictly earlier.
$\blacksquare$

This creates a **starvation incentive**: rational agents optimizing the
unweighted statistic will indefinitely defer large tasks in favor of small
ones. Austin [18] identified this general pattern — that incomplete
measurement creates incentives to optimize the measured dimension at the
expense of unmeasured ones — in the context of organizational performance
management. Theorem 3 provides the specific mechanism for task scheduling.

### 4.2 Maximum Completion Time for the Largest Task

**Theorem 4 (SPT Uniquely Maximizes Completion Time of the Largest Task).**
Among all schedules, SPT is the unique policy that assigns the maximum
possible completion time ($\sum p_i$) to the largest task.

**Proof.** SPT sorts tasks in ascending order of $p_i$, placing the largest
task $p_{\max}$ in the last position. The last task in any schedule has
completion time $\sum_{i=1}^{n} p_i$, which is the maximum any individual
task can receive. Under any schedule that does not place $p_{\max}$ last,
it completes strictly before $\sum p_i$. $\blacksquare$

**Corollary 4.1.** A team optimizing unweighted mean completion time will
systematically deliver the worst experience to clients with the most
complex needs. This is not a side effect — it is the *mechanism* by which
the metric improves.

**Note on slowdown ratios.** SPT actually *compresses* slowdown ratios
($S_i = C_i / p_i$) because larger tasks in later positions have large
denominators that absorb the accumulated sum. For example, with tasks
$[1, 5, 10]$: SPT gives slowdowns $[1, 1.2, 1.6]$ (low variance) while
LPT gives $[1, 3, 16]$ (high variance). SPT's harm to large-task clients
is not visible in the slowdown ratio — it is visible in **absolute
completion time**. This distinction is important: the scheduling fairness
literature [21, 22, 23] has debated SPT/SRPT unfairness primarily through
slowdown-based measures, which can obscure the absolute-delay burden
proved below.

### 4.3 Delay Concentration

**Theorem 5 (SPT Concentrates Delay on the Largest Task).** Under SPT,
the largest task bears more absolute delay than under any other schedule.

**Proof.** Define absolute delay as $\Delta_i = C_i - p_i$ (time spent
waiting, independent of own size). Under SPT, the largest task is in
position $n$ with:

$$\Delta_{\max\text{-task}}^{\text{SPT}} = C_n - p_n = \sum_{i=1}^{n-1} p_i$$

This is the sum of all other tasks' processing times — the maximum possible
delay for any single task. Under any schedule where the largest task is not
last, its delay is strictly less. Meanwhile, SPT gives the smallest task
zero delay ($\Delta_1^{\text{SPT}} = 0$). The entire queuing burden is
shifted from small tasks to large tasks. $\blacksquare$

SPT minimizes *total* delay (good for aggregate efficiency) by
concentrating delay onto the tasks best able to absorb it in slowdown-ratio
terms. But in absolute terms — hours spent waiting — the largest task bears
the full weight.

### 4.4 Throughput Invariance

**Theorem 6 (Throughput Invariance).** Total work completed over any time
horizon $T$ is identical under all scheduling policies.

**Proof.** The executor processes work at a fixed rate. Over any horizon
$T \ge \sum p_i$, the total work done is exactly $\sum p_i$ regardless of
order. For the steady-state case with ongoing arrivals, the long-run
throughput is determined by the service rate $\mu$ and is completely
independent of scheduling:

$$\lim_{T \to \infty} \frac{W(T)}{T} = \mu \quad \text{for all schedules } \sigma$$

$\blacksquare$

**Corollary 6.1.** A team that switches from any scheduling policy to SPT
will observe an improvement in unweighted mean completion time with **zero
change in actual throughput**. The metric improves. The output does not.

### 4.5 The Aged-Task Abandonment Incentive

Theorems 3–5 show that SPT deprioritizes large tasks. But the metric
creates a second, more destructive incentive: **completing old tasks is
actively punished**.

**Theorem 6.1 (Aged-Task Penalty).** Completing a single task with
completion time $C_{\text{old}}$ increases the running mean by more than
completing $C_{\text{old}}$ tasks with completion time 1 each.

**Proof.** Let the team have completed $m$ tasks with running sum
$S = \sum_{i=1}^{m} C_i$ and running mean $\bar{C} = S/m$.

**Case 1:** Complete one task with completion time $C_{\text{old}}$:

$$\bar{C}_1 = \frac{S + C_{\text{old}}}{m + 1}$$

**Case 2:** Complete $C_{\text{old}}$ tasks each with completion time 1:

$$\bar{C}_2 = \frac{S + C_{\text{old}}}{m + C_{\text{old}}}$$

Both cases add the same value ($C_{\text{old}}$) to the numerator. But
Case 2 adds $C_{\text{old}}$ completions to the denominator, while Case 1
adds only 1. Therefore:

$$\bar{C}_1 - \bar{C}_2 = \frac{S + C_{\text{old}}}{m + 1} - \frac{S + C_{\text{old}}}{m + C_{\text{old}}} = (S + C_{\text{old}}) \cdot \frac{C_{\text{old}} - 1}{(m+1)(m + C_{\text{old}})}$$

For $C_{\text{old}} > 1$, this difference is strictly positive: the old
task produces a **worse average** than the equivalent volume of fresh
work. $\blacksquare$

**Example.** A team has completed 100 tasks with a running mean of 2 days
($S = 200$). They can either:

- Complete one 26-day-old task: $\bar{C} = 226/101 = 2.24$ days
- Complete 26 tasks at 1 day each: $\bar{C} = 226/126 = 1.79$ days

Same 26 days of total wait resolved. The metric says the second team is
better — 1.79 vs 2.24 — despite resolving the same total wait time.

**Corollary 6.2 (Abandonment Incentive).** Under the unweighted mean,
the rational response to an aged task is not to deprioritize it (SPT,
Theorem 3) but to **remove it from the system entirely** — close it as
"won't fix," transfer it to another team, or let it expire. This removes
the task from both numerator and denominator, protecting the average.

This goes beyond starvation. Theorems 3–5 prove that the metric
*delays* large and old tasks. Theorem 6.1 proves that the metric
*punishes completion of them* — meaning the incentive is not merely to
defer but to abandon. A metric that penalizes resolving the hardest
problems is not measuring performance; it is measuring avoidance.

---

### 4.6 The Compound Effect

Combining Theorems 4, 5, 6, and 6.1:

| Measure | Effect of optimizing unweighted mean |
|---------|--------------------------------------|
| Throughput (work/time) | No change (Theorem 6) |
| Delay for small tasks | Minimized — approaches zero (SPT) |
| Delay for large tasks | **Maximized** — bears all queuing burden (Theorem 5) |
| Completion time of largest task | **Maximum possible**: $\sum p_i$ (Theorem 4) |
| Incentive for aged tasks | **Abandon rather than complete** (Theorem 6.1) |

The net effect on perceived quality is negative because:

1. **Loss aversion is asymmetric** [8]. A client whose 100-hour task is
   deprioritized experiences a large, salient negative. A client whose
   1-hour task is expedited experiences a small, often unnoticed positive.

2. **High-effort tasks correlate with high-value clients.** Large tasks
   are disproportionately likely to come from major clients, complex
   contracts, or critical business needs.

3. **Starvation compounds.** In a continuous system (Theorem 3), large
   tasks may be **indefinitely deferred** as new small tasks keep arriving.

**Theorem 7 (The Core Result).** For a team processing tasks of non-uniform
size, adopting unweighted mean completion time as a performance metric:

(a) Provides **zero productivity gain** (Theorem 6), while
(b) **Assigning the maximum possible completion time** to the largest task
    (Theorem 4), and
(c) **Concentrating all queuing delay** onto the largest tasks while
    eliminating delay for the smallest (Theorem 5).

This is not a tradeoff. The metric creates a pure transfer of service
quality from high-effort clients to low-effort clients, with no net work
gained. $\blacksquare$

---

# Part II: Priority Systems

## 5. Breakdown Under Priority Classification

The preceding sections proved that unweighted mean completion time is
biased when tasks vary in size. We now show that introducing a **priority
system** — as virtually all real teams use — causes the metric to become
not merely biased but **actively adversarial** to the organization's stated
goals.

### 5.1 Extended Model: Tasks With Priority

Let each task $i$ have processing time $p_i$ and a priority class
$q_i \in \{1, 2, 3, 4\}$ where 1 is the highest priority (critical) and
4 is the lowest (cosmetic/enhancement). Assign priority weights:

$$w(q) = \begin{cases} 8 & q = 1 \text{ (Critical)} \\ 4 & q = 2 \text{ (High)} \\ 2 & q = 3 \text{ (Medium)} \\ 1 & q = 4 \text{ (Low)} \end{cases}$$

The specific weights are illustrative; the results hold for any strictly
decreasing weight function. The key property is that priority is assigned
by **business impact**, not by task size.

### 5.2 The Metric Contradicts the Priority System

**Theorem 8 (Priority-Size Inversion).** When priority is independent of
task size, the schedule that minimizes unweighted mean completion time
(SPT) will, in expectation, complete low-priority tasks before
high-priority tasks of greater size.

**Proof.** SPT orders tasks by $p_i$ ascending, regardless of $q_i$.
Consider two tasks:

- Task A: $p_A = 40$ hours, $q_A = 1$ (Critical — e.g., server outage)
- Task B: $p_B = 0.5$ hours, $q_B = 4$ (Low — e.g., cosmetic UI fix)

SPT schedules B before A. The unweighted mean for this pair:

$$\bar{C}^{\text{SPT}} = \frac{0.5 + 40.5}{2} = 20.5 \qquad \bar{C}^{\text{priority}} = \frac{40 + 40.5}{2} = 40.25$$

The metric declares SPT nearly **twice as good** — despite completing a
cosmetic fix while a server outage burns.

In general, when $q_i$ is statistically independent of $p_i$, SPT's
ordering has **zero correlation** with priority. In practice, Critical
tasks (outages, security incidents, data loss) often require more work
than Low tasks, so the metric is plausibly **anti-correlated** with the
priority system. $\blacksquare$

### 5.3 Information Destruction

The unweighted mean reduces a three-dimensional task $(p_i, q_i, C_i)$ to
a one-dimensional signal ($C_i$), then averages uniformly. This discards
priority entirely and implicitly inverts size.

**Theorem 9 (Information Destruction).** Let $I(\sigma)$ be the mutual
information between the schedule's implicit priority ranking (position)
and the actual priority assignment $q_i$. For SPT:

$$I(\sigma_{\text{SPT}}) = 0 \quad \text{when } p_i \perp q_i$$

**Proof.** SPT assigns positions based solely on $p_i$. When $p_i$ and
$q_i$ are independent, knowing a task's position in the SPT schedule
provides zero information about its priority. $\blacksquare$

**Corollary 9.1.** A team that optimizes unweighted mean completion time
is operating a scheduling system that carries zero information about its
own priority classification. The priority field in their ticketing system
is, with respect to execution order, decorative.

This is an instance of what Austin [18] calls the fundamental problem of
incomplete measurement: when the measurement system captures only a subset
of the relevant dimensions, optimizing the measurement systematically
degrades the unmeasured dimensions.

### 5.4 Priority-Weighted Delay Cost

Define the **priority-weighted delay cost** of a schedule:

$$D(\sigma) = \sum_{i=1}^{n} w(q_i) \cdot C_i$$

**Theorem 10 (SPT and Priority-Weighted Delay Cost).** The optimal
schedule for minimizing $D(\sigma)$ is WSJF: order by $w(q_i)/p_i$
descending [1, 5]. SPT's ordering — by $1/p_i$ descending — ignores
priority entirely and produces higher $D$ than priority-respecting
alternatives when priority is correlated with task size.

**Proof.** By the exchange argument, swapping adjacent tasks $i, j$
changes $D$ by:

$$\Delta D = w(q_j) \cdot p_i - w(q_i) \cdot p_j$$

The swap improves $D$ when $w(q_j)/p_j > w(q_i)/p_i$ but $j$ is
scheduled after $i$. Therefore the optimal order is decreasing
$w(q_i)/p_i$ — the WSJF rule. SPT corresponds to WSJF only when
$w(q_i) = \text{const}$ (all tasks have equal priority).

**Example.** Critical ($w = 8$, $p = 3$) and Low ($w = 1$, $p = 2$):

- SPT (Low first): $D = 1 \cdot 2 + 8 \cdot 5 = 42$
- WSJF (Critical first): $D = 8 \cdot 3 + 1 \cdot 5 = 29$

SPT incurs 45% more priority-weighted delay. In practice, Critical tasks
tend to be larger (outages, security incidents), making the divergence
systematic. $\blacksquare$

---

## 6. Proposed Solutions

### 6.1 Priority-Weighted Metrics

Replace unweighted mean completion time with the **Priority-Weighted
Completion Score (PWCS)**:

$$\text{PWCS}(\sigma) = \frac{\sum_{i=1}^{n} w(q_i) \cdot \frac{C_i}{p_i}}{\sum_{i=1}^{n} w(q_i)}$$

This is the priority-weighted mean slowdown ratio. It measures how long
each task waited relative to its size, weighted by how much that task
mattered. Lower is better.

**Properties:**

1. **Priority-respecting.** Delays to Critical tasks cost 8x more than
   delays to Low tasks.
2. **Size-fair.** Uses slowdown ratio $C_i / p_i$, so large tasks are not
   penalized for being large.
3. **Not gameable by SPT.** Reordering by processing time does not
   systematically improve the score.
4. **Reduces to unweighted mean when tasks are uniform.** A strict
   generalization.

### 6.2 Optimal Policy: WSJF

**Theorem 11.** The schedule minimizing the priority-weighted completion
time $\text{PWCT}(\sigma) = \sum w(q_i) \cdot C_i / \sum w(q_i)$ processes
tasks in order of decreasing $w(q_i)/p_i$ — the **Weighted Shortest Job
First (WSJF)** rule [1, 5].

**Proof.** By the exchange argument (as in Theorem 10), the swap of
adjacent tasks $i, j$ improves PWCT when $w(q_j)/p_j > w(q_i)/p_i$ but
$j$ is scheduled after $i$. The optimal order is therefore decreasing
$w(q_i)/p_i$. $\blacksquare$

Within a priority class, this reduces to SPT (shortest first). Across
classes, a Critical 4-hour task ($w/p = 2.0$) beats a Low 1-hour task
($w/p = 1.0$).

**Practical caveat.** Pure WSJF can place tiny Low-priority tasks ahead
of large Critical tasks (a 15-minute Low task has $w/p = 1/0.25 = 4.0$,
beating a 6-hour Critical at $w/p = 8/6 = 1.33$). In practice, this is
mitigated by enforcing **strict priority-class ordering** and applying
WSJF only *within* each class.

### 6.3 Applied Example: IT Service Desk

Consider an IT team with the following ticket queue:

| Ticket | Priority | Type | Est. Hours |
|--------|----------|------|-----------|
| T1 | P1 (Critical) | Email server down | 6 |
| T2 | P2 (High) | VPN failing for remote team | 4 |
| T3 | P3 (Medium) | New employee laptop setup | 2 |
| T4 | P4 (Low) | Update desktop wallpaper policy | 0.5 |
| T5 | P3 (Medium) | Install software license | 1 |
| T6 | P1 (Critical) | Database backup failing | 3 |
| T7 | P2 (High) | Printer fleet offline | 2 |
| T8 | P4 (Low) | Archive old shared drive folder | 0.25 |

**SPT order** (optimizing unweighted mean): T8, T4, T5, T3, T7, T6, T2, T1

| Pos | Ticket | Priority | Hours | Completion | Slowdown |
|-----|--------|----------|-------|------------|----------|
| 1 | T8 (archive folder) | P4 Low | 0.25 | 0.25 | 1.0 |
| 2 | T4 (wallpaper) | P4 Low | 0.5 | 0.75 | 1.5 |
| 3 | T5 (software) | P3 Med | 1 | 1.75 | 1.75 |
| 4 | T3 (laptop) | P3 Med | 2 | 3.75 | 1.875 |
| 5 | T7 (printers) | P2 High | 2 | 5.75 | 2.875 |
| 6 | T6 (backups) | P1 Crit | 3 | 8.75 | 2.917 |
| 7 | T2 (VPN) | P2 High | 4 | 12.75 | 3.188 |
| 8 | T1 (email) | P1 Crit | 6 | 18.75 | 3.125 |

**Practical WSJF** (priority-class-first, SPT within class):

| Pos | Ticket | Priority | Hours | Completion |
|-----|--------|----------|-------|------------|
| 1 | T6 (backups) | P1 Crit | 3 | 3 |
| 2 | T1 (email) | P1 Crit | 6 | 9 |
| 3 | T7 (printers) | P2 High | 2 | 11 |
| 4 | T2 (VPN) | P2 High | 4 | 15 |
| 5 | T5 (software) | P3 Med | 1 | 16 |
| 6 | T3 (laptop) | P3 Med | 2 | 18 |
| 7 | T8 (archive) | P4 Low | 0.25 | 18.25 |
| 8 | T4 (wallpaper) | P4 Low | 0.5 | 18.75 |

**Comparison:**

| Metric | SPT | Practical WSJF | Winner |
|--------|-----|----------------|--------|
| Unweighted mean completion | **6.56 hrs** | 13.63 hrs | SPT |
| P1 mean time to resolution | 13.75 hrs | **6 hrs** | WSJF |
| P2 mean time to resolution | 9.25 hrs | **13 hrs** | SPT |
| Time to fix email server | 18.75 hrs | **9 hrs** | WSJF |
| Time to fix database backups | 8.75 hrs | **3 hrs** | WSJF |
| Time to update wallpaper | **0.75 hrs** | 18.75 hrs | SPT |

The aggregate priority-weighted completion times are nearly identical
(PWCT: 10.2 vs 10.17) because aggregation hides distributional damage.
The real difference is in the **per-priority-class** breakdown: the email
server is down for 18.75 hours under SPT versus 9 hours under WSJF. The
database backups fail for 8.75 hours versus 3.

The unweighted metric confidently reports SPT as **more than twice as
efficient** (6.56 vs 13.63), rewarding the team that updated desktop
wallpaper while the email server was on fire.

### 6.4 Recommended Metric Suite

Even priority-weighted aggregate metrics can fail to distinguish good from
bad schedules, because aggregation hides distributional damage. No single
metric suffices. A complete measurement system should track:

| Metric | What it measures | Formula |
|--------|-----------------|---------|
| **Mean completion by priority class** | Per-class responsiveness | $\bar{C}$ filtered by $q$ |
| **P1 mean time to resolution** | Critical incident response | $\bar{C}$ for $q = 1$ |
| **Throughput** | Raw work capacity | Work-hours completed / calendar time |
| **Aging violations** | Starvation prevention | Tasks exceeding SLA by priority |
| **Max completion time (P1/P2)** | Worst-case critical response | $\max(C_i)$ for $q \le 2$ |

The key insight: **per-priority-class metrics** expose scheduling failures
that aggregate metrics hide.

---

# Part III: Organizational Dynamics

## 7. When the Metric Is the Product

Sections 2–6 assume that client satisfaction is a function of *experienced
service quality*. But there exists a scenario in which this assumption
fails and the entire argument collapses.

### 7.1 The Self-Referential Metric

Suppose the provider reports the unweighted mean directly to the client
— on a dashboard, in an SLA report, on a marketing page — and the
client's satisfaction is derived primarily from *that number*:

$$U_{\text{client}} = f\!\left(\bar{C}(\sigma)\right), \quad f' < 0$$

Under this model, SPT genuinely maximizes client satisfaction (Theorem 1).
Throughput is unchanged (Theorem 6). The business outcome improves: same
work done, happier client.

**Every theorem in this paper remains mathematically correct. But the
conclusion inverts.** The metric is no longer a proxy that can be gamed —
it *is* the service quality, because the client has agreed to evaluate
quality by the aggregate number.

### 7.2 The Economics

This creates a coherent, stable equilibrium:

| Actor | Behavior | Outcome |
|-------|----------|---------|
| Provider | Optimizes unweighted mean (SPT) | Metric improves, no extra work |
| Client | Reads dashboard, sees low average | Reports satisfaction |
| Management | Sees satisfied client + good metric | Rewards team |

The provider extracts satisfaction at zero marginal cost, by optimizing a
number the client has accepted as a proxy for quality.

### 7.3 The Fragility

This equilibrium is stable only as long as the client never inspects their
own experience. It breaks when:

1. **The client checks their own ticket.** A CTO whose email server was
   down for 18.75 hours will not be reassured by "Average resolution:
   6.56 hours." The clients most likely to inspect are exactly the ones
   receiving the worst service (Theorem 4).

2. **A competitor offers per-ticket SLAs.** "P1 resolved within 4 hours"
   beats "average resolution under 7 hours" for any client with critical
   needs.

3. **The team internalizes the metric.** If the team believes the metric
   reflects real performance, they lose the ability to recognize when
   critical work is neglected. The metric becomes an epistemic hazard.

### 7.4 The General Pattern

This pattern — proxy replaces quality, proxy is optimized, quality
diverges, system is stable until tested by reality — recurs across domains.
Muller [19] documents it extensively as "metric fixation"; Campbell [24]
formalized the corrupting effect of using indicators as targets.

| Domain | Proxy metric | Underlying quality | Divergence |
|--------|-------------|-------------------|------------|
| IT support | Avg. resolution time | Critical system uptime | Server down 19 hrs, avg says 6.5 |
| Education | Test scores | Actual learning | Teaching to the test |
| Healthcare | Patient throughput | Patient outcomes | Faster discharges, higher readmission |
| Finance | Quarterly earnings | Long-term value | Cost-cutting inflates EPS, erodes capability |
| Software | Velocity (story points) | Product quality | Point inflation, features half-finished |

### 7.5 Information Asymmetry

Model the system as a game between provider (P) and client (C). P observes
individual $\{C_i\}$ and chooses $\sigma$; C observes only
$\bar{C}(\sigma)$. This is a **moral hazard** problem [10]: P's optimal
strategy is to minimize the observable signal regardless of the
unobservable distribution.

The equilibrium is a **pooling equilibrium** [9]: P's reported metric
looks identical regardless of the underlying priority-weighted performance.
It is stable until C obtains access to individual $C_i$ values — via a
customer portal, a competitor's transparency, or a sufficiently painful
incident.

### 7.6 The Uncomfortable Conclusion

The honest answer to "does optimizing the unweighted mean hurt the
business?" is: **not necessarily, as long as the client never looks behind
the number**. The honest answer to "is this sustainable?" is: it is
exactly as sustainable as any system in which the seller knows more than
the buyer — stable for extended periods, then rapid collapse when the
asymmetry is punctured.

---

## 8. The Psychological Cost of Knowing

Section 7 modeled the provider as a unitary actor. But teams are composed
of individuals. When a team member understands the proof — when they
*know* the metric is synthetic, that the dashboard is theater, that the
email server is still down while they close wallpaper tickets — a new cost
appears that the equilibrium model omitted.

### 8.1 The Hidden Variable: Team Awareness

| Actor | Observes individual $C_i$ | Observes $\bar{C}$ | Understands the proof |
|-------|--------------------------|--------------------|-----------------------|
| Management | Possibly | Yes | Varies |
| Team member | **Yes** | Yes | **Yes** (in this scenario) |
| Client | No | Yes | No |

The team member has full information. They see the ticket queue. They know
the email server has been down since 7 AM. They know they are closing a
wallpaper ticket because it improves the number. And they know *why*.

### 8.2 Cognitive Dissonance Under Full Information

Cognitive dissonance [11] arises when an individual holds contradictory
cognitions. Without understanding *why*, the contradiction can be
rationalized: "management knows best." Understanding the proof removes
the ambiguity. The team member now holds:

- **Cognition A:** "I am a competent professional. My job is to solve
  important problems."
- **Cognition B:** "I am closing a wallpaper ticket while the email
  server is down, because the metric is mathematically biased (Theorem 1),
  the reordering produces zero throughput (Theorem 6), and the only
  beneficiary is the dashboard (Section 7). I can prove this."

The dissonance is now *load-bearing*. The available resolutions — abandon
professional identity, reject the proof, advocate for change, or leave —
each impose costs that did not exist before.

### 8.3 Self-Determination Theory: Three Needs Violated

Deci and Ryan's Self-Determination Theory [12, 13] identifies three needs
predicting intrinsic motivation:

**Autonomy.** The metric constrains choices in a way the team member
knows is mathematically suboptimal. A worker who understands the process
is provably counterproductive cannot feel autonomous following it.

**Competence.** The metric rewards *apparent* effectiveness (low $\bar{C}$)
while being invariant to *actual* effectiveness (Theorem 6). Genuine
competence — fixing the email server first — is *punished* by the metric.

**Relatedness.** The team member knows the client's email server is down.
They could help. They are instead updating wallpaper — not because it
helps anyone, but because it helps a number. The connection between work
and human impact has been severed, and the team member can see the severed
ends.

### 8.4 Moral Injury

Moral injury [16, 17] is the lasting harm caused by "perpetrating, failing
to prevent, bearing witness to, or learning about acts that transgress
deeply held moral beliefs" [17]. It has since been extended to business
settings [25]. The key distinction from burnout: **burnout is exhaustion
from doing too much. Moral injury is damage from doing the wrong thing.**

A team member who knows the email server is down, knows they should fix
it, closes a wallpaper ticket instead, and does so because the metric
requires it, is experiencing the structural conditions for moral injury.

### 8.5 Learned Helplessness and Metric Fatalism

Seligman's learned helplessness [14, 15] describes how exposure to
uncontrollable negative outcomes leads to passivity. The sequence:

1. The metric is flawed (proof understood).
2. Advocate for change.
3. Rejected ("the numbers are good, don't rock the boat").
4. Repeat with decreasing conviction.
5. Terminal state: "The metric is what it is. I'll just close tickets."

This is not laziness. It is the rational response to a system that
punishes correct behavior and rewards incorrect behavior, when the
individual lacks power to change the system.

### 8.6 The Adversarial Selection Spiral

Combining Section 7's equilibrium with the turnover dynamic:

1. Organization adopts unweighted mean. Metric looks good (SPT).
2. Aware, competent team members experience psychological costs (8.2–8.5).
3. Those members leave. Replaced by members who do not understand the
   metric's flaws or do not care.
4. The metric continues to look good — it always does under SPT,
   regardless of team competence (Corollary 6.1).
5. Actual service quality degrades, but the metric cannot detect this
   (Corollary 9.1).
6. Return to step 1.

The metric selects *against* the people who would improve the system and
*for* the people who will not challenge it. The system stabilizes at a
lower level of competence, invisible to its own measurement apparatus.

### 8.7 The Complete Cost Model

| Section 7 (visible) | Section 8 (hidden) |
|---------------------|---------------------|
| Client satisfied (good number) | Team dissatisfied (bad reality) |
| Throughput unchanged | Discretionary effort withdrawn |
| Metric improves | Competent members leave |
| Business economy stable | Institutional competence degrades |

These operate on different timescales: the equilibrium is visible
quarterly; the competence degradation is visible over years. The complete
model is: **the metric works, and it is destructive, and the destruction
is invisible to the metric.** The metric is fresh paint on corroded rebar.

---

## 9. Manager Internalization: The Actionable Solution

Sections 2–6 say reject the metric. Section 7 says the metric works
(for the business). Section 8 says it destroys the team. In practice,
most managers cannot unilaterally change the metric. The best solution is
company-wide metric reform. The *actionable* solution is what a single
informed manager can do right now.

### 9.1 The Strategy

A manager who understands the proof can **internalize the metric's
limitations without propagating them to the team**:

1. **Schedule primarily by priority.** The team works critical tasks first.
2. **Tactically interleave small tasks.** When a small low-priority task
   can be completed without materially delaying high-priority work, do it.
   Not because the metric demands it, but because it also needs to get
   done and costs almost nothing.
3. **Never reveal the metric as the motivation.** "Knock out this quick
   one while we wait for the vendor callback on the P1" — not "we need
   to bring our average down." The team's intrinsic motivation remains
   intact (Section 8). The manager absorbs the metric-management burden.

### 9.2 Formalization

The manager's problem is a constrained optimization:

$$\min_{\sigma} \sum_{i=1}^{n} w(q_i) \cdot C_i \quad \text{subject to} \quad \bar{C}(\sigma) \le \bar{C}_{\text{target}}$$

**Theorem 12 (Bounded Metric Cost of Priority Scheduling).** A manager
who uses SPT *within* each priority class and priority ordering *between*
classes will produce a metric close to the SPT-optimal value — the gap
arises only from between-class inversions.

**Proof sketch.** Within each priority class, SPT is free (all tasks have
equal priority). The only deviation from global SPT is the between-class
ordering. Each cross-class inversion costs at most
$p_{\text{large}} - p_{\text{small}}$ in the unweighted sum, and these
inversions are bounded by the number of classes. In practice, the gap is
typically within 10–20% of SPT-optimal. $\blacksquare$

### 9.3 The Manager as Information Barrier

| Layer | Sees metric | Sees priorities | Sees proof |
|-------|-----------|----------------|------------|
| Organization | Yes | Nominally | No |
| Manager | Yes | Yes | **Yes** |
| Team | No (shielded) | Yes | Irrelevant |
| Client | Yes (dashboard) | Via SLA | No |

The manager is the only actor holding all three pieces of information.
This is not manipulation — they are doing the right work in the right
order, and the metric happens to be acceptable because within-class SPT
is free.

### 9.4 The Competitive Breakdown

This strategy fails when the metric becomes **competitive between teams**.

**Case 1: Cooperative** — Teams measured for parity, not ranking. Each
manager independently uses the internalization strategy. The metric is
decorative but harmless. This is a **coordination game** with a stable
cooperative equilibrium.

**Case 2: Competitive** — Teams ranked by $\bar{C}$. This is a
**prisoner's dilemma**:

| | Team B: Priority-first | Team B: SPT |
|---|---|---|
| **Team A: Priority-first** | (Good work, Good work) | (A looks bad, B looks good) |
| **Team A: SPT** | (A looks good, B looks bad) | (Both look good, both do wrong work) |

The Nash equilibrium is (SPT, SPT). The internalization strategy is a
cooperative equilibrium that is **not stable under competition**.

### 9.5 Scope

| Condition | Viability |
|-----------|-----------|
| Metric used for health-check / parity | **Viable** |
| Metric visible but not ranked | **Viable** |
| Metric ranked across teams | **Fragile** — requires all managers to cooperate |
| Metric tied to compensation / resources | **Not viable** — prisoner's dilemma dominates |
| Metric reform possible at org level | **Unnecessary** — fix the metric instead |

**The best solution is company-wide. The actionable solution is a manager
who understands this proof, shields their team from the metric, schedules
by priority, and uses SPT only within priority classes to keep the number
reasonable.**

---

# Part IV: Assessment

## 10. Devil's Advocate

Intellectual honesty requires acknowledging where the argument has limits.

### 10.1 Simplicity Has Real Value

**Argument.** The unweighted mean requires no priority weights, no
task-size estimates, no calibration.

**Assessment: True.** But the unweighted metric does not avoid assumptions
— it *hides* them by implicitly setting all weights to 1 and all sizes to
1. A known-imprecise estimate of task size is still more informative than
the implicit assumption that all sizes are equal.

### 10.2 Minimizing the Number of People Waiting

**Argument.** SPT minimizes total person-hours spent waiting. If each
task represents one client, this is optimal.

**Assessment: Mathematically correct.** If you run a DMV and every
person's time is equally valuable, SPT is the right policy. It breaks
down when tasks are not 1:1 with clients, waiting cost is not uniform,
or the metric is used to evaluate teams rather than serve a literal queue.

### 10.3 SPT as a Triage Heuristic

**Argument.** When task sizes cluster tightly, SPT approximates FIFO
and the unweighted mean approximates the weighted mean.

**Assessment: Correct.** The coefficient of variation $CV = \sigma_p / \bar{p}$ determines distortion severity:

| $CV$ | Task size distribution | Distortion |
|------|----------------------|------------|
| < 0.3 | Tight (call center) | Negligible |
| 0.3 – 1.0 | Moderate (mixed IT) | Moderate |
| > 1.0 | Wide (typical IT queue) | Severe |

A typical IT desk spans 15 minutes to 40+ hours ($CV > 2$). The
distortion is not an edge case — it is the default.

### 10.4 Gaming Requires Malice

**Argument.** The theorems show the metric *can* be gamed, not that it
*will* be gamed.

**Assessment: This is the strongest counterargument.** If the metric is
purely informational and never influences behavior, the gaming incentive
is absent. However, any metric reported to management, tied to OKRs, or
discussed in retrospectives will influence behavior. This is Goodhart's
Law [6, 7] — and it applies to well-intentioned teams as reliably as to
cynical ones. The drift happens organically: completing three easy tickets
"feels productive" while the metric validates the feeling.

### 10.5 When the Unweighted Mean Is Defensible

The metric is defensible **only when all four conditions hold**:

1. Task sizes are approximately uniform ($CV < 0.3$)
2. No priority differentiation (all tasks equally important)
3. Each task represents exactly one client
4. The metric is not used to evaluate, reward, or direct behavior

These conditions are rarely met in the systems where the metric is most
commonly used.

---

## 11. Related Work

This paper sits at the intersection of several literatures that have not
previously been connected.

### 11.1 Scheduling Theory and Fairness

Smith [1] established the SPT optimality result and the WSJF rule in 1956.
Conway, Maxwell, and Miller [2] provided the comprehensive textbook
treatment. The fairness of size-based scheduling policies has been debated
in computer systems scheduling: Bansal and Harchol-Balter [22] investigated
SRPT unfairness; Wierman and Harchol-Balter [23] formalized fairness
classifications against Processor-Sharing; Angel, Bampis, and Pascual [21]
measured SPT schedule quality against fair optimality criteria.

This prior work analyzes fairness in CPU and server scheduling. The present
paper applies the same mathematical results to *organizational task
management*, where the "scheduler" is a human team, the "jobs" are client
requests with business-impact priorities, and the "objective function" is
a management metric. The mechanism is identical; the consequences differ
because organizational scheduling has priority systems, client
relationships, and psychological costs that CPU scheduling does not.

### 11.2 Measurement Dysfunction

Austin [18] proved that incomplete measurement — measuring only a subset
of relevant dimensions — creates incentives to optimize the measured
dimensions at the expense of unmeasured ones, and that this effect is not
merely possible but *inevitable* when measurement is tied to rewards. His
information-asymmetry framing closely parallels Section 7. The present
paper provides the specific mathematical mechanism (Theorems 1–2) for the
case of task scheduling, and extends the argument through psychology
(Section 8) to trace the complete chain of organizational harm.

Muller [19] documented "metric fixation" across education, healthcare,
policing, and finance, providing extensive empirical evidence for the
patterns theorized in Section 7.4. Campbell [24] formalized the corrupting
effect of using indicators as targets, complementing Goodhart's original
observation [6] and Strathern's generalization [7].

Bevan and Hood [26] empirically documented gaming behaviors in the English
public health system — including the exact patterns of "hitting the target
and missing the point" described in our Section 5.2.

### 11.3 Psychological Costs of Metric Dysfunction

The application of moral injury (Shay [16], Litz et al. [17]) to business
settings has recent precedent: a 2024 *Journal of Business Ethics* study
[25] explicitly extended the construct to for-profit workplaces, finding
structural conditions similar to those described in Section 8.4. Moore
[27] analyzed moral *disengagement* — the cognitive restructuring that
enables unethical behavior under organizational pressure. The present
paper addresses the complementary phenomenon: the harm to individuals who
*refuse* to disengage.

### 11.4 What Is Novel

The individual components — SPT optimality, Goodhart's Law, measurement
dysfunction, moral injury — all have precedent. The contributions of this
paper are:

1. **The conservation law (Theorem 2) used prescriptively** — as a
   constructive argument that work-weighted completion time *cannot* be
   gamed, rather than as a theoretical scheduling result.

2. **The specific proof that priority classes make the metric algebraically
   adversarial** (Theorems 8–9) — not merely empirically bad but
   structurally contradictory, with zero mutual information between the
   schedule and the priority system.

3. **The integrated chain** from mathematical proof through information
   asymmetry through psychological harm through adversarial selection
   spiral — tracing a single metric from Smith (1956) to organizational
   hollowing.

4. **The manager internalization strategy** (Section 9) with formal
   game-theoretic analysis of its stability and breakdown conditions
   under inter-team competition.

5. **The application of scheduling theory to organizational management
   critique** — proving that a commonly used team metric has specific,
   quantifiable pathologies rather than arguing from anecdote or
   general principle.

---

## 12. Conclusion

The unweighted average completion time is a **biased statistic** that:

1. **Can be gamed** by scheduling policy (Theorem 1), unlike work-weighted
   completion time which is schedule-invariant (Theorem 2).
2. **Incentivizes starvation** of large tasks (Theorem 3).
3. **Punishes completion of aged tasks**, incentivizing abandonment
   over resolution (Theorem 6.1).
4. **Degrades client satisfaction** with zero compensating productivity
   gain (Theorem 7).
5. **Actively contradicts priority systems** by carrying zero information
   about business-impact classification (Theorem 9).
6. **Ignores priority entirely** in its scheduling recommendation,
   producing suboptimal priority-weighted delay whenever priority and
   size are not perfectly inversely correlated (Theorem 10).

A metric that can be improved by reordering work — without doing any
additional work — is measuring the scheduling policy, not the system's
capacity. When combined with a priority system, it recommends the schedule
that inflicts the most damage on the highest-priority work.

When the metric is reported to clients, it creates an information asymmetry
(Section 7) whose business equilibrium is profitable but fragile. When
team members understand its flaws, it violates their intrinsic motivation
and selects for the departure of the most competent people (Section 8).
A single informed manager can partially mitigate these effects through
constrained optimization (Section 9), but this cooperative strategy is
not stable under inter-team competition.

The unweighted mean is defensible only under narrow conditions
(Section 10.5): uniform task sizes, no priorities, one-to-one client-task
mapping, and no behavioral influence. These conditions are rarely met.

**Unweighted average completion time is not a fair or accurate measurement
of task execution performance. Its adoption as a team metric will
rationally produce starvation of complex work, violation of stated
priorities, inequitable client outcomes, and the illusion of productivity
where none exists.**

The best solution is organizational metric reform. The actionable solution
is a manager who understands this proof.

---

## References

### Scheduling Theory

[1] Smith, W. E. (1956). Various optimizers for single-stage production.
*Naval Research Logistics Quarterly*, 3(1–2), 59–66.
doi:[10.1002/nav.3800030106](https://doi.org/10.1002/nav.3800030106)

> Origin of the SPT optimality result (Theorem 1), the weighted completion
> time rule $w_i/p_i$ descending (WSJF, Theorem 11), and the adjacent-job
> pairwise interchange (exchange argument) proof technique used throughout.

[2] Conway, R. W., Maxwell, W. L., & Miller, L. W. (1967). *Theory of
Scheduling*. Addison-Wesley.

> Standard textbook treatment of single-machine scheduling theory,
> extending Smith's results.

[3] Little, J. D. C. (1961). A proof for the queuing formula: L = λW.
*Operations Research*, 9(3), 383–387.
doi:[10.1287/opre.9.3.383](https://doi.org/10.1287/opre.9.3.383)

> First rigorous proof of Little's Law. Referenced in Section 3.2 for
> queueing-theoretic context.

[4] Little, J. D. C. (2011). Little's Law as viewed on its 50th
anniversary. *Operations Research*, 59(3), 536–549.
doi:[10.1287/opre.1110.0941](https://doi.org/10.1287/opre.1110.0941)

> Retrospective discussing scope, limitations, and common misapplications.

[5] Reinertsen, D. G. (2009). *The Principles of Product Development
Flow: Second Generation Lean Product Development*. Celeritas Publishing.
ISBN: 978-0-9844512-0-8.

> Popularized WSJF and "Cost of Delay / Duration" in agile/lean contexts.
> Mathematical foundation is Smith (1956) [1].

### Measurement and Incentives

[6] Goodhart, C. A. E. (1984). Problems of monetary management: The U.K.
experience. In *Monetary Theory and Practice* (pp. 91–121). Macmillan.

> Source of Goodhart's Law: "Any observed statistical regularity will tend
> to collapse once pressure is placed upon it for control purposes."

[7] Strathern, M. (1997). 'Improving ratings': Audit in the British
university system. *European Review*, 5(3), 305–321.
doi:[10.1002/(SICI)1234-981X(199707)5:3<305::AID-EURO184>3.0.CO;2-4](https://doi.org/10.1002/(SICI)1234-981X(199707)5:3%3C305::AID-EURO184%3E3.0.CO;2-4)

> Generalized Goodhart's Law: "When a measure becomes a target, it ceases
> to be a good measure."

### Behavioral Economics

[8] Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of
decision under risk. *Econometrica*, 47(2), 263–292.
doi:[10.2307/1914185](https://doi.org/10.2307/1914185)

> Established loss aversion. Referenced in Section 4.5.

### Game Theory and Contract Theory

[9] Akerlof, G. A. (1970). The market for "lemons": Quality uncertainty
and the market mechanism. *The Quarterly Journal of Economics*, 84(3),
488–500. doi:[10.2307/1879431](https://doi.org/10.2307/1879431)

> Information asymmetry and adverse selection. The pooling equilibrium in
> Section 7.5 is structurally analogous.

[10] Hölmstrom, B. (1979). Moral hazard and observability. *The Bell
Journal of Economics*, 10(1), 74–91.
doi:[10.2307/3003320](https://doi.org/10.2307/3003320)

> Formal treatment of moral hazard. The metric-reporting scenario in
> Section 7.5 is a moral hazard problem.

### Psychology

[11] Festinger, L. (1957). *A Theory of Cognitive Dissonance*. Stanford
University Press. ISBN: 978-0-8047-0131-0.

> Foundational theory. Referenced in Section 8.2.

[12] Deci, E. L., & Ryan, R. M. (1985). *Intrinsic Motivation and
Self-Determination in Human Behavior*. Plenum Press.
ISBN: 978-0-306-42022-1.

> Original treatment of Self-Determination Theory. Referenced in
> Section 8.3.

[13] Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and
the facilitation of intrinsic motivation, social development, and
well-being. *American Psychologist*, 55(1), 68–78.
doi:[10.1037/0003-066X.55.1.68](https://doi.org/10.1037/0003-066X.55.1.68)

> SDT overview linking need satisfaction to intrinsic motivation and
> well-being.

[14] Seligman, M. E. P., & Maier, S. F. (1967). Failure to escape
traumatic shock. *Journal of Experimental Psychology*, 74(1), 1–9.
doi:[10.1037/h0024514](https://doi.org/10.1037/h0024514)

> Original demonstration of learned helplessness. Referenced in
> Section 8.5.

[15] Seligman, M. E. P. (1975). *Helplessness: On Depression,
Development, and Death*. W. H. Freeman. ISBN: 978-0-7167-0752-3.

> Extended treatment connecting learned helplessness to human depression
> and institutional behavior.

[16] Shay, J. (1994). *Achilles in Vietnam: Combat Trauma and the Undoing
of Character*. Atheneum / Simon & Schuster. ISBN: 978-0-689-12182-3.

> Introduced the concept of moral injury. Referenced in Section 8.4.

[17] Litz, B. T., Stein, N., Delaney, E., Lebowitz, L., Nash, W. P.,
Silva, C., & Maguen, S. (2009). Moral injury and moral repair in war
veterans: A preliminary model and intervention strategy. *Clinical
Psychology Review*, 29(8), 695–706.
doi:[10.1016/j.cpr.2009.07.003](https://doi.org/10.1016/j.cpr.2009.07.003)

> Formalized moral injury as a clinical construct. Definition quoted in
> Section 8.4.

### Organizational Measurement

[18] Austin, R. D. (1996). *Measuring and Managing Performance in
Organizations*. Dorset House. ISBN: 978-0-932633-36-1.

> Proved that incomplete measurement creates inevitable incentives to
> optimize measured dimensions at the expense of unmeasured ones. The
> information-asymmetry framing closely parallels Section 7. The single
> most important predecessor to this paper's argument.

[19] Muller, J. Z. (2018). *The Tyranny of Metrics*. Princeton University
Press. ISBN: 978-0-691-17495-2.

> Comprehensive treatment of "metric fixation" across education,
> healthcare, policing, and finance. Extensive empirical evidence for the
> patterns theorized in Section 7.4.

### Scheduling Fairness

[20] Coffman, E. G., Shanthikumar, J. G., & Yao, D. D. (1992).
Multiclass queueing systems: Polymatroid structure and optimal scheduling
control. *Operations Research*, 40(S2), S293–S299.

> Conservation laws in scheduling. The schedule-invariance of
> work-weighted completion time (Theorem 2) is an instance of these
> conservation laws.

[21] Angel, E., Bampis, E., & Pascual, F. (2008). How good are SPT
schedules for fair optimality criteria? *Annals of Operations Research*,
159(1), 53–64. doi:[10.1007/s10479-007-0267-0](https://doi.org/10.1007/s10479-007-0267-0)

> Directly measures SPT schedule quality against fairness criteria.
> Closest predecessor in scheduling theory to Section 4's fairness
> analysis.

[22] Bansal, N., & Harchol-Balter, M. (2001). Analysis of SRPT
scheduling: Investigating unfairness. *ACM SIGMETRICS Performance
Evaluation Review*, 29(1), 279–290.
doi:[10.1145/384268.378792](https://doi.org/10.1145/384268.378792)

> Investigates the belief that SRPT unfairly penalizes large jobs in
> computer scheduling. Argues unfairness is smaller than believed but
> acknowledges the core tension.

[23] Wierman, A., & Harchol-Balter, M. (2003). Classifying scheduling
policies with respect to unfairness in an M/GI/1. *ACM SIGMETRICS
Performance Evaluation Review*, 31(1), 238–249.

> Formalizes fairness definitions for scheduling policies by comparison
> to Processor-Sharing.

### Additional References

[24] Campbell, D. T. (1979). Assessing the impact of planned social
change. *Evaluation and Program Planning*, 2(1), 67–90.
doi:[10.1016/0149-7189(79)90048-X](https://doi.org/10.1016/0149-7189(79)90048-X)

> Campbell's Law: "The more any quantitative social indicator is used for
> social decision-making, the more subject it will be to corruption
> pressures and the more apt it will be to distort and corrupt the social
> processes it is intended to monitor." Complements Goodhart's Law [6].

[25] Ferreira, C. M., et al. (2024). It's business: A qualitative study
of moral injury in business settings. *Journal of Business Ethics*.
doi:[10.1007/s10551-024-05615-0](https://doi.org/10.1007/s10551-024-05615-0)

> Extends moral injury to for-profit workplaces. Validates Section 8.4's
> application of Shay/Litz beyond military and healthcare settings.

[26] Bevan, G., & Hood, C. (2006). What's measured is what matters:
Targets and gaming in the English public health care system. *Public
Administration*, 84(3), 517–538.
doi:[10.1111/j.1467-9299.2006.00600.x](https://doi.org/10.1111/j.1467-9299.2006.00600.x)

> Empirically documents gaming behaviors including "hitting the target
> and missing the point." Provides real-world evidence for Section 5.2's
> priority-metric contradiction.

[27] Moore, C. (2012). Why employees do bad things: Moral disengagement
and unethical organizational behavior. *Personnel Psychology*, 65(1),
1–48. doi:[10.1111/j.1744-6570.2011.01237.x](https://doi.org/10.1111/j.1744-6570.2011.01237.x)

> Analyzes moral *disengagement* — the cognitive restructuring enabling
> unethical behavior. Section 8 addresses the complementary phenomenon:
> harm to individuals who *refuse* to disengage.

---

*This proof was developed conversationally and formalized on 2026-03-28.*