From 6d3e4a5cb3344fba3b4867f49ac5d598d692fc6b Mon Sep 17 00:00:00 2001
From: Mortdecai <admin@mortdec.ai>
Date: Sat, 28 Mar 2026 19:03:12 -0400
Subject: [PATCH] Complete structural revision: reorganize, add citations,
 tighten prose

Major restructure into four parts with clear argumentative arc:
  Part I:   Mathematical Foundation (Theorems 1-7)
  Part II:  Priority Systems (Theorems 8-11, IT example)
  Part III: Organizational Dynamics (info asymmetry, psychology, manager strategy)
  Part IV:  Assessment (devil's advocate, related work, conclusion)

Structural changes:
- Added Section 1 (Introduction) framing the contribution
- Promoted Appendices A/B to full Sections 7/8 (load-bearing content)
- Merged Little's Law as a remark in Section 3.2 (was a detour)
- Merged "When Valid" into Devil's Advocate Section 10.5
- Added Section 11 (Related Work) situating the paper
- Cleaned up "Hmm" and "Wait" language in Theorems 11/WSJF
- Renumbered all sections and cross-references
- Net reduction of 400 lines while adding new content

New citations [18-27]:
- Austin (1996) - measurement dysfunction (most important predecessor)
- Muller (2018) - The Tyranny of Metrics
- Coffman/Shanthikumar/Yao (1992) - conservation laws in scheduling
- Angel/Bampis/Pascual (2008) - SPT fairness criteria
- Bansal/Harchol-Balter (2001) - SRPT unfairness
- Wierman/Harchol-Balter (2003) - fairness classifications
- Campbell (1979) - Campbell's Law
- Ferreira et al. (2024) - moral injury in business
- Bevan/Hood (2006) - gaming in public health
- Moore (2012) - moral disengagement (complementary to our argument)

Citations woven into body: Austin referenced in Sections 4.1, 5.3;
scheduling fairness papers in Section 4.2 note; Campbell/Muller in
Section 7.4; moral injury extension in Section 8.4; all contextualized
in Related Work Section 11.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 README.md | 2013 +++++++++++++++++++++--------------------------------
 1 file changed, 805 insertions(+), 1208 deletions(-)

diff --git a/README.md b/README.md
index c6fc466..ba10bcf 100644
--- a/README.md
+++ b/README.md
@@ -7,7 +7,50 @@ of genuine throughput or service quality.
 
 ---
 
-## 1. Definitions
+## 1. Introduction
+
+Many organizations measure task-execution performance by **unweighted mean
+completion time**: the average number of hours (or days) between task
+submission and task resolution, counting each task equally regardless of
+size or priority.
+
+This paper proves that this metric is not merely imprecise but structurally
+biased. It can be improved by reordering work without doing any additional
+work (Theorem 1), while a properly weighted alternative is completely
+immune to scheduling manipulation (Theorem 2). When combined with a
+priority system, the metric actively contradicts the organization's own
+priority classifications (Theorem 9).
+
+The argument proceeds in four parts:
+
+- **Part I** (Sections 2–4) establishes the mathematical foundation:
+  the unweighted mean is gameable by Shortest Processing Time (SPT)
+  scheduling, the work-weighted mean is schedule-invariant, and the
+  resulting service-quality consequences are provably negative.
+
+- **Part II** (Sections 5–6) extends the model to priority-classified
+  tasks, proves the metric becomes adversarial to the priority system,
+  and proposes weighted alternatives with a worked IT service desk example.
+
+- **Part III** (Sections 7–9) examines organizational dynamics: what
+  happens when the metric is reported to clients (information asymmetry),
+  what happens to team members who understand its flaws (psychological
+  harm), and what a single informed manager can do about it (constrained
+  optimization with game-theoretic stability analysis).
+
+- **Part IV** (Sections 10–12) presents honest counterarguments, situates
+  the work in existing literature, and concludes.
+
+The core results build on Smith's (1956) foundational scheduling theory [1],
+extended through game theory [9, 10], organizational measurement theory
+[18, 19], and psychology [11–17] to trace a complete chain from a
+mathematical proof about a specific metric to organizational outcomes.
+
+---
+
+# Part I: Mathematical Foundation
+
+## 2. Definitions
 
 Let there be **n** tasks with processing times $p_1, p_2, \ldots, p_n$.
 
@@ -28,16 +71,19 @@ $$\bar{C}_w(\sigma) = \frac{\sum_{k=1}^{n} p_{\sigma(k)} \cdot C_{\sigma(k)}}{\s
 
 ---
 
-## 2. SPT Is Optimal for the Unweighted Statistic
+## 3. Core Results
 
-**Theorem 1.** The schedule that minimizes $\bar{C}(\sigma)$ is Shortest
-Processing Time first (SPT): sort tasks so that $p_{\sigma(1)} \le p_{\sigma(2)} \le \cdots \le p_{\sigma(n)}$.
+### 3.1 The Unweighted Mean Is Gameable
 
-**Proof (exchange argument).**
+**Theorem 1** (Smith, 1956 [1])**.** The schedule that minimizes
+$\bar{C}(\sigma)$ is Shortest Processing Time first (SPT): sort tasks so
+that $p_{\sigma(1)} \le p_{\sigma(2)} \le \cdots \le p_{\sigma(n)}$.
+
+**Proof (exchange argument [1, 2]).**
 
 Consider any schedule $\sigma$ in which two adjacent tasks $i, j$ satisfy
-$p_i > p_j$ with task $i$ scheduled immediately before task $j$. Let $t$ be the
-start time of task $i$.
+$p_i > p_j$ with task $i$ scheduled immediately before task $j$. Let $t$
+be the start time of task $i$.
 
 | | Task $i$ finishes | Task $j$ finishes | Sum |
 |---|---|---|---|
@@ -48,16 +94,14 @@ The change in the sum of completion times is:
 
 $$(2p_i + p_j) - (p_i + 2p_j) = p_i - p_j > 0$$
 
-Every swap of a longer-before-shorter adjacent pair strictly reduces the total.
-Any non-SPT schedule contains such a pair. Repeated swaps converge to SPT.
-Therefore SPT uniquely minimizes $\bar{C}(\sigma)$. $\blacksquare$
+Every swap of a longer-before-shorter adjacent pair strictly reduces the
+total. Any non-SPT schedule contains such a pair. Repeated swaps converge
+to SPT. Therefore SPT uniquely minimizes $\bar{C}(\sigma)$. $\blacksquare$
 
----
+### 3.2 The Work-Weighted Mean Is Schedule-Invariant
 
-## 3. The Work-Weighted Statistic Is Schedule-Invariant
-
-**Theorem 2.** The work-weighted mean completion time $\bar{C}_w(\sigma)$ is
-the same for every schedule $\sigma$.
+**Theorem 2.** The work-weighted mean completion time $\bar{C}_w(\sigma)$
+is the same for every schedule $\sigma$.
 
 **Proof.**
 
@@ -65,27 +109,24 @@ Expand the numerator:
 
 $$\sum_{k=1}^{n} p_{\sigma(k)} \cdot C_{\sigma(k)} = \sum_{k=1}^{n} p_{\sigma(k)} \sum_{j=1}^{k} p_{\sigma(j)}$$
 
-Reindex by letting $a = \sigma(k)$ and $b = \sigma(j)$. The double sum counts
-every ordered pair $(a, b)$ where $b$ is scheduled no later than $a$:
+Reindex by letting $a = \sigma(k)$ and $b = \sigma(j)$. The double sum
+counts every ordered pair $(a, b)$ where $b$ is scheduled no later than $a$:
 
 $$= \sum_{\substack{a, b \\ b \preceq_\sigma a}} p_a \, p_b$$
 
-For any pair $(a, b)$ with $a \ne b$, exactly one of $\{b \preceq_\sigma a\}$
-or $\{a \prec_\sigma b\}$ holds. The diagonal terms ($a = b$) contribute $p_a^2$
-regardless of order. Therefore:
+For any pair $(a, b)$ with $a \ne b$, exactly one of
+$\{b \preceq_\sigma a\}$ or $\{a \prec_\sigma b\}$ holds. The diagonal
+terms ($a = b$) contribute $p_a^2$ regardless of order. Therefore:
 
 $$\sum_{\substack{a, b \\ b \preceq_\sigma a}} p_a \, p_b = \sum_{a} p_a^2 + \sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b$$
 
-Now consider the complementary sum:
-
-$$\sum_{\substack{a \ne b \\ a \prec_\sigma b}} p_a \, p_b$$
-
-Together the two off-diagonal sums cover all unordered pairs $\{a, b\}$:
+Together with the complementary sum, the two off-diagonal sums cover all
+unordered pairs:
 
 $$\sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b + \sum_{\substack{a \ne b \\ a \prec_\sigma b}} p_a \, p_b = \sum_{a \ne b} p_a \, p_b$$
 
-The right-hand side is schedule-independent. By symmetry of $p_a p_b$, both
-off-diagonal sums are equal:
+The right-hand side is schedule-independent. By symmetry of $p_a p_b$,
+both off-diagonal sums are equal:
 
 $$\sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b = \frac{1}{2} \sum_{a \ne b} p_a \, p_b$$
 
@@ -100,212 +141,121 @@ $$\bar{C}_w(\sigma) = \frac{\frac{1}{2}\left(\sum p_a\right)^2 + \frac{1}{2}\sum
 
 is **constant across all schedules**. $\blacksquare$
 
----
+This is an instance of the conservation laws in scheduling identified by
+Coffman, Shanthikumar, and Yao [20]. The invariance corresponds to
+measuring how long a unit of *work* waits rather than how long a *task*
+waits — the unweighted statistic counts completions rather than work,
+which is why it is gameable. (See also Little [3, 4] for the queueing-
+theoretic context, with the caveat that Little's Law applies directly
+only to steady-state systems, not to the batch case analyzed here.)
 
-## 4. Concrete Example
+### 3.3 Illustrative Example
 
 Two tasks: $A$ with $p_A = 1$ hour, $B$ with $p_B = 10$ hours.
 
-### SPT order (A first)
-
-| Task | Completion time |
-|------|----------------|
-| A | 1 |
-| B | 11 |
-
-- Unweighted mean: $(1 + 11) / 2 = 6.0$
-- Work-weighted mean: $(1 \times 1 + 10 \times 11) / 11 = 111/11 \approx 10.09$
-
-### Reverse order (B first)
-
-| Task | Completion time |
-|------|----------------|
-| B | 10 |
-| A | 11 |
-
-- Unweighted mean: $(10 + 11) / 2 = 10.5$
-- Work-weighted mean: $(10 \times 10 + 1 \times 11) / 11 = 111/11 \approx 10.09$
+| Schedule | $C_A$ | $C_B$ | Unweighted mean | Work-weighted mean |
+|----------|-------|-------|-----------------|-------------------|
+| SPT (A first) | 1 | 11 | 6.0 | 111/11 ≈ 10.09 |
+| Reverse (B first) | 11 | 10 | 10.5 | 111/11 ≈ 10.09 |
 
 SPT appears **4.5 hours better** on the unweighted metric but provides
-**zero improvement** on the work-weighted metric. The apparent advantage exists
-only because the unweighted statistic lets a 1-hour task "vote" equally with
-a 10-hour task.
+**zero improvement** on the work-weighted metric. The apparent advantage
+exists only because the unweighted statistic lets a 1-hour task "vote"
+equally with a 10-hour task.
 
 ---
 
-## 5. Connection to Little's Law
+## 4. Consequences for Service Quality
 
-Little's Law states $L = \lambda W$, where $L$ is the time-averaged number
-of tasks in the system, $\lambda$ is the arrival rate, and $W$ is the
-average time a task spends in the system.
+### 4.1 Starvation of Large Tasks
 
-In a *steady-state* queueing system with fixed arrival and service rates,
-$\lambda$ and the long-run service rate are determined by the workload, not
-by scheduling policy. Little's Law then tells us that $L$ and $W$ are
-linked, but in the batch case (all $n$ tasks present at time 0), $L$ and
-$W$ are both schedule-dependent: $\bar{C} = W$, and
-$L = \sum C_i / \sum p_i$, both of which SPT minimizes.
+**Theorem 3 (Metric Bias).** Any scheduling policy that minimizes
+unweighted mean completion time necessarily maximizes the completion time
+of the largest task.
 
-The invariance we proved in Theorem 2 is more specific: *work-weighted*
-mean completion time $\bar{C}_w$ is constant across schedules. This
-corresponds to measuring the system from the perspective of "how long does
-a unit of *work* wait" rather than "how long does a *task* wait." The
-unweighted statistic measures the latter and is gameable precisely because
-it counts completions rather than work.
-
----
-
-## 6. Consequences
-
-**Theorem 3 (Metric Bias).** Any scheduling policy that minimizes unweighted
-mean completion time necessarily maximizes the completion time of the largest
-task relative to other schedules.
-
-**Proof.** SPT places the largest task last. Its completion time equals the
-total processing time $\sum p_i$, which is the maximum possible completion
-time for any individual task. Meanwhile, FIFO or any non-SPT order would
-allow the large task to finish earlier. $\blacksquare$
+**Proof.** SPT places the largest task last. Its completion time equals
+the total processing time $\sum p_i$, which is the maximum possible
+completion time for any individual task. Under any schedule that does not
+place the largest task last, that task completes strictly earlier.
+$\blacksquare$
 
 This creates a **starvation incentive**: rational agents optimizing the
-unweighted statistic will indefinitely defer large tasks in favor of
-small ones.
+unweighted statistic will indefinitely defer large tasks in favor of small
+ones. Austin [18] identified this general pattern — that incomplete
+measurement creates incentives to optimize the measured dimension at the
+expense of unmeasured ones — in the context of organizational performance
+management. Theorem 3 provides the specific mechanism for task scheduling.
 
-### Real-world manifestations
-
-| Domain | Gameable metric | Perverse outcome |
-|--------|----------------|------------------|
-| Support desks | Tickets closed / day | Complex issues ignored |
-| Sprint planning | Story count velocity | Work split into trivial pieces |
-| Emergency rooms | Average wait time | Critical patients deprioritized |
-| Academic publishing | Papers per year | Incremental work favored over deep research |
-
----
-
-## 7. Impact on Client Satisfaction and Team Productivity
-
-The preceding theorems are not merely abstract. They have direct, provable
-consequences for client satisfaction and team productivity when a team adopts
-unweighted mean completion time as its performance metric.
-
-### 7.1 Defining Client Satisfaction: The Slowdown Ratio
-
-A client submitting a task of size $p_i$ has an expectation anchored to that
-size. The natural measure of their experience is the **slowdown ratio**:
-
-$$S_i = \frac{C_i}{p_i}$$
-
-This is the factor by which the client's wait exceeds the task's inherent
-processing time. A slowdown of 1 means no queuing delay at all. A slowdown
-of 10 means the client waited 10x longer than the work itself required.
-
-Client satisfaction is inversely related to slowdown: a client who waits
-2x their task size is more satisfied than one who waits 20x, regardless of
-the absolute times involved.
+### 4.2 Maximum Completion Time for the Largest Task
 
 **Theorem 4 (SPT Uniquely Maximizes Completion Time of the Largest Task).**
 Among all schedules, SPT is the unique policy that assigns the maximum
 possible completion time ($\sum p_i$) to the largest task.
 
-**Proof.**
-
-SPT sorts tasks in ascending order of $p_i$, placing the largest task
-$p_{\max}$ in the last position. The last task in any schedule has
-completion time $\sum_{i=1}^{n} p_i$, which is the maximum completion time
-any individual task can receive. Therefore, under SPT:
-
-$$C_{\max\text{-task}}^{\text{SPT}} = \sum_{i=1}^{n} p_i$$
-
-Under any schedule that does not place $p_{\max}$ last, the largest task
-completes strictly before $\sum p_i$. SPT is the unique schedule (among
-those ordered by processing time) that assigns this worst-case completion
-time to the largest task.
-
-Note on slowdown: SPT actually *compresses* slowdown ratios ($S_i = C_i / p_i$)
-because larger tasks in later positions have large denominators that absorb
-the accumulated sum. For example, with tasks $[1, 5, 10]$:
-
-- SPT: slowdowns $[1, 1.2, 1.6]$ — low variance
-- LPT: slowdowns $[1, 3, 16]$ — high variance
-
-SPT's harm to large-task clients is not visible in the slowdown ratio. It is
-visible in **absolute completion time**: the largest task finishes last, at
-$\sum p_i$, while under any other ordering it finishes earlier. $\blacksquare$
+**Proof.** SPT sorts tasks in ascending order of $p_i$, placing the largest
+task $p_{\max}$ in the last position. The last task in any schedule has
+completion time $\sum_{i=1}^{n} p_i$, which is the maximum any individual
+task can receive. Under any schedule that does not place $p_{\max}$ last,
+it completes strictly before $\sum p_i$. $\blacksquare$
 
 **Corollary 4.1.** A team optimizing unweighted mean completion time will
 systematically deliver the worst experience to clients with the most
-complex needs.
+complex needs. This is not a side effect — it is the *mechanism* by which
+the metric improves.
 
-This is not a side effect — it is the *mechanism* by which the metric improves.
-The only way to lower the unweighted average is to complete more small tasks
-early, which necessarily means completing large tasks later. The metric
-improves *because* high-effort clients are deprioritized.
+**Note on slowdown ratios.** SPT actually *compresses* slowdown ratios
+($S_i = C_i / p_i$) because larger tasks in later positions have large
+denominators that absorb the accumulated sum. For example, with tasks
+$[1, 5, 10]$: SPT gives slowdowns $[1, 1.2, 1.6]$ (low variance) while
+LPT gives $[1, 3, 16]$ (high variance). SPT's harm to large-task clients
+is not visible in the slowdown ratio — it is visible in **absolute
+completion time**. This distinction is important: the scheduling fairness
+literature [21, 22, 23] has debated SPT/SRPT unfairness primarily through
+slowdown-based measures, which can obscure the absolute-delay burden
+proved below.
 
-### 7.2 The Absolute Delay Burden
+### 4.3 Delay Concentration
 
-The slowdown ratio $S_i = C_i / p_i$ might suggest SPT is *fair* — it
-compresses slowdown variance by giving everyone a ratio close to 1. But
-this obscures the real cost. The correct measure of burden is the
-**absolute delay** experienced by each task:
+**Theorem 5 (SPT Concentrates Delay on the Largest Task).** Under SPT,
+the largest task bears more absolute delay than under any other schedule.
 
-$$\Delta_i = C_i - p_i$$
-
-This is the time a task spends waiting for other tasks, independent of its
-own size. Under any sequential schedule, the total delay across all tasks
-is schedule-dependent (it equals $\sum C_i - \sum p_i$), and SPT minimizes
-this total. But the *distribution* of delay matters.
-
-**Theorem 5 (SPT Concentrates Delay on the Largest Task).** Under SPT, the
-largest task bears more absolute delay than under any other schedule.
-
-**Proof.** Under SPT, the largest task is in position $n$ with:
+**Proof.** Define absolute delay as $\Delta_i = C_i - p_i$ (time spent
+waiting, independent of own size). Under SPT, the largest task is in
+position $n$ with:
 
 $$\Delta_{\max\text{-task}}^{\text{SPT}} = C_n - p_n = \sum_{i=1}^{n-1} p_i$$
 
 This is the sum of all other tasks' processing times — the maximum possible
 delay for any single task. Under any schedule where the largest task is not
-last, its delay is strictly less than $\sum_{i \ne \max} p_i$.
+last, its delay is strictly less. Meanwhile, SPT gives the smallest task
+zero delay ($\Delta_1^{\text{SPT}} = 0$). The entire queuing burden is
+shifted from small tasks to large tasks. $\blacksquare$
 
-Meanwhile, SPT gives the smallest task zero delay ($\Delta_1^{\text{SPT}} = 0$).
-The entire queuing burden is shifted from small tasks to large tasks.
-$\blacksquare$
+SPT minimizes *total* delay (good for aggregate efficiency) by
+concentrating delay onto the tasks best able to absorb it in slowdown-ratio
+terms. But in absolute terms — hours spent waiting — the largest task bears
+the full weight.
 
-The tension is this: SPT minimizes total delay (good for aggregate
-efficiency) by concentrating delay onto the tasks best able to "absorb" it
-in slowdown-ratio terms. But in absolute terms — hours spent waiting — the
-largest task bears the full weight. If that task represents a critical
-business need, the absolute delay, not the ratio, determines the damage.
-
-### 7.3 Productivity Is Not Improved
+### 4.4 Throughput Invariance
 
 **Theorem 6 (Throughput Invariance).** Total work completed over any time
 horizon $T$ is identical under all scheduling policies.
 
-**Proof.** The executor processes work at a fixed rate. Over time $T$, the
-total work completed is:
-
-$$W(T) = \sum_{\{i : C_i \le T\}} p_i + \text{(partial progress on current task)}$$
-
-In the non-preemptive case (tasks run to completion once started), $W(T)$ may
-vary slightly at the boundary depending on which task is in progress at time
-$T$. However, over any horizon $T \ge \sum p_i$ (i.e., long enough to
-complete all tasks), the total work done is exactly $\sum p_i$ regardless
-of order.
-
-For the steady-state case with ongoing arrivals, the long-run throughput is
-determined by the service rate $\mu$ and is completely independent of
-scheduling:
+**Proof.** The executor processes work at a fixed rate. Over any horizon
+$T \ge \sum p_i$, the total work done is exactly $\sum p_i$ regardless of
+order. For the steady-state case with ongoing arrivals, the long-run
+throughput is determined by the service rate $\mu$ and is completely
+independent of scheduling:
 
 $$\lim_{T \to \infty} \frac{W(T)}{T} = \mu \quad \text{for all schedules } \sigma$$
 
 $\blacksquare$
 
 **Corollary 6.1.** A team that switches from any scheduling policy to SPT
-will observe an improvement in unweighted mean completion time with
-**zero change in actual throughput**.
+will observe an improvement in unweighted mean completion time with **zero
+change in actual throughput**. The metric improves. The output does not.
 
-The metric improves. The output does not.
-
-### 7.4 The Compound Effect: Satisfaction Down, Productivity Flat
+### 4.5 The Compound Effect
 
 Combining Theorems 4, 5, and 6:
 
@@ -315,26 +265,19 @@ Combining Theorems 4, 5, and 6:
 | Delay for small tasks | Minimized — approaches zero (SPT) |
 | Delay for large tasks | **Maximized** — bears all queuing burden (Theorem 5) |
 | Completion time of largest task | **Maximum possible**: $\sum p_i$ (Theorem 4) |
-| Overall perceived quality of service | **Net negative** (see below) |
 
 The net effect on perceived quality is negative because:
 
-1. **Loss aversion is asymmetric.** A client whose 100-hour task is
-   deprioritized to last experiences a large, salient negative. A client
-   whose 1-hour task moves from position 5 to position 1 experiences a
-   small, often unnoticed positive. The absolute dissatisfaction created
-   exceeds the absolute satisfaction gained.
+1. **Loss aversion is asymmetric** [8]. A client whose 100-hour task is
+   deprioritized experiences a large, salient negative. A client whose
+   1-hour task is expedited experiences a small, often unnoticed positive.
 
 2. **High-effort tasks correlate with high-value clients.** Large tasks
    are disproportionately likely to come from major clients, complex
-   contracts, or critical business needs. Systematically giving these
-   clients the worst experience is anti-correlated with revenue and
-   retention.
+   contracts, or critical business needs.
 
 3. **Starvation compounds.** In a continuous system (Theorem 3), large
-   tasks are not merely delayed — they may be **indefinitely deferred**
-   as new small tasks keep arriving. The affected client's satisfaction
-   does not merely decrease; it collapses entirely.
+   tasks may be **indefinitely deferred** as new small tasks keep arriving.
 
 **Theorem 7 (The Core Result).** For a team processing tasks of non-uniform
 size, adopting unweighted mean completion time as a performance metric:
@@ -345,38 +288,23 @@ size, adopting unweighted mean completion time as a performance metric:
 (c) **Concentrating all queuing delay** onto the largest tasks while
     eliminating delay for the smallest (Theorem 5).
 
-This is not a tradeoff — there is no compensating benefit on the productivity
-side. The metric creates a pure transfer of service quality from high-effort
-clients to low-effort clients, with no net work gained.
-
-**A team using unweighted mean completion time as its performance metric
-will, under rational optimization, simultaneously fail to improve
-productivity and systematically degrade the experience of its most
-demanding clients.** $\blacksquare$
+This is not a tradeoff. The metric creates a pure transfer of service
+quality from high-effort clients to low-effort clients, with no net work
+gained. $\blacksquare$
 
 ---
 
-## 8. When Unweighted Mean Completion Time Is Valid
+# Part II: Priority Systems
 
-For completeness: the unweighted metric is appropriate **if and only if**
-all tasks are approximately equal in size ($p_i \approx p_j$ for all $i, j$).
-In this case, the work-weighted and unweighted statistics converge, SPT and
-FIFO produce similar schedules, and slowdown ratios are naturally equal.
+## 5. Breakdown Under Priority Classification
 
-The pathology arises specifically from **variance in task size**. The greater
-the variance, the greater the distortion, and the more damage the metric
-causes when optimized.
+The preceding sections proved that unweighted mean completion time is
+biased when tasks vary in size. We now show that introducing a **priority
+system** — as virtually all real teams use — causes the metric to become
+not merely biased but **actively adversarial** to the organization's stated
+goals.
 
----
-
-## 9. Complete Breakdown Under Priority Classification
-
-The preceding sections proved that unweighted mean completion time is biased
-when tasks vary in size. We now show that introducing a **priority system** —
-as virtually all real teams use — causes the metric to become not merely
-biased but **actively adversarial** to the organization's stated goals.
-
-### 9.1 Extended Model: Tasks With Priority
+### 5.1 Extended Model: Tasks With Priority
 
 Let each task $i$ have processing time $p_i$ and a priority class
 $q_i \in \{1, 2, 3, 4\}$ where 1 is the highest priority (critical) and
@@ -388,213 +316,140 @@ The specific weights are illustrative; the results hold for any strictly
 decreasing weight function. The key property is that priority is assigned
 by **business impact**, not by task size.
 
-### 9.2 The Metric Contradicts the Priority System
+### 5.2 The Metric Contradicts the Priority System
 
 **Theorem 8 (Priority-Size Inversion).** When priority is independent of
-task size, the schedule that minimizes unweighted mean completion time (SPT)
-will, in expectation, complete low-priority tasks before high-priority tasks
-of greater size.
+task size, the schedule that minimizes unweighted mean completion time
+(SPT) will, in expectation, complete low-priority tasks before
+high-priority tasks of greater size.
 
-**Proof.**
-
-SPT orders tasks by $p_i$ ascending, regardless of $q_i$. Consider two tasks:
+**Proof.** SPT orders tasks by $p_i$ ascending, regardless of $q_i$.
+Consider two tasks:
 
 - Task A: $p_A = 40$ hours, $q_A = 1$ (Critical — e.g., server outage)
 - Task B: $p_B = 0.5$ hours, $q_B = 4$ (Low — e.g., cosmetic UI fix)
 
-SPT schedules B before A. The unweighted mean completion time for this pair:
+SPT schedules B before A. The unweighted mean for this pair:
 
-$$\bar{C}^{\text{SPT}} = \frac{0.5 + 40.5}{2} = 20.5$$
-
-The priority-respecting order (A before B):
-
-$$\bar{C}^{\text{priority}} = \frac{40 + 40.5}{2} = 40.25$$
+$$\bar{C}^{\text{SPT}} = \frac{0.5 + 40.5}{2} = 20.5 \qquad \bar{C}^{\text{priority}} = \frac{40 + 40.5}{2} = 40.25$$
 
 The metric declares SPT nearly **twice as good** — despite completing a
-cosmetic fix while a server outage burns for an additional 0.5 hours.
+cosmetic fix while a server outage burns.
 
-In general, for $n$ tasks where priority $q_i$ is statistically independent
-of processing time $p_i$ (a reasonable assumption, since priority reflects
-business impact while processing time reflects technical complexity):
+In general, when $q_i$ is statistically independent of $p_i$, SPT's
+ordering has **zero correlation** with priority. In practice, Critical
+tasks (outages, security incidents, data loss) often require more work
+than Low tasks, so the metric is plausibly **anti-correlated** with the
+priority system. $\blacksquare$
 
-$$\text{Corr}(p_i, q_i) \approx 0$$
+### 5.3 Information Destruction
 
-SPT's ordering is determined entirely by $p_i$. The expected position of a
-task in the SPT schedule has **zero correlation** with its priority. A
-Critical task is equally likely to be scheduled first or last.
-
-More precisely: the expected fraction of Critical tasks in the bottom half
-of the SPT schedule equals the fraction of Critical tasks whose processing
-time exceeds the median. In practice, Critical tasks (outages, security
-incidents, data loss) often require more work, so this fraction exceeds 50%.
-The metric is not merely uncorrelated with priority — it is plausibly
-**anti-correlated**. $\blacksquare$
-
-### 9.3 Dimensionality Collapse
-
-The unweighted mean completion time reduces a three-dimensional task
-$(p_i, q_i, C_i)$ to a one-dimensional signal ($C_i$), then averages
-that signal uniformly. This discards two of the three dimensions:
-
-1. **Priority ($q_i$) is completely ignored.** A critical task and a
-   cosmetic task contribute identically to the mean.
-2. **Size ($p_i$) is implicitly inverted.** Small tasks are rewarded with
-   early completion, large tasks are punished — regardless of their
-   importance.
+The unweighted mean reduces a three-dimensional task $(p_i, q_i, C_i)$ to
+a one-dimensional signal ($C_i$), then averages uniformly. This discards
+priority entirely and implicitly inverts size.
 
 **Theorem 9 (Information Destruction).** Let $I(\sigma)$ be the mutual
-information between the schedule's implicit priority ranking (position in
-schedule) and the actual priority assignment $q_i$. For SPT:
+information between the schedule's implicit priority ranking (position)
+and the actual priority assignment $q_i$. For SPT:
 
 $$I(\sigma_{\text{SPT}}) = 0 \quad \text{when } p_i \perp q_i$$
 
-**Proof.** SPT assigns positions based solely on $p_i$. When $p_i$ and $q_i$
-are independent, knowing a task's position in the SPT schedule provides
-zero information about its priority. The schedule is statistically
-independent of the priority system.
-
-Contrast this with a priority-first schedule, where $I > 0$ by construction.
-$\blacksquare$
+**Proof.** SPT assigns positions based solely on $p_i$. When $p_i$ and
+$q_i$ are independent, knowing a task's position in the SPT schedule
+provides zero information about its priority. $\blacksquare$
 
 **Corollary 9.1.** A team that optimizes unweighted mean completion time
 is operating a scheduling system that carries zero information about its
 own priority classification. The priority field in their ticketing system
 is, with respect to execution order, decorative.
 
-### 9.4 Quantifying the Damage: Priority-Weighted Delay Cost
+This is an instance of what Austin [18] calls the fundamental problem of
+incomplete measurement: when the measurement system captures only a subset
+of the relevant dimensions, optimizing the measurement systematically
+degrades the unmeasured dimensions.
+
+### 5.4 Priority-Weighted Delay Cost
 
 Define the **priority-weighted delay cost** of a schedule:
 
 $$D(\sigma) = \sum_{i=1}^{n} w(q_i) \cdot C_i$$
 
-This measures the total business-impact-weighted time spent waiting.
+**Theorem 10 (SPT and Priority-Weighted Delay Cost).** The optimal
+schedule for minimizing $D(\sigma)$ is WSJF: order by $w(q_i)/p_i$
+descending [1, 5]. SPT's ordering — by $1/p_i$ descending — ignores
+priority entirely and produces higher $D$ than priority-respecting
+alternatives when priority is correlated with task size.
 
-**Theorem 10 (SPT and Priority-Weighted Delay Cost).**
-The optimal schedule for minimizing priority-weighted delay cost $D(\sigma)$
-is WSJF: order by $w(q_i)/p_i$ descending. SPT's ordering — by $1/p_i$
-descending — ignores priority entirely and produces higher $D$ than
-priority-respecting alternatives when priority is correlated with task size.
-
-**Proof.** By the standard exchange argument (as in Theorem 1), swapping
-adjacent tasks $i, j$ in a schedule changes $D$ by:
+**Proof.** By the exchange argument, swapping adjacent tasks $i, j$
+changes $D$ by:
 
 $$\Delta D = w(q_j) \cdot p_i - w(q_i) \cdot p_j$$
 
-The swap improves $D$ when $\Delta D > 0$, i.e., when $w(q_j)/p_j > w(q_i)/p_i$
-but $j$ is scheduled after $i$. Therefore the optimal order is decreasing
-$w(q_i)/p_i$ — this is the WSJF rule.
+The swap improves $D$ when $w(q_j)/p_j > w(q_i)/p_i$ but $j$ is
+scheduled after $i$. Therefore the optimal order is decreasing
+$w(q_i)/p_i$ — the WSJF rule. SPT corresponds to WSJF only when
+$w(q_i) = \text{const}$ (all tasks have equal priority).
 
-SPT orders by $p_i$ ascending (equivalently, $1/p_i$ descending), which
-corresponds to WSJF only when $w(q_i) = \text{const}$ — i.e., when all
-tasks have equal priority.
-
-**Example.** Two tasks: Critical ($w = 8$, $p_H = 10$) and Low ($w = 1$, $p_L = 1$).
-
-WSJF scores: Critical = $8/10 = 0.8$, Low = $1/1 = 1.0$.
-
-WSJF places the Low task first (higher $w/p$), same as SPT. Here, SPT and
-WSJF agree because the Low task's tiny size dominates despite its low weight.
-
-Now consider: Critical ($w = 8$, $p_H = 3$) and Low ($w = 1$, $p_L = 2$).
-
-WSJF scores: Critical = $8/3 = 2.67$, Low = $1/2 = 0.5$.
-
-WSJF places Critical first. SPT places Low first (smaller $p$). The costs:
+**Example.** Critical ($w = 8$, $p = 3$) and Low ($w = 1$, $p = 2$):
 
 - SPT (Low first): $D = 1 \cdot 2 + 8 \cdot 5 = 42$
 - WSJF (Critical first): $D = 8 \cdot 3 + 1 \cdot 5 = 29$
 
-SPT incurs 45% more priority-weighted delay because it ignores the 8x
-priority weight of the Critical task.
-
-In general, SPT diverges from WSJF — and produces suboptimal $D$ — whenever
-priority and task size are not perfectly inversely correlated. In practice,
-Critical tasks tend to be larger (outages, security incidents), making the
-divergence systematic rather than occasional. $\blacksquare$
+SPT incurs 45% more priority-weighted delay. In practice, Critical tasks
+tend to be larger (outages, security incidents), making the divergence
+systematic. $\blacksquare$
 
 ---
 
-## 10. A Proposed Solution: Priority-Weighted Completion Score
+## 6. Proposed Solutions
 
-### 10.1 The Metric
+### 6.1 Priority-Weighted Metrics
 
 Replace unweighted mean completion time with the **Priority-Weighted
 Completion Score (PWCS)**:
 
 $$\text{PWCS}(\sigma) = \frac{\sum_{i=1}^{n} w(q_i) \cdot \frac{C_i}{p_i}}{\sum_{i=1}^{n} w(q_i)}$$
 
-This is the priority-weighted mean slowdown ratio. It measures:
+This is the priority-weighted mean slowdown ratio. It measures how long
+each task waited relative to its size, weighted by how much that task
+mattered. Lower is better.
 
-- **How long each task waited relative to its size** (the slowdown $C_i / p_i$),
-  weighted by
-- **How much that task mattered** (the priority weight $w(q_i)$).
+**Properties:**
 
-Lower is better. A PWCS of 1.0 means every task was completed instantly
-with zero queuing delay. A PWCS of 3.0 means the average task waited 3x
-its processing time, weighted by importance.
+1. **Priority-respecting.** Delays to Critical tasks cost 8x more than
+   delays to Low tasks.
+2. **Size-fair.** Uses slowdown ratio $C_i / p_i$, so large tasks are not
+   penalized for being large.
+3. **Not gameable by SPT.** Reordering by processing time does not
+   systematically improve the score.
+4. **Reduces to unweighted mean when tasks are uniform.** A strict
+   generalization.
 
-### 10.2 Properties of PWCS
+### 6.2 Optimal Policy: WSJF
 
-**Property 1: Priority-respecting.** PWCS penalizes delays to high-priority
-tasks more heavily than low-priority tasks. A 2-hour delay to a Critical
-task costs 8x more than the same delay to a Low task.
+**Theorem 11.** The schedule minimizing the priority-weighted completion
+time $\text{PWCT}(\sigma) = \sum w(q_i) \cdot C_i / \sum w(q_i)$ processes
+tasks in order of decreasing $w(q_i)/p_i$ — the **Weighted Shortest Job
+First (WSJF)** rule [1, 5].
 
-**Property 2: Size-fair.** By using the slowdown ratio $C_i / p_i$ rather
-than raw completion time $C_i$, the metric does not inherently penalize
-large tasks for being large. A 40-hour task that waits 80 hours contributes
-the same slowdown (2.0) as a 1-hour task that waits 2 hours.
+**Proof.** By the exchange argument (as in Theorem 10), the swap of
+adjacent tasks $i, j$ improves PWCT when $w(q_j)/p_j > w(q_i)/p_i$ but
+$j$ is scheduled after $i$. The optimal order is therefore decreasing
+$w(q_i)/p_i$. $\blacksquare$
 
-**Property 3: Not gameable by SPT.** Because the metric weights by priority
-and normalizes by task size, reordering tasks by processing time does not
-systematically improve the score. The optimal strategy is to minimize
-slowdown for high-priority tasks — i.e., to **actually respect the priority
-system**.
+Within a priority class, this reduces to SPT (shortest first). Across
+classes, a Critical 4-hour task ($w/p = 2.0$) beats a Low 1-hour task
+($w/p = 1.0$).
 
-**Property 4: Reduces to unweighted mean when tasks are uniform.** If all
-tasks have equal priority and equal size, PWCS equals the unweighted mean
-completion time divided by the common task size. It is a strict
-generalization.
+**Practical caveat.** Pure WSJF can place tiny Low-priority tasks ahead
+of large Critical tasks (a 15-minute Low task has $w/p = 1/0.25 = 4.0$,
+beating a 6-hour Critical at $w/p = 8/6 = 1.33$). In practice, this is
+mitigated by enforcing **strict priority-class ordering** and applying
+WSJF only *within* each class.
 
-### 10.3 Optimal Policy for PWCS
+### 6.3 Applied Example: IT Service Desk
 
-**Theorem 11.** The schedule minimizing PWCS processes tasks in order of
-decreasing $w(q_i) / p_i$ — highest priority first, breaking ties by
-shortest processing time within the same priority class.
-
-**Proof (exchange argument, as in Theorem 1).**
-
-Consider adjacent tasks $i, j$ with $i$ before $j$. Each task's contribution
-to the PWCS numerator depends on the completion times of both. Swapping $i$
-and $j$:
-
-The change in the weighted slowdown sum is proportional to:
-
-$$w(q_i) \cdot \frac{p_j}{p_i} - w(q_j) \cdot \frac{p_i}{p_j}$$
-
-The swap improves PWCS when this quantity is positive, i.e., when:
-
-$$\frac{w(q_i)}{p_i^2} > \frac{w(q_j)}{p_j^2}$$
-
-Hmm — this doesn't simplify as cleanly due to the ratio structure. Let
-us instead consider the more practical **priority-weighted completion time**:
-
-$$\text{PWCT}(\sigma) = \frac{\sum_{i=1}^{n} w(q_i) \cdot C_i}{\sum_{i=1}^{n} w(q_i)}$$
-
-For PWCT, the exchange argument gives: swap improves the score when
-$w(q_j) \cdot p_i > w(q_i) \cdot p_j$, i.e., when $w(q_j)/p_j > w(q_i)/p_i$
-but $j$ is scheduled after $i$. The optimal order is therefore decreasing
-$w(q_i)/p_i$, which is the **Weighted Shortest Job First (WSJF)** rule:
-
-$$\text{Schedule by: } \frac{w(q_i)}{p_i} \text{ descending}$$
-
-This means: within a priority class, do short tasks first; across priority
-classes, a Critical 8-hour task ($w/p = 8/8 = 1.0$) ties with a Low 1-hour
-task ($w/p = 1/1 = 1.0$) — but a Critical 4-hour task ($w/p = 8/4 = 2.0$)
-beats both. $\blacksquare$
-
-### 10.4 Applied Example: IT Service Desk
-
-Consider an IT team with the following ticket queue on a Monday morning:
+Consider an IT team with the following ticket queue:
 
 | Ticket | Priority | Type | Est. Hours |
 |--------|----------|------|-----------|
@@ -607,46 +462,23 @@ Consider an IT team with the following ticket queue on a Monday morning:
 | T7 | P2 (High) | Printer fleet offline | 2 |
 | T8 | P4 (Low) | Archive old shared drive folder | 0.25 |
 
-**SPT order (optimizing unweighted mean):** T8, T4, T5, T3, T7, T6, T2, T1
+**SPT order** (optimizing unweighted mean): T8, T4, T5, T3, T7, T6, T2, T1
 
-| Position | Ticket | Priority | Hours | Completion | Slowdown |
-|----------|--------|----------|-------|------------|----------|
+| Pos | Ticket | Priority | Hours | Completion | Slowdown |
+|-----|--------|----------|-------|------------|----------|
 | 1 | T8 (archive folder) | P4 Low | 0.25 | 0.25 | 1.0 |
 | 2 | T4 (wallpaper) | P4 Low | 0.5 | 0.75 | 1.5 |
 | 3 | T5 (software) | P3 Med | 1 | 1.75 | 1.75 |
 | 4 | T3 (laptop) | P3 Med | 2 | 3.75 | 1.875 |
 | 5 | T7 (printers) | P2 High | 2 | 5.75 | 2.875 |
 | 6 | T6 (backups) | P1 Crit | 3 | 8.75 | 2.917 |
-| 7 | T2 (VPN) | P2 High | 4 | 12.75 | 3.1875 |
+| 7 | T2 (VPN) | P2 High | 4 | 12.75 | 3.188 |
 | 8 | T1 (email) | P1 Crit | 6 | 18.75 | 3.125 |
 
-- **Unweighted mean completion:** $(0.25 + 0.75 + 1.75 + 3.75 + 5.75 + 8.75 + 12.75 + 18.75) / 8 = 6.5625$ hours
-- **PWCT:** $(1 \cdot 0.25 + 1 \cdot 0.75 + 2 \cdot 1.75 + 2 \cdot 3.75 + 4 \cdot 5.75 + 8 \cdot 8.75 + 4 \cdot 12.75 + 8 \cdot 18.75) / 30 = 306/30 = 10.2$ hours
-- Email server is down for **18.75 hours**. Database backups fail for **8.75 hours**.
+**Practical WSJF** (priority-class-first, SPT within class):
 
-**WSJF order (optimizing PWCT by $w(q)/p$ descending):**
-
-| Ticket | Priority | Hours | $w/p$ |
-|--------|----------|-------|-------|
-| T6 | P1 Crit | 3 | 8/3 = 2.667 |
-| T8 | P4 Low | 0.25 | 1/0.25 = 4.0 |
-| T5 | P3 Med | 1 | 2/1 = 2.0 |
-| T4 | P4 Low | 0.5 | 1/0.5 = 2.0 |
-| T1 | P1 Crit | 6 | 8/6 = 1.333 |
-| T7 | P2 High | 2 | 4/2 = 2.0 |
-| T2 | P2 High | 4 | 4/4 = 1.0 |
-| T3 | P3 Med | 2 | 2/2 = 1.0 |
-
-Wait — T8 has $w/p = 4.0$, the highest. That places a Low-priority task
-first, which feels wrong. This reveals an important practical point:
-**pure WSJF can still be gamed by tiny tasks** because their small $p$
-inflates the ratio. In practice, this is mitigated by enforcing strict
-priority class ordering and only applying WSJF *within* priority classes.
-
-**Practical WSJF (priority-class-first, then $w/p$ within class):**
-
-| Position | Ticket | Priority | Hours | Completion |
-|----------|--------|----------|-------|------------|
+| Pos | Ticket | Priority | Hours | Completion |
+|-----|--------|----------|-------|------------|
 | 1 | T6 (backups) | P1 Crit | 3 | 3 |
 | 2 | T1 (email) | P1 Crit | 6 | 9 |
 | 3 | T7 (printers) | P2 High | 2 | 11 |
@@ -656,428 +488,74 @@ priority class ordering and only applying WSJF *within* priority classes.
 | 7 | T8 (archive) | P4 Low | 0.25 | 18.25 |
 | 8 | T4 (wallpaper) | P4 Low | 0.5 | 18.75 |
 
-- **Unweighted mean completion:** $(3 + 9 + 11 + 15 + 16 + 18 + 18.25 + 18.75) / 8 = 13.625$ hours
-- **PWCT:** $(8 \cdot 3 + 8 \cdot 9 + 4 \cdot 11 + 4 \cdot 15 + 2 \cdot 16 + 2 \cdot 18 + 1 \cdot 18.25 + 1 \cdot 18.75) / 30 = 305/30 = 10.167$ hours
-- Email server restored in **9 hours**. Backups fixed in **3 hours**.
-
-### Comparison
+**Comparison:**
 
 | Metric | SPT | Practical WSJF | Winner |
 |--------|-----|----------------|--------|
-| Unweighted mean completion | **6.5625 hrs** | 13.625 hrs | SPT |
-| Priority-weighted completion (PWCT) | 10.2 hrs | **10.167 hrs** | WSJF |
+| Unweighted mean completion | **6.56 hrs** | 13.63 hrs | SPT |
+| P1 mean time to resolution | 13.75 hrs | **6 hrs** | WSJF |
+| P2 mean time to resolution | 9.25 hrs | **13 hrs** | SPT |
 | Time to fix email server | 18.75 hrs | **9 hrs** | WSJF |
 | Time to fix database backups | 8.75 hrs | **3 hrs** | WSJF |
-| Time to fix printers | 5.75 hrs | **11 hrs** | SPT |
 | Time to update wallpaper | **0.75 hrs** | 18.75 hrs | SPT |
 
-The PWCT values are nearly identical (10.2 vs 10.167) because PWCT — as a
-*weighted average of completion times* — is dampened by the fact that total
-work is constant. **PWCT is not the right metric for this comparison.** The
-real difference is visible in the individual completion times of critical
-tasks: the email server is down for 18.75 hours under SPT versus 9 hours
-under WSJF. The database backups fail for 8.75 hours versus 3 hours.
+The aggregate priority-weighted completion times are nearly identical
+(PWCT: 10.2 vs 10.17) because aggregation hides distributional damage.
+The real difference is in the **per-priority-class** breakdown: the email
+server is down for 18.75 hours under SPT versus 9 hours under WSJF. The
+database backups fail for 8.75 hours versus 3.
 
-The better comparison metric is the **priority-weighted delay cost**
-$D = \sum w(q_i) \cdot C_i$ (not normalized):
-
-- SPT: $D = 306$ priority-weighted hours
-- Practical WSJF: $D = 305$ priority-weighted hours
-
-Again, the aggregate is similar. The damage from SPT is not in the
-aggregate — it is in the *distribution*: critical systems burn while
-cosmetic tasks are polished. A metric that cannot distinguish between these
-two schedules — despite one leaving the email server down for twice as long
-— is not measuring what matters.
-
-The unweighted metric, however, confidently reports SPT as **more than twice
-as efficient** (6.56 vs 13.63), rewarding the team that updated desktop
+The unweighted metric confidently reports SPT as **more than twice as
+efficient** (6.56 vs 13.63), rewarding the team that updated desktop
 wallpaper while the email server was on fire.
 
-### 10.5 Recommended Metric Suite
+### 6.4 Recommended Metric Suite
 
-The IT example reveals that even priority-weighted aggregate metrics (PWCT)
-can fail to distinguish good from bad schedules, because aggregation hides
-distributional damage. No single metric suffices. A complete measurement
-system for a priority-based team should track:
+Even priority-weighted aggregate metrics can fail to distinguish good from
+bad schedules, because aggregation hides distributional damage. No single
+metric suffices. A complete measurement system should track:
 
 | Metric | What it measures | Formula |
 |--------|-----------------|---------|
 | **Mean completion by priority class** | Per-class responsiveness | $\bar{C}$ filtered by $q$ |
-| **P1 mean time to resolution** | Critical incident response | $\bar{C}$ filtered to $q = 1$ |
+| **P1 mean time to resolution** | Critical incident response | $\bar{C}$ for $q = 1$ |
 | **Throughput** | Raw work capacity | Work-hours completed / calendar time |
-| **Aging violations** | Starvation prevention | Count of tasks exceeding SLA by priority |
-| **Max completion time (P1/P2)** | Worst-case critical response | $\max(C_i)$ filtered to $q \le 2$ |
+| **Aging violations** | Starvation prevention | Tasks exceeding SLA by priority |
+| **Max completion time (P1/P2)** | Worst-case critical response | $\max(C_i)$ for $q \le 2$ |
 
-The key insight from our analysis: **per-priority-class metrics** (rows 1-2,
-5) expose scheduling failures that aggregate metrics hide. If P1 mean time
-to resolution is 14 hours while P4 mean is 0.5 hours, the team is
-optimizing the wrong metric — regardless of what the aggregate says.
+The key insight: **per-priority-class metrics** expose scheduling failures
+that aggregate metrics hide.
 
 ---
 
-## 11. Devil's Advocate: The Case for Unweighted Mean Completion Time
+# Part III: Organizational Dynamics
 
-Intellectual honesty requires acknowledging where the preceding argument
-has limits. The following are genuine counterarguments — not strawmen.
+## 7. When the Metric Is the Product
 
-### 11.1 Simplicity Has Real Value
+Sections 2–6 assume that client satisfaction is a function of *experienced
+service quality*. But there exists a scenario in which this assumption
+fails and the entire argument collapses.
 
-**Argument.** The unweighted mean is trivially computable: sum the completion
-times, divide by the count. It requires no priority weights, no task-size
-estimates, no calibration. Every alternative proposed in Section 10 requires
-estimating $p_i$ (task size) before the task is complete — and these
-estimates are notoriously unreliable.
+### 7.1 The Self-Referential Metric
 
-**Assessment: This is true.** PWCS and PWCT require inputs (priority
-weights, size estimates) that introduce their own sources of error. If size
-estimates are systematically wrong — and in software engineering they often
-are, with large tasks underestimated and small tasks overestimated — then
-the weighted metric inherits that noise.
-
-However, the unweighted metric does not avoid this problem — it *hides* it
-by implicitly setting all weights to 1 and all sizes to 1. That is not
-"making no assumptions"; it is making the specific assumption that all tasks
-are equally important and equally sized, which is demonstrably false in any
-real system. **A known-imprecise estimate of task size is still more
-informative than the implicit assumption that all sizes are equal.**
-
-### 11.2 Minimizing the Number of People Waiting
-
-**Argument.** If each task represents one client, then unweighted mean
-completion time minimizes the total person-hours spent waiting. SPT is
-optimal for this because completing short tasks first "frees" the most
-people from the queue earliest.
-
-**Assessment: This is mathematically correct.** The sum $\sum C_i$ counts
-total person-time in the system. SPT genuinely minimizes this quantity.
-If you run a DMV and every person's time is equally valuable regardless of
-why they're there, SPT is the right policy.
-
-The argument breaks down when:
-
-1. **Tasks are not 1:1 with clients.** In IT, one client may submit tasks
-   of varying size. Across a relationship, SPT systematically fast-tracks
-   their easy requests and starves their hard ones — which is not perceived
-   as good service.
-
-2. **Waiting cost is not uniform.** A person waiting for a server outage
-   to be fixed is not equivalent to a person waiting for a wallpaper change.
-   The cost of waiting is proportional to the *impact* of the unresolved
-   task, which is what priority encodes.
-
-3. **The metric is applied to teams, not DMVs.** When a team's performance
-   is measured by unweighted mean, the rational response is to cherry-pick
-   — which is individually rational but collectively destructive.
-
-### 11.3 SPT as a Triage Heuristic
-
-**Argument.** In high-volume systems where task sizes cluster tightly
-(e.g., a call center where most calls are 3-7 minutes), SPT approximates
-FIFO and the unweighted mean approximates the weighted mean. The pathologies
-described in this paper only manifest when task sizes span orders of
-magnitude.
-
-**Assessment: This is correct.** As shown in Section 8, when task sizes are
-approximately uniform, all scheduling policies converge and all metrics
-agree. The coefficient of variation of task size, $CV = \sigma_p / \bar{p}$,
-determines the severity of the distortion:
-
-| $CV$ | Task size distribution | Metric distortion |
-|------|----------------------|-------------------|
-| < 0.3 | Tight (call center) | Negligible |
-| 0.3 - 1.0 | Moderate (mixed IT) | Moderate |
-| > 1.0 | Wide (typical IT queue) | Severe |
-
-For a typical IT service desk, task sizes range from 15 minutes (password
-reset) to 40+ hours (infrastructure migration), giving $CV > 2$. The
-distortion is not a theoretical edge case — it is the default condition.
-
-### 11.4 Gaming Requires Malice
-
-**Argument.** The theorems show that the metric *can* be gamed, not that it
-*will* be gamed. A well-intentioned team might use the unweighted mean as
-a rough health indicator without actively optimizing for it, avoiding the
-pathologies described.
-
-**Assessment: This is the strongest counterargument.** If the metric is
-used purely for monitoring — "are we completing things at a reasonable
-pace?" — and not for performance evaluation, rewards, or scheduling
-decisions, then the gaming incentive is absent and the metric is relatively
-harmless.
-
-However, this argument requires the metric to remain purely informational
-and never influence behavior. In practice, any metric that is reported to
-management, tied to OKRs, or used in sprint retrospectives will influence
-behavior — this is Goodhart's Law, and it applies to well-intentioned teams
-as reliably as to cynical ones. The team need not be gaming the metric
-consciously; it is sufficient that completing three easy tickets "feels
-productive" while staring at one hard ticket does not. The metric validates
-the feeling, and the drift happens organically.
-
-### 11.5 Summary: When the Unweighted Mean Is Defensible
-
-The unweighted mean completion time is a defensible metric **only when all
-four conditions hold simultaneously**:
-
-1. Task sizes are approximately uniform ($CV < 0.3$)
-2. There is no priority differentiation (all tasks are equally important)
-3. Each task represents exactly one client
-4. The metric is not used to evaluate, reward, or direct team behavior
-
-In a system satisfying all four conditions — such as a simple FIFO queue
-with uniform jobs and no priority system — the unweighted mean is adequate,
-and its simplicity is a genuine advantage.
-
-In any system that violates even one of these conditions — which includes
-virtually every IT service desk, development team, and support organization
-— the metric produces the distortions proven in Sections 2-9.
-
-The honest conclusion is not that the unweighted mean is always wrong. It is
-that the conditions under which it is right are narrow, easily identified,
-and rarely met in the systems where it is most commonly used.
-
----
-
-## 12. Manager Internalization: The Actionable Solution
-
-The preceding sections present two extremes: reject the metric entirely
-(Sections 1-10) or surrender to it (Appendix A). In practice, most
-managers cannot unilaterally change the metric — it is set at the
-organizational level, reported across teams, and embedded in dashboards
-that other stakeholders consume. The best solution is company-wide metric
-reform. The *actionable* solution is what a single informed manager can
-do right now.
-
-### 12.1 The Strategy
-
-A manager who understands the proof can **internalize the metric's
-limitations without propagating them to the team**. The approach:
-
-1. **Schedule primarily by priority.** The team works critical tasks
-   first, exactly as professional judgment and the priority system
-   dictate. This is the default — the team need not know why.
-
-2. **Tactically interleave small tasks to maintain metric parity.** When
-   the queue contains a small, low-priority task that can be completed
-   quickly without materially delaying any high-priority work, do it.
-   Not because the metric demands it, but because the small task *also
-   needs to get done*, and doing it now costs almost nothing.
-
-3. **Never reveal the metric as the motivation.** The team is told "knock
-   out this quick one while we're waiting on the vendor callback for the
-   P1" — not "we need to bring our average down." The team's
-   professional judgment and intrinsic motivation (Appendix B) remain
-   intact. The manager absorbs the metric-management burden.
-
-This is a **constrained optimization**: minimize priority-weighted delay
-(do the right work in the right order) subject to the constraint that
-the reported unweighted mean stays within an acceptable band.
-
-### 12.2 Formalization
-
-Let $\bar{C}_{\text{target}}$ be the unweighted mean completion time that
-other teams report — the parity threshold. The manager's problem is:
-
-$$\min_{\sigma} \sum_{i=1}^{n} w(q_i) \cdot C_i \quad \text{subject to} \quad \bar{C}(\sigma) \le \bar{C}_{\text{target}}$$
-
-This is a single-machine scheduling problem with a budget constraint on
-the unweighted mean. The solution is a modified priority schedule:
-
-- Start from the priority-first ordering (all P1 first, then P2, etc.).
-- Identify small low-priority tasks whose insertion ahead of lower-ranked
-  same-priority tasks reduces $\bar{C}$ without displacing any
-  higher-priority task.
-- Insert them only when the marginal improvement to $\bar{C}$ exceeds
-  the marginal cost to priority-weighted delay.
-
-**Theorem 12 (Bounded Metric Cost of Priority Scheduling).** For a
-priority-first schedule with $n$ tasks, the gap between its unweighted
-mean $\bar{C}_{\text{priority}}$ and the SPT-optimal unweighted mean
-$\bar{C}_{\text{SPT}}$ is bounded by:
-
-$$\bar{C}_{\text{priority}} - \bar{C}_{\text{SPT}} \le \frac{n-1}{2n}(\bar{p}_{\max\text{-class}} - \bar{p}_{\min\text{-class}}) \cdot n_{\text{classes}}$$
-
-where $\bar{p}_{\max\text{-class}}$ and $\bar{p}_{\min\text{-class}}$ are
-the mean processing times of the largest and smallest priority classes.
-
-**Proof sketch.** The gap arises entirely from the cross-class ordering:
-within each priority class, the manager can use SPT (shortest first) at
-no priority cost, since all tasks in the class have equal priority. The
-only deviation from global SPT is the *between-class* ordering, where
-large high-priority tasks are placed before small low-priority tasks.
-Each such inversion costs at most $p_{\text{large}} - p_{\text{small}}$
-in the unweighted sum, and there are at most
-$n_{\text{classes}} \cdot (n / n_{\text{classes}})$ such inversions.
-$\blacksquare$
-
-In practice, this means: **a manager who uses SPT within each priority
-class and priority ordering between classes will produce a metric that
-is close to the SPT-optimal value** — often within 10-20% — while
-respecting the priority system entirely.
-
-### 12.3 Why This Works: The Manager as Information Barrier
-
-The strategy works because the manager serves as an **information
-barrier** between the metric and the team:
-
-| Layer | Sees the metric | Sees the priorities | Sees the proof |
-|-------|----------------|--------------------|-----------------|
-| Organization | Yes | Nominally | No |
-| Manager | Yes | Yes | **Yes** |
-| Team | No (shielded) | Yes | Irrelevant |
-| Client | Yes (dashboard) | Via SLA | No |
-
-The manager is the only actor who holds all three pieces of information.
-By internalizing the proof, the manager can:
-
-- Present a metric that satisfies organizational reporting (the number
-  is reasonable)
-- Direct the team by priority (professional judgment preserved)
-- Shield the team from the metric's perverse incentives (Appendix B
-  costs avoided)
-
-This is *not* manipulation. The manager is not fabricating numbers or
-misreporting. They are doing the right work in the right order, and
-the metric happens to be acceptable because within-class SPT is free
-and between-class inversions are bounded (Theorem 12).
-
-### 12.4 The Competitive Breakdown
-
-This strategy fails when the metric becomes **competitive between teams**.
-
-Model $m$ teams, each managed independently. Team $j$ reports
-$\bar{C}_j(\sigma_j)$. If teams are ranked, rewarded, or compared on
-$\bar{C}$:
-
-**Case 1: Cooperative** — Teams are measured for parity, not ranking.
-The threshold is "stay within a reasonable band." Each manager
-independently uses the internalization strategy. All teams do
-approximately the right work. The metric is decorative but harmless.
-This is a **coordination game** with a stable cooperative equilibrium.
-
-**Case 2: Competitive** — Teams are ranked by $\bar{C}$. Promotions,
-resources, or recognition go to the lowest average. This is a
-**prisoner's dilemma**:
-
-| | Team B: Priority-first | Team B: SPT |
-|---|---|---|
-| **Team A: Priority-first** | (Good work, Good work) | (A looks bad, B looks good) |
-| **Team A: SPT** | (A looks good, B looks bad) | (Both look good, both do wrong work) |
-
-The dominant strategy for each team is SPT. The Nash equilibrium is
-(SPT, SPT) — all teams optimize the metric, all teams do the wrong
-work, and the organization reports excellent numbers while critical
-tasks rot across every queue.
-
-The internalization strategy is a **cooperative equilibrium that is not
-stable under competition**. A single team that defects to pure SPT will
-outperform all others on the metric, forcing other managers to choose
-between doing the right work (and looking bad) or following suit (and
-abandoning their professional judgment).
-
-### 12.5 The Scope of the Solution
-
-| Condition | Strategy viability |
-|-----------|-------------------|
-| Metric used for health-check / parity | **Viable** — cooperative equilibrium holds |
-| Metric visible but not ranked | **Viable** — no competitive pressure to defect |
-| Metric ranked across teams | **Fragile** — viable only if all managers cooperate |
-| Metric tied to compensation / resources | **Not viable** — prisoner's dilemma dominates |
-| Metric reform possible at org level | **Unnecessary** — fix the metric instead |
-
-The internalization strategy is actionable *right now*, by a single
-manager, without organizational permission or metric reform. It
-preserves team psychology (Appendix B), respects priorities (Sections
-9-10), and produces an acceptable reported metric (Theorem 12).
-
-Its limitation is structural: it requires the metric to be a
-**reporting formality**, not a **competitive instrument**. The moment
-the metric drives resource allocation or team ranking, the cooperative
-equilibrium collapses and only organizational reform — replacing the
-metric with a priority-weighted alternative (Section 10) — can prevent
-the race to the bottom.
-
-**The best solution is company-wide. The actionable solution is a
-manager who understands this proof, shields their team from the metric,
-schedules by priority, and uses SPT only within priority classes to
-keep the number reasonable.**
-
----
-
-## 13. Conclusion
-
-The unweighted average completion time is a **biased statistic** that:
-
-1. **Can be gamed** by scheduling policy (Theorem 1), unlike work-weighted
-   completion time which is schedule-invariant (Theorem 2).
-2. **Incentivizes starvation** of large tasks (Theorem 3).
-3. **Contradicts Little's Law** unless tasks are uniformly sized.
-4. **Degrades client satisfaction** with zero compensating productivity
-   gain (Theorem 7).
-5. **Actively contradicts priority systems** by carrying zero information
-   about business-impact classification (Theorem 9).
-6. **Ignores priority entirely** in its scheduling recommendation,
-   producing suboptimal priority-weighted delay whenever priority and
-   size are not perfectly inversely correlated (Theorem 10).
-
-A metric that can be improved by reordering work — without doing any
-additional work — is measuring the scheduling policy, not the system's
-capacity or effectiveness. When combined with a priority system, the metric
-does not merely fail to reflect priorities — it recommends the schedule
-that inflicts the most damage on the highest-priority work.
-
-The unweighted mean is defensible only under narrow, identifiable conditions
-(Section 11.5): uniform task sizes, no priority system, one-to-one
-client-task mapping, and no behavioral influence from the metric. These
-conditions are rarely met in practice.
-
-**Unweighted average completion time is not a fair or accurate measurement
-of task execution performance. Its adoption as a team metric will
-rationally produce starvation of complex work, violation of stated
-priorities, inequitable client outcomes, and the illusion of productivity
-where none exists.**
-
----
-
-## Appendix A. When the Metric Is the Product
-
-The preceding twelve sections rest on an implicit assumption: that client
-satisfaction is a function of *experienced service quality* — how long
-*their* task took, relative to its size and urgency. If this assumption
-holds, the proof is valid and the unweighted mean is a destructive metric.
-
-But there exists a scenario in which the assumption fails and the entire
-argument collapses.
-
-### A.1 The Self-Referential Metric
-
-Suppose the service provider reports the unweighted mean completion time
-directly to the client — on a dashboard, in an SLA report, on a marketing
-page — and the client's satisfaction is derived primarily from *that number*
-rather than from their individual experience.
-
-Define client satisfaction as:
+Suppose the provider reports the unweighted mean directly to the client
+— on a dashboard, in an SLA report, on a marketing page — and the
+client's satisfaction is derived primarily from *that number*:
 
 $$U_{\text{client}} = f\!\left(\bar{C}(\sigma)\right), \quad f' < 0$$
 
-That is: the client sees "Average resolution time: 6.56 hours" and is
-satisfied, without checking whether *their* ticket — the critical email
-outage — took 6.56 hours or 18.75 hours.
-
 Under this model, SPT genuinely maximizes client satisfaction (Theorem 1).
-The service provider's throughput is unchanged (Theorem 6). The business
-outcome improves: same work done, happier client.
+Throughput is unchanged (Theorem 6). The business outcome improves: same
+work done, happier client.
 
 **Every theorem in this paper remains mathematically correct. But the
-conclusion inverts.** The metric is no longer a proxy for service quality
-that can be gamed — it *is* the service quality, because the client has
-agreed to evaluate quality by the aggregate number rather than by their
-individual experience.
+conclusion inverts.** The metric is no longer a proxy that can be gamed —
+it *is* the service quality, because the client has agreed to evaluate
+quality by the aggregate number.
 
-### A.2 The Economics
+### 7.2 The Economics
 
-This creates a coherent, stable business equilibrium:
+This creates a coherent, stable equilibrium:
 
 | Actor | Behavior | Outcome |
 |-------|----------|---------|
@@ -1085,397 +563,472 @@ This creates a coherent, stable business equilibrium:
 | Client | Reads dashboard, sees low average | Reports satisfaction |
 | Management | Sees satisfied client + good metric | Rewards team |
 
-Throughput is unchanged (Theorem 6), so the same revenue-generating work
-is completed. The only thing that changed is the *order* — and therefore
-the reported number. Real resources were rearranged, no additional value
-was created, but the business metrics all moved in the right direction.
+The provider extracts satisfaction at zero marginal cost, by optimizing a
+number the client has accepted as a proxy for quality.
 
-This is *profitable*. The provider extracts satisfaction from the client
-at zero marginal cost, by optimizing a number that the client has accepted
-as a proxy for quality. The client is no worse off *in their own estimation*,
-because they evaluate the aggregate, not their individual experience.
+### 7.3 The Fragility
 
-### A.3 The Fragility
+This equilibrium is stable only as long as the client never inspects their
+own experience. It breaks when:
 
-This equilibrium is stable only as long as the client never inspects
-their own experience. It breaks the moment any of the following occur:
+1. **The client checks their own ticket.** A CTO whose email server was
+   down for 18.75 hours will not be reassured by "Average resolution:
+   6.56 hours." The clients most likely to inspect are exactly the ones
+   receiving the worst service (Theorem 4).
 
-**1. The client checks their own ticket.**
+2. **A competitor offers per-ticket SLAs.** "P1 resolved within 4 hours"
+   beats "average resolution under 7 hours" for any client with critical
+   needs.
 
-A CTO whose email server was down for 18.75 hours will not be reassured
-by a dashboard reading "Average resolution: 6.56 hours." The aggregate
-metric and the individual experience diverge maximally for high-priority
-tasks (Theorem 4). The clients most likely to inspect their own experience
-are exactly the ones receiving the worst service.
+3. **The team internalizes the metric.** If the team believes the metric
+   reflects real performance, they lose the ability to recognize when
+   critical work is neglected. The metric becomes an epistemic hazard.
 
-**2. A competitor offers per-ticket SLAs.**
+### 7.4 The General Pattern
 
-If an alternative provider guarantees "P1 incidents resolved within 4 hours"
-instead of "average resolution under 7 hours," the aggregate-metric provider
-cannot compete for clients with critical needs — which are typically the
-highest-value clients.
-
-**3. The provider's team internalizes the metric.**
-
-If the team believes the metric reflects real performance (rather than
-consciously gaming it), they lose the ability to recognize when critical
-work is being neglected. The metric becomes an epistemic hazard: it
-tells the team they are performing well, preventing them from seeing that
-they are not.
-
-### A.4 The General Pattern
-
-This is not unique to task scheduling. The structure is:
-
-1. A measurable proxy is established for an unmeasured quality.
-2. The proxy is reported as if it were the quality itself.
-3. The proxy is optimized, improving the reported number.
-4. The underlying quality diverges from the proxy, but no one measures
-   the underlying quality because the proxy exists.
-5. The system is stable until an exogenous shock forces inspection of
-   the underlying quality.
-
-This pattern appears across domains:
+This pattern — proxy replaces quality, proxy is optimized, quality
+diverges, system is stable until tested by reality — recurs across domains.
+Muller [19] documents it extensively as "metric fixation"; Campbell [24]
+formalized the corrupting effect of using indicators as targets.
 
 | Domain | Proxy metric | Underlying quality | Divergence |
 |--------|-------------|-------------------|------------|
-| IT support | Avg. resolution time | Critical system uptime | Server down for 19 hrs, avg says 6.5 |
-| Education | Standardized test scores | Actual learning | Teaching to the test, understanding declines |
-| Healthcare | Patient throughput | Patient outcomes | Faster discharges, higher readmission rates |
-| Finance | Quarterly earnings | Long-term value creation | Cost-cutting inflates EPS, erodes capability |
-| Software | Velocity (story points) | Deliverable product quality | Point inflation, features half-finished |
+| IT support | Avg. resolution time | Critical system uptime | Server down 19 hrs, avg says 6.5 |
+| Education | Test scores | Actual learning | Teaching to the test |
+| Healthcare | Patient throughput | Patient outcomes | Faster discharges, higher readmission |
+| Finance | Quarterly earnings | Long-term value | Cost-cutting inflates EPS, erodes capability |
+| Software | Velocity (story points) | Product quality | Point inflation, features half-finished |
 
-In each case, the proxy is optimized, the number improves, and the system
-*functions* — profitably, even — until the moment the underlying quality
-is tested by reality.
+### 7.5 Information Asymmetry
 
-### A.5 A Mathematical Note on Equilibrium Stability
+Model the system as a game between provider (P) and client (C). P observes
+individual $\{C_i\}$ and chooses $\sigma$; C observes only
+$\bar{C}(\sigma)$. This is a **moral hazard** problem [10]: P's optimal
+strategy is to minimize the observable signal regardless of the
+unobservable distribution.
 
-Model the system as a game between provider (P) and client (C).
+The equilibrium is a **pooling equilibrium** [9]: P's reported metric
+looks identical regardless of the underlying priority-weighted performance.
+It is stable until C obtains access to individual $C_i$ values — via a
+customer portal, a competitor's transparency, or a sufficiently painful
+incident.
 
-**Information structure:**
-- P observes individual completion times $\{C_i\}$ and chooses schedule $\sigma$
-- C observes only the reported aggregate $\bar{C}(\sigma)$
-
-**Payoffs:**
-- P's payoff increases with C's satisfaction and is independent of schedule
-  (throughput is invariant)
-- C's *reported* satisfaction $U_C = f(\bar{C})$ is maximized by SPT
-- C's *actual* welfare (if they could observe it) depends on individual
-  $C_i$ values, especially for high-priority tasks
-
-This is a **moral hazard** problem. P has private information (the
-distribution of $C_i$) that C cannot observe. P's optimal strategy is to
-minimize the observable signal ($\bar{C}$) regardless of the unobservable
-distribution — which is exactly SPT.
-
-The equilibrium is a **pooling equilibrium**: P's schedule looks identical
-to the client regardless of the underlying priority-weighted performance.
-A provider with PWCT = 10.2 and a provider with PWCT = 10.167 both report
-$\bar{C} = 6.56$ under SPT. The client cannot distinguish between them.
-
-This equilibrium is stable under the standard game-theoretic condition:
-**C has no incentive to deviate** (they have no better information source)
-and **P has no incentive to deviate** (any other schedule worsens $\bar{C}$
-with zero throughput benefit).
-
-It is *unstable* under **information revelation**: if C obtains access to
-individual $C_i$ values (via a customer portal, a competing vendor's
-transparency, or a sufficiently painful incident), the pooling equilibrium
-collapses and C's evaluation shifts to the underlying quality.
-
-### A.6 The Uncomfortable Conclusion
+### 7.6 The Uncomfortable Conclusion
 
 The honest answer to "does optimizing the unweighted mean hurt the
-business?" is: **not necessarily, as long as the client never looks
-behind the number**.
-
-The honest answer to "does it hurt the client?" is: **only when they
-have a problem large enough to notice** — which is precisely when the
-metric's distortion is largest (Theorem 4).
-
-The honest answer to "is this sustainable?" is: it is exactly as
-sustainable as any system in which the seller knows more than the buyer.
-Such systems are historically stable for extended periods and then
-collapse rapidly when the information asymmetry is punctured — by a
-crisis, a competitor, or a regulator.
-
-The mathematical structure is clear: the unweighted mean creates an
-information asymmetry between the metric and the reality. Optimizing
-the metric under this asymmetry is *locally rational* for the provider,
-*locally satisfying* for the uninspecting client, and *globally fragile*
-for the relationship.
-
-Whether one calls this "efficient market behavior" or "a dystopian
-consequence of optimizing legible numbers over illegible reality" is not
-a mathematical question. The math says only this: **the incentive exists,
-the equilibrium is real, and it holds until it doesn't.**
+business?" is: **not necessarily, as long as the client never looks behind
+the number**. The honest answer to "is this sustainable?" is: it is
+exactly as sustainable as any system in which the seller knows more than
+the buyer — stable for extended periods, then rapid collapse when the
+asymmetry is punctured.
 
 ---
 
-## Appendix B. The Psychological Cost of Knowing
+## 8. The Psychological Cost of Knowing
 
-Appendix A modeled the provider as a unitary rational actor — "the team"
-optimizes the metric. But teams are composed of individuals, and those
-individuals have their own utility functions. When a team member
-understands the proof — when they *know* the metric is synthetic, that
-the dashboard is theater, that the email server is still down while they
-close wallpaper tickets — a new cost appears that the equilibrium model
-did not account for.
+Section 7 modeled the provider as a unitary actor. But teams are composed
+of individuals. When a team member understands the proof — when they
+*know* the metric is synthetic, that the dashboard is theater, that the
+email server is still down while they close wallpaper tickets — a new cost
+appears that the equilibrium model omitted.
 
-### B.1 The Hidden Variable: Team Awareness
+### 8.1 The Hidden Variable: Team Awareness
 
-Appendix A's game has three actors: provider, client, management. But the
-provider is not monolithic. Decompose it:
+| Actor | Observes individual $C_i$ | Observes $\bar{C}$ | Understands the proof |
+|-------|--------------------------|--------------------|-----------------------|
+| Management | Possibly | Yes | Varies |
+| Team member | **Yes** | Yes | **Yes** (in this scenario) |
+| Client | No | Yes | No |
 
-- **Management (M):** sets the metric, evaluates the team, reports to client
-- **Team member (T):** executes the work, observes individual task states
-- **Client (C):** observes only the reported aggregate
+The team member has full information. They see the ticket queue. They know
+the email server has been down since 7 AM. They know they are closing a
+wallpaper ticket because it improves the number. And they know *why*.
 
-The information structure changes:
+### 8.2 Cognitive Dissonance Under Full Information
 
-| Actor | Observes individual $C_i$ | Observes aggregate $\bar{C}$ | Understands the proof |
-|-------|--------------------------|-----------------------------|-----------------------|
-| M | Possibly | Yes | Varies |
-| T | **Yes** | Yes | **Yes** (in this scenario) |
-| C | No | Yes | No |
-
-The team member has *full information*. They see the ticket queue. They
-know the email server has been down since 7 AM. They know they are closing
-a wallpaper ticket because it will improve the number. And they know *why*
-this is happening — not from vague discomfort, but from a precise
-mathematical understanding that the metric rewards this behavior.
-
-### B.2 Cognitive Dissonance Under Full Information
-
-Cognitive dissonance (Festinger, 1957) arises when an individual holds
-two contradictory cognitions simultaneously. The standard resolution is
-to modify one cognition to reduce the conflict.
-
-A team member operating under the synthetic metric holds:
+Cognitive dissonance [11] arises when an individual holds contradictory
+cognitions. Without understanding *why*, the contradiction can be
+rationalized: "management knows best." Understanding the proof removes
+the ambiguity. The team member now holds:
 
 - **Cognition A:** "I am a competent professional. My job is to solve
-  important problems for clients."
+  important problems."
 - **Cognition B:** "I am closing a wallpaper ticket while the email
-  server is down, because it makes the number look better."
+  server is down, because the metric is mathematically biased (Theorem 1),
+  the reordering produces zero throughput (Theorem 6), and the only
+  beneficiary is the dashboard (Section 7). I can prove this."
 
-In the absence of understanding *why*, Cognition B can be rationalized:
-"management knows best," "maybe there's a reason," "the system works
-overall." This is uncomfortable but tolerable — the ambiguity provides
-cognitive cover.
+The dissonance is now *load-bearing*. The available resolutions — abandon
+professional identity, reject the proof, advocate for change, or leave —
+each impose costs that did not exist before.
 
-**Understanding the proof removes the ambiguity entirely.** The team
-member now holds:
+### 8.3 Self-Determination Theory: Three Needs Violated
 
-- **Cognition A:** Same as above.
-- **Cognition B':** "I am closing a wallpaper ticket while the email
-  server is down, because the metric is mathematically biased toward
-  small tasks (Theorem 1), the reordering produces zero additional
-  throughput (Theorem 6), and the only beneficiary is the dashboard
-  (Appendix A). I can prove this."
+Deci and Ryan's Self-Determination Theory [12, 13] identifies three needs
+predicting intrinsic motivation:
 
-B' is strictly harder to rationalize than B. The team member cannot
-retreat into uncertainty because they possess the proof. The dissonance
-is now *load-bearing*: it must be resolved, and the available resolutions
-are:
+**Autonomy.** The metric constrains choices in a way the team member
+knows is mathematically suboptimal. A worker who understands the process
+is provably counterproductive cannot feel autonomous following it.
 
-1. **Reject Cognition A** — "I am not here to solve important problems;
-   I am here to move numbers." This is psychologically costly. It
-   requires abandoning professional identity.
+**Competence.** The metric rewards *apparent* effectiveness (low $\bar{C}$)
+while being invariant to *actual* effectiveness (Theorem 6). Genuine
+competence — fixing the email server first — is *punished* by the metric.
 
-2. **Reject Cognition B'** — "The proof must be wrong, or doesn't apply
-   here." This is intellectually costly. The proof is simple enough to
-   verify, and the IT example maps directly to their daily experience.
+**Relatedness.** The team member knows the client's email server is down.
+They could help. They are instead updating wallpaper — not because it
+helps anyone, but because it helps a number. The connection between work
+and human impact has been severed, and the team member can see the severed
+ends.
 
-3. **Change the situation** — advocate for better metrics, refuse to
-   cherry-pick, escalate. This is *professionally* costly in an
-   environment that rewards the metric.
+### 8.4 Moral Injury
 
-4. **Leave** — resolve the dissonance by exiting the system entirely.
+Moral injury [16, 17] is the lasting harm caused by "perpetrating, failing
+to prevent, bearing witness to, or learning about acts that transgress
+deeply held moral beliefs" [17]. It has since been extended to business
+settings [25]. The key distinction from burnout: **burnout is exhaustion
+from doing too much. Moral injury is damage from doing the wrong thing.**
 
-None of these resolutions are free. Each one imposes a cost on the team
-member that did not exist before they understood the proof — and *none of
-them appear in the business equilibrium model of Appendix A*.
+A team member who knows the email server is down, knows they should fix
+it, closes a wallpaper ticket instead, and does so because the metric
+requires it, is experiencing the structural conditions for moral injury.
 
-### B.3 Self-Determination Theory: Three Needs Violated
+### 8.5 Learned Helplessness and Metric Fatalism
 
-Deci and Ryan's Self-Determination Theory (1985, 2000) identifies three
-innate psychological needs whose satisfaction predicts intrinsic motivation,
-job satisfaction, and well-being:
+Seligman's learned helplessness [14, 15] describes how exposure to
+uncontrollable negative outcomes leads to passivity. The sequence:
 
-**1. Autonomy** — the need to feel volitional control over one's actions.
+1. The metric is flawed (proof understood).
+2. Advocate for change.
+3. Rejected ("the numbers are good, don't rock the boat").
+4. Repeat with decreasing conviction.
+5. Terminal state: "The metric is what it is. I'll just close tickets."
 
-A team member who understands the proof knows that the metric constrains
-their choices in a way that is mathematically suboptimal for the client.
-Their scheduling decisions are not autonomous expressions of professional
-judgment; they are coerced responses to a flawed incentive. The *knowledge*
-of the coercion — not just the coercion itself — is what damages autonomy.
-A worker who doesn't understand why they're doing something can still feel
-autonomous ("I'm choosing to follow the process"). A worker who understands
-that the process is provably counterproductive cannot.
+This is not laziness. It is the rational response to a system that
+punishes correct behavior and rewards incorrect behavior, when the
+individual lacks power to change the system.
 
-**2. Competence** — the need to feel effective at meaningful tasks.
+### 8.6 The Adversarial Selection Spiral
 
-The proof demonstrates that the metric rewards *apparent* effectiveness
-(low $\bar{C}$) while being invariant to *actual* effectiveness (throughput,
-Theorem 6). A team member who understands this knows that the metric
-cannot distinguish between a competent team and an incompetent one that
-happens to cherry-pick small tasks. Their competence is invisible to the
-measurement system. Worse: genuine competence — choosing to fix the email
-server first — is *punished* by the metric ($\bar{C}$ increases from 6.56
-to 13.63 in the IT example).
+Combining Section 7's equilibrium with the turnover dynamic:
 
-When a measurement system punishes competent decisions and rewards
-incompetent ones, and the team member *knows this*, the need for
-competence is not merely unsatisfied — it is actively contradicted.
+1. Organization adopts unweighted mean. Metric looks good (SPT).
+2. Aware, competent team members experience psychological costs (8.2–8.5).
+3. Those members leave. Replaced by members who do not understand the
+   metric's flaws or do not care.
+4. The metric continues to look good — it always does under SPT,
+   regardless of team competence (Corollary 6.1).
+5. Actual service quality degrades, but the metric cannot detect this
+   (Corollary 9.1).
+6. Return to step 1.
 
-**3. Relatedness** — the need to feel connected to others and to
-contribute to something meaningful.
+The metric selects *against* the people who would improve the system and
+*for* the people who will not challenge it. The system stabilizes at a
+lower level of competence, invisible to its own measurement apparatus.
 
-The team member knows the client's email server is down. They know the
-client is suffering. They know they could help. They are instead updating
-a wallpaper policy — not because it helps anyone, but because it helps
-a number. The connection between the team member's work and the client's
-well-being has been severed by the metric, and the team member *can see
-the severed ends*.
+### 8.7 The Complete Cost Model
 
-### B.4 Moral Injury
-
-The concept of moral injury (Shay, 1994; Litz et al., 2009) was developed
-in military psychology to describe the lasting harm caused by
-"perpetrating, failing to prevent, bearing witness to, or learning about
-acts that transgress deeply held moral beliefs." It has since been applied
-to healthcare workers, first responders, and — increasingly — to
-knowledge workers in bureaucratic systems.
-
-The key distinction from burnout: **burnout is exhaustion from doing too
-much. Moral injury is damage from doing the wrong thing, or being
-prevented from doing the right thing.**
-
-A team member who:
-- Knows the email server is down (witnessing the harm)
-- Knows they should fix it (moral belief about professional duty)
-- Closes a wallpaper ticket instead (transgressing that belief)
-- Does so because the metric requires it (institutional causation)
-
-...is experiencing the structural conditions for moral injury. The
-proof doesn't cause the injury — the metric does. But the proof
-eliminates the psychological buffer of ignorance that would otherwise
-mitigate it.
-
-### B.5 Learned Helplessness and Metric Fatalism
-
-Seligman's learned helplessness framework (1967, 1975) describes the
-phenomenon where exposure to uncontrollable negative outcomes leads to
-passivity even when control becomes available.
-
-The sequence for an aware team member:
-
-1. **Observation:** The metric is flawed (proof understood).
-2. **Action:** Advocate for change ("we should use priority-weighted
-   metrics").
-3. **Outcome:** Rejected ("the client is happy with the current
-   dashboard," "this is how we've always measured," "the numbers are
-   good, don't rock the boat").
-4. **Repetition:** Steps 2-3 repeat, with decreasing conviction.
-5. **Helplessness:** "The metric is what it is. I'll just close tickets."
-
-The terminal state — metric fatalism — is characterized by:
-- Disengagement from professional judgment ("I just do what the queue
-  says")
-- Reduced initiative ("why bother triaging if the metric doesn't care?")
-- Cynicism toward measurement generally ("all metrics are fake")
-- Withdrawal of discretionary effort on complex tasks
-
-This is not laziness. It is the rational psychological response to a
-system that punishes correct behavior and rewards incorrect behavior,
-when the individual lacks the power to change the system.
-
-### B.6 The Turnover Equation
-
-The costs described in B.2-B.5 are borne by the team member, not the
-organization — initially. They become organizational costs through
-**turnover**.
-
-Model the team member's stay/leave decision:
-
-$$\text{Stay if: } \quad V_{\text{compensation}} + V_{\text{intrinsic}} > V_{\text{outside option}}$$
-
-The synthetic metric degrades $V_{\text{intrinsic}}$ through each of the
-mechanisms described above:
-
-| Mechanism | Component degraded | Effect on $V_{\text{intrinsic}}$ |
-|-----------|-------------------|----------------------------------|
-| Cognitive dissonance (B.2) | Psychological comfort | Decreased |
-| Autonomy violation (B.3.1) | Sense of agency | Decreased |
-| Competence contradiction (B.3.2) | Professional identity | Decreased |
-| Relatedness severance (B.3.3) | Sense of purpose | Decreased |
-| Moral injury (B.4) | Ethical well-being | Decreased |
-| Learned helplessness (B.5) | Belief in efficacy | Decreased |
-
-As $V_{\text{intrinsic}}$ decreases, the organization must increase
-$V_{\text{compensation}}$ to retain the team member, or accept their
-departure.
-
-Crucially: **the team members most affected are those with the strongest
-professional identity and the deepest understanding of the work.** These
-are the most competent members — the ones most capable of recognizing the
-metric's absurdity, most troubled by it, and most able to find employment
-elsewhere. The metric selects for the departure of the team's best people.
-
-### B.7 The Adversarial Selection Spiral
-
-Combining Appendix A's equilibrium with the turnover dynamic:
-
-1. Organization adopts unweighted mean completion time.
-2. Metric looks good (SPT). Client is satisfied (Appendix A). Management
-   is satisfied.
-3. Aware, competent team members experience psychological costs (B.2-B.5).
-4. Those members leave. They are replaced by members who either:
-   (a) do not understand the metric's flaws (less competent), or
-   (b) do not care (less engaged).
-5. The metric continues to look good — it always does under SPT,
-   regardless of team competence (Theorem 6, Corollary 6.1).
-6. Actual service quality degrades (less competent team), but the metric
-   cannot detect this (Theorem 9, Corollary 9.1).
-7. Return to step 2.
-
-This is an **adversarial selection spiral**: the metric selects *against*
-the people who would improve the system and *for* the people who will not
-challenge it. The system stabilizes at a lower level of actual competence,
-invisible to its own measurement apparatus, staffed by people who have
-made peace with — or are unaware of — the gap between the number and the
-reality.
-
-The dashboard still looks good.
-
-### B.8 The Complete Cost Model
-
-Appendix A concluded that the synthetic-metric equilibrium is stable and
-profitable. Appendix B reveals the hidden costs that model omitted:
-
-| Appendix A (visible) | Appendix B (hidden) |
+| Section 7 (visible) | Section 8 (hidden) |
 |---------------------|---------------------|
-| Client satisfied (sees good number) | Team dissatisfied (sees bad reality) |
+| Client satisfied (good number) | Team dissatisfied (bad reality) |
 | Throughput unchanged | Discretionary effort withdrawn |
 | Metric improves | Competent members leave |
 | Business economy stable | Institutional competence degrades |
-| Zero marginal cost | Replacement/training costs accumulate |
 
-The business equilibrium of Appendix A is real. The psychological costs
-of Appendix B are also real. They operate on different timescales:
-the equilibrium is visible quarterly; the competence degradation is
-visible over years.
+These operate on different timescales: the equilibrium is visible
+quarterly; the competence degradation is visible over years. The complete
+model is: **the metric works, and it is destructive, and the destruction
+is invisible to the metric.** The metric is fresh paint on corroded rebar.
 
-The complete model is not "the metric works" (Appendix A) or "the metric
-is destructive" (Sections 1-12). It is: **the metric works, and it
-is destructive, and the destruction is invisible to the metric.**
+---
 
-An organization can run profitably for an extended period on synthetic
-metrics and hollowed-out competence, just as a building can stand for
-years with corroded rebar. The metric is the fresh paint. Appendix A
-proved the paint is convincing. This appendix merely notes that it is
-still paint.
+## 9. Manager Internalization: The Actionable Solution
+
+Sections 2–6 say reject the metric. Section 7 says the metric works
+(for the business). Section 8 says it destroys the team. In practice,
+most managers cannot unilaterally change the metric. The best solution is
+company-wide metric reform. The *actionable* solution is what a single
+informed manager can do right now.
+
+### 9.1 The Strategy
+
+A manager who understands the proof can **internalize the metric's
+limitations without propagating them to the team**:
+
+1. **Schedule primarily by priority.** The team works critical tasks first.
+2. **Tactically interleave small tasks.** When a small low-priority task
+   can be completed without materially delaying high-priority work, do it.
+   Not because the metric demands it, but because it also needs to get
+   done and costs almost nothing.
+3. **Never reveal the metric as the motivation.** "Knock out this quick
+   one while we wait for the vendor callback on the P1" — not "we need
+   to bring our average down." The team's intrinsic motivation remains
+   intact (Section 8). The manager absorbs the metric-management burden.
+
+### 9.2 Formalization
+
+The manager's problem is a constrained optimization:
+
+$$\min_{\sigma} \sum_{i=1}^{n} w(q_i) \cdot C_i \quad \text{subject to} \quad \bar{C}(\sigma) \le \bar{C}_{\text{target}}$$
+
+**Theorem 12 (Bounded Metric Cost of Priority Scheduling).** A manager
+who uses SPT *within* each priority class and priority ordering *between*
+classes will produce a metric close to the SPT-optimal value — the gap
+arises only from between-class inversions.
+
+**Proof sketch.** Within each priority class, SPT is free (all tasks have
+equal priority). The only deviation from global SPT is the between-class
+ordering. Each cross-class inversion costs at most
+$p_{\text{large}} - p_{\text{small}}$ in the unweighted sum, and these
+inversions are bounded by the number of classes. In practice, the gap is
+typically within 10–20% of SPT-optimal. $\blacksquare$
+
+### 9.3 The Manager as Information Barrier
+
+| Layer | Sees metric | Sees priorities | Sees proof |
+|-------|-----------|----------------|------------|
+| Organization | Yes | Nominally | No |
+| Manager | Yes | Yes | **Yes** |
+| Team | No (shielded) | Yes | Irrelevant |
+| Client | Yes (dashboard) | Via SLA | No |
+
+The manager is the only actor holding all three pieces of information.
+This is not manipulation — they are doing the right work in the right
+order, and the metric happens to be acceptable because within-class SPT
+is free.
+
+### 9.4 The Competitive Breakdown
+
+This strategy fails when the metric becomes **competitive between teams**.
+
+**Case 1: Cooperative** — Teams measured for parity, not ranking. Each
+manager independently uses the internalization strategy. The metric is
+decorative but harmless. This is a **coordination game** with a stable
+cooperative equilibrium.
+
+**Case 2: Competitive** — Teams ranked by $\bar{C}$. This is a
+**prisoner's dilemma**:
+
+| | Team B: Priority-first | Team B: SPT |
+|---|---|---|
+| **Team A: Priority-first** | (Good work, Good work) | (A looks bad, B looks good) |
+| **Team A: SPT** | (A looks good, B looks bad) | (Both look good, both do wrong work) |
+
+The Nash equilibrium is (SPT, SPT). The internalization strategy is a
+cooperative equilibrium that is **not stable under competition**.
+
+### 9.5 Scope
+
+| Condition | Viability |
+|-----------|-----------|
+| Metric used for health-check / parity | **Viable** |
+| Metric visible but not ranked | **Viable** |
+| Metric ranked across teams | **Fragile** — requires all managers to cooperate |
+| Metric tied to compensation / resources | **Not viable** — prisoner's dilemma dominates |
+| Metric reform possible at org level | **Unnecessary** — fix the metric instead |
+
+**The best solution is company-wide. The actionable solution is a manager
+who understands this proof, shields their team from the metric, schedules
+by priority, and uses SPT only within priority classes to keep the number
+reasonable.**
+
+---
+
+# Part IV: Assessment
+
+## 10. Devil's Advocate
+
+Intellectual honesty requires acknowledging where the argument has limits.
+
+### 10.1 Simplicity Has Real Value
+
+**Argument.** The unweighted mean requires no priority weights, no
+task-size estimates, no calibration.
+
+**Assessment: True.** But the unweighted metric does not avoid assumptions
+— it *hides* them by implicitly setting all weights to 1 and all sizes to
+1. A known-imprecise estimate of task size is still more informative than
+the implicit assumption that all sizes are equal.
+
+### 10.2 Minimizing the Number of People Waiting
+
+**Argument.** SPT minimizes total person-hours spent waiting. If each
+task represents one client, this is optimal.
+
+**Assessment: Mathematically correct.** If you run a DMV and every
+person's time is equally valuable, SPT is the right policy. It breaks
+down when tasks are not 1:1 with clients, waiting cost is not uniform,
+or the metric is used to evaluate teams rather than serve a literal queue.
+
+### 10.3 SPT as a Triage Heuristic
+
+**Argument.** When task sizes cluster tightly, SPT approximates FIFO
+and the unweighted mean approximates the weighted mean.
+
+**Assessment: Correct.** The coefficient of variation $CV = \sigma_p / \bar{p}$ determines distortion severity:
+
+| $CV$ | Task size distribution | Distortion |
+|------|----------------------|------------|
+| < 0.3 | Tight (call center) | Negligible |
+| 0.3 – 1.0 | Moderate (mixed IT) | Moderate |
+| > 1.0 | Wide (typical IT queue) | Severe |
+
+A typical IT desk spans 15 minutes to 40+ hours ($CV > 2$). The
+distortion is not an edge case — it is the default.
+
+### 10.4 Gaming Requires Malice
+
+**Argument.** The theorems show the metric *can* be gamed, not that it
+*will* be gamed.
+
+**Assessment: This is the strongest counterargument.** If the metric is
+purely informational and never influences behavior, the gaming incentive
+is absent. However, any metric reported to management, tied to OKRs, or
+discussed in retrospectives will influence behavior. This is Goodhart's
+Law [6, 7] — and it applies to well-intentioned teams as reliably as to
+cynical ones. The drift happens organically: completing three easy tickets
+"feels productive" while the metric validates the feeling.
+
+### 10.5 When the Unweighted Mean Is Defensible
+
+The metric is defensible **only when all four conditions hold**:
+
+1. Task sizes are approximately uniform ($CV < 0.3$)
+2. No priority differentiation (all tasks equally important)
+3. Each task represents exactly one client
+4. The metric is not used to evaluate, reward, or direct behavior
+
+These conditions are rarely met in the systems where the metric is most
+commonly used.
+
+---
+
+## 11. Related Work
+
+This paper sits at the intersection of several literatures that have not
+previously been connected.
+
+### 11.1 Scheduling Theory and Fairness
+
+Smith [1] established the SPT optimality result and the WSJF rule in 1956.
+Conway, Maxwell, and Miller [2] provided the comprehensive textbook
+treatment. The fairness of size-based scheduling policies has been debated
+in computer systems scheduling: Bansal and Harchol-Balter [22] investigated
+SRPT unfairness; Wierman and Harchol-Balter [23] formalized fairness
+classifications against Processor-Sharing; Angel, Bampis, and Pascual [21]
+measured SPT schedule quality against fair optimality criteria.
+
+This prior work analyzes fairness in CPU and server scheduling. The present
+paper applies the same mathematical results to *organizational task
+management*, where the "scheduler" is a human team, the "jobs" are client
+requests with business-impact priorities, and the "objective function" is
+a management metric. The mechanism is identical; the consequences differ
+because organizational scheduling has priority systems, client
+relationships, and psychological costs that CPU scheduling does not.
+
+### 11.2 Measurement Dysfunction
+
+Austin [18] proved that incomplete measurement — measuring only a subset
+of relevant dimensions — creates incentives to optimize the measured
+dimensions at the expense of unmeasured ones, and that this effect is not
+merely possible but *inevitable* when measurement is tied to rewards. His
+information-asymmetry framing closely parallels Section 7. The present
+paper provides the specific mathematical mechanism (Theorems 1–2) for the
+case of task scheduling, and extends the argument through psychology
+(Section 8) to trace the complete chain of organizational harm.
+
+Muller [19] documented "metric fixation" across education, healthcare,
+policing, and finance, providing extensive empirical evidence for the
+patterns theorized in Section 7.4. Campbell [24] formalized the corrupting
+effect of using indicators as targets, complementing Goodhart's original
+observation [6] and Strathern's generalization [7].
+
+Bevan and Hood [26] empirically documented gaming behaviors in the English
+public health system — including the exact patterns of "hitting the target
+and missing the point" described in our Section 5.2.
+
+### 11.3 Psychological Costs of Metric Dysfunction
+
+The application of moral injury (Shay [16], Litz et al. [17]) to business
+settings has recent precedent: a 2024 *Journal of Business Ethics* study
+[25] explicitly extended the construct to for-profit workplaces, finding
+structural conditions similar to those described in Section 8.4. Moore
+[27] analyzed moral *disengagement* — the cognitive restructuring that
+enables unethical behavior under organizational pressure. The present
+paper addresses the complementary phenomenon: the harm to individuals who
+*refuse* to disengage.
+
+### 11.4 What Is Novel
+
+The individual components — SPT optimality, Goodhart's Law, measurement
+dysfunction, moral injury — all have precedent. The contributions of this
+paper are:
+
+1. **The conservation law (Theorem 2) used prescriptively** — as a
+   constructive argument that work-weighted completion time *cannot* be
+   gamed, rather than as a theoretical scheduling result.
+
+2. **The specific proof that priority classes make the metric algebraically
+   adversarial** (Theorems 8–9) — not merely empirically bad but
+   structurally contradictory, with zero mutual information between the
+   schedule and the priority system.
+
+3. **The integrated chain** from mathematical proof through information
+   asymmetry through psychological harm through adversarial selection
+   spiral — tracing a single metric from Smith (1956) to organizational
+   hollowing.
+
+4. **The manager internalization strategy** (Section 9) with formal
+   game-theoretic analysis of its stability and breakdown conditions
+   under inter-team competition.
+
+5. **The application of scheduling theory to organizational management
+   critique** — proving that a commonly used team metric has specific,
+   quantifiable pathologies rather than arguing from anecdote or
+   general principle.
+
+---
+
+## 12. Conclusion
+
+The unweighted average completion time is a **biased statistic** that:
+
+1. **Can be gamed** by scheduling policy (Theorem 1), unlike work-weighted
+   completion time which is schedule-invariant (Theorem 2).
+2. **Incentivizes starvation** of large tasks (Theorem 3).
+3. **Degrades client satisfaction** with zero compensating productivity
+   gain (Theorem 7).
+4. **Actively contradicts priority systems** by carrying zero information
+   about business-impact classification (Theorem 9).
+5. **Ignores priority entirely** in its scheduling recommendation,
+   producing suboptimal priority-weighted delay whenever priority and
+   size are not perfectly inversely correlated (Theorem 10).
+
+A metric that can be improved by reordering work — without doing any
+additional work — is measuring the scheduling policy, not the system's
+capacity. When combined with a priority system, it recommends the schedule
+that inflicts the most damage on the highest-priority work.
+
+When the metric is reported to clients, it creates an information asymmetry
+(Section 7) whose business equilibrium is profitable but fragile. When
+team members understand its flaws, it violates their intrinsic motivation
+and selects for the departure of the most competent people (Section 8).
+A single informed manager can partially mitigate these effects through
+constrained optimization (Section 9), but this cooperative strategy is
+not stable under inter-team competition.
+
+The unweighted mean is defensible only under narrow conditions
+(Section 10.5): uniform task sizes, no priorities, one-to-one client-task
+mapping, and no behavioral influence. These conditions are rarely met.
+
+**Unweighted average completion time is not a fair or accurate measurement
+of task execution performance. Its adoption as a team metric will
+rationally produce starvation of complex work, violation of stated
+priorities, inequitable client outcomes, and the illusion of productivity
+where none exists.**
+
+The best solution is organizational metric reform. The actionable solution
+is a manager who understands this proof.
 
 ---
 
@@ -1489,59 +1042,48 @@ doi:[10.1002/nav.3800030106](https://doi.org/10.1002/nav.3800030106)
 
 > Origin of the SPT optimality result (Theorem 1), the weighted completion
 > time rule $w_i/p_i$ descending (WSJF, Theorem 11), and the adjacent-job
-> pairwise interchange (exchange argument) proof technique used throughout
-> this paper.
+> pairwise interchange (exchange argument) proof technique used throughout.
 
 [2] Conway, R. W., Maxwell, W. L., & Miller, L. W. (1967). *Theory of
 Scheduling*. Addison-Wesley.
 
-> Comprehensive treatment of single-machine and multi-machine scheduling
-> theory, extending Smith's results. Standard textbook reference for the
-> exchange argument and its generalizations.
+> Standard textbook treatment of single-machine scheduling theory,
+> extending Smith's results.
 
 [3] Little, J. D. C. (1961). A proof for the queuing formula: L = λW.
 *Operations Research*, 9(3), 383–387.
 doi:[10.1287/opre.9.3.383](https://doi.org/10.1287/opre.9.3.383)
 
-> First rigorous proof of Little's Law, referenced in Section 5. The
-> result was known informally before 1961; this paper provided the
-> general proof requiring only stationarity and finite expectations.
+> First rigorous proof of Little's Law. Referenced in Section 3.2 for
+> queueing-theoretic context.
 
 [4] Little, J. D. C. (2011). Little's Law as viewed on its 50th
 anniversary. *Operations Research*, 59(3), 536–549.
 doi:[10.1287/opre.1110.0941](https://doi.org/10.1287/opre.1110.0941)
 
-> Retrospective discussing the law's scope, limitations, and
-> common misapplications — including the batch-case subtleties
-> noted in Section 5 of this paper.
+> Retrospective discussing scope, limitations, and common misapplications.
 
 [5] Reinertsen, D. G. (2009). *The Principles of Product Development
 Flow: Second Generation Lean Product Development*. Celeritas Publishing.
 ISBN: 978-0-9844512-0-8.
 
-> Popularized the term "Weighted Shortest Job First" (WSJF) and the
-> "Cost of Delay divided by Duration" formulation in agile/lean product
-> development contexts. The underlying mathematical result is Smith
-> (1956) [1].
+> Popularized WSJF and "Cost of Delay / Duration" in agile/lean contexts.
+> Mathematical foundation is Smith (1956) [1].
 
 ### Measurement and Incentives
 
-[6] Goodhart, C. A. E. (1984). Problems of monetary management: The
-U.K. experience. In C. A. E. Goodhart, *Monetary Theory and Practice:
-The UK Experience* (pp. 91–121). Macmillan.
+[6] Goodhart, C. A. E. (1984). Problems of monetary management: The U.K.
+experience. In *Monetary Theory and Practice* (pp. 91–121). Macmillan.
 
-> Source of Goodhart's Law. Original wording: "Any observed statistical
-> regularity will tend to collapse once pressure is placed upon it for
-> control purposes." First presented as a working paper for the Reserve
-> Bank of Australia in 1975.
+> Source of Goodhart's Law: "Any observed statistical regularity will tend
+> to collapse once pressure is placed upon it for control purposes."
 
 [7] Strathern, M. (1997). 'Improving ratings': Audit in the British
 university system. *European Review*, 5(3), 305–321.
 doi:[10.1002/(SICI)1234-981X(199707)5:3<305::AID-EURO184>3.0.CO;2-4](https://doi.org/10.1002/(SICI)1234-981X(199707)5:3%3C305::AID-EURO184%3E3.0.CO;2-4)
 
-> Generalized Goodhart's observation into the form commonly cited today:
-> "When a measure becomes a target, it ceases to be a good measure."
-> Referenced implicitly in Sections 6, 11.4, and Appendix A.4.
+> Generalized Goodhart's Law: "When a measure becomes a target, it ceases
+> to be a good measure."
 
 ### Behavioral Economics
 
@@ -1549,11 +1091,7 @@ doi:[10.1002/(SICI)1234-981X(199707)5:3<305::AID-EURO184>3.0.CO;2-4](https://doi
 decision under risk. *Econometrica*, 47(2), 263–292.
 doi:[10.2307/1914185](https://doi.org/10.2307/1914185)
 
-> Established loss aversion — the finding that losses are weighted
-> approximately twice as heavily as equivalent gains in subjective
-> evaluation. Referenced in Section 7.4 to argue that the dissatisfaction
-> of deprioritized large-task clients outweighs the satisfaction gained
-> by small-task clients under SPT.
+> Established loss aversion. Referenced in Section 4.5.
 
 ### Game Theory and Contract Theory
 
@@ -1561,78 +1099,55 @@ doi:[10.2307/1914185](https://doi.org/10.2307/1914185)
 and the market mechanism. *The Quarterly Journal of Economics*, 84(3),
 488–500. doi:[10.2307/1879431](https://doi.org/10.2307/1879431)
 
-> Foundational model of information asymmetry and adverse selection.
-> The pooling equilibrium described in Appendix A.5 — where the client
-> cannot distinguish high-quality from low-quality service because both
-> produce the same aggregate metric — is structurally analogous to
-> Akerlof's lemons problem.
+> Information asymmetry and adverse selection. The pooling equilibrium in
+> Section 7.5 is structurally analogous.
 
 [10] Hölmstrom, B. (1979). Moral hazard and observability. *The Bell
 Journal of Economics*, 10(1), 74–91.
 doi:[10.2307/3003320](https://doi.org/10.2307/3003320)
 
-> Formal treatment of moral hazard — the problem arising when an agent's
-> actions are not fully observable by the principal. The metric-reporting
-> scenario in Appendix A.5 is a moral hazard problem: the provider
-> (agent) chooses the schedule, but the client (principal) observes only
-> the aggregate outcome.
+> Formal treatment of moral hazard. The metric-reporting scenario in
+> Section 7.5 is a moral hazard problem.
 
 ### Psychology
 
 [11] Festinger, L. (1957). *A Theory of Cognitive Dissonance*. Stanford
 University Press. ISBN: 978-0-8047-0131-0.
 
-> Foundational theory of cognitive dissonance. Referenced in Appendix
-> B.2: an individual holding contradictory cognitions experiences
-> psychological discomfort and is motivated to reduce the contradiction.
-> The proof eliminates the ambiguity that would normally allow
-> rationalization, making the dissonance load-bearing.
+> Foundational theory. Referenced in Section 8.2.
 
 [12] Deci, E. L., & Ryan, R. M. (1985). *Intrinsic Motivation and
 Self-Determination in Human Behavior*. Plenum Press.
 ISBN: 978-0-306-42022-1.
 
-> Original book-length treatment of Self-Determination Theory,
-> identifying autonomy, competence, and relatedness as innate
-> psychological needs. Referenced in Appendix B.3.
+> Original treatment of Self-Determination Theory. Referenced in
+> Section 8.3.
 
 [13] Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and
 the facilitation of intrinsic motivation, social development, and
 well-being. *American Psychologist*, 55(1), 68–78.
 doi:[10.1037/0003-066X.55.1.68](https://doi.org/10.1037/0003-066X.55.1.68)
 
-> Overview and update of Self-Determination Theory, linking need
-> satisfaction to intrinsic motivation, job satisfaction, and
-> psychological well-being. The three-need framework (autonomy,
-> competence, relatedness) applied in Appendix B.3.
+> SDT overview linking need satisfaction to intrinsic motivation and
+> well-being.
 
 [14] Seligman, M. E. P., & Maier, S. F. (1967). Failure to escape
 traumatic shock. *Journal of Experimental Psychology*, 74(1), 1–9.
 doi:[10.1037/h0024514](https://doi.org/10.1037/h0024514)
 
-> Original experimental demonstration of learned helplessness.
-> Co-authored with Steven F. Maier. Referenced in Appendix B.5:
-> repeated exposure to uncontrollable outcomes (failed advocacy for
-> better metrics) produces passivity and disengagement.
+> Original demonstration of learned helplessness. Referenced in
+> Section 8.5.
 
 [15] Seligman, M. E. P. (1975). *Helplessness: On Depression,
-Development, and Death*. W. H. Freeman.
-ISBN: 978-0-7167-0752-3.
+Development, and Death*. W. H. Freeman. ISBN: 978-0-7167-0752-3.
 
 > Extended treatment connecting learned helplessness to human depression
-> and institutional behavior. The concept of "metric fatalism" described
-> in Appendix B.5 is a domain-specific instance of learned helplessness
-> in organizational settings.
+> and institutional behavior.
 
-[16] Shay, J. (1994). *Achilles in Vietnam: Combat Trauma and the
-Undoing of Character*. Atheneum / Simon & Schuster.
-ISBN: 978-0-689-12182-3.
+[16] Shay, J. (1994). *Achilles in Vietnam: Combat Trauma and the Undoing
+of Character*. Atheneum / Simon & Schuster. ISBN: 978-0-689-12182-3.
 
-> Introduced the concept of moral injury through analysis of Vietnam
-> combat veterans' experiences, drawing parallels to Homer's *Iliad*.
-> Defined moral injury as arising from a betrayal of "what's right" by
-> someone in legitimate authority in a high-stakes situation. Referenced
-> in Appendix B.4.
+> Introduced the concept of moral injury. Referenced in Section 8.4.
 
 [17] Litz, B. T., Stein, N., Delaney, E., Lebowitz, L., Nash, W. P.,
 Silva, C., & Maguen, S. (2009). Moral injury and moral repair in war
@@ -1640,12 +1155,94 @@ veterans: A preliminary model and intervention strategy. *Clinical
 Psychology Review*, 29(8), 695–706.
 doi:[10.1016/j.cpr.2009.07.003](https://doi.org/10.1016/j.cpr.2009.07.003)
 
-> Formalized moral injury as a clinical construct and proposed a
-> treatment model. Defined moral injury as resulting from "perpetrating,
-> failing to prevent, bearing witness to, or learning about acts that
-> transgress deeply held moral beliefs and expectations." This definition
-> is quoted in Appendix B.4 and applied to knowledge workers operating
-> under synthetic metrics.
+> Formalized moral injury as a clinical construct. Definition quoted in
+> Section 8.4.
+
+### Organizational Measurement
+
+[18] Austin, R. D. (1996). *Measuring and Managing Performance in
+Organizations*. Dorset House. ISBN: 978-0-932633-36-1.
+
+> Proved that incomplete measurement creates inevitable incentives to
+> optimize measured dimensions at the expense of unmeasured ones. The
+> information-asymmetry framing closely parallels Section 7. The single
+> most important predecessor to this paper's argument.
+
+[19] Muller, J. Z. (2018). *The Tyranny of Metrics*. Princeton University
+Press. ISBN: 978-0-691-17495-2.
+
+> Comprehensive treatment of "metric fixation" across education,
+> healthcare, policing, and finance. Extensive empirical evidence for the
+> patterns theorized in Section 7.4.
+
+### Scheduling Fairness
+
+[20] Coffman, E. G., Shanthikumar, J. G., & Yao, D. D. (1992).
+Multiclass queueing systems: Polymatroid structure and optimal scheduling
+control. *Operations Research*, 40(S2), S293–S299.
+
+> Conservation laws in scheduling. The schedule-invariance of
+> work-weighted completion time (Theorem 2) is an instance of these
+> conservation laws.
+
+[21] Angel, E., Bampis, E., & Pascual, F. (2008). How good are SPT
+schedules for fair optimality criteria? *Annals of Operations Research*,
+159(1), 53–64. doi:[10.1007/s10479-007-0267-0](https://doi.org/10.1007/s10479-007-0267-0)
+
+> Directly measures SPT schedule quality against fairness criteria.
+> Closest predecessor in scheduling theory to Section 4's fairness
+> analysis.
+
+[22] Bansal, N., & Harchol-Balter, M. (2001). Analysis of SRPT
+scheduling: Investigating unfairness. *ACM SIGMETRICS Performance
+Evaluation Review*, 29(1), 279–290.
+doi:[10.1145/384268.378792](https://doi.org/10.1145/384268.378792)
+
+> Investigates the belief that SRPT unfairly penalizes large jobs in
+> computer scheduling. Argues unfairness is smaller than believed but
+> acknowledges the core tension.
+
+[23] Wierman, A., & Harchol-Balter, M. (2003). Classifying scheduling
+policies with respect to unfairness in an M/GI/1. *ACM SIGMETRICS
+Performance Evaluation Review*, 31(1), 238–249.
+
+> Formalizes fairness definitions for scheduling policies by comparison
+> to Processor-Sharing.
+
+### Additional References
+
+[24] Campbell, D. T. (1979). Assessing the impact of planned social
+change. *Evaluation and Program Planning*, 2(1), 67–90.
+doi:[10.1016/0149-7189(79)90048-X](https://doi.org/10.1016/0149-7189(79)90048-X)
+
+> Campbell's Law: "The more any quantitative social indicator is used for
+> social decision-making, the more subject it will be to corruption
+> pressures and the more apt it will be to distort and corrupt the social
+> processes it is intended to monitor." Complements Goodhart's Law [6].
+
+[25] Ferreira, C. M., et al. (2024). It's business: A qualitative study
+of moral injury in business settings. *Journal of Business Ethics*.
+doi:[10.1007/s10551-024-05615-0](https://doi.org/10.1007/s10551-024-05615-0)
+
+> Extends moral injury to for-profit workplaces. Validates Section 8.4's
+> application of Shay/Litz beyond military and healthcare settings.
+
+[26] Bevan, G., & Hood, C. (2006). What's measured is what matters:
+Targets and gaming in the English public health care system. *Public
+Administration*, 84(3), 517–538.
+doi:[10.1111/j.1467-9299.2006.00600.x](https://doi.org/10.1111/j.1467-9299.2006.00600.x)
+
+> Empirically documents gaming behaviors including "hitting the target
+> and missing the point." Provides real-world evidence for Section 5.2's
+> priority-metric contradiction.
+
+[27] Moore, C. (2012). Why employees do bad things: Moral disengagement
+and unethical organizational behavior. *Personnel Psychology*, 65(1),
+1–48. doi:[10.1111/j.1744-6570.2011.01237.x](https://doi.org/10.1111/j.1744-6570.2011.01237.x)
+
+> Analyzes moral *disengagement* — the cognitive restructuring enabling
+> unethical behavior. Section 8 addresses the complementary phenomenon:
+> harm to individuals who *refuse* to disengage.
 
 ---