From 6d3e4a5cb3344fba3b4867f49ac5d598d692fc6b Mon Sep 17 00:00:00 2001 From: Mortdecai Date: Sat, 28 Mar 2026 19:03:12 -0400 Subject: [PATCH] Complete structural revision: reorganize, add citations, tighten prose Major restructure into four parts with clear argumentative arc: Part I: Mathematical Foundation (Theorems 1-7) Part II: Priority Systems (Theorems 8-11, IT example) Part III: Organizational Dynamics (info asymmetry, psychology, manager strategy) Part IV: Assessment (devil's advocate, related work, conclusion) Structural changes: - Added Section 1 (Introduction) framing the contribution - Promoted Appendices A/B to full Sections 7/8 (load-bearing content) - Merged Little's Law as a remark in Section 3.2 (was a detour) - Merged "When Valid" into Devil's Advocate Section 10.5 - Added Section 11 (Related Work) situating the paper - Cleaned up "Hmm" and "Wait" language in Theorems 11/WSJF - Renumbered all sections and cross-references - Net reduction of 400 lines while adding new content New citations [18-27]: - Austin (1996) - measurement dysfunction (most important predecessor) - Muller (2018) - The Tyranny of Metrics - Coffman/Shanthikumar/Yao (1992) - conservation laws in scheduling - Angel/Bampis/Pascual (2008) - SPT fairness criteria - Bansal/Harchol-Balter (2001) - SRPT unfairness - Wierman/Harchol-Balter (2003) - fairness classifications - Campbell (1979) - Campbell's Law - Ferreira et al. (2024) - moral injury in business - Bevan/Hood (2006) - gaming in public health - Moore (2012) - moral disengagement (complementary to our argument) Citations woven into body: Austin referenced in Sections 4.1, 5.3; scheduling fairness papers in Section 4.2 note; Campbell/Muller in Section 7.4; moral injury extension in Section 8.4; all contextualized in Related Work Section 11. Co-Authored-By: Claude Opus 4.6 (1M context) --- README.md | 2013 +++++++++++++++++++++-------------------------------- 1 file changed, 805 insertions(+), 1208 deletions(-) diff --git a/README.md b/README.md index c6fc466..ba10bcf 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,50 @@ of genuine throughput or service quality. --- -## 1. Definitions +## 1. Introduction + +Many organizations measure task-execution performance by **unweighted mean +completion time**: the average number of hours (or days) between task +submission and task resolution, counting each task equally regardless of +size or priority. + +This paper proves that this metric is not merely imprecise but structurally +biased. It can be improved by reordering work without doing any additional +work (Theorem 1), while a properly weighted alternative is completely +immune to scheduling manipulation (Theorem 2). When combined with a +priority system, the metric actively contradicts the organization's own +priority classifications (Theorem 9). + +The argument proceeds in four parts: + +- **Part I** (Sections 2–4) establishes the mathematical foundation: + the unweighted mean is gameable by Shortest Processing Time (SPT) + scheduling, the work-weighted mean is schedule-invariant, and the + resulting service-quality consequences are provably negative. + +- **Part II** (Sections 5–6) extends the model to priority-classified + tasks, proves the metric becomes adversarial to the priority system, + and proposes weighted alternatives with a worked IT service desk example. + +- **Part III** (Sections 7–9) examines organizational dynamics: what + happens when the metric is reported to clients (information asymmetry), + what happens to team members who understand its flaws (psychological + harm), and what a single informed manager can do about it (constrained + optimization with game-theoretic stability analysis). + +- **Part IV** (Sections 10–12) presents honest counterarguments, situates + the work in existing literature, and concludes. + +The core results build on Smith's (1956) foundational scheduling theory [1], +extended through game theory [9, 10], organizational measurement theory +[18, 19], and psychology [11–17] to trace a complete chain from a +mathematical proof about a specific metric to organizational outcomes. + +--- + +# Part I: Mathematical Foundation + +## 2. Definitions Let there be **n** tasks with processing times $p_1, p_2, \ldots, p_n$. @@ -28,16 +71,19 @@ $$\bar{C}_w(\sigma) = \frac{\sum_{k=1}^{n} p_{\sigma(k)} \cdot C_{\sigma(k)}}{\s --- -## 2. SPT Is Optimal for the Unweighted Statistic +## 3. Core Results -**Theorem 1.** The schedule that minimizes $\bar{C}(\sigma)$ is Shortest -Processing Time first (SPT): sort tasks so that $p_{\sigma(1)} \le p_{\sigma(2)} \le \cdots \le p_{\sigma(n)}$. +### 3.1 The Unweighted Mean Is Gameable -**Proof (exchange argument).** +**Theorem 1** (Smith, 1956 [1])**.** The schedule that minimizes +$\bar{C}(\sigma)$ is Shortest Processing Time first (SPT): sort tasks so +that $p_{\sigma(1)} \le p_{\sigma(2)} \le \cdots \le p_{\sigma(n)}$. + +**Proof (exchange argument [1, 2]).** Consider any schedule $\sigma$ in which two adjacent tasks $i, j$ satisfy -$p_i > p_j$ with task $i$ scheduled immediately before task $j$. Let $t$ be the -start time of task $i$. +$p_i > p_j$ with task $i$ scheduled immediately before task $j$. Let $t$ +be the start time of task $i$. | | Task $i$ finishes | Task $j$ finishes | Sum | |---|---|---|---| @@ -48,16 +94,14 @@ The change in the sum of completion times is: $$(2p_i + p_j) - (p_i + 2p_j) = p_i - p_j > 0$$ -Every swap of a longer-before-shorter adjacent pair strictly reduces the total. -Any non-SPT schedule contains such a pair. Repeated swaps converge to SPT. -Therefore SPT uniquely minimizes $\bar{C}(\sigma)$. $\blacksquare$ +Every swap of a longer-before-shorter adjacent pair strictly reduces the +total. Any non-SPT schedule contains such a pair. Repeated swaps converge +to SPT. Therefore SPT uniquely minimizes $\bar{C}(\sigma)$. $\blacksquare$ ---- +### 3.2 The Work-Weighted Mean Is Schedule-Invariant -## 3. The Work-Weighted Statistic Is Schedule-Invariant - -**Theorem 2.** The work-weighted mean completion time $\bar{C}_w(\sigma)$ is -the same for every schedule $\sigma$. +**Theorem 2.** The work-weighted mean completion time $\bar{C}_w(\sigma)$ +is the same for every schedule $\sigma$. **Proof.** @@ -65,27 +109,24 @@ Expand the numerator: $$\sum_{k=1}^{n} p_{\sigma(k)} \cdot C_{\sigma(k)} = \sum_{k=1}^{n} p_{\sigma(k)} \sum_{j=1}^{k} p_{\sigma(j)}$$ -Reindex by letting $a = \sigma(k)$ and $b = \sigma(j)$. The double sum counts -every ordered pair $(a, b)$ where $b$ is scheduled no later than $a$: +Reindex by letting $a = \sigma(k)$ and $b = \sigma(j)$. The double sum +counts every ordered pair $(a, b)$ where $b$ is scheduled no later than $a$: $$= \sum_{\substack{a, b \\ b \preceq_\sigma a}} p_a \, p_b$$ -For any pair $(a, b)$ with $a \ne b$, exactly one of $\{b \preceq_\sigma a\}$ -or $\{a \prec_\sigma b\}$ holds. The diagonal terms ($a = b$) contribute $p_a^2$ -regardless of order. Therefore: +For any pair $(a, b)$ with $a \ne b$, exactly one of +$\{b \preceq_\sigma a\}$ or $\{a \prec_\sigma b\}$ holds. The diagonal +terms ($a = b$) contribute $p_a^2$ regardless of order. Therefore: $$\sum_{\substack{a, b \\ b \preceq_\sigma a}} p_a \, p_b = \sum_{a} p_a^2 + \sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b$$ -Now consider the complementary sum: - -$$\sum_{\substack{a \ne b \\ a \prec_\sigma b}} p_a \, p_b$$ - -Together the two off-diagonal sums cover all unordered pairs $\{a, b\}$: +Together with the complementary sum, the two off-diagonal sums cover all +unordered pairs: $$\sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b + \sum_{\substack{a \ne b \\ a \prec_\sigma b}} p_a \, p_b = \sum_{a \ne b} p_a \, p_b$$ -The right-hand side is schedule-independent. By symmetry of $p_a p_b$, both -off-diagonal sums are equal: +The right-hand side is schedule-independent. By symmetry of $p_a p_b$, +both off-diagonal sums are equal: $$\sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b = \frac{1}{2} \sum_{a \ne b} p_a \, p_b$$ @@ -100,212 +141,121 @@ $$\bar{C}_w(\sigma) = \frac{\frac{1}{2}\left(\sum p_a\right)^2 + \frac{1}{2}\sum is **constant across all schedules**. $\blacksquare$ ---- +This is an instance of the conservation laws in scheduling identified by +Coffman, Shanthikumar, and Yao [20]. The invariance corresponds to +measuring how long a unit of *work* waits rather than how long a *task* +waits — the unweighted statistic counts completions rather than work, +which is why it is gameable. (See also Little [3, 4] for the queueing- +theoretic context, with the caveat that Little's Law applies directly +only to steady-state systems, not to the batch case analyzed here.) -## 4. Concrete Example +### 3.3 Illustrative Example Two tasks: $A$ with $p_A = 1$ hour, $B$ with $p_B = 10$ hours. -### SPT order (A first) - -| Task | Completion time | -|------|----------------| -| A | 1 | -| B | 11 | - -- Unweighted mean: $(1 + 11) / 2 = 6.0$ -- Work-weighted mean: $(1 \times 1 + 10 \times 11) / 11 = 111/11 \approx 10.09$ - -### Reverse order (B first) - -| Task | Completion time | -|------|----------------| -| B | 10 | -| A | 11 | - -- Unweighted mean: $(10 + 11) / 2 = 10.5$ -- Work-weighted mean: $(10 \times 10 + 1 \times 11) / 11 = 111/11 \approx 10.09$ +| Schedule | $C_A$ | $C_B$ | Unweighted mean | Work-weighted mean | +|----------|-------|-------|-----------------|-------------------| +| SPT (A first) | 1 | 11 | 6.0 | 111/11 ≈ 10.09 | +| Reverse (B first) | 11 | 10 | 10.5 | 111/11 ≈ 10.09 | SPT appears **4.5 hours better** on the unweighted metric but provides -**zero improvement** on the work-weighted metric. The apparent advantage exists -only because the unweighted statistic lets a 1-hour task "vote" equally with -a 10-hour task. +**zero improvement** on the work-weighted metric. The apparent advantage +exists only because the unweighted statistic lets a 1-hour task "vote" +equally with a 10-hour task. --- -## 5. Connection to Little's Law +## 4. Consequences for Service Quality -Little's Law states $L = \lambda W$, where $L$ is the time-averaged number -of tasks in the system, $\lambda$ is the arrival rate, and $W$ is the -average time a task spends in the system. +### 4.1 Starvation of Large Tasks -In a *steady-state* queueing system with fixed arrival and service rates, -$\lambda$ and the long-run service rate are determined by the workload, not -by scheduling policy. Little's Law then tells us that $L$ and $W$ are -linked, but in the batch case (all $n$ tasks present at time 0), $L$ and -$W$ are both schedule-dependent: $\bar{C} = W$, and -$L = \sum C_i / \sum p_i$, both of which SPT minimizes. +**Theorem 3 (Metric Bias).** Any scheduling policy that minimizes +unweighted mean completion time necessarily maximizes the completion time +of the largest task. -The invariance we proved in Theorem 2 is more specific: *work-weighted* -mean completion time $\bar{C}_w$ is constant across schedules. This -corresponds to measuring the system from the perspective of "how long does -a unit of *work* wait" rather than "how long does a *task* wait." The -unweighted statistic measures the latter and is gameable precisely because -it counts completions rather than work. - ---- - -## 6. Consequences - -**Theorem 3 (Metric Bias).** Any scheduling policy that minimizes unweighted -mean completion time necessarily maximizes the completion time of the largest -task relative to other schedules. - -**Proof.** SPT places the largest task last. Its completion time equals the -total processing time $\sum p_i$, which is the maximum possible completion -time for any individual task. Meanwhile, FIFO or any non-SPT order would -allow the large task to finish earlier. $\blacksquare$ +**Proof.** SPT places the largest task last. Its completion time equals +the total processing time $\sum p_i$, which is the maximum possible +completion time for any individual task. Under any schedule that does not +place the largest task last, that task completes strictly earlier. +$\blacksquare$ This creates a **starvation incentive**: rational agents optimizing the -unweighted statistic will indefinitely defer large tasks in favor of -small ones. +unweighted statistic will indefinitely defer large tasks in favor of small +ones. Austin [18] identified this general pattern — that incomplete +measurement creates incentives to optimize the measured dimension at the +expense of unmeasured ones — in the context of organizational performance +management. Theorem 3 provides the specific mechanism for task scheduling. -### Real-world manifestations - -| Domain | Gameable metric | Perverse outcome | -|--------|----------------|------------------| -| Support desks | Tickets closed / day | Complex issues ignored | -| Sprint planning | Story count velocity | Work split into trivial pieces | -| Emergency rooms | Average wait time | Critical patients deprioritized | -| Academic publishing | Papers per year | Incremental work favored over deep research | - ---- - -## 7. Impact on Client Satisfaction and Team Productivity - -The preceding theorems are not merely abstract. They have direct, provable -consequences for client satisfaction and team productivity when a team adopts -unweighted mean completion time as its performance metric. - -### 7.1 Defining Client Satisfaction: The Slowdown Ratio - -A client submitting a task of size $p_i$ has an expectation anchored to that -size. The natural measure of their experience is the **slowdown ratio**: - -$$S_i = \frac{C_i}{p_i}$$ - -This is the factor by which the client's wait exceeds the task's inherent -processing time. A slowdown of 1 means no queuing delay at all. A slowdown -of 10 means the client waited 10x longer than the work itself required. - -Client satisfaction is inversely related to slowdown: a client who waits -2x their task size is more satisfied than one who waits 20x, regardless of -the absolute times involved. +### 4.2 Maximum Completion Time for the Largest Task **Theorem 4 (SPT Uniquely Maximizes Completion Time of the Largest Task).** Among all schedules, SPT is the unique policy that assigns the maximum possible completion time ($\sum p_i$) to the largest task. -**Proof.** - -SPT sorts tasks in ascending order of $p_i$, placing the largest task -$p_{\max}$ in the last position. The last task in any schedule has -completion time $\sum_{i=1}^{n} p_i$, which is the maximum completion time -any individual task can receive. Therefore, under SPT: - -$$C_{\max\text{-task}}^{\text{SPT}} = \sum_{i=1}^{n} p_i$$ - -Under any schedule that does not place $p_{\max}$ last, the largest task -completes strictly before $\sum p_i$. SPT is the unique schedule (among -those ordered by processing time) that assigns this worst-case completion -time to the largest task. - -Note on slowdown: SPT actually *compresses* slowdown ratios ($S_i = C_i / p_i$) -because larger tasks in later positions have large denominators that absorb -the accumulated sum. For example, with tasks $[1, 5, 10]$: - -- SPT: slowdowns $[1, 1.2, 1.6]$ — low variance -- LPT: slowdowns $[1, 3, 16]$ — high variance - -SPT's harm to large-task clients is not visible in the slowdown ratio. It is -visible in **absolute completion time**: the largest task finishes last, at -$\sum p_i$, while under any other ordering it finishes earlier. $\blacksquare$ +**Proof.** SPT sorts tasks in ascending order of $p_i$, placing the largest +task $p_{\max}$ in the last position. The last task in any schedule has +completion time $\sum_{i=1}^{n} p_i$, which is the maximum any individual +task can receive. Under any schedule that does not place $p_{\max}$ last, +it completes strictly before $\sum p_i$. $\blacksquare$ **Corollary 4.1.** A team optimizing unweighted mean completion time will systematically deliver the worst experience to clients with the most -complex needs. +complex needs. This is not a side effect — it is the *mechanism* by which +the metric improves. -This is not a side effect — it is the *mechanism* by which the metric improves. -The only way to lower the unweighted average is to complete more small tasks -early, which necessarily means completing large tasks later. The metric -improves *because* high-effort clients are deprioritized. +**Note on slowdown ratios.** SPT actually *compresses* slowdown ratios +($S_i = C_i / p_i$) because larger tasks in later positions have large +denominators that absorb the accumulated sum. For example, with tasks +$[1, 5, 10]$: SPT gives slowdowns $[1, 1.2, 1.6]$ (low variance) while +LPT gives $[1, 3, 16]$ (high variance). SPT's harm to large-task clients +is not visible in the slowdown ratio — it is visible in **absolute +completion time**. This distinction is important: the scheduling fairness +literature [21, 22, 23] has debated SPT/SRPT unfairness primarily through +slowdown-based measures, which can obscure the absolute-delay burden +proved below. -### 7.2 The Absolute Delay Burden +### 4.3 Delay Concentration -The slowdown ratio $S_i = C_i / p_i$ might suggest SPT is *fair* — it -compresses slowdown variance by giving everyone a ratio close to 1. But -this obscures the real cost. The correct measure of burden is the -**absolute delay** experienced by each task: +**Theorem 5 (SPT Concentrates Delay on the Largest Task).** Under SPT, +the largest task bears more absolute delay than under any other schedule. -$$\Delta_i = C_i - p_i$$ - -This is the time a task spends waiting for other tasks, independent of its -own size. Under any sequential schedule, the total delay across all tasks -is schedule-dependent (it equals $\sum C_i - \sum p_i$), and SPT minimizes -this total. But the *distribution* of delay matters. - -**Theorem 5 (SPT Concentrates Delay on the Largest Task).** Under SPT, the -largest task bears more absolute delay than under any other schedule. - -**Proof.** Under SPT, the largest task is in position $n$ with: +**Proof.** Define absolute delay as $\Delta_i = C_i - p_i$ (time spent +waiting, independent of own size). Under SPT, the largest task is in +position $n$ with: $$\Delta_{\max\text{-task}}^{\text{SPT}} = C_n - p_n = \sum_{i=1}^{n-1} p_i$$ This is the sum of all other tasks' processing times — the maximum possible delay for any single task. Under any schedule where the largest task is not -last, its delay is strictly less than $\sum_{i \ne \max} p_i$. +last, its delay is strictly less. Meanwhile, SPT gives the smallest task +zero delay ($\Delta_1^{\text{SPT}} = 0$). The entire queuing burden is +shifted from small tasks to large tasks. $\blacksquare$ -Meanwhile, SPT gives the smallest task zero delay ($\Delta_1^{\text{SPT}} = 0$). -The entire queuing burden is shifted from small tasks to large tasks. -$\blacksquare$ +SPT minimizes *total* delay (good for aggregate efficiency) by +concentrating delay onto the tasks best able to absorb it in slowdown-ratio +terms. But in absolute terms — hours spent waiting — the largest task bears +the full weight. -The tension is this: SPT minimizes total delay (good for aggregate -efficiency) by concentrating delay onto the tasks best able to "absorb" it -in slowdown-ratio terms. But in absolute terms — hours spent waiting — the -largest task bears the full weight. If that task represents a critical -business need, the absolute delay, not the ratio, determines the damage. - -### 7.3 Productivity Is Not Improved +### 4.4 Throughput Invariance **Theorem 6 (Throughput Invariance).** Total work completed over any time horizon $T$ is identical under all scheduling policies. -**Proof.** The executor processes work at a fixed rate. Over time $T$, the -total work completed is: - -$$W(T) = \sum_{\{i : C_i \le T\}} p_i + \text{(partial progress on current task)}$$ - -In the non-preemptive case (tasks run to completion once started), $W(T)$ may -vary slightly at the boundary depending on which task is in progress at time -$T$. However, over any horizon $T \ge \sum p_i$ (i.e., long enough to -complete all tasks), the total work done is exactly $\sum p_i$ regardless -of order. - -For the steady-state case with ongoing arrivals, the long-run throughput is -determined by the service rate $\mu$ and is completely independent of -scheduling: +**Proof.** The executor processes work at a fixed rate. Over any horizon +$T \ge \sum p_i$, the total work done is exactly $\sum p_i$ regardless of +order. For the steady-state case with ongoing arrivals, the long-run +throughput is determined by the service rate $\mu$ and is completely +independent of scheduling: $$\lim_{T \to \infty} \frac{W(T)}{T} = \mu \quad \text{for all schedules } \sigma$$ $\blacksquare$ **Corollary 6.1.** A team that switches from any scheduling policy to SPT -will observe an improvement in unweighted mean completion time with -**zero change in actual throughput**. +will observe an improvement in unweighted mean completion time with **zero +change in actual throughput**. The metric improves. The output does not. -The metric improves. The output does not. - -### 7.4 The Compound Effect: Satisfaction Down, Productivity Flat +### 4.5 The Compound Effect Combining Theorems 4, 5, and 6: @@ -315,26 +265,19 @@ Combining Theorems 4, 5, and 6: | Delay for small tasks | Minimized — approaches zero (SPT) | | Delay for large tasks | **Maximized** — bears all queuing burden (Theorem 5) | | Completion time of largest task | **Maximum possible**: $\sum p_i$ (Theorem 4) | -| Overall perceived quality of service | **Net negative** (see below) | The net effect on perceived quality is negative because: -1. **Loss aversion is asymmetric.** A client whose 100-hour task is - deprioritized to last experiences a large, salient negative. A client - whose 1-hour task moves from position 5 to position 1 experiences a - small, often unnoticed positive. The absolute dissatisfaction created - exceeds the absolute satisfaction gained. +1. **Loss aversion is asymmetric** [8]. A client whose 100-hour task is + deprioritized experiences a large, salient negative. A client whose + 1-hour task is expedited experiences a small, often unnoticed positive. 2. **High-effort tasks correlate with high-value clients.** Large tasks are disproportionately likely to come from major clients, complex - contracts, or critical business needs. Systematically giving these - clients the worst experience is anti-correlated with revenue and - retention. + contracts, or critical business needs. 3. **Starvation compounds.** In a continuous system (Theorem 3), large - tasks are not merely delayed — they may be **indefinitely deferred** - as new small tasks keep arriving. The affected client's satisfaction - does not merely decrease; it collapses entirely. + tasks may be **indefinitely deferred** as new small tasks keep arriving. **Theorem 7 (The Core Result).** For a team processing tasks of non-uniform size, adopting unweighted mean completion time as a performance metric: @@ -345,38 +288,23 @@ size, adopting unweighted mean completion time as a performance metric: (c) **Concentrating all queuing delay** onto the largest tasks while eliminating delay for the smallest (Theorem 5). -This is not a tradeoff — there is no compensating benefit on the productivity -side. The metric creates a pure transfer of service quality from high-effort -clients to low-effort clients, with no net work gained. - -**A team using unweighted mean completion time as its performance metric -will, under rational optimization, simultaneously fail to improve -productivity and systematically degrade the experience of its most -demanding clients.** $\blacksquare$ +This is not a tradeoff. The metric creates a pure transfer of service +quality from high-effort clients to low-effort clients, with no net work +gained. $\blacksquare$ --- -## 8. When Unweighted Mean Completion Time Is Valid +# Part II: Priority Systems -For completeness: the unweighted metric is appropriate **if and only if** -all tasks are approximately equal in size ($p_i \approx p_j$ for all $i, j$). -In this case, the work-weighted and unweighted statistics converge, SPT and -FIFO produce similar schedules, and slowdown ratios are naturally equal. +## 5. Breakdown Under Priority Classification -The pathology arises specifically from **variance in task size**. The greater -the variance, the greater the distortion, and the more damage the metric -causes when optimized. +The preceding sections proved that unweighted mean completion time is +biased when tasks vary in size. We now show that introducing a **priority +system** — as virtually all real teams use — causes the metric to become +not merely biased but **actively adversarial** to the organization's stated +goals. ---- - -## 9. Complete Breakdown Under Priority Classification - -The preceding sections proved that unweighted mean completion time is biased -when tasks vary in size. We now show that introducing a **priority system** — -as virtually all real teams use — causes the metric to become not merely -biased but **actively adversarial** to the organization's stated goals. - -### 9.1 Extended Model: Tasks With Priority +### 5.1 Extended Model: Tasks With Priority Let each task $i$ have processing time $p_i$ and a priority class $q_i \in \{1, 2, 3, 4\}$ where 1 is the highest priority (critical) and @@ -388,213 +316,140 @@ The specific weights are illustrative; the results hold for any strictly decreasing weight function. The key property is that priority is assigned by **business impact**, not by task size. -### 9.2 The Metric Contradicts the Priority System +### 5.2 The Metric Contradicts the Priority System **Theorem 8 (Priority-Size Inversion).** When priority is independent of -task size, the schedule that minimizes unweighted mean completion time (SPT) -will, in expectation, complete low-priority tasks before high-priority tasks -of greater size. +task size, the schedule that minimizes unweighted mean completion time +(SPT) will, in expectation, complete low-priority tasks before +high-priority tasks of greater size. -**Proof.** - -SPT orders tasks by $p_i$ ascending, regardless of $q_i$. Consider two tasks: +**Proof.** SPT orders tasks by $p_i$ ascending, regardless of $q_i$. +Consider two tasks: - Task A: $p_A = 40$ hours, $q_A = 1$ (Critical — e.g., server outage) - Task B: $p_B = 0.5$ hours, $q_B = 4$ (Low — e.g., cosmetic UI fix) -SPT schedules B before A. The unweighted mean completion time for this pair: +SPT schedules B before A. The unweighted mean for this pair: -$$\bar{C}^{\text{SPT}} = \frac{0.5 + 40.5}{2} = 20.5$$ - -The priority-respecting order (A before B): - -$$\bar{C}^{\text{priority}} = \frac{40 + 40.5}{2} = 40.25$$ +$$\bar{C}^{\text{SPT}} = \frac{0.5 + 40.5}{2} = 20.5 \qquad \bar{C}^{\text{priority}} = \frac{40 + 40.5}{2} = 40.25$$ The metric declares SPT nearly **twice as good** — despite completing a -cosmetic fix while a server outage burns for an additional 0.5 hours. +cosmetic fix while a server outage burns. -In general, for $n$ tasks where priority $q_i$ is statistically independent -of processing time $p_i$ (a reasonable assumption, since priority reflects -business impact while processing time reflects technical complexity): +In general, when $q_i$ is statistically independent of $p_i$, SPT's +ordering has **zero correlation** with priority. In practice, Critical +tasks (outages, security incidents, data loss) often require more work +than Low tasks, so the metric is plausibly **anti-correlated** with the +priority system. $\blacksquare$ -$$\text{Corr}(p_i, q_i) \approx 0$$ +### 5.3 Information Destruction -SPT's ordering is determined entirely by $p_i$. The expected position of a -task in the SPT schedule has **zero correlation** with its priority. A -Critical task is equally likely to be scheduled first or last. - -More precisely: the expected fraction of Critical tasks in the bottom half -of the SPT schedule equals the fraction of Critical tasks whose processing -time exceeds the median. In practice, Critical tasks (outages, security -incidents, data loss) often require more work, so this fraction exceeds 50%. -The metric is not merely uncorrelated with priority — it is plausibly -**anti-correlated**. $\blacksquare$ - -### 9.3 Dimensionality Collapse - -The unweighted mean completion time reduces a three-dimensional task -$(p_i, q_i, C_i)$ to a one-dimensional signal ($C_i$), then averages -that signal uniformly. This discards two of the three dimensions: - -1. **Priority ($q_i$) is completely ignored.** A critical task and a - cosmetic task contribute identically to the mean. -2. **Size ($p_i$) is implicitly inverted.** Small tasks are rewarded with - early completion, large tasks are punished — regardless of their - importance. +The unweighted mean reduces a three-dimensional task $(p_i, q_i, C_i)$ to +a one-dimensional signal ($C_i$), then averages uniformly. This discards +priority entirely and implicitly inverts size. **Theorem 9 (Information Destruction).** Let $I(\sigma)$ be the mutual -information between the schedule's implicit priority ranking (position in -schedule) and the actual priority assignment $q_i$. For SPT: +information between the schedule's implicit priority ranking (position) +and the actual priority assignment $q_i$. For SPT: $$I(\sigma_{\text{SPT}}) = 0 \quad \text{when } p_i \perp q_i$$ -**Proof.** SPT assigns positions based solely on $p_i$. When $p_i$ and $q_i$ -are independent, knowing a task's position in the SPT schedule provides -zero information about its priority. The schedule is statistically -independent of the priority system. - -Contrast this with a priority-first schedule, where $I > 0$ by construction. -$\blacksquare$ +**Proof.** SPT assigns positions based solely on $p_i$. When $p_i$ and +$q_i$ are independent, knowing a task's position in the SPT schedule +provides zero information about its priority. $\blacksquare$ **Corollary 9.1.** A team that optimizes unweighted mean completion time is operating a scheduling system that carries zero information about its own priority classification. The priority field in their ticketing system is, with respect to execution order, decorative. -### 9.4 Quantifying the Damage: Priority-Weighted Delay Cost +This is an instance of what Austin [18] calls the fundamental problem of +incomplete measurement: when the measurement system captures only a subset +of the relevant dimensions, optimizing the measurement systematically +degrades the unmeasured dimensions. + +### 5.4 Priority-Weighted Delay Cost Define the **priority-weighted delay cost** of a schedule: $$D(\sigma) = \sum_{i=1}^{n} w(q_i) \cdot C_i$$ -This measures the total business-impact-weighted time spent waiting. +**Theorem 10 (SPT and Priority-Weighted Delay Cost).** The optimal +schedule for minimizing $D(\sigma)$ is WSJF: order by $w(q_i)/p_i$ +descending [1, 5]. SPT's ordering — by $1/p_i$ descending — ignores +priority entirely and produces higher $D$ than priority-respecting +alternatives when priority is correlated with task size. -**Theorem 10 (SPT and Priority-Weighted Delay Cost).** -The optimal schedule for minimizing priority-weighted delay cost $D(\sigma)$ -is WSJF: order by $w(q_i)/p_i$ descending. SPT's ordering — by $1/p_i$ -descending — ignores priority entirely and produces higher $D$ than -priority-respecting alternatives when priority is correlated with task size. - -**Proof.** By the standard exchange argument (as in Theorem 1), swapping -adjacent tasks $i, j$ in a schedule changes $D$ by: +**Proof.** By the exchange argument, swapping adjacent tasks $i, j$ +changes $D$ by: $$\Delta D = w(q_j) \cdot p_i - w(q_i) \cdot p_j$$ -The swap improves $D$ when $\Delta D > 0$, i.e., when $w(q_j)/p_j > w(q_i)/p_i$ -but $j$ is scheduled after $i$. Therefore the optimal order is decreasing -$w(q_i)/p_i$ — this is the WSJF rule. +The swap improves $D$ when $w(q_j)/p_j > w(q_i)/p_i$ but $j$ is +scheduled after $i$. Therefore the optimal order is decreasing +$w(q_i)/p_i$ — the WSJF rule. SPT corresponds to WSJF only when +$w(q_i) = \text{const}$ (all tasks have equal priority). -SPT orders by $p_i$ ascending (equivalently, $1/p_i$ descending), which -corresponds to WSJF only when $w(q_i) = \text{const}$ — i.e., when all -tasks have equal priority. - -**Example.** Two tasks: Critical ($w = 8$, $p_H = 10$) and Low ($w = 1$, $p_L = 1$). - -WSJF scores: Critical = $8/10 = 0.8$, Low = $1/1 = 1.0$. - -WSJF places the Low task first (higher $w/p$), same as SPT. Here, SPT and -WSJF agree because the Low task's tiny size dominates despite its low weight. - -Now consider: Critical ($w = 8$, $p_H = 3$) and Low ($w = 1$, $p_L = 2$). - -WSJF scores: Critical = $8/3 = 2.67$, Low = $1/2 = 0.5$. - -WSJF places Critical first. SPT places Low first (smaller $p$). The costs: +**Example.** Critical ($w = 8$, $p = 3$) and Low ($w = 1$, $p = 2$): - SPT (Low first): $D = 1 \cdot 2 + 8 \cdot 5 = 42$ - WSJF (Critical first): $D = 8 \cdot 3 + 1 \cdot 5 = 29$ -SPT incurs 45% more priority-weighted delay because it ignores the 8x -priority weight of the Critical task. - -In general, SPT diverges from WSJF — and produces suboptimal $D$ — whenever -priority and task size are not perfectly inversely correlated. In practice, -Critical tasks tend to be larger (outages, security incidents), making the -divergence systematic rather than occasional. $\blacksquare$ +SPT incurs 45% more priority-weighted delay. In practice, Critical tasks +tend to be larger (outages, security incidents), making the divergence +systematic. $\blacksquare$ --- -## 10. A Proposed Solution: Priority-Weighted Completion Score +## 6. Proposed Solutions -### 10.1 The Metric +### 6.1 Priority-Weighted Metrics Replace unweighted mean completion time with the **Priority-Weighted Completion Score (PWCS)**: $$\text{PWCS}(\sigma) = \frac{\sum_{i=1}^{n} w(q_i) \cdot \frac{C_i}{p_i}}{\sum_{i=1}^{n} w(q_i)}$$ -This is the priority-weighted mean slowdown ratio. It measures: +This is the priority-weighted mean slowdown ratio. It measures how long +each task waited relative to its size, weighted by how much that task +mattered. Lower is better. -- **How long each task waited relative to its size** (the slowdown $C_i / p_i$), - weighted by -- **How much that task mattered** (the priority weight $w(q_i)$). +**Properties:** -Lower is better. A PWCS of 1.0 means every task was completed instantly -with zero queuing delay. A PWCS of 3.0 means the average task waited 3x -its processing time, weighted by importance. +1. **Priority-respecting.** Delays to Critical tasks cost 8x more than + delays to Low tasks. +2. **Size-fair.** Uses slowdown ratio $C_i / p_i$, so large tasks are not + penalized for being large. +3. **Not gameable by SPT.** Reordering by processing time does not + systematically improve the score. +4. **Reduces to unweighted mean when tasks are uniform.** A strict + generalization. -### 10.2 Properties of PWCS +### 6.2 Optimal Policy: WSJF -**Property 1: Priority-respecting.** PWCS penalizes delays to high-priority -tasks more heavily than low-priority tasks. A 2-hour delay to a Critical -task costs 8x more than the same delay to a Low task. +**Theorem 11.** The schedule minimizing the priority-weighted completion +time $\text{PWCT}(\sigma) = \sum w(q_i) \cdot C_i / \sum w(q_i)$ processes +tasks in order of decreasing $w(q_i)/p_i$ — the **Weighted Shortest Job +First (WSJF)** rule [1, 5]. -**Property 2: Size-fair.** By using the slowdown ratio $C_i / p_i$ rather -than raw completion time $C_i$, the metric does not inherently penalize -large tasks for being large. A 40-hour task that waits 80 hours contributes -the same slowdown (2.0) as a 1-hour task that waits 2 hours. +**Proof.** By the exchange argument (as in Theorem 10), the swap of +adjacent tasks $i, j$ improves PWCT when $w(q_j)/p_j > w(q_i)/p_i$ but +$j$ is scheduled after $i$. The optimal order is therefore decreasing +$w(q_i)/p_i$. $\blacksquare$ -**Property 3: Not gameable by SPT.** Because the metric weights by priority -and normalizes by task size, reordering tasks by processing time does not -systematically improve the score. The optimal strategy is to minimize -slowdown for high-priority tasks — i.e., to **actually respect the priority -system**. +Within a priority class, this reduces to SPT (shortest first). Across +classes, a Critical 4-hour task ($w/p = 2.0$) beats a Low 1-hour task +($w/p = 1.0$). -**Property 4: Reduces to unweighted mean when tasks are uniform.** If all -tasks have equal priority and equal size, PWCS equals the unweighted mean -completion time divided by the common task size. It is a strict -generalization. +**Practical caveat.** Pure WSJF can place tiny Low-priority tasks ahead +of large Critical tasks (a 15-minute Low task has $w/p = 1/0.25 = 4.0$, +beating a 6-hour Critical at $w/p = 8/6 = 1.33$). In practice, this is +mitigated by enforcing **strict priority-class ordering** and applying +WSJF only *within* each class. -### 10.3 Optimal Policy for PWCS +### 6.3 Applied Example: IT Service Desk -**Theorem 11.** The schedule minimizing PWCS processes tasks in order of -decreasing $w(q_i) / p_i$ — highest priority first, breaking ties by -shortest processing time within the same priority class. - -**Proof (exchange argument, as in Theorem 1).** - -Consider adjacent tasks $i, j$ with $i$ before $j$. Each task's contribution -to the PWCS numerator depends on the completion times of both. Swapping $i$ -and $j$: - -The change in the weighted slowdown sum is proportional to: - -$$w(q_i) \cdot \frac{p_j}{p_i} - w(q_j) \cdot \frac{p_i}{p_j}$$ - -The swap improves PWCS when this quantity is positive, i.e., when: - -$$\frac{w(q_i)}{p_i^2} > \frac{w(q_j)}{p_j^2}$$ - -Hmm — this doesn't simplify as cleanly due to the ratio structure. Let -us instead consider the more practical **priority-weighted completion time**: - -$$\text{PWCT}(\sigma) = \frac{\sum_{i=1}^{n} w(q_i) \cdot C_i}{\sum_{i=1}^{n} w(q_i)}$$ - -For PWCT, the exchange argument gives: swap improves the score when -$w(q_j) \cdot p_i > w(q_i) \cdot p_j$, i.e., when $w(q_j)/p_j > w(q_i)/p_i$ -but $j$ is scheduled after $i$. The optimal order is therefore decreasing -$w(q_i)/p_i$, which is the **Weighted Shortest Job First (WSJF)** rule: - -$$\text{Schedule by: } \frac{w(q_i)}{p_i} \text{ descending}$$ - -This means: within a priority class, do short tasks first; across priority -classes, a Critical 8-hour task ($w/p = 8/8 = 1.0$) ties with a Low 1-hour -task ($w/p = 1/1 = 1.0$) — but a Critical 4-hour task ($w/p = 8/4 = 2.0$) -beats both. $\blacksquare$ - -### 10.4 Applied Example: IT Service Desk - -Consider an IT team with the following ticket queue on a Monday morning: +Consider an IT team with the following ticket queue: | Ticket | Priority | Type | Est. Hours | |--------|----------|------|-----------| @@ -607,46 +462,23 @@ Consider an IT team with the following ticket queue on a Monday morning: | T7 | P2 (High) | Printer fleet offline | 2 | | T8 | P4 (Low) | Archive old shared drive folder | 0.25 | -**SPT order (optimizing unweighted mean):** T8, T4, T5, T3, T7, T6, T2, T1 +**SPT order** (optimizing unweighted mean): T8, T4, T5, T3, T7, T6, T2, T1 -| Position | Ticket | Priority | Hours | Completion | Slowdown | -|----------|--------|----------|-------|------------|----------| +| Pos | Ticket | Priority | Hours | Completion | Slowdown | +|-----|--------|----------|-------|------------|----------| | 1 | T8 (archive folder) | P4 Low | 0.25 | 0.25 | 1.0 | | 2 | T4 (wallpaper) | P4 Low | 0.5 | 0.75 | 1.5 | | 3 | T5 (software) | P3 Med | 1 | 1.75 | 1.75 | | 4 | T3 (laptop) | P3 Med | 2 | 3.75 | 1.875 | | 5 | T7 (printers) | P2 High | 2 | 5.75 | 2.875 | | 6 | T6 (backups) | P1 Crit | 3 | 8.75 | 2.917 | -| 7 | T2 (VPN) | P2 High | 4 | 12.75 | 3.1875 | +| 7 | T2 (VPN) | P2 High | 4 | 12.75 | 3.188 | | 8 | T1 (email) | P1 Crit | 6 | 18.75 | 3.125 | -- **Unweighted mean completion:** $(0.25 + 0.75 + 1.75 + 3.75 + 5.75 + 8.75 + 12.75 + 18.75) / 8 = 6.5625$ hours -- **PWCT:** $(1 \cdot 0.25 + 1 \cdot 0.75 + 2 \cdot 1.75 + 2 \cdot 3.75 + 4 \cdot 5.75 + 8 \cdot 8.75 + 4 \cdot 12.75 + 8 \cdot 18.75) / 30 = 306/30 = 10.2$ hours -- Email server is down for **18.75 hours**. Database backups fail for **8.75 hours**. +**Practical WSJF** (priority-class-first, SPT within class): -**WSJF order (optimizing PWCT by $w(q)/p$ descending):** - -| Ticket | Priority | Hours | $w/p$ | -|--------|----------|-------|-------| -| T6 | P1 Crit | 3 | 8/3 = 2.667 | -| T8 | P4 Low | 0.25 | 1/0.25 = 4.0 | -| T5 | P3 Med | 1 | 2/1 = 2.0 | -| T4 | P4 Low | 0.5 | 1/0.5 = 2.0 | -| T1 | P1 Crit | 6 | 8/6 = 1.333 | -| T7 | P2 High | 2 | 4/2 = 2.0 | -| T2 | P2 High | 4 | 4/4 = 1.0 | -| T3 | P3 Med | 2 | 2/2 = 1.0 | - -Wait — T8 has $w/p = 4.0$, the highest. That places a Low-priority task -first, which feels wrong. This reveals an important practical point: -**pure WSJF can still be gamed by tiny tasks** because their small $p$ -inflates the ratio. In practice, this is mitigated by enforcing strict -priority class ordering and only applying WSJF *within* priority classes. - -**Practical WSJF (priority-class-first, then $w/p$ within class):** - -| Position | Ticket | Priority | Hours | Completion | -|----------|--------|----------|-------|------------| +| Pos | Ticket | Priority | Hours | Completion | +|-----|--------|----------|-------|------------| | 1 | T6 (backups) | P1 Crit | 3 | 3 | | 2 | T1 (email) | P1 Crit | 6 | 9 | | 3 | T7 (printers) | P2 High | 2 | 11 | @@ -656,428 +488,74 @@ priority class ordering and only applying WSJF *within* priority classes. | 7 | T8 (archive) | P4 Low | 0.25 | 18.25 | | 8 | T4 (wallpaper) | P4 Low | 0.5 | 18.75 | -- **Unweighted mean completion:** $(3 + 9 + 11 + 15 + 16 + 18 + 18.25 + 18.75) / 8 = 13.625$ hours -- **PWCT:** $(8 \cdot 3 + 8 \cdot 9 + 4 \cdot 11 + 4 \cdot 15 + 2 \cdot 16 + 2 \cdot 18 + 1 \cdot 18.25 + 1 \cdot 18.75) / 30 = 305/30 = 10.167$ hours -- Email server restored in **9 hours**. Backups fixed in **3 hours**. - -### Comparison +**Comparison:** | Metric | SPT | Practical WSJF | Winner | |--------|-----|----------------|--------| -| Unweighted mean completion | **6.5625 hrs** | 13.625 hrs | SPT | -| Priority-weighted completion (PWCT) | 10.2 hrs | **10.167 hrs** | WSJF | +| Unweighted mean completion | **6.56 hrs** | 13.63 hrs | SPT | +| P1 mean time to resolution | 13.75 hrs | **6 hrs** | WSJF | +| P2 mean time to resolution | 9.25 hrs | **13 hrs** | SPT | | Time to fix email server | 18.75 hrs | **9 hrs** | WSJF | | Time to fix database backups | 8.75 hrs | **3 hrs** | WSJF | -| Time to fix printers | 5.75 hrs | **11 hrs** | SPT | | Time to update wallpaper | **0.75 hrs** | 18.75 hrs | SPT | -The PWCT values are nearly identical (10.2 vs 10.167) because PWCT — as a -*weighted average of completion times* — is dampened by the fact that total -work is constant. **PWCT is not the right metric for this comparison.** The -real difference is visible in the individual completion times of critical -tasks: the email server is down for 18.75 hours under SPT versus 9 hours -under WSJF. The database backups fail for 8.75 hours versus 3 hours. +The aggregate priority-weighted completion times are nearly identical +(PWCT: 10.2 vs 10.17) because aggregation hides distributional damage. +The real difference is in the **per-priority-class** breakdown: the email +server is down for 18.75 hours under SPT versus 9 hours under WSJF. The +database backups fail for 8.75 hours versus 3. -The better comparison metric is the **priority-weighted delay cost** -$D = \sum w(q_i) \cdot C_i$ (not normalized): - -- SPT: $D = 306$ priority-weighted hours -- Practical WSJF: $D = 305$ priority-weighted hours - -Again, the aggregate is similar. The damage from SPT is not in the -aggregate — it is in the *distribution*: critical systems burn while -cosmetic tasks are polished. A metric that cannot distinguish between these -two schedules — despite one leaving the email server down for twice as long -— is not measuring what matters. - -The unweighted metric, however, confidently reports SPT as **more than twice -as efficient** (6.56 vs 13.63), rewarding the team that updated desktop +The unweighted metric confidently reports SPT as **more than twice as +efficient** (6.56 vs 13.63), rewarding the team that updated desktop wallpaper while the email server was on fire. -### 10.5 Recommended Metric Suite +### 6.4 Recommended Metric Suite -The IT example reveals that even priority-weighted aggregate metrics (PWCT) -can fail to distinguish good from bad schedules, because aggregation hides -distributional damage. No single metric suffices. A complete measurement -system for a priority-based team should track: +Even priority-weighted aggregate metrics can fail to distinguish good from +bad schedules, because aggregation hides distributional damage. No single +metric suffices. A complete measurement system should track: | Metric | What it measures | Formula | |--------|-----------------|---------| | **Mean completion by priority class** | Per-class responsiveness | $\bar{C}$ filtered by $q$ | -| **P1 mean time to resolution** | Critical incident response | $\bar{C}$ filtered to $q = 1$ | +| **P1 mean time to resolution** | Critical incident response | $\bar{C}$ for $q = 1$ | | **Throughput** | Raw work capacity | Work-hours completed / calendar time | -| **Aging violations** | Starvation prevention | Count of tasks exceeding SLA by priority | -| **Max completion time (P1/P2)** | Worst-case critical response | $\max(C_i)$ filtered to $q \le 2$ | +| **Aging violations** | Starvation prevention | Tasks exceeding SLA by priority | +| **Max completion time (P1/P2)** | Worst-case critical response | $\max(C_i)$ for $q \le 2$ | -The key insight from our analysis: **per-priority-class metrics** (rows 1-2, -5) expose scheduling failures that aggregate metrics hide. If P1 mean time -to resolution is 14 hours while P4 mean is 0.5 hours, the team is -optimizing the wrong metric — regardless of what the aggregate says. +The key insight: **per-priority-class metrics** expose scheduling failures +that aggregate metrics hide. --- -## 11. Devil's Advocate: The Case for Unweighted Mean Completion Time +# Part III: Organizational Dynamics -Intellectual honesty requires acknowledging where the preceding argument -has limits. The following are genuine counterarguments — not strawmen. +## 7. When the Metric Is the Product -### 11.1 Simplicity Has Real Value +Sections 2–6 assume that client satisfaction is a function of *experienced +service quality*. But there exists a scenario in which this assumption +fails and the entire argument collapses. -**Argument.** The unweighted mean is trivially computable: sum the completion -times, divide by the count. It requires no priority weights, no task-size -estimates, no calibration. Every alternative proposed in Section 10 requires -estimating $p_i$ (task size) before the task is complete — and these -estimates are notoriously unreliable. +### 7.1 The Self-Referential Metric -**Assessment: This is true.** PWCS and PWCT require inputs (priority -weights, size estimates) that introduce their own sources of error. If size -estimates are systematically wrong — and in software engineering they often -are, with large tasks underestimated and small tasks overestimated — then -the weighted metric inherits that noise. - -However, the unweighted metric does not avoid this problem — it *hides* it -by implicitly setting all weights to 1 and all sizes to 1. That is not -"making no assumptions"; it is making the specific assumption that all tasks -are equally important and equally sized, which is demonstrably false in any -real system. **A known-imprecise estimate of task size is still more -informative than the implicit assumption that all sizes are equal.** - -### 11.2 Minimizing the Number of People Waiting - -**Argument.** If each task represents one client, then unweighted mean -completion time minimizes the total person-hours spent waiting. SPT is -optimal for this because completing short tasks first "frees" the most -people from the queue earliest. - -**Assessment: This is mathematically correct.** The sum $\sum C_i$ counts -total person-time in the system. SPT genuinely minimizes this quantity. -If you run a DMV and every person's time is equally valuable regardless of -why they're there, SPT is the right policy. - -The argument breaks down when: - -1. **Tasks are not 1:1 with clients.** In IT, one client may submit tasks - of varying size. Across a relationship, SPT systematically fast-tracks - their easy requests and starves their hard ones — which is not perceived - as good service. - -2. **Waiting cost is not uniform.** A person waiting for a server outage - to be fixed is not equivalent to a person waiting for a wallpaper change. - The cost of waiting is proportional to the *impact* of the unresolved - task, which is what priority encodes. - -3. **The metric is applied to teams, not DMVs.** When a team's performance - is measured by unweighted mean, the rational response is to cherry-pick - — which is individually rational but collectively destructive. - -### 11.3 SPT as a Triage Heuristic - -**Argument.** In high-volume systems where task sizes cluster tightly -(e.g., a call center where most calls are 3-7 minutes), SPT approximates -FIFO and the unweighted mean approximates the weighted mean. The pathologies -described in this paper only manifest when task sizes span orders of -magnitude. - -**Assessment: This is correct.** As shown in Section 8, when task sizes are -approximately uniform, all scheduling policies converge and all metrics -agree. The coefficient of variation of task size, $CV = \sigma_p / \bar{p}$, -determines the severity of the distortion: - -| $CV$ | Task size distribution | Metric distortion | -|------|----------------------|-------------------| -| < 0.3 | Tight (call center) | Negligible | -| 0.3 - 1.0 | Moderate (mixed IT) | Moderate | -| > 1.0 | Wide (typical IT queue) | Severe | - -For a typical IT service desk, task sizes range from 15 minutes (password -reset) to 40+ hours (infrastructure migration), giving $CV > 2$. The -distortion is not a theoretical edge case — it is the default condition. - -### 11.4 Gaming Requires Malice - -**Argument.** The theorems show that the metric *can* be gamed, not that it -*will* be gamed. A well-intentioned team might use the unweighted mean as -a rough health indicator without actively optimizing for it, avoiding the -pathologies described. - -**Assessment: This is the strongest counterargument.** If the metric is -used purely for monitoring — "are we completing things at a reasonable -pace?" — and not for performance evaluation, rewards, or scheduling -decisions, then the gaming incentive is absent and the metric is relatively -harmless. - -However, this argument requires the metric to remain purely informational -and never influence behavior. In practice, any metric that is reported to -management, tied to OKRs, or used in sprint retrospectives will influence -behavior — this is Goodhart's Law, and it applies to well-intentioned teams -as reliably as to cynical ones. The team need not be gaming the metric -consciously; it is sufficient that completing three easy tickets "feels -productive" while staring at one hard ticket does not. The metric validates -the feeling, and the drift happens organically. - -### 11.5 Summary: When the Unweighted Mean Is Defensible - -The unweighted mean completion time is a defensible metric **only when all -four conditions hold simultaneously**: - -1. Task sizes are approximately uniform ($CV < 0.3$) -2. There is no priority differentiation (all tasks are equally important) -3. Each task represents exactly one client -4. The metric is not used to evaluate, reward, or direct team behavior - -In a system satisfying all four conditions — such as a simple FIFO queue -with uniform jobs and no priority system — the unweighted mean is adequate, -and its simplicity is a genuine advantage. - -In any system that violates even one of these conditions — which includes -virtually every IT service desk, development team, and support organization -— the metric produces the distortions proven in Sections 2-9. - -The honest conclusion is not that the unweighted mean is always wrong. It is -that the conditions under which it is right are narrow, easily identified, -and rarely met in the systems where it is most commonly used. - ---- - -## 12. Manager Internalization: The Actionable Solution - -The preceding sections present two extremes: reject the metric entirely -(Sections 1-10) or surrender to it (Appendix A). In practice, most -managers cannot unilaterally change the metric — it is set at the -organizational level, reported across teams, and embedded in dashboards -that other stakeholders consume. The best solution is company-wide metric -reform. The *actionable* solution is what a single informed manager can -do right now. - -### 12.1 The Strategy - -A manager who understands the proof can **internalize the metric's -limitations without propagating them to the team**. The approach: - -1. **Schedule primarily by priority.** The team works critical tasks - first, exactly as professional judgment and the priority system - dictate. This is the default — the team need not know why. - -2. **Tactically interleave small tasks to maintain metric parity.** When - the queue contains a small, low-priority task that can be completed - quickly without materially delaying any high-priority work, do it. - Not because the metric demands it, but because the small task *also - needs to get done*, and doing it now costs almost nothing. - -3. **Never reveal the metric as the motivation.** The team is told "knock - out this quick one while we're waiting on the vendor callback for the - P1" — not "we need to bring our average down." The team's - professional judgment and intrinsic motivation (Appendix B) remain - intact. The manager absorbs the metric-management burden. - -This is a **constrained optimization**: minimize priority-weighted delay -(do the right work in the right order) subject to the constraint that -the reported unweighted mean stays within an acceptable band. - -### 12.2 Formalization - -Let $\bar{C}_{\text{target}}$ be the unweighted mean completion time that -other teams report — the parity threshold. The manager's problem is: - -$$\min_{\sigma} \sum_{i=1}^{n} w(q_i) \cdot C_i \quad \text{subject to} \quad \bar{C}(\sigma) \le \bar{C}_{\text{target}}$$ - -This is a single-machine scheduling problem with a budget constraint on -the unweighted mean. The solution is a modified priority schedule: - -- Start from the priority-first ordering (all P1 first, then P2, etc.). -- Identify small low-priority tasks whose insertion ahead of lower-ranked - same-priority tasks reduces $\bar{C}$ without displacing any - higher-priority task. -- Insert them only when the marginal improvement to $\bar{C}$ exceeds - the marginal cost to priority-weighted delay. - -**Theorem 12 (Bounded Metric Cost of Priority Scheduling).** For a -priority-first schedule with $n$ tasks, the gap between its unweighted -mean $\bar{C}_{\text{priority}}$ and the SPT-optimal unweighted mean -$\bar{C}_{\text{SPT}}$ is bounded by: - -$$\bar{C}_{\text{priority}} - \bar{C}_{\text{SPT}} \le \frac{n-1}{2n}(\bar{p}_{\max\text{-class}} - \bar{p}_{\min\text{-class}}) \cdot n_{\text{classes}}$$ - -where $\bar{p}_{\max\text{-class}}$ and $\bar{p}_{\min\text{-class}}$ are -the mean processing times of the largest and smallest priority classes. - -**Proof sketch.** The gap arises entirely from the cross-class ordering: -within each priority class, the manager can use SPT (shortest first) at -no priority cost, since all tasks in the class have equal priority. The -only deviation from global SPT is the *between-class* ordering, where -large high-priority tasks are placed before small low-priority tasks. -Each such inversion costs at most $p_{\text{large}} - p_{\text{small}}$ -in the unweighted sum, and there are at most -$n_{\text{classes}} \cdot (n / n_{\text{classes}})$ such inversions. -$\blacksquare$ - -In practice, this means: **a manager who uses SPT within each priority -class and priority ordering between classes will produce a metric that -is close to the SPT-optimal value** — often within 10-20% — while -respecting the priority system entirely. - -### 12.3 Why This Works: The Manager as Information Barrier - -The strategy works because the manager serves as an **information -barrier** between the metric and the team: - -| Layer | Sees the metric | Sees the priorities | Sees the proof | -|-------|----------------|--------------------|-----------------| -| Organization | Yes | Nominally | No | -| Manager | Yes | Yes | **Yes** | -| Team | No (shielded) | Yes | Irrelevant | -| Client | Yes (dashboard) | Via SLA | No | - -The manager is the only actor who holds all three pieces of information. -By internalizing the proof, the manager can: - -- Present a metric that satisfies organizational reporting (the number - is reasonable) -- Direct the team by priority (professional judgment preserved) -- Shield the team from the metric's perverse incentives (Appendix B - costs avoided) - -This is *not* manipulation. The manager is not fabricating numbers or -misreporting. They are doing the right work in the right order, and -the metric happens to be acceptable because within-class SPT is free -and between-class inversions are bounded (Theorem 12). - -### 12.4 The Competitive Breakdown - -This strategy fails when the metric becomes **competitive between teams**. - -Model $m$ teams, each managed independently. Team $j$ reports -$\bar{C}_j(\sigma_j)$. If teams are ranked, rewarded, or compared on -$\bar{C}$: - -**Case 1: Cooperative** — Teams are measured for parity, not ranking. -The threshold is "stay within a reasonable band." Each manager -independently uses the internalization strategy. All teams do -approximately the right work. The metric is decorative but harmless. -This is a **coordination game** with a stable cooperative equilibrium. - -**Case 2: Competitive** — Teams are ranked by $\bar{C}$. Promotions, -resources, or recognition go to the lowest average. This is a -**prisoner's dilemma**: - -| | Team B: Priority-first | Team B: SPT | -|---|---|---| -| **Team A: Priority-first** | (Good work, Good work) | (A looks bad, B looks good) | -| **Team A: SPT** | (A looks good, B looks bad) | (Both look good, both do wrong work) | - -The dominant strategy for each team is SPT. The Nash equilibrium is -(SPT, SPT) — all teams optimize the metric, all teams do the wrong -work, and the organization reports excellent numbers while critical -tasks rot across every queue. - -The internalization strategy is a **cooperative equilibrium that is not -stable under competition**. A single team that defects to pure SPT will -outperform all others on the metric, forcing other managers to choose -between doing the right work (and looking bad) or following suit (and -abandoning their professional judgment). - -### 12.5 The Scope of the Solution - -| Condition | Strategy viability | -|-----------|-------------------| -| Metric used for health-check / parity | **Viable** — cooperative equilibrium holds | -| Metric visible but not ranked | **Viable** — no competitive pressure to defect | -| Metric ranked across teams | **Fragile** — viable only if all managers cooperate | -| Metric tied to compensation / resources | **Not viable** — prisoner's dilemma dominates | -| Metric reform possible at org level | **Unnecessary** — fix the metric instead | - -The internalization strategy is actionable *right now*, by a single -manager, without organizational permission or metric reform. It -preserves team psychology (Appendix B), respects priorities (Sections -9-10), and produces an acceptable reported metric (Theorem 12). - -Its limitation is structural: it requires the metric to be a -**reporting formality**, not a **competitive instrument**. The moment -the metric drives resource allocation or team ranking, the cooperative -equilibrium collapses and only organizational reform — replacing the -metric with a priority-weighted alternative (Section 10) — can prevent -the race to the bottom. - -**The best solution is company-wide. The actionable solution is a -manager who understands this proof, shields their team from the metric, -schedules by priority, and uses SPT only within priority classes to -keep the number reasonable.** - ---- - -## 13. Conclusion - -The unweighted average completion time is a **biased statistic** that: - -1. **Can be gamed** by scheduling policy (Theorem 1), unlike work-weighted - completion time which is schedule-invariant (Theorem 2). -2. **Incentivizes starvation** of large tasks (Theorem 3). -3. **Contradicts Little's Law** unless tasks are uniformly sized. -4. **Degrades client satisfaction** with zero compensating productivity - gain (Theorem 7). -5. **Actively contradicts priority systems** by carrying zero information - about business-impact classification (Theorem 9). -6. **Ignores priority entirely** in its scheduling recommendation, - producing suboptimal priority-weighted delay whenever priority and - size are not perfectly inversely correlated (Theorem 10). - -A metric that can be improved by reordering work — without doing any -additional work — is measuring the scheduling policy, not the system's -capacity or effectiveness. When combined with a priority system, the metric -does not merely fail to reflect priorities — it recommends the schedule -that inflicts the most damage on the highest-priority work. - -The unweighted mean is defensible only under narrow, identifiable conditions -(Section 11.5): uniform task sizes, no priority system, one-to-one -client-task mapping, and no behavioral influence from the metric. These -conditions are rarely met in practice. - -**Unweighted average completion time is not a fair or accurate measurement -of task execution performance. Its adoption as a team metric will -rationally produce starvation of complex work, violation of stated -priorities, inequitable client outcomes, and the illusion of productivity -where none exists.** - ---- - -## Appendix A. When the Metric Is the Product - -The preceding twelve sections rest on an implicit assumption: that client -satisfaction is a function of *experienced service quality* — how long -*their* task took, relative to its size and urgency. If this assumption -holds, the proof is valid and the unweighted mean is a destructive metric. - -But there exists a scenario in which the assumption fails and the entire -argument collapses. - -### A.1 The Self-Referential Metric - -Suppose the service provider reports the unweighted mean completion time -directly to the client — on a dashboard, in an SLA report, on a marketing -page — and the client's satisfaction is derived primarily from *that number* -rather than from their individual experience. - -Define client satisfaction as: +Suppose the provider reports the unweighted mean directly to the client +— on a dashboard, in an SLA report, on a marketing page — and the +client's satisfaction is derived primarily from *that number*: $$U_{\text{client}} = f\!\left(\bar{C}(\sigma)\right), \quad f' < 0$$ -That is: the client sees "Average resolution time: 6.56 hours" and is -satisfied, without checking whether *their* ticket — the critical email -outage — took 6.56 hours or 18.75 hours. - Under this model, SPT genuinely maximizes client satisfaction (Theorem 1). -The service provider's throughput is unchanged (Theorem 6). The business -outcome improves: same work done, happier client. +Throughput is unchanged (Theorem 6). The business outcome improves: same +work done, happier client. **Every theorem in this paper remains mathematically correct. But the -conclusion inverts.** The metric is no longer a proxy for service quality -that can be gamed — it *is* the service quality, because the client has -agreed to evaluate quality by the aggregate number rather than by their -individual experience. +conclusion inverts.** The metric is no longer a proxy that can be gamed — +it *is* the service quality, because the client has agreed to evaluate +quality by the aggregate number. -### A.2 The Economics +### 7.2 The Economics -This creates a coherent, stable business equilibrium: +This creates a coherent, stable equilibrium: | Actor | Behavior | Outcome | |-------|----------|---------| @@ -1085,397 +563,472 @@ This creates a coherent, stable business equilibrium: | Client | Reads dashboard, sees low average | Reports satisfaction | | Management | Sees satisfied client + good metric | Rewards team | -Throughput is unchanged (Theorem 6), so the same revenue-generating work -is completed. The only thing that changed is the *order* — and therefore -the reported number. Real resources were rearranged, no additional value -was created, but the business metrics all moved in the right direction. +The provider extracts satisfaction at zero marginal cost, by optimizing a +number the client has accepted as a proxy for quality. -This is *profitable*. The provider extracts satisfaction from the client -at zero marginal cost, by optimizing a number that the client has accepted -as a proxy for quality. The client is no worse off *in their own estimation*, -because they evaluate the aggregate, not their individual experience. +### 7.3 The Fragility -### A.3 The Fragility +This equilibrium is stable only as long as the client never inspects their +own experience. It breaks when: -This equilibrium is stable only as long as the client never inspects -their own experience. It breaks the moment any of the following occur: +1. **The client checks their own ticket.** A CTO whose email server was + down for 18.75 hours will not be reassured by "Average resolution: + 6.56 hours." The clients most likely to inspect are exactly the ones + receiving the worst service (Theorem 4). -**1. The client checks their own ticket.** +2. **A competitor offers per-ticket SLAs.** "P1 resolved within 4 hours" + beats "average resolution under 7 hours" for any client with critical + needs. -A CTO whose email server was down for 18.75 hours will not be reassured -by a dashboard reading "Average resolution: 6.56 hours." The aggregate -metric and the individual experience diverge maximally for high-priority -tasks (Theorem 4). The clients most likely to inspect their own experience -are exactly the ones receiving the worst service. +3. **The team internalizes the metric.** If the team believes the metric + reflects real performance, they lose the ability to recognize when + critical work is neglected. The metric becomes an epistemic hazard. -**2. A competitor offers per-ticket SLAs.** +### 7.4 The General Pattern -If an alternative provider guarantees "P1 incidents resolved within 4 hours" -instead of "average resolution under 7 hours," the aggregate-metric provider -cannot compete for clients with critical needs — which are typically the -highest-value clients. - -**3. The provider's team internalizes the metric.** - -If the team believes the metric reflects real performance (rather than -consciously gaming it), they lose the ability to recognize when critical -work is being neglected. The metric becomes an epistemic hazard: it -tells the team they are performing well, preventing them from seeing that -they are not. - -### A.4 The General Pattern - -This is not unique to task scheduling. The structure is: - -1. A measurable proxy is established for an unmeasured quality. -2. The proxy is reported as if it were the quality itself. -3. The proxy is optimized, improving the reported number. -4. The underlying quality diverges from the proxy, but no one measures - the underlying quality because the proxy exists. -5. The system is stable until an exogenous shock forces inspection of - the underlying quality. - -This pattern appears across domains: +This pattern — proxy replaces quality, proxy is optimized, quality +diverges, system is stable until tested by reality — recurs across domains. +Muller [19] documents it extensively as "metric fixation"; Campbell [24] +formalized the corrupting effect of using indicators as targets. | Domain | Proxy metric | Underlying quality | Divergence | |--------|-------------|-------------------|------------| -| IT support | Avg. resolution time | Critical system uptime | Server down for 19 hrs, avg says 6.5 | -| Education | Standardized test scores | Actual learning | Teaching to the test, understanding declines | -| Healthcare | Patient throughput | Patient outcomes | Faster discharges, higher readmission rates | -| Finance | Quarterly earnings | Long-term value creation | Cost-cutting inflates EPS, erodes capability | -| Software | Velocity (story points) | Deliverable product quality | Point inflation, features half-finished | +| IT support | Avg. resolution time | Critical system uptime | Server down 19 hrs, avg says 6.5 | +| Education | Test scores | Actual learning | Teaching to the test | +| Healthcare | Patient throughput | Patient outcomes | Faster discharges, higher readmission | +| Finance | Quarterly earnings | Long-term value | Cost-cutting inflates EPS, erodes capability | +| Software | Velocity (story points) | Product quality | Point inflation, features half-finished | -In each case, the proxy is optimized, the number improves, and the system -*functions* — profitably, even — until the moment the underlying quality -is tested by reality. +### 7.5 Information Asymmetry -### A.5 A Mathematical Note on Equilibrium Stability +Model the system as a game between provider (P) and client (C). P observes +individual $\{C_i\}$ and chooses $\sigma$; C observes only +$\bar{C}(\sigma)$. This is a **moral hazard** problem [10]: P's optimal +strategy is to minimize the observable signal regardless of the +unobservable distribution. -Model the system as a game between provider (P) and client (C). +The equilibrium is a **pooling equilibrium** [9]: P's reported metric +looks identical regardless of the underlying priority-weighted performance. +It is stable until C obtains access to individual $C_i$ values — via a +customer portal, a competitor's transparency, or a sufficiently painful +incident. -**Information structure:** -- P observes individual completion times $\{C_i\}$ and chooses schedule $\sigma$ -- C observes only the reported aggregate $\bar{C}(\sigma)$ - -**Payoffs:** -- P's payoff increases with C's satisfaction and is independent of schedule - (throughput is invariant) -- C's *reported* satisfaction $U_C = f(\bar{C})$ is maximized by SPT -- C's *actual* welfare (if they could observe it) depends on individual - $C_i$ values, especially for high-priority tasks - -This is a **moral hazard** problem. P has private information (the -distribution of $C_i$) that C cannot observe. P's optimal strategy is to -minimize the observable signal ($\bar{C}$) regardless of the unobservable -distribution — which is exactly SPT. - -The equilibrium is a **pooling equilibrium**: P's schedule looks identical -to the client regardless of the underlying priority-weighted performance. -A provider with PWCT = 10.2 and a provider with PWCT = 10.167 both report -$\bar{C} = 6.56$ under SPT. The client cannot distinguish between them. - -This equilibrium is stable under the standard game-theoretic condition: -**C has no incentive to deviate** (they have no better information source) -and **P has no incentive to deviate** (any other schedule worsens $\bar{C}$ -with zero throughput benefit). - -It is *unstable* under **information revelation**: if C obtains access to -individual $C_i$ values (via a customer portal, a competing vendor's -transparency, or a sufficiently painful incident), the pooling equilibrium -collapses and C's evaluation shifts to the underlying quality. - -### A.6 The Uncomfortable Conclusion +### 7.6 The Uncomfortable Conclusion The honest answer to "does optimizing the unweighted mean hurt the -business?" is: **not necessarily, as long as the client never looks -behind the number**. - -The honest answer to "does it hurt the client?" is: **only when they -have a problem large enough to notice** — which is precisely when the -metric's distortion is largest (Theorem 4). - -The honest answer to "is this sustainable?" is: it is exactly as -sustainable as any system in which the seller knows more than the buyer. -Such systems are historically stable for extended periods and then -collapse rapidly when the information asymmetry is punctured — by a -crisis, a competitor, or a regulator. - -The mathematical structure is clear: the unweighted mean creates an -information asymmetry between the metric and the reality. Optimizing -the metric under this asymmetry is *locally rational* for the provider, -*locally satisfying* for the uninspecting client, and *globally fragile* -for the relationship. - -Whether one calls this "efficient market behavior" or "a dystopian -consequence of optimizing legible numbers over illegible reality" is not -a mathematical question. The math says only this: **the incentive exists, -the equilibrium is real, and it holds until it doesn't.** +business?" is: **not necessarily, as long as the client never looks behind +the number**. The honest answer to "is this sustainable?" is: it is +exactly as sustainable as any system in which the seller knows more than +the buyer — stable for extended periods, then rapid collapse when the +asymmetry is punctured. --- -## Appendix B. The Psychological Cost of Knowing +## 8. The Psychological Cost of Knowing -Appendix A modeled the provider as a unitary rational actor — "the team" -optimizes the metric. But teams are composed of individuals, and those -individuals have their own utility functions. When a team member -understands the proof — when they *know* the metric is synthetic, that -the dashboard is theater, that the email server is still down while they -close wallpaper tickets — a new cost appears that the equilibrium model -did not account for. +Section 7 modeled the provider as a unitary actor. But teams are composed +of individuals. When a team member understands the proof — when they +*know* the metric is synthetic, that the dashboard is theater, that the +email server is still down while they close wallpaper tickets — a new cost +appears that the equilibrium model omitted. -### B.1 The Hidden Variable: Team Awareness +### 8.1 The Hidden Variable: Team Awareness -Appendix A's game has three actors: provider, client, management. But the -provider is not monolithic. Decompose it: +| Actor | Observes individual $C_i$ | Observes $\bar{C}$ | Understands the proof | +|-------|--------------------------|--------------------|-----------------------| +| Management | Possibly | Yes | Varies | +| Team member | **Yes** | Yes | **Yes** (in this scenario) | +| Client | No | Yes | No | -- **Management (M):** sets the metric, evaluates the team, reports to client -- **Team member (T):** executes the work, observes individual task states -- **Client (C):** observes only the reported aggregate +The team member has full information. They see the ticket queue. They know +the email server has been down since 7 AM. They know they are closing a +wallpaper ticket because it improves the number. And they know *why*. -The information structure changes: +### 8.2 Cognitive Dissonance Under Full Information -| Actor | Observes individual $C_i$ | Observes aggregate $\bar{C}$ | Understands the proof | -|-------|--------------------------|-----------------------------|-----------------------| -| M | Possibly | Yes | Varies | -| T | **Yes** | Yes | **Yes** (in this scenario) | -| C | No | Yes | No | - -The team member has *full information*. They see the ticket queue. They -know the email server has been down since 7 AM. They know they are closing -a wallpaper ticket because it will improve the number. And they know *why* -this is happening — not from vague discomfort, but from a precise -mathematical understanding that the metric rewards this behavior. - -### B.2 Cognitive Dissonance Under Full Information - -Cognitive dissonance (Festinger, 1957) arises when an individual holds -two contradictory cognitions simultaneously. The standard resolution is -to modify one cognition to reduce the conflict. - -A team member operating under the synthetic metric holds: +Cognitive dissonance [11] arises when an individual holds contradictory +cognitions. Without understanding *why*, the contradiction can be +rationalized: "management knows best." Understanding the proof removes +the ambiguity. The team member now holds: - **Cognition A:** "I am a competent professional. My job is to solve - important problems for clients." + important problems." - **Cognition B:** "I am closing a wallpaper ticket while the email - server is down, because it makes the number look better." + server is down, because the metric is mathematically biased (Theorem 1), + the reordering produces zero throughput (Theorem 6), and the only + beneficiary is the dashboard (Section 7). I can prove this." -In the absence of understanding *why*, Cognition B can be rationalized: -"management knows best," "maybe there's a reason," "the system works -overall." This is uncomfortable but tolerable — the ambiguity provides -cognitive cover. +The dissonance is now *load-bearing*. The available resolutions — abandon +professional identity, reject the proof, advocate for change, or leave — +each impose costs that did not exist before. -**Understanding the proof removes the ambiguity entirely.** The team -member now holds: +### 8.3 Self-Determination Theory: Three Needs Violated -- **Cognition A:** Same as above. -- **Cognition B':** "I am closing a wallpaper ticket while the email - server is down, because the metric is mathematically biased toward - small tasks (Theorem 1), the reordering produces zero additional - throughput (Theorem 6), and the only beneficiary is the dashboard - (Appendix A). I can prove this." +Deci and Ryan's Self-Determination Theory [12, 13] identifies three needs +predicting intrinsic motivation: -B' is strictly harder to rationalize than B. The team member cannot -retreat into uncertainty because they possess the proof. The dissonance -is now *load-bearing*: it must be resolved, and the available resolutions -are: +**Autonomy.** The metric constrains choices in a way the team member +knows is mathematically suboptimal. A worker who understands the process +is provably counterproductive cannot feel autonomous following it. -1. **Reject Cognition A** — "I am not here to solve important problems; - I am here to move numbers." This is psychologically costly. It - requires abandoning professional identity. +**Competence.** The metric rewards *apparent* effectiveness (low $\bar{C}$) +while being invariant to *actual* effectiveness (Theorem 6). Genuine +competence — fixing the email server first — is *punished* by the metric. -2. **Reject Cognition B'** — "The proof must be wrong, or doesn't apply - here." This is intellectually costly. The proof is simple enough to - verify, and the IT example maps directly to their daily experience. +**Relatedness.** The team member knows the client's email server is down. +They could help. They are instead updating wallpaper — not because it +helps anyone, but because it helps a number. The connection between work +and human impact has been severed, and the team member can see the severed +ends. -3. **Change the situation** — advocate for better metrics, refuse to - cherry-pick, escalate. This is *professionally* costly in an - environment that rewards the metric. +### 8.4 Moral Injury -4. **Leave** — resolve the dissonance by exiting the system entirely. +Moral injury [16, 17] is the lasting harm caused by "perpetrating, failing +to prevent, bearing witness to, or learning about acts that transgress +deeply held moral beliefs" [17]. It has since been extended to business +settings [25]. The key distinction from burnout: **burnout is exhaustion +from doing too much. Moral injury is damage from doing the wrong thing.** -None of these resolutions are free. Each one imposes a cost on the team -member that did not exist before they understood the proof — and *none of -them appear in the business equilibrium model of Appendix A*. +A team member who knows the email server is down, knows they should fix +it, closes a wallpaper ticket instead, and does so because the metric +requires it, is experiencing the structural conditions for moral injury. -### B.3 Self-Determination Theory: Three Needs Violated +### 8.5 Learned Helplessness and Metric Fatalism -Deci and Ryan's Self-Determination Theory (1985, 2000) identifies three -innate psychological needs whose satisfaction predicts intrinsic motivation, -job satisfaction, and well-being: +Seligman's learned helplessness [14, 15] describes how exposure to +uncontrollable negative outcomes leads to passivity. The sequence: -**1. Autonomy** — the need to feel volitional control over one's actions. +1. The metric is flawed (proof understood). +2. Advocate for change. +3. Rejected ("the numbers are good, don't rock the boat"). +4. Repeat with decreasing conviction. +5. Terminal state: "The metric is what it is. I'll just close tickets." -A team member who understands the proof knows that the metric constrains -their choices in a way that is mathematically suboptimal for the client. -Their scheduling decisions are not autonomous expressions of professional -judgment; they are coerced responses to a flawed incentive. The *knowledge* -of the coercion — not just the coercion itself — is what damages autonomy. -A worker who doesn't understand why they're doing something can still feel -autonomous ("I'm choosing to follow the process"). A worker who understands -that the process is provably counterproductive cannot. +This is not laziness. It is the rational response to a system that +punishes correct behavior and rewards incorrect behavior, when the +individual lacks power to change the system. -**2. Competence** — the need to feel effective at meaningful tasks. +### 8.6 The Adversarial Selection Spiral -The proof demonstrates that the metric rewards *apparent* effectiveness -(low $\bar{C}$) while being invariant to *actual* effectiveness (throughput, -Theorem 6). A team member who understands this knows that the metric -cannot distinguish between a competent team and an incompetent one that -happens to cherry-pick small tasks. Their competence is invisible to the -measurement system. Worse: genuine competence — choosing to fix the email -server first — is *punished* by the metric ($\bar{C}$ increases from 6.56 -to 13.63 in the IT example). +Combining Section 7's equilibrium with the turnover dynamic: -When a measurement system punishes competent decisions and rewards -incompetent ones, and the team member *knows this*, the need for -competence is not merely unsatisfied — it is actively contradicted. +1. Organization adopts unweighted mean. Metric looks good (SPT). +2. Aware, competent team members experience psychological costs (8.2–8.5). +3. Those members leave. Replaced by members who do not understand the + metric's flaws or do not care. +4. The metric continues to look good — it always does under SPT, + regardless of team competence (Corollary 6.1). +5. Actual service quality degrades, but the metric cannot detect this + (Corollary 9.1). +6. Return to step 1. -**3. Relatedness** — the need to feel connected to others and to -contribute to something meaningful. +The metric selects *against* the people who would improve the system and +*for* the people who will not challenge it. The system stabilizes at a +lower level of competence, invisible to its own measurement apparatus. -The team member knows the client's email server is down. They know the -client is suffering. They know they could help. They are instead updating -a wallpaper policy — not because it helps anyone, but because it helps -a number. The connection between the team member's work and the client's -well-being has been severed by the metric, and the team member *can see -the severed ends*. +### 8.7 The Complete Cost Model -### B.4 Moral Injury - -The concept of moral injury (Shay, 1994; Litz et al., 2009) was developed -in military psychology to describe the lasting harm caused by -"perpetrating, failing to prevent, bearing witness to, or learning about -acts that transgress deeply held moral beliefs." It has since been applied -to healthcare workers, first responders, and — increasingly — to -knowledge workers in bureaucratic systems. - -The key distinction from burnout: **burnout is exhaustion from doing too -much. Moral injury is damage from doing the wrong thing, or being -prevented from doing the right thing.** - -A team member who: -- Knows the email server is down (witnessing the harm) -- Knows they should fix it (moral belief about professional duty) -- Closes a wallpaper ticket instead (transgressing that belief) -- Does so because the metric requires it (institutional causation) - -...is experiencing the structural conditions for moral injury. The -proof doesn't cause the injury — the metric does. But the proof -eliminates the psychological buffer of ignorance that would otherwise -mitigate it. - -### B.5 Learned Helplessness and Metric Fatalism - -Seligman's learned helplessness framework (1967, 1975) describes the -phenomenon where exposure to uncontrollable negative outcomes leads to -passivity even when control becomes available. - -The sequence for an aware team member: - -1. **Observation:** The metric is flawed (proof understood). -2. **Action:** Advocate for change ("we should use priority-weighted - metrics"). -3. **Outcome:** Rejected ("the client is happy with the current - dashboard," "this is how we've always measured," "the numbers are - good, don't rock the boat"). -4. **Repetition:** Steps 2-3 repeat, with decreasing conviction. -5. **Helplessness:** "The metric is what it is. I'll just close tickets." - -The terminal state — metric fatalism — is characterized by: -- Disengagement from professional judgment ("I just do what the queue - says") -- Reduced initiative ("why bother triaging if the metric doesn't care?") -- Cynicism toward measurement generally ("all metrics are fake") -- Withdrawal of discretionary effort on complex tasks - -This is not laziness. It is the rational psychological response to a -system that punishes correct behavior and rewards incorrect behavior, -when the individual lacks the power to change the system. - -### B.6 The Turnover Equation - -The costs described in B.2-B.5 are borne by the team member, not the -organization — initially. They become organizational costs through -**turnover**. - -Model the team member's stay/leave decision: - -$$\text{Stay if: } \quad V_{\text{compensation}} + V_{\text{intrinsic}} > V_{\text{outside option}}$$ - -The synthetic metric degrades $V_{\text{intrinsic}}$ through each of the -mechanisms described above: - -| Mechanism | Component degraded | Effect on $V_{\text{intrinsic}}$ | -|-----------|-------------------|----------------------------------| -| Cognitive dissonance (B.2) | Psychological comfort | Decreased | -| Autonomy violation (B.3.1) | Sense of agency | Decreased | -| Competence contradiction (B.3.2) | Professional identity | Decreased | -| Relatedness severance (B.3.3) | Sense of purpose | Decreased | -| Moral injury (B.4) | Ethical well-being | Decreased | -| Learned helplessness (B.5) | Belief in efficacy | Decreased | - -As $V_{\text{intrinsic}}$ decreases, the organization must increase -$V_{\text{compensation}}$ to retain the team member, or accept their -departure. - -Crucially: **the team members most affected are those with the strongest -professional identity and the deepest understanding of the work.** These -are the most competent members — the ones most capable of recognizing the -metric's absurdity, most troubled by it, and most able to find employment -elsewhere. The metric selects for the departure of the team's best people. - -### B.7 The Adversarial Selection Spiral - -Combining Appendix A's equilibrium with the turnover dynamic: - -1. Organization adopts unweighted mean completion time. -2. Metric looks good (SPT). Client is satisfied (Appendix A). Management - is satisfied. -3. Aware, competent team members experience psychological costs (B.2-B.5). -4. Those members leave. They are replaced by members who either: - (a) do not understand the metric's flaws (less competent), or - (b) do not care (less engaged). -5. The metric continues to look good — it always does under SPT, - regardless of team competence (Theorem 6, Corollary 6.1). -6. Actual service quality degrades (less competent team), but the metric - cannot detect this (Theorem 9, Corollary 9.1). -7. Return to step 2. - -This is an **adversarial selection spiral**: the metric selects *against* -the people who would improve the system and *for* the people who will not -challenge it. The system stabilizes at a lower level of actual competence, -invisible to its own measurement apparatus, staffed by people who have -made peace with — or are unaware of — the gap between the number and the -reality. - -The dashboard still looks good. - -### B.8 The Complete Cost Model - -Appendix A concluded that the synthetic-metric equilibrium is stable and -profitable. Appendix B reveals the hidden costs that model omitted: - -| Appendix A (visible) | Appendix B (hidden) | +| Section 7 (visible) | Section 8 (hidden) | |---------------------|---------------------| -| Client satisfied (sees good number) | Team dissatisfied (sees bad reality) | +| Client satisfied (good number) | Team dissatisfied (bad reality) | | Throughput unchanged | Discretionary effort withdrawn | | Metric improves | Competent members leave | | Business economy stable | Institutional competence degrades | -| Zero marginal cost | Replacement/training costs accumulate | -The business equilibrium of Appendix A is real. The psychological costs -of Appendix B are also real. They operate on different timescales: -the equilibrium is visible quarterly; the competence degradation is -visible over years. +These operate on different timescales: the equilibrium is visible +quarterly; the competence degradation is visible over years. The complete +model is: **the metric works, and it is destructive, and the destruction +is invisible to the metric.** The metric is fresh paint on corroded rebar. -The complete model is not "the metric works" (Appendix A) or "the metric -is destructive" (Sections 1-12). It is: **the metric works, and it -is destructive, and the destruction is invisible to the metric.** +--- -An organization can run profitably for an extended period on synthetic -metrics and hollowed-out competence, just as a building can stand for -years with corroded rebar. The metric is the fresh paint. Appendix A -proved the paint is convincing. This appendix merely notes that it is -still paint. +## 9. Manager Internalization: The Actionable Solution + +Sections 2–6 say reject the metric. Section 7 says the metric works +(for the business). Section 8 says it destroys the team. In practice, +most managers cannot unilaterally change the metric. The best solution is +company-wide metric reform. The *actionable* solution is what a single +informed manager can do right now. + +### 9.1 The Strategy + +A manager who understands the proof can **internalize the metric's +limitations without propagating them to the team**: + +1. **Schedule primarily by priority.** The team works critical tasks first. +2. **Tactically interleave small tasks.** When a small low-priority task + can be completed without materially delaying high-priority work, do it. + Not because the metric demands it, but because it also needs to get + done and costs almost nothing. +3. **Never reveal the metric as the motivation.** "Knock out this quick + one while we wait for the vendor callback on the P1" — not "we need + to bring our average down." The team's intrinsic motivation remains + intact (Section 8). The manager absorbs the metric-management burden. + +### 9.2 Formalization + +The manager's problem is a constrained optimization: + +$$\min_{\sigma} \sum_{i=1}^{n} w(q_i) \cdot C_i \quad \text{subject to} \quad \bar{C}(\sigma) \le \bar{C}_{\text{target}}$$ + +**Theorem 12 (Bounded Metric Cost of Priority Scheduling).** A manager +who uses SPT *within* each priority class and priority ordering *between* +classes will produce a metric close to the SPT-optimal value — the gap +arises only from between-class inversions. + +**Proof sketch.** Within each priority class, SPT is free (all tasks have +equal priority). The only deviation from global SPT is the between-class +ordering. Each cross-class inversion costs at most +$p_{\text{large}} - p_{\text{small}}$ in the unweighted sum, and these +inversions are bounded by the number of classes. In practice, the gap is +typically within 10–20% of SPT-optimal. $\blacksquare$ + +### 9.3 The Manager as Information Barrier + +| Layer | Sees metric | Sees priorities | Sees proof | +|-------|-----------|----------------|------------| +| Organization | Yes | Nominally | No | +| Manager | Yes | Yes | **Yes** | +| Team | No (shielded) | Yes | Irrelevant | +| Client | Yes (dashboard) | Via SLA | No | + +The manager is the only actor holding all three pieces of information. +This is not manipulation — they are doing the right work in the right +order, and the metric happens to be acceptable because within-class SPT +is free. + +### 9.4 The Competitive Breakdown + +This strategy fails when the metric becomes **competitive between teams**. + +**Case 1: Cooperative** — Teams measured for parity, not ranking. Each +manager independently uses the internalization strategy. The metric is +decorative but harmless. This is a **coordination game** with a stable +cooperative equilibrium. + +**Case 2: Competitive** — Teams ranked by $\bar{C}$. This is a +**prisoner's dilemma**: + +| | Team B: Priority-first | Team B: SPT | +|---|---|---| +| **Team A: Priority-first** | (Good work, Good work) | (A looks bad, B looks good) | +| **Team A: SPT** | (A looks good, B looks bad) | (Both look good, both do wrong work) | + +The Nash equilibrium is (SPT, SPT). The internalization strategy is a +cooperative equilibrium that is **not stable under competition**. + +### 9.5 Scope + +| Condition | Viability | +|-----------|-----------| +| Metric used for health-check / parity | **Viable** | +| Metric visible but not ranked | **Viable** | +| Metric ranked across teams | **Fragile** — requires all managers to cooperate | +| Metric tied to compensation / resources | **Not viable** — prisoner's dilemma dominates | +| Metric reform possible at org level | **Unnecessary** — fix the metric instead | + +**The best solution is company-wide. The actionable solution is a manager +who understands this proof, shields their team from the metric, schedules +by priority, and uses SPT only within priority classes to keep the number +reasonable.** + +--- + +# Part IV: Assessment + +## 10. Devil's Advocate + +Intellectual honesty requires acknowledging where the argument has limits. + +### 10.1 Simplicity Has Real Value + +**Argument.** The unweighted mean requires no priority weights, no +task-size estimates, no calibration. + +**Assessment: True.** But the unweighted metric does not avoid assumptions +— it *hides* them by implicitly setting all weights to 1 and all sizes to +1. A known-imprecise estimate of task size is still more informative than +the implicit assumption that all sizes are equal. + +### 10.2 Minimizing the Number of People Waiting + +**Argument.** SPT minimizes total person-hours spent waiting. If each +task represents one client, this is optimal. + +**Assessment: Mathematically correct.** If you run a DMV and every +person's time is equally valuable, SPT is the right policy. It breaks +down when tasks are not 1:1 with clients, waiting cost is not uniform, +or the metric is used to evaluate teams rather than serve a literal queue. + +### 10.3 SPT as a Triage Heuristic + +**Argument.** When task sizes cluster tightly, SPT approximates FIFO +and the unweighted mean approximates the weighted mean. + +**Assessment: Correct.** The coefficient of variation $CV = \sigma_p / \bar{p}$ determines distortion severity: + +| $CV$ | Task size distribution | Distortion | +|------|----------------------|------------| +| < 0.3 | Tight (call center) | Negligible | +| 0.3 – 1.0 | Moderate (mixed IT) | Moderate | +| > 1.0 | Wide (typical IT queue) | Severe | + +A typical IT desk spans 15 minutes to 40+ hours ($CV > 2$). The +distortion is not an edge case — it is the default. + +### 10.4 Gaming Requires Malice + +**Argument.** The theorems show the metric *can* be gamed, not that it +*will* be gamed. + +**Assessment: This is the strongest counterargument.** If the metric is +purely informational and never influences behavior, the gaming incentive +is absent. However, any metric reported to management, tied to OKRs, or +discussed in retrospectives will influence behavior. This is Goodhart's +Law [6, 7] — and it applies to well-intentioned teams as reliably as to +cynical ones. The drift happens organically: completing three easy tickets +"feels productive" while the metric validates the feeling. + +### 10.5 When the Unweighted Mean Is Defensible + +The metric is defensible **only when all four conditions hold**: + +1. Task sizes are approximately uniform ($CV < 0.3$) +2. No priority differentiation (all tasks equally important) +3. Each task represents exactly one client +4. The metric is not used to evaluate, reward, or direct behavior + +These conditions are rarely met in the systems where the metric is most +commonly used. + +--- + +## 11. Related Work + +This paper sits at the intersection of several literatures that have not +previously been connected. + +### 11.1 Scheduling Theory and Fairness + +Smith [1] established the SPT optimality result and the WSJF rule in 1956. +Conway, Maxwell, and Miller [2] provided the comprehensive textbook +treatment. The fairness of size-based scheduling policies has been debated +in computer systems scheduling: Bansal and Harchol-Balter [22] investigated +SRPT unfairness; Wierman and Harchol-Balter [23] formalized fairness +classifications against Processor-Sharing; Angel, Bampis, and Pascual [21] +measured SPT schedule quality against fair optimality criteria. + +This prior work analyzes fairness in CPU and server scheduling. The present +paper applies the same mathematical results to *organizational task +management*, where the "scheduler" is a human team, the "jobs" are client +requests with business-impact priorities, and the "objective function" is +a management metric. The mechanism is identical; the consequences differ +because organizational scheduling has priority systems, client +relationships, and psychological costs that CPU scheduling does not. + +### 11.2 Measurement Dysfunction + +Austin [18] proved that incomplete measurement — measuring only a subset +of relevant dimensions — creates incentives to optimize the measured +dimensions at the expense of unmeasured ones, and that this effect is not +merely possible but *inevitable* when measurement is tied to rewards. His +information-asymmetry framing closely parallels Section 7. The present +paper provides the specific mathematical mechanism (Theorems 1–2) for the +case of task scheduling, and extends the argument through psychology +(Section 8) to trace the complete chain of organizational harm. + +Muller [19] documented "metric fixation" across education, healthcare, +policing, and finance, providing extensive empirical evidence for the +patterns theorized in Section 7.4. Campbell [24] formalized the corrupting +effect of using indicators as targets, complementing Goodhart's original +observation [6] and Strathern's generalization [7]. + +Bevan and Hood [26] empirically documented gaming behaviors in the English +public health system — including the exact patterns of "hitting the target +and missing the point" described in our Section 5.2. + +### 11.3 Psychological Costs of Metric Dysfunction + +The application of moral injury (Shay [16], Litz et al. [17]) to business +settings has recent precedent: a 2024 *Journal of Business Ethics* study +[25] explicitly extended the construct to for-profit workplaces, finding +structural conditions similar to those described in Section 8.4. Moore +[27] analyzed moral *disengagement* — the cognitive restructuring that +enables unethical behavior under organizational pressure. The present +paper addresses the complementary phenomenon: the harm to individuals who +*refuse* to disengage. + +### 11.4 What Is Novel + +The individual components — SPT optimality, Goodhart's Law, measurement +dysfunction, moral injury — all have precedent. The contributions of this +paper are: + +1. **The conservation law (Theorem 2) used prescriptively** — as a + constructive argument that work-weighted completion time *cannot* be + gamed, rather than as a theoretical scheduling result. + +2. **The specific proof that priority classes make the metric algebraically + adversarial** (Theorems 8–9) — not merely empirically bad but + structurally contradictory, with zero mutual information between the + schedule and the priority system. + +3. **The integrated chain** from mathematical proof through information + asymmetry through psychological harm through adversarial selection + spiral — tracing a single metric from Smith (1956) to organizational + hollowing. + +4. **The manager internalization strategy** (Section 9) with formal + game-theoretic analysis of its stability and breakdown conditions + under inter-team competition. + +5. **The application of scheduling theory to organizational management + critique** — proving that a commonly used team metric has specific, + quantifiable pathologies rather than arguing from anecdote or + general principle. + +--- + +## 12. Conclusion + +The unweighted average completion time is a **biased statistic** that: + +1. **Can be gamed** by scheduling policy (Theorem 1), unlike work-weighted + completion time which is schedule-invariant (Theorem 2). +2. **Incentivizes starvation** of large tasks (Theorem 3). +3. **Degrades client satisfaction** with zero compensating productivity + gain (Theorem 7). +4. **Actively contradicts priority systems** by carrying zero information + about business-impact classification (Theorem 9). +5. **Ignores priority entirely** in its scheduling recommendation, + producing suboptimal priority-weighted delay whenever priority and + size are not perfectly inversely correlated (Theorem 10). + +A metric that can be improved by reordering work — without doing any +additional work — is measuring the scheduling policy, not the system's +capacity. When combined with a priority system, it recommends the schedule +that inflicts the most damage on the highest-priority work. + +When the metric is reported to clients, it creates an information asymmetry +(Section 7) whose business equilibrium is profitable but fragile. When +team members understand its flaws, it violates their intrinsic motivation +and selects for the departure of the most competent people (Section 8). +A single informed manager can partially mitigate these effects through +constrained optimization (Section 9), but this cooperative strategy is +not stable under inter-team competition. + +The unweighted mean is defensible only under narrow conditions +(Section 10.5): uniform task sizes, no priorities, one-to-one client-task +mapping, and no behavioral influence. These conditions are rarely met. + +**Unweighted average completion time is not a fair or accurate measurement +of task execution performance. Its adoption as a team metric will +rationally produce starvation of complex work, violation of stated +priorities, inequitable client outcomes, and the illusion of productivity +where none exists.** + +The best solution is organizational metric reform. The actionable solution +is a manager who understands this proof. --- @@ -1489,59 +1042,48 @@ doi:[10.1002/nav.3800030106](https://doi.org/10.1002/nav.3800030106) > Origin of the SPT optimality result (Theorem 1), the weighted completion > time rule $w_i/p_i$ descending (WSJF, Theorem 11), and the adjacent-job -> pairwise interchange (exchange argument) proof technique used throughout -> this paper. +> pairwise interchange (exchange argument) proof technique used throughout. [2] Conway, R. W., Maxwell, W. L., & Miller, L. W. (1967). *Theory of Scheduling*. Addison-Wesley. -> Comprehensive treatment of single-machine and multi-machine scheduling -> theory, extending Smith's results. Standard textbook reference for the -> exchange argument and its generalizations. +> Standard textbook treatment of single-machine scheduling theory, +> extending Smith's results. [3] Little, J. D. C. (1961). A proof for the queuing formula: L = λW. *Operations Research*, 9(3), 383–387. doi:[10.1287/opre.9.3.383](https://doi.org/10.1287/opre.9.3.383) -> First rigorous proof of Little's Law, referenced in Section 5. The -> result was known informally before 1961; this paper provided the -> general proof requiring only stationarity and finite expectations. +> First rigorous proof of Little's Law. Referenced in Section 3.2 for +> queueing-theoretic context. [4] Little, J. D. C. (2011). Little's Law as viewed on its 50th anniversary. *Operations Research*, 59(3), 536–549. doi:[10.1287/opre.1110.0941](https://doi.org/10.1287/opre.1110.0941) -> Retrospective discussing the law's scope, limitations, and -> common misapplications — including the batch-case subtleties -> noted in Section 5 of this paper. +> Retrospective discussing scope, limitations, and common misapplications. [5] Reinertsen, D. G. (2009). *The Principles of Product Development Flow: Second Generation Lean Product Development*. Celeritas Publishing. ISBN: 978-0-9844512-0-8. -> Popularized the term "Weighted Shortest Job First" (WSJF) and the -> "Cost of Delay divided by Duration" formulation in agile/lean product -> development contexts. The underlying mathematical result is Smith -> (1956) [1]. +> Popularized WSJF and "Cost of Delay / Duration" in agile/lean contexts. +> Mathematical foundation is Smith (1956) [1]. ### Measurement and Incentives -[6] Goodhart, C. A. E. (1984). Problems of monetary management: The -U.K. experience. In C. A. E. Goodhart, *Monetary Theory and Practice: -The UK Experience* (pp. 91–121). Macmillan. +[6] Goodhart, C. A. E. (1984). Problems of monetary management: The U.K. +experience. In *Monetary Theory and Practice* (pp. 91–121). Macmillan. -> Source of Goodhart's Law. Original wording: "Any observed statistical -> regularity will tend to collapse once pressure is placed upon it for -> control purposes." First presented as a working paper for the Reserve -> Bank of Australia in 1975. +> Source of Goodhart's Law: "Any observed statistical regularity will tend +> to collapse once pressure is placed upon it for control purposes." [7] Strathern, M. (1997). 'Improving ratings': Audit in the British university system. *European Review*, 5(3), 305–321. doi:[10.1002/(SICI)1234-981X(199707)5:3<305::AID-EURO184>3.0.CO;2-4](https://doi.org/10.1002/(SICI)1234-981X(199707)5:3%3C305::AID-EURO184%3E3.0.CO;2-4) -> Generalized Goodhart's observation into the form commonly cited today: -> "When a measure becomes a target, it ceases to be a good measure." -> Referenced implicitly in Sections 6, 11.4, and Appendix A.4. +> Generalized Goodhart's Law: "When a measure becomes a target, it ceases +> to be a good measure." ### Behavioral Economics @@ -1549,11 +1091,7 @@ doi:[10.1002/(SICI)1234-981X(199707)5:3<305::AID-EURO184>3.0.CO;2-4](https://doi decision under risk. *Econometrica*, 47(2), 263–292. doi:[10.2307/1914185](https://doi.org/10.2307/1914185) -> Established loss aversion — the finding that losses are weighted -> approximately twice as heavily as equivalent gains in subjective -> evaluation. Referenced in Section 7.4 to argue that the dissatisfaction -> of deprioritized large-task clients outweighs the satisfaction gained -> by small-task clients under SPT. +> Established loss aversion. Referenced in Section 4.5. ### Game Theory and Contract Theory @@ -1561,78 +1099,55 @@ doi:[10.2307/1914185](https://doi.org/10.2307/1914185) and the market mechanism. *The Quarterly Journal of Economics*, 84(3), 488–500. doi:[10.2307/1879431](https://doi.org/10.2307/1879431) -> Foundational model of information asymmetry and adverse selection. -> The pooling equilibrium described in Appendix A.5 — where the client -> cannot distinguish high-quality from low-quality service because both -> produce the same aggregate metric — is structurally analogous to -> Akerlof's lemons problem. +> Information asymmetry and adverse selection. The pooling equilibrium in +> Section 7.5 is structurally analogous. [10] Hölmstrom, B. (1979). Moral hazard and observability. *The Bell Journal of Economics*, 10(1), 74–91. doi:[10.2307/3003320](https://doi.org/10.2307/3003320) -> Formal treatment of moral hazard — the problem arising when an agent's -> actions are not fully observable by the principal. The metric-reporting -> scenario in Appendix A.5 is a moral hazard problem: the provider -> (agent) chooses the schedule, but the client (principal) observes only -> the aggregate outcome. +> Formal treatment of moral hazard. The metric-reporting scenario in +> Section 7.5 is a moral hazard problem. ### Psychology [11] Festinger, L. (1957). *A Theory of Cognitive Dissonance*. Stanford University Press. ISBN: 978-0-8047-0131-0. -> Foundational theory of cognitive dissonance. Referenced in Appendix -> B.2: an individual holding contradictory cognitions experiences -> psychological discomfort and is motivated to reduce the contradiction. -> The proof eliminates the ambiguity that would normally allow -> rationalization, making the dissonance load-bearing. +> Foundational theory. Referenced in Section 8.2. [12] Deci, E. L., & Ryan, R. M. (1985). *Intrinsic Motivation and Self-Determination in Human Behavior*. Plenum Press. ISBN: 978-0-306-42022-1. -> Original book-length treatment of Self-Determination Theory, -> identifying autonomy, competence, and relatedness as innate -> psychological needs. Referenced in Appendix B.3. +> Original treatment of Self-Determination Theory. Referenced in +> Section 8.3. [13] Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. *American Psychologist*, 55(1), 68–78. doi:[10.1037/0003-066X.55.1.68](https://doi.org/10.1037/0003-066X.55.1.68) -> Overview and update of Self-Determination Theory, linking need -> satisfaction to intrinsic motivation, job satisfaction, and -> psychological well-being. The three-need framework (autonomy, -> competence, relatedness) applied in Appendix B.3. +> SDT overview linking need satisfaction to intrinsic motivation and +> well-being. [14] Seligman, M. E. P., & Maier, S. F. (1967). Failure to escape traumatic shock. *Journal of Experimental Psychology*, 74(1), 1–9. doi:[10.1037/h0024514](https://doi.org/10.1037/h0024514) -> Original experimental demonstration of learned helplessness. -> Co-authored with Steven F. Maier. Referenced in Appendix B.5: -> repeated exposure to uncontrollable outcomes (failed advocacy for -> better metrics) produces passivity and disengagement. +> Original demonstration of learned helplessness. Referenced in +> Section 8.5. [15] Seligman, M. E. P. (1975). *Helplessness: On Depression, -Development, and Death*. W. H. Freeman. -ISBN: 978-0-7167-0752-3. +Development, and Death*. W. H. Freeman. ISBN: 978-0-7167-0752-3. > Extended treatment connecting learned helplessness to human depression -> and institutional behavior. The concept of "metric fatalism" described -> in Appendix B.5 is a domain-specific instance of learned helplessness -> in organizational settings. +> and institutional behavior. -[16] Shay, J. (1994). *Achilles in Vietnam: Combat Trauma and the -Undoing of Character*. Atheneum / Simon & Schuster. -ISBN: 978-0-689-12182-3. +[16] Shay, J. (1994). *Achilles in Vietnam: Combat Trauma and the Undoing +of Character*. Atheneum / Simon & Schuster. ISBN: 978-0-689-12182-3. -> Introduced the concept of moral injury through analysis of Vietnam -> combat veterans' experiences, drawing parallels to Homer's *Iliad*. -> Defined moral injury as arising from a betrayal of "what's right" by -> someone in legitimate authority in a high-stakes situation. Referenced -> in Appendix B.4. +> Introduced the concept of moral injury. Referenced in Section 8.4. [17] Litz, B. T., Stein, N., Delaney, E., Lebowitz, L., Nash, W. P., Silva, C., & Maguen, S. (2009). Moral injury and moral repair in war @@ -1640,12 +1155,94 @@ veterans: A preliminary model and intervention strategy. *Clinical Psychology Review*, 29(8), 695–706. doi:[10.1016/j.cpr.2009.07.003](https://doi.org/10.1016/j.cpr.2009.07.003) -> Formalized moral injury as a clinical construct and proposed a -> treatment model. Defined moral injury as resulting from "perpetrating, -> failing to prevent, bearing witness to, or learning about acts that -> transgress deeply held moral beliefs and expectations." This definition -> is quoted in Appendix B.4 and applied to knowledge workers operating -> under synthetic metrics. +> Formalized moral injury as a clinical construct. Definition quoted in +> Section 8.4. + +### Organizational Measurement + +[18] Austin, R. D. (1996). *Measuring and Managing Performance in +Organizations*. Dorset House. ISBN: 978-0-932633-36-1. + +> Proved that incomplete measurement creates inevitable incentives to +> optimize measured dimensions at the expense of unmeasured ones. The +> information-asymmetry framing closely parallels Section 7. The single +> most important predecessor to this paper's argument. + +[19] Muller, J. Z. (2018). *The Tyranny of Metrics*. Princeton University +Press. ISBN: 978-0-691-17495-2. + +> Comprehensive treatment of "metric fixation" across education, +> healthcare, policing, and finance. Extensive empirical evidence for the +> patterns theorized in Section 7.4. + +### Scheduling Fairness + +[20] Coffman, E. G., Shanthikumar, J. G., & Yao, D. D. (1992). +Multiclass queueing systems: Polymatroid structure and optimal scheduling +control. *Operations Research*, 40(S2), S293–S299. + +> Conservation laws in scheduling. The schedule-invariance of +> work-weighted completion time (Theorem 2) is an instance of these +> conservation laws. + +[21] Angel, E., Bampis, E., & Pascual, F. (2008). How good are SPT +schedules for fair optimality criteria? *Annals of Operations Research*, +159(1), 53–64. doi:[10.1007/s10479-007-0267-0](https://doi.org/10.1007/s10479-007-0267-0) + +> Directly measures SPT schedule quality against fairness criteria. +> Closest predecessor in scheduling theory to Section 4's fairness +> analysis. + +[22] Bansal, N., & Harchol-Balter, M. (2001). Analysis of SRPT +scheduling: Investigating unfairness. *ACM SIGMETRICS Performance +Evaluation Review*, 29(1), 279–290. +doi:[10.1145/384268.378792](https://doi.org/10.1145/384268.378792) + +> Investigates the belief that SRPT unfairly penalizes large jobs in +> computer scheduling. Argues unfairness is smaller than believed but +> acknowledges the core tension. + +[23] Wierman, A., & Harchol-Balter, M. (2003). Classifying scheduling +policies with respect to unfairness in an M/GI/1. *ACM SIGMETRICS +Performance Evaluation Review*, 31(1), 238–249. + +> Formalizes fairness definitions for scheduling policies by comparison +> to Processor-Sharing. + +### Additional References + +[24] Campbell, D. T. (1979). Assessing the impact of planned social +change. *Evaluation and Program Planning*, 2(1), 67–90. +doi:[10.1016/0149-7189(79)90048-X](https://doi.org/10.1016/0149-7189(79)90048-X) + +> Campbell's Law: "The more any quantitative social indicator is used for +> social decision-making, the more subject it will be to corruption +> pressures and the more apt it will be to distort and corrupt the social +> processes it is intended to monitor." Complements Goodhart's Law [6]. + +[25] Ferreira, C. M., et al. (2024). It's business: A qualitative study +of moral injury in business settings. *Journal of Business Ethics*. +doi:[10.1007/s10551-024-05615-0](https://doi.org/10.1007/s10551-024-05615-0) + +> Extends moral injury to for-profit workplaces. Validates Section 8.4's +> application of Shay/Litz beyond military and healthcare settings. + +[26] Bevan, G., & Hood, C. (2006). What's measured is what matters: +Targets and gaming in the English public health care system. *Public +Administration*, 84(3), 517–538. +doi:[10.1111/j.1467-9299.2006.00600.x](https://doi.org/10.1111/j.1467-9299.2006.00600.x) + +> Empirically documents gaming behaviors including "hitting the target +> and missing the point." Provides real-world evidence for Section 5.2's +> priority-metric contradiction. + +[27] Moore, C. (2012). Why employees do bad things: Moral disengagement +and unethical organizational behavior. *Personnel Psychology*, 65(1), +1–48. doi:[10.1111/j.1744-6570.2011.01237.x](https://doi.org/10.1111/j.1744-6570.2011.01237.x) + +> Analyzes moral *disengagement* — the cognitive restructuring enabling +> unethical behavior. Section 8 addresses the complementary phenomenon: +> harm to individuals who *refuse* to disengage. ---