diff --git a/.backup/README.md.v1 b/.backup/README.md.v1 new file mode 100644 index 0000000..ba10bcf --- /dev/null +++ b/.backup/README.md.v1 @@ -0,0 +1,1249 @@ +# Unweighted Average Completion Time Is Not a Fair Metric for Task Scheduling + +A mathematical proof that unweighted average task completion time is a biased +statistic that incentivizes cherry-picking easy work, and that any scheduling +advantage it appears to reveal is an artifact of the metric — not a reflection +of genuine throughput or service quality. + +--- + +## 1. Introduction + +Many organizations measure task-execution performance by **unweighted mean +completion time**: the average number of hours (or days) between task +submission and task resolution, counting each task equally regardless of +size or priority. + +This paper proves that this metric is not merely imprecise but structurally +biased. It can be improved by reordering work without doing any additional +work (Theorem 1), while a properly weighted alternative is completely +immune to scheduling manipulation (Theorem 2). When combined with a +priority system, the metric actively contradicts the organization's own +priority classifications (Theorem 9). + +The argument proceeds in four parts: + +- **Part I** (Sections 2–4) establishes the mathematical foundation: + the unweighted mean is gameable by Shortest Processing Time (SPT) + scheduling, the work-weighted mean is schedule-invariant, and the + resulting service-quality consequences are provably negative. + +- **Part II** (Sections 5–6) extends the model to priority-classified + tasks, proves the metric becomes adversarial to the priority system, + and proposes weighted alternatives with a worked IT service desk example. + +- **Part III** (Sections 7–9) examines organizational dynamics: what + happens when the metric is reported to clients (information asymmetry), + what happens to team members who understand its flaws (psychological + harm), and what a single informed manager can do about it (constrained + optimization with game-theoretic stability analysis). + +- **Part IV** (Sections 10–12) presents honest counterarguments, situates + the work in existing literature, and concludes. + +The core results build on Smith's (1956) foundational scheduling theory [1], +extended through game theory [9, 10], organizational measurement theory +[18, 19], and psychology [11–17] to trace a complete chain from a +mathematical proof about a specific metric to organizational outcomes. + +--- + +# Part I: Mathematical Foundation + +## 2. Definitions + +Let there be **n** tasks with processing times $p_1, p_2, \ldots, p_n$. + +A **schedule** $\sigma$ is a permutation of $\{1, 2, \ldots, n\}$ assigning +tasks to execution order on a single executor. + +The **completion time** of task $\sigma(k)$ under schedule $\sigma$ is: + +$$C_{\sigma(k)} = \sum_{j=1}^{k} p_{\sigma(j)}$$ + +The **unweighted mean completion time** is: + +$$\bar{C}(\sigma) = \frac{1}{n} \sum_{k=1}^{n} C_{\sigma(k)}$$ + +The **work-weighted mean completion time** is: + +$$\bar{C}_w(\sigma) = \frac{\sum_{k=1}^{n} p_{\sigma(k)} \cdot C_{\sigma(k)}}{\sum_{k=1}^{n} p_{\sigma(k)}}$$ + +--- + +## 3. Core Results + +### 3.1 The Unweighted Mean Is Gameable + +**Theorem 1** (Smith, 1956 [1])**.** The schedule that minimizes +$\bar{C}(\sigma)$ is Shortest Processing Time first (SPT): sort tasks so +that $p_{\sigma(1)} \le p_{\sigma(2)} \le \cdots \le p_{\sigma(n)}$. + +**Proof (exchange argument [1, 2]).** + +Consider any schedule $\sigma$ in which two adjacent tasks $i, j$ satisfy +$p_i > p_j$ with task $i$ scheduled immediately before task $j$. Let $t$ +be the start time of task $i$. + +| | Task $i$ finishes | Task $j$ finishes | Sum | +|---|---|---|---| +| **Before swap** ($i$ then $j$) | $t + p_i$ | $t + p_i + p_j$ | $2t + 2p_i + p_j$ | +| **After swap** ($j$ then $i$) | $t + p_j$ | $t + p_j + p_i$ | $2t + p_i + 2p_j$ | + +The change in the sum of completion times is: + +$$(2p_i + p_j) - (p_i + 2p_j) = p_i - p_j > 0$$ + +Every swap of a longer-before-shorter adjacent pair strictly reduces the +total. Any non-SPT schedule contains such a pair. Repeated swaps converge +to SPT. Therefore SPT uniquely minimizes $\bar{C}(\sigma)$. $\blacksquare$ + +### 3.2 The Work-Weighted Mean Is Schedule-Invariant + +**Theorem 2.** The work-weighted mean completion time $\bar{C}_w(\sigma)$ +is the same for every schedule $\sigma$. + +**Proof.** + +Expand the numerator: + +$$\sum_{k=1}^{n} p_{\sigma(k)} \cdot C_{\sigma(k)} = \sum_{k=1}^{n} p_{\sigma(k)} \sum_{j=1}^{k} p_{\sigma(j)}$$ + +Reindex by letting $a = \sigma(k)$ and $b = \sigma(j)$. The double sum +counts every ordered pair $(a, b)$ where $b$ is scheduled no later than $a$: + +$$= \sum_{\substack{a, b \\ b \preceq_\sigma a}} p_a \, p_b$$ + +For any pair $(a, b)$ with $a \ne b$, exactly one of +$\{b \preceq_\sigma a\}$ or $\{a \prec_\sigma b\}$ holds. The diagonal +terms ($a = b$) contribute $p_a^2$ regardless of order. Therefore: + +$$\sum_{\substack{a, b \\ b \preceq_\sigma a}} p_a \, p_b = \sum_{a} p_a^2 + \sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b$$ + +Together with the complementary sum, the two off-diagonal sums cover all +unordered pairs: + +$$\sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b + \sum_{\substack{a \ne b \\ a \prec_\sigma b}} p_a \, p_b = \sum_{a \ne b} p_a \, p_b$$ + +The right-hand side is schedule-independent. By symmetry of $p_a p_b$, +both off-diagonal sums are equal: + +$$\sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b = \frac{1}{2} \sum_{a \ne b} p_a \, p_b$$ + +Therefore: + +$$\sum_{k=1}^{n} p_{\sigma(k)} \cdot C_{\sigma(k)} = \sum_a p_a^2 + \frac{1}{2} \sum_{a \ne b} p_a \, p_b = \frac{1}{2}\left(\sum_a p_a\right)^2 + \frac{1}{2}\sum_a p_a^2$$ + +This expression contains no reference to $\sigma$. Since the denominator +$\sum p_a$ is also schedule-independent: + +$$\bar{C}_w(\sigma) = \frac{\frac{1}{2}\left(\sum p_a\right)^2 + \frac{1}{2}\sum p_a^2}{\sum p_a}$$ + +is **constant across all schedules**. $\blacksquare$ + +This is an instance of the conservation laws in scheduling identified by +Coffman, Shanthikumar, and Yao [20]. The invariance corresponds to +measuring how long a unit of *work* waits rather than how long a *task* +waits — the unweighted statistic counts completions rather than work, +which is why it is gameable. (See also Little [3, 4] for the queueing- +theoretic context, with the caveat that Little's Law applies directly +only to steady-state systems, not to the batch case analyzed here.) + +### 3.3 Illustrative Example + +Two tasks: $A$ with $p_A = 1$ hour, $B$ with $p_B = 10$ hours. + +| Schedule | $C_A$ | $C_B$ | Unweighted mean | Work-weighted mean | +|----------|-------|-------|-----------------|-------------------| +| SPT (A first) | 1 | 11 | 6.0 | 111/11 ≈ 10.09 | +| Reverse (B first) | 11 | 10 | 10.5 | 111/11 ≈ 10.09 | + +SPT appears **4.5 hours better** on the unweighted metric but provides +**zero improvement** on the work-weighted metric. The apparent advantage +exists only because the unweighted statistic lets a 1-hour task "vote" +equally with a 10-hour task. + +--- + +## 4. Consequences for Service Quality + +### 4.1 Starvation of Large Tasks + +**Theorem 3 (Metric Bias).** Any scheduling policy that minimizes +unweighted mean completion time necessarily maximizes the completion time +of the largest task. + +**Proof.** SPT places the largest task last. Its completion time equals +the total processing time $\sum p_i$, which is the maximum possible +completion time for any individual task. Under any schedule that does not +place the largest task last, that task completes strictly earlier. +$\blacksquare$ + +This creates a **starvation incentive**: rational agents optimizing the +unweighted statistic will indefinitely defer large tasks in favor of small +ones. Austin [18] identified this general pattern — that incomplete +measurement creates incentives to optimize the measured dimension at the +expense of unmeasured ones — in the context of organizational performance +management. Theorem 3 provides the specific mechanism for task scheduling. + +### 4.2 Maximum Completion Time for the Largest Task + +**Theorem 4 (SPT Uniquely Maximizes Completion Time of the Largest Task).** +Among all schedules, SPT is the unique policy that assigns the maximum +possible completion time ($\sum p_i$) to the largest task. + +**Proof.** SPT sorts tasks in ascending order of $p_i$, placing the largest +task $p_{\max}$ in the last position. The last task in any schedule has +completion time $\sum_{i=1}^{n} p_i$, which is the maximum any individual +task can receive. Under any schedule that does not place $p_{\max}$ last, +it completes strictly before $\sum p_i$. $\blacksquare$ + +**Corollary 4.1.** A team optimizing unweighted mean completion time will +systematically deliver the worst experience to clients with the most +complex needs. This is not a side effect — it is the *mechanism* by which +the metric improves. + +**Note on slowdown ratios.** SPT actually *compresses* slowdown ratios +($S_i = C_i / p_i$) because larger tasks in later positions have large +denominators that absorb the accumulated sum. For example, with tasks +$[1, 5, 10]$: SPT gives slowdowns $[1, 1.2, 1.6]$ (low variance) while +LPT gives $[1, 3, 16]$ (high variance). SPT's harm to large-task clients +is not visible in the slowdown ratio — it is visible in **absolute +completion time**. This distinction is important: the scheduling fairness +literature [21, 22, 23] has debated SPT/SRPT unfairness primarily through +slowdown-based measures, which can obscure the absolute-delay burden +proved below. + +### 4.3 Delay Concentration + +**Theorem 5 (SPT Concentrates Delay on the Largest Task).** Under SPT, +the largest task bears more absolute delay than under any other schedule. + +**Proof.** Define absolute delay as $\Delta_i = C_i - p_i$ (time spent +waiting, independent of own size). Under SPT, the largest task is in +position $n$ with: + +$$\Delta_{\max\text{-task}}^{\text{SPT}} = C_n - p_n = \sum_{i=1}^{n-1} p_i$$ + +This is the sum of all other tasks' processing times — the maximum possible +delay for any single task. Under any schedule where the largest task is not +last, its delay is strictly less. Meanwhile, SPT gives the smallest task +zero delay ($\Delta_1^{\text{SPT}} = 0$). The entire queuing burden is +shifted from small tasks to large tasks. $\blacksquare$ + +SPT minimizes *total* delay (good for aggregate efficiency) by +concentrating delay onto the tasks best able to absorb it in slowdown-ratio +terms. But in absolute terms — hours spent waiting — the largest task bears +the full weight. + +### 4.4 Throughput Invariance + +**Theorem 6 (Throughput Invariance).** Total work completed over any time +horizon $T$ is identical under all scheduling policies. + +**Proof.** The executor processes work at a fixed rate. Over any horizon +$T \ge \sum p_i$, the total work done is exactly $\sum p_i$ regardless of +order. For the steady-state case with ongoing arrivals, the long-run +throughput is determined by the service rate $\mu$ and is completely +independent of scheduling: + +$$\lim_{T \to \infty} \frac{W(T)}{T} = \mu \quad \text{for all schedules } \sigma$$ + +$\blacksquare$ + +**Corollary 6.1.** A team that switches from any scheduling policy to SPT +will observe an improvement in unweighted mean completion time with **zero +change in actual throughput**. The metric improves. The output does not. + +### 4.5 The Compound Effect + +Combining Theorems 4, 5, and 6: + +| Measure | Effect of optimizing unweighted mean | +|---------|--------------------------------------| +| Throughput (work/time) | No change (Theorem 6) | +| Delay for small tasks | Minimized — approaches zero (SPT) | +| Delay for large tasks | **Maximized** — bears all queuing burden (Theorem 5) | +| Completion time of largest task | **Maximum possible**: $\sum p_i$ (Theorem 4) | + +The net effect on perceived quality is negative because: + +1. **Loss aversion is asymmetric** [8]. A client whose 100-hour task is + deprioritized experiences a large, salient negative. A client whose + 1-hour task is expedited experiences a small, often unnoticed positive. + +2. **High-effort tasks correlate with high-value clients.** Large tasks + are disproportionately likely to come from major clients, complex + contracts, or critical business needs. + +3. **Starvation compounds.** In a continuous system (Theorem 3), large + tasks may be **indefinitely deferred** as new small tasks keep arriving. + +**Theorem 7 (The Core Result).** For a team processing tasks of non-uniform +size, adopting unweighted mean completion time as a performance metric: + +(a) Provides **zero productivity gain** (Theorem 6), while +(b) **Assigning the maximum possible completion time** to the largest task + (Theorem 4), and +(c) **Concentrating all queuing delay** onto the largest tasks while + eliminating delay for the smallest (Theorem 5). + +This is not a tradeoff. The metric creates a pure transfer of service +quality from high-effort clients to low-effort clients, with no net work +gained. $\blacksquare$ + +--- + +# Part II: Priority Systems + +## 5. Breakdown Under Priority Classification + +The preceding sections proved that unweighted mean completion time is +biased when tasks vary in size. We now show that introducing a **priority +system** — as virtually all real teams use — causes the metric to become +not merely biased but **actively adversarial** to the organization's stated +goals. + +### 5.1 Extended Model: Tasks With Priority + +Let each task $i$ have processing time $p_i$ and a priority class +$q_i \in \{1, 2, 3, 4\}$ where 1 is the highest priority (critical) and +4 is the lowest (cosmetic/enhancement). Assign priority weights: + +$$w(q) = \begin{cases} 8 & q = 1 \text{ (Critical)} \\ 4 & q = 2 \text{ (High)} \\ 2 & q = 3 \text{ (Medium)} \\ 1 & q = 4 \text{ (Low)} \end{cases}$$ + +The specific weights are illustrative; the results hold for any strictly +decreasing weight function. The key property is that priority is assigned +by **business impact**, not by task size. + +### 5.2 The Metric Contradicts the Priority System + +**Theorem 8 (Priority-Size Inversion).** When priority is independent of +task size, the schedule that minimizes unweighted mean completion time +(SPT) will, in expectation, complete low-priority tasks before +high-priority tasks of greater size. + +**Proof.** SPT orders tasks by $p_i$ ascending, regardless of $q_i$. +Consider two tasks: + +- Task A: $p_A = 40$ hours, $q_A = 1$ (Critical — e.g., server outage) +- Task B: $p_B = 0.5$ hours, $q_B = 4$ (Low — e.g., cosmetic UI fix) + +SPT schedules B before A. The unweighted mean for this pair: + +$$\bar{C}^{\text{SPT}} = \frac{0.5 + 40.5}{2} = 20.5 \qquad \bar{C}^{\text{priority}} = \frac{40 + 40.5}{2} = 40.25$$ + +The metric declares SPT nearly **twice as good** — despite completing a +cosmetic fix while a server outage burns. + +In general, when $q_i$ is statistically independent of $p_i$, SPT's +ordering has **zero correlation** with priority. In practice, Critical +tasks (outages, security incidents, data loss) often require more work +than Low tasks, so the metric is plausibly **anti-correlated** with the +priority system. $\blacksquare$ + +### 5.3 Information Destruction + +The unweighted mean reduces a three-dimensional task $(p_i, q_i, C_i)$ to +a one-dimensional signal ($C_i$), then averages uniformly. This discards +priority entirely and implicitly inverts size. + +**Theorem 9 (Information Destruction).** Let $I(\sigma)$ be the mutual +information between the schedule's implicit priority ranking (position) +and the actual priority assignment $q_i$. For SPT: + +$$I(\sigma_{\text{SPT}}) = 0 \quad \text{when } p_i \perp q_i$$ + +**Proof.** SPT assigns positions based solely on $p_i$. When $p_i$ and +$q_i$ are independent, knowing a task's position in the SPT schedule +provides zero information about its priority. $\blacksquare$ + +**Corollary 9.1.** A team that optimizes unweighted mean completion time +is operating a scheduling system that carries zero information about its +own priority classification. The priority field in their ticketing system +is, with respect to execution order, decorative. + +This is an instance of what Austin [18] calls the fundamental problem of +incomplete measurement: when the measurement system captures only a subset +of the relevant dimensions, optimizing the measurement systematically +degrades the unmeasured dimensions. + +### 5.4 Priority-Weighted Delay Cost + +Define the **priority-weighted delay cost** of a schedule: + +$$D(\sigma) = \sum_{i=1}^{n} w(q_i) \cdot C_i$$ + +**Theorem 10 (SPT and Priority-Weighted Delay Cost).** The optimal +schedule for minimizing $D(\sigma)$ is WSJF: order by $w(q_i)/p_i$ +descending [1, 5]. SPT's ordering — by $1/p_i$ descending — ignores +priority entirely and produces higher $D$ than priority-respecting +alternatives when priority is correlated with task size. + +**Proof.** By the exchange argument, swapping adjacent tasks $i, j$ +changes $D$ by: + +$$\Delta D = w(q_j) \cdot p_i - w(q_i) \cdot p_j$$ + +The swap improves $D$ when $w(q_j)/p_j > w(q_i)/p_i$ but $j$ is +scheduled after $i$. Therefore the optimal order is decreasing +$w(q_i)/p_i$ — the WSJF rule. SPT corresponds to WSJF only when +$w(q_i) = \text{const}$ (all tasks have equal priority). + +**Example.** Critical ($w = 8$, $p = 3$) and Low ($w = 1$, $p = 2$): + +- SPT (Low first): $D = 1 \cdot 2 + 8 \cdot 5 = 42$ +- WSJF (Critical first): $D = 8 \cdot 3 + 1 \cdot 5 = 29$ + +SPT incurs 45% more priority-weighted delay. In practice, Critical tasks +tend to be larger (outages, security incidents), making the divergence +systematic. $\blacksquare$ + +--- + +## 6. Proposed Solutions + +### 6.1 Priority-Weighted Metrics + +Replace unweighted mean completion time with the **Priority-Weighted +Completion Score (PWCS)**: + +$$\text{PWCS}(\sigma) = \frac{\sum_{i=1}^{n} w(q_i) \cdot \frac{C_i}{p_i}}{\sum_{i=1}^{n} w(q_i)}$$ + +This is the priority-weighted mean slowdown ratio. It measures how long +each task waited relative to its size, weighted by how much that task +mattered. Lower is better. + +**Properties:** + +1. **Priority-respecting.** Delays to Critical tasks cost 8x more than + delays to Low tasks. +2. **Size-fair.** Uses slowdown ratio $C_i / p_i$, so large tasks are not + penalized for being large. +3. **Not gameable by SPT.** Reordering by processing time does not + systematically improve the score. +4. **Reduces to unweighted mean when tasks are uniform.** A strict + generalization. + +### 6.2 Optimal Policy: WSJF + +**Theorem 11.** The schedule minimizing the priority-weighted completion +time $\text{PWCT}(\sigma) = \sum w(q_i) \cdot C_i / \sum w(q_i)$ processes +tasks in order of decreasing $w(q_i)/p_i$ — the **Weighted Shortest Job +First (WSJF)** rule [1, 5]. + +**Proof.** By the exchange argument (as in Theorem 10), the swap of +adjacent tasks $i, j$ improves PWCT when $w(q_j)/p_j > w(q_i)/p_i$ but +$j$ is scheduled after $i$. The optimal order is therefore decreasing +$w(q_i)/p_i$. $\blacksquare$ + +Within a priority class, this reduces to SPT (shortest first). Across +classes, a Critical 4-hour task ($w/p = 2.0$) beats a Low 1-hour task +($w/p = 1.0$). + +**Practical caveat.** Pure WSJF can place tiny Low-priority tasks ahead +of large Critical tasks (a 15-minute Low task has $w/p = 1/0.25 = 4.0$, +beating a 6-hour Critical at $w/p = 8/6 = 1.33$). In practice, this is +mitigated by enforcing **strict priority-class ordering** and applying +WSJF only *within* each class. + +### 6.3 Applied Example: IT Service Desk + +Consider an IT team with the following ticket queue: + +| Ticket | Priority | Type | Est. Hours | +|--------|----------|------|-----------| +| T1 | P1 (Critical) | Email server down | 6 | +| T2 | P2 (High) | VPN failing for remote team | 4 | +| T3 | P3 (Medium) | New employee laptop setup | 2 | +| T4 | P4 (Low) | Update desktop wallpaper policy | 0.5 | +| T5 | P3 (Medium) | Install software license | 1 | +| T6 | P1 (Critical) | Database backup failing | 3 | +| T7 | P2 (High) | Printer fleet offline | 2 | +| T8 | P4 (Low) | Archive old shared drive folder | 0.25 | + +**SPT order** (optimizing unweighted mean): T8, T4, T5, T3, T7, T6, T2, T1 + +| Pos | Ticket | Priority | Hours | Completion | Slowdown | +|-----|--------|----------|-------|------------|----------| +| 1 | T8 (archive folder) | P4 Low | 0.25 | 0.25 | 1.0 | +| 2 | T4 (wallpaper) | P4 Low | 0.5 | 0.75 | 1.5 | +| 3 | T5 (software) | P3 Med | 1 | 1.75 | 1.75 | +| 4 | T3 (laptop) | P3 Med | 2 | 3.75 | 1.875 | +| 5 | T7 (printers) | P2 High | 2 | 5.75 | 2.875 | +| 6 | T6 (backups) | P1 Crit | 3 | 8.75 | 2.917 | +| 7 | T2 (VPN) | P2 High | 4 | 12.75 | 3.188 | +| 8 | T1 (email) | P1 Crit | 6 | 18.75 | 3.125 | + +**Practical WSJF** (priority-class-first, SPT within class): + +| Pos | Ticket | Priority | Hours | Completion | +|-----|--------|----------|-------|------------| +| 1 | T6 (backups) | P1 Crit | 3 | 3 | +| 2 | T1 (email) | P1 Crit | 6 | 9 | +| 3 | T7 (printers) | P2 High | 2 | 11 | +| 4 | T2 (VPN) | P2 High | 4 | 15 | +| 5 | T5 (software) | P3 Med | 1 | 16 | +| 6 | T3 (laptop) | P3 Med | 2 | 18 | +| 7 | T8 (archive) | P4 Low | 0.25 | 18.25 | +| 8 | T4 (wallpaper) | P4 Low | 0.5 | 18.75 | + +**Comparison:** + +| Metric | SPT | Practical WSJF | Winner | +|--------|-----|----------------|--------| +| Unweighted mean completion | **6.56 hrs** | 13.63 hrs | SPT | +| P1 mean time to resolution | 13.75 hrs | **6 hrs** | WSJF | +| P2 mean time to resolution | 9.25 hrs | **13 hrs** | SPT | +| Time to fix email server | 18.75 hrs | **9 hrs** | WSJF | +| Time to fix database backups | 8.75 hrs | **3 hrs** | WSJF | +| Time to update wallpaper | **0.75 hrs** | 18.75 hrs | SPT | + +The aggregate priority-weighted completion times are nearly identical +(PWCT: 10.2 vs 10.17) because aggregation hides distributional damage. +The real difference is in the **per-priority-class** breakdown: the email +server is down for 18.75 hours under SPT versus 9 hours under WSJF. The +database backups fail for 8.75 hours versus 3. + +The unweighted metric confidently reports SPT as **more than twice as +efficient** (6.56 vs 13.63), rewarding the team that updated desktop +wallpaper while the email server was on fire. + +### 6.4 Recommended Metric Suite + +Even priority-weighted aggregate metrics can fail to distinguish good from +bad schedules, because aggregation hides distributional damage. No single +metric suffices. A complete measurement system should track: + +| Metric | What it measures | Formula | +|--------|-----------------|---------| +| **Mean completion by priority class** | Per-class responsiveness | $\bar{C}$ filtered by $q$ | +| **P1 mean time to resolution** | Critical incident response | $\bar{C}$ for $q = 1$ | +| **Throughput** | Raw work capacity | Work-hours completed / calendar time | +| **Aging violations** | Starvation prevention | Tasks exceeding SLA by priority | +| **Max completion time (P1/P2)** | Worst-case critical response | $\max(C_i)$ for $q \le 2$ | + +The key insight: **per-priority-class metrics** expose scheduling failures +that aggregate metrics hide. + +--- + +# Part III: Organizational Dynamics + +## 7. When the Metric Is the Product + +Sections 2–6 assume that client satisfaction is a function of *experienced +service quality*. But there exists a scenario in which this assumption +fails and the entire argument collapses. + +### 7.1 The Self-Referential Metric + +Suppose the provider reports the unweighted mean directly to the client +— on a dashboard, in an SLA report, on a marketing page — and the +client's satisfaction is derived primarily from *that number*: + +$$U_{\text{client}} = f\!\left(\bar{C}(\sigma)\right), \quad f' < 0$$ + +Under this model, SPT genuinely maximizes client satisfaction (Theorem 1). +Throughput is unchanged (Theorem 6). The business outcome improves: same +work done, happier client. + +**Every theorem in this paper remains mathematically correct. But the +conclusion inverts.** The metric is no longer a proxy that can be gamed — +it *is* the service quality, because the client has agreed to evaluate +quality by the aggregate number. + +### 7.2 The Economics + +This creates a coherent, stable equilibrium: + +| Actor | Behavior | Outcome | +|-------|----------|---------| +| Provider | Optimizes unweighted mean (SPT) | Metric improves, no extra work | +| Client | Reads dashboard, sees low average | Reports satisfaction | +| Management | Sees satisfied client + good metric | Rewards team | + +The provider extracts satisfaction at zero marginal cost, by optimizing a +number the client has accepted as a proxy for quality. + +### 7.3 The Fragility + +This equilibrium is stable only as long as the client never inspects their +own experience. It breaks when: + +1. **The client checks their own ticket.** A CTO whose email server was + down for 18.75 hours will not be reassured by "Average resolution: + 6.56 hours." The clients most likely to inspect are exactly the ones + receiving the worst service (Theorem 4). + +2. **A competitor offers per-ticket SLAs.** "P1 resolved within 4 hours" + beats "average resolution under 7 hours" for any client with critical + needs. + +3. **The team internalizes the metric.** If the team believes the metric + reflects real performance, they lose the ability to recognize when + critical work is neglected. The metric becomes an epistemic hazard. + +### 7.4 The General Pattern + +This pattern — proxy replaces quality, proxy is optimized, quality +diverges, system is stable until tested by reality — recurs across domains. +Muller [19] documents it extensively as "metric fixation"; Campbell [24] +formalized the corrupting effect of using indicators as targets. + +| Domain | Proxy metric | Underlying quality | Divergence | +|--------|-------------|-------------------|------------| +| IT support | Avg. resolution time | Critical system uptime | Server down 19 hrs, avg says 6.5 | +| Education | Test scores | Actual learning | Teaching to the test | +| Healthcare | Patient throughput | Patient outcomes | Faster discharges, higher readmission | +| Finance | Quarterly earnings | Long-term value | Cost-cutting inflates EPS, erodes capability | +| Software | Velocity (story points) | Product quality | Point inflation, features half-finished | + +### 7.5 Information Asymmetry + +Model the system as a game between provider (P) and client (C). P observes +individual $\{C_i\}$ and chooses $\sigma$; C observes only +$\bar{C}(\sigma)$. This is a **moral hazard** problem [10]: P's optimal +strategy is to minimize the observable signal regardless of the +unobservable distribution. + +The equilibrium is a **pooling equilibrium** [9]: P's reported metric +looks identical regardless of the underlying priority-weighted performance. +It is stable until C obtains access to individual $C_i$ values — via a +customer portal, a competitor's transparency, or a sufficiently painful +incident. + +### 7.6 The Uncomfortable Conclusion + +The honest answer to "does optimizing the unweighted mean hurt the +business?" is: **not necessarily, as long as the client never looks behind +the number**. The honest answer to "is this sustainable?" is: it is +exactly as sustainable as any system in which the seller knows more than +the buyer — stable for extended periods, then rapid collapse when the +asymmetry is punctured. + +--- + +## 8. The Psychological Cost of Knowing + +Section 7 modeled the provider as a unitary actor. But teams are composed +of individuals. When a team member understands the proof — when they +*know* the metric is synthetic, that the dashboard is theater, that the +email server is still down while they close wallpaper tickets — a new cost +appears that the equilibrium model omitted. + +### 8.1 The Hidden Variable: Team Awareness + +| Actor | Observes individual $C_i$ | Observes $\bar{C}$ | Understands the proof | +|-------|--------------------------|--------------------|-----------------------| +| Management | Possibly | Yes | Varies | +| Team member | **Yes** | Yes | **Yes** (in this scenario) | +| Client | No | Yes | No | + +The team member has full information. They see the ticket queue. They know +the email server has been down since 7 AM. They know they are closing a +wallpaper ticket because it improves the number. And they know *why*. + +### 8.2 Cognitive Dissonance Under Full Information + +Cognitive dissonance [11] arises when an individual holds contradictory +cognitions. Without understanding *why*, the contradiction can be +rationalized: "management knows best." Understanding the proof removes +the ambiguity. The team member now holds: + +- **Cognition A:** "I am a competent professional. My job is to solve + important problems." +- **Cognition B:** "I am closing a wallpaper ticket while the email + server is down, because the metric is mathematically biased (Theorem 1), + the reordering produces zero throughput (Theorem 6), and the only + beneficiary is the dashboard (Section 7). I can prove this." + +The dissonance is now *load-bearing*. The available resolutions — abandon +professional identity, reject the proof, advocate for change, or leave — +each impose costs that did not exist before. + +### 8.3 Self-Determination Theory: Three Needs Violated + +Deci and Ryan's Self-Determination Theory [12, 13] identifies three needs +predicting intrinsic motivation: + +**Autonomy.** The metric constrains choices in a way the team member +knows is mathematically suboptimal. A worker who understands the process +is provably counterproductive cannot feel autonomous following it. + +**Competence.** The metric rewards *apparent* effectiveness (low $\bar{C}$) +while being invariant to *actual* effectiveness (Theorem 6). Genuine +competence — fixing the email server first — is *punished* by the metric. + +**Relatedness.** The team member knows the client's email server is down. +They could help. They are instead updating wallpaper — not because it +helps anyone, but because it helps a number. The connection between work +and human impact has been severed, and the team member can see the severed +ends. + +### 8.4 Moral Injury + +Moral injury [16, 17] is the lasting harm caused by "perpetrating, failing +to prevent, bearing witness to, or learning about acts that transgress +deeply held moral beliefs" [17]. It has since been extended to business +settings [25]. The key distinction from burnout: **burnout is exhaustion +from doing too much. Moral injury is damage from doing the wrong thing.** + +A team member who knows the email server is down, knows they should fix +it, closes a wallpaper ticket instead, and does so because the metric +requires it, is experiencing the structural conditions for moral injury. + +### 8.5 Learned Helplessness and Metric Fatalism + +Seligman's learned helplessness [14, 15] describes how exposure to +uncontrollable negative outcomes leads to passivity. The sequence: + +1. The metric is flawed (proof understood). +2. Advocate for change. +3. Rejected ("the numbers are good, don't rock the boat"). +4. Repeat with decreasing conviction. +5. Terminal state: "The metric is what it is. I'll just close tickets." + +This is not laziness. It is the rational response to a system that +punishes correct behavior and rewards incorrect behavior, when the +individual lacks power to change the system. + +### 8.6 The Adversarial Selection Spiral + +Combining Section 7's equilibrium with the turnover dynamic: + +1. Organization adopts unweighted mean. Metric looks good (SPT). +2. Aware, competent team members experience psychological costs (8.2–8.5). +3. Those members leave. Replaced by members who do not understand the + metric's flaws or do not care. +4. The metric continues to look good — it always does under SPT, + regardless of team competence (Corollary 6.1). +5. Actual service quality degrades, but the metric cannot detect this + (Corollary 9.1). +6. Return to step 1. + +The metric selects *against* the people who would improve the system and +*for* the people who will not challenge it. The system stabilizes at a +lower level of competence, invisible to its own measurement apparatus. + +### 8.7 The Complete Cost Model + +| Section 7 (visible) | Section 8 (hidden) | +|---------------------|---------------------| +| Client satisfied (good number) | Team dissatisfied (bad reality) | +| Throughput unchanged | Discretionary effort withdrawn | +| Metric improves | Competent members leave | +| Business economy stable | Institutional competence degrades | + +These operate on different timescales: the equilibrium is visible +quarterly; the competence degradation is visible over years. The complete +model is: **the metric works, and it is destructive, and the destruction +is invisible to the metric.** The metric is fresh paint on corroded rebar. + +--- + +## 9. Manager Internalization: The Actionable Solution + +Sections 2–6 say reject the metric. Section 7 says the metric works +(for the business). Section 8 says it destroys the team. In practice, +most managers cannot unilaterally change the metric. The best solution is +company-wide metric reform. The *actionable* solution is what a single +informed manager can do right now. + +### 9.1 The Strategy + +A manager who understands the proof can **internalize the metric's +limitations without propagating them to the team**: + +1. **Schedule primarily by priority.** The team works critical tasks first. +2. **Tactically interleave small tasks.** When a small low-priority task + can be completed without materially delaying high-priority work, do it. + Not because the metric demands it, but because it also needs to get + done and costs almost nothing. +3. **Never reveal the metric as the motivation.** "Knock out this quick + one while we wait for the vendor callback on the P1" — not "we need + to bring our average down." The team's intrinsic motivation remains + intact (Section 8). The manager absorbs the metric-management burden. + +### 9.2 Formalization + +The manager's problem is a constrained optimization: + +$$\min_{\sigma} \sum_{i=1}^{n} w(q_i) \cdot C_i \quad \text{subject to} \quad \bar{C}(\sigma) \le \bar{C}_{\text{target}}$$ + +**Theorem 12 (Bounded Metric Cost of Priority Scheduling).** A manager +who uses SPT *within* each priority class and priority ordering *between* +classes will produce a metric close to the SPT-optimal value — the gap +arises only from between-class inversions. + +**Proof sketch.** Within each priority class, SPT is free (all tasks have +equal priority). The only deviation from global SPT is the between-class +ordering. Each cross-class inversion costs at most +$p_{\text{large}} - p_{\text{small}}$ in the unweighted sum, and these +inversions are bounded by the number of classes. In practice, the gap is +typically within 10–20% of SPT-optimal. $\blacksquare$ + +### 9.3 The Manager as Information Barrier + +| Layer | Sees metric | Sees priorities | Sees proof | +|-------|-----------|----------------|------------| +| Organization | Yes | Nominally | No | +| Manager | Yes | Yes | **Yes** | +| Team | No (shielded) | Yes | Irrelevant | +| Client | Yes (dashboard) | Via SLA | No | + +The manager is the only actor holding all three pieces of information. +This is not manipulation — they are doing the right work in the right +order, and the metric happens to be acceptable because within-class SPT +is free. + +### 9.4 The Competitive Breakdown + +This strategy fails when the metric becomes **competitive between teams**. + +**Case 1: Cooperative** — Teams measured for parity, not ranking. Each +manager independently uses the internalization strategy. The metric is +decorative but harmless. This is a **coordination game** with a stable +cooperative equilibrium. + +**Case 2: Competitive** — Teams ranked by $\bar{C}$. This is a +**prisoner's dilemma**: + +| | Team B: Priority-first | Team B: SPT | +|---|---|---| +| **Team A: Priority-first** | (Good work, Good work) | (A looks bad, B looks good) | +| **Team A: SPT** | (A looks good, B looks bad) | (Both look good, both do wrong work) | + +The Nash equilibrium is (SPT, SPT). The internalization strategy is a +cooperative equilibrium that is **not stable under competition**. + +### 9.5 Scope + +| Condition | Viability | +|-----------|-----------| +| Metric used for health-check / parity | **Viable** | +| Metric visible but not ranked | **Viable** | +| Metric ranked across teams | **Fragile** — requires all managers to cooperate | +| Metric tied to compensation / resources | **Not viable** — prisoner's dilemma dominates | +| Metric reform possible at org level | **Unnecessary** — fix the metric instead | + +**The best solution is company-wide. The actionable solution is a manager +who understands this proof, shields their team from the metric, schedules +by priority, and uses SPT only within priority classes to keep the number +reasonable.** + +--- + +# Part IV: Assessment + +## 10. Devil's Advocate + +Intellectual honesty requires acknowledging where the argument has limits. + +### 10.1 Simplicity Has Real Value + +**Argument.** The unweighted mean requires no priority weights, no +task-size estimates, no calibration. + +**Assessment: True.** But the unweighted metric does not avoid assumptions +— it *hides* them by implicitly setting all weights to 1 and all sizes to +1. A known-imprecise estimate of task size is still more informative than +the implicit assumption that all sizes are equal. + +### 10.2 Minimizing the Number of People Waiting + +**Argument.** SPT minimizes total person-hours spent waiting. If each +task represents one client, this is optimal. + +**Assessment: Mathematically correct.** If you run a DMV and every +person's time is equally valuable, SPT is the right policy. It breaks +down when tasks are not 1:1 with clients, waiting cost is not uniform, +or the metric is used to evaluate teams rather than serve a literal queue. + +### 10.3 SPT as a Triage Heuristic + +**Argument.** When task sizes cluster tightly, SPT approximates FIFO +and the unweighted mean approximates the weighted mean. + +**Assessment: Correct.** The coefficient of variation $CV = \sigma_p / \bar{p}$ determines distortion severity: + +| $CV$ | Task size distribution | Distortion | +|------|----------------------|------------| +| < 0.3 | Tight (call center) | Negligible | +| 0.3 – 1.0 | Moderate (mixed IT) | Moderate | +| > 1.0 | Wide (typical IT queue) | Severe | + +A typical IT desk spans 15 minutes to 40+ hours ($CV > 2$). The +distortion is not an edge case — it is the default. + +### 10.4 Gaming Requires Malice + +**Argument.** The theorems show the metric *can* be gamed, not that it +*will* be gamed. + +**Assessment: This is the strongest counterargument.** If the metric is +purely informational and never influences behavior, the gaming incentive +is absent. However, any metric reported to management, tied to OKRs, or +discussed in retrospectives will influence behavior. This is Goodhart's +Law [6, 7] — and it applies to well-intentioned teams as reliably as to +cynical ones. The drift happens organically: completing three easy tickets +"feels productive" while the metric validates the feeling. + +### 10.5 When the Unweighted Mean Is Defensible + +The metric is defensible **only when all four conditions hold**: + +1. Task sizes are approximately uniform ($CV < 0.3$) +2. No priority differentiation (all tasks equally important) +3. Each task represents exactly one client +4. The metric is not used to evaluate, reward, or direct behavior + +These conditions are rarely met in the systems where the metric is most +commonly used. + +--- + +## 11. Related Work + +This paper sits at the intersection of several literatures that have not +previously been connected. + +### 11.1 Scheduling Theory and Fairness + +Smith [1] established the SPT optimality result and the WSJF rule in 1956. +Conway, Maxwell, and Miller [2] provided the comprehensive textbook +treatment. The fairness of size-based scheduling policies has been debated +in computer systems scheduling: Bansal and Harchol-Balter [22] investigated +SRPT unfairness; Wierman and Harchol-Balter [23] formalized fairness +classifications against Processor-Sharing; Angel, Bampis, and Pascual [21] +measured SPT schedule quality against fair optimality criteria. + +This prior work analyzes fairness in CPU and server scheduling. The present +paper applies the same mathematical results to *organizational task +management*, where the "scheduler" is a human team, the "jobs" are client +requests with business-impact priorities, and the "objective function" is +a management metric. The mechanism is identical; the consequences differ +because organizational scheduling has priority systems, client +relationships, and psychological costs that CPU scheduling does not. + +### 11.2 Measurement Dysfunction + +Austin [18] proved that incomplete measurement — measuring only a subset +of relevant dimensions — creates incentives to optimize the measured +dimensions at the expense of unmeasured ones, and that this effect is not +merely possible but *inevitable* when measurement is tied to rewards. His +information-asymmetry framing closely parallels Section 7. The present +paper provides the specific mathematical mechanism (Theorems 1–2) for the +case of task scheduling, and extends the argument through psychology +(Section 8) to trace the complete chain of organizational harm. + +Muller [19] documented "metric fixation" across education, healthcare, +policing, and finance, providing extensive empirical evidence for the +patterns theorized in Section 7.4. Campbell [24] formalized the corrupting +effect of using indicators as targets, complementing Goodhart's original +observation [6] and Strathern's generalization [7]. + +Bevan and Hood [26] empirically documented gaming behaviors in the English +public health system — including the exact patterns of "hitting the target +and missing the point" described in our Section 5.2. + +### 11.3 Psychological Costs of Metric Dysfunction + +The application of moral injury (Shay [16], Litz et al. [17]) to business +settings has recent precedent: a 2024 *Journal of Business Ethics* study +[25] explicitly extended the construct to for-profit workplaces, finding +structural conditions similar to those described in Section 8.4. Moore +[27] analyzed moral *disengagement* — the cognitive restructuring that +enables unethical behavior under organizational pressure. The present +paper addresses the complementary phenomenon: the harm to individuals who +*refuse* to disengage. + +### 11.4 What Is Novel + +The individual components — SPT optimality, Goodhart's Law, measurement +dysfunction, moral injury — all have precedent. The contributions of this +paper are: + +1. **The conservation law (Theorem 2) used prescriptively** — as a + constructive argument that work-weighted completion time *cannot* be + gamed, rather than as a theoretical scheduling result. + +2. **The specific proof that priority classes make the metric algebraically + adversarial** (Theorems 8–9) — not merely empirically bad but + structurally contradictory, with zero mutual information between the + schedule and the priority system. + +3. **The integrated chain** from mathematical proof through information + asymmetry through psychological harm through adversarial selection + spiral — tracing a single metric from Smith (1956) to organizational + hollowing. + +4. **The manager internalization strategy** (Section 9) with formal + game-theoretic analysis of its stability and breakdown conditions + under inter-team competition. + +5. **The application of scheduling theory to organizational management + critique** — proving that a commonly used team metric has specific, + quantifiable pathologies rather than arguing from anecdote or + general principle. + +--- + +## 12. Conclusion + +The unweighted average completion time is a **biased statistic** that: + +1. **Can be gamed** by scheduling policy (Theorem 1), unlike work-weighted + completion time which is schedule-invariant (Theorem 2). +2. **Incentivizes starvation** of large tasks (Theorem 3). +3. **Degrades client satisfaction** with zero compensating productivity + gain (Theorem 7). +4. **Actively contradicts priority systems** by carrying zero information + about business-impact classification (Theorem 9). +5. **Ignores priority entirely** in its scheduling recommendation, + producing suboptimal priority-weighted delay whenever priority and + size are not perfectly inversely correlated (Theorem 10). + +A metric that can be improved by reordering work — without doing any +additional work — is measuring the scheduling policy, not the system's +capacity. When combined with a priority system, it recommends the schedule +that inflicts the most damage on the highest-priority work. + +When the metric is reported to clients, it creates an information asymmetry +(Section 7) whose business equilibrium is profitable but fragile. When +team members understand its flaws, it violates their intrinsic motivation +and selects for the departure of the most competent people (Section 8). +A single informed manager can partially mitigate these effects through +constrained optimization (Section 9), but this cooperative strategy is +not stable under inter-team competition. + +The unweighted mean is defensible only under narrow conditions +(Section 10.5): uniform task sizes, no priorities, one-to-one client-task +mapping, and no behavioral influence. These conditions are rarely met. + +**Unweighted average completion time is not a fair or accurate measurement +of task execution performance. Its adoption as a team metric will +rationally produce starvation of complex work, violation of stated +priorities, inequitable client outcomes, and the illusion of productivity +where none exists.** + +The best solution is organizational metric reform. The actionable solution +is a manager who understands this proof. + +--- + +## References + +### Scheduling Theory + +[1] Smith, W. E. (1956). Various optimizers for single-stage production. +*Naval Research Logistics Quarterly*, 3(1–2), 59–66. +doi:[10.1002/nav.3800030106](https://doi.org/10.1002/nav.3800030106) + +> Origin of the SPT optimality result (Theorem 1), the weighted completion +> time rule $w_i/p_i$ descending (WSJF, Theorem 11), and the adjacent-job +> pairwise interchange (exchange argument) proof technique used throughout. + +[2] Conway, R. W., Maxwell, W. L., & Miller, L. W. (1967). *Theory of +Scheduling*. Addison-Wesley. + +> Standard textbook treatment of single-machine scheduling theory, +> extending Smith's results. + +[3] Little, J. D. C. (1961). A proof for the queuing formula: L = λW. +*Operations Research*, 9(3), 383–387. +doi:[10.1287/opre.9.3.383](https://doi.org/10.1287/opre.9.3.383) + +> First rigorous proof of Little's Law. Referenced in Section 3.2 for +> queueing-theoretic context. + +[4] Little, J. D. C. (2011). Little's Law as viewed on its 50th +anniversary. *Operations Research*, 59(3), 536–549. +doi:[10.1287/opre.1110.0941](https://doi.org/10.1287/opre.1110.0941) + +> Retrospective discussing scope, limitations, and common misapplications. + +[5] Reinertsen, D. G. (2009). *The Principles of Product Development +Flow: Second Generation Lean Product Development*. Celeritas Publishing. +ISBN: 978-0-9844512-0-8. + +> Popularized WSJF and "Cost of Delay / Duration" in agile/lean contexts. +> Mathematical foundation is Smith (1956) [1]. + +### Measurement and Incentives + +[6] Goodhart, C. A. E. (1984). Problems of monetary management: The U.K. +experience. In *Monetary Theory and Practice* (pp. 91–121). Macmillan. + +> Source of Goodhart's Law: "Any observed statistical regularity will tend +> to collapse once pressure is placed upon it for control purposes." + +[7] Strathern, M. (1997). 'Improving ratings': Audit in the British +university system. *European Review*, 5(3), 305–321. +doi:[10.1002/(SICI)1234-981X(199707)5:3<305::AID-EURO184>3.0.CO;2-4](https://doi.org/10.1002/(SICI)1234-981X(199707)5:3%3C305::AID-EURO184%3E3.0.CO;2-4) + +> Generalized Goodhart's Law: "When a measure becomes a target, it ceases +> to be a good measure." + +### Behavioral Economics + +[8] Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of +decision under risk. *Econometrica*, 47(2), 263–292. +doi:[10.2307/1914185](https://doi.org/10.2307/1914185) + +> Established loss aversion. Referenced in Section 4.5. + +### Game Theory and Contract Theory + +[9] Akerlof, G. A. (1970). The market for "lemons": Quality uncertainty +and the market mechanism. *The Quarterly Journal of Economics*, 84(3), +488–500. doi:[10.2307/1879431](https://doi.org/10.2307/1879431) + +> Information asymmetry and adverse selection. The pooling equilibrium in +> Section 7.5 is structurally analogous. + +[10] Hölmstrom, B. (1979). Moral hazard and observability. *The Bell +Journal of Economics*, 10(1), 74–91. +doi:[10.2307/3003320](https://doi.org/10.2307/3003320) + +> Formal treatment of moral hazard. The metric-reporting scenario in +> Section 7.5 is a moral hazard problem. + +### Psychology + +[11] Festinger, L. (1957). *A Theory of Cognitive Dissonance*. Stanford +University Press. ISBN: 978-0-8047-0131-0. + +> Foundational theory. Referenced in Section 8.2. + +[12] Deci, E. L., & Ryan, R. M. (1985). *Intrinsic Motivation and +Self-Determination in Human Behavior*. Plenum Press. +ISBN: 978-0-306-42022-1. + +> Original treatment of Self-Determination Theory. Referenced in +> Section 8.3. + +[13] Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and +the facilitation of intrinsic motivation, social development, and +well-being. *American Psychologist*, 55(1), 68–78. +doi:[10.1037/0003-066X.55.1.68](https://doi.org/10.1037/0003-066X.55.1.68) + +> SDT overview linking need satisfaction to intrinsic motivation and +> well-being. + +[14] Seligman, M. E. P., & Maier, S. F. (1967). Failure to escape +traumatic shock. *Journal of Experimental Psychology*, 74(1), 1–9. +doi:[10.1037/h0024514](https://doi.org/10.1037/h0024514) + +> Original demonstration of learned helplessness. Referenced in +> Section 8.5. + +[15] Seligman, M. E. P. (1975). *Helplessness: On Depression, +Development, and Death*. W. H. Freeman. ISBN: 978-0-7167-0752-3. + +> Extended treatment connecting learned helplessness to human depression +> and institutional behavior. + +[16] Shay, J. (1994). *Achilles in Vietnam: Combat Trauma and the Undoing +of Character*. Atheneum / Simon & Schuster. ISBN: 978-0-689-12182-3. + +> Introduced the concept of moral injury. Referenced in Section 8.4. + +[17] Litz, B. T., Stein, N., Delaney, E., Lebowitz, L., Nash, W. P., +Silva, C., & Maguen, S. (2009). Moral injury and moral repair in war +veterans: A preliminary model and intervention strategy. *Clinical +Psychology Review*, 29(8), 695–706. +doi:[10.1016/j.cpr.2009.07.003](https://doi.org/10.1016/j.cpr.2009.07.003) + +> Formalized moral injury as a clinical construct. Definition quoted in +> Section 8.4. + +### Organizational Measurement + +[18] Austin, R. D. (1996). *Measuring and Managing Performance in +Organizations*. Dorset House. ISBN: 978-0-932633-36-1. + +> Proved that incomplete measurement creates inevitable incentives to +> optimize measured dimensions at the expense of unmeasured ones. The +> information-asymmetry framing closely parallels Section 7. The single +> most important predecessor to this paper's argument. + +[19] Muller, J. Z. (2018). *The Tyranny of Metrics*. Princeton University +Press. ISBN: 978-0-691-17495-2. + +> Comprehensive treatment of "metric fixation" across education, +> healthcare, policing, and finance. Extensive empirical evidence for the +> patterns theorized in Section 7.4. + +### Scheduling Fairness + +[20] Coffman, E. G., Shanthikumar, J. G., & Yao, D. D. (1992). +Multiclass queueing systems: Polymatroid structure and optimal scheduling +control. *Operations Research*, 40(S2), S293–S299. + +> Conservation laws in scheduling. The schedule-invariance of +> work-weighted completion time (Theorem 2) is an instance of these +> conservation laws. + +[21] Angel, E., Bampis, E., & Pascual, F. (2008). How good are SPT +schedules for fair optimality criteria? *Annals of Operations Research*, +159(1), 53–64. doi:[10.1007/s10479-007-0267-0](https://doi.org/10.1007/s10479-007-0267-0) + +> Directly measures SPT schedule quality against fairness criteria. +> Closest predecessor in scheduling theory to Section 4's fairness +> analysis. + +[22] Bansal, N., & Harchol-Balter, M. (2001). Analysis of SRPT +scheduling: Investigating unfairness. *ACM SIGMETRICS Performance +Evaluation Review*, 29(1), 279–290. +doi:[10.1145/384268.378792](https://doi.org/10.1145/384268.378792) + +> Investigates the belief that SRPT unfairly penalizes large jobs in +> computer scheduling. Argues unfairness is smaller than believed but +> acknowledges the core tension. + +[23] Wierman, A., & Harchol-Balter, M. (2003). Classifying scheduling +policies with respect to unfairness in an M/GI/1. *ACM SIGMETRICS +Performance Evaluation Review*, 31(1), 238–249. + +> Formalizes fairness definitions for scheduling policies by comparison +> to Processor-Sharing. + +### Additional References + +[24] Campbell, D. T. (1979). Assessing the impact of planned social +change. *Evaluation and Program Planning*, 2(1), 67–90. +doi:[10.1016/0149-7189(79)90048-X](https://doi.org/10.1016/0149-7189(79)90048-X) + +> Campbell's Law: "The more any quantitative social indicator is used for +> social decision-making, the more subject it will be to corruption +> pressures and the more apt it will be to distort and corrupt the social +> processes it is intended to monitor." Complements Goodhart's Law [6]. + +[25] Ferreira, C. M., et al. (2024). It's business: A qualitative study +of moral injury in business settings. *Journal of Business Ethics*. +doi:[10.1007/s10551-024-05615-0](https://doi.org/10.1007/s10551-024-05615-0) + +> Extends moral injury to for-profit workplaces. Validates Section 8.4's +> application of Shay/Litz beyond military and healthcare settings. + +[26] Bevan, G., & Hood, C. (2006). What's measured is what matters: +Targets and gaming in the English public health care system. *Public +Administration*, 84(3), 517–538. +doi:[10.1111/j.1467-9299.2006.00600.x](https://doi.org/10.1111/j.1467-9299.2006.00600.x) + +> Empirically documents gaming behaviors including "hitting the target +> and missing the point." Provides real-world evidence for Section 5.2's +> priority-metric contradiction. + +[27] Moore, C. (2012). Why employees do bad things: Moral disengagement +and unethical organizational behavior. *Personnel Psychology*, 65(1), +1–48. doi:[10.1111/j.1744-6570.2011.01237.x](https://doi.org/10.1111/j.1744-6570.2011.01237.x) + +> Analyzes moral *disengagement* — the cognitive restructuring enabling +> unethical behavior. Section 8 addresses the complementary phenomenon: +> harm to individuals who *refuse* to disengage. + +--- + +*This proof was developed conversationally and formalized on 2026-03-28.* diff --git a/README.md b/README.md index ba10bcf..9d7d4a6 100644 --- a/README.md +++ b/README.md @@ -255,9 +255,63 @@ $\blacksquare$ will observe an improvement in unweighted mean completion time with **zero change in actual throughput**. The metric improves. The output does not. -### 4.5 The Compound Effect +### 4.5 The Aged-Task Abandonment Incentive -Combining Theorems 4, 5, and 6: +Theorems 3–5 show that SPT deprioritizes large tasks. But the metric +creates a second, more destructive incentive: **completing old tasks is +actively punished**. + +**Theorem 6.1 (Aged-Task Penalty).** Completing a single task with +completion time $C_{\text{old}}$ increases the running mean by more than +completing $C_{\text{old}}$ tasks with completion time 1 each. + +**Proof.** Let the team have completed $m$ tasks with running sum +$S = \sum_{i=1}^{m} C_i$ and running mean $\bar{C} = S/m$. + +**Case 1:** Complete one task with completion time $C_{\text{old}}$: + +$$\bar{C}_1 = \frac{S + C_{\text{old}}}{m + 1}$$ + +**Case 2:** Complete $C_{\text{old}}$ tasks each with completion time 1: + +$$\bar{C}_2 = \frac{S + C_{\text{old}}}{m + C_{\text{old}}}$$ + +Both cases add the same value ($C_{\text{old}}$) to the numerator. But +Case 2 adds $C_{\text{old}}$ completions to the denominator, while Case 1 +adds only 1. Therefore: + +$$\bar{C}_1 - \bar{C}_2 = \frac{S + C_{\text{old}}}{m + 1} - \frac{S + C_{\text{old}}}{m + C_{\text{old}}} = (S + C_{\text{old}}) \cdot \frac{C_{\text{old}} - 1}{(m+1)(m + C_{\text{old}})}$$ + +For $C_{\text{old}} > 1$, this difference is strictly positive: the old +task produces a **worse average** than the equivalent volume of fresh +work. $\blacksquare$ + +**Example.** A team has completed 100 tasks with a running mean of 2 days +($S = 200$). They can either: + +- Complete one 26-day-old task: $\bar{C} = 226/101 = 2.24$ days +- Complete 26 tasks at 1 day each: $\bar{C} = 226/126 = 1.79$ days + +Same 26 days of total wait resolved. The metric says the second team is +better — 1.79 vs 2.24 — despite resolving the same total wait time. + +**Corollary 6.2 (Abandonment Incentive).** Under the unweighted mean, +the rational response to an aged task is not to deprioritize it (SPT, +Theorem 3) but to **remove it from the system entirely** — close it as +"won't fix," transfer it to another team, or let it expire. This removes +the task from both numerator and denominator, protecting the average. + +This goes beyond starvation. Theorems 3–5 prove that the metric +*delays* large and old tasks. Theorem 6.1 proves that the metric +*punishes completion of them* — meaning the incentive is not merely to +defer but to abandon. A metric that penalizes resolving the hardest +problems is not measuring performance; it is measuring avoidance. + +--- + +### 4.6 The Compound Effect + +Combining Theorems 4, 5, 6, and 6.1: | Measure | Effect of optimizing unweighted mean | |---------|--------------------------------------| @@ -265,6 +319,7 @@ Combining Theorems 4, 5, and 6: | Delay for small tasks | Minimized — approaches zero (SPT) | | Delay for large tasks | **Maximized** — bears all queuing burden (Theorem 5) | | Completion time of largest task | **Maximum possible**: $\sum p_i$ (Theorem 4) | +| Incentive for aged tasks | **Abandon rather than complete** (Theorem 6.1) | The net effect on perceived quality is negative because: @@ -996,11 +1051,13 @@ The unweighted average completion time is a **biased statistic** that: 1. **Can be gamed** by scheduling policy (Theorem 1), unlike work-weighted completion time which is schedule-invariant (Theorem 2). 2. **Incentivizes starvation** of large tasks (Theorem 3). -3. **Degrades client satisfaction** with zero compensating productivity +3. **Punishes completion of aged tasks**, incentivizing abandonment + over resolution (Theorem 6.1). +4. **Degrades client satisfaction** with zero compensating productivity gain (Theorem 7). -4. **Actively contradicts priority systems** by carrying zero information +5. **Actively contradicts priority systems** by carrying zero information about business-impact classification (Theorem 9). -5. **Ignores priority entirely** in its scheduling recommendation, +6. **Ignores priority entirely** in its scheduling recommendation, producing suboptimal priority-weighted delay whenever priority and size are not perfectly inversely correlated (Theorem 10).