Files
task-queue-proof/README.md
T
Mortdecai 6d3e4a5cb3 Complete structural revision: reorganize, add citations, tighten prose
Major restructure into four parts with clear argumentative arc:
  Part I:   Mathematical Foundation (Theorems 1-7)
  Part II:  Priority Systems (Theorems 8-11, IT example)
  Part III: Organizational Dynamics (info asymmetry, psychology, manager strategy)
  Part IV:  Assessment (devil's advocate, related work, conclusion)

Structural changes:
- Added Section 1 (Introduction) framing the contribution
- Promoted Appendices A/B to full Sections 7/8 (load-bearing content)
- Merged Little's Law as a remark in Section 3.2 (was a detour)
- Merged "When Valid" into Devil's Advocate Section 10.5
- Added Section 11 (Related Work) situating the paper
- Cleaned up "Hmm" and "Wait" language in Theorems 11/WSJF
- Renumbered all sections and cross-references
- Net reduction of 400 lines while adding new content

New citations [18-27]:
- Austin (1996) - measurement dysfunction (most important predecessor)
- Muller (2018) - The Tyranny of Metrics
- Coffman/Shanthikumar/Yao (1992) - conservation laws in scheduling
- Angel/Bampis/Pascual (2008) - SPT fairness criteria
- Bansal/Harchol-Balter (2001) - SRPT unfairness
- Wierman/Harchol-Balter (2003) - fairness classifications
- Campbell (1979) - Campbell's Law
- Ferreira et al. (2024) - moral injury in business
- Bevan/Hood (2006) - gaming in public health
- Moore (2012) - moral disengagement (complementary to our argument)

Citations woven into body: Austin referenced in Sections 4.1, 5.3;
scheduling fairness papers in Section 4.2 note; Campbell/Muller in
Section 7.4; moral injury extension in Section 8.4; all contextualized
in Related Work Section 11.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 19:03:12 -04:00

53 KiB
Raw Blame History

Unweighted Average Completion Time Is Not a Fair Metric for Task Scheduling

A mathematical proof that unweighted average task completion time is a biased statistic that incentivizes cherry-picking easy work, and that any scheduling advantage it appears to reveal is an artifact of the metric — not a reflection of genuine throughput or service quality.


1. Introduction

Many organizations measure task-execution performance by unweighted mean completion time: the average number of hours (or days) between task submission and task resolution, counting each task equally regardless of size or priority.

This paper proves that this metric is not merely imprecise but structurally biased. It can be improved by reordering work without doing any additional work (Theorem 1), while a properly weighted alternative is completely immune to scheduling manipulation (Theorem 2). When combined with a priority system, the metric actively contradicts the organization's own priority classifications (Theorem 9).

The argument proceeds in four parts:

  • Part I (Sections 24) establishes the mathematical foundation: the unweighted mean is gameable by Shortest Processing Time (SPT) scheduling, the work-weighted mean is schedule-invariant, and the resulting service-quality consequences are provably negative.

  • Part II (Sections 56) extends the model to priority-classified tasks, proves the metric becomes adversarial to the priority system, and proposes weighted alternatives with a worked IT service desk example.

  • Part III (Sections 79) examines organizational dynamics: what happens when the metric is reported to clients (information asymmetry), what happens to team members who understand its flaws (psychological harm), and what a single informed manager can do about it (constrained optimization with game-theoretic stability analysis).

  • Part IV (Sections 1012) presents honest counterarguments, situates the work in existing literature, and concludes.

The core results build on Smith's (1956) foundational scheduling theory [1], extended through game theory [9, 10], organizational measurement theory [18, 19], and psychology [1117] to trace a complete chain from a mathematical proof about a specific metric to organizational outcomes.


Part I: Mathematical Foundation

2. Definitions

Let there be n tasks with processing times p_1, p_2, \ldots, p_n.

A schedule \sigma is a permutation of \{1, 2, \ldots, n\} assigning tasks to execution order on a single executor.

The completion time of task \sigma(k) under schedule \sigma is:

C_{\sigma(k)} = \sum_{j=1}^{k} p_{\sigma(j)}

The unweighted mean completion time is:

\bar{C}(\sigma) = \frac{1}{n} \sum_{k=1}^{n} C_{\sigma(k)}

The work-weighted mean completion time is:

\bar{C}_w(\sigma) = \frac{\sum_{k=1}^{n} p_{\sigma(k)} \cdot C_{\sigma(k)}}{\sum_{k=1}^{n} p_{\sigma(k)}}

3. Core Results

3.1 The Unweighted Mean Is Gameable

Theorem 1 (Smith, 1956 [1]). The schedule that minimizes \bar{C}(\sigma) is Shortest Processing Time first (SPT): sort tasks so that p_{\sigma(1)} \le p_{\sigma(2)} \le \cdots \le p_{\sigma(n)}.

Proof (exchange argument [1, 2]).

Consider any schedule \sigma in which two adjacent tasks i, j satisfy p_i > p_j with task i scheduled immediately before task j. Let t be the start time of task i.

Task i finishes Task j finishes Sum
Before swap (i then j) t + p_i t + p_i + p_j 2t + 2p_i + p_j
After swap (j then i) t + p_j t + p_j + p_i 2t + p_i + 2p_j

The change in the sum of completion times is:

(2p_i + p_j) - (p_i + 2p_j) = p_i - p_j > 0

Every swap of a longer-before-shorter adjacent pair strictly reduces the total. Any non-SPT schedule contains such a pair. Repeated swaps converge to SPT. Therefore SPT uniquely minimizes \bar{C}(\sigma). \blacksquare

3.2 The Work-Weighted Mean Is Schedule-Invariant

Theorem 2. The work-weighted mean completion time \bar{C}_w(\sigma) is the same for every schedule \sigma.

Proof.

Expand the numerator:

\sum_{k=1}^{n} p_{\sigma(k)} \cdot C_{\sigma(k)} = \sum_{k=1}^{n} p_{\sigma(k)} \sum_{j=1}^{k} p_{\sigma(j)}

Reindex by letting a = \sigma(k) and b = \sigma(j). The double sum counts every ordered pair (a, b) where b is scheduled no later than a:

= \sum_{\substack{a, b \\ b \preceq_\sigma a}} p_a \, p_b

For any pair (a, b) with a \ne b, exactly one of \{b \preceq_\sigma a\} or \{a \prec_\sigma b\} holds. The diagonal terms (a = b) contribute p_a^2 regardless of order. Therefore:

\sum_{\substack{a, b \\ b \preceq_\sigma a}} p_a \, p_b = \sum_{a} p_a^2 + \sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b

Together with the complementary sum, the two off-diagonal sums cover all unordered pairs:

\sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b + \sum_{\substack{a \ne b \\ a \prec_\sigma b}} p_a \, p_b = \sum_{a \ne b} p_a \, p_b

The right-hand side is schedule-independent. By symmetry of p_a p_b, both off-diagonal sums are equal:

\sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b = \frac{1}{2} \sum_{a \ne b} p_a \, p_b

Therefore:

\sum_{k=1}^{n} p_{\sigma(k)} \cdot C_{\sigma(k)} = \sum_a p_a^2 + \frac{1}{2} \sum_{a \ne b} p_a \, p_b = \frac{1}{2}\left(\sum_a p_a\right)^2 + \frac{1}{2}\sum_a p_a^2

This expression contains no reference to \sigma. Since the denominator \sum p_a is also schedule-independent:

\bar{C}_w(\sigma) = \frac{\frac{1}{2}\left(\sum p_a\right)^2 + \frac{1}{2}\sum p_a^2}{\sum p_a}

is constant across all schedules. \blacksquare

This is an instance of the conservation laws in scheduling identified by Coffman, Shanthikumar, and Yao [20]. The invariance corresponds to measuring how long a unit of work waits rather than how long a task waits — the unweighted statistic counts completions rather than work, which is why it is gameable. (See also Little [3, 4] for the queueing- theoretic context, with the caveat that Little's Law applies directly only to steady-state systems, not to the batch case analyzed here.)

3.3 Illustrative Example

Two tasks: A with p_A = 1 hour, B with p_B = 10 hours.

Schedule C_A C_B Unweighted mean Work-weighted mean
SPT (A first) 1 11 6.0 111/11 ≈ 10.09
Reverse (B first) 11 10 10.5 111/11 ≈ 10.09

SPT appears 4.5 hours better on the unweighted metric but provides zero improvement on the work-weighted metric. The apparent advantage exists only because the unweighted statistic lets a 1-hour task "vote" equally with a 10-hour task.


4. Consequences for Service Quality

4.1 Starvation of Large Tasks

Theorem 3 (Metric Bias). Any scheduling policy that minimizes unweighted mean completion time necessarily maximizes the completion time of the largest task.

Proof. SPT places the largest task last. Its completion time equals the total processing time \sum p_i, which is the maximum possible completion time for any individual task. Under any schedule that does not place the largest task last, that task completes strictly earlier. \blacksquare

This creates a starvation incentive: rational agents optimizing the unweighted statistic will indefinitely defer large tasks in favor of small ones. Austin [18] identified this general pattern — that incomplete measurement creates incentives to optimize the measured dimension at the expense of unmeasured ones — in the context of organizational performance management. Theorem 3 provides the specific mechanism for task scheduling.

4.2 Maximum Completion Time for the Largest Task

Theorem 4 (SPT Uniquely Maximizes Completion Time of the Largest Task). Among all schedules, SPT is the unique policy that assigns the maximum possible completion time (\sum p_i) to the largest task.

Proof. SPT sorts tasks in ascending order of p_i, placing the largest task p_{\max} in the last position. The last task in any schedule has completion time \sum_{i=1}^{n} p_i, which is the maximum any individual task can receive. Under any schedule that does not place p_{\max} last, it completes strictly before \sum p_i. \blacksquare

Corollary 4.1. A team optimizing unweighted mean completion time will systematically deliver the worst experience to clients with the most complex needs. This is not a side effect — it is the mechanism by which the metric improves.

Note on slowdown ratios. SPT actually compresses slowdown ratios (S_i = C_i / p_i) because larger tasks in later positions have large denominators that absorb the accumulated sum. For example, with tasks [1, 5, 10]: SPT gives slowdowns [1, 1.2, 1.6] (low variance) while LPT gives [1, 3, 16] (high variance). SPT's harm to large-task clients is not visible in the slowdown ratio — it is visible in absolute completion time. This distinction is important: the scheduling fairness literature [21, 22, 23] has debated SPT/SRPT unfairness primarily through slowdown-based measures, which can obscure the absolute-delay burden proved below.

4.3 Delay Concentration

Theorem 5 (SPT Concentrates Delay on the Largest Task). Under SPT, the largest task bears more absolute delay than under any other schedule.

Proof. Define absolute delay as \Delta_i = C_i - p_i (time spent waiting, independent of own size). Under SPT, the largest task is in position n with:

\Delta_{\max\text{-task}}^{\text{SPT}} = C_n - p_n = \sum_{i=1}^{n-1} p_i

This is the sum of all other tasks' processing times — the maximum possible delay for any single task. Under any schedule where the largest task is not last, its delay is strictly less. Meanwhile, SPT gives the smallest task zero delay (\Delta_1^{\text{SPT}} = 0). The entire queuing burden is shifted from small tasks to large tasks. \blacksquare

SPT minimizes total delay (good for aggregate efficiency) by concentrating delay onto the tasks best able to absorb it in slowdown-ratio terms. But in absolute terms — hours spent waiting — the largest task bears the full weight.

4.4 Throughput Invariance

Theorem 6 (Throughput Invariance). Total work completed over any time horizon T is identical under all scheduling policies.

Proof. The executor processes work at a fixed rate. Over any horizon T \ge \sum p_i, the total work done is exactly \sum p_i regardless of order. For the steady-state case with ongoing arrivals, the long-run throughput is determined by the service rate \mu and is completely independent of scheduling:

\lim_{T \to \infty} \frac{W(T)}{T} = \mu \quad \text{for all schedules } \sigma

\blacksquare

Corollary 6.1. A team that switches from any scheduling policy to SPT will observe an improvement in unweighted mean completion time with zero change in actual throughput. The metric improves. The output does not.

4.5 The Compound Effect

Combining Theorems 4, 5, and 6:

Measure Effect of optimizing unweighted mean
Throughput (work/time) No change (Theorem 6)
Delay for small tasks Minimized — approaches zero (SPT)
Delay for large tasks Maximized — bears all queuing burden (Theorem 5)
Completion time of largest task Maximum possible: \sum p_i (Theorem 4)

The net effect on perceived quality is negative because:

  1. Loss aversion is asymmetric [8]. A client whose 100-hour task is deprioritized experiences a large, salient negative. A client whose 1-hour task is expedited experiences a small, often unnoticed positive.

  2. High-effort tasks correlate with high-value clients. Large tasks are disproportionately likely to come from major clients, complex contracts, or critical business needs.

  3. Starvation compounds. In a continuous system (Theorem 3), large tasks may be indefinitely deferred as new small tasks keep arriving.

Theorem 7 (The Core Result). For a team processing tasks of non-uniform size, adopting unweighted mean completion time as a performance metric:

(a) Provides zero productivity gain (Theorem 6), while (b) Assigning the maximum possible completion time to the largest task (Theorem 4), and (c) Concentrating all queuing delay onto the largest tasks while eliminating delay for the smallest (Theorem 5).

This is not a tradeoff. The metric creates a pure transfer of service quality from high-effort clients to low-effort clients, with no net work gained. \blacksquare


Part II: Priority Systems

5. Breakdown Under Priority Classification

The preceding sections proved that unweighted mean completion time is biased when tasks vary in size. We now show that introducing a priority system — as virtually all real teams use — causes the metric to become not merely biased but actively adversarial to the organization's stated goals.

5.1 Extended Model: Tasks With Priority

Let each task i have processing time p_i and a priority class q_i \in \{1, 2, 3, 4\} where 1 is the highest priority (critical) and 4 is the lowest (cosmetic/enhancement). Assign priority weights:

w(q) = \begin{cases} 8 & q = 1 \text{ (Critical)} \\ 4 & q = 2 \text{ (High)} \\ 2 & q = 3 \text{ (Medium)} \\ 1 & q = 4 \text{ (Low)} \end{cases}

The specific weights are illustrative; the results hold for any strictly decreasing weight function. The key property is that priority is assigned by business impact, not by task size.

5.2 The Metric Contradicts the Priority System

Theorem 8 (Priority-Size Inversion). When priority is independent of task size, the schedule that minimizes unweighted mean completion time (SPT) will, in expectation, complete low-priority tasks before high-priority tasks of greater size.

Proof. SPT orders tasks by p_i ascending, regardless of q_i. Consider two tasks:

  • Task A: p_A = 40 hours, q_A = 1 (Critical — e.g., server outage)
  • Task B: p_B = 0.5 hours, q_B = 4 (Low — e.g., cosmetic UI fix)

SPT schedules B before A. The unweighted mean for this pair:

\bar{C}^{\text{SPT}} = \frac{0.5 + 40.5}{2} = 20.5 \qquad \bar{C}^{\text{priority}} = \frac{40 + 40.5}{2} = 40.25

The metric declares SPT nearly twice as good — despite completing a cosmetic fix while a server outage burns.

In general, when q_i is statistically independent of p_i, SPT's ordering has zero correlation with priority. In practice, Critical tasks (outages, security incidents, data loss) often require more work than Low tasks, so the metric is plausibly anti-correlated with the priority system. \blacksquare

5.3 Information Destruction

The unweighted mean reduces a three-dimensional task (p_i, q_i, C_i) to a one-dimensional signal (C_i), then averages uniformly. This discards priority entirely and implicitly inverts size.

Theorem 9 (Information Destruction). Let I(\sigma) be the mutual information between the schedule's implicit priority ranking (position) and the actual priority assignment q_i. For SPT:

I(\sigma_{\text{SPT}}) = 0 \quad \text{when } p_i \perp q_i

Proof. SPT assigns positions based solely on p_i. When p_i and q_i are independent, knowing a task's position in the SPT schedule provides zero information about its priority. \blacksquare

Corollary 9.1. A team that optimizes unweighted mean completion time is operating a scheduling system that carries zero information about its own priority classification. The priority field in their ticketing system is, with respect to execution order, decorative.

This is an instance of what Austin [18] calls the fundamental problem of incomplete measurement: when the measurement system captures only a subset of the relevant dimensions, optimizing the measurement systematically degrades the unmeasured dimensions.

5.4 Priority-Weighted Delay Cost

Define the priority-weighted delay cost of a schedule:

D(\sigma) = \sum_{i=1}^{n} w(q_i) \cdot C_i

Theorem 10 (SPT and Priority-Weighted Delay Cost). The optimal schedule for minimizing D(\sigma) is WSJF: order by w(q_i)/p_i descending [1, 5]. SPT's ordering — by 1/p_i descending — ignores priority entirely and produces higher D than priority-respecting alternatives when priority is correlated with task size.

Proof. By the exchange argument, swapping adjacent tasks i, j changes D by:

\Delta D = w(q_j) \cdot p_i - w(q_i) \cdot p_j

The swap improves D when w(q_j)/p_j > w(q_i)/p_i but j is scheduled after i. Therefore the optimal order is decreasing w(q_i)/p_i — the WSJF rule. SPT corresponds to WSJF only when w(q_i) = \text{const} (all tasks have equal priority).

Example. Critical (w = 8, p = 3) and Low (w = 1, p = 2):

  • SPT (Low first): D = 1 \cdot 2 + 8 \cdot 5 = 42
  • WSJF (Critical first): D = 8 \cdot 3 + 1 \cdot 5 = 29

SPT incurs 45% more priority-weighted delay. In practice, Critical tasks tend to be larger (outages, security incidents), making the divergence systematic. \blacksquare


6. Proposed Solutions

6.1 Priority-Weighted Metrics

Replace unweighted mean completion time with the Priority-Weighted Completion Score (PWCS):

\text{PWCS}(\sigma) = \frac{\sum_{i=1}^{n} w(q_i) \cdot \frac{C_i}{p_i}}{\sum_{i=1}^{n} w(q_i)}

This is the priority-weighted mean slowdown ratio. It measures how long each task waited relative to its size, weighted by how much that task mattered. Lower is better.

Properties:

  1. Priority-respecting. Delays to Critical tasks cost 8x more than delays to Low tasks.
  2. Size-fair. Uses slowdown ratio C_i / p_i, so large tasks are not penalized for being large.
  3. Not gameable by SPT. Reordering by processing time does not systematically improve the score.
  4. Reduces to unweighted mean when tasks are uniform. A strict generalization.

6.2 Optimal Policy: WSJF

Theorem 11. The schedule minimizing the priority-weighted completion time \text{PWCT}(\sigma) = \sum w(q_i) \cdot C_i / \sum w(q_i) processes tasks in order of decreasing w(q_i)/p_i — the Weighted Shortest Job First (WSJF) rule [1, 5].

Proof. By the exchange argument (as in Theorem 10), the swap of adjacent tasks i, j improves PWCT when w(q_j)/p_j > w(q_i)/p_i but j is scheduled after i. The optimal order is therefore decreasing w(q_i)/p_i. \blacksquare

Within a priority class, this reduces to SPT (shortest first). Across classes, a Critical 4-hour task (w/p = 2.0) beats a Low 1-hour task (w/p = 1.0).

Practical caveat. Pure WSJF can place tiny Low-priority tasks ahead of large Critical tasks (a 15-minute Low task has w/p = 1/0.25 = 4.0, beating a 6-hour Critical at w/p = 8/6 = 1.33). In practice, this is mitigated by enforcing strict priority-class ordering and applying WSJF only within each class.

6.3 Applied Example: IT Service Desk

Consider an IT team with the following ticket queue:

Ticket Priority Type Est. Hours
T1 P1 (Critical) Email server down 6
T2 P2 (High) VPN failing for remote team 4
T3 P3 (Medium) New employee laptop setup 2
T4 P4 (Low) Update desktop wallpaper policy 0.5
T5 P3 (Medium) Install software license 1
T6 P1 (Critical) Database backup failing 3
T7 P2 (High) Printer fleet offline 2
T8 P4 (Low) Archive old shared drive folder 0.25

SPT order (optimizing unweighted mean): T8, T4, T5, T3, T7, T6, T2, T1

Pos Ticket Priority Hours Completion Slowdown
1 T8 (archive folder) P4 Low 0.25 0.25 1.0
2 T4 (wallpaper) P4 Low 0.5 0.75 1.5
3 T5 (software) P3 Med 1 1.75 1.75
4 T3 (laptop) P3 Med 2 3.75 1.875
5 T7 (printers) P2 High 2 5.75 2.875
6 T6 (backups) P1 Crit 3 8.75 2.917
7 T2 (VPN) P2 High 4 12.75 3.188
8 T1 (email) P1 Crit 6 18.75 3.125

Practical WSJF (priority-class-first, SPT within class):

Pos Ticket Priority Hours Completion
1 T6 (backups) P1 Crit 3 3
2 T1 (email) P1 Crit 6 9
3 T7 (printers) P2 High 2 11
4 T2 (VPN) P2 High 4 15
5 T5 (software) P3 Med 1 16
6 T3 (laptop) P3 Med 2 18
7 T8 (archive) P4 Low 0.25 18.25
8 T4 (wallpaper) P4 Low 0.5 18.75

Comparison:

Metric SPT Practical WSJF Winner
Unweighted mean completion 6.56 hrs 13.63 hrs SPT
P1 mean time to resolution 13.75 hrs 6 hrs WSJF
P2 mean time to resolution 9.25 hrs 13 hrs SPT
Time to fix email server 18.75 hrs 9 hrs WSJF
Time to fix database backups 8.75 hrs 3 hrs WSJF
Time to update wallpaper 0.75 hrs 18.75 hrs SPT

The aggregate priority-weighted completion times are nearly identical (PWCT: 10.2 vs 10.17) because aggregation hides distributional damage. The real difference is in the per-priority-class breakdown: the email server is down for 18.75 hours under SPT versus 9 hours under WSJF. The database backups fail for 8.75 hours versus 3.

The unweighted metric confidently reports SPT as more than twice as efficient (6.56 vs 13.63), rewarding the team that updated desktop wallpaper while the email server was on fire.

Even priority-weighted aggregate metrics can fail to distinguish good from bad schedules, because aggregation hides distributional damage. No single metric suffices. A complete measurement system should track:

Metric What it measures Formula
Mean completion by priority class Per-class responsiveness \bar{C} filtered by q
P1 mean time to resolution Critical incident response \bar{C} for q = 1
Throughput Raw work capacity Work-hours completed / calendar time
Aging violations Starvation prevention Tasks exceeding SLA by priority
Max completion time (P1/P2) Worst-case critical response \max(C_i) for q \le 2

The key insight: per-priority-class metrics expose scheduling failures that aggregate metrics hide.


Part III: Organizational Dynamics

7. When the Metric Is the Product

Sections 26 assume that client satisfaction is a function of experienced service quality. But there exists a scenario in which this assumption fails and the entire argument collapses.

7.1 The Self-Referential Metric

Suppose the provider reports the unweighted mean directly to the client — on a dashboard, in an SLA report, on a marketing page — and the client's satisfaction is derived primarily from that number:

U_{\text{client}} = f\!\left(\bar{C}(\sigma)\right), \quad f' < 0

Under this model, SPT genuinely maximizes client satisfaction (Theorem 1). Throughput is unchanged (Theorem 6). The business outcome improves: same work done, happier client.

Every theorem in this paper remains mathematically correct. But the conclusion inverts. The metric is no longer a proxy that can be gamed — it is the service quality, because the client has agreed to evaluate quality by the aggregate number.

7.2 The Economics

This creates a coherent, stable equilibrium:

Actor Behavior Outcome
Provider Optimizes unweighted mean (SPT) Metric improves, no extra work
Client Reads dashboard, sees low average Reports satisfaction
Management Sees satisfied client + good metric Rewards team

The provider extracts satisfaction at zero marginal cost, by optimizing a number the client has accepted as a proxy for quality.

7.3 The Fragility

This equilibrium is stable only as long as the client never inspects their own experience. It breaks when:

  1. The client checks their own ticket. A CTO whose email server was down for 18.75 hours will not be reassured by "Average resolution: 6.56 hours." The clients most likely to inspect are exactly the ones receiving the worst service (Theorem 4).

  2. A competitor offers per-ticket SLAs. "P1 resolved within 4 hours" beats "average resolution under 7 hours" for any client with critical needs.

  3. The team internalizes the metric. If the team believes the metric reflects real performance, they lose the ability to recognize when critical work is neglected. The metric becomes an epistemic hazard.

7.4 The General Pattern

This pattern — proxy replaces quality, proxy is optimized, quality diverges, system is stable until tested by reality — recurs across domains. Muller [19] documents it extensively as "metric fixation"; Campbell [24] formalized the corrupting effect of using indicators as targets.

Domain Proxy metric Underlying quality Divergence
IT support Avg. resolution time Critical system uptime Server down 19 hrs, avg says 6.5
Education Test scores Actual learning Teaching to the test
Healthcare Patient throughput Patient outcomes Faster discharges, higher readmission
Finance Quarterly earnings Long-term value Cost-cutting inflates EPS, erodes capability
Software Velocity (story points) Product quality Point inflation, features half-finished

7.5 Information Asymmetry

Model the system as a game between provider (P) and client (C). P observes individual \{C_i\} and chooses \sigma; C observes only \bar{C}(\sigma). This is a moral hazard problem [10]: P's optimal strategy is to minimize the observable signal regardless of the unobservable distribution.

The equilibrium is a pooling equilibrium [9]: P's reported metric looks identical regardless of the underlying priority-weighted performance. It is stable until C obtains access to individual C_i values — via a customer portal, a competitor's transparency, or a sufficiently painful incident.

7.6 The Uncomfortable Conclusion

The honest answer to "does optimizing the unweighted mean hurt the business?" is: not necessarily, as long as the client never looks behind the number. The honest answer to "is this sustainable?" is: it is exactly as sustainable as any system in which the seller knows more than the buyer — stable for extended periods, then rapid collapse when the asymmetry is punctured.


8. The Psychological Cost of Knowing

Section 7 modeled the provider as a unitary actor. But teams are composed of individuals. When a team member understands the proof — when they know the metric is synthetic, that the dashboard is theater, that the email server is still down while they close wallpaper tickets — a new cost appears that the equilibrium model omitted.

8.1 The Hidden Variable: Team Awareness

Actor Observes individual C_i Observes \bar{C} Understands the proof
Management Possibly Yes Varies
Team member Yes Yes Yes (in this scenario)
Client No Yes No

The team member has full information. They see the ticket queue. They know the email server has been down since 7 AM. They know they are closing a wallpaper ticket because it improves the number. And they know why.

8.2 Cognitive Dissonance Under Full Information

Cognitive dissonance [11] arises when an individual holds contradictory cognitions. Without understanding why, the contradiction can be rationalized: "management knows best." Understanding the proof removes the ambiguity. The team member now holds:

  • Cognition A: "I am a competent professional. My job is to solve important problems."
  • Cognition B: "I am closing a wallpaper ticket while the email server is down, because the metric is mathematically biased (Theorem 1), the reordering produces zero throughput (Theorem 6), and the only beneficiary is the dashboard (Section 7). I can prove this."

The dissonance is now load-bearing. The available resolutions — abandon professional identity, reject the proof, advocate for change, or leave — each impose costs that did not exist before.

8.3 Self-Determination Theory: Three Needs Violated

Deci and Ryan's Self-Determination Theory [12, 13] identifies three needs predicting intrinsic motivation:

Autonomy. The metric constrains choices in a way the team member knows is mathematically suboptimal. A worker who understands the process is provably counterproductive cannot feel autonomous following it.

Competence. The metric rewards apparent effectiveness (low \bar{C}) while being invariant to actual effectiveness (Theorem 6). Genuine competence — fixing the email server first — is punished by the metric.

Relatedness. The team member knows the client's email server is down. They could help. They are instead updating wallpaper — not because it helps anyone, but because it helps a number. The connection between work and human impact has been severed, and the team member can see the severed ends.

8.4 Moral Injury

Moral injury [16, 17] is the lasting harm caused by "perpetrating, failing to prevent, bearing witness to, or learning about acts that transgress deeply held moral beliefs" [17]. It has since been extended to business settings [25]. The key distinction from burnout: burnout is exhaustion from doing too much. Moral injury is damage from doing the wrong thing.

A team member who knows the email server is down, knows they should fix it, closes a wallpaper ticket instead, and does so because the metric requires it, is experiencing the structural conditions for moral injury.

8.5 Learned Helplessness and Metric Fatalism

Seligman's learned helplessness [14, 15] describes how exposure to uncontrollable negative outcomes leads to passivity. The sequence:

  1. The metric is flawed (proof understood).
  2. Advocate for change.
  3. Rejected ("the numbers are good, don't rock the boat").
  4. Repeat with decreasing conviction.
  5. Terminal state: "The metric is what it is. I'll just close tickets."

This is not laziness. It is the rational response to a system that punishes correct behavior and rewards incorrect behavior, when the individual lacks power to change the system.

8.6 The Adversarial Selection Spiral

Combining Section 7's equilibrium with the turnover dynamic:

  1. Organization adopts unweighted mean. Metric looks good (SPT).
  2. Aware, competent team members experience psychological costs (8.28.5).
  3. Those members leave. Replaced by members who do not understand the metric's flaws or do not care.
  4. The metric continues to look good — it always does under SPT, regardless of team competence (Corollary 6.1).
  5. Actual service quality degrades, but the metric cannot detect this (Corollary 9.1).
  6. Return to step 1.

The metric selects against the people who would improve the system and for the people who will not challenge it. The system stabilizes at a lower level of competence, invisible to its own measurement apparatus.

8.7 The Complete Cost Model

Section 7 (visible) Section 8 (hidden)
Client satisfied (good number) Team dissatisfied (bad reality)
Throughput unchanged Discretionary effort withdrawn
Metric improves Competent members leave
Business economy stable Institutional competence degrades

These operate on different timescales: the equilibrium is visible quarterly; the competence degradation is visible over years. The complete model is: the metric works, and it is destructive, and the destruction is invisible to the metric. The metric is fresh paint on corroded rebar.


9. Manager Internalization: The Actionable Solution

Sections 26 say reject the metric. Section 7 says the metric works (for the business). Section 8 says it destroys the team. In practice, most managers cannot unilaterally change the metric. The best solution is company-wide metric reform. The actionable solution is what a single informed manager can do right now.

9.1 The Strategy

A manager who understands the proof can internalize the metric's limitations without propagating them to the team:

  1. Schedule primarily by priority. The team works critical tasks first.
  2. Tactically interleave small tasks. When a small low-priority task can be completed without materially delaying high-priority work, do it. Not because the metric demands it, but because it also needs to get done and costs almost nothing.
  3. Never reveal the metric as the motivation. "Knock out this quick one while we wait for the vendor callback on the P1" — not "we need to bring our average down." The team's intrinsic motivation remains intact (Section 8). The manager absorbs the metric-management burden.

9.2 Formalization

The manager's problem is a constrained optimization:

\min_{\sigma} \sum_{i=1}^{n} w(q_i) \cdot C_i \quad \text{subject to} \quad \bar{C}(\sigma) \le \bar{C}_{\text{target}}

Theorem 12 (Bounded Metric Cost of Priority Scheduling). A manager who uses SPT within each priority class and priority ordering between classes will produce a metric close to the SPT-optimal value — the gap arises only from between-class inversions.

Proof sketch. Within each priority class, SPT is free (all tasks have equal priority). The only deviation from global SPT is the between-class ordering. Each cross-class inversion costs at most p_{\text{large}} - p_{\text{small}} in the unweighted sum, and these inversions are bounded by the number of classes. In practice, the gap is typically within 1020% of SPT-optimal. \blacksquare

9.3 The Manager as Information Barrier

Layer Sees metric Sees priorities Sees proof
Organization Yes Nominally No
Manager Yes Yes Yes
Team No (shielded) Yes Irrelevant
Client Yes (dashboard) Via SLA No

The manager is the only actor holding all three pieces of information. This is not manipulation — they are doing the right work in the right order, and the metric happens to be acceptable because within-class SPT is free.

9.4 The Competitive Breakdown

This strategy fails when the metric becomes competitive between teams.

Case 1: Cooperative — Teams measured for parity, not ranking. Each manager independently uses the internalization strategy. The metric is decorative but harmless. This is a coordination game with a stable cooperative equilibrium.

Case 2: Competitive — Teams ranked by \bar{C}. This is a prisoner's dilemma:

Team B: Priority-first Team B: SPT
Team A: Priority-first (Good work, Good work) (A looks bad, B looks good)
Team A: SPT (A looks good, B looks bad) (Both look good, both do wrong work)

The Nash equilibrium is (SPT, SPT). The internalization strategy is a cooperative equilibrium that is not stable under competition.

9.5 Scope

Condition Viability
Metric used for health-check / parity Viable
Metric visible but not ranked Viable
Metric ranked across teams Fragile — requires all managers to cooperate
Metric tied to compensation / resources Not viable — prisoner's dilemma dominates
Metric reform possible at org level Unnecessary — fix the metric instead

The best solution is company-wide. The actionable solution is a manager who understands this proof, shields their team from the metric, schedules by priority, and uses SPT only within priority classes to keep the number reasonable.


Part IV: Assessment

10. Devil's Advocate

Intellectual honesty requires acknowledging where the argument has limits.

10.1 Simplicity Has Real Value

Argument. The unweighted mean requires no priority weights, no task-size estimates, no calibration.

Assessment: True. But the unweighted metric does not avoid assumptions — it hides them by implicitly setting all weights to 1 and all sizes to

  1. A known-imprecise estimate of task size is still more informative than the implicit assumption that all sizes are equal.

10.2 Minimizing the Number of People Waiting

Argument. SPT minimizes total person-hours spent waiting. If each task represents one client, this is optimal.

Assessment: Mathematically correct. If you run a DMV and every person's time is equally valuable, SPT is the right policy. It breaks down when tasks are not 1:1 with clients, waiting cost is not uniform, or the metric is used to evaluate teams rather than serve a literal queue.

10.3 SPT as a Triage Heuristic

Argument. When task sizes cluster tightly, SPT approximates FIFO and the unweighted mean approximates the weighted mean.

Assessment: Correct. The coefficient of variation CV = \sigma_p / \bar{p} determines distortion severity:

CV Task size distribution Distortion
< 0.3 Tight (call center) Negligible
0.3 1.0 Moderate (mixed IT) Moderate
> 1.0 Wide (typical IT queue) Severe

A typical IT desk spans 15 minutes to 40+ hours (CV > 2). The distortion is not an edge case — it is the default.

10.4 Gaming Requires Malice

Argument. The theorems show the metric can be gamed, not that it will be gamed.

Assessment: This is the strongest counterargument. If the metric is purely informational and never influences behavior, the gaming incentive is absent. However, any metric reported to management, tied to OKRs, or discussed in retrospectives will influence behavior. This is Goodhart's Law [6, 7] — and it applies to well-intentioned teams as reliably as to cynical ones. The drift happens organically: completing three easy tickets "feels productive" while the metric validates the feeling.

10.5 When the Unweighted Mean Is Defensible

The metric is defensible only when all four conditions hold:

  1. Task sizes are approximately uniform (CV < 0.3)
  2. No priority differentiation (all tasks equally important)
  3. Each task represents exactly one client
  4. The metric is not used to evaluate, reward, or direct behavior

These conditions are rarely met in the systems where the metric is most commonly used.


This paper sits at the intersection of several literatures that have not previously been connected.

11.1 Scheduling Theory and Fairness

Smith [1] established the SPT optimality result and the WSJF rule in 1956. Conway, Maxwell, and Miller [2] provided the comprehensive textbook treatment. The fairness of size-based scheduling policies has been debated in computer systems scheduling: Bansal and Harchol-Balter [22] investigated SRPT unfairness; Wierman and Harchol-Balter [23] formalized fairness classifications against Processor-Sharing; Angel, Bampis, and Pascual [21] measured SPT schedule quality against fair optimality criteria.

This prior work analyzes fairness in CPU and server scheduling. The present paper applies the same mathematical results to organizational task management, where the "scheduler" is a human team, the "jobs" are client requests with business-impact priorities, and the "objective function" is a management metric. The mechanism is identical; the consequences differ because organizational scheduling has priority systems, client relationships, and psychological costs that CPU scheduling does not.

11.2 Measurement Dysfunction

Austin [18] proved that incomplete measurement — measuring only a subset of relevant dimensions — creates incentives to optimize the measured dimensions at the expense of unmeasured ones, and that this effect is not merely possible but inevitable when measurement is tied to rewards. His information-asymmetry framing closely parallels Section 7. The present paper provides the specific mathematical mechanism (Theorems 12) for the case of task scheduling, and extends the argument through psychology (Section 8) to trace the complete chain of organizational harm.

Muller [19] documented "metric fixation" across education, healthcare, policing, and finance, providing extensive empirical evidence for the patterns theorized in Section 7.4. Campbell [24] formalized the corrupting effect of using indicators as targets, complementing Goodhart's original observation [6] and Strathern's generalization [7].

Bevan and Hood [26] empirically documented gaming behaviors in the English public health system — including the exact patterns of "hitting the target and missing the point" described in our Section 5.2.

11.3 Psychological Costs of Metric Dysfunction

The application of moral injury (Shay [16], Litz et al. [17]) to business settings has recent precedent: a 2024 Journal of Business Ethics study [25] explicitly extended the construct to for-profit workplaces, finding structural conditions similar to those described in Section 8.4. Moore [27] analyzed moral disengagement — the cognitive restructuring that enables unethical behavior under organizational pressure. The present paper addresses the complementary phenomenon: the harm to individuals who refuse to disengage.

11.4 What Is Novel

The individual components — SPT optimality, Goodhart's Law, measurement dysfunction, moral injury — all have precedent. The contributions of this paper are:

  1. The conservation law (Theorem 2) used prescriptively — as a constructive argument that work-weighted completion time cannot be gamed, rather than as a theoretical scheduling result.

  2. The specific proof that priority classes make the metric algebraically adversarial (Theorems 89) — not merely empirically bad but structurally contradictory, with zero mutual information between the schedule and the priority system.

  3. The integrated chain from mathematical proof through information asymmetry through psychological harm through adversarial selection spiral — tracing a single metric from Smith (1956) to organizational hollowing.

  4. The manager internalization strategy (Section 9) with formal game-theoretic analysis of its stability and breakdown conditions under inter-team competition.

  5. The application of scheduling theory to organizational management critique — proving that a commonly used team metric has specific, quantifiable pathologies rather than arguing from anecdote or general principle.


12. Conclusion

The unweighted average completion time is a biased statistic that:

  1. Can be gamed by scheduling policy (Theorem 1), unlike work-weighted completion time which is schedule-invariant (Theorem 2).
  2. Incentivizes starvation of large tasks (Theorem 3).
  3. Degrades client satisfaction with zero compensating productivity gain (Theorem 7).
  4. Actively contradicts priority systems by carrying zero information about business-impact classification (Theorem 9).
  5. Ignores priority entirely in its scheduling recommendation, producing suboptimal priority-weighted delay whenever priority and size are not perfectly inversely correlated (Theorem 10).

A metric that can be improved by reordering work — without doing any additional work — is measuring the scheduling policy, not the system's capacity. When combined with a priority system, it recommends the schedule that inflicts the most damage on the highest-priority work.

When the metric is reported to clients, it creates an information asymmetry (Section 7) whose business equilibrium is profitable but fragile. When team members understand its flaws, it violates their intrinsic motivation and selects for the departure of the most competent people (Section 8). A single informed manager can partially mitigate these effects through constrained optimization (Section 9), but this cooperative strategy is not stable under inter-team competition.

The unweighted mean is defensible only under narrow conditions (Section 10.5): uniform task sizes, no priorities, one-to-one client-task mapping, and no behavioral influence. These conditions are rarely met.

Unweighted average completion time is not a fair or accurate measurement of task execution performance. Its adoption as a team metric will rationally produce starvation of complex work, violation of stated priorities, inequitable client outcomes, and the illusion of productivity where none exists.

The best solution is organizational metric reform. The actionable solution is a manager who understands this proof.


References

Scheduling Theory

[1] Smith, W. E. (1956). Various optimizers for single-stage production. Naval Research Logistics Quarterly, 3(12), 5966. doi:10.1002/nav.3800030106

Origin of the SPT optimality result (Theorem 1), the weighted completion time rule w_i/p_i descending (WSJF, Theorem 11), and the adjacent-job pairwise interchange (exchange argument) proof technique used throughout.

[2] Conway, R. W., Maxwell, W. L., & Miller, L. W. (1967). Theory of Scheduling. Addison-Wesley.

Standard textbook treatment of single-machine scheduling theory, extending Smith's results.

[3] Little, J. D. C. (1961). A proof for the queuing formula: L = λW. Operations Research, 9(3), 383387. doi:10.1287/opre.9.3.383

First rigorous proof of Little's Law. Referenced in Section 3.2 for queueing-theoretic context.

[4] Little, J. D. C. (2011). Little's Law as viewed on its 50th anniversary. Operations Research, 59(3), 536549. doi:10.1287/opre.1110.0941

Retrospective discussing scope, limitations, and common misapplications.

[5] Reinertsen, D. G. (2009). The Principles of Product Development Flow: Second Generation Lean Product Development. Celeritas Publishing. ISBN: 978-0-9844512-0-8.

Popularized WSJF and "Cost of Delay / Duration" in agile/lean contexts. Mathematical foundation is Smith (1956) [1].

Measurement and Incentives

[6] Goodhart, C. A. E. (1984). Problems of monetary management: The U.K. experience. In Monetary Theory and Practice (pp. 91121). Macmillan.

Source of Goodhart's Law: "Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes."

[7] Strathern, M. (1997). 'Improving ratings': Audit in the British university system. European Review, 5(3), 305321. doi:10.1002/(SICI)1234-981X(199707)5:3<305::AID-EURO184>3.0.CO;2-4

Generalized Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure."

Behavioral Economics

[8] Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263292. doi:10.2307/1914185

Established loss aversion. Referenced in Section 4.5.

Game Theory and Contract Theory

[9] Akerlof, G. A. (1970). The market for "lemons": Quality uncertainty and the market mechanism. The Quarterly Journal of Economics, 84(3), 488500. doi:10.2307/1879431

Information asymmetry and adverse selection. The pooling equilibrium in Section 7.5 is structurally analogous.

[10] Hölmstrom, B. (1979). Moral hazard and observability. The Bell Journal of Economics, 10(1), 7491. doi:10.2307/3003320

Formal treatment of moral hazard. The metric-reporting scenario in Section 7.5 is a moral hazard problem.

Psychology

[11] Festinger, L. (1957). A Theory of Cognitive Dissonance. Stanford University Press. ISBN: 978-0-8047-0131-0.

Foundational theory. Referenced in Section 8.2.

[12] Deci, E. L., & Ryan, R. M. (1985). Intrinsic Motivation and Self-Determination in Human Behavior. Plenum Press. ISBN: 978-0-306-42022-1.

Original treatment of Self-Determination Theory. Referenced in Section 8.3.

[13] Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. American Psychologist, 55(1), 6878. doi:10.1037/0003-066X.55.1.68

SDT overview linking need satisfaction to intrinsic motivation and well-being.

[14] Seligman, M. E. P., & Maier, S. F. (1967). Failure to escape traumatic shock. Journal of Experimental Psychology, 74(1), 19. doi:10.1037/h0024514

Original demonstration of learned helplessness. Referenced in Section 8.5.

[15] Seligman, M. E. P. (1975). Helplessness: On Depression, Development, and Death. W. H. Freeman. ISBN: 978-0-7167-0752-3.

Extended treatment connecting learned helplessness to human depression and institutional behavior.

[16] Shay, J. (1994). Achilles in Vietnam: Combat Trauma and the Undoing of Character. Atheneum / Simon & Schuster. ISBN: 978-0-689-12182-3.

Introduced the concept of moral injury. Referenced in Section 8.4.

[17] Litz, B. T., Stein, N., Delaney, E., Lebowitz, L., Nash, W. P., Silva, C., & Maguen, S. (2009). Moral injury and moral repair in war veterans: A preliminary model and intervention strategy. Clinical Psychology Review, 29(8), 695706. doi:10.1016/j.cpr.2009.07.003

Formalized moral injury as a clinical construct. Definition quoted in Section 8.4.

Organizational Measurement

[18] Austin, R. D. (1996). Measuring and Managing Performance in Organizations. Dorset House. ISBN: 978-0-932633-36-1.

Proved that incomplete measurement creates inevitable incentives to optimize measured dimensions at the expense of unmeasured ones. The information-asymmetry framing closely parallels Section 7. The single most important predecessor to this paper's argument.

[19] Muller, J. Z. (2018). The Tyranny of Metrics. Princeton University Press. ISBN: 978-0-691-17495-2.

Comprehensive treatment of "metric fixation" across education, healthcare, policing, and finance. Extensive empirical evidence for the patterns theorized in Section 7.4.

Scheduling Fairness

[20] Coffman, E. G., Shanthikumar, J. G., & Yao, D. D. (1992). Multiclass queueing systems: Polymatroid structure and optimal scheduling control. Operations Research, 40(S2), S293S299.

Conservation laws in scheduling. The schedule-invariance of work-weighted completion time (Theorem 2) is an instance of these conservation laws.

[21] Angel, E., Bampis, E., & Pascual, F. (2008). How good are SPT schedules for fair optimality criteria? Annals of Operations Research, 159(1), 5364. doi:10.1007/s10479-007-0267-0

Directly measures SPT schedule quality against fairness criteria. Closest predecessor in scheduling theory to Section 4's fairness analysis.

[22] Bansal, N., & Harchol-Balter, M. (2001). Analysis of SRPT scheduling: Investigating unfairness. ACM SIGMETRICS Performance Evaluation Review, 29(1), 279290. doi:10.1145/384268.378792

Investigates the belief that SRPT unfairly penalizes large jobs in computer scheduling. Argues unfairness is smaller than believed but acknowledges the core tension.

[23] Wierman, A., & Harchol-Balter, M. (2003). Classifying scheduling policies with respect to unfairness in an M/GI/1. ACM SIGMETRICS Performance Evaluation Review, 31(1), 238249.

Formalizes fairness definitions for scheduling policies by comparison to Processor-Sharing.

Additional References

[24] Campbell, D. T. (1979). Assessing the impact of planned social change. Evaluation and Program Planning, 2(1), 6790. doi:10.1016/0149-7189(79)90048-X

Campbell's Law: "The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor." Complements Goodhart's Law [6].

[25] Ferreira, C. M., et al. (2024). It's business: A qualitative study of moral injury in business settings. Journal of Business Ethics. doi:10.1007/s10551-024-05615-0

Extends moral injury to for-profit workplaces. Validates Section 8.4's application of Shay/Litz beyond military and healthcare settings.

[26] Bevan, G., & Hood, C. (2006). What's measured is what matters: Targets and gaming in the English public health care system. Public Administration, 84(3), 517538. doi:10.1111/j.1467-9299.2006.00600.x

Empirically documents gaming behaviors including "hitting the target and missing the point." Provides real-world evidence for Section 5.2's priority-metric contradiction.

[27] Moore, C. (2012). Why employees do bad things: Moral disengagement and unethical organizational behavior. Personnel Psychology, 65(1), 148. doi:10.1111/j.1744-6570.2011.01237.x

Analyzes moral disengagement — the cognitive restructuring enabling unethical behavior. Section 8 addresses the complementary phenomenon: harm to individuals who refuse to disengage.


This proof was developed conversationally and formalized on 2026-03-28.