635d902691
Formalizes the actionable middle ground: a manager who understands the proof can schedule primarily by priority while tactically interleaving small tasks to maintain metric parity with other teams. Key contributions: - Constrained optimization formulation (minimize priority-weighted delay subject to unweighted mean staying in acceptable band) - Theorem 12: bounded metric cost of priority scheduling (within-class SPT is free, between-class inversions are bounded) - Manager as information barrier (shields team from metric's perverse incentives, preserving intrinsic motivation per Appendix B) - Competitive breakdown as prisoner's dilemma: cooperative equilibrium is stable when metric is a health-check, collapses when metric is ranked or tied to compensation - Scope table: viable for parity/health-check, fragile under ranking, not viable under compensation linkage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1653 lines
73 KiB
Markdown
1653 lines
73 KiB
Markdown
# Unweighted Average Completion Time Is Not a Fair Metric for Task Scheduling
|
||
|
||
A mathematical proof that unweighted average task completion time is a biased
|
||
statistic that incentivizes cherry-picking easy work, and that any scheduling
|
||
advantage it appears to reveal is an artifact of the metric — not a reflection
|
||
of genuine throughput or service quality.
|
||
|
||
---
|
||
|
||
## 1. Definitions
|
||
|
||
Let there be **n** tasks with processing times $p_1, p_2, \ldots, p_n$.
|
||
|
||
A **schedule** $\sigma$ is a permutation of $\{1, 2, \ldots, n\}$ assigning
|
||
tasks to execution order on a single executor.
|
||
|
||
The **completion time** of task $\sigma(k)$ under schedule $\sigma$ is:
|
||
|
||
$$C_{\sigma(k)} = \sum_{j=1}^{k} p_{\sigma(j)}$$
|
||
|
||
The **unweighted mean completion time** is:
|
||
|
||
$$\bar{C}(\sigma) = \frac{1}{n} \sum_{k=1}^{n} C_{\sigma(k)}$$
|
||
|
||
The **work-weighted mean completion time** is:
|
||
|
||
$$\bar{C}_w(\sigma) = \frac{\sum_{k=1}^{n} p_{\sigma(k)} \cdot C_{\sigma(k)}}{\sum_{k=1}^{n} p_{\sigma(k)}}$$
|
||
|
||
---
|
||
|
||
## 2. SPT Is Optimal for the Unweighted Statistic
|
||
|
||
**Theorem 1.** The schedule that minimizes $\bar{C}(\sigma)$ is Shortest
|
||
Processing Time first (SPT): sort tasks so that $p_{\sigma(1)} \le p_{\sigma(2)} \le \cdots \le p_{\sigma(n)}$.
|
||
|
||
**Proof (exchange argument).**
|
||
|
||
Consider any schedule $\sigma$ in which two adjacent tasks $i, j$ satisfy
|
||
$p_i > p_j$ with task $i$ scheduled immediately before task $j$. Let $t$ be the
|
||
start time of task $i$.
|
||
|
||
| | Task $i$ finishes | Task $j$ finishes | Sum |
|
||
|---|---|---|---|
|
||
| **Before swap** ($i$ then $j$) | $t + p_i$ | $t + p_i + p_j$ | $2t + 2p_i + p_j$ |
|
||
| **After swap** ($j$ then $i$) | $t + p_j$ | $t + p_j + p_i$ | $2t + p_i + 2p_j$ |
|
||
|
||
The change in the sum of completion times is:
|
||
|
||
$$(2p_i + p_j) - (p_i + 2p_j) = p_i - p_j > 0$$
|
||
|
||
Every swap of a longer-before-shorter adjacent pair strictly reduces the total.
|
||
Any non-SPT schedule contains such a pair. Repeated swaps converge to SPT.
|
||
Therefore SPT uniquely minimizes $\bar{C}(\sigma)$. $\blacksquare$
|
||
|
||
---
|
||
|
||
## 3. The Work-Weighted Statistic Is Schedule-Invariant
|
||
|
||
**Theorem 2.** The work-weighted mean completion time $\bar{C}_w(\sigma)$ is
|
||
the same for every schedule $\sigma$.
|
||
|
||
**Proof.**
|
||
|
||
Expand the numerator:
|
||
|
||
$$\sum_{k=1}^{n} p_{\sigma(k)} \cdot C_{\sigma(k)} = \sum_{k=1}^{n} p_{\sigma(k)} \sum_{j=1}^{k} p_{\sigma(j)}$$
|
||
|
||
Reindex by letting $a = \sigma(k)$ and $b = \sigma(j)$. The double sum counts
|
||
every ordered pair $(a, b)$ where $b$ is scheduled no later than $a$:
|
||
|
||
$$= \sum_{\substack{a, b \\ b \preceq_\sigma a}} p_a \, p_b$$
|
||
|
||
For any pair $(a, b)$ with $a \ne b$, exactly one of $\{b \preceq_\sigma a\}$
|
||
or $\{a \prec_\sigma b\}$ holds. The diagonal terms ($a = b$) contribute $p_a^2$
|
||
regardless of order. Therefore:
|
||
|
||
$$\sum_{\substack{a, b \\ b \preceq_\sigma a}} p_a \, p_b = \sum_{a} p_a^2 + \sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b$$
|
||
|
||
Now consider the complementary sum:
|
||
|
||
$$\sum_{\substack{a \ne b \\ a \prec_\sigma b}} p_a \, p_b$$
|
||
|
||
Together the two off-diagonal sums cover all unordered pairs $\{a, b\}$:
|
||
|
||
$$\sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b + \sum_{\substack{a \ne b \\ a \prec_\sigma b}} p_a \, p_b = \sum_{a \ne b} p_a \, p_b$$
|
||
|
||
The right-hand side is schedule-independent. By symmetry of $p_a p_b$, both
|
||
off-diagonal sums are equal:
|
||
|
||
$$\sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b = \frac{1}{2} \sum_{a \ne b} p_a \, p_b$$
|
||
|
||
Therefore:
|
||
|
||
$$\sum_{k=1}^{n} p_{\sigma(k)} \cdot C_{\sigma(k)} = \sum_a p_a^2 + \frac{1}{2} \sum_{a \ne b} p_a \, p_b = \frac{1}{2}\left(\sum_a p_a\right)^2 + \frac{1}{2}\sum_a p_a^2$$
|
||
|
||
This expression contains no reference to $\sigma$. Since the denominator
|
||
$\sum p_a$ is also schedule-independent:
|
||
|
||
$$\bar{C}_w(\sigma) = \frac{\frac{1}{2}\left(\sum p_a\right)^2 + \frac{1}{2}\sum p_a^2}{\sum p_a}$$
|
||
|
||
is **constant across all schedules**. $\blacksquare$
|
||
|
||
---
|
||
|
||
## 4. Concrete Example
|
||
|
||
Two tasks: $A$ with $p_A = 1$ hour, $B$ with $p_B = 10$ hours.
|
||
|
||
### SPT order (A first)
|
||
|
||
| Task | Completion time |
|
||
|------|----------------|
|
||
| A | 1 |
|
||
| B | 11 |
|
||
|
||
- Unweighted mean: $(1 + 11) / 2 = 6.0$
|
||
- Work-weighted mean: $(1 \times 1 + 10 \times 11) / 11 = 111/11 \approx 10.09$
|
||
|
||
### Reverse order (B first)
|
||
|
||
| Task | Completion time |
|
||
|------|----------------|
|
||
| B | 10 |
|
||
| A | 11 |
|
||
|
||
- Unweighted mean: $(10 + 11) / 2 = 10.5$
|
||
- Work-weighted mean: $(10 \times 10 + 1 \times 11) / 11 = 111/11 \approx 10.09$
|
||
|
||
SPT appears **4.5 hours better** on the unweighted metric but provides
|
||
**zero improvement** on the work-weighted metric. The apparent advantage exists
|
||
only because the unweighted statistic lets a 1-hour task "vote" equally with
|
||
a 10-hour task.
|
||
|
||
---
|
||
|
||
## 5. Connection to Little's Law
|
||
|
||
Little's Law states $L = \lambda W$, where $L$ is the time-averaged number
|
||
of tasks in the system, $\lambda$ is the arrival rate, and $W$ is the
|
||
average time a task spends in the system.
|
||
|
||
In a *steady-state* queueing system with fixed arrival and service rates,
|
||
$\lambda$ and the long-run service rate are determined by the workload, not
|
||
by scheduling policy. Little's Law then tells us that $L$ and $W$ are
|
||
linked, but in the batch case (all $n$ tasks present at time 0), $L$ and
|
||
$W$ are both schedule-dependent: $\bar{C} = W$, and
|
||
$L = \sum C_i / \sum p_i$, both of which SPT minimizes.
|
||
|
||
The invariance we proved in Theorem 2 is more specific: *work-weighted*
|
||
mean completion time $\bar{C}_w$ is constant across schedules. This
|
||
corresponds to measuring the system from the perspective of "how long does
|
||
a unit of *work* wait" rather than "how long does a *task* wait." The
|
||
unweighted statistic measures the latter and is gameable precisely because
|
||
it counts completions rather than work.
|
||
|
||
---
|
||
|
||
## 6. Consequences
|
||
|
||
**Theorem 3 (Metric Bias).** Any scheduling policy that minimizes unweighted
|
||
mean completion time necessarily maximizes the completion time of the largest
|
||
task relative to other schedules.
|
||
|
||
**Proof.** SPT places the largest task last. Its completion time equals the
|
||
total processing time $\sum p_i$, which is the maximum possible completion
|
||
time for any individual task. Meanwhile, FIFO or any non-SPT order would
|
||
allow the large task to finish earlier. $\blacksquare$
|
||
|
||
This creates a **starvation incentive**: rational agents optimizing the
|
||
unweighted statistic will indefinitely defer large tasks in favor of
|
||
small ones.
|
||
|
||
### Real-world manifestations
|
||
|
||
| Domain | Gameable metric | Perverse outcome |
|
||
|--------|----------------|------------------|
|
||
| Support desks | Tickets closed / day | Complex issues ignored |
|
||
| Sprint planning | Story count velocity | Work split into trivial pieces |
|
||
| Emergency rooms | Average wait time | Critical patients deprioritized |
|
||
| Academic publishing | Papers per year | Incremental work favored over deep research |
|
||
|
||
---
|
||
|
||
## 7. Impact on Client Satisfaction and Team Productivity
|
||
|
||
The preceding theorems are not merely abstract. They have direct, provable
|
||
consequences for client satisfaction and team productivity when a team adopts
|
||
unweighted mean completion time as its performance metric.
|
||
|
||
### 7.1 Defining Client Satisfaction: The Slowdown Ratio
|
||
|
||
A client submitting a task of size $p_i$ has an expectation anchored to that
|
||
size. The natural measure of their experience is the **slowdown ratio**:
|
||
|
||
$$S_i = \frac{C_i}{p_i}$$
|
||
|
||
This is the factor by which the client's wait exceeds the task's inherent
|
||
processing time. A slowdown of 1 means no queuing delay at all. A slowdown
|
||
of 10 means the client waited 10x longer than the work itself required.
|
||
|
||
Client satisfaction is inversely related to slowdown: a client who waits
|
||
2x their task size is more satisfied than one who waits 20x, regardless of
|
||
the absolute times involved.
|
||
|
||
**Theorem 4 (SPT Uniquely Maximizes Completion Time of the Largest Task).**
|
||
Among all schedules, SPT is the unique policy that assigns the maximum
|
||
possible completion time ($\sum p_i$) to the largest task.
|
||
|
||
**Proof.**
|
||
|
||
SPT sorts tasks in ascending order of $p_i$, placing the largest task
|
||
$p_{\max}$ in the last position. The last task in any schedule has
|
||
completion time $\sum_{i=1}^{n} p_i$, which is the maximum completion time
|
||
any individual task can receive. Therefore, under SPT:
|
||
|
||
$$C_{\max\text{-task}}^{\text{SPT}} = \sum_{i=1}^{n} p_i$$
|
||
|
||
Under any schedule that does not place $p_{\max}$ last, the largest task
|
||
completes strictly before $\sum p_i$. SPT is the unique schedule (among
|
||
those ordered by processing time) that assigns this worst-case completion
|
||
time to the largest task.
|
||
|
||
Note on slowdown: SPT actually *compresses* slowdown ratios ($S_i = C_i / p_i$)
|
||
because larger tasks in later positions have large denominators that absorb
|
||
the accumulated sum. For example, with tasks $[1, 5, 10]$:
|
||
|
||
- SPT: slowdowns $[1, 1.2, 1.6]$ — low variance
|
||
- LPT: slowdowns $[1, 3, 16]$ — high variance
|
||
|
||
SPT's harm to large-task clients is not visible in the slowdown ratio. It is
|
||
visible in **absolute completion time**: the largest task finishes last, at
|
||
$\sum p_i$, while under any other ordering it finishes earlier. $\blacksquare$
|
||
|
||
**Corollary 4.1.** A team optimizing unweighted mean completion time will
|
||
systematically deliver the worst experience to clients with the most
|
||
complex needs.
|
||
|
||
This is not a side effect — it is the *mechanism* by which the metric improves.
|
||
The only way to lower the unweighted average is to complete more small tasks
|
||
early, which necessarily means completing large tasks later. The metric
|
||
improves *because* high-effort clients are deprioritized.
|
||
|
||
### 7.2 The Absolute Delay Burden
|
||
|
||
The slowdown ratio $S_i = C_i / p_i$ might suggest SPT is *fair* — it
|
||
compresses slowdown variance by giving everyone a ratio close to 1. But
|
||
this obscures the real cost. The correct measure of burden is the
|
||
**absolute delay** experienced by each task:
|
||
|
||
$$\Delta_i = C_i - p_i$$
|
||
|
||
This is the time a task spends waiting for other tasks, independent of its
|
||
own size. Under any sequential schedule, the total delay across all tasks
|
||
is schedule-dependent (it equals $\sum C_i - \sum p_i$), and SPT minimizes
|
||
this total. But the *distribution* of delay matters.
|
||
|
||
**Theorem 5 (SPT Concentrates Delay on the Largest Task).** Under SPT, the
|
||
largest task bears more absolute delay than under any other schedule.
|
||
|
||
**Proof.** Under SPT, the largest task is in position $n$ with:
|
||
|
||
$$\Delta_{\max\text{-task}}^{\text{SPT}} = C_n - p_n = \sum_{i=1}^{n-1} p_i$$
|
||
|
||
This is the sum of all other tasks' processing times — the maximum possible
|
||
delay for any single task. Under any schedule where the largest task is not
|
||
last, its delay is strictly less than $\sum_{i \ne \max} p_i$.
|
||
|
||
Meanwhile, SPT gives the smallest task zero delay ($\Delta_1^{\text{SPT}} = 0$).
|
||
The entire queuing burden is shifted from small tasks to large tasks.
|
||
$\blacksquare$
|
||
|
||
The tension is this: SPT minimizes total delay (good for aggregate
|
||
efficiency) by concentrating delay onto the tasks best able to "absorb" it
|
||
in slowdown-ratio terms. But in absolute terms — hours spent waiting — the
|
||
largest task bears the full weight. If that task represents a critical
|
||
business need, the absolute delay, not the ratio, determines the damage.
|
||
|
||
### 7.3 Productivity Is Not Improved
|
||
|
||
**Theorem 6 (Throughput Invariance).** Total work completed over any time
|
||
horizon $T$ is identical under all scheduling policies.
|
||
|
||
**Proof.** The executor processes work at a fixed rate. Over time $T$, the
|
||
total work completed is:
|
||
|
||
$$W(T) = \sum_{\{i : C_i \le T\}} p_i + \text{(partial progress on current task)}$$
|
||
|
||
In the non-preemptive case (tasks run to completion once started), $W(T)$ may
|
||
vary slightly at the boundary depending on which task is in progress at time
|
||
$T$. However, over any horizon $T \ge \sum p_i$ (i.e., long enough to
|
||
complete all tasks), the total work done is exactly $\sum p_i$ regardless
|
||
of order.
|
||
|
||
For the steady-state case with ongoing arrivals, the long-run throughput is
|
||
determined by the service rate $\mu$ and is completely independent of
|
||
scheduling:
|
||
|
||
$$\lim_{T \to \infty} \frac{W(T)}{T} = \mu \quad \text{for all schedules } \sigma$$
|
||
|
||
$\blacksquare$
|
||
|
||
**Corollary 6.1.** A team that switches from any scheduling policy to SPT
|
||
will observe an improvement in unweighted mean completion time with
|
||
**zero change in actual throughput**.
|
||
|
||
The metric improves. The output does not.
|
||
|
||
### 7.4 The Compound Effect: Satisfaction Down, Productivity Flat
|
||
|
||
Combining Theorems 4, 5, and 6:
|
||
|
||
| Measure | Effect of optimizing unweighted mean |
|
||
|---------|--------------------------------------|
|
||
| Throughput (work/time) | No change (Theorem 6) |
|
||
| Delay for small tasks | Minimized — approaches zero (SPT) |
|
||
| Delay for large tasks | **Maximized** — bears all queuing burden (Theorem 5) |
|
||
| Completion time of largest task | **Maximum possible**: $\sum p_i$ (Theorem 4) |
|
||
| Overall perceived quality of service | **Net negative** (see below) |
|
||
|
||
The net effect on perceived quality is negative because:
|
||
|
||
1. **Loss aversion is asymmetric.** A client whose 100-hour task is
|
||
deprioritized to last experiences a large, salient negative. A client
|
||
whose 1-hour task moves from position 5 to position 1 experiences a
|
||
small, often unnoticed positive. The absolute dissatisfaction created
|
||
exceeds the absolute satisfaction gained.
|
||
|
||
2. **High-effort tasks correlate with high-value clients.** Large tasks
|
||
are disproportionately likely to come from major clients, complex
|
||
contracts, or critical business needs. Systematically giving these
|
||
clients the worst experience is anti-correlated with revenue and
|
||
retention.
|
||
|
||
3. **Starvation compounds.** In a continuous system (Theorem 3), large
|
||
tasks are not merely delayed — they may be **indefinitely deferred**
|
||
as new small tasks keep arriving. The affected client's satisfaction
|
||
does not merely decrease; it collapses entirely.
|
||
|
||
**Theorem 7 (The Core Result).** For a team processing tasks of non-uniform
|
||
size, adopting unweighted mean completion time as a performance metric:
|
||
|
||
(a) Provides **zero productivity gain** (Theorem 6), while
|
||
(b) **Assigning the maximum possible completion time** to the largest task
|
||
(Theorem 4), and
|
||
(c) **Concentrating all queuing delay** onto the largest tasks while
|
||
eliminating delay for the smallest (Theorem 5).
|
||
|
||
This is not a tradeoff — there is no compensating benefit on the productivity
|
||
side. The metric creates a pure transfer of service quality from high-effort
|
||
clients to low-effort clients, with no net work gained.
|
||
|
||
**A team using unweighted mean completion time as its performance metric
|
||
will, under rational optimization, simultaneously fail to improve
|
||
productivity and systematically degrade the experience of its most
|
||
demanding clients.** $\blacksquare$
|
||
|
||
---
|
||
|
||
## 8. When Unweighted Mean Completion Time Is Valid
|
||
|
||
For completeness: the unweighted metric is appropriate **if and only if**
|
||
all tasks are approximately equal in size ($p_i \approx p_j$ for all $i, j$).
|
||
In this case, the work-weighted and unweighted statistics converge, SPT and
|
||
FIFO produce similar schedules, and slowdown ratios are naturally equal.
|
||
|
||
The pathology arises specifically from **variance in task size**. The greater
|
||
the variance, the greater the distortion, and the more damage the metric
|
||
causes when optimized.
|
||
|
||
---
|
||
|
||
## 9. Complete Breakdown Under Priority Classification
|
||
|
||
The preceding sections proved that unweighted mean completion time is biased
|
||
when tasks vary in size. We now show that introducing a **priority system** —
|
||
as virtually all real teams use — causes the metric to become not merely
|
||
biased but **actively adversarial** to the organization's stated goals.
|
||
|
||
### 9.1 Extended Model: Tasks With Priority
|
||
|
||
Let each task $i$ have processing time $p_i$ and a priority class
|
||
$q_i \in \{1, 2, 3, 4\}$ where 1 is the highest priority (critical) and
|
||
4 is the lowest (cosmetic/enhancement). Assign priority weights:
|
||
|
||
$$w(q) = \begin{cases} 8 & q = 1 \text{ (Critical)} \\ 4 & q = 2 \text{ (High)} \\ 2 & q = 3 \text{ (Medium)} \\ 1 & q = 4 \text{ (Low)} \end{cases}$$
|
||
|
||
The specific weights are illustrative; the results hold for any strictly
|
||
decreasing weight function. The key property is that priority is assigned
|
||
by **business impact**, not by task size.
|
||
|
||
### 9.2 The Metric Contradicts the Priority System
|
||
|
||
**Theorem 8 (Priority-Size Inversion).** When priority is independent of
|
||
task size, the schedule that minimizes unweighted mean completion time (SPT)
|
||
will, in expectation, complete low-priority tasks before high-priority tasks
|
||
of greater size.
|
||
|
||
**Proof.**
|
||
|
||
SPT orders tasks by $p_i$ ascending, regardless of $q_i$. Consider two tasks:
|
||
|
||
- Task A: $p_A = 40$ hours, $q_A = 1$ (Critical — e.g., server outage)
|
||
- Task B: $p_B = 0.5$ hours, $q_B = 4$ (Low — e.g., cosmetic UI fix)
|
||
|
||
SPT schedules B before A. The unweighted mean completion time for this pair:
|
||
|
||
$$\bar{C}^{\text{SPT}} = \frac{0.5 + 40.5}{2} = 20.5$$
|
||
|
||
The priority-respecting order (A before B):
|
||
|
||
$$\bar{C}^{\text{priority}} = \frac{40 + 40.5}{2} = 40.25$$
|
||
|
||
The metric declares SPT nearly **twice as good** — despite completing a
|
||
cosmetic fix while a server outage burns for an additional 0.5 hours.
|
||
|
||
In general, for $n$ tasks where priority $q_i$ is statistically independent
|
||
of processing time $p_i$ (a reasonable assumption, since priority reflects
|
||
business impact while processing time reflects technical complexity):
|
||
|
||
$$\text{Corr}(p_i, q_i) \approx 0$$
|
||
|
||
SPT's ordering is determined entirely by $p_i$. The expected position of a
|
||
task in the SPT schedule has **zero correlation** with its priority. A
|
||
Critical task is equally likely to be scheduled first or last.
|
||
|
||
More precisely: the expected fraction of Critical tasks in the bottom half
|
||
of the SPT schedule equals the fraction of Critical tasks whose processing
|
||
time exceeds the median. In practice, Critical tasks (outages, security
|
||
incidents, data loss) often require more work, so this fraction exceeds 50%.
|
||
The metric is not merely uncorrelated with priority — it is plausibly
|
||
**anti-correlated**. $\blacksquare$
|
||
|
||
### 9.3 Dimensionality Collapse
|
||
|
||
The unweighted mean completion time reduces a three-dimensional task
|
||
$(p_i, q_i, C_i)$ to a one-dimensional signal ($C_i$), then averages
|
||
that signal uniformly. This discards two of the three dimensions:
|
||
|
||
1. **Priority ($q_i$) is completely ignored.** A critical task and a
|
||
cosmetic task contribute identically to the mean.
|
||
2. **Size ($p_i$) is implicitly inverted.** Small tasks are rewarded with
|
||
early completion, large tasks are punished — regardless of their
|
||
importance.
|
||
|
||
**Theorem 9 (Information Destruction).** Let $I(\sigma)$ be the mutual
|
||
information between the schedule's implicit priority ranking (position in
|
||
schedule) and the actual priority assignment $q_i$. For SPT:
|
||
|
||
$$I(\sigma_{\text{SPT}}) = 0 \quad \text{when } p_i \perp q_i$$
|
||
|
||
**Proof.** SPT assigns positions based solely on $p_i$. When $p_i$ and $q_i$
|
||
are independent, knowing a task's position in the SPT schedule provides
|
||
zero information about its priority. The schedule is statistically
|
||
independent of the priority system.
|
||
|
||
Contrast this with a priority-first schedule, where $I > 0$ by construction.
|
||
$\blacksquare$
|
||
|
||
**Corollary 9.1.** A team that optimizes unweighted mean completion time
|
||
is operating a scheduling system that carries zero information about its
|
||
own priority classification. The priority field in their ticketing system
|
||
is, with respect to execution order, decorative.
|
||
|
||
### 9.4 Quantifying the Damage: Priority-Weighted Delay Cost
|
||
|
||
Define the **priority-weighted delay cost** of a schedule:
|
||
|
||
$$D(\sigma) = \sum_{i=1}^{n} w(q_i) \cdot C_i$$
|
||
|
||
This measures the total business-impact-weighted time spent waiting.
|
||
|
||
**Theorem 10 (SPT and Priority-Weighted Delay Cost).**
|
||
The optimal schedule for minimizing priority-weighted delay cost $D(\sigma)$
|
||
is WSJF: order by $w(q_i)/p_i$ descending. SPT's ordering — by $1/p_i$
|
||
descending — ignores priority entirely and produces higher $D$ than
|
||
priority-respecting alternatives when priority is correlated with task size.
|
||
|
||
**Proof.** By the standard exchange argument (as in Theorem 1), swapping
|
||
adjacent tasks $i, j$ in a schedule changes $D$ by:
|
||
|
||
$$\Delta D = w(q_j) \cdot p_i - w(q_i) \cdot p_j$$
|
||
|
||
The swap improves $D$ when $\Delta D > 0$, i.e., when $w(q_j)/p_j > w(q_i)/p_i$
|
||
but $j$ is scheduled after $i$. Therefore the optimal order is decreasing
|
||
$w(q_i)/p_i$ — this is the WSJF rule.
|
||
|
||
SPT orders by $p_i$ ascending (equivalently, $1/p_i$ descending), which
|
||
corresponds to WSJF only when $w(q_i) = \text{const}$ — i.e., when all
|
||
tasks have equal priority.
|
||
|
||
**Example.** Two tasks: Critical ($w = 8$, $p_H = 10$) and Low ($w = 1$, $p_L = 1$).
|
||
|
||
WSJF scores: Critical = $8/10 = 0.8$, Low = $1/1 = 1.0$.
|
||
|
||
WSJF places the Low task first (higher $w/p$), same as SPT. Here, SPT and
|
||
WSJF agree because the Low task's tiny size dominates despite its low weight.
|
||
|
||
Now consider: Critical ($w = 8$, $p_H = 3$) and Low ($w = 1$, $p_L = 2$).
|
||
|
||
WSJF scores: Critical = $8/3 = 2.67$, Low = $1/2 = 0.5$.
|
||
|
||
WSJF places Critical first. SPT places Low first (smaller $p$). The costs:
|
||
|
||
- SPT (Low first): $D = 1 \cdot 2 + 8 \cdot 5 = 42$
|
||
- WSJF (Critical first): $D = 8 \cdot 3 + 1 \cdot 5 = 29$
|
||
|
||
SPT incurs 45% more priority-weighted delay because it ignores the 8x
|
||
priority weight of the Critical task.
|
||
|
||
In general, SPT diverges from WSJF — and produces suboptimal $D$ — whenever
|
||
priority and task size are not perfectly inversely correlated. In practice,
|
||
Critical tasks tend to be larger (outages, security incidents), making the
|
||
divergence systematic rather than occasional. $\blacksquare$
|
||
|
||
---
|
||
|
||
## 10. A Proposed Solution: Priority-Weighted Completion Score
|
||
|
||
### 10.1 The Metric
|
||
|
||
Replace unweighted mean completion time with the **Priority-Weighted
|
||
Completion Score (PWCS)**:
|
||
|
||
$$\text{PWCS}(\sigma) = \frac{\sum_{i=1}^{n} w(q_i) \cdot \frac{C_i}{p_i}}{\sum_{i=1}^{n} w(q_i)}$$
|
||
|
||
This is the priority-weighted mean slowdown ratio. It measures:
|
||
|
||
- **How long each task waited relative to its size** (the slowdown $C_i / p_i$),
|
||
weighted by
|
||
- **How much that task mattered** (the priority weight $w(q_i)$).
|
||
|
||
Lower is better. A PWCS of 1.0 means every task was completed instantly
|
||
with zero queuing delay. A PWCS of 3.0 means the average task waited 3x
|
||
its processing time, weighted by importance.
|
||
|
||
### 10.2 Properties of PWCS
|
||
|
||
**Property 1: Priority-respecting.** PWCS penalizes delays to high-priority
|
||
tasks more heavily than low-priority tasks. A 2-hour delay to a Critical
|
||
task costs 8x more than the same delay to a Low task.
|
||
|
||
**Property 2: Size-fair.** By using the slowdown ratio $C_i / p_i$ rather
|
||
than raw completion time $C_i$, the metric does not inherently penalize
|
||
large tasks for being large. A 40-hour task that waits 80 hours contributes
|
||
the same slowdown (2.0) as a 1-hour task that waits 2 hours.
|
||
|
||
**Property 3: Not gameable by SPT.** Because the metric weights by priority
|
||
and normalizes by task size, reordering tasks by processing time does not
|
||
systematically improve the score. The optimal strategy is to minimize
|
||
slowdown for high-priority tasks — i.e., to **actually respect the priority
|
||
system**.
|
||
|
||
**Property 4: Reduces to unweighted mean when tasks are uniform.** If all
|
||
tasks have equal priority and equal size, PWCS equals the unweighted mean
|
||
completion time divided by the common task size. It is a strict
|
||
generalization.
|
||
|
||
### 10.3 Optimal Policy for PWCS
|
||
|
||
**Theorem 11.** The schedule minimizing PWCS processes tasks in order of
|
||
decreasing $w(q_i) / p_i$ — highest priority first, breaking ties by
|
||
shortest processing time within the same priority class.
|
||
|
||
**Proof (exchange argument, as in Theorem 1).**
|
||
|
||
Consider adjacent tasks $i, j$ with $i$ before $j$. Each task's contribution
|
||
to the PWCS numerator depends on the completion times of both. Swapping $i$
|
||
and $j$:
|
||
|
||
The change in the weighted slowdown sum is proportional to:
|
||
|
||
$$w(q_i) \cdot \frac{p_j}{p_i} - w(q_j) \cdot \frac{p_i}{p_j}$$
|
||
|
||
The swap improves PWCS when this quantity is positive, i.e., when:
|
||
|
||
$$\frac{w(q_i)}{p_i^2} > \frac{w(q_j)}{p_j^2}$$
|
||
|
||
Hmm — this doesn't simplify as cleanly due to the ratio structure. Let
|
||
us instead consider the more practical **priority-weighted completion time**:
|
||
|
||
$$\text{PWCT}(\sigma) = \frac{\sum_{i=1}^{n} w(q_i) \cdot C_i}{\sum_{i=1}^{n} w(q_i)}$$
|
||
|
||
For PWCT, the exchange argument gives: swap improves the score when
|
||
$w(q_j) \cdot p_i > w(q_i) \cdot p_j$, i.e., when $w(q_j)/p_j > w(q_i)/p_i$
|
||
but $j$ is scheduled after $i$. The optimal order is therefore decreasing
|
||
$w(q_i)/p_i$, which is the **Weighted Shortest Job First (WSJF)** rule:
|
||
|
||
$$\text{Schedule by: } \frac{w(q_i)}{p_i} \text{ descending}$$
|
||
|
||
This means: within a priority class, do short tasks first; across priority
|
||
classes, a Critical 8-hour task ($w/p = 8/8 = 1.0$) ties with a Low 1-hour
|
||
task ($w/p = 1/1 = 1.0$) — but a Critical 4-hour task ($w/p = 8/4 = 2.0$)
|
||
beats both. $\blacksquare$
|
||
|
||
### 10.4 Applied Example: IT Service Desk
|
||
|
||
Consider an IT team with the following ticket queue on a Monday morning:
|
||
|
||
| Ticket | Priority | Type | Est. Hours |
|
||
|--------|----------|------|-----------|
|
||
| T1 | P1 (Critical) | Email server down | 6 |
|
||
| T2 | P2 (High) | VPN failing for remote team | 4 |
|
||
| T3 | P3 (Medium) | New employee laptop setup | 2 |
|
||
| T4 | P4 (Low) | Update desktop wallpaper policy | 0.5 |
|
||
| T5 | P3 (Medium) | Install software license | 1 |
|
||
| T6 | P1 (Critical) | Database backup failing | 3 |
|
||
| T7 | P2 (High) | Printer fleet offline | 2 |
|
||
| T8 | P4 (Low) | Archive old shared drive folder | 0.25 |
|
||
|
||
**SPT order (optimizing unweighted mean):** T8, T4, T5, T3, T7, T6, T2, T1
|
||
|
||
| Position | Ticket | Priority | Hours | Completion | Slowdown |
|
||
|----------|--------|----------|-------|------------|----------|
|
||
| 1 | T8 (archive folder) | P4 Low | 0.25 | 0.25 | 1.0 |
|
||
| 2 | T4 (wallpaper) | P4 Low | 0.5 | 0.75 | 1.5 |
|
||
| 3 | T5 (software) | P3 Med | 1 | 1.75 | 1.75 |
|
||
| 4 | T3 (laptop) | P3 Med | 2 | 3.75 | 1.875 |
|
||
| 5 | T7 (printers) | P2 High | 2 | 5.75 | 2.875 |
|
||
| 6 | T6 (backups) | P1 Crit | 3 | 8.75 | 2.917 |
|
||
| 7 | T2 (VPN) | P2 High | 4 | 12.75 | 3.1875 |
|
||
| 8 | T1 (email) | P1 Crit | 6 | 18.75 | 3.125 |
|
||
|
||
- **Unweighted mean completion:** $(0.25 + 0.75 + 1.75 + 3.75 + 5.75 + 8.75 + 12.75 + 18.75) / 8 = 6.5625$ hours
|
||
- **PWCT:** $(1 \cdot 0.25 + 1 \cdot 0.75 + 2 \cdot 1.75 + 2 \cdot 3.75 + 4 \cdot 5.75 + 8 \cdot 8.75 + 4 \cdot 12.75 + 8 \cdot 18.75) / 30 = 306/30 = 10.2$ hours
|
||
- Email server is down for **18.75 hours**. Database backups fail for **8.75 hours**.
|
||
|
||
**WSJF order (optimizing PWCT by $w(q)/p$ descending):**
|
||
|
||
| Ticket | Priority | Hours | $w/p$ |
|
||
|--------|----------|-------|-------|
|
||
| T6 | P1 Crit | 3 | 8/3 = 2.667 |
|
||
| T8 | P4 Low | 0.25 | 1/0.25 = 4.0 |
|
||
| T5 | P3 Med | 1 | 2/1 = 2.0 |
|
||
| T4 | P4 Low | 0.5 | 1/0.5 = 2.0 |
|
||
| T1 | P1 Crit | 6 | 8/6 = 1.333 |
|
||
| T7 | P2 High | 2 | 4/2 = 2.0 |
|
||
| T2 | P2 High | 4 | 4/4 = 1.0 |
|
||
| T3 | P3 Med | 2 | 2/2 = 1.0 |
|
||
|
||
Wait — T8 has $w/p = 4.0$, the highest. That places a Low-priority task
|
||
first, which feels wrong. This reveals an important practical point:
|
||
**pure WSJF can still be gamed by tiny tasks** because their small $p$
|
||
inflates the ratio. In practice, this is mitigated by enforcing strict
|
||
priority class ordering and only applying WSJF *within* priority classes.
|
||
|
||
**Practical WSJF (priority-class-first, then $w/p$ within class):**
|
||
|
||
| Position | Ticket | Priority | Hours | Completion |
|
||
|----------|--------|----------|-------|------------|
|
||
| 1 | T6 (backups) | P1 Crit | 3 | 3 |
|
||
| 2 | T1 (email) | P1 Crit | 6 | 9 |
|
||
| 3 | T7 (printers) | P2 High | 2 | 11 |
|
||
| 4 | T2 (VPN) | P2 High | 4 | 15 |
|
||
| 5 | T5 (software) | P3 Med | 1 | 16 |
|
||
| 6 | T3 (laptop) | P3 Med | 2 | 18 |
|
||
| 7 | T8 (archive) | P4 Low | 0.25 | 18.25 |
|
||
| 8 | T4 (wallpaper) | P4 Low | 0.5 | 18.75 |
|
||
|
||
- **Unweighted mean completion:** $(3 + 9 + 11 + 15 + 16 + 18 + 18.25 + 18.75) / 8 = 13.625$ hours
|
||
- **PWCT:** $(8 \cdot 3 + 8 \cdot 9 + 4 \cdot 11 + 4 \cdot 15 + 2 \cdot 16 + 2 \cdot 18 + 1 \cdot 18.25 + 1 \cdot 18.75) / 30 = 305/30 = 10.167$ hours
|
||
- Email server restored in **9 hours**. Backups fixed in **3 hours**.
|
||
|
||
### Comparison
|
||
|
||
| Metric | SPT | Practical WSJF | Winner |
|
||
|--------|-----|----------------|--------|
|
||
| Unweighted mean completion | **6.5625 hrs** | 13.625 hrs | SPT |
|
||
| Priority-weighted completion (PWCT) | 10.2 hrs | **10.167 hrs** | WSJF |
|
||
| Time to fix email server | 18.75 hrs | **9 hrs** | WSJF |
|
||
| Time to fix database backups | 8.75 hrs | **3 hrs** | WSJF |
|
||
| Time to fix printers | 5.75 hrs | **11 hrs** | SPT |
|
||
| Time to update wallpaper | **0.75 hrs** | 18.75 hrs | SPT |
|
||
|
||
The PWCT values are nearly identical (10.2 vs 10.167) because PWCT — as a
|
||
*weighted average of completion times* — is dampened by the fact that total
|
||
work is constant. **PWCT is not the right metric for this comparison.** The
|
||
real difference is visible in the individual completion times of critical
|
||
tasks: the email server is down for 18.75 hours under SPT versus 9 hours
|
||
under WSJF. The database backups fail for 8.75 hours versus 3 hours.
|
||
|
||
The better comparison metric is the **priority-weighted delay cost**
|
||
$D = \sum w(q_i) \cdot C_i$ (not normalized):
|
||
|
||
- SPT: $D = 306$ priority-weighted hours
|
||
- Practical WSJF: $D = 305$ priority-weighted hours
|
||
|
||
Again, the aggregate is similar. The damage from SPT is not in the
|
||
aggregate — it is in the *distribution*: critical systems burn while
|
||
cosmetic tasks are polished. A metric that cannot distinguish between these
|
||
two schedules — despite one leaving the email server down for twice as long
|
||
— is not measuring what matters.
|
||
|
||
The unweighted metric, however, confidently reports SPT as **more than twice
|
||
as efficient** (6.56 vs 13.63), rewarding the team that updated desktop
|
||
wallpaper while the email server was on fire.
|
||
|
||
### 10.5 Recommended Metric Suite
|
||
|
||
The IT example reveals that even priority-weighted aggregate metrics (PWCT)
|
||
can fail to distinguish good from bad schedules, because aggregation hides
|
||
distributional damage. No single metric suffices. A complete measurement
|
||
system for a priority-based team should track:
|
||
|
||
| Metric | What it measures | Formula |
|
||
|--------|-----------------|---------|
|
||
| **Mean completion by priority class** | Per-class responsiveness | $\bar{C}$ filtered by $q$ |
|
||
| **P1 mean time to resolution** | Critical incident response | $\bar{C}$ filtered to $q = 1$ |
|
||
| **Throughput** | Raw work capacity | Work-hours completed / calendar time |
|
||
| **Aging violations** | Starvation prevention | Count of tasks exceeding SLA by priority |
|
||
| **Max completion time (P1/P2)** | Worst-case critical response | $\max(C_i)$ filtered to $q \le 2$ |
|
||
|
||
The key insight from our analysis: **per-priority-class metrics** (rows 1-2,
|
||
5) expose scheduling failures that aggregate metrics hide. If P1 mean time
|
||
to resolution is 14 hours while P4 mean is 0.5 hours, the team is
|
||
optimizing the wrong metric — regardless of what the aggregate says.
|
||
|
||
---
|
||
|
||
## 11. Devil's Advocate: The Case for Unweighted Mean Completion Time
|
||
|
||
Intellectual honesty requires acknowledging where the preceding argument
|
||
has limits. The following are genuine counterarguments — not strawmen.
|
||
|
||
### 11.1 Simplicity Has Real Value
|
||
|
||
**Argument.** The unweighted mean is trivially computable: sum the completion
|
||
times, divide by the count. It requires no priority weights, no task-size
|
||
estimates, no calibration. Every alternative proposed in Section 10 requires
|
||
estimating $p_i$ (task size) before the task is complete — and these
|
||
estimates are notoriously unreliable.
|
||
|
||
**Assessment: This is true.** PWCS and PWCT require inputs (priority
|
||
weights, size estimates) that introduce their own sources of error. If size
|
||
estimates are systematically wrong — and in software engineering they often
|
||
are, with large tasks underestimated and small tasks overestimated — then
|
||
the weighted metric inherits that noise.
|
||
|
||
However, the unweighted metric does not avoid this problem — it *hides* it
|
||
by implicitly setting all weights to 1 and all sizes to 1. That is not
|
||
"making no assumptions"; it is making the specific assumption that all tasks
|
||
are equally important and equally sized, which is demonstrably false in any
|
||
real system. **A known-imprecise estimate of task size is still more
|
||
informative than the implicit assumption that all sizes are equal.**
|
||
|
||
### 11.2 Minimizing the Number of People Waiting
|
||
|
||
**Argument.** If each task represents one client, then unweighted mean
|
||
completion time minimizes the total person-hours spent waiting. SPT is
|
||
optimal for this because completing short tasks first "frees" the most
|
||
people from the queue earliest.
|
||
|
||
**Assessment: This is mathematically correct.** The sum $\sum C_i$ counts
|
||
total person-time in the system. SPT genuinely minimizes this quantity.
|
||
If you run a DMV and every person's time is equally valuable regardless of
|
||
why they're there, SPT is the right policy.
|
||
|
||
The argument breaks down when:
|
||
|
||
1. **Tasks are not 1:1 with clients.** In IT, one client may submit tasks
|
||
of varying size. Across a relationship, SPT systematically fast-tracks
|
||
their easy requests and starves their hard ones — which is not perceived
|
||
as good service.
|
||
|
||
2. **Waiting cost is not uniform.** A person waiting for a server outage
|
||
to be fixed is not equivalent to a person waiting for a wallpaper change.
|
||
The cost of waiting is proportional to the *impact* of the unresolved
|
||
task, which is what priority encodes.
|
||
|
||
3. **The metric is applied to teams, not DMVs.** When a team's performance
|
||
is measured by unweighted mean, the rational response is to cherry-pick
|
||
— which is individually rational but collectively destructive.
|
||
|
||
### 11.3 SPT as a Triage Heuristic
|
||
|
||
**Argument.** In high-volume systems where task sizes cluster tightly
|
||
(e.g., a call center where most calls are 3-7 minutes), SPT approximates
|
||
FIFO and the unweighted mean approximates the weighted mean. The pathologies
|
||
described in this paper only manifest when task sizes span orders of
|
||
magnitude.
|
||
|
||
**Assessment: This is correct.** As shown in Section 8, when task sizes are
|
||
approximately uniform, all scheduling policies converge and all metrics
|
||
agree. The coefficient of variation of task size, $CV = \sigma_p / \bar{p}$,
|
||
determines the severity of the distortion:
|
||
|
||
| $CV$ | Task size distribution | Metric distortion |
|
||
|------|----------------------|-------------------|
|
||
| < 0.3 | Tight (call center) | Negligible |
|
||
| 0.3 - 1.0 | Moderate (mixed IT) | Moderate |
|
||
| > 1.0 | Wide (typical IT queue) | Severe |
|
||
|
||
For a typical IT service desk, task sizes range from 15 minutes (password
|
||
reset) to 40+ hours (infrastructure migration), giving $CV > 2$. The
|
||
distortion is not a theoretical edge case — it is the default condition.
|
||
|
||
### 11.4 Gaming Requires Malice
|
||
|
||
**Argument.** The theorems show that the metric *can* be gamed, not that it
|
||
*will* be gamed. A well-intentioned team might use the unweighted mean as
|
||
a rough health indicator without actively optimizing for it, avoiding the
|
||
pathologies described.
|
||
|
||
**Assessment: This is the strongest counterargument.** If the metric is
|
||
used purely for monitoring — "are we completing things at a reasonable
|
||
pace?" — and not for performance evaluation, rewards, or scheduling
|
||
decisions, then the gaming incentive is absent and the metric is relatively
|
||
harmless.
|
||
|
||
However, this argument requires the metric to remain purely informational
|
||
and never influence behavior. In practice, any metric that is reported to
|
||
management, tied to OKRs, or used in sprint retrospectives will influence
|
||
behavior — this is Goodhart's Law, and it applies to well-intentioned teams
|
||
as reliably as to cynical ones. The team need not be gaming the metric
|
||
consciously; it is sufficient that completing three easy tickets "feels
|
||
productive" while staring at one hard ticket does not. The metric validates
|
||
the feeling, and the drift happens organically.
|
||
|
||
### 11.5 Summary: When the Unweighted Mean Is Defensible
|
||
|
||
The unweighted mean completion time is a defensible metric **only when all
|
||
four conditions hold simultaneously**:
|
||
|
||
1. Task sizes are approximately uniform ($CV < 0.3$)
|
||
2. There is no priority differentiation (all tasks are equally important)
|
||
3. Each task represents exactly one client
|
||
4. The metric is not used to evaluate, reward, or direct team behavior
|
||
|
||
In a system satisfying all four conditions — such as a simple FIFO queue
|
||
with uniform jobs and no priority system — the unweighted mean is adequate,
|
||
and its simplicity is a genuine advantage.
|
||
|
||
In any system that violates even one of these conditions — which includes
|
||
virtually every IT service desk, development team, and support organization
|
||
— the metric produces the distortions proven in Sections 2-9.
|
||
|
||
The honest conclusion is not that the unweighted mean is always wrong. It is
|
||
that the conditions under which it is right are narrow, easily identified,
|
||
and rarely met in the systems where it is most commonly used.
|
||
|
||
---
|
||
|
||
## 12. Manager Internalization: The Actionable Solution
|
||
|
||
The preceding sections present two extremes: reject the metric entirely
|
||
(Sections 1-10) or surrender to it (Appendix A). In practice, most
|
||
managers cannot unilaterally change the metric — it is set at the
|
||
organizational level, reported across teams, and embedded in dashboards
|
||
that other stakeholders consume. The best solution is company-wide metric
|
||
reform. The *actionable* solution is what a single informed manager can
|
||
do right now.
|
||
|
||
### 12.1 The Strategy
|
||
|
||
A manager who understands the proof can **internalize the metric's
|
||
limitations without propagating them to the team**. The approach:
|
||
|
||
1. **Schedule primarily by priority.** The team works critical tasks
|
||
first, exactly as professional judgment and the priority system
|
||
dictate. This is the default — the team need not know why.
|
||
|
||
2. **Tactically interleave small tasks to maintain metric parity.** When
|
||
the queue contains a small, low-priority task that can be completed
|
||
quickly without materially delaying any high-priority work, do it.
|
||
Not because the metric demands it, but because the small task *also
|
||
needs to get done*, and doing it now costs almost nothing.
|
||
|
||
3. **Never reveal the metric as the motivation.** The team is told "knock
|
||
out this quick one while we're waiting on the vendor callback for the
|
||
P1" — not "we need to bring our average down." The team's
|
||
professional judgment and intrinsic motivation (Appendix B) remain
|
||
intact. The manager absorbs the metric-management burden.
|
||
|
||
This is a **constrained optimization**: minimize priority-weighted delay
|
||
(do the right work in the right order) subject to the constraint that
|
||
the reported unweighted mean stays within an acceptable band.
|
||
|
||
### 12.2 Formalization
|
||
|
||
Let $\bar{C}_{\text{target}}$ be the unweighted mean completion time that
|
||
other teams report — the parity threshold. The manager's problem is:
|
||
|
||
$$\min_{\sigma} \sum_{i=1}^{n} w(q_i) \cdot C_i \quad \text{subject to} \quad \bar{C}(\sigma) \le \bar{C}_{\text{target}}$$
|
||
|
||
This is a single-machine scheduling problem with a budget constraint on
|
||
the unweighted mean. The solution is a modified priority schedule:
|
||
|
||
- Start from the priority-first ordering (all P1 first, then P2, etc.).
|
||
- Identify small low-priority tasks whose insertion ahead of lower-ranked
|
||
same-priority tasks reduces $\bar{C}$ without displacing any
|
||
higher-priority task.
|
||
- Insert them only when the marginal improvement to $\bar{C}$ exceeds
|
||
the marginal cost to priority-weighted delay.
|
||
|
||
**Theorem 12 (Bounded Metric Cost of Priority Scheduling).** For a
|
||
priority-first schedule with $n$ tasks, the gap between its unweighted
|
||
mean $\bar{C}_{\text{priority}}$ and the SPT-optimal unweighted mean
|
||
$\bar{C}_{\text{SPT}}$ is bounded by:
|
||
|
||
$$\bar{C}_{\text{priority}} - \bar{C}_{\text{SPT}} \le \frac{n-1}{2n}(\bar{p}_{\max\text{-class}} - \bar{p}_{\min\text{-class}}) \cdot n_{\text{classes}}$$
|
||
|
||
where $\bar{p}_{\max\text{-class}}$ and $\bar{p}_{\min\text{-class}}$ are
|
||
the mean processing times of the largest and smallest priority classes.
|
||
|
||
**Proof sketch.** The gap arises entirely from the cross-class ordering:
|
||
within each priority class, the manager can use SPT (shortest first) at
|
||
no priority cost, since all tasks in the class have equal priority. The
|
||
only deviation from global SPT is the *between-class* ordering, where
|
||
large high-priority tasks are placed before small low-priority tasks.
|
||
Each such inversion costs at most $p_{\text{large}} - p_{\text{small}}$
|
||
in the unweighted sum, and there are at most
|
||
$n_{\text{classes}} \cdot (n / n_{\text{classes}})$ such inversions.
|
||
$\blacksquare$
|
||
|
||
In practice, this means: **a manager who uses SPT within each priority
|
||
class and priority ordering between classes will produce a metric that
|
||
is close to the SPT-optimal value** — often within 10-20% — while
|
||
respecting the priority system entirely.
|
||
|
||
### 12.3 Why This Works: The Manager as Information Barrier
|
||
|
||
The strategy works because the manager serves as an **information
|
||
barrier** between the metric and the team:
|
||
|
||
| Layer | Sees the metric | Sees the priorities | Sees the proof |
|
||
|-------|----------------|--------------------|-----------------|
|
||
| Organization | Yes | Nominally | No |
|
||
| Manager | Yes | Yes | **Yes** |
|
||
| Team | No (shielded) | Yes | Irrelevant |
|
||
| Client | Yes (dashboard) | Via SLA | No |
|
||
|
||
The manager is the only actor who holds all three pieces of information.
|
||
By internalizing the proof, the manager can:
|
||
|
||
- Present a metric that satisfies organizational reporting (the number
|
||
is reasonable)
|
||
- Direct the team by priority (professional judgment preserved)
|
||
- Shield the team from the metric's perverse incentives (Appendix B
|
||
costs avoided)
|
||
|
||
This is *not* manipulation. The manager is not fabricating numbers or
|
||
misreporting. They are doing the right work in the right order, and
|
||
the metric happens to be acceptable because within-class SPT is free
|
||
and between-class inversions are bounded (Theorem 12).
|
||
|
||
### 12.4 The Competitive Breakdown
|
||
|
||
This strategy fails when the metric becomes **competitive between teams**.
|
||
|
||
Model $m$ teams, each managed independently. Team $j$ reports
|
||
$\bar{C}_j(\sigma_j)$. If teams are ranked, rewarded, or compared on
|
||
$\bar{C}$:
|
||
|
||
**Case 1: Cooperative** — Teams are measured for parity, not ranking.
|
||
The threshold is "stay within a reasonable band." Each manager
|
||
independently uses the internalization strategy. All teams do
|
||
approximately the right work. The metric is decorative but harmless.
|
||
This is a **coordination game** with a stable cooperative equilibrium.
|
||
|
||
**Case 2: Competitive** — Teams are ranked by $\bar{C}$. Promotions,
|
||
resources, or recognition go to the lowest average. This is a
|
||
**prisoner's dilemma**:
|
||
|
||
| | Team B: Priority-first | Team B: SPT |
|
||
|---|---|---|
|
||
| **Team A: Priority-first** | (Good work, Good work) | (A looks bad, B looks good) |
|
||
| **Team A: SPT** | (A looks good, B looks bad) | (Both look good, both do wrong work) |
|
||
|
||
The dominant strategy for each team is SPT. The Nash equilibrium is
|
||
(SPT, SPT) — all teams optimize the metric, all teams do the wrong
|
||
work, and the organization reports excellent numbers while critical
|
||
tasks rot across every queue.
|
||
|
||
The internalization strategy is a **cooperative equilibrium that is not
|
||
stable under competition**. A single team that defects to pure SPT will
|
||
outperform all others on the metric, forcing other managers to choose
|
||
between doing the right work (and looking bad) or following suit (and
|
||
abandoning their professional judgment).
|
||
|
||
### 12.5 The Scope of the Solution
|
||
|
||
| Condition | Strategy viability |
|
||
|-----------|-------------------|
|
||
| Metric used for health-check / parity | **Viable** — cooperative equilibrium holds |
|
||
| Metric visible but not ranked | **Viable** — no competitive pressure to defect |
|
||
| Metric ranked across teams | **Fragile** — viable only if all managers cooperate |
|
||
| Metric tied to compensation / resources | **Not viable** — prisoner's dilemma dominates |
|
||
| Metric reform possible at org level | **Unnecessary** — fix the metric instead |
|
||
|
||
The internalization strategy is actionable *right now*, by a single
|
||
manager, without organizational permission or metric reform. It
|
||
preserves team psychology (Appendix B), respects priorities (Sections
|
||
9-10), and produces an acceptable reported metric (Theorem 12).
|
||
|
||
Its limitation is structural: it requires the metric to be a
|
||
**reporting formality**, not a **competitive instrument**. The moment
|
||
the metric drives resource allocation or team ranking, the cooperative
|
||
equilibrium collapses and only organizational reform — replacing the
|
||
metric with a priority-weighted alternative (Section 10) — can prevent
|
||
the race to the bottom.
|
||
|
||
**The best solution is company-wide. The actionable solution is a
|
||
manager who understands this proof, shields their team from the metric,
|
||
schedules by priority, and uses SPT only within priority classes to
|
||
keep the number reasonable.**
|
||
|
||
---
|
||
|
||
## 13. Conclusion
|
||
|
||
The unweighted average completion time is a **biased statistic** that:
|
||
|
||
1. **Can be gamed** by scheduling policy (Theorem 1), unlike work-weighted
|
||
completion time which is schedule-invariant (Theorem 2).
|
||
2. **Incentivizes starvation** of large tasks (Theorem 3).
|
||
3. **Contradicts Little's Law** unless tasks are uniformly sized.
|
||
4. **Degrades client satisfaction** with zero compensating productivity
|
||
gain (Theorem 7).
|
||
5. **Actively contradicts priority systems** by carrying zero information
|
||
about business-impact classification (Theorem 9).
|
||
6. **Ignores priority entirely** in its scheduling recommendation,
|
||
producing suboptimal priority-weighted delay whenever priority and
|
||
size are not perfectly inversely correlated (Theorem 10).
|
||
|
||
A metric that can be improved by reordering work — without doing any
|
||
additional work — is measuring the scheduling policy, not the system's
|
||
capacity or effectiveness. When combined with a priority system, the metric
|
||
does not merely fail to reflect priorities — it recommends the schedule
|
||
that inflicts the most damage on the highest-priority work.
|
||
|
||
The unweighted mean is defensible only under narrow, identifiable conditions
|
||
(Section 11.5): uniform task sizes, no priority system, one-to-one
|
||
client-task mapping, and no behavioral influence from the metric. These
|
||
conditions are rarely met in practice.
|
||
|
||
**Unweighted average completion time is not a fair or accurate measurement
|
||
of task execution performance. Its adoption as a team metric will
|
||
rationally produce starvation of complex work, violation of stated
|
||
priorities, inequitable client outcomes, and the illusion of productivity
|
||
where none exists.**
|
||
|
||
---
|
||
|
||
## Appendix A. When the Metric Is the Product
|
||
|
||
The preceding twelve sections rest on an implicit assumption: that client
|
||
satisfaction is a function of *experienced service quality* — how long
|
||
*their* task took, relative to its size and urgency. If this assumption
|
||
holds, the proof is valid and the unweighted mean is a destructive metric.
|
||
|
||
But there exists a scenario in which the assumption fails and the entire
|
||
argument collapses.
|
||
|
||
### A.1 The Self-Referential Metric
|
||
|
||
Suppose the service provider reports the unweighted mean completion time
|
||
directly to the client — on a dashboard, in an SLA report, on a marketing
|
||
page — and the client's satisfaction is derived primarily from *that number*
|
||
rather than from their individual experience.
|
||
|
||
Define client satisfaction as:
|
||
|
||
$$U_{\text{client}} = f\!\left(\bar{C}(\sigma)\right), \quad f' < 0$$
|
||
|
||
That is: the client sees "Average resolution time: 6.56 hours" and is
|
||
satisfied, without checking whether *their* ticket — the critical email
|
||
outage — took 6.56 hours or 18.75 hours.
|
||
|
||
Under this model, SPT genuinely maximizes client satisfaction (Theorem 1).
|
||
The service provider's throughput is unchanged (Theorem 6). The business
|
||
outcome improves: same work done, happier client.
|
||
|
||
**Every theorem in this paper remains mathematically correct. But the
|
||
conclusion inverts.** The metric is no longer a proxy for service quality
|
||
that can be gamed — it *is* the service quality, because the client has
|
||
agreed to evaluate quality by the aggregate number rather than by their
|
||
individual experience.
|
||
|
||
### A.2 The Economics
|
||
|
||
This creates a coherent, stable business equilibrium:
|
||
|
||
| Actor | Behavior | Outcome |
|
||
|-------|----------|---------|
|
||
| Provider | Optimizes unweighted mean (SPT) | Metric improves, no extra work |
|
||
| Client | Reads dashboard, sees low average | Reports satisfaction |
|
||
| Management | Sees satisfied client + good metric | Rewards team |
|
||
|
||
Throughput is unchanged (Theorem 6), so the same revenue-generating work
|
||
is completed. The only thing that changed is the *order* — and therefore
|
||
the reported number. Real resources were rearranged, no additional value
|
||
was created, but the business metrics all moved in the right direction.
|
||
|
||
This is *profitable*. The provider extracts satisfaction from the client
|
||
at zero marginal cost, by optimizing a number that the client has accepted
|
||
as a proxy for quality. The client is no worse off *in their own estimation*,
|
||
because they evaluate the aggregate, not their individual experience.
|
||
|
||
### A.3 The Fragility
|
||
|
||
This equilibrium is stable only as long as the client never inspects
|
||
their own experience. It breaks the moment any of the following occur:
|
||
|
||
**1. The client checks their own ticket.**
|
||
|
||
A CTO whose email server was down for 18.75 hours will not be reassured
|
||
by a dashboard reading "Average resolution: 6.56 hours." The aggregate
|
||
metric and the individual experience diverge maximally for high-priority
|
||
tasks (Theorem 4). The clients most likely to inspect their own experience
|
||
are exactly the ones receiving the worst service.
|
||
|
||
**2. A competitor offers per-ticket SLAs.**
|
||
|
||
If an alternative provider guarantees "P1 incidents resolved within 4 hours"
|
||
instead of "average resolution under 7 hours," the aggregate-metric provider
|
||
cannot compete for clients with critical needs — which are typically the
|
||
highest-value clients.
|
||
|
||
**3. The provider's team internalizes the metric.**
|
||
|
||
If the team believes the metric reflects real performance (rather than
|
||
consciously gaming it), they lose the ability to recognize when critical
|
||
work is being neglected. The metric becomes an epistemic hazard: it
|
||
tells the team they are performing well, preventing them from seeing that
|
||
they are not.
|
||
|
||
### A.4 The General Pattern
|
||
|
||
This is not unique to task scheduling. The structure is:
|
||
|
||
1. A measurable proxy is established for an unmeasured quality.
|
||
2. The proxy is reported as if it were the quality itself.
|
||
3. The proxy is optimized, improving the reported number.
|
||
4. The underlying quality diverges from the proxy, but no one measures
|
||
the underlying quality because the proxy exists.
|
||
5. The system is stable until an exogenous shock forces inspection of
|
||
the underlying quality.
|
||
|
||
This pattern appears across domains:
|
||
|
||
| Domain | Proxy metric | Underlying quality | Divergence |
|
||
|--------|-------------|-------------------|------------|
|
||
| IT support | Avg. resolution time | Critical system uptime | Server down for 19 hrs, avg says 6.5 |
|
||
| Education | Standardized test scores | Actual learning | Teaching to the test, understanding declines |
|
||
| Healthcare | Patient throughput | Patient outcomes | Faster discharges, higher readmission rates |
|
||
| Finance | Quarterly earnings | Long-term value creation | Cost-cutting inflates EPS, erodes capability |
|
||
| Software | Velocity (story points) | Deliverable product quality | Point inflation, features half-finished |
|
||
|
||
In each case, the proxy is optimized, the number improves, and the system
|
||
*functions* — profitably, even — until the moment the underlying quality
|
||
is tested by reality.
|
||
|
||
### A.5 A Mathematical Note on Equilibrium Stability
|
||
|
||
Model the system as a game between provider (P) and client (C).
|
||
|
||
**Information structure:**
|
||
- P observes individual completion times $\{C_i\}$ and chooses schedule $\sigma$
|
||
- C observes only the reported aggregate $\bar{C}(\sigma)$
|
||
|
||
**Payoffs:**
|
||
- P's payoff increases with C's satisfaction and is independent of schedule
|
||
(throughput is invariant)
|
||
- C's *reported* satisfaction $U_C = f(\bar{C})$ is maximized by SPT
|
||
- C's *actual* welfare (if they could observe it) depends on individual
|
||
$C_i$ values, especially for high-priority tasks
|
||
|
||
This is a **moral hazard** problem. P has private information (the
|
||
distribution of $C_i$) that C cannot observe. P's optimal strategy is to
|
||
minimize the observable signal ($\bar{C}$) regardless of the unobservable
|
||
distribution — which is exactly SPT.
|
||
|
||
The equilibrium is a **pooling equilibrium**: P's schedule looks identical
|
||
to the client regardless of the underlying priority-weighted performance.
|
||
A provider with PWCT = 10.2 and a provider with PWCT = 10.167 both report
|
||
$\bar{C} = 6.56$ under SPT. The client cannot distinguish between them.
|
||
|
||
This equilibrium is stable under the standard game-theoretic condition:
|
||
**C has no incentive to deviate** (they have no better information source)
|
||
and **P has no incentive to deviate** (any other schedule worsens $\bar{C}$
|
||
with zero throughput benefit).
|
||
|
||
It is *unstable* under **information revelation**: if C obtains access to
|
||
individual $C_i$ values (via a customer portal, a competing vendor's
|
||
transparency, or a sufficiently painful incident), the pooling equilibrium
|
||
collapses and C's evaluation shifts to the underlying quality.
|
||
|
||
### A.6 The Uncomfortable Conclusion
|
||
|
||
The honest answer to "does optimizing the unweighted mean hurt the
|
||
business?" is: **not necessarily, as long as the client never looks
|
||
behind the number**.
|
||
|
||
The honest answer to "does it hurt the client?" is: **only when they
|
||
have a problem large enough to notice** — which is precisely when the
|
||
metric's distortion is largest (Theorem 4).
|
||
|
||
The honest answer to "is this sustainable?" is: it is exactly as
|
||
sustainable as any system in which the seller knows more than the buyer.
|
||
Such systems are historically stable for extended periods and then
|
||
collapse rapidly when the information asymmetry is punctured — by a
|
||
crisis, a competitor, or a regulator.
|
||
|
||
The mathematical structure is clear: the unweighted mean creates an
|
||
information asymmetry between the metric and the reality. Optimizing
|
||
the metric under this asymmetry is *locally rational* for the provider,
|
||
*locally satisfying* for the uninspecting client, and *globally fragile*
|
||
for the relationship.
|
||
|
||
Whether one calls this "efficient market behavior" or "a dystopian
|
||
consequence of optimizing legible numbers over illegible reality" is not
|
||
a mathematical question. The math says only this: **the incentive exists,
|
||
the equilibrium is real, and it holds until it doesn't.**
|
||
|
||
---
|
||
|
||
## Appendix B. The Psychological Cost of Knowing
|
||
|
||
Appendix A modeled the provider as a unitary rational actor — "the team"
|
||
optimizes the metric. But teams are composed of individuals, and those
|
||
individuals have their own utility functions. When a team member
|
||
understands the proof — when they *know* the metric is synthetic, that
|
||
the dashboard is theater, that the email server is still down while they
|
||
close wallpaper tickets — a new cost appears that the equilibrium model
|
||
did not account for.
|
||
|
||
### B.1 The Hidden Variable: Team Awareness
|
||
|
||
Appendix A's game has three actors: provider, client, management. But the
|
||
provider is not monolithic. Decompose it:
|
||
|
||
- **Management (M):** sets the metric, evaluates the team, reports to client
|
||
- **Team member (T):** executes the work, observes individual task states
|
||
- **Client (C):** observes only the reported aggregate
|
||
|
||
The information structure changes:
|
||
|
||
| Actor | Observes individual $C_i$ | Observes aggregate $\bar{C}$ | Understands the proof |
|
||
|-------|--------------------------|-----------------------------|-----------------------|
|
||
| M | Possibly | Yes | Varies |
|
||
| T | **Yes** | Yes | **Yes** (in this scenario) |
|
||
| C | No | Yes | No |
|
||
|
||
The team member has *full information*. They see the ticket queue. They
|
||
know the email server has been down since 7 AM. They know they are closing
|
||
a wallpaper ticket because it will improve the number. And they know *why*
|
||
this is happening — not from vague discomfort, but from a precise
|
||
mathematical understanding that the metric rewards this behavior.
|
||
|
||
### B.2 Cognitive Dissonance Under Full Information
|
||
|
||
Cognitive dissonance (Festinger, 1957) arises when an individual holds
|
||
two contradictory cognitions simultaneously. The standard resolution is
|
||
to modify one cognition to reduce the conflict.
|
||
|
||
A team member operating under the synthetic metric holds:
|
||
|
||
- **Cognition A:** "I am a competent professional. My job is to solve
|
||
important problems for clients."
|
||
- **Cognition B:** "I am closing a wallpaper ticket while the email
|
||
server is down, because it makes the number look better."
|
||
|
||
In the absence of understanding *why*, Cognition B can be rationalized:
|
||
"management knows best," "maybe there's a reason," "the system works
|
||
overall." This is uncomfortable but tolerable — the ambiguity provides
|
||
cognitive cover.
|
||
|
||
**Understanding the proof removes the ambiguity entirely.** The team
|
||
member now holds:
|
||
|
||
- **Cognition A:** Same as above.
|
||
- **Cognition B':** "I am closing a wallpaper ticket while the email
|
||
server is down, because the metric is mathematically biased toward
|
||
small tasks (Theorem 1), the reordering produces zero additional
|
||
throughput (Theorem 6), and the only beneficiary is the dashboard
|
||
(Appendix A). I can prove this."
|
||
|
||
B' is strictly harder to rationalize than B. The team member cannot
|
||
retreat into uncertainty because they possess the proof. The dissonance
|
||
is now *load-bearing*: it must be resolved, and the available resolutions
|
||
are:
|
||
|
||
1. **Reject Cognition A** — "I am not here to solve important problems;
|
||
I am here to move numbers." This is psychologically costly. It
|
||
requires abandoning professional identity.
|
||
|
||
2. **Reject Cognition B'** — "The proof must be wrong, or doesn't apply
|
||
here." This is intellectually costly. The proof is simple enough to
|
||
verify, and the IT example maps directly to their daily experience.
|
||
|
||
3. **Change the situation** — advocate for better metrics, refuse to
|
||
cherry-pick, escalate. This is *professionally* costly in an
|
||
environment that rewards the metric.
|
||
|
||
4. **Leave** — resolve the dissonance by exiting the system entirely.
|
||
|
||
None of these resolutions are free. Each one imposes a cost on the team
|
||
member that did not exist before they understood the proof — and *none of
|
||
them appear in the business equilibrium model of Appendix A*.
|
||
|
||
### B.3 Self-Determination Theory: Three Needs Violated
|
||
|
||
Deci and Ryan's Self-Determination Theory (1985, 2000) identifies three
|
||
innate psychological needs whose satisfaction predicts intrinsic motivation,
|
||
job satisfaction, and well-being:
|
||
|
||
**1. Autonomy** — the need to feel volitional control over one's actions.
|
||
|
||
A team member who understands the proof knows that the metric constrains
|
||
their choices in a way that is mathematically suboptimal for the client.
|
||
Their scheduling decisions are not autonomous expressions of professional
|
||
judgment; they are coerced responses to a flawed incentive. The *knowledge*
|
||
of the coercion — not just the coercion itself — is what damages autonomy.
|
||
A worker who doesn't understand why they're doing something can still feel
|
||
autonomous ("I'm choosing to follow the process"). A worker who understands
|
||
that the process is provably counterproductive cannot.
|
||
|
||
**2. Competence** — the need to feel effective at meaningful tasks.
|
||
|
||
The proof demonstrates that the metric rewards *apparent* effectiveness
|
||
(low $\bar{C}$) while being invariant to *actual* effectiveness (throughput,
|
||
Theorem 6). A team member who understands this knows that the metric
|
||
cannot distinguish between a competent team and an incompetent one that
|
||
happens to cherry-pick small tasks. Their competence is invisible to the
|
||
measurement system. Worse: genuine competence — choosing to fix the email
|
||
server first — is *punished* by the metric ($\bar{C}$ increases from 6.56
|
||
to 13.63 in the IT example).
|
||
|
||
When a measurement system punishes competent decisions and rewards
|
||
incompetent ones, and the team member *knows this*, the need for
|
||
competence is not merely unsatisfied — it is actively contradicted.
|
||
|
||
**3. Relatedness** — the need to feel connected to others and to
|
||
contribute to something meaningful.
|
||
|
||
The team member knows the client's email server is down. They know the
|
||
client is suffering. They know they could help. They are instead updating
|
||
a wallpaper policy — not because it helps anyone, but because it helps
|
||
a number. The connection between the team member's work and the client's
|
||
well-being has been severed by the metric, and the team member *can see
|
||
the severed ends*.
|
||
|
||
### B.4 Moral Injury
|
||
|
||
The concept of moral injury (Shay, 1994; Litz et al., 2009) was developed
|
||
in military psychology to describe the lasting harm caused by
|
||
"perpetrating, failing to prevent, bearing witness to, or learning about
|
||
acts that transgress deeply held moral beliefs." It has since been applied
|
||
to healthcare workers, first responders, and — increasingly — to
|
||
knowledge workers in bureaucratic systems.
|
||
|
||
The key distinction from burnout: **burnout is exhaustion from doing too
|
||
much. Moral injury is damage from doing the wrong thing, or being
|
||
prevented from doing the right thing.**
|
||
|
||
A team member who:
|
||
- Knows the email server is down (witnessing the harm)
|
||
- Knows they should fix it (moral belief about professional duty)
|
||
- Closes a wallpaper ticket instead (transgressing that belief)
|
||
- Does so because the metric requires it (institutional causation)
|
||
|
||
...is experiencing the structural conditions for moral injury. The
|
||
proof doesn't cause the injury — the metric does. But the proof
|
||
eliminates the psychological buffer of ignorance that would otherwise
|
||
mitigate it.
|
||
|
||
### B.5 Learned Helplessness and Metric Fatalism
|
||
|
||
Seligman's learned helplessness framework (1967, 1975) describes the
|
||
phenomenon where exposure to uncontrollable negative outcomes leads to
|
||
passivity even when control becomes available.
|
||
|
||
The sequence for an aware team member:
|
||
|
||
1. **Observation:** The metric is flawed (proof understood).
|
||
2. **Action:** Advocate for change ("we should use priority-weighted
|
||
metrics").
|
||
3. **Outcome:** Rejected ("the client is happy with the current
|
||
dashboard," "this is how we've always measured," "the numbers are
|
||
good, don't rock the boat").
|
||
4. **Repetition:** Steps 2-3 repeat, with decreasing conviction.
|
||
5. **Helplessness:** "The metric is what it is. I'll just close tickets."
|
||
|
||
The terminal state — metric fatalism — is characterized by:
|
||
- Disengagement from professional judgment ("I just do what the queue
|
||
says")
|
||
- Reduced initiative ("why bother triaging if the metric doesn't care?")
|
||
- Cynicism toward measurement generally ("all metrics are fake")
|
||
- Withdrawal of discretionary effort on complex tasks
|
||
|
||
This is not laziness. It is the rational psychological response to a
|
||
system that punishes correct behavior and rewards incorrect behavior,
|
||
when the individual lacks the power to change the system.
|
||
|
||
### B.6 The Turnover Equation
|
||
|
||
The costs described in B.2-B.5 are borne by the team member, not the
|
||
organization — initially. They become organizational costs through
|
||
**turnover**.
|
||
|
||
Model the team member's stay/leave decision:
|
||
|
||
$$\text{Stay if: } \quad V_{\text{compensation}} + V_{\text{intrinsic}} > V_{\text{outside option}}$$
|
||
|
||
The synthetic metric degrades $V_{\text{intrinsic}}$ through each of the
|
||
mechanisms described above:
|
||
|
||
| Mechanism | Component degraded | Effect on $V_{\text{intrinsic}}$ |
|
||
|-----------|-------------------|----------------------------------|
|
||
| Cognitive dissonance (B.2) | Psychological comfort | Decreased |
|
||
| Autonomy violation (B.3.1) | Sense of agency | Decreased |
|
||
| Competence contradiction (B.3.2) | Professional identity | Decreased |
|
||
| Relatedness severance (B.3.3) | Sense of purpose | Decreased |
|
||
| Moral injury (B.4) | Ethical well-being | Decreased |
|
||
| Learned helplessness (B.5) | Belief in efficacy | Decreased |
|
||
|
||
As $V_{\text{intrinsic}}$ decreases, the organization must increase
|
||
$V_{\text{compensation}}$ to retain the team member, or accept their
|
||
departure.
|
||
|
||
Crucially: **the team members most affected are those with the strongest
|
||
professional identity and the deepest understanding of the work.** These
|
||
are the most competent members — the ones most capable of recognizing the
|
||
metric's absurdity, most troubled by it, and most able to find employment
|
||
elsewhere. The metric selects for the departure of the team's best people.
|
||
|
||
### B.7 The Adversarial Selection Spiral
|
||
|
||
Combining Appendix A's equilibrium with the turnover dynamic:
|
||
|
||
1. Organization adopts unweighted mean completion time.
|
||
2. Metric looks good (SPT). Client is satisfied (Appendix A). Management
|
||
is satisfied.
|
||
3. Aware, competent team members experience psychological costs (B.2-B.5).
|
||
4. Those members leave. They are replaced by members who either:
|
||
(a) do not understand the metric's flaws (less competent), or
|
||
(b) do not care (less engaged).
|
||
5. The metric continues to look good — it always does under SPT,
|
||
regardless of team competence (Theorem 6, Corollary 6.1).
|
||
6. Actual service quality degrades (less competent team), but the metric
|
||
cannot detect this (Theorem 9, Corollary 9.1).
|
||
7. Return to step 2.
|
||
|
||
This is an **adversarial selection spiral**: the metric selects *against*
|
||
the people who would improve the system and *for* the people who will not
|
||
challenge it. The system stabilizes at a lower level of actual competence,
|
||
invisible to its own measurement apparatus, staffed by people who have
|
||
made peace with — or are unaware of — the gap between the number and the
|
||
reality.
|
||
|
||
The dashboard still looks good.
|
||
|
||
### B.8 The Complete Cost Model
|
||
|
||
Appendix A concluded that the synthetic-metric equilibrium is stable and
|
||
profitable. Appendix B reveals the hidden costs that model omitted:
|
||
|
||
| Appendix A (visible) | Appendix B (hidden) |
|
||
|---------------------|---------------------|
|
||
| Client satisfied (sees good number) | Team dissatisfied (sees bad reality) |
|
||
| Throughput unchanged | Discretionary effort withdrawn |
|
||
| Metric improves | Competent members leave |
|
||
| Business economy stable | Institutional competence degrades |
|
||
| Zero marginal cost | Replacement/training costs accumulate |
|
||
|
||
The business equilibrium of Appendix A is real. The psychological costs
|
||
of Appendix B are also real. They operate on different timescales:
|
||
the equilibrium is visible quarterly; the competence degradation is
|
||
visible over years.
|
||
|
||
The complete model is not "the metric works" (Appendix A) or "the metric
|
||
is destructive" (Sections 1-12). It is: **the metric works, and it
|
||
is destructive, and the destruction is invisible to the metric.**
|
||
|
||
An organization can run profitably for an extended period on synthetic
|
||
metrics and hollowed-out competence, just as a building can stand for
|
||
years with corroded rebar. The metric is the fresh paint. Appendix A
|
||
proved the paint is convincing. This appendix merely notes that it is
|
||
still paint.
|
||
|
||
---
|
||
|
||
## References
|
||
|
||
### Scheduling Theory
|
||
|
||
[1] Smith, W. E. (1956). Various optimizers for single-stage production.
|
||
*Naval Research Logistics Quarterly*, 3(1–2), 59–66.
|
||
doi:[10.1002/nav.3800030106](https://doi.org/10.1002/nav.3800030106)
|
||
|
||
> Origin of the SPT optimality result (Theorem 1), the weighted completion
|
||
> time rule $w_i/p_i$ descending (WSJF, Theorem 11), and the adjacent-job
|
||
> pairwise interchange (exchange argument) proof technique used throughout
|
||
> this paper.
|
||
|
||
[2] Conway, R. W., Maxwell, W. L., & Miller, L. W. (1967). *Theory of
|
||
Scheduling*. Addison-Wesley.
|
||
|
||
> Comprehensive treatment of single-machine and multi-machine scheduling
|
||
> theory, extending Smith's results. Standard textbook reference for the
|
||
> exchange argument and its generalizations.
|
||
|
||
[3] Little, J. D. C. (1961). A proof for the queuing formula: L = λW.
|
||
*Operations Research*, 9(3), 383–387.
|
||
doi:[10.1287/opre.9.3.383](https://doi.org/10.1287/opre.9.3.383)
|
||
|
||
> First rigorous proof of Little's Law, referenced in Section 5. The
|
||
> result was known informally before 1961; this paper provided the
|
||
> general proof requiring only stationarity and finite expectations.
|
||
|
||
[4] Little, J. D. C. (2011). Little's Law as viewed on its 50th
|
||
anniversary. *Operations Research*, 59(3), 536–549.
|
||
doi:[10.1287/opre.1110.0941](https://doi.org/10.1287/opre.1110.0941)
|
||
|
||
> Retrospective discussing the law's scope, limitations, and
|
||
> common misapplications — including the batch-case subtleties
|
||
> noted in Section 5 of this paper.
|
||
|
||
[5] Reinertsen, D. G. (2009). *The Principles of Product Development
|
||
Flow: Second Generation Lean Product Development*. Celeritas Publishing.
|
||
ISBN: 978-0-9844512-0-8.
|
||
|
||
> Popularized the term "Weighted Shortest Job First" (WSJF) and the
|
||
> "Cost of Delay divided by Duration" formulation in agile/lean product
|
||
> development contexts. The underlying mathematical result is Smith
|
||
> (1956) [1].
|
||
|
||
### Measurement and Incentives
|
||
|
||
[6] Goodhart, C. A. E. (1984). Problems of monetary management: The
|
||
U.K. experience. In C. A. E. Goodhart, *Monetary Theory and Practice:
|
||
The UK Experience* (pp. 91–121). Macmillan.
|
||
|
||
> Source of Goodhart's Law. Original wording: "Any observed statistical
|
||
> regularity will tend to collapse once pressure is placed upon it for
|
||
> control purposes." First presented as a working paper for the Reserve
|
||
> Bank of Australia in 1975.
|
||
|
||
[7] Strathern, M. (1997). 'Improving ratings': Audit in the British
|
||
university system. *European Review*, 5(3), 305–321.
|
||
doi:[10.1002/(SICI)1234-981X(199707)5:3<305::AID-EURO184>3.0.CO;2-4](https://doi.org/10.1002/(SICI)1234-981X(199707)5:3%3C305::AID-EURO184%3E3.0.CO;2-4)
|
||
|
||
> Generalized Goodhart's observation into the form commonly cited today:
|
||
> "When a measure becomes a target, it ceases to be a good measure."
|
||
> Referenced implicitly in Sections 6, 11.4, and Appendix A.4.
|
||
|
||
### Behavioral Economics
|
||
|
||
[8] Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of
|
||
decision under risk. *Econometrica*, 47(2), 263–292.
|
||
doi:[10.2307/1914185](https://doi.org/10.2307/1914185)
|
||
|
||
> Established loss aversion — the finding that losses are weighted
|
||
> approximately twice as heavily as equivalent gains in subjective
|
||
> evaluation. Referenced in Section 7.4 to argue that the dissatisfaction
|
||
> of deprioritized large-task clients outweighs the satisfaction gained
|
||
> by small-task clients under SPT.
|
||
|
||
### Game Theory and Contract Theory
|
||
|
||
[9] Akerlof, G. A. (1970). The market for "lemons": Quality uncertainty
|
||
and the market mechanism. *The Quarterly Journal of Economics*, 84(3),
|
||
488–500. doi:[10.2307/1879431](https://doi.org/10.2307/1879431)
|
||
|
||
> Foundational model of information asymmetry and adverse selection.
|
||
> The pooling equilibrium described in Appendix A.5 — where the client
|
||
> cannot distinguish high-quality from low-quality service because both
|
||
> produce the same aggregate metric — is structurally analogous to
|
||
> Akerlof's lemons problem.
|
||
|
||
[10] Hölmstrom, B. (1979). Moral hazard and observability. *The Bell
|
||
Journal of Economics*, 10(1), 74–91.
|
||
doi:[10.2307/3003320](https://doi.org/10.2307/3003320)
|
||
|
||
> Formal treatment of moral hazard — the problem arising when an agent's
|
||
> actions are not fully observable by the principal. The metric-reporting
|
||
> scenario in Appendix A.5 is a moral hazard problem: the provider
|
||
> (agent) chooses the schedule, but the client (principal) observes only
|
||
> the aggregate outcome.
|
||
|
||
### Psychology
|
||
|
||
[11] Festinger, L. (1957). *A Theory of Cognitive Dissonance*. Stanford
|
||
University Press. ISBN: 978-0-8047-0131-0.
|
||
|
||
> Foundational theory of cognitive dissonance. Referenced in Appendix
|
||
> B.2: an individual holding contradictory cognitions experiences
|
||
> psychological discomfort and is motivated to reduce the contradiction.
|
||
> The proof eliminates the ambiguity that would normally allow
|
||
> rationalization, making the dissonance load-bearing.
|
||
|
||
[12] Deci, E. L., & Ryan, R. M. (1985). *Intrinsic Motivation and
|
||
Self-Determination in Human Behavior*. Plenum Press.
|
||
ISBN: 978-0-306-42022-1.
|
||
|
||
> Original book-length treatment of Self-Determination Theory,
|
||
> identifying autonomy, competence, and relatedness as innate
|
||
> psychological needs. Referenced in Appendix B.3.
|
||
|
||
[13] Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and
|
||
the facilitation of intrinsic motivation, social development, and
|
||
well-being. *American Psychologist*, 55(1), 68–78.
|
||
doi:[10.1037/0003-066X.55.1.68](https://doi.org/10.1037/0003-066X.55.1.68)
|
||
|
||
> Overview and update of Self-Determination Theory, linking need
|
||
> satisfaction to intrinsic motivation, job satisfaction, and
|
||
> psychological well-being. The three-need framework (autonomy,
|
||
> competence, relatedness) applied in Appendix B.3.
|
||
|
||
[14] Seligman, M. E. P., & Maier, S. F. (1967). Failure to escape
|
||
traumatic shock. *Journal of Experimental Psychology*, 74(1), 1–9.
|
||
doi:[10.1037/h0024514](https://doi.org/10.1037/h0024514)
|
||
|
||
> Original experimental demonstration of learned helplessness.
|
||
> Co-authored with Steven F. Maier. Referenced in Appendix B.5:
|
||
> repeated exposure to uncontrollable outcomes (failed advocacy for
|
||
> better metrics) produces passivity and disengagement.
|
||
|
||
[15] Seligman, M. E. P. (1975). *Helplessness: On Depression,
|
||
Development, and Death*. W. H. Freeman.
|
||
ISBN: 978-0-7167-0752-3.
|
||
|
||
> Extended treatment connecting learned helplessness to human depression
|
||
> and institutional behavior. The concept of "metric fatalism" described
|
||
> in Appendix B.5 is a domain-specific instance of learned helplessness
|
||
> in organizational settings.
|
||
|
||
[16] Shay, J. (1994). *Achilles in Vietnam: Combat Trauma and the
|
||
Undoing of Character*. Atheneum / Simon & Schuster.
|
||
ISBN: 978-0-689-12182-3.
|
||
|
||
> Introduced the concept of moral injury through analysis of Vietnam
|
||
> combat veterans' experiences, drawing parallels to Homer's *Iliad*.
|
||
> Defined moral injury as arising from a betrayal of "what's right" by
|
||
> someone in legitimate authority in a high-stakes situation. Referenced
|
||
> in Appendix B.4.
|
||
|
||
[17] Litz, B. T., Stein, N., Delaney, E., Lebowitz, L., Nash, W. P.,
|
||
Silva, C., & Maguen, S. (2009). Moral injury and moral repair in war
|
||
veterans: A preliminary model and intervention strategy. *Clinical
|
||
Psychology Review*, 29(8), 695–706.
|
||
doi:[10.1016/j.cpr.2009.07.003](https://doi.org/10.1016/j.cpr.2009.07.003)
|
||
|
||
> Formalized moral injury as a clinical construct and proposed a
|
||
> treatment model. Defined moral injury as resulting from "perpetrating,
|
||
> failing to prevent, bearing witness to, or learning about acts that
|
||
> transgress deeply held moral beliefs and expectations." This definition
|
||
> is quoted in Appendix B.4 and applied to knowledge workers operating
|
||
> under synthetic metrics.
|
||
|
||
---
|
||
|
||
*This proof was developed conversationally and formalized on 2026-03-28.*
|