Add priority system breakdown, IT example, and devil's advocate
Sections 9-11: Prove that unweighted mean completion time becomes adversarial under priority classification (Theorems 8-10), propose PWCT/WSJF as alternatives with a worked IT service desk example, and present honest counterarguments establishing the narrow conditions under which the unweighted metric remains defensible. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -375,7 +375,461 @@ causes when optimized.
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 9. Conclusion
|
## 9. Complete Breakdown Under Priority Classification
|
||||||
|
|
||||||
|
The preceding sections proved that unweighted mean completion time is biased
|
||||||
|
when tasks vary in size. We now show that introducing a **priority system** —
|
||||||
|
as virtually all real teams use — causes the metric to become not merely
|
||||||
|
biased but **actively adversarial** to the organization's stated goals.
|
||||||
|
|
||||||
|
### 9.1 Extended Model: Tasks With Priority
|
||||||
|
|
||||||
|
Let each task $i$ have processing time $p_i$ and a priority class
|
||||||
|
$q_i \in \{1, 2, 3, 4\}$ where 1 is the highest priority (critical) and
|
||||||
|
4 is the lowest (cosmetic/enhancement). Assign priority weights:
|
||||||
|
|
||||||
|
$$w(q) = \begin{cases} 8 & q = 1 \text{ (Critical)} \\ 4 & q = 2 \text{ (High)} \\ 2 & q = 3 \text{ (Medium)} \\ 1 & q = 4 \text{ (Low)} \end{cases}$$
|
||||||
|
|
||||||
|
The specific weights are illustrative; the results hold for any strictly
|
||||||
|
decreasing weight function. The key property is that priority is assigned
|
||||||
|
by **business impact**, not by task size.
|
||||||
|
|
||||||
|
### 9.2 The Metric Contradicts the Priority System
|
||||||
|
|
||||||
|
**Theorem 8 (Priority-Size Inversion).** When priority is independent of
|
||||||
|
task size, the schedule that minimizes unweighted mean completion time (SPT)
|
||||||
|
will, in expectation, complete low-priority tasks before high-priority tasks
|
||||||
|
of greater size.
|
||||||
|
|
||||||
|
**Proof.**
|
||||||
|
|
||||||
|
SPT orders tasks by $p_i$ ascending, regardless of $q_i$. Consider two tasks:
|
||||||
|
|
||||||
|
- Task A: $p_A = 40$ hours, $q_A = 1$ (Critical — e.g., server outage)
|
||||||
|
- Task B: $p_B = 0.5$ hours, $q_B = 4$ (Low — e.g., cosmetic UI fix)
|
||||||
|
|
||||||
|
SPT schedules B before A. The unweighted mean completion time for this pair:
|
||||||
|
|
||||||
|
$$\bar{C}^{\text{SPT}} = \frac{0.5 + 40.5}{2} = 20.5$$
|
||||||
|
|
||||||
|
The priority-respecting order (A before B):
|
||||||
|
|
||||||
|
$$\bar{C}^{\text{priority}} = \frac{40 + 40.5}{2} = 40.25$$
|
||||||
|
|
||||||
|
The metric declares SPT nearly **twice as good** — despite completing a
|
||||||
|
cosmetic fix while a server outage burns for an additional 0.5 hours.
|
||||||
|
|
||||||
|
In general, for $n$ tasks where priority $q_i$ is statistically independent
|
||||||
|
of processing time $p_i$ (a reasonable assumption, since priority reflects
|
||||||
|
business impact while processing time reflects technical complexity):
|
||||||
|
|
||||||
|
$$\text{Corr}(p_i, q_i) \approx 0$$
|
||||||
|
|
||||||
|
SPT's ordering is determined entirely by $p_i$. The expected position of a
|
||||||
|
task in the SPT schedule has **zero correlation** with its priority. A
|
||||||
|
Critical task is equally likely to be scheduled first or last.
|
||||||
|
|
||||||
|
More precisely: the expected fraction of Critical tasks in the bottom half
|
||||||
|
of the SPT schedule equals the fraction of Critical tasks whose processing
|
||||||
|
time exceeds the median. In practice, Critical tasks (outages, security
|
||||||
|
incidents, data loss) often require more work, so this fraction exceeds 50%.
|
||||||
|
The metric is not merely uncorrelated with priority — it is plausibly
|
||||||
|
**anti-correlated**. $\blacksquare$
|
||||||
|
|
||||||
|
### 9.3 Dimensionality Collapse
|
||||||
|
|
||||||
|
The unweighted mean completion time reduces a three-dimensional task
|
||||||
|
$(p_i, q_i, C_i)$ to a one-dimensional signal ($C_i$), then averages
|
||||||
|
that signal uniformly. This discards two of the three dimensions:
|
||||||
|
|
||||||
|
1. **Priority ($q_i$) is completely ignored.** A critical task and a
|
||||||
|
cosmetic task contribute identically to the mean.
|
||||||
|
2. **Size ($p_i$) is implicitly inverted.** Small tasks are rewarded with
|
||||||
|
early completion, large tasks are punished — regardless of their
|
||||||
|
importance.
|
||||||
|
|
||||||
|
**Theorem 9 (Information Destruction).** Let $I(\sigma)$ be the mutual
|
||||||
|
information between the schedule's implicit priority ranking (position in
|
||||||
|
schedule) and the actual priority assignment $q_i$. For SPT:
|
||||||
|
|
||||||
|
$$I(\sigma_{\text{SPT}}) = 0 \quad \text{when } p_i \perp q_i$$
|
||||||
|
|
||||||
|
**Proof.** SPT assigns positions based solely on $p_i$. When $p_i$ and $q_i$
|
||||||
|
are independent, knowing a task's position in the SPT schedule provides
|
||||||
|
zero information about its priority. The schedule is statistically
|
||||||
|
independent of the priority system.
|
||||||
|
|
||||||
|
Contrast this with a priority-first schedule, where $I > 0$ by construction.
|
||||||
|
$\blacksquare$
|
||||||
|
|
||||||
|
**Corollary 9.1.** A team that optimizes unweighted mean completion time
|
||||||
|
is operating a scheduling system that carries zero information about its
|
||||||
|
own priority classification. The priority field in their ticketing system
|
||||||
|
is, with respect to execution order, decorative.
|
||||||
|
|
||||||
|
### 9.4 Quantifying the Damage: Priority-Weighted Delay Cost
|
||||||
|
|
||||||
|
Define the **priority-weighted delay cost** of a schedule:
|
||||||
|
|
||||||
|
$$D(\sigma) = \sum_{i=1}^{n} w(q_i) \cdot C_i$$
|
||||||
|
|
||||||
|
This measures the total business-impact-weighted time spent waiting.
|
||||||
|
|
||||||
|
**Theorem 10 (SPT Maximizes Priority-Weighted Delay in the Worst Case).**
|
||||||
|
Among all schedules, SPT produces the highest priority-weighted delay cost
|
||||||
|
when high-priority tasks are large and low-priority tasks are small.
|
||||||
|
|
||||||
|
**Proof.** Consider the worst case: all Critical ($q = 1$) tasks have
|
||||||
|
processing time $p_H$ and all Low ($q = 4$) tasks have processing time
|
||||||
|
$p_L$, with $p_H > p_L$. Let there be $n_H$ critical tasks and $n_L$ low
|
||||||
|
tasks, $n = n_H + n_L$.
|
||||||
|
|
||||||
|
SPT places all $n_L$ low tasks first, then all $n_H$ critical tasks.
|
||||||
|
|
||||||
|
The priority-weighted delay cost under SPT:
|
||||||
|
|
||||||
|
$$D_{\text{SPT}} = w(4) \sum_{k=1}^{n_L} k \cdot p_L + w(1) \sum_{k=1}^{n_H} (n_L \cdot p_L + k \cdot p_H)$$
|
||||||
|
|
||||||
|
$$= 1 \cdot \frac{n_L(n_L+1)}{2} p_L + 8 \left( n_H \cdot n_L \cdot p_L + \frac{n_H(n_H+1)}{2} p_H \right)$$
|
||||||
|
|
||||||
|
Under priority-first scheduling (all Critical tasks first):
|
||||||
|
|
||||||
|
$$D_{\text{priority}} = w(1) \sum_{k=1}^{n_H} k \cdot p_H + w(4) \sum_{k=1}^{n_L} (n_H \cdot p_H + k \cdot p_L)$$
|
||||||
|
|
||||||
|
$$= 8 \cdot \frac{n_H(n_H+1)}{2} p_H + 1 \cdot \left( n_L \cdot n_H \cdot p_H + \frac{n_L(n_L+1)}{2} p_L \right)$$
|
||||||
|
|
||||||
|
The difference $D_{\text{SPT}} - D_{\text{priority}}$ simplifies. The critical
|
||||||
|
cross-terms are:
|
||||||
|
|
||||||
|
- SPT charges $8 \cdot n_H \cdot n_L \cdot p_L$ for Critical tasks waiting
|
||||||
|
behind Low tasks.
|
||||||
|
- Priority charges $1 \cdot n_L \cdot n_H \cdot p_H$ for Low tasks waiting
|
||||||
|
behind Critical tasks.
|
||||||
|
|
||||||
|
Since $w(1) = 8$ and $w(4) = 1$:
|
||||||
|
|
||||||
|
$$D_{\text{SPT}} - D_{\text{priority}} = n_H \cdot n_L \cdot (8 p_L - p_H) + n_H \cdot n_L \cdot (p_H - 8 p_L)$$
|
||||||
|
|
||||||
|
Wait — let me compute this more carefully. The cross-term in SPT is the
|
||||||
|
cost of all Critical tasks being delayed by all Low tasks:
|
||||||
|
|
||||||
|
$$\Delta_{\text{cross}} = w(1) \cdot n_H \cdot n_L \cdot p_L - w(4) \cdot n_L \cdot n_H \cdot p_H$$
|
||||||
|
$$= n_H \cdot n_L \cdot (8 p_L - p_H)$$
|
||||||
|
|
||||||
|
When $p_H > 8 p_L$, the priority-first schedule wins on *both* the
|
||||||
|
priority-weighted metric and unweighted metric — SPT is Pareto-dominated.
|
||||||
|
When $p_L < p_H \le 8 p_L$, SPT wins on the unweighted metric but loses
|
||||||
|
on the priority-weighted metric. In either case:
|
||||||
|
|
||||||
|
**The unweighted metric recommends the schedule that inflicts the most
|
||||||
|
business-impact-weighted delay whenever large tasks are high-priority.** $\blacksquare$
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10. A Proposed Solution: Priority-Weighted Completion Score
|
||||||
|
|
||||||
|
### 10.1 The Metric
|
||||||
|
|
||||||
|
Replace unweighted mean completion time with the **Priority-Weighted
|
||||||
|
Completion Score (PWCS)**:
|
||||||
|
|
||||||
|
$$\text{PWCS}(\sigma) = \frac{\sum_{i=1}^{n} w(q_i) \cdot \frac{C_i}{p_i}}{\sum_{i=1}^{n} w(q_i)}$$
|
||||||
|
|
||||||
|
This is the priority-weighted mean slowdown ratio. It measures:
|
||||||
|
|
||||||
|
- **How long each task waited relative to its size** (the slowdown $C_i / p_i$),
|
||||||
|
weighted by
|
||||||
|
- **How much that task mattered** (the priority weight $w(q_i)$).
|
||||||
|
|
||||||
|
Lower is better. A PWCS of 1.0 means every task was completed instantly
|
||||||
|
with zero queuing delay. A PWCS of 3.0 means the average task waited 3x
|
||||||
|
its processing time, weighted by importance.
|
||||||
|
|
||||||
|
### 10.2 Properties of PWCS
|
||||||
|
|
||||||
|
**Property 1: Priority-respecting.** PWCS penalizes delays to high-priority
|
||||||
|
tasks more heavily than low-priority tasks. A 2-hour delay to a Critical
|
||||||
|
task costs 8x more than the same delay to a Low task.
|
||||||
|
|
||||||
|
**Property 2: Size-fair.** By using the slowdown ratio $C_i / p_i$ rather
|
||||||
|
than raw completion time $C_i$, the metric does not inherently penalize
|
||||||
|
large tasks for being large. A 40-hour task that waits 80 hours contributes
|
||||||
|
the same slowdown (2.0) as a 1-hour task that waits 2 hours.
|
||||||
|
|
||||||
|
**Property 3: Not gameable by SPT.** Because the metric weights by priority
|
||||||
|
and normalizes by task size, reordering tasks by processing time does not
|
||||||
|
systematically improve the score. The optimal strategy is to minimize
|
||||||
|
slowdown for high-priority tasks — i.e., to **actually respect the priority
|
||||||
|
system**.
|
||||||
|
|
||||||
|
**Property 4: Reduces to unweighted mean when tasks are uniform.** If all
|
||||||
|
tasks have equal priority and equal size, PWCS equals the unweighted mean
|
||||||
|
completion time divided by the common task size. It is a strict
|
||||||
|
generalization.
|
||||||
|
|
||||||
|
### 10.3 Optimal Policy for PWCS
|
||||||
|
|
||||||
|
**Theorem 11.** The schedule minimizing PWCS processes tasks in order of
|
||||||
|
decreasing $w(q_i) / p_i$ — highest priority first, breaking ties by
|
||||||
|
shortest processing time within the same priority class.
|
||||||
|
|
||||||
|
**Proof (exchange argument, as in Theorem 1).**
|
||||||
|
|
||||||
|
Consider adjacent tasks $i, j$ with $i$ before $j$. Each task's contribution
|
||||||
|
to the PWCS numerator depends on the completion times of both. Swapping $i$
|
||||||
|
and $j$:
|
||||||
|
|
||||||
|
The change in the weighted slowdown sum is proportional to:
|
||||||
|
|
||||||
|
$$w(q_i) \cdot \frac{p_j}{p_i} - w(q_j) \cdot \frac{p_i}{p_j}$$
|
||||||
|
|
||||||
|
The swap improves PWCS when this quantity is positive, i.e., when:
|
||||||
|
|
||||||
|
$$\frac{w(q_i)}{p_i^2} > \frac{w(q_j)}{p_j^2}$$
|
||||||
|
|
||||||
|
Hmm — this doesn't simplify as cleanly due to the ratio structure. Let
|
||||||
|
us instead consider the more practical **priority-weighted completion time**:
|
||||||
|
|
||||||
|
$$\text{PWCT}(\sigma) = \frac{\sum_{i=1}^{n} w(q_i) \cdot C_i}{\sum_{i=1}^{n} w(q_i)}$$
|
||||||
|
|
||||||
|
For PWCT, the exchange argument gives: swap improves the score when
|
||||||
|
$w(q_j) \cdot p_i > w(q_i) \cdot p_j$, i.e., when $w(q_j)/p_j > w(q_i)/p_i$
|
||||||
|
but $j$ is scheduled after $i$. The optimal order is therefore decreasing
|
||||||
|
$w(q_i)/p_i$, which is the **Weighted Shortest Job First (WSJF)** rule:
|
||||||
|
|
||||||
|
$$\text{Schedule by: } \frac{w(q_i)}{p_i} \text{ descending}$$
|
||||||
|
|
||||||
|
This means: within a priority class, do short tasks first; across priority
|
||||||
|
classes, a Critical 8-hour task ($w/p = 8/8 = 1.0$) ties with a Low 1-hour
|
||||||
|
task ($w/p = 1/1 = 1.0$) — but a Critical 4-hour task ($w/p = 8/4 = 2.0$)
|
||||||
|
beats both. $\blacksquare$
|
||||||
|
|
||||||
|
### 10.4 Applied Example: IT Service Desk
|
||||||
|
|
||||||
|
Consider an IT team with the following ticket queue on a Monday morning:
|
||||||
|
|
||||||
|
| Ticket | Priority | Type | Est. Hours |
|
||||||
|
|--------|----------|------|-----------|
|
||||||
|
| T1 | P1 (Critical) | Email server down | 6 |
|
||||||
|
| T2 | P2 (High) | VPN failing for remote team | 4 |
|
||||||
|
| T3 | P3 (Medium) | New employee laptop setup | 2 |
|
||||||
|
| T4 | P4 (Low) | Update desktop wallpaper policy | 0.5 |
|
||||||
|
| T5 | P3 (Medium) | Install software license | 1 |
|
||||||
|
| T6 | P1 (Critical) | Database backup failing | 3 |
|
||||||
|
| T7 | P2 (High) | Printer fleet offline | 2 |
|
||||||
|
| T8 | P4 (Low) | Archive old shared drive folder | 0.25 |
|
||||||
|
|
||||||
|
**SPT order (optimizing unweighted mean):** T8, T4, T5, T3, T7, T6, T2, T1
|
||||||
|
|
||||||
|
| Position | Ticket | Priority | Hours | Completion | Slowdown |
|
||||||
|
|----------|--------|----------|-------|------------|----------|
|
||||||
|
| 1 | T8 (archive folder) | P4 Low | 0.25 | 0.25 | 1.0 |
|
||||||
|
| 2 | T4 (wallpaper) | P4 Low | 0.5 | 0.75 | 1.5 |
|
||||||
|
| 3 | T5 (software) | P3 Med | 1 | 1.75 | 1.75 |
|
||||||
|
| 4 | T3 (laptop) | P3 Med | 2 | 3.75 | 1.875 |
|
||||||
|
| 5 | T7 (printers) | P2 High | 2 | 5.75 | 2.875 |
|
||||||
|
| 6 | T6 (backups) | P1 Crit | 3 | 8.75 | 2.917 |
|
||||||
|
| 7 | T2 (VPN) | P2 High | 4 | 12.75 | 3.1875 |
|
||||||
|
| 8 | T1 (email) | P1 Crit | 6 | 18.75 | 3.125 |
|
||||||
|
|
||||||
|
- **Unweighted mean completion:** $(0.25 + 0.75 + 1.75 + 3.75 + 5.75 + 8.75 + 12.75 + 18.75) / 8 = 6.5625$ hours
|
||||||
|
- **PWCT:** $(1 \cdot 0.25 + 1 \cdot 0.75 + 2 \cdot 1.75 + 2 \cdot 3.75 + 4 \cdot 5.75 + 8 \cdot 8.75 + 4 \cdot 12.75 + 8 \cdot 18.75) / 30 = 9.225$ hours
|
||||||
|
- Email server is down for **18.75 hours**. Database backups fail for **8.75 hours**.
|
||||||
|
|
||||||
|
**WSJF order (optimizing PWCT by $w(q)/p$ descending):**
|
||||||
|
|
||||||
|
| Ticket | Priority | Hours | $w/p$ |
|
||||||
|
|--------|----------|-------|-------|
|
||||||
|
| T6 | P1 Crit | 3 | 8/3 = 2.667 |
|
||||||
|
| T8 | P4 Low | 0.25 | 1/0.25 = 4.0 |
|
||||||
|
| T5 | P3 Med | 1 | 2/1 = 2.0 |
|
||||||
|
| T4 | P4 Low | 0.5 | 1/0.5 = 2.0 |
|
||||||
|
| T1 | P1 Crit | 6 | 8/6 = 1.333 |
|
||||||
|
| T7 | P2 High | 2 | 4/2 = 2.0 |
|
||||||
|
| T2 | P2 High | 4 | 4/4 = 1.0 |
|
||||||
|
| T3 | P3 Med | 2 | 2/2 = 1.0 |
|
||||||
|
|
||||||
|
Wait — T8 has $w/p = 4.0$, the highest. That places a Low-priority task
|
||||||
|
first, which feels wrong. This reveals an important practical point:
|
||||||
|
**pure WSJF can still be gamed by tiny tasks** because their small $p$
|
||||||
|
inflates the ratio. In practice, this is mitigated by enforcing strict
|
||||||
|
priority class ordering and only applying WSJF *within* priority classes.
|
||||||
|
|
||||||
|
**Practical WSJF (priority-class-first, then $w/p$ within class):**
|
||||||
|
|
||||||
|
| Position | Ticket | Priority | Hours | Completion |
|
||||||
|
|----------|--------|----------|-------|------------|
|
||||||
|
| 1 | T6 (backups) | P1 Crit | 3 | 3 |
|
||||||
|
| 2 | T1 (email) | P1 Crit | 6 | 9 |
|
||||||
|
| 3 | T7 (printers) | P2 High | 2 | 11 |
|
||||||
|
| 4 | T2 (VPN) | P2 High | 4 | 15 |
|
||||||
|
| 5 | T5 (software) | P3 Med | 1 | 16 |
|
||||||
|
| 6 | T3 (laptop) | P3 Med | 2 | 18 |
|
||||||
|
| 7 | T8 (archive) | P4 Low | 0.25 | 18.25 |
|
||||||
|
| 8 | T4 (wallpaper) | P4 Low | 0.5 | 18.75 |
|
||||||
|
|
||||||
|
- **Unweighted mean completion:** $(3 + 9 + 11 + 15 + 16 + 18 + 18.25 + 18.75) / 8 = 13.625$ hours
|
||||||
|
- **PWCT:** $(8 \cdot 3 + 8 \cdot 9 + 4 \cdot 11 + 4 \cdot 15 + 2 \cdot 16 + 2 \cdot 18 + 1 \cdot 18.25 + 1 \cdot 18.75) / 30 = 6.633$ hours
|
||||||
|
- Email server restored in **9 hours**. Backups fixed in **3 hours**.
|
||||||
|
|
||||||
|
### Comparison
|
||||||
|
|
||||||
|
| Metric | SPT | Practical WSJF | Winner |
|
||||||
|
|--------|-----|----------------|--------|
|
||||||
|
| Unweighted mean completion | **6.5625 hrs** | 13.625 hrs | SPT |
|
||||||
|
| Priority-weighted completion (PWCT) | 9.225 hrs | **6.633 hrs** | WSJF |
|
||||||
|
| Time to fix email server | 18.75 hrs | **9 hrs** | WSJF |
|
||||||
|
| Time to fix database backups | 8.75 hrs | **3 hrs** | WSJF |
|
||||||
|
| Time to fix printers | 5.75 hrs | **11 hrs** | SPT |
|
||||||
|
| Time to update wallpaper | **0.75 hrs** | 18.75 hrs | SPT |
|
||||||
|
|
||||||
|
SPT wins the unweighted metric by completing wallpaper policies and folder
|
||||||
|
archives first. WSJF wins every metric that accounts for business impact.
|
||||||
|
|
||||||
|
The unweighted metric would report that the SPT team is **more than twice
|
||||||
|
as efficient** (6.56 vs 13.63), when in reality the SPT team left a critical
|
||||||
|
email outage burning for nearly an entire business day while updating desktop
|
||||||
|
wallpaper.
|
||||||
|
|
||||||
|
### 10.5 Recommended Metric Suite
|
||||||
|
|
||||||
|
No single metric suffices. A complete measurement system for a priority-based
|
||||||
|
team should track:
|
||||||
|
|
||||||
|
| Metric | What it measures | Formula |
|
||||||
|
|--------|-----------------|---------|
|
||||||
|
| **PWCT** | Business-impact-weighted responsiveness | $\sum w(q_i) C_i / \sum w(q_i)$ |
|
||||||
|
| **P1 mean time to resolution** | Critical incident response | $\bar{C}$ filtered to $q = 1$ |
|
||||||
|
| **Throughput** | Raw work capacity | Work-hours completed / calendar time |
|
||||||
|
| **Aging violations** | Starvation prevention | Count of tasks exceeding SLA by priority |
|
||||||
|
| **Slowdown by priority class** | Equity across task types | $\bar{S}$ grouped by $q$ |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 11. Devil's Advocate: The Case for Unweighted Mean Completion Time
|
||||||
|
|
||||||
|
Intellectual honesty requires acknowledging where the preceding argument
|
||||||
|
has limits. The following are genuine counterarguments — not strawmen.
|
||||||
|
|
||||||
|
### 11.1 Simplicity Has Real Value
|
||||||
|
|
||||||
|
**Argument.** The unweighted mean is trivially computable: sum the completion
|
||||||
|
times, divide by the count. It requires no priority weights, no task-size
|
||||||
|
estimates, no calibration. Every alternative proposed in Section 10 requires
|
||||||
|
estimating $p_i$ (task size) before the task is complete — and these
|
||||||
|
estimates are notoriously unreliable.
|
||||||
|
|
||||||
|
**Assessment: This is true.** PWCS and PWCT require inputs (priority
|
||||||
|
weights, size estimates) that introduce their own sources of error. If size
|
||||||
|
estimates are systematically wrong — and in software engineering they often
|
||||||
|
are, with large tasks underestimated and small tasks overestimated — then
|
||||||
|
the weighted metric inherits that noise.
|
||||||
|
|
||||||
|
However, the unweighted metric does not avoid this problem — it *hides* it
|
||||||
|
by implicitly setting all weights to 1 and all sizes to 1. That is not
|
||||||
|
"making no assumptions"; it is making the specific assumption that all tasks
|
||||||
|
are equally important and equally sized, which is demonstrably false in any
|
||||||
|
real system. **A known-imprecise estimate of task size is still more
|
||||||
|
informative than the implicit assumption that all sizes are equal.**
|
||||||
|
|
||||||
|
### 11.2 Minimizing the Number of People Waiting
|
||||||
|
|
||||||
|
**Argument.** If each task represents one client, then unweighted mean
|
||||||
|
completion time minimizes the total person-hours spent waiting. SPT is
|
||||||
|
optimal for this because completing short tasks first "frees" the most
|
||||||
|
people from the queue earliest.
|
||||||
|
|
||||||
|
**Assessment: This is mathematically correct.** The sum $\sum C_i$ counts
|
||||||
|
total person-time in the system. SPT genuinely minimizes this quantity.
|
||||||
|
If you run a DMV and every person's time is equally valuable regardless of
|
||||||
|
why they're there, SPT is the right policy.
|
||||||
|
|
||||||
|
The argument breaks down when:
|
||||||
|
|
||||||
|
1. **Tasks are not 1:1 with clients.** In IT, one client may submit tasks
|
||||||
|
of varying size. Across a relationship, SPT systematically fast-tracks
|
||||||
|
their easy requests and starves their hard ones — which is not perceived
|
||||||
|
as good service.
|
||||||
|
|
||||||
|
2. **Waiting cost is not uniform.** A person waiting for a server outage
|
||||||
|
to be fixed is not equivalent to a person waiting for a wallpaper change.
|
||||||
|
The cost of waiting is proportional to the *impact* of the unresolved
|
||||||
|
task, which is what priority encodes.
|
||||||
|
|
||||||
|
3. **The metric is applied to teams, not DMVs.** When a team's performance
|
||||||
|
is measured by unweighted mean, the rational response is to cherry-pick
|
||||||
|
— which is individually rational but collectively destructive.
|
||||||
|
|
||||||
|
### 11.3 SPT as a Triage Heuristic
|
||||||
|
|
||||||
|
**Argument.** In high-volume systems where task sizes cluster tightly
|
||||||
|
(e.g., a call center where most calls are 3-7 minutes), SPT approximates
|
||||||
|
FIFO and the unweighted mean approximates the weighted mean. The pathologies
|
||||||
|
described in this paper only manifest when task sizes span orders of
|
||||||
|
magnitude.
|
||||||
|
|
||||||
|
**Assessment: This is correct.** As shown in Section 8, when task sizes are
|
||||||
|
approximately uniform, all scheduling policies converge and all metrics
|
||||||
|
agree. The coefficient of variation of task size, $CV = \sigma_p / \bar{p}$,
|
||||||
|
determines the severity of the distortion:
|
||||||
|
|
||||||
|
| $CV$ | Task size distribution | Metric distortion |
|
||||||
|
|------|----------------------|-------------------|
|
||||||
|
| < 0.3 | Tight (call center) | Negligible |
|
||||||
|
| 0.3 - 1.0 | Moderate (mixed IT) | Moderate |
|
||||||
|
| > 1.0 | Wide (typical IT queue) | Severe |
|
||||||
|
|
||||||
|
For a typical IT service desk, task sizes range from 15 minutes (password
|
||||||
|
reset) to 40+ hours (infrastructure migration), giving $CV > 2$. The
|
||||||
|
distortion is not a theoretical edge case — it is the default condition.
|
||||||
|
|
||||||
|
### 11.4 Gaming Requires Malice
|
||||||
|
|
||||||
|
**Argument.** The theorems show that the metric *can* be gamed, not that it
|
||||||
|
*will* be gamed. A well-intentioned team might use the unweighted mean as
|
||||||
|
a rough health indicator without actively optimizing for it, avoiding the
|
||||||
|
pathologies described.
|
||||||
|
|
||||||
|
**Assessment: This is the strongest counterargument.** If the metric is
|
||||||
|
used purely for monitoring — "are we completing things at a reasonable
|
||||||
|
pace?" — and not for performance evaluation, rewards, or scheduling
|
||||||
|
decisions, then the gaming incentive is absent and the metric is relatively
|
||||||
|
harmless.
|
||||||
|
|
||||||
|
However, this argument requires the metric to remain purely informational
|
||||||
|
and never influence behavior. In practice, any metric that is reported to
|
||||||
|
management, tied to OKRs, or used in sprint retrospectives will influence
|
||||||
|
behavior — this is Goodhart's Law, and it applies to well-intentioned teams
|
||||||
|
as reliably as to cynical ones. The team need not be gaming the metric
|
||||||
|
consciously; it is sufficient that completing three easy tickets "feels
|
||||||
|
productive" while staring at one hard ticket does not. The metric validates
|
||||||
|
the feeling, and the drift happens organically.
|
||||||
|
|
||||||
|
### 11.5 Summary: When the Unweighted Mean Is Defensible
|
||||||
|
|
||||||
|
The unweighted mean completion time is a defensible metric **only when all
|
||||||
|
four conditions hold simultaneously**:
|
||||||
|
|
||||||
|
1. Task sizes are approximately uniform ($CV < 0.3$)
|
||||||
|
2. There is no priority differentiation (all tasks are equally important)
|
||||||
|
3. Each task represents exactly one client
|
||||||
|
4. The metric is not used to evaluate, reward, or direct team behavior
|
||||||
|
|
||||||
|
In a system satisfying all four conditions — such as a simple FIFO queue
|
||||||
|
with uniform jobs and no priority system — the unweighted mean is adequate,
|
||||||
|
and its simplicity is a genuine advantage.
|
||||||
|
|
||||||
|
In any system that violates even one of these conditions — which includes
|
||||||
|
virtually every IT service desk, development team, and support organization
|
||||||
|
— the metric produces the distortions proven in Sections 2-9.
|
||||||
|
|
||||||
|
The honest conclusion is not that the unweighted mean is always wrong. It is
|
||||||
|
that the conditions under which it is right are narrow, easily identified,
|
||||||
|
and rarely met in the systems where it is most commonly used.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 12. Conclusion
|
||||||
|
|
||||||
The unweighted average completion time is a **biased statistic** that:
|
The unweighted average completion time is a **biased statistic** that:
|
||||||
|
|
||||||
@@ -385,16 +839,27 @@ The unweighted average completion time is a **biased statistic** that:
|
|||||||
3. **Contradicts Little's Law** unless tasks are uniformly sized.
|
3. **Contradicts Little's Law** unless tasks are uniformly sized.
|
||||||
4. **Degrades client satisfaction** with zero compensating productivity
|
4. **Degrades client satisfaction** with zero compensating productivity
|
||||||
gain (Theorem 7).
|
gain (Theorem 7).
|
||||||
|
5. **Actively contradicts priority systems** by carrying zero information
|
||||||
|
about business-impact classification (Theorem 9).
|
||||||
|
6. **Maximizes priority-weighted delay** in the most common real-world
|
||||||
|
scenario where high-priority tasks are large (Theorem 10).
|
||||||
|
|
||||||
A metric that can be improved by reordering work — without doing any
|
A metric that can be improved by reordering work — without doing any
|
||||||
additional work — is measuring the scheduling policy, not the system's
|
additional work — is measuring the scheduling policy, not the system's
|
||||||
capacity or effectiveness. When optimized, it actively harms the clients
|
capacity or effectiveness. When combined with a priority system, the metric
|
||||||
who need the most from the system.
|
does not merely fail to reflect priorities — it recommends the schedule
|
||||||
|
that inflicts the most damage on the highest-priority work.
|
||||||
|
|
||||||
|
The unweighted mean is defensible only under narrow, identifiable conditions
|
||||||
|
(Section 11.5): uniform task sizes, no priority system, one-to-one
|
||||||
|
client-task mapping, and no behavioral influence from the metric. These
|
||||||
|
conditions are rarely met in practice.
|
||||||
|
|
||||||
**Unweighted average completion time is not a fair or accurate measurement
|
**Unweighted average completion time is not a fair or accurate measurement
|
||||||
of task execution performance. Its adoption as a team metric will
|
of task execution performance. Its adoption as a team metric will
|
||||||
rationally produce starvation of complex work, inequitable client
|
rationally produce starvation of complex work, violation of stated
|
||||||
outcomes, and the illusion of productivity where none exists.**
|
priorities, inequitable client outcomes, and the illusion of productivity
|
||||||
|
where none exists.**
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user