Proof: unweighted avg completion time is a biased metric
Mathematical proof that unweighted average task completion time is gameable by scheduling policy (SPT), while work-weighted completion time is schedule-invariant. Demonstrates that SPT's apparent advantage is an artifact of the metric, not genuine throughput improvement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,196 @@
|
||||
# Unweighted Average Completion Time Is Not a Fair Metric for Task Scheduling
|
||||
|
||||
A mathematical proof that unweighted average task completion time is a biased
|
||||
statistic that incentivizes cherry-picking easy work, and that any scheduling
|
||||
advantage it appears to reveal is an artifact of the metric — not a reflection
|
||||
of genuine throughput or service quality.
|
||||
|
||||
---
|
||||
|
||||
## 1. Definitions
|
||||
|
||||
Let there be **n** tasks with processing times $p_1, p_2, \ldots, p_n$.
|
||||
|
||||
A **schedule** $\sigma$ is a permutation of $\{1, 2, \ldots, n\}$ assigning
|
||||
tasks to execution order on a single executor.
|
||||
|
||||
The **completion time** of task $\sigma(k)$ under schedule $\sigma$ is:
|
||||
|
||||
$$C_{\sigma(k)} = \sum_{j=1}^{k} p_{\sigma(j)}$$
|
||||
|
||||
The **unweighted mean completion time** is:
|
||||
|
||||
$$\bar{C}(\sigma) = \frac{1}{n} \sum_{k=1}^{n} C_{\sigma(k)}$$
|
||||
|
||||
The **work-weighted mean completion time** is:
|
||||
|
||||
$$\bar{C}_w(\sigma) = \frac{\sum_{k=1}^{n} p_{\sigma(k)} \cdot C_{\sigma(k)}}{\sum_{k=1}^{n} p_{\sigma(k)}}$$
|
||||
|
||||
---
|
||||
|
||||
## 2. SPT Is Optimal for the Unweighted Statistic
|
||||
|
||||
**Theorem 1.** The schedule that minimizes $\bar{C}(\sigma)$ is Shortest
|
||||
Processing Time first (SPT): sort tasks so that $p_{\sigma(1)} \le p_{\sigma(2)} \le \cdots \le p_{\sigma(n)}$.
|
||||
|
||||
**Proof (exchange argument).**
|
||||
|
||||
Consider any schedule $\sigma$ in which two adjacent tasks $i, j$ satisfy
|
||||
$p_i > p_j$ with task $i$ scheduled immediately before task $j$. Let $t$ be the
|
||||
start time of task $i$.
|
||||
|
||||
| | Task $i$ finishes | Task $j$ finishes | Sum |
|
||||
|---|---|---|---|
|
||||
| **Before swap** ($i$ then $j$) | $t + p_i$ | $t + p_i + p_j$ | $2t + 2p_i + p_j$ |
|
||||
| **After swap** ($j$ then $i$) | $t + p_j$ | $t + p_j + p_i$ | $2t + p_i + 2p_j$ |
|
||||
|
||||
The change in the sum of completion times is:
|
||||
|
||||
$$(2p_i + p_j) - (p_i + 2p_j) = p_i - p_j > 0$$
|
||||
|
||||
Every swap of a longer-before-shorter adjacent pair strictly reduces the total.
|
||||
Any non-SPT schedule contains such a pair. Repeated swaps converge to SPT.
|
||||
Therefore SPT uniquely minimizes $\bar{C}(\sigma)$. $\blacksquare$
|
||||
|
||||
---
|
||||
|
||||
## 3. The Work-Weighted Statistic Is Schedule-Invariant
|
||||
|
||||
**Theorem 2.** The work-weighted mean completion time $\bar{C}_w(\sigma)$ is
|
||||
the same for every schedule $\sigma$.
|
||||
|
||||
**Proof.**
|
||||
|
||||
Expand the numerator:
|
||||
|
||||
$$\sum_{k=1}^{n} p_{\sigma(k)} \cdot C_{\sigma(k)} = \sum_{k=1}^{n} p_{\sigma(k)} \sum_{j=1}^{k} p_{\sigma(j)}$$
|
||||
|
||||
Reindex by letting $a = \sigma(k)$ and $b = \sigma(j)$. The double sum counts
|
||||
every ordered pair $(a, b)$ where $b$ is scheduled no later than $a$:
|
||||
|
||||
$$= \sum_{\substack{a, b \\ b \preceq_\sigma a}} p_a \, p_b$$
|
||||
|
||||
For any pair $(a, b)$ with $a \ne b$, exactly one of $\{b \preceq_\sigma a\}$
|
||||
or $\{a \prec_\sigma b\}$ holds. The diagonal terms ($a = b$) contribute $p_a^2$
|
||||
regardless of order. Therefore:
|
||||
|
||||
$$\sum_{\substack{a, b \\ b \preceq_\sigma a}} p_a \, p_b = \sum_{a} p_a^2 + \sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b$$
|
||||
|
||||
Now consider the complementary sum:
|
||||
|
||||
$$\sum_{\substack{a \ne b \\ a \prec_\sigma b}} p_a \, p_b$$
|
||||
|
||||
Together the two off-diagonal sums cover all unordered pairs $\{a, b\}$:
|
||||
|
||||
$$\sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b + \sum_{\substack{a \ne b \\ a \prec_\sigma b}} p_a \, p_b = \sum_{a \ne b} p_a \, p_b$$
|
||||
|
||||
The right-hand side is schedule-independent. By symmetry of $p_a p_b$, both
|
||||
off-diagonal sums are equal:
|
||||
|
||||
$$\sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b = \frac{1}{2} \sum_{a \ne b} p_a \, p_b$$
|
||||
|
||||
Therefore:
|
||||
|
||||
$$\sum_{k=1}^{n} p_{\sigma(k)} \cdot C_{\sigma(k)} = \sum_a p_a^2 + \frac{1}{2} \sum_{a \ne b} p_a \, p_b = \frac{1}{2}\left(\sum_a p_a\right)^2 + \frac{1}{2}\sum_a p_a^2$$
|
||||
|
||||
This expression contains no reference to $\sigma$. Since the denominator
|
||||
$\sum p_a$ is also schedule-independent:
|
||||
|
||||
$$\bar{C}_w(\sigma) = \frac{\frac{1}{2}\left(\sum p_a\right)^2 + \frac{1}{2}\sum p_a^2}{\sum p_a}$$
|
||||
|
||||
is **constant across all schedules**. $\blacksquare$
|
||||
|
||||
---
|
||||
|
||||
## 4. Concrete Example
|
||||
|
||||
Two tasks: $A$ with $p_A = 1$ hour, $B$ with $p_B = 10$ hours.
|
||||
|
||||
### SPT order (A first)
|
||||
|
||||
| Task | Completion time |
|
||||
|------|----------------|
|
||||
| A | 1 |
|
||||
| B | 11 |
|
||||
|
||||
- Unweighted mean: $(1 + 11) / 2 = 6.0$
|
||||
- Work-weighted mean: $(1 \times 1 + 10 \times 11) / 11 = 111/11 \approx 10.09$
|
||||
|
||||
### Reverse order (B first)
|
||||
|
||||
| Task | Completion time |
|
||||
|------|----------------|
|
||||
| B | 10 |
|
||||
| A | 11 |
|
||||
|
||||
- Unweighted mean: $(10 + 11) / 2 = 10.5$
|
||||
- Work-weighted mean: $(10 \times 10 + 1 \times 11) / 11 = 111/11 \approx 10.09$
|
||||
|
||||
SPT appears **4.5 hours better** on the unweighted metric but provides
|
||||
**zero improvement** on the work-weighted metric. The apparent advantage exists
|
||||
only because the unweighted statistic lets a 1-hour task "vote" equally with
|
||||
a 10-hour task.
|
||||
|
||||
---
|
||||
|
||||
## 5. Connection to Little's Law
|
||||
|
||||
Little's Law states $L = \lambda W$, where $L$ is the average number of tasks
|
||||
in the system, $\lambda$ is the arrival rate, and $W$ is the average time a
|
||||
task spends in the system.
|
||||
|
||||
For a stable system, $L$ and $\lambda$ are determined by arrival and service
|
||||
rates — not by scheduling policy. Therefore $W = L / \lambda$ is
|
||||
**schedule-invariant** when measured correctly (i.e., weighted by the quantity
|
||||
being served).
|
||||
|
||||
SPT appears to violate this only because the unweighted statistic counts
|
||||
*completions* rather than *work*, systematically underweighting large tasks.
|
||||
|
||||
---
|
||||
|
||||
## 6. Consequences
|
||||
|
||||
**Theorem 3 (Metric Bias).** Any scheduling policy that minimizes unweighted
|
||||
mean completion time necessarily maximizes the completion time of the largest
|
||||
task relative to other schedules.
|
||||
|
||||
**Proof.** SPT places the largest task last. Its completion time equals the
|
||||
total processing time $\sum p_i$, which is the maximum possible completion
|
||||
time for any individual task. Meanwhile, FIFO or any non-SPT order would
|
||||
allow the large task to finish earlier. $\blacksquare$
|
||||
|
||||
This creates a **starvation incentive**: rational agents optimizing the
|
||||
unweighted statistic will indefinitely defer large tasks in favor of
|
||||
small ones.
|
||||
|
||||
### Real-world manifestations
|
||||
|
||||
| Domain | Gameable metric | Perverse outcome |
|
||||
|--------|----------------|------------------|
|
||||
| Support desks | Tickets closed / day | Complex issues ignored |
|
||||
| Sprint planning | Story count velocity | Work split into trivial pieces |
|
||||
| Emergency rooms | Average wait time | Critical patients deprioritized |
|
||||
| Academic publishing | Papers per year | Incremental work favored over deep research |
|
||||
|
||||
---
|
||||
|
||||
## 7. Conclusion
|
||||
|
||||
The unweighted average completion time is a **biased statistic** that:
|
||||
|
||||
1. **Can be gamed** by scheduling policy (Theorem 1), unlike work-weighted
|
||||
completion time which is schedule-invariant (Theorem 2).
|
||||
2. **Incentivizes starvation** of large tasks (Theorem 3).
|
||||
3. **Contradicts Little's Law** unless tasks are uniformly sized.
|
||||
|
||||
A metric that can be improved by reordering work — without doing any
|
||||
additional work — is measuring the scheduling policy, not the system's
|
||||
capacity or effectiveness.
|
||||
|
||||
**Unweighted average completion time is not a fair or accurate measurement
|
||||
of task execution performance.**
|
||||
|
||||
---
|
||||
|
||||
*This proof was developed conversationally and formalized on 2026-03-28.*
|
||||
Reference in New Issue
Block a user