Fix mathematical errors in Theorems 4, 5, 10 and IT example

Corrections: - Theorem 4: Restated from "maximizes slowdown inequality" (wrong) to "uniquely assigns max completion time to largest task" (correct). SPT actually compresses slowdown variance; harm is in absolute delay. - Theorem 5: Completely rewritten. Old claim that LPT minimizes slowdown variance was backwards (verified: tasks [1,5,10] give SPT var=0.06, LPT var=42.2). New theorem correctly states SPT concentrates absolute delay on the largest task. - Theorem 10: Removed draft language ("Wait —"), corrected cross-term analysis. Old claim that SPT is Pareto-dominated when p_H > 8p_L was wrong (verified: n_H=2,n_L=2,p_H=10,p_L=1 gives D_SPT=275 < D_pri=283). Replaced with correct WSJF exchange argument. - IT example: Fixed PWCT arithmetic (9.225→10.2, 6.633→10.167). Added honest discussion that aggregate PWCT fails to distinguish schedules; per-priority-class metrics are needed. - Section 5: Added caveat that Little's Law batch-case application is not straightforward; clarified what Theorem 2 actually proves. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 17:18:31 -04:00
parent 3edc5d33b2
commit 574eca5b27
1 changed files with 133 additions and 122 deletions
@@ -135,17 +135,23 @@ a 10-hour task.

 ## 5. Connection to Little's Law

-Little's Law states $L = \lambda W$, where $L$ is the average number of tasks
-in the system, $\lambda$ is the arrival rate, and $W$ is the average time a
-task spends in the system.
+Little's Law states $L = \lambda W$, where $L$ is the time-averaged number
+of tasks in the system, $\lambda$ is the arrival rate, and $W$ is the
+average time a task spends in the system.

-For a stable system, $L$ and $\lambda$ are determined by arrival and service
-rates — not by scheduling policy. Therefore $W = L / \lambda$ is
-**schedule-invariant** when measured correctly (i.e., weighted by the quantity
-being served).
+In a *steady-state* queueing system with fixed arrival and service rates,
+$\lambda$ and the long-run service rate are determined by the workload, not
+by scheduling policy. Little's Law then tells us that $L$ and $W$ are
+linked, but in the batch case (all $n$ tasks present at time 0), $L$ and
+$W$ are both schedule-dependent: $\bar{C} = W$, and
+$L = \sum C_i / \sum p_i$, both of which SPT minimizes.

-SPT appears to violate this only because the unweighted statistic counts
-*completions* rather than *work*, systematically underweighting large tasks.
+The invariance we proved in Theorem 2 is more specific: *work-weighted*
+mean completion time $\bar{C}_w$ is constant across schedules. This
+corresponds to measuring the system from the perspective of "how long does
+a unit of *work* wait" rather than "how long does a *task* wait." The
+unweighted statistic measures the latter and is gameable precisely because
+it counts completions rather than work.

 ---

@@ -196,44 +202,34 @@ Client satisfaction is inversely related to slowdown: a client who waits
 2x their task size is more satisfied than one who waits 20x, regardless of
 the absolute times involved.

-**Theorem 4 (SPT Maximizes Slowdown Inequality).** Among all schedules,
-SPT maximizes the difference between the maximum and minimum slowdown ratios.
+**Theorem 4 (SPT Uniquely Maximizes Completion Time of the Largest Task).**
+Among all schedules, SPT is the unique policy that assigns the maximum
+possible completion time ($\sum p_i$) to the largest task.

 **Proof.**

-Under any schedule $\sigma$, the task in position $k$ has completion time
-$C_{\sigma(k)} = \sum_{j=1}^{k} p_{\sigma(j)}$ and slowdown:
+SPT sorts tasks in ascending order of $p_i$, placing the largest task
+$p_{\max}$ in the last position. The last task in any schedule has
+completion time $\sum_{i=1}^{n} p_i$, which is the maximum completion time
+any individual task can receive. Therefore, under SPT:

-$$S_{\sigma(k)} = \frac{\sum_{j=1}^{k} p_{\sigma(j)}}{p_{\sigma(k)}}$$
+$$C_{\max\text{-task}}^{\text{SPT}} = \sum_{i=1}^{n} p_i$$

-Under SPT, the last task (position $n$) is the largest, $p_{\max}$, with:
+Under any schedule that does not place $p_{\max}$ last, the largest task
+completes strictly before $\sum p_i$. SPT is the unique schedule (among
+those ordered by processing time) that assigns this worst-case completion
+time to the largest task.

-$$S_n^{\text{SPT}} = \frac{\sum_{i=1}^{n} p_i}{p_{\max}}$$
+Note on slowdown: SPT actually *compresses* slowdown ratios ($S_i = C_i / p_i$)
+because larger tasks in later positions have large denominators that absorb
+the accumulated sum. For example, with tasks $[1, 5, 10]$:

-The first task is the smallest, $p_{\min}$, with:
+- SPT: slowdowns $[1, 1.2, 1.6]$ — low variance
+- LPT: slowdowns $[1, 3, 16]$ — high variance

-$$S_1^{\text{SPT}} = \frac{p_{\min}}{p_{\min}} = 1$$
-
-The slowdown range under SPT is:
-
-$$\Delta S^{\text{SPT}} = \frac{\sum p_i}{p_{\max}} - 1$$
-
-Now consider the reverse schedule (Longest Processing Time first, LPT).
-The largest task goes first with slowdown 1. The smallest task goes last:
-
-$$S_n^{\text{LPT}} = \frac{\sum p_i}{p_{\min}}, \quad S_1^{\text{LPT}} = 1$$
-
-While LPT has a larger maximum slowdown, its minimum is also 1. The critical
-difference is *which clients* suffer. Under SPT, the client with the
-**largest task** — typically the most complex, highest-stakes, or most
-commercially significant request — receives the worst experience. Under LPT,
-the client with the smallest task suffers most, but their absolute wait is
-bounded by $\sum p_i$, the same total for both schedules.
-
-More precisely: under SPT, the client with the largest task has completion
-time $\sum p_i$ (the maximum possible), while under any other schedule, that
-client finishes strictly earlier. SPT **uniquely minimizes the satisfaction
-of the highest-effort client**. $\blacksquare$
+SPT's harm to large-task clients is not visible in the slowdown ratio. It is
+visible in **absolute completion time**: the largest task finishes last, at
+$\sum p_i$, while under any other ordering it finishes earlier. $\blacksquare$

 **Corollary 4.1.** A team optimizing unweighted mean completion time will
 systematically deliver the worst experience to clients with the most
@@ -244,42 +240,40 @@ The only way to lower the unweighted average is to complete more small tasks
 early, which necessarily means completing large tasks later. The metric
 improves *because* high-effort clients are deprioritized.

-### 7.2 The Fairness Benchmark: Proportional Slowdown
+### 7.2 The Absolute Delay Burden

-A **fair** schedule is one where all clients experience equal slowdown:
+The slowdown ratio $S_i = C_i / p_i$ might suggest SPT is *fair* — it
+compresses slowdown variance by giving everyone a ratio close to 1. But
+this obscures the real cost. The correct measure of burden is the
+**absolute delay** experienced by each task:

-$$S_i = S_j \quad \forall \, i, j$$
+$$\Delta_i = C_i - p_i$$

-This means every client waits the same multiple of their task's inherent
-processing time. A 1-hour task might wait 2 hours; a 10-hour task waits 20
-hours. The ratio is the same.
+This is the time a task spends waiting for other tasks, independent of its
+own size. Under any sequential schedule, the total delay across all tasks
+is schedule-dependent (it equals $\sum C_i - \sum p_i$), and SPT minimizes
+this total. But the *distribution* of delay matters.

-**Theorem 5 (Proportional Scheduling).** The unique schedule achieving equal
-slowdown for all tasks is to order tasks so that each task's completion time
-is proportional to its processing time:
+**Theorem 5 (SPT Concentrates Delay on the Largest Task).** Under SPT, the
+largest task bears more absolute delay than under any other schedule.

-$$C_i = S \cdot p_i \quad \text{where } S = \frac{\sum p_i}{\sum p_i} \cdot \frac{\sum_{j} p_j}{p_i} \text{ ... }$$
+**Proof.** Under SPT, the largest task is in position $n$ with:

-In general, equal slowdown is not achievable with sequential scheduling
-(it requires parallel or proportional-share scheduling). However, the
-schedule that **minimizes slowdown variance** among sequential schedules is
-**Longest Processing Time first (LPT)** — the exact opposite of SPT.
+$$\Delta_{\max\text{-task}}^{\text{SPT}} = C_n - p_n = \sum_{i=1}^{n-1} p_i$$

-**Proof sketch.** Under LPT, large tasks go first and receive slowdown
-close to 1. Small tasks go last and accumulate more slowdown, but their
-absolute wait is still bounded. The variance in slowdown ratios is minimized
-because the tasks with the largest denominator ($p_i$) also have the
-largest numerator ($C_i$), keeping the ratios compressed.
+This is the sum of all other tasks' processing times — the maximum possible
+delay for any single task. Under any schedule where the largest task is not
+last, its delay is strictly less than $\sum_{i \ne \max} p_i$.

-Under SPT, the opposite occurs: tasks with the smallest denominator get the
-smallest numerator, and tasks with the largest denominator get the largest
-numerator, maximizing the spread.
+Meanwhile, SPT gives the smallest task zero delay ($\Delta_1^{\text{SPT}} = 0$).
+The entire queuing burden is shifted from small tasks to large tasks.
+$\blacksquare$

-Formally, for any two schedules $\sigma_1$ (SPT) and $\sigma_2$ (LPT):
-
-$$\text{Var}(S^{\text{SPT}}) \ge \text{Var}(S^{\text{LPT}})$$
-
-with equality only when all $p_i$ are equal. $\blacksquare$
+The tension is this: SPT minimizes total delay (good for aggregate
+efficiency) by concentrating delay onto the tasks best able to "absorb" it
+in slowdown-ratio terms. But in absolute terms — hours spent waiting — the
+largest task bears the full weight. If that task represents a critical
+business need, the absolute delay, not the ratio, determines the damage.

 ### 7.3 Productivity Is Not Improved

@@ -318,9 +312,9 @@ Combining Theorems 4, 5, and 6:
 | Measure | Effect of optimizing unweighted mean |
 |---------|--------------------------------------|
 | Throughput (work/time) | No change (Theorem 6) |
-| Client satisfaction for small tasks | Improves |
-| Client satisfaction for large tasks | **Worsens maximally** (Theorem 4) |
-| Satisfaction equity across clients | **Worsens maximally** (Theorem 5) |
+| Delay for small tasks | Minimized — approaches zero (SPT) |
+| Delay for large tasks | **Maximized** — bears all queuing burden (Theorem 5) |
+| Completion time of largest task | **Maximum possible**: $\sum p_i$ (Theorem 4) |
 | Overall perceived quality of service | **Net negative** (see below) |

 The net effect on perceived quality is negative because:
@@ -346,10 +340,10 @@ The net effect on perceived quality is negative because:
 size, adopting unweighted mean completion time as a performance metric:

 (a) Provides **zero productivity gain** (Theorem 6), while
-(b) **Maximally degrading satisfaction** for clients with the largest tasks
+(b) **Assigning the maximum possible completion time** to the largest task
    (Theorem 4), and
-(c) **Maximally increasing inequality** in service quality across clients
-    (Theorem 5).
+(c) **Concentrating all queuing delay** onto the largest tasks while
+    eliminating delay for the smallest (Theorem 5).

 This is not a tradeoff — there is no compensating benefit on the productivity
 side. The metric creates a pure transfer of service quality from high-effort
@@ -475,54 +469,48 @@ $$D(\sigma) = \sum_{i=1}^{n} w(q_i) \cdot C_i$$

 This measures the total business-impact-weighted time spent waiting.

-**Theorem 10 (SPT Maximizes Priority-Weighted Delay in the Worst Case).**
-Among all schedules, SPT produces the highest priority-weighted delay cost
-when high-priority tasks are large and low-priority tasks are small.
+**Theorem 10 (SPT and Priority-Weighted Delay Cost).**
+The optimal schedule for minimizing priority-weighted delay cost $D(\sigma)$
+is WSJF: order by $w(q_i)/p_i$ descending. SPT's ordering — by $1/p_i$
+descending — ignores priority entirely and produces higher $D$ than
+priority-respecting alternatives when priority is correlated with task size.

-**Proof.** Consider the worst case: all Critical ($q = 1$) tasks have
-processing time $p_H$ and all Low ($q = 4$) tasks have processing time
-$p_L$, with $p_H > p_L$. Let there be $n_H$ critical tasks and $n_L$ low
-tasks, $n = n_H + n_L$.
+**Proof.** By the standard exchange argument (as in Theorem 1), swapping
+adjacent tasks $i, j$ in a schedule changes $D$ by:

-SPT places all $n_L$ low tasks first, then all $n_H$ critical tasks.
+$$\Delta D = w(q_j) \cdot p_i - w(q_i) \cdot p_j$$

-The priority-weighted delay cost under SPT:
+The swap improves $D$ when $\Delta D > 0$, i.e., when $w(q_j)/p_j > w(q_i)/p_i$
+but $j$ is scheduled after $i$. Therefore the optimal order is decreasing
+$w(q_i)/p_i$ — this is the WSJF rule.

-$$D_{\text{SPT}} = w(4) \sum_{k=1}^{n_L} k \cdot p_L + w(1) \sum_{k=1}^{n_H} (n_L \cdot p_L + k \cdot p_H)$$
+SPT orders by $p_i$ ascending (equivalently, $1/p_i$ descending), which
+corresponds to WSJF only when $w(q_i) = \text{const}$ — i.e., when all
+tasks have equal priority.

-$$= 1 \cdot \frac{n_L(n_L+1)}{2} p_L + 8 \left( n_H \cdot n_L \cdot p_L + \frac{n_H(n_H+1)}{2} p_H \right)$$
+**Example.** Two tasks: Critical ($w = 8$, $p_H = 10$) and Low ($w = 1$, $p_L = 1$).

-Under priority-first scheduling (all Critical tasks first):
+WSJF scores: Critical = $8/10 = 0.8$, Low = $1/1 = 1.0$.

-$$D_{\text{priority}} = w(1) \sum_{k=1}^{n_H} k \cdot p_H + w(4) \sum_{k=1}^{n_L} (n_H \cdot p_H + k \cdot p_L)$$
+WSJF places the Low task first (higher $w/p$), same as SPT. Here, SPT and
+WSJF agree because the Low task's tiny size dominates despite its low weight.

-$$= 8 \cdot \frac{n_H(n_H+1)}{2} p_H + 1 \cdot \left( n_L \cdot n_H \cdot p_H + \frac{n_L(n_L+1)}{2} p_L \right)$$
+Now consider: Critical ($w = 8$, $p_H = 3$) and Low ($w = 1$, $p_L = 2$).

-The difference $D_{\text{SPT}} - D_{\text{priority}}$ simplifies. The critical
-cross-terms are:
+WSJF scores: Critical = $8/3 = 2.67$, Low = $1/2 = 0.5$.

- SPT charges $8 \cdot n_H \cdot n_L \cdot p_L$ for Critical tasks waiting
-  behind Low tasks.
- Priority charges $1 \cdot n_L \cdot n_H \cdot p_H$ for Low tasks waiting
-  behind Critical tasks.
+WSJF places Critical first. SPT places Low first (smaller $p$). The costs:

-Since $w(1) = 8$ and $w(4) = 1$:
+- SPT (Low first): $D = 1 \cdot 2 + 8 \cdot 5 = 42$
+- WSJF (Critical first): $D = 8 \cdot 3 + 1 \cdot 5 = 29$

-$$D_{\text{SPT}} - D_{\text{priority}} = n_H \cdot n_L \cdot (8 p_L - p_H) + n_H \cdot n_L \cdot (p_H - 8 p_L)$$
+SPT incurs 45% more priority-weighted delay because it ignores the 8x
+priority weight of the Critical task.

-Wait — let me compute this more carefully. The cross-term in SPT is the
-cost of all Critical tasks being delayed by all Low tasks:
-
-$$\Delta_{\text{cross}} = w(1) \cdot n_H \cdot n_L \cdot p_L - w(4) \cdot n_L \cdot n_H \cdot p_H$$
-$$= n_H \cdot n_L \cdot (8 p_L - p_H)$$
-
-When $p_H > 8 p_L$, the priority-first schedule wins on *both* the
-priority-weighted metric and unweighted metric — SPT is Pareto-dominated.
-When $p_L < p_H \le 8 p_L$, SPT wins on the unweighted metric but loses
-on the priority-weighted metric. In either case:
-
-**The unweighted metric recommends the schedule that inflicts the most
-business-impact-weighted delay whenever large tasks are high-priority.** $\blacksquare$
+In general, SPT diverges from WSJF — and produces suboptimal $D$ — whenever
+priority and task size are not perfectly inversely correlated. In practice,
+Critical tasks tend to be larger (outages, security incidents), making the
+divergence systematic rather than occasional. $\blacksquare$

 ---

@@ -633,7 +621,7 @@ Consider an IT team with the following ticket queue on a Monday morning:
 | 8 | T1 (email) | P1 Crit | 6 | 18.75 | 3.125 |

 - **Unweighted mean completion:** $(0.25 + 0.75 + 1.75 + 3.75 + 5.75 + 8.75 + 12.75 + 18.75) / 8 = 6.5625$ hours
- **PWCT:** $(1 \cdot 0.25 + 1 \cdot 0.75 + 2 \cdot 1.75 + 2 \cdot 3.75 + 4 \cdot 5.75 + 8 \cdot 8.75 + 4 \cdot 12.75 + 8 \cdot 18.75) / 30 = 9.225$ hours
+- **PWCT:** $(1 \cdot 0.25 + 1 \cdot 0.75 + 2 \cdot 1.75 + 2 \cdot 3.75 + 4 \cdot 5.75 + 8 \cdot 8.75 + 4 \cdot 12.75 + 8 \cdot 18.75) / 30 = 306/30 = 10.2$ hours
 - Email server is down for **18.75 hours**. Database backups fail for **8.75 hours**.

 **WSJF order (optimizing PWCT by $w(q)/p$ descending):**
@@ -669,7 +657,7 @@ priority class ordering and only applying WSJF *within* priority classes.
 | 8 | T4 (wallpaper) | P4 Low | 0.5 | 18.75 |

 - **Unweighted mean completion:** $(3 + 9 + 11 + 15 + 16 + 18 + 18.25 + 18.75) / 8 = 13.625$ hours
- **PWCT:** $(8 \cdot 3 + 8 \cdot 9 + 4 \cdot 11 + 4 \cdot 15 + 2 \cdot 16 + 2 \cdot 18 + 1 \cdot 18.25 + 1 \cdot 18.75) / 30 = 6.633$ hours
+- **PWCT:** $(8 \cdot 3 + 8 \cdot 9 + 4 \cdot 11 + 4 \cdot 15 + 2 \cdot 16 + 2 \cdot 18 + 1 \cdot 18.25 + 1 \cdot 18.75) / 30 = 305/30 = 10.167$ hours
 - Email server restored in **9 hours**. Backups fixed in **3 hours**.

 ### Comparison
@@ -677,32 +665,54 @@ priority class ordering and only applying WSJF *within* priority classes.
 | Metric | SPT | Practical WSJF | Winner |
 |--------|-----|----------------|--------|
 | Unweighted mean completion | **6.5625 hrs** | 13.625 hrs | SPT |
-| Priority-weighted completion (PWCT) | 9.225 hrs | **6.633 hrs** | WSJF |
+| Priority-weighted completion (PWCT) | 10.2 hrs | **10.167 hrs** | WSJF |
 | Time to fix email server | 18.75 hrs | **9 hrs** | WSJF |
 | Time to fix database backups | 8.75 hrs | **3 hrs** | WSJF |
 | Time to fix printers | 5.75 hrs | **11 hrs** | SPT |
 | Time to update wallpaper | **0.75 hrs** | 18.75 hrs | SPT |

-SPT wins the unweighted metric by completing wallpaper policies and folder
-archives first. WSJF wins every metric that accounts for business impact.
+The PWCT values are nearly identical (10.2 vs 10.167) because PWCT — as a
+*weighted average of completion times* — is dampened by the fact that total
+work is constant. **PWCT is not the right metric for this comparison.** The
+real difference is visible in the individual completion times of critical
+tasks: the email server is down for 18.75 hours under SPT versus 9 hours
+under WSJF. The database backups fail for 8.75 hours versus 3 hours.

-The unweighted metric would report that the SPT team is **more than twice
-as efficient** (6.56 vs 13.63), when in reality the SPT team left a critical
-email outage burning for nearly an entire business day while updating desktop
-wallpaper.
+The better comparison metric is the **priority-weighted delay cost**
+$D = \sum w(q_i) \cdot C_i$ (not normalized):
+
+- SPT: $D = 306$ priority-weighted hours
+- Practical WSJF: $D = 305$ priority-weighted hours
+
+Again, the aggregate is similar. The damage from SPT is not in the
+aggregate — it is in the *distribution*: critical systems burn while
+cosmetic tasks are polished. A metric that cannot distinguish between these
+two schedules — despite one leaving the email server down for twice as long
+— is not measuring what matters.
+
+The unweighted metric, however, confidently reports SPT as **more than twice
+as efficient** (6.56 vs 13.63), rewarding the team that updated desktop
+wallpaper while the email server was on fire.

 ### 10.5 Recommended Metric Suite

-No single metric suffices. A complete measurement system for a priority-based
-team should track:
+The IT example reveals that even priority-weighted aggregate metrics (PWCT)
+can fail to distinguish good from bad schedules, because aggregation hides
+distributional damage. No single metric suffices. A complete measurement
+system for a priority-based team should track:

 | Metric | What it measures | Formula |
 |--------|-----------------|---------|
-| **PWCT** | Business-impact-weighted responsiveness | $\sum w(q_i) C_i / \sum w(q_i)$ |
+| **Mean completion by priority class** | Per-class responsiveness | $\bar{C}$ filtered by $q$ |
 | **P1 mean time to resolution** | Critical incident response | $\bar{C}$ filtered to $q = 1$ |
 | **Throughput** | Raw work capacity | Work-hours completed / calendar time |
 | **Aging violations** | Starvation prevention | Count of tasks exceeding SLA by priority |
-| **Slowdown by priority class** | Equity across task types | $\bar{S}$ grouped by $q$ |
+| **Max completion time (P1/P2)** | Worst-case critical response | $\max(C_i)$ filtered to $q \le 2$ |
+
+The key insight from our analysis: **per-priority-class metrics** (rows 1-2,
+5) expose scheduling failures that aggregate metrics hide. If P1 mean time
+to resolution is 14 hours while P4 mean is 0.5 hours, the team is
+optimizing the wrong metric — regardless of what the aggregate says.

 ---

@@ -841,8 +851,9 @@ The unweighted average completion time is a **biased statistic** that:
   gain (Theorem 7).
 5. **Actively contradicts priority systems** by carrying zero information
   about business-impact classification (Theorem 9).
-6. **Maximizes priority-weighted delay** in the most common real-world
-   scenario where high-priority tasks are large (Theorem 10).
+6. **Ignores priority entirely** in its scheduling recommendation,
+   producing suboptimal priority-weighted delay whenever priority and
+   size are not perfectly inversely correlated (Theorem 10).

 A metric that can be improved by reordering work — without doing any
 additional work — is measuring the scheduling policy, not the system's