Hoare Partition, one of the simplest and most beautiful algorithms

Jun 17, 2024 · 4 min read · Algorithms ·

Share on:

Tony Hoare invented QuickSort in his 1961 paper, QuickSort. At the time of its publication, the best comparison-based sorting algorithm was merge sort. Merge sort divides an unordered array into two equally sized subarrays, sorts each subarray, and then merge the two subarrays to produce a sorted array. Merge sort is simple to understand. However, quick sort is just as simple as merge sort but more elegant. In quicksort, there is no requirement for the two subarrays to be of equal size, and there is no merging step. The beauty of quicksort doesn't stop here. In my opinion, the profound beauty of quicksort is its partitioning process, which
splits an unordered array into two unordered subarrays with the guarantee every element in one subarray is smaller than any element in the other subarray.

 1// partition a[low, high] to a left part and right part such that,
 2// all numbers in the left are less than or equal to numbers in the right part.
 3// Return the index of the last element of the left part.
 4func partition(a []int, low int, high int) int {
 5  // optional: pick a random element as the split value.
 6  var randomPivot = rand.Int()%(high-low+1) + low
 7  a[low], a[randomPivot] = a[randomPivot], a[low]
 8
 9  i, j, pivot := low-1, high+1, a[low]
10  for {
11    for i++; a[i] < pivot; i++ {
12    }
13    for j--; a[j] > pivot; j-- {
14    }
15    if i >= j {
16      return j
17    }
18    a[i], a[j] = a[j], a[i]
19  }
20}

The partitioning code is clean and short; however, its correctness is not immediately obvious.
Why is there no boundary check of i and j? Why return j instead of i? Why is the outer loop guaranteed to exit? Answering these questions helps one appreciate the beauty of Hoare's ingenious invention. In the following sections, I will prove an invariant to demonstrate the correctness of quicksort.

Invariant. At the beginning of each outer loop, $i<j$ and $A[low, i] \le pivot \le A[j, high]$.

Prove by induction: Let k be the round of outer loop. Let $i_{k}$, $j_{k}$ denote the value of $i$ and $j$ at the beginning of $k$-th round. When k=0, $i_0$=low-1, $j_0$=high+1, thus $A[low, i] \le pivot \le A[j, high]$ because both ranges are empty. Assume the invariant holds for k, we will prove it holds for k+1.

The inner loop at line 11 increases i as long as A[i] is smaller than pivot. The inner loop at line 13 decreases j as long as A[j] is larger than pivot. Therefore, $A[i_k+1, i_{k+1}-1] < pivot < A[j_{k+1}+1, j_{k}-1]$ and $A[i_{k+1}] \ge pivot \ge A[j_{k+1}]$. Line 18 then swaps $A[i_{k+1}]$ and $A[j_{k+1}]$ so we can extend the inequality to $A[i_k+1, i_{k+1}-1] \le pivot \le A[j_{k+1}+1, j_{k}-1]$. Combining it with the the induction assumption, we have $A[low, i_{k+1}] \le pivot \le A[j_{k+1}, high]$. Note that the condition at line 15 is false according to the induction assumption that there is a k+1 round.

With the invariant proved, let's answer the questions asked previously.

Q1. Why is there no boundary check of i and j?
Because $pivot \le A[j_{k}, high]$, the condition of $A[i] < pivot$ must not hold for indexes in $[j_{k}, high]$.
This means the index i must not go beyond high.

Q2. Why return j instead of i?
According to the invariant, at the line 15 of $(k+1)$-th round, $A[i_{k+1}] \ge pivot \ge A[j_{k+1}]$. Since we return the last index of the smaller part, we should return the index that is less or equal to the pivot, which is $j_{k+1}$.

Q3. Why the outer loop guaranteed to exit?
In each inner loop, i must increase at least 1 and j must decrease by at least 1. Therefore, the condition $i \ge j$ does not hold after at most (high-low+1)/2 loops.

QuickSort has been one of the best comparison-based sorting algorithms since Tony Hoare invented it in 1961. I hope you enjoy the beauty of the core of quicksort, the partitioning process, as much as I do. Hoare is a remarkable computer scientist who also introduced the Communicating Sequential Processes (CSP) in 1978. CSP is a language that describe concurrency, and the Go programming language follows the CSP model.