Hoare Partition, one of the simplest and most beautiful algorithms
Tony Hoare invented QuickSort in 1961. At the time of its publication, the best comparison-based sorting algorithm was merge sort. Merge sort divides an unordered array into two equally sized subarrays, sorts each subarray, and then merge the two subarrays to produce a sorted array. Merge sort is simple to understand. However, quicksort is just as simple as merge sort but more elegant. In quicksort, there is no requirement for the two subarrays to be of equal size, and there is no merging step. The beauty of quicksort doesn't stop here. In my opinion, the profound beauty of quicksort is its partitioning process, which splits an unordered array into two unordered subarrays with the guarantee every element in one subarray is smaller than every element in the other subarray.
1// partition a[low, high] to a left part and right part such that,
2// numbers in the left are less than or equal to numbers in the right part.
3// Return the index of the last element of the left part.
4// Both low and high indexes are inclusive.
5func partition(a []int, low int, high int) int {
6 // optional: pick a random element as the split value.
7 var randomPivot = rand.Int()%(high-low+1) + low
8 a[low], a[randomPivot] = a[randomPivot], a[low]
9
10 i, j, pivot := low-1, high+1, a[low]
11 for {
12 for i++; a[i] < pivot; i++ {
13 }
14 for j--; a[j] > pivot; j-- {
15 }
16 if i >= j {
17 return j
18 }
19 a[i], a[j] = a[j], a[i]
20 }
21}
The partitioning code is clean and short; however, its correctness is not immediately obvious.
Why is there no boundary check of i
and j
? Why return j
instead of i
?
Why is the outer loop guaranteed to exit? Answering these questions helps one appreciate the beauty of
Hoare's ingenious invention. In the following sections, I will prove an invariant to demonstrate
the correctness of quicksort.
Invariant. At the beginning of each outer loop, $i<j$ and $A[low, i] \le pivot \le A[j, high]$.
Prove by induction: Let k be the round of outer loop. Let $i_{k}$, $j_{k}$ denote the value of $i$ and $j$ at the beginning of $k$-th round. When k=0, $i_0$=low-1, $j_0$=high+1, thus $A[low, i] \le pivot \le A[j, high]$ because both ranges are empty. Assume the invariant holds for k, we will prove it holds for k+1.
The inner loop at line 11 increases i as long as A[i] is smaller than pivot. The inner loop at line 13 decreases j as long as A[j] is larger than pivot. Therefore, $A[i_k+1, i_{k+1}-1] < pivot < A[j_{k+1}+1, j_{k}-1]$ and $A[i_{k+1}] \ge pivot \ge A[j_{k+1}]$. Line 18 then swaps $A[i_{k+1}]$ and $A[j_{k+1}]$ so we can extend the inequality to $A[i_k+1, i_{k+1}-1] \le pivot \le A[j_{k+1}+1, j_{k}-1]$. Combining it with the the induction assumption, we have $A[low, i_{k+1}] \le pivot \le A[j_{k+1}, high]$. Note that the condition at line 15 is false according to the induction assumption that there is a k+1 round.
With the invariant proved, let's answer the questions asked previously.
Q1. Why is there no boundary check of i
and j
?
Because $pivot \le A[j_{k}, high]$, the condition of $A[i] < pivot$ must not hold for indexes in $[j_{k}, high]$.
This means the index i must not go beyond high.
Q2. Why return j
instead of i
?
According to the invariant, at the line 15 of $(k+1)$-th round, $A[i_{k+1}] \ge pivot \ge A[j_{k+1}]$.
Since we return the last index of the smaller part, we should return the index that is less or equal
to the pivot, which is $j_{k+1}$.
Q3. Why the outer loop guaranteed to exit?
In each inner loop, i must increase at least 1 and j must decrease by at least 1. Therefore, the condition
$i \ge j$ does not hold after at most (high-low+1)/2 loops.
I hope you enjoy the beauty of the core of quicksort, the partitioning process, as much as I do. Hoare is a remarkable computer scientist who also introduced the Communicating Sequential Processes (CSP) in 1978. CSP is a language that describe concurrency, and the Go programming language follows the CSP model.