CSC 2300 Data Structures & Algorithms March 23, 2007 Chapter 7. Sorting
Today – Sorting Quicksort – Algorithm Pivot Analysis Worst Case Best Case Average Case
Quicksort – Algorithm 1. If the number of elements in S is 0 or 1, then return. 2. Pick any element v in S. This is called the pivot. 3. Partition S – {v} into two disjoint groups: S 1 = { x ε S – {v} | x ≤ v} and S 2 = { x ε S – {v} | x ≥ v}. 4. Return { quicksort(S 1 ) followed by v followed by quicksort(S 2 )}.
Quicksort – Example
Quicksort – Partition Strategy Example. Input: 8, 1, 4, 9, 6, 3, 5, 2, 7, 0. Say 6 is chosen as pivot ijpivot ij ij ij ij ij ij ij ij jipivot pivot
Choices of Pivot Four suggestions: First element of array; Larger of first two distinct elements of array; Middle element of array; Randomly. What do you think about these choices? All bad choices. Why?
Good Choice of Pivot Best choice: median of array. Disadvantage? Practical choice: Median of Three. What is it? Median of left, right, and center elements. Example: 8, 1, 4, 9, 6, 3, 5, 2, 7, 0. Median of 8, 6, and 0.
Example Example: 8, 1, 4, 9, 6, 3, 5, 2, 7, 0. Pivot = Median of 8, 6, and 0. What should new array look like? Recall what we have done: ij pivot Can we do better? i pivotj Where should we move pivot? ij pivot
Median-of-Three Code
Quicksort – Analysis Quicksort is recursive. We thus get a recurrence formula: T(0) = T(1) = 1, T(N) = T(i) + T(N – i – 1) + cN, where i denotes the number of elements in S 1. What value of i gives worst case? What value of i gives best case?
Worst Case Analysis We have i = 0, always. What does that say about the pivot? Always the smallest element. Recurrence becomes T(N) = T(0) + T(N – 1) + cN. Ignore T(0), and get T(N) = T(N – 1) + cN. Hence T(N – 1) = T(N – 2) + c(N – 1), T(N – 2) = T(N – 3) + c(N – 2), … T(2) = T(1) + c(2). We get T(N) = T(1) + c ∑ i = 1 + c [ N(N+1)/2 – 1] = O(N 2 ).
Best Case Analysis We have i = N/2, always. What does that say about the pivot? Always the median. Recurrence becomes T(N) = T(N/2) + T(N/2) + cN = 2 T(N/2) + cN. Do you remember how to solve this recurrence? Divide by N to get T(N)/N = T(N/2)/(N/2) + c. Thus, T(N/2)/(N/2) = T(N/4)/(N/4) + c, T(N/4)/(N/4) = T(N/8)/(N/8) + c, … T(2)/2= T(1)/1 + c. We get T(N)/N= T(1)/1 + c logN, and so T(N)= N + c N logN = O(N log N).
Average Case Analysis Always much harder than worst and best cases. What can we assume about the pivot? Assume that each of the sizes for S 1 is equally likely and thus has probability 1/N. The average value of T(i) is thus (1/N) ∑ T(j). What can we say about the value of T(N – i – 1)? Recurrence becomes T(N) = (2/N) ∑ T(j) + cN. Does this recurrence look familiar? When we did an internal path length analysis in Chapter 4 (Trees).
Average Case Analysis Recurrence: T(N) = (2/N) ∑ T(j) + cN. How can we solve this recurrence? Divide by N? No, multiply by N! We get this recurrence: N T(N) = 2 ∑ T(j) + cN 2. How do we get rid of the ∑ T(j) ? We use this recurrence: (N – 1)T(N – 1) = 2 ∑ T(j) + c(N – 1) 2. Subtracting one recurrence from the other, we get NT(N) – (N – 1)T(N – 1) = 2 T(N – 1) + c(2N – 1). Simplifying and dropping the c term, we get NT(N) = (N+1) T(N – 1) + 2cN.
Recurrence Recurrence: NT(N) = (N+1) T(N – 1) + 2cN. How can we solve this recurrence? Divide by N? Divide by N+1? No, divide by N(N+1)! We get this recurrence: T(N)/(N+1)= T(N – 1)/N + 2c/(N+1). What to do now? We can telescope: T(N – 1)/N= T(N – 2)/(N – 1) + 2c/N, T(N – 2)/(N – 1)= T(N – 3)/(N – 2) + 2c/(N – 1), … T(2)/3= T(1)/2 + 2c/3. We get this solution: T(N)/(N+1)= T(1)/2 + 2c ∑ (1/i). What does ∑ (1/i) equal? We get T(N) = O(N log N).