Median/Order Statistics Algorithms Minimum and Maximum Selection in expected linear time Selection in worst-case linear time
Minimum and Maximum How many comparisons are sufficient to find minimum/maximum? How many comparisons are sufficient to find both minimum AND maximum? Show n + log n - 2 comparisons are sufficient to find second minimum (and minimum)
Median Problem How quickly can we find the median (or in general the kth largest element) of an unsorted list of numbers? Two approaches Quicksort partition algorithm expected Q (n) time but W(n2) time in the worst-case Deterministic Q(n) time in the worst-case
Quicksort Approach int Select(int A[], k, low, high) Choose a pivot item Determine rank of pivot element in current partition Compare all items to this pivot element If pivot is kth item, return pivot Else update low and high and recurse on partition that contains kth item
Example k=5 low high rank 17 12 6 23 19 8 5 10 1 8 17 12 6 23 19 8 5 10 1 8 6 8 5 10 17 12 23 19 5 8 4 17 12 19 23 5 6 7 12 17 found: 5
Probabilistic Analysis Assume each of n! permutations is equally likely Modify earlier indicator variable analysis of quicksort to handle this k-selection problem What is probability ith smallest item is compared to jth smallest item? If k is contained in (i..j)? If k ≤ i? If k ≥ j? 2/(j-i+1) 2/(j-k+1) 2/(k-i+1)
Cases where (i..j) do not contain k Case k ≥ j: S(i=1 to k-1)Sj = i+1 to k 2/(k-i+1) = Si=1 to k-1 (k-i) 2/(k-i+1) = Si=1 to k-1 2i/(i+1) [replace k-i with i] = 2 Si=1 to k-1 i/(i+1) ≤ 2(k-1) Case k ≤ i: S(j=k+1 to n)Si = k to j-1 2/(j-k+1) = Sj=k+1 to n (j-k) 2/(j-k+1) = Sj = 1 to n-k 2j/(j+1) [replace j-k with j and change bounds] = 2 Sj=1 to n-k j/(j+1) ≥ 2(n-k) Total for both cases is ≤ 2n-2
Case where (i..j) contains k At most 1 interval of size 3 contains k i=k-1, j=k+1 At most 2 intervals of size 4 contain k i=k-1, j=k+2 and i=k-2, j= k+1 In general, at most q-2 intervals of size q contain k Thus we get S(q=3 to n) (q-2)2/q ≤ S(q=3 to n) 2 = 2(n-2) Summing together all cases we see the expected number of comparisons is less than 4n
Best case, Worst-case Best case running time? What happens in the worst-case? Pivot element chosen is always what? This leads to comparing all possible pairs This leads to Q(n2) comparisons
Deterministic O(n) approach Need to guarantee a good pivot element while doing O(n) work to find the pivot element int Select(int A[], k, low, high) Choosing pivot element Divide into groups of 5 For each group of 5, find that group’s median Use median of the medians as pivot element Determine rank of pivot element Compare some remaining items directly to median Update low and high and recurse on partition that contains kth item (or return kth item if it is pivot)
Guarantees on the pivot element Median of medians is guaranteed to be smaller than all the red colored items Why? How many red items are there? Likewise, median of medians is guaranteed to be larger than the blue colored items Thus median of medians is in the range: What elements do we need to compare to pivot to determine its rank? How many of these are there? 3n/10 ignoring non-perfect division issues
Analysis of number of comparisons int Select(int A[], k, low, high) Choosing pivot element For each group of 5, find that group’s median Find the median of the medians Compare remaining items directly to median Recurse on correct partition Analysis Choosing pivot element c1 n/5 c1 for median of 5 Recurse on problem of size n/5 c2 n comparisons Recurse on problem of size 7n/10 T(n) =
Solving recurrence relation T(n) = T(7n/10) + T(n/5) + O(n) Key observation: 7/10 + 1/5 = 9/10 < 1 Prove T(n) ≤ cn for some constant n by induction on n T(n) = 7cn/10 + cn/5 + dn = 9cn/10 + dn Need 9cn/10 + dn ≤ cn Thus c/10 ≥ d c ≥ 10d