1 Selection --Medians and Order Statistics (Chap. 9) The ith order statistic of n elements S={a 1, a 2,…, a n } : ith smallest elements Also called selection problem Minimum and maximum Median, lower median, upper median Selection in expected/average linear time Selection in worst-case linear time
2 O(nlg n) Algorithm Suppose n elements are sorted by an O(nlg n) algorithm, e.g., MERGE-SORT –Minimum: the first element –Maximum: the last element –The ith order statistic: the ith element. –Median: If n is odd, then ((n+1)/2)th element. If n is even, –then ( (n+1)/2 )th element, lower median –then ( (n+1)/2 )th element, upper median All selections can be done in O(1), so total: O(nlg n). Can we do better?
3 Selection in Expected Linear Time O(n) Select ith element A divide-and-conquer algorithm RANDOMIZED- SELECT Similar to quicksort, partition the input array recursively Unlike quicksort, which works on both sides of the partition, just work on one side of the partition. –Called prune-and-search, prune one side, just search the other side). (Please review or read quicksort in chapter 7.)
4 RANDOMIZED-SELECT(A,p,r,i) 1.if p=r then return A[p] 2.q RANDOMIZED-PARTITION(A,p,r) 3.//the q holds for A[p,q-1] A[q] A[q+1,r] 4.k q-p+1 5.if i=k then return A[q] 6. else if i<k 7. then return RANDOMIZED-SELECT(A,p,q-1,i) 8. else return RANDOMIZED-SELECT(A,q+1,r,i-k)
5 Analysis of RANDOMIZED-SELECT Worst-case running time (n 2 ), why??? it may be unlucky and always partition into A[q], an empty side and a side with remaining elements. So every partitioning of m elements will take (m) time, and m=n,n-1,…,2. Thus total is (n)+ (n-1)+…+ (2)= (n(n+1)/2-1) = (n 2 ). Moreover, no particular input elicits the worst-case behavior, Because of “randomness”. But in average, it is good. By using probabilistic analysis/random variable, it can be proven that the expected running time is O(n). (ref. to page 187). Can we do better, such that O(n) in worst case??
6 Selection in worst case linear time O(n) Select the ith smallest element of S={a 1, a 2,…, a n } Use so called prune-and-search technique: –Let x S, and partition S into three subsets –S 1 ={a j | a j x} –If | S 1 |>i, search ith smallest element in S 1 recursively, (prune S 2 and S 3 away) –Else If | S 1 |+| S 2 |>i, then return x (the ith smallest element) –Else search (i-(| S 1 |+| S 2 |))th in S 3 recursively, (prune S 1 and S 2 away) The question is how to select x such that S 1 and S 3 are nearly equal. ?
7 The Way to Select x Divide elements into n/5 groups of 5 elements each. Find the median of each group Find the median of the medians At least (3n/10)-6 elements >x At least (3n/10)-6 elements <x Because each of 1/2 n/5 -2 groups contributes 3 elements which are x
8 SELECT ith Element in n Elements) 1.Divide n elements into n/5 groups of 5 elements. 2.Find the median of each group. 3.Use SELECT recursively to find the median x of the above n/5 medians. 4.Partition n elements around x into S 1, S 2, and S 3. 5.If |S 1 |>i, search ith smallest element in S 1 recursively, Else If |S 1 |+|S 2 |>i, then return x (the ith smallest element) Else search (i-(|S 1 |+|S 2 |))th in S 3 recursively,
9 Analysis of SELECT (cont.) Steps 1,2,4 take O(n), Step 3 takes T( n/5 ). Let us see step 5: –At least half of medians in step 2 are x, thus at least 1/2 n/5 -2 groups contribute 3 elements which are x. i.e, 3( 1/2 n/5 -2) (3n/10)-6. –Similarly, the number of elements x is also at least (3n/10)-6. –Thus, |S 1 | is at most (7n/10)+6, similarly for |S 3 |. –Thus SELECT in step 5 is called recursively on at most (7n/10)+6 elements. Recurrence is: –T(n)= O(1) if n< some value (i.e. 140) – T( n/5 )+T(7n/10+6)+O(n) if n the value (i.e, 140)
10 Solve recurrence by substitution Suppose T(n) cn, for some c. T(n) c n/5 + c(7n/10+6) + an – cn/5+ c + 7/10cn+6c + an – = 9/10cn+an+7c – =cn+(-cn/10+an+7c) –Which is at most cn if -cn/10+an+7c<0. –i.e., c 10a(n/(n-70)) when n>70. –So select n=140, and then c 20a. –Note: n may not be 140, any integer >70 is OK.
11 Summary Bucket sort, counting sort, radix sort: –Their running times, –Modifications The ith order statistic of n elements S={a 1, a 2,…, a n } : ith smallest elements: –Minimum and maximum. –Median, lower median, upper median Selection in expected/average linear time –Worst case running time –Prune-and-search Selection in worst-case linear time: –Why group size 5?