Download presentation
Presentation is loading. Please wait.
Published byMargaretMargaret Gibson Modified over 9 years ago
1
1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II
2
2 Finding the Median Last time, we showed that we can find the k th order statistic (i.e. the k th smallest element) in (n) time, by repeatedly finding the minimum and discarding it. How long will it take to find the median using this strategy? Note that the position of the median (n/2) increases as n increases. T(n) = ? Conclusion: This method does not work as well for finding the median. Larger values of k take longer to find (although the order of growth is the same). Can we do better?
3
3 Randomized-Select Randomized-Select(A, lo, hi, i){Find the ith order statistic between lo and hi} if lo = hi then return A[lo] split Randomized-Partition(A, lo, hi) length (split - lo) + 1 if i <= length then return Randomized-Select(A, lo, split, i) else return Randomized-Select(A, split+1, hi, i-length) Idea: Partition the array as in Quick-sort. Recursively search the appropriate partition for the k th element.
4
4 Example A 17 6 34 18 9 5 11 22 28 2 Find the 3rd order statistic: Randomized-Select(A, 1, 10, 3) 1 2 3 4 5 6 7 8 9 10
5
5 Running time of Randomized- Select Worst Case: As with QuickSort, we can get unlucky and partition the array into two pieces of size 1 and n-1, with the ith statistic in the larger side. T(n) = T(n-1) + n = (n 2 ) cost of partition A good case: Partition into two equal parts: T(n) = T(n/2) + n (We will work this one out in class). Average case: Can show that T(n) <= cn, so T(n) = O(n)
6
6 Selection in Worst case linear time To make a selection in worst case linear time, we want to use an algorithm that guarantees a good split when we partition. To do this, we use the "median of median of c" algorithm. To start, we pick c, an integer constant >= 1. We write our input array, A, as a 2-D array with c rows, n/c columns. (If n/c is not an integer, we can pad the array with large numbers that won't change the result). Sort the columns of this new, 2D array.
7
7 Example A=[43, 5, 17, 91, 2, 42, 19, 72, 37, 3, 7, 15, 0, 63, 51, 73, 6, 30, 62, 10, 24, 26, 25, 28, 29]n = 25 Choose c = 5 Sort each column: B[1..c, 1..n/c] = B[1..5, 1..5] After sorting, the median row contains the median of each column. Sorting the columns takes (c 2 (n/c)) = (n) time.
8
8 Median-of-median-of-c continued We now call the Median-of-median-of-c algorithm again, on the single median row of B, with the same value of c as before. Write median row as B' = [17, 37, 15, 30, 26] Write B' as 2D array, with c= 5 rows and n/c = 1 column: Sort columns: Value at the middle row is mm, the median of medians. We use this as our pivot for the partition.
9
9 Showing that it gives a good split We can show that at least 1/4 of the elements are less than mm and at least 1/4 of the elements are greater than mm by imagining that the columns of B are sorted by the value of each median. (Note: we only imagine it, we don't actually do it). At least 1/4 are less than 26 At least 1/4 are greater than 26
10
10 Partitioning Partition A using mm = 26 as the pivot. Use a partition that keeps mm in the high part of the partition: "low" = 2, 5, 17, 3, 19, 0, 7, 15, 6, 10, 24, 25(12 items) "high" = 26, 43, 91, 37, 42, 72, 51, 63, 30, 62, 73, 28, 29(13 items) If the number of items in the low part of the partition = k, and the order statistic we are looking for is given by i, then if i <= k, iterate the entire procedure on the lower partition if i > k, iterate on the higher partition (looking for (i - k) th element).
11
11 Running time T(n) = (n) + T(n/c) + T(3n/4) + (n) Cost of sorting columns Cost of finding m-of-m-of-c on median row of B Worst case split. Cost of partition T(n) = T(n/c) + T(3n/4) + (n) We can show that T(n) = (n) for c >=5
12
12 Benefits of M-of-M-of-c Good order statistic algorithm Can use this with other algorithms. For example, we can use it with QuickSort to guarantee a good split and an nlgn order of growth. The linear time is not the result of constraining the problem (as we did with counting-sort). It is a comparison-based method!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.