Presentation is loading. Please wait.

Presentation is loading. Please wait.

Medians and Order Statistics

Similar presentations


Presentation on theme: "Medians and Order Statistics"— Presentation transcript:

1 Medians and Order Statistics

2 The i th order statistic of a set of n elements is the i th smallest element.
For example, the minimum of a set of elements is the first order statistic (i == 1), and the maximum is the n-th order statistic (i == n). A median, informally, is the “halfway point” of the set.

3 We formally specify the selection problem as follows:
Solution: We can solve the selection problem in O(n lg n) time, Sort the numbers using heapsort or merge sort and then simply index the i th element in the output array. We can develop faster algorithms.

4 Finding Minimum and maximum value
We can find the minimum with O(n) comparisons as well. This is the optimal algorithm.

5 Simultaneous minimum and maximum
Find both the minimum and the maximum of a set of n elements. We can find the minimum and maximum independently, using n-1 comparisons for each, for a total of 2n -2 comparisons, The complexity is Θ(n). Compare pairs of elements from the input first with each other, and then compare the smaller with the current minimum and the larger to the current maximum, at a cost of 3 comparisons for every 2 elements

6 Selection in expected linear time
The following code for RANDOMIZED-SELECT returns the i th smallest element of the array A[p,r]. RANDOMIZED-SELECT(A,p,r,i) if p=r then return A[p] // stopping condition qRANDOMIZED-PARTITION(A,p,r) //the q holds for A[p,q-1]A[q] A[q+1,r] k q-p+1 if i=k then return A[q] else if i<k then return RANDOMIZED-SELECT(A,p,q-1,i) else return RANDOMIZED-SELECT(A,q+1,r,i-k)

7

8 Selection in worst-case linear time
The worst-case running time for RANDOMIZED-SELECT is The algorithm has a linear expected running time. A selection algorithm whose running time is O(n) in the worst case. We guarantee a good split upon partitioning the array.

9 Guaranteeing a Good Split
Does RANDOMIZED-SELECT guarantee a good split? Idea: recursively find the median of medians Divide elements into groups of 5 Find the medians of those five elements Find the median of those medians Partition around that median Your partition point will have k elements to the left, and n -k elements to the right. Make decision.

10 median-of-medians x is labeled.

11 Interesting Results At least half of the medians are greater than the median of medians This means 3 of the 5 of those are greater than x is at least This also means we only have 7n/10 elements left!

12 The Recurrence Equation
T (n) = T (n/5) + T (7n/10 + 6) + O(n) This turns out to be linear Time to find the median of medians The remaining part of the problem Dividing, finding the medians and partitioning around the median of medians.

13 Summary Finding the i-th order statistic using sorting takes (n lg n). Not necessary to sort everything Can leverage off of the PARTITION method from QUICKSORT You can guarantee a good split by finding the median of medians


Download ppt "Medians and Order Statistics"

Similar presentations


Ads by Google