Presentation is loading. Please wait.

Presentation is loading. Please wait.

Order Statistics.

Similar presentations


Presentation on theme: "Order Statistics."— Presentation transcript:

1 Order Statistics

2 Order Statistic ith order statistic: ith smallest element of a set of n elements. Minimum: first order statistic. Maximum: nth order statistic. Median: “half-way point” of the set. Unique, when n is odd – occurs at i = (n+1)/2. Two medians when n is even. Lower median, at i = n/2. Upper median, at i = n/2+1. For consistency, “median” will refer to the lower median.

3 Selection Problem Selection problem:
Input: A set A of n distinct numbers and a number i, with 1 i  n. Ouput: the element x  A that is larger than exactly i – 1 other elements of A. Can be solved in O(n lg n) time. How? We will study linear-time algorithms. For the special cases when i = 1 and i = n. For the general problem.

4 Minimum (Maximum) No. of comparisons: n – 1. Can we do better?
Minimum (A) 1. min  A[1] 2. for i  2 to length[A] do if min > A[i] then min  A[i] 5. return min Maximum can be determined similarly. T(n) = (n). No. of comparisons: n – 1. Can we do better?

5 Minimum (Maximum) No. of comparisons: n – 1.
Minimum (A) 1. min  A[1] 2. for i  2 to length[A] do if min > A[i] then min  A[i] 5. return min Maximum can be determined similarly. T(n) = (n). No. of comparisons: n – 1. Can we do better? Why not? Minimum(A) has worst-case optimal # of comparisons.

6 Selection Problem Selection problem:
Input: A set A of n distinct numbers and a number i, with 1 i  n. Ouput: the element x  A that is larger than exactly i – 1 other elements of A. Seems harder than min or max, but has the same asymptotic running time. QuickMedian: expected linear time (randomized) BFPRT73: linear time (deterministic) Key idea: geometric series sum to linear

7 Randomized Quicksort: review
Quicksort(A, p, r) if p < r then q := Rnd-Partition(A, p, r); Quicksort(A, p, q – 1); Quicksort(A, q + 1, r) fi Rnd-Partition(A, p, r) i := Random(p, r); A[r]  A[i]; x, i := A[r], p – 1; for j := p to r – 1 do if A[j]  x then i := i + 1; A[i]  A[j] fi od; A[i + 1]  A[r]; return i + 1 A[p..r] 5 A[p..q – 1] A[q+1..r] Partition 5  5  5

8 Selection in Expected Linear Time
Key idea: Quicksort, but recur only on list containing i Exploit two abilities of Randomized-Partition (RP). RP returns the index k in the sorted order of a randomly chosen element (pivot). If the order statistic i = k, then we are done. Else reduce the problem size using its other ability. RP rearranges all other elements around the pivot. If i < k, selection can be narrowed down to A[1..k – 1]. Else, select the (i – k)th element from A[k+1..n].

9 Randomized-Select Randomized-Select(A, p, r, i) // select ith order statistic. 1. if p = r then return A[p] 3. q  Randomized-Partition(A, p, r) 4. k  q – p + 1 5. if i = k then return A[q] 7. elseif i < k then return Randomized-Select(A, p, q – 1, i) else return Randomized-Select(A, q+1, r, i – k)

10 Randomized-Select Example
8 1 5 3 4 Goal: Find 3rd smallest element

11 Randomized-Select Example
8 1 5 3 4

12 Randomized-Select Example
8 1 5 3 4 1 3 8 5 4

13 Randomized-Select Example
8 1 5 3 4 1 3 8 5 4 8 5 4

14 Randomized-Select Example
8 1 5 3 4 1 3 8 5 4 8 5 4

15 Randomized-Select Example
8 1 5 3 4 1 3 8 5 4 4 5 8

16 Randomized-Select Example
8 1 5 3 4 1 3 8 5 4 4 5 8 4

17 Analysis Worst-case Complexity: Average-case Complexity:
(n2) – As we could get unlucky and always recurse on a subarray that is only one element smaller than the previous subarray. (T(n) = T(n-1) + (n) ) Average-case Complexity: (n) – Intuition: Because the pivot is chosen at random, we expect that we get rid of half of the list each time we choose a random pivot q. Why (n) and not (n lg n)?

18 Analysis Average-case Complexity - more intuition
If we get rid of 10% of the list each time we choose a random pivot q... T(n) = T(9/10 n) + (n)

19 Analysis Average-case Complexity - more intuition
If we get rid of 10% of the list each time we choose a random pivot q... T(n) = T(9/10 n) + (n) Let a  1 and b > 1 ,T(n) = a T(n/b) + c nk, n  0 1. If a > bk, then T(n) = ( nlog_b a ). 2. If a = bk, then T(n) = ( nk lg n ). 3. If a < bk, then T(n) = ( nk ).

20 Analysis Average-case Complexity - more intuition T(n) = (n)
If we get rid of 10% of the list each time we choose a random pivot q... T(n) = T(9/10 n) + (n) T(n) = (n) Let a  1 and b > 1 ,T(n) = a T(n/b) + c nk, n  0 1. If a > bk, then T(n) = ( nlog_b a ). 2. If a = bk, then T(n) = ( nk lg n ). 3. If a < bk, then T(n) = ( nk ).

21 Selection in Worst-Case Linear Time
Algorithm Select: Like RandomizedSelect, finds the desired element by recursively partitioning the input array. Unlike RandomizedSelect, is deterministic. Uses a variant of the deterministic Partition routine. Partition is told which element to use as the pivot. Achieves linear-time complexity in the worst case by Guaranteeing that the split is always “good” at each Partition. How can a good split be guaranteed?

22 Choosing a Pivot Median-of-Medians:
Divide the n elements into n/5 groups. n/5 groups contain 5 elements each. 1 group may contain n mod 5 < 5 elements. Determine the median of each of the groups. Recursively find the median x of the n/5 medians. n/5 groups of 5 elements each. n/5th group of n mod 5 elements.

23 Example

24 Algorithm Select Determine the median-of-medians x (on previous slide.) Partition input array around x (Partition from Quicksort). Let k be the index of x that Partition returns. If k = i, then return x. Else if i < k, apply Select recursively to A[1..k–1] to find the ith smallest element. Else if i > k, apply Select recursively to A[k+1..n] to find the (i – k)th smallest element.

25 Worst-case Split Arrows point from larger to smaller elements.
n/5 groups of 5 elements each. Elements < x n/5th group of n mod 5 elements. Median-of-medians, x Elements > x

26 Worst-case Split Assumption: Elements are distinct. Why?
At least  n/5 /2 groups have 3 of their 5 elements ≥ x. Ignore the last group if it has fewer than 5 elements. Hence, the no. of elements ≥ x is at least 3(n–4)/10. Likewise, the no. of elements ≤ x is at least 3(n–4)/10. Thus, in the worst case, Select is called recursively on at most (7n+12)/10 elements.

27 Recurrence for worst-case running time
T(Select)  T(Median-of-medians) T(Partition) T(recursive call to select) T(n)  O(n) + T(n/5) + O(n) + T((7n+12)/10) = T(n/5) + T(7n/10+1.2) + O(n) Base: for n  24, assume we just use Insertionsort. So T(n)  24n for all n  24. T(Median-of-medians) T(Partition) T(recursive call)

28 Solving the recurrence
Base: for all n  24, T(n)  24n For n > 24, T(n) ≤ an+ T(n/5) + T(7n/10+1.2) We want to find c>0 so for all n>0 T(n) ≤ cn. Base implies c ≥ 24 T(n) ≤ an+ T(n/5) + T(7n/10+1.2) ?≤ an+ c n/5 + c 7n/ c = cn – (c n/10 – an – 1.2c) = cn – ((c/20 – a)n + (n/20 – 1.2)c) ≤ cn, as long as c ≥ 20a. So, c = max(24, 20a) works

29 Conclusions We can find the ith largest in an unordered list in Θ(n) worst-case time Let us do Quicksort in worst-case Θ(n lg n) . That constant, 20× partition cost, was high; use RandomizedSelect (aka QuickMedian) in practice.


Download ppt "Order Statistics."

Similar presentations


Ads by Google