Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 9: Selection of Order Statistics

Similar presentations


Presentation on theme: "Chapter 9: Selection of Order Statistics"— Presentation transcript:

1 Chapter 9: Selection of Order Statistics
What are an order statistic? min, max, median, ith smallest, etc. Selection means finding a particular order statistic Selection by sorting T(n) = W(nlgn) Partition allows selection in linear time

2 Min, Max and Median order statistics
In a set of n elements the ith order statistic = ith smallest element Larger then exactly i-1 other elements min is 1st order statistic; max is the nth order statistic parity of a set is whether n is even or odd median is roughly half way between min and max unique for an odd parity set ith smallest with i = (n+1)/2 regardless of parity lower median means ith smallest with i = (n+1)/2 upper median means ith smallest with i = (n+1)/2

3 The selection problem Find the ith order statistic in set of n (distinct) elements A=<a1, a2,...,an> (i.e. find x  A such that x is larger than exactly i –1 other elements of A) Selection problem can be solve with T(n)=W(nlgn) by sorting Since min and max can be found in linear time, expect that any order statistic can be found in linear time. Analyze deterministic algorithm, SELECT, that finds the ith order statistic with worst-case runtime that is linear. Analyze RANDOMIZED-SELECT that finds the ith order statistic by randomized partition that has a linear expected runtime.

4 Select by partition pseudocode
Select-by-Partition(A,p,r,i) %argument i specifies which order statistic 1 if p=r then return A[p] %single element is ith smallest by default 2 q  Partition(A,p,r) %get upper and lower sub-arrays 3 k  q – p + 1 %number of elements in lower sub-array including pivot 4 if i = k then 5 return A[q] %pivot is the ith smallest element 6 else 7 if i < k then return Select-by-Partition(A,p,q-1,i) 8 else 9 return Select-by-Partition(A,q+1,r,i - k) Note: index of ith order statistic changed in upper sub-array With favorable splits, T(n) = O(n) Why not O(nlg(n)) as in quicksort?

5 Selection algorithm with worst-case runtime = O(n)
Possible to design a deterministic selection algorithm that has a linear worst-case runtime. Make the pivot an input parameter. Process before calling partition to determine a good choice for pivot.

6 SELECT by partition with preprocessing: T(n)=O(n)
Step 1: Divide n-elements into groups of 5 elements each and at most one with less than 5: cost = Q(n) Step 2: Use insertion sort to find median of each subgroup: cost = constant (cost of sorting 5 elements) x number of subgroups = Q(n) Step 3: Use SELECT to find the median of the medians: cost = T(ceiling(n/5)) The median of the group that may contain less than 5 is included. Step 4: Partition the input array with pivot = median of medians. Calculate k, the number of elements < pivot: cost = Q(n) + constant. If k=i return pivot. Step 5: If pivot is not the ith smallest element, get upper bound on runtime by assuming the ith smallest element is in the larger sub-array: cost < T(7n/10 + 6) (to be explained)

7 Diagram to help explain cost of Step 5
Dots represent elements of input. Subgroups of 5 occupy columns Arrows point from larger to smaller elements. Medians are white. x marks (lower) median of medians. Shaded area shows elements larger than x 3 out of 5 are shaded if subgroup is full and does not contain x

8 Rationale for this diagram
Odd number in full groups so that median is unique Total number 28 so that partial group also has unique median Choose lower median of medians so that we are sure that every element in shaded area is >x

9 Upper bound on lower sub array
At least 3(ceiling((ceiling(n/5)/2)-2) > (3n/10)-6 elements in full groups in shaded region # elements > x is at least (3n/10)-6 # elements < x is at most n-((3n/10)-6)= (7n/10)+6 Choose x as pivot Upper bound on lower sub-array = (7n/10)+6 full group Shaded region contains elements > x

10 By similar argument, Upper bound on upper sub array = (7n/10)+6 Worst case described by T(n) < T(ceiling(n/5)) + T(ceiling(7n/10+6)) + Q(n) step step steps 1,2,4

11 SELECT by partition with preprocessing: T(n)=O(n)
Step 1: Divide n-elements into groups of 5 elements each and at most one with less than 5: cost = Q(n) Step 2: Use insertion sort to find median of each subgroup: cost = constant (cost of sorting 5 elements) x number of subgroups = Q(n) Step 3: Use SELECT to find the median of the medians: cost = T(ceiling(n/5)) The median of the group that may contain less than 5 is included. Step 4: Partition the input array with pivot = median of medians. Calculate k, the number of elements < pivot: cost = Q(n) + constant. If k=i return pivot. Step 5: If pivot is not the ith smallest element, get upper bound on runtime by assuming the ith smallest element is in the larger sub-array: cost < T(7n/10 + 6) (to be explained)

12 Show by substitution that
T(n) = T(ceiling(n/5)) + T(ceiling(7n/10+6)) + Q(n) has asymptotic solution T(n) = O(n).

13 What recurrences describe (a) and (b)?
Homework Assignment 16: due 3/22/19 Ex p 223: (a) Show that SELECT with groups of 7 has a linear worst-case runtime (b) Show that SELECT with groups of 3 does not have a worst-case linear runtime. What recurrences describe (a) and (b)?

14 Randomized-Select lets us analyze the runtime for the average case
Randomized-Select(A,p,r,i) 1 if p=r then return A[p] 2 q  Randomized-Partition(A,p,r) 3 k  q – p + 1 4 if i = k then 5 return A[q] (pivot is the ith smallest element) 6 else 7 if i < k then return Randomized-Select(A,p,q-1,i) 8 else 9 return Randomized-Select(A,q+1,r,i –k) As in Randomized-Quicksort, Randomized-Partition chooses a pivot at random from array elements between p and r

15 Upper bound on the expected value of T(n) for Randomized-Select
Calls to Randomized-Partition creates upper and lower sub-arrays Include the pivot in lower sub-array A(p..q) Define indicator random variables: Ak = event where sub-array A[p...q] has exactly k elements. Xk = I{Ak} 1 < k < n All possibilities values of k are equally likely. E[Xk] = 1/n

16 If the lower sub-array (including the pivot) has k element and if the pivot
is not the ith smallest, randomized select will be called with an array of size k-1 or an array of size n-k Randomized-Select(A,p,r,i) 1 if p=r then return A[p] 2 q  Randomized-Partition(A,p,r) 3 k  q – p + 1 4 if i = k then 5 return A[q] (pivot is the ith smallest element) 6 else 7 if i < k then return Randomized-Select(A,p,q-1,i) 8 else 9 return Randomized-Select(A,q+1,r,i –k)

17 Assume that pivot in not the ith smallest and that ith smallest is in larger sub-array (ensures an upper bound on E(T(n)) T(n) < {Xk T(max(k-1,n-k))} + O(n) randomized recurrence T(n) = T(n-1) + O(n) when lower sub-array has 1 element T(n) = T(n-2) + O(n) when lower sub-array has 2 element . T(n) = T(n-2) + O(n) when lower sub-array has n-1 element T(n) = T(n-1) + O(n) when lower sub-array has n element For even n, max(k-1,n-k) varies from n/2 to n-1, and each value occurs exactly twice in the sum over k from 1 to n

18 E[T(n)] < { E[Xk T(max(k-1,n-k))] } + O(n)
(linearity of expected values) E[T(n)] < { E[Xk] E[ T(max(k-1,n-k))] } + O(n) (expected value of independent of random variables) E[T(n)] < (1/n) E[ T(max(k-1,n-k))] + O(n) (using E[Xk] = 1/n)

19 To avoid floors and ceiling, assume n is even.
E[T(n)] < (1/n) E[ T(max(k-1,n-k))] + O(n) if k > n/2, max(k-1,n-k) = k-1 if k < n/2, max(k-1,n-k) = n-k To avoid floors and ceiling, assume n is even. Each term from T(n/2) to T(n-1) occurs exactly twice E[T(n)] < (2/n) E[ T(k)] + O(n) (using the redundancy of T’s) E[T(n)] < (2/n) { E[ T(k)] E[ T(k)] } + O(n) Getting setup to use the arithmetic sum

20 Apply substitution method: assume E[T(k)] = O(k)
Then exist c > 0 such that E[T(k)] < ck E[T(n)] < (2c/n) { k k} + dn d>0 Now use arithmetic sum After much algebra (text p219) E[T(n)] < cn – (cn/4 – c/2 – dn) Complete the proof by finding constraints on c and n

21 E[T(n)] < cn – (cn/4 – c/2 – dn) < cn
Multiply by 4 -cn + 2c + 4dn < 0 n(4d-c) +2c < 0 4d-c < 0 c>4d constraint on c Choose c=5d -nd+10d < 0 Divide by d>0 n>10 constraint on n Choose n0=10 Exist c=5d such that 0<E[T(n)]<cn for all n>n0 Therefore E[T(n)]=O(n) by definition

22 Homework Assignment 17: due 3/25/19
Given a “black box” code that the finds the median in linear time, worst case: 1) Write a version of Partition (Good-Partition) that uses the median for the pivot. 2) Write a version of Quicksort by Partition (Good-Quicksort) that runs in O(nlgn), worst case. Write a recurrence relation for the runtime of Good-Quicksort and show that it has solution T(n)=O(nlgn). 3) Write a version of Select by Partition (Good-Select) that runs in O(n), worst case. Write a recurrence relation for the runtime of Good-Select and show that it has solution T(n)=O(n).

23 Quiz #4 3/29/19 Chapters 7 and 9 Homework assignments 14-17


Download ppt "Chapter 9: Selection of Order Statistics"

Similar presentations


Ads by Google