Chapter 9: Selection of Order Statistics

Slides:



Advertisements
Similar presentations
©2001 by Charles E. Leiserson Introduction to AlgorithmsDay 9 L6.1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 6 Prof. Erik Demaine.
Advertisements

Comp 122, Spring 2004 Order Statistics. order - 2 Lin / Devi Comp 122 Order Statistic i th order statistic: i th smallest element of a set of n elements.
1 More Sorting; Searching Dan Barrish-Flood. 2 Bucket Sort Put keys into n buckets, then sort each bucket, then concatenate. If keys are uniformly distributed.
Order Statistics(Selection Problem) A more interesting problem is selection:  finding the i th smallest element of a set We will show: –A practical randomized.
CS 3343: Analysis of Algorithms Lecture 14: Order Statistics.
Medians and Order Statistics
1 Selection --Medians and Order Statistics (Chap. 9) The ith order statistic of n elements S={a 1, a 2,…, a n } : ith smallest elements Also called selection.
Introduction to Algorithms
Introduction to Algorithms Jiafen Liu Sept
Data Structures and Algorithms (AT70.02) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: CLRS “Intro.
Spring 2015 Lecture 5: QuickSort & Selection
Analysis of Algorithms CS 477/677 Randomizing Quicksort Instructor: George Bebis (Appendix C.2, Appendix C.3) (Chapter 5, Chapter 7)
Median/Order Statistics Algorithms
Probabilistic (Average-Case) Analysis and Randomized Algorithms Two different but similar analyses –Probabilistic analysis of a deterministic algorithm.
Ch. 7 - QuickSort Quick but not Guaranteed. Ch.7 - QuickSort Another Divide-and-Conquer sorting algorithm… As it turns out, MERGESORT and HEAPSORT, although.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Median, order statistics. Problem Find the i-th smallest of n elements.  i=1: minimum  i=n: maximum  i= or i= : median Sol: sort and index the i-th.
Selection: Find the ith number
David Luebke 1 8/17/2015 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Introduction to Algorithms Jiafen Liu Sept
Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Order Statistics ● The ith order statistic in a set of n elements is the ith smallest element ● The minimum is thus the 1st order statistic ● The maximum.
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
Order Statistics(Selection Problem)
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
1 Medians and Order Statistics CLRS Chapter 9. upper median lower median The lower median is the -th order statistic The upper median.
COSC 3101A - Design and Analysis of Algorithms 4 Quicksort Medians and Order Statistics Many of these slides are taken from Monica Nicolescu, Univ. of.
CSC317 1 Quicksort on average run time We’ll prove that average run time with random pivots for any input array is O(n log n) Randomness is in choosing.
David Luebke 1 6/26/2016 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
David Luebke 1 7/2/2016 CS 332: Algorithms Linear-Time Sorting: Review + Bucket Sort Medians and Order Statistics.
Chapter 9: Selection of Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Analysis of Algorithms CS 477/677
Order Statistics.
Order Statistics Comp 122, Spring 2004.
Introduction to Algorithms Prof. Charles E. Leiserson
Linear-Time Sorting Continued Medians and Order Statistics
Randomized Algorithms
Order Statistics(Selection Problem)
Quick Sort (11.2) CSE 2011 Winter November 2018.
Dr. Yingwu Zhu Chapter 9, p Linear Time Selection Dr. Yingwu Zhu Chapter 9, p
Randomized Algorithms
Data Structures Review Session
Medians and Order Statistics
Topic: Divide and Conquer
Lecture No 6 Advance Analysis of Institute of Southern Punjab Multan
CS 3343: Analysis of Algorithms
Order Statistics Comp 550, Spring 2015.
CS 583 Analysis of Algorithms
CS 3343: Analysis of Algorithms
Order Statistics Def: Let A be an ordered set containing n elements. The i-th order statistic is the i-th smallest element. Minimum: 1st order statistic.
Algorithms: the big picture
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Ack: Several slides from Prof. Jim Anderson’s COMP 202 notes.
Chapter 9: Medians and Order Statistics
Assignment 1: due 1/9/19 Geometric sum: Prove by induction on integers that Give a structured proof using the technique if S(n-1) then S(n). Include the.
Topic: Divide and Conquer
Algorithms CSCI 235, Spring 2019 Lecture 20 Order Statistics II
Order Statistics Comp 122, Spring 2004.
Chapter 8: Overview Comparison sorts: algorithms that sort sequences by comparing the value of elements Prove that the number of comparison required to.
CS200: Algorithm Analysis
CS 583 Analysis of Algorithms
The Selection Problem.
Quicksort and Randomized Algs
Quicksort Quick sort Correctness of partition - loop invariant
CS200: Algorithm Analysis
Medians and Order Statistics
Presentation transcript:

Chapter 9: Selection of Order Statistics What are an order statistic? min, max, median, ith smallest, etc. Selection means finding a particular order statistic Selection by sorting T(n) = W(nlgn) Partition allows selection in linear time

Min, Max and Median order statistics In a set of n elements the ith order statistic = ith smallest element Larger then exactly i-1 other elements min is 1st order statistic; max is the nth order statistic parity of a set is whether n is even or odd median is roughly half way between min and max unique for an odd parity set ith smallest with i = (n+1)/2 regardless of parity lower median means ith smallest with i = (n+1)/2 upper median means ith smallest with i = (n+1)/2

The selection problem Find the ith order statistic in set of n (distinct) elements A=<a1, a2,...,an> (i.e. find x  A such that x is larger than exactly i –1 other elements of A) Selection problem can be solve with T(n)=W(nlgn) by sorting Since min and max can be found in linear time, expect that any order statistic can be found in linear time. Analyze deterministic algorithm, SELECT, that finds the ith order statistic with worst-case runtime that is linear. Analyze RANDOMIZED-SELECT that finds the ith order statistic by randomized partition that has a linear expected runtime.

Select by partition pseudocode Select-by-Partition(A,p,r,i) %argument i specifies which order statistic 1 if p=r then return A[p] %single element is ith smallest by default 2 q  Partition(A,p,r) %get upper and lower sub-arrays 3 k  q – p + 1 %number of elements in lower sub-array including pivot 4 if i = k then 5 return A[q] %pivot is the ith smallest element 6 else 7 if i < k then return Select-by-Partition(A,p,q-1,i) 8 else 9 return Select-by-Partition(A,q+1,r,i - k) Note: index of ith order statistic changed in upper sub-array With favorable splits, T(n) = O(n) Why not O(nlg(n)) as in quicksort?

Selection algorithm with worst-case runtime = O(n) Possible to design a deterministic selection algorithm that has a linear worst-case runtime. Make the pivot an input parameter. Process before calling partition to determine a good choice for pivot.

SELECT by partition with preprocessing: T(n)=O(n) Step 1: Divide n-elements into groups of 5 elements each and at most one with less than 5: cost = Q(n) Step 2: Use insertion sort to find median of each subgroup: cost = constant (cost of sorting 5 elements) x number of subgroups = Q(n) Step 3: Use SELECT to find the median of the medians: cost = T(ceiling(n/5)) The median of the group that may contain less than 5 is included. Step 4: Partition the input array with pivot = median of medians. Calculate k, the number of elements < pivot: cost = Q(n) + constant. If k=i return pivot. Step 5: If pivot is not the ith smallest element, get upper bound on runtime by assuming the ith smallest element is in the larger sub-array: cost < T(7n/10 + 6) (to be explained)

Diagram to help explain cost of Step 5 Dots represent elements of input. Subgroups of 5 occupy columns Arrows point from larger to smaller elements. Medians are white. x marks (lower) median of medians. Shaded area shows elements larger than x 3 out of 5 are shaded if subgroup is full and does not contain x

Rationale for this diagram Odd number in full groups so that median is unique Total number 28 so that partial group also has unique median Choose lower median of medians so that we are sure that every element in shaded area is >x

Upper bound on lower sub array At least 3(ceiling((ceiling(n/5)/2)-2) > (3n/10)-6 elements in full groups in shaded region # elements > x is at least (3n/10)-6 # elements < x is at most n-((3n/10)-6)= (7n/10)+6 Choose x as pivot Upper bound on lower sub-array = (7n/10)+6 full group Shaded region contains elements > x

By similar argument, Upper bound on upper sub array = (7n/10)+6 Worst case described by T(n) < T(ceiling(n/5)) + T(ceiling(7n/10+6)) + Q(n) step 3 step 5 steps 1,2,4

SELECT by partition with preprocessing: T(n)=O(n) Step 1: Divide n-elements into groups of 5 elements each and at most one with less than 5: cost = Q(n) Step 2: Use insertion sort to find median of each subgroup: cost = constant (cost of sorting 5 elements) x number of subgroups = Q(n) Step 3: Use SELECT to find the median of the medians: cost = T(ceiling(n/5)) The median of the group that may contain less than 5 is included. Step 4: Partition the input array with pivot = median of medians. Calculate k, the number of elements < pivot: cost = Q(n) + constant. If k=i return pivot. Step 5: If pivot is not the ith smallest element, get upper bound on runtime by assuming the ith smallest element is in the larger sub-array: cost < T(7n/10 + 6) (to be explained)

Show by substitution that T(n) = T(ceiling(n/5)) + T(ceiling(7n/10+6)) + Q(n) has asymptotic solution T(n) = O(n).

What recurrences describe (a) and (b)? Homework Assignment 16: due 3/22/19 Ex 9.3-1 p 223: (a) Show that SELECT with groups of 7 has a linear worst-case runtime (b) Show that SELECT with groups of 3 does not have a worst-case linear runtime. What recurrences describe (a) and (b)?

Randomized-Select lets us analyze the runtime for the average case Randomized-Select(A,p,r,i) 1 if p=r then return A[p] 2 q  Randomized-Partition(A,p,r) 3 k  q – p + 1 4 if i = k then 5 return A[q] (pivot is the ith smallest element) 6 else 7 if i < k then return Randomized-Select(A,p,q-1,i) 8 else 9 return Randomized-Select(A,q+1,r,i –k) As in Randomized-Quicksort, Randomized-Partition chooses a pivot at random from array elements between p and r

Upper bound on the expected value of T(n) for Randomized-Select Calls to Randomized-Partition creates upper and lower sub-arrays Include the pivot in lower sub-array A(p..q) Define indicator random variables: Ak = event where sub-array A[p...q] has exactly k elements. Xk = I{Ak} 1 < k < n All possibilities values of k are equally likely. E[Xk] = 1/n

If the lower sub-array (including the pivot) has k element and if the pivot is not the ith smallest, randomized select will be called with an array of size k-1 or an array of size n-k Randomized-Select(A,p,r,i) 1 if p=r then return A[p] 2 q  Randomized-Partition(A,p,r) 3 k  q – p + 1 4 if i = k then 5 return A[q] (pivot is the ith smallest element) 6 else 7 if i < k then return Randomized-Select(A,p,q-1,i) 8 else 9 return Randomized-Select(A,q+1,r,i –k)

Assume that pivot in not the ith smallest and that ith smallest is in larger sub-array (ensures an upper bound on E(T(n)) T(n) < {Xk T(max(k-1,n-k))} + O(n) randomized recurrence T(n) = T(n-1) + O(n) when lower sub-array has 1 element T(n) = T(n-2) + O(n) when lower sub-array has 2 element . T(n) = T(n-2) + O(n) when lower sub-array has n-1 element T(n) = T(n-1) + O(n) when lower sub-array has n element For even n, max(k-1,n-k) varies from n/2 to n-1, and each value occurs exactly twice in the sum over k from 1 to n

E[T(n)] < { E[Xk T(max(k-1,n-k))] } + O(n) (linearity of expected values) E[T(n)] < { E[Xk] E[ T(max(k-1,n-k))] } + O(n) (expected value of independent of random variables) E[T(n)] < (1/n) E[ T(max(k-1,n-k))] + O(n) (using E[Xk] = 1/n)

To avoid floors and ceiling, assume n is even. E[T(n)] < (1/n) E[ T(max(k-1,n-k))] + O(n) if k > n/2, max(k-1,n-k) = k-1 if k < n/2, max(k-1,n-k) = n-k To avoid floors and ceiling, assume n is even. Each term from T(n/2) to T(n-1) occurs exactly twice E[T(n)] < (2/n) E[ T(k)] + O(n) (using the redundancy of T’s) E[T(n)] < (2/n) { E[ T(k)] - E[ T(k)] } + O(n) Getting setup to use the arithmetic sum

Apply substitution method: assume E[T(k)] = O(k) Then exist c > 0 such that E[T(k)] < ck E[T(n)] < (2c/n) { k - k} + dn d>0 Now use arithmetic sum After much algebra (text p219) E[T(n)] < cn – (cn/4 – c/2 – dn) Complete the proof by finding constraints on c and n

E[T(n)] < cn – (cn/4 – c/2 – dn) < cn Multiply by 4 -cn + 2c + 4dn < 0 n(4d-c) +2c < 0 4d-c < 0 c>4d constraint on c Choose c=5d -nd+10d < 0 Divide by d>0 n>10 constraint on n Choose n0=10 Exist c=5d such that 0<E[T(n)]<cn for all n>n0 Therefore E[T(n)]=O(n) by definition

Homework Assignment 17: due 3/25/19 Given a “black box” code that the finds the median in linear time, worst case: 1) Write a version of Partition (Good-Partition) that uses the median for the pivot. 2) Write a version of Quicksort by Partition (Good-Quicksort) that runs in O(nlgn), worst case. Write a recurrence relation for the runtime of Good-Quicksort and show that it has solution T(n)=O(nlgn). 3) Write a version of Select by Partition (Good-Select) that runs in O(n), worst case. Write a recurrence relation for the runtime of Good-Select and show that it has solution T(n)=O(n).

Quiz #4 3/29/19 Chapters 7 and 9 Homework assignments 14-17