The Selection Problem.

Slides:



Advertisements
Similar presentations
Comp 122, Spring 2004 Order Statistics. order - 2 Lin / Devi Comp 122 Order Statistic i th order statistic: i th smallest element of a set of n elements.
Advertisements

1 More Sorting; Searching Dan Barrish-Flood. 2 Bucket Sort Put keys into n buckets, then sort each bucket, then concatenate. If keys are uniformly distributed.
Medians and Order Statistics
1 Selection --Medians and Order Statistics (Chap. 9) The ith order statistic of n elements S={a 1, a 2,…, a n } : ith smallest elements Also called selection.
Introduction to Algorithms
DIVIDE AND CONQUER APPROACH. General Method Works on the approach of dividing a given problem into smaller sub problems (ideally of same size).  Divide.
Spring 2015 Lecture 5: QuickSort & Selection
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Analysis of Algorithms CS 477/677 Midterm Exam Review Instructor: George Bebis.
Selection: Find the ith number
Analysis of Algorithms CS 477/677
Tirgul 4 Order Statistics Heaps minimum/maximum Selection Overview
David Luebke 1 8/17/2015 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Analysis of Algorithms CS 477/677
Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Order Statistics ● The ith order statistic in a set of n elements is the ith smallest element ● The minimum is thus the 1st order statistic ● The maximum.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
COSC 3101A - Design and Analysis of Algorithms 4 Quicksort Medians and Order Statistics Many of these slides are taken from Monica Nicolescu, Univ. of.
CSC317 1 Quicksort on average run time We’ll prove that average run time with random pivots for any input array is O(n log n) Randomness is in choosing.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
David Luebke 1 6/26/2016 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
David Luebke 1 7/2/2016 CS 332: Algorithms Linear-Time Sorting: Review + Bucket Sort Medians and Order Statistics.
Chapter 9: Selection of Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Lower Bounds & Sorting in Linear Time
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Analysis of Algorithms CS 477/677
Order Statistics.
Order Statistics Comp 122, Spring 2004.
Introduction to Algorithms Prof. Charles E. Leiserson
Linear-Time Sorting Continued Medians and Order Statistics
Randomized Algorithms
Order Statistics(Selection Problem)
Quick-Sort 11/14/2018 2:17 PM Chapter 4: Sorting    7 9
Quick-Sort 11/19/ :46 AM Chapter 4: Sorting    7 9
Quick Sort (11.2) CSE 2011 Winter November 2018.
Ch 7: Quicksort Ming-Te Chi
Randomized Algorithms
Data Structures Review Session
Lecture 3 / 4 Algorithm Analysis
Medians and Order Statistics
Lecture No 6 Advance Analysis of Institute of Southern Punjab Multan
Linear Sorting Sorting in O(n) Jeff Chastine.
CS 3343: Analysis of Algorithms
Order Statistics Comp 550, Spring 2015.
Lower Bounds & Sorting in Linear Time
Linear-Time Sorting Algorithms
Quick-Sort 2/23/2019 1:48 AM Chapter 4: Sorting    7 9
Divide and Conquer (Merge Sort)
CS 3343: Analysis of Algorithms
Order Statistics Def: Let A be an ordered set containing n elements. The i-th order statistic is the i-th smallest element. Minimum: 1st order statistic.
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Chapter 9: Medians and Order Statistics
Algorithms CSCI 235, Spring 2019 Lecture 20 Order Statistics II
Order Statistics Comp 122, Spring 2004.
Chapter 9: Selection of Order Statistics
CS 583 Analysis of Algorithms
Quicksort and Randomized Algs
Algorithms CSCI 235, Spring 2019 Lecture 19 Order Statistics
Design and Analysis of Algorithms
Quicksort Quick sort Correctness of partition - loop invariant
Algorithms CSCI 235, Spring 2019 Lecture 17 Quick Sort II
CS200: Algorithm Analysis
Analysis of Algorithms CS 477/677
Medians and Order Statistics
Presentation transcript:

The Selection Problem

Median and Order Statistics In this section, we will study algorithms for finding the ith smallest element in a set of n elements We will again use divide-and-conquer algorithms

The Selection Problem Input: A set A of n (distinct) numbers and a number i, with 1  i  n Output: The element x  A that is larger than exactly i – 1 other elements of A x is the ith smallest element i = 1  minimum i = n  maximum

The Selection Problem (cont) A simple solution Sort A Return A[ i ] This is (nlgn)

Minimum and Maximum Finding the minimum and maximum Takes (n-1) comparisons ((n)) This is the best we can do and is optimal with respect to the number of comparisons MINIMUM(A) min  A[1] for i  2 to length(A) if min > A[ i ] min  A[ i ] return min MAXIMUM(A) max  A[1] for i  2 to length(A) if max < A[ i ] max  A[ i ] return max

Minimum and Maximum (cont) Simultaneous minimum and maximum Obvious solution is 2(n-1) comparisons But we can do better – namely The algorithm If n is odd, set max and min to first element If n is even, compare first two elements and set max, min Process the remaining elements in pairs Find the larger and the smaller of the pair Compare the larger of the pair with the current max And the smaller of the pair with the current min

Minimum and Maximum (cont) Total number of comparisons If n is odd comparisons If n is even 1 initial comparison And 3(n – 2)/2 comparisons For a total of 3n/2 – 2 comparisons In either case, total number of comparisons is at most

Selection in Expected Linear Time Goal: Select ith smallest element from A[p..r]. Partition into A[p..q-1] and A[q+1..r] if i = q then return A[q] If ith smallest element is in A[p..q-1] then recurse on A[p..q-1] else recurse on A[q+1..r]

Selection in Expected Linear Time (cont) Randomized-Select(A, p, r, i) 1 if p = r 2 return A[p] 3 q  Randomized-Partition(A, p, r) k  q - p + 1 //number of elements in the low side of of partition + pivot 5 if i = k //the pivot value is the answer 6 return A[q] 7 else if i < k 8 return Randomized-Select(A, p, q-1, i) 9 else 10 return Randomized-Select(A, q+1, r, i-k)

Revised Algorithm Randomized-Select(A, p, r, i) 1 if p = r 2 return A[p] 3 q  Randomized-Partition(A, p, r) 4 if i = q //the pivot value is the answer 5 return A[q] 6 else if i < q 7 return Randomized-Select(A, p, q-1, i) 8 else 9 return Randomized-Select(A, q+1, r, i)

Analysis of Selection Algorithm Worst-case running time is (n2) Partition takes (n) If we always partition around the largest remaining element, we reduce the partition-size by one element each time What is best-case?

Analysis of Selection Algorithm (cont) Average Case (i.e. expected running time for Randomized-Select) Average-case running time is (n) The time required is the random variable T(n) We want an upper bound on E[T(n)] In Randomized-Partition, all elements are equally likely to be the pivot

Analysis of Selection Algorithm (cont) So, for each k such that 1  k  n, subarray A[p..q] has k elements All  the pivot with probability 1/n For k = 1, 2, …, n we define indicator random variables Xk where Xk = I{the subarray A[p..q] has exactly k elements} So, E[Xk] = 1/n

Analysis of Selection Algorithm (cont) When we choose the pivot element (which ends up in A[q]) we do not know what will happen next Do we return with the ith element (k = i)? Do we recurse on A[p..q-1]? Do we recurse on A[q+1..r]? Decision depends on i in relation to k We will find the upper-bound on the average case by assuming that the ith element is always in the larger partition

Analysis of Selection Algorithm (cont) Now, Xk = 1 for just one value of k, 0 for all others When Xk = 1, the two subarrays have sizes k – 1 and n – k Hence the recurrence:

Analysis of Selection Algorithm (cont) Taking the expected values:

Analysis of Selection Algorithm (cont) Looking at the expression max(k-1, n-k) If n is even, each term from appears twice in the summation If n is odd, each term from appears twice and appears once in the summation

Analysis of Selection Algorithm (cont) Thus we have We use substitution to solve the recurrence Note: T(1) = (1) for n less than some constant Assume that T(n)  cn for some constant c that satisfies the initial conditions of the recurrence

Analysis of Selection Algorithm (cont) Using this inductive hypothesis

Analysis of Selection Algorithm (cont)

Analysis of Selection Algorithm (cont) To complete the proof, we need to show that for sufficiently large n, this last expression is at most cn i.e. As long as we choose the constant c so that c/4 – a > 0 (i.e., c > 4a), we can divide both sides by c/4 – a

Analysis of Selection Algorithm (cont) Thus, if we assume that T(n) = (1) for , we have T(n) = (n)

Selection in Worst-Case Linear Time “Median of Medians” algorithm It guarantees a good split when array is partitioned Partition is modified so that the pivot now becomes an input parameter The algorithm: If n = 1 return A[n]

Selection in Worst-Case Linear Time (cont) Divide the n elements of the input array into n/5 groups of 5 elements each and at most one group of (n mod 5) elements Find the median of each of the n/5 groups by using insertion sort to sort list and then pick the 3rd element of each group Use Select recursively to find the median x of the n/5 medians found in step 2. If even number of medians, choose lower median

Selection in Worst-Case Linear Time (cont) Partition the input array around the “median of medians” x using the modified version of Partition. Let k be one more than the number of elements on the low side of the partition, so that x is the kth smallest element and there are n – k elements on the high side of the partition if i = k, then return x. Otherwise, use Select recursively to find the ith smallest element on the low side if i < k, or the (i – k)th smallest element on the high side if i > k

Selection in Worst-Case Linear Time (cont) Example of “Median of Medians” Input Array A[1..125] Step 1: 25 groups of 5 Step 2: We get 25 medians Step 3: Step 1: Using the 25 medians we get 5 groups of 5 Step 2: We get 5 medians Step 3: Step 1: Using the 5 medians, we get 1 group of 5 Step 2: We get 1 median Step 4: Partition A around the median

Analyzing “Median of Medians” The following diagram might be helpful:

Analyzing “Median of Medians” (cont) First, we need to put a lower bound on how many elements are greater than x (pivot) How many of the medians are greater than x? At least half of the medians from the groups Why “at least half?” medians are greater than x

Analyzing “Median of Medians” (cont) Each of these medians contribute at least 3 elements greater than x except for two groups The group that contains x contributes only 2 elements greater than x The group that has less than 5 elements So the total number of elements > x is at least: The two discarded groups

Analyzing “Median of Medians” (cont) Similarly, there are at least elements smaller than x Thus, in the worst case, for Step 5 Select is called recursively on the largest partition The largest partition has at most elements The size of the array minus the number of elements in the smaller partition

Analyzing “Median of Medians” (cont) Developing the recurrence: Step 1 takes (n) time Step 2 takes (n) time (n) calls to Insertion Sort on sets of size (1) Step 3 takes Step 4 takes (n) time Step 5 takes at most

Analyzing “Median of Medians” (cont) So the recurrence is Now use substitution to solve Assume T(n)  cn for some suitable large constant c and all n > ??? Also pick a constant a such that the function described by the (n) term is bounded above by an for all n > 0

Analyzing “Median of Medians” (cont) Comes from removing the   Which is at most cn if If n = 70, then this inequality is undefined

Analyzing “Median of Medians” (cont) We assume that n  71, so Choosing c  710a will satisfy the inequality on the previous slide You could choose any constant > 70 to be the base case constant Thus, the selection problem can be solved in the worst-case in linear time

Review of Sorts Review of sorts seen so far Insertion Sort Easy to code Fast on small inputs (less than ~50) Fast on nearly sorted inputs Stable (n) best case (sorted list) (n2) average case (n2) worst case (reverse sorted list)

Stable Sorts Stable means that numbers with the same value appear in the output array in the same order as they do in the input array. That is, ties between two numbers are broken by the rule that whichever number appears first in the input array appears first in the output array. Normally, the property of stability is important only when satellite data are carried around with the element being sorted.

An example of stable sorting on playing cards An example of stable sorting on playing cards. When the cards are sorted by rank with a stable sort, the two 5s must remain in the same order in the sorted output that they were originally in. When they are sorted with a non-stable sort, the 5s may end up in the opposite order in the sorted output.

Review of Sorts (cont) MergeSort Divide and Conquer algorithm Doesn’t sort in place Requires memory as a function of n Stable (nlgn) best case (nlgn) average case (nlgn) worst case

Review of Sorts (cont) QuickSort Divide and Conquer algorithm No merge step needed Small constants Fast in practice Not stable (nlgn) best case (nlgn) average case (n2) worst case

Review of Sorts (cont) Several of these algorithms sort in (nlgn) time MergeSort in worst case QuickSort on average On some input we can achieve (nlgn) time for each of these algorithms The sorted order they determine is based only on comparisons between the input elements They are called comparison sorts

Review of Sorts (cont) Other techniques for sorting exist, such as Linear Sorting which is not based on comparisons Usually with some restrictions or assumptions on input elements Linear Sorting techniques include: Counting Sort Radix Sort Bucket Sort

Lower Bounds for Sorting In general, assuming unique inputs, comparison sorts are expressed in terms of comparisons. are equivalent in learning about the order of ai and aj What is the best we can do on the worst case type of input? What is the best worst-case running time?

The Decision-Tree Model input: a1,a2,a3 # possible outputs = 3! = 6 Each possible output is a leaf 1:2  > 2:3 1:3  >  > 1,2,3  1:3 2,1,3  2:3  >  > 1,3,2  3,1,2  2,3,1  3,2,1 

Analysis of Decision-Tree Model Worst Case Comparisons is equal to height of decision tree Lower bound on the worst case running time is the lower bound on the height of the decision tree. Note that the number of leaves in the decision tree  n!, where n = number elements in the input sequence

Theorem 8.1 Any comparison sort algorithm requires (nlgn) comparisons in the worst case Proof: Consider a decision tree of height h that sorts n elements Since there are n! permutations of n elements, each permutation representing a distinct sorted order, the tree must have at least n! leaves

Theorem 8.1 (cont) A binary tree of height h has at most 2h leaves The best possible worst case running time for comparison sorts is thus (nlgn) Mergesort, which is O(nlgn), is asymptotically optimal By equation 3.18

Sorting in Linear Time How can we do better? CountingSort RadixSort BucketSort