The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.

Slides:



Advertisements
Similar presentations
Comp 122, Spring 2004 Order Statistics. order - 2 Lin / Devi Comp 122 Order Statistic i th order statistic: i th smallest element of a set of n elements.
Advertisements

1 More Sorting; Searching Dan Barrish-Flood. 2 Bucket Sort Put keys into n buckets, then sort each bucket, then concatenate. If keys are uniformly distributed.
Order Statistics(Selection Problem) A more interesting problem is selection:  finding the i th smallest element of a set We will show: –A practical randomized.
Medians and Order Statistics
1 Selection --Medians and Order Statistics (Chap. 9) The ith order statistic of n elements S={a 1, a 2,…, a n } : ith smallest elements Also called selection.
Introduction to Algorithms
Introduction to Algorithms Jiafen Liu Sept
Data Structures and Algorithms (AT70.02) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: CLRS “Intro.
September 12, Algorithms and Data Structures Lecture III Simonas Šaltenis Nykredit Center for Database Research Aalborg University
Analysis of Algorithms CS 477/677 Sorting – Part B Instructor: George Bebis (Chapter 7)
Spring 2015 Lecture 5: QuickSort & Selection
Median/Order Statistics Algorithms
25 May Quick Sort (11.2) CSE 2011 Winter 2011.
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
CS 253: Algorithms Chapter 7 Mergesort Quicksort Credit: Dr. George Bebis.
Ch. 7 - QuickSort Quick but not Guaranteed. Ch.7 - QuickSort Another Divide-and-Conquer sorting algorithm… As it turns out, MERGESORT and HEAPSORT, although.
CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Analysis of Algorithms CS 477/677 Midterm Exam Review Instructor: George Bebis.
Median, order statistics. Problem Find the i-th smallest of n elements.  i=1: minimum  i=n: maximum  i= or i= : median Sol: sort and index the i-th.
Selection: Find the ith number
Analysis of Algorithms CS 477/677
Tirgul 4 Order Statistics Heaps minimum/maximum Selection Overview
David Luebke 1 7/2/2015 Linear-Time Sorting Algorithms.
1 QuickSort Worst time:  (n 2 ) Expected time:  (nlgn) – Constants in the expected time are small Sorts in place.
David Luebke 1 8/17/2015 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
Sorting in Linear Time Lower bound for comparison-based sorting
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
1 Divide & Conquer Algorithms Part 4. 2 Recursion Review A function that calls itself either directly or indirectly through another function Recursive.
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
Order Statistics. Order statistics Given an input of n values and an integer i, we wish to find the i’th largest value. There are i-1 elements smaller.
Sorting Fun1 Chapter 4: Sorting     29  9.
Analysis of Algorithms CS 477/677
Introduction to Algorithms Jiafen Liu Sept
Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Order Statistics ● The ith order statistic in a set of n elements is the ith smallest element ● The minimum is thus the 1st order statistic ● The maximum.
Order Statistics(Selection Problem)
QuickSort (Ch. 7) Like Merge-Sort, based on the three-step process of divide- and-conquer. Input: An array A[1…n] of comparable elements, the starting.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
Midterm Review 1. Midterm Exam Thursday, October 15 in classroom 75 minutes Exam structure: –TRUE/FALSE questions –short questions on the topics discussed.
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II.
COSC 3101A - Design and Analysis of Algorithms 4 Quicksort Medians and Order Statistics Many of these slides are taken from Monica Nicolescu, Univ. of.
CSC317 1 Quicksort on average run time We’ll prove that average run time with random pivots for any input array is O(n log n) Randomness is in choosing.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
David Luebke 1 6/26/2016 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
David Luebke 1 7/2/2016 CS 332: Algorithms Linear-Time Sorting: Review + Bucket Sort Medians and Order Statistics.
Chapter 9: Selection of Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Analysis of Algorithms CS 477/677
Order Statistics.
Order Statistics Comp 122, Spring 2004.
Introduction to Algorithms Prof. Charles E. Leiserson
Linear-Time Sorting Continued Medians and Order Statistics
Randomized Algorithms
Order Statistics(Selection Problem)
Randomized Algorithms
Medians and Order Statistics
Order Statistics Comp 550, Spring 2015.
Order Statistics Def: Let A be an ordered set containing n elements. The i-th order statistic is the i-th smallest element. Minimum: 1st order statistic.
Order Statistics Comp 122, Spring 2004.
Chapter 9: Selection of Order Statistics
CS 583 Analysis of Algorithms
The Selection Problem.
Quicksort and Randomized Algs
CS200: Algorithm Analysis
Medians and Order Statistics
Presentation transcript:

The Selection Problem

2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements We will again use divide-and-conquer algorithms

3 The Selection Problem Input: A set A of n (distinct) numbers and a number i, with 1  i  n Output: The element x  A that is larger than exactly i – 1 other elements of A – x is the i th smallest element i = 1  minimum i = n  maximum

4 The Selection Problem (cont) A simple solution – Sort A – Return A[ i ] – This is  (nlgn)

5 Minimum and Maximum Finding the minimum and maximum – Takes (n-1) comparisons (  (n)) – This is the best we can do and is optimal with respect to the number of comparisons MINIMUM(A) min  A[1] for i  2 to length(A) if min > A[ i ] min  A[ i ] return min MAXIMUM(A) max  A[1] for i  2 to length(A) if max < A[ i ] max  A[ i ] return max

6 Minimum and Maximum (cont) Simultaneous minimum and maximum – Obvious solution is 2(n-1) comparisons – But we can do better – namely – The algorithm If n is odd, set max and min to first element If n is even, compare first two elements and set max, min Process the remaining elements in pairs Find the larger and the smaller of the pair Compare the larger of the pair with the current max And the smaller of the pair with the current min

7 Minimum and Maximum (cont) – Total number of comparisons If n is oddcomparisons If n is even – 1 initial comparison – And 3(n – 2)/2 comparisons – For a total of 3n/2 – 2 comparisons In either case, total number of comparisons is at most

8 Selection in Expected Linear Time Goal: Select i th smallest element from A[p..r]. Partition into A[p..q-1] and A[q+1..r] if i = q – then return A[q] If i th smallest element is in A[p..q-1] – then recurse on A[p..q-1] – else recurse on A[q+1..r]

9 Selection in Expected Linear Time (cont) Randomized-Select(A, p, r, i) 1 if p = r 2 return A[p] 3 q  Randomized-Partition(A, p, r) 4k  q - p + 1 //number of elements in the low side of 5 of partition + pivot 5 if i = k //the pivot value is the answer 6 return A[q] 7else if i < k 8return Randomized-Partition(A, p, q-1, i) 9else 10return Randomized-Partition(A, q+1, r, i-k)

10

11 Analysis of Selection Algorithm Worst-case running time is  (n 2 ) – Partition takes  (n) – If we always partition around the largest remaining element, we reduce the partition-size by one element each time What is best-case?

12 Analysis of Selection Algorithm (cont) Average Case – Average-case running time is  (n) – The time required is the random variable T(n) – We want an upper bound on E[T(n)] – In Randomized-Partition, all elements are equally likely to be the pivot

13 Analysis of Selection Algorithm (cont) – So, for each k such that 1  k  n, subarray A[p..q] has k elements All  the pivot with probability 1/n – For k = 1, 2, …, n we define indicator random variables X k where X k = I{the subarray A[p..q] has exactly k elements} – So, E[X k ] = 1/n

14 Analysis of Selection Algorithm (cont) – When we choose the pivot element (which ends up in A[q]) we do not know what will happen next Do we return with the i th element (k = i)? Do we recurse on A[p..q-1]? Do we recurse on A[q+1..r]? – Decision depends on i in relation to k – We will find the upper-bound on the average case by assuming that the i th element is always in the larger partition

15 Analysis of Selection Algorithm (cont) – Now, X k = 1 for just one value of k, 0 for all others – When X k = 1, the two subarrays have sizes k – 1 and n – k – Hence the recurrence:

16 Analysis of Selection Algorithm (cont) – Taking the expected values:

17 Analysis of Selection Algorithm (cont)  Looking at the expression max(k-1, n-k) If n is even, each term from appears twice in the summation If n is odd, each term from appears twice and appears once in the summation

18 Analysis of Selection Algorithm (cont) – Thus we have – We use substitution to solve the recurrence – Note: T(1) =  (1) for n less than some constant – Assume that T(n)  cn for some constant c that satisfies the initial conditions of the recurrence

19 Analysis of Selection Algorithm (cont) – Using this inductive hypothesis

20 Analysis of Selection Algorithm (cont)

21 Analysis of Selection Algorithm (cont) – To complete the proof, we need to show that for sufficiently large n, this last expression is at most cn i.e. As long as we choose the constant c so that c/4 – a > 0 (i.e., c > 4a), we can divide both sides by c/4 – a

22 Analysis of Selection Algorithm (cont) – Thus, if we assume that T(n) =  (1) for, we have T(n) =  (n)

23 Selection in Worst-Case Linear Time “Median of Medians” algorithm It guarantees a good split when array is partitioned – Partition is modified so that the pivot now becomes an input parameter The algorithm: – If n = 1 return A[n]

24 Selection in Worst-Case Linear Time (cont) 1.Divide the n elements of the input array into  n/5  groups of 5 elements each and at most one group of (n mod 5) elements 2.Find the median of each of the  n/5  groups by using insertion sort to sort list and then pick the 3 rd element of each group 3.Use Select recursively to find the median x of the  n/5  medians found in step 2. – If even number of medians, choose lower median

25 Selection in Worst-Case Linear Time (cont) 4.Partition the input array around the “median of medians” x using the modified version of Partition. Let k be one more than the number of elements on the low side of the partition, so that x is the k th smallest element and there are n – k elements on the high side of the partition 5.if i = k, then return x. Otherwise, use Select recursively to find the i th smallest element on the low side if i k

26 Selection in Worst-Case Linear Time (cont) Example of “Median of Medians” – Input Array A[1..125] – Step 1: 25 groups of 5 – Step 2: We get 25 medians – Step 3: Step 1: Using the 25 medians we get 5 groups of 5 Step 2: We get 5 medians Step 3: Step 1: Using the 5 medians, we get 1 group of 5 Step 2: We get 1 median – Step 4: Partition A around the median

27 Analyzing “Median of Medians” Analyzing “median of medians” – The following diagram might be helpful:

28 Analyzing “Median of Medians” (cont) – First, we need to put a lower bound on how many elements are greater than x (pivot) – How many of the medians are greater than x? At least half of the medians from the groups – Why “at least half?” medians are greater than x

29 The two discarded groups Analyzing “Median of Medians” (cont) – Each of these medians contribute at least 3 elements greater than x except for two groups The group that contains x – contributes only 2 elements greater than x The group that has less than 5 elements – So the total number of elements > x is at least:

30 Analyzing “Median of Medians” (cont) – Similarly, there are at least elements smaller than x – Thus, in the worst case, for Step 5 Select is called recursively on the largest partition The largest partition has at most elements The size of the array minus the number of elements in the smaller partition

31 Analyzing “Median of Medians” (cont) – Developing the recurrence: Step 1 takes  (n) time Step 2 takes  (n) time –  (n) calls to Insertion Sort on sets of size  (1) Step 3 takes Step 4 takes  (n) time Step 5 takes at most

32 Analyzing “Median of Medians” (cont) – So the recurrence is – Now use substitution to solve Assume T(n)  cn for some suitable large constant c and all n > ??? Also pick a constant a such that the function described by the  (n) term is bounded above by an for all n > 0

33 Comes from removing the   Analyzing “Median of Medians” (cont) Which is at most cn if If n = 70, then this inequality is undefined

34 Analyzing “Median of Medians” (cont) – We assume that n  71, so – Choosing c  710a will satisfy the inequality on the previous slide – You could choose any constant > 70 to be the base case constant Thus, the selection problem can be solved in the worst-case in linear time

35 Review of Sorts Review of sorts seen so far – Insertion Sort Easy to code Fast on small inputs (less than ~50) Fast on nearly sorted inputs Stable  (n) best case (sorted list)  (n 2 ) average case  (n 2 ) worst case (reverse sorted list)

36 Review of Sorts Stable means that numbers with the same value appear in the output array in the same order as they do in the input array. That is, ties between two numbers are broken by the rule that whichever number appears first in the input array appears first in the output array. Normally, the property of stability is important only when satellite data are carried around with the element being sorted.

37 Review of Sorts (cont) – MergeSort Divide and Conquer algorithm Doesn’t sort in place Requires memory as a function of n Stable  (nlgn) best case  (nlgn) average case  (nlgn) worst case

38 Review of Sorts (cont) – QuickSort Divide and Conquer algorithm – No merge step needed Small constants Fast in practice Not stable  (nlgn) best case  (nlgn) average case  (n 2 ) worst case

39 Review of Sorts (cont) Several of these algorithms sort in  (nlgn) time – MergeSort in worst case – QuickSort on average On some input we can achieve  (nlgn) time for each of these algorithms The sorted order they determine is based only on comparisons between the input elements They are called comparison sorts

40 Review of Sorts (cont) Other techniques for sorting exist, such as Linear Sorting which is not based on comparisons Usually with some restrictions or assumptions on input elements Linear Sorting techniques include: – Counting Sort – Radix Sort – Bucket Sort

41 Lower Bounds for Sorting In general, assuming unique inputs, comparison sorts are expressed in terms of comparisons. – are equivalent in learning about the order of a i and a j What is the best we can do on the worst case type of input? What is the best worst-case running time?

42 The Decision-Tree Model 1:2 2:3 1:3  1,2,3   2,1,3   1,3,2  3,1,2  2,3,1  3,2,1           n = 3 input:  a 1,a 2,a 3  # possible outputs = 3! = 6 Each possible output is a leaf

43 Analysis of Decision-Tree Model Worst Case Comparisons is equal to height of decision tree Lower bound on the worst case running time is the lower bound on the height of the decision tree. Note that the number of leaves in the decision tree  n!, where n = number elements in the input sequence

44 Theorem 8.1 Any comparison sort algorithm requires  (nlgn) comparisons in the worst case Proof: – Consider a decision tree of height h that sorts n elements – Since there are n! permutations of n elements, each permutation representing a distinct sorted order, the tree must have at least n! leaves

45 Theorem 8.1 (cont) – A binary tree of height h has at most 2 h leaves The best possible worst case running time for comparison sorts is thus  (nlgn) Mergesort, which is O(nlgn), is asymptotically optimal By equation 3.18

46 Sorting in Linear Time How can we do better? – CountingSort – RadixSort – BucketSort