Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.

Slides:



Advertisements
Similar presentations
©2001 by Charles E. Leiserson Introduction to AlgorithmsDay 9 L6.1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 6 Prof. Erik Demaine.
Advertisements

Comp 122, Spring 2004 Order Statistics. order - 2 Lin / Devi Comp 122 Order Statistic i th order statistic: i th smallest element of a set of n elements.
Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
1 More Sorting; Searching Dan Barrish-Flood. 2 Bucket Sort Put keys into n buckets, then sort each bucket, then concatenate. If keys are uniformly distributed.
Order Statistics(Selection Problem) A more interesting problem is selection:  finding the i th smallest element of a set We will show: –A practical randomized.
CS 3343: Analysis of Algorithms Lecture 14: Order Statistics.
Medians and Order Statistics
1 Selection --Medians and Order Statistics (Chap. 9) The ith order statistic of n elements S={a 1, a 2,…, a n } : ith smallest elements Also called selection.
Introduction to Algorithms
Introduction to Algorithms Jiafen Liu Sept
1 Today’s Material Medians & Order Statistics – Ch. 9.
Data Structures and Algorithms (AT70.02) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: CLRS “Intro.
Spring 2015 Lecture 5: QuickSort & Selection
Analysis of Algorithms CS 477/677 Randomizing Quicksort Instructor: George Bebis (Appendix C.2, Appendix C.3) (Chapter 5, Chapter 7)
The Substitution method T(n) = 2T(n/2) + cn Guess:T(n) = O(n log n) Proof by Mathematical Induction: Prove that T(n)  d n log n for d>0 T(n)  2(d  n/2.
Median/Order Statistics Algorithms
1 Introduction to Randomized Algorithms Md. Aashikur Rahman Azim.
Probabilistic (Average-Case) Analysis and Randomized Algorithms Two different but similar analyses –Probabilistic analysis of a deterministic algorithm.
Ch. 7 - QuickSort Quick but not Guaranteed. Ch.7 - QuickSort Another Divide-and-Conquer sorting algorithm… As it turns out, MERGESORT and HEAPSORT, although.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Median, order statistics. Problem Find the i-th smallest of n elements.  i=1: minimum  i=n: maximum  i= or i= : median Sol: sort and index the i-th.
Selection: Find the ith number
Analysis of Algorithms CS 477/677
CSC 2300 Data Structures & Algorithms March 20, 2007 Chapter 7. Sorting.
Tirgul 4 Order Statistics Heaps minimum/maximum Selection Overview
David Luebke 1 7/2/2015 Linear-Time Sorting Algorithms.
David Luebke 1 7/2/2015 Medians and Order Statistics Structures for Dynamic Sets.
David Luebke 1 8/17/2015 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Analysis of Algorithms CS 477/677
Introduction to Algorithms Jiafen Liu Sept
David Luebke 1 6/3/2016 CS 332: Algorithms Analyzing Quicksort: Average Case.
Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Order Statistics ● The ith order statistic in a set of n elements is the ith smallest element ● The minimum is thus the 1st order statistic ● The maximum.
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
Order Statistics(Selection Problem)
QuickSort (Ch. 7) Like Merge-Sort, based on the three-step process of divide- and-conquer. Input: An array A[1…n] of comparable elements, the starting.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
1 Medians and Order Statistics CLRS Chapter 9. upper median lower median The lower median is the -th order statistic The upper median.
1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II.
Mudasser Naseer 1 1/25/2016 CS 332: Algorithms Lecture # 10 Medians and Order Statistics Structures for Dynamic Sets.
COSC 3101A - Design and Analysis of Algorithms 4 Quicksort Medians and Order Statistics Many of these slides are taken from Monica Nicolescu, Univ. of.
David Luebke 1 2/19/2016 Priority Queues Quicksort.
CSC317 1 Quicksort on average run time We’ll prove that average run time with random pivots for any input array is O(n log n) Randomness is in choosing.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
David Luebke 1 6/26/2016 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
David Luebke 1 7/2/2016 CS 332: Algorithms Linear-Time Sorting: Review + Bucket Sort Medians and Order Statistics.
Chapter 9: Selection of Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Order Statistics.
Order Statistics Comp 122, Spring 2004.
Introduction to Algorithms Prof. Charles E. Leiserson
Linear-Time Sorting Continued Medians and Order Statistics
Randomized Algorithms
Order Statistics(Selection Problem)
Randomized Algorithms
Medians and Order Statistics
CS 3343: Analysis of Algorithms
Order Statistics Comp 550, Spring 2015.
Order Statistics Def: Let A be an ordered set containing n elements. The i-th order statistic is the i-th smallest element. Minimum: 1st order statistic.
Chapter 9: Medians and Order Statistics
Order Statistics Comp 122, Spring 2004.
Chapter 9: Selection of Order Statistics
CS 583 Analysis of Algorithms
The Selection Problem.
CS200: Algorithm Analysis
Medians and Order Statistics
Presentation transcript:

Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is the nth order statistic The median is the n/2 order statistic – If n is even, there are 2 medians occur at i = ⌊ (n + 1)/2 ⌋ (the lower median) and at i = ⌈ (n + 1)/2 ⌉ (the upper median). How can we calculate order statistics? What is the running time?

Order Statistics The selection problem can be specified formally as follows: Input: A set A of n (distinct) numbers and a number i, with 1 ≤ i ≤ n. Output: The element x ∈ A that is larger than exactly i − 1 other elements of A. The selection problem can be solved in O(nlgn) time Since we can sort the numbers using heapsort or merge sort and then simply index the ith element in the output array. There are faster algorithms, however.

Minimum and Maximum MINIMUM(A) 1 min← A[1] 2 for i ← 2 to length[A] 3 do if min > A[i ] 4 then min← A[i ] 5 return min Time complexity of the algorithm is n − 1 comparisons Finding the maximum can, of course, be accomplished with n − 1 comparisons as well.

Finding Order Statistics: The Selection Problem A more interesting problem is selection: finding the ith smallest element of a set We will show: – A practical randomized algorithm with O(n) expected running time – A cool algorithm of theoretical interest only with O(n) worst-case running time

Randomized Selection Key idea: use partition() from quicksort – But, only need to examine one subarray – This savings shows up in running time: O(n) We will again use a slightly different partition: q = RandomizedPartition(A, p, r)  A[q]  A[q] qp r

Randomized Selection RandomizedSelect(A, p, r, i) if (p == r) then return A[p]; q = RandomizedPartition(A, p, r) k = q - p + 1; if (i == k) then return A[q]; if (i < k) then return RandomizedSelect(A, p, q-1, i); else return RandomizedSelect(A, q+1, r, i-k);  A[q]  A[q] k q pr

Randomized Selection Analyzing RandomizedSelect() – Worst case: partition always 0:n-1 T(n) = T(n-1) + O(n)= ??? = O(n 2 ) (arithmetic series) No better than sorting! – “Best” case: suppose a 9:1 partition T(n) = T(9n/10) + O(n) = ??? = O(n)(Master Theorem, case 3) Better than sorting! What if this had been a 99:1 split?

Randomized Selection The time required by RANDOMIZED-SELECT on an input array A[p.. r] of n elements is a random variable that we denote by T (n) We obtain an upper bound on E[T(n)] as follows: RANDOMIZED-PARTITION() is equally likely to return any element as the pivot. For each k such that 1 ≤ k ≤ n, the subarray A[p.. q] has k elements with probability 1/n. For k = 1, 2,..., n, we define indicator random variables X k where: X k = I {the subarray A[p.. q] has exactly k elements}, and so we have E[X k ] = 1/n.

Randomized Selection When we call RANDOMIZED-SELECT and choose A[q] as the pivot element, we do not know: –If we will terminate immediately with the correct answer, –Recurse on the subarray A[p.. q − 1], or –Recurse on the subarray A[q r]. This decision depends on where the ith smallest element falls relative to A[q]. Assuming that T(n) is monotonically increasing, we can bound the time needed for the recursive call by the time needed for the recursive call on the largest possible input.

Randomized Selection We assume, to obtain an upper bound, that the ith element is always on the side of the partition with the greater number of elements. For a given call of RANDOMIZED-SELECT, the indicator random variable X k has the value 1 for exactly one value of k, and it is 0 for all other k. When X k = 1, the two subarrays on which we might recurse have sizes k − 1 and n − k.

Randomized Selection Hence, we have the recurrence Average case (For upper bound, assume ith element always falls in larger side of partition): Let’s show that T(n) = O(n) by substitution What happened here?

Randomized Selection Taking the expected values, we have

Randomized Selection Let us consider the expression max(k − 1, n − k) We have, If n is even, each term from T (n/2) up to T (n − 1) appears exactly twice in the summation, and if n is odd, all these terms appear twice and T (n/2) appears once. Thus, we have

Randomized Selection We solve the recurrence by substitution. Assume that, T (n) ≤ cn, for some constant c that satisfies the initial conditions of the recurrence. We assume that, T (n) = O(1), for n less than some constant. We also pick a constant a such that the function described by the O(n) term above (which describes the non-recursive component of the running time of the algorithm) is bounded from above by a.n for all n > 0.

Randomized Selection Using this inductive hypothesis, we have

Randomized Selection In order to complete the proof, we need to show that for sufficiently large n, this last expression is at most cn or, equivalently, that cn/4 − c/2 − an ≥ 0. If we add c/2 to both sides and factor out n, we get n(c/4 − a) ≥ c/2. As long as we choose the constant c so that c/4 − a > 0, i.e., c > 4a, we can divide both sides by c/4 − a, giving

Randomized Selection Thus, if we assume that T (n) = O(1) for n < 2c/(c−4a), we have T (n) = O(n). We conclude that any order statistic, and in particular the median, can be determined on average in linear time.

The End