Chapter 9: Medians and Order Statistics

Slides:



Advertisements
Similar presentations
Comp 122, Spring 2004 Order Statistics. order - 2 Lin / Devi Comp 122 Order Statistic i th order statistic: i th smallest element of a set of n elements.
Advertisements

Quicksort Lecture 3 Prof. Dr. Aydın Öztürk. Quicksort.
Order Statistics(Selection Problem) A more interesting problem is selection:  finding the i th smallest element of a set We will show: –A practical randomized.
CS 3343: Analysis of Algorithms Lecture 14: Order Statistics.
Medians and Order Statistics
1 Selection --Medians and Order Statistics (Chap. 9) The ith order statistic of n elements S={a 1, a 2,…, a n } : ith smallest elements Also called selection.
Introduction to Algorithms
Introduction to Algorithms Jiafen Liu Sept
Data Structures and Algorithms (AT70.02) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: CLRS “Intro.
Median Finding, Order Statistics & Quick Sort
Spring 2015 Lecture 5: QuickSort & Selection
Median/Order Statistics Algorithms
CS38 Introduction to Algorithms Lecture 7 April 22, 2014.
1 Introduction to Randomized Algorithms Md. Aashikur Rahman Azim.
CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Median, order statistics. Problem Find the i-th smallest of n elements.  i=1: minimum  i=n: maximum  i= or i= : median Sol: sort and index the i-th.
CSC 2300 Data Structures & Algorithms March 20, 2007 Chapter 7. Sorting.
Tirgul 4 Order Statistics Heaps minimum/maximum Selection Overview
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
Order Statistics. Order statistics Given an input of n values and an integer i, we wish to find the i’th largest value. There are i-1 elements smaller.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Order Statistics ● The ith order statistic in a set of n elements is the ith smallest element ● The minimum is thus the 1st order statistic ● The maximum.
Order Statistics David Kauchak cs302 Spring 2012.
Order Statistics(Selection Problem)
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
1 Medians and Order Statistics CLRS Chapter 9. upper median lower median The lower median is the -th order statistic The upper median.
1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II.
COSC 3101A - Design and Analysis of Algorithms 4 Quicksort Medians and Order Statistics Many of these slides are taken from Monica Nicolescu, Univ. of.
CSC317 1 Quicksort on average run time We’ll prove that average run time with random pivots for any input array is O(n log n) Randomness is in choosing.
Randomized Quicksort (8.4.2/7.4.2) Randomized Quicksort –i = Random(p, r) –swap A[p]  A[i] –partition A(p, r) Average analysis = Expected runtime –solving.
David Luebke 1 6/26/2016 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
1 Chapter 7 Quicksort. 2 About this lecture Introduce Quicksort Running time of Quicksort – Worst-Case – Average-Case.
Chapter 9: Selection of Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Order Statistics.
Order Statistics Comp 122, Spring 2004.
Introduction to Algorithms Prof. Charles E. Leiserson
Introduction to Algorithms (2nd edition)
Linear-Time Sorting Continued Medians and Order Statistics
Randomized Algorithms
Lecture 8 Randomized Algorithms
Order Statistics(Selection Problem)
Chapter 3 Image Slides Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Dr. Yingwu Zhu Chapter 9, p Linear Time Selection Dr. Yingwu Zhu Chapter 9, p
Randomized Algorithms
Data Structures Review Session
Medians and Order Statistics
CS 3343: Analysis of Algorithms
Order Statistics Comp 550, Spring 2015.
Chapter 4.
CS 3343: Analysis of Algorithms
Order Statistics Def: Let A be an ordered set containing n elements. The i-th order statistic is the i-th smallest element. Minimum: 1st order statistic.
CSE 373 Data Structures and Algorithms
Ch. 2: Getting Started.
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Chapter 7 Quicksort.
Algorithms CSCI 235, Spring 2019 Lecture 20 Order Statistics II
Order Statistics Comp 122, Spring 2004.
Lecture 15, Winter 2019 Closest Pair, Multiplication
Chapter 9: Selection of Order Statistics
CS 583 Analysis of Algorithms
The Selection Problem.
Quicksort and Randomized Algs
Algorithms CSCI 235, Spring 2019 Lecture 19 Order Statistics
Design and Analysis of Algorithms
CS200: Algorithm Analysis
Medians and Order Statistics
Presentation transcript:

Chapter 9: Medians and Order Statistics

kth smallest element  kth order statistics About this lecture Finding max, min in an unsorted array (upper bound and lower bound) Finding both max and min (upper bound) Selecting the kth smallest element kth smallest element  kth order statistics

Finding Maximum in unsorted array

Finding Maximum (Method I) Let S denote the input set of n items To find the maximum of S, we can: Step 1: Set max = item 1 Step 2: for k = 2, 3, …, n if (item k is larger than max) Update max = item k; Step 3: return max; # comparisons = n – 1

Finding Maximum (Method II) Define a function Find-Max as follows: Find-Max(R, k) /* R is a set with k items */ 1. if (k  2) return maximum of R; 2. Partition items of R into pairs; 3. Delete smaller item from R in each pair; 4. return Find-Max(R, k - ); Calling Find-Max(S, n) gives the maximum of S

Finding Maximum (Method II) Let T(n) = # comparisons for Find-Max with problem size n So, T(n) = T(n - n/2) + n/2 for n  3 T(2) = 1 Solving the recurrence (by substitution), we get T(n) = n - 1

Lower Bound Question: Can we find the maximum using fewer than n – 1 comparisons? Answer: No ! Every element except the winner must drop at least one match So, we need to ensure n-1 items not max  at least n – 1 comparisons are needed

Finding Both Max and Min in unsorted array

Finding Both Max and Min Can we find both max and min quickly? Solution 1: First, find max with n – 1 comparisons Then, find min with n – 1 comparisons  Total = 2n – 2 comparisons Is there a better solution ??

Finding Both Max and Min Better Solution: (Case 1: if n is even) First, partition items into n/2 pairs; … Next, compare items within each pair; … = larger = smaller

Finding Both Max and Min Then, max = Find-Max in larger items min = Find-Min in smaller items … Find-Max Find-Min # comparisons = 3n/2 – 2

Finding Both Max and Min Better Solution: (Case 2: if n is odd) We find max and min of first n - 1 items; if (last item is larger than max) Update max = last item; if (last item is smaller than min ) Update min = last item; # comparisons = 3(n-1)/2

Finding Both Max and Min Conclusion: To find both max and min: if n is odd: 3(n-1)/2 comparisons if n is even: 3n/2 – 2 comparisons Combining: at most 3n/2 comparisons  better than finding max and min separately

Selecting kth smallest item in unsorted array

Selection in Expected Linear Time Randomized-Select(A, p, r, i) if p==r return A[p] q = Randomized-Partition(A, p, r) k = q – p + 1 if i ==k //the pivot value is the answer return A[q] else if i < k return Randomized-Partition(A, p, q-1, i) 6. else return Randomized-Partition(A, q+1, r, i-k)

Running Time Worst case: T(n) = O(n) + T(n–1) = O(n2) Average case: E(n) = O(n) + 1/n ∑1≤k≤n E(max{k-1, n-k}) = O(n) + 2/n ∑n/2≤k≤n-1 E(k) = O(n) (Prove it using substitution method by yourself.)  

Selection in Linear Time In next slides, we describe a recursive call Select(S, k) which supports finding the kth smallest element in S Recursion is used for two purposes: (1) selecting a good pivot (as in Quicksort) (2) solving a smaller sub-problem

Select(S, k) /* First, find a good pivot */ If |S|less than a small number then use insertion sort to return the answer Else Partition S into |S|/5 groups, each group has five items (one group may have fewer items); 2. Sort each group separately; 3. Collect median of each group into S’; 4. Find median m of S’: m = Select(S’,|S|/5/2);

5. Let q = # items of S smaller than m; 6. If (k == q + 1) return m; /* Partition with pivot */ 7. Else partition S into X and Y X = {items smaller than m} Y = {items larger than m} /* Next, form a sub-problem */ 8. If (k  q + 1) return Select(X, k) 9. Else return Select(Y, k–(q+1));

Copyright © The McGraw-Hill Companies, Inc Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Running Time In our selection algorithm, we chose m, which is the median of medians, to be a pivot and partition S into two sets X and Y In fact, if we choose any other item as the pivot, the algorithm is still correct Why don’t we just pick an arbitrary pivot so that we can save some time ??

Running Time A closer look reviews that the worst-case running time depends on |X| and |Y| Precisely, if T(|S|) denote the worst-case running time of the algorithm on S, then T(|S|) = T(|S|/5) + Q(|S|) + max {T(|X|),T(|Y|) }

Running Time Later, we show that if we choose m, the “median of medians”, as the pivot, both |X| and |Y| will be at most 7|S|/10 + 6 Consequently, T(n) = T(n /5) + Q(n) + T(7n/10 + 6)  T(n) = Q(n) (obtained by substitution)

Substitution if

Median of Medians Let’s begin with n/5 sorted groups, each has 5 items (one group may have fewer) … = larger = smaller = median

Median of Medians Then, we obtain the median of medians, m = m Groups with median smaller than m Groups with median larger than m = m

Median of Medians The number of items with value greater than m is at least 3(n/5/2 – 2) two groups may not have 3 ‘crossed’ items each full group has 3 ‘crossed’ items min # of groups  number of items: at least 3n/10 – 6

Median of Medians Previous page implies that at most 7n/10 + 6 items are smaller than m For large enough n (say, n  140) 7n/10 + 6  3n/4 |X| is at most 3n/4 for large enough n

Median of Medians Similarly, we can show that at most 7n/10 + 6 items are larger than m |Y| is at most 3n/4 for large enough n Conclusion: The “median of medians” helps us control the worst-case size of the sub-problem  without it, the algorithm runs in Q(n2) time in the worst-case

Homework Problem: 9-1 (Due: Nov. 2) Practice at home: 9.1-1, 9.3-7, 9.3-9