Dr. Yingwu Zhu Chapter 9, p 213-220 Linear Time Selection Dr. Yingwu Zhu Chapter 9, p 213-220.

Slides:



Advertisements
Similar presentations
©2001 by Charles E. Leiserson Introduction to AlgorithmsDay 9 L6.1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 6 Prof. Erik Demaine.
Advertisements

Comp 122, Spring 2004 Order Statistics. order - 2 Lin / Devi Comp 122 Order Statistic i th order statistic: i th smallest element of a set of n elements.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 6.
1 More Sorting; Searching Dan Barrish-Flood. 2 Bucket Sort Put keys into n buckets, then sort each bucket, then concatenate. If keys are uniformly distributed.
CS 3343: Analysis of Algorithms Lecture 14: Order Statistics.
Medians and Order Statistics
1 Selection --Medians and Order Statistics (Chap. 9) The ith order statistic of n elements S={a 1, a 2,…, a n } : ith smallest elements Also called selection.
Introduction to Algorithms
Introduction to Algorithms Jiafen Liu Sept
1 Today’s Material Medians & Order Statistics – Ch. 9.
Data Structures and Algorithms (AT70.02) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: CLRS “Intro.
Median Finding, Order Statistics & Quick Sort
Algorithm Design Techniques: Divide and Conquer. Introduction Divide and conquer – algorithm makes at least two recursive calls Subproblems should not.
Spring 2015 Lecture 5: QuickSort & Selection
Median/Order Statistics Algorithms
CS38 Introduction to Algorithms Lecture 7 April 22, 2014.
CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Chapter 4: Divide and Conquer The Design and Analysis of Algorithms.
Princeton University COS 423 Theory of Algorithms Spring 2002 Kevin Wayne Linear Time Selection These lecture slides are adapted from CLRS 10.3.
Median, order statistics. Problem Find the i-th smallest of n elements.  i=1: minimum  i=n: maximum  i= or i= : median Sol: sort and index the i-th.
Selection: Find the ith number
Analysis of Algorithms CS 477/677
CSC 2300 Data Structures & Algorithms March 20, 2007 Chapter 7. Sorting.
10/15/2002CSE More on Sorting CSE Algorithms Sorting-related topics 1.Lower bound on comparison sorting 2.Beating the lower bound 3.Finding.
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
© The McGraw-Hill Companies, Inc., Chapter 6 Prune-and-Search Strategy.
Order Statistics. Order statistics Given an input of n values and an integer i, we wish to find the i’th largest value. There are i-1 elements smaller.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Analysis of Algorithms CS 477/677
Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
1 Prune-and-Search Method 2012/10/30. A simple example: Binary search sorted sequence : (search 9) step 1  step 2  step 3  Binary search.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
1 Medians and Order Statistics CLRS Chapter 9. upper median lower median The lower median is the -th order statistic The upper median.
1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II.
Young CS 331 D&A of Algo. Topic: Divide and Conquer1 Divide-and-Conquer General idea: Divide a problem into subprograms of the same kind; solve subprograms.
CSC317 1 Quicksort on average run time We’ll prove that average run time with random pivots for any input array is O(n log n) Randomness is in choosing.
Randomized Quicksort (8.4.2/7.4.2) Randomized Quicksort –i = Random(p, r) –swap A[p]  A[i] –partition A(p, r) Average analysis = Expected runtime –solving.
Chapter 9: Selection of Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Analysis of Algorithms CS 477/677
Order Statistics.
Order Statistics Comp 122, Spring 2004.
Introduction to Algorithms Prof. Charles E. Leiserson
Randomized Algorithms
Sorting We have actually seen already two efficient ways to sort:
Order Statistics(Selection Problem)
Quick Sort (11.2) CSE 2011 Winter November 2018.
Randomized Algorithms
Medians and Order Statistics
Topic: Divide and Conquer
Lecture No 6 Advance Analysis of Institute of Southern Punjab Multan
CS 3343: Analysis of Algorithms
Order Statistics Comp 550, Spring 2015.
Order Statistics Def: Let A be an ordered set containing n elements. The i-th order statistic is the i-th smallest element. Minimum: 1st order statistic.
CSE 373 Data Structures and Algorithms
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Chapter 9: Medians and Order Statistics
Topic: Divide and Conquer
Algorithms CSCI 235, Spring 2019 Lecture 20 Order Statistics II
Order Statistics Comp 122, Spring 2004.
Lecture 15, Winter 2019 Closest Pair, Multiplication
Chapter 9: Selection of Order Statistics
The Selection Problem.
Quicksort and Randomized Algs
Design and Analysis of Algorithms
Richard Anderson Lecture 14 Divide and Conquer
CS200: Algorithm Analysis
Analysis of Algorithms CS 477/677
Medians and Order Statistics
Presentation transcript:

Dr. Yingwu Zhu Chapter 9, p 213-220 Linear Time Selection Dr. Yingwu Zhu Chapter 9, p 213-220

Terminology Order statistic The i-th order statistic of a set of n elements is the i-th smallest element Median n is odd, (n+1)/2 –th smallest n is even, floor((n+1)/2), usually lower median Selection problem Given a set of n elements and i such that 1<=i<=n Find the i-th smallest element

Where to Build a Road? Given (x,y) coordinates of N houses, where should you build road parallel to x-axis to minimize construction cost of building driveways? 1 2 3 Chazelle: Amazingly enough, the road costs nothing to built. It is the driveways that cost money. Driveway cost is proportional to its distance to road. (Obviously, they will be perpendicular.) 4 5 6 7 8 9 10 11 12

Where to Build a Road? Given (x,y) coordinates of N houses, where should you build road parallel to x-axis to minimize construction cost of building driveways? n1 = nodes on top. n2 = nodes on bottom. 1 Decreases total cost by (n2-n1)  2 3 4  5 6 7 8 9 10 11 12

Where to Build a Road? Given (x,y) coordinates of N houses, where should you build road parallel to x-axis to minimize construction cost of building driveways? n1 = nodes on top. n2 = nodes on bottom. 1 2 3 4 5 6 7 8 9 10 11 12

Where to Build a Road? Given (x,y) coordinates of N houses, where should you build road parallel to x-axis to minimize construction cost of building driveways? n1 = nodes on top. n2 = nodes on bottom. 1 2 3 4 5 6 7 8 9 10 11 12

Where to Build a Road? Given (x,y) coordinates of N houses, where should you build road parallel to x-axis to minimize construction cost of building driveways? Solution: put street at median of y coordinates. 1 2 3 4 5 6 7 8 9 10 11 12

Order Statistics Given N linearly ordered elements, find ith smallest element. Min: i = 1 Max: i = N Median: i = (N+1) / 2 and (N+1) / 2 O(N) for min or max. O(N log N) comparisons by sorting. O(N log i) with heaps. Can we do in worst-case O(N) comparisons? Surprisingly, yes. (Blum, Floyd, Pratt, Rivest, Tarjan, 1973) Assumption to make presentation cleaner. All items have distinct values.

Randomized Select Similar to quicksort, but throw away useless "half" at each iteration. Select ith smallest element from A[L..H]. if (L == H) return A[L] x  random_partition(A, L, H) k  x-L+1 if (i == k) return x else if (i < k) return Rand_Select(A, L, x-1, i) else if (i > k) return Select(A, x+1, H, i-k) Rand_Select (A, L, H, i) x = pivot position

Summary The expected running time: O(n) The worst-case running time: O(n2) However, because it is randomized, no particular input elicits the worst-case behavior

The worst-case linear time selection: partition Divide N elements into N/5 groups of 5 elements each, plus extra. 22 45 14 28 29 44 39 09 23 10 52 50 05 06 38 11 15 03 26 37 53 25 54 40 02 12 16 30 19 55 13 41 48 01 18 43 17 47 46 24 20 33 32 31 34 04 07 51 49 35 27 21 08 36 N = 54

The worst-case linear time selection: partition Divide N elements into N/5 groups of 5 elements each, plus extra. Brute force sort each of the 5-element groups. 22 45 14 28 29 44 39 09 23 10 52 50 05 06 38 11 15 03 26 37 53 25 54 40 02 12 16 30 19 55 13 41 48 01 18 43 17 47 46 24 20 33 32 31 34 04 07 51 49 35 27 21 08 36 22 45 29 28 14 10 44 39 23 09 06 52 50 38 05 11 37 26 15 03 25 54 53 40 02 16 53 30 19 12 13 48 41 18 01 24 47 46 43 17 31 34 33 32 20 07 51 49 35 04

The worst-case linear time selection: partition Divide N elements into N/5 groups of 5 elements each, plus extra. Brute force sort each of the 5-element groups. Find x = "median of medians" by Select() on N/5 medians. 14 09 05 03 02 12 01 17 20 04 36 32 is also median the algorithm could return 22 10 06 11 25 16 13 24 31 07 27 28 28 23 38 15 40 19 18 43 32 35 08 29 39 50 26 53 30 41 46 33 49 21 45 44 52 37 54 53 48 47 34 51

The worst-case linear time selection: partition Divide N elements into N/5 groups of 5 elements each, plus extra. Brute force sort each of the 5-element groups. Find x = "median of medians" by Select() on N/5 medians. Partition the array using x 14 09 05 03 02 12 01 17 20 04 36 32 is also median the algorithm could return 22 10 06 11 25 16 13 24 31 07 27 28 28 23 38 15 40 19 18 43 32 35 08 29 39 50 26 53 30 41 46 33 49 21 45 44 52 37 54 53 48 47 34 51

The worst-case linear time selection: partition Divide N elements into N/5 groups of 5 elements each, plus extra. Brute force sort each of the 5-element groups. Find x = "median of medians" by Select() on N/5 medians. Partition the array using x , follow 3 cases in Rand_Select() 14 09 05 03 02 12 01 17 20 04 36 32 is also median the algorithm could return 22 10 06 11 25 16 13 24 31 07 27 28 28 23 38 15 40 19 18 43 32 35 08 29 39 50 26 53 30 41 46 33 49 21 45 44 52 37 54 53 48 47 34 51

Selection Analysis Crux of proof: delete roughly 30% of elements by partitioning. At least 1/2 of 5 element medians  x at least  N / 5 / 2 = N / 10 medians  x 14 09 05 03 02 12 01 17 20 04 36 CLR Instructors manual median of medians 22 10 06 11 25 16 13 24 31 07 27 28 23 38 15 40 19 18 43 32 35 08 29 39 50 26 53 30 41 46 33 49 21 45 44 52 37 54 53 48 47 34 51

Selection Analysis Crux of proof: delete roughly 30% of elements by partitioning. At least 1/2 of 5 element medians  x at least  N / 5 / 2 = N / 10 medians  x At least 3 N / 10 elements  x. 14 09 05 03 02 12 01 17 20 04 36 median of medians 22 10 06 11 25 16 13 24 31 07 27 28 23 38 15 40 19 18 43 32 35 08 29 39 50 26 53 30 41 46 33 49 21 45 44 52 37 54 53 48 47 34 51

Selection Analysis Crux of proof: delete roughly 30% of elements by partitioning. At least 1/2 of 5 element medians  x at least  N / 5 / 2 = N / 10 medians  x At least 3 N / 10 elements  x. At least 3 N / 10 elements  x. 14 09 05 03 02 12 01 17 20 04 36 median of medians 22 10 06 11 25 16 13 24 31 07 27 28 23 38 15 40 19 18 43 32 35 08 29 39 50 26 53 30 41 46 33 49 21 45 44 52 37 54 53 48 47 34 51

Selection Analysis Crux of proof: delete roughly 30% of elements by partitioning. At least 1/2 of 5 element medians  x at least  N / 5 / 2 = N / 10 medians  x At least 3 N / 10 elements  x. At least 3 N / 10 elements  x.  Select() called recursively (Case 2 or 3) with at most N - 3 N / 10 elements. C(N) = # comparisons on a file of size N. Now, solve recurrence. Apply master theorem? Assume N is a power of 2? Assume C(N) is monotone non-decreasing? would be nice to assume C(N) is a power of 5 and a power of 10 to eliminate floors, but unfortunately not many such integers :) CLRS assumes that C(N) is monotone. Theirs is (I think). Ours is not.

Selection Analysis Analysis of selection recurrence. T(N) = # comparisons on a file of size  N. T(N) is monotone, but C(N) is not! Claim: T(N)  20cN. Base case: N < 50. Inductive hypothesis: assume true for 1, 2, . . . , N-1. Induction step: for N  50, we have: For n  50, 3 N / 10  N / 4.

Linear Time Selection Postmortem Practical considerations. Constant (currently) too large to be useful. Practical variant: choose random partition element. O(N) expected running time ala quicksort. Open problem: guaranteed O(N) with better constant. Quicksort. Worst case O(N log N) if always partition on median. Justifies practical variants: median-of-3, median-of-5.