Princeton University COS 423 Theory of Algorithms Spring 2002 Kevin Wayne Linear Time Selection These lecture slides are adapted from CLRS 10.3.
2 Where to Build a Road? Given (x,y) coordinates of N houses, where should you build road parallel to x-axis to minimize construction cost of building driveways?
3 Where to Build a Road? Given (x,y) coordinates of N houses, where should you build road parallel to x-axis to minimize construction cost of building driveways? n n 1 = nodes on top. n n 2 = nodes on bottom Decreases total cost by (n 2 -n 1 ) 1210
4 Where to Build a Road? Given (x,y) coordinates of N houses, where should you build road parallel to x-axis to minimize construction cost of building driveways? n n 1 = nodes on top. n n 2 = nodes on bottom
5 Where to Build a Road? Given (x,y) coordinates of N houses, where should you build road parallel to x-axis to minimize construction cost of building driveways? n n 1 = nodes on top. n n 2 = nodes on bottom
6 Where to Build a Road? Given (x,y) coordinates of N houses, where should you build road parallel to x-axis to minimize construction cost of building driveways? Solution: put street at median of y coordinates
7 Order Statistics Given N linearly ordered elements, find i th smallest element. n Min: i = 1 n Max: i = N n Median: i = (N+1) / 2 and (N+1) / 2 n O(N) for min or max. n O(N log N) comparisons by sorting. n O(N log i) with heaps. Can we do in worst-case O(N) comparisons? n Surprisingly, yes. (Blum, Floyd, Pratt, Rivest, Tarjan, 1973) Assumption to make presentation cleaner. n All items have distinct values.
8 Select Similar to quicksort, but throw away useless "half" at each iteration. n Select i th smallest element from a 1, a 2,..., a N. x Partition(N, a 1, a 2,..., a N ) k rank(x) if (i == k) return x else if (i < k) b[] all items of a[] less than x return Select(i th, k-1, b 1, b 2,..., b k-1 ) else if (i > k) c[] all items of a[] greater than x return Select((i-k) th, N-k, c 1, c 2,..., c N-k ) Select (i th, N, a 1, a 2,..., a N ) x = partition element Want to choose x so that x is (roughly) the ith largest.
9 Partition Partition(). n Divide N elements into N/5 groups of 5 elements each, plus extra N = 54
Partition Partition(). n Divide N elements into N/5 groups of 5 elements each, plus extra. n Brute force sort each of the 5-element groups
11 28 Partition Partition(). n Divide N elements into N/5 groups of 5 elements each, plus extra. n Brute force sort each of the 5-element groups. Find x = "median of medians" by Select() on N/5 medians
12 Select if (N is small) use mergesort Divide a[] into groups of 5, and let m 1, m 2,..., m N/5 be list of medians. x Select(N/10, m 1, m 2,..., m N/5 ) k rank(x) if (i == k) // Case 1 return x else if (i < k) // Case 2 b[] all items of a[] less than x return Select(i th, k-1, b 1, b 2,..., b k-1 ) else if (i > k) // Case 3 c[] all items of a[] greater than x return Select((i-k) th, N-k, c 1, c 2,..., c N-k ) Select (i th, N, a 1, a 2,..., a N ) median of medians
median of medians Selection Analysis Crux of proof: delete roughly 30% of elements by partitioning. n At least 1/2 of 5 element medians x – at least N / 5 / 2 = N / 10 medians x
median of medians Selection Analysis Crux of proof: delete roughly 30% of elements by partitioning. n At least 1/2 of 5 element medians x – at least N / 5 / 2 = N / 10 medians x n At least 3 N / 10 elements x.
Selection Analysis Crux of proof: delete roughly 30% of elements by partitioning. n At least 1/2 of 5 element medians x – at least N / 5 / 2 = N / 10 medians x n At least 3 N / 10 elements x. n At least 3 N / 10 elements x. median of medians
16 Selection Analysis Crux of proof: delete roughly 30% of elements by partitioning. n At least 1/2 of 5 element medians x – at least N / 5 / 2 = N / 10 medians x n At least 3 N / 10 elements x. n At least 3 N / 10 elements x. Select() called recursively (Case 2 or 3) with at most N - 3 N / 10 elements. C(N) = # comparisons on a file of size N. Now, solve recurrence. n Apply master theorem? n Assume N is a power of 2? n Assume C(N) is monotone non-decreasing?
17 Selection Analysis Analysis of selection recurrence. n T(N) = # comparisons on a file of size N. n T(N) is monotone, but C(N) is not! Claim: T(N) 20cN. n Base case: N < 50. n Inductive hypothesis: assume true for 1, 2,..., N-1. n Induction step: for N 50, we have: For n 50, 3 N / 10 N / 4.
18 Linear Time Selection Postmortem Practical considerations. n Constant (currently) too large to be useful. n Practical variant: choose random partition element. – O(N) expected running time ala quicksort. n Open problem: guaranteed O(N) with better constant. Quicksort. n Worst case O(N log N) if always partition on median. n Justifies practical variants: median-of-3, median-of-5.