The Selection Problem.

The Selection Problem

Median and Order Statistics
In this section, we will study algorithms for finding the ith smallest element in a set of n elements We will again use divide-and-conquer algorithms

The Selection Problem Input: A set A of n (distinct) numbers and a number i, with 1  i  n Output: The element x  A that is larger than exactly i – 1 other elements of A x is the ith smallest element i = 1  minimum i = n  maximum

The Selection Problem (cont)
A simple solution Sort A Return A[ i ] This is (nlgn)

Minimum and Maximum Finding the minimum and maximum
Takes (n-1) comparisons ((n)) This is the best we can do and is optimal with respect to the number of comparisons MINIMUM(A) min  A[1] for i  2 to length(A) if min > A[ i ] min  A[ i ] return min MAXIMUM(A) max  A[1] for i  2 to length(A) if max < A[ i ] max  A[ i ] return max

Minimum and Maximum (cont)
Simultaneous minimum and maximum Obvious solution is 2(n-1) comparisons But we can do better – namely The algorithm If n is odd, set max and min to first element If n is even, compare first two elements and set max, min Process the remaining elements in pairs Find the larger and the smaller of the pair Compare the larger of the pair with the current max And the smaller of the pair with the current min

Minimum and Maximum (cont)
Total number of comparisons If n is odd comparisons If n is even 1 initial comparison And 3(n – 2)/2 comparisons For a total of 3n/2 – 2 comparisons In either case, total number of comparisons is at most

Selection in Expected Linear Time
Goal: Select ith smallest element from A[p..r]. Partition into A[p..q-1] and A[q+1..r] if i = q then return A[q] If ith smallest element is in A[p..q-1] then recurse on A[p..q-1] else recurse on A[q+1..r]

Selection in Expected Linear Time (cont)
Randomized-Select(A, p, r, i) 1 if p = r 2 return A[p] 3 q  Randomized-Partition(A, p, r) k  q - p + 1 //number of elements in the low side of of partition + pivot 5 if i = k //the pivot value is the answer 6 return A[q] 7 else if i < k 8 return Randomized-Select(A, p, q-1, i) 9 else 10 return Randomized-Select(A, q+1, r, i-k)

Revised Algorithm Randomized-Select(A, p, r, i) 1 if p = r
2 return A[p] 3 q  Randomized-Partition(A, p, r) 4 if i = q //the pivot value is the answer 5 return A[q] 6 else if i < q 7 return Randomized-Select(A, p, q-1, i) 8 else 9 return Randomized-Select(A, q+1, r, i)

Analysis of Selection Algorithm
Worst-case running time is (n2) Partition takes (n) If we always partition around the largest remaining element, we reduce the partition-size by one element each time What is best-case?

Analysis of Selection Algorithm (cont)
Average Case (i.e. expected running time for Randomized-Select) Average-case running time is (n) The time required is the random variable T(n) We want an upper bound on E[T(n)] In Randomized-Partition, all elements are equally likely to be the pivot

So, for each k such that 1  k  n, subarray A[p..q] has k elements All  the pivot with probability 1/n For k = 1, 2, …, n we define indicator random variables Xk where Xk = I{the subarray A[p..q] has exactly k elements} So, E[Xk] = 1/n

When we choose the pivot element (which ends up in A[q]) we do not know what will happen next Do we return with the ith element (k = i)? Do we recurse on A[p..q-1]? Do we recurse on A[q+1..r]? Decision depends on i in relation to k We will find the upper-bound on the average case by assuming that the ith element is always in the larger partition

Now, Xk = 1 for just one value of k, 0 for all others When Xk = 1, the two subarrays have sizes k – 1 and n – k Hence the recurrence:

Taking the expected values:

Looking at the expression max(k-1, n-k) If n is even, each term from appears twice in the summation If n is odd, each term from appears twice and appears once in the summation

Thus we have We use substitution to solve the recurrence Note: T(1) = (1) for n less than some constant Assume that T(n)  cn for some constant c that satisfies the initial conditions of the recurrence

Using this inductive hypothesis

To complete the proof, we need to show that for sufficiently large n, this last expression is at most cn i.e. As long as we choose the constant c so that c/4 – a > 0 (i.e., c > 4a), we can divide both sides by c/4 – a

Thus, if we assume that T(n) = (1) for , we have T(n) = (n)

Selection in Worst-Case Linear Time
“Median of Medians” algorithm It guarantees a good split when array is partitioned Partition is modified so that the pivot now becomes an input parameter The algorithm: If n = 1 return A[n]

Selection in Worst-Case Linear Time (cont)
Divide the n elements of the input array into n/5 groups of 5 elements each and at most one group of (n mod 5) elements Find the median of each of the n/5 groups by using insertion sort to sort list and then pick the 3rd element of each group Use Select recursively to find the median x of the n/5 medians found in step 2. If even number of medians, choose lower median

Partition the input array around the “median of medians” x using the modified version of Partition. Let k be one more than the number of elements on the low side of the partition, so that x is the kth smallest element and there are n – k elements on the high side of the partition if i = k, then return x. Otherwise, use Select recursively to find the ith smallest element on the low side if i < k, or the (i – k)th smallest element on the high side if i > k

Example of “Median of Medians” Input Array A[1..125] Step 1: 25 groups of 5 Step 2: We get 25 medians Step 3: Step 1: Using the 25 medians we get groups of 5 Step 2: We get 5 medians Step 3: Step 1: Using the 5 medians, we get 1 group of Step 2: We get 1 median Step 4: Partition A around the median

Analyzing “Median of Medians”
The following diagram might be helpful:

Analyzing “Median of Medians” (cont)
First, we need to put a lower bound on how many elements are greater than x (pivot) How many of the medians are greater than x? At least half of the medians from the groups Why “at least half?” medians are greater than x

Each of these medians contribute at least 3 elements greater than x except for two groups The group that contains x contributes only 2 elements greater than x The group that has less than 5 elements So the total number of elements > x is at least: The two discarded groups

Similarly, there are at least elements smaller than x Thus, in the worst case, for Step 5 Select is called recursively on the largest partition The largest partition has at most elements The size of the array minus the number of elements in the smaller partition

Developing the recurrence: Step 1 takes (n) time Step 2 takes (n) time (n) calls to Insertion Sort on sets of size (1) Step 3 takes Step 4 takes (n) time Step 5 takes at most

So the recurrence is Now use substitution to solve Assume T(n)  cn for some suitable large constant c and all n > ??? Also pick a constant a such that the function described by the (n) term is bounded above by an for all n > 0

Comes from removing the   Which is at most cn if If n = 70, then this inequality is undefined

We assume that n  71, so Choosing c  710a will satisfy the inequality on the previous slide You could choose any constant > 70 to be the base case constant Thus, the selection problem can be solved in the worst-case in linear time

Review of Sorts Review of sorts seen so far Insertion Sort
Easy to code Fast on small inputs (less than ~50) Fast on nearly sorted inputs Stable (n) best case (sorted list) (n2) average case (n2) worst case (reverse sorted list)

Stable Sorts Stable means that numbers with the same value appear in the output array in the same order as they do in the input array. That is, ties between two numbers are broken by the rule that whichever number appears first in the input array appears first in the output array. Normally, the property of stability is important only when satellite data are carried around with the element being sorted.

An example of stable sorting on playing cards
An example of stable sorting on playing cards. When the cards are sorted by rank with a stable sort, the two 5s must remain in the same order in the sorted output that they were originally in. When they are sorted with a non-stable sort, the 5s may end up in the opposite order in the sorted output.

Review of Sorts (cont) MergeSort Divide and Conquer algorithm
Doesn’t sort in place Requires memory as a function of n Stable (nlgn) best case (nlgn) average case (nlgn) worst case

Review of Sorts (cont) QuickSort Divide and Conquer algorithm
No merge step needed Small constants Fast in practice Not stable (nlgn) best case (nlgn) average case (n2) worst case

Review of Sorts (cont) Several of these algorithms sort in (nlgn) time MergeSort in worst case QuickSort on average On some input we can achieve (nlgn) time for each of these algorithms The sorted order they determine is based only on comparisons between the input elements They are called comparison sorts

Review of Sorts (cont) Other techniques for sorting exist, such as Linear Sorting which is not based on comparisons Usually with some restrictions or assumptions on input elements Linear Sorting techniques include: Counting Sort Radix Sort Bucket Sort

Lower Bounds for Sorting
In general, assuming unique inputs, comparison sorts are expressed in terms of comparisons. are equivalent in learning about the order of ai and aj What is the best we can do on the worst case type of input? What is the best worst-case running time?

The Decision-Tree Model
input: a1,a2,a3 # possible outputs = 3! = 6 Each possible output is a leaf 1:2  > 2:3 1:3  >  > 1,2,3  1:3 2,1,3  2:3  >  > 1,3,2  3,1,2  2,3,1  3,2,1 

Analysis of Decision-Tree Model
Worst Case Comparisons is equal to height of decision tree Lower bound on the worst case running time is the lower bound on the height of the decision tree. Note that the number of leaves in the decision tree  n!, where n = number elements in the input sequence

Theorem 8.1 Any comparison sort algorithm requires (nlgn) comparisons in the worst case Proof: Consider a decision tree of height h that sorts n elements Since there are n! permutations of n elements, each permutation representing a distinct sorted order, the tree must have at least n! leaves

Theorem 8.1 (cont) A binary tree of height h has at most 2h leaves The best possible worst case running time for comparison sorts is thus (nlgn) Mergesort, which is O(nlgn), is asymptotically optimal By equation 3.18

Sorting in Linear Time How can we do better? CountingSort RadixSort
BucketSort

The Selection Problem.

Similar presentations

Presentation on theme: "The Selection Problem."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Selection Problem.

Similar presentations

Presentation on theme: "The Selection Problem."— Presentation transcript:

Similar presentations

About project

Feedback