Order Statistics
Order statistics Given an input of n values and an integer i, we wish to find the i’th largest value. There are i-1 elements smaller than the i’th order statistic. The minimal element is of order statistic 1
Order statistics The maximum element is the n’th order statistic. Finding the i’th order statistic when the values are sorted is trivial O(1) using direct access, but requires at least nlogn time to sort the elements in advance.
Selection Our goal is to find the order statistics without sorting the elements, if indeed we will be able to improve the execution time. If we want to find the minimal or maximal element, then a linear search will do. However this idea can not be easily expanded for any order statistic.
Tournaments In a basketball tournament involving n teams, we form a complete binary tree with n leaves. Each internal node represents an elimination game. Each level has half the number of nodes from the previous level. Assuming the better team always wins its game, the best team always wins all its games, and can be found as the winner of the last game.
Tournaments
Tournaments can be used for finding minimum or maximum. But could they be enhanced for selection of any order statistics. The tournament algorithm: –Can be run in parallel. –Is fair (every team gets to each step after the same number of games)
Tournaments To select the second best team in the tournament, we need to compare all the logn teams that lost to the best element. We can compare these elements recursively using another tournament. The running time is therefore of n + logn
HeapSelect The tournament algorithm is like a binary heap, and finding the second minimum is like removing the minimal element from a binary heap. For any other k we use: heapSelect (int[] values, int k) { Heap heap = buildHeap(values); for (i = 1; i < k; i++) heap.removeMin(); return heap.minElement(); }
Heap Select The time is O(n + klogn) which is linear for any k = O(n/logn) But this algorithm is not linear for finding the median element, which is of common interest.
Quick Select We could use quick sort to first sort the elements and then select the k’th element according to its location in the sorted values quickSelect (int[] values, int k) { quickSort(values); return values (k); }
Quick Select An inline version of this algorithm would look like this. quickSelect(int[] values, int k) { pick x in values partition values into L1 x quicksort(L1) quicksort(L3) concatenate L1,L2,L3 return kth element in concatenation }
Quick Select But if k is less than the length of L1, we will always return some object in L1. Similarly, if k is greater than the combined lengths of L1 and L2, we will always return some object in L3, and it doesn't matter whether we call quicksort on L1. In either case, we can save some time by only making one of the two recursive calls. If we find that the element to be returned is in L2, we can just immediately return x without making either recursive call.
Quick Select quickSelect(int[] values, int k) { pick x in values partition values into L1 x if (k <= length(L1)) { quicksort(L1) return kth element in L1 } else if (k > length(L1)+length(L2)) { quicksort(L3) return (k-length(L1)-length(L2)) element in L3 } else return x }
Recursive final version quickSelect(int[] values, int k) { pick x in values partition values into L1 x if (k <= length(L1)) { return quickSelect(L1,k) } else if (k > length(L1)+length(L2)) { return quickSelect(L3, k – length(L1)+length(L2)) } else return x }
Time analysis If the partition always splits the values to 2 equal sub arrays Worst case is that partition has a bad split Average case- ?
Worst case O(n) algorithm Divide the input elements into groups of 5 elements each Find the median of each group Use select recursively to find the median of medians Partition the input using the median of medians as the pivot element
Time analysis The number of elements greater than x (the median of medians) is at least
Time analysis
Exercise Given an array of n elements, describe an algorithm that efficiently finds if one of the numbers in the array appears more than n/3 times
An inefficient solution Sort the array. Then check for a sequence of size greater than n/3.
An efficient solution The only elements that can appear more than n/3 time are the o.s n/3 and o.s 2n/3 Find both of these elements using the select algorithm. Count the instances of each of these elements in the array.
An efficient solution `
Example n=12 n/3=4 2n/3=
Exercise Given two sorted arrays a,b of size n each, find the median of the the 2n elements of the union of a and b. The median of array of even size is the average of the two elements in the middle of the sorted collection
An inefficient solution Using merge, we unify both arrays into a single array, and return the median of the new array.
An efficient solution Let a be the median of A, let b be the median of b. Assuming Recursively call the algorithm with the upper half of A and the lower half of B Base case: if |A| =1 and |B| =1 return (a+b)/2
Proof The median c of the union of A and B A has exactly n/2 elements smaller than a B has at most n/2 elements smaller than a In the union there is at most n elements smaller than a In the union there are at most n elements greater than b
Proof The median of the merge of the upper half of A and the lower half of B is the same median as A union B. acb n n <n
Proof Since we removed exactly n elements and these elements are n/2 smallest and n/2 largest in the union, the median stays at place. Time analysis: