Order Statistics. Order statistics Given an input of n values and an integer i, we wish to find the i’th largest value. There are i-1 elements smaller.

Slides:



Advertisements
Similar presentations
Introduction to Algorithms Quicksort
Advertisements

Order Statistics Sorted
Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
Lower bound: Decision tree and adversary argument
Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
CSC 213 – Large Scale Programming or. Today’s Goals  Begin by discussing basic approach of quick sort  Divide-and-conquer used, but how does this help?
Medians and Order Statistics
Introduction to Algorithms
SELECTION CS16: Introduction to Data Structures & Algorithms Tuesday, March 3,
Median Finding, Order Statistics & Quick Sort
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
CSC 331: Algorithm Analysis Divide-and-Conquer Algorithms.
Chapter 4: Divide and Conquer Master Theorem, Mergesort, Quicksort, Binary Search, Binary Trees The Design and Analysis of Algorithms.
Using Divide and Conquer for Sorting
Quicksort Quicksort     29  9.
© Copyright 2012 by Pearson Education, Inc. All Rights Reserved. 1 Chapter 17 Sorting.
The Substitution method T(n) = 2T(n/2) + cn Guess:T(n) = O(n log n) Proof by Mathematical Induction: Prove that T(n)  d n log n for d>0 T(n)  2(d  n/2.
Sorting Algorithms and Average Case Time Complexity
Updated QuickSort Problem From a given set of n integers, find the missing integer from 0 to n using O(n) queries of type: “what is bit[j]
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Tirgul 4 Sorting: – Quicksort – Average vs. Randomized – Bucket Sort Heaps – Overview – Heapify – Build-Heap.
Chapter 4: Divide and Conquer The Design and Analysis of Algorithms.
Quicksort. Quicksort I To sort a[left...right] : 1. if left < right: 1.1. Partition a[left...right] such that: all a[left...p-1] are less than a[p], and.
Median, order statistics. Problem Find the i-th smallest of n elements.  i=1: minimum  i=n: maximum  i= or i= : median Sol: sort and index the i-th.
Data Structures Review Session 1
Analysis of Algorithms CS 477/677
Tirgul 4 Order Statistics Heaps minimum/maximum Selection Overview
Selection1. 2 The Selection Problem Given an integer k and n elements x 1, x 2, …, x n, taken from a total order, find the k-th smallest element in this.
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved L18 (Chapter 23) Algorithm.
Mergesort and Quicksort Chapter 8 Kruse and Ryba.
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
C++ Programming: From Problem Analysis to Program Design, Second Edition Chapter 19: Searching and Sorting.
Binary Heap.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Analysis of Algorithms CS 477/677
Order Statistics ● The ith order statistic in a set of n elements is the ith smallest element ● The minimum is thus the 1st order statistic ● The maximum.
Sorting What makes it hard? Chapter 7 in DS&AA Chapter 8 in DS&PS.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II.
Chapter 9 Sorting. The efficiency of data handling can often be increased if the data are sorted according to some criteria of order. The first step is.
CSC317 1 Quicksort on average run time We’ll prove that average run time with random pivots for any input array is O(n log n) Randomness is in choosing.
Sorting Lower Bounds n Beating Them. Recap Divide and Conquer –Know how to break a problem into smaller problems, such that –Given a solution to the smaller.
Computer Sciences Department1. Sorting algorithm 4 Computer Sciences Department3.
Sorting – Lecture 3 More about Merge Sort, Quick Sort.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
David Luebke 1 6/26/2016 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
Liang, Introduction to Java Programming, Seventh Edition, (c) 2009 Pearson Education, Inc. All rights reserved Chapter 26 Sorting.
David Luebke 1 7/2/2016 CS 332: Algorithms Linear-Time Sorting: Review + Bucket Sort Medians and Order Statistics.
Algorithm Design Techniques, Greedy Method – Knapsack Problem, Job Sequencing, Divide and Conquer Method – Quick Sort, Finding Maximum and Minimum, Dynamic.
Advanced Sorting 7 2  9 4   2   4   7
Randomized Algorithms
Quick-Sort 9/12/2018 3:26 PM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia,
Divide and Conquer – and an Example QuickSort
Randomized Algorithms
Data Structures Review Session
Sub-Quadratic Sorting Algorithms
Chapter 4.
Quick-Sort 2/25/2019 2:22 AM Quick-Sort     2
Quick-Sort 4/8/ :20 AM Quick-Sort     2 9  9
Quick-Sort 4/25/2019 8:10 AM Quick-Sort     2
Topic: Divide and Conquer
Algorithms CSCI 235, Spring 2019 Lecture 20 Order Statistics II
The Selection Problem.
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Chapter 11 Sets, and Selection
Presentation transcript:

Order Statistics

Order statistics Given an input of n values and an integer i, we wish to find the i’th largest value. There are i-1 elements smaller than the i’th order statistic. The minimal element is of order statistic 1

Order statistics The maximum element is the n’th order statistic. Finding the i’th order statistic when the values are sorted is trivial O(1) using direct access, but requires at least nlogn time to sort the elements in advance.

Selection Our goal is to find the order statistics without sorting the elements, if indeed we will be able to improve the execution time. If we want to find the minimal or maximal element, then a linear search will do. However this idea can not be easily expanded for any order statistic.

Tournaments In a basketball tournament involving n teams, we form a complete binary tree with n leaves. Each internal node represents an elimination game. Each level has half the number of nodes from the previous level. Assuming the better team always wins its game, the best team always wins all its games, and can be found as the winner of the last game.

Tournaments

Tournaments can be used for finding minimum or maximum. But could they be enhanced for selection of any order statistics. The tournament algorithm: –Can be run in parallel. –Is fair (every team gets to each step after the same number of games)

Tournaments To select the second best team in the tournament, we need to compare all the logn teams that lost to the best element. We can compare these elements recursively using another tournament. The running time is therefore of n + logn

HeapSelect The tournament algorithm is like a binary heap, and finding the second minimum is like removing the minimal element from a binary heap. For any other k we use: heapSelect (int[] values, int k) { Heap heap = buildHeap(values); for (i = 1; i < k; i++) heap.removeMin(); return heap.minElement(); }

Heap Select The time is O(n + klogn) which is linear for any k = O(n/logn) But this algorithm is not linear for finding the median element, which is of common interest.

Quick Select We could use quick sort to first sort the elements and then select the k’th element according to its location in the sorted values quickSelect (int[] values, int k) { quickSort(values); return values (k); }

Quick Select An inline version of this algorithm would look like this. quickSelect(int[] values, int k) { pick x in values partition values into L1 x quicksort(L1) quicksort(L3) concatenate L1,L2,L3 return kth element in concatenation }

Quick Select But if k is less than the length of L1, we will always return some object in L1. Similarly, if k is greater than the combined lengths of L1 and L2, we will always return some object in L3, and it doesn't matter whether we call quicksort on L1. In either case, we can save some time by only making one of the two recursive calls. If we find that the element to be returned is in L2, we can just immediately return x without making either recursive call.

Quick Select quickSelect(int[] values, int k) { pick x in values partition values into L1 x if (k <= length(L1)) { quicksort(L1) return kth element in L1 } else if (k > length(L1)+length(L2)) { quicksort(L3) return (k-length(L1)-length(L2)) element in L3 } else return x }

Recursive final version quickSelect(int[] values, int k) { pick x in values partition values into L1 x if (k <= length(L1)) { return quickSelect(L1,k) } else if (k > length(L1)+length(L2)) { return quickSelect(L3, k – length(L1)+length(L2)) } else return x }

Time analysis If the partition always splits the values to 2 equal sub arrays Worst case is that partition has a bad split Average case- ?

Worst case O(n) algorithm Divide the input elements into groups of 5 elements each Find the median of each group Use select recursively to find the median of medians Partition the input using the median of medians as the pivot element

Time analysis The number of elements greater than x (the median of medians) is at least

Time analysis

Exercise Given an array of n elements, describe an algorithm that efficiently finds if one of the numbers in the array appears more than n/3 times

An inefficient solution Sort the array. Then check for a sequence of size greater than n/3.

An efficient solution The only elements that can appear more than n/3 time are the o.s n/3 and o.s 2n/3 Find both of these elements using the select algorithm. Count the instances of each of these elements in the array.

An efficient solution `

Example n=12 n/3=4 2n/3=

Exercise Given two sorted arrays a,b of size n each, find the median of the the 2n elements of the union of a and b. The median of array of even size is the average of the two elements in the middle of the sorted collection

An inefficient solution Using merge, we unify both arrays into a single array, and return the median of the new array.

An efficient solution Let a be the median of A, let b be the median of b. Assuming Recursively call the algorithm with the upper half of A and the lower half of B Base case: if |A| =1 and |B| =1 return (a+b)/2

Proof The median c of the union of A and B A has exactly n/2 elements smaller than a B has at most n/2 elements smaller than a In the union there is at most n elements smaller than a In the union there are at most n elements greater than b

Proof The median of the merge of the upper half of A and the lower half of B is the same median as A union B. acb n n <n

Proof Since we removed exactly n elements and these elements are n/2 smallest and n/2 largest in the union, the median stays at place. Time analysis: