Order Statistics David Kauchak cs302 Spring 2012.

Slides:



Advertisements
Similar presentations
©2001 by Charles E. Leiserson Introduction to AlgorithmsDay 9 L6.1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 6 Prof. Erik Demaine.
Advertisements

Comp 122, Spring 2004 Order Statistics. order - 2 Lin / Devi Comp 122 Order Statistic i th order statistic: i th smallest element of a set of n elements.
Order Statistics Sorted
Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
Order Statistics(Selection Problem) A more interesting problem is selection:  finding the i th smallest element of a set We will show: –A practical randomized.
CS 3343: Analysis of Algorithms Lecture 14: Order Statistics.
1 Selection --Medians and Order Statistics (Chap. 9) The ith order statistic of n elements S={a 1, a 2,…, a n } : ith smallest elements Also called selection.
Introduction to Algorithms
Introduction to Algorithms Jiafen Liu Sept
SELECTION CS16: Introduction to Data Structures & Algorithms Tuesday, March 3,
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
Chapter 4: Divide and Conquer Master Theorem, Mergesort, Quicksort, Binary Search, Binary Trees The Design and Analysis of Algorithms.
Quicksort Quicksort     29  9.
Divide and Conquer. Recall Complexity Analysis – Comparison of algorithm – Big O Simplification From source code – Recursive.
Quicksort CSE 331 Section 2 James Daly. Review: Merge Sort Basic idea: split the list into two parts, sort both parts, then merge the two lists
Efficient Sorts. Divide and Conquer Divide and Conquer : chop a problem into smaller problems, solve those – Ex: binary search.
Spring 2015 Lecture 5: QuickSort & Selection
Nattee Niparnan. Recall  Complexity Analysis  Comparison of Two Algos  Big O  Simplification  From source code  Recursive.
Sorting and selection – Part 2 Prof. Noah Snavely CS1114
Quicksort CS 3358 Data Structures. Sorting II/ Slide 2 Introduction Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case:
CS 206 Introduction to Computer Science II 04 / 28 / 2009 Instructor: Michael Eckmann.
CSC 2300 Data Structures & Algorithms March 23, 2007 Chapter 7. Sorting.
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
CS 206 Introduction to Computer Science II 12 / 09 / 2009 Instructor: Michael Eckmann.
Sorting Heapsort Quick review of basic sorting methods Lower bounds for comparison-based methods Non-comparison based sorting.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Chapter 4: Divide and Conquer The Design and Analysis of Algorithms.
Quicksort.
Median, order statistics. Problem Find the i-th smallest of n elements.  i=1: minimum  i=n: maximum  i= or i= : median Sol: sort and index the i-th.
Tirgul 4 Order Statistics Heaps minimum/maximum Selection Overview
CS 206 Introduction to Computer Science II 12 / 08 / 2008 Instructor: Michael Eckmann.
Design and Analysis of Algorithms - Chapter 41 Divide and Conquer The most well known algorithm design strategy: 1. Divide instance of problem into two.
Nattee Niparnan. Recall  Complexity Analysis  Comparison of Two Algos  Big O  Simplification  From source code  Recursive.
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
Computer Science 101 Fast Searching and Sorting. Improving Efficiency We got a better best case by tweaking the selection sort and the bubble sort We.
Searching. RHS – SOC 2 Searching A magic trick: –Let a person secretly choose a random number between 1 and 1000 –Announce that you can guess the number.
Order Statistics. Order statistics Given an input of n values and an integer i, we wish to find the i’th largest value. There are i-1 elements smaller.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Order Statistics ● The ith order statistic in a set of n elements is the ith smallest element ● The minimum is thus the 1st order statistic ● The maximum.
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
Order Statistics(Selection Problem)
Chapter 8 Sorting and Searching Goals: 1.Java implementation of sorting algorithms 2.Selection and Insertion Sorts 3.Recursive Sorts: Mergesort and Quicksort.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II.
Quicksort Data Structures and Algorithms CS 244 Brent M. Dingle, Ph.D. Game Design and Development Program Department of Mathematics, Statistics, and Computer.
CSC317 1 Quicksort on average run time We’ll prove that average run time with random pivots for any input array is O(n log n) Randomness is in choosing.
Sorting – Lecture 3 More about Merge Sort, Quick Sort.
INTRO2CS Tirgul 8 1. Searching and Sorting  Tips for debugging  Binary search  Sorting algorithms:  Bogo sort  Bubble sort  Quick sort and maybe.
David Luebke 1 6/26/2016 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
David Luebke 1 7/2/2016 CS 332: Algorithms Linear-Time Sorting: Review + Bucket Sort Medians and Order Statistics.
Chapter 9: Selection of Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Randomized Algorithms
CSE 143 Lecture 23: quick sort.
Order Statistics(Selection Problem)
Chapter 4: Divide and Conquer
Randomized Algorithms
Medians and Order Statistics
CS 3343: Analysis of Algorithms
Chapter 4.
CS 1114: Sorting and selection (part two)
Chapter 9: Medians and Order Statistics
Algorithms CSCI 235, Spring 2019 Lecture 20 Order Statistics II
CS200: Algorithm Analysis
The Selection Problem.
Quicksort and Randomized Algs
CS200: Algorithm Analysis
Data Structures and Algorithms CS 244
Presentation transcript:

Order Statistics David Kauchak cs302 Spring 2012

Medians The median of a set of numbers is the number such that half of the numbers are larger and half smaller How might we calculate the median of a set? Sort the numbers, then pick the n/2 element A = [50, 12, 1, 97, 30]

Medians The median of a set of numbers is the number such that half of the numbers are larger and half smaller How might we calculate the median of a set? Sort the numbers, then pick the n/2 element A = [50, 12, 1, 97, 30] A = [1, 12, 30, 50, 97] runtime?

Medians The median of a set of numbers is the number such that half of the numbers are larger and half smaller How might we calculate the median of a set? Sort the numbers, then pick the n/2 element A = [50, 12, 1, 97, 30] A = [1, 12, 30, 50, 97] Θ(n log n)

Selection More general problem: find the k-th smallest element in an array i.e. element where exactly k-1 things are smaller than it aka the “selection” problem can use this to find the median if we want Can we solve this in a similar way? Yes, sort the data and take the kth element Θ(n log n)

Can we do better? Are we doing more work than we need to? To get the k-th element (or the median) by sorting, we’re finding all the k-th elements at once We just want the one! Often when you find yourself doing more work than you need to, there is a faster way (though not always)

selection problem Our tools divide and conquer sorting algorithms other functions merge partition

Partition Partition takes Θ(n) time and performs a similar operation given an element A[q], Partition can be seen as dividing the array into three sets: < A[q] = A[q] > A[q] Ideas?

An example We’re looking for the 5 th smallest If we called partition, what would be the in three sets? < 5: = 5: > 5:

An example We’re looking for the 5 th smallest < 5: = 5: > 5: Does this help us?

An example We’re looking for the 5 th smallest < 5: = 5: > 5: We know the 5 th smallest has to be in this set

Selection: divide and conquer Call partition decide which of the three sets contains the answer we’re looking for recurse Like binary search on unsorted data Selection(A, k, p, r) q relq Return Selection(A, k-relq, q+1, r) ?

Selection: divide and conquer Call partition decide which of the three sets contains the answer we’re looking for recurse Like binary search on unsorted data Selection(A, k, p, r) q relq Return Selection(A, k-relq, q+1, r) As we recurse, we may update the k that we’re looking for because we update the lower end

Selection: divide and conquer Call partition decide which of the three sets contains the answer we’re looking for recurse Like binary search on unsorted data Selection(A, k, p, r) q relq Return Selection(A, k-relq, q+1, r) Partition returns the absolute index, we want an index relative to the current p (window start)

Selection(A, 3, 1, 8) Selection(A, k, p, r) q relq Selection(A, k-relq, q+1, r)

Selection(A, k, p, r) q relq Selection(A, k-relq, q+1, r) Selection(A, 3, 1, 8) relq = 6 – =

Selection(A, k, p, r) q relq Selection(A, k-relq, q+1, r) Selection(A, 3, 1, 8) relq = 6 – = 6

Selection(A, 3, 1, 5) At each call, discard part of the array Selection(A, k, p, r) q relq Selection(A, k-relq, q+1, r)

Selection(A, k, p, r) q relq Selection(A, k-relq, q+1, r) Selection(A, 3, 1, 5) relq = 2 – = 2

Selection(A, k, p, r) q relq Selection(A, k-relq, q+1, r) Selection(A, 1, 3, 5)

Selection(A, 1, 3, 5) Selection(A, k, p, r) q relq Selection(A, k-relq, q+1, r)

Selection(A, k, p, r) q relq Selection(A, k-relq, q+1, r) Selection(A, 1, 3, 5) relq = 5 – = 3

Selection(A, k, p, r) q relq Selection(A, k-relq, q+1, r) Selection(A, 1, 3, 4)

Selection(A, 1, 3, 4) Selection(A, k, p, r) q relq Selection(A, k-relq, q+1, r)

Selection(A, 1, 3, 4) Selection(A, k, p, r) q relq Selection(A, k-relq, q+1, r)

Selection(A, k, p, r) q relq Selection(A, k-relq, q+1, r) Selection(A, 1, 3, 4) relq = 3 – = 1

Running time of Selection? Best case? Each call to Partition throws away half the data Recurrence? O(n)

Running time of Selection? Worst case? Each call to Partition only reduces our search by 1 Recurrence? O(n 2 )

Running time of Selection? Worst case? Each call to Partition only reduces our search by 1 When does this happen? sorted reverse sorted others…

How can randomness help us? RSelection(A, k, p, r) q q Return Selection(A, k, q+1, r)

Running time of RSelection? Best case O(n) Worst case Still O(n 2 ) As with Quicksort, we can get unlucky Average case?

Average case Depends on how much data we throw away at each step

Average case We’ll call a partition “good” if the pivot falls within within the 25 th and 75 th percentile a “good” partition throws away at least a quarter of the data Or, each of the partitions contains at least 25% of the data What is the probability of a “good” partition? Half of the elements lie within this range and half outside, so 50% chance

Average case Recall, that like Quicksort, we can absorb the cost of a number of “bad” partitions

Average case On average, how many times will Partition need to be called before be get a good partition? Let E be the number of times Recurrence: half the time we get a good partition on the first try and half of the time, we have to try again

Mathematicians and beer An infinite number of mathematicians walk into a bar. The first one orders a beer. The second orders half a beer. The third, a quarter of a beer. The bartender says "You're all idiots", and pours two beers.

Average case Another look. Let p be the probability of success Let X be the number of calls required

Average case If on average we can get a “good” partition ever other time, what is the recurrence? recall the pivot of a “good” partition falls in the 25 th and 75 th percentile roll in the cost of the “bad” partitions We throw away at least ¼ of the data

Which is?

a = b = f(n) = 1 4/3 n Case 1: Θ(n) Average case running time!

An aside… Notice a trend? Θ(n)

a = b = f(n) = 1 1/p f(n) Case 1: Θ(f(n))