Order Statistics Sorted

Slides:



Advertisements
Similar presentations
Randomized Algorithms Introduction Rom Aschner & Michal Shemesh.
Advertisements

Comp 122, Spring 2004 Order Statistics. order - 2 Lin / Devi Comp 122 Order Statistic i th order statistic: i th smallest element of a set of n elements.
Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
1 More Sorting; Searching Dan Barrish-Flood. 2 Bucket Sort Put keys into n buckets, then sort each bucket, then concatenate. If keys are uniformly distributed.
Medians and Order Statistics
Introduction to Algorithms
Randomized Algorithms Randomized Algorithms CS648 Lecture 1 1.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
Quicksort Quicksort     29  9.
Divide and Conquer. Recall Complexity Analysis – Comparison of algorithm – Big O Simplification From source code – Recursive.
ISOM MIS 215 Module 7 – Sorting. ISOM Where are we? 2 Intro to Java, Course Java lang. basics Arrays Introduction NewbieProgrammersDevelopersProfessionalsDesigners.
Spring 2015 Lecture 5: QuickSort & Selection
Nattee Niparnan. Recall  Complexity Analysis  Comparison of Two Algos  Big O  Simplification  From source code  Recursive.
Probabilistic (Average-Case) Analysis and Randomized Algorithms Two different approaches –Probabilistic analysis of a deterministic algorithm –Randomized.
Quicksort CS 3358 Data Structures. Sorting II/ Slide 2 Introduction Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case:
CSL758 Instructors: Naveen Garg Kavitha Telikepalli Scribe: Manish Singh Vaibhav Rastogi February 7 & 11, 2008.
Median/Order Statistics Algorithms
25 May Quick Sort (11.2) CSE 2011 Winter 2011.
Quicksort COMP171 Fall Sorting II/ Slide 2 Introduction * Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case: O(N.
Sorting Chapter Sorting Consider list x 1, x 2, x 3, … x n We seek to arrange the elements of the list in order –Ascending or descending Some O(n.
WS Algorithmentheorie 03 – Randomized Algorithms (Overview and randomised Quicksort) Prof. Dr. Th. Ottmann.
CS38 Introduction to Algorithms Lecture 7 April 22, 2014.
 1 Sorting. For computer, sorting is the process of ordering data. [ ]  [ ] [ “Tom”, “Michael”, “Betty” ]  [ “Betty”, “Michael”,
Algorithmic Complexity Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
Probabilistic (Average-Case) Analysis and Randomized Algorithms Two different but similar analyses –Probabilistic analysis of a deterministic algorithm.
Ch. 7 - QuickSort Quick but not Guaranteed. Ch.7 - QuickSort Another Divide-and-Conquer sorting algorithm… As it turns out, MERGESORT and HEAPSORT, although.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Median, order statistics. Problem Find the i-th smallest of n elements.  i=1: minimum  i=n: maximum  i= or i= : median Sol: sort and index the i-th.
Study Group Randomized Algorithms Jun 7, 2003 Jun 14, 2003.
Tirgul 4 Order Statistics Heaps minimum/maximum Selection Overview
Theory I Algorithm Design and Analysis (9 – Randomized algorithms) Prof. Dr. Th. Ottmann.
Computer Science CS 330: Algorithms Quick Sort Gene Itkis.
Randomized Turing Machines
Chapter 14 Randomized algorithms Introduction Las Vegas and Monte Carlo algorithms Randomized Quicksort Randomized selection Testing String Equality Pattern.
Nattee Niparnan. Recall  Complexity Analysis  Comparison of Two Algos  Big O  Simplification  From source code  Recursive.
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
Order Statistics. Order statistics Given an input of n values and an integer i, we wish to find the i’th largest value. There are i-1 elements smaller.
CPSC 335 Randomized Algorithms Dr. Marina Gavrilova Computer Science University of Calgary Canada.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Analysis of Algorithms CSCI Previous Evaluations of Programs Correctness – does the algorithm do what it is supposed to do? Generality – does it.
. CLASSES RP AND ZPP By: SARIKA PAMMI. CONTENTS:  INTRODUCTION  RP  FACTS ABOUT RP  MONTE CARLO ALGORITHM  CO-RP  ZPP  FACTS ABOUT ZPP  RELATION.
CSC 211 Data Structures Lecture 13
Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Order Statistics ● The ith order statistic in a set of n elements is the ith smallest element ● The minimum is thus the 1st order statistic ● The maximum.
Order Statistics David Kauchak cs302 Spring 2012.
Sorting: Implementation Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.
Quicksort CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
Review 1 Selection Sort Selection Sort Algorithm Time Complexity Best case Average case Worst case Examples.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
Instructor Neelima Gupta Expected Running Times and Randomized Algorithms Instructor Neelima Gupta
ICS 353: Design and Analysis of Algorithms
Young CS 331 D&A of Algo. Topic: Divide and Conquer1 Divide-and-Conquer General idea: Divide a problem into subprograms of the same kind; solve subprograms.
CSC317 1 Quicksort on average run time We’ll prove that average run time with random pivots for any input array is O(n log n) Randomness is in choosing.
Chapter 9: Sorting1 Sorting & Searching Ch. # 9. Chapter 9: Sorting2 Chapter Outline  What is sorting and complexity of sorting  Different types of.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
Chapter 9: Selection of Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Advanced Sorting.
CMPT 438 Algorithms.
Order Statistics Comp 122, Spring 2004.
Randomized Algorithms
Lecture 8 Randomized Algorithms
Randomized Algorithms
Medians and Order Statistics
Topic: Divide and Conquer
Order Statistics Comp 122, Spring 2004.
Algorithms CSCI 235, Spring 2019 Lecture 19 Order Statistics
Presentation transcript:

Order Statistics Sorted Find the key that is smaller than exactly k of the n keys

Order Statistics Statistics: Methods for combining a large amount of data (such as the scores of the whole class on a homework) into a single number or small set of numbers that gives a representative value of the data. The phrase order statistics refers to statistical methods that depend only on the ordering of the data and not on its numerical values. Average of the data, while easy to compute and very important as an estimate of a central value, is NOT an order statistic.

Concept of robustness of estimation Order Statistics Mode (most commonly occurring value) also does not depend on ordering. Most efficient methods for computing mode in a comparison-based model involve sorting algorithms. Median: The most commonly used order statistic, the value in the middle position in the sorted order of the values. Median can be obtained easily in O(n log n) time via sorting, is it possible to do better? Concept of robustness of estimation

Randomized Algorithms An algorithm that uses random “bits” to guide so as to achieve good “average case” performance. Formally, the algorithm's performance will be a random variable. The "worst case" is typically so unlikely to occur that it can be ignored.

Randomized Algorithms Access a source of independent, unbiased random bits (pseudo random numbers), and it is then allowed to use these random bits to influence its computation. Input Output Algorithm Random bits

Randomized Algorithms Las Vegas Algorithms A randomized algorithm that always outputs the correct answer, it is just that there is a small probability of taking long to execute. Monte Carlo Algorithms Sometimes we want the algorithm to always complete quickly, but allow a small probability of error. Any Las Vegas algorithm can be converted into a Monte Carlo algorithm, by outputting an arbitrary, possibly incorrect answer if it fails to complete within a specified time.

Randomized Quick Sort In traditional Quick Sort, we will always pick the first element as the pivot for partitioning. The worst case runtime is O(n2) while the expected runtime is O(nlogn) over the set of all input. Therefore, some input are born to have long runtime, e.g., an inversely sorted list.

Randomized Quick Sort In randomized Quick Sort, we will pick randomly an element as the pivot for partitioning. The expected runtime of any input is O(nlogn) even if the pivot is off by 90%.

Randomized Algorithms: Motivating Example Problem: Finding an 'a' in an array of n elements, given that half are 'a's and the other half are 'b's. Solution: Look at each element of the array, requiring (n/2 operations) if the array were ordered as 'b's first followed by 'a's. Similar drawback with checking in the reverse order, or checking every second element.

Randomized Algorithms: Motivating Example Any strategy with fixed order of checking i.e, a deterministic algorithm, we cannot guarantee that the algorithm will complete quickly for all possible inputs. On the other hand, if we were to check array elements at random, then we will quickly find an 'a' with high probability, whatever be the input.

Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is the nth order statistic The median is the n/2 order statistic If n is even, there are 2 medians How can we calculate order statistics? What is the running time?

Selection problem Given a list of n items, and a number k between 1 and n, find the item that would be kth if we sorted the list. The median is the special case of this for which k=n/2. We'll see two algorithms i.e. a randomized one based on quicksort ("quickselect") and a deterministic one. The randomized one is easier to understand & better in practice so we'll do it first. Let's warm up with some cases of selection that don't have much to do with medians (because k is very far from n/2).

Selection problem: 2nd best search If k=1, the selection problem is trivial: just select the minimum element. As usual we maintain a value x that is the minimum seen so far, and compare it against each successive value, updating it when something smaller is seen. min(L) { x = L[1] for (i = 2; i <= n; i++) if (L[i] < x) x = L[i] return x } What if you want to select the second best?

Selection problem: 2nd best search One possibility: Follow the same general strategy, but modify min(L) to keep two values, the best and second best seen so far. Compare each new value against the second best, to tell whether it is in the top two, but then if we discover that a new value is one of the top two so far we need to tell whether it's best or second best.

Selection problem: 2nd best search Some interesting behavior shows up when we try to analyze it. Worst case: List may be sorted in decreasing order, so each of the n-2 iterations of the loop performs 2 comparisons. The total is then 2n-3 comparisons. Average case: (assuming any permutation of L is equally likely) the first comparison in each iteration still always happens. But the second only happens when L[i] is one of the two smallest values among the first i. Each of the first i values is equally likely to be one of these two, so this is true with probability 2/i. The total expected number of times we make the second comparison is

Selection problem: 2nd best search Conclusion The sum (for i from 1 to n) of 1/i, known as the harmonic series, is ln n + O(1) (this can be proved using calculus, by comparing the sum to a similar integral). Therefore the total expected number of comparisons overall is n + O(log n). This small increase over the n-1 comparisons needed to find the minimum gives us hope that we can perform selection faster than sorting.

Linear-Time Median Selection Random-Select (S, i) 1. If |S| = 1 then return S. 2. Choose a random element y uniformly from S 3. Compare all elements of S to y. Let S1 = {x ≤ y} S2 = {x > y} 4. If |S1| = n then 4.1 If i = n return {y} else S1 = S1 – {y} 5. If |S1| ≥ i then return Random-Select(S1, i) else return Random-Select(|S2|, i - |S1|)

Linear-Time Median Selection Given a “black box” O(n) median algorithm, what can we do? ith order statistic: Find median x Partition input around x if (i  (n+1)/2) recursively find ith element of first half else find (i - (n+1)/2)th element in second half T(n) = T(n/2) + O(n) = O(n) Can you think of an application to sorting?

Linear-Time Median Selection Worst-case O(n lg n) quicksort Find median x and partition around it Recursively quicksort two halves T(n) = 2T(n/2) + O(n) = O(n lg n)