Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?

Slides:



Advertisements
Similar presentations
Comp 122, Spring 2004 Order Statistics. order - 2 Lin / Devi Comp 122 Order Statistic i th order statistic: i th smallest element of a set of n elements.
Advertisements

QuickSort Average Case Analysis An Incompressibility Approach Brendan Lucier August 2, 2005.
Analysis of Algorithms CS 477/677 Linear Sorting Instructor: George Bebis ( Chapter 8 )
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
MS 101: Algorithms Instructor Neelima Gupta
1 Sorting in Linear Time How can we do better?  CountingSort  RadixSort  BucketSort.
1 Selection --Medians and Order Statistics (Chap. 9) The ith order statistic of n elements S={a 1, a 2,…, a n } : ith smallest elements Also called selection.
Introduction to Algorithms
Introduction to Algorithms Jiafen Liu Sept
Data Structures and Algorithms (AT70.02) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: CLRS “Intro.
Using Divide and Conquer for Sorting
Spring 2015 Lecture 5: QuickSort & Selection
Median/Order Statistics Algorithms
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
Sorting Heapsort Quick review of basic sorting methods Lower bounds for comparison-based methods Non-comparison based sorting.
Ch. 7 - QuickSort Quick but not Guaranteed. Ch.7 - QuickSort Another Divide-and-Conquer sorting algorithm… As it turns out, MERGESORT and HEAPSORT, although.
Comp 122, Spring 2004 Lower Bounds & Sorting in Linear Time.
Data Structures, Spring 2006 © L. Joskowicz 1 Data Structures – LECTURE 4 Comparison-based sorting Why sorting? Formal analysis of Quick-Sort Comparison.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Analysis of Algorithms CS 477/677
Tirgul 4 Order Statistics Heaps minimum/maximum Selection Overview
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
David Luebke 1 7/2/2015 Linear-Time Sorting Algorithms.
10/15/2002CSE More on Sorting CSE Algorithms Sorting-related topics 1.Lower bound on comparison sorting 2.Beating the lower bound 3.Finding.
Lower Bounds for Comparison-Based Sorting Algorithms (Ch. 8)
David Luebke 1 8/17/2015 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
Computer Algorithms Lecture 11 Sorting in Linear Time Ch. 8
Sorting in Linear Time Lower bound for comparison-based sorting
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
David Luebke 1 10/13/2015 CS 332: Algorithms Linear-Time Sorting Algorithms.
CSC 41/513: Intro to Algorithms Linear-Time Sorting Algorithms.
Introduction to Algorithms Jiafen Liu Sept
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Sorting Fun1 Chapter 4: Sorting     29  9.
Analysis of Algorithms CS 477/677
Fall 2015 Lecture 4: Sorting in linear time
Introduction to Algorithms Jiafen Liu Sept
Data Structures and Algorithms (AT70.02) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Prof. Sumanta Guha Slide Sources: CLRS “Intro.
Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Order Statistics ● The ith order statistic in a set of n elements is the ith smallest element ● The minimum is thus the 1st order statistic ● The maximum.
Mudasser Naseer 1 11/5/2015 CSC 201: Design and Analysis of Algorithms Lecture # 8 Some Examples of Recursion Linear-Time Sorting Algorithms.
Order Statistics(Selection Problem)
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
Data Structures Haim Kaplan & Uri Zwick December 2013 Sorting 1.
COSC 3101A - Design and Analysis of Algorithms 4 Quicksort Medians and Order Statistics Many of these slides are taken from Monica Nicolescu, Univ. of.
Linear Sorting. Comparison based sorting Any sorting algorithm which is based on comparing the input elements has a lower bound of Proof, since there.
Sorting Lower Bounds n Beating Them. Recap Divide and Conquer –Know how to break a problem into smaller problems, such that –Given a solution to the smaller.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
David Luebke 1 6/26/2016 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
David Luebke 1 7/2/2016 CS 332: Algorithms Linear-Time Sorting: Review + Bucket Sort Medians and Order Statistics.
Chapter 9: Selection of Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Lower Bounds & Sorting in Linear Time
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Order Statistics Comp 122, Spring 2004.
Linear-Time Sorting Continued Medians and Order Statistics
Randomized Algorithms
Introduction to Algorithms
Ch8: Sorting in Linear Time Ming-Te Chi
Randomized Algorithms
Linear Sorting Sorting in O(n) Jeff Chastine.
Order Statistics Comp 550, Spring 2015.
Lower Bounds & Sorting in Linear Time
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Order Statistics Comp 122, Spring 2004.
Chapter 8: Overview Comparison sorts: algorithms that sort sequences by comparing the value of elements Prove that the number of comparison required to.
CS 583 Analysis of Algorithms
The Selection Problem.
Presentation transcript:

Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?

Ch.8 – Linear Sorting Sorting by Comparisons Up to this point, all the sorting algorithms we examined depended on comparing elements of a given set, in order to sort the set. All the algorithms we came up with, were either O(n lg n) or  (n lg n) or O(n 2 ). One can ask: can we sort a set S, consisting of elements from a totally ordered universe, in time O(|S|) ? The answer, as we might expect, is “yes, but…” First of all, the negative result: sorting by comparisons is (worst case)  (n lg n). 8/24/

Ch.8 – Linear Sorting.. 8/24/

Ch.8 – Linear Sorting The main ideas are: 1.Every time you try to determine the relative positions of two element you must make a comparison (decision). 2.The input ( n elements) can come in any order. 3.There are n! ways in which n different elements can be arranged. 4.A “sort” is equivalent to finding (by a sequence of comparisons between two elements) the permutation of the input that leaves the input set ordered. 5.Each such permutation corresponds to a leaf of the “binary decision tree” generated by the comparisons. 6.The binary decision tree has n! leaves. 8/24/

Ch.8 – Linear Sorting Theorem 8.1: Any comparison sort algorithm requires  (n lg n) comparisons in the worst case. Proof: by the previous discussion, the binary decision tree has at least n! leaves, and height h. Since such a binary tree cannot have more than 2 h leaves, we have n! ≤ 2 h. Taking the logarithm (base 2) of both sides: h ≥ lg (n!) =  (n lg n), This means that there is at least ONE path of length h connecting the root to a leaf. Corollary 8.2: HEAPSORT and MERGESORT are asymptotically optimal comparison sorts. 8/24/

Ch.8 – Linear Sorting Sorting NOT by comparisons How do we do it? We must make some further assumptions. For example, we need to assume more than “the set to be sorted is a set of integers”. More specifically, we assume the integers fall in some range, say [1..k], where k = O(n). This is the idea behind Counting Sort. How do we use it? 8/24/

Ch.8 – Linear Sorting.. 8/24/

Ch.8 – Linear Sorting.. 8/24/

Ch.8 – Linear Sorting Counting Sort: Time Complexity How much time? The for loop of l. 2-3 takes time  (k). The for loop of l. 4-5 takes time  (n). The for loop of l. 7-8 takes time  (k). The for loop of l takes time  (n). The overall time is  (k + n). The assumption on k gives that the overall time is  (n). 8/24/

Ch.8 – Linear Sorting Radix Sort – or Hollerith’s Sort What else can we use? Assume all your integers (we are sorting sets of integers) have d or fewer digits. Pad the ones with fewer than d digits with leading 0 s, for uniformity. Assume the digits are on cards with 80 or so vertical columns, each column with room for 10 (or more) distinct holes (one for each of 0..9 ). Use columns 1..d to store each integer. Take a deck of such cards (with integers) and sort it. 8/24/

Ch.8 – Linear Sorting Radix Sort 8/24/

Ch.8 – Linear Sorting Radix Sort Lemma 8.3. Given n d -digit numbers in which each digit can take up to k possible values, R ADIX S ORT correctly sorts these numbers in  (d(n + k)) time if the stable sort it uses takes  (n + k) time. Proof: the correctness follows by induction on the column being sorted. Sort the first column (using a stable sort). Assume the set is sorted on the first i columns (starting from the back); prove it remains sorted when we use a stable sort on the i+1 st column (see Ex ). C OUNTING S ORT on each digit will give the result (details?). 8/24/

Ch.8 – Linear Sorting Bucket Sort Assume all the values have equal (independent) probability of appearing as elements of [0, 1). Divide the interval [0, 1) into n equal sized subintervals (buckets) and distribute the n input numbers into the buckets. Sort the numbers in each bucket (your choice of sort). Go through the buckets in order, listing the contents. 8/24/

Ch.8 – Linear Sorting Bucket Sort 8/24/

Ch.8 – Linear Sorting Bucket Sort We observe that both of the for loops in l. 3-4 and 5-6 take time O(n) to execute. We need to analyze the cost of the n calls to I NSERTION S ORT on l. 8. 8/24/

Ch.8 – Linear Sorting Bucket Sort Let n i be the random variable denoting the number of elements in bucket B[i]. Since INSERTIONSORT runs in quadratic time, the running time for BUCKETSORT is The expected time is given by We will show that E[n i 2 ] = 2 – 1/n, for i = 0, 1,..., n - 1. It should be clear that the expected number of items in each bucket is the same: by the uniform distribution. 8/24/

Ch.8 – Linear Sorting Bucket Sort Define the indicator random variables X ij = I{A[j] falls in bucket i}, for i = 0, …, n - 1, j = 1, …, n. Thus To compute E[n i 2 ], we expand, square and regroup: 8/24/

Ch.8 – Linear Sorting Bucket Sort We now compute the two sums. The indicator random variable X ij takes the value 1 with probability 1/n and 0 with probability 1 – 1/n. The same is true of X ij 2 : E[X ij 2 ] = 1 2 *(1/n) *(1 – 1/n) = 1/n. We observe that, with k ≠ j, X ij and X ik are independent. This leads to: E[X ij X ik ] = E[X ij ]*E[X ik ] = (1/n)*(1/n) = 1/n 2. 8/24/

Ch.8 – Linear Sorting Bucket Sort Substitute in the last equation two slides ago: It follows that E[T(n)] =  (n) + n*O(2 – 1/n) =  (n). Note: the same result holds as long as the distribution implies that the sum of the bucket sizes is linear in the total number of elements. 8/24/

Ch.9 – Order Statistics Question: what is the time complexity for the extraction of the i th element in a totally ordered set of n elements?Answer: we first have to find an algorithm… We can find a minimum or a maximum in (deterministic) time  (n) - how? To find the i th element, we can partition the set into two parts (as in QUICKSORT ), if the pivot is the i th element, we are done; if not, we can decide which side of the pivot will contain the i th element and partition that subset further. 8/24/

Ch.9 – Order Statistics The algorithm: we use randomization to offset the probability of receiving the worst possible input sequence. 8/24/

Ch.9 – Order Statistics Note that the recursive call is only on one side of the partition. That is what will allow us to conclude an expected time complexity that is linear in n. Obviously, the worst case is  (n 2 ). Let T(n) be the random variable that denotes the running time of the algorithm on an input array A[p..r] of n elements. We obtain an upper bound on E[T(n)]. 1.RANDOMIZED-PARTITION is equally likely to return any element as the pivot. 2.Therefore, for each k  [1, …, n], the subarray A[p..q] has k elements (all ≤ the pivot) with probability 1/n. 8/24/

Ch.9 – Order Statistics 3.For k  [1, …, n], define the indicator random variable X k = I{the subarray A[p..q] has exactly k elements}. 4.If the elements of A are distinct, E[X k ] = 1/n. 5.When we call RANDOMIZE-SELECT and choose A[q] as the pivot element, we do not know a priori, if we will terminate immediately, recurse on the left subarray A[p, …, q-1] or recurse on the right subarray A[q+1, …, r]. 6.The decision depends on where the i th smallest element falls w.r.t. A[q]. Since we are looking for an upper bound, we will assume the desired element is always in the larger of the two partition subarrays – and we’ll assume T(n) to be monotone increasing. 8/24/

Ch.9 – Order Statistics 7.For a given call of Randomized-Select, the indicator random variable X k has the value 1 for exactly one value of k, and has the value 0 for all oher values of k. 8.When X k = 1, the two subarrays available for the recursion have sizes k - 1 and n - k. 9.The recurrence is 7.Taking Expected values ( X k & T(max(k-1, n-k)) indep.): 8/24/

Ch.9 – Order Statistics 8/24/ We now need to simplify the last expression: observe that 12. If n is even, each term from T(ceil(n/2)) up to T(n-1) appears exactly twice in the summation; 13. If n is odd, each term from T(ceil(n/2)) up to T(n-1) appears exactly twice in the summation along with one appearance of T(floor(n/2)). This leads to the estimate: The rest of the proof uses this estimate and induction on the substitution E[T(n)] ≤ c n, for some c > 0 and large enough n.

Ch.9 – Order Statistics 8/24/ Note: with a more complex algorithm, one can compute order statistics in deterministic time  (n). See the text.