David Luebke 1 7/2/2015 Linear-Time Sorting Algorithms.

Slides:



Advertisements
Similar presentations
David Luebke 1 6/1/2014 CS 332: Algorithms Medians and Order Statistics Structures for Dynamic Sets.
Advertisements

Sorting in Linear Time Introduction to Algorithms Sorting in Linear Time CSE 680 Prof. Roger Crawfis.
Analysis of Algorithms CS 477/677 Linear Sorting Instructor: George Bebis ( Chapter 8 )
Non-Comparison Based Sorting
CSE 3101: Introduction to the Design and Analysis of Algorithms
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
MS 101: Algorithms Instructor Neelima Gupta
1 Sorting in Linear Time How can we do better?  CountingSort  RadixSort  BucketSort.
Mudasser Naseer 1 5/1/2015 CSC 201: Design and Analysis of Algorithms Lecture # 9 Linear-Time Sorting Continued.
Counting Sort Non-comparison sort. Precondition: n numbers in the range 1..k. Key ideas: For each x count the number C(x) of elements ≤ x Insert x at output.
Order Statistics(Selection Problem) A more interesting problem is selection:  finding the i th smallest element of a set We will show: –A practical randomized.
CS 3343: Analysis of Algorithms Lecture 14: Order Statistics.
Introduction to Algorithms Jiafen Liu Sept
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
CSCE 3110 Data Structures & Algorithm Analysis
Spring 2015 Lecture 5: QuickSort & Selection
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
1 SORTING Dan Barrish-Flood. 2 heapsort made file “3-Sorting-Intro-Heapsort.ppt”
Sorting Heapsort Quick review of basic sorting methods Lower bounds for comparison-based methods Non-comparison based sorting.
Comp 122, Spring 2004 Keys into Buckets: Lower bounds, Linear-time sort, & Hashing.
Comp 122, Spring 2004 Lower Bounds & Sorting in Linear Time.
Median, order statistics. Problem Find the i-th smallest of n elements.  i=1: minimum  i=n: maximum  i= or i= : median Sol: sort and index the i-th.
Analysis of Algorithms CS 477/677
Data Structures, Spring 2004 © L. Joskowicz 1 Data Structures – LECTURE 5 Linear-time sorting Can we do better than comparison sorting? Linear-time sorting.
David Luebke 1 7/2/2015 Medians and Order Statistics Structures for Dynamic Sets.
Lower Bounds for Comparison-Based Sorting Algorithms (Ch. 8)
David Luebke 1 8/17/2015 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
Computer Algorithms Lecture 11 Sorting in Linear Time Ch. 8
Sorting in Linear Time Lower bound for comparison-based sorting
CSE 373 Data Structures Lecture 15
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
David Luebke 1 9/8/2015 CS 332: Algorithms Review for Exam 1.
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
David Luebke 1 10/13/2015 CS 332: Algorithms Linear-Time Sorting Algorithms.
CSC 41/513: Intro to Algorithms Linear-Time Sorting Algorithms.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Analysis of Algorithms CS 477/677
September 29, Algorithms and Data Structures Lecture V Simonas Šaltenis Aalborg University
David Luebke 1 6/3/2016 CS 332: Algorithms Analyzing Quicksort: Average Case.
Order Statistics ● The ith order statistic in a set of n elements is the ith smallest element ● The minimum is thus the 1st order statistic ● The maximum.
Mudasser Naseer 1 11/5/2015 CSC 201: Design and Analysis of Algorithms Lecture # 8 Some Examples of Recursion Linear-Time Sorting Algorithms.
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
Order Statistics(Selection Problem)
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
1 Algorithms CSCI 235, Fall 2015 Lecture 17 Linear Sorting.
Mudasser Naseer 1 1/25/2016 CS 332: Algorithms Lecture # 10 Medians and Order Statistics Structures for Dynamic Sets.
David Luebke 1 2/19/2016 Priority Queues Quicksort.
Linear Sorting. Comparison based sorting Any sorting algorithm which is based on comparing the input elements has a lower bound of Proof, since there.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
David Luebke 1 6/26/2016 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
David Luebke 1 7/2/2016 CS 332: Algorithms Linear-Time Sorting: Review + Bucket Sort Medians and Order Statistics.
Lower Bounds & Sorting in Linear Time
Linear-Time Sorting Continued Medians and Order Statistics
Randomized Algorithms
MCA 301: Design and Analysis of Algorithms
Introduction to Algorithms
Order Statistics(Selection Problem)
Algorithm Design and Analysis (ADA)
Keys into Buckets: Lower bounds, Linear-time sort, & Hashing
Ch8: Sorting in Linear Time Ming-Te Chi
Randomized Algorithms
Linear Sort "Our intuition about the future is linear. But the reality of information technology is exponential, and that makes a profound difference.
CS 3343: Analysis of Algorithms
Linear Sort "Our intuition about the future is linear. But the reality of information technology is exponential, and that makes a profound difference.
Lower Bounds & Sorting in Linear Time
Linear-Time Sorting Algorithms
CS 583 Analysis of Algorithms
The Selection Problem.
Presentation transcript:

David Luebke 1 7/2/2015 Linear-Time Sorting Algorithms

2 7/2/2015 Sorting So Far l Insertion sort: n Easy to code n Fast on small inputs (less than ~50 elements) n Fast on nearly-sorted inputs n O(n 2 ) worst case n O(n 2 ) average (equally-likely inputs) case n O(n 2 ) reverse-sorted case

3 7/2/2015 Sorting So Far l Merge sort: n Divide-and-conquer: u Split array in half u Recursively sort subarrays u Linear-time merge step n O(n lg n) worst case n Doesn’t sort in place

Prepared by David Luebke 4 7/2/2015 Sorting So Far l Heap sort: n Uses the very useful heap data structure u Complete binary tree u Heap property: parent key > children’s keys n O(n lg n) worst case n Sorts in place n Fair amount of shuffling memory around

5 7/2/2015 Sorting So Far l Quick sort: n Divide-and-conquer: u Partition array into two subarrays, recursively sort u All of first subarray < all of second subarray u No merge step needed! n O(n lg n) average case n Fast in practice n O(n 2 ) worst case u Naïve implementation: worst case on sorted input u Address this with randomized quicksort

6 7/2/2015 How Fast Can We Sort? l We will provide a lower bound, then beat it n How do you suppose we’ll beat it? l First, an observation: all of the sorting algorithms so far are comparison sorts n The only operation used to gain ordering information about a sequence is the pairwise comparison of two elements n Theorem: all comparison sorts are  (n lg n) u A comparison sort must do O(n) comparisons (why?) u What about the gap between O(n) and O(n lg n)

Prepared by David Luebke 7 7/2/2015 Decision Trees l Decision trees provide an abstraction of comparison sorts n A decision tree represents the comparisons made by a comparison sort. Every thing else ignored n (Draw examples on board) l What do the leaves represent? l How many leaves must there be?

8 7/2/2015 Decision Trees l Decision trees can model comparison sorts. For a given algorithm: n One tree for each n n Tree paths are all possible execution traces n What’s the longest path in a decision tree for insertion sort? For merge sort? l What is the asymptotic height of any decision tree for sorting n elements? l Answer:  (n lg n) (now let’s prove it…)

9 7/2/2015 Lower Bound For Comparison Sorting l Thm: Any decision tree that sorts n elements has height  (n lg n) l What’s the minimum # of leaves? l What’s the maximum # of leaves of a binary tree of height h? l Clearly the minimum # of leaves is less than or equal to the maximum # of leaves

10 7/2/2015 Lower Bound For Comparison Sorting l So we have… n!  2 h l Taking logarithms: lg (n!)  h l Stirling’s approximation tells us: l Thus:

11 7/2/2015 Lower Bound For Comparison Sorting l So we have l Thus the minimum height of a decision tree is  (n lg n)

Prepared by David Luebke 12 7/2/2015 Lower Bound For Comparison Sorts l Thus the time to comparison sort n elements is  (n lg n) l Corollary: Heapsort and Mergesort are asymptotically optimal comparison sorts l But the name of this lecture is “Sorting in linear time”! n How can we do better than  (n lg n)?

13 7/2/2015 Sorting In Linear Time l Counting sort n No comparisons between elements! n But…depends on assumption about the numbers being sorted u We assume numbers are in the range 1.. k n The algorithm: u Input: A[1..n], where A[j]  {1, 2, 3, …, k} u Output: B[1..n], sorted (notice: not sorting in place) u Also: Array C[1..k] for auxiliary storage

14 7/2/2015 Counting Sort 1CountingSort(A, B, k) 2for i=1 to k 3C[i]= 0; 4for j=1 to n 5C[A[j]] += 1; 6for i=2 to k 7C[i] = C[i] + C[i-1]; 8for j=n downto 1 9B[C[A[j]]] = A[j]; 10C[A[j]] -= 1; Work through example: A={ }, k = 4

Prepared by David Luebke15 7/2/2015 Counting Sort 1CountingSort(A, B, k) 2for i=1 to k 3C[i]= 0; 4for j=1 to n 5C[A[j]] += 1; 6for i=2 to k 7C[i] = C[i] + C[i-1]; 8for j=n downto 1 9B[C[A[j]]] = A[j]; 10C[A[j]] -= 1; What will be the running time? Takes time O(k) Takes time O(n)

16 7/2/2015 Counting Sort l Total time: O(n + k) n Usually, k = O(n) n Thus counting sort runs in O(n) time l But sorting is  (n lg n)! n No contradiction--this is not a comparison sort (in fact, there are no comparisons at all!) n Notice that this algorithm is stable

David Luebke 17 7/2/2015 Counting Sort l Cool! Why don’t we always use counting sort? l Because it depends on range k of elements l Could we use counting sort to sort 32 bit integers? Why or why not? l Answer: no, k too large (2 32 = 4,294,967,296)

David Luebke 18 7/2/2015 Radix Sort l Intuitively, you might sort on the most significant digit, then the second msd, etc. l Problem: lots of intermediate piles of cards (read: scratch arrays) to keep track of l Key idea: sort the least significant digit first RadixSort(A, d) for i=1 to d StableSort(A) on digit i n Example: Fig 9.3

David Luebke 19 7/2/2015 Radix Sort l Can we prove it will work? l Sketch of an inductive argument (induction on the number of passes): n Assume lower-order digits {j: j<i}are sorted n Show that sorting next digit i leaves array correctly sorted u If two digits at position i are different, ordering numbers by that digit is correct (lower-order digits irrelevant) u If they are the same, numbers are already sorted on the lower-order digits. Since we use a stable sort, the numbers stay in the right order

David Luebke 20 7/2/2015 Radix Sort l What sort will we use to sort on digits? l Counting sort is obvious choice: n Sort n numbers on digits that range from 1..k n Time: O(n + k) l Each pass over n numbers with d digits takes time O(n+k), so total time O(dn+dk) n When d is constant and k=O(n), takes O(n) time l How many bits in a computer word?

David Luebke 21 7/2/2015 Radix Sort l Problem: sort 1 million 64-bit numbers n Treat as four-digit radix 2 16 numbers n Can sort in just four passes with radix sort! l Compares well with typical O(n lg n) comparison sort n Requires approx lg n = 20 operations per number being sorted l So why would we ever use anything but radix sort?

David Luebke 22 7/2/2015 Radix Sort l In general, radix sort based on counting sort is n Fast n Asymptotically fast (i.e., O(n)) n Simple to code n A good choice

David Luebke 23 7/2/2015 Radix Sort l Can we prove it will work? l Sketch of an inductive argument (induction on the number of passes): n Assume lower-order digits {j: j<i}are sorted n Show that sorting next digit i leaves array correctly sorted u If two digits at position i are different, ordering numbers by that digit is correct (lower-order digits irrelevant) u If they are the same, numbers are already sorted on the lower-order digits. Since we use a stable sort, the numbers stay in the right order

David Luebke 24 7/2/2015 Radix Sort l What sort will we use to sort on digits? l Counting sort is obvious choice: n Sort n numbers on digits that range from 1..k n Time: O(n + k) l Each pass over n numbers with d digits takes time O(n+k), so total time O(dn+dk) n When d is constant and k=O(n), takes O(n) time l How many bits in a computer word?

David Luebke 25 7/2/2015 Radix Sort l Problem: sort 1 million 64-bit numbers n Treat as four-digit radix 2 16 numbers n Can sort in just four passes with radix sort! l Compares well with typical O(n lg n) comparison sort n Requires approx lg n = 20 operations per number being sorted l So why would we ever use anything but radix sort?

David Luebke 26 7/2/2015 Radix Sort l In general, radix sort based on counting sort is n Fast n Asymptotically fast (i.e., O(n)) n Simple to code n A good choice

David Luebke 27 7/2/2015 Summary: Radix Sort l Radix sort: n Assumption: input has d digits ranging from 0 to k n Basic idea: u Sort elements by digit starting with least significant u Use a stable sort (like counting sort) for each stage n Each pass over n numbers with d digits takes time O(n+k), so total time O(dn+dk) u When d is constant and k=O(n), takes O(n) time n Fast! Stable! Simple!

David Luebke 28 7/2/2015 Bucket Sort l Bucket sort n Assumption: input is n reals from [0, 1) n Basic idea: u Create n linked lists (buckets) to divide interval [0,1) into subintervals of size 1/n u Add each input element to appropriate bucket and sort buckets with insertion sort n Uniform input distribution  O(1) bucket size u Therefore the expected total time is O(n) n These ideas will return when we study hash tables

David Luebke 29 7/2/2015 Order Statistics l The ith order statistic in a set of n elements is the ith smallest element l The minimum is thus the 1st order statistic l The maximum is (duh) the nth order statistic l The median is the n/2 order statistic n If n is even, there are 2 medians l How can we calculate order statistics? l What is the running time?

David Luebke 30 7/2/2015 Order Statistics l How many comparisons are needed to find the minimum element in a set? The maximum? l Can we find the minimum and maximum with less than twice the cost? l Yes: n Walk through elements by pairs u Compare each element in pair to the other u Compare the largest to maximum, smallest to minimum n Total cost: 3 comparisons per 2 elements = O(3n/2)

David Luebke 31 7/2/2015 Finding Order Statistics: The Selection Problem l A more interesting problem is selection: finding the ith smallest element of a set l We will show: n A practical randomized algorithm with O(n) expected running time n A cool algorithm of theoretical interest only with O(n) worst-case running time

David Luebke 32 7/2/2015 Randomized Selection l Key idea: use partition() from quicksort n But, only need to examine one subarray n This savings shows up in running time: O(n) l We will again use a slightly different partition than the book: q = RandomizedPartition(A, p, r)  A[q]  A[q] q pr

David Luebke 33 7/2/2015 Randomized Selection RandomizedSelect(A, p, r, i) if (p == r) then return A[p]; q = RandomizedPartition(A, p, r) k = q - p + 1; if (i == k) then return A[q]; // not in book if (i < k) then return RandomizedSelect(A, p, q-1, i); else return RandomizedSelect(A, q+1, r, i-k);  A[q]  A[q] k q pr

David Luebke 34 7/2/2015 Randomized Selection RandomizedSelect(A, p, r, i) if (p == r) then return A[p]; q = RandomizedPartition(A, p, r) k = q - p + 1; if (i == k) then return A[q]; // not in book if (i < k) then return RandomizedSelect(A, p, q-1, i); else return RandomizedSelect(A, q+1, r, i-k);  A[q]  A[q] k q pr

David Luebke 35 7/2/2015 Randomized Selection l Average case n For upper bound, assume ith element always falls in larger side of partition: n Let’s show that T(n) = O(n) by substitution What happened here?

David Luebke 36 7/2/2015 What happened here?“Split” the recurrence What happened here? Randomized Selection l Assume T(n)  cn for sufficiently large c: The recurrence we started with Substitute T(n)  cn for T(k) Expand arithmetic series Multiply it out

David Luebke 37 7/2/2015 What happened here?Subtract c/2 What happened here? Randomized Selection l Assume T(n)  cn for sufficiently large c: The recurrence so far Multiply it out Rearrange the arithmetic What we set out to prove

David Luebke 38 7/2/2015 Worst-Case Linear-Time Selection l Randomized algorithm works well in practice l What follows is a worst-case linear time algorithm, really of theoretical interest only l Basic idea: n Generate a good partitioning element n Call this element x

39 7/2/2015 Worst-Case Linear-Time Selection l The algorithm in words: 1.Divide n elements into groups of 5 2.Find median of each group (How? How long?) 3.Use Select() recursively to find median x of the  n/5  medians 4.Partition the n elements around x. Let k = rank(x) 5.if (i == k) then return x if (i k) use Select() recursively to find (i-k)th smallest element in last partition

40 7/2/2015 Worst-Case Linear-Time Selection l (Sketch situation on the board) l How many of the 5-element medians are  x? n At least 1/2 of the medians =  n/5  / 2  =  n/10  l How many elements are  x? n At least 3  n/10  elements l For large n, 3  n/10   n/4 (How large?) l So at least n/4 elements  x l Similarly: at least n/4 elements  x

41 7/2/2015 Worst-Case Linear-Time Selection l Thus after partitioning around x, step 5 will call Select() on at most 3n/4 elements l The recurrence is therefore: ???  n/5   n/5 Substitute T(n) = cn Combine fractions Express in desired form What we set out to prove

42 7/2/2015 Worst-Case Linear-Time Selection l Intuitively: n Work at each level is a constant fraction (19/20) smaller u Geometric progression! n Thus the O(n) work at the root dominates

43 7/2/2015 Linear-Time Median Selection l Given a “black box” O(n) median algorithm, what can we do? n ith order statistic: u Find median x u Partition input around x u if (i  (n+1)/2) recursively find ith element of first half u else find (i - (n+1)/2)th element in second half u T(n) = T(n/2) + O(n) = O(n) n Can you think of an application to sorting?

David Luebke 44 7/2/2015 Linear-Time Median Selection l Worst-case O(n lg n) quicksort n Find median x and partition around it n Recursively quicksort two halves n T(n) = 2T(n/2) + O(n) = O(n lg n)