CSE373: Data Structures & Algorithms Lecture 20: Beyond Comparison Sorting Dan Grossman Fall 2013.

Slides:



Advertisements
Similar presentations
Analysis of Algorithms CS 477/677 Linear Sorting Instructor: George Bebis ( Chapter 8 )
Advertisements

Non-Comparison Based Sorting
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
MS 101: Algorithms Instructor Neelima Gupta
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Dan Grossman Spring 2010.
Lower bound for sorting, radix sort COMP171 Fall 2005.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
CSCE 3110 Data Structures & Algorithm Analysis
CSE 332 Data Abstractions: Sorting It All Out Kate Deibel Summer 2012 July 16, 2012CSE 332 Data Abstractions, Summer
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 24 Sorting.
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
CSE332: Data Abstractions Lecture 12: Introduction to Sorting Tyler Robison Summer
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Tyler Robison Summer
Lower bound for sorting, radix sort COMP171 Fall 2006.
Lecture 5: Linear Time Sorting Shang-Hua Teng. Sorting Input: Array A[1...n], of elements in arbitrary order; array size n Output: Array A[1...n] of the.
CS 253: Algorithms Chapter 8 Sorting in Linear Time Credit: Dr. George Bebis.
Sorting Heapsort Quick review of basic sorting methods Lower bounds for comparison-based methods Non-comparison based sorting.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Comp 122, Spring 2004 Lower Bounds & Sorting in Linear Time.
Lecture 5: Master Theorem and Linear Time Sorting
Divide and Conquer Sorting
CSE 326: Data Structures Sorting Ben Lerner Summer 2007.
Analysis of Algorithms CS 477/677
1 Today’s Material Lower Bounds on Comparison-based Sorting Linear-Time Sorting Algorithms –Counting Sort –Radix Sort.
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
© 2006 Pearson Addison-Wesley. All rights reserved10 A-1 Chapter 10 Algorithm Efficiency and Sorting.
Lower Bounds for Comparison-Based Sorting Algorithms (Ch. 8)
Computer Algorithms Lecture 11 Sorting in Linear Time Ch. 8
Data Structure & Algorithm Lecture 7 – Linear Sort JJCAO Most materials are stolen from Prof. Yoram Moses’s course.
Sorting in Linear Time Lower bound for comparison-based sorting
CSE 373 Data Structures Lecture 15
CSE373: Data Structure & Algorithms Lecture 22: Beyond Comparison Sorting Aaron Bauer Winter 2014.
CSE373: Data Structure & Algorithms Lecture 20: More Sorting Lauren Milne Summer 2015.
David Luebke 1 10/13/2015 CS 332: Algorithms Linear-Time Sorting Algorithms.
CSC 41/513: Intro to Algorithms Linear-Time Sorting Algorithms.
Introduction to Algorithms Jiafen Liu Sept
Sorting Fun1 Chapter 4: Sorting     29  9.
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Dan Grossman Spring 2012.
Analysis of Algorithms CS 477/677
CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting Aaron Bauer Winter 2014.
Mudasser Naseer 1 11/5/2015 CSC 201: Design and Analysis of Algorithms Lecture # 8 Some Examples of Recursion Linear-Time Sorting Algorithms.
CSE373: Data Structure & Algorithms Lecture 22: More Sorting Linda Shapiro Winter 2015.
CSE373: Data Structure & Algorithms Lecture 21: More Sorting and Other Classes of Algorithms Lauren Milne Summer 2015.
Sorting.
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
Quick sort, lower bound on sorting, bucket sort, radix sort, comparison of algorithms, code, … Sorting: part 2.
Data Structures Haim Kaplan & Uri Zwick December 2013 Sorting 1.
CS6045: Advanced Algorithms Sorting Algorithms. Heap Data Structure A heap (nearly complete binary tree) can be stored as an array A –Root of tree is.
CSE332: Data Abstractions Lecture 12: Introduction to Sorting Dan Grossman Spring 2010.
CSE373: Data Structure & Algorithms Lecture 22: More Sorting Nicki Dell Spring 2014.
Sorting Lower Bounds n Beating Them. Recap Divide and Conquer –Know how to break a problem into smaller problems, such that –Given a solution to the smaller.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
1 Chapter 8-1: Lower Bound of Comparison Sorts. 2 About this lecture Lower bound of any comparison sorting algorithm – applies to insertion sort, selection.
CSE373: Data Structures & Algorithms
Lower Bounds & Sorting in Linear Time
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
May 26th –non-comparison sorting
CSE373: Data Structure & Algorithms Lecture 23: More Sorting and Other Classes of Algorithms Catie Baker Spring 2015.
May 17th – Comparison Sorts
CSE373: Data Structures & Algorithms
CSE332: Data Abstractions Lecture 12: Introduction to Sorting
Introduction to Algorithms
Instructor: Lilian de Greef Quarter: Summer 2017
Ch8: Sorting in Linear Time Ming-Te Chi
CSE373: Data Structure & Algorithms Lecture 22: More Sorting
Lower Bounds & Sorting in Linear Time
CSE373: Data Structure & Algorithms Lecture 21: Comparison Sorting
Linear-Time Sorting Algorithms
CSE 326: Data Structures Sorting
Lower bound for sorting, radix sort
Presentation transcript:

CSE373: Data Structures & Algorithms Lecture 20: Beyond Comparison Sorting Dan Grossman Fall 2013

The Big Picture Surprising amount of juicy computer science: 2-3 lectures… Fall 20132CSE373: Data Structures & Algorithms Simple algorithms: O(n 2 ) Fancier algorithms: O(n log n) Comparison lower bound:  (n log n) Specialized algorithms: O(n) Handling huge data sets Insertion sort Selection sort Shell sort … Heap sort Merge sort Quick sort (avg) … Bucket sort Radix sort External sorting

How Fast Can We Sort? Heapsort & mergesort have O(n log n) worst-case running time Quicksort has O(n log n) average-case running time These bounds are all tight, actually  (n log n) So maybe we need to dream up another algorithm with a lower asymptotic complexity, such as O(n) or O(n log log n) –Instead: we know that this is impossible Assuming our comparison model: The only operation an algorithm can perform on data items is a 2-element comparison Fall 20133CSE373: Data Structures & Algorithms

A General View of Sorting Assume we have n elements to sort –For simplicity, assume none are equal (no duplicates) How many permutations of the elements (possible orderings)? Example, n=3 a[0]<a[1]<a[2]a[0]<a[2]<a[1]a[1]<a[0]<a[2] a[1]<a[2]<a[0]a[2]<a[0]<a[1]a[2]<a[1]<a[0] In general, n choices for least element, n-1 for next, n-2 for next, … –n(n-1)(n-2)…(2)(1) = n! possible orderings Fall 20134CSE373: Data Structures & Algorithms

Counting Comparisons So every sorting algorithm has to “find” the right answer among the n! possible answers –Starts “knowing nothing”, “anything is possible” –Gains information with each comparison –Intuition: Each comparison can at best eliminate half the remaining possibilities –Must narrow answer down to a single possibility What we can show: Any sorting algorithm must do at least (1/2)n log n – (1/2)n (which is  (n log n)) comparisons –Otherwise there are at least two permutations among the n! possible that cannot yet be distinguished, so the algorithm would have to guess and could be wrong [incorrect algorithm] Fall 20135CSE373: Data Structures & Algorithms

Optional: Counting Comparisons Don’t know what the algorithm is, but it cannot make progress without doing comparisons –Eventually does a first comparison “is a < b ?" –Can use the result to decide what second comparison to do –Etc.: comparison k can be chosen based on first k-1 results Can represent this process as a decision tree –Nodes contain “set of remaining possibilities” Root: None of the n! options yet eliminated –Edges are “answers from a comparison” –The algorithm does not actually build the tree; it’s what our proof uses to represent “the most the algorithm could know so far” as the algorithm progresses Fall 20136CSE373: Data Structures & Algorithms

Optional: One Decision Tree for n=3 Fall 20137CSE373: Data Structures & Algorithms a < b < c, b < c < a, a < c < b, c < a < b, b < a < c, c < b < a a < b < c a < c < b c < a < b b < a < c b < c < a c < b < a a < b < c a < c < b c < a < b a < b < c a < c < b b < a < c b < c < a c < b < a b < c < a b < a < c a < b a > b a > ca < c b < c b > c b < c b > c c < a c > a The leaves contain all the possible orderings of a, b, c A different algorithm would lead to a different tree

Optional: Example if a < c < b Fall 20138CSE373: Data Structures & Algorithms a < b < c, b < c < a, a < c < b, c < a < b, b < a < c, c < b < a a < b < c a < c < b c < a < b b < a < c b < c < a c < b < a a < b < c a < c < b c < a < b a < b < c a < c < b b < a < c b < c < a c < b < a b < c < a b < a < c a < b a > b a > ca < c b < c b > c b < c b > c c < a c > a possible orders actual order

Optional: What the Decision Tree Tells Us A binary tree because each comparison has 2 outcomes –(We assume no duplicate elements) –(Would have 1 outcome if algorithm asks redundant questions) Because any data is possible, any algorithm needs to ask enough questions to produce all n! answers –Each answer is a different leaf –So the tree must be big enough to have n! leaves –Running any algorithm on any input will at best correspond to a root-to-leaf path in some decision tree with n! leaves –So no algorithm can have worst-case running time better than the height of a tree with n! leaves Worst-case number-of-comparisons for an algorithm is an input leading to a longest path in algorithm’s decision tree Fall 20139CSE373: Data Structures & Algorithms

Optional: Where are we Proven: No comparison sort can have worst-case running time better than the height of a binary tree with n! leaves –A comparison sort could be worse than this height, but it cannot be better Now: a binary tree with n! leaves has height  (n log n) –Height could be more, but cannot be less –Factorial function grows very quickly Conclusion: Comparison sorting is  (n log n) –An amazing computer-science result: proves all the clever programming in the world cannot comparison-sort in linear time Fall CSE373: Data Structures & Algorithms

Optional: Height lower bound The height of a binary tree with L leaves is at least log 2 L So the height of our decision tree, h: h  log 2 (n!) property of binary trees = log 2 (n*(n-1)*(n-2)…(2)(1)) definition of factorial = log 2 n + log 2 (n-1) + … + log 2 1 property of logarithms  log 2 n + log 2 (n-1) + … + log 2 (n/2) drop smaller terms (  0)  log 2 (n/2) + log 2 (n/2) + … + log 2 (n/2) shrink terms to log 2 (n/2) = (n/2) log 2 (n/2) arithmetic = (n/2)( log 2 n - log 2 2) property of logarithms = (1/2)n log 2 n – (1/2)n arithmetic “=“  (n log n) Fall CSE373: Data Structures & Algorithms

The Big Picture Surprising amount of juicy computer science: 2-3 lectures… Fall CSE373: Data Structures & Algorithms Simple algorithms: O(n 2 ) Fancier algorithms: O(n log n) Comparison lower bound:  (n log n) Specialized algorithms: O(n) Handling huge data sets Insertion sort Selection sort Shell sort … Heap sort Merge sort Quick sort (avg) … Bucket sort Radix sort External sorting How??? Change the model – assume more than “compare(a,b)”

BucketSort (a.k.a. BinSort) If all values to be sorted are known to be integers between 1 and K (or any small range): –Create an array of size K –Put each element in its proper bucket (a.k.a. bin) –If data is only integers, no need to store more than a count of how times that bucket has been used Output result via linear pass through array of buckets Fall CSE373: Data Structures & Algorithms count array Example: K=5 input (5,1,3,4,3,2,1,1,5,4,5) output: 1,1,1,2,3,3,4,4,5,5,5

Analyzing Bucket Sort Overall: O(n+K) –Linear in n, but also linear in K –  (n log n) lower bound does not apply because this is not a comparison sort Good when K is smaller (or not much larger) than n –We don’t spend time doing comparisons of duplicates Bad when K is much larger than n –Wasted space; wasted time during linear O(K) pass For data in addition to integer keys, use list at each bucket Fall CSE373: Data Structures & Algorithms

Bucket Sort with Data Most real lists aren’t just keys; we have data Each bucket is a list (say, linked list) To add to a bucket, insert in O(1) (at beginning, or keep pointer to last element) count array Example: Movie ratings; scale 1-5;1=bad, 5=excellent Input= 5: Casablanca 3: Harry Potter movies 5: Star Wars Original Trilogy 1: Rocky V Rocky V Harry Potter CasablancaStar Wars Result: 1: Rocky V, 3: Harry Potter, 5: Casablanca, 5: Star Wars Easy to keep ‘stable’; Casablanca still before Star Wars Fall CSE373: Data Structures & Algorithms

Radix sort Radix = “the base of a number system” –Examples will use 10 because we are used to that –In implementations use larger numbers For example, for ASCII strings, might use 128 Idea: –Bucket sort on one digit at a time Number of buckets = radix Starting with least significant digit Keeping sort stable –Do one pass per digit –Invariant: After k passes (digits), the last k digits are sorted Aside: Origins go back to the 1890 U.S. census Fall CSE373: Data Structures & Algorithms

Example Radix = 10 Input: Fall CSE373: Data Structures & Algorithms First pass: bucket sort by ones digit Order now:

Example Fall CSE373: Data Structures & Algorithms Second pass: stable bucket sort by tens digit Order now: Radix = 10 Order was:

Example Fall CSE373: Data Structures & Algorithms Third pass: stable bucket sort by 100s digit Order now: Radix = Order was:

Analysis Input size: n Number of buckets = Radix: B Number of passes = “Digits”: P Work per pass is 1 bucket sort: O(B+n) Total work is O(P(B+n)) Compared to comparison sorts, sometimes a win, but often not –Example: Strings of English letters up to length 15 Run-time proportional to: 15*(52 + n) This is less than n log n only if n > 33,000 Of course, cross-over point depends on constant factors of the implementations –And radix sort can have poor locality properties Fall CSE373: Data Structures & Algorithms

Sorting massive data Need sorting algorithms that minimize disk/tape access time: –Quicksort and Heapsort both jump all over the array, leading to expensive random disk accesses –Mergesort scans linearly through arrays, leading to (relatively) efficient sequential disk access Mergesort is the basis of massive sorting Mergesort can leverage multiple disks 21 CSE373: Data Structures & Algorithms Fall 2013

Last Slide on Sorting Simple O(n 2 ) sorts can be fastest for small n –Selection sort, Insertion sort (latter linear for mostly-sorted) –Good for “below a cut-off” to help divide-and-conquer sorts O(n log n) sorts –Heap sort, in-place but not stable nor parallelizable –Merge sort, not in place but stable and works as external sort –Quick sort, in place but not stable and O(n 2 ) in worst-case Often fastest, but depends on costs of comparisons/copies  (n log n) is worst-case and average lower-bound for sorting by comparisons Non-comparison sorts –Bucket sort good for small number of possible key values –Radix sort uses fewer buckets and more phases Best way to sort? It depends! Fall CSE373: Data Structures & Algorithms