CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting Aaron Bauer Winter 2014.

Slides:



Advertisements
Similar presentations
Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
Advertisements

Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Dan Grossman Spring 2010.
ADA: 5. Quicksort1 Objective o describe the quicksort algorithm, it's partition function, and analyse its running time under different data conditions.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
Using Divide and Conquer for Sorting
CSE 332 Data Abstractions: Sorting It All Out Kate Deibel Summer 2012 July 16, 2012CSE 332 Data Abstractions, Summer
Quicksort CS 3358 Data Structures. Sorting II/ Slide 2 Introduction Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case:
25 May Quick Sort (11.2) CSE 2011 Winter 2011.
Quicksort COMP171 Fall Sorting II/ Slide 2 Introduction * Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case: O(N.
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
CSE332: Data Abstractions Lecture 13: Comparison Sorting Tyler Robison Summer
CSE332: Data Abstractions Lecture 13: Comparison Sorting
CSE332: Data Abstractions Lecture 12: Introduction to Sorting Tyler Robison Summer
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Tyler Robison Summer
Data Structures, Spring 2006 © L. Joskowicz 1 Data Structures – LECTURE 4 Comparison-based sorting Why sorting? Formal analysis of Quick-Sort Comparison.
CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting.
Quicksort.
TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] ,
Divide and Conquer Sorting
CSE 326: Data Structures Sorting Ben Lerner Summer 2007.
CSE 373 Data Structures Lecture 15
CSE373: Data Structures and Algorithms Lecture 4: Asymptotic Analysis Aaron Bauer Winter 2014.
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
CSE373: Data Structure & Algorithms Lecture 20: More Sorting Lauren Milne Summer 2015.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
David Luebke 1 10/13/2015 CS 332: Algorithms Linear-Time Sorting Algorithms.
CSC 41/513: Intro to Algorithms Linear-Time Sorting Algorithms.
CSE373: Data Structures & Algorithms Lecture 20: Beyond Comparison Sorting Dan Grossman Fall 2013.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Sorting Fun1 Chapter 4: Sorting     29  9.
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Dan Grossman Spring 2012.
CS 61B Data Structures and Programming Methodology July 28, 2008 David Sun.
Analysis of Algorithms CS 477/677
September 29, Algorithms and Data Structures Lecture V Simonas Šaltenis Aalborg University
Mudasser Naseer 1 11/5/2015 CSC 201: Design and Analysis of Algorithms Lecture # 8 Some Examples of Recursion Linear-Time Sorting Algorithms.
CSE373: Data Structure & Algorithms Lecture 22: More Sorting Linda Shapiro Winter 2015.
1 CSE 326: Data Structures A Sort of Detour Henry Kautz Winter Quarter 2002.
Sorting: Implementation Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.
Quicksort CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
Review 1 Selection Sort Selection Sort Algorithm Time Complexity Best case Average case Worst case Examples.
Sorting.
Sorting Fundamental Data Structures and Algorithms Aleks Nanevski February 17, 2004.
CSE332: Data Abstractions Lecture 12: Introduction to Sorting Dan Grossman Spring 2010.
CSE373: Data Structure & Algorithms Lecture 22: More Sorting Nicki Dell Spring 2014.
CSE 326: Data Structures Lecture 23 Spring Quarter 2001 Sorting, Part 1 David Kaplan
CSE332: Data Abstractions Lecture 13: Comparison Sorting Dan Grossman Spring 2012.
Sorting Lower Bounds n Beating Them. Recap Divide and Conquer –Know how to break a problem into smaller problems, such that –Given a solution to the smaller.
SORTING AND ASYMPTOTIC COMPLEXITY Lecture 13 CS2110 – Fall 2009.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
CSE 250 – Data Structures. Today’s Goals  First review the easy, simple sorting algorithms  Compare while inserting value into place in the vector 
1 Chapter 8-1: Lower Bound of Comparison Sorts. 2 About this lecture Lower bound of any comparison sorting algorithm – applies to insertion sort, selection.
CSE373: Data Structure & Algorithms Lecture 19: Comparison Sorting Lauren Milne Summer 2015.
CSE373: Data Structures & Algorithms
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
CSE 373: Data Structure & Algorithms Comparison Sorting
CSE373: Data Structures & Algorithms
CSE373: Data Structures & Algorithms
CSE332: Data Abstractions Lecture 12: Introduction to Sorting
Instructor: Lilian de Greef Quarter: Summer 2017
Instructor: Lilian de Greef Quarter: Summer 2017
Quick Sort (11.2) CSE 2011 Winter November 2018.
CSE373: Data Structure & Algorithms Lecture 21: Comparison Sorting
CSE373: Data Structure & Algorithms Lecture 22: More Sorting
CSE373: Data Structure & Algorithms Lecture 21: Comparison Sorting
CSE 326: Data Structures Sorting
CSE 373 Data Structures and Algorithms
The Selection Problem.
CSE 332: Sorting II Spring 2016.
Presentation transcript:

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting Aaron Bauer Winter 2014

The main problem, stated carefully For now, assume we have n comparable elements in an array and we want to rearrange them to be in increasing order Input: –An array A of data records –A key value in each data record –A comparison function (consistent and total) Effect: –Reorganize the elements of A such that for any i and j, if i < j then A[i]  A[j] –(Also, A must have exactly the same data it started with) –Could also sort in reverse order, of course An algorithm doing this is a comparison sort Winter 20142CSE373: Data Structures & Algorithms

Sorting: The Big Picture Surprising amount of neat stuff to say about sorting: Winter 20143CSE373: Data Structures & Algorithms Simple algorithms: O(n 2 ) Fancier algorithms: O(n log n) Comparison lower bound:  (n log n) Specialized algorithms: O(n) Handling huge data sets Insertion sort Selection sort Shell sort … Heap sort Merge sort Quick sort (avg) … Bucket sort Radix sort External sorting

Mergesort Analysis Having defined an algorithm and argued it is correct, we should analyze its running time and space: To sort n elements, we: –Return immediately if n=1 –Else do 2 subproblems of size n/2 and then an O(n) merge Recurrence relation: T(1) = c 1 T(n) = 2T(n/2) + c 2 n Winter 20144CSE373: Data Structures & Algorithms

One of the recurrence classics… For simplicity let constants be 1 – no effect on asymptotic answer T(1) = 1 So total is 2 k T(n/2 k ) + kn where T(n) = 2T(n/2) + n n/2 k = 1, i.e., log n = k = 2(2T(n/4) + n/2) + n That is, 2 log n T(1) + n log n = 4T(n/4) + 2n = n + n log n = 4(2T(n/8) + n/4) + 2n = O(n log n) = 8T(n/8) + 3n …. = 2 k T(n/2 k ) + kn Winter 20145CSE373: Data Structures & Algorithms

Or more intuitively… This recurrence is common you just “know” it’s O(n log n) Merge sort is relatively easy to intuit (best, worst, and average): The recursion “tree” will have log n height At each level we do a total amount of merging equal to n Winter 20146CSE373: Data Structures & Algorithms

Quicksort Also uses divide-and-conquer –Recursively chop into two pieces –Instead of doing all the work as we merge together, we will do all the work as we recursively split into halves –Unlike merge sort, does not need auxiliary space O(n log n) on average, but O(n 2 ) worst-case  Faster than merge sort in practice? –Often believed so –Does fewer copies and more comparisons, so it depends on the relative cost of these two operations! Winter 20147CSE373: Data Structures & Algorithms

Quicksort Overview 1.Pick a pivot element 2.Partition all the data into: A.The elements less than the pivot B.The pivot C.The elements greater than the pivot 3.Recursively sort A and C 4.The answer is, “as simple as A, B, C” (Alas, there are some details lurking in this algorithm) Winter 20148CSE373: Data Structures & Algorithms

Think in Terms of Sets Winter 20149CSE373: Data Structures & Algorithms S select pivot value S1S1 S2S2 partition S S1S S2S2 Quicksort(S 1 ) and Quicksort(S 2 ) S Presto! S is sorted [Weiss]

Example, Showing Recursion Winter CSE373: Data Structures & Algorithms Conquer Divide 1 Element

Details Have not yet explained: How to pick the pivot element –Any choice is correct: data will end up sorted –But as analysis will show, want the two partitions to be about equal in size How to implement partitioning –In linear time –In place Winter CSE373: Data Structures & Algorithms

Pivots Best pivot? –Median –Halve each time Worst pivot? –Greatest/least element –Problem of size n - 1 –O(n 2 ) Winter 2014CSE373: Data Structures & Algorithms12

Potential pivot rules While sorting arr from lo (inclusive) to hi (exclusive)… Pick arr[lo] or arr[hi-1] –Fast, but worst-case occurs with mostly sorted input Pick random element in the range –Does as well as any technique, but (pseudo)random number generation can be slow –Still probably the most elegant approach Median of 3, e.g., arr[lo], arr[hi-1], arr[(hi+lo)/2] –Common heuristic that tends to work well Winter CSE373: Data Structures & Algorithms

Partitioning Conceptually simple, but hardest part to code up correctly –After picking pivot, need to partition in linear time in place One approach (there are slightly fancier ones): 1.Swap pivot with arr[lo] 2.Use two fingers i and j, starting at lo+1 and hi-1 3.while (i < j) if (arr[j] > pivot) j-- else if (arr[i] < pivot) i++ else swap arr[i] with arr[j] 4.Swap pivot with arr[i] * *skip step 4 if pivot ends up being least element Winter CSE373: Data Structures & Algorithms

Example Step one: pick pivot as median of 3 –lo = 0, hi = 10 Winter CSE373: Data Structures & Algorithms Step two: move pivot to the lo position

Example Now partition in place Move fingers Swap Move fingers Move pivot Winter CSE373: Data Structures & Algorithms Often have more than one swap during partition – this is a short example

Analysis Best-case: Pivot is always the median T(0)=T(1)=1 T(n)=2T(n/2) + n -- linear-time partition Same recurrence as mergesort: O(n log n) Worst-case: Pivot is always smallest or largest element T(0)=T(1)=1 T(n) = 1T(n-1) + n Basically same recurrence as selection sort: O(n 2 ) Average-case (e.g., with random pivot) –O(n log n), not responsible for proof (in text) Winter CSE373: Data Structures & Algorithms

Cutoffs For small n, all that recursion tends to cost more than doing a quadratic sort –Remember asymptotic complexity is for large n Common engineering technique: switch algorithm below a cutoff –Reasonable rule of thumb: use insertion sort for n < 10 Notes: –Could also use a cutoff for merge sort –Cutoffs are also the norm with parallel algorithms Switch to sequential algorithm –None of this affects asymptotic complexity Winter CSE373: Data Structures & Algorithms

Cutoff skeleton Winter CSE373: Data Structures & Algorithms void quicksort(int[] arr, int lo, int hi) { if(hi – lo < CUTOFF) insertionSort(arr,lo,hi); else … } Notice how this cuts out the vast majority of the recursive calls – Think of the recursive calls to quicksort as a tree – Trims out the bottom layers of the tree

Visualizations Winter CSE373: Data Structures & Algorithms

How Fast Can We Sort? Heapsort & mergesort have O(n log n) worst-case running time Quicksort has O(n log n) average-case running time These bounds are all tight, actually  (n log n) So maybe we need to dream up another algorithm with a lower asymptotic complexity, such as O(n) or O(n log log n) –Instead: we know that this is impossible Assuming our comparison model: The only operation an algorithm can perform on data items is a 2-element comparison Winter CSE373: Data Structures & Algorithms

A General View of Sorting Assume we have n elements to sort –For simplicity, assume none are equal (no duplicates) How many permutations of the elements (possible orderings)? Example, n=3 a[0]<a[1]<a[2]a[0]<a[2]<a[1]a[1]<a[0]<a[2] a[1]<a[2]<a[0]a[2]<a[0]<a[1]a[2]<a[1]<a[0] In general, n choices for least element, n-1 for next, n-2 for next, … –n(n-1)(n-2)…(2)(1) = n! possible orderings Winter CSE373: Data Structures & Algorithms

Counting Comparisons So every sorting algorithm has to “find” the right answer among the n! possible answers –Starts “knowing nothing”, “anything is possible” –Gains information with each comparison –Intuition: Each comparison can at best eliminate half the remaining possibilities –Must narrow answer down to a single possibility What we can show: Any sorting algorithm must do at least (1/2)n log n – (1/2)n (which is  (n log n)) comparisons –Otherwise there are at least two permutations among the n! possible that cannot yet be distinguished, so the algorithm would have to guess and could be wrong [incorrect algorithm] Winter CSE373: Data Structures & Algorithms

Optional: Counting Comparisons Don’t know what the algorithm is, but it cannot make progress without doing comparisons –Eventually does a first comparison “is a < b ?" –Can use the result to decide what second comparison to do –Etc.: comparison k can be chosen based on first k-1 results Can represent this process as a decision tree –Nodes contain “set of remaining possibilities” Root: None of the n! options yet eliminated –Edges are “answers from a comparison” –The algorithm does not actually build the tree; it’s what our proof uses to represent “the most the algorithm could know so far” as the algorithm progresses Winter CSE373: Data Structures & Algorithms

Optional: One Decision Tree for n=3 Winter CSE373: Data Structures & Algorithms a < b < c, b < c < a, a < c < b, c < a < b, b < a < c, c < b < a a < b < c a < c < b c < a < b b < a < c b < c < a c < b < a a < b < c a < c < b c < a < b a < b < c a < c < b b < a < c b < c < a c < b < a b < c < a b < a < c a < b a > b a > ca < c b < c b > c b < c b > c c < a c > a The leaves contain all the possible orderings of a, b, c A different algorithm would lead to a different tree

Optional: Example if a < c < b Winter CSE373: Data Structures & Algorithms a < b < c, b < c < a, a < c < b, c < a < b, b < a < c, c < b < a a < b < c a < c < b c < a < b b < a < c b < c < a c < b < a a < b < c a < c < b c < a < b a < b < c a < c < b b < a < c b < c < a c < b < a b < c < a b < a < c a < b a > b a > ca < c b < c b > c b < c b > c c < a c > a possible orders actual order

Optional: What the Decision Tree Tells Us A binary tree because each comparison has 2 outcomes –(We assume no duplicate elements) –(Would have 1 outcome if algorithm asks redundant questions) Because any data is possible, any algorithm needs to ask enough questions to produce all n! answers –Each answer is a different leaf –So the tree must be big enough to have n! leaves –Running any algorithm on any input will at best correspond to a root-to-leaf path in some decision tree with n! leaves –So no algorithm can have worst-case running time better than the height of a tree with n! leaves Worst-case number-of-comparisons for an algorithm is an input leading to a longest path in algorithm’s decision tree Winter CSE373: Data Structures & Algorithms

Optional: Where are we Proven: No comparison sort can have worst-case running time better than the height of a binary tree with n! leaves –A comparison sort could be worse than this height, but it cannot be better Now: a binary tree with n! leaves has height  (n log n) –Height could be more, but cannot be less –Factorial function grows very quickly Conclusion: Comparison sorting is  (n log n) –An amazing computer-science result: proves all the clever programming in the world cannot comparison-sort in linear time Winter CSE373: Data Structures & Algorithms

Optional: Height lower bound The height of a binary tree with L leaves is at least log 2 L So the height of our decision tree, h: h  log 2 (n!) property of binary trees = log 2 (n*(n-1)*(n-2)…(2)(1)) definition of factorial = log 2 n + log 2 (n-1) + … + log 2 1 property of logarithms  log 2 n + log 2 (n-1) + … + log 2 (n/2) drop smaller terms (  0)  log 2 (n/2) + log 2 (n/2) + … + log 2 (n/2) shrink terms to log 2 (n/2) = (n/2) log 2 (n/2) arithmetic = (n/2)( log 2 n - log 2 2) property of logarithms = (1/2)n log 2 n – (1/2)n arithmetic “=“  (n log n) Winter CSE373: Data Structures & Algorithms