CS221: Algorithms and Data Structures Lecture #4 Sorting Things Out

Slides:



Advertisements
Similar presentations
Introduction to Algorithms Quicksort
Advertisements

Using Divide and Conquer for Sorting
ISOM MIS 215 Module 7 – Sorting. ISOM Where are we? 2 Intro to Java, Course Java lang. basics Arrays Introduction NewbieProgrammersDevelopersProfessionalsDesigners.
Data Structures Data Structures Topic #13. Today’s Agenda Sorting Algorithms: Recursive –mergesort –quicksort As we learn about each sorting algorithm,
CS 171: Introduction to Computer Science II Quicksort.
1 Data Structures and Algorithms Sorting. 2  Sorting is the process of arranging a list of items into a particular order  There must be some value on.
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
10/14/ Algorithms1 Algorithms - Ch2 - Sorting.
C++ Programming: From Problem Analysis to Program Design, Second Edition Chapter 19: Searching and Sorting.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
CS 61B Data Structures and Programming Methodology July 28, 2008 David Sun.
CS221: Algorithms and Data Structures Lecture #4 Sorting Things Out Steve Wolfman 2014W1 1.
1 Sorting Algorithms Sections 7.1 to Comparison-Based Sorting Input – 2,3,1,15,11,23,1 Output – 1,1,2,3,11,15,23 Class ‘Animals’ – Sort Objects.
Sorting: Implementation Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.
CS221: Algorithms and Data Structures Lecture #4 Sorting Things Out Steve Wolfman 2011W2 1.
Searching Topics Sequential Search Binary Search.
CSE 326: Data Structures Lecture 23 Spring Quarter 2001 Sorting, Part 1 David Kaplan
Quicksort This is probably the most popular sorting algorithm. It was invented by the English Scientist C.A.R. Hoare It is popular because it works well.
329 3/30/98 CSE 143 Searching and Sorting [Sections 12.4, ]
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Fundamental Data Structures and Algorithms
Subject Name: Design and Analysis of Algorithm Subject Code: 10CS43
May 17th – Comparison Sorts
CS 3343: Analysis of Algorithms
COMP 53 – Week Seven Big O Sorting.
Sorting by Tammy Bailey
Quicksort
Divide-And-Conquer-And-Combine
Quick-Sort 9/13/2018 1:15 AM Quick-Sort     2
Chapter 7 Sorting Spring 14
Algorithm Analysis CSE 2011 Winter September 2018.
Quicksort "There's nothing in your head the sorting hat can't see. So try me on and I will tell you where you ought to be." -The Sorting Hat, Harry Potter.
Week 12 - Wednesday CS221.
Data Structures and Algorithms
Description Given a linear collection of items x1, x2, x3,….,xn
Quicksort 1.
CS 3343: Analysis of Algorithms
Advanced Sorting Methods: Shellsort
Quick Sort (11.2) CSE 2011 Winter November 2018.
CO 303 Algorithm Analysis And Design Quicksort
Quicksort analysis Bubble sort
Ch 7: Quicksort Ming-Te Chi
Hassan Khosravi / Geoffrey Tien
Data Structures Review Session
Lecture 3 / 4 Algorithm Analysis
8/04/2009 Many thanks to David Sun for some of the included slides!
Divide-And-Conquer-And-Combine
Sub-Quadratic Sorting Algorithms
CSE373: Data Structure & Algorithms Lecture 21: Comparison Sorting
Linear-Time Sorting Algorithms
Searching.
CSE 326: Data Structures Sorting
EE 312 Software Design and Implementation I
CSE 332: Data Abstractions Sorting I
CS 3343: Analysis of Algorithms
Quicksort.
Algorithm Efficiency and Sorting
Analysis of Algorithms
CSE 373 Data Structures and Algorithms
CSC 143 Java Sorting.
Quicksort.
Algorithms Recurrences.
The Selection Problem.
CSE 332: Sorting II Spring 2016.
Algorithm Efficiency and Sorting
Divide and Conquer Merge sort and quick sort Binary search
Quicksort.
Advanced Sorting Methods: Shellsort
Presentation transcript:

CS221: Algorithms and Data Structures Lecture #4 Sorting Things Out Steve Wolfman 2009W1

Learning Goals After this unit, you should be able to: Describe and apply various sorting algorithms; Compare and contrast their tradeoffs. State differences in performance for large data sets versus small data sets on various sorting algorithms. Analyze the complexity of these sorting algorithms. Explain the difference between the complexity of a problem (sorting) and the complexity of a particular algorithm for solving that problem. Manipulate data using various sorting algorithms (irrespective of any implementation).

Today’s Outline Categorizing/Comparing Sorting Algorithms MergeSort PQSorts as examples MergeSort QuickSort More Comparisons Complexity of Sorting

Categorizing Sorting Algorithms Computational complexity Average case behaviour: Why do we care? Worst/best case behaviour: Why do we care? How often do we resort sorted, reverse sorted, or “almost” sorted (k swaps from sorted where k << n) lists? Stability: What happens to elements with identical keys? Memory Usage: How much extra memory is used? * Computational complexity (worst, average and best number of comparisons for several typical test cases, see below) in terms of the size of the list (n). Typically, good average number of comparisons is O(n log n) and bad is O(n2). Note that asymptotic analysis does not tell about algorithms behavior on small lists or worst case behavior. Worst case behavior is probably more important then average. For example "plain-vanilla" quicksort requires O(n2) comparisons in case of already sorted arrays: a very important in practice case. Sort algorithms which only use an abstract key comparison operation always need at least O(n log n) comparisons on average; while sort algorithms which exploit the structure of the key space cannot sort faster than O(n log k) where k is the size of the keyspace. Please note that the number of comparison is just convenient theoretical metric. In reality both moves and comparisons matter and on short keys the cost of move is comparable to the cost of comparison (even if pointers are used). * Stability: stable sorting algorithms maintain the relative order of records with equal keys. If all keys are different the this distinction does not make any sense. But if there are equal keys, then a sorting algorithm is stable if whenever there are two records R and S with the same key and with R appearing before S in the original list, R will appear before S in the sorted list. * Memory usage (and use of other computer resources). One large class of algorithms are "in- place sorting. They are generally slower than algorithms that use additional memory: additional memory can be used for mapping of keyspace. Most fast stable algorithms use additional memory. With the current 4G of memory of more of memory and virtual memory used in all major OSes, the old emphasis on algorithms that does not require additional memory should be abandoned. Speedup that can be achieved by using of a small amount of additional memory is considerable and it is stupid to ignore it. Moreover you can use pointers and then additional space of the size N actually becomes size of N pointers. In real life when the records usually have size several times bigger then the size of a pointers (4 bytes in 32 bit CPUs) that makes huge difference and make those methods much more acceptable then they look form purely theoretical considerations. * The difference between worst case and average behavior (quicksort is efficient only on the average, and its worst case is n2, while heapsort has an interesting property that the worst case is not much different form the an average case. * Behaviors on practically important data sets (completely sorted, inversely sorted and 'almost sorted' (1 to K permutations, where K is less then N/10). Those has tremendous practical implications which are often not addressed or addressed incorrectly in textbooks other then Knuth.

Comparing our “PQSort” Algorithms Computational complexity Selection Sort: Always makes n passes with a “triangular” shape. Best/worst/average case (n2) Insertion Sort: Always makes n passes, but if we’re lucky and search for the maximum from the right, only constant work is needed on each pass. Best case (n); worst/average case: (n2) Heap Sort: Always makes n passes needing O(lg n) on each pass. Best/worst/average case: (n lg n). * Computational complexity (worst, average and best number of comparisons for several typical test cases, see below) in terms of the size of the list (n). Typically, good average number of comparisons is O(n log n) and bad is O(n2). Note that asymptotic analysis does not tell about algorithms behavior on small lists or worst case behavior. Worst case behavior is probably more important then average. For example "plain-vanilla" quicksort requires O(n2) comparisons in case of already sorted arrays: a very important in practice case. Sort algorithms which only use an abstract key comparison operation always need at least O(n log n) comparisons on average; while sort algorithms which exploit the structure of the key space cannot sort faster than O(n log k) where k is the size of the keyspace. Please note that the number of comparison is just convenient theoretical metric. In reality both moves and comparisons matter and on short keys the cost of move is comparable to the cost of comparison (even if pointers are used). * Stability: stable sorting algorithms maintain the relative order of records with equal keys. If all keys are different the this distinction does not make any sense. But if there are equal keys, then a sorting algorithm is stable if whenever there are two records R and S with the same key and with R appearing before S in the original list, R will appear before S in the sorted list. * Memory usage (and use of other computer resources). One large class of algorithms are "in- place sorting. They are generally slower than algorithms that use additional memory: additional memory can be used for mapping of keyspace. Most fast stable algorithms use additional memory. With the current 4G of memory of more of memory and virtual memory used in all major OSes, the old emphasis on algorithms that does not require additional memory should be abandoned. Speedup that can be achieved by using of a small amount of additional memory is considerable and it is stupid to ignore it. Moreover you can use pointers and then additional space of the size N actually becomes size of N pointers. In real life when the records usually have size several times bigger then the size of a pointers (4 bytes in 32 bit CPUs) that makes huge difference and make those methods much more acceptable then they look form purely theoretical considerations. * The difference between worst case and average behavior (quicksort is efficient only on the average, and its worst case is n2, while heapsort has an interesting property that the worst case is not much different form the an average case. * Behaviors on practically important data sets (completely sorted, inversely sorted and 'almost sorted' (1 to K permutations, where K is less then N/10). Those has tremendous practical implications which are often not addressed or addressed incorrectly in textbooks other then Knuth. Note: best cases assume distinct elements. With identical elements, Heap Sort can get (n) performance.

Insertion Sort Best Case 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PQ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PQ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PQ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 If we search from the right: constant time per pass! PQ

Comparing our “PQSort” Algorithms Stability Selection: Easily made stable (when building from the right, prefer the rightmost of identical “biggest” keys). Insertion: Easily made stable (when building from the right, find the leftmost slot for a new element). Heap: Unstable  Memory use: All three are essentially “in-place” algorithms with small O(1) extra space requirements. Cache access: Not detailed in 221, but… algorithms that don’t “jump around” tend to perform better in modern memory systems. Which of these “jumps around”?

Comparison of growth... T(n)=100 nlgn n2 n n=100

Today’s Outline Categorizing/Comparing Sorting Algorithms MergeSort PQSorts as examples MergeSort QuickSort More Comparisons Complexity of Sorting

MergeSort Mergesort belongs to a class of algorithms known as “divide and conquer” algorithms (your recursion sense should be tingling here...). The problem space is continually split in half, recursively applying the algorithm to each half until the base case is reached.

MergeSort Algorithm If the array has 0 or 1 elements, it’s sorted. Else… Split the array into two halves Sort each half recursively (i.e., using mergesort) Merge the sorted halves to produce one sorted result: Consider the two halves to be queues. Repeatedly compare the fronts of the queues. Whichever is smaller (or, if one is empty, whichever is left), dequeue it and insert it into the result.

MergeSort Performance Analysis If the array has 0 or 1 elements, it’s sorted. Else… Split the array into two halves Sort each half recursively (i.e., using mergesort) Merge the sorted halves to produce one sorted result: Consider the two halves to be queues. Repeatedly compare the fronts of the queues. Whichever is smaller (or, if one is empty, whichever is left), dequeue it and insert it into the result. T(1) = 1 2*T(n/2) n

MergeSort Performance Analysis T(1) = 1 T(n) = 2T(n/2) + n = 4T(n/4) + 2(n/2) + n = 8T(n/8) + 4(n/4) + 2(n/2) + n = 8T(n/8) + n + n + n = 8T(n/8) + 3n = 2iT(n/2i) + in. Let i = lg n T(n) = nT(1) + n lg n = n + n lg n  (n lg n) We ignored floors/ceilings. To prove performance formally, we’d use this as a guess and prove it with floors/ceilings by induction.

Consider the following array of integers: 3 -4 7 5 9 6 2 1 * **

Mergesort: void msort(int x[], int lo, int hi, int tmp[]) { if (lo >= hi) return; int mid = (lo+hi)/2; msort(x, lo, mid, tmp); msort(x, mid+1, hi, tmp); merge(x, lo, mid, hi, tmp); } void mergesort(int x[], int n) { int *tmp = new int[n]; msort(x, 0, n-1, tmp); delete[] tmp;

void merge(int x[],int lo,int mid,int hi, int tmp[]) { int a = lo, b = mid+1; for( int k = lo; k <= hi; k++ ) if( a <= mid && (b > hi || x[a] < x[b]) ) tmp[k] = x[a++]; else tmp[k] = x[b++]; } x[k] = tmp[k]; The first for loop The second for loop just puts everything back into x

merge( x, 0, 0, 1, tmp ); // step * x: 3 -4 7 5 9 6 2 1 tmp: -4 3 x: -4 3 7 5 9 6 2 1 merge( x, 4, 5, 7, tmp ); // step ** x: -4 3 5 7 6 9 1 2 tmp: 1 2 6 9 x: -4 3 5 7 1 2 6 9 merge( x, 0, 3, 7, tmp ); // will be the final step

Today’s Outline Categorizing/Comparing Sorting Algorithms MergeSort PQSorts as examples MergeSort QuickSort More Comparisons Complexity of Sorting

QuickSort In practice, one of the fastest sorting algorithms is Quicksort, developed in 1961 by C.A.R. Hoare. Comparison-based: examines elements by comparing them to other elements Divide-and-conquer: divides into “halves” (that may be very unequal) and recursively sorts

QuickSort algorithm Pick a pivot Reorder the list such that all elements < pivot are on the left, while all elements >= pivot are on the right Recursively sort each side Note that this algorithm will terminate -- each iteration places at least one element in its final spot (loop invariant) Are we missing a base case?

Partitioning The act of splitting up an array according to the pivot is called partitioning Consider the following: -4 1 -3 2 3 5 4 7 pivot left partition right partition

QuickSort Visually P P P P P Ask about insertion sort P P P Sorted!

QuickSort (by Jon Bentley): void qsort(int x[], int lo, int hi) { int i, p; if (lo >= hi) return; p = lo; for( i=lo+1; i <= hi; i++ ) if( x[i] < x[lo] ) swap(x[++p], x[i]); swap(x[lo], x[p]); qsort(x, lo, p-1); qsort(x, p+1, hi); } void quicksort(int x[], int n) { qsort(x, 0, n-1);

QuickSort Example (using Bentley’s Algorithm) 2 -4 6 1 5 -3 3 7

QuickSort: Complexity Recall that Quicksort is comparison based Thus, the operations are comparisons In our partitioning task, we compared each element to the pivot Thus, the total number of comparisons is N As with MergeSort, if one of the partitions is about half (or any constant fraction of) the size of the array, complexity is (n lg n). In the worst case, however, we end up with a partition with a 1 and n-1 split

QuickSort Visually: Worst case P P P Ask about insertion sort P

QuickSort: Worst Case In the overall worst-case, this happens at every step… Thus we have N comparisons in the first step N-1 comparisons in the second step N-2 comparisons in the third step : …or approximately n2 ...

QuickSort: Average Case (Intuition) Clearly pivot choice is important It has a direct impact on the performance of the sort Hence, QuickSort is fragile, or at least “attackable” So how do we pick a good pivot?

QuickSort: Average Case (Intuition) Let’s assume that pivot choice is random Half the time the pivot will be from the centre half of the array Thus at worst the split will be n/4 and 3n/4

QuickSort: Average Case (Intuition) We can apply this to the notion of a good split Every “good” split: 2 partitions of size n/4 and 3n/4 Or divides N by 4/3 Hence, we make up to log4/3(N) splits Expected # of partitions is at most 2 * log4/3(N) O(logN) Given N comparisons at each partitioning step, we have (N log N) Recall 2^3 = 8 log_2 8 = 3

Quicksort Complexity: How does it compare? N Insertion Sort Quicksort 10,000 4.1777 sec 0.05 sec 20,000 20.52 sec 0.11 sec 300,000 4666 sec (1.25 hrs) 2.15 sec

Today’s Outline Categorizing/Comparing Sorting Algorithms MergeSort PQSorts as examples MergeSort QuickSort More Comparisons Complexity of Sorting

How Do Quick, Merge, Heap, Insertion, and Selection Sort Compare? Complexity Best case: Insert < Quick, Merge, Heap < Select Average case: Quick, Merge, Heap < Insert, Select Worst case: Merge, Heap < Quick, Insert, Select Usually on “real” data: Quick < Merge < Heap < I/S On very short lists: quadratic sorts may have an advantage (so, some quick/merge implementations “bottom out” to these as base cases) Heap sort later Some details depend on implementation! (E.g., an initial check whether the last elt of the left sublist is less than first of the right can make merge’s best case linear.)

How Do Quick, Merge, Heap, Insertion, and Selection Sort Compare? Stability Easily Made Stable: Insert, Select, Merge (prefer the “left” of the two sorted sublists on ties) Unstable: Heap Challenging to Make Stable: Quick Memory use: Insert, Select, Heap, Quick < Merge Heap sort later

Today’s Outline Categorizing/Comparing Sorting Algorithms MergeSort PQSorts as examples MergeSort QuickSort More Comparisons Complexity of Sorting

Complexity of Sorting Using Comparisons as a Problem Each comparison is a “choice point” in the algorithm. You can do one thing if the comparison is true and another if false. So, the whole algorithm is like a binary tree… x < y yes no a < b a < d yes no yes no sorted! c < d z < c sorted! yes no yes no … … … …

Complexity of Sorting Using Comparisons as a Problem The algorithm spits out a (possibly different) sorted list at each leaf. What’s the maximum number of leaves? x < y yes no a < b a < d yes no yes no sorted! c < d z < c sorted! yes no yes no … … … …

Complexity of Sorting Using Comparisons as a Problem There are n! possible permutations of a sorted list (i.e., input orders for a given set of input elements). How deep must the tree be to distinguish those input orderings? x < y yes no a < b a < d yes no yes no sorted! c < d z < c sorted! yes no yes no … … … …

Complexity of Sorting Using Comparisons as a Problem If the tree is not at least lg(n!) deep, then there’s some pair of orderings I could feed the algorithm which the algorithm does not distinguish. So, it must not successfully sort one of those two orderings. x < y yes no a < b a < d yes no yes no sorted! c < d z < c sorted! yes no yes no … … … …

Complexity of Sorting Using Comparisons as a Problem QED: The complexity of sorting using comparisons is (n lg n) in the worst case, regardless of algorithm! In general, we can lower-bound but not upper-bound the complexity of problems. (Why not? Because I can give as crappy an algorithm as I please to solve any problem.)

Today’s Outline Categorizing/Comparing Sorting Algorithms MergeSort PQSorts as examples MergeSort QuickSort More Comparisons Complexity of Sorting

Learning Goals After this unit, you should be able to: Describe and apply various sorting algorithms; Compare and contrast their tradeoffs. State differences in performance for large data sets versus small data sets on various sorting algorithms. Analyze the complexity of these sorting algorithms. Explain the difference between the complexity of a problem (sorting) and the complexity of a particular algorithm for solving that problem. Manipulate data using various sorting algorithms (irrespective of any implementation).

To Do Continue Programming Project #2 (released soon) Read: Epp Section 9.5 and KW Section 10.1, 10.4, and 10.7-10.10

Coming Up Steve out of town (October 20-24) Midterm Exam! (October 21st) Special Midterm-Related Event… treat it like a second midterm exam (October 23rd) Prog Project #2 due (Oct 28th)