Divide and Conquer Sorting

Slides:



Advertisements
Similar presentations
Introduction to Algorithms Quicksort
Advertisements

Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
Analysis of Algorithms CS 477/677 Linear Sorting Instructor: George Bebis ( Chapter 8 )
CSE 3101: Introduction to the Design and Analysis of Algorithms
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Dan Grossman Spring 2010.
§7 Quicksort -- the fastest known sorting algorithm in practice 1. The Algorithm void Quicksort ( ElementType A[ ], int N ) { if ( N < 2 ) return; pivot.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
CSCE 3110 Data Structures & Algorithm Analysis
Chapter 4: Divide and Conquer Master Theorem, Mergesort, Quicksort, Binary Search, Binary Trees The Design and Analysis of Algorithms.
Using Divide and Conquer for Sorting
DIVIDE AND CONQUER APPROACH. General Method Works on the approach of dividing a given problem into smaller sub problems (ideally of same size).  Divide.
Spring 2015 Lecture 5: QuickSort & Selection
Quicksort CS 3358 Data Structures. Sorting II/ Slide 2 Introduction Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case:
25 May Quick Sort (11.2) CSE 2011 Winter 2011.
Quicksort COMP171 Fall Sorting II/ Slide 2 Introduction * Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case: O(N.
1 Today’s Material Divide & Conquer (Recursive) Sorting Algorithms –QuickSort External Sorting.
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
CS 171: Introduction to Computer Science II Quicksort.
Sorting Heapsort Quick review of basic sorting methods Lower bounds for comparison-based methods Non-comparison based sorting.
Comp 122, Spring 2004 Lower Bounds & Sorting in Linear Time.
Data Structures, Spring 2006 © L. Joskowicz 1 Data Structures – LECTURE 4 Comparison-based sorting Why sorting? Formal analysis of Quick-Sort Comparison.
Chapter 4: Divide and Conquer The Design and Analysis of Algorithms.
Quicksort. 2 Introduction * Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case: O(N 2 ) n But, the worst case seldom.
Quicksort.
CSE 326: Data Structures Sorting Ben Lerner Summer 2007.
Analysis of Algorithms CS 477/677
1 Today’s Material Lower Bounds on Comparison-based Sorting Linear-Time Sorting Algorithms –Counting Sort –Radix Sort.
Chapter 7 (Part 2) Sorting Algorithms Merge Sort.
Sorting Chapter 6 Chapter 6 –Insertion Sort 6.1 –Quicksort 6.2 Chapter 5 Chapter 5 –Mergesort 5.2 –Stable Sorts Divide & Conquer.
Lecture 8 Sorting. Sorting (Chapter 7) We have a list of real numbers. Need to sort the real numbers in increasing order (smallest first). Important points.
CS2420: Lecture 11 Vladimir Kulyukin Computer Science Department Utah State University.
CSE 373 Data Structures Lecture 19
Sorting (Part II: Divide and Conquer) CSE 373 Data Structures Lecture 14.
Computer Algorithms Lecture 11 Sorting in Linear Time Ch. 8
CSE 373 Data Structures Lecture 15
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
HKOI 2006 Intermediate Training Searching and Sorting 1/4/2006.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
Quicksort, Mergesort, and Heapsort. Quicksort Fastest known sorting algorithm in practice  Caveats: not stable  Vulnerable to certain attacks Average.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Dan Grossman Spring 2012.
Analysis of Algorithms CS 477/677
Sorting. Pseudocode of Insertion Sort Insertion Sort To sort array A[0..n-1], sort A[0..n-2] recursively and then insert A[n-1] in its proper place among.
Merge sort, Insertion sort. Sorting I / Slide 2 Sorting * Selection sort (iterative, recursive?) * Bubble sort.
1 Sorting Algorithms Sections 7.1 to Comparison-Based Sorting Input – 2,3,1,15,11,23,1 Output – 1,1,2,3,11,15,23 Class ‘Animals’ – Sort Objects.
Sorting CSIT 402 Data Structures II. 2 Sorting (Ascending Order) Input ›an array A of data records ›a key value in each data record ›a comparison function.
Searching and Sorting Recursion, Merge-sort, Divide & Conquer, Bucket sort, Radix sort Lecture 5.
Review 1 Selection Sort Selection Sort Algorithm Time Complexity Best case Average case Worst case Examples.
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
Sorting 1 Devon M. Simmonds University of North Carolina, Wilmington TIME: Tuesday/Thursday 11:11:50am in 1012 & Thursday 3:30-5:10pm in Office hours:
Sorting Fundamental Data Structures and Algorithms Aleks Nanevski February 17, 2004.
Sorting Algorithms Merge Sort Quick Sort Hairong Zhao New Jersey Institute of Technology.
PREVIOUS SORTING ALGORITHMS  BUBBLE SORT –Time Complexity: O(n 2 ) For each item, make (n –1) comparisons Gives: Comparisons = (n –1) + (n – 2)
Nothing is particularly hard if you divide it into small jobs. Henry Ford Nothing is particularly hard if you divide it into small jobs. Henry Ford.
SORTING AND ASYMPTOTIC COMPLEXITY Lecture 13 CS2110 – Fall 2009.
1 Overview Divide and Conquer Merge Sort Quick Sort.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
Lower Bounds & Sorting in Linear Time
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Quick Sort (11.2) CSE 2011 Winter November 2018.
Lower Bounds & Sorting in Linear Time
Sorting CSE 373 Data Structures.
CSE 326: Data Structures Sorting
CSE 332: Data Abstractions Sorting I
CSE 373 Data Structures and Algorithms
The Selection Problem.
CSE 332: Sorting II Spring 2016.
Presentation transcript:

Divide and Conquer Sorting Data Structures

Insertion Sort What if first k elements of array are already sorted? 4, 7, 12, 5, 19, 16 We can shift the tail of the sorted elements list down and then insert next element into proper position and we get k+1 sorted elements 4, 5, 7, 12, 19, 16

“Divide and Conquer” Very important strategy in computer science: Divide problem into smaller parts Independently solve the parts Combine these solutions to get overall solution Idea 1: Divide array into two halves, recursively sort left and right halves, then merge two halves  known as Mergesort Idea 2 : Partition array into small items and large items, then recursively sort the two sets  known as Quicksort

Mergesort Divide it in two at the midpoint Conquer each side in turn (by recursively sorting) Merge two halves together 8 2 9 4 5 3 1 6

Mergesort Example 8 2 9 4 5 3 1 6 Divide 8 2 9 4 5 3 1 6 Divide 8 2 8 2 9 4 5 3 1 6 Divide 8 2 9 4 5 3 1 6 Divide 1 element 8 2 9 4 5 3 1 6 Merge 2 8 4 9 3 5 1 6 Merge 2 4 8 9 1 3 5 6 Merge 1 2 3 4 5 6 8 9

Auxiliary Array The merging requires an auxiliary array. 2 4 8 9 1 3 5 6 Auxiliary array

Auxiliary Array The merging requires an auxiliary array. 2 4 8 9 1 3 5 6 Auxiliary array 1

Auxiliary Array The merging requires an auxiliary array. 2 4 8 9 1 3 5 6 Auxiliary array 1 2 3 4 5

Merging i j normal target Left completed first i j copy target

Merging first i j Right completed first second target

Merging Merge(A[], T[] : integer array, left, right : integer) : { mid, i, j, k, l, target : integer; mid := (right + left)/2; i := left; j := mid + 1; target := left; while i < mid and j < right do if A[i] < A[j] then T[target] := A[i] ; i:= i + 1; else T[target] := A[j]; j := j + 1; target := target + 1; if i > mid then //left completed// for k := left to target-1 do A[k] := T[k]; if j > right then //right completed// k : = mid; l := right; while k > i do A[l] := A[k]; k := k-1; l := l-1; }

Recursive Mergesort Mergesort(A[], T[] : integer array, left, right : integer) : { if left < right then mid := (left + right)/2; Mergesort(A,T,left,mid); Mergesort(A,T,mid+1,right); Merge(A,T,left,right); } MainMergesort(A[1..n]: integer array, n : integer) : { T[1..n]: integer array; Mergesort[A,T,1,n];

Iterative Mergesort Merge by 1 Merge by 2 Merge by 4 Merge by 8

Iterative Mergesort Merge by 1 Merge by 2 Merge by 4 Merge by 8 copy

Iterative pseudocode Sort(array A of length N) Let m = 2, let B be temp array of length N While m<N For i = 1…N in increments of m merge A[i…i+m/2] and A[i+m/2…i+m] into B[i…i+m] Swap role of A and B m=m*2 If needed, copy B back to A

Mergesort Analysis Let T(N) be the running time for an array of N elements Mergesort divides array in half and calls itself on the two halves. After returning, it merges both halves using a temporary array Each recursive call takes T(N/2) and merging takes O(N)

Mergesort Recurrence Relation The recurrence relation for T(N) is: T(1) < c base case: 1 element array  constant time T(N) < 2T(N/2) + dN Sorting n elements takes the time to sort the left half plus the time to sort the right half plus an O(N) time to merge the two halves T(N) = O(N log N)

Properties of Mergesort Not in-place Requires an auxiliary array Very few comparisons Iterative Mergesort reduces copying.

Quicksort Quicksort uses a divide and conquer strategy, but does not require the O(N) extra space that MergeSort does Partition array into left and right sub-arrays the elements in left sub-array are all less than pivot elements in right sub-array are all greater than pivot Recursively sort left and right sub-arrays Concatenate left and right sub-arrays in O(1) time

“Four easy steps” To sort an array S If the number of elements in S is 0 or 1, then return. The array is sorted. Pick an element v in S. This is the pivot value. Partition S-{v} into two disjoint subsets, S1 = {all values xv}, and S2 = {all values xv}. Return QuickSort(S1), v, QuickSort(S2)

The steps of QuickSort S S1 S2 S1 S2 S select pivot value partition S 81 31 57 43 13 75 92 65 26 S1 S2 partition S 31 75 43 13 65 81 92 26 57 QuickSort(S1) and QuickSort(S2) S1 S2 13 26 31 43 57 65 75 81 92 S 13 26 31 43 57 65 75 81 92 Presto! S is sorted [Weiss]

Details, details “The algorithm so far lacks quite a few of the details” Picking the pivot want a value that will cause |S1| and |S2| to be non-zero, and close to equal in size if possible Implementing the actual partitioning Dealing with cases where the element equals the pivot

Alternative Pivot Rules Chose A[left] Fast, but too biased, enables worst-case Chose A[random], left < random < right Completely unbiased Will cause relatively even split, but slow Median of three, A[left], A[right], A[(left+right)/2] The standard, tends to be unbiased, and does a little sorting on the side.

Quicksort Partitioning Need to partition the array into left and right sub-arrays the elements in left sub-array are  pivot elements in right sub-array are  pivot How do the elements get to the correct partition? Choose an element from the array as the pivot Make one pass through the rest of the array and swap as needed to put elements in partitions

Example Choose the pivot as the median of three. 1 2 3 4 5 6 7 8 9 8 1 4 9 3 5 2 7 6 1 4 9 7 3 5 2 6 8 i j Choose the pivot as the median of three. Place the pivot and the largest at the right and the smallest at the left

Partitioning is done In-Place One implementation (there are others) median3 finds pivot and sorts left, center, right Swap pivot with next to last element Set pointers i and j to start and end of array Increment i until you hit element A[i] > pivot Decrement j until you hit element A[j] < pivot Swap A[i] and A[j] Repeat until i and j cross Swap pivot (= A[N-2]) with A[i]

Example Move i to the right to be larger than pivot. j 1 4 9 7 3 5 2 6 8 i j 1 4 9 7 3 5 2 6 8 i j 1 4 9 7 3 5 2 6 8 i j 1 4 2 7 3 5 9 6 8 Move i to the right to be larger than pivot. Move j to the left to be smaller than pivot. Swap

Example S1 < pivot pivot S2 > pivot 1 4 2 7 3 5 9 6 8 1 4 2 7 3 j 1 4 2 7 3 5 9 6 8 i j 1 4 2 7 3 5 9 6 8 i j 1 4 2 5 3 7 9 6 8 i j 1 4 2 5 3 7 9 6 8 j i 1 4 2 5 3 7 9 6 8 j i 1 4 2 5 3 6 9 7 8 S1 < pivot pivot S2 > pivot

Recursive Quicksort Don’t use quicksort for small arrays. Quicksort(A[]: integer array, left,right : integer): { pivotindex : integer; if left + CUTOFF  right then pivot := median3(A,left,right); pivotindex := Partition(A,left,right-1,pivot); Quicksort(A, left, pivotindex – 1); Quicksort(A, pivotindex + 1, right); else Insertionsort(A,left,right); } Don’t use quicksort for small arrays. CUTOFF = 10 is reasonable.

Quicksort Best Case Performance Algorithm always chooses best pivot and splits sub-arrays in half at each recursion T(0) = T(1) = O(1) constant time if 0 or 1 element For N > 1, 2 recursive calls plus linear time for partitioning T(N) = 2T(N/2) + O(N) Same recurrence relation as Mergesort T(N) = O(N log N)

Quicksort Worst Case Performance Algorithm always chooses the worst pivot – one sub-array is empty at each recursion T(N)  a for N  C T(N)  T(N-1) + bN  T(N-2) + b(N-1) + bN  T(C) + b(C+1)+ … + bN  a +b(C + C+1 + C+2 + … + N) T(N) = O(N2) Fortunately, average case performance is O(N log N) (see text for proof)

Properties of Quicksort No iterative version (without using a stack). Pure quicksort not good for small arrays. “In-place”, but uses auxiliary storage because of recursive calls. O(n log n) average case performance, but O(n2) worst case performance.

Folklore “Quicksort is the best in-memory sorting algorithm.” Mergesort and Quicksort make different tradeoffs regarding the cost of comparison and the cost of a swap

Features of Sorting Algorithms In-place Sorted items occupy the same space as the original items. (No copying required, only O(1) extra space if any.) Stable Items in input with the same value end up in the same order as when they began.

How fast can we sort? Heapsort, Mergesort, and Quicksort all run in O(N log N) best case running time Can we do any better? No, if the basic action is a comparison.

Sorting Model Recall our basic assumption: we can only compare two elements at a time we can only reduce the possible solution space by half each time we make a comparison Suppose you are given N elements Assume no duplicates How many possible orderings can you get? Example: a, b, c (N = 3)

Permutations How many possible orderings can you get? For N elements Example: a, b, c (N = 3) (a b c), (a c b), (b a c), (b c a), (c a b), (c b a) 6 orderings = 3•2•1 = 3! (ie, “3 factorial”) All the possible permutations of a set of 3 elements For N elements N choices for the first position, (N-1) choices for the second position, …, (2) choices, 1 choice N(N-1)(N-2)(2)(1)= N! possible orderings

Decision Tree a < b < c, b < c < a, c < a < b, a < c < b, b < a < c, c < b < a a < b a > b a < b < c c < a < b a < c < b b < c < a b < a < c c < b < a a < c a > c b < c b > c a < b < c a < c < b c < a < b b < c < a b < a < c c < b < a c < a c > a b < c b > c a < c < b b < c < a b < a < c a < b < c The leaves contain all the possible orderings of a, b, c

Decision Trees A Decision Tree is a Binary Tree such that: Each node = a set of orderings ie, the remaining solution space Each edge = 1 comparison Each leaf = 1 unique ordering How many leaves for N distinct elements? N!, ie, a leaf for each possible ordering Only 1 leaf has the ordering that is the desired correctly sorted arrangement

Decision Tree Example a < b < c, b < c < a, c < a < b, a < c < b, b < a < c, c < b < a possible orders a < b a > b a < b < c c < a < b a < c < b b < c < a b < a < c c < b < a a < c a > c b < c b > c a < b < c a < c < b c < a < b b < c < a b < a < c c < b < a c < a c > a b < c b > c actual order a < c < b b < c < a b < a < c a < b < c

Decision Trees and Sorting Every sorting algorithm corresponds to a decision tree Finds correct leaf by choosing edges to follow ie, by making comparisons Each decision reduces the possible solution space by one half Run time is  maximum no. of comparisons maximum number of comparisons is the length of the longest path in the decision tree, i.e. the height of the tree

Lower bound on Height A binary tree of height h has at most how many leaves? The decision tree has how many leaves: A binary tree with L leaves has height at least: So the decision tree has height:

log(N!) is (NlogN) select just the first N/2 terms each of the selected terms is  logN/2

(N log N) Run time of any comparison-based sorting algorithm is (N log N) Can we do better if we don’t use comparisons?

BucketSort (aka BinSort) If all values to be sorted are known to be between 1 and K, create an array count of size K, increment counts while traversing the input, and finally output the result. Example K=5. Input = (5,1,3,4,3,2,1,1,5,4,5) count array 1 2 3 4 5 Running time to sort n items?

BucketSort Complexity: O(n+K) Case 1: K is a constant BinSort is linear time Case 2: K is variable Not simply linear time Case 3: K is constant but large (e.g. 232) ??? Impractical! Linear time sounds great. How to fix???

Fixing impracticality: RadixSort Radix = “The base of a number system” We’ll use 10 for convenience, but could be anything Idea: BucketSort on each digit, least significant to most significant (lsd to msd)

Radix Sort Example (1st pass) Bucket sort by 1’s digit After 1st pass Input data 721 3 123 537 67 478 38 9 478 537 9 1 2 3 4 5 6 7 8 9 721 721 3 123 537 67 478 38 9 3 38 123 67 This example uses B=10 and base 10 digits for simplicity of demonstration. Larger bucket counts should be used in an actual implementation.

Radix Sort Example (2nd pass) Bucket sort by 10’s digit After 1st pass After 2nd pass 3 9 721 123 537 38 67 478 721 3 123 537 67 478 38 9 1 2 3 4 5 6 7 8 9 03 09 721 123 537 38 67 478

Radix Sort Example (3rd pass) Bucket sort by 100’s digit After 2nd pass After 3rd pass 3 9 721 123 537 38 67 478 3 9 38 67 123 478 537 721 1 2 3 4 5 6 7 8 9 003 009 038 067 123 478 537 721 Invariant: after k passes the low order k digits are sorted.

RadixSort Input:126, 328, 636, 341, 416, 131, 328 Your Turn BucketSort on lsd: 1 2 3 4 5 6 7 8 9 BucketSort on next-higher digit: 1 2 3 4 5 6 7 8 9 BucketSort on msd: 1 2 3 4 5 6 7 8 9

Radixsort: Complexity How many passes? How much work per pass? Total time? Conclusion? In practice RadixSort only good for large number of elements with relatively small values Hard on the cache compared to MergeSort/QuickSort

Summary of sorting Sorting choices: O(N2) – Bubblesort, Insertion Sort O(N log N) average case running time: Heapsort: In-place, not stable. Mergesort: O(N) extra space, stable. Quicksort: claimed fastest in practice, but O(N2) worst case. Needs extra storage for recursion. Not stable. O(N) – Radix Sort: fast and stable. Not comparison based. Not in-place.