CSE 326: Data Structures: Sorting

Slides:



Advertisements
Similar presentations
Introduction to Algorithms Quicksort
Advertisements

Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
Chapter 4: Divide and Conquer Master Theorem, Mergesort, Quicksort, Binary Search, Binary Trees The Design and Analysis of Algorithms.
Quicksort CS 3358 Data Structures. Sorting II/ Slide 2 Introduction Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case:
25 May Quick Sort (11.2) CSE 2011 Winter 2011.
Quicksort COMP171 Fall Sorting II/ Slide 2 Introduction * Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case: O(N.
1 Today’s Material Divide & Conquer (Recursive) Sorting Algorithms –QuickSort External Sorting.
CSC 2300 Data Structures & Algorithms March 23, 2007 Chapter 7. Sorting.
Updated QuickSort Problem From a given set of n integers, find the missing integer from 0 to n using O(n) queries of type: “what is bit[j]
Quicksort Divide-and-Conquer. Quicksort Algorithm Given an array S of n elements (e.g., integers): If array only contains one element, return it. Else.
Lecture 25 Selection sort, reviewed Insertion sort, reviewed Merge sort Running time of merge sort, 2 ways to look at it Quicksort Course evaluations.
Quicksort. Quicksort I To sort a[left...right] : 1. if left < right: 1.1. Partition a[left...right] such that: all a[left...p-1] are less than a[p], and.
Quicksort
Divide and Conquer Sorting
Chapter 7 (Part 2) Sorting Algorithms Merge Sort.
1 CSE 326: Data Structures: Sorting Lecture 16: Friday, Feb 14, 2003.
1 CSE 326: Data Structures: Sorting Lecture 17: Wednesday, Feb 19, 2003.
Sorting II/ Slide 1 Lecture 24 May 15, 2011 l merge-sorting l quick-sorting.
Sorting (Part II: Divide and Conquer) CSE 373 Data Structures Lecture 14.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
CS 61B Data Structures and Programming Methodology July 28, 2008 David Sun.
Analysis of Algorithms CS 477/677
Searching and Sorting Recursion, Merge-sort, Divide & Conquer, Bucket sort, Radix sort Lecture 5.
1 CSE 326: Data Structures A Sort of Detour Henry Kautz Winter Quarter 2002.
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
Young CS 331 D&A of Algo. Topic: Divide and Conquer1 Divide-and-Conquer General idea: Divide a problem into subprograms of the same kind; solve subprograms.
CSE 326: Data Structures Lecture 23 Spring Quarter 2001 Sorting, Part 1 David Kaplan
Computer Science 1620 Sorting. cases exist where we would like our data to be in ascending (descending order) binary searching printing purposes selection.
Sorting Algorithms Merge Sort Quick Sort Hairong Zhao New Jersey Institute of Technology.
Sorting & Lower Bounds Jeff Edmonds York University COSC 3101 Lecture 5.
1 CSE 326: Data Structures Sorting by Comparison Zasha Weinberg in lieu of Steve Wolfman Winter Quarter 2000.
Algorithm Design Techniques, Greedy Method – Knapsack Problem, Job Sequencing, Divide and Conquer Method – Quick Sort, Finding Maximum and Minimum, Dynamic.
Advanced Sorting.
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Sorting.
Quicksort
Quick-Sort 9/13/2018 1:15 AM Quick-Sort     2
Chapter 7 Sorting Spring 14
Data Structures and Algorithms
Quicksort 1.
Chapter 4: Divide and Conquer
Quick Sort (11.2) CSE 2011 Winter November 2018.
CO 303 Algorithm Analysis And Design Quicksort
Ch 7: Quicksort Ming-Te Chi
Data Structures Review Session
8/04/2009 Many thanks to David Sun for some of the included slides!
Topic: Divide and Conquer
Lecture No 6 Advance Analysis of Institute of Southern Punjab Multan
Sub-Quadratic Sorting Algorithms
slides adapted from Marty Stepp
CSE373: Data Structure & Algorithms Lecture 21: Comparison Sorting
CSE 326: Data Structures Sorting
EE 312 Software Design and Implementation I
CSE 332: Data Abstractions Sorting I
CS 3343: Analysis of Algorithms
Quicksort.
CSE 373: Data Structures and Algorithms
CSE 326: Data Structures: Sorting
CSE 373 Data Structures and Algorithms
Topic: Divide and Conquer
Data Structures & Algorithms
Quicksort.
The Selection Problem.
Quicksort and Randomized Algs
CSE 332: Sorting II Spring 2016.
Quicksort.
Advanced Sorting Methods: Shellsort
Presentation transcript:

CSE 326: Data Structures: Sorting Lecture 14: Friday, Feb 7, 2003

QuickSort Pick a “pivot”. Divide list into two lists: Picture from PhotoDisc.com Pick a “pivot”. Divide list into two lists: One less-than-or-equal-to pivot value One greater than pivot Sort each sub-problem recursively Answer is the concatenation of the two solutions

QuickSort: Array-Based Version Pick pivot: 7 2 8 3 5 9 6 Partition with cursors 7 2 8 3 5 9 6 < > 2 goes to less-than 7 2 8 3 5 9 6 < >

QuickSort Partition (cont’d) 6, 8 swap less/greater-than 7 2 6 3 5 9 8 < > 3,5 less-than 9 greater-than 7 2 6 3 5 9 8 Partition done. 7 2 6 3 5 9 8

QuickSort Partition (cont’d) Put pivot into final position. 5 2 6 3 7 9 8 Recursively sort each side. 2 3 5 6 7 8 9

QuickSort Procedure quickSort(Array A, int N) { quickSortRecursive(A, 0, N-1); } procedure quickSortRecursive (Array A, int left, int right) if (left == right) return; int pivot = choosePivot(A, left, right); /* partition A s.t.: A[left], A[left+1], …, A[i]  pivot A[i+1], A[i+2], …, A[right]  pivot */ quickSortRecursive(A, left, i); quickSortRecursive(A, i+1, right);

QuickSort /* partition A s.t.: A[left], A[left+1], …, A[i]  pivot A[i+1], A[i+2], …, A[right]  pivot */ /* TO DO IN CLASS IN THE NEXT 5 minues. CONSTRAINT: TIME = O(right-left+1)

QuickSort: The Partition My code (no better nor worse than yours): i = left; j = right; repeat { while (i<j && A[i] <= pivot) i++; while (j>i && A[j] >= pivot) j--; if (i<j) swap(A[i], A[j]); else break; } quickSortRecursive(A, left, i); quickSortRecursive(A, i+1, right); A[left] … A[i-1] A[i] A[j] A[right]  pivot  pivot

QuickSort: The Partition Running time: T = O(right-left+1) Why ? Clever optimization: get rid of the tests i<j and j>i !

QuickSort: The Partition i = left; j = right; repeat { while (A[i] < pivot) i++; while (A[j] > pivot) j--; if (i<j) {swap(A[i], A[j]); i++; j++;} else break; } Why do we need i++, j++ ? There exists a sentinel A[k] pivot A[left] … A[i-1] A[i] A[j] A[right]  pivot  pivot We need I++, j++ because otherwise if A[I]=A[j]=pivot then we loop forever.

QuickSort: The Partition At the end:  pivot A[left] … A[j] A[i] A[right]  pivot Q: How are these elements ? A: They are = pivot ! All elements A[j+1],…,A[I-1] are = pivot ! quickSortRecursive(A, left, j); quickSortRecursive(A, i, right);

QuickSort: The Partition Variation (this is like in the book, more or less): i = left-1; j = right+1; repeat { repeat i++; until (A[i]  pivot); repeat j--; until (A[j]  pivot); if (i<j) swap(A[i], A[j]); else break; } quickSortRecursive(A, left, j-1); quickSortRecursive(A, i+1, right); There exists A[k] pivot A[left] … A[i-1] A[i] A[j] A[right]  pivot  pivot

Analyzing QuickSort Can’t solve, it depends on i Picking pivot: constant time Will discuss later Partitioning: linear time Recursion: suppose there are i elements  pivot: T(1) = b T(N) = T(i) + T(N-i) + cN Can’t solve, it depends on i

QuickSort Worst case Pivot is always smallest element, so i=1: T(N) = T(i) + T(N-i) + cN T(N) = T(N-1) + cN+b = T(N-2) + cN + c(N-1) + b + b = T(N-3) + cN + c(N-1) + c(N-2) + b + b + b = . . . = cN + c(N-1) + . . . c2 + c1 + b + b + . . . + b = O(N2)

QuickSort Best Case Pivot is always the median. T(N) = T(i) + T(N-i) + cN T(N) = 2T(N/2) + cN T(N) = 4T(N/4) + cN + cN T(N) = 8T(N/8) + cN + cN + cN . . . T(N) = 2log N T(1) + cN log N T(N) = O(N log N)

Choosing the Right Pivot pivot = A[left] I don’t recommend that why ? Randomly choose pivot Very good, but random number generator is slow “Median-of-3” rule: pivot = Median(A[left], A[middle], A[last]) Computing the median: a bit messy, read the book Much easier: “average-of-3” pivot = (A[left] + A[middle] + A[last])/3 But it’s a bad idea why ?? Pivot = A[left] leads to N2 time on already sorted arrays Average: doesn’t work for strings !

QuickSort Average Case Suppose pivot is picked at random All the following cases are equally likely: Pivot is smallest value in list: i=1 Pivot is 2nd smallest value in list i=2 Pivot is 3rd smallest value in list i=3 … Pivot is largest value in list i=N-1 Same is true if pivot is e.g. always first element, but the input itself is perfectly random

QuickSort Avg Case, cont. Expected running time: T(N) = 1/N (T(1)+T(N-1) + T(2)+T(N-2) + … + T(N-1)+T(1)) + cN = 2/N (T(1) + T(2) + … T(N-1)) + cN N T(N) = 2 T(1) + 2 T(2) + . . . +2 T(N-2) + 2 T(N-1) + cN2 (N-1) T(N-1) = 2 T(1) + 2 T(2) + . . . + 2 T(N-2) + c(N-1)2 NT(N) – (N-1) T(N-1) = 2 T(N-1) + 2cN – c NT(N) = (N+1)T(N-1) + 2cN – c T(N)/(N+1) = T(N-1)/N + 2c/(N+1) – c/N(N+1) T(N)/(N+1) = T(0)/1 + 2c(1/(N+1) + 1/N + … +1/2) – c(1/N(N+1) + … +1/1.2) = O(log N) T(N) = O(N log N)

Detour: Computing the Median The median of A[1], A[2], …, A[N] is some A[k] s.t.: There exists N/2 elements  A[k] There exists N/2 elements  A[k] Think of it as the perfect pivot ! Very important in applications: Median income v.s. average income Median grade v.s. average grade To compute: sort A[1], …, A[N], then median=A[N/2] Time O(N log N) Can we do it in O(N) time ?

Detour: Computing the Median int medianRecursive(Array A, int left, int right) { if (left==right) return A[left]; . . . Partition . . . if N/2  j return medianRecursive(A, left, j); if N/2  i return medianRecursive(A, i, right); return pivot } Int median(Array A, int N) { return medianRecursive(A, 0, N-1); } Why ?  pivot A[left] … A[j] A[i] A[right]  pivot

Detour: Computing the Median Best case running time: T(N) = T(N/2) + cN = T(N/4) + cN(1 + 1/2) = T(N/8) + cN(1 + 1/2 + 1/4) = . . . = T(1) + cN (1 + 1/2 + 1/4 + … 1/2k) = O(N) Worst case = O(N2) Average case = O(N) Question: how can you compute the median in O(N) worst case time ? Note: it’s tricky. Choose the pivot as follows. For each group of five compute the median: M[0] = median(A[0], A[1], …, A[4]) M[1] = median(A[5], A[6], …, A[9]) … M[N/5-1]) = median… Now do: Pivot = medianRecursive(M) Then continue as before. The trick is than there are at least 2N/5 elements greater than the pivot (and similarly 2N/5 less than the pivot). Hence, when medianRecursive is called, the input array has size at most 3N/5. Putting it together: T(N) = T(N/5) + T(3N/5) + cN The key here is that 1/5 + 3/5 < 1, hence the recursive calls we make are on smaller problem instances than the original problem. This observation exploited correctly in the math, results in T(N) = O(N)

Back to Sorting Naïve sorting algorithms: Clever sorting algorithms: Bubble sort, insertion sort, selection sort Time = O(n2) Clever sorting algorithms: Merge sort, heap sort, quick sort Time = O(N log N) I want to sort in O(N) ! Is this possible ?

Could We Do Better? For any possible correct Sorting by Comparison algorithm, what is lowest worst case time? Imagine how the comparisons that would be performed by the best possible sorting algorithm form a decision tree… Worst-case running time cannot be less than the depth of this tree!

Decision tree to sort list A,B,C

Max depth of the decision tree How many permutations are there of N numbers? How many leaves does the tree have? What’s the shallowest tree with a given number of leaves? What is therefore the worst running time (number of comparisons) by the best possible sorting algorithm? N! log (N!)

Max depth of the decision tree How many permutations are there of N numbers? N! How many leaves does the tree have? What’s the shallowest tree with a given number of leaves? log(N!) What is therefore the worst running time (number of comparisons) by the best possible sorting algorithm? N! log (N!)

Stirling’s approximation

Stirling’s Approximation Redux

Why is QuickSort Faster than Merge Sort? Quicksort typically performs more comparisons than Mergesort, because partitions are not always perfectly balanced Mergesort – n log n comparisons Quicksort – 1.38 n log n comparisons on average Quicksort performs many fewer copies, because on average half of the elements are on the correct side of the partition – while Mergesort copies every element when merging Mergesort – 2n log n copies (using “temp array”) n log n copies (using “alternating array”) Quicksort – n/2 log n copies on average

Bucket Sort Now let’s sort in O(N) Assume: A[0], A[1], …, A[N-1] {0, 1, …, M-1} M = not too big

Bucket Sort int bucketSort(Array A, int N) { for k = 0 to M-1 Q[k] = new Queue; for j = 0 to N-1 Q[A[j]].enqueue(A[j]); Result = new Queue; Result = Result.append(Q[k]); return Result; }

Bucket Sort Running time: O(M+N) What about the Theorem that says sorting takes (N log N) ?? Bucket sort does not rely on comparisons. It only works in a very restrictive setting

Radix Sort I still want to sort in time O(N) A[0], A[1], …, A[N-1] are strings Very common in practice Each string is: cd-1cd-2…c1c0, where c0, c1, …, cd-1 {0, 1, …, M-1} M = 128

Radix Sort What is the size of the input ? Size = dN cd-1 cd-2 … c0 A[N-1]

Radix Sort Running time: T = O(d(M+N)) = O(dN) = O(Size) int radixSort(Array A, int N) { for k = 0 to d-1 A = bucketSort(A, on position k) } Running time: T = O(d(M+N)) = O(dN) = O(Size)

Radix Sort Variable length strings: Improve Radix Sort to still sort in time O(Size) ! A[0] A[1] A[2] A[3] A[4]

Radix Sort Suppose we want to sort N distinct numbers Represent them in decimal: Need d=log N digits Hence RadixSort takes time O(dN) = O(N log N) No conflict with theory here 