Download presentation
Presentation is loading. Please wait.
CSE 326: Data Structures: Sorting
Lecture 14: Friday, Feb 7, 2003
QuickSort Pick a “pivot”. Divide list into two lists:
Picture from Pick a “pivot”. Divide list into two lists: One less-than-or-equal-to pivot value One greater than pivot Sort each sub-problem recursively Answer is the concatenation of the two solutions
QuickSort: Array-Based Version
Pick pivot: 7 2 8 3 5 9 6 Partition with cursors 7 2 8 3 5 9 6 < > 2 goes to less-than 7 2 8 3 5 9 6 < >
QuickSort Partition (cont’d)
6, 8 swap less/greater-than 7 2 6 3 5 9 8 < > 3,5 less-than 9 greater-than 7 2 6 3 5 9 8 Partition done. 7 2 6 3 5 9 8
QuickSort Partition (cont’d)
Put pivot into final position. 5 2 6 3 7 9 8 Recursively sort each side. 2 3 5 6 7 8 9
QuickSort Procedure quickSort(Array A, int N) {
quickSortRecursive(A, 0, N-1); } procedure quickSortRecursive (Array A, int left, int right) if (left == right) return; int pivot = choosePivot(A, left, right); /* partition A s.t.: A[left], A[left+1], …, A[i] pivot A[i+1], A[i+2], …, A[right] pivot */ quickSortRecursive(A, left, i); quickSortRecursive(A, i+1, right);
QuickSort /* partition A s.t.: A[left], A[left+1], …, A[i] pivot
A[i+1], A[i+2], …, A[right] pivot */ /* TO DO IN CLASS IN THE NEXT 5 minues. CONSTRAINT: TIME = O(right-left+1)
QuickSort: The Partition
My code (no better nor worse than yours): i = left; j = right; repeat { while (i<j && A[i] <= pivot) i++; while (j>i && A[j] >= pivot) j--; if (i<j) swap(A[i], A[j]); else break; } quickSortRecursive(A, left, i); quickSortRecursive(A, i+1, right); A[left] … A[i-1] A[i] A[j] A[right] pivot pivot
QuickSort: The Partition
Running time: T = O(right-left+1) Why ? Clever optimization: get rid of the tests i<j and j>i !
QuickSort: The Partition
i = left; j = right; repeat { while (A[i] < pivot) i++; while (A[j] > pivot) j--; if (i<j) {swap(A[i], A[j]); i++; j++;} else break; } Why do we need i++, j++ ? There exists a sentinel A[k] pivot A[left] … A[i-1] A[i] A[j] A[right] pivot pivot We need I++, j++ because otherwise if A[I]=A[j]=pivot then we loop forever.
QuickSort: The Partition
At the end: pivot A[left] … A[j] A[i] A[right] pivot Q: How are these elements ? A: They are = pivot ! All elements A[j+1],…,A[I-1] are = pivot ! quickSortRecursive(A, left, j); quickSortRecursive(A, i, right);
QuickSort: The Partition
Variation (this is like in the book, more or less): i = left-1; j = right+1; repeat { repeat i++; until (A[i] pivot); repeat j--; until (A[j] pivot); if (i<j) swap(A[i], A[j]); else break; } quickSortRecursive(A, left, j-1); quickSortRecursive(A, i+1, right); There exists A[k] pivot A[left] … A[i-1] A[i] A[j] A[right] pivot pivot
Analyzing QuickSort Can’t solve, it depends on i
Picking pivot: constant time Will discuss later Partitioning: linear time Recursion: suppose there are i elements pivot: T(1) = b T(N) = T(i) + T(N-i) + cN Can’t solve, it depends on i
QuickSort Worst case Pivot is always smallest element, so i=1:
T(N) = T(i) + T(N-i) + cN T(N) = T(N-1) + cN+b = T(N-2) + cN + c(N-1) + b + b = T(N-3) + cN + c(N-1) + c(N-2) + b + b + b = . . . = cN + c(N-1) c2 + c1 + b + b b = O(N2)
QuickSort Best Case Pivot is always the median.
T(N) = T(i) + T(N-i) + cN T(N) = 2T(N/2) + cN T(N) = 4T(N/4) + cN + cN T(N) = 8T(N/8) + cN + cN + cN . . . T(N) = 2log N T(1) + cN log N T(N) = O(N log N)
Choosing the Right Pivot
pivot = A[left] I don’t recommend that why ? Randomly choose pivot Very good, but random number generator is slow “Median-of-3” rule: pivot = Median(A[left], A[middle], A[last]) Computing the median: a bit messy, read the book Much easier: “average-of-3” pivot = (A[left] + A[middle] + A[last])/3 But it’s a bad idea why ?? Pivot = A[left] leads to N2 time on already sorted arrays Average: doesn’t work for strings !
QuickSort Average Case
Suppose pivot is picked at random All the following cases are equally likely: Pivot is smallest value in list: i=1 Pivot is 2nd smallest value in list i=2 Pivot is 3rd smallest value in list i=3 … Pivot is largest value in list i=N-1 Same is true if pivot is e.g. always first element, but the input itself is perfectly random
QuickSort Avg Case, cont.
Expected running time: T(N) = 1/N (T(1)+T(N-1) + T(2)+T(N-2) + … + T(N-1)+T(1)) + cN = 2/N (T(1) + T(2) + … T(N-1)) + cN N T(N) = 2 T(1) + 2 T(2) T(N-2) + 2 T(N-1) + cN2 (N-1) T(N-1) = 2 T(1) + 2 T(2) T(N-2) + c(N-1)2 NT(N) – (N-1) T(N-1) = 2 T(N-1) + 2cN – c NT(N) = (N+1)T(N-1) + 2cN – c T(N)/(N+1) = T(N-1)/N + 2c/(N+1) – c/N(N+1) T(N)/(N+1) = T(0)/1 + 2c(1/(N+1) + 1/N + … +1/2) – c(1/N(N+1) + … +1/1.2) = O(log N) T(N) = O(N log N)
Detour: Computing the Median
The median of A[1], A[2], …, A[N] is some A[k] s.t.: There exists N/2 elements A[k] There exists N/2 elements A[k] Think of it as the perfect pivot ! Very important in applications: Median income v.s. average income Median grade v.s. average grade To compute: sort A[1], …, A[N], then median=A[N/2] Time O(N log N) Can we do it in O(N) time ?
Detour: Computing the Median
int medianRecursive(Array A, int left, int right) { if (left==right) return A[left]; . . . Partition . . . if N/2 j return medianRecursive(A, left, j); if N/2 i return medianRecursive(A, i, right); return pivot } Int median(Array A, int N) { return medianRecursive(A, 0, N-1); } Why ? pivot A[left] … A[j] A[i] A[right] pivot
Detour: Computing the Median
Best case running time: T(N) = T(N/2) + cN = T(N/4) + cN(1 + 1/2) = T(N/8) + cN(1 + 1/2 + 1/4) = = T(1) + cN (1 + 1/2 + 1/4 + … 1/2k) = O(N) Worst case = O(N2) Average case = O(N) Question: how can you compute the median in O(N) worst case time ? Note: it’s tricky. Choose the pivot as follows. For each group of five compute the median: M[0] = median(A[0], A[1], …, A[4]) M[1] = median(A[5], A[6], …, A[9]) … M[N/5-1]) = median… Now do: Pivot = medianRecursive(M) Then continue as before. The trick is than there are at least 2N/5 elements greater than the pivot (and similarly 2N/5 less than the pivot). Hence, when medianRecursive is called, the input array has size at most 3N/5. Putting it together: T(N) = T(N/5) + T(3N/5) + cN The key here is that 1/5 + 3/5 < 1, hence the recursive calls we make are on smaller problem instances than the original problem. This observation exploited correctly in the math, results in T(N) = O(N)
Back to Sorting Naïve sorting algorithms: Clever sorting algorithms:
Bubble sort, insertion sort, selection sort Time = O(n2) Clever sorting algorithms: Merge sort, heap sort, quick sort Time = O(N log N) I want to sort in O(N) ! Is this possible ?
Could We Do Better? For any possible correct Sorting by Comparison algorithm, what is lowest worst case time? Imagine how the comparisons that would be performed by the best possible sorting algorithm form a decision tree… Worst-case running time cannot be less than the depth of this tree!
Decision tree to sort list A,B,C
Max depth of the decision tree
How many permutations are there of N numbers? How many leaves does the tree have? What’s the shallowest tree with a given number of leaves? What is therefore the worst running time (number of comparisons) by the best possible sorting algorithm? N! log (N!)
Max depth of the decision tree
How many permutations are there of N numbers? N! How many leaves does the tree have? What’s the shallowest tree with a given number of leaves? log(N!) What is therefore the worst running time (number of comparisons) by the best possible sorting algorithm? N! log (N!)
Stirling’s approximation
Stirling’s Approximation Redux
Why is QuickSort Faster than Merge Sort?
Quicksort typically performs more comparisons than Mergesort, because partitions are not always perfectly balanced Mergesort – n log n comparisons Quicksort – 1.38 n log n comparisons on average Quicksort performs many fewer copies, because on average half of the elements are on the correct side of the partition – while Mergesort copies every element when merging Mergesort – 2n log n copies (using “temp array”) n log n copies (using “alternating array”) Quicksort – n/2 log n copies on average
Bucket Sort Now let’s sort in O(N)
Assume: A[0], A[1], …, A[N-1] {0, 1, …, M-1} M = not too big
Bucket Sort int bucketSort(Array A, int N) { for k = 0 to M-1
Q[k] = new Queue; for j = 0 to N-1 Q[A[j]].enqueue(A[j]); Result = new Queue; Result = Result.append(Q[k]); return Result; }
Bucket Sort Running time: O(M+N)
What about the Theorem that says sorting takes (N log N) ?? Bucket sort does not rely on comparisons. It only works in a very restrictive setting
Radix Sort I still want to sort in time O(N)
A[0], A[1], …, A[N-1] are strings Very common in practice Each string is: cd-1cd-2…c1c0, where c0, c1, …, cd-1 {0, 1, …, M-1} M = 128
Radix Sort What is the size of the input ? Size = dN cd-1 cd-2 … c0
Radix Sort Running time: T = O(d(M+N)) = O(dN) = O(Size)
int radixSort(Array A, int N) { for k = 0 to d-1 A = bucketSort(A, on position k) } Running time: T = O(d(M+N)) = O(dN) = O(Size)
Radix Sort Variable length strings:
Improve Radix Sort to still sort in time O(Size) ! A[0] A[1] A[2] A[3] A[4]
Radix Sort Suppose we want to sort N distinct numbers
Represent them in decimal: Need d=log N digits Hence RadixSort takes time O(dN) = O(N log N) No conflict with theory here
Similar presentations
© 2025 Inc.
All rights reserved.