Download presentation
Presentation is loading. Please wait.
Published byLynne Tucker Modified over 9 years ago
1
Sorting and Lower Bounds 15-211 Fundamental Data Structures and Algorithms Peter Lee February 25, 2003
2
Announcements Quiz #2 available today Open until Wednesday midnight Midterm exam next Tuesday Tuesday, March 4, 2003, in class Review session in Thursday’s class Homework #4 is out You should finish Part 1 this week! Reading: Chapter 8
3
Recap
4
Naïve sorting algorithms Bubble sort. 2447139910522213471324 1054713993022247105139930222134710599302221347991053022213304799105222 10547139930222 Insertion sort.
5
Heapsort Build heap. O(N) DeleteMin until empty. O(Nlog N) Total worst case: O(Nlog N)
6
Shellsort Example with sequence 3, 1. 105471399302229947131053022299301310547222993013105472223099131054722230139910547222... Several inverted pairs fixed in one exchange.
7
Divide-and-conquer
9
Analysis of recursive sorting Suppose it takes time T(N) to sort N elements. Suppose also it takes time N to combine the two sorted arrays. Then: T(1) = 1 T(N) = 2T(N/2) + N, for N>1 Solving for T gives the running time for the recursive sorting algorithm.
10
Divide-and-Conquer Theorem Theorem: Let a, b, c 0. The recurrence relation T(1) = b T(N) = aT(N/c) + bN for any N which is a power of c has upper-bound solutions T(N) = O(N)if a<c T(N) = O(Nlog N)if a=c T(N) = O(N log c a )if a>c a=2, b=1, c=2 for rec. sorting
11
Exact solutions It is sometimes possible to derive closed-form solutions to recurrence relations. Several methods exist for doing this. Telescoping-sum method Repeated-substitution method
12
Mergesort Mergesort is the most basic recursive sorting algorithm. Divide array in halves A and B. Recursively mergesort each half. Combine A and B by successively looking at the first elements of A and B and moving the smaller one to the result array. Note: Should be a careful to avoid creating of lots of result arrays.
13
Mergesort LLRL Use simple indexes to perform the split. Use a single extra array to hold each intermediate result.
14
Analysis of mergesort Mergesort generates almost exactly the same recurrence relations shown before. T(1) = 1 T(N) = 2T(N/2) + N - 1, for N>1 Thus, mergesort is O(Nlog N).
15
Comparison-based sorting Recall that these are all examples of comparison-based sorting algorithms: Items are stored in an array. Can be moved around in the array. Can compare any two array elements. Comparison has 3 possible outcomes:
16
Non-comparison-based sorting If we can do more than just compare pairs of elements, we can sometimes sort more quickly Two simple examples are bucket sort and radix sort
17
Bucket Sort
18
Bucket sort In addition to comparing pairs of elements, we require these additional restrictions: all elements are non-negative integers all elements are less than a predetermined maximum value
19
Bucket sort 13312 1 2 3
20
Bucket sort characteristics Runs in O(N) time. Easy to implement each bucket as a linked list. Is stable: If two elements (A,B) are equal with respect to sorting, and they appear in the input in order (A,B), then they remain in the same order in the output.
21
Radix Sort
22
Radix sort Another sorting algorithm that goes beyond comparison is radix sort. 0 1 0 0 0 0 1 0 1 0 0 1 1 1 1 0 1 1 1 0 0 1 1 0 2051734620517346 0123456701234567 0 1 0 0 0 0 1 0 0 1 1 0 1 0 1 0 0 1 1 1 1 0 1 1 0 0 0 1 0 0 1 0 1 0 0 1 0 1 0 1 1 0 1 1 1 0 1 1 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 Each sorting step must be stable.
23
Radix sort characteristics Each sorting step can be performed via bucket sort, and is thus O(N). If the numbers are all b bits long, then there are b sorting steps. Hence, radix sort is O(bN). Also, radix sort can be implemented in-place (just like quicksort).
24
Not just for binary numbers Radix sort can be used for decimal numbers and alphanumeric strings. 0 3 2 2 2 4 0 1 6 0 1 5 0 3 1 1 6 9 1 2 3 2 5 2 0 3 1 0 3 2 2 5 2 1 2 3 2 2 4 0 1 5 0 1 6 1 6 9 0 1 5 0 1 6 1 2 3 2 2 4 0 3 1 0 3 2 2 5 2 1 6 9 0 1 5 0 1 6 0 3 1 0 3 2 1 2 3 1 6 9 2 2 4 2 5 2
25
Why comparison-based? Bucket and radix sort are much faster than any comparison-based sorting algorithm Unfortunately, we can’t always live with the restrictions imposed by these algorithms In such cases, comparison-based sorting algorithms give us general solutions
26
Back to Quick Sort
27
Review: Quicksort algorithm If array A has 1 (or 0) elements, then done. Choose a pivot element x from A. Divide A-{x} into two arrays: B = {yA | yx} C = {yA | yx} Quicksort arrays B and C. Result is B+{x}+C.
28
Implementation issues Quick sort can be very fast in practice, but this depends on careful coding Three major issues: 1.doing quicksort in-place 2.picking the right pivot 3.avoiding quicksort on small arrays
29
1. Doing quicksort in place 85 24 63 50 17 31 96 45 85 24 63 45 17 31 96 50 LR LR 31 24 63 45 17 85 96 50 LR
30
1. Doing quicksort in place 31 24 63 45 17 85 96 50 LR 31 24 17 45 63 85 96 50 RL 31 24 17 45 50 85 96 63 31 24 17 45 63 85 96 50 LR
31
2. Picking the pivot In real life, inputs to a sorting routine are often partially sorted why does this happen? So, picking the first or last element to be the pivot is usually a bad choice One common strategy is to pick the middle element this is an OK strategy
32
2. Picking the pivot A more sophisticated approach is to use random sampling think about opinion polls For example, the median-of-three strategy: take the median of the first, middle, and last elements to be the pivot
33
3. Avoiding small arrays While quicksort is extremely fast for large arrays, experimentation shows that it performs less well on small arrays For small enough arrays, a simpler method such as insertion sort works better The exact cutoff depends on the language and machine, but usually is somewhere between 10 and 30 elements
34
Putting it all together 85 24 63 50 17 31 96 45 85 24 63 45 17 31 96 50 LR LR 31 24 63 45 17 85 96 50 LR
35
Putting it all together 31 24 63 45 17 85 96 50 LR 31 24 17 45 63 85 96 50 RL 31 24 17 45 50 85 96 63 31 24 17 45 63 85 96 50 LR
36
A complication! What should happen if we encounter an element that is equal to the pivot? Four possibilities: L stops, R keeps going R stops, L keeps going L and R stop L and R keep going
37
Quiz Break
38
Red-green quiz What should happen if we encounter an element that is equal to the pivot? Four possibilities: L stops, R keeps going R stops, L keeps going L and R stop L and R keep going Explain why your choice is the only reasonable one
39
Quick Sort Analysis
40
Worst-case behavior 10547131730222519 5 471317302221910547105173022219 13 17 471051930222 19
41
Best-case analysis In the best case, the pivot is always the median element. In that case, the splits are always “down the middle”. Hence, same behavior as mergesort. That is, O(Nlog N).
42
Average-case analysis Consider the quicksort tree: 10547131730222519 517134730222105 19 51730222105 1347 105222
43
Average-case analysis The time spent at each level of the tree is O(N). So, on average, how many levels? That is, what is the expected height of the tree? If on average there are O(log N) levels, then quicksort is O(Nlog N) on average.
44
Expected height of qsort tree Assume that pivot is chosen randomly. When is a pivot “good”? “Bad”? 51317193047105222 Probability of a good pivot is 0.5. After good pivot, each child is at most 3/4 size of parent.
45
Expected height of qsort tree So, if we descend k levels in the tree, each time being lucky enough to pick a “good” pivot, the maximum size of the k th child is: N(3/4)(3/4) … (3/4) (k times) = N(3/4) k But on average, only half of the pivots will be good, so N(3/4) k/2 = 2log 4/3 N = O(log N)
46
Summary of quicksort A fast sorting algorithm in practice. Can be implemented in-place. But is O(N 2 ) in the worst case. O(Nlog N) average-case performance.
47
Lower Bound for the Sorting Problem
48
How fast can we sort? We have seen several sorting algorithms with O(Nlog N) running time. In fact, O(Nlog N) is a general lower bound for the sorting algorithm. A proof appears in Weiss. Informally…
49
Upper and lower bounds N d g(N) T(N) T(N) = O(f(N)) T(N) = (g(N)) c f(N)
50
Decision tree for sorting a<b<c a<c<b b<a<c b<c<a c<a<b c<b<a a<b<c a<c<b c<a<b b<a<c b<c<a c<b<a a<b<c a<c<b c<a<b a<b<ca<c<b b<a<c b<c<a c<b<a b<a<cb<c<a a<bb<a b<cc<bc<aa<c b<cc<ba<cc<a N! leaves. So, tree has height log(N!). log(N!) = (Nlog N).
51
Summary on sorting bound If we are restricted to comparisons on pairs of elements, then the general lower bound for sorting is (Nlog N). A decision tree is a representation of the possible comparisons required to solve a problem.
52
External Sorting
53
External sorting In many real-world situations, the amount of data to be sorted is much more than can be stored in memory So, it is important in some cases to use algorithms that work well when sorting data stored externally See tomorrow’s recitation…
54
World’s Fastest Sorters
55
Sorting competitions There are several world-wide sorting competitions Unix CoSort has achieved 1GB in under one minute, on a single Alpha Berkeley’s NOW-sort sorted 8.4GB of disk data in under one minute, using a network of 95 workstations Sandia Labs was able to sort 1TB of data in under 50 minutes, using a 144- node multiprocessor machine
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.