Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005.

Similar presentations


Presentation on theme: "Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005."— Presentation transcript:

1 Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005

2 Recap

3 Sorting Comparison  We can categorize sorting algorithms into two major classes Fast SortsversusSlow Sorts O (N log 2 N)O (N 2 ) slow sorts are easy to code and sufficient when the amount of data is small

4 Basic Sorting Algorithms  Bubble Sort  fix flips until no more flips  Insertion Sort  Insert a[i] to the sorted array [0…i-1]  Advantages  Simple to implemen  Good for small data sets  Disadvantages  O(n 2 ) algorithms

5 Recursive Sorting Algorithms  QuickSort  Average case – O(n log n)  Worst Case – O(n 2 )  Merge Sort  All cases O(n log n)  Extra Memory – O(n)

6 Comparison Chart Almost sorted In reverse order Random order All equal bubble insertion merge quicksort

7 Analysis of recursive sorting  Suppose it takes time T(N) to sort N elements.  Suppose also it takes time N to combine the two sorted arrays.  Then:  T(1) = 1  T(N) = 2T(N/2) + N, for N>1  Solving for T gives the running time for the recursive sorting algorithm.

8 QuickSort Example  Sort the following using qsort

9 Quicksort implementation

10 Implementation issues  Quick sort can be very fast in practice, but this depends on careful coding  Three major issues: 1.dividing the array in-place 2.picking the right pivot 3.avoiding quicksort on small arrays

11 2. Picking the pivot  In real life, inputs to a sorting routine are often not completely random  So, picking the first or last element to be the pivot is usually a bad choice  One common strategy is to pick the middle element  this is an OK strategy

12 2. Picking the pivot  A more sophisticated approach is to use random sampling  think about opinion polls  For example, the median-of-three strategy:  take the median of the first, middle, and last elements to be the pivot

13 3. Avoiding small arrays  While quicksort is extremely fast for large arrays, experimentation shows that it performs less well on small arrays  For small enough arrays, a simpler method such as insertion sort works better  The exact cutoff depends on the language and machine, but usually is somewhere between 10 and 30 elements

14 A complication!  What should happen if we encounter an element that is equal to the pivot?  Four possibilities:  L stops, R keeps going  R stops, L keeps going  L and R stop  L and R keep going

15 A complication!  What should happen if we encounter an element that is equal to the pivot?  Four possibilities:  L stops, R keeps going (right list longer)  R stops, L keeps going (left list longer)  L and R stop (lists equal)  L and R keep going (left list longer)

16 Quick Sort  Algorithm  Partitioning Step  Choose a pivot element say a = v[j]  Determine its final position in the sorted array a > v[I] for all I < j  a j  Recursive Step  Perform above step on left array and right array  An early look at quicksort code (incomplete) void quicksort(int[] A, int left, int right) { int I; if (right > left) { pivot = Pivot(A, left, right); I = partition(A, left, right, pivot); quicksort(A, left, I-1, pivot); quicksort(A, I+1, right, pivot); }

17 Quick Sort Code ctd.. // Suppose that the pivot is p // Partition(): rearrange A into 2 sublists // S 1 = { x  A | x p } int Partition(int[] A, int left, int right) { if (A[left] > A[right]) swap(A[left], A[right]); char pivot = A[left]; int i = left; int j = right+1; do { do ++i; while (A[i] < pivot); do --j; while (A[j] > pivot); if (i < j) { Swap(A[i], A[j]); } } while (i < j); Swap(A[j], A[left]); return j; // j is the position of the pivot after rearrangement }

18 Quick Sort Analysis

19 Worst-case behavior 10547131730222519 5 471317302221910547105173022219 13 17 471051930222 19 If always pick the smallest (or largest) possible pivot then O(n 2 ) steps

20 Best-case analysis  In the best case, the pivot is always the median element.  In that case, the splits are always “down the middle”.  Hence, same behavior as mergesort.  That is, O(N log N).

21 Average-case analysis  Consider the quicksort tree: 10547131730222519 517134730222105 19 51730222105 1347 105222

22 Average-case analysis  At each level of the tree, there are less than N nodes.  So, time spent at each level is O(N).  On average, how many levels?  That is, what is the expected height of the tree?  If on average there are O(log N) levels, then quicksort is O(N log N) on average.

23 Expected height of qsort tree  Assume that pivot is chosen randomly.  And that ½ the pivots are good, and ½ are bad.  Which elements in the list below are “good” pivots? 51317193047105222

24 Expected height of qsort tree  Assume that pivot is chosen randomly.  And that ½ the pivots are good, and ½ are bad.  When is a pivot “good”? “Bad”? 51317193047105222 Probability of a good pivot is 0.5. After good pivot, each partition is at most 3/4 size of original array.

25 Expected height of qsort tree  So, if we descend k levels in the tree, each time being lucky enough to pick a “good” pivot, the maximum size of the k th child is:  N(3/4)(3/4) … (3/4) (k times)  = N(3/4) k  But on average, only half of the pivots will be good, so  k th child has size N(3/4) k/2

26 Expected height of qsort tree  But, if the k th child is a leaf, then N(3/4) k/2 = 1  Thus, the expected height k = 2log 4/3 N = O(log N)

27 Summary of quicksort  A fast sorting algorithm in practice.  Can be implemented in-place.  But is O(N 2 ) in the worst case.  O(N log N) average-case performance.

28 Shell Sort

29 Shellsort  Shellsort, like bubble sort and insertion sort, is based on performing exchanges on inverted pairs.  Start by picking a decrement sequence h k, h k-1, …, h 1, where h 1 =1 and for each h i > h i-1.  Start with h k and exchange each pair of inverted array elements that are k elements apart.  Continue with h k-1, …, h 1.

30 Shellsort  Example with sequence 3, 1. 105471399302229947131053022299301310547222993013105472223099131054722230139910547222... Several inverted pairs fixed in one exchange.

31 Shellsort characteristics  The running time for shellshort depends on the decrement sequence chosen.  h k =N/2, h k-1 =h k /2:  Worst-case O(N 2 ).  Let h k =2 i -1, for largest 2 i -1<N. h k-1 =2 i-1 -1.  Example: 15, 7, 3, 1.  Worst-case O(N 3/2 ).  Other sequences achieve O(N 4/3 ).

32 Non-Comparison based Sorting

33 Non-comparison-based sorting  If we can do more than just compare pairs of elements, we can sometimes sort more quickly  Two simple examples are bucket sort and radix sort

34 Bucket Sort

35 Bucket sort  In addition to comparing pairs of elements, we require these additional restrictions:  all elements are non-negative integers  all elements are less than a predetermined maximum value  Elements are usually keys paired with other data

36 Bucket sort 13312 1 2 3

37 Bucket sort characteristics  Runs in O(N) time.  Easy to implement each bucket as a linked list.  Is stable:  If two elements (A,B) are equal with respect to sorting, and they appear in the input in order (A,B), then they remain in the same order in the output.

38 Work area

39 Radix Sort

40 Radix sort  If your integers are in a larger range then do bucket sort on each digit  Start by sorting with the low-order digit using a STABLE bucket sort.  Then, do the next-lowest,and so on

41 Radix sort  Example: 0 1 0 0 0 0 1 0 1 0 0 1 1 1 1 0 1 1 1 0 0 1 1 0 2051734620517346 0123456701234567 0 1 0 0 0 0 1 0 0 1 1 0 1 0 1 0 0 1 1 1 1 0 1 1 0 0 0 1 0 0 1 0 1 0 0 1 0 1 0 1 1 0 1 1 1 0 1 1 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 Each sorting step must be stable.

42 Radix sort characteristics  Each sorting step can be performed via bucket sort, and is thus O(N).  If the numbers are all b bits long, then there are b sorting steps.  Hence, radix sort is O(bN).  Also, radix sort can be implemented in-place (just like quicksort).

43 Not just for binary numbers  Radix sort can be used for decimal numbers and alphanumeric strings. 0 3 2 2 2 4 0 1 6 0 1 5 0 3 1 1 6 9 1 2 3 2 5 2 0 3 1 0 3 2 2 5 2 1 2 3 2 2 4 0 1 5 0 1 6 1 6 9 0 1 5 0 1 6 1 2 3 2 2 4 0 3 1 0 3 2 2 5 2 1 6 9 0 1 5 0 1 6 0 3 1 0 3 2 1 2 3 1 6 9 2 2 4 2 5 2

44 Why comparison-based?  Bucket and radix sort are much faster than any comparison-based sorting algorithm  Unfortunately, we can’t always live with the restrictions imposed by these algorithms  In such cases, comparison-based sorting algorithms give us general solutions

45 Lower Bound for the Sorting Problem

46 How fast can we sort?  We have seen several sorting algorithms with O(N log N) running time.  In fact, O(N log N) is a general lower bound for the sorting algorithm.  A proof appears in Weiss.  Informally…

47 Upper and lower bounds N d  g(N) T(N) T(N) = O(f(N)) T(N) = (g(N)) c  f(N)

48 Decision tree for sorting a<b<c a<c<b b<a<c b<c<a c<a<b c<b<a a<b<c a<c<b c<a<b b<a<c b<c<a c<b<a a<b<c a<c<b c<a<b a<b<ca<c<b b<a<c b<c<a c<b<a b<a<cb<c<a a<bb<a b<cc<bc<aa<c b<cc<ba<cc<a N! leaves. So, tree has height log(N!). log(N!) =  (N log N).

49 Summary on sorting bound  If we are restricted to comparisons on pairs of elements, then the general lower bound for sorting is  (N log N).  A decision tree is a representation of the possible comparisons required to solve a problem.

50 Quickselect – finding median

51 World’s Fastest Sorters

52 Sorting competitions  There are several world-wide sorting competitions  Unix CoSort has achieved 1GB in under one minute, on a single Alpha  http://www.cosort.com  Berkeley’s NOW-sort sorted 8.4GB of disk data in under one minute, using a network of 95 workstations  http://now.cs.berkeley.edu/  Sandia Labs was able to sort 1TB of data in under 50 minutes, using a 144-node multiprocessor machine


Download ppt "Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005."

Similar presentations


Ads by Google