Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 1 Chapter 7 Sorting Sort is a very useful and frequently used operation Require fast algorithm Easy algorithms sort in O(N 2 ) Complicate algorithms sort in O(N log N) Any general-purpose sorting algorithm requires (N log N) comparisons
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 2 Sorting What is covered in this chapter: Sort array of integers Comparison-based sorting main operations are compare and swap Assume that the entire sort can be done in main memory
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 3 Sorting Algorithms Insertion Sort Shellsort Heapsort Mergesort Quicksort Bucket Sort
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 4 Insertion Sort Sort N elements in N-1 passes ( pass 1 to N-1 ) In pass p –insertion sort ensures that the elements in positions 0 through p are in sorted order –elements in positions 0 through p-1 are already in sorted order –move the element in position p to the left until its correct place is found
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 5 Insertion Sort
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 6 Insertion Sort The element in position p is saved in tmp All larger elements prior to position p are moved one spot to the right Then tmp is placed in the correct spot
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 7 public static void insertionSort( Comparable [ ] a ) { int j; for( int p = 1; p < a.length; p++ ) { Comparable tmp = a[ p ]; for( j = p; j > 0 && tmp.compareTo( a[ j - 1 ] ) < 0; j-- ) a[ j ] = a[ j - 1 ]; a[ j ] = tmp; }
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 8 Analysis Insertion sort has nested loops. Each loop can have N iterations. So, insertion sort is O(N 2 ). The inner loop can be executed at most p+1 times for each value of p. For all p = 1 to N-1, the inner loop can be executed at most N = (N 2 ) Input in reverse order can achieve this bound.
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 9 Analysis If the input is presorted, the running time is O(N) because the test in the inner for loop always fails immediately. The average case is (N 2 )
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 10 Shellsort Sort N element in t passes Each pass k has an associated value h k The t passes use a sequence of h 1, h 2,..., h t (called increment sequence) The first pass uses h t and the last pass uses h 1 h t >... > h 2 > h 1 and h 1 = 1 In each pass, all elements spaced h k apart are sorted
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 11 Shellsort
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 12 Shellsort A sequence that is sorted using h k is said to be h k -sorted An h k -sorted sequence that is then h k-1 sorted remains h k -sorted An h k -sort performs an insertion sort on h k independent subarrays
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 13 Shellsort Any increment sequence works, as long as h 1 = 1 Some choices are better than others A popular (but poor) increment sequence is 1, 2, 4, 8,..., N/2 h t = N/2, and h k = h k+1 /2
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 14 Shellsort v.s. Insertion Sort The last pass of shellsort performs an insertion sort on the whole array (h 1 -sort). But shellsort is better than insertion sort because shellsort perform insertion sorts on presorted arrays
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 15 public static void shellsort( Comparable [ ] a ) { int j; for( int gap = a.length / 2; gap > 0; gap /= 2 ) for( int i = gap; i < a.length; i++ ) { Comparable tmp = a[ i ]; for( j = i; j >= gap && tmp.compareTo(a[j-gap]) < 0; j -= gap ) a[ j ] = a[ j - gap ]; a[ j ] = tmp; }
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 16 A Bad Case of Shellsort N is a power of 2 All the increments are even, except the last increment, which is 1. The N/2 largest numbers are in the even positions and the N/2 smallest numbers are in the odd positions
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 17 A Bad Case of Shellsort
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 18 A Bad Case of Shellsort No sorting is performed until the last pass i th smallest number (i ฃ N/2) is in position 2i-1 Restoring the i th element to its correct place requires moving it i-1 spaces Restoring N/2 smallest numbers requires = (N 2 ) work
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 19 Worst-Case Analysis A pass with increment h k consists of h k insertion sorts of about N/h k elements Since insertion sort is quadratic, the total cost of a pass is O(h k (N/h k ) 2 ) = O(N 2 /h k ) Summing over all passes gives
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 20 Hibbard’s Increments Increment sequence 1, 3, 7, 15,..., 2 k - 1 h k+1 = 2 h k + 1 Consecutive increments have no common factors Worst case running time of Shellsort using Hibbard’s increment is (N 3/2 )
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 21 Analysis of Hibbard’s Increments Input of h k -sort is already h k+1 -sorted and h k+2 - sorted (e.g. input of 3-sort is already 7-sorted and 15-sorted) Let i be the distance between two elements. If i is expressible as a linear combination of h k+1 and h k+2, then a[p-i] ฃ a[p] For example, 52 = 1*7 + 3*15, so a[100] ฃ a[152] because a[100] ฃ a[107] ฃ a[122] ฃ a[137] ฃ a[152]
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 22 Analysis of Hibbard’s Increments All integers ณ (h k+1 -1)(h k+2 -1) = 8h k 2 + 4h k can be expressed as a linear combination of h k+1 and h k+2 Proof: i = x*h k+1 + y*h k+2 i+1 = x*h k+1 + y*(2*h k+1 +1) +1 i+1 = x*h k+1 + y*(2*h k+1 +1) - 2*h k+1 + 2*h k i+1 = (x-2)*h k+1 + (y+1)*h k+2
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 23 Analysis of Hibbard’s Increments So, a[p-i] ฃ a[p] if i ณ 8h k 2 + 4h k In each pass, a[p] is never moved further than a[p-i] or 8h k 2 + 4h k elements to the left The innermost for loop is executed at most 8h k + 4 = O(h k ) times for each position. So, each pass has O(Nh k ) running time.
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 24 Analysis of Hibbard’s Increments For h k > N 1/2, use the bound O(N 2 /h k ). For h k ฃ N 1/2 use the bound O(Nh k ) About half of the increment sequence satisfy h k N 1/2 The total time is
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 25 Sedgewick’s Increments Sedgewick’s increments is {1, 5, 19, 41, 109,...} which can be term as 9*4 i - 9*2 i + 1 or 4 i -3*2 i + 1 O(N 4/3 ) worst-case time and O(N 7/6 ) average time
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 26 Heap Sort Build a binary heap of N elements and then perform N deleteMin operations Building a heap takes O(N) time and N deleteMin operations take O(N log N) time The total running time is O(N log N)
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 27 Heap Sort The sorted elements, which are taken out of the heap, can be place in another array. To avoid using extra array to keep result, replace the last element in the heap with the element taken out of the heap. To get the result in increasing order, use max- heap instead.
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University After one deleteMax
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University After two deleteMax After three deleteMax
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 30 public static void heapsort( Comparable [ ] a ) { for( int i = a.length / 2; i >= 0; i-- ) percDown( a, i, a.length ); for( int i = a.length - 1; i > 0; i-- ) { swapReferences( a, 0, i ); percDown( a, 0, i ); } private static int leftChild( int i ) { return 2 * i + 1; } // array begins at index 0 public static final void swapReferences( Object [ ] a, int index1, int index2 ) { Object tmp = a[ index1 ]; a[ index1 ] = a[ index2 ]; a[ index2 ] = tmp; }
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 31 private static void percDown( Comparable [ ] a, int i, int n ) { int child; Comparable tmp; for( tmp = a[ i ]; leftChild( i ) < n; i = child ) { child = leftChild( i ); if( child != n - 1 && a[ child ].compareTo( a[ child + 1 ] ) < 0 ) child++; if( tmp.compareTo( a[ child ] ) < 0 ) a[ i ] = a[ child ]; else break; } a[ i ] = tmp; }
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 32 Analysis Building the heap uses at most 2N comparisons deleteMax uses at most 2N log N - O(N) comparisons So, heapsort uses at most 2N log N - O(N) comparison
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 33 Analysis Worst-case and average-case are only slightly different Average number of comparison is 2N log N - O(N log log N)
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 34 Mergesort The fundamental operation is merging two sorted lists. Because the lists are sorted, this can be done in one pass through the input, if the output is put in a third list.
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 35 Mergesort Mergesort takes two input arrays A and B, an output array C, and three counters, Actr, Bctr, and Cctr. The smaller of A[Actr] and B[Bctr] is copied to the next entry in C, and appropriate counters are advanced Remaining input items are copied to C
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 36 Mergesort ActrBctrCctr ActrBctrCctr ActrBctrCctr
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 37 Mergesort ActrBctrCctr ActrBctrCctr ActrBctrCctr
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 38 Mergesort ActrBctrCctr ActrBctrCctr ActrBctrCctr
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 39 Mergesort If N > 1, recursively mergesort the first half and the second half If N = 1, only one element to sort -> the base case
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 40 Mergesort: Divide and Conquer
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 41 Analysis Merging two sorted lists is linear, because at most N-1 comparisons are made For N = 1, the time to mergesort is constant Otherwise, the time to mergesort N numbers is the time to do two recursive mergesorts of size N/2, plus the linear time to merge T(N) = N log N + N
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 42 public static void mergeSort( Comparable [ ] a ) { Comparable [ ] tmpArray = new Comparable[ a.length ]; mergeSort( a, tmpArray, 0, a.length - 1 ); } private static void mergeSort( Comparable [ ] a, Comparable [ ] tmpArray, int left, int right ) { if( left < right ) { int center = ( left + right ) / 2; mergeSort( a, tmpArray, left, center ); mergeSort( a, tmpArray, center + 1, right ); merge( a, tmpArray, left, center + 1, right ); }
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 43 private static void merge( Comparable [ ] a, Comparable [ ] tmpArray, int leftPos, int rightPos, int rightEnd ) { int leftEnd = rightPos - 1; int tmpPos = leftPos; int numElements = rightEnd - leftPos + 1; while( leftPos <= leftEnd && rightPos <= rightEnd ) if( a[ leftPos ].compareTo( a[ rightPos ] ) <= 0 ) tmpArray[ tmpPos++ ] = a[ leftPos++ ]; else tmpArray[ tmpPos++ ] = a[ rightPos++ ]; while( leftPos <= leftEnd ) // Copy rest of first half tmpArray[ tmpPos++ ] = a[ leftPos++ ]; while( rightPos <= rightEnd ) // Copy rest of right half tmpArray[ tmpPos++ ] = a[ rightPos++ ]; for( int i = 0; i < numElements; i++, rightEnd-- ) a[ rightEnd ] = tmpArray[ rightEnd ]; }
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 44 Quicksort Divide-and-Conquer recursive algorithm 1. If the number of elements in S is 0 or 1, then return 2. Pick any element v in S. This is called the pivot 3. Partition the remaining elements in S ( S - {v} ) into two disjoint groups: S 1 and S 2. S 1 contains elements ฃ v, S 2 contains elements ณ v 4. Return {quicksort(S 1 ), v, quicksort(S 2 )}
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University select pivot partition quicksort
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 46 Quicksort v.s. Mergesort In Quicksort, subproblems need not be of equal size Quicksort is faster because partitioning step can be performed in place and very efficiently
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 47 Picking the Pivot Use the first element as the pivot –Bad choice –If input is presorted or in reverse order, the pivot makes poor partitioning because either all elements go into S 1 or they go into S 2 –If the input is presorted, quicksort will take quadratic time to do nothing useful Use the larger of the first two distinct elements –Also bad
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 48 Picking the Pivot Choose the pivot randomly –generally safe –generating random numbers is expensive –does not reduce the average running time Median-of-Three Partitioning –The best choice would be the median of the array –A good estimation is to use the median of the left, right, and center elements as pivot
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 49 Partitioning Strategy 1. Swap the pivot with the last element 2. i starts at the first element and j starts at the next-to-last element 3. Move i right, skipping over elements smaller than the pivot. Move j left, skipping over elements larger than the pivot. Both i and j stops if encounter an element equal to the pivot 4. When i and j stop, if i is to the left of j, swap their elements 5. Repeat 3 and 4 until i and j cross 6. Swap the pivot with i’s element
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 50 Partitioning ijiijjij
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 51 Partitioning iijji pivot
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 52 Small Arrays For very small arrays (N ฃ 20), quicksort does not perform as well as insertion sort. Quicksort is recursive. So, these cases occur frequently. So, use a sorting algorithm that is efficient for small arrays, such as insertion sort. A good cutoff range is N = 10
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 53 public static void quicksort( Comparable [ ] a ) { quicksort( a, 0, a.length - 1 ); } private static void quicksort( Comparable [ ] a, int left, int right ) { if( left + CUTOFF <= right ) { Comparable pivot = median3( a, left, right ); int i = left, j = right - 1; for( ; ; ) { while( a[ ++i ].compareTo( pivot ) < 0 ) { } while( a[ --j ].compareTo( pivot ) > 0 ) { } if( i < j ) swapReferences( a, i, j ); else break; } swapReferences( a, i, right - 1 ); quicksort( a, left, i - 1 ); quicksort( a, i + 1, right ); } else insertionSort( a, left, right ); }
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 54 private static Comparable median3( Comparable [ ] a, int left, int right ) { int center = ( left + right ) / 2; if( a[ center ].compareTo( a[ left ] ) < 0 ) swapReferences( a, left, center ); if( a[ right ].compareTo( a[ left ] ) < 0 ) swapReferences( a, left, right ); if( a[ right ].compareTo( a[ center ] ) < 0 ) swapReferences( a, center, right ); // Place pivot at position right - 1 swapReferences( a, center, right - 1 ); return a[ right - 1 ]; }
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 55 Analysis T(N) = T(i) + T(N - i - 1) + cN; i = |S 1 | Worst-Case T(N) = T(N - 1) + cN T(N) = T(1) + c i = O(N 2 )
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 56 Analysis Best Case T(N) = 2T(N/2) + cN T(N) = cN log N + N = O(N log N) Average Case T(N) = 2/N ( T(j) ) + cN T(N) = O(N log N)
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 57 Bucket Sort Sort N integers in the range 1 to M Use M buckets, one bucket for each integer i Bucket i stores how many times i appears in the input. Initially, all buckets are empty. Read input and increase values in buckets Finally, scan the buckets and print the sorted list
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 58 Bucket Sort , 1, 3, 5, 8, 7, 4, 2, 9, 5, 4, 10, 4 1, 2, 3, 3, 4, 4, 4, 5, 5, 7, 8, 9, 10
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 59 Bucket Sort Input A 1, A 2,..., A N consist of positive integer smaller than M Keep an array count[ ] of size M, which is initialized to all 0s When A i is read, increment count[A i ] by 1 After all input is read, scan the count array, printing out the sorted list This algorithm takes O(M+N)