CS 146: Data Structures and Algorithms July 9 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 2 Insertion Sort One of the simplest and intuitive algorithms. The way you would manually sort a deck of cards. Make N–1 passes over the list of data. For pass p = 1 through N–1, the algorithm guarantees that the data in positions 0 through p–1 are already sorted.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 3 Insertion Sort The inner for loop terminates quickly if the tmp value does not need to be inserted too far into the sorted part. The entire sort finishes quickly if the data is nearly sorted: O(N). public static > void insertionSort(AnyType[] a) { int j; for (int p = 1; p < a.length; p++) { AnyType tmp = a[p]; for (j = p; j > 0 && tmp.compareTo(a[j-1]) < 0; j--) { a[j] = a[j-1]; } a[j] = tmp; } Slide values in the sorted part of the list one to the right to make room for a new member of the sorted part. Does this value belong in the sorted part?
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 4 Shellsort Like insertion sort, except we compare values that are h elements apart in the list. h diminishes after completing a pass, for example, 5, 3, and 1. The final value of h must be 1, so the final pass is a regular insertion sort. The previous passes get the array “nearly sorted” quickly.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 5 Shellsort, cont’d After each pass, the array is said to be h k -sorted. Examples: 5-sorted, 3-sorted, etc.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 6 Shellsort public static > void shellsort(AnyType[] a) { int j; for (int h = a.length/2; h > 0; h /= 2) { for (int i = h; i < a.length; i++) { AnyType tmp = a[i]; for (j = i; j >= h && tmp.compareTo(a[j-h]) < 0; j -= h) { a[j] = a[j-h]; } a[j] = tmp; } Use the (suboptimal) sequence for h which starts at half the list length and is halved for each subsequent pass.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 7 Insertion Sort vs. Shellsort Insertion sort is slow because it swaps only adjacent values. A value may have to travel a long way through the array during a pass, one element at a time, to arrive at its proper place in the sorted part of the array.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 8 Insertion Sort vs. Shellsort, cont’d Shellsort is able to move a value a longer distance ( h ) without making the value travel through the intervening values. Early passes with large h make it easier for later passes with smaller h to sort. The final value of h = 1 is a simple insertion sort. Choosing a good increment sequence for h can produce a 25% speedup of the sort.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 9 Heapsort Heapsort is based on using a priority queue. Which we implement as a binary heap. Which we implement using an underlying array. To sort N values into increasing order: Build a min heap O(N ) time Do N deletions to get the values in order. Each deletion takes O(log N ) time, so total O(N log N) time.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 10 Heapsort, cont’d But where to put the sorted values? Append them to the end of underlying array as values are being deleted one by one.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 11 Mergesort Divide and conquer! Divide Split the list of values into two halves. Recursively sort each of the two halves. Conquer Merge the two sorted sublists back into a single sorted list. Nearly the optimal number of comparisons.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 12 Mergesort public static > void mergeSort(AnyType[] a) { AnyType[] tmpArray = (AnyType[]) new Comparable[a.length]; mergeSort(a, tmpArray, 0, a.length - 1); } private static > void mergeSort(AnyType[] a, AnyType[] tmpArray, int left, int right) { if (left < right) { int center = (left + right)/2; mergeSort(a, tmpArray, left, center); mergeSort(a, tmpArray, center+1, right); merge(a, tmpArray, left, center+1, right); } }
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 13 Mergesort private static > void merge(AnyType[] a, AnyType[] tmpArray, int leftPos, int rightPos, int rightEnd) { int leftEnd = rightPos - 1; int tmpPos = leftPos; int numElements = rightEnd - leftPos + 1; while (leftPos <= leftEnd && rightPos <= rightEnd) { if (a[leftPos].compareTo(a[rightPos]) <= 0) { tmpArray[tmpPos++] = a[leftPos++]; } else { tmpArray[tmpPos++] = a[rightPos++]; } }... } Do the merge.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 14 Mergesort private static > void merge(AnyType[] a, AnyType[] tmpArray, int leftPos, int rightPos, int rightEnd) {... while (leftPos <= leftEnd) { tmpArray[tmpPos++] = a[leftPos++]; } while (rightPos <= rightEnd) { tmpArray[tmpPos++] = a[rightPos++]; } for (int i = 0; i < numElements; i++, rightEnd--) { a[rightEnd] = tmpArray[rightEnd]; } } Copy the rest of the first half. Copy the rest of the second half. Copy from the temporary array back into the original.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 15 Analysis of Mergesort How long does it take mergesort to run? Let T(N) be the time to sort N values. It takes a constant 1 if N = 1. It takes T(N/2) to sort each half. N to do the merge. Therefore, we have a recurrence relation: T(N) = 1if N = 1 2T(N/2) + Nif N > 1 {
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 16 Analysis of Mergesort Solve: T(N) = 1if N = 1 2T(N/2) + Nif N > 1 { Divide both sides by N : Assume N is a power of 2. Telescope: Since the equation is valid for any N that’s a power of 2, successively replace N by N/2 : Add together, and many convenient cancellations will occur.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 17 Analysis of Mergesort since there are log N number of 1’s. Multiply through by N : And so mergesort runs in O(N log N) time.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 18 Mergesort for Linked Lists Mergesort does not rely on random access to the values in the list. Therefore, it is well-suited for sorting linked lists.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 19 Mergesort for Linked Lists, cont’d How do we split a linked list into two sublists? Splitting it at the midpoint is not efficient. Idea: Iterate down the list and assign the nodes alternating between the two sublists. Merging two sorted sublists should be easy.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak Break 20
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 21 Partitioning a List of Values Are there better ways to partition (split) a list of values other than down the middle?
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 22 Partitioning a List of Values, cont’d Pick an arbitrary “pivot value” in the list. Move all the values less than the pivot value into one sublist. Move all the values greater than the pivot value into the other sublist. Now the pivot value is in its “final resting place”. It’s in the correct position for the sorted list. Recursively sort the two sublists. The pivot value doesn’t move. Challenge: Find a good pivot value.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 23 Mark Allen Weiss Data Structures and Algorithms in Java (c) 2006 Pearson Education, Inc. All rights reserved
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak Partition a List Using a Pivot Given a list, pick an element to be the pivot. There are various strategies to pick the pivot. The simplest is to pick the first element of the list. First get the chosen pivot value “out of the way” by swapping with the value currently at the right end
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak Partition a List Using a Pivot, cont’d Goal: Move all values pivot to the right part of the list
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 26 Partition a List Using a Pivot, cont’d Set index i to the left end of the list and index j to one from the right end. While i < j : Move i right, skipping over values < pivot. Stop i when it reaches a value ≥ pivot. Move j left, skipping over values > pivot. Stop j when it reaches a value ≤ pivot. After both i and j have stopped, swap the values at i and j i j
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 27 Partition a List Using a Pivot, cont’d i j i j Move j : Swap: i j i j j i Move i and j : Swap: Move i and j. They’ve crossed! j i Swap the pivot with the i th element: Now the list is properly partitioned for quicksort!
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak Sorting Statistics 28 public class Stats { long moves; long compares; long time;... }
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 29 Quicksort A fast divide-and-conquer sorting algorithm. A very tight and highly optimized inner loop. Looks like magic in animation. Average running time is O(N log N). Worst-case running time is O(N 2 ). The worst case be made to occur very unlikely. Basic idea: Partition the list using a pivot. Recursively sort the two sublists. Sounds like mergesort, but does not require merging or a temporary array. One of the most elegant and useful algorithms in computer science.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 30 Quicksort Pivot Strategy Quicksort is a fragile algorithm! It is sensitive to picking a good pivot. Attempts to improve the algorithm can break it. Simplest pivot strategy: Pick the first element of the list. Worst strategy if the list is already sorted. Running time O(N 2 ).
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak First Element Pivot Strategy 31 public interface PivotStrategy { public Integer choosePivot(Integer[] a, int left, int right, Stats stats); } public class PivotFirst implements PivotStrategy { public Integer choosePivot(Integer[] a, int left, int right, Stats stats) { Utilities.swapReferences(a, left, right); stats.moves += 2; return a[right]; } Pivot is first element Swap it with the right. Demo
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 32 Median-of-Three Pivot Strategy A good pivot value would be the median value of the list. The median of a list of unsorted numbers is nontrivial to compute. Compromise: Examine the two values at the ends of the list and the value at the middle position of the list. Choose the value that’s in between the other two.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 33 Median-of-Three Pivot Strategy, cont’d public class PivotMedianOfThree implements PivotStrategy { public Integer choosePivot(Integer[] a, int left, int right, Stats stats) { int center = (left + right)/2; if (a[center].compareTo(a[left]) < 0) { Utilities.swapReferences(a, left, center); stats.moves += 2; } if (a[right].compareTo(a[left]) < 0) { Utilities.swapReferences(a, left, right); stats.moves += 2; } if (a[right].compareTo(a[center]) < 0) { Utilities.swapReferences(a, center, right); stats.moves += 2; } stats.compares += 3; Order the left, center, and right elements.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak Median-of-Three Pivot Strategy, cont’d 34 Utilities.swapReferences(a, center, right); stats.moves += 2; return a[right]; } Pivot is the center element Swap it with the right.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 35 Quicksort Recursion private Stats quicksort(Integer[] a, int left, int right) { Stats stats = new Stats(); if (left <= right) { Integer pivot = pivotStrategy.choosePivot(a, left, right, stats); int p = partition(a, left, right, pivot, stats); Stats stats1 = quicksort(a, left, p-1); // Sort small elements Stats stats2 = quicksort(a, p+1, right); // Sort large elements stats.moves += (stats1.moves + stats2.moves); stats.compares += (stats1.compares + stats2.compares); } return stats; }
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 36 Quicksort Partitioning private int partition(Integer[] a, int left, int right, Integer pivot, Stats stats) { int i = left-1; int j = right; while (i < j) { do { i++; stats.compares++; } while ((i <= right) && a[i].compareTo(pivot) < 0); do { j--; stats.compares++; } while ((j >= left) && a[j].compareTo(pivot) > 0); if (i < j) { Utilities.swapReferences(a, i, j); stats.moves += 2; } } Move i to the right. Move j to the left. Swap.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak Quicksort Partitioning, cont’d 37 Utilities.swapReferences(a, i, right); stats.moves += 2; return i; } Restore the pivot’s position.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 38 Mergesort vs. Quicksort In the standard Java library: Mergesort is used to sort arrays of object types. It uses the lowest number of comparisons. Comparing objects can be slow in Java for objects that implement the Comparable interface. Quicksort is used to sort arrays of primitive types. In the standard C++ library: Quicksort is used for the generic sort. Copying large objects can be expensive. Comparing objects can be cheap if the compiler can generate optimized code to do comparisons inline.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak 39 Quicksort Quicksort doesn’t do well for very short lists. When a sublist becomes too small, use another algorithm to sort the sublist such as insertion sort. The textbook uses a cutoff of size 10 for a sublist.
Computer Science Dept. Summer 2015: July 9 CS 146: Data Structures and Algorithms © R. Mak Sorting Animations omparisonSort.html omparisonSort.html