Week 14 - Monday
What did we talk about last time? Heaps Priority queues Heapsort
Pros: Best, worst, and average case running time of O(n log n) In-place Good for arrays Cons: Not adaptive Not stable
Make the array have the heap property: 1. Let i be the index of the parent of the last two nodes 2. Bubble the value at index i down if needed 3. Decrement i 4. If i is not less than zero, go to Step 2 1. Let pos be the index of the last element in the array 2. Swap index 0 with index pos 3. Bubble down index 0 4. Decrement pos 5. If pos is greater than zero, go to Step 2
Heap sort is a clever algorithm that uses part of the array to store the heap and the rest to store the (growing) sorted array Even though a priority queue uses both bubble up and bubble down methods to manage the heap, heap sort only needs bubble down You don't need bubble up because nothing is added to the heap, only removed
Timsort is a recently developed sorting algorithm used as the default sort in Python It is also used to sort non-primitive arrays in Java It's a hybrid sort, combining elements of merge sort and insertion sort Features Worst case and average case running time: O(n log n) Best case running time: O(n) Stable Adaptive Not in-place
It is similar to when we insertion sorted arrays of length 10 or smaller We also want to find "runs" of data of two kinds: Non-decreasing:34, 45, 58, 58, 91 Strictly decreasing:85, 67, 24, 18, 7 These runs are already sorted (or only need a reversal) If runs are not as long as a minimum run length determined by the algorithm, the next few values are added in and sorted Finally, the sorted runs are merged together The algorithm can use a specially tuned galloping mode when merging from two lists Essentially copying in bulk from one list when it knows that it won't need something from the other for a while
It might be useful to implement Timsort in class, but it has a lot of special cases It was developed from both a theoretical perspective but also with a lot of testing If you want to know more, read here: Development/Algorithms/Timsort-Sorting- Algorithm/ Development/Algorithms/Timsort-Sorting- Algorithm/
Understanding how sorts work can be challenging Understanding how running time is affected by various algorithms and data sets is not obvious To help, there are many good visualizations of sorting algorithms in action: risonSort.html risonSort.html
Lets focus on an unusual sort that lets us (potentially) get better performance than O(n log n) But, I thought O(n log n) was the theoretical maximum!
You use counting sort when you know that your data is in a narrow range, like, the numbers between 1 and 10 or even 1 and 100 As long as the range of possible values is in the neighborhood of the length of your list, counting sort can do well Example: 150 students with integer grades between 1 and 100 Doesn’t work for sorting double or String values
Make an array with enough elements to hold every possible value in your range of values If you need 1 – 100, make an array with length 100 Sweep through your original list of numbers, when you see a particular value, increment the corresponding index in the value array To get your final sorted list, sweep through your value array and, for every entry with value k > 0, print its index k times
We know our values will be in the range [1,10] Our example array: Our values array: The result:
It takes O(n) time to scan through the original array But, now we have to take into account the number of values we expect So, let’s say we have m possible values It takes O(m) time to scan back through the value array, with O(n) additional updates to the original array Time: O(n + m)
We can “generalize” counting sort somewhat Instead of looking at the value as a whole, we can look at individual digits (or even individual characters) For decimal numbers, we would only need 10 buckets (0 – 9) First, we bucket everything based on the least significant digits, then the second least, etc. The book discusses MSD and LSD string sorts, which are similar
Pros: Best, worst, and average case running time of O(nk) where k is the number of digits Stable for least significant digit (LSD) version Simple implementation Cons: Requires a fixed number of digits to be checked Unstable for most significant digit (MSD) version Works poorly for floating point and non-digit based keys Not in-place
For integers, make 10 buckets (0-9) Each bucket contains a queue Starting with the 1’s place and going up a place each time: Enqueue each item into the bucket whose value matches the value of the item at a particular place Starting with bucket 0, and going up to bucket 9, dequeue all the items into the original array
Tries
Work on Project 4 Work on Assignment 7 Due when you return from Thanksgiving Read section 5.2 Office hours are canceled today because of a visiting faculty candidate