Computer Science 112 Fundamentals of Programming II Finding Faster Algorithms
Bubble Sort Strategy Compare the first two items and if they are out of order, exchange them Repeat this process for the second and third items, etc. At the end of this process, the largest itemwill have bubbled down to the end of the list Repeat this process for the unsorted portion of the list, etc.
set n to the length of the list while n > 1 bubble the elements from position 0 to position n - 1 decrement n Formalize the Strategy
set n to the length of the list while n > 1 for each position i from 1 to n - 1 if the elements at i and i - 1 are out of order swap them decrement n Refine the Strategy
def bubbleSort(lyst): n = len(lyst) while n > 1: # Do n - 1 bubbles #i = 1 # Start each bubble for i in range(1, n): if lyst[i] < lyst[i - 1]: # Swap if needed swap(lyst, i, i – 1) n -= 1 Implement bubbleSort Analysis: How many iterations does the outer loop perform? How many iterations does the inner loop perform?
def bubbleSort(lyst): n = len(lyst) while n > 1: isSorted = True i = 1 for i in range(n): if lyst[i] < lyst[i - 1]: swap(lyst, i, i – 1) isSorted = False if isSorted: break n -= 1 Improving bubbleSort Analysis: Best, worst, average cases?
Example: Exponentiation def ourPow(base, expo): if expo == 0: return 1 else: return base * ourPow(base, expo – 1) What is the best case performance? Worst case? Average case? Recursive definition: b n = 1, when n = 0 b n = b * b n-1 otherwise
Faster Exponentiation def fastPow(base, expo): if expo == 0: return 1 elif n % 2 == 1: return base * fastPow(base, expo – 1) else: result = fastPow(base, expo // 2) return result * result What is the best case performance? Worst case? Average case? Recursive definition: b n = 1, when n = 0 b n = b * b n-1, when n is odd b n = (b n/2 ) 2, when n is even
The Fibonacci Series def fib(n): if n == 1 or n == 2: return 1 else: return fib(n – 1) + fib(n – 2) fib(n) = 1, when n = 1 or n = 2 fib(n) = fib(n – 1) + fib(n – 2) otherwise
Tracing fib(5) with a Call Tree fib(5) fib(4) fib(3) fib(2) fib(1)
Work Done – Function Calls fib(5) fib(4) fib(3) fib(2) fib(1) Somewhere between 1 n and 2 n
Memoization def fib(n): if n == 1 or n == 2: return 1 else: return fib(n – 1) + fib (n – 2) Intermediate values returned by the function can be memoized, or saved in a cache, for subsequent access Then they don’t have to be recomputed!
Memoization def fib(n): cache = dict() def fastFib(n): if n == 1 or n == 2: return 1 elif n in cache: return cache[n] else: value = fastFib(n – 1) + fastFib(n – 2) cache[n] = value return value return fastFib(n) The cache is a dictionary whose keys are the arguments of fib and whose values are the values of fib at those keys
Improving on n 2 Sorting Selection sort uses a linear method within a linear method, so it ’ s an O(n 2 ) method Find a way of using a linear method with a method that ’ s better than linear
A Hint from Binary Search Binary search is better than linear, because we divide the problem size by 2 on each step Find a way of dividing the size of sorting problem by 2 on each step, even though each step will itself be linear This should produce an O(nlogn) algorithm
Quick Sort Select a pivot element (say, the element at the midpoint) Shift all of the smaller values to the left of the pivot, and all of the larger values to the right of the pivot (the linear part) Sort the values to the left and to the right of the pivot (ideally, done logn times)
pivot Step 1: select the pivot (at the midpoint) Step 2: shift the data pivot Trace of Quick Sort
Step 3: sort to the left of the pivot pivot Step 4: sort to the right of the pivot pivot Trace of Quick Sort
Design of Quick Sort: First Cut quickSort(lyst, left, right) if left < right pivotPosition = partition(lyst, left, right) quickSort (lyst, left, pivotPosition - 1); quickSort (lyst, pivotPosition + 1, right)
Design of Quick Sort: First Cut quickSort(lyst, left, right) if left < right pivotPosition = partition(lyst, left, right) quickSort (lyst, left, pivotPosition - 1); quickSort (lyst, pivotPosition + 1, right) This version selects the midpoint element as the pivot The position of the pivot might change during the shifting of data partition(lyst, left, right) pivotValue = lyst[(left + right) // 2] shift smaller values to left of pivotValue shift larger values to right of pivotValue return pivotPosition
Implementation of Partition def partition(lyst, left, right): # Find the pivot and exchange it with the last item middle = (left + right) // 2 pivot = lyst[middle] lyst[middle] = lyst[right] lyst[right] = pivot # Set boundary point to first position boundary = left # Move items less than pivot to the left for index in range(left, right): if lyst[index] < pivot: swap(lyst, index, boundary) boundary += 1 # Exchange the pivot item and the boundary item swap(lyst, right, boundary) return boundary The number of comparisons required to shift values in each sublist is equal to the size of the sublist.
def quickSort(lyst): def recurse(left, right): if left < right: pivotPosition = partition(lyst, left, right) recurse(left, pivotPosition - 1); recurse(pivotPosition + 1, right) def partition(lyst, left, right): # Find the pivot and exchange it with the last item middle = (left + right) // 2 pivot = lyst[middle] lyst[middle] = lyst[right] lyst[right] = pivot # Set boundary point to first position boundary = left # Move items less than pivot to the left for index in range(left, right): if lyst[index] < pivot: swap(lyst, index, boundary) boundary += 1 # Exchange the pivot item and the boundary item swap(lyst, right, boundary) return boundary recurse(0, len(lyst) – 1)
The number of comparisons in the top-level call is n Complexity Analysis The sum of the comparisons in the two recursive calls is also n The sum of the comparisons in the four recursive calls beneath these is also n, etc. Thus, the total number of comparisons equals n * the number of times the list must be subdivided
How Many Times Must the Array Be Subdivided? It depends on the data and on the choice of the pivot element Ideally, when the pivot is the median on each call, the list is subdivided log 2 n times Best-case behavior is O(nlogn)
Call Tree For a Best Case We select the midpoint element as the pivot. The median element happens to be at the midpoint on each call. But the list was already sorted!
Worst Case What if the value at the midpoint is near the largest value on each call? Or near the smallest value on each call? Then there will be approximately n subdivisions, and quick sort will degenerate to O(n 2 )
Call Tree For a Worst Case We select the first element as the pivot. The smallest element happens to be the first one on each call. n subdivisions!
Other Methods of Selecting the Pivot Element Pick a random element Pick the median of the first three elements Pick the median of the first, middle, and last elements Pick the median element - not!! This is an O(n) algorithm
For Friday Working with the Array Data Structure Chapter 4