Algorithm design and Analysis Focus: developing algorithms abstractly Independent of programming language, data types, etc. Think of a stack or queue: not specific to C++, but can be implemented when needed Addressed in depth during COSC 320 Develop mathematical tools to analyze the costs of the algorithms Time Space
Big-O Notation Big-O notation: a way to formally quantify the complexity (cost and/or space) of a function Definition: A function 𝑓 is said to be “big-oh” of a function 𝑔, denoted 𝑓= 𝑂 𝑔 if there exist constants 𝐶,𝑘>0 such that 𝑓 𝑥 ≤𝐶|𝑔 𝑥 | for all 𝑥> 𝑘. In words: eventually, some multiple of the function 𝑔 will outgrow the function 𝑓. Functions here will typically denote the runtime of an algorithm. The argument will denote the size of the input Big-O will capture the growth of the algorithms runtime, with respect to the size of the input given E.g. How does sorting 100 items compare to sorting 1,000? 10,000? 1,000,000?
Examples 𝑥 2 +𝑥+1 is 𝑂( 𝑥 3 ) but also is 𝑂 𝑥 2 log(𝑥) is 𝑂(𝑥) 𝑥 2 is not 𝑂(𝑥) log(𝑥) is 𝑂(𝑥) A function can also be 𝑂(1) – meaning does not grow faster than a constant: upper-bounded for all 𝑥! “linear” vs “quadratic” vs “cubic” Polynomials grow according to the highest power
Big-o captures “growth” Note: constants don’t matter Note: adding functions with “smaller” growth rates don’t matter If 𝑓 1 is 𝑂( 𝑔 1 ) and 𝑓 2 is 𝑂( 𝑔 2 ) then 𝑓 1 + 𝑓 2 is 𝑂( max 𝑔 1 , 𝑔 2 ) Algorithmically: if we run two algorithms, each with its own growth, the total time cost asymptotically is only the larger of the two Put another way: adding the “same” or “less” runtime to an algorithm is “free”!
Example: Searching a Sorted List We have seen one method already: “linear search” Given: a list (or array, or similar structure of data) with the requirement that it is already sorted in either increasing or decreasing order Algorithm: Start at the beginning, while the item is not found, scan through the list If we find the item, return “true”. Otherwise return “false” once we get to the end. Runtime: if there are 𝑛 items in the list, the total time cost is 𝑂(𝑛) Won’t be exactly 𝑛 because we need to manage temp variables, function calls, etc. These things only add a constant cost increase, which is less than 𝑛 !
Method One: Linear Search Idea: check each element in the array, stop if you find the one you want Given array: [1, 15, 6, 40, 35, 99] Target: 40 i = 2 i = 3 i = 1 i = 0 Found! Returns i = 3 1 15 6 40 35 99
Better Method: Binary Search Given: a sorted array (increasing, for definiteness), target x Algorithm: Use three variables: “left”, “right” and “middle” Start ”left” at index 0, “right” and index length – 1 Start middle at length/2 If array[middle] == x Return true; Else If array[length/2] < x Set “left” to middle Else Set “right” to middle Repeat until “left” <= “right” or return
1 5 16 40 55 99 Example First: Array gets sorted! Target:55 What about target: 50? What about target: 60? Bottom Middle Top 1 5 16 40 55 99 2 3 4 6
Cost of Binary Search On each iteration of the “while” loop, half of the search range is discarded! If we start with 100 elements, in the worst case, the search space would evolve (approximately) as: 100 -> 50 -> 25 -> 13 -> 7 -> 4 -> 2 -> 1 Only 8 iterations of the while loop! In general, for an array of size n, it will take k iterations, where k is the largest integer such that n/2k ≥ 1 Simplified, k will be about log2(n)! Cost of sorting: About n∙log(n) operations How? We will see later.
Linear vs. Binary Search Both are correct Binary search requires sorting, which takes extra time Only needs to happen once! For multiple queries, then, compare n vs. log(n) time!
SORTING (Slowly) Algorithm: Bubble Sort Given: an unsorted array or list Do Set swap flag to false. For count = 0 through length - 1 If array[count] is greater than array[count + 1] Swap the contents of array[count] and array[count + 1]. Set swap flag to true. While any elements have been swapped.
Bubble Sort Time complexity Each element only moves one space toward it’s proper position on each iteration of the for-loop In the worst case, the for-loop will run about n times for each element, meaning a cost of 𝑂 𝑛∗𝑛 =𝑂 𝑛 2 operations
Selection Sort A (slightly) better approach is selection sort Algorithm: Set “position” = 0 Find smallest element from “position” to the end of the array Swap that smallest element with the one at “position” Increment “position” and repeat until “position” = length
Selection Sort Total complexity: 𝑂 𝑛 2 Proof: Time to find smallest of 𝑛 elements: 𝑂(𝑛) First loop: 𝑂 𝑛 Second loop: 𝑂 𝑛−1 Third loop: 𝑂 𝑛−2 … In general: 𝑘=1 𝑛 𝑘 = 𝑛 𝑛+1 2 = 𝑛 2 +𝑛 2 =𝑂( 𝑛 2 )
Recursion How do we achieve the 𝑛 log (𝑛) complexity promised? Need a new tool: recursive algorithms An algorithm which calls itself Example recursive computation: the Fibonacci sequence The n’th term is defined as the sum of the two previous ones The first is 1, the second is 1 Formally: 𝑓 𝑛 =𝑓 𝑛−1 +𝑓(𝑛−2) 𝑓 0 =1;𝑓 1 =1
Recursion In C++, a recursive solution to compute the n’th Fibonacci number: int fib (int n){ if (n == 1 || n == 0){ return 0; } return fib(n) + fib(n-1);