Analysis CS 367 – Introduction to Data Structures
Overview Why study data structures? –biggest reason for a computer is to store, analyze, and retrieve data –data structures are used to store data –type of data structure chosen effects amount of data that can be stored retrieval speed speed of analysis
Data Structures Data structure selection involves tradeoffs –speed vs. capacity vs. flexibility vs. complexity Consider an array versus a vector –which is faster? –which has a larger storage capacity? –which has more flexibility in data placement? –which is more complex? –which one is better? Must always consider which characteristics are most important for the data being handled
Algorithms Always do some kind of work on data –at least sort or search through it Many different routes to same destination –consider 10 different people writing the same program Picking the best route requires analysis –most concerned with how long an algorithm will take
Analysis Computation Complexity –how much effort and cost is associated with a specific algorithm How do we measure computational complexity? –run algorithm on a set of data and time it? results depend on speed of computer used –a PC vs. a Cray supercomputer results depend on language algorithm written in –C vs. Java vs. Basic
Analysis Need a mathematical technique to analyze algorithms –needs to be free of any real-time units (ms) –should relate logical time unit to data size t = cn t->logical time, c->constant, n->data size notice that time does not need to given in seconds Several different techniques have been suggested –we will only consider a few of them
Big-O Notation Formal, mathematical definition f(n) is O(g(n)) if there exist positive numbers c and N such that f(n) ≤ cg(n) for all n ≥ N. –Wow! That probably makes no sense at all Now let’s put it into English –we want to find some function (g(n)) that is always greater than or equal to the original function (f(n)) for small values of f(n), g(n) may not be equal or greater However, once n reaches a certain size, then g(n) will always be greater than or equal to f(n)
Big-O Notation Still confused? Don’t feel bad, let’s see an example –consider the following function: f(n) = n 2 + 2n + 35 –now consider this next function: g(n) = n 2 –by the previous definition, we need to find a c and an N that satisfy the following: cg(n) ≥ f(n) for all n ≥ N
Big-O Notation So let’s find a c and an N –if c=2 then for what values of n is cg(n) ≥ f(n)? solve the following equation for n: –cn 2 ≥ n 2 + 2n + 35 n ≥ 7 (this means N = 7 from previous definition) it is clear that there are an infinite number of c and N possibilities –what’s important, is g(n) is the same for all of them –but there’s an infinite number of possible g(n), too g(n) = n 2, n 3, n 4, … –pick the smallest g(n) that works
Big-O Notation The previous slide showed that g(n) = n 2 –Hence, our Big-O notation is O(n 2 ) for the function f(n) = n 2 + 2n + 35 Notice that all but the first term of f(n) were discarded to find g(n) –this works because for large n, the largest exponent dominates –consider n=1, n=5, n=100
Analysis So how does all this math help algorithm analysis? –consider the following piece of code for(i=sum=0; i<n; i++) sum += a[i]; –before loop starts, 2 assignments (i and sum) –2 assignments within the loop (sum and i) –equation relating time to n is: f(n) = 2 + 2n –computational complexity is: O(n)
Analysis Consider another example int rowsum[n]; int firstcolsum; for(i=firstcolsum=0; i<n; i++) { firstcolsum += a2d[i][0]; for(j=rowsum[i]=0; j<n; j++) rowsum[i] += a2d[i][j]; } –before loop starts: 2 assignments –inside outer loop: 4 assignments –inside inner loop: 2 assignments –equation: f(n) = 2 + 4n + 2n*n = 2 + 4n + 2n 2 –computational complexity: O(n 2 )
Analysis Analysis can get much more sophisticated Let’s compare 2 array search algorithms –brute force check the first element, then second, third, etc. –binary search requires a sorted array start at the center and divide the array into two if less than center, search lower half if greater than center, search upper half repeat until the number is found
Brute Force Analysis Code int bruteforce(int [] array, int key) { for(int i=0; i<array.length; i++) { if(array[i] == key) return i; } return -1; Analysis –best case: iterations = 1 –worst case: iterations = n –average case: iterations = n/2 –average computational complexity = O(n)
Binary Search Analysis Code int binarySearch(int [] array, int key) { int lo = 0, hi = array.length – 1, mid; while(lo <= hi) { mid = (lo + hi) / 2; if(key < array[mid]) hi = mid – 1; if(key > array[mid]) lo = mid + 1; else return mid; } return -1;
Binary Search Analysis Analysis –best case: iterations = 1 –worst case: iterations = log 2 n –average case: give equation don’t worry about how this was achieved see the book if you want details it should be clear that calculating complexity is not always easy or straightforward –average computational complexity: O(log 2 n)
Big-Ω Notation Formal, mathematical definition f(n) is Ω(g(n)) if there exist positive numbers c and N such that f(n) ≥ cg(n) for all n ≥ N. –only real difference between Big-Ω and Big-O is that Big-O is an upper bound on a function and Big-Ω is a lower bound –all the descriptions of Big-O apply – just reverse the “≤” sign in Big-O to a “≥” sign for Big- Ω
Big-Θ Notation Formal, mathematical definition f(n) is Θ(g(n)) if there exist positive numbers c 1, c 2, and N such that c 1 g(n) ≤ f(n) ≤ c 2 g(n) for all n ≥ N. Again, we consider the following equation –f(n) = n 2 + 2n + 35 We have shown that this is O(n 2 ) It is easy to show that it is also Ω(n 2 ) –if c=1 and g(n)=n 2 then cg(n) ≤ f(n) If c 1 =1, c 2 =2, and N=7, we satisfy the above definition and the computational complexity (using Big-Θ) becomes Θ(n 2 )
Words of Warning There are many other, more sophisticated methods of computational complexity Those shown here do not always paint a complete picture However, these techniques can provide some good insight into an algorithm