SIGCSE Tradeoffs, intuition analysis, understanding big-Oh aka O-notation Owen Astrachan
SIGCSE Analysis l Vocabulary to discuss performance and to reason about alternative algorithms and implementations It’s faster! It’s more elegant! It’s safer! It’s cooler! l Use mathematics to analyze the algorithm, l Implementation is another matter cache, compiler optimizations, OS, memory,…
SIGCSE What do we need? l Need empirical tests and mathematical tools Compare by running 30 seconds vs. 3 seconds, 5 hours vs. 2 minutes Two weeks to implement code l We need a vocabulary to discuss tradeoffs
SIGCSE Analyzing Algorithms l Three solutions to online problem sort1 Sort strings, scan looking for runs Insert into Set, count each unique string Use map of (String,Integer) to process l We want to discuss trade-offs of these solutions Ease to develop, debug, verify Runtime efficiency Vocabulary for discussion
SIGCSE What is big-Oh about? l Intuition: avoid details when they don’t matter, and they don’t matter when input size (N) is big enough For polynomials, use only leading term, ignore coefficients: linear, quadratic y = 3x y = 6x-2 y = 15x + 44 y = x 2 y = x 2 -6x+9 y = 3x 2 +4x
SIGCSE O-notation, family of functions first family is O(n), the second is O(n 2 ) Intuition: family of curves, same shape More formally: O(f(n)) is an upper- bound, when n is large enough the expression cf(n) is larger Intuition: linear function: double input, double time, quadratic function: double input, quadruple the time
SIGCSE Reasoning about algorithms l We have an O(n) algorithm, For 5,000 elements takes 3.2 seconds For 10,000 elements takes 6.4 seconds For 15,000 elements takes ….? l We have an O(n 2 ) algorithm For 5,000 elements takes 2.4 seconds For 10,000 elements takes 9.6 seconds For 15,000 elements takes …?
SIGCSE More on O-notation, big-Oh l Big-Oh hides/obscures some empirical analysis, but is good for general description of algorithm Compare algorithms in the limit 20N hours v. N 2 microseconds: which is better?
SIGCSE More formal definition O-notation is an upper-bound, this means that N is O(N), but it is also O(N 2 ) ; we try to provide tight bounds. Formally: A function g(N) is O(f(N)) if there exist constants c and n such that g(N) n cf(N) g(N) x = n
SIGCSE Big-Oh calculations from code l Search for element in array: What is complexity (using O-notation)? If array doubles, what happens to time? for(int k=0; k < a.length; k++) { if (a[k].equals(target)) return true; } return false; l Best case? Average case? Worst case?
SIGCSE Measures of complexity l Worst case Good upper-bound on behavior Never get worse than this l Average case What does average mean? Averaged over all inputs? Assuming uniformly distributed random data?
SIGCSE Some helpful mathematics l … + N N(N+1)/2 = N 2 /2 + N/2 is O(N 2 ) l N + N + N + …. + N (total of N times) N*N = N 2 which is O(N 2 ) l … + 2 N 2 N+1 – 1 = 2 x 2 N – 1 which is O(2 N )
SIGCSE instructions/sec, runtimes N O(log N)O(N)O(N log N)O(N 2 ) , , min 100, hr 1,000, day 1,000,000, min18.3 hr318 centuries
SIGCSE Multiplying and adding big-Oh l Suppose we do a linear search then we do another one What is the complexity? If we do 100 linear searches? If we do n searches on an array of size n?
SIGCSE Multiplying and adding l Binary search followed by linear search? What are big-Oh complexities? Sum? 50 binary searches? N searches? l What is the number of elements in the list (1,2,2,3,3,3)? What about (1,2,2, …, n,n,…,n)? How can we reason about this?
SIGCSE Reasoning about big-O l Given an n-list: (1,2,2,3,3,3, …n,n,…n) If we remove all n’s, how many left? If we remove all larger than n/2?
SIGCSE Reasoning about big-O l Given an n-list: (1,2,2,3,3,3, …n,n,…n) If we remove every other element? If we remove all larger than n/1000? Remove all larger than square root n?
SIGCSE Money matters while (n != 0) { count++; n = n/2; } l Penny on a statement each time executed What’s the cost?
SIGCSE Money matters void stuff(int n){ for(int k=0; k < n; k++) System.out.println(k); } while (n != 0) { stuff(n); n = n/2; } l Is a penny always the right cost/statement? What’s the cost ?
SIGCSE Find k-th largest l Given an array of values, find k-th largest 0 th largest is the smallest How can I do this? l Do it the easy way… l Do it the fast way …
SIGCSE Easy way, complexity? public int find(int[] list, int index) { Arrays.sort(list); return list[index]; }
SIGCSE Fast way, complexity? public int find(int[] list, int index) { return findHelper(list,index, 0, list.length-1); }
SIGCSE Fast way, complexity? private int findHelper(int[] list, int index, int first, int last) { int lastIndex = first; int pivot = list[first]; for(int k=first+1; k <= last; k++){ if (list[k] <= pivot){ lastIndex++; swap(list,lastIndex,k); } swap(list,lastIndex,first);
SIGCSE Continued… if (lastIndex == index) return list[lastIndex]; else if (index < lastIndex ) return findHelper(list,index, first,lastIndex-1); else return findHelper(list,index, lastIndex+1,last);
SIGCSE Recurrences int length(ListNode list) { if (0 == list) return 0; else return 1+length(list.getNext()); } l What is complexity? justification? l T(n) = time of length for an n-node list T(n) = T(n-1) + 1 T(0) = O(1)
SIGCSE Recognizing Recurrences l T must be explicitly identified l n is measure of size of input/parameter T(n) time to run on an array of size n T(n) = T(n/2) + O(1) binary search O( log n ) T(n) = T(n-1) + O(1) sequential search O( n )
SIGCSE Recurrences T(n) = 2T(n/2) + O(1) tree traversal O( n ) T(n) = 2T(n/2) + O(n) quicksort O( n log n) T(n) = T(n-1) + O(n) selection sort O( n 2 )
SIGCSE Compute b n, b is BigInteger l What is complexity using BigInteger.pow? What method does it use? How do we analyze it? l Can we do exponentiation ourselves? Is there a reason to do so? What techniques will we use?
SIGCSE Correctness and Complexity? public BigInteger pow(BigInteger b, int expo) { if (expo == 0) { return BigInteger.ONE; } BigInteger half = pow(b,expo/2); half = half.multiply(half); if (expo % 2 == 0) return half; else return half.multiply(b); }
SIGCSE Correctness and Complexity? public BigInteger pow(BigInteger b, int expo) { BigInteger result =BigInteger.ONE; while (expo != 0){ if (expo % 2 != 0){ result = result.multiply(b); } expo = expo/2; b = b.multiply(b); } return result; }
SIGCSE Solve and Analyze l Given N words (e.g., from a file) What are 20 most frequently occurring? What are k most frequently occurring? l Proposals? Tradeoffs in efficiency Tradeoffs in implementation