CSE 326: Data Structures Lecture #6 Asymptotic Analysis Ashish Sabharwal Summer Quarter 2001
First a Reminder! Don’t forget to turn-in your graded quizes with your homework!!! Policy: If you don’t, you are responsible for doing ALL problems as hw
Analysis of Algorithms Efficiency measure how long the program runs time complexity how much memory it uses space complexity For today, we’ll focus on time complexity only Why analyze at all? Confidence: algorithm will work well in practice Insight : alternative, better algorithms
Time Complexity We count number of abstract steps Not physical runtime in seconds Not every machine instruction What is one abstract step? count = count + 1 y = a*x3 + b*x + c if (n 2 sqrt(m) || n 0.5 sqrt(m)) …
Asymptotic Analysis Complexity as a function of input size n T(n) = 4n + 5 T(n) = 0.5 n log n - 2n + 7 T(n) = 2n + n3 + 3n What happens as n grows?
Why do we care? Most algorithms are fast for small n Time difference too small to be noticeable External things dominate (OS, disk I/O, …) n is typically large in practice Databases, internet, graphics, … Time difference really shows up as n grows!
Rates of Growth Suppose we can execute 1010 ops / sec n=? T(n)=? 104s = 2.8 hrs 1018s = 30 billion years
Obtaining Asymptotic Bounds Eliminate low order terms 4n + 5 4n 0.5 n log n - 2n + 7 0.5 n log n 2n + n3 + 3n 2n Eliminate coefficients 4n n 0.5 n log n n log n n log n2 = 2 n log n n log n We didn’t get very precise in our analysis of the UWID info finder; why? Didn’t know the machine we’d use. Is this always true? Do you buy that coefficients and low order terms don’t matter? When might they matter? (Linked list memory usage)
Race Against Time! Race # 1 2 3 4 5 6 7 T1(n) n3 + 2n2 n0.1 82log n mn3 T2(n) 100n2 + 1000 log n 2n + 10 log n n! 1000n15 3n7 + 7n 2mn Which is faster?
Race 1 n3 + 2n2 vs. 100n2 + 1000
Race 2 n0.1 vs. log n Well, log n looked good out of the starting gate and indeed kept on looking good until about n^17 at which point n^0.1 passed it up forever. Moral of the story? N^epsilon beats log n for any eps > 0. BUT, which one of these is really better?
Race 3 n + 100n0.1 vs. 2n + 10 log n Notice that these just look like n and 2n once we get way out. That’s because the larger terms dominate. So, the left is less, but not asymptotically less. It’s a TIE!
Race 4 5n5 vs. n! N! is BIG!!!
Race 5 n-152n/100 vs. 1000n15 No matter how you put it, any exponential beats any polynomial. It doesn’t even take that long here (~250 input size)
Race 6 82log(n) vs. 3n7 + 7n We can reduce the left hand term to n^6, so they’re both polynomial and it’s an open and shut case.
Race Against Time! (2) Race # 1 2 3 4 5 6 7 T1(n) n3 + 2n2 n0.1 82log n mn3 T2(n) 100n2 + 1000 log n 2n + 10 log n n! 1000n15 3n7 + 7n 2mn Which is faster? T2 : O(n2) T2 : O(log n) Tie: O(n) T1 : O(n5) T2 : O(n15) T1 : O(n6) It depends! Welcome, everyone, to the Silicon Downs. I’m getting race results as we stand here. Let’s start with the first race. I’ll have the first row bet on race #1. Raise your hand if you bet on function #1 (the jockey is n^0.1) So on. Show the race slides after each race.
Typical Growth Rates constant: O(1) logarithmic: O(log n) (logkn, log n2 O(log n)) poly-log: O(logk n) linear: O(n) log-linear: O(n log n) superlinear: O(n1+c) (c is a constant > 0) quadratic: O(n2) cubic: O(n3) polynomial: O(nk) (k is a constant) exponential: O(cn) (c is a constant > 1)
Terminology T(n) O(f(n)) T(n) (f(n)) T(n) (f(n)) constants c and n0 s.t. T(n) c f(n) n n0 1, log n, n, 100n O(n) T(n) (f(n)) constants c and n0 s.t. T(n) c f(n) n n0 n/10, n2, 100 . 2n, n3 log n (n) T(n) (f(n)) T(n) O(f(n)) and T(n) (f(n)) n+4, 2n, 100n, 0.01 n + log n (n)
Terminology (2) T(n) o(f(n)) T(n) (f(n)) T(n) O(f(n)) and T(n) (f(n)) 1, log n, n0.99 o(n) T(n) (f(n)) T(n) (f(n)) and T(n) (f(n)) n1.01, n2, 100 . 2n, n3 log n (n)
Terminology (3) Roughly speaking, the correspondence is O = = o < >
Types of Analysis Three orthogonal axes: bound flavor analysis case upper bound (O, o) lower bound (, ) asymptotically tight () analysis case worst case (adversary) average case best case “common” case analysis quality loose bound (most true analyses) tight bound (no better bound which is asymptotically different) We already discussed the bound flavor. All of these can be applied to any analysis case. For example, we’ll later prove that sorting in the worst case takes at least n log n time. That’s a lower bound on a worst case. Average case is hard! What does “average” mean. For example, what’s the average case for searching an unordered list (as precise as possible, not asymptotic). WRONG! It’s about n, not 1/2 n. Why? You have to search the whole thing if the elt is not there. Note there’s two senses of tight. I’ll try to avoid the terminology “asymptotically tight” and stick with the lower def’n of tight. O(inf) is not tight!
Analyzing Code General guidelines Simple C++ operations - constant time consecutive stmts - sum of times per stmt conditionals - sum of branches and condition loops - sum over iterations function calls - cost of function body Alright, I want to do some code examples, but first, how will we examine code? These are just rules of thumb, not the way you always do it. Sometimes, for ex, it’s important to keep track of which branch of a conditional is followed when (like when analyzing recursive functions).
Simple loops sum = 0 for i = 1 to n do for j = 1 to n do sum = sum + 1 There’s a little twist here. J goes from I to N, not 1 to N. So, let’s do the sums inside is constant. Next loop is sum I to N of 1 which equals N - I + 1 Outer loop is sum 1 to N of N - I + 1 That’s the same as sum N to 1 of I or N(N+1)/2 or O(N^2)
Simple loops (2) sum = 0 for i = 1 to n do for j = i to n do sum = sum + 1 There’s a little twist here. J goes from I to N, not 1 to N. So, let’s do the sums inside is constant. Next loop is sum I to N of 1 which equals N - I + 1 Outer loop is sum 1 to N of N - I + 1 That’s the same as sum N to 1 of I or N(N+1)/2 or O(N^2)
Conditionals and While Loop if C then S1 else S2 Loops while C do S OK, so this isn’t exactly an example. Just reiterating the rule. Time <= time of C plus max of S1 and S2 <= time of C plus S1 plus S2 time <= sum of times of iterations often #of iterations * time of S (or worst time of S)
Recursion Recursion Almost always yields a recurrence Recursive max Example: Factorial fac(n) if n = 0 return 1 else return n * fac(n - 1) T(0) = 1 T(n) c + T(n - 1) if n > 0
Example: Factorial Analysis by simple calculation T(n) T(n) c + c + T(n - 2) (by substitution) T(n) c + c + c + T(n - 3) (by substitution, again) T(n) kc + T(n - k) (extrapolating 0 < k n) T(n) nc + T(0) = nc + b (setting k = n) T(n)
Example: Mergesort Mergesort algorithm If list has 1 element, return Otherwise split list in half, sort first half, sort second half, merge together T(1) = 1 T(n) 2T(n/2) + cn if n > 1 This is the same sort of analysis as last slide. Here’s a function defined in terms of itself. WORK THROUGH Answer: O(n log n) Generally, then, the strategy is to keep expanding these things out until you see a pattern. Then, write the general form. Finally, sub in for the series bounds to make T(?) come out to a known value and solve all the series. Tip: Look for powers/multiples of the numbers that appear in the original equation. Splitting and merging Sorting the two halves recursively
Example: Mergesort (2) Analysis by simple calculation T(n) ? T(n) 2T(n/2) + cn 2(2T(n/4) + c(n/2)) + cn = 4T(n/4) + cn + cn 4(2T(n/8) + c(n/4)) + cn + cn = 8T(n/8) + cn + cn + cn 2kT(n/2k) + kcn (extrapolating 1 < k n) nT(1) + cn log n (for 2k = n or k = log n) T(n) ?
Example: Mergesort (3) Analysis by induction Guess the answer! T(n) an log n + b a,b constants, not known yet Verify base case T(1) b Verify inductive step T(n) 2T(n/2) + cn = 2(an/2 log n/2 + b) + cn = an log n/2 + 2b + cn = (an log n + b) – (an – b – cn) an log n + b for a > c b 1
Summary Determine what characterizes a problem’s input size Express how much resources (time, memory, etc.) an algorithm requires as a function of input size using O(•), (•), (•) worst case best case average case common case