CSE 326: Data Structures Asymptotic Analysis

Slides:

Advertisements

Similar presentations

CSE 373: Data Structures and Algorithms Lecture 5: Math Review/Asymptotic Analysis III 1.

Advertisements

CSE332: Data Abstractions Lecture 2: Math Review; Algorithm Analysis Tyler Robison Summer

CSE 326 Asymptotic Analysis David Kaplan Dept of Computer Science & Engineering Autumn 2001.

CSE 326: Data Structures Introduction 1Data Structures - Introduction.

Cmpt-225 Algorithm Efficiency.

CSE 326: Data Structures Lecture #2 Analysis of Algorithms Alon Halevy Fall Quarter 2000.

CSE373: Data Structures and Algorithms Lecture 4: Asymptotic Analysis Aaron Bauer Winter 2014.

COMP s1 Computing 2 Complexity

Algorithm Analysis & Complexity We saw that a linear search used n comparisons in the worst case (for an array of size n) and binary search had logn comparisons.

Mathematics Review and Asymptotic Notation

Analysis of Algorithms

Asymptotic Analysis-Ch. 3

CSE373: Data Structures and Algorithms Lecture 4: Asymptotic Analysis Catie Baker Spring 2015.

Program Efficiency & Complexity Analysis. Algorithm Review An algorithm is a definite procedure for solving a problem in finite number of steps Algorithm.

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis Lauren Milne Summer 2015.

Algorithm Analysis CS 400/600 – Data Structures. Algorithm Analysis2 Abstract Data Types Abstract Data Type (ADT): a definition for a data type solely.

1/6/20161 CS 3343: Analysis of Algorithms Lecture 2: Asymptotic Notations.

Asymptotic Performance. Review: Asymptotic Performance Asymptotic performance: How does algorithm behave as the problem size gets very large? Running.

CSE 373: Data Structures and Algorithms

CS 150: Analysis of Algorithms. Goals for this Unit Begin a focus on data structures and algorithms Understand the nature of the performance of algorithms.

CSE 326: Data Structures Asymptotic Analysis Hannah Tang and Brian Tjaden Summer Quarter 2002.

CSE373: Data Structures and Algorithms Lecture 4: Asymptotic Analysis Linda Shapiro Winter 2015.

1 Chapter 2 Algorithm Analysis All sections. 2 Complexity Analysis Measures efficiency (time and memory) of algorithms and programs –Can be used for the.

1 Chapter 2 Algorithm Analysis Reading: Chapter 2.

Data Structures I (CPCS-204) Week # 2: Algorithm Analysis tools Dr. Omar Batarfi Dr. Yahya Dahab Dr. Imtiaz Khan.

Algorithm Analysis 1.

Chapter 2 Algorithm Analysis

Introduction to Analysis of Algorithms

Analysis of Algorithms

GC 211:Data Structures Week 2: Algorithm Analysis Tools

COMP108 Algorithmic Foundations Algorithm efficiency

Introduction to Algorithms

CSE373: Data Structures and Algorithms Lecture 3: Asymptotic Analysis

CSE 373: Data Structures and Algorithms Pep Talk; Algorithm Analysis

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis Catie Baker Spring 2015.

CS 3343: Analysis of Algorithms

CSE 332: Data Abstractions Asymptotic Analysis

Growth of functions CSC317.

Analysis of Algorithms

CS 3343: Analysis of Algorithms

CSE 332: Abstractions Stacks and Queues

Algorithm Analysis (not included in any exams!)

CSE373: Data Structures and Algorithms Lecture 4: Asymptotic Analysis

Algorithm design and Analysis

CSC 413/513: Intro to Algorithms

Introduction to Algorithms Analysis

Analysis of Algorithms

CSE373: Data Structures and Algorithms Lecture 3: Asymptotic Analysis

Data Structures Review Session

CS 3343: Analysis of Algorithms

Analysis of Algorithms

CS 201 Fundamental Structures of Computer Science

CSE 373 Data Structures Lecture 5

Searching, Sorting, and Asymptotic Complexity

CSE 373: Data Structures & Algorithms

CSE 373 Data Structures and Algorithms

Mathematical Background 2

CSE 332: Data Abstractions Leftover Asymptotic Analysis

Introduction to Discrete Mathematics

CSE 373 Optional Section Led by Yuanwei, Luyi Apr

CSE 326: Data Structures Lecture #6 Asymptotic Analysis

At the end of this session, learner will be able to:

David Kauchak cs161 Summer 2009

CS210- Lecture 2 Jun 2, 2005 Announcements Questions

CSE 373: Data Structures and Algorithms

Estimating Algorithm Performance

Algorithm Course Dr. Aref Rashad

Analysis of Algorithms

Algorithms and data structures: basic definitions

Presentation transcript:

CSE 326: Data Structures Asymptotic Analysis Ben Lerner Summer 2007 Lecture 2

Office Hours, etc. Ben Lerner Tue 10:45-11:45 CSE 212 (except today) The plan so far… Ben Lerner Tue 10:45-11:45 CSE 212 (except today) Ioannis Giotis Mon 10:45-11:45 CSE 220 Or by appointment. (Comments? Conflicts?) TODO : Subscribe to mailing list if you haven’t! 2/25/2019

Comparing Two Algorithms GOAL: Sort a list of names as fast as possible “Who cares how I do it, I’ll add more memory!” “I’ll use C++ instead of Java – wicked fast!” “I’ll buy a faster CPU” “Ooh look, the –O4 flag!” “Can’t I just get the data pre-sorted??” Why bother? 2/25/2019

Comparing Two Algorithms What we want: Rough Estimate Ignores Details Really, independent of details Coding tricks, CPU speed, compiler optimizations, … These would help any algorithms equally Don’t just care about running time – not a good enough measure 2/25/2019

Big-O Analysis Ignores “details” What details? CPU speed Programming language used Amount of memory Compiler Order of input Size of input … sorta. 2/25/2019

Analysis of Algorithms Efficiency measure how long the program runs time complexity how much memory it uses space complexity Why analyze at all? Decide what algorithm to implement before actually doing it Given code, get a sense for where bottlenecks must be, without actually measuring it We talked last time about efficiency. Let’s refine this further. Have an idea where potential bottlenecks are/will be Insight : alternative, better algorithms Confidence: algorithm will work well in practice : gives you boss a reason to pay you right away! 2/25/2019

One detail we won’t ignore – problem size, # of input elements Asymptotic Analysis One “detail” won’t ignore – problem size, # elements One detail we won’t ignore – problem size, # of input elements Complexity as a function of input size n T(n) = 4n + 5 T(n) = 0.5 n log n - 2n + 7 T(n) = 2n + n3 + 3n What happens as n grows? Asymptotic = performance as N -> infinity 2/25/2019

Why Asymptotic Analysis? Most algorithms are fast for small n Time difference too small to be noticeable External things dominate (OS, disk I/O, …) BUT n is often large in practice Databases, internet, graphics, … Time difference really shows up as n grows! 2/25/2019

Exercise - Searching 2 3 5 16 37 50 73 75 126 bool ArrayFind( int array[], int n, int key){ // Insert your algorithm here } Ultimately we want to analyze algorithms, so let’s generate an algorithm to try out. The point of these “Hannah takes a break” series of slides is to encourage students to come up with the answers themselves. What I say should be minimal, and there really shouldn’t be any point in the lecture when I present the “right answer”, because it encourages the students not to say anything if they come to expect this. (n = 0) Hopefully, students will only pick linear and binary search … What algorithm would you choose to implement this code snippet? 2/25/2019

Analyzing Code Basic Java operations Consecutive statements Conditionals Loops Function calls Recursive functions Constant time Sum of times Larger branch plus test Sum of iterations Cost of function body Solve recurrence relation Okay, now that we have some algorithms to serve as examples, let’s analyze them! Here’s some hints before we begin … Best case Worst Case (what are these?) Let’s try it! 2/25/2019

Linear Search Analysis bool LinearArrayFind(int array[], int n, int key ) { for( int i = 0; i < n; i++ ) { if( array[i] == key ) // Found it! return true; } return false; Best Case: 3 Worst Case: 2n+1 T(n) = n (we are looking for exact runtimes) Best = 3, Worst = 2n + 1 [note that average depends on if found] Best: T(n) = 4, when at [0] Worst: T(n) = 3n+2 2/25/2019

Binary Search Analysis bool BinArrayFind( int array[], int low, int high, int key ) { // The subarray is empty if( low > high ) return false; // Search this subarray recursively int mid = (high + low) / 2; if( key == array[mid] ) { return true; } else if( key < array[mid] ) { return BinArrayFind( array, low, mid-1, key ); } else { return BinArrayFind( array, mid+1, high, key ); } Best case: 4 Worst case: log n? Uh-oh. We don’t know how to calculate the runtime exactly (although 142/3 should have taught us that it’s O( log n ) Let’s go to the next slide. We’ll come back and fill out these numbers later. Runtime: T(n) = 4 log_2 n + 2 Best = 4, Worst = 4 log_2 n + 2, Most of the time = 4 log_2 n + 2 Best: 4 when at [mid] Worst: recursion! 4 log n + 4 2/25/2019

Solving Recurrence Relations For problem of size n, Time to get ready for recursive call (4) + time for that call. T(n) = 4 + T(n/2); T(1) = 4 Determine the recurrence relation. What is the base case(s)? “Expand” the original relation to find an equivalent general expression in terms of the number of expansions. Find a closed-form expression by setting the number of expansions to a value which reduces the problem to a base case By repeated substitution, until see pattern: T(n) = 4 + (4 + T( n/4 ) ) = 4 + (4 + (4 + T(n/8))) = 4*3 + T(n/23) = … = 4k + T(n/2k) 1. T(n) = 4 + T( floor(n/2) ) T(1) = 2 2. T(n) = 4 + (4 + T( floor(n/4) ) ) = 8 + T( floor(n/4) ) T(n) = 4 + (4 + (4 + T( floor(n/8) ) ) = 12 + T( floor(n/8) ) So, if k is the number of expansions, the general expression is: T(n) = 4k + T( floor( n/(2^k) ) ) 3. Since the base case is n = 1, we need to find a value of k such that the value of floor( n/(2^k) ) is 1 (which removes T(n) from the RHS of the equation, thus giving us a closed-form expression). A value of k = log_2 n gives us this. Setting k = log_2 n, we have: T(n) = 4 log_2 n + T(1) T(n) = 4 log_2 n + 2 Want: n/2^k = 1, mult by 2k, Need (n/2k) = 1, mult by 2^k So k = log n (all logs to base 2) Hence, T(n) = 4 log n + 4 2/25/2019

Linear Search vs Binary Search Best Case 4 at [0] 4 at [middle] Worst Case 3n+2 4 log n + 4 4 at [0] 4 at [mid] 4 log n + 4 Linear Search: 3, 2n+1 Binary Search: 4, 4logN + 2 Some students will probably say “Binary search, because it’s O( log n ), whereas linear search is O( n )”. But the point of this discussion is that big-O notation obscures some important factors (like constants!) and we really don’t know the input size. To make a meaningful comparison, we need to know more information. What information might that be? (1) what our priorities are (runtime? Memory footprint?) (2) what the input size is (or, even better, what the input is!) (3) our platform/machine – we saw on the earlier chart that architecture made a difference! (4) other … Big-O notation gives us a way to compare algorithms without this information, but the cost is a loss of precision. 3n+2 Depends on constants, input size, good/bad input for alg., machine, … So … which algorithm is better? What tradeoffs can you make? 2/25/2019

Fast Computer vs. Slow Computer Pentium IV was newer/faster machine The y-axis is time – lower is better With the same algorithm, the faster machine wins out With same algorithm, faster machine wins!! 2/25/2019

Fast Computer vs. Smart Programmer (round 1) With different algorithms, constants matter. Does linear search beat out binary search? With different algorithms, constants matter!! 2/25/2019

Fast Computer vs. Smart Programmer (round 2) Binary search wins out – eventually Eventually, better algorithm wins!! 2/25/2019

Asymptotic Analysis Asymptotic analysis looks at the order of the running time of the algorithm A valuable tool when the input gets “large” Ignores the effects of different machines or different implementations of the same algorithm Intuitively, to find the asymptotic runtime, throw away the constants and low-order terms Linear search is T(n) = 3n + 2  O(n) Binary search is T(n) = 4 log2n + 4  O(log n) Okay, so the point of all those pretty pictures is to show that, while constants matter, they don’t really matter as much as the order for “sufficiently large” input (“large” depends, of course, on the constants). Bases don’t matter, more in a sec Remember: the fastest algorithm has the slowest growing function for its runtime 2/25/2019

Asymptotic Analysis Eliminate coefficients Eliminate low order terms 0.5 n log n + 2n + 7  n3 + 2n + 3n  Eliminate coefficients 4n  0.5 n log n  n log n2 => 4n 0.5 n log n 2^n --- n n log n = 2 n log n = n log n We didn’t get very precise in our analysis of the UWID info finder; why? Didn’t know the machine we’d use. Is this always true? Do you buy that coefficients and low order terms don’t matter? When might they matter? (Linked list memory usage) Any base x log is equivalent to a base 2 log within a constant factor log_A B = log_x B ------------- log_x A log_x B = log_2 B ------------- log_2 x 2/25/2019

Properties of Logs We will assume logs in base 2 unless explicitly otherwise log AB = log A + log B Proof: Similarly, log(A/B) = log A – log B log(AB) = B log A Any base k log is equivalent to log-base-2 2/25/2019

Order Notation: Intuition Why do we eliminate these constants, etc? This is not the whole picture! f(n) = n3 + 2n2 g(n) = 100n2 + 1000 Although not yet apparent, as n gets “sufficiently large”, f(n) will be “greater than or equal to” g(n) 2/25/2019

Definition of Order Notation Upper bound: T(n) = O(f(n)) Big-O Exist positive constants c and n’ such that T(n)  c f(n) for all n  n’ Lower bound: T(n) = (g(n)) Omega T(n)  c g(n) for all n  n’ Tight bound: T(n) = (f(n)) Theta When both hold: T(n) = O(f(n)) T(n) = (f(n)) We’ll use some specific terminology to describe asymptotic behavior. There are some analogies here that you might find useful. 2/25/2019

Definition of Order Notation Back to our two functions f and g from before Definition of Order Notation O( f(n) ) : a set or class of functions g(n)  O( f(n) ) iff there exist positive consts c and n0 such that: g(n)  c f(n) for all n  n0 Example: 100n2 + 1000  5 (n3 + 2n2) for all n  19 So g(n)  O( f(n) ) Sometimes, you’ll see the notation g(n) = O(f(n)). This is equivalent to g(n)  O(f(n)). Remember: notation O(f(n)) = g(n) is meaningless! 2/25/2019

Notation Notes Sometimes you’ll see g(n) = O( f(n) ) This is equivalent to g(n)  O( f(n) ) But note that the reverse O( f(n) ) = g(n) Is MEANINGLESS 2/25/2019

Order Notation: Example Wait, crossover point at 100, not 19? Point is- big_O captures the relationship – doesn’t tell you exactly where the crossover point is. If we pick c =1, then n0=100 Whoa, what happened here? The picture seems to indicate that the “crossover point” happens around 95, whereas our inequality seems to indicate that the crossover happens at 19! The point we want to make is that big-O notation captures a relationship between f(n) and g(n) (ie, the fact that f(n) is “greater than or equal to” g(n)), not that it captures the actual constants that describe when the “crossover” happens. Remember, in big-O notation, the constants on the two functions don’t really matter. For c = 1, the crossover happens at n = 100 exactly 100n2 + 1000  5 (n3 + 2n2) for all n  19 So f(n)  O( g(n) ) 2/25/2019

Big-O: Common Names constant: O(1) logarithmic: O(log n) (logkn, log n2  O(log n)) linear: O(n) log-linear: O(n log n) quadratic: O(n2) cubic: O(n3) polynomial: O(nk) (k is a constant) exponential: O(cn) (c is a constant > 1) 2/25/2019

Meet the Family O( f(n) ) is the set of all functions asymptotically less than or equal to f(n) o( f(n) ) is the set of all functions asymptotically strictly less than f(n) ( f(n) ) is the set of all functions asymptotically greater than or equal to f(n) ( f(n) ) is the set of all functions asymptotically strictly greater than f(n) ( f(n) ) is the set of all functions asymptotically equal to f(n) ( f(n) ) = theta 2/25/2019

Meet the Family, Formally g(n)  O( f(n) ) iff There exist c and n0 such that g(n)  c f(n) for all n  n0 g(n)  o( f(n) ) iff There exists a n0 such that g(n) < c f(n) for all c and n  n0 g(n)  ( f(n) ) iff There exist c and n0 such that g(n)  c f(n) for all n  n0 g(n)  ( f(n) ) iff There exists a n0 such that g(n) > c f(n) for all c and n  n0 g(n)  ( f(n) ) iff g(n)  O( f(n) ) and g(n)  ( f(n) ) Equivalent to: limn g(n)/f(n) = 0 Note how they all look suspiciously like copy-and-paste definitions … Make sure that they notice the only difference between the relations -- less-than, less-than-or-equal, etc. Equivalent to: limn g(n)/f(n) =  Note: only inequality change, or (there exists c) or (for all c). 2/25/2019

Big-Omega et al. Intuitively Asymptotic Notation Mathematics Relation O     = o <  > In fact, it’s not just the intuitive chart, but it’s the chart of definitions! Notice how similar the formal definitions all were … they only differed in the relations which we highlighted in blue! 2/25/2019

Pros and Cons of Asymptotic Analysis Let’s do a think-pair-share: Have the students (in teams of 2 or 3) come up with some of the pros and cons of asymptotic analysis Have them come together and share as a group Some points I hope to get out: Asymptotic analysis is useful for quick-and-and-dirty comparisons of algorithms They allow us to talk about algorithms separate from architecture But in order to be so powerful, we sacrifice precision. Also don’t contrast implementation complexity. Pros: quick-and-dirty comparison separates alg from architecture Cons: less precise (lose consts) separates alg from arch. (bad for graphics etc) doesn’t capture implementation complexity 2/25/2019

Perspective: Kinds of Analysis Running time may depend on actual data input, not just length of input Distinguish worst case your worst enemy is choosing input best case average case assumes some probabilistic distribution of inputs amortized average time over many operations We already discussed the bound flavor. All of these can be applied to any analysis case. For example, we’ll later prove that sorting in the worst case takes at least n log n time. That’s a lower bound on a worst case. Average case is hard! What does “average” mean. For example, what’s the average case for searching an unordered list (as precise as possible, not asymptotic). WRONG! It’s about n, not 1/2 n. Why? You have to search the whole thing if the elt is not there. Note there’s two senses of tight. I’ll try to avoid the terminology “asymptotically tight” and stick with the lower def’n of tight. O(inf) is not tight! 2/25/2019

Types of Analysis Two orthogonal axes: bound flavor analysis case All of these can be applied to any analysis case Two orthogonal axes: bound flavor upper bound (O, o) lower bound (, ) asymptotically tight () analysis case worst case (adversary) average case best case “amortized” We already discussed the bound flavor. All of these can be applied to any analysis case. For example, we’ll later prove that sorting in the worst case takes at least n log n time. That’s a lower bound on a worst case. Average case is hard! What does “average” mean. For example, what’s the average case for searching an unordered list (as precise as possible, not asymptotic). WRONG! It’s about n, not 1/2 n. Why? You have to search the whole thing if the elt is not there. Note there’s two senses of tight. I’ll try to avoid the terminology “asymptotically tight” and stick with the lower def’n of tight. O(inf) is not tight! 2/25/2019

16n3log8(10n2) + 100n2 = O(n3log n) Eliminate low-order terms Eliminate constant coefficients 16n3log8(10n2) + 100n2 16n3log8(10n2) n3log8(10n2) n3(log8(10) + log8(n2)) n3log8(10) + n3log8(n2) n3log8(n2) 2n3log8(n) n3log8(n) n3log8(2)log(n) n3log(n)/3 n3log(n) 2/25/2019