CSE 326: Data Structures Introduction & Part One: Complexity Henry Kautz Autumn Quarter 2002.

Slides:



Advertisements
Similar presentations
Analysis of Algorithms
Advertisements

CSE 373: Data Structures and Algorithms Lecture 5: Math Review/Asymptotic Analysis III 1.
CSE 326: Data Structures Lecture #4 Alon Halevy Spring Quarter 2001.
Fall 2006CENG 7071 Algorithm Analysis. Fall 2006CENG 7072 Algorithmic Performance There are two aspects of algorithmic performance: Time Instructions.
CSE332: Data Abstractions Lecture 2: Math Review; Algorithm Analysis Tyler Robison Summer
Chapter 3 Growth of Functions
CSC401 – Analysis of Algorithms Lecture Notes 1 Introduction
CSE 326 Asymptotic Analysis David Kaplan Dept of Computer Science & Engineering Autumn 2001.
© 2004 Goodrich, Tamassia 1 Lecture 01 Algorithm Analysis Topics Basic concepts Theoretical Analysis Concept of big-oh Choose lower order algorithms Relatives.
Introduction to Analysis of Algorithms
Complexity Analysis (Part I)
CSE 326: Data Structures Lecture #3 Analysis of Recursive Algorithms Alon Halevy Fall Quarter 2000.
Analyzing algorithms & Asymptotic Notation BIO/CS 471 – Algorithms for Bioinformatics.
Analysis of Algorithms1 Estimate the running time Estimate the memory space required. Time and space depend on the input size.
1 CSE 326: Data Structures Program Analysis Lecture 3: Friday, Jan 8, 2003.
© 2006 Pearson Addison-Wesley. All rights reserved6-1 More on Recursion.
1 Data Structures A program solves a problem. A program solves a problem. A solution consists of: A solution consists of:  a way to organize the data.
CSE 326: Data Structures Lecture #2 Analysis of Algorithms Alon Halevy Fall Quarter 2000.
CS3381 Des & Anal of Alg ( SemA) City Univ of HK / Dept of CS / Helena Wong 2. Analysis of Algorithms - 1 Analysis.
Algorithm Analysis CS 201 Fundamental Structures of Computer Science.
Analysis of Algorithms 7/2/2015CS202 - Fundamentals of Computer Science II1.
Analysis of Algorithms COMP171 Fall Analysis of Algorithms / Slide 2 Introduction * What is Algorithm? n a clearly specified set of simple instructions.
Analysis of Algorithms Spring 2015CS202 - Fundamentals of Computer Science II1.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 1 Prepared by İnanç TAHRALI.
CSE373: Data Structures and Algorithms Lecture 4: Asymptotic Analysis Aaron Bauer Winter 2014.
Instructor Neelima Gupta
Algorithm Analysis & Complexity We saw that a linear search used n comparisons in the worst case (for an array of size n) and binary search had logn comparisons.
Week 2 CS 361: Advanced Data Structures and Algorithms
Lecture 8. How to Form Recursive relations 1. Recap Asymptotic analysis helps to highlight the order of growth of functions to compare algorithms Common.
SEARCHING, SORTING, AND ASYMPTOTIC COMPLEXITY Lecture 12 CS2110 – Fall 2009.
1 Chapter 24 Developing Efficient Algorithms. 2 Executing Time Suppose two algorithms perform the same task such as search (linear search vs. binary search)
Analysis of Algorithms1 The Goal of the Course Design “good” data structures and algorithms Data structure is a systematic way of organizing and accessing.
Mathematics Review and Asymptotic Notation
Iterative Algorithm Analysis & Asymptotic Notations
Unit III : Introduction To Data Structures and Analysis Of Algorithm 10/8/ Objective : 1.To understand primitive storage structures and types 2.To.
Analysis of Algorithms
1 COMP3040 Tutorial 1 Analysis of algorithms. 2 Outline Motivation Analysis of algorithms Examples Practice questions.
Searching. RHS – SOC 2 Searching A magic trick: –Let a person secretly choose a random number between 1 and 1000 –Announce that you can guess the number.
Analysis of Algorithms These slides are a modified version of the slides used by Prof. Eltabakh in his offering of CS2223 in D term 2013.
Analyzing algorithms & Asymptotic Notation BIO/CS 471 – Algorithms for Bioinformatics.
Fundamentals CSE 373 Data Structures Lecture 5. 12/26/03Fundamentals - Lecture 52 Mathematical Background Today, we will review: ›Logs and exponents ›Series.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Introduction to Analysis of Algorithms COMP171 Fall 2005.
Algorithm Analysis (Algorithm Complexity). Correctness is Not Enough It isn’t sufficient that our algorithms perform the required tasks. We want them.
Program Efficiency & Complexity Analysis. Algorithm Review An algorithm is a definite procedure for solving a problem in finite number of steps Algorithm.
Algorithm Analysis CS 400/600 – Data Structures. Algorithm Analysis2 Abstract Data Types Abstract Data Type (ADT): a definition for a data type solely.
1 CSE 326 Data Structures: Complexity Lecture 2: Wednesday, Jan 8, 2003.
Java Methods Big-O Analysis of Algorithms Object-Oriented Programming
1 Asymptotic Notations Iterative Algorithms and their analysis Asymptotic Notations –Big O,  Notations Review of Discrete Math –Summations –Logarithms.
Analysis of algorithms. What are we going to learn? Need to say that some algorithms are “better” than others Criteria for evaluation Structure of programs.
CSE 326: Data Structures Asymptotic Analysis Hannah Tang and Brian Tjaden Summer Quarter 2002.
DS.A.1 Algorithm Analysis Chapter 2 Overview Definitions of Big-Oh and Other Notations Common Functions and Growth Rates Simple Model of Computation Worst.
CSE 326: Data Structures Lecture #3 Asymptotic Analysis Steve Wolfman Winter Quarter 2000.
CSE 326: Data Structures Class #4 Analysis of Algorithms III Analysis of Recursive Algorithms Henry Kautz Winter 2002.
Vishnu Kotrajaras, PhD.1 Data Structures
Analysis of Algorithms Spring 2016CS202 - Fundamentals of Computer Science II1.
Chapter 15 Running Time Analysis. Topics Orders of Magnitude and Big-Oh Notation Running Time Analysis of Algorithms –Counting Statements –Evaluating.
Algorithm Analysis 1.
Analysis of Algorithms
Analysis of Algorithms
Analysis of Algorithms
Algorithm Analysis (not included in any exams!)
Algorithm design and Analysis
Introduction to Algorithms Analysis
CS 201 Fundamental Structures of Computer Science
Analysis of Algorithms
CSE 373 Data Structures Lecture 5
At the end of this session, learner will be able to:
Analysis of Algorithms
Presentation transcript:

CSE 326: Data Structures Introduction & Part One: Complexity Henry Kautz Autumn Quarter 2002

Overview of the Quarter Part One: Complexity –inductive proofs of program correctness –empirical and asymptotic complexity –order of magnitude notation; logs & series –analyzing recursive programs Part Two: List-like data structures Part Three: Sorting Part Four: Search Trees Part Five: Hash Tables Part Six: Heaps and Union/Find Part Seven: Graph Algorithms Part Eight: Advanced Topics

Material for Part One Weiss Chapters 1 and 2 Additional material –Graphical analysis –Amortized analysis –Stretchy arrays Any questions on course organization?

Program Analysis Correctness –Testing –Proofs of correctness Efficiency –How to define? –Asymptotic complexity - how running times scales as function of size of input

Proving Programs Correct Often takes the form of an inductive proof Example: summing an array int sum(int v[], int n) { if (n==0) return 0; else return v[n-1]+sum(v,n-1); } What are the parts of an inductive proof?

Inductive Proof of Correctness int sum(int v[], int n) { if (n==0) return 0; else return v[n-1]+sum(v,n-1); } Theorem: sum(v,n) correctly returns sum of 1 st n elements of array v for any n. Basis Step: Program is correct for n=0; returns 0. Inductive Hypothesis (n=k): Assume sum(v,k) returns sum of first k elements of v. Inductive Step (n=k+1): sum(v,k+1) returns v[k]+sum(v,k), which is the same of the first k+1 elements of v.

Proof by Contradiction Assume negation of goal, show this leads to a contradiction Example: there is no program that solves the “halting problem” –Determines if any other program runs forever or not Alan Turing, 1937

Does NonConformist(NonConformist) halt? Yes? That means HALT(NonConformist) = “never halts” No? That means HALT(NonConformist) = “halts” Program NonConformist (Program P) If ( HALT(P) = “never halts” ) Then Halt Else Do While (1 > 0) Print “Hello!” End While End If End Program Contradiction!

Defining Efficiency Asymptotic Complexity - how running time scales as function of size of input Why is this a reasonable definition?

Defining Efficiency Asymptotic Complexity - how running time scales as function of size of input Why is this a reasonable definition? –Many kinds of small problems can be solved in practice by almost any approach E.g., exhaustive enumeration of possible solutions Want to focus efficiency concerns on larger problems –Definition is independent of any possible advances in computer technology

Technology-Depended Efficiency Drum Computers: Popular technology from early 1960’s Transistors too costly to use for RAM, so memory was kept on a revolving magnetic drum An efficient program scattered instructions on the drum so that next instruction to execute was under read head just when it was needed –Minimized number of full revolutions of drum during execution

The Apocalyptic Laptop Seth Lloyd, SCIENCE, 31 Aug 2000 Speed  Energy Consumption E = m c 2 25 million megawatt-hours Quantum mechanics: Switching speed =  h / (2 * Energy) h is Planck’s constant 5.4 x operations per second

Big Bang Ultimate Laptop, 1 year 1 second 1000 MIPS, since Big Bang 1000 MIPS, 1 day

Defining Efficiency Asymptotic Complexity - how running time scales as function of size of input What is “size”? –Often: length (in characters) of input –Sometimes: value of input (if input is a number) Which inputs? –Worst case Advantages / disadvantages ? –Best case Why?

Average Case Analysis More realistic analysis, first attempt: –Assume inputs are randomly distributed according to some “realistic” distribution  –Compute expected running time –Drawbacks Often hard to define realistic random distributions Usually hard to perform math

Amortized Analysis Instead of a single input, consider a sequence of inputs Choose worst possible sequence Determine average running time on this sequence Advantages –Often less pessimistic than simple worst-case analysis –Guaranteed results - no assumed distribution –Usually mathematically easier than average case analysis

Comparing Runtimes Program A is asymptotically less efficient than program B iff the runtime of A dominates the runtime of B, as the size of the input goes to infinity Note: RunTime can be “worst case”, “best case”, “average case”, “amortized case”

Which Function Dominates? n 3 + 2n 2 n 0.1 n + 100n 0.1 5n 5 n n / log n 100n log n 2n + 10 log n n! 1000n 15 3n 7 + 7n

Race I n 3 + 2n 2 100n vs.

Race II n 0.1 log n vs.

Race III n + 100n 0.1 2n + 10 log n vs.

Race IV 5n 5 n! vs.

Race V n n / n 15 vs.

Race VI 8 2log(n) 3n 7 + 7n vs.

Order of Magnitude Notation (big O) Asymptotic Complexity - how running time scales as function of size of input –We usually only care about order of magnitude of scaling Why?

Order of Magnitude Notation (big O) Asymptotic Complexity - how running time scales as function of size of input –We usually only care about order of magnitude of scaling Why? –As we saw, some functions overwhelm other functions So if running time is a sum of terms, can drop dominated terms –“True” constant factors depend on details of compiler and hardware Might as well make constant factor 1

Eliminate low order terms Eliminate constant coefficients

Common Names Slowest Growth constant:O(1) logarithmic:O(log n) linear:O(n) log-linear:O(n log n) quadratic:O(n 2 ) exponential:O(c n )(c is a constant > 1) Fastest Growth superlinear:O(n c )(c is a constant > 1) polynomial:O(n c )(c is a constant > 0)

Summary Proofs by induction and contradiction Asymptotic complexity Worst case, best case, average case, and amortized asymptotic complexity Dominance of functions Order of magnitude notation Next: –Part One: Complexity, continued –Read Chapters 1 and 2

Part One: Complexity, continued Friday, October 4 th, 2002

Determining the Complexity of an Algorithm Empirical measurement Formal analysis (i.e. proofs) Question: what are likely advantages and drawbacks of each approach?

Determining the Complexity of an Algorithm Empirical measurement Formal analysis (i.e. proofs) Question: what are likely advantages and drawbacks of each approach? –Empirical: pro: discover if constant factors are significant con: may be running on “wrong” inputs –Formal: pro: no interference from implementation/hardware details con: can make mistake in a proof! In theory, theory is the same as practice, but not in practice.

Measuring Empirical Complexity: Linear vs. Binary Search Linear SearchBinary Search Time to find one item: Time to find N items: Find a item in a sorted array of length N Binary search algorithm:

My C Code void bfind(int x, int a[], int n) { m = n / 2; if (x == a[m]) return; if (x < a[m]) bfind(x, a, m); else bfind(x, &a[m+1], n-m-1); } for (i=0; i<n; i++) a[i] = i; for (i=0; i<n; i++) lfind(i,a,n); void lfind(int x, int a[], int n) { for (i=0; i<n; i++) if (a[i] == x) return; } or bfind

Graphical Analysis

slope  2 slope  1

Property of Log/Log Plots On a linear plot, a linear function is a straight line On a log/log plot, any polynomial function is a straight line! –The slope  y/  x is the same as the exponent horizontal axis vertical axis slope

slope  1 Why does O(n log n) look like a straight line?

Summary Empirical and formal analyses of runtime scaling are both important techniques in algorithm development Large data sets may be required to gain an accurate empirical picture Log/log plots provide a fast and simple visual tool for estimating the exponent of a polynomial function

Formal Asymptotic Analysis In order to prove complexity results, we must make the notion of “order of magnitude” more precise Asymptotic bounds on runtime –Upper bound –Lower bound

Definition of Order Notation Upper bound:T(n) = O(f(n))Big-O Exist constants c and n’ such that T(n)  c f(n)for all n  n’ Lower bound:T(n) =  (g(n))Omega Exist constants c and n’ such that T(n)  c g(n)for all n  n’ Tight bound: T(n) = θ(f(n))Theta When both hold: T(n) = O(f(n)) T(n) =  (f(n))

Example: Upper Bound

Using a Different Pair of Constants

Example: Lower Bound

Conventions of Order Notation

Upper/Lower vs. Worst/Best Worst case upper bound is f(n) –Guarantee that run time is no more than c f(n) Best case upper bound is f(n) –If you are lucky, run time is no more than c f(n) Worst case lower bound is g(n) –If you are unlikely, run time is at least c g(n) Best case lower bound is g(n) –Guarantee that run time is at least c g(n)

Analyzing Code primitive operations consecutive statements function calls conditionals loops recursive functions

Conditionals Conditional if C then S 1 else S 2 Suppose you are doing a O( ) analysis? Suppose you are doing a  ( ) analysis?

Conditionals Conditional if C then S 1 else S 2 Suppose you are doing a O( ) analysis? Time(C) + Max(Time(S1),Time(S2)) or Time(C)+Time(S1)+Time(S2) or … Suppose you are doing a  ( ) analysis? Time(C) + Min(Time(S1),Time(S2)) or Time(C) or …

Nested Loops for i = 1 to n do for j = 1 to n do sum = sum + 1

Nested Loops for i = 1 to n do for j = 1 to n do sum = sum + 1

Nested Dependent Loops for i = 1 to n do for j = i to n do sum = sum + 1

Nested Dependent Loops for i = 1 to n do for j = i to n do sum = sum + 1

Summary Formal definition of order of magnitude notation Proving upper and lower asymptotic bounds on a function Formal analysis of conditionals and simple loops Next: –Analyzing complex loops –Mathematical series –Analyzing recursive functions

Part One: Complexity, Continued Monday October 7, 2002

Today’s Material Running time of nested dependent loops Mathematical series Formal analysis of linear search Formal analysis of binary search Solving recursive equations Stretchy arrays and the Stack ADT Amortized analysis

Nested Dependent Loops for i = 1 to n do for j = i to n do sum = sum + 1

Nested Dependent Loops for i = 1 to n do for j = i to n do sum = sum + 1

Arithmetic Series Note that: S(1) = 1, S(2) = 3, S(3) = 6, S(4) = 10, … Hypothesis: S(N) = N(N+1)/2 Prove by induction –Base case: for N = 1, S(N) = 1(2)/2 = 1 –Assume true for N = k –Suppose N = k+1. –S(k+1) = S(k) + (k+1) = k(k+1)/2 + (k+1) = (k+1)(k/2 + 1) = (k+1)(k+2)/2.

Other Important Series Sum of squares: Sum of exponents: Geometric series: Novel series: –Reduce to known series, or prove inductively

Nested Dependent Loops for i = 1 to n do for j = i to n do sum = sum + 1

Linear Search Analysis Best case, tight analysis: Worst case, tight analysis: void lfind(int x, int a[], int n) { for (i=0; i<n; i++) if (a[i] == x) return; }

Iterated Linear Search Analysis Easy worst-case upper-bound: Worst-case tight analysis: for (i=0; i<n; i++) a[i] = i; for (i=0; i<n; i++) lfind(i,a,n);

Iterated Linear Search Analysis Easy worst-case upper-bound: Worst-case tight analysis: –Just multiplying worst case by n does not justify answer, since each time lfind is called i is specified for (i=0; i<n; i++) a[i] = i; for (i=0; i<n; i++) lfind(i,a,n);

Analyzing Recursive Programs 1.Express the running time T(n) as a recursive equation 2.Solve the recursive equation For an upper-bound analysis, you can optionally simplify the equation to something larger For a lower-bound analysis, you can optionally simplify the equation to something smaller

Binary Search void bfind(int x, int a[], int n) { m = n / 2; if (x == a[m]) return; if (x < a[m]) bfind(x, a, m); else bfind(x, &a[m+1], n-m-1); } What is the worst-case upper bound?

Binary Search void bfind(int x, int a[], int n) { m = n / 2; if (x == a[m]) return; if (x < a[m]) bfind(x, a, m); else bfind(x, &a[m+1], n-m-1); } What is the worst-case upper bound? Trick question: 

Binary Search void bfind(int x, int a[], int n) { m = n / 2; if (n <= 1) return; if (x == a[m]) return; if (x < a[m]) bfind(x, a, m); else bfind(x, &a[m+1], n-m-1); } Okay, let’s prove it is  (log n)…

Binary Search Introduce some constants… b = time needed for base case c = time needed to get ready to do a recursive call Running time is thus: void bfind(int x, int a[], int n) { m = n / 2; if (n <= 1) return; if (x == a[m]) return; if (x < a[m]) bfind(x, a, m); else bfind(x, &a[m+1], n-m-1); }

Binary Search Analysis One sub-problem, half as large Equation: T(1)  b T(n)  T(n/2) + c for n>1 Solution: T(n)  T(n/2) + cwrite equation  T(n/4) + c + cexpand  T(n/8) + c + c + c  T(n/2 k ) + kcinductive leap  T(1) + c log n where k = log nselect value for k  b + c log n = O(log n)simplify

Solving Recursive Equations by Repeated Substitution Somewhat “informal”, but intuitively clear and straightforward

Solving Recursive Equations by Telescoping Create a set of equations, take their sum

Solving Recursive Equations by Induction Repeated substitution and telescoping construct the solution If you know the closed form solution, you can validate it by ordinary induction For the induction, may want to increase n by a multiple (2n) rather than by n+1

Inductive Proof

Example: Sum of Integer Queue sum_queue(Q){ if (Q.length() == 0 ) return 0; else return Q.dequeue() + sum_queue(Q); } –One subproblem –Linear reduction in size (decrease by 1) Equation:T(0) = b T(n) = c + T(n – 1) for n>0

Lower Bound Analysis: Recursive Fibonacci int Fib(n){ if (n == 0 or n == 1) return 1 ; else return Fib(n - 1) + Fib(n - 2); } Lower bound analysis  (n) Instead of =, equations will use  T(n)  Some expression Will simplify math by throwing out terms on the right-hand side

Analysis by Repeated Subsitution

Learning from Analysis To avoid recursive calls –store all basis values in a table –each time you calculate an answer, store it in the table –before performing any calculation for a value n check if a valid answer for n is in the table if so, return it Memoization –a form of dynamic programming How much time does memoized version take?

Amortized Analysis Consider any sequence of operations applied to a data structure Some operations may be fast, others slow Goal: show that the average time per operation is still good

Stack Abstract Data Type Stack operations –push –pop –is_empty Stack property: if x is on the stack before y is pushed, then x will be popped after y is popped What is biggest problem with an array implementation? A BCDEFBCDEF E D C B A F

Stretchy Stack Implementation int[] data; int maxsize; int top; Push(e){ if (top == maxsize){ temp = new int[2*maxsize]; for (i=0;i<maxsize;i++) temp[i]=data[i]; ; data = temp; maxsize = 2*maxsize; } else { data[++top] = e; } Best case Push = O( ) Worst case Push = O( )

Stretchy Stack Amortized Analysis Consider sequence of n operations push(3); push(19); push(2); … What is the max number of stretches? What is the total time? –let’s say a regular push takes time a, and stretching an array contain k elements takes time bk. Amortized time =

Stretchy Stack Amortized Analysis Consider sequence of n operations push(3); push(19); push(2); … What is the max number of stretches? What is the total time? –let’s say a regular push takes time a, and stretching an array contain k elements takes time bk. Amortized time = log n

Geometric Series

Stretchy Stack Amortized Analysis Consider sequence of n operations push(3); push(19); push(2); … What is the max number of stretches? What is the total time? –let’s say a regular push takes time a, and stretching an array contain k elements takes time bk. Amortized time = log n

Surprise In an asymptotic sense, there is no overhead in using stretchy arrays rather than regular arrays!