CSC401 – Analysis of Algorithms Chapter 1 Algorithm Analysis Objectives: Introduce algorithm and algorithm analysis Discuss algorithm analysis methodologies Introduce pseudo code of algorithms Asymptotic notion of algorithm efficiency Mathematics foundation for algorithm analysis Amortization analysis techniques
2 What is an algorithm ? Algorithm Input Output An algorithm is a step-by-step procedure for solving a problem in a finite amount of time. What is algorithm analysis ? Two aspects: Running time – How much time is taken to complete the algorithm execution? Storage requirement – How much memory is required to execute the program?
3 Running Time Most algorithms transform input objects into output objects. The running time of an algorithm typically grows with the input size. Average case time is often difficult to determine. We focus on the worst case running time. –Easier to analyze –Crucial to applications such as games, finance and robotics
4 How Is An Algorithm Analyzed? Two methodologies –Experimental analysis Method –Write a program implementing the algorithm –Run the program with inputs of varying size and composition –Use a method like System.currentTimeMillis() to get an accurate measure of the actual running time –Plot the results Limitations –It is necessary to implement the algorithm, which may be difficult –Results may not be indicative of the running time on other inputs not included in the experiment. –In order to compare two algorithms, the same hardware and software environments must be used
5 How Is An Algorithm Analyzed? –Theoretical Analysis Method –Uses a high-level description of the algorithm instead of an implementation –Characterizes running time as a function of the input size, n. –Takes into account all possible inputs –Allows us to evaluate the speed of an algorithm independent of the hardware/software environment Characteristics –A description language -- Pseudo code –Mathematics
6 Pseudocode High-level description of an algorithm More structured than English prose Less detailed than a program Preferred notation for describing algorithms Hides program design issues Algorithm arrayMax(A, n) Input array A of n integers Output maximum element of A currentMax A[0] for i 1 to n 1 do if A[i] currentMax then currentMax A[i] return currentMax Example: find max element of an array
7 Pseudocode Details Control flow –if … then … [else …] –switch … –while … do … –repeat … until … –for … do … –Indentation replaces braces Method declaration Algorithm method (arg [, arg…]) Input … Output … Array indexing: A[i] Method call var.method (arg [, arg…]) Return value return return expressionExpressions Assignment (like in Java) Equality testing (like in Java) n 2 Superscripts and other mathematical formatting allowed
8 The Random Access Machine (RAM) Model A CPU An potentially unbounded bank of memory cells, each of which can hold an arbitrary number or character Memory cells are numbered and accessing any cell in memory takes unit time.
9 Primitive Operations Basic computations performed by an algorithm Identifiable in pseudocode Largely independent from the programming language Exact definition not important (we will see why later) Assumed to take a constant amount of time in the RAM model Examples: –Evaluating an expression –Assigning a value to a variable –Performing an arithmetic operation –Comparing two numbers –Indexing into an array –Following an object reference –Calling a method –Returning from a method
10 Counting Primitive Operations By inspecting the pseudocode, we can determine the maximum number of primitive operations executed by an algorithm, as a function of the input size Algorithm arrayMax(A, n) # operations # operations currentMax A[0] 2 for i 1 to n 1 do 1 n if A[i] currentMax then2(n 1) currentMax A[i]2(n 1) { increment counter i }2(n 1) return currentMax 1 Total 7n 2
11 Estimating Running Time Algorithm arrayMax executes 7n 2 primitive operations in the worst case. Define: a = Time taken by the fastest primitive operation b = Time taken by the slowest primitive operation Let T(n) be the worst-case time of arrayMax. Then a (7n 1) T(n) b(7n 1) Hence, the running time T(n) is bounded by two linear functions
12 Growth Rate of Running Time Changing the hardware/ software environment –Affects T(n) by a constant factor, but –Does not alter the growth rate of T(n) The linear growth rate of the running time T(n) is an intrinsic property of algorithm arrayMax Growth rates of functions: –Linear n –Quadratic n 2 –Cubic n 3 In a log-log chart, the slope of the line corresponds to the growth rate of the function
13 Constant Factors The growth rate is not affected by –constant factors or –lower-order terms Examples –10 2 n 10 5 is a linear function –10 5 n 2 10 8 n is a quadratic function
14 Big-Oh Notation Given functions f(n) and g(n), we say that f(n) is O(g(n)) if there are positive constants c and n 0 such that f(n) cg(n) for n n 0 Example: 2n 10 is O(n) –2n 10 cn –(c 2) n 10 –n 10 (c 2) –Pick c 3 and n 0 10 Example: the function n 2 is not O(n) –n 2 cn –n c –The above inequality cannot be satisfied since c must be a constant
15 More Big-Oh Examples 7n-2 7n-2 is O(n) need c > 0 and n 0 1 such that 7n-2 cn for n n 0 this is true for c = 7 and n 0 = 1 this is true for c = 7 and n 0 = 1 3n n n n is O(n 3 ) 3n n is O(n 3 ) need c > 0 and n 0 1 such that 3n n cn 3 for n n 0 need c > 0 and n 0 1 such that 3n n cn 3 for n n 0 this is true for c = 4 and n 0 = 21 this is true for c = 4 and n 0 = 21 3 log n + log log n 3 log n + log log n is O(log n) 3 log n + log log n is O(log n) need c > 0 and n 0 1 such that 3 log n + log log n clog n for n n 0 need c > 0 and n 0 1 such that 3 log n + log log n clog n for n n 0 this is true for c = 4 and n 0 = 2 this is true for c = 4 and n 0 = 2
16 Big-Oh and Growth Rate The big-Oh notation gives an upper bound on the growth rate of a function The statement “ f(n) is O(g(n)) ” means that the growth rate of f(n) is no more than the growth rate of g(n) We can use the big-Oh notation to rank functions according to their growth rate f(n) is O(g(n)) g(n) is O(f(n)) g(n) grows more YesNo f(n) grows more NoYes Same growth YesYes
17 Big-Oh Rules If is f(n) a polynomial of degree d, then f(n) is O(n d ), i.e., 1.Drop lower-order terms 2.Drop constant factors Use the smallest possible class of functions –Say “ 2n is O(n) ” instead of “ 2n is O(n 2 ) ” Use the simplest expression of the class –Say “ 3n 5 is O(n) ” instead of “ 3n 5 is O(3n) ”
18 Asymptotic Algorithm Analysis The asymptotic analysis of an algorithm determines the running time in big-Oh notation To perform the asymptotic analysis –We find the worst-case number of primitive operations executed as a function of the input size –We express this function with big-Oh notation Example: –We determine that algorithm arrayMax executes at most 7n 2 primitive operations –We say that algorithm arrayMax “runs in O(n) time” Since constant factors and lower-order terms are eventually dropped anyhow, we can disregard them when counting primitive operations
19 Summation: Geometric summation: Natural summation: Harmonic number: Split summation: Summations
20 Logarithms –properties of logarithms: log b (xy) = log b x + log b y log b (x/y) = log b x - log b y log b xa = alog b x log b a = log x a/log x b Exponents –properties of exponentials: a (b+c) = a b a c a bc = (a b ) c a b /a c = a (b-c) b = a log a b b c = a c*log a b Floor and ceiling functions – – the largest integer less than or equal to x – the smallest integer greater than or equal to x Logarithms and Exponents
21 By Example -- counterexample Contrapositive –Principle: To justify “if p is true, then q is true”, show “if q is not true, then p is not true”. –Example: To justify “if ab is odd, a is odd or b is even”, assume “’a is odd or b is odd’ is not true”, then “a is even and b is odd”, then “ab is even”, then “’ab is odd’ is not true” Contradiction –Principle: To justify “p is true”, show “if p is not true, then there exists a contradiction”. –Example: To justify “if ab is odd, then a is odd or b is even”, let “ab is odd”, and assume “’a is odd or b is even’ is not true”, then “a is even and b is odd”, thus “ab is even” which is a contradiction to “ab is odd”. Proof Techniques
22 Induction -- To justify S(n) for n>=n0 –Principle Base cases: Justify S(n) is true for n0<=n<=n1 Assumption: Assume S(n) is true for n=N>=n1; or Assume S(n) is true for n1<=n<=N Induction: Justify S(n) is true for n=N+1 –Example: Question: Fibonacci sequence is defined as F(1)=1, F(2)=1, and F(n)=F(n-1)+F(n-2) for n>2. Justify F(n) =1 Proof by induction where n0=1. Let n1=2. –Base cases: For n0<=n<=n1, n=1 or n=2. If n=1, F(1)=1<2=2^1. If n=2, F(2)=2<4=2^2. It holds. –Assumption: Assume F(n) =2. –Induction: For n=N+1, F(N+1)=F(N)+F(N- 1)<2^N+2^(N-1)<2 2^N=2^(N+1). It holds. Proof by Induction
23 Principle: –To prove a statement S about a loop is correct, define S in terms of a series of smaller statements S0, S1, …, Sk, where The initial claim S0 is true before the loop begins If Si-1 is true before iteration i begins, then show that Si is true after iteration i is over The final statement Sk is true Example: Consider the algorithm arrayMax –Statement S: max is the maximum number when finished –A series of smaller statements: Si: max is the maximum in the first i+1 elements in the first i+1 elements of the array of the array S0 is true before the loop If Si is true, then easy to show Si+1 is true show Si+1 is true S=Sn-1 is also true Proof by Loop Invariants Algorithm arrayMax(A, n) max A[0] for i 1 to n 1 do if A[i] max then max A[i] return max
24 Sample space: the set of all possible outcomes from some experiment Probability space: a sample space S together with a probability function that maps subsets of S to real numbers between 0 and 1 Event: Each subset of A of S called an event Properties of the probability function Pr –Pr(Ø)=0 –Pr(S)=1 –0<=Pr(A)<=1 for any subset A of S –If A and B are subsets of S and AB=, then Pr(AB)=Pr(A)+Pr(B) Independence –A and B are independent if Pr(AB)=Pr(A)Pr(B) –A 1, A 2, …, A n are mutually independent if Pr(A 1 A 2 … A n )=Pr(A 1 )Pr(A 2 )…Pr(A n ) Basic Probability
25 Conditional probability –The conditional probability that A occurs, given B, is defined as: Pr(A|B)=Pr(AB)/Pr(B), assuming Pr(B)>0 Random variables –Intuitively, Variables whose values depend on the outcomes of some experiment –Formally, a function X that maps outcomes from some sample space S to real numbers –Indicator random variable: a variable that maps outcomes to either 0 or 1 Expectation:a random variable has random values –Intuitively, average value of a random variable –Formally, expected value of a random variable X is defined as E(X)= x xPr(X=x) –Properties: Linearity: E(X+Y) = E(X) + E(Y) Independence: If X and Y are independent, that is, Pr(X=x|Y=y)=Pr(X=x), then E(XY)=E(X)E(Y) Basic Probability
26 Computing Prefix Averages We further illustrate asymptotic analysis with two algorithms for prefix averages The i -th prefix average of an array X is average of the first (i 1) elements of X: A[i] X[0] X[1] … X[i])/(i+1) Computing the array A of prefix averages of another array X has applications to financial analysis
27 Prefix Averages (Quadratic) The following algorithm computes prefix averages in quadratic time by applying the definition Algorithm prefixAverages1(X, n) Input array X of n integers Output array A of prefix averages of X #operations A new array of n integers n A new array of n integers n for i 0 to n 1 do n s X[0] n for j 1 to i do 1 2 … (n 1) s s X[j] 1 2 … (n 1) A[i] s (i 1) n return A 1
28 Arithmetic Progression The running time of prefixAverages1 is O(1 2 … n) The sum of the first n integers is n(n 1) 2 –There is a simple visual proof of this fact Thus, algorithm prefixAverages1 runs in O(n 2 ) time
29 Prefix Averages (Linear) The following algorithm computes prefix averages in linear time by keeping a running sum Algorithm prefixAverages2(X, n) Input array X of n integers Output array A of prefix averages of X #operations A new array of n integersn s 0 1 for i 0 to n 1 don s s X[i]n A[i] s (i 1) n return A 1 Algorithm prefixAverages2 runs in O(n) time
30 Amortization Amortization –Typical data structure supports a wide variety of operations for accessing and updating the elements –Each operation takes a varying amount of running time –Rather than focusing on each operation –Consider the interactions between all the operations by studying the running time of a series of these operations –Average the operations’ running time Amortized running time –The amortized running time of an operation within a series of operations is defined as the worst-case running time of the series of operations divided by the number of operations –Some operations may have much higher actual running time than its amortized running time, while some others have much lower
31 The Clearable Table Data Structure The clearable table –An ADT Storing a table of elements Being accessing by their index in the table –Two methods: add(e) -- add an element e to the next available cell in the table clear() -- empty the table by removing all elements Consider a series of operations (add and clear) performed on a clearable table S –Each add takes O(1) –Each clear takes O(n) –Thus, a series of operations takes O(n 2 ), because it may consist of only clears
32 Amortization Analysis Theorem: –A series of n operations on an initially empty clearable table implemented with an array takes O(n) time Proof: –Let M 0, M 1, …, M n-1 be the series of operations performed on S, where k operations are clear –Let M i0, M i1, …, M i k-1 be the k clear operations within the series, and others be the (n-k) add operations –Define i -1 =-1,M ij takes i j -i j-1, because at most i j -i j-1 -1 elements are added by add operations between M i j-1 and M i j –The total time of the series is: (n-k) + ∑ k-1 j=0 (i j -i j-1 ) = n-k + (i k-1 - i -1 ) <= 2n-k Total time is O(n) Amortized time is O(1)
33 Accounting Method The method –Use a scheme of credits and debits: each operation pays an amount of cyber-dollar –Some operations overpay --> credits –Some operations underpay --> debits –Keep the balance at any time at least 0 Example: the clearable table –Each operation pays two cyber-dollars –add always overpays one dollar -- one credit –clear may underpay a variety of dollars the underpaid amount equals the number of add operations since last clear - 2 –Thus, the balance is at least 0 –So the total cost is 2n -- may have credits
34 Potential Functions Based on energy model –associate a value with the structure, Ø, representing the current energy state –each operation contributes to Ø a certain amount of energy t’ and consumes a varying amount of energy t –Ø 0 -- the initial energy, Ø i -- the energy after the i-th operation –for the i-th operation t i -- the actual running time t’ i -- the amortized running time t i = t’ i + Ø i-1 - Ø i –Overall: T=∑(t i ), T’=∑(t’ i ) –The total actual time T = T’ + Ø 0 - Ø n –As long as Ø 0 <= Ø n, T<=T’
35 Potential Functions Example -- The clearable table –the current energy state Ø is defined as the number of elements in the table, thus Ø>=0 –each operation contributes to Ø t’=2 –Ø 0 = 0 –Ø i = Ø i-1 + 1, if the i-th operation is add add: t = t’+ Ø i-1 - Ø i = = 1 –Ø i = 0, if the i-th operation is clear clear: t =t’+ Ø i-1 = 2 + Ø i-1 –Overall: T=∑ (t), T’=∑(t’) –The total actual time T = T’ + Ø 0 - Ø n –Because Ø 0 = 0 <= Ø n, T<=T’ =2n
36 Extendable Array ADT: Extendable array is an array with extendable size. One of its methods is add to add an element to the array. If the array is not full, the element is added to the first available cell. When the array is full, the add method performs the following –Allocate a new array with double size –Copy elements from the old array to the new array –Add the element to the first available cell in the new array –Replace the old array with the new array Question: What is the amortization time of add? –Two situations of add: add -- O(1) Extend and add -- (n) –Amortization analysis: Each add deposits 3 dollars and spends 1 dollar Each extension spends k dollars from the size k to 2k (copy) Amortization time of add is O(1)
37 Relatives of Big-Oh big-Omega –f(n) is (g(n)) if there is a constant c > 0 and an integer constant n 0 1 such that f(n) cg(n) for n n 0 big-Theta –f(n) is (g(n)) if there are constants c’ > 0 and c’’ > 0 and an integer constant n 0 1 such that c’g(n) f(n) c’’g(n) for n n 0 little-oh –f(n) is o(g(n)) if, for any constant c > 0, there is an integer constant n 0 0 such that f(n) cg(n) for n n 0 little-omega –f(n) is (g(n)) if, for any constant c > 0, there is an integer constant n 0 0 such that f(n) cg(n) for n n 0
38 Intuition for Asymptotic Notation Big-Oh –f(n) is O(g(n)) if f(n) is asymptotically less than or equal to g(n) big-Omega –f(n) is (g(n)) if f(n) is asymptotically greater than or equal to g(n) big-Theta –f(n) is (g(n)) if f(n) is asymptotically equal to g(n) little-oh –f(n) is o(g(n)) if f(n) is asymptotically strictly less than g(n) little-omega –f(n) is (g(n)) if is asymptotically strictly greater than g(n)
39 Example Uses of the Relatives of Big-Oh 5n 2 is (n 2 ) f(n) is (g(n)) if there is a constant c > 0 and an integer constant n 0 1 such that f(n) cg(n) for n n 0 let c = 5 and n 0 = 1 5n 2 is (n) f(n) is (g(n)) if there is a constant c > 0 and an integer constant n 0 1 such that f(n) cg(n) for n n 0 let c = 1 and n 0 = 1 5n 2 is (n) f(n) is (g(n)) if, for any constant c > 0, there is an integer constant n 0 0 such that f(n) cg(n) for n n 0 need 5n 0 2 cn 0 given c, the n 0 that satisfies this is n 0 c/5 0