Complexity Analysis Text is mainly from Chapter 2 by Drozdek
Computational complexity Developed by Juris Hartmanis & Richard E. Stearns Compares efficiency of algorithms Assesses the degree of difficulty of an algorithm How much effort is needed to APPLY an algorithm Estimate the resources required by an algorithm How costly it is
What is the COST of an algorithm??? Time Space
How can you practically compare 2 algorithms??? Run them on the same machine Implement them using the same language Only time units such as nano-seconds or micro seconds should not be used…WHY??? Logical units expressing the relation between the size of file ‘n’ and the amount of time ‘t’ required to process it should be used Same input data should be used
Problems with this approach Effort One program may be better written than the other Choice of tests may favor one of the algorithms We want a method in which we compare two algorithms WITHOUT executing them The method should be independent of hardware/compiler/operating system
What do we need????? We require algorithm analysis!!! Estimate the performance of an algorithm through The number of operations required to process an input Process an input of certain size Require a function expressing relation between n & t called time complexity function T(n) For calculating T(n) we need to compute the total number of program steps … (can be the number of executable statements or meaningful program segment)
Calculating T(n) A program step is the syntactically / semantically meaningful segments of a program A step DOESNOT correspond to a definite time unit A step count is telling us how run time for a program changes with change in data size Calculate the total number of steps/executable statements in a program Find the frequency of each statement and sum them up Don’t count comments and declarations
Analysis of ‘for’ Loop 2: sum++; 3: cout << sum 1: for (i=0;i<n;++i) 2: sum++; 3: cout << sum You can see…. That statement 1 executes n+1 times Statement 2 executes n times Statement 3 executes 1 time T(n) = ??? The condition always executes one more time than the loop itself
Nested ‘for’ loops for (i=0;i<n;++i) { for (j=0;j<m;++j) sum++ } The statement sum++ executes n*m times So in a nested for loop if the loop variables are independent then: The total number of times a statement executes = outer loop times * inner loop times
Problems with T(n) T(n) is difficult to calculate T(n) is also not very meaningful as step size is not exactly defined T(n) is usually very complicated so we need an approximation of T(n)….close to T(n). This measure of efficiency or approximation of T(n) is called ASYMPTOTIC COMPLEXITY or ASYMPTOTIC ALGORITHM ANALYSIS Asymptotic complexity studies the efficiency of an algorithm as the input size becomes large
Example If T(n) = 7n+100 What is T(n) for different values of n??? n Comment 1 107 Contributing factor is 100 5 135 Contributing factor is 7n and 100 10 170 100 800 Contribution of 100 is small 1000 7100 Contributing factor is 7n 10000 70100 106 7000100 What is the contributing factor???? When approximating T(n) we can IGNORE the 100 term for very large value of n and say that T(n) can be approximated by 7(n)
Example 2 T(n) = n2 + 100n + log10n +1000 n T(n) n2 100n log10n 1000 Val % 1 1101 0.1% 100 9.1% 0% 90.8% 10 2101 5.8% 47.6% 0.05% 21002 10000 2 0.99% 4.76% 105 10,010,001,005 1010 99.9% 107 .099% 5 0.0% 0.00% When approximating T(n) we can IGNORE the last 3 terms and say say that T(n) can be approximated by n2
Big-Oh or Big-O g(n) is called the upper bound on f(n) OR Definition: f(n) is O(g(n)) if there exist positive numbers c & N such that f(n)<=cg(n) for all n>=N g(n) is called the upper bound on f(n) OR f(n) grows at the most as large as g(n) Example: T(n) = n2 + 3n + 4 n2 + 3n + 4 <= 2 n2 for all n>10 (What is f(n) and what is g(n)????) (What is c ???? & What is N???) so we can say that T(n) is O(n2) OR T(n) is in the order of n2. T(n) is bounded above by a + real multiple of n2
What does it mean???? An example… For three constants c1, c2, c3 …(doesn’t matter what the values are) there will be an n beyond which the program with complexity c3n will be faster than the one with complexity c1n2+c2n This value of n is called the break even point If n=0 then c3n will always be faster
COMPLEXITY Space complexity is the amount of memory a program needs to run to completion Time complexity is the amount of computer time a program needs to run to completion If f(n) = amnm+am-1nm-1+….+a1n+a0 then f(n) = O(nm)
FACTS ABOUT O(n) Gives us means for comparing algorithms It tells us about the growth rate of an algorithm as input size becomes large Its also called the asymptotic growth rate or asymptotic order or simply order of the function. ASYMPTOTIC NOTATION
Properties of Big-Oh If f(n) is O(g(n)) and g(n) is O(h(n)) then f(n) is O(h(n)) If f(n) is O(h(n)) and g(n) is O(h(n)) then f(n) + g(n) is O(h(n)) ank is O(nk) where a is a constant nk is O(nk+j ) for any positive j If f(n)=cg(n) then f(n) is O(g(n)) The function logan is O(logbn) for all positive a & b ≠ 1 so we can just write O(log n) (without the base)
Growth rates sum++; for (i=0;i<n;++i) sum++; for (i=0;i<n;++i) This is O(1) sum++; This is O(n) for (i=0;i<n;++i) sum++; This is O(n2) for (i=0;i<n;++i) for (j=0;j<n;++j) sum++;
Growth rates Which growth rate is better??? Add n!, nn , 2n to the graph and see where they fit Here we have: Constant Linear Logarithmic Quadratic cubic This is certainly what I would like my program to be!!! (but this is just wishful thinking) Graph from Adam Drozdek’s book
Comparison of Growth Rates log2N N log2N N2 N3 2N 1 2 4 8 3 24 64 512 256 6 384 4096 262,144 About 5 years 128 7 896 16,384 2,097,152 Approx 6 billion years, 600,000 times more than age of univ. (If one operation takes 10-11 seconds) REF: Nell Dale
EXAMPLE OF O(log n) sum = 0; for (i=1;i<n;i=i*2) sum = sum+1; i sum 1+1=2 4 2+1=3 8 3+1=4 … n/2 log2n n “exit loop”… sum not incremented further loop will not execute for i=n The value of sum is log2n at the end of the loop
EXAMPLE OF O(log n)…suppose you are coloring the table take half discarded take half discarded take half take half take half You didn’t color all cells…..everytime you discarded half the cells
EXAMPLE OF O(n)…suppose you are coloring the table You will color all the cells one by one take half
EXAMPLE OF O(n2)…suppose you are coloring the table You will color each cell n times take half
EXAMPLE OF O(2n)…suppose you are coloring the table The number of times you color a cell is written there 1 2 4 8 16 32 64 128 256 512 1024 2048
SOME MATH Normally these formulas are very handy : If then
SOME RULES Rule 1 for (i=0;i<n;i=i+k) Anything inside the loop will run approximately n/k times Rule 2 for (i=n;i>0;i=i-k) Anything inside the loop will run approximately n/k times Rule 3 for (i=1;i<n;i=i*k) Anything inside the loop will run approximately logkn times Rule 4 If the loop variables are independent then the total times a statement inside a nested loop is executed is equal to the product of the times the individual loops run e.g. for (i=0;i<n;++i) for (j=0;j<m;++j) A statement inside the above nested loop will run n*m times
Some rules…(contd.) Rule 5 for(i=1;i<=n;++i) for (j=1;j<=i;++j) The above nested loop approximately runs (1/2)n(n+1) times. The variable j depends upon the value of i Rule 6 for(i=1;i<=n;i=i*2) The statements in the above nested loop approximately run 2n-1 times. Rule 7 for(i=0;i<n;i=i+1) for (j=i;j<n;++j) The statements in the above nested loop approximately run n(n+1)/2 times.
SELECTION SORT for (i=0;i<n;++i) { maxIndex = FindMaxIndex(arr,i,n-1); swap(arr[i],arr[maxIndex]); } //FindMaxIndex(int arr[],int startIndex,int endIndex) //finds the maximum item in the partial array //from start index to end index what is the complexity of the above???
BUBBLE SORT what is the complexity of the above??? bool done = false; for(i = 1; (i < n) && !done; i++) //repeat a pass of bubble sort { done = true; for (j=0; j <n-i; j++) //inner loop swaps consecutive items if (arr[j+1] < arr[j]) swap(arr[j+1],arr[j]) done = false; //a swap is made and so sorting continues }//end of if }//end of inner for }//end of outer for what is the complexity of the above???