2018, Fall Pusan National University Ki-Joune Li Basic Concepts 2018, Fall Pusan National University Ki-Joune Li
Data Structures ? Refrigerator Problem Data Structures If we have only one item in my refrigerator, no problem. If we have several items in a refrigerator, it does matter. Some Organization or Structures Data Structures How to place data in memory
What is good data structure? What is good placement in refrigerator ? Easy to cook (and place) What is good data structure ? Easy (and efficient) to use It depends on what you want to do with the data structures Two different aspects Functions and Internal Implementations
Two viewpoints and Abstract Data Types Example: Suppose we have 1,000,000 integer values, then what the difference between Array and Stack ? ADT We need a clear separation Interface for each function Implementation Data Structure + Algorithm getElement(i) putElement(i,v) number()
Abstract Data Type Object-Oriented Programming Abstraction (or Encapsulation) Hiding the internal details Implementation Internal mechanism and process Only provide Interfaces Abstract Data Type Hiding the internal structures once it has been implemented Provide only the interface to the users
Algorithms Algorithm A sequence of instructions with specifications of Input Output : at least one output Definiteness : Clear instructions Finiteness Effectiveness Abstract description of a program Can be easily converted to a program
Performance Analysis What is a good algorithm ? Correctness Good documentation and readable code Proper structure Effective How to measure the effectiveness of an algorithm ? Space complexity Amount of memory it needs to run Time complexity Amount of time (mostly CPU time) it needs to run
Space Complexity Notation Space complexity, f (n) : function of input size n How should the constants be determined ? Is it meaningful to count the number of bytes ? int sumAB(int a, int b) { int sum; sum=a+b; return sum; } int sumArray(int n, int *a) { int sum=0; for(int i=0;i<n;i++) sum += a[i]; return sum; } { if(n<=0) return 0; return a[n-1]+sumArrary(n-1,a); f (n) = c1 c1 : constant f (n) = c2 a[n] ? f (n) = c3 n why ?
Is it really meaningful to determine these constants ? Time Complexity Notation Time complexity, f (n) : function of input size n int sumAB(int a, int b) { int sum; sum=a+b; return sum; } int sumArray(int n, int *a) { int sum=0; for(int i=0;i<n;i++) sum += a[i]; return sum; } { if(n<=0) return 0; return a[n-1]+sumArrary(n-1,a); f (n) = 2 + f (n) = 3 n +1 + f (n) = (2+ ) n Is it really meaningful to determine these constants ?
Asymptotic Notation Exact time (or step count) to run Depends on machines and implementation (program) NOT good measure More general (but inexact) notation : Asymptotic Notation Big-O O(n) : Upper Bound Omega-O (n) : Lower Bound Theta-O (n) : More precise than Big-O and Omega-O Big-O notation is the most popular one.
Big-O notation Definition ( f of n is big-O of g of n) Example f (n) O(g (n)) there exist c and n0 (c, n0 >0) such that f (n) c·g(n), for all n ( n0) Example 3n + 2 = O(n), 3n + 2 = O(n2) Time complexity of the following algorithm f (n)= O(n) int sumArray(int n, int *a) { if(n<=0) return 0; return a[n-1]+sumArrary(n-1,a); }
Big-O notation : Some Properties Classification O(1): constant, O(n): Linear, O(n2): Quadratic, O(2n): exponential Polynomial function If f (n) = amnm + am-1nm-1 + … + a1n + a0, then f (n) O(nm), where ai > 0 Big-O is determined by the highest order Only the term of the highest order is of our concern Big-O : useful for determining the upper bound of time complexity When only an upper bound is known, Big-O notation is useful In most cases, not easy to find the exact f (n) Big-O notation is the most used.
Omega-O notation Definition ( f of n is Omega-O of g of n) Example f (n) (g (n)) there exist c and n0 (c, n0 >0) such that f (n) c·g(n), for all n ( n0) Example 3n + 2 = (n), 3n + 2 = (1), 3n2 + 2 = (n), Time complexity of the following algorithm f (n)= (n) If f (n) = amnm + am-1nm-1+…+ a1n+a0, then f (n) (nm) Omega-O is determined by the highest order Omega-O notation : useful to describe the lower bound int sumArray(int n, int *a) { if(n<=0) return 0; return a[n-1]+sumArrary(n-1,a); }
Theta-O notation Definition ( f of n is theta-O of g of n) Example f (n) (g (n)) there exist c1, c2 and n0 (c1, c2, and n0 >0) such that c1·g(n) f (n) c2·g(n), for all n ( n0) Example 3n + 2 = (n), 3n + 2 (1), 3n2 + 2 (n), Time complexity of the following algorithm f (n)= (n) If f (n) = amnm + am-1nm-1+…+ a1n+a0, then f (n) (nm) Theta-O is determined by the highest order Theta-O Possible Only if f (n)=(g(n)), and f(n)=(g(n)) Lower bound and Upper bound is the same very exact but not easy to find such a g(n) int sumArray(int n, int *a) { if(n<=0) return 0; return a[n-1]+sumArrary(n-1,a); }
Complexity Analysis Worst-Case Analysis Average Analysis Time complexity for the worst case Example Linear Search : f (n) = n Average Analysis Average time complexity f (n) = p1f1(n) + p2f2(n) + … pkfk(n), where pi is the probability for the i -th case. Not easy to find pi In most cases, only worst-case analysis Why not Best-Case Analysis ?
Example : Worst-Case Time complexity of Binary Search Big-O : O(n2), O(log n) Omega O : (1), (log n) Theta O : (log n) int BinarySearch(int v, int *a, int lower, int upper) // search m among a sorted array a[1ower], a[lower+1], … a[upper] { if(l<=u) { int m=(lower+upper)/2; if(v>a[m]), return BinarySearch(v,a,m+1,upper); else if(v<a[m]) return BinarySearch(v,a,lower,m-1); else /* v==a[m] */ return m; } return -1; // not found
Comparison : O(1), O(log n), O(n), O(n2), O(2n) Graph In general Algorithm of O(1) is almost impossible Algorithms of O(log n) is excellent Algorithms of O(n) is very good Algorithms of O(n2 ) is not bad Algorithms of O(n3 ) is acceptable Algorithms of O(2n ) is not useful