Algorithm Analysis Input size Time I1 T1 I2 T2 … We want to quantify the behavior of an algorithm. Useful to compare efficiency of two algorithms on the same problem. Observations: A program (algorithm) consumes resources: time and space Amount of resources directly related to size of input. Say, we have the following table: Input size Time I1 T1 I2 T2 …
How can we derive the function T = f(I) ? Problems: Too many parameters to learn this function It depends on Machine in which program is run Compiler used Programming language Programmer who writes the code Solution: Imagine our algorithm runs on an “algorithm machine” that accepts pseudo-code. Assumptions: READs and WRITEs take constant time Arithmetic operations take constant time Logical operations take constant time
Even though we made various assumptions, … it is still complicated. Instead, we quantify our algorithm on the worst-case input. This is called “worst-case analysis” Also, the “average-case analysis” exists: Requires probability distribution of set of inputs which is usually unknown. Not studied in this course.
Input Size: not always easy to determine, and problem dependent. Some examples: Graph-theoretic problem: Number of vertices, V, and number of edges, E. Matrix multiplication: Number of rows and columns of input matrices. Sorting: The number of elements, n.
Not needs to find exactly what T = f(I) is, but we can say: T(I) = O(f(I)) For example, the time complexity of mergesort of n elements: T(n) = O(n log n) What does it mean? Behavior of mergesort is better than a constant times n log n, where n n0.
Growth of Functions: The analysis of the complexity of an algorithm is linked to a problem of growth of functions. Let f(n) be a function of a positive integer n. The dominant term of f(n) determines the behavior of f(n) as n . For example, let: f(n) = 2n3 + 3n2 + 4n + 1 The dominant term of f(n) is 2n3. This means that as n becomes large (n ): 2n3 dominates the behavior of f(n) The other terms’ contributions become much less significant.
Example 2: The term dominates the behavior of f(n) as n Example 3: The rate of growth means how function behaves as n . It is determined by its dominant term. The big-Oh notation is a short-hand way of expressing this.
The relationship: f(n) = O(n2) is interpreted as: f(n) grows no faster than n2 as n becomes large (n ). The dominant term of f(n) does not grow faster than n2. It can grow as fast as n2 the most accurate description. In this case: Another notation used, not discussed here. Also, O(n2) is true for: We could make up as many such functions as we wish. In general, the description: the most accurate possible (the smaller).
Problem: To find the most accurate description for any function… in terms of the big-Oh notation. A formal definition of: f(n) = O(g(n)) is that the inequality: f(n) c * g(n) holds for all n n0, where n0 and c are positive constants f(n) and g(n) are functions mapping nonnegative integers to real numbers Informally “f(n) is order g(n)”
Graphically: Input Size: n f(n) c * g(n) n0
Example: f(n) = 7n2 + .5n + 6 and g(n) = n2 f(n) is O(n2), provided that c = 10 and n0 = 2
f(n) = a0 + a1 n + … + ad-1 nd-1 + ad nd Then, f(n) is O(nd) In general, if f(n) = a0 + a1 n + … + ad-1 nd-1 + ad nd Then, f(n) is O(nd) We will see other functions too: For example: O(log n), O(n log n), etc. Having defined the big-Oh notation, now … Seven functions that often appear in algorithm analysis: Constant 1 Logarithmic log n Linear n N-Log-N n log n Quadratic n2 Cubic n3 Exponential 2n
Others examples What is the smallest big-oh complexity associated to algorithms for which the running time is given by the following functions: f(n) = 8 + n/2 + n4/104 O(n4) 2. f(n) = log4 n + n O(n) 3. f(n) = log2 n + n O(n) 4. f(n) = n2 - n - n O(n2) 5. f(n) = n2/log2 n O(n2/log2 n) 6. f(n) = log2/3 n + log n2 O(log n)
Analysis of Examples Given a list of n elements, find the minimum (or maximum). Then, T(n) = O(n) We look at all elements to determine minimum (maximum). Given n points in the plane, find the closest pair of points. In this case, T(n) = O(n2) why? a brute-force algorithm that looks at all n2 pairs of points.
Given n points in a plane, determine if any three points are contained in a straight line. In this case, T(n) = O(n3) why? a brute-force algorithm that searches all n3 triplets.
Maximum Contiguous Subsequence (MCS) Problem Given a sequence S of n integers: S = a1, a2, a3, …, an-1, an a contiguous subsequence is: ai, ai+1, …, aj-1, aj, where 1 i j n. The problem: Determine a contiguous subsequence such that: ai + ai+1 + … + aj-1 + aj 0 is maximal. Some examples: S = -1, -2, -3, -4, -5, -6 MCS is empty, it has value 0 by definition.
For the sequence: -1, 2, 3, -3, 2, an MCS is 2, 3 whose value is 2 + 3 = 5. Note: There may be more than one MCS. For example: -1, 1, -1, 1, -1, 1 has six MCS whose value is 1
An O(n2) Algorithm for MCS Search problems have an associated search space. To figure out: How large the search space is. For the MCS problem: How many subsequences need be examined? For example: -1, 2, 3, -3, 2 Then, the subsequences that begin with –1 are: -1 -1, 2 -1, 2, 3 -1, 2, 3, -3
The ones beginning with 2 are: 2, 3 2, 3, -3 2, 3, -3, 2 Those beginning with 3 are: 3 3, -3 3, -3, 2 The ones beginning with –3: -3 -3, 2 and beginning with 2, just one:
Then, including the empty sequence, a total of 16 examined. In general, given a1, a2, a3, …, an-1, an We have n sequences beginning with a1: a1 a1, a2 a1, a2, a3 …. n-1 beginning with a2: a2 a2, a3 a2, a3, …, an-1, an
and so on. Then, two subsequences beginning with an-1: an-1 an-1, an and, finally, one beginning with an an Total of possible subsequences: 1 + 2 + … n-1 + n + 1 = n(n+1)/2 + 1 Analysis: The dominant term is n2/2, hence search space is O(n2). A “brute-force” algorithm follows…
Algorithm MCSBruteForce Input: A sequence a1, a2, a3, …, an-1, an. Output: value, start and end of MCS. maxSum 0 for i = 1 to n do Set sum 0 for j = i to n do sum sum + aj if (sum > maxSum). maxSum sum start i end j Print start, end, maxSum and STOP.
Improved MCS Algorithm Think of avoiding looking at all the subsequences. Introduce the following notion. Given: ai, ai+1, …, ak, ak+1, …, aj (1) the subsequence: ai, ai+1, …, ak is a prefix of (1), where i k j. The prefix sum is: ai + ai+1 + … + ak Observation: In an MCS no prefix sum can be negative.
In the previous example, -1, 2, 3, -3, 2, we exclude: -1 -1, 2 -1, 2, 3 -1, 2, 3, -3 -1, 2, 3, -3, 2 and -3 -3, 2 as being possible candidates.
In general: If ever sum < 0, skip over index positions from i+1, …, j Also, if sum 0 always for a starting position i, none of positions i+1, …, n is a candidate start position, since ai and all following prefix sums are non-negative. So the only time we need to consider a new starting position is when the sum becomes negative and all the index positions from i+1, …, j can be skipped. The improved MCS algorithm inspects ai just once. The algorithm follows….
Algorithm MCSImproved Set i 1; Set start end 1 Set maxSum sum 0 for j = 1 to n do sum sum + aj if (sum > maxSum) maxSum sum start i end j if (sum < 0) i j + 1 sum 0 Print start, end, maxSum and STOP.
Analysis of the Algorithms Algorithm MCSBruteForce: The outer loop is executed n times For each i, the inner loop is executed n – i + 1 times Thus, the total number of times the inner loop is executed: Algorithm MCSImproved: It has a single for loop, which visits all n elements. Hence,
What is the big-Oh complexity of the following algorithm What is the big-Oh complexity of the following algorithm? What is the value of Sum after the execution of this algorithm with the values: n = 5, m = 10, and p = 6? Sum = 2; For (i = 0, i < n; i++) { For (j = 0; j ≤ m; j++) { For (k = 1; k < p; k++) { Sum++;
What is the big-Oh complexity of the following algorithm What is the big-Oh complexity of the following algorithm? What is the value of Sum after the execution of this algorithm with the values: n = 8? Sum = -5; For (i = 0, i < n; i++) { For (j = 0; 2*j < i; j++) { Sum++;