Dynamic Programming Dynamic programming is a technique for solving problems with a recursive structure with the following characteristics: 1.optimal substructure (principle of optimality): An optimal solution to a problem can be decomposed into optimal solutions for subproblems, This is the defining characteristic of problems solving by DP. Not every problem has this property. 2.a small number of subproblems: The total number of sub-instances to be solved is small 3.overlapping subproblems: During the computation same instances are referred to over and over again
Illustration ﹌﹌﹌ uv x y uv x y If is the shortest path between u and v then x y is the shortest path between x and y
A Negative Example Consider the problem of finding shortest paths on graphs that allow negative edge weights. Referring to the following graph, observe that a-b-c is a shortest path between a and c of length -2. However, unlike for the positive weight shortest path problem, the subpaths of this shortest path are not necessarily shortest paths. Thus, the shortest path between and and b is a-c-b (of length -1) not a-b (of length 1). The principle of optimality fails to hold for this problem. a b c
The Airplane Problem You live in a country with n+1 cities, all equally spaced along a line. The cities are numbered from 0 to n. You are in city n and want to get to 0. 1.You can only travel by plane. 2.You can only take one plane per day. The fare is determined by the distance (start city # - stop city #) you fly. Flying distance i costs air_fare[i]. 3.Spending the night in city i costs hotel_cost[i]. Goal: Minimize the total cost to get to city 0. Ignore how many days it takes.
Example (n=4) ihotel_cost[i] air_fare[i]
A Divide and Conquer Solution function mincost(i: integer): integer; var k: integer; begin if i = 0 then return(0) else return( [air_fare[i-k]+hotel_cost[k]+mincost(k)]) end; /* mincost(n) is the answer */ Let T(n) be the time required to solve a problem of size n. T(n) = T(i) + O(n), T(0) = O(1) T(n) - T(n-1) = T(n-1) + k, T(n) = 2T(n-1) + k for n > 1 T(n) = O(2 n ) !!! min 0 k i-1
Discussion
A DP Solution var mincost: array[0..n] of integers begin mincost[0] := 0; for i := 1 to n do mincost[i]:= { (air_fare[i-k]+hotel_cost[k])+mincost[k]} end; /* the answer is in mincost[n] */ This solution requires time (n 2 ). A DP Solution min 0 k i-1
Observations Original problem: Go from city n to city 0. Generalized Problem: Go from city i to city 0. Another generalized problem: Go from city n to city i. Note: A more generalized problem is: "Go from city i to j (i ≥ j)". But this idea leads to a less efficient solution. The method calculating one solution from some others is exactly the same as in the recursive version (except that function calls are replaced by array references). Usually we get the idea for this method by thinking recursively. However, the final solution is iterative. –Divide and conquer: top-down. –Dynamic programming: bottom-up.
Matrix Multiplication With the standard matrix multiplication method, how many scalar multiplications are needed for computing A B where A and B are of dimensions p q and q r? 1. pqr 2. pq 2 r
Chained Matrix Multiplication Need to compute the product M = A 1 A 2 A n of matrices A 1, , A n. By associativity (i.e., (AB)C = A(BC)) of multiplication we can compute M in various order. Use parentheses to describe the order. The matrix-chain multiplication problem asks: What is the minimum cost for computing M?
Example A is 10 100, B 100 10, and C 10 100 (AB)C How many operations for A(BC)? 200,000
Problem Formulation A product of matrices is fully parenthesized if it is either a single matrix or the product of two fully parenthesized matrix products. For example, for the product ABCD, there are five possible ways to fully parenthesize the product: (A(B(CD))), (A((BC)D)), ((A(BC))D), ((AB)(CD)), (((AB)C)D).
Problem Formulation Now the matrix-chain multiplication problem is equivalent to: Given a list p = (p 0, p 1, , p n ) of positive integers, compute the optimal-cost full-parenthesization of the chain (A 1, A 2, , A n ), where for each i, the i th matrix is of dimension p i-1 p i and the cost is measured by # of scalar multiplications.
Exhaustive Search Is brute-force search efficient for this problem? Let P(n) be the number of distinct full parenthesizations of n matrices. Then Solving this, we obtain P(n) = C(n – 1), where Thus, the brute-force search is too inefficient.
DP Approach: O(n 3 ) We need to cut the chain A 1, , A n somewhere in the middle, i.e., pick some k; 1 k n-1 and compute B(k) = A 1, , A k, C(k) = A k+1, , A n, and B(k)C(k). Now, if we already know the optimal cost of computing B(k) and C(k) for all k; 1 k n-1, then we can compute the optimal cost for the entire product by finding a k that minimize “the optimal cost for computing B(k)" + “the optimal cost for computing C(k)" + p 0 p k p n Use the same idea to compute the minimum costs for all the subproducts. This results in a bottom-up computation of optimal costs.
Generalization Original problem: Compute the cost for A 1 A 2 A n Generalized problem: Compute the cost for A i A i+1 A j 1.For any k such that i ≤ k < j, divide product into two products: (A i A i+1 A k ), (A k+1 A k+2 A j ). Let m[i, j] be the optimal cost for computing A i A i+1 A j. Then this grouping has cost m[i, k] + m[k+1, j] + p i-1 p k p j 2.Then choose minimum for all such k. For each i; j, 1 i j n, let s[i; j] be the smallest cut- point that provides the optimal cost for m[i, j].
Problem Ordering –To get a DP solution, order the problems: m[1,1] m[2,2]... m[n-1,n-1] m[n,n] m[1,2] m[2,3]... m[n-1,n] m[1,n-1] m[2,n] m[1,n] 1.for i = 1 to n, set m[i; i] = 0. 2.for ℓ = 2 to n, compute s- and m-values for all length ℓ subchains (A i, , A j ).
Computing Size ℓ Chains
The Parenthesization Once all the entries of s have been computed, we can compute the parenthesization that provides the optimal cost: 1.Print-Chain(i; j) 2. /* print the parenthesization for A i A j */ 3.print("(") 4.Print-Chain(i; s[i; j]) 5.Print-Chain(s[i; j] +1; j) 6.print(")")
Example M1 M2 M3 M4 10 1 1 100 m[1,1]=0 m[2,2]=0m[3,3]=0 m[4,4]=0 m[1,2]=10000m[2,3]=1000m[3,4]=5000 m[1,3]=1200m[2,4]=3000 m[1.4]=2200 For example to compute m[1,4] choose the best of M1 (M2 M3 M4) = (M1 M2) (M3 M4) = (M1 M2 M3) M = ((M1 (M2 M3)) M4)
Characteristics 1.optimal substructure: If s[1, n] = k, then an optimal full parenthesization contains those of (A 1 A k ) and (A k+1 A n ). 2.a small number of subproblems: The number of subproblems is the number of (i; j) with 1 i j n, which is n(n+1)/2. 3.overlapping subproblems: m[i, j’] and m[i’, j] are referred to during the computation of m[i, j], for every i < i’ j and i j’ < j. Because of the last property, computing an m-entry by recursive calls takes exponentially many steps.
Longest Common Subsequence A sequence Z = is a subsequence of a sequence X = if Z can be generated by striking out some (or none) elements from X. For example, is a subsequence of The longest common subsequence problem is the problem of finding, for given two sequences X = and Y =, a maximum-length common subsequence of X and Y. Brute-force search for LCS requires exponentially many steps because if m n, there are candidate subsequences
Example
The Optimal-Substructure of LCS For a sequence Z = and i, 1 i k. let Z i denote the length i prefix of Z, i.e., Z i =. Theorem: Let X = and Y= 1.If x m = y n, then an LCS of X m-1 and Y n-1 followed by x m (= y n ) is an LCS of X and Y. 2.If x m y n, then an LCS of X and Y is either an LCS of X m-1 and Y or an LCS of X and Y n-1.
Proof of the Theorem 1.Suppose x m and y n are the same symbol, say . Take an LCS Z of X and Y. Generation of Z should need either x m or y n. Otherwise, appending to Z would make a LCS. If necessary, modify the production of Z from X (from Y) so that its last element is x m (y n ). Then Z is a common subsequence W of X m-1 or Y n-1 followed by a . By the maximality of Z, W should be an LCS. 2.If x m y n, then for any LCS Z of X and Y, generation of Z cannot use both x m and y n. So, Z is either an LCS of X m-1 and Y or an LCS of X and Y n-1.
Computation Strategy If x m = y n, then append x m to an LCS of X m-1 and Y n-1. Otherwise, compute the longer of an LCS of X and Y n-1 and X m-1 and Y. Let c[i, j] be the length of an LCS of X i and Y j. We get the recurrence: Let b[i, j] maintain the choice made for (X i, Y j ). With the b-table we can reconstruct an LCS.
Example: Here numeric entries are c-values and Arrows are b-values.
The Other 2 Characteristics of DP a small number of subproblems: There are only (m+1)(n +1) entries in c (and in b). overlapping subproblems: c[i; j] may be eventually referenced to in the process of computing c[i’, j’] for any i’ i and j’ j. Time (and space) complexity of the algorithm : O(mn).
Optimal Polygon Triangulation A polygon is a closed collection of lines (called sides) in the plane. A point joining two sides is a vertex. The line segment between two nonadjacent nodes is a chord. A polygon is convex if any chord is either on the boundary or in the interior of the polygon. A polygon is represented by listing its vertices in counterclockwise order. represents the right polygon.
Triangulation A triangulation of a polygon is a set T of chords of the polygon that divides the polygon into disjoint triangles. Every triangulation of an n-vertex polygon has n – 3 chords and divides the polygon into n – 2 triangles.
Optimal Triangulation The weight of a triangle v i v j v k, denoted by w( v i v j v k ), is |v i v j | + |v j v k |+ |v k v i |, where |v i v j | is the Euclidean distance between v i and v j. The optimal (polygon) triangulation problem is the problem of finding a triangulation of a convex polygon that minimizes the sum of the weights of the triangles in the triangulation. Given a triangulation with chords ℓ 1, …, ℓ n-3, the weight- sum can be rewritten as: Note that v n = v 0.
Observation For each i, j, 0 i, where either i > 0 or j < n. The polygon P ij consists of j - i consecutive sides v i v i+1, , v j-1 v j and one more line v j v i which is a side if (i, j) = (0, n-1) or (1, n) and a chord otherwise. Let t[i, j] be the sum of the chord length in an optimal triangulation of P ij. The ultimate goal is to compute t[0, n-1].
The DP Approach Idea: In any triangulation, there is a unique triangle that has v i v j as a side. So, try each k, i < k < j, and cut the polygon by lines v i v k and v j v k (one of these lines can be a side, but not both) thereby generating a triangle and the rest, which consists of one or two polygons.
The Key Step Then the sum of the chord lengths are: 1.k = i +1: t[i +1, j] + |v i+1 v j |. 2.k = j - 1: t[i, j -1] + |v i v j-1 |. 3.i +1 < k < j –1: t[i, k] + t[k, j] + |v i v k | + |v j v k | –Pick a k that provides the smallest value.
The Algorithm Let t(i, j) be the optimal cost of a triangulation of P ij. Observe: t(i; i) = 0 and for i < j, Now a dynamic program algorithm is straightforward: just compute t(i, j)'s from small intervals (j – i) to large intervals for all i, j, and t(0, n-1) is the solution.
Optimal Binary Search Tree Given a sorted list with n elements and the probability for accessing the keys (and leaves). Find the binary search tree with smallest cost. Here, "leaves" are "fictitious" nodes, added to account for unsuccessful searches. If p[i] is the prob. of node i, and q[i] is the prob. of an unsuccessful search between node i and i+1. The cost of a tree is: p[i]*level[node i]+ q[i]*(level[leaf i]-1)
Example
The Idea Let c(i, j) = minimum cost for tree consisting of nodes i, …, j. In calculating c(i, j), we must find the best chance for the root. How to determine this? – Try all possibilities, k, i ≤ k ≤ j Clearly, the 2 subtrees should be the optimal trees for the given set of nodes Clearly, the cost of this tree is related to c(i, k-1) and c(k+1, j). However, the two subtrees have been "pushed down" one level.
Tree does not have to be balanced to be optimal Analysis of the algorithm –Body of the loop takes time O(n) (finding minimun of ≤ n elements), and it is evaluated O(n 2 ) times –Total O(n 3 )
CFG Parsing: CYK Algorithm Problem: Given a context-free grammar G in Chomsky Normal Form, and a word w, decide if w is in L(G), the language generated by G. Example: Given G, L(G) ={1 2n : n > 1}, by S AA, A BB | AA, B is in G by derivation: S AA AAA * BBBBBB * Dynamic Programming solution: Exercise.