Dynamic Programming.

1 Dynamic Programming

2 The basic idea is drawn from intuition behind divide and conquer :
One implicitly explores the space of all possible solutions, by decomposing things into a series of sub-problems, and then building up correct solutions to larger and larger sub-problems.

3 The term Dynamic Programming comes from Control Theory, not computer science.
Programming refers to the use of tables (arrays) to construct a solution. Used extensively in "Operation Research" given in the Math dept.

4 The Main Idea In dynamic programming we usually reduce time by increasing the amount of space We solve the problem by solving subproblems of increasing size and saving each optimal solution in a table (usually). The table is then used for finding the optimal solution to larger problems. Time is saved since each subproblem is solved only once.

5 When is Dynamic Programming used?
Used for problems in which an optimal solution for the original problem can be found from optimal solutions to subproblems of the original problem Usually, a recursive algorithm can solve the problem. But the algorithm computes the optimal solution to the same subproblem more than once and therefore is slow. The two examples (Fibonacci and binomial coefficient) have such a recursive algorithm Dynamic programming reduces the time by computing the optimal solution of a subproblem only once and saving its value. The saved value is then used whenever the same subproblem needs to be solved.

6 Principle of Optimality (Optimal Substructure)
The principle of optimality applies to a problem (not an algorithm) A large number of optimization problems satisfy this principle. Principle of optimality: Given an optimal sequence of decisions or choices, each subsequence must also be optimal.

7 Principle of optimality - shortest path problem
Problem: Given a graph G and vertices s and t, find a shortest path in G from s to t Theorem: A subpath P’ (from s’ to t’) of a shortest path P is a shortest path from s’ to t’ of the subgraph G’ induced by P’. Subpaths are paths that start or end at an intermediate vertex of P. Proof: If P’ was not a shortest path from s’ to t’ in G’, we can substitute the subpath from s’ to t’ in P, by the shortest path in G’ from s’ to t’. The result is a shorter path from s to t than P. This contradicts our assumption that P is a shortest path from s to t.

8 Principle of optimality
P={ (a,b), (b,c) (c.d), (d,e)} P’={(c.d), (d,e)} f 5 e 3 13 G 10 7 a b c d 3 1 6 G’ P’ must be a shortest path from c to e in G’, otherwise P cannot be a shortest path from a to e in G.

9 Principle of optimality - MST problem (minimum spanning tree)
Problem: Given an undirected connected graph G, find a minimum spanning tree Theorem: Any subtree T’ of an MST T of G, is an MST of the subgraph G’ of G induced by the vertices of the subtree. Proof: If T’ is not an MST of G’, we can substitute the edges of T’ in T, with the edges of an MST of G’. This would result in a lower cost spanning tree, contradicting our assumption that T is an MST of G.

10 Principle of optimality
T={(c.d), (d,f), (d,e), (a,b), (b,c)} T’={(c.d), (d,f), (d,e)} f 5 e 3 G 10 2 a b c d 3 1 6 G’ T’ must be an MST of G’, otherwise T cannot be an MST

11 A problem that does not satisfy the Principle of Optimality
Problem: What is the longest simple route between City A and B? Simple = never visit the same spot twice. The longest simple route (solid line) has city C as an intermediate city. It does not consist of the longest simple route from A to C plus the longest simple route from C to B. Longest C to B D Longest A to C B A C Longest A to B

12 Longest Common Subsequence (LCS)

13 Longest Common Subsequence (LCS)
Problem: Given sequences x[1..m] and y[1..n], find a longest common subsequence of both. Example: x=ABCBDAB and y=BDCABA, BCA is a common subsequence and BCBA and BDAB are two LCSs

14 LCS Brute force solution Writing a recurrence equation
The dynamic programming solution Application of algorithm

15 Brute force solution Solution: For every subsequence of x, check if it is a subsequence of y. Analysis : Each subsequence of X corresponds to a subset of the indices {1, 2, …., m} of X. There are 2m subsequences of x. Each check takes O(n) time, since we scan y for first element, and then scan for second element, etc. The worst case running time is O(n2m).

16 Writing the recurrence equation
Let Xi denote the ith prefix x[1..i] of x[1..m], and X0 denotes an empty prefix We will first compute the length of an LCS of Xm and Yn, LenLCS(m, n), and then use information saved during the computation for finding the actual subsequence We need a recursive formula for computing LenLCS(i, j).

17 Writing the recurrence equation
If Xi and Yj end with the same character xi=yj, the LCS must include the character. If it did not we could get a longer LCS by adding the common character. If Xi and Yj do not end with the same character there are two possibilities: either the LCS does not end with xi, or it does not end with yj Let Zk denote an LCS of Xi and Yj

18 Xi and Yj end with xi=yj Xi Yj Zk
x1 x2 … xi-1 xi Yj y1 y2 … yj-1 yj=xi Zk z1 z2…zk-1 zk =yj=xi Zk is Zk -1 followed by zk = yj = xi where Zk-1 is an LCS of Xi-1 and Yj -1 and LenLCS(i, j)=LenLCS(i-1, j-1)+1

19 Xi and Yj end with xi ¹ yj Xi Xi Yj Zk Yj Zk x1 x2 … xi-1 xi
yj y1 y2 …yj-1 yj Zk z1 z2…zk-1 zk ¹ xi Yj y1 y2 … yj-1 yj Zk z1 z2…zk-1 zk ¹yj Zk is an LCS of Xi and Yj -1 Zk is an LCS of Xi -1 and Yj LenLCS(i, j)=max{LenLCS(i, j-1), LenLCS(i-1, j)}

20 The recurrence equation

21 The dynamic programming solution
Initialize the first row and the first column of the matrix LenLCS to 0 Calculate LenLCS (1, j) for j = 1,…, n Then the LenLCS (2, j) for j = 1,…, n, etc. Store also in a table an arrow pointing to the array element that was used in the computation. It is easy to see that the computation is O(mn)

22 Example To find an LCS follow the arrows, for each
diagonal arrow there is a member of the LCS

23 LCS-Length(X, Y) m  length[X} n  length[Y]
for i  1 to m do c[i, 0]  0 for j  1 to n do c[0, j]  0

24 LCS-Length(X, Y) cont. for i  1 to m do for j  1 to n do if xi = yj c[i, j]  c[i-1, j-1] b[i, j]  “D” else if c[i-1, j]  c[i, j-1] c[i, j]  c[i-1, j] b[i, j]  “U” else c[i, j]  c[i, j-1] b[i, j]  “L” return c and b

25 Questions How do we find the LCS? Do we need B?
Do we need whole C (for finding longest length)?

26 Fibonacci's Series

27 Fibonacci's Series: Definition: S0 = 0, S1 = 1, Sn = Sn-1 + Sn-2 n>1 0, 1, 1, 2, 3, 5, 8, 13, 21, … Applying the recursive definition we get: fib (n) 1. if n < return n 3. else return (fib (n -1) + fib(n -2))

28 Analysis using Substitution of Recursive Fibonacci
Let T(n) be the number of additions done by fib(n) T(n)=T(n-1)+T(n-2) for n>=2 T(2)=1 T(1)=T(0)=0 T(n) ÎO(2n), T(n) ÎW(2n/2)

29 What does the Execution Tree look like?
Fib(5) + Fib(3) + Fib(4) + Fib(3) + Fib(2) + Fib(1) Fib(2) + Fib(2) + Fib(1) Fib(1) Fib(0) Fib(1) Fib(0) Fib(1) Fib(0)

30 Dynamic Programming Solution for Fibonacci
Builds a table with the first n Fibonacci numbers. fib(n) 1. A[0] =0 2. A[1] = for i  2 to n 4. do A[ i ] = A [i -1] + A[i -2 ] 5. return A Is there a recurrence equation? What is the run time? What is the space requirements? If we only need the nth number can we save space?

31 Binomial Coefficient

32 The Binomial Coefficient

33 The recursive algorithm
binomialCoef(n, k) 1. if k = 0 or k = n then return else {return (binomialCoef( n -1, k -1) + binomialCoef(n-1, k))}

34 ( ) ( ) ( ) ) ( ) ( ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
binomialCoef(n, k) 1. if k = 0 or k = n then return else return (binomialCoef( n -1, k -1) + binomialCoef(n-1, k)) ( ) The Call Tree n k ( ) ( ) n -1 n-1 k -1 k ) ( ) ( ( ) ( ) n -2 n -2 n -2 n -2 k -1 k k -2 k -1 ( ) ( ) ( ) ( ) n -3 ( ) ( ) ( ) n -3 n -3 n -3 ( ) n -3 n -3 n -3 n -3 k k -2 k -1 k -1 k -2 k -2 k -1 k -3

35 Dynamic Solution Use a matrix B of n+1 rows, k+1 columns where
Establish a recursive property. Rewrite in terms of matrix B: B[ i , j ]= B[ i -1 , j -1 ] + B[ i -1, j ] , 0 < j < i , j = 0 or j = i Solve all “smaller instances of the problem” in a bottom-up fashion by computing the rows in B in sequence starting with the first row.

36 The B Matrix j k i n 1 1 1 1 2 1 B[i -1, j -1] B[i -1, j ] B[ i, j ]

37 Compute B[4,2]= ( ) Row 0: B[0,0] =1 Row 1: B[1,0] = 1 B[1,1] = 1
Row 2: B[2,0] = 1 B[2,1] = B[1,0] + B[1,1] = B[2,2] = 1 Row 3: B[3,0] = 1 B[3,1] = B[2,0] + B[2,1] = B[3,2] = B[2,1] + B[2,2] = 3 Row 4: B[4,0] = 1 B[4,1] = B[3, 0] + B[3, 1] = B[4,2] = B[3, 1] + B[3, 2] = 6

38 Dynamic Program bin(n,k ) 1. for i = 0 to n // every row 2. for j = 0 to minimum( i, k ) if j == 0 or j == i // column 0 or diagonal then B[ i , j ] = else B[ i , j ] =B[i -1, j -1] + B[i -1, j ] 6. return B[ n, k ] What is the run time? How much space does it take? If we only need the last value, can we save space?

39 Dynamic programming All values in column 0 are 1
All values in the first k+1 diagonal cells are 1 j ¹ i and 0 < j <=min{i,k} ensures we only compute B[ i, j ] for j < i and only first k+1 columns. Elements above diagonal (B[i,j] for j>i) are not computed since is undefined for j >i

40 Number of iterations

41 All pairs shortest path
Floyd’s Algorithm All pairs shortest path

42 All pairs shortest path
The problem: find the shortest path between every pair of vertices of a graph (expensive using brute-force approach) The graph: may contain negative edges but no negative cycles A representation: a weight matrix where W(i,j)=0 if i=j. W(i,j)=¥ if there is no edge between i and j W(i,j)=“weight of edge” Note: we have shown principle of optimality applies to shortest path problems

43 The weight matrix and the graph
1 v1 v2 3 9 5 1 3 2 v5 3 2 v4 v3 4 Adjacency Matrix

44 The subproblems How can we define the shortest distance di,j in terms of “smaller” problems? One way is to restrict the paths to only include vertices from a restricted subset. Initially, the subset is empty. Then, it is incrementally increased until it includes all the vertices.

45 The subproblems Let D(k)[i,j]=weight of a shortest path from vi to vj using only vertices from {v1,v2,…,vk} as intermediate vertices in the path D(0)=W D(n)=D which is the goal matrix How do we compute D(k) from D(k-1) ?

46 The Recursive Definition:
Case 1: A shortest path from vi to vj restricted to using only vertices from {v1,v2,…,vk} as intermediate vertices does not use vk Then D(k)[i,j]= D(k-1)[i,j]. Case 2: A shortest path from vi to vj restricted to using only vertices from {v1,v2,…,vk} as intermediate vertices does use vk. Then D(k)[i,j]= D(k-1)[i,k]+ D(k-1)[k,j]. Shortest path using intermediate vertices {V1, Vk } Vk Vj Vi Shortest Path using intermediate vertices { V1, Vk -1 }

47 The recursive definition
Since D(k)[i,j]= D(k-1)[i,j] or D(k)[i,j]= D(k-1)[i,k]+ D(k-1)[k,j]. We conclude: D(k)[i,j]= min{ D(k-1)[i,j], D(k-1)[i,k]+ D(k-1)[k,j] }. Shortest path using intermediate vertices {V1, Vk } Vk Vj Vi Shortest Path using intermediate vertices { V1, Vk -1 }

48 The pointer array P Used to enable finding a shortest path
Initially the array contains 0 Each time that a shorter path from i to j is found, the k that provided the minimum is saved (highest index node on the path from i to j) To print the intermediate nodes on the shortest path a recursive procedure that print the shortest paths from i and k, and from k to j can be used

49 Floyd's Algorithm Using n+1 D matrices
Floyd//Computes shortest distance between all pairs of //nodes, and saves P to enable finding shortest paths 1. D0  W // initialize D array to W [ ] 2. P  // initialize P array to [0] 3. for k  1 to n do for i  1 to n do for j  1 to n if (Dk-1[ i, j ] > Dk-1 [ i, k ] + Dk-1 [ k, j ] ) then Dk[ i, j ]  Dk-1 [ i, k ] + Dk-1 [ k, j ] P[ i, j ]  k; else Dk[ i, j ]  Dk-1 [ i, j ]

50 Example 4 5 2 -3 1 3 W = D0 = 1 5 4 2 3 1 2 3 -3 2 P =

51 k = 1 Vertex 1 can be intermediate node
4 5 2 7 -3 1 3 D1 = 1 2 3 5 -3 4 D1[2,3] = min( D0[2,3], D0[2,1]+D0[1,3] ) = min (, 7) = 7 D1[3,2] = min( D0[3,2], D0[3,1]+D0[1,2] ) = min (-3,) = -3 4 5 2 -3 1 3 D0 = 1 2 3 P =

52 k = 2 Vertices 1, 2 can be intermediate
4 5 2 7 -3 1 3 D1 = 1 2 3 5 -3 4 4 5 2 7 -1 -3 1 3 D2[1,3] = min( D1[1,3], D1[1,2]+D1[2,3] ) = min (5, 4+7) = 5 D2[3,1] = min( D1[3,1], D1[3,2]+D1[2,1] ) = min (, -3+2) = -1 D2 = 1 2 3 P =

53 k = 3 Vertices 1, 2, 3 can be intermediate
4 5 2 7 -1 -3 1 3 D2 = 1 2 3 5 -3 4 2 5 7 -1 -3 1 3 D3[1,2] = min(D2[1,2], D2[1,3]+D2[3,2] ) = min (4, 5+(-3)) = 2 D3[2,1] = min(D2[2,1], D2[2,3]+D2[3,1] ) = min (2, 7+ (-1)) D3 = 3 1 2 P =

54 Floyd's Algorithm: Using 2 D matrices
Floyd 1. D  W // initialize D array to W [ ] 2. P  // initialize P array to [0] 3. for k  1 to n // Computing D’ from D do for i  1 to n do for j  1 to n if (D[ i, j ] > D[ i, k ] + D[ k, j ] ) then D’[ i, j ]  D[ i, k ] + D[ k, j ] P[ i, j ]  k; else D’[ i, j ]  D[ i, j ] 10. Move D’ to D.

55 Can we use only one D matrix?
D[i,j] depends only on elements in the kth column and row of the distance matrix. We will show that the kth row and the kth column of the distance matrix are unchanged when Dk is computed This means D can be calculated in-place

56 The main diagonal values
Before we show that kth row and column of D remain unchanged we show that the main diagonal remains 0 D(k)[ j,j ] = min{ D(k-1)[ j,j ] , D(k-1)[ j,k ] + D(k-1)[ k,j ] } = min{ 0, D(k-1)[ j,k ] + D(k-1)[ k,j ] } = 0 Based on which assumption?

57 The kth column kth column of Dk is equal to the kth column of Dk-1
Intuitively true - a path from i to k will not become shorter by adding k to the allowed subset of intermediate vertices For all i, D(k)[i,k] = = min{ D(k-1)[i,k], D(k-1)[i,k]+ D(k-1)[k,k] } = min { D(k-1)[i,k], D(k-1)[i,k]+0 } = D(k-1)[i,k]

58 The kth row kth row of Dk is equal to the kth row of Dk-1
For all j, D(k)[k,j] = = min{ D(k-1)[k,j], D(k-1)[k,k]+ D(k-1)[k,j] } = min{ D(k-1)[ k,j ], 0+D(k-1)[k,j ] } = D(k-1)[ k,j ]

59 Floyd's Algorithm using a single D
Floyd 1. D  W // initialize D array to W [ ] 2. P  // initialize P array to [0] 3. for k  1 to n do for i  1 to n do for j  1 to n if (D[ i, j ] > D[ i, k ] + D[ k, j ] ) then {D[ i, j ]  D[ i, k ] + D[ k, j ] P[ i, j ]  k; } Time-complexity: T(n)=n*n*n = n3

60 Printing intermediate nodes on shortest path from q to r
path(index q, r) if (P[ q, r ]!=0) path(q, P[q, r]) println( “v”+ P[q, r]) path(P[q, r], r) return; //no intermediate nodes else return Before calling path check D[q, r] < , and print node q, after the call to path print node r 3 1 2 P = 1 2 3 5 -3 4

61 The graph in the Floyd example

62 The final distance matrix D and P
5(1) The values in parenthesis are the non zero P values.

63 The call tree for Path(1, 4)
Print v6 P(1, 6)=0 Path(6, 3) Print v3 Path(3, 4) P(3, 4)=0 P(6, 3)=0 The intermediate nodes on the shortest path from 1 to 4 are v6, v3. The shortest path is v1, v6, v3, v4.

64 An alternative computation of P
Initialization: P[i, j]= i if there is an edge from i to j to indicate that i is the last node before j on the shortest path from i to j P[i, j] is initialized to NIL if there is no edges from i to j Update: P[i, j] = P[k, j] if D[ i, j ] > D[ i, k ] + D[ k, j ]. This guarantees P[i ,j] contains the last node before j on the shortest path from i to j.

65 findPath (i, j) if P[i, j] = NIL return "No path" create empty stack S push j on stack S k = j //k is the last node on the path while P[i, k] != i //push node before k on stack push P[i, k] on stack S k = P[i, k] //node before k is new last node push i on stack S return S

66 Optimal Binary Search Tree

67 Optimal Binary Search Trees
Problem Given sequence K = k1 < k2 <··· < kn of n sorted keys, with a search probability pi for each key ki. Want to build a binary search tree (BST) with minimum expected search cost. Actual cost = # of items examined. For key ki, cost = depthT(ki)+1, where depthT(ki) = depth of ki in BST T . Minimizes the expected search time for a given probability distribution

68 Optimal Binary Search Tree
Average number of comparisons Index i 1 2 3 4 Key A B C D P(i) 0.1 0.2 0.4 0.3

69 Optimal Binary Search Tree
Let T be a binary search tree that contains Ki , Ki+1 , , Kj for some 1 ≤ i ≤ j ≤ n. We define the cost of average search time as Recurrence relation Let TL and TR be the left and right sub-trees of T. Then

70 Optimal Binary Search Tree
Principle of optimality Applied to Optimal Binary Search Trees Let T be a binary search tree that has minimum cost among all trees containing keys Ki , Ki+1 , , Kj and let Km be the key at the root of T (so i ≤ m ≤ j ). Then TL, the left subtree of T, is a binary search tree that has minimum cost among all trees containing keys Ki , . . , Km-1; and TR, the right subtree of T, is a binary search tree that has minimum cost among all trees containing keys Km+1 , , Kj.

71 Expected Search Cost Sum of probabilities is 1.

72 Example Consider 5 keys with these search probabilities: p1 = 0.25, p2 = 0.2, p3 = 0.05, p4 = 0.2, p5 = 0.3. k2 i depthT(ki) depthT(ki)·pi 1.15 k1 k4 k3 k5 Therefore, E[search cost] = 2.15.

73 Example p1 = 0.25, p2 = 0.2, p3 = 0.05, p4 = 0.2, p5 = 0.3. k2 k1 k5
i depthT(ki) depthT(ki)·pi 1.10 Therefore, E[search cost] = 2.10. This tree turns out to be optimal for this set of keys.

74 Example Observations: Build by exhaustive checking?
Optimal BST may not have smallest height. Optimal BST may not have highest-probability key at root. Build by exhaustive checking? Construct each n-node BST. For each, assign keys and compute expected search cost. But there are (4n/n3/2) different BSTs with n nodes.

75 Optimal Substructure One of the keys in ki, …,kj, say kr, where i ≤ r ≤ j, must be the root of an optimal subtree for these keys. Left subtree of kr contains ki,...,kr1. Right subtree of kr contains kr+1, ...,kj. To find an optimal BST: Examine all candidate roots kr , for i ≤ r ≤ j Determine all optimal BSTs containing ki,...,kr1 and containing kr+1,...,kj kr ki kr-1 kr+1 kj

76 Recursive Solution Find optimal BST for ki,...,kj, where i ≥ 1, j ≤ n, j ≥ i1. When j = i1, the tree is empty. Define A[i, j ] = expected search cost of optimal BST for ki,...,kj. If j = i1, then A[i, j ] = 0. If j ≥ i, Select a root kr, for some i ≤ r ≤ j . Recursively make an optimal BSTs for ki,..,kr1 as the left subtree, and for kr+1,..,kj as the right subtree.

77 Recursive Solution Find optimal BST for ki,...,kj, where i ≥ 1, j ≤ n, j ≥ i1. When j = i1, the tree is empty. A[i, j ] = expected search cost of optimal BST for ki,...,kj.

78 Computing an Optimal Solution
For each subproblem (i,j), store: expected search cost in a table A[1 ..n+1 , 0 ..n] Will use only entries A[i, j ], where j ≥ i1. Root[i, j ] = root of subtree with keys ki,..,kj, for 1 ≤ i ≤ j ≤ n. P[1..n+1, 0..n] = sum of probabilities p[i, i1] = 0 for 1 ≤ i ≤ n. p[i, j ] = p[i, j-1] + pj for 1 ≤ i ≤ j ≤ n. The Optimal Solution is: A[1, n] (minimum search time) Root[1, n] = R (1<=r<=n) Make an optimal binary search tree based on the root at R with keys k1,…, kn.

79 Memoization

80 Idea Memoization is used for recursive code that solves sub problems more than once. The idea is to modify the recursive code, and use a table to store the solution for every problem that was already solved. We can still use the recursive call with more efficiency

81 Initialization Before you can apply the recursive code, you need to initialize the table to some "impossible value". For example if the function can have only positive values you can initialize the table to negative values.

82 Method The recursive code is changed to first check the table to determine whether you have already solved the current problem or not. If the value in the table is still the initial value you know that you have not yet solved the current problem, so you apply the modified recursive code and store the solution in the table. Otherwise, you know that the value is the required solution and use or return it directly.

83 Example 1 The following code is for a memoized version of recursive Fibonacci. MemoizedFib(n) for (i=0; i<=n; i++) A[i]=-1 //initialize the array return LookupFib(n) LookupFib(n) //compute fn if not yet done if A[n]= //initial value if n<=1 //base case A[n] = n //store solution //solve and store general case else A[n] = (LookupFib(n-1)+LookupFib(n-2)) return A[n] //return fn = A[n]

84 Example 2 The following code is for a memoized version of the recursive code for binomial coefficients memoizedBC(n, k) for i=0 to n for j=0 to min(i, k) B[i, j] = -1 //initialize array return binomialCoef(n, k) binomialCoef(n, k) if B(n, k) = -1 //B[n, k] not yet computed   if k = 0 or k = n //Base case   B[k, n] = 1 //solve base case and store   //solve and store general case   else B[k,n]=(binomialCoef(n-1,k –1)+binomialCoef(n-1, k)) return B[n, k]

85 Example 3 Give a memorized version of the longest common subsequence that runs in O(mn) time. This version is slightly different since the base cases are directly stored into the table during initialization. Memoized-LCS-Length(X, Y) m <- length[X} n <- length[Y] for i <- 1 to m do //initialize column 0 c[i, 0] <- 0 for j <- 1 to n do //initialize row 0 c[0, j] <- 0 for i <- 1 to m do //initialize remaining matrix for j <- 1 to n do c[i, j] <- -1 L-LCS-Length(X, Y, m, n)//call with m and n return c, b

86 Example 3 (cont’d) L-LCS-Length(X, Y, i, j) if c[i, j] = - 1
if xi = yj L-LCS-Length(X, Y, i - 1, j -1) c[ i j] <- c[i - 1, j - 1] + 1 b[i, j] <- “D” return L-LCS-Length(X, Y, i - 1, j) L-LCS-Length(X, Y, i, j - 1) if c[i - 1, j] >= c[i, j - 1] c[i, j] <- c[i - 1, j] b[i, j] <- “U” else c[i, j] <- c[i, j - 1] b[i, j] <- “L” return 

