Download presentation
Presentation is loading. Please wait.
Published byLizbeth McDonald Modified over 5 years ago
1
Dynamic Programming-- Longest Common Subsequence
2
Subsequences A subsequence of a character string x0x1x2…xn-1 is a string of the form xi1xi2…xik, where ij < ij+1. Not the same as substring! Gaps can exist in subsequences Example String: ABCDEFGHIJK Subsequence: ACEGIJK Subsequence: DFGHK Not subsequence: DAGH Dynamic Programming
3
The Longest Common Subsequence (LCS) Problem
Given two strings X and Y find a longest subsequence common to both X and Y Has applications to DNA similarity testing (alphabet is {A,C,G,T}) disease identification, parental tests Example: ABCDEFG and XZACKDFWGH have ACDFG as a longest common subsequence Dynamic Programming
4
A Poor Approach to the LCS Problem
A Brute-force solution: Enumerate all subsequences of X Test which ones are also subsequences of Y Pick the longest one. Analysis: If X is of length n, then it has 2n subsequences Hint: a char can be presence/absent (binary) exponential-time algorithm! Dynamic Programming
5
Observations Can be viewed as string editing
Transform one string to another by keeping/adding/deleting characters Can also be viewed as aligning two strings Any ideas? -- T G C A
6
Observation: Pairs of indices
1 2 1 2 3 3 4 4 5 5 6 7 6 8 7 i index: elements of X -- T G C A T -- A -- C elements of Y A T -- C -- T G A T C j index:
7
Observation: Pairs of indices
1 2 1 2 3 3 4 4 5 5 6 7 6 8 7 i index: elements of X -- T G C A T -- A -- C elements of Y A T -- C -- T G A T C j index: (0,0) (0,1) (1,2) (2,2) (3,3) (4,3) (5,4) (5,5) (6,6) (6,7) (7,8) Matches shown in green
8
Observation: Pairs of indices
1 2 1 2 3 3 4 4 5 5 6 7 6 8 7 i index: elements of X -- T G C A T -- A -- C elements of Y A T -- C -- T G A T C j index: (0,0) (0,1) (1,2) (2,2) (3,3) (4,3) (5,4) (5,5) (6,6) (6,7) (7,8) Matches shown in green Pairs of indices correspond to a path in a 2-D grid
9
Edit Graph for LCS Problem
j 1 2 3 4 5 6 7 8 i T 1 G 2 C 3 A 4 T 5 A 6 C 7 (0,0) (0,1) (1,2) (2,2) (3,3) (4,3) (5,4) (5,5) (6,6) (6,7) (7,8)
10
Observation: Pairs of indices
1 2 1 2 3 3 4 4 5 5 6 7 6 8 7 i index: elements of X -- T G C A T -- A -- C elements of Y A T -- C -- T G A T C j index: Skipping a letter in Y (increment index j) (0,0) to (0,1) (5,4) to (5,5) (6,6) to (6,7)
11
Edit Graph for LCS Problem
Going right Increment the column index But not the row index Ie, skipping a col char T G C A 1 2 3 4 5 6 7 i 8 j
12
Observation: Pairs of indices
1 2 1 2 3 3 4 4 5 5 6 7 6 8 7 i index: elements of X -- T G C A T -- A -- C elements of Y A T -- C -- T G A T C j index: Skipping a letter in X (increment index i) (1,2) to (2,2) (3,3) to (4,3)
13
Edit Graph for LCS Problem
1 2 3 4 5 6 7 i 8 j Going down Increment the row index But not the column index Ie, skipping a row char
14
Observation: Pairs of indices
1 2 1 2 3 3 4 4 5 5 6 7 6 8 7 i index: elements of X -- T G C A T -- A -- C elements of Y A T -- C -- T G A T C j index: Matching a letter in X and Y (increment indices i and j) (0,1) to (1,2) (2,2) to (3,3) (4,3) to (5,4) …
15
Edit Graph for LCS Problem
1 2 3 4 5 6 7 i 8 j Going diagonal Increment the row index And the column index Ie, matching char
16
Edit Graph for LCS Problem
1 2 3 4 5 6 7 i 8 j Brown Path Skip A (column) Match T Skip G (row) Match C Skip A (row) ? 1 2 3 4 5 6 7 8 9 10
17
Edit Graph for LCS Problem
1 2 3 4 5 6 7 i 8 j Brown Path Skip A (column) Match T Skip G (row) Match C Skip A (row) Skip G (column) Match A Skip T (column) 1 2 3 4 5 6 7 8 9 10
18
Edit Graph for LCS Problem
j 1 2 3 4 5 6 7 8 i Always can skip a row char (down edge) Always can skip a col char (right edge) Additional diagonal edges (matching) T 1 G 2 C 3 A 4 T 5 A 6 C 7
19
Edit Graph for LCS Problem
j 1 2 3 4 5 6 7 8 Every path is a common subsequence. Every diagonal edge adds an extra element to common subsequence LCS Problem: Find a path with the maximum number of diagonal edges i T 1 G 2 C 3 A 4 T 5 A 6 C 7
20
Let Li,j = length of common subsequence up to X[i] and Y[j]
Computing LCS Let Li,j = length of common subsequence up to X[i] and Y[j] i-1,j i-1,j -1 1 i,j -1 i,j Li-1,j skip char at Xi Li,j = MAX Li,j skip char at Yj Li-1,j , if Xi = Yj , lengthen the match
21
LCS length L(Xi,Yj) is computed by:
Computing LCS LCS length L(Xi,Yj) is computed by: Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj
22
LCS length L(Xi,Yj) is computed by:
Example for a mismatch LCS length L(Xi,Yj) is computed by: Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj … A 1 C ?
23
LCS length L(Xi,Yj) is computed by:
Example for a mismatch LCS length L(Xi,Yj) is computed by: Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj … A 1 C
24
LCS length L(Xi,Yj) is computed by:
Example for a match LCS length L(Xi,Yj) is computed by: Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj … A 1 ?
25
LCS length L(Xi,Yj) is computed by:
Example for a match LCS length L(Xi,Yj) is computed by: Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj … A 1
26
Dynamic Programming Example
T C X Y 1 2 3 4 5 6 7 For simplicity: Table index starts from zero String index starts from 1 Table index 1 corresponds to string index 1 as shown Initialize row 0 and col 0 to be zeros in the table A T G 1 2 3 4 5 6 7
27
Dynamic Programming Example
T C X Y 1 2 3 4 5 6 7 A T G 1 1 2 Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj 3 4 5 6 7
28
Dynamic Programming: Backtracing
Arrows show where the score originated from. if from the left , skip Xi if from the top, skip Yj if Xi = Yj
29
Dynamic Programming Example
T C X Y 1 2 3 4 5 6 7 A T G 1 1 1 2 Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj 3 4 5 6 7
30
Dynamic Programming Example
T C X Y 1 2 3 4 5 6 7 A T G 1 1 1 1 2 Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj 3 4 5 6 7
31
Dynamic Programming Example
T C X Y 1 2 3 4 5 6 7 A T G 1 1 1 1 1 2 Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj 3 4 5 6 7
32
Dynamic Programming Example
T C X Y 1 2 3 4 5 6 7 A T G 1 1 1 1 1 1 2 Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj 3 4 5 6 7
33
Dynamic Programming Example
T C X Y 1 2 3 4 5 6 7 A T G 1 1 1 1 1 1 1 2 Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj 3 4 5 6 7
34
Dynamic Programming Example
T C X Y 1 2 3 4 5 6 7 A T G 1 1 1 1 1 1 1 1 2 Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj 3 4 5 6 7
35
Dynamic Programming Example
T C X Y 1 2 3 4 5 6 7 1 A T G 1 2 Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj 3 4 5 6 7
36
Dynamic Programming Example
T C X 1 2 3 4 5 6 7 Y A T G 1 1 1 1 1 1 1 1 Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj 2 1 2 2 2 2 2 2 3 1 2 4 1 2 5 1 2 6 1 2 1 7 2
37
Dynamic Programming Example
T C X 1 2 3 4 5 6 7 A T G Y Continuing with the dynamic programming algorithm gives this result. 1 1 1 1 1 1 1 1 2 1 2 2 2 2 2 2 3 1 2 2 3 3 3 3 4 1 2 2 3 4 4 4 5 1 2 2 3 4 4 4 6 1 2 2 3 4 5 5 1 7 2 2 3 4 5 5
38
LCS Algorithm { LCS(X,Y) for i 1 to n // zeros in column 0 Li,0 0
for j 1 to m // zeros in row 0 L0,j 0 for i 1 to n // assume string index starts from 1 for j 1 to m Li-1,j Li,j max Li,j-1 Li-1,j-1 + 1, if Xi = Yj return (L) {
39
Length but not sequence yet
1 2 3 4 5 6 7 G A T C X Y LCS(X,Y) creates the alignment grid Now we need a way to read the best alignment of X and Y Follow the arrows backwards from sink
40
LCS: Backtracing without “arrows”
L i-1, j-1 L i-1, j L i, j-1 L i, j 5 6 D Pair of chars are not the same and L i-1, j >= L i, j-1 From which direction did we get L i, j? Dynamic Programming
41
LCS: Backtracing without “arrows”
getLCS(X,Y,L) i = m // assume string index from 1 to m (or n) j = n while (Lij > 0) if (Xi = Yj) // from diagonal S.insertFront(Xi ), decrement i, decrement j else if (Li-1,j >= Li,j-1) // cell above is larger decrement i // from above else decrement j // from left return S
42
Note on book’s implementation
G A T C X Y 1 2 3 4 5 6 7 For simplicity (slides): Table index starts from zero String index starts from 1 Table index 1 corresponds to string index 1 as shown Book: String index starts from zero Table index 1 corresponds to string index 0 A T G 1 2 3 4 5 6 7
43
Time Complexity Length of two strings: n and m ?
44
Time Complexity Length of two strings: n and m
O(nm) to build the table O(m + n) to backtrace O(nm)
45
Dynamic Programming Well, sounds like what technique we learned?
Algorithm for LCS is an example Key observations: Problem can be decomposed into smaller subproblems (“decomposition”) Solution for a problem can be composed from solutions for subproblems (“composition”) Smallest problems with known solutions (“base case”) Well, sounds like what technique we learned? Dynamic Programming
46
Dynamic Programming Well, sounds like recursion will do as well
Algorithm for LCS is an example Key observations: Problem can be decompose into smaller subproblems (“decomposition”) Solution for a problem can be composed from solutions for subproblems (“composition”) Smallest problems with known solutions (“base case”) Well, sounds like recursion will do as well Yes, but the subproblems are repeated Dynamic Programing remembers the solutions for subproblems Ie, reuse the solutions -> saving time Dynamic Programming
47
Reusing solutions I needs E, F, H F needs B, C, E H needs D, E, G
A B C D E F G H I I needs E, F, H F needs B, C, E H needs D, E, G Recursion E is a subproblem of I, F, and H E would be calculated 3 times Dynamic programming E would be calculated only once!! Dynamic Programming
48
The General Dynamic Programming Technique
Simple subproblems (decomposition): can be defined in terms of a few variables Subproblem optimality (composition): the global optimum value can be defined in terms of optimal subproblems Subproblem overlap (repeated) Construct solutions bottom-up (starting with “base cases”) Remembering solutions for subproblems Dynamic Programming
49
Skipping the rest Dynamic Programming
50
Another example of Dynamic Programming
51
Matrix Chain-Products
Dynamic Programming can be used. Review: Matrix Multiplication. C = A*B A is d × e and B is e × f O(def ) time A C B d f e i j i,j Dynamic Programming
52
Matrix Chain-Products
Compute A=A0*A1*…*An-1 Ai is di × di+1 Problem: How to parenthesize? Example B is 3 × 100 C is 100 × 5 D is 5 × 5 (B*C)*D takes = 1575 ops B*(C*D) takes = 4000 ops Dynamic Programming
53
A Brute-force Approach
Algorithm Try all possible ways to parenthesize A=A0*A1*…*An-1 Calculate number of ops for each one Pick the one that is best Running time: The number of paranethesizations is equal to the number of binary trees with n nodes This is exponential! It is called the Catalan number, and it is almost 4n. This is a terrible algorithm! Dynamic Programming
54
A Greedy Approach Idea #1: repeatedly select the product that uses the most operations. Counter-example: A is 10 × 5 B is 5 × 10 C is 10 × 5 D is 5 × 10 Greedy idea #1 gives (A*B)*(C*D), which takes = 2000 ops A*((B*C)*D) takes = 1000 ops Dynamic Programming
55
Another Greedy Approach
Idea #2: repeatedly select the product that uses the fewest operations. Counter-example: A is 101 × 11 B is 11 × 9 C is 9 × 100 D is 100 × 99 Greedy idea #2 gives A*((B*C)*D)), which takes = ops (A*B)*(C*D) takes = ops The greedy approach is not giving us the optimal value. Dynamic Programming
56
A “Recursive” Approach
Define subproblems: Find the best parenthesization of Ai*Ai+1*…*Aj. Let Ni,j denote the number of operations done by this subproblem. The optimal solution for the whole problem is N0,n-1. Dynamic Programming
57
A “Recursive” Approach
Subproblem optimality: The optimal solution can be defined in terms of optimal subproblems: There has to be a final multiplication (root of the expression tree) for the optimal solution. final multiply is at index i: (A0*…*Ai)*(Ai+1*…*An-1). Dynamic Programming
58
A “Recursive” Approach
Subproblem optimality: The optimal solution can be defined in terms of optimal subproblems: There has to be a final multiplication (root of the expression tree) for the optimal solution. final multiply is at index i: (A0*…*Ai)*(Ai+1*…*An-1). Then the optimal solution N0,n-1 is the sum of two optimal subproblems, N0,i and Ni+1,n-1 plus the time for the last multiply. Dynamic Programming
59
A Characterizing Equation
The global optimal has to be defined in terms of optimal subproblems Consider all possible places (k) for that final multiply: Recall that Ai is a di × di+1 dimensional matrix. a characterizing equation for Ni,j is: Note that subproblems are not independent--the subproblems overlap: Dynamic Programming! Operations in the left product Operations in the right product Operations in multiplying the 2 products Dynamic Programming
60
Example Matrices A is 10 × 5 B is 5 × 10 C is 10 × 5 D is 5 × 10
Dimensions: d0=10 d1=5 d2=10 d3=5 d4=10 N 1 2 3 Dynamic Programming
61
AB Matrices A is 10 × 5 B is 5 × 10 C is 10 × 5 D is 5 × 10
Dimensions: d0=10 d1=5 d2=10 d3=5 d4=10 N 1 2 3 500 Dynamic Programming
62
BC Matrices A is 10 × 5 B is 5 × 10 C is 10 × 5 D is 5 × 10
Dimensions: d0=10 d1=5 d2=10 d3=5 d4=10 N 1 2 3 500 250 Dynamic Programming
63
CD Matrices A is 10 × 5 B is 5 × 10 C is 10 × 5 D is 5 × 10
Dimensions: d0=10 d1=5 d2=10 d3=5 d4=10 N 1 2 3 500 250 Dynamic Programming
64
ABC: A(BC) vs (AB)C Matrices A is 10 × 5 B is 5 × 10 C is 10 × 5
D is 5 × 10 N[0,2] A(BC) N[0,0] + N[1,2] + 10*5*5 = 500 N 1 2 3 500 250 Dynamic Programming
65
ABC: A(BC) vs (AB)C Matrices A is 10 × 5 B is 5 × 10 C is 10 × 5
D is 5 × 10 N[0,2] A(BC) N[0,0] + N[1,2] + 10*5*5 = 500 N[0,2] (AB)C N[0,1] + N[2,2] + 10*10*5 = 1000 N 1 2 3 500 (k=0) 250 Dynamic Programming
66
BCD: B(CD) vs (BC)D Matrices A is 10 × 5 B is 5 × 10 C is 10 × 5
D is 5 × 10 N[1,3] B(CD) N[1,3] (BC)D N 1 2 3 500 (k=0) 250 ? (k=?) Dynamic Programming
67
BCD: B(CD) vs (BC)D Matrices A is 10 × 5 B is 5 × 10 C is 10 × 5
D is 5 × 10 N[1,3] B(CD) N[1,1] + N[2,3] + 5*10*10 = 1000 N[1,3] (BC)D N[1,2]+ N[3,3] + 5*5*10 = 500 N 1 2 3 500 (k=0) 250 (k=2) Dynamic Programming
68
ABCD: A(BCD) vs (AB)(CD) vs (ABC)D
Matrices A is 10 × 5 B is 5 × 10 C is 10 × 5 D is 5 × 10 N[0,3] A(BCD) N[0,0] + N[1,3] + 10*5*10 = 1000 N[0,3] (AB)(CD) N[0,1] + N[2,3] + 10*10*10 =2000 N[0,3] (ABC)D N[0,2]+ N[3,3] + 10*5*10 =1000 N 1 2 3 500 (k=0) 1000 250 (k=2) Dynamic Programming
69
ABCD Matrices A is 10 × 5 B is 5 × 10 C is 10 × 5 D is 5 × 10
N[0,3] --- ABCD k=0 A(BCD) N[1,3] BCD k=2 (BC)D A((BC)D) N 1 2 3 500 (k=0) 1000 250 (k=2) Dynamic Programming
70
Time Complexity answer N
Filling in each entry in the N table takes O(n) time. O(n2) entries Building the table: O(n3) O(n) for finding the parenthesis locations Total: O(n3) answer N 1 2 … n-1 j i Dynamic Programming
71
Keeping track of indices
Initialize diagonal i,j or i,i j=0 j=1 j=2 j=3 i=0 0,0 i=1 1,1 i=2 2,2 i=3 3,3 Dynamic Programming
72
Keeping track of indices
b = # of products in subchain 1 2 matrices => 1 product (or length of subchain – 1) i = 0 to 2 in this example Generally 0 to n-b-1 [4-1-1] n-b combinations -1 for zero-based index j = 1 to 3 in this example Generally, j = i+b j=0 j=1 j=2 j=3 i=0 0,1 i=1 1,2 i=2 2,3 i=3 ABCD: AB, BC, CD Dynamic Programming
73
Keeping track of indices
b = # of products in subchain 2 3 matrices => 2 products (or length of subchain – 1) i = 0 to 1 in this example 0 to n-b-1 [4-2-1] j = 2 to 3 in this example j = i+b j=0 j=1 j=2 j=3 i=0 0,2 i=1 1,3 i=2 i=3 ABCD: ABC, BCD Dynamic Programming
74
Keeping track of indices
b = # of products in subchain 3 4 matrices => 3 products (or length of subchain – 1) i = 0 to 0 in this example 0 to n-b-1 [4-3-1] j = 3 to 3 in this example j = i+b j=0 j=1 j=2 j=3 i=0 0,3 i=1 i=2 i=3 ABCD: ABCD Dynamic Programming
75
A Dynamic Programming Algorithm
Algorithm matrixChain(S): Input: sequence S of n matrices to be multiplied Output: number of operations in an optimal paranethization of S for i 0 to n-1 do Ni,i // diagonal entries for b 1 to n-1 do // number of products in subchain for i 0 to n-b-1 do // start of subchain (first matrix) j i+b // end of subchain (last matrix) Ni,j +infinity // initial value for min for k i to j-1 do Ni,j min{Ni,j , Ni,k +Nk+1,j +di dk+1 dj+1} Dynamic Programming
76
The General Dynamic Programming Technique
Applies to a problem that at first seems to require a lot of time (possibly exponential), provided we have: Simple subproblems (decomposition): the subproblems can be defined in terms of a few variables, such as j, k, l, m, and so on. Subproblem optimality (composition): the global optimum value can be defined in terms of optimal subproblems Subproblem overlap: the subproblems are not independent, but instead they overlap (hence, should be constructed bottom-up (base case)). Dynamic Programming
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.