Longest common subsequence (LCS)
The longest common subsequence (LCS) problem A string : A = b a c a d A subsequence of A: deleting 0 or more symbols from A (not necessarily consecutive). e.g. ad, ac, bac, acad, bacad, bcd. Common subsequences of A = b a c a d and B = a c c b a d c b : ad, ac, bac, acad. The longest common subsequence (LCS) of A and B: a c a d. Subsequence need not be consecutive, but must be in order.
Longest common subsequence INPUT: two strings OUTPUT: longest common subsequence ACTGAACTCTGTGCACT TGACTCAGCACAAAAAC
Longest common subsequence INPUT: two strings OUTPUT: longest common subsequence ACTGAACTCTGTGCACT TGACTCAGCACAAAAAC
Longest common subsequence INPUT: two strings OUTPUT: longest common subsequence ACTGAACTCTGTGCACT TGACTCAGCACAAAAAC
Longest common subsequence INPUT: two strings OUTPUT: longest common subsequence ACTGAACTCTGTGCACT TGACTCAGCACAAAAAC
Longest common subsequence INPUT: two strings OUTPUT: longest common subsequence ACTGAACTCTGTGCACT TGACTCAGCACAAAAAC
Naïve /Brute Force Algorithm For every subsequence of X, check whether it’s a subsequence of Y . X has 2m subsequences. Each subsequence takes Θ(n) time to check: scan Y for first letter, for second, and so on. Time: Θ(n2m).
Optimal Substructure Theorem Let Z = z1, . . . , zk be any LCS of X x1, . . , xk and Y y1, . . , yk . 1. If xm = yn, then zk = xm = yn and Zk-1 is an LCS of Xm-1 and Yn-1. 2. If xm yn, then either zk xm and Z is an LCS of Xm-1 and Y . 3. or zk yn and Z is an LCS of X and Yn-1.
Optimal Substructure Proof: (case 1: xm = yn) Theorem Let Z = z1, . . . , zk be any LCS of X and Y . 1. If xm = yn, then zk = xm = yn and Zk-1 is an LCS of Xm-1 and Yn-1. Proof: (case 1: xm = yn) Any sequence Z’ that does not end in xm = yn can be made longer by adding xm = yn to the end. Therefore, longest common subsequence (LCS) Z must end in xm = yn. Zk-1 is a common subsequence of Xm-1 and Yn-1, and there is no longer CS of Xm-1 and Yn-1, or Z would not be an LCS.
Optimal Substructure Proof: (case 2: xm yn, and zk xm) Theorem Let Z = z1, . . . , zk be any LCS of X and Y . 2. If xm yn, then either zk xm and Z is an LCS of Xm-1 and Y . Proof: (case 2: xm yn, and zk xm) Since Z does not end in xm, Z is a common subsequence of Xm-1 and Y, and there is no longer CS of yn-1 and x, or Z would not be an LCS.
Optimal Substructure Proof: (case 2: xm yn, and zk yn) Theorem Let Z = z1, . . . , zk be any LCS of X and Y . 2. If xm yn, then zk yn and Z is an LCS of X and Yn-1. Proof: (case 2: xm yn, and zk yn) Since Z does not end in yn, Z is a common subsequence of yn-1 and x, and there is no longer CS of Xm-1 and Y, or Z would not be an LCS.
Recursive Solution Sequences x1,…,xn, and y1,…,ym LCS(i,j) = length of a longest common subsequence of x1,…,xi and y1,…,yj if xi = yj then LCS(i,j) =
Recursive Solution Sequences x1,…,xn, and y1,…,ym LCS(i,j) = length of a longest common subsequence of x1,…,xi and y1,…,yj if xi = yj then LCS(i,j) = 1 + LCS(i-1,j-1)
Recursive Solution Sequences x1,…,xn, and y1,…,ym LCS(i,j) = length of a longest common subsequence of x1,…,xi and y1,…,yj if xi yj then LCS(i,j) = max (LCS(i-1,j),LCS(i,j-1)) xi and yj cannot both be in LCS
Recursive Solution Sequences x1,…,xn, and y1,…,ym LCS(i,j) = length of a longest common subsequence of x1,…,xi and y1,…,yj if xi = yj then LCS(i,j) = 1 + LCS(i-1,j-1) if xi yj then LCS(i,j) = max (LCS(i-1,j),LCS(i,j-1))
Recursive Solution Define c[i, j] = length of LCS of Xi and Yj . We want c[m,n].
Constructing an LCS
LCS Example What is the Longest Common Subsequence of X and Y? We’ll see how LCS algorithm works on the following example: X = ABCB Y = BDCAB What is the Longest Common Subsequence of X and Y? LCS(X, Y) = BCB X = A B C B Y = B D C A B
ABCB BDCAB X = ABCB; m = |X| = 4 Y = BDCAB; n = |Y| = 5 j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 B 2 3 C 4 B X = ABCB; m = |X| = 4 Y = BDCAB; n = |Y| = 5 Allocate array c[5,4]
ABCB BDCAB for i = 1 to m c[i,0] = 0 for j = 1 to n c[0,j] = 0 Yj B D C A B Xi A 1 B 2 3 C 4 B for i = 1 to m c[i,0] = 0 for j = 1 to n c[0,j] = 0
ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 B 2 3 C if ( Xi == Yj ) A 1 B 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B
ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 B 2 3 C if ( Xi == Yj ) A 1 B 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B
ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 B 2 3 C A 1 1 B 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B
ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 3 C A 1 1 1 B 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B
ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 3 C A 1 1 1 B 2 1 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B
ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 3 C A 1 1 1 B 2 1 1 1 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B
ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 3 C A 1 1 1 B 2 1 1 1 1 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B
ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B
ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 1 1 4 B
ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 1 1 2 4 B
ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 1 1 2 2 4 B
ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 1 1 2 2 2 4 B
ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 1 1 2 2 2 4 B 1
ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 1 1 2 2 2 4 B 1 1 2 2
3 ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 1 1 2 2 2 3 4 B 1 1 2 2
Computing the Length of an LCS
Analysis of an LCS O(m*n) Running time and memory: O(mn) and O(mn). since each c[i,j] is calculated in constant time, and there are m*n elements in the array Running time and memory: O(mn) and O(mn).
How to find actual LCS 3 j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 2 B 1 Xi A 1 1 1 2 B 1 1 1 1 2 3 C 1 1 2 2 2 4 3 B 1 1 2 2
Finding LCS 3 j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 2 2 3 4 B 1 1 2 2
How to find actual LCS 3 j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 2 B 1 Xi A 1 1 1 2 B 1 1 1 1 2 3 C 1 1 2 2 2 4 3 B 1 1 2 2 Initial call is PRINT-LCS (b, X,m, n). When b[i, j ] = , we have extended LCS by one character. So LCS = entries with in them. Time: O(m+n) (no of characters to be checked)
3 B C B LCS (reversed order): B C B j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 2 2 3 4 B 1 1 2 2 B C B LCS (reversed order): B C B (this string turned out to be a palindrome) LCS (straight order):
LCS Example What is the Longest Common Subsequence of X and Y? We’ll see how LCS algorithm works on the following example: X = PRESIDENT Y = PROVIDENCE What is the Longest Common Subsequence of X and Y?
Output: PRIDEN