Longest Common Subsequence

Longest Common Subsequence
Dynamic Programming Longest Common Subsequence

Dynamic Programming(DP)
A powerful design technique for optimization problems Related to divide and conquer However, due to the nature of DP problems, standard divide-and-conquer solution are not efficient

Dynamic Programming(DP)
Main question: How to set up the subproblem structure? For DP to be applicable to an optimization problem Optimal substructure: for the global problem to be solved optimally, each subproblem should be solved optimally Polynomially many subproblems Overlapping subproblems

Longest Common Subsequence (LCS)
Application: searching for a substring or pattern in a large piece of text Not necessarily exact text but something similar Method for measuring degree of similarity: LCS

Given two sequences x[1 . . m] and y[1 . . n], find a longest subsequence common to them both. x: A B C B D A B y: B D C A B A

Given two sequences x[1 . . m] and y[1 . . n], find a longest subsequence common to them both. x: A B C B D A B y: B D C A B A LCS(x,y) = BCBA

LCS Not always unique X : A B C Y: B A C

Brute Force Solution? Check every subsequence of x[1 . . m] to see if it is also a subsequence of y[1 . .n]

Brute Force Solution? Check every subsequence of x[1 . . m] to see if it is also a subsequence of y[1 . .n] Analysis • Checking = O(n) time per subsequence. • subsequences of x(each bit-vector of length m determines a distinct subsequence of x). Thus, Exponential time.

DP Formulation for LCS Subproblems? Consider all pairs of prefixes
A prefix of a sequence is just an initial string of characters Let denote the prefix of X with i characters Let denote the empty sequence

DP Formulation Compute the LCS for every possible pair of prefixes
C[i,j]: length of the LCS of and Then, C[m,n]: length of LCS of X and Y

Recursive formulation
C[i,0] = C[0,j] = 0 // base case How can I compute C[i,j] using solutions to subproblems?

Two cases Last characters match: if X[i] = Y[j] LCS must end with this same character i X A j Y A

Two cases Last characters match: if X[i] = Y[j] LCS must end with this same character C[i,j] = C[i-1, j-1] + 1 LCS ( ) = LCS( ) + X[i] i X A j Y A

(2) Last characters DO NOT match: LCS( ) //skip x[i] LCS( ) //skip Y[j] C[i,j] = max (C[i, j-1], C[i-1,j]) B A i j X Y B A i j X Y B A i j X Y

A recursive algorithm LCS(x, y, i, j) { if x[i] = y[ j ]
then c[i, j ] ←LCS(x, y, i–1, j–1) + 1 else c[i, j ] ←max{LCS(x, y, i–1, j), LCS(x, y, i, j–1) } Worst-case:x[i] ≠y[ j ], in which case the algorithm evaluates two subproblems, each with only one parameter decremented.

Recursion tree Height = O(m+n) Running time = potentially exponential!
3,4 2,4 3,3 1,4 2,3 3,2 Height = O(m+n) Running time = potentially exponential!

BUT, we keep recomputing the same subproblems
3,4 2,4 3,3 1,4 2,3 2,3 3,2

Overlapping subproblems
A recursive solution contains a “small” number of distinct subproblems repeated many times The number of distinct LCS subproblems for two strings of lengths m and n is only mn

Store the solutions of subproblems
After computing a solution to a subproblem, store it in a table. C[0..m,0..n] that stores lengths of LCS Keep also a helper array B[0..m,0..n] to store some pointers to extract the LCS later

O(mn) LCS(x[1..m],y[1..n]) { int C[0..m,0..n]; int B[0..m,0..n]
for i=0 to m C[i,0] = 0; B[i,0] = UP for j=0 to n C[0,j] = 0; B[0,j] = LEFT for i=1 to m for j=1 to n if x[i] == y[j] C[i,j]=c[i-1, j-1] +1; B[i,j] = DIAG; else if (C[i-1,j] >= C[i, j-1]) C[i,j]=C[i-1, j]; B[i,j] = UP; else C[i,j]=C[i, j-1]; B[i,j] = LEFT; return C[m,n]; } O(mn)

Compute tables bottom-up
Y Y B A C D B D C B B A X X C D B Length Table: C Tables C and B

Compute tables bottom-up
Y Y B A C D B D C B B A X X C D B Length Table: C Tables C and B Start from B[m,n], follow pointers Extract entries with DIAG ( ) LCS for above example: BCB

ExtractLCS(B, X, i, j) { //initially called with (B, X, m, n)
if i==0 OR j== 0 return; if B[i,j] == DIAG ExtractLCS(B, X, i-1, j-1); print X[i] else if B[i,j] == UP extractLCS(B, X, i-1, j); else // LEFT extractLCS(B, X, i, j-1) }

Longest Common Subsequence

Similar presentations

Presentation on theme: "Longest Common Subsequence"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Longest Common Subsequence

Similar presentations

Presentation on theme: "Longest Common Subsequence"— Presentation transcript:

Similar presentations

About project

Feedback