Longest common subsequence (LCS)

Slides:



Advertisements
Similar presentations
Introduction to Algorithms 6.046J/18.401J/SMA5503
Advertisements

Dynamic Programming Nithya Tarek. Dynamic Programming Dynamic programming solves problems by combining the solutions to sub problems. Paradigms: Divide.
Lecture 8: Dynamic Programming Shang-Hua Teng. Longest Common Subsequence Biologists need to measure how similar strands of DNA are to determine how closely.
CPSC 335 Dynamic Programming Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Overview What is Dynamic Programming? A Sequence of 4 Steps
Algorithms Dynamic programming Longest Common Subsequence.
COMP8620 Lecture 8 Dynamic Programming.
Chapter 7 Dynamic Programming.
Comp 122, Fall 2004 Dynamic Programming. dynprog - 2 Lin / Devi Comp 122, Spring 2004 Longest Common Subsequence  Problem: Given 2 sequences, X =  x.
1 Longest Common Subsequence (LCS) Problem: Given sequences x[1..m] and y[1..n], find a longest common subsequence of both. Example: x=ABCBDAB and y=BDCABA,
Chapter 7 Dynamic Programming 7.
§ 8 Dynamic Programming Fibonacci sequence
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
November 7, 2005Copyright © by Erik D. Demaine and Charles E. Leiserson Dynamic programming Design technique, like divide-and-conquer. Example:
chapter chapter chapter253.
1 Dynamic Programming Jose Rolim University of Geneva.
Lecture 7 Topics Dynamic Programming
Longest Common Subsequence
1 Dynamic programming algorithms for all-pairs shortest path and longest common subsequences We will study a new technique—dynamic programming algorithms.
Longest common subsequence INPUT: two strings OUTPUT: longest common subsequence ACTGAACTCTGTGCACT TGACTCAGCACAAAAAC.
Dynamic Programming – Part 2 Introduction to Algorithms Dynamic Programming – Part 2 CSE 680 Prof. Roger Crawfis.
Dynamic Programming (16.0/15) The 3-d Paradigm 1st = Divide and Conquer 2nd = Greedy Algorithm Dynamic Programming = metatechnique (not a particular algorithm)
Dynamic Programming UNC Chapel Hill Z. Guo.
ADA: 7. Dynamic Prog.1 Objective o introduce DP, its two hallmarks, and two major programming techniques o look at two examples: the fibonacci.
7 -1 Chapter 7 Dynamic Programming Fibonacci sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
6/4/ ITCS 6114 Dynamic programming Longest Common Subsequence.
Introduction to Algorithms Jiafen Liu Sept
David Luebke 1 2/26/2016 CS 332: Algorithms Dynamic Programming.
9/27/10 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Adam Smith Algorithm Design and Analysis L ECTURE 16 Dynamic.
TU/e Algorithms (2IL15) – Lecture 4 1 DYNAMIC PROGRAMMING II
CS583 Lecture 12 Jana Kosecka Dynamic Programming Longest Common Subsequence Matrix Chain Multiplication Greedy Algorithms Many slides here are based on.
Dynamic Programming Typically applied to optimization problems
Dynamic Programming (DP)
Many slides here are based on D. Luebke slides
David Meredith Dynamic programming David Meredith
Advanced Algorithms Analysis and Design
Least common subsequence:
Dynamic Programming CISC4080, Computer Algorithms CIS, Fordham Univ.
CS200: Algorithm Analysis
Dynamic Programming Several problems Principle of dynamic programming
Chapter 8 Dynamic Programming.
Dynamic Programming Comp 122, Fall 2004.
CSCE 411 Design and Analysis of Algorithms
Dynamic Programming.
برنامه نویسی پویا (Dynamic Programming)
CS Algorithms Dynamic programming 0-1 Knapsack problem 12/5/2018.
Dynamic Programming Dr. Yingwu Zhu Chapter 15.
CS 3343: Analysis of Algorithms
Merge Sort Dynamic Programming
CS6045: Advanced Algorithms
Longest Common Subsequence
Dynamic Programming Comp 122, Fall 2004.
Lecture 8. Paradigm #6 Dynamic Programming
Ch. 15: Dynamic Programming Ming-Te Chi
Dynamic programming & Greedy algorithms
CS 332: Algorithms Amortized Analysis Continued
Introduction to Algorithms: Dynamic Programming
Algorithm Design Techniques Greedy Approach vs Dynamic Programming
Longest Common Subsequence
Dynamic Programming CISC4080, Computer Algorithms CIS, Fordham Univ.
Midterm: week 7 in the lecture for 2 hours
Algorithms and Data Structures Lecture X
Analysis of Algorithms CS 477/677
Negative-Weight edges:
Longest Common Subsequence
Chapter 15: Dynamic Programming
Chapter 15: Dynamic Programming
Dynamic Programming.
Longest Common Subsequence
Algorithm Course Dr. Aref Rashad
Presentation transcript:

Longest common subsequence (LCS)

The longest common subsequence (LCS) problem A string : A = b a c a d A subsequence of A: deleting 0 or more symbols from A (not necessarily consecutive). e.g. ad, ac, bac, acad, bacad, bcd. Common subsequences of A = b a c a d and B = a c c b a d c b : ad, ac, bac, acad. The longest common subsequence (LCS) of A and B: a c a d. Subsequence need not be consecutive, but must be in order.

Longest common subsequence INPUT: two strings OUTPUT: longest common subsequence ACTGAACTCTGTGCACT TGACTCAGCACAAAAAC

Longest common subsequence INPUT: two strings OUTPUT: longest common subsequence ACTGAACTCTGTGCACT TGACTCAGCACAAAAAC

Longest common subsequence INPUT: two strings OUTPUT: longest common subsequence ACTGAACTCTGTGCACT TGACTCAGCACAAAAAC

Longest common subsequence INPUT: two strings OUTPUT: longest common subsequence ACTGAACTCTGTGCACT TGACTCAGCACAAAAAC

Longest common subsequence INPUT: two strings OUTPUT: longest common subsequence ACTGAACTCTGTGCACT TGACTCAGCACAAAAAC

Naïve /Brute Force Algorithm For every subsequence of X, check whether it’s a subsequence of Y . X has 2m subsequences. Each subsequence takes Θ(n) time to check: scan Y for first letter, for second, and so on. Time: Θ(n2m).

Optimal Substructure Theorem Let Z = z1, . . . , zk be any LCS of X x1, . . , xk and Y y1, . . , yk . 1. If xm = yn, then zk = xm = yn and Zk-1 is an LCS of Xm-1 and Yn-1. 2. If xm  yn, then either zk  xm and Z is an LCS of Xm-1 and Y . 3. or zk  yn and Z is an LCS of X and Yn-1.

Optimal Substructure Proof: (case 1: xm = yn) Theorem Let Z = z1, . . . , zk be any LCS of X and Y . 1. If xm = yn, then zk = xm = yn and Zk-1 is an LCS of Xm-1 and Yn-1. Proof: (case 1: xm = yn) Any sequence Z’ that does not end in xm = yn can be made longer by adding xm = yn to the end. Therefore, longest common subsequence (LCS) Z must end in xm = yn. Zk-1 is a common subsequence of Xm-1 and Yn-1, and there is no longer CS of Xm-1 and Yn-1, or Z would not be an LCS.

Optimal Substructure Proof: (case 2: xm  yn, and zk  xm) Theorem Let Z = z1, . . . , zk be any LCS of X and Y . 2. If xm  yn, then either zk  xm and Z is an LCS of Xm-1 and Y . Proof: (case 2: xm  yn, and zk  xm) Since Z does not end in xm, Z is a common subsequence of Xm-1 and Y, and there is no longer CS of yn-1 and x, or Z would not be an LCS.

Optimal Substructure Proof: (case 2: xm  yn, and zk  yn) Theorem Let Z = z1, . . . , zk be any LCS of X and Y . 2. If xm  yn, then zk  yn and Z is an LCS of X and Yn-1. Proof: (case 2: xm  yn, and zk  yn) Since Z does not end in yn, Z is a common subsequence of yn-1 and x, and there is no longer CS of Xm-1 and Y, or Z would not be an LCS.

Recursive Solution Sequences x1,…,xn, and y1,…,ym LCS(i,j) = length of a longest common subsequence of x1,…,xi and y1,…,yj if xi = yj then LCS(i,j) =

Recursive Solution Sequences x1,…,xn, and y1,…,ym LCS(i,j) = length of a longest common subsequence of x1,…,xi and y1,…,yj if xi = yj then LCS(i,j) = 1 + LCS(i-1,j-1)

Recursive Solution Sequences x1,…,xn, and y1,…,ym LCS(i,j) = length of a longest common subsequence of x1,…,xi and y1,…,yj if xi  yj then LCS(i,j) = max (LCS(i-1,j),LCS(i,j-1)) xi and yj cannot both be in LCS

Recursive Solution Sequences x1,…,xn, and y1,…,ym LCS(i,j) = length of a longest common subsequence of x1,…,xi and y1,…,yj if xi = yj then LCS(i,j) = 1 + LCS(i-1,j-1) if xi  yj then LCS(i,j) = max (LCS(i-1,j),LCS(i,j-1))

Recursive Solution Define c[i, j] = length of LCS of Xi and Yj . We want c[m,n].

Constructing an LCS

LCS Example What is the Longest Common Subsequence of X and Y? We’ll see how LCS algorithm works on the following example: X = ABCB Y = BDCAB What is the Longest Common Subsequence of X and Y? LCS(X, Y) = BCB X = A B C B Y = B D C A B

ABCB BDCAB X = ABCB; m = |X| = 4 Y = BDCAB; n = |Y| = 5 j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 B 2 3 C 4 B X = ABCB; m = |X| = 4 Y = BDCAB; n = |Y| = 5 Allocate array c[5,4]

ABCB BDCAB for i = 1 to m c[i,0] = 0 for j = 1 to n c[0,j] = 0 Yj B D C A B Xi A 1 B 2 3 C 4 B for i = 1 to m c[i,0] = 0 for j = 1 to n c[0,j] = 0

ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 B 2 3 C if ( Xi == Yj ) A 1 B 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B

ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 B 2 3 C if ( Xi == Yj ) A 1 B 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B

ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 B 2 3 C A 1 1 B 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B

ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 3 C A 1 1 1 B 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B

ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 3 C A 1 1 1 B 2 1 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B

ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 3 C A 1 1 1 B 2 1 1 1 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B

ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 3 C A 1 1 1 B 2 1 1 1 1 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B

ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B

ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 1 1 4 B

ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 1 1 2 4 B

ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 1 1 2 2 4 B

ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 1 1 2 2 2 4 B

ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 1 1 2 2 2 4 B 1

ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 1 1 2 2 2 4 B 1 1 2 2

3 ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 1 1 2 2 2 3 4 B 1 1 2 2

Computing the Length of an LCS

Analysis of an LCS O(m*n) Running time and memory: O(mn) and O(mn). since each c[i,j] is calculated in constant time, and there are m*n elements in the array Running time and memory: O(mn) and O(mn).

How to find actual LCS 3 j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 2 B 1 Xi A 1 1 1 2 B 1 1 1 1 2 3 C 1 1 2 2 2 4 3 B 1 1 2 2

Finding LCS 3 j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 2 2 3 4 B 1 1 2 2

How to find actual LCS 3 j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 2 B 1 Xi A 1 1 1 2 B 1 1 1 1 2 3 C 1 1 2 2 2 4 3 B 1 1 2 2 Initial call is PRINT-LCS (b, X,m, n). When b[i, j ] = , we have extended LCS by one character. So LCS = entries with in them. Time: O(m+n) (no of characters to be checked)

3 B C B LCS (reversed order): B C B j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 2 2 3 4 B 1 1 2 2 B C B LCS (reversed order): B C B (this string turned out to be a palindrome) LCS (straight order):

LCS Example What is the Longest Common Subsequence of X and Y? We’ll see how LCS algorithm works on the following example: X = PRESIDENT Y = PROVIDENCE What is the Longest Common Subsequence of X and Y?

Output: PRIDEN