Longest Common Subsequence

Slides:



Advertisements
Similar presentations
Introduction to Algorithms 6.046J/18.401J/SMA5503
Advertisements

Dynamic Programming.
Lecture 8: Dynamic Programming Shang-Hua Teng. Longest Common Subsequence Biologists need to measure how similar strands of DNA are to determine how closely.
CPSC 335 Dynamic Programming Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Overview What is Dynamic Programming? A Sequence of 4 Steps
Algorithms Dynamic programming Longest Common Subsequence.
COMP8620 Lecture 8 Dynamic Programming.
Dynamic Programming.
Comp 122, Fall 2004 Dynamic Programming. dynprog - 2 Lin / Devi Comp 122, Spring 2004 Longest Common Subsequence  Problem: Given 2 sequences, X =  x.
1 Dynamic Programming (DP) Like divide-and-conquer, solve problem by combining the solutions to sub-problems. Differences between divide-and-conquer and.
1 Longest Common Subsequence (LCS) Problem: Given sequences x[1..m] and y[1..n], find a longest common subsequence of both. Example: x=ABCBDAB and y=BDCABA,
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2006 Lecture 1 (Part 3) Design Patterns for Optimization Problems.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2009 Design Patterns for Optimization Problems Dynamic Programming.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2002 Lecture 1 (Part 3) Tuesday, 9/3/02 Design Patterns for Optimization.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2008 Design Patterns for Optimization Problems Dynamic Programming.
November 7, 2005Copyright © by Erik D. Demaine and Charles E. Leiserson Dynamic programming Design technique, like divide-and-conquer. Example:
1 Dynamic Programming Jose Rolim University of Geneva.
Lecture 7 Topics Dynamic Programming
Longest Common Subsequence
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2005 Design Patterns for Optimization Problems Dynamic Programming.
Dynamic Programming (16.0/15) The 3-d Paradigm 1st = Divide and Conquer 2nd = Greedy Algorithm Dynamic Programming = metatechnique (not a particular algorithm)
Dynamic Programming UNC Chapel Hill Z. Guo.
October 21, Algorithms and Data Structures Lecture X Simonas Šaltenis Nykredit Center for Database Research Aalborg University
Dynamic Programming. Well known algorithm design techniques:. –Divide-and-conquer algorithms Another strategy for designing algorithms is dynamic programming.
ADA: 7. Dynamic Prog.1 Objective o introduce DP, its two hallmarks, and two major programming techniques o look at two examples: the fibonacci.
CS 8833 Algorithms Algorithms Dynamic Programming.
Greedy Methods and Backtracking Dr. Marina Gavrilova Computer Science University of Calgary Canada.
6/4/ ITCS 6114 Dynamic programming Longest Common Subsequence.
Dynamic Programming (Ch. 15) Not a specific algorithm, but a technique (like divide- and-conquer). Developed back in the day when “programming” meant “tabular.
Introduction to Algorithms Jiafen Liu Sept
Part 2 # 68 Longest Common Subsequence T.H. Cormen et al., Introduction to Algorithms, MIT press, 3/e, 2009, pp Example: X=abadcda, Y=acbacadb.
CSC 213 Lecture 19: Dynamic Programming and LCS. Subsequences (§ ) A subsequence of a string x 0 x 1 x 2 …x n-1 is a string of the form x i 1 x.
9/27/10 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Adam Smith Algorithm Design and Analysis L ECTURE 16 Dynamic.
TU/e Algorithms (2IL15) – Lecture 4 1 DYNAMIC PROGRAMMING II
Dynamic Programming Csc 487/687 Computing for Bioinformatics.
Dynamic Programming Typically applied to optimization problems
Dynamic Programming (DP)
Algorithm Design and Analysis (ADA)
David Meredith Dynamic programming David Meredith
Advanced Algorithms Analysis and Design
Least common subsequence:
Dynamic Programming CISC4080, Computer Algorithms CIS, Fordham Univ.
CS200: Algorithm Analysis
Dynamic Programming Several problems Principle of dynamic programming
Chapter 8 Dynamic Programming.
Dynamic Programming Comp 122, Fall 2004.
CSCE 411 Design and Analysis of Algorithms
Dynamic Programming.
برنامه نویسی پویا (Dynamic Programming)
CS Algorithms Dynamic programming 0-1 Knapsack problem 12/5/2018.
Dynamic Programming Dr. Yingwu Zhu Chapter 15.
CS 3343: Analysis of Algorithms
Merge Sort Dynamic Programming
CS6045: Advanced Algorithms
Longest Common Subsequence
Dynamic Programming Comp 122, Fall 2004.
Lecture 8. Paradigm #6 Dynamic Programming
Trevor Brown DC 2338, Office hour M3-4pm
Dynamic Programming-- Longest Common Subsequence
Introduction to Algorithms: Dynamic Programming
Algorithm Design Techniques Greedy Approach vs Dynamic Programming
Dynamic Programming CISC4080, Computer Algorithms CIS, Fordham Univ.
Algorithms and Data Structures Lecture X
Longest common subsequence (LCS)
Lecture 5 Dynamic Programming
Analysis of Algorithms CS 477/677
Longest Common Subsequence
Dynamic Programming.
Longest Common Subsequence
Algorithm Course Dr. Aref Rashad
Presentation transcript:

Longest Common Subsequence Dynamic Programming Longest Common Subsequence

Dynamic Programming(DP) A powerful design technique for optimization problems Related to divide and conquer However, due to the nature of DP problems, standard divide-and-conquer solution are not efficient

Dynamic Programming(DP) Main question: How to set up the subproblem structure? For DP to be applicable to an optimization problem Optimal substructure: for the global problem to be solved optimally, each subproblem should be solved optimally Polynomially many subproblems Overlapping subproblems

Longest Common Subsequence (LCS) Application: searching for a substring or pattern in a large piece of text Not necessarily exact text but something similar Method for measuring degree of similarity: LCS

Longest Common Subsequence (LCS) Given two sequences x[1 . . m] and y[1 . . n], find a longest subsequence common to them both. x: A B C B D A B y: B D C A B A

Longest Common Subsequence (LCS) Given two sequences x[1 . . m] and y[1 . . n], find a longest subsequence common to them both. x: A B C B D A B y: B D C A B A LCS(x,y) = BCBA

LCS Not always unique X : A B C Y: B A C

LCS Not always unique X : A B C Y: B A C

Brute Force Solution? Check every subsequence of x[1 . . m] to see if it is also a subsequence of y[1 . .n]

Brute Force Solution? Check every subsequence of x[1 . . m] to see if it is also a subsequence of y[1 . .n] Analysis • Checking = O(n) time per subsequence. • subsequences of x(each bit-vector of length m determines a distinct subsequence of x). Thus, Exponential time.

DP Formulation for LCS Subproblems? Consider all pairs of prefixes A prefix of a sequence is just an initial string of characters Let denote the prefix of X with i characters Let denote the empty sequence

DP Formulation Compute the LCS for every possible pair of prefixes C[i,j]: length of the LCS of and Then, C[m,n]: length of LCS of X and Y

Recursive formulation C[i,0] = C[0,j] = 0 // base case How can I compute C[i,j] using solutions to subproblems?

Recursive formulation Two cases Last characters match: if X[i] = Y[j] LCS must end with this same character i X A j Y A

Recursive formulation Two cases Last characters match: if X[i] = Y[j] LCS must end with this same character C[i,j] = C[i-1, j-1] + 1 LCS ( ) = LCS( ) + X[i] i X A j Y A

Recursive formulation (2) Last characters DO NOT match: LCS( ) //skip x[i] LCS( ) //skip Y[j] C[i,j] = max (C[i, j-1], C[i-1,j]) B A i j X Y B A i j X Y B A i j X Y

A recursive algorithm LCS(x, y, i, j) { if x[i] = y[ j ] then c[i, j ] ←LCS(x, y, i–1, j–1) + 1 else c[i, j ] ←max{LCS(x, y, i–1, j), LCS(x, y, i, j–1) } Worst-case:x[i] ≠y[ j ], in which case the algorithm evaluates two subproblems, each with only one parameter decremented.

Recursion tree Height = O(m+n) Running time = potentially exponential! 3,4 2,4 3,3 1,4 2,3 3,2 Height = O(m+n) Running time = potentially exponential!

BUT, we keep recomputing the same subproblems 3,4 2,4 3,3 1,4 2,3 2,3 3,2

Overlapping subproblems A recursive solution contains a “small” number of distinct subproblems repeated many times The number of distinct LCS subproblems for two strings of lengths m and n is only mn

Store the solutions of subproblems After computing a solution to a subproblem, store it in a table. C[0..m,0..n] that stores lengths of LCS Keep also a helper array B[0..m,0..n] to store some pointers to extract the LCS later

O(mn) LCS(x[1..m],y[1..n]) { int C[0..m,0..n]; int B[0..m,0..n] for i=0 to m C[i,0] = 0; B[i,0] = UP for j=0 to n C[0,j] = 0; B[0,j] = LEFT for i=1 to m for j=1 to n if x[i] == y[j] C[i,j]=c[i-1, j-1] +1; B[i,j] = DIAG; else if (C[i-1,j] >= C[i, j-1]) C[i,j]=C[i-1, j]; B[i,j] = UP; else C[i,j]=C[i, j-1]; B[i,j] = LEFT; return C[m,n]; } O(mn)

Compute tables bottom-up Y Y B A C D 0 0 0 0 0 0 1 1 1 1 0 1 1 2 2 0 1 2 2 2 0 1 2 2 3 B D C B 0 0 0 0 0 0 1 1 1 1 0 1 1 2 2 0 1 2 2 2 0 1 2 2 3 B A X X C D B Length Table: C Tables C and B

Compute tables bottom-up Y Y B A C D 0 0 0 0 0 0 1 1 1 1 0 1 1 2 2 0 1 2 2 2 0 1 2 2 3 B D C B 0 0 0 0 0 0 1 1 1 1 0 1 1 2 2 0 1 2 2 2 0 1 2 2 3 B A X X C D B Length Table: C Tables C and B Start from B[m,n], follow pointers Extract entries with DIAG ( ) LCS for above example: BCB

ExtractLCS(B, X, i, j) { //initially called with (B, X, m, n) if i==0 OR j== 0 return; if B[i,j] == DIAG ExtractLCS(B, X, i-1, j-1); print X[i] else if B[i,j] == UP extractLCS(B, X, i-1, j); else // LEFT extractLCS(B, X, i, j-1) }