Part 2 # 68 Longest Common Subsequence T.H. Cormen et al., Introduction to Algorithms, MIT press, 3/e, 2009, pp. 390-396 Example: X=abadcda, Y=acbacadb.

Slides:



Advertisements
Similar presentations
Introduction to Algorithms 6.046J/18.401J/SMA5503
Advertisements

Chapter 7. Binary Search Trees
Introduction to Algorithms Quicksort
Longest Common Subsequence
Greedy Algorithms Amihood Amir Bar-Ilan University.
Lecture 8: Dynamic Programming Shang-Hua Teng. Longest Common Subsequence Biologists need to measure how similar strands of DNA are to determine how closely.
CPSC 335 Dynamic Programming Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Overview What is Dynamic Programming? A Sequence of 4 Steps
Algorithms Dynamic programming Longest Common Subsequence.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
CS420 Lecture 9 Dynamic Programming. Optimization Problems In optimization problems a set of choices are to be made to arrive at an optimum, and sub problems.
David Luebke 1 5/4/2015 CS 332: Algorithms Dynamic Programming Greedy Algorithms.
Introduction to Algorithms
Comp 122, Fall 2004 Dynamic Programming. dynprog - 2 Lin / Devi Comp 122, Spring 2004 Longest Common Subsequence  Problem: Given 2 sequences, X =  x.
1 Dynamic Programming (DP) Like divide-and-conquer, solve problem by combining the solutions to sub-problems. Differences between divide-and-conquer and.
Dynamic Programming Lets begin by looking at the Fibonacci sequence.
1 Longest Common Subsequence (LCS) Problem: Given sequences x[1..m] and y[1..n], find a longest common subsequence of both. Example: x=ABCBDAB and y=BDCABA,
Data Structures Lecture 10 Fang Yu Department of Management Information Systems National Chengchi University Fall 2010.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2006 Lecture 1 (Part 3) Design Patterns for Optimization Problems.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 12: Refining Core String.
Dynamic Programming Reading Material: Chapter 7..
Dynamic Programming Technique. D.P.2 The term Dynamic Programming comes from Control Theory, not computer science. Programming refers to the use of tables.
Dynamic Programming CIS 606 Spring 2010.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2009 Design Patterns for Optimization Problems Dynamic Programming.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
Distance Functions for Sequence Data and Time Series
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2002 Lecture 1 (Part 3) Tuesday, 9/3/02 Design Patterns for Optimization.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2008 Design Patterns for Optimization Problems Dynamic Programming.
November 7, 2005Copyright © by Erik D. Demaine and Charles E. Leiserson Dynamic programming Design technique, like divide-and-conquer. Example:
Dynamic Programming Reading Material: Chapter 7 Sections and 6.
© 2004 Goodrich, Tamassia Dynamic Programming1. © 2004 Goodrich, Tamassia Dynamic Programming2 Matrix Chain-Products (not in book) Dynamic Programming.
1 A Linear Space Algorithm for Computing Maximal Common Subsequences Author: D.S. Hirschberg Publisher: Communications of the ACM 1975 Presenter: Han-Chen.
Analysis of Algorithms
1 Dynamic Programming Jose Rolim University of Geneva.
Lecture 7 Topics Dynamic Programming
Longest Common Subsequence
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2005 Design Patterns for Optimization Problems Dynamic Programming.
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
Dynamic Programming UNC Chapel Hill Z. Guo.
ADA: 7. Dynamic Prog.1 Objective o introduce DP, its two hallmarks, and two major programming techniques o look at two examples: the fibonacci.
6/4/ ITCS 6114 Dynamic programming Longest Common Subsequence.
Dynamic Programming Louis Siu What is Dynamic Programming (DP)? Not a single algorithm A technique for speeding up algorithms (making use of.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Introduction to Algorithms Jiafen Liu Sept
Dynamic Programming Min Edit Distance Longest Increasing Subsequence Climbing Stairs Minimum Path Sum.
Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject to some constraints. (There may.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
Chapter 7 Dynamic Programming 7.1 Introduction 7.2 The Longest Common Subsequence Problem 7.3 Matrix Chain Multiplication 7.4 The dynamic Programming Paradigm.
CS38 Introduction to Algorithms Lecture 10 May 1, 2014.
WEEK 5 The Disjoint Set Class Ch CE222 Dr. Senem Kumova Metin
CSC 213 Lecture 19: Dynamic Programming and LCS. Subsequences (§ ) A subsequence of a string x 0 x 1 x 2 …x n-1 is a string of the form x i 1 x.
Dynamic Programming … Continued
9/27/10 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Adam Smith Algorithm Design and Analysis L ECTURE 16 Dynamic.
TU/e Algorithms (2IL15) – Lecture 4 1 DYNAMIC PROGRAMMING II
1 Chapter 15-2: Dynamic Programming II. 2 Matrix Multiplication Let A be a matrix of dimension p x q and B be a matrix of dimension q x r Then, if we.
Merge Sort 5/28/2018 9:55 AM Dynamic Programming Dynamic Programming.
Algorithmics - Lecture 11
Distance Functions for Sequence Data and Time Series
CS200: Algorithm Analysis
CS 3343: Analysis of Algorithms
Lecture 8. Paradigm #6 Dynamic Programming
Dynamic Programming-- Longest Common Subsequence
Introduction to Algorithms: Dynamic Programming
Longest Common Subsequence
Longest Common Subsequence
Dynamic Programming.
Presentation transcript:

Part 2 # 68 Longest Common Subsequence T.H. Cormen et al., Introduction to Algorithms, MIT press, 3/e, 2009, pp Example: X=abadcda, Y=acbacadb bcd is a common subsequence of X,Y X=abadcda, Y=acbacadb abacd is an LCS of X,Y (not unique) X=abadcda, Y=acbacadb Length of LCS is a good measure of the similarity of two strings The Unix diff command is based on LCS Similar to the edit distance of two strings Problem: Given two strings X and Y, of lengths m and n, find – the length of an LCS of X and Y – an actual LCS Solution: use Dynamic Programming (DP) – LCS problem has optimal substructure

Part 2 # 69 Iterative DP Let f i,j be the length of an LCS of the i th prefix X i =X(1..i) of X and the j th prefix Y j = Y(1..j) of Y. Then 1 + f i-1,j-1 if X(i) = Y(j) f i,j = max(f i,j-1, f i-1,j ) otherwise with f i,0 = f 0,j = 0 for all i, j Proof: let Z be an LCS of X i and Y j, |Z|=k. Case (i): X(i)=Y(j). Then Z(k)=X(i)=Y(j) and Z k-1 is an LCS of X i-1 and Y j-1. Case (ii): X(i)  Y(j) and Z(k)  X(i). Then Z is an LCS of X i-1 and Y j. Case (iii): X(i)  Y(j) and Z(k)  Y(j). Then Z is an LCS of X i and Y j-1.

Part 2 # 70 /** Returns the length of the LCS * of Strings x and y * Assume chars of a string of length * r are indexed from 1..r */ public int lcs(String x, String y) { int m = x.length(); int n = y.length(); int [][] l = new int[m+1][n+1]; for (int i = 0; i <= m; i++) l[i][0]=0; for (int j = 1; j <= n; j++) l[0][j]=0; for (int i = 1; i <= m; i++) { for (int j = 1; j <= n; j++) { if (x.charAt(i)==y.charAt(j)) l[i][j] = l[i-1][j-1]+1; else l[i][j] = Math.max(l[i-1][j], l[i][j-1]); } } return l[m][n]; } Algorithm for Iterative DP (simple version - just to find LCS length)

Part 2 # 71 Dynamic programming table - example Y a c b a c a d b a b a X 4 d c d a To construct an actual LCS – trace a path bottom right to top left – draw an arrow from an entry to the entry that led to its value – interpret diagonal steps as members of an LCS – solution is not necessarily unique a 0 2 b 0 3 a 0 X 4 d 0 5 c 0 6 d 0 7 a 0

Part 2 # 72 Complexity Each table entry is evaluated in O(1) time So overall, algorithm uses O(mn) time and space Can easily reduce to O(n) space if only LCS length is required There is a subtle divide-and-conquer variant (due to Hirschberg) that allows an actual LCS to be found in O(mn) time and O(n) space Lazy LCS evaluation In the DP table, typically only a subset of the entries are needed - example later An alternative lazy approach evaluates only those entries that are needed, and therefore is potentially more efficient The lazy evaluation can be facilitated by use of recursion

Part 2 # 73 /** Return LCS length of x i and y j */ private int rLCS(int i, int j) { if ( i==0 || j==0 ) return 0; else if (x.charAt(i)==y.charAt(j)) return 1 + recLCS(i-1,j-1); else return Math.max(recLCS(i-1,j), recLCS(i,j-1)) } Recursive DP - first attempt Call function externally as rLCS(m,n) Good news: for some values of i and j, rLCS(i,j) is never computed Bad news: for other values of i and j, rLCS(i,j) is computed several times! Example of both things happening in the next slide

Part 2 # 74 Example: X=caab Y=abac Tree of recursive calls: (4,4) (4,3) (3,4) (4,2) (3,3) (2,4) (3,1) (2,2) (2,0) (2,1) (1,2) (2,2) (2,1) (1,2) (1,4) (2,3) (0,3) (1,2) (1,0) (0,2) (1,1) (0,2) (1,1) (0,2) (1,1) (1,0) (0,1) Each of rLCS(1,3), rLCS(3,2), rLCS(4,1) is never computed rLCS(3,3) is computed twice rLCS(1,2) is computed three times

Part 2 # 75 Avoiding repeated computation The technique of memoisation – maintains the 2-dimensional array as a global variable – looks up the array before making a recursive call – makes the call only if the element not previously evaluated – when first determined, a value is entered into the table How can we determine if an element has been previously evaluated? – easy - just initialise the table, setting every entry to, say -1 – but - defeats the purpose, since this involves visiting every cell in the table

Part 2 # 76 int [][] l = new int[x.length()+1] [y.length()+1]; /* l is a global variable; assume that l’s * elements initialised with values -1 */ /** Lazy DP for LCS with memoisation * Returns LCS length of x i and y j */ private int mLCS(int i, int j) { if (l[i][j]==-1) // required value not already computed { if (i==0 || j==0) // trivial case l[i][j]=0; else if (x.charAt(i)==y.charAt(j)) // case (i) l[i][j] = 1 + mLCS(i-1,j-1); else // case (ii) l[i][j] = Math.max(mLCS(i-1,j), mLCS(i,j-1)); } return l[i][j]; } Recursive DP with memoisation

Part 2 # 77 Example of lazy evaluation an entry of -1 indicates non-evaluation First string: dbacacbabcda Second string: dbcbdaadcabd Length of LCS = 8 d b c b d a a d c a b d d b a c a c b a b c d a But: every cell has to be initialised (and this defeats the purpose) - or does it?

Part 2 # 78 Avoiding initialisation Suppose we want to insert values into an array at unpredictable positions We want to know, in O(1) time, for a given position, whether a value has been inserted We want to avoid initialising the whole array Intuition suggests it can't be done - how do we distinguish a genuine inserted value from a garbage value? The solution (for a 1-D array, 2-D similar) Let the array, a, be an array of pairs (v,p) – v holds the actual value inserted – p is a pointer into a second array b b is a companion integer array Keep a count n of values inserted On inserting next value, say in position i – increment n – set a(i).v to the required value – set a(i).p = n – set b(n) = i

Part 2 # 79 So how does it work? suppose we want to know, at any point, whether a[k].v is a genuine value The algorithm int j = a[k].p; if ( j n ) /* no value can have been * inserted in position k */ return false; else /* if a[k].v genuine then b[j] * must have been set to k, and * if not b[j] must have been * set to a different value */ return ( b[j] == k )