The Longest Common Subsequence Problem CSE 373 Data Structures
CSE 373 AU 04 -- Longest Common Subsequences Reading Goodrich and Tamassia, 3rd ed, Chapter 12, section 11.5, pp.570-574. 12/31/2018 CSE 373 AU 04 -- Longest Common Subsequences
CSE 373 AU 04 -- Longest Common Subsequences Motivation Two Problems and Methods for String Comparison: The substring problem The longest common subsequence problem. In both cases, good algorithms do substantially better than the brute force methods. 12/31/2018 CSE 373 AU 04 -- Longest Common Subsequences
String Matching Problem Given two strings TEXT and PATTERN, find the first occurrence of PATTERN in TEXT. Useful in text editing, document analysis, genome analysis, etc. 12/31/2018 CSE 373 AU 04 -- Longest Common Subsequences
String Matching Problem: Brute-Force Algorithm For i = 0 to n – m { For j = 0 to m – 1 { If TEXT[j] PATTERN[i] then break If j = m – 1 then return i } return -1; Suppose TEXT = 0000000000001 PATTERN = 0000001 This type of problem has (n2) behavior. A more efficient algorithm is the Boyer-Moore algorithm. (We will not be covering it in this course.) 12/31/2018 CSE 373 AU 04 -- Longest Common Subsequences
Longest Common Subsequence Problem A Longest Common Subsequence LCS of two strings S1 and S2 is a longest string the can be obtained from S1 and from S2 by deleting elements. For example, S1 = “thoughtful” and S2 = “shuffle” have an LCS: “hufl”. Useful in spelling correction, document comparison, etc. 12/31/2018 CSE 373 AU 04 -- Longest Common Subsequences
CSE 373 AU 04 -- Longest Common Subsequences Dynamic Programming Analyze the problem in terms of a number of smaller subproblems. Solve the subproblems and keep their answers in a table. Each subproblem’s answer is easily computed from the answers to its own subproblems. 12/31/2018 CSE 373 AU 04 -- Longest Common Subsequences
Longest Common Subsequence: Algorithm using Dynamic Programming For every prefix of S1 and prefix of S2 we’ll compute the length L of an LCS. In the end, we’ll get the length of an LCS for S1 and S2 themselves. The subsequence can be recovered from the matrix of L values. (see demonstration) 12/31/2018 CSE 373 AU 04 -- Longest Common Subsequences