CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 20
Dynamic Programming Review Recipe –Characterize structure of problem –Recursively define value of optimal solution –Compute value of optimal solution –Construct optimal solution from computed information Dynamic programming techniques –Binary choice: weighted interval scheduling –Multi-way choice: segmented least squares –Adding a new variable: knapsack –Dynamic programming over intervals: RNA secondary structure
RNA Secondary Structure RNA: String B = b 1 b 2 b n over alphabet { A, C, G, U } Secondary structure: RNA is single-stranded so it tends to loop back and form base pairs with itself. This structure is essential for understanding behavior of molecule G U C A GA A G CG A U G A U U A G A CA A C U G A G U C A U C G G G C C G Ex: GUCGAUUGAGCGAAUGUAACAACGUGGCUACGGCGAGA complementary base pairs: A-U, C-G
RNA Secondary Structure Secondary structure: A set of pairs S = { (b i, b j ) } that satisfy –[Watson-Crick] S is a matching and each pair in S is a Watson-Crick complement: A-U, U-A, C-G, or G-C –[No sharp turns] The ends of each pair are separated by at least 4 intervening bases. If (b i, b j ) S, then i < j - 4 –[Non-crossing] If (b i, b j ) and (b k, b l ) are two pairs in S, then we cannot have i < k < j < l C GG C A G U U UA A UGUGGCCAU ok GG C A G U UA G A UGGGCAU sharp turn G 44 C GG C A U G U UA A GUUGGCCAU crossing
RNA Secondary Structure Out of all the secondary structures that are possible for a single RNA molecule, which are the ones that are likely to arise? –Free energy: Usual hypothesis is that an RNA molecule will form the secondary structure with the optimum total free energy –Goal: Given an RNA molecule B = b 1 b 2 b n, find a secondary structure S that maximizes the number of base pairs approximate by number of base pairs
RNA Secondary Structure: Subproblems First attempt: OPT(j) = maximum number of base pairs in a secondary structure of the substring b 1 b 2 b j. Either –j is not involved in a pair Finding secondary structure in: b 1 b 2 b j-1 –j pairs with t for some t < j – 4 Finding secondary structure in: b 1 b 2 b t-1 Finding secondary structure in: b t+1 b t+2 b j-1 OPT(j-1) 1 tj WHY ? Not on our list of subproblems, does not begin with b 1. Need more subproblems! Need to be able to work with subproblems that do not begin with b 1.
Dynamic Programming Over Intervals Notation: OPT(i, j) = maximum number of base pairs in a secondary structure of the substring b i b i+1 b j –Case 1: If i j - 4 OPT(i, j) = 0 by no-sharp turns condition –Case 2: Base b j is not involved in a pair OPT(i, j) = OPT(i, j-1) –Case 3: Base b j pairs with b t for some i t < j - 4 non-crossing constraint decouples resulting sub-problems OPT(i, j) = 1 + max t { OPT(i, t-1) + OPT(t+1, j-1) } i tj take max over t such that i t < j-4 and b t and b j are Watson-Crick complements
Dynamic Programming Over Intervals Write down final recurrence (DONE IN CLASS) What order to solve the sub-problems? –Do shortest intervals first –Running time: O(n 3 ) –Example: ACCGGUAGU (DONE IN CLASS) RNA(b 1,…,b n ) { Initialize Opt[i, j] = 0 whenever i j-4 for k = 5, 6, …, n-1 for i = 1, 2, …, n-k j = i + k Compute Opt[i, j] return Opt[1, n] } using recurrence