M.M. Dalkilic, PhD Monday, September 08, 2008 Class II Indiana University, Bloomington, IN Sequence Homology 1 Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 ©
Outline New Due Dates for Programs New Reading Posted on Website: T-Coffee Readings [Mount] Chap 3, [R] Chaps 3-4 Most Important Aspect of Bioinformatics—homology search through sequence similarity (cont’d) 2 Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 ©
Computation (review) Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 © 3 Algorithm “process or rules for (esp. machine) calculations. The execution of an algorithm must not include any subjective decisions, nor must it require the use of intuition or creativity” [Brassard & Bratley]
Computation (review) Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 © 4 constant Upper bound starts Upper bound
Computation (Next Lecture) Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 © 5 Divide and Conquer gives rise to Dynamic Programming—the approach used in sequence comparison
General Technique of Divide and Conquer Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 © 6 General approach—to work on more smaller pieces Key point: data is not share between among processes The cost of breaking-down, solving, then reassembling solution is less than working on the solution itself constantwork
General Technique of Dynamic Programming Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 © 7 But what if data needs to be shared or the cost of redundancy is too high? Rethink computation: Dynamic Programming or Recursive Optimization Reduce cost of sharing thereby reduce cost of recursion
General Technique of Dynamic Programming Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 © 8 “ Dynamic programming reduces the running time of a recursive function to be at most the time required to evaluate the function for all arguments less than or equal to the given argument, treating the cost of a recursive call as a constant” [Sedgewick] o Top-down DP o Bottom-Up DP
General Technique of Dynamic Programming Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 © 9 o Top-down DP Create a “dictionary” of new input-output values are they are encountered; Each time recursion is called, we “look-up” the entry—if it’s blank, we add it; Otherwise, we continue…
General Technique of Dynamic Programming Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 © 10 o Top-down DP New input-output pairs encountered
General Technique of Dynamic Programming Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 © 11 o Bottom-up DP Simply pre-compute all input-output pairs sequentially;
General Technique of Dynamic Programming Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 © 12 o TPD generally easier o Memory isn’t so much of an issue o We might not need every entry in the “dictionary”
General Technique of Dynamic Programming Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 © 13 o DP has state variables that keep information about the current state o DP has decision variables that are used for making choices o DP has return function that is optimized
General Technique of Dynamic Programming Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 © 14
Edit Substitution to Sequence Alignment Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 © 15
Edit Substitution to Sequence Alignment Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 © 16
Edit Substitution to Sequence Alignment Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 © 17
Edit Substitution to Sequence Alignment Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 © 18
Edit Substitution to Sequence Alignment Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 © 19
Edit Substitution to Sequence Alignment Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 © 20 Given that “eine”, “one”, and “bir” all mean 1 in different languages, based on edit distance (sequence similarity) which two words are more related? All that remains is to prove that edit distance is essentially sequence alignment
Edit Substitution to Sequence Alignment Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 © 21 A sequence alignment is grid of cells that contain either a single symbol, -, or blank. A sequence alignment looks much like a spreadsheet All that remains is to prove that edit distance is essentially sequence alignment
Edit Substitution to Sequence Alignment Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 © 22 A scientist then can use sequence alignment and be assured that this is nothing more than window dressing edit distance—which itself is a kind of distance between sequences Next class, the algorithm for sequence alignments…