1 Sequence Alignment Input: two sequences over the same alphabet Output: an alignment of the two sequences Example: u GCGCATGGATTGAGCGA u TGCGCCATTGATGACCA A possible alignment: -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A cc: shlomo moran
2 Alignments -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A Three elements: u Perfect matches u Mismatches u Insertions & deletions (indel) cc: shlomo moran
3 Choosing Alignments There are many possible alignments For example, compare: -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A to GCGCATGGATTGAGCGA TGCGCC----ATTGATGACCA-- Which one is better? cc: shlomo moran
4 Alignments Costs Replacements: one letter replaced by another Deletion: deletion of a letter Insertion: insertion of a letter u A cost of sequence similarity should examine how many and which operations took place cc: shlomo moran
5 Cost Function u We define a cost function by specifying a function (x,y) is the cost of replacing x by y (x,-) is the cost of deleting x (-,x) is the cost of inserting x u The cost of an alignment is the sum of position costs cc: shlomo moran
6 Simple Cost Function Cost of each position: u Match: 0 u Mismatch: 1 u Indel 2 cc: shlomo moran
7 The Optimal Cost The distance between two sequences is the minimal cost of all alignments of these sequences, namely, cc: shlomo moran
8 Recursive Formula for optimal cost Consider any optimal alignment of two sequences: s[1..m+1] and t[1..n+1] The last column in that alignment must be one of : 1. ( s[m+1],t[n +1] ) 2. ( s[m +1], - ) 3. ( -, t[n +1] ) cc: shlomo moran
9 Recursive Formula Consider any optimal alignment of two sequences: s[1..m+1] and t[1..n+1] The last column in that alignment must be one of : 1. Last match is ( s[m+1],t[n +1] ) 2. Last match is ( s[m +1], - ) 3. Last match is ( -, t[n +1] ) cc: shlomo moran
10 Recursive Formula Consider any optimal alignment of two sequences: s[1..m+1] and t[1..n+1] The last column in that alignment must be one of : 1. Last match is ( s[m+1],t[n +1] ) 2. Last match is ( s[m +1], - ) 3. Last match is ( -, t[n +1] ) cc: shlomo moran
11 Recursive Formula Define a Matrix V: Using our recursive formula, we get the following recurrence for V : V[i,j]V[i,j+1] V[i+1,j]V[i+1,j+1] cc: shlomo moran
12 Recursive Formula u Of course, we also need to handle the base cases in the recursion: AA - We fill the matrix using the recurrence rule: S T versus cc: shlomo moran
13 Dynamic Programming Algorithm We continue to fill the matrix using the recurrence rule S T cc: shlomo moran
14 Dynamic Programming Algorithm V[0,0]V[0,1] V[1,0]V[1,1] 0 2 -A A- 2 (A- versus -A) versus S T cc: shlomo moran
15 Dynamic Programming Algorithm S T cc: shlomo moran
16 Dynamic Programming Algorithm Conclusion: d( AAAC, AGC ) = 3 S T cc: shlomo moran
17 Reconstructing the Best Alignment u To reconstruct the best alignment, we record which case(s) in the recursive rule minimized the cost S T cc: shlomo moran
18 Reconstructing the Best Alignment u We now trace back a path that corresponds to the best alignment AAAC AG-C S T cc: shlomo moran
19 Reconstructing the Best Alignment u Sometimes, more than one alignment has minimal cost S T AAAC A-GC AAAC -AGC AAAC AG-C cc: shlomo moran
20 Time Complexity Space: O(mn) Time: O(mn) Filling the matrix O(mn) Backtrack O(m+n) S T cc: Shlomo Moran