Download presentation
Presentation is loading. Please wait.
Published byJeffery Clark Modified over 9 years ago
1
1 Sequence Alignment Input: two sequences over the same alphabet Output: an alignment of the two sequences Example: u GCGCATGGATTGAGCGA u TGCGCCATTGATGACCA A possible alignment: -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A cc: shlomo moran
2
2 Alignments -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A Three elements: u Perfect matches u Mismatches u Insertions & deletions (indel) cc: shlomo moran
3
3 Choosing Alignments There are many possible alignments For example, compare: -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A to ------GCGCATGGATTGAGCGA TGCGCC----ATTGATGACCA-- Which one is better? cc: shlomo moran
4
4 Alignments Costs Replacements: one letter replaced by another Deletion: deletion of a letter Insertion: insertion of a letter u A cost of sequence similarity should examine how many and which operations took place cc: shlomo moran
5
5 Cost Function u We define a cost function by specifying a function (x,y) is the cost of replacing x by y (x,-) is the cost of deleting x (-,x) is the cost of inserting x u The cost of an alignment is the sum of position costs cc: shlomo moran
6
6 Simple Cost Function Cost of each position: u Match: 0 u Mismatch: 1 u Indel 2 cc: shlomo moran
7
7 The Optimal Cost The distance between two sequences is the minimal cost of all alignments of these sequences, namely, cc: shlomo moran
8
8 Recursive Formula for optimal cost Consider any optimal alignment of two sequences: s[1..m+1] and t[1..n+1] The last column in that alignment must be one of : 1. ( s[m+1],t[n +1] ) 2. ( s[m +1], - ) 3. ( -, t[n +1] ) cc: shlomo moran
9
9 Recursive Formula Consider any optimal alignment of two sequences: s[1..m+1] and t[1..n+1] The last column in that alignment must be one of : 1. Last match is ( s[m+1],t[n +1] ) 2. Last match is ( s[m +1], - ) 3. Last match is ( -, t[n +1] ) cc: shlomo moran
10
10 Recursive Formula Consider any optimal alignment of two sequences: s[1..m+1] and t[1..n+1] The last column in that alignment must be one of : 1. Last match is ( s[m+1],t[n +1] ) 2. Last match is ( s[m +1], - ) 3. Last match is ( -, t[n +1] ) cc: shlomo moran
11
11 Recursive Formula Define a Matrix V: Using our recursive formula, we get the following recurrence for V : V[i,j]V[i,j+1] V[i+1,j]V[i+1,j+1] cc: shlomo moran
12
12 Recursive Formula u Of course, we also need to handle the base cases in the recursion: AA - We fill the matrix using the recurrence rule: S T versus cc: shlomo moran
13
13 Dynamic Programming Algorithm We continue to fill the matrix using the recurrence rule S T cc: shlomo moran
14
14 Dynamic Programming Algorithm V[0,0]V[0,1] V[1,0]V[1,1] 0 2 -A A- 2 (A- versus -A) versus S T cc: shlomo moran
15
15 Dynamic Programming Algorithm S T cc: shlomo moran
16
16 Dynamic Programming Algorithm Conclusion: d( AAAC, AGC ) = 3 S T cc: shlomo moran
17
17 Reconstructing the Best Alignment u To reconstruct the best alignment, we record which case(s) in the recursive rule minimized the cost S T cc: shlomo moran
18
18 Reconstructing the Best Alignment u We now trace back a path that corresponds to the best alignment AAAC AG-C S T cc: shlomo moran
19
19 Reconstructing the Best Alignment u Sometimes, more than one alignment has minimal cost S T AAAC A-GC AAAC -AGC AAAC AG-C cc: shlomo moran
20
20 Time Complexity Space: O(mn) Time: O(mn) Filling the matrix O(mn) Backtrack O(m+n) S T cc: Shlomo Moran
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.