Sequence Alignment Using Dynamic Programming Saurabh Sinha
Dynamic Programming Is not a type of programming language Is a type of algorithm, used to solve many different computational problems Sequence Alignment is one of these problems We will see the algorithm in its general sense first
Manhattan Tourist Problem 1 2 5 source 5 3 10 5 2 1 5 3 5 3 1 2 3 4 5 2 sink Find most weighted path from source to sink.
Manhattan Tourist Problem 1 2 5 source 1 3 5 3 10 5 2 1 5 13 3 5 3 1 2 3 4 16 20 5 2 sink 22
MTP: Greedy Algorithm Is Not Optimal 1 2 5 source 22 5 3 10 5 2 1 5 3 5 3 1 2 3 4 promising start, but leads to bad choices! 5 2 sink 18
MTP: Dynamic Programming j 1 source 1 1 i S0,1 = 1 5 1 5 S1,0 = 5 Calculate optimal path score for each vertex in the graph Each vertex’s score is the maximum of the prior vertices score plus the weight of the respective edge in between
MTP: Dynamic Programming (cont’d) j 1 2 source 1 2 1 3 i S0,2 = 3 5 3 -5 1 5 4 S1,1 = 4 3 2 8 S2,0 = 8
MTP: Dynamic Programming (cont’d) j 1 2 3 source 1 2 5 1 3 8 i S3,0 = 8 5 3 10 -5 1 1 5 4 13 S1,2 = 13 3 5 -5 2 8 9 S2,1 = 9 3 8 S3,0 = 8
MTP: Dynamic Programming (cont’d) j 1 2 3 source 1 2 5 1 3 8 i 5 3 10 -5 -5 1 -5 1 5 4 13 8 S1,3 = 8 3 5 -3 -5 3 2 8 9 12 S2,2 = 12 3 8 9 S3,1 = 9
MTP: Dynamic Programming (cont’d) j 1 2 3 source 1 2 5 1 3 8 i 5 3 10 -5 -5 1 -5 1 5 4 13 8 3 5 -3 2 -5 3 3 2 8 9 12 15 S2,3 = 15 -5 3 8 9 9 S3,2 = 9
MTP: Dynamic Programming (cont’d) j 1 2 3 source 1 2 5 1 3 8 Almost Done i 5 3 10 -5 -5 1 -5 1 5 4 13 8 3 5 -3 2 -5 3 3 2 8 9 12 15 -5 1 3 8 9 9 16 S3,3 = 16
MTP: Dynamic Programming (cont’d) j 1 2 3 source 1 2 5 1 3 8 Done! i 5 3 10 -5 -5 1 -5 1 5 4 13 8 3 5 -3 2 -5 3 3 2 8 9 12 15 -5 1 3 8 9 9 16 S3,3 = 16
MTP Dynamic Programming: Formal Description Computing the score for a point (i,j) by the recurrence relation: si, j = max si-1, j + weight of the edge between (i-1, j) and (i, j) si, j-1 + weight of the edge between (i, j-1) and (i, j)
Applying Dynamic Programming to Sequence Alignment
Representing alignments Alignment : 2 x k matrix ( k m, n ) V = ACCTGGTAAA n = 10 8 2 1 matches mismatches deletions insertions W = ACATGCGTATA m = 11 V A C T G W
Scoring functions A simple scoring function: if in an alignment there are nm matches, nmis substitutions and ng gaps, the alignment score is where wm , wmis ,wg represent match score, mismatch score and gap score (penalty) respectively
Sequence Alignment as a MTP-like problem
Sequence Alignment as a MTP-like problem Match = 20 Mismatch = -10 Gap = -20 Score of path = 8 matches + 2 mismatches + 1 gap = 130
What alignment is this? V W A C T G A C T G Match = 20 Mismatch = -10 Gap = -20 Score of path = 5 matches + 2 mismatches + 7 gaps = -60
Sequence Alignment, formally Find the best alignment between two strings under a given scoring scheme Input : Strings v and w and a scoring schema Output : Alignment of maximum score Dynamic programming recurrence: si-1,j-1 + score (vi, wj) si,j = max si-1,j + gapscore si,j-1 + gapscore {
Sequence Alignment: Example Calculate and show the Dynamic Programming matrix and an optimal alignment for the DNA sequences GCTTAGC and GCATTGC, scoring +3 for a match, -2 for a mismatch, and -3 for a gap