Download presentation
Presentation is loading. Please wait.
1
Sequence alignment with Needleman-Wunsch
Usman Roshan
2
Pairwise alignment X: ACA, Y: GACAT Match=8, mismatch=2, gap-5
ACA-- -ACA- --ACA ACA---- GACAT GACAT GACAT G—ACAT Score =
3
Traceback We can compute an alignment of DNA (or protein or RNA) sequences X and Y with a traceback matrix T. Sequence X is aligned along the rows and Y along the columns. Each entry of the matrix T contains D, L, or U specifying diagonal, left or upper
4
Traceback X: ACA, Y=TACAG T A C G L U D
5
Traceback X: ACA, Y=TACAG T A C G L U D
6
Traceback code aligned_seq1 = "" aligned_seq2 = "" i = len(seq1)
j = len(seq2) while(i !=0 or j != 0): if(T[i][j] == “L”): aligned_seq1 = “-” + aligned_seq1 aligned_seq2 = seq2[j-1] + aligned_seq2 j = j - 1 elif(T[i][j] == "U"): aligned_seq2 = "-" + aligned_seq2 aligned_seq1 = seq1[i-1] + aligned_seq1 i = i - 1 else:
7
Optimal alignment An alignment can be specified by the traceback matrix. How do we determine the traceback for the highest scoring alignment? Needleman-Wunsch algorithm for global alignment First proposed in 1970 Widely used in genomics/bioinformatics Dynamic programming algorithm
8
Needleman-Wunsch (NW)
Input: X = x1x2…xn, Y=y1y2…ym (X is seq1 and Y is seq2) Notation: X1..i = x1x2…xi Score(X1..i,Y1..j) = Optimal alignment score of sequences X1..i and Y1..j. Suppose we know the optimal alignment scores of X1…i-1 and Y1…j-1 X1…i and Y1...j-1 X1...i-1 and Y1…j
9
Needleman-Wunsch (NW)
Then the optimal alignment score of X1…i and Y1…j is the maximum of Score(X1…i-1,Y1…j-1) + match/mismatch Score(X1…i,Y1…j-1) + gap Score(X1…i-1,Y1…j) + gap We build on this observation to compute Score(Xn,Ym)
10
Needleman-Wunsch Define V to be a two dimensional matrix with len(X)+1 rows and len(Y)+1 columns Let V[i][j] be the score of the optimal alignment of X1…i and Y1…j. Let m be the match cost, mm be mismatch, and g be the gap cost.
11
NW pseudocode Initialization:
for i = 1 to length of seq1 { V[i][0] = i*g; } For i = 1 to length of seq2 { V[0][i] = i*g; } Recurrence: for i = 1 to length of seq1{ for j = 1 to length of seq2{ V[i-1][j-1] + m(or mm) V[i][j] = max { V[i-1][j] + g V[i][j-1] + g if(maximum is V[i-1][j-1] + m(or mm)) then T[i][j] = ‘D’ else if (maximum is V[i-1][j] + g) then T[i][j] = ‘U’ else then T[i][j] = ‘L’ }
12
Example V G A C A T A C T Input: seq1: ACA seq2: GACAT m = 5 mm = -4
gap = -20 seq1 is lined along the rows and seq2 is along the columns G A C A T -20 -40 -60 -80 -100 -4 -15 -35 -55 -75 -24 -8 -10 -30 -50 -44 -19 -12 -5 -25 A C T L U D
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.