Download presentation
Presentation is loading. Please wait.
1
BNFO 136 Sequence alignment Usman Roshan
2
Pairwise alignment X: ACA, Y: GACAT Match=8, mismatch=2, gap-5 ACA---ACA---ACAACA---- GACATGACATGACATG--ACAT 8+2+2-5-5-5+8+8+8-5-5-5+2+2+22-5-5-5-5-5-5 Score = 214-4-28
3
Traceback We can compute an alignment of DNA (or protein or RNA) sequences X and Y with a traceback matrix T. Sequence X is aligned along the rows and Y along the columns. Each entry of the matrix T contains D, L, or U specifying diagonal, left or upper
4
Traceback X: ACA, Y=TACAG TACAG LLLLL AUDUUL CUUDUD AULLDL
5
Traceback X: ACA, Y=TACAG TACAG LLLLL AUDUUL CUUDUD AULLDL
6
Traceback code aligned_seq1 = "" aligned_seq2 = "" i = len(seq2) j = len(seq1) while(i !=0 or j != 0): if(T[i][j] == “L”): aligned_seq1 = “-” + aligned_seq1 aligned_seq1 = seq1[j-1] + aligned_seq1 j = j - 1 elif(T[i][j] == "U"): aligned_seq1 = "-" + aligned_seq1 aligned_seq2 = seq2[i-1] + aligned_seq2 i = i - 1 else: aligned_seq1 = seq1[j-1] + aligned_seq1 aligned_seq2 = seq2[i-1] + aligned_seq2 i = i - 1 j = j - 1
7
Optimal alignment An alignment can be specified by the traceback matrix. How do we determine the traceback for the highest scoring alignment? Needleman-Wunsch algorithm for global alignment –First proposed in 1970 –Widely used in genomics/bioinformatics –Dynamic programming algorithm
8
Needleman-Wunsch (NW) Input: –X = x 1 x 2 …x n, Y=y 1 y 2 …y m –(X is seq2 and Y is seq1) Notation: –X 1..i = x 1 x 2 …x i –Score(X 1..i,Y 1..j ) = Optimal alignment score of sequences X 1..i and Y 1..j. Suppose we know the optimal alignment scores of –X 1…i-1 and Y 1…j-1 –X 1…i and Y 1...j-1 –X 1...i-1 and Y 1…j
9
Needleman-Wunsch (NW) Then the optimal alignment score of X 1…i and Y 1…j is the maximum of –Score(X 1…i-1,Y 1…j-1 ) + match/mismatch –Score(X 1…i,Y 1…j-1 ) + gap –Score(X 1…i-1,Y 1…j ) + gap We build on this observation to compute Score(X n,Y m )
10
Needleman-Wunsch Define V to be a two dimensional matrix with len(X)+1 rows and len(Y)+1 columns Let V[i][j] be the score of the optimal alignment of X 1…i and Y 1…j. Let m be the match cost, mm be mismatch, and g be the gap cost.
11
NW pseudocode Initialization: for i = 1 to len(seq2) { V[i][0] = i*g; } For i = 1 to len(seq1) { V[0][i] = i*g; } Recurrence: for i = 1 to len(seq2){ for j = 1 to len(seq1){ V[i-1][j-1] + m(or mm) V[i][j] = max {V[i-1][j] + g V[i][j-1] + g if(maximum is V[i-1][j-1] + m(or mm)) then T[i][j] = ‘D’ else if (maximum is V[i-1][j] + g) then T[i][j] = ‘U’ else then T[i][j] = ‘L’ }
12
Example Input: seq2: ACA seq1: GACAT m = 5 mm = -4 gap = -20 seq2 is lined along the rows and seq2 is along the columns 0-20-40-60-80-100 -20-4-15-35-55-75 -40-24-8-10-30-50 -60-44-19-12-5-25 LLLLL UDDLLL UUDDLL UUDDDL V T G A C A T ACAACA
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.