Download presentation
Presentation is loading. Please wait.
1
Developing Sequence Alignment Algorithms in C++ Dr. Nancy Warter-Perez May 21, 2002
2
Developing Sequence Alignment Algorithms in C++2 Outline Hand out project Group assignments References for sequence alignment algorithms Board example of Needleman-Wunch Discussion of LCS Algorithm and how it can be extended for global alignment (Smith- Waterman) Extensions: local alignment and gap penalties
3
May 21, 2002 Developing Sequence Alignment Algorithms in C++3 Project Group Members Group 1: Bonnie, Eduardo, Sara Group 2: Thi, Edain Group 3: Michael, Hardik, Daisy Group 4: Dennis, Ivonne, Patrick Group 5: Chuck, Ronny
4
May 21, 2002 Developing Sequence Alignment Algorithms in C++4 Project References http://www.sbc.su.se/~arne/kurser/swell/pairwise_alignme nts.html http://www.sbc.su.se/~arne/kurser/swell/pairwise_alignme nts.html http://www.sbc.su.se/~per/molbioinfo2001/dynprog/dyna mic.html http://www.sbc.su.se/~per/molbioinfo2001/dynprog/dyna mic.html Lectures: Database search (4/16) and Rationale for DB Searching (5/16) Computational Molecular Biology – An Algorithmic Approach, Pavel Pevzner Introduction to Computational Biology – Maps, sequences, and genomes, Michael Waterman Algorithms on Strings, Trees, and Sequences – Computer Science and Computational Biology, Dan Gusfield
5
May 21, 2002 Developing Sequence Alignment Algorithms in C++5 Classic Papers Needleman, S.B. and Wunsch, C.D. A General Method Applicable to the Search for Similarities in Amino Acid Sequence of Two Proteins. J. Mol. Biol., 48, pp. 443-453, 1970. (http://poweredge.stanford.edu/BioinformaticsArchive/Cla ssicArticlesArchive/needlemanandwunsch1970.pdf) Needleman, S.B. and Wunsch, C.D. A General Method Applicable to the Search for Similarities in Amino Acid Sequence of Two Proteins. J. Mol. Biol., 48, pp. 443-453, 1970. Smith, T.F. and Waterman, M.S. Identification of Common Molecular Subsequences. J. Mol. Biol., 147, pp. 195-197, 1981.(http://poweredge.stanford.edu/BioinformaticsArchive/Clas sicArticlesArchive/smithandwaterman1981.pdf) Smith, T.F. and Waterman, M.S. Identification of Common Molecular Subsequences. J. Mol. Biol., 147, pp. 195-197, 1981. Smith, T.F. The History of the Genetic Sequence Databases. Genomics, 6, pp. 701-707, 1990. (http://poweredge.stanford.edu/BioinformaticsArchive/ClassicArt iclesArchive/smith1990.pdf) Smith, T.F. The History of the Genetic Sequence Databases. Genomics, 6, pp. 701-707, 1990.
6
May 21, 2002 Developing Sequence Alignment Algorithms in C++6 Longest Common Subsequence (LCS) Problem Can have insertion and deletions but no substitutions Ex: V: ATCTGAT W:TGCATA LCS:TCTA
7
May 21, 2002 Developing Sequence Alignment Algorithms in C++7 LCS Problem (cont.) Similarity score s i-1,j s i,j = max { s i,j-1 s i-1,j-1 + 1, if vi = wj
8
May 21, 2002 Developing Sequence Alignment Algorithms in C++8 Indels – insertions and deletions (e.g., gaps) alignment is V and W Alignment A is a 2xl matrix (l >= n,m) First row of A contains characters of V interspersed with l-n spaces Second row of A contains characters of W interspersed with l-m spaces Space in first row = insertion (UP) Space in second row = deletion (LEFT) Match (no mismatch in LCS) (DIAG)
9
May 21, 2002 Developing Sequence Alignment Algorithms in C++9 LCS(V,W) Algorithm for i = 1 to n si,0 = 0 for j = 1 to n s0,j = 0 for i = 1 to n for j = 1 to m if vi = wj si,j = si-1,j-1 + 1; bi,j = DIAG else if si-1,j >= si,j-1 si,j = si-1,j; bi,j = UP else si,j = si,j-1; bi,j = LEFT
10
May 21, 2002 Developing Sequence Alignment Algorithms in C++10 Print-LCS(b,V,i,j) if i = 0 or j = 0 return if bi,j = DIAG PRINT-LCS(b, V, i-1, j-1) print vi else if bi,j = UP PRINT-LCS(b, V, i-1, j) else PRINT-LCS(b, V, I, j-1)
11
May 21, 2002 Developing Sequence Alignment Algorithms in C++11 Extend LCS to Global Alignment si-1,j + (vi, -) si,j= max {si,j-1 + (-, wj) si-1,j-1 + (vi, wj) (vi, -) = (-, wj) = - = extend gap penalty (vi, wj) = score for match or mismatch – can be fixed, from PAM or BLOSUM Modify LCS and PRINT-LCS algorithms to support global alignment (On board discussion)
12
May 21, 2002 Developing Sequence Alignment Algorithms in C++12 Extend to Local Alignment 0(no negative scores) si-1,j + (vi, -) si,j= max {si,j-1 + (-, wj) si-1,j-1 + (vi, wj) (vi, -) = (-, wj) = - = extend gap penalty (vi, wj) = score for match or mismatch – can be fixed, from PAM or BLOSUM
13
May 21, 2002 Developing Sequence Alignment Algorithms in C++13 Discussion on adding affine gap penalties Affine gap penalty Score for a gap of length x -( + x) Where > 0 is the insert gap penalty > 0 is the extend gap penalty On board example from http://www.sbc.su.se/~arne/kurser/swell/pairwise_ali gnments.html http://www.sbc.su.se/~arne/kurser/swell/pairwise_ali gnments.html
14
May 21, 2002 Developing Sequence Alignment Algorithms in C++14 Alignment with Gap Penalties Can apply to global or local (w/ zero) algorithms si,j= max { si-1,j - si-1,j - ( + ) si,j= max { si1,j-1 - si,j-1 - ( + ) si-1,j-1 + (vi, wj) si,j= max { si,j si,j
15
May 21, 2002 Developing Sequence Alignment Algorithms in C++15 Implementing Global Alignment Program in C++ Keeping it simple (e.g., without classes or structures) Score matrix Traceback matrix Simple algorithm: Read in two sequences Compute score and traceback matrices (modified LCS) Print alignment score = score[n][m] Print each aligned sequence (modified PRINT-LCS) using traceback For debugging – can also print the score and traceback matrices
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.