LCS and Extensions to Global and Local Alignment Dr. Nancy Warter-Perez June 26, 2003.

Slides:



Advertisements
Similar presentations
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Advertisements

Global Sequence Alignment by Dynamic Programming.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Sequence Alignment Arthur W. Chou Tunghai University Fall 2005.
Lecture 8 Alignment of pairs of sequence Local and global alignment
 If Score(i, j) denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + galign A[i] with GAP Score(i, j-1) + galign B[j] with GAP Score(i,
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Sequence Alignment Storing, retrieving and comparing DNA sequences in Databases. Comparing two or more sequences for similarities. Searching databases.
©CMBI 2005 Sequence Alignment In phylogeny one wants to line up residues that came from a common ancestor. For information transfer one wants to line up.
Longest Common Subsequence (LCS) Dr. Nancy Warter-Perez.
1-month Practical Course Genome Analysis (Integrative Bioinformatics & Genomics) Lecture 3: Pair-wise alignment Centre for Integrative Bioinformatics VU.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
Developing Pairwise Sequence Alignment Algorithms
Longest Common Subsequence (LCS) Dr. Nancy Warter-Perez June 22, 2005.
Longest Common Subsequence (LCS) - Scoring Dr. Nancy Warter-Perez June 25, 2003.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Sequence Alignment Oct 9, 2002 Joon Lee Genomics & Computational Biology.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Algorithms Dr. Nancy Warter-Perez June 19, May 20, 2003 Developing Pairwise Sequence Alignment Algorithms2 Outline Programming workshop 2 solutions.
Developing Sequence Alignment Algorithms in C++ Dr. Nancy Warter-Perez May 21, 2002.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Alignment II Dynamic Programming
Bioinformatics Workshop, Fall 2003 Algorithms in Bioinformatics Lawrence D’Antonio Ramapo College of New Jersey.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
FA05CSE182 CSE 182-L2:Blast & variants I Dynamic Programming
15-853:Algorithms in the Real World
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Local alignment
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Developing Pairwise Sequence Alignment Algorithms
Needleman Wunsch Sequence Alignment
Sequence Alignment.
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
Brandon Andrews.  Longest Common Subsequences  Global Sequence Alignment  Scoring Alignments  Local Sequence Alignment  Alignment with Gap Penalties.
Pair-wise Sequence Alignment Introduction to bioinformatics 2007 Lecture 5 C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Pairwise Sequence Alignment (I) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 22, 2005 ChengXiang Zhai Department of Computer Science University.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment (II) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 27, 2005 ChengXiang Zhai Department of Computer Science University.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Chapter 3 Computational Molecular Biology Michael Smith
Data Structures and Debugging Dr. Nancy Warter-Perez June 18, 2003.
Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
DNA, RNA and protein are an alien language
Introduction to Sequence Alignment. Why Align Sequences? Find homology within the same species Find clues to gene function Practical issues in experiments.
Introduction to Dynamic Programming
The ideal approach is simultaneous alignment and tree estimation.
Sequence comparison: Local alignment
Sequence Alignment.
Biology 162 Computational Genetics Todd Vision Fall Aug 2004
Sequence Alignment Using Dynamic Programming
Sequence Alignment 11/24/2018.
Pairwise sequence Alignment.
Intro to Alignment Algorithms: Global and Local
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Pairwise Alignment Global & local alignment
Presentation transcript:

LCS and Extensions to Global and Local Alignment Dr. Nancy Warter-Perez June 26, 2003

LCS and Extensions2 Overview Recursion Recursive solution to hydrophobicity sliding window problem LCS Smith-Waterman Algorithm Extensions to LCS Global Alignment Local Alignment Affine Gap Penalties Programming Workshop 6

June 26, 2003LCS and Extensions3 Project References _alignments.html _alignments.html Computational Molecular Biology – An Algorithmic Approach, Pavel Pevzner Introduction to Computational Biology – Maps, sequences, and genomes, Michael Waterman Algorithms on Strings, Trees, and Sequences – Computer Science and Computational Biology, Dan Gusfield

June 26, 2003LCS and Extensions4 Recursion Problems can be solved iteratively or recursively Recursion is useful in cases where you are building upon a partial solution Consider the hydrophobicity problem

June 26, 2003LCS and Extensions5 Main.cpp #include using namespace std; #include "hydro.h" double hydro[25] = {1.8,0,2.5,-3.5,-3.5,2.8,-0.4,-3.2,4.5,0,-3.9,3.8,1.9,-3.5,0, -1.6,-3.5,-4.5,-0.8,-0.7,0,4.2,-0.9,0,-1.3}; void main () { string seq;int ws, i; cout << "This program will compute the hydrophobicity of an sequence of amino acids.\n"; cout > seq; for(i = 0; i < seq.size(); i++) if((seq.data()[i] >= 'a') && (seq.data()[i] <= 'z')) seq.at(i) = seq.data()[i] - 32; cout > ws; compute_hydro(seq, ws); }

June 26, 2003LCS and Extensions6 Hydro.cpp #include using namespace std; #include "hydro.h" void print_hydro(string seq, int ws, int i, double sum); void compute_hydro(string seq, int ws) { cout << "\n\nThe hydrophocity values are:" << endl; print_hydro(seq, ws, seq.size()-1, 0); } void print_hydro(string seq, int ws, int i, double sum) { if(i == -1) return; if(i > seq.size() - ws) sum += hydro[seq.data()[i] - 'A']; else sum = sum - hydro[seq.data()[i+ws] - 'A'] + hydro[seq.data()[i] - 'A']; print_hydro(seq, ws, i-1, sum); if (i <= seq.size() - ws) cout << "Hydrophocity value:\t" << sum/ws << endl; }

June 26, 2003LCS and Extensions7 hydro.h extern double hydro[25]; void compute_hydro(string seq, int ws);

June 26, 2003LCS and Extensions8 Dynamic Programming Applied to optimization problems Useful when Problem can be recursively divided into sub-problems Sub-problems are not independent

June 26, 2003LCS and Extensions9 Longest Common Subsequence (LCS) Problem Reference: Pevzner Can have insertion and deletions but no substitutions (no mismatches) Ex: V: ATCTGAT W:TGCATA LCS:TCTA

June 26, 2003LCS and Extensions10 LCS Problem (cont.) Similarity score s i-1,j s i,j = max { s i,j-1 s i-1,j-1 + 1, if vi = wj On board example: Pevzner Fig 6.1

June 26, 2003LCS and Extensions11 Indels – insertions and deletions (e.g., gaps) alignment of V and W V = rows of similarity matrix (vertical axis) W = columns of similarity matrix (horizontal axis) Space (gap) in W  (UP) insertion Space (gap) in V  (LEFT) deletion Match (no mismatch in LCS) (DIAG)

June 26, 2003LCS and Extensions12 LCS(V,W) Algorithm for i = 0 to n si,0 = 0 for j = 1 to m s0,j = 0 for i = 1 to n for j = 1 to m if vi = wj si,j = si-1,j-1 + 1; bi,j = DIAG else if si-1,j >= si,j-1 si,j = si-1,j; bi,j = UP else si,j = si,j-1; bi,j = LEFT

June 26, 2003LCS and Extensions13 Print-LCS(V,i,j) if i = 0 or j = 0 return if bi,j = DIAG PRINT-LCS(V, i-1, j-1) print vi else if bi,j = UP PRINT-LCS(V, i-1, j) else PRINT-LCS(V, I, j-1)

June 26, 2003LCS and Extensions14 Classic Papers Needleman, S.B. and Wunsch, C.D. A General Method Applicable to the Search for Similarities in Amino Acid Sequence of Two Proteins. J. Mol. Biol., 48, pp , ( ssicArticlesArchive/needlemanandwunsch1970.pdf) Needleman, S.B. and Wunsch, C.D. A General Method Applicable to the Search for Similarities in Amino Acid Sequence of Two Proteins. J. Mol. Biol., 48, pp , Smith, T.F. and Waterman, M.S. Identification of Common Molecular Subsequences. J. Mol. Biol., 147, pp , 1981.( sicArticlesArchive/smithandwaterman1981.pdf) Smith, T.F. and Waterman, M.S. Identification of Common Molecular Subsequences. J. Mol. Biol., 147, pp , Smith, T.F. The History of the Genetic Sequence Databases. Genomics, 6, pp , ( iclesArchive/smith1990.pdf) Smith, T.F. The History of the Genetic Sequence Databases. Genomics, 6, pp , 1990.

June 26, 2003LCS and Extensions15 Smith-Waterman (1 of 3) Algorithm The two molecular sequences will be A=a 1 a 2... a n, and B=b 1 b 2... b m. A similarity s(a,b) is given between sequence elements a and b. Deletions of length k are given weight W k. To find pairs of segments with high degrees of similarity, we set up a matrix H. First set H k0 = H ol = 0 for 0 <= k <= n and 0 <= l <= m. Preliminary values of H have the interpretation that H i j is the maximum similarity of two segments ending in a i and b j. respectively. These values are obtained from the relationship H ij =max{H i-1,j-1 + s(a i,b j ), max {H i-k,j – W k }, max{H i,j-l - W l }, 0} ( 1 ) k >= 1 l >= 1 1 <= i <= n and 1 <= j <= m.

June 26, 2003LCS and Extensions16 Smith-Waterman (2 of 3) The formula for H ij follows by considering the possibilities for ending the segments at any a i and b j. (1)If a i and b j are associated, the similarity is H i-l,j-l + s(a i,b j ). (2) If a i is at the end of a deletion of length k, the similarity is H i – k, j - W k. (3) If b j is at the end of a deletion of length 1, the similarity is H i,j-l - W l. (typo in paper) (4) Finally, a zero is included to prevent calculated negative similarity, indicating no similarity up to a i and b j.

June 26, 2003LCS and Extensions17 Smith-Waterman (3 of 3) The pair of segments with maximum similarity is found by first locating the maximum element of H. The other matrix elements leading to this maximum value are than sequentially determined with a traceback procedure ending with an element of H equal to zero. This procedure identifies the segments as well as produces the corresponding alignment. The pair of segments with the next best similarity is found by applying the traceback procedure to the second largest element of H not associated with the first traceback.

June 26, 2003LCS and Extensions18 Extend LCS to Global Alignment si-1,j +  (vi, -) si,j= max {si,j-1 +  (-, wj) si-1,j-1 +  (vi, wj)  (vi, -) =  (-, wj) = -  = fixed gap penalty  (vi, wj) = score for match or mismatch – can be fixed, from PAM or BLOSUM

June 26, 2003LCS and Extensions19 Extend to Local Alignment 0(no negative scores) si-1,j +  (vi, -) si,j= max {si,j-1 +  (-, wj) si-1,j-1 +  (vi, wj)  (vi, -) =  (-, wj) = -  = fixed gap penalty  (vi, wj) = score for match or mismatch – can be fixed, from PAM or BLOSUM

June 26, 2003LCS and Extensions20 Discussion on adding affine gap penalties Affine gap penalty Score for a gap of length x -(  +  x) Where  > 0 is the insert gap penalty  > 0 is the extend gap penalty

June 26, 2003LCS and Extensions21 Alignment with Gap Penalties Can apply to global or local (w/ zero) algorithms  si,j= max {  si-1,j -  si-1,j - (  +  )  si,j= max {  si1,j-1 -  si,j-1 - (  +  ) si-1,j-1 +  (vi, wj) si,j= max {  si,j  si,j Note: keeping with traversal order in Figure 6.1,  is replaced by , and  is replaced by 

June 26, 2003LCS and Extensions22 Programming Workshop 6 Implement LCS LCS(V,W) b and s are global matrices Print-LCS(V,i,j) Write a program that uses LCS and Print-LCS. The program should prompt the user for 2 sequences and print the longest common sequence.