 -globin ( 141) and  -globin (146) V-LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS--H---GSAQVKGHGKKVADAL VHLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAF.

Slides:



Advertisements
Similar presentations
Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Advertisements

Multiple Sequence Alignment
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
6/11/2015 © Bud Mishra, 2001 L7-1 Lecture #7: Local Alignment Computational Biology Lecture #7: Local Alignment Bud Mishra Professor of Computer Science.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Sequence Alignment Storing, retrieving and comparing DNA sequences in Databases. Comparing two or more sequences for similarities. Searching databases.
Heuristic alignment algorithms and cost matrices
Sequence Alignment Algorithms in Computational Biology Spring 2006 Edited by Itai Sharon Most slides have been created and edited by Nir Friedman, Dan.
CPM '05 Sensitivity Analysis for Ungapped Markov Models of Evolution David Fernández-Baca Department of Computer Science Iowa State University (Joint work.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Large-Scale Global Alignments Multiple Alignments Lecture 10, Thursday May 1, 2003.
Regular Expression Constrained Sequence Alignment Abdullah N. Arslan Assistant Professor Computer Science Department.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Dynamic Programming and Biological Sequence Comparison Part I.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Phylogeny Tree Reconstruction
Recap 3 different types of comparisons 1. Whole genome comparison 2. Gene search 3. Motif discovery (shared pattern discovery)
Aligning Alignments Exactly By John Kececioglu, Dean Starrett CS Dept. Univ. of Arizona Appeared in 8 th ACM RECOME 2004, Presented by Jie Meng.
Sequence analysis of nucleic acids and proteins: part 1 Based on Chapter 3 of Post-genome Bioinformatics by Minoru Kanehisa, Oxford University Press, 2000.
Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary.
Bioinformatics Workshop, Fall 2003 Algorithms in Bioinformatics Lawrence D’Antonio Ramapo College of New Jersey.
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Protein Sequence Comparison Patrice Koehl
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
Class 2: Basic Sequence Alignment
15-853:Algorithms in the Real World
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
. Sequence Alignment I Lecture #2 This class has been edited from Nir Friedman’s lecture. Changes made by Dan Geiger, then Shlomo Moran. Background Readings:
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Trees, Stars, and Multiple Biological Sequence Alignment Jesse Wolfgang CSE 497 February 19, 2004.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Amino Acid Scoring Matrices Jason Davis. Overview Protein synthesis/evolution Protein synthesis/evolution Computational sequence alignment Computational.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment (II) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 27, 2005 ChengXiang Zhai Department of Computer Science University.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Using Traveling Salesman Problem Algorithms to Determine Multiple Sequence Alignment Orders Weiwei Zhong.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Approaches to Sequence Analysis s2s2 s3s3 s4s4 s1s1 statistics GT-CAT GTTGGT GT-CA- CT-CA- Parsimony, similarity, optimisation. Data {GTCAT,GTTGGT,GTCA,CTCA}
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
DNA, RNA and protein are an alien language
Sequence Comparison I519 Introduction to Bioinformatics, Fall 2012.
Multiple Sequence Alignment (cont.) (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 13, 2004 ChengXiang Zhai Department of Computer Science University.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
Aligning Genomes Genome Analysis, 12 Nov 2007 Several slides shamelessly stolen from Chr. Storm.
INTRODUCTION TO BIOINFORMATICS
The ideal approach is simultaneous alignment and tree estimation.
Week 36: Trees, Alignment & Database Search
Sequence Alignment 11/24/2018.
Computational Biology Lecture #6: Matching and Alignment
SMA5422: Special Topics in Biotechnology
Computational Biology Lecture #6: Matching and Alignment
Overview Pairwise Alignment Again Triple – Quadruple - Many
CSE 589 Applied Algorithms Spring 1999
Multiple Sequence Alignment (I)
Optimisation Alignment (60 minutes)
Computational Genomics Lecture #3a
Space-Saving Strategies for Analyzing Biomolecular Sequences
Presentation transcript:

 -globin ( 141) and  -globin (146) V-LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS--H---GSAQVKGHGKKVADAL VHLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAF TNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR SDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH Optimisation Alignment. (60 minutes) Determines homology at residue/nucleotide level. Current Topics in Computational Molecular Biology Chapter Chapter It often matches functional region with functional region Similarity/Distance between molecules can be evaluated Molecular Evolution studies. Homology/Non-homology depends on it.

Number of alignments, T(n,m) T(n,m) is the number of alignments of s1[1,n] and s2[1,m] then T(n,m)=T(n-1,m)+T(n,m-1)+T(n-1,m-1) T(0,0)=1 T(n,m) > 3 min(n,m) Alignments columns are equivalent to step (0,1), (1,0) and (1,1) in a [0,n][0,m] matrix. Thus alignment by alignment search for best alignment is not realistic. If n- -n n- is equivalent to then alignments are equivalent to choosing two subsets of s1 and and s2 that has to be matched, thus T G T T C T A G G

D i,j =min{D i-1,j-1 + d(s1[i],s2[j]), D i,j-1 + g, D i-1,j +g} Parsimony Alignment of two strings. {CTAG,TTG} AL = Sequences: s1=CTAGG s2=TTGT. 5, indels (g) 10. Cost Additivity Basic operations: transitions 2 (C-T & A-G), transversions 5, indels (g) 10. CTAG CTA G = + TT-G TT- G {CTA,TT} AL + GG (A) {CTA,TTG} AL + G- (B) {CTAG,TT} AL + -G (C) Min [] Initial condition: D 0,0 =0. (D i,j := D(s1[1:i], s2[1:j]))

T G T T C T A G G CTAGG Alignment: i v Cost 17 TT-GT

Complexity of Accelerations of pairwise algorithm. Heuristic acceleration: Smaller band & larger acceleration, but no guarantee of optimum.  { Dynamical Programming: (n+1)(m+1)3=O(nm) Exact acceleration (Ukkonen,Myers). Assume all events cost 1. If d  (s 1,s 2 ) <2  +|l 1 -l 2 |, then d(s 1,s 2 )= d  (s 1,s 2 Backtracking: O(n+m) Recursion without memory: T(n,m) > 3 min(n,m)

Caveat: There are enormous numbers of suboptimal alignments. Close-to-Optimum Alignments (Waterman & Byers, 1983) Alignments within  of optimal Ex.  = * 17 T * / G * / T / T / C T A G G i i v g Cost 19 T T G T -

Sets of positions that are on some suboptimal alignment. Alignments within  of optimal. Ex.  = 2 Hirschberg & Close-to-Optimum Alignments (Hirschberg, 1975). 40/50 32/40 22/30 14/20 9/10 17/0 T 30/40 22/30 12/25 4/15 12/5 22/10 G 20/35 12/25 2/15 12/5 22/10 32/20 T 10/25 2/15 10/15 20/15 30/20 40/30 T 0/17 10/15 20/20 30/25 40/30 50/40 C T A G G Mid point: (3,2) and the alignment problem is then reduced to 2 smaller alignment problems: (CTA + TT) and (GG + GT)

Longer Indels (i,j) (i-2,j) (i,j-1) (i,j-2) Comment: Evolutionary Consistency Condition: g i + g j > g i+j (i-1,j) (i-1,j- 1) TCATGGTACCGTTAGCGT GCA GCAT g k :cost of indel of length k (for instance 10 + log k) Initial condition: D 0,0 =0 D i,j = min { D i-1,j-1 + d(s1[i],s2[j]), D i,j-1 + g 1,D i,j-2 + g 2,, D i-1,j + g 1,D i-2,j + g 2,, } Cubic running time. Quadratic memory.

40/ / / /0.9 9/ /2.9 T 30/ / /-2.1 4/ /2.9 22/-7.2 G 20/ /-7.1 2/8.0 12/ / /-22.3 T 10/ /3.0 10/ / / /-37.4 T 0/0 10/ / / / /-50.5 C T A G G Distance-Similarity (Smith-Waterman-Fitch,1982) S i,j =max{S i-1,j-1 + s(s1[i],s2[j]), S i,j-1 - w, S i-1,j –w} Similarity Distance s(n1,n2) M - d(n1,n2) w 1/(2*M) + g Similarity: Transversions:0 Transitions:3 Identity:5 Indels: /10 Distance: Transitions:2 Transversions 5 Identity 0 Indels:10. M largest dist (5) 1. The Switch from Dist to Sim is highly analogous to Maximizing {- f(x)} instead of Minimizing {f(x)}. 2. Dist will based on a metric: i. d(x,x) =0, ii. d(x,y) >=0, iii. d(x,y) = d(y,x) & iv. d(x,z) + d(z,y) >= d(x,y). There are no analogous restrictions on Sim, giving it a larger parameter space.

Local alignment Smith,Waterman (1981 Global Alignment: S i,j =max{D i-1,j-1 + s(s1[i],s2[j]), S i,j-1 -w, S i-1,j -w} Local: S i,j =max{D i-1,j-1 + s(s1[i],s2[j]), S i,j-1 -w, S i-1,j -w,0} Score Parameters: C Match: 1 A Mismatch -1/3 G / Gap 1 + k/3 C / GCC-UCG U / GCCAUUG A ! C / C / G / U A A C A G C C U C G C U U

Alignment of three sequences. s1=ATCG s2=ATGCC s3=CTCC Consensus sequence: ATCC Alignment: AT-CG ATGCC CT-CC C A A ? Configurations in an alignment column: - - n n n - n - - n - n - n n - n n n n - Initial condition: D 0,0,0 = 0. Recursion: D i,j,k = min{D i-i', j-j', k-k' + d(i,i',j,j',k,k')} Running time : l 1 *l 2 *l 3 *(2 3 -1) Memory requirement: l 1 *l 2 *l 3 New phenomena: ancestral/consensus sequence. AACAAC

G G C C Parsimony Alignment of four sequences s1=ATCG s2=ATGCC s3=CTCC s4=ACGCG Configurations in alignment columns: n n n n - n n n n n - n n - n - - n - n n n - - n - - n - n - n - n n - n n - n n n - - n n n n - n - Alignment: AT-CG ATGCC CT-CC ACGCG Initial condition: D 0 = 0. Memory : l 1 *l 2 *l 3 *l 4 New Phenomena: Cost and alignment is phylogeny dependent GCCGGCCG Computation time: l 1 *l 2 *l 3 *l 4 *2 4 Memory : l 1 *l 2 *l 3 *l 4 Recursion: D i = min{D i-∆ + d(i,∆)} ∆ [{0,1} 4 \{0} 4 ]

Alignment of many sequences. s1=ATCG, s2=ATGCC, , sn=ACGCG Alignment: AT-CG ATGCC..... ACGCG Configurations in an alignment column: 2 n -1 Recursion: D i =min{D i-∆ + d(i,∆)} ∆ [{0,1} n \{0} n ] Initial condition: D 0,0,..0 = 0. Computation time: l n *(2 n -1)*n Memory requirement: l n (l:sequence length, n:number of sequences) s1 s3 s4 \ ! / / \ s2 s5

Sodh Sodb Sodl sddm Sdmz sodsSdpb Progressive Alignment (Feng-Doolittle 1987 J.Mol.Evol.) Can align alignments and given a tree make a multiple alignment. * * alkmny-trwq acdeqrt akkmdyftrwq acdehrt kkkmemftrwq [ P(n,q) + P(n,h) + P(d,q) + P(d,h) + P(e,q) + P(e,h)]/6 * * *** * * * * * * Sodh atkavcvlkgdgpqvqgsinfeqkesdgpvkvwgsikglte-glhgfhvhqfg----ndtagct sagphfnp lsrk Sodb atkavcvlkgdgpqvqgtinfeak-gdtvkvwgsikglte—-glhgfhvhqfg----ndtagct sagphfnp lsrk Sodl atkavcvlkgdgpqvqgsinfeqkesdgpvkvwgsikglte-glhgfhvhqfg----ndtagct sagphfnp lsrk Sddm atkavcvlkgdgpqvq -infeak-gdtvkvwgsikglte—-glhgfhvhqfg----ndtagct sagphfnp lsrk Sdmz atkavcvlkgdgpqvq— infeqkesdgpvkvwgsikglte—glhgfhvhqfg----ndtagct sagphfnp Lsrk Sods vatkavcvlkgdgpqvq— infeak-gdtvkvwgsikgltepnglhgfhvhqfg----ndtagct sagphfnp lsrk Sdpb datkavcvlkgdgpqvq—-infeqkesdgpv----wgsikgltglhgfhvhqfgscasndtagctvlggssagphfnpehtnk

Summary Comparison of 2 Strings Minimize Distance-Maximize Similarity Dynamical Programming Algorithm Local alignment Close-to-Optimal Solutions Comparison of many Strings Simultaneous Phylogeny and Alignment