Sequence Comparison Introduction Comparison Homogy -- Analogy

Slides:



Advertisements
Similar presentations
Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Techniques for Protein Sequence Alignment and Database Searching
Sequence Similarity Searching Class 4 March 2010.
1 “INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN” Lecture 4 Multiple Sequence Alignment Doç. Dr. Nizamettin AYDIN
. Class 5: Multiple Sequence Alignment. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG.
Sequence analysis lecture 6 Sequence analysis course Lecture 6 Multiple sequence alignment 2 of 3 Multiple alignment methods.
1 Multiple sequence alignment Lesson 4. 2 VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Progressive MSA Do pair-wise alignment Develop an evolutionary tree Most closely related sequences are then aligned, then more distant are added. Genetic.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)
Rapid Global Alignments How to align genomic sequences in (more or less) linear time.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 6 – 07/01/08 Multiple sequence alignment 2 Sequence analysis 2007 Optimizing.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Multiple Sequence Alignments
CECS Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka Lecture 3: Multiple Sequence Alignment Eric C. Rouchka,
CISC667, F05, Lec8, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Bioinformatics Sequence Analysis III
CS 177 Sequence Alignment Classification of sequence alignments
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple sequence alignment
Multiple sequence alignment Monday, December 6, 2010 Bioinformatics J. Pevsner
Biology 4900 Biocomputing.
Multiple Sequence Alignment
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.
Multiple Sequence Alignment School of B&I TCD May 2010.
Protein Sequence Alignment and Database Searching.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
© Wiley Publishing All Rights Reserved. Building Multiple- Sequence Alignments.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center.
Multiple sequence alignments Introduction to Bioinformatics Jacques van Helden Aix-Marseille Université (AMU), France Lab.
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Using Traveling Salesman Problem Algorithms to Determine Multiple Sequence Alignment Orders Weiwei Zhong.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Sequence Alignment Only things that are homologous should be compared in a phylogenetic analysis Homologous – sharing a common ancestor This is true for.
Multiple sequence alignment
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
Burkhard Morgenstern Institut für Mikrobiologie und Genetik Molekulare Evolution und Rekonstruktion von phylogenetischen Bäumen WS 2006/2007.
Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Multiple alignments, PATTERNS, PSI-BLAST.
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
1 Multiple Sequence Alignment(MSA). 2 Multiple Alignment Number of sequences >2 Global alignment Seek an alignment that maximizes score.
Protein Sequence Alignment Multiple Sequence Alignment
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Multiple Sequence Alignment Carlow IT Bioinformatics November 2006.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Biology 224 Instructor: Tom Peavy October 18 & 20, Multiple Sequence.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
BIOINFORMATICS Ayesha M. Khan Spring Lec-6.
Multiple Sequence Alignment Dr. Urmila Kulkarni-Kale Bioinformatics Centre University of Pune
Pairwise alignment Now we know how to do it: How do we get a multiple alignment (three or more sequences)? Multiple alignment: much greater combinatorial.
INTRODUCTION TO BIOINFORMATICS
Multiple sequence alignment (msa)
Multiple Sequence Alignment
Sequence Based Analysis Tutorial
Introduction to Bioinformatics
Presentation transcript:

Sequence Comparison Introduction Comparison Homogy -- Analogy Identity -- Similarity Pairwise -- Multiple Scoring Matrixes Gap -- indel Global -- Local Manual alignment, dot plot visual inspection Dynamic programming Needleman-Wunsch exhaustive global alignment Smith-Waterman exhaustive local alignment Multiple alignment Database search BLAST FASTA

Sequence Comparison Multiple alignment (Multiple sequence alignment: MSA) Application Procedure Extrapolation Allocation of an uncharacterized sequence to a protein family. Phylogenetic analysis Reconstruction of the history of closely related proteins and protein families. Pattern identification Identification of regions characteristic of a function by conserved positions. Domain identification Turning MSA into a domain or protein family specific profile may be useful in identifying new or remote family members. DNA regulatory elements Turning DNA-MSAs of a binding site into a weight matrix may be used in scanning other DNA sequences for potential similar binding sites. Structure prediction Good MSAs yield high quality prediction of secondary structure and help building 3D models. PCR analysis Identification of less degenerated regions of a protein family are useful in fishing out new members by PCR (primer design).

Sequence Comparison Multiple alignment Multiple sequence alignment - Computational complexity V S N S _ S N A A N S V S N S

Sequence Comparison Multiple alignment Multiple sequence alignment - Computational complexity Alignment of protein sequences with 200 amino acids using dynamic programming # of sequences CPU time (approx.) 2 1 sec 4 104 sec – 2,8 hours 5 106 sec – 11,6 days 6 108 sec – 3,2 years 7 1010 sec – 371 years

Sequence Comparison Multiple alignment Approximate methods for MSA Multidimensional dynamic programming (MSA, Lipman 1988) Progressive alignments (Clustalw, Higgins 1996; PileUp, Genetics Computer Group (GCG)) Local alignments (e.g. DiAlign, Morgenstern 1996; lots of others) Iterative methods (e.g. PRRP, Gotoh 1996) Statistical methods (e.g. Bayesian Hidden Markov Models)

Sequence Comparison Multiple alignment Multiple sequence alignment - Programs Multidimentional Dynamic programming Progressive Clustal Tree based DCA T-Coffee MSA Combalign Dalign OMA Interalign Prrp Non tree based GA SAGA Sam HMMER GAs Iterative HMMS

Sequence Comparison Multiple alignment Multiple sequence alignment - Computational complexity Program Seq type Alignment Methode Comment ClustalW Prot/DNA Global Progressive No format limitation Run on Windows too! PileUp Prot/DNA Global Progressive Limited by the format and UNIX based MultAlin Prot/DNA Global Progressive/Iterativ Limited by the format T-COFFEE Prot/DNA Global/local Progressive Can be slow

Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X) ClustalW uses a progressive algorithm. Instead of aligning all sequences at once, it adds them little by little. Pairwise comparison of all sequences to align. „Clustering by similarity“ resulting in a dendrogram. Following the dendrogram topology, ClustalW aligns most similar pairs. Each alignment is replaced by a consensus sequence and further aligned as if it was a single sequence. ClustalW treats multiple alignments like single sequences and aligns them progressively two-by-two. Thus, alignment errors early in the procedure propagate throughout the whole MSA.

Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X) Principle: Pairwise Alignment Guide Tree Multiple Alignment by adding sequences 1 + 2 3 + 4 1 + 3 1 + 4 2 + 4 2 + 3 1 2 3 4 2 3 4 1 1 2 3

Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X) Pairwise Comparison of all sequences 1 : 2 1 : 3 1 : 4 1 : 5 2 : 3 2 : 4 2 : 5 3 : 4 3 : 5 4 : 5 Similarity score of every pair distance score of every pair

Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X) Sequence 1 2 3 4 5 Guide Tree 1 1 2 3 4 5 Distance Matrix: displays distances of all sequence pairs. 5 2 3 4

Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X) Guide Tree 1 5 2 3 4

Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X) G T C C G - C A G G T T - C G C C - G G T T A C T T C C A G G G T C C G - - C A G G T T - C G C - C - G G T T A C T T C C A G G

Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X) G T C C G - C A G G T T - C G C C - G G T T A C T T C C A G G . . . . and new gaps are inserted. G T C C G - - C A G G T T - C G C - C - G G T T A C T T C C A G G

Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X) G T C C G - - C A G G T T - C G C - C - G G T T A C T T C C A G G G T C C G - - C A G G T T - C G C - C - G G A T C - T - - C A A T C T G - T C C C T A G T T A C T T C C A G G A T C T - - C A A T C T G T C C C T A G

Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X) loops core CLUSTAL W (1.74) multiple sequence alignment sp|P20472|PRVA_HUMAN EDIKKAVGAFSATDS--FDHKKFFQMVG------LKKKSADDVKKVFHMLDKDKSGFIEEDELGFILKG sp|P32848|PRVA_MOUSE EDIKKAIGAFAAADS--FDHKKFFQMVG------LKKKNPDEVKKVFHILDKDKSGFIEEDELGSILKG sp|P18087|PRVA_RANCA GDISKAVEAFAAPDS--FNHKKFFEMCG------LKSKGPDVMKQVFGILDQDRSGFIEEDELCLMLKG sp|P02629|PRVA_LATCH EDIDKALNTFKEAGS--FDHHKFFNLVG------LKGKPDDTLKEVFGILDQDKSGYIEEEELKFVLKG sp|P02616|PRVB_AMPME KDIEAALSSVKAAES--FNYKTFFTKCG------LAGKPTDQVKKVFDILDQDKSGYIEEDELQLFLKN sp|P51879|ONCO_MOUSE DDIAAALQECQDPDT--FEPQKFFQTSG------LSKMSASQLKDIFQFIDNDQSGYLDEDELKYFLQR sp|P56503|PRVB_MERBI ADVAAALKACEAADS--FNYKAFFAKVG------LTAKSADDIKKAFFVIDQDKSGFIEEDELKLFLQV sp|P59747|PRVB_SCOJP AEVTAALDGCKAAGS--FDHKKFFKACG------LSGKSTDEVKKAFAIIDQDKSGFIEEEELKLFLQN sp|P02620|PRVB_MERME ADITAALAACKAEGS--FKHGEFFTKIG------LKGKSAADIKKVFGIIDQDKSDFVEEDELKLFLQN sp|P02630|PRVA_RAJCL ADITKALEQCAAG----FHHTAFFKASG------LSKKSDAELAEIFNVLDGDQSGYIEVEELKNFLKC sp|P02586|TPCS_RABIT EELDAIIEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNADGYIDAEELAEIFR- :: : :. *: : . * .:* : ..::: :** .:: * A star indicates an entirely conserved column. : A colon indicates columns, where all residues have roughly the same size and hydropathy. ● A period indicates columns, where the size or the hydropathy has been preserved in the course of evolution.

Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X)

Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X)

Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X)

Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X)

Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X)

Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X)

Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X)

Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X) >hso MNTWKEAIGQEKQQPYFQHILQQVQQARQSGRTIYPPQEEVFSAFRLTEFDQVRVVILGQDPYHGV NQAHGLAFSVKPGIAPPPSLVNIYKELSTDIMGFQTPSHGYLVGWAKQGVLLLNTVLTVEQGLAHSHANF GWETFTDRVIHVLNEQRDHLVFLLWGSHAQKKGQFIDRTKHCVLTSPHPSPLSAHRGFFGCRHFSKTNQY LRHHNLTEINWQLPMTI >pmu MKTWKDVIGTEKTQPYFKHILDQVHQARASGKIVYPPPQEVFSAFQLTEFEAVKVVIIGQDPYHGPNQAH GLAFSVKPGVVPPPSLMNMYKELTQDIEGFQIPNHGYLVPWAEQGVLLLNTVLTVEQGKAHSHASFGWET FTDRVIAALNAQREKLVFLLWGSHAQKKGQFIDRQKHCVFTAPHPSPLSAHRGFLGCRHFSKTNAYLMAQ GLSPIQWQLASL >hdu MNSWTEAIGEEKVQPYFQQLLQQVYQARASGKIIYPPQHEVFSAFALTDFKAVKVVILGQDPYHGPNQAH GLAFSVKPSVVPPPSLVNIYKELAQDIAGFQVPSHGYLIDWAKQGVLLLNTVLTVQQGMAHSHATLGWEI FTDKVIAQLNDHRENLVFLLWGSHAQKKGQFINRSRHCVLTAPHPSPLSAHRGFFGCQHFSKANAYLQSK GIATINWQLPLVV >apl MNNWTEALGEEKQQPYFQHILQQVHQERMNGVTVFPPQKEVFSAFALTEFKDVKVVILGQDPYHGPNQAH GLAFSVKPPVAPPPSLVNMYKELAQDVEGFQIPNHGYLVDWAKQGVLLLNTVLTVRQGQAHSHANFGWEI FTDKVIAQLNQHRENLVFLLWGSHAQKKGQFIDRSRHCVLTAPHPSPLSAYRGFFGCKHFSKTNRYLLSK GIAPINWQLRLEIDY >hin MKNWTDVIGTEKAQPYFQHTLQQVHLARASGKTIYPPQEDVFNAFKYTAFEDVKVVILGQDPYHGPNQAH GLAFSVKPEVAIPPSLLNIYKELTQDISGFQMPSNGYLVKWAEQGVLLLNTVLTVERGMAHSHANLGWER FTDKVIAVLNEHREKLVFLLWGSHAQKKGQMIDRTRHLVLTAPHPSPLSAHRGFFGCRHFSKTNSYLESH GIKPIDWQI >sfl MANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIYPPQKDVFNAFRFTELGDVKVVILGQDPYHGPG QAHGLAFSVRPGIAIPPSLLNMYKELENTIPGFTRPNHGYLESWARQGVLLLNTVLTVRAGQAHSHASLG WETFTDKVISLINQHREGVVFLLWGSHAQKKGAIIDKQRHHVLKAPHPSPLSAHRGFFGCNHFVLANQWL EQRGETPIDWMPVLPAECE

Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X)

Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X) Clustal file format vibrio.aln Clustal file format vibrio.dnd CLUSTAL X (1.81) multiple sequence alignment hdu ----------------------MN---SWTEAIGEEKVQPYFQQLLQQVYQARASGKIIY apl ----------------------MN---NWTEALGEEKQQPYFQHILQQVHQERMNGVTVF hso ----------------------MN---TWKEAIGQEKQQPYFQHILQQVQQARQSGRTIY pmu ----------------------MK---TWKDVIGTEKTQPYFKHILDQVHQARASGKIVY hin ----------------------MK---NWTDVIGTEKAQPYFQHTLQQVHLARASGKTIY sfl ----------------------MANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIY eco -----------------------ANELTWHDVLAEEKQQPHFLNTLQTVASERQSGVTIY sen ----------------------MATELTWHDVLADEKQQPYFINTLHTVAGERQSGITVY vvu ----------------------MTQQLTWHDVIGAEKEQSYFQQTLNFVEAERQAGKVIY vpa ----------------------MNQSPTWHDVIGEEKKQSYFVDTLNFVEAERAAGKAIY vch ----------------------MSESLTWHDVIGNEKQQAYFQQTLQFVESQRQAGKVIY ype ----------------------MSPSLTWHDVIGQEKEQPYFKDTLAYVAAERRAGKTIY vfi ----------------------MA--LTWNSIISAEKKKAYYQSMSEKIDAQRSLGKSIF vsa ----------------------MN--TSWNDILETEKEKPYYQEMMTYINEARSQGKKIF son --------------------------MTWPAFIDHQRTQPYYQQLIAFVNQERQVGKVIY cbl --------------------MPK---LTWQLLLSQEKNLPYFKNIFTILNQQKKSGKIIY bap --------------------MDNRTLLNWSSILKNEKKKYYFINIINHLFFERQK-KMIF cbu -------------------MTTMAETQTWQTVLGEEKQEPYFQEILDFVKKERKAGKIIY dra --MTDQPDLFGLAPDAPRPIIPANLPEDWQEALLPEFSAPYFHELTDFLRQERKE-YTIY xax --MTE-------------GEGRIQLEPSWKARVGDWLLRPQMRELSAFLRQRKAAGARVF xca --MTE-------------GEGRIQLEPSWKARVGEWLLQPQMQELSAFLRQRKAANARVF xfa --MNEQGKAINSS-----AESRIQLESSWKAHVGNWLLRPEMRDLSSFLRARKVAGVSVY pfl MTMTA--------------DDRIKLEPSWKEALRAEFDQPYMTELRTFLQQERAAGKEIY psy --MTS--------------DDRIKLEPSWKEALRDEFEQPYMAQLREFLRQEHAAGKEIY ppu --MTD--------------DDRIKLEPSWKAALRGEFDQPYMHQLREFLRGEYAAGKEIY pae --MTDN-------------DDRIKLEASWKEALREEFDKPYMKQLGEFLRQEKAAGKAIF avi --MGRV-------------EDRVRLEASWKEALHDEFEKPYMQELSDFLRREKAAGKEIY mde --MQPN-------------GKHVQLCESWMQQIGQEFEQPYMAELKAFLLREKKAGKTIY * : : :: ( hso:0.11940, hdu:0.08584, apl:0.08905) :0.03531) :0.00478, pmu:0.11739) :0.00668, hin:0.10800) :0.04106, sfl:0.00482, eco:0.00833) :0.03744, sen:0.05007) :0.11285, ype:0.12645, vvu:0.07310, vpa:0.07734) :0.03829, vch:0.09446) :0.02842) :0.00533) :0.01680) :0.01604,

Sequence Comparison Multiple alignment Multiple sequence alignment – T-Coffee T_Coffee uses a principle that‘s a bit similar to ClustalW. Yields more accurate alignments at the cost of computing time. Builds a progressive alignment as ClustalW, but Creates a library containing a complete collection of global (ClustalW) and local (Lalign) alignments and thus Compares segments across the entire data set

Sequence Comparison Multiple alignment Multiple sequence alignment - T-Coffee

Sequence Comparison Multiple alignment Multiple sequence alignment - T-Coffee RED high-quality segments YELLOW GREEN BLUE regions, that you have no reasons to trust