Download presentation
Presentation is loading. Please wait.
1
Sequence Comparison Introduction Comparison Homogy -- Analogy
Identity -- Similarity Pairwise -- Multiple Scoring Matrixes Gap -- indel Global -- Local Manual alignment, dot plot visual inspection Dynamic programming Needleman-Wunsch exhaustive global alignment Smith-Waterman exhaustive local alignment Multiple alignment Database search BLAST FASTA
2
Sequence Comparison Multiple alignment (Multiple sequence alignment: MSA) Application Procedure Extrapolation Allocation of an uncharacterized sequence to a protein family. Phylogenetic analysis Reconstruction of the history of closely related proteins and protein families. Pattern identification Identification of regions characteristic of a function by conserved positions. Domain identification Turning MSA into a domain or protein family specific profile may be useful in identifying new or remote family members. DNA regulatory elements Turning DNA-MSAs of a binding site into a weight matrix may be used in scanning other DNA sequences for potential similar binding sites. Structure prediction Good MSAs yield high quality prediction of secondary structure and help building 3D models. PCR analysis Identification of less degenerated regions of a protein family are useful in fishing out new members by PCR (primer design).
3
Sequence Comparison Multiple alignment
Multiple sequence alignment - Computational complexity V S N S _ S N A A N S V S N S
4
Sequence Comparison Multiple alignment
Multiple sequence alignment - Computational complexity Alignment of protein sequences with 200 amino acids using dynamic programming # of sequences CPU time (approx.) sec sec – 2,8 hours sec – 11,6 days sec – 3,2 years sec – 371 years
5
Sequence Comparison Multiple alignment Approximate methods for MSA
Multidimensional dynamic programming (MSA, Lipman 1988) Progressive alignments (Clustalw, Higgins 1996; PileUp, Genetics Computer Group (GCG)) Local alignments (e.g. DiAlign, Morgenstern 1996; lots of others) Iterative methods (e.g. PRRP, Gotoh 1996) Statistical methods (e.g. Bayesian Hidden Markov Models)
6
Sequence Comparison Multiple alignment
Multiple sequence alignment - Programs Multidimentional Dynamic programming Progressive Clustal Tree based DCA T-Coffee MSA Combalign Dalign OMA Interalign Prrp Non tree based GA SAGA Sam HMMER GAs Iterative HMMS
7
Sequence Comparison Multiple alignment
Multiple sequence alignment - Computational complexity Program Seq type Alignment Methode Comment ClustalW Prot/DNA Global Progressive No format limitation Run on Windows too! PileUp Prot/DNA Global Progressive Limited by the format and UNIX based MultAlin Prot/DNA Global Progressive/Iterativ Limited by the format T-COFFEE Prot/DNA Global/local Progressive Can be slow
8
Sequence Comparison Multiple alignment
Multiple sequence alignment – ClustalW (X) ClustalW uses a progressive algorithm. Instead of aligning all sequences at once, it adds them little by little. Pairwise comparison of all sequences to align. „Clustering by similarity“ resulting in a dendrogram. Following the dendrogram topology, ClustalW aligns most similar pairs. Each alignment is replaced by a consensus sequence and further aligned as if it was a single sequence. ClustalW treats multiple alignments like single sequences and aligns them progressively two-by-two. Thus, alignment errors early in the procedure propagate throughout the whole MSA.
9
Sequence Comparison Multiple alignment
Multiple sequence alignment – ClustalW (X) Principle: Pairwise Alignment Guide Tree Multiple Alignment by adding sequences 1 + 2 3 + 4 1 + 3 1 + 4 2 + 4 2 + 3 1 2 3 4 2 3 4 1 1 2 3
10
Sequence Comparison Multiple alignment
Multiple sequence alignment – ClustalW (X) Pairwise Comparison of all sequences 1 : 2 1 : 3 1 : 4 1 : 5 2 : 3 2 : 4 2 : 5 3 : 4 3 : 5 4 : 5 Similarity score of every pair distance score of every pair
11
Sequence Comparison Multiple alignment
Multiple sequence alignment – ClustalW (X) Sequence Guide Tree 1 1 2 3 4 5 Distance Matrix: displays distances of all sequence pairs. 5 2 3 4
12
Sequence Comparison Multiple alignment
Multiple sequence alignment – ClustalW (X) Guide Tree 1 5 2 3 4
13
Sequence Comparison Multiple alignment
Multiple sequence alignment – ClustalW (X) G T C C G - C A G G T T - C G C C - G G T T A C T T C C A G G G T C C G - - C A G G T T - C G C - C - G G T T A C T T C C A G G
14
Sequence Comparison Multiple alignment
Multiple sequence alignment – ClustalW (X) G T C C G - C A G G T T - C G C C - G G T T A C T T C C A G G and new gaps are inserted. G T C C G - - C A G G T T - C G C - C - G G T T A C T T C C A G G
15
Sequence Comparison Multiple alignment
Multiple sequence alignment – ClustalW (X) G T C C G - - C A G G T T - C G C - C - G G T T A C T T C C A G G G T C C G - - C A G G T T - C G C - C - G G A T C - T - - C A A T C T G - T C C C T A G T T A C T T C C A G G A T C T - - C A A T C T G T C C C T A G
16
Sequence Comparison Multiple alignment
Multiple sequence alignment – ClustalW (X) loops core CLUSTAL W (1.74) multiple sequence alignment sp|P20472|PRVA_HUMAN EDIKKAVGAFSATDS--FDHKKFFQMVG------LKKKSADDVKKVFHMLDKDKSGFIEEDELGFILKG sp|P32848|PRVA_MOUSE EDIKKAIGAFAAADS--FDHKKFFQMVG------LKKKNPDEVKKVFHILDKDKSGFIEEDELGSILKG sp|P18087|PRVA_RANCA GDISKAVEAFAAPDS--FNHKKFFEMCG------LKSKGPDVMKQVFGILDQDRSGFIEEDELCLMLKG sp|P02629|PRVA_LATCH EDIDKALNTFKEAGS--FDHHKFFNLVG------LKGKPDDTLKEVFGILDQDKSGYIEEEELKFVLKG sp|P02616|PRVB_AMPME KDIEAALSSVKAAES--FNYKTFFTKCG------LAGKPTDQVKKVFDILDQDKSGYIEEDELQLFLKN sp|P51879|ONCO_MOUSE DDIAAALQECQDPDT--FEPQKFFQTSG------LSKMSASQLKDIFQFIDNDQSGYLDEDELKYFLQR sp|P56503|PRVB_MERBI ADVAAALKACEAADS--FNYKAFFAKVG------LTAKSADDIKKAFFVIDQDKSGFIEEDELKLFLQV sp|P59747|PRVB_SCOJP AEVTAALDGCKAAGS--FDHKKFFKACG------LSGKSTDEVKKAFAIIDQDKSGFIEEEELKLFLQN sp|P02620|PRVB_MERME ADITAALAACKAEGS--FKHGEFFTKIG------LKGKSAADIKKVFGIIDQDKSDFVEEDELKLFLQN sp|P02630|PRVA_RAJCL ADITKALEQCAAG----FHHTAFFKASG------LSKKSDAELAEIFNVLDGDQSGYIEVEELKNFLKC sp|P02586|TPCS_RABIT EELDAIIEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNADGYIDAEELAEIFR- :: : :. *: : . * .:* : ..::: :** .:: * A star indicates an entirely conserved column. : A colon indicates columns, where all residues have roughly the same size and hydropathy. ● A period indicates columns, where the size or the hydropathy has been preserved in the course of evolution.
17
Sequence Comparison Multiple alignment
Multiple sequence alignment – ClustalW (X)
18
Sequence Comparison Multiple alignment
Multiple sequence alignment – ClustalW (X)
19
Sequence Comparison Multiple alignment
Multiple sequence alignment – ClustalW (X)
20
Sequence Comparison Multiple alignment
Multiple sequence alignment – ClustalW (X)
21
Sequence Comparison Multiple alignment
Multiple sequence alignment – ClustalW (X)
22
Sequence Comparison Multiple alignment
Multiple sequence alignment – ClustalW (X)
23
Sequence Comparison Multiple alignment
Multiple sequence alignment – ClustalW (X)
24
Sequence Comparison Multiple alignment
Multiple sequence alignment – ClustalW (X) >hso MNTWKEAIGQEKQQPYFQHILQQVQQARQSGRTIYPPQEEVFSAFRLTEFDQVRVVILGQDPYHGV NQAHGLAFSVKPGIAPPPSLVNIYKELSTDIMGFQTPSHGYLVGWAKQGVLLLNTVLTVEQGLAHSHANF GWETFTDRVIHVLNEQRDHLVFLLWGSHAQKKGQFIDRTKHCVLTSPHPSPLSAHRGFFGCRHFSKTNQY LRHHNLTEINWQLPMTI >pmu MKTWKDVIGTEKTQPYFKHILDQVHQARASGKIVYPPPQEVFSAFQLTEFEAVKVVIIGQDPYHGPNQAH GLAFSVKPGVVPPPSLMNMYKELTQDIEGFQIPNHGYLVPWAEQGVLLLNTVLTVEQGKAHSHASFGWET FTDRVIAALNAQREKLVFLLWGSHAQKKGQFIDRQKHCVFTAPHPSPLSAHRGFLGCRHFSKTNAYLMAQ GLSPIQWQLASL >hdu MNSWTEAIGEEKVQPYFQQLLQQVYQARASGKIIYPPQHEVFSAFALTDFKAVKVVILGQDPYHGPNQAH GLAFSVKPSVVPPPSLVNIYKELAQDIAGFQVPSHGYLIDWAKQGVLLLNTVLTVQQGMAHSHATLGWEI FTDKVIAQLNDHRENLVFLLWGSHAQKKGQFINRSRHCVLTAPHPSPLSAHRGFFGCQHFSKANAYLQSK GIATINWQLPLVV >apl MNNWTEALGEEKQQPYFQHILQQVHQERMNGVTVFPPQKEVFSAFALTEFKDVKVVILGQDPYHGPNQAH GLAFSVKPPVAPPPSLVNMYKELAQDVEGFQIPNHGYLVDWAKQGVLLLNTVLTVRQGQAHSHANFGWEI FTDKVIAQLNQHRENLVFLLWGSHAQKKGQFIDRSRHCVLTAPHPSPLSAYRGFFGCKHFSKTNRYLLSK GIAPINWQLRLEIDY >hin MKNWTDVIGTEKAQPYFQHTLQQVHLARASGKTIYPPQEDVFNAFKYTAFEDVKVVILGQDPYHGPNQAH GLAFSVKPEVAIPPSLLNIYKELTQDISGFQMPSNGYLVKWAEQGVLLLNTVLTVERGMAHSHANLGWER FTDKVIAVLNEHREKLVFLLWGSHAQKKGQMIDRTRHLVLTAPHPSPLSAHRGFFGCRHFSKTNSYLESH GIKPIDWQI >sfl MANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIYPPQKDVFNAFRFTELGDVKVVILGQDPYHGPG QAHGLAFSVRPGIAIPPSLLNMYKELENTIPGFTRPNHGYLESWARQGVLLLNTVLTVRAGQAHSHASLG WETFTDKVISLINQHREGVVFLLWGSHAQKKGAIIDKQRHHVLKAPHPSPLSAHRGFFGCNHFVLANQWL EQRGETPIDWMPVLPAECE
25
Sequence Comparison Multiple alignment
Multiple sequence alignment – ClustalW (X)
26
Sequence Comparison Multiple alignment
Multiple sequence alignment – ClustalW (X) Clustal file format vibrio.aln Clustal file format vibrio.dnd CLUSTAL X (1.81) multiple sequence alignment hdu MN---SWTEAIGEEKVQPYFQQLLQQVYQARASGKIIY apl MN---NWTEALGEEKQQPYFQHILQQVHQERMNGVTVF hso MN---TWKEAIGQEKQQPYFQHILQQVQQARQSGRTIY pmu MK---TWKDVIGTEKTQPYFKHILDQVHQARASGKIVY hin MK---NWTDVIGTEKAQPYFQHTLQQVHLARASGKTIY sfl MANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIY eco ANELTWHDVLAEEKQQPHFLNTLQTVASERQSGVTIY sen MATELTWHDVLADEKQQPYFINTLHTVAGERQSGITVY vvu MTQQLTWHDVIGAEKEQSYFQQTLNFVEAERQAGKVIY vpa MNQSPTWHDVIGEEKKQSYFVDTLNFVEAERAAGKAIY vch MSESLTWHDVIGNEKQQAYFQQTLQFVESQRQAGKVIY ype MSPSLTWHDVIGQEKEQPYFKDTLAYVAAERRAGKTIY vfi MA--LTWNSIISAEKKKAYYQSMSEKIDAQRSLGKSIF vsa MN--TSWNDILETEKEKPYYQEMMTYINEARSQGKKIF son MTWPAFIDHQRTQPYYQQLIAFVNQERQVGKVIY cbl MPK---LTWQLLLSQEKNLPYFKNIFTILNQQKKSGKIIY bap MDNRTLLNWSSILKNEKKKYYFINIINHLFFERQK-KMIF cbu MTTMAETQTWQTVLGEEKQEPYFQEILDFVKKERKAGKIIY dra MTDQPDLFGLAPDAPRPIIPANLPEDWQEALLPEFSAPYFHELTDFLRQERKE-YTIY xax MTE GEGRIQLEPSWKARVGDWLLRPQMRELSAFLRQRKAAGARVF xca MTE GEGRIQLEPSWKARVGEWLLQPQMQELSAFLRQRKAANARVF xfa MNEQGKAINSS-----AESRIQLESSWKAHVGNWLLRPEMRDLSSFLRARKVAGVSVY pfl MTMTA DDRIKLEPSWKEALRAEFDQPYMTELRTFLQQERAAGKEIY psy MTS DDRIKLEPSWKEALRDEFEQPYMAQLREFLRQEHAAGKEIY ppu MTD DDRIKLEPSWKAALRGEFDQPYMHQLREFLRGEYAAGKEIY pae MTDN DDRIKLEASWKEALREEFDKPYMKQLGEFLRQEKAAGKAIF avi MGRV EDRVRLEASWKEALHDEFEKPYMQELSDFLRREKAAGKEIY mde MQPN GKHVQLCESWMQQIGQEFEQPYMAELKAFLLREKKAGKTIY * : : :: ( hso: , hdu: , apl: ) : ) : , pmu: ) : , hin: ) : , sfl: , eco: ) : , sen: ) : , ype: , vvu: , vpa: ) : , vch: ) : ) : ) : ) : ,
27
Sequence Comparison Multiple alignment
Multiple sequence alignment – T-Coffee T_Coffee uses a principle that‘s a bit similar to ClustalW. Yields more accurate alignments at the cost of computing time. Builds a progressive alignment as ClustalW, but Creates a library containing a complete collection of global (ClustalW) and local (Lalign) alignments and thus Compares segments across the entire data set
28
Sequence Comparison Multiple alignment Multiple sequence alignment
- T-Coffee
29
Sequence Comparison Multiple alignment Multiple sequence alignment
- T-Coffee RED high-quality segments YELLOW GREEN BLUE regions, that you have no reasons to trust
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.