Download presentation
Presentation is loading. Please wait.
Published bySilas Cooper Modified over 8 years ago
1
Copyright OpenHelix. No use or reproduction without express written consent1
2
ClustalW using EBI Toolbox Version 1 An Introduction to Multiple Sequence Alignments (MSA) using the alignment program ClustalW2 at the EBI Toolbox site Materials prepared by: Steffen Schmidt, Ph.D. and Warren C. Lathe III, Ph.D. www.openhelix.com Updated: Q2 2011
3
Copyright OpenHelix. No use or reproduction without express written consent3 ClustalW Using EBI Interface Agenda Introduction & Credits Background and Theory The EBI Toolbox Site Sequence alignment using ClustalW2 Viewing the multiple sequence alignment Summary Exercises Copyright OpenHelix. No use or reproduction without express written consent3 ClustalW2: www.clustal.orgwww.clustal.org ClustalW2 EBI Toolbox: www.ebi.ac.uk/Tools/clustalw2www.ebi.ac.uk/Tools/clustalw2
4
Copyright OpenHelix. No use or reproduction without express written consent4 ClustalW Introduction Multiple sequence alignments (MSA) are the basis of many bioinformatics analyses molecular evolutionary analysis (phylogenetic trees) find functionally important positions in a sequence family prediction of secondary and tertiary structure of proteins Creation of a “correct” MSA is difficult automatic tools often can be improved by human intervention Copyright OpenHelix. No use or reproduction without express written consent4 MyoD from UniProt smart.embl.de PDB MyoD
5
Copyright OpenHelix. No use or reproduction without express written consent5 Literature and Software Sources Copyright OpenHelix. No use or reproduction without express written consent5
6
6 ClustalW Using EBI Interface Agenda Copyright OpenHelix. No use or reproduction without express written consent6 Introduction & Credits Background and Theory The EBI Toolbox Site Sequence alignment using ClustalW2 Viewing the multiple sequence alignment Summary Exercises ClustalW2: www.clustal.orgwww.clustal.org ClustalW2 EBI Toolbox: www.ebi.ac.uk/Tools/clustalw2www.ebi.ac.uk/Tools/clustalw2
7
Copyright OpenHelix. No use or reproduction without express written consent7 Row – a sequence (protein or nucleotide) Column – “equivalent” positions in different sequences gaps can be introduced to slide amino acids to the “correct” position Theory: Multiple Sequence Alignment (MSA) Copyright OpenHelix. No use or reproduction without express written consent7 “equivalent ” sequences gaps
8
Copyright OpenHelix. No use or reproduction without express written consent8 “equivalent” positions means “evolutionarily related” what is the evolutionary history of the sequences in the alignment? how can the alignment be explained by a set of amino acid / nucleotide substitutions, insertions, and deletions? Theory: Problem we only know the sequences of today we need to make assumptions about the past Copyright OpenHelix. No use or reproduction without express written consent8
9
9 Theory: Parsimony Parsimony: the simplest explanation is the best penalize events like insertion / deletions Copyright OpenHelix. No use or reproduction without express written consent9
10
10 Theory: Scoring Matrix substitution of similar amino acids is more likely Copyright OpenHelix. No use or reproduction without express written consent10 Serine AG(C/T), TC(N) Threonine AC(N) Tryptophan TGG probability of substitution Serine AG(C/T), TC(N) Threonine AC(N) Tryptophan TGG Serinefrequent rare Threoninefrequent rare Tryptophanrare
11
Copyright OpenHelix. No use or reproduction without express written consent11 Theory: Substitution or Scoring Matrix scoring matrix contains two kind of probabilities how often an amino acid occurs at random (diagonal) how often a substitution occurs (derived from actual alignments) Copyright OpenHelix. No use or reproduction without express written consent11 (positive values – more common, negative values – less likely) observed frequency of amino acid substitution expected frequency of both amino acids Score = log 2
12
Copyright OpenHelix. No use or reproduction without express written consent12 multiple sequence alignments computationally too intensive need for “shortcuts” pairwise sequence alignments scoring matrix gap penalties two kinds of pairwise sequence alignments Theory: Pairwise Alignment Copyright OpenHelix. No use or reproduction without express written consent12 global MACMYFASTCAT ---MYFA-TCTT localMACMYFASTCAT- M---YFA-TC-TT
13
Copyright OpenHelix. No use or reproduction without express written consent13 progressively assemble alignment guided by the tree create phylogentic tree / guided tree pairwise alignment of all sequences against all ClustalW Algorithm Overview Copyright OpenHelix. No use or reproduction without express written consent13 1212 1414 2424 2323 1313 3434 13241324 progessive alignment
14
Copyright OpenHelix. No use or reproduction without express written consent14 ClustalW Algorithm: Pairwise alignment pairwise alignment of all sequences against all aligning the complete sequences (global alignment) uses scoring matrices to score similarity two types of gap penalties - gap opening & gap extension Copyright OpenHelix. No use or reproduction without express written consent14
15
Copyright OpenHelix. No use or reproduction without express written consent15 create phylogentic tree / guide tree using the pairwise distance matrix computed above neighbor-joining ClustalW Algorithm: Guided Tree pairwise alignment of all sequences against all Copyright OpenHelix. No use or reproduction without express written consent15
16
Copyright OpenHelix. No use or reproduction without express written consent16 ClustalW Algorithm: Assembly progressively assemble alignment guided by the tree each alignment is analyzed to build a profile which is then merged with profile of the other branch gaps introduced in an alignment step before will be kept gap penalties will be varied depending on: - sequence similarity - neighboring amino acid (individual scores) - hydrophilic stretches (prone for gaps) - previous gaps (extension allowed, new gaps penalized) scoring matrix varies depending on the estimated divergence Copyright OpenHelix. No use or reproduction without express written consent16 pairwise alignment of all sequences against all create phylogentic tree / guided tree
17
Copyright OpenHelix. No use or reproduction without express written consent17 ClustalW2: Improvements ClustalW2 now allows option on tree program neighbor joining (more accurate) UPGMA (faster, less accurate) ClustalW2 refinement removing each sequence and re-aligns them, and test if this alignment is better. Two possibilities: a) “alignment”: aligning to complete alignment (faster) b) “tree”: aligning to each step of alignment (more accurate) Copyright OpenHelix. No use or reproduction without express written consent17
18
Copyright OpenHelix. No use or reproduction without express written consent18 ClustalW: Summary ClustalW a “progressive multiple alignment method” uses global pairwise alignments to create a phylogenetic tree stepwise assembly of the MSA by the tree Drawback: method heavily depends on the initial tree no guarantee that this tree is correct misaligned regions can’t be corrected later You need to critically look at your alignment Copyright OpenHelix. No use or reproduction without express written consent18
19
Copyright OpenHelix. No use or reproduction without express written consent19 ClustalW Using EBI Interface Agenda Copyright OpenHelix. No use or reproduction without express written consent19 Introduction & Credits Background and Theory The EBI Toolbox Site Sequence alignment using ClustalW2 Viewing the multiple sequence alignment Summary Exercises ClustalW2: www.clustal.orgwww.clustal.org ClustalW2 EBI Toolbox: www.ebi.ac.uk/Tools/clustalw2www.ebi.ac.uk/Tools/clustalw2
20
Copyright OpenHelix. No use or reproduction without express written consent20 EBI Toolbox Overview Copyright OpenHelix. No use or reproduction without express written consent20 http://www.ebi.ac.uk/ Sequence Analysis
21
Copyright OpenHelix. No use or reproduction without express written consent21 EBI Toolbox for Sequence Analysis Copyright OpenHelix. No use or reproduction without express written consent21 ClustalW2
22
Copyright OpenHelix. No use or reproduction without express written consent22 ClustalW Using EBI Interface Agenda Copyright OpenHelix. No use or reproduction without express written consent22 Introduction & Credits Background and Theory The EBI Toolbox Site Sequence alignment using ClustalW2 Viewing the multiple sequence alignment Summary Exercises ClustalW2: www.clustal.orgwww.clustal.org ClustalW2 EBI Toolbox: www.ebi.ac.uk/Tools/clustalw2www.ebi.ac.uk/Tools/clustalw2
23
Copyright OpenHelix. No use or reproduction without express written consent23 ClustalW2 Overview Copyright OpenHelix. No use or reproduction without express written consent23 Submit upload file
24
Copyright OpenHelix. No use or reproduction without express written consent24 ClustalW2 sample query Copyright OpenHelix. No use or reproduction without express written consent24 paste sequences >P02647|APOA1_HUMAN Apolipoprotein A-I precursor - Homo sapiens MKAAVLTLAVLFLTGSQARHFWQQDEPPQSPWDRVKDLATVYVDVLKDSGRDYVSQFEGS ALGKQLNLKLLDNWDSVTSTFSKLREQLGPVTQEFWDNLEKETEGLRQEMSKDLEEVKAK VQPYLDDFQKKWQEEMELYRQKVEPLRAELQEGARQKLHELQEKLSPLGEEMRDRARAHV DALRTHLAPYSDELRQRLAARLEALKENGGARLAEYHAKATEHLSTLSEKAKPALEDLRQ GLLPVLESFKVSFLSALEEYTKKLNTQ >Q00623|APOA1_MOUSE Apolipoprotein A-I precursor - Mus musculus MKAVVLAVALVFLTGSQAWHVWQQDEPQSQWDKVKDFANVYVDAVKDSGRDYVSQFESSS LGQQLNLNLLENWDTLGSTVSQLQERLGPLTRDFWDNLEKETDWVRQEMNKDLEEVKQKV QPYLDEFQKKWKEDVELYRQKVAPLGAELQESARQKLQELQGRLSPVAEEFRDRMRTHVD SLRTQLAPHSEQMRESLAQRLAELKSNPTLNEYHTRAKTHLKTLGEKARPALEDLRHSLM PMLETLKTKAQSVIDKASETLTAQ >Q9Z2L4|APOA1_MESAU Apolipoprotein A-I precursor - Mesocricetus auratus MKTVVLAVAVLFLTGSQARHFWQRDDPQTPWDRVKDFATVYVDAVKDSGREYVSQFETSA LGKQLNLNLLENWDTLGSTVGRLQEQLGPVTQEFWDNLEKETEWLRREMNKDLEEVKAKV QPYLDQFQTKWQEEVALYRQKMEPLGAELRDGARQKLQELQEKLTPLGEDLRDRMRHHVD ALRTKMTPYSDQMRDRLAERLAQLKDSPTLAEYHTKAADHLKAFGEKAKPALEDLRQGLM PVFESFKTRIMSMVEEASKKLNAQ >P08250|APOA1_CHICK Apolipoprotein A-I precursor - Gallus gallus MRGVLVTLAVLFLTGTQARSFWQHDEPQTPLDRIRDMVDVYLETVKASGKDAIAQFESSA VGKQLDLKLADNLDTLSAAAAKLREDMAPYYKEVREMWLKDTEALRAELTKDLEEVKEKI RPFLDQFSAKWTEELEQYRQRLTPVAQELKELTKQKVELMQAKLTPVAEEARDRLRGHVE ELRKNLAPYSDELRQKLSQKLEEIREKGIPQASEYQAKVMEQLSNLREKMTPLVQEFRER LTPYAENLKNRLISFLDELQKSVA
25
Copyright OpenHelix. No use or reproduction without express written consent25 ClustalW2 Alignment Method Copyright OpenHelix. No use or reproduction without express written consent25 alignment method
26
Copyright OpenHelix. No use or reproduction without express written consent26 Aligning Sequences: Fine-Tuning Slow Alignment Copyright OpenHelix. No use or reproduction without express written consent26 options Fast
27
Copyright OpenHelix. No use or reproduction without express written consent27 Aligning Sequences: Fine-Tuning Fast Alignment Copyright OpenHelix. No use or reproduction without express written consent27 Step 3 options
28
Copyright OpenHelix. No use or reproduction without express written consent28 Aligning Sequences: Scoring Parameters Copyright OpenHelix. No use or reproduction without express written consent28
29
Copyright OpenHelix. No use or reproduction without express written consent29 Aligning Sequences: Iteration Parameters Copyright OpenHelix. No use or reproduction without express written consent29
30
Copyright OpenHelix. No use or reproduction without express written consent30 Aligning Sequences: Output Format & Clustering Copyright OpenHelix. No use or reproduction without express written consent30
31
Copyright OpenHelix. No use or reproduction without express written consent31 ClustalW2 General Parameters Copyright OpenHelix. No use or reproduction without express written consent31
32
Copyright OpenHelix. No use or reproduction without express written consent32 ClustalW Using EBI Interface Agenda Copyright OpenHelix. No use or reproduction without express written consent32 Introduction & Credits Background and Theory The EBI Toolbox Site Sequence alignment using ClustalW2 Viewing the multiple sequence alignment Summary Exercises ClustalW2: www.clustal.orgwww.clustal.org ClustalW2 EBI Toolbox: www.ebi.ac.uk/Tools/clustalw2www.ebi.ac.uk/Tools/clustalw2
33
Copyright OpenHelix. No use or reproduction without express written consent33 ClustalW2 Alignment Copyright OpenHelix. No use or reproduction without express written consent33 AVFPMILW RED Small (small+ hydrophobic (incl. aromatic -Y)) DE BLUE Acidic RK MAGENTA Basic STYHCNGQ GREEN Hydroxyl + Amine + Basic - Q Others Gray * Asterisks are identical amino acids. : Colons are significantly conservative amino acids substitutions. Periods are amino acids substitutions that suggest some conservation conservation
34
Copyright OpenHelix. No use or reproduction without express written consent34 ClustalW2 Output Overview Copyright OpenHelix. No use or reproduction without express written consent34 output files scores
35
Copyright OpenHelix. No use or reproduction without express written consent35 Guide Tree and Cladogram Copyright OpenHelix. No use or reproduction without express written consent35 Right click for display options
36
Copyright OpenHelix. No use or reproduction without express written consent36 Submission Details Copyright OpenHelix. No use or reproduction without express written consent36 Input parameters
37
Copyright OpenHelix. No use or reproduction without express written consent37 Jalview Visualization Copyright OpenHelix. No use or reproduction without express written consent37 Jalview
38
Copyright OpenHelix. No use or reproduction without express written consent38 Jalview Overview Copyright OpenHelix. No use or reproduction without express written consent38 alignment conservation consensus quality position
39
Copyright OpenHelix. No use or reproduction without express written consent39 Jalview Editing: Deleting Copyright OpenHelix. No use or reproduction without express written consent39
40
Copyright OpenHelix. No use or reproduction without express written consent40 Jalview Editing: Sliding Sequences Copyright OpenHelix. No use or reproduction without express written consent40 shift “Q”
41
Copyright OpenHelix. No use or reproduction without express written consent41 Jalview Editing: Removing Columns Copyright OpenHelix. No use or reproduction without express written consent41
42
Copyright OpenHelix. No use or reproduction without express written consent42 Jalview: Saving Alignment Copyright OpenHelix. No use or reproduction without express written consent42
43
Copyright OpenHelix. No use or reproduction without express written consent43 ClustalW Using EBI Interface Agenda Copyright OpenHelix. No use or reproduction without express written consent43 Introduction & Credits Background and Theory The EBI Toolbox Site Sequence alignment using ClustalW2 Viewing the multiple sequence alignment Summary Exercises ClustalW2: www.clustal.orgwww.clustal.org ClustalW2 EBI Toolbox: www.ebi.ac.uk/Tools/clustalw2www.ebi.ac.uk/Tools/clustalw2
44
Copyright OpenHelix. No use or reproduction without express written consent44 ClustalW Summary Multiple sequence alignments examine relationships ClustalW at the EBI Tool Site Jalview: a multiple sequence alignment editor Copyright OpenHelix. No use or reproduction without express written consent44
45
Copyright OpenHelix. No use or reproduction without express written consent45 ClustalW Using EBI Interface Agenda Copyright OpenHelix. No use or reproduction without express written consent45 Introduction & Credits Background and Theory The EBI Toolbox Site Sequence alignment using ClustalW2 Viewing the multiple sequence alignment Summary Exercises ClustalW2: www.clustal.orgwww.clustal.org ClustalW2 EBI Toolbox: www.ebi.ac.uk/Tools/clustalw2www.ebi.ac.uk/Tools/clustalw2
46
Copyright OpenHelix. No use or reproduction without express written consent46
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.