Download presentation
Presentation is loading. Please wait.
1
Multiple Sequence Alignment (MSA) and Phylogeny
2
One of the options to get multiple sequence Fasta file
4
MSA input: multiple sequence Fasta file >gi|10835167|ref|NP_000607.1| CD4 antigen precursor [Homo sapiens] MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLT KGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI >gi|57113961|ref|NP_001009043.1| CD4 antigen [Pan troglodytes] MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQTKILGNQGSFLT KGPSKLNDRVDSRRSLWDQGNFTLIIKNLKIEDSDTYICEVGDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAQRMSQIKRLLSEKKTCQCPHRFQKTCSPI >gi|50054438|ref|NP_001001908.1| CD4 antigen [Sus scrofa] MDPGTSLRHLFLVLQLAMLPAASGTQEKYLVLGKAGDLAELPCHSSQKKNLPFNWKNSNQTKILGGHGSF WHTASVTELTSRLDSKKNMWDHGSFPLIIKNLEVTDSGIYICEVEDKRIEVQLLVFRLTASVTRVLLGQS LTLTLEGPSGSHPTVQWKGPGNKSKNDVKSLLLPQVGLEDSGLWTCTVSQDQKTLVFRSNIFVLAFQKVP STVYVKEGDQVALSFPLTFEAESLSGELMWRQTKGASSPQSWITFSLKDRKVTVQKSLQNLKLRMAEKLP LQITLLQALPQYAGSGNLTLVLPEGRLHREVNLVVMRATQSKNEVTCEVLGPTPPKVVLSLKLGNQSMKV SDQQKLVTVLDPEAGMWRCLLRDKDKVLLESQVEVLPTAFTRAWPELLASVIGGIIGLLFLAGFCIACVK CWHRRRRAERMSQIKRLLSEKKTCQCAHRQQKNYSLT >gi|6978631|ref|NP_036837.1| Cd4 molecule [Rattus norvegicus] MCRGFSFRHLLPLLLLQLSKLLVVTQGKTVVLGKEGGSAELPCESTSRRSASFAWKSSDQKTILGYKNKL LIKGSLELYSRFDSRKNAWERGSFPLIINKLRMEDSQTYVCELENKKEEVELWVFRVTFNPGTRLLQGQS LTLILDSNPKVSDPPIECKHKSSNIVKDSKAFSTHSLRIQDSGIWNCTVTLNQKKHSFDMKLSVLGFAST SITAYKSEGESAEFSFPLNLGEESLQGELRWKAEKAPSSQSWITFSLKNQKVSVQKSTSNPKFQLSETLP LTLQIPQVSLQFAGSGNLTLTLDRGILYQEVNLVVMKVTQPDSNTLTCEVMGPTSPKMRLILKQENQEAR VSRQEKVIQVQAPEAGVWQCLLSEGEEVKMDSKIQVLSKGLNQTMFLAVVLGSAFSFLVFTGLCILFCVR CRHQQRQAARMSQIKRLLSEKKTCQCSHRMQKSHNLI
5
Clustal X
6
Step1: Load the sequences
7
Uploaded sequences A little unclear…
8
Edit Fasta headers… MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLT KGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQTKILGNQGSFLT KGPSKLNDRVDSRRSLWDQGNFTLIIKNLKIEDSDTYICEVGDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAQRMSQIKRLLSEKKTCQCPHRFQKTCSPI MDPGTSLRHLFLVLQLAMLPAASGTQEKYLVLGKAGDLAELPCHSSQKKNLPFNWKNSNQTKILGGHGSF WHTASVTELTSRLDSKKNMWDHGSFPLIIKNLEVTDSGIYICEVEDKRIEVQLLVFRLTASVTRVLLGQS LTLTLEGPSGSHPTVQWKGPGNKSKNDVKSLLLPQVGLEDSGLWTCTVSQDQKTLVFRSNIFVLAFQKVP STVYVKEGDQVALSFPLTFEAESLSGELMWRQTKGASSPQSWITFSLKDRKVTVQKSLQNLKLRMAEKLP LQITLLQALPQYAGSGNLTLVLPEGRLHREVNLVVMRATQSKNEVTCEVLGPTPPKVVLSLKLGNQSMKV SDQQKLVTVLDPEAGMWRCLLRDKDKVLLESQVEVLPTAFTRAWPELLASVIGGIIGLLFLAGFCIACVK CWHRRRRAERMSQIKRLLSEKKTCQCAHRQQKNYSLT MCRGFSFRHLLPLLLLQLSKLLVVTQGKTVVLGKEGGSAELPCESTSRRSASFAWKSSDQKTILGYKNKL LIKGSLELYSRFDSRKNAWERGSFPLIINKLRMEDSQTYVCELENKKEEVELWVFRVTFNPGTRLLQGQS LTLILDSNPKVSDPPIECKHKSSNIVKDSKAFSTHSLRIQDSGIWNCTVTLNQKKHSFDMKLSVLGFAST SITAYKSEGESAEFSFPLNLGEESLQGELRWKAEKAPSSQSWITFSLKNQKVSVQKSTSNPKFQLSETLP LTLQIPQVSLQFAGSGNLTLTLDRGILYQEVNLVVMKVTQPDSNTLTCEVMGPTSPKMRLILKQENQEAR VSRQEKVIQVQAPEAGVWQCLLSEGEEVKMDSKIQVLSKGLNQTMFLAVVLGSAFSFLVFTGLCILFCVR CRHQQRQAARMSQIKRLLSEKKTCQCSHRMQKSHNLI >Homo_sapiens_CD4 >Pan_troglodytes_CD4 >Sus_scrofa_CD4 >Rattus_norvegicus_CD4 <gi|10835167|ref|NP_000607.1| CD4 antigen precursor [Homo sapiens] >gi|57113961|ref|NP_001009043.1| CD4 antigen [Pan troglodytes] >gi|50054438|ref|NP_001001908.1| CD4 antigen [Sus scrofa] >gi|6978631|ref|NP_036837.1| Cd4 molecule [Rattus norvegicus]
9
Uploaded sequences Much better
10
Step2: Perform alignment
11
Multiple Sequence Alignment and conservation view
12
Step 3: Create tree
13
The Newick tree format is used to represent trees as strings CA D In Newick format: ((A,C),(B,D)); B Each pair of parenthesis () encloses a clade in the tree A comma “,” separates the members of the corresponding clade A semicolon “;” is always the last character
14
Step 4: View tree with NJPlot Note: unrooted tree
15
Step 4.5: defining an outgroup
16
Note: The order inside a split doesn’t matter Swapping nodes
17
Bootstrap
18
A. Resample (100-1000 time). 12345 100 1 : ATCTG…A 2 : ATCTG…C 3 : ACTTA…C 4 : ACCTA…T 12345 100 1 : AATTT…T 2 : AATTT…G 3 : AACTT…T 4 : AACTT…T 11244x 12345 100 1 : TTTAT…T 2 : TAACC…G 3 : TAACC…T 4 : TGGGA…T 47789…x 12345 100 1 : AGGTA…T 2 : AGGAC…G 3 : AAAAC…A 4 : AAAGG…C 15578…x
19
Bootstrap B. Reconstruct a tree from each data set. 12345 100 1 : AATTT…T 2 : AATTT…G 3 : AACTT…T 4 : AACTT…T 11244x 12345 100 1 : TTTAT…T 2 : TAACC…G 3 : TAACC…T 4 : TGGGA…T 47789…x 12345 100 1 : AGGTA…T 2 : AGGAC…G 3 : AAAAC…A 4 : AAAGG…C 15578…x Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4
20
C. We compute the majority rule consensus. Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 67% 100% In 67% of the data sets, the split between SP1+SP2 and the rest of the tree was found. Bootstrap
21
Step 3.5 - Bootstrap
22
Bootstrap values on NJPlot Note: ClustalX saves trees with.ph extension. Trees with bootstrap are saved with.phb extension
23
Detecting selection forces using phlogeny (ConSeq, ConSurf, Selecton)
24
“Important” sites evolve slower unimportantones than “unimportant” ones.
25
Conservation = functional/structural importance
26
Use Phylogenetic information 1234567 HumanDMAAHAM ChimpDEAAGGC CowDQAAWAP FishDLAACAL S. cerevisiae DDGAFAA S. pombe DDGALGE AG A A A G AA A A G G
27
http://conseq.tau.ac.il Site-specific rate computation tool
28
Using ConSeq
29
ConSeq results
30
ConSeq conservation scores
31
Conservation scores: The scores are standardized: the average score for all residues is zero, and the standard deviation is one The lowest score represents the most conserved site in the protein negative values: slowly evolving (= low evolutionary rate), conserved sites negative values: slowly evolving (= low evolutionary rate), conserved sites The highest score represents the most variable site in the protein positive values: rapidly evolving (= fast evolutionary rate), variable sites positive values: rapidly evolving (= fast evolutionary rate), variable sites Scores are relative to the protein. Scores of different proteins are incomparable !!!
32
ConSeq results
33
Color-coded results
34
Protein core: structurally constrained - usually conserved Active site: functionally constrained - usually conserved Surface loops: tolerant to mutations - usually variable Hydrophobic core Surface loops Conservation in the structure Active site
35
Color-coded results
36
http://consurf.tau.ac.il Same algorithm as ConSeq, but here the results are projected onto the 3D structure of the protein
37
Using ConSurf
38
ConSurf results
39
First Glance in Jmol visual presentation
40
ConSeq/ConSurf user intervention (advanced options) ConSeq/ConSurf user intervention (advanced options) 1. Choosing the method for calculating the amino-acid conservation scores: (Bayesian/Max Likelihood) 2. Entering your own MSA file 3. Performing the MSA using: (MUSCLE/CLUSTALW) 4. Collecting the homologs from: (SWISS-PROT/UniProt) 5. Max. number of homologs: (50) 6. No. of PSI-BLAST iterations: (1) 7. PSI-BLAST 3-value cutoff: (0.001 ) 8. Model of substitution for proteins: (JTT/Dayhoff/mtREV/cpREV/WAG) 9. Entering your own PDB file 10. Entering your own TREE file
41
Solution – look at the DNA Purifying selection Syn > Non-syn Adaptive evolution = Positive selection Non-syn > Syn Neutral selection Syn = Non-syn
42
Selection score (Ka/Ks) < 1purifying selection Selection score (Ka/Ks) > 1positive selection Selection score (Ka/Ks) = 1no selection Ka/Ks also known as… (or dn/ds, or ω) Non- synonymous mutation rate Synonymous mutation rate
43
http://selecton.tau.ac.il
44
Selecton input The user must provide the sequences – no psi-blast option Coding sequences Only ORF No stop codons If an MSA is provided it must be codon aligned (RevTrans) (RevTrans) Codon-level sequences !!!
45
Comparing H 0 and H 1 in Selecton H0H0 H1H1
46
Solution: statistics helps us to compare between hypotheses H 0 : There’s no positive selection H 1 : There is positive selection H 0 : compute the probability (likelihood) of the data using a model that does not account for positive selection H 1 : compute the probability (likelihood) of the data using a model that does account for positive selection Perform a statistical test to accept or reject H 0 (likelihood ratio test) P-value a > 0.05 ( a ) accept H 0 a < 0.05 ( a ) reject H 0
47
Comparing H 0 and H 1 in Selecton
49
Selecton results:
50
Results Humanrhesus swaps at sites 332, 335-340 confer human resistance to HIV and rhesus resistance to SIV
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.