Download presentation
Presentation is loading. Please wait.
Published byBerenice Brown Modified over 9 years ago
1
From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel May 2012 http://ibis.tau.ac.il/intro_bioinfo/phylogenyWorkshop/
2
2 of 50 The Human Genome Project ("behind the scene”) Venter et. al., Science 292:1304-1351 (2001) International Human Genome Sequencing Consortium, Nature, 409: 860-921 (2001) The club resident J.D. Watson: Back2back with DJ. Venter -
3
3 of 50 Genome Sequencing – Ongoing Revolution The race is (still) on…The promise is huge…
4
4 of 50 SRA(Sequence Read Archive)= raw seq. from next-generation machines Trace=raw seq. from 90s machines)
5
5 of 50 Darwin’s teachings– Tree-like evolution Introduction – The tree concept
6
6 of 50 Darwin’s teachings– common descent Introduction – The tree concept
7
7 of 50 Common Descent – Modern evidence Introduction – The tree concept "The unity of life is no less remarkable than its diversity" "The unity of life is no less remarkable than its diversity" THEODOSIUS DOBZHANSK
8
8 of 50 Mathematicians developed tools to analyze Trees Adapted from Huson et al. 2008 connected graph without cycles is a tree.connectedcycles Not a tree! (cycle)Rooted binary tree Tree Part of the wider field of graph theory Bridges of Königsberg
9
9 of 50 What is a Phylogenetic Tree? Phylogenetic tree: (hypothetical) historical pattern of evolutionary relationships among organisms Introduction – The tree concept (Greek: phylon = race and genetic = birth) sps Horizontal branch length – proportional to evolutionary distances (unit = substitution / site)
10
10 of 50 Molecular evidence of HIV transmission in a criminal case Introduction - Anecdotes Metzker, Michael L. et al. (2002) Proc. Natl. Acad. Sci. USA 99, 14292-14297
11
11 of 50 Criminal investigation August 1994 a nurse tests negative for HIV. breaks off a messy 10 year affair with a doctor. Three weeks later the doctor gives his ex-mistress a vitamin B-12 shot In January 1995, the nurse tests positive for both HIV and hepatitis C. The doctor’s office records from the day are missing (but eventually found). The doctor had withdrawn blood samples from a known HIV patient and a known hepatitis C patient the same day as the vitamin B-12 shot. The nurse had never had contact with either patient Introduction - Anecdotes Circumstantial evidence that the doctor injected blood from a patient of his into this ex-girlfriend…. How can this be proved using a phylogenetic approach?
12
12 of 50 HIV – short background Extreme heterogeneity Within each patient there are many different viral strains ("quasi-species") Introduction - Anecdotes
13
13 of 50 History of the virus: gp120 (Gene tree) PATIENT VICTIM CONTROLS ©2002 National Academy of Sciences, U.S.A. Introduction - Anecdotes
14
14 of 50 History of the virus: RT (Gene tree) VICTIM PATIENT Introduction - Anecdotes Source sequences that are paraphyletic (other sequences are nested within them) with respect to the recipient sequences provide evidence for the direction of transmission.
15
15 of 50 Ernst Haeckel's Monophyletic tree of organisms, 1866 Reconstructing the tree of life
16
16 of 50 Organisms classified into 2 domains: Eukaryotes including {plants, animals, protists, fungi} Prkaryotes = Bacteria Whittaker, 1969
17
17 of 50 Reconstructing the Tree Of Life Carl Woese, 1977. phylogenetic taxonomy of 16S ribosomal RNAphylogenetictaxonomy16S ribosomal RNA Critiques: Woese un-balanced the tree of life… (too much representation for microbial species)
18
18 of 50 Phylogenetic analysis: Not only among organisms - Cancer phylogeny A phylogeny of acute myeloid leukemia (AML) subtypes Riester et al. 2010Liu et al. 2009
19
19 of 50 Phylogenetic analysis: Not only in biology – Language evolution Russell and Atkinson. 2003 Researchers learn the evolution of languages by treating them like genomes. Instead of COGs (gene families), analyze COGNATES (words families)
20
20 of 50 Reading Trees: Which tree is more accurate? Reading Trees Haeckel’s pedigree of man Human "on top" – wrong!
21
21 of 50 Rooted vs. Un-rooted trees human mouse fugu Drosophila root edge internal node leaf human mouse fugu Drosophila root edge internal node (ancestor) leaf time Reading Trees
22
22 of 50 Gorilla gorilla (Gorilla) Homo sapiens (human) Pan troglodytes (Chimpanzee) Gallus gallus (chicken) How do we root a tree? Reading Trees
23
23 of 50 Rooting based on a priori knowledge: Using Outgroup Human Chimp Chicken Gorilla HumanChimpChickenGorilla Reading Trees
24
24 of 50 Comparative Genomics – " All life is one" Compare homologues sequences – Multiple Sequence Alignments
25
25 of 50 Orthologs speciation ancestor descendant 2 (e.g., dog) descendant 1 (e.g., human) Orthologs will typically have the same or similar function in the course of evolution.
26
26 of 50 Paralogs Duplication Evolutionary innovation - lack of the original selective pressure upon one copy, this copy is free to mutate and acquire new functions.
27
27 of 50 Alignment and phylogeny are mutually dependant Inaccurate tree building MSA Sequence alignment Phylogeny reconstruction Unaligned sequences
28
28 of 50 Part II: Tools
29
29 of 50 Multiple sequence alignment (MSA) Several advanced MSA programs are available. Today we will use two: MAFFT – fast and relatively accurate PRANK – distinct from all other MSA programs because of its correct treatment of insertions/deletions Tools - Alignments
30
30 of 50 MAFFT Web server (& download option): http://mafft.cbrc.jp/alignment/server/index.html http://mafft.cbrc.jp/alignment/server/index.html Efficiency-tuned variants quick & dirty or slow but accurate Nucleic Acids Research, 2002, Vol. 30, No. 14 3059-3066 © 2002 Oxford University PressOxford University Press MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform Kazutaka Katoh, Kazuharu Misawa 1, Kei-ichi Kuma and Takashi Miyata * Tools - Alignments
31
31 of 50 Choosing a MAFFT strategy quick & dirty slow but accurate Tools - Alignments
32
32 of 50 MAFFT output Saving the output Choose a format: Clustal, Fasta, or click "Reformat" to convert to a selection of other formats Save page as a text file A colored view of the alignment Tools - Alignments
33
33 of 50 PRANK Tools - Alignments
34
34 of 50 Classical alignment errors for HIV env Tools - Alignments CLUSTALW PRANK
35
35 of 50 PRANK Web server: http://www.ebi.ac.uk/goldman-srv/webPRANK/http://www.ebi.ac.uk/goldman-srv/webPRANK/ Tools - Alignments
36
36 of 50 PRANK output If you need a different format – copy the results to the READSEQ sequence converter: http://www-bimas.cit.nih.gov/molbio/readseq/ http://www-bimas.cit.nih.gov/molbio/readseq/ Tools - Alignments
37
37 of 50 Downloadable PRANK http://www.ebi.ac.uk/goldman-srv/prank/prank/ PRANK: A command-line program interface PRANKSTER: A program with graphical user interface Tools - Alignments
38
38 1.Download the sequence files from the web-site http://ibis.tau.ac.il/intro_bioinfo/phylogenyWorkshop/ Open "fahA.fas" in Notepad/Browser – these are 65 protein sequences in FASTA format. 2.Run PRANK web server http://www.ebi.ac.uk/goldman-srv/webPRANK/ http://www.ebi.ac.uk/goldman-srv/webPRANK/ (1)
39
39 of 50 Trees Reconstruction Methods
40
40 of 50 Phylogeny reconstruction Different approaches (algorithms / programs): Distance based methods (e.g. neighbor-joining, as in ClustalW) Fast but inaccurate Maximum parsimony (e.g. MEGA)MEGA Maximum likelihood methods (e.g. phyML, RAxML) Accurate but slowerphyMLRAxML Bayesian methods (e.g. MrBayes) Most accurate but very slowMrBayes ABCDEABCDE Guide tree A D C B E MSA Pairwise distance table Tools - Trees
41
41 of 50 PhyML The most widely used maximum likelihood (ML) program Web server (& download) : http://www.atgc-montpellier.fr/phyml/http://www.atgc-montpellier.fr/phyml/ Tools - Trees
42
42 of 50 Downloadable PhyML Less user-friendly, but allows using local computer power Run "phyml.bat" Drag the file from Windows Explorer to the blue window Enter "d" to switch from DNA to AA Enter "y" to run
43
43 1.Give "fahA.prank.phylip" or "fahA.mafft.phylip" as input to the phyML webserver (don't forget to choose "Amino-acids" and enter your email and name) (2)
44
44 of 50 RAxML Web server: http://phylobench.vital-it.ch/raxml-bb/http://phylobench.vital-it.ch/raxml-bb/ Similar maximum likelihood (ML) methodology as phyML, but much faster Faster results with bootstrap Tools - Trees
45
45 of 50 Bootstrapping Now we have a tree, but what is the reliability of this tree?
46
46 of 50 Bootstrap A. Generate pseudo-data sets by sampling N positions Do not change the number of sequences. Resample (100-1000 time). 12345 100 1 : ATCTG…A 2 : ATCTG…C 3 : ACTTA…C 4 : ACCTA…T 12345 100 1 : AATTT…T 2 : AATTT…G 3 : AACTT…T 4 : AACTT…T 11244x 12345 100 1 : TTTAT…T 2 : TAACC…G 3 : TAACC…T 4 : TGGGA…T 47789…x 12345 100 1 : AGGTA…T 2 : AGGAC…G 3 : AAAAC…A 4 : AAAGG…C 15578…x
47
47 of 50 Bootstrap B. Reconstruct a tree from each data set. 12345 100 1 : AATTT…T 2 : AATTT…G 3 : AACTT…T 4 : AACTT…T 11244x 12345 100 1 : TTTAT…T 2 : TAACC…G 3 : TAACC…T 4 : TGGGA…T 47789…x 12345 100 1 : AGGTA…T 2 : AGGAC…G 3 : AAAAC…A 4 : AAAGG…C 15578…x Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4
48
48 of 50 C. compute the majority rule consensus. Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 67% 100% In 67% of the data sets, the split between SP1+SP2 and the rest of the tree was found. Bootstrap
49
49 1.Give "fahA.prank.phylip" or "fahA.mafft.phylip" as input to the RAxML webserver (don't forget to tick "Protein sequences" and “Maximum likelihood search” and enter your email) (3)
50
50 of 50 FigTree: tree visualization and figure creation http://tree.bio.ed.ac.uk/software/figtree/ http://tree.bio.ed.ac.uk/software/figtree/ Manipulate a node Manipulate a clade Manipulate a taxon
51
51 1.In case tree are not ready yet… download tree from website 2.Open "fahA.prank.phylip_phyml_tree.txt" in FigTree http://tree.bio.ed.ac.uk/software/figtree/ http://tree.bio.ed.ac.uk/software/figtree/ 3.Play around with the different options and make a pretty figure! 1.Find out how to color specific clades, as below 2.Try each of the three options under "Layout" 4.Export a figure in PDF format (File Export Graphic…) (4)
52
52 of 50 Final Questions…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.