Download presentation
Presentation is loading. Please wait.
1
Phylogeny
2
Reconstructing a phylogeny The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data The data must be comprised of homologous types In molecular evolution, the studied data are homologous DNA/AA sequences Phylogeny reconstruction explicitly assumes that the sequences are aligned INPUT = MSA
3
Reminder: MSA and phylogeny are dependent Inaccurate guide tree MSA Sequence alignment Phylogeny reconstruction Unaligned sequences
4
Phylogeny representation CA D Textual representation (Newick format) B Each pair of parenthesis () encloses a clade in the tree A comma “,” separates the members of the corresponding clade A semicolon “;” is always the last character Visual representation ((A,C),(B,D));
5
Some terminology root internal branches (splits) internal nodes External nodes (leaves) monophyletic group (clade) External branches Neighbors
6
ChimpHumanGorilla HumanChimpGorilla = ChimpGorillaHuman == GorillaChimp (Gorilla,(Human,Chimp)) = (Gorilla,(Chimp,Human)) = ((Human,Chimp),Gorilla) = ((Chimp,Human),Gorilla) Swapping neighbors is meaningless
7
1 2 3 A B C 1 CB A 2 BC A 3 AB C ≠ ≠ Rooted vs. unrooted
8
1 2 3 A B C 1 CB A 2 BC A 3 AB C ≠ ≠ ((C,B),A) ((A,B),C) ((A,C),B) (A,B,C) In newick format
9
How can we root a tree?
10
Rooting the tree based on a priori knowledge: using an outgroup HumanChimp Chicken Gorilla INGROUP OUTGROUP Human Chimp Gorilla Chicken Human Chimp Chicken Gorilla The outgroup should be close enough for detecting sequence homology, but far enough to be a clear outgroup
11
The gene tree is not always identical to the species tree Gorilla Chimp Chicken Human GorillaChimp Chicken Human Chimp Chicken Gorilla ≠ Gene tree Species tree
12
Phylogeny reconstruction approaches Distance based methods: Neighbor Joining B D A C E A D C E B A,B B D A C E ABCDE A02344 B0345 C034 D05 E0 CDE 02.54.53.5 C034 D05 E0 The Minimum Evolution (ME) criterion: in each iteration we separate the two sequences which result with the minimal sum of branch lengths
13
Maximum Parsimony: finds the most parsimonious topology Seq 1: Seq 2: Seq 3: Seq 4: 1324 14231234 Phylogeny reconstruction approaches 1324 14231234 P(Data|T) Maximum Likelihood: finds the most likely topology Topology search methods: MP, ML
14
Distance based methods Neighbor Joining (e.g., using ClustalX) Neighbor Joining (e.g., using ClustalX) Fast Fast Inaccurate Topology search methods Maximum parsimony (e.g., using MEGA ) Maximum parsimony (e.g., using MEGA ) MEGA ×Crude ×Questionable statistical basis Maximum likelihood (e.g., using RAxML, phyML ) Maximum likelihood (e.g., using RAxML, phyML ) RAxMLphyML RAxMLphyML Accurate Accurate Slow Bayesian methods Monte Carlo Markov Chains (MCMC) (e.g., using MrBayes ) Monte Carlo Markov Chains (MCMC) (e.g., using MrBayes ) MrBayes Most accurate Most accurate Very slow Phylogeny reconstruction approaches: summary
15
How robust is our tree? HumanGorillaChimp
16
We need some statistical way to estimate the confidence in the tree topology But we don’t know anything about the distribution of tree topologies The only data source we have is our data (MSA) So, we must rely on our own resources: “pull up by your own bootstraps” Bootstrap for estimating robustness
17
Bootstrap 1. C reate n (100-1000) new MSAs (pseudo-MSAs) by randomly sampling K positions from our original MSA with replacement 12345 K 1 : ATCTG…A 2 : ATCTG…C 3 : ACTTA…C 4 : ACCTA…T 11244…3 1 : AATTT…C 2 : AATTT…C 3 : AACTT…T 4 : AACTT…C 97478…10 1 : TTTTA…T 2 : CATAC…A 3 : CATAC…T 4 : AGTGG…A 51578…12 1 : GAGTA…T 2 : GAGAC…G 3 : AAAAC…A 4 : AAAGG…C Sp1 Sp2 Sp3 Sp4
18
Bootstrap 2. Reconstruct a pseudo-tree from each pseudo- MSA with the same method used for reconstructing the original tree Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 11244…3 1 : AATTT…C 2 : AATTT…C 3 : AACTT…T 4 : AACTT…C 97478…10 1 : TTTTA…T 2 : CATAC…A 3 : CATAC…T 4 : AGTGG…A 51578…12 1 : GAGTA…T 2 : GAGAC…G 3 : AAAAC…A 4 : AAAGG…C
19
Bootstrap 3. For each split in our original tree, we count the number of times it appeared in the pseudo-trees Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 67% 100% In 67% of the pseudo- trees, the split between SP1+SP2 and the rest of the tree was found In general bp support < 80% is considered low
20
ClustalX: NJ phylogeny reconstruction
22
http://phylobench.vital-it.ch/raxml-bbhttp://phylobench.vital-it.ch/raxml-bb/ http://phylobench.vital-it.ch/raxml-bb
24
Viewing the tree with njPlot
25
Note: unrooted tree
26
Defining an outgroup
27
Swapping nodes
28
Bootstrap support
29
FigTree: tree visualization and figure creation http://tree.bio.ed.ac.uk/software/figtree/ http://tree.bio.ed.ac.uk/software/figtree/
30
Reconstructing the tree of life
31
Darwin’s vision of the tree of life from the Origin of Species
32
The three-domain tree of life based on SSU rRNA MSA
33
But branching of several kingdoms remain in dispute
34
Lateral Gene Transfer (LGT) challenges the conceptual basis of phylogenetic classification
36
Methodology Started with 36 genes universally present in 191 species (spanning all 3 domains of life), for which orthologs could be unambiguously identified Eliminated 5 genes that are LGT suspects (mostly tRNA synthetases) Constructed an MSA for each of the 31 orthogroups Concatenated all 31 MSAs to a super-MSA of 8090 columns The phylogeny was reconstructed based on the super-MSA using the maximum likelihood approach
37
Archaea Eukaryota Bacteria http://itol.embl.de
38
Tree support 81.7% of the splits show bootstrap support of over 80% 65% of the split show bootstrap support of 100% However, several deep splits show low supports
39
Still, the debate goes on
40
“Tree of one percent of life” Ciccarelli et al. on the one hand favor the claim that bacteria adhere to a bifurcating tree of life, given that the small amount of LGT genes are filtered On the other hand, their filtering process left only 31 proteins, which represent ~1% of an average prokaryotic proteome and ~0.1% of a large eukaryotic proteome “If throwing out all non-universally distributed genes and all LGT suspects leaves a 1% tree, then we should probably abandon the tree as a working hypothesis”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.