Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Slides:



Advertisements
Similar presentations
Ortholog vs. paralog? 1. Collect Sequence Data Good Dataset
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
An Introduction to Phylogenetic Methods
Estimating the reliability of a tree Reconstructed phylogenetic trees are almost certainly wrong. They are estimates of the true tree. But how reliable.
Wellcome Trust Workshop Working with Pathogen Genomes Module 6 Phylogeny.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
© Wiley Publishing All Rights Reserved. Phylogeny.
Multiple Sequence Alignment (MSA) and Phylogeny. Clustal X.
Phylogenetic trees as a visualization tools for evolutionary classification.
Multiple Sequence Alignment (MSA) and Phylogeny. One of the options to get multiple sequence Fasta file.
Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2.
Bioinformatics and Phylogenetic Analysis
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
Steps of the phylogenetic analysis
Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species.
07/05/2004 Evolution/Phylogeny Introduction to Bioinformatics MNW2.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Lecture 24 Inferring molecular phylogeny Distance methods
Introduction to Bioinformatics Molecular Phylogeny Lesson 5.
Phylogenetic trees. ChimpHumanGorilla HumanChimpGorilla = ChimpGorillaHuman == GorillaChimp Trees.
Probabilistic methods for phylogenetic trees (Part 2)
Bioinformatics tools for phylogeny and visualization
Phylogenetic trees Sushmita Roy BMI/CS 576
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Terminology of phylogenetic trees
Molecular phylogenetics
Christian M Zmasek, PhD 15 June 2010.
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
BINF6201/8201 Molecular phylogenetic methods
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel.
Building and visualizing phylogeny Henrik Lantz Dept. of Medical Biochemistry and Microbiology, BMC, Uppsala University.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
A brief introduction to phylogenetics
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Lecture 2: Principles of Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Molecular Phylogeny. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Phylogeny Ch. 7 & 8.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Part 9 Phylogenetic Trees
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Phylogenetic genome analysis, phylogenomics
Phylogenetic basis of systematics
Phylogenetic Inference
Phylogenetic Trees.
BNFO 602 Phylogenetics Usman Roshan.
Phylogeny.
Lecture 19: Evolution/Phylogeny
But what if there is a large amount of homoplasy in the data?
Presentation transcript:

Phylogeny

Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data must be comprised of homologous types  In molecular evolution, the studied data are homologous DNA/AA sequences  Phylogeny reconstruction explicitly assumes that the sequences are aligned INPUT = MSA

Reminder: MSA and phylogeny are dependent Inaccurate guide tree MSA Sequence alignment Phylogeny reconstruction Unaligned sequences

Phylogeny representation CA D Textual representation (Newick format) B Each pair of parenthesis () encloses a clade in the tree A comma “,” separates the members of the corresponding clade A semicolon “;” is always the last character Visual representation ((A,C),(B,D));

Some terminology root internal branches (splits) internal nodes External nodes (leaves) monophyletic group (clade) External branches Neighbors

ChimpHumanGorilla HumanChimpGorilla = ChimpGorillaHuman == GorillaChimp (Gorilla,(Human,Chimp)) = (Gorilla,(Chimp,Human)) = ((Human,Chimp),Gorilla) = ((Chimp,Human),Gorilla) Swapping neighbors is meaningless

1 2 3 A B C 1 CB A 2 BC A 3 AB C ≠ ≠ Rooted vs. unrooted

1 2 3 A B C 1 CB A 2 BC A 3 AB C ≠ ≠ ((C,B),A) ((A,B),C) ((A,C),B) (A,B,C) In newick format

How can we root a tree?

Rooting the tree based on a priori knowledge: using an outgroup HumanChimp Chicken Gorilla INGROUP OUTGROUP Human Chimp Gorilla Chicken Human Chimp Chicken Gorilla The outgroup should be close enough for detecting sequence homology, but far enough to be a clear outgroup

The gene tree is not always identical to the species tree Gorilla Chimp Chicken Human GorillaChimp Chicken Human Chimp Chicken Gorilla ≠ Gene tree Species tree

Phylogeny reconstruction approaches Distance based methods: Neighbor Joining B D A C E A D C E B A,B B D A C E ABCDE A02344 B0345 C034 D05 E0 CDE C034 D05 E0 The Minimum Evolution (ME) criterion: in each iteration we separate the two sequences which result with the minimal sum of branch lengths

Maximum Parsimony: finds the most parsimonious topology Seq 1: Seq 2: Seq 3: Seq 4: Phylogeny reconstruction approaches P(Data|T) Maximum Likelihood: finds the most likely topology Topology search methods: MP, ML

 Distance based methods Neighbor Joining (e.g., using ClustalX) Neighbor Joining (e.g., using ClustalX) Fast Fast  Inaccurate  Topology search methods Maximum parsimony (e.g., using MEGA ) Maximum parsimony (e.g., using MEGA ) MEGA ×Crude ×Questionable statistical basis Maximum likelihood (e.g., using RAxML, phyML ) Maximum likelihood (e.g., using RAxML, phyML ) RAxMLphyML RAxMLphyML Accurate Accurate  Slow  Bayesian methods Monte Carlo Markov Chains (MCMC) (e.g., using MrBayes ) Monte Carlo Markov Chains (MCMC) (e.g., using MrBayes ) MrBayes Most accurate Most accurate  Very slow Phylogeny reconstruction approaches: summary

How robust is our tree? HumanGorillaChimp

 We need some statistical way to estimate the confidence in the tree topology  But we don’t know anything about the distribution of tree topologies  The only data source we have is our data (MSA)  So, we must rely on our own resources: “pull up by your own bootstraps” Bootstrap for estimating robustness

Bootstrap 1. C reate n ( ) new MSAs (pseudo-MSAs) by randomly sampling K positions from our original MSA with replacement K 1 : ATCTG…A 2 : ATCTG…C 3 : ACTTA…C 4 : ACCTA…T 11244…3 1 : AATTT…C 2 : AATTT…C 3 : AACTT…T 4 : AACTT…C 97478…10 1 : TTTTA…T 2 : CATAC…A 3 : CATAC…T 4 : AGTGG…A 51578…12 1 : GAGTA…T 2 : GAGAC…G 3 : AAAAC…A 4 : AAAGG…C Sp1 Sp2 Sp3 Sp4

Bootstrap 2. Reconstruct a pseudo-tree from each pseudo- MSA with the same method used for reconstructing the original tree Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp …3 1 : AATTT…C 2 : AATTT…C 3 : AACTT…T 4 : AACTT…C 97478…10 1 : TTTTA…T 2 : CATAC…A 3 : CATAC…T 4 : AGTGG…A 51578…12 1 : GAGTA…T 2 : GAGAC…G 3 : AAAAC…A 4 : AAAGG…C

Bootstrap 3. For each split in our original tree, we count the number of times it appeared in the pseudo-trees Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 67% 100% In 67% of the pseudo- trees, the split between SP1+SP2 and the rest of the tree was found In general bp support < 80% is considered low

ClustalX: NJ phylogeny reconstruction

Viewing the tree with njPlot

Note: unrooted tree

Defining an outgroup

Swapping nodes

Bootstrap support

FigTree: tree visualization and figure creation

Reconstructing the tree of life

Darwin’s vision of the tree of life from the Origin of Species

The three-domain tree of life based on SSU rRNA MSA

But branching of several kingdoms remain in dispute

Lateral Gene Transfer (LGT) challenges the conceptual basis of phylogenetic classification

Methodology  Started with 36 genes universally present in 191 species (spanning all 3 domains of life), for which orthologs could be unambiguously identified  Eliminated 5 genes that are LGT suspects (mostly tRNA synthetases)  Constructed an MSA for each of the 31 orthogroups  Concatenated all 31 MSAs to a super-MSA of 8090 columns  The phylogeny was reconstructed based on the super-MSA using the maximum likelihood approach

Archaea Eukaryota Bacteria

Tree support  81.7% of the splits show bootstrap support of over 80%  65% of the split show bootstrap support of 100%  However, several deep splits show low supports

Still, the debate goes on

“Tree of one percent of life”   Ciccarelli et al. on the one hand favor the claim that bacteria adhere to a bifurcating tree of life, given that the small amount of LGT genes are filtered   On the other hand, their filtering process left only 31 proteins, which represent ~1% of an average prokaryotic proteome and ~0.1% of a large eukaryotic proteome  “If throwing out all non-universally distributed genes and all LGT suspects leaves a 1% tree, then we should probably abandon the tree as a working hypothesis”