Christian M Zmasek, PhD 15 June 2010.

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
An Introduction to Phylogenetic Methods
BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.
Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
- A brief introduction in 4 hours -
© Wiley Publishing All Rights Reserved. Phylogeny.
Bioinformatics and Phylogenetic Analysis
Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.
Steps of the phylogenetic analysis
07/05/2004 Evolution/Phylogeny Introduction to Bioinformatics MNW2.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Probabilistic methods for phylogenetic trees (Part 2)
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
Phylogenetic trees Sushmita Roy BMI/CS 576
Phylogenetic Analysis
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Terminology of phylogenetic trees
BINF6201/8201 Molecular phylogenetic methods
Molecular phylogenetics
Alexis Dereeper Homology analysis and molecular phylogeny CIBA courses – Brasil 2011.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
COMPUTATIONAL MODELS FOR PHYLOGENETIC ANALYSIS K. R. PARDASANI DEPTT OF APPLIED MATHEMATICS MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY (MANIT) BHOPAL.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Phylogenetics and Coalescence Lab 9 October 24, 2012.
3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Phylogenetic Tree Reconstruction
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Construction of Substitution Matrices
Calculating branch lengths from distances. ABC A B C----- a b c.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Building Phylogenetic Trees.
Phylogeny and Genome Biology Andrew Jackson Wellcome Trust Sanger Institute Changes: Type program name to start Always Cd to phyml directory before starting.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Phylogenetics.
Phylogeny & Systematics
Burkhard Morgenstern Institut für Mikrobiologie und Genetik Molekulare Evolution und Rekonstruktion von phylogenetischen Bäumen WS 2006/2007.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Molecular Evolution. Study of how genes and proteins evolve and how are organisms related based on their DNA sequence Molecular evolution therefore is.
Phylogeny and the Tree of Life
Phylogenetic basis of systematics
Inferring a phylogeny is an estimation procedure.
Phylogenetic Inference
Multiple Alignment and Phylogenetic Trees
Methods of molecular phylogeny
Phylogenetic Trees.
Molecular Evolution.
Chapter 19 Molecular Phylogenetics
Lecture 19: Evolution/Phylogeny
But what if there is a large amount of homoplasy in the data?
Presentation transcript:

Christian M Zmasek, PhD 15 June 2010

1. Why perform phylogenetic inference? 2. Theoretical background 3. Methods 4. Software & Examples (C) 2010 Christian M. Zmasek2

 ‘Tree of life’: The relationships amongst different species  Infer the functions of proteins from family members in model organisms or to refine existing annotations through phylogenetic analysis  A method to organize/cluster sequences with biological justification (C) 2010 Christian M. Zmasek3

RAT MOUSE HUMAN RICE LIZARD SHARK RAT MOUSE HUMAN RICE LIZARD SHARK Y Z X Z Y : query sequence : orthologous to query : most similar to query : gene duplication (C) 2010 Christian M. Zmasek4

RAT WHEAT HUMAN BARLEY Y Z : query sequence : orthologous to query : most similar to query : gene duplication (C) 2010 Christian M. Zmasek5

 A phylogeny is the evolutionary history of a species or a group of species. Lately, the term is also being applied to the evolutionary history of individual DNA or protein sequences.  The evolutionary history of organisms or sequences can be illustrated using a tree-like diagram – a phylogenetic tree. (C) 2010 Christian M. Zmasek6

7

 Initially, phylogenetic trees were built based on the morphology of organisms.  Around 1960 molecular sequences were recognized as containing phylogenetic information and hence as valuable for tree building  A tree built based on sequence data is called a gene tree since it is a representation of the evolutionary history of genes  A tree illustrating the evolutionary history of organisms is called a species tree (C) 2010 Christian M. Zmasek8

9

10

 Homologs are defined as sequences which share a common ancestor (Fitch, 1966)  This definition becomes unclear if mosaic proteins, which are composed of structural units originating from different genes are considered  Phylogenetic trees make sense only if constructed based on homologous sequences (whole genes/proteins, or domains) (C) 2010 Christian M. Zmasek11

 Homologous sequences can be divided into orthologs, paralogs and xenologs:  Orthologs: diverged by a speciation event (their last common ancestor on a phylogenetic tree corresponds to a speciation event)  IMPORANT: Functional similarity does not imply orthology  Paralogs: diverged by a duplication event (their last common ancestor corresponds to a duplication)  Xenologs: are related to each other by horizontal gene transfer (via retroviruses, for example) (C) 2010 Christian M. Zmasek12

(C) 2010 Christian M. Zmasek13

 Orthologous sequences tend to have more similar “functions” than paralogs  Yet: Orthologs are mathematically defined, whereas there is no definition of sequence “function” (i.e. it is a subjective term) (C) 2010 Christian M. Zmasek14

 New genes evolve if mutations accumulate while selective constraints are relaxed by gene duplication  First recognized by Haldane (“… it [mutation pressure] will favour polyploids, and particularly allopolyploids, which possess several pairs of sets of genes, so that one gene may be altered without disadvantage…” (C) 2010 Christian M. Zmasek15

HumanRatWheatHumanRat Wheat Human Rat Wheat Human Rat Wheat G1G1 G2G2 S (C) 2010 Christian M. Zmasek16

Multiple sequence alignment of homologous sequences Pairwise distance calculation Algorithmic Methods Based on Pairwise Distances: UPGMA Neighbor Joining Optimality Criteria Based on Pairwise Distances: Fitch-Margoliash Minimal Evolution Optimality Criteria Based on Character Data: Maximum Parsimony Maximum Likelihood “More accurate” (in general) Fast Bayesian Methods (MCMC) (C) 2010 Christian M. Zmasek17

The simplest method to measure the distance between two amino acid sequences is by their fractional dissimilarity p (n d is the number of aligned sequence positions containing non-identical amino acids and n s is the number of aligned sequence positions containing identical amino acids): (C) 2010 Christian M. Zmasek18

 Unfortunately, this is unrealistic -- does not take into account:  superimposed changes: multiple mutations at the same sequence location  different chemical properties of amino acids: for example, changing leucine into isoleucine is more likely and should be weighted less than changing leucine into proline (C) 2010 Christian M. Zmasek19

 A more realistic approach for estimating evolutionary distances is to apply maximum likelihood to empirical amino acid replacement models, such as PAM transition probability matrices.  The likelihood L H of a hypothesis H (an evolutionary distance, for example) given some data D (an alignment, for example) is the probability of D given H: L H =P(D|H) (C) 2010 Christian M. Zmasek20

 UPGMA stands for unweighted pair group method using arithmetic averages  This is clustering  This algorithm produces rooted trees based under the assumption of a molecular clock. (C) 2010 Christian M. Zmasek21

 As opposed to UPGMA, neighbor joining (NJ) is not misled by the absence of a molecular clock  NJ produces phylogenetic trees (not cluster diagrams) (C) 2010 Christian M. Zmasek22

 Fitch-Margoliash  Minimal evolution (ME)  Maximum Parsimony (MP)  Maximum Likelihood (ML) (C) 2010 Christian M. Zmasek23

 Branch lengths are fitted to a tree according to a unweighted least squares criterion, but the optimality criterion to evaluate and compare trees is to minimize the sum of all branch lengths. (C) 2010 Christian M. Zmasek24

 Evaluate a given topology  Example: Sequence1: TGC Sequence2: TAC Sequence3: AGG Sequence4: AAG (C) 2010 Christian M. Zmasek 25

 Probabilistic methods can be used to assign a likelihood to a given tree and therefore allow the selection of the tree which is most likely given the observed sequences.  Probability for one residue a to change to b in time t along a branch of a tree: P(b|a,t)  Its actual calculation is dependent on what model for sequence evolution is used.  Poisson process:  P(b|a,t)=1/ /20e -ut for a=b  P(b|a,t)=1/20 + 1/20e -ut for a≠b (C) 2010 Christian M. Zmasek26

 Example: MrBayes  Use Markov Chain Monte Carlo (MCMC) approach to sample over tree space (C) 2010 Christian M. Zmasek27

 To asses the reliability of trees  Resampling with replacement (see example on next slide)  What is “good enough”?? >60%?, >90%? (C) 2010 Christian M. Zmasek28

Original sequence alignment: Sequence 1: ARNDCQ Sequence 2: VRNDCQ Bootstrap resample 1: Sequence 1: RRQCCA Sequence 2: RRQCCV Bootstrap resample 2: Sequence 1: AQCDCQ Sequence 2: VQCDCQ (C) 2010 Christian M. Zmasek29

Multiple sequence alignment of homologous sequences Pairwise distance calculation Algorithmic Methods Based on Pairwise Distances: UPGMA Neighbor Joining Optimality Criteria Based on Pairwise Distances: Fitch-Margoliash Minimal Evolution Optimality Criteria Based on Character Data: Maximum Parsimony Maximum Likelihood “More accurate” (in general) Fast Bayesian Methods (MCMC) (C) 2010 Christian M. Zmasek30

 Mafft:   Server:  T-Coffee:   Server:  Server:  ClustalW:  ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalW/ ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalW/  Server:  Probcons:   Server:  Muscle:   Server: (C) 2010 Christian M. Zmasek31

 List of programs:  ML pairwise distance calculation (protein):  TREE-PUZZLE:  Bootstrapping, pairwise distance calculation, UPGMA, NJ, Fitch-Margolish, ME:  PHYLIP:  ME:  FastME (server):  MEGA:  ML:  PhyML (server):  RAxML (server):  Bayesian (MCMC):  MrBayes:  Parsimony (esp. on Macintosh), display:  PAUP:  Tree display:  Archaeopteryx:  Hypothesis testing:  HyPhy: (C) 2010 Christian M. Zmasek32

 Richard Durbin et al.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids [ Proteins/dp/ /sr=1-1/qid= /ref=sr_1_1/ ?ie=UTF8&s=books] Proteins/dp/ /sr=1-1/qid= /ref=sr_1_1/ ?ie=UTF8&s=books  Joe Felsenstein: Inferring Phylogenies [ Joseph-Felsenstein/dp/ /sr=8-1/qid= /ref=pd_bbs_sr_1/ ?ie=UTF8&s=books] Joseph-Felsenstein/dp/ /sr=8-1/qid= /ref=pd_bbs_sr_1/ ?ie=UTF8&s=books  Ziheng Yang: Computational Molecular Evolution [ Ecology/dp/ /sr=1-1/qid= /ref=pd_bbs_sr_1/ ?ie=UTF8&s=books] Ecology/dp/ /sr=1-1/qid= /ref=pd_bbs_sr_1/ ?ie=UTF8&s=books  Oliver Gascuel: Mathematics of Evolution & Phylogeny [ Gascuel/dp/ /sr=1-1/qid= /ref=sr_1_1/ ?ie=UTF8&s=books] Gascuel/dp/ /sr=1-1/qid= /ref=sr_1_1/ ?ie=UTF8&s=books (C) 2010 Christian M. Zmasek33

 Download and install MrBayes:  Read the tutorial:  Analyze the provided data set (“primates.nex”)  Download and install PHYLIP:  Perform seqboot (100x) – dnadist – neighbor (NJ) – consense on “primates.nex” (you need to change the format accordingly)  Compare the results (MrBayes vs. Phylip NJ) (C) 2010 Christian M. Zmasek34