Phylogenetics Alexei Drummond. CS369 20072 Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) 2027025 (B) 34459425.

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

An Introduction to Phylogenetic Methods
Reading Phylogenetic Trees Gloria Rendon NCSA November, 2008.
Wellcome Trust Workshop Working with Pathogen Genomes Module 6 Phylogeny.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Based on lectures by C-B Stewart, and by Tal Pupko Phylogenetic Analysis based on two talks, by Caro-Beth Stewart, Ph.D. Department of Biological Sciences.
Phylogenetic Analysis
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.
Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Reading Phylogenetic Trees
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
Bioinformatics and Phylogenetic Analysis
Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Building Phylogenies Parsimony 2.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic trees Sushmita Roy BMI/CS 576
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Terminology of phylogenetic trees
Molecular phylogenetics
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
PHYLOGENETIC TREES Dwyane George February 24,
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
 Read Chapter 4.  All living organisms are related to each other having descended from common ancestors.  Understanding the evolutionary relationships.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Reading Phylogenetic Trees
Calculating branch lengths from distances. ABC A B C----- a b c.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Phylogeny & the Tree of Life
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: (1) Phylogeny inference or “tree building”
Introduction to Bioinformatics Resources for DNA Barcoding
Phylogenetic basis of systematics
Phylogeny & the Tree of Life
Inferring a phylogeny is an estimation procedure.
Reading Cladograms Who is more closely related?
Phylogenetic Inference
Multiple Alignment and Phylogenetic Trees
Goals of Phylogenetic Analysis
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Phylogenetic Trees.
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Reading Phylogenetic Trees
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Phylogeny.
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Phylogenetic Trees Jasmin sutkovic.
Presentation transcript:

Phylogenetics Alexei Drummond

CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B) (C) 8.20  (D) 3.21  Bonus question: What about unrooted trees?

CS Computational Biology Multiple sequence alignment GlobalLocal Evolutionary tree reconstruction Substitution matrices Pairwise sequence alignment (global and local) Database searching BLAST Sequence statistics Adapted from slide by Dannie Durant

Molecules as Documents of Evolutionary History Macromolecules contain information about the processes and history that formed them HIV-1 (UK) ATCGGATGCTAAAGCATATGACACAGAGGTACATAATGTTT HIV-1 (USA) ATCAGATGCTAGAGCTTATGATACAGAGGTACA---TGTTT However, this information is often fragmentary, camouflaged or lost completely One of the aims of computational biology is to recover as much of this information as possible and decipher its meaning

Phylogenetics Views similarity (homology) as evidence of common ancestry –Homology: similarity that is the result of inheritance from a common ancestor Uses tree diagrams to portray relationships based upon recency of common ancestry Monophyletic groups (clades) - contain species which are more closely related to each other than to any outside of the group Phylogenetics has in recent years become a statistical science based on probabilistic models of evolution.

Bacterium 1 Bacterium 3 Bacterium 2 Eukaryote 1 Eukaryote 4 Eukaryote 3 Eukaryote 2 Bacterium 1 Bacterium 3 Bacterium 2 Eukaryote 1 Eukaryote 4 Eukaryote 3 Eukaryote 2 Types of Phylogenies Cladograms show clusters –Branch lengths are meaningless Phylograms show clusters and branch lengths –Branch lengths can represent time or genetic distance –Vertical dimension is meaningless

Rooting trees using an outgroup archaea eukaryote bacteria outgroup root of ingroup eukaryote archaea Monophyletic Group (clade) Unrooted tree Rooted by outgroup Monophyletic Group (clade)

CS Anatomy of a tree Bacterium 1 Bacterium 3 Bacterium 2 Eukaryote 1 Eukaryote 4 Eukaryote 3 Eukaryote 2 External branch or edge Internal branch or edge Internal node External node or tip Taxon Root

Problems in Phylogenetics Correctly aligning multiple sequences Choosing an evolutionary model of sequence change –To estimate the genetic distance between sequences Inferring phylogenetic trees Testing evolutionary hypotheses –(we won’t cover this material in 369)

   enumerable by hand enumerable by hand on a rainy day enumerable by computer still searchable very quickly on computer a bit more than the number of hairs on your head Greater than the population of Auckland ≈ upper limit for exhaustive searching; about the number of possible combinations of numbers in the UK National Lottery ≈ upper limit for branch-and-bound searching ≈ the number of particles in the universe number of trees to choose from in the “Out of Africa” data (Vigilant et al., 1991) n #trees How many trees are there? For n taxa there are (2n – 3)! = (2n – 3)(2n – 5)...(3)(1) rooted, binary trees:

A B C D E Characters Taxa A B C D E Distances Phylogenetic Reconstruction There are essentially two types of data for phylogenetic tree estimation: –Distance data, usually stored in a distance matrix, e.g. DNA×DNA hybridisation data, morphometric differences, immunological data, pairwise genetic distances –Character data, usually stored in a character array; e.g. multiple sequence alignment of DNA sequences, morphological characters.

Phylogenetic Reconstruction Given the huge number of possible trees even for small data sets, we have two options: –Build one according to some clustering algorithm –Assign a “goodness of fit” criterion (an objective function) and find the tree(s) which optimise(s) this criterion

CS Distances Nucleotide Sites Type of Data UPGMA Neighbor-Joining Minimum Evolution Maximum Parsimony Maximum Likelihood Tree Building Method Optimality Criterion Clustering Algorithm Phylogenetic Reconstruction

Clustering Algorithms The clustering algorithms are usually very fast, and simple but –there is no explicit optimality criterion, so we have no measure of how good the tree is! we do not get any idea about other potential trees – were there any better trees? Common methods are Neighbour-Joining and UPGMA.

A B Node 1 * NJ uses rate-corrected distances Clustering Algorithms The UPGMA and neighbor-joining (NJ) methods are both greedy heuristics which join, at each step, the two closest* sub-trees that are not already joined. They are based on the minimum evolution principle. An important concept in both of these methods is a pair of neighbors, which is defined as two nodes that are connected via a single node:

CS UPGMA Example ABCD A0 B80 C790 D A C 3.5

CS UPGMA Example ACBD 0 B8.50 D A C 3.5 B

CS UPGMA Example A C 3.5 B ABCD 0 D D

CS UPGMA weaknesses ABCD A0 B80 C790 D A B 3 5 C D There is a (non clock-like) tree that fits the distance matrix exactly!

CS UPGMA properties UPGMA assumes that the rates of evolution are clock-like. –Assumes the rate of substitution is the same on all branches of the tree Produces a rooted tree

CS Neighbor-joining Most widely-used distance based method for phylogenetic reconstruction UPGMA illustrated that it is not enough to pick the closest neighbors (at least when there is rate heterogeneity across branches) Idea: take into account averaged distances to other leaves as well Produces an unrooted tree

CS The basic idea We start by moving every node i closer to all other nodes by this amount: As a result the new (squashed) distances are: We are pushing node i closer to all other nodes by an amount slightly more than the average distance to all other taxa.

CS The basic idea In effect, the nodes that were far away from everything get pushed towards everything quite a lot. This counteracts the effect of long branches. A B C D UPGMA would incorrectly group A and B, whereas NJ would reconstruct the correct tree in this case.

CS Neighbor-joining We use an algorithm very similar to UPGMA to connect the two closest nodes, i and j, using these new squashed distances. We join these into a cluster and make a new node k to correspond to their ancestor, and pick distances from i, j and all other nodes to k. The squashed distances are updated at each step. See Durbin book, p171 for details.

CS Runtime of the algorithm Both of these clustering-based algorithms take O(n 3 ) time once we have the distance matrix. There are n steps and in each step we do: –(1) find the smallest distance –(2) join these two taxa –(3) compute the distance from the new ancestor to all others Step (1) takes O(n 2 ) and the other two steps take O(n)