Phylogenetic trees Sushmita Roy BMI/CS 576

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Multiple Sequence Alignment & Phylogenetic Trees.
Lecture 13 CS5661 Phylogenetics Motivation Concepts Algorithms.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
Problem Set 2 Solutions Tree Reconstruction Algorithms
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Phylogenetic Reconstruction: Distance Matrix Methods Anders Gorm Pedersen Molecular Evolution Group Center for.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces.
Overview of Phylogeny Artiodactyla (pigs, deer, cattle, goats, sheep, hippopotamuses, camels, etc.) Cetacea (whales, dolphins, porpoises)
Bioinformatics Algorithms and Data Structures
Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.
The Tree of Life From Ernst Haeckel, 1891.
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogeny Tree Reconstruction
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Distance-Based Phylogenetic Reconstruction Tutorial #8 © Ilan Gronau, edited by Itai Sharon.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
9/1/ Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein.
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Terminology of phylogenetic trees
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
COMPUTATIONAL MODELS FOR PHYLOGENETIC ANALYSIS K. R. PARDASANI DEPTT OF APPLIED MATHEMATICS MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY (MANIT) BHOPAL.
1 Chapter 7 Building Phylogenetic Trees. 2 Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances –UPGMA method.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Introduction to Phylogenetic Trees
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Building Phylogenetic Trees.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Tutorial 5 Phylogenetic Trees.
Phylogenetics.
Phylogenetic Trees - Parsimony Tutorial #13
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Part 9 Phylogenetic Trees
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Phylogenetic basis of systematics
Inferring a phylogeny is an estimation procedure.
Goals of Phylogenetic Analysis
The Tree of Life From Ernst Haeckel, 1891.
Phylogenetic Trees.
#30 - Phylogenetics Distance-Based Methods
Lecture 7 – Algorithmic Approaches
Phylogeny.
Presentation transcript:

Phylogenetic trees Sushmita Roy BMI/CS 576 www.biostat.wisc.edu/bmi576/ sroy@biostat.wisc.edu Oct 1st, 2013

Key concepts in this section What are phylogenies or phylogenetic trees? Terminology such as extant, ancestral, branch point, branch length, orthologs, paralogs, taxon Why build phylogenetic trees? How to build phylogenetic trees? Distance-based methods Parsimony methods Minimize the number of changes Probabilistic methods Find the tree that best explains the data using probabilistic models

What are phylogenetic trees? A tree that describes evolutionary relationships among entities Species, genes, strains Leaves represent extant entities Internal nodes represent ancestral species Such a tree is inferred from observations in existing organisms.

Tree of life aims to represents the phylogeny of all species on earth From http://tellapallet.com/tree_of_life.htm

Phylogenetic tree of 29 mammals Lindbald-Toh et al, 2011, Nature

Why phylogenetic trees? Understand how organisms are related Do humans and chimpanzees share a common ancestor or do humans and gorillas? Ask how closely organisms are related Humans and chimpanzees shard a common ancestor 5mya Provide insight into the evolutionary history of species How specific functions have evolved Language evolution Identify signatures of conservation of sequence Conjecture the fate of specific regions of the genome Will the human Y disappear? Inform multiple sequence alignments

Orthologs and paralogs Two sequences in two species that have a a common ancestor Diverged due to a speciation event Used to create a “species tree” Paralogs: Two sequences in the same species that arose from a gene duplication event Captured in a “gene tree”.

Phylogenetic tree basics Leaves represent things (genes, species, individuals/strains) being compared the term taxon (taxa plural) is used to refer to these when they represent species and broader classifications of organisms For example if taxa are species, the tree is a species tree Internal nodes are hypothetical ancestral units Phylogenetic trees can be rooted or unrooted the root represents the common ancestor In a rooted tree, path from root to a node represents an evolutionary path Gives directionality to evolutionary time An unrooted tree specifies relationships among taxa, but not from an ancestor

Tree basics Internal node: Ancestral 2 5 9 8 8 6 Branch length 6 7 7 1 4 1 2 3 4 5 3 Leaf node: Extant Unrooted tree Rooted tree Each tree topology represents a different evolutionary history For a species tree, internal nodes represent speciation events

Internal nodes represent ancestral species Tree of Life project (http://tolweb.org/tree/)

Rooting a tree An unrooted tree can be converted to a rooted tree using an outgroup species Outgroup: a species known to be more distantly related all the species than each of the species themselves Find the branch where the outgroup is selected to be added That gives the root

Tree counting A rooted tree with n leaf nodes has n-1 internal nodes 2n-2 edges/branches An unrooted tree with n leaf nodes has n-2 internal nodes 2n-3 edges/branches A root can be added to any of these branches to give 2n-3 rooted trees for any unrooted tree For three taxa there is one unrooted tree and three rooted trees

Tree counting 1 1 3 3 1 2 3 2 2 1 An unrooted tree 3 3 2 1 2 1 3 2 1 3 Possible positions for root Rooted trees

Tree counting Instead of adding a root we could add a branch for the n+1th taxon 4 1 1 1 3 3 3 2 2 2 1 1 4 3 2 3 2 1 1 3 3 2 2 4

Tree counting With four nodes, we have five branches Each of the branches can give rise to five trees of six nodes Thus we have 3*5 trees In general for n nodes we can have (1)(3)(5)..(2n-5) unrooted trees

Constructing phylogenetic trees Three types of methods Distance based methods Parsimony methods Probabilistic approaches Most methods start with pairwise distance methods We have already seen one method!

Methods for phylogenetic tree reconstruction Distance-based methods UPGMA Neighbor joining Assume additivity and sometimes a “molecular clock” Additivity means we can add up the branch lengths of the tree connecting two nodes and get their distances. Alignment-based methods Parsimony Probabilistic

Defining distance between sequences Fractional alignment difference for two sequences i and j pij = mij/Lij Gives an estimate of changes per site mij: Number of mismatches Assumes that changes have happened only once Underestimates the distance between sequences Assumes all sequences change at the same rate Jukes Cantor distance The simplest, evolutionary distance

UPGMA relies on the molecular clock assumption Sequences diverge at the same rate at different points in the phylogeny Distance from any leaf to root is the same. If this is true the data is said to be ultrametric

The molecular clock assumption & ultrametric data Ultrametric data: for any triplet of sequences, i, j, k, the distances are either all equal, or two are equal and the remaining one is smaller A B C D E 8 5 3 4 3 d_ij <= max(d_ik, d_jk) 2 1 A E D B C

Problems with the molecular clock assumption 3 2 4 2 3 4 1 1 Constructed by UPGMA Actual tree

Neighbor joining The ultra-metric property is too strong Most sequences diverge at different rates A more relaxed requirement is that of additivity Distance between a pair of species/nodes is equal to the sum of the branch lengths Uses a similar idea to construct trees as UPGMA That is consider pairs of nodes and joins them Produces unrooted trees

How to select nodes for merging? Given all pairwise distances for n sequences dij denote the distance between node i and j Should we select node pairs with the smallest dij? A B C D 0.4 0.1 Should we merge A and B?

Need to correct for long branches L: current set of leaves ri : Average distance from other nodes

Defining the distance to a new node dkm? i m k j New node Given dij, dim, djm, how to calculate distance to new node k?

Algorithm for NJ Initialization Iteration Terminate T be set the of leaf nodes L = T Estimate ri for all i in L Estimate Dij Iteration Pick a pair i, j from L such that Dij is smallest Define new node k Estimate dik, djk, add edge between k and i, and between k to j Add k to T, remove i and j from L Estimate Dmn for all nodes m, n in L Terminate If L has two nodes, add the edge between these two.

An example with neighbor joining Consider 5 sequences: A, B, C, D, E Distance matrix What is the tree inferred by the Neighbor joining algorithm? B C D E A 5 4 9 8 10 7 6 B C D E

Can we check for additivity? Check for additivity: For four leaves, i, j, k, l and the distances dij, dik, dil, djk, djl, dkl j i l k The three sums of two distances j j i j i i l l l k k k Should be such that two of these are equal, and larger than the third.