Phylogeny.

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

Computational Molecular Biology Biochem 218 – BioMedical Informatics Doug Brutlag Professor.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Multiple Sequence Alignment & Phylogenetic Trees.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Lecture 13 CS5661 Phylogenetics Motivation Concepts Algorithms.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
Bioinformatics Algorithms and Data Structures
Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogeny Tree Reconstruction
Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Strings and.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic trees Tutorial 6. Distance based methods UPGMA Neighbor Joining Tools Mega phylogeny.fr DrewTree Phylogenetic Trees.
Phylogenetic trees Sushmita Roy BMI/CS 576
9/1/ Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein.
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Terminology of phylogenetic trees
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
PHYLOGENETIC TREES Dwyane George February 24,
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetics II.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Calculating branch lengths from distances. ABC A B C----- a b c.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
Evolutionary tree reconstruction (Chapter 10). Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Evolutionary tree reconstruction
Comp. Genomics Recitation 8 Phylogeny. Outline Phylogeny: Distance based Probabilistic Parsimony.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Fitch-Margoliash Algorithm 1.From the distance matrix find the closest pair, e.g., A & B 2.Treat the rest of the sequences as a single composite sequence.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Phylogenetic Trees - Parsimony Tutorial #12
Distance based phylogenetics
Inferring a phylogeny is an estimation procedure.
Character-Based Phylogeny Reconstruction
Clustering methods Tree building methods for distance-based trees
Multiple Alignment and Phylogenetic Trees
The Tree of Life From Ernst Haeckel, 1891.
Phylogenetic Trees.
BNFO 602 Phylogenetics Usman Roshan.
CS 581 Tandy Warnow.
#30 - Phylogenetics Distance-Based Methods
September 1, 2009 Tandy Warnow
Presentation transcript:

Phylogeny

Tree construction methods Character based Parsimony Fitch Sankoff Probabilistic Maximum likelihood Distance based UPGMA

Maximum Likelihood Method Input: 𝑛 strings of length 𝑚 (multiple alignment) Substitution matrix Character frequency Output: A tree topology with the input strings at the leaves

Maximum Likelihood Method Input: 𝑛 strings of length 𝑚 (multiple alignment) Substitution matrix Character frequency for each possible tree topology with leaf labeling 𝑇: for each position 𝑖 from 1 to 𝑚: 𝐿 𝑖 = 𝑃(𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐 𝑖𝑛𝑛𝑒𝑟 𝑛𝑜𝑑𝑒 𝑙𝑎𝑏𝑒𝑙𝑖𝑛𝑔) 𝐿 𝑇 = 𝐿 1 ∗ 𝐿 2 ∗…∗ 𝐿 𝑚 Pick the tree with the Maximal Likelihood

Maximum Likelihood Computation for a specific tree Given a tree topology and a leaf labeling Every possible inner node labeling should be considered How many different trees with inner nodes labeling exist for the given tree? (DNA alphabet) 4 4 =256 T G C A

Maximum Likelihood Computation for a specific tree Compute the likelihood for the given tree: A T G C T G C A   0.1 0.2 1 *All other options for inner node labeling were already computed, and their sum is 0.2 𝑃 𝐴 =0.3, 𝑃 𝑇 =0.3 𝑃 𝐶 =0.2, 𝑃 𝐺 =0.2

Maximum Likelihood Computation for a specific tree Compute the likelihood for the given tree: A T G C T G C A   0.1 0.2 1 *All other options for inner node labeling were already computed, and their sum is 0.2 𝑃 𝐴 =0.3, 𝑃 𝑇 =0.3 𝑃 𝐶 =0.2, 𝑃 𝐺 =0.2 𝐿(𝑇)=0.2+𝑃 𝐴 ∗𝑃 𝐴→𝑇 ∗𝑃 𝑇→𝐶 ∗𝑃 𝐶→𝑇 ∗𝑃 𝐶→𝐺 ∗𝑃 𝑇→𝐶 ∗𝑃 𝐴→𝐺 ∗𝑃 𝐺→𝐴 ∗𝑃 𝐺→𝐺

Maximum Likelihood Computation for a specific tree Compute the likelihood for the given tree: A T G C T G C A   0.1 0.2 1 *All other options for inner node labeling were already computed, and their sum is 0.2 𝑃 𝐴 =0.3, 𝑃 𝑇 =0.3 𝑃 𝐶 =0.2, 𝑃 𝐺 =0.2 𝐿(𝑇)=0.2+𝑃 𝐴 ∗𝑃 𝐴→𝑇 ∗𝑃 𝑇→𝐶 ∗𝑃 𝐶→𝑇 ∗𝑃 𝐶→𝐺 ∗𝑃 𝑇→𝐶 ∗𝑃 𝐴→𝐺 ∗𝑃 𝐺→𝐴 ∗𝑃 𝐺→𝐺 =0.3∗0.1∗0.2∗0.2∗0.1∗0.2∗0.2∗0.2∗1=0.2+9.6∗ 10 −7 =0.20000096

UPGMA UPGMA is a greedy algorithm that constructs a phylogenetic tree, given 𝑛 species and a table 𝐷[𝑛×𝑛] of distances between each 2 species.

Some definitions Additive distance matrix: A distance matrix is called additive if there exists a tree in which the distances between the leaves correspond to the matrix’s distances. Another definition is the “4 point criterion”, which is easier to verify.

Additive matrix The “4 points criterion”: A matrix is said to be additive if every 4 objects (species) can be labeled as 𝑥,𝑦,𝑧,𝑤 so that: z x c a x y b d w 𝑎+𝑏 + 𝑐+𝑑 ≤ 𝑎+𝑥+𝑐 + 𝑏+𝑥+𝑑 = 𝑎+𝑥+𝑑 +(𝑏+𝑥+𝑐)

Additive matrix The “4 points criterion”: A matrix is said to be additive if every 4 objects (species) can be labeled as 𝑥,𝑦,𝑧,𝑤 so that: z x c a x y b d w 𝑎+𝑏 + 𝑐+𝑑 ≤ 𝑎+𝑥+𝑐 + 𝑏+𝑥+𝑑 = 𝑎+𝑥+𝑑 +(𝑏+𝑥+𝑐)

Additive matrix The “4 points criterion”: A matrix is said to be additive if every 4 objects (species) can be labeled as 𝑥,𝑦,𝑧,𝑤 so that: z x c a x y b d w 𝑎+𝑏 + 𝑐+𝑑 ≤ 𝑎+𝑥+𝑐 + 𝑏+𝑥+𝑑 = 𝑎+𝑥+𝑑 +(𝑏+𝑥+𝑐)

Additive matrix The “4 points criterion”: A matrix is said to be additive if every 4 objects (species) can be labeled as 𝑥,𝑦,𝑧,𝑤 so that: z x c a x y b d w 𝑎+𝑏 + 𝑐+𝑑 ≤ 𝑎+𝑥+𝑐 + 𝑏+𝑥+𝑑 = 𝑎+𝑥+𝑑 +(𝑏+𝑥+𝑐)

Additive matrix 𝑑 𝐴,𝐵 +𝑑 𝐶,𝐷 =12+6=18 𝑑 𝐴,𝐶 +𝑑 𝐵,𝐷 =14+12=26 𝑑 𝐴,𝐷 +𝑑(𝐵,𝐶)=14+12=26 𝑑 𝐴,𝐵 +𝑑 𝐶,𝐷 ≤𝑑 𝐴,𝐶 +𝑑 𝐵,𝐷 =𝑑 𝐴,𝐷 +𝑑(𝐵,𝐶) 18 26 26

Non-Additive matrix 𝑑 𝐴,𝐵 +𝑑 𝐶,𝐷 =2+2 𝑑 𝐴,𝐶 +𝑑 𝐵,𝐷 =2+2 𝑑 𝐴,𝐷 +𝑑 𝐵,𝐶 =2+3   A B C D 2 3

Ultrametric Distance Matrix A distance matrix is called ultrametric if there exists a tree corresponding to the matrix’s distances, in which all leaves have equal distance from the root. Notice that by definition, ultrametric is a special case of additive.

Some definitions The “3 point criterion”: Like the additive case, ultrametric has another definition: If all 3 taxa can be relabeled as 𝑥,𝑦,𝑧 so that:

Some definitions Ultrametric distance: For example, this is an ultrametric tree:

UPGMA algorithm UPGMA - Unweighted Pair Group Method with Arithmatic Mean Input – a distance matrix D. Each cell [𝑖,𝑗] represents the distance 𝑑(𝑖,𝑗) between species 𝑖 and species 𝑗. Output – an ultrametric phylogenetic tree T, with leaf labeling

UPGMA algorithm Input: 𝐷[𝑛×𝑛] – distance matrix Initialize: 𝑇={ 𝐶 1 ,…, 𝐶 𝑛 } While 𝑇 >1 cluster taxa: Pick shortest distance 𝑑(𝑖,𝑗) C← C 𝑖 , C j Define node at height 𝑑 𝐶 𝑖 , 𝐶 𝑗 2 T← T \ { C i , C j } T←T U {C} Update D: ∀ 𝐶 𝑘 ∈𝑇, 𝐶 𝑘 ≠ 𝐶 𝑖 , 𝐶 𝑗 𝑑 𝐶, 𝐶 𝑘 = 𝑑 𝐶 𝑖 , 𝐶 𝑘 | 𝐶 𝑖 |+𝑑 𝐶 𝑗 , 𝐶 𝑘 | 𝐶 𝑗 | 𝐶 𝑖 +| 𝐶 𝑗 |

UPGMA Example Given the distance matrix below, build a phylogenetic tree using UPGMA   A B C D E 2 4 6 F 8

Example   A B C D E 2 4 6 F 8 We begin by choosing a minimal distance, and clustering the nodes chosen.

UPGMA Example Then we calculate the distances between our new cluster and all the rest of the nodes, to create an updated distance matrix D. The distances not including our cluster’s nodes remain exactly the same.

Example The updated distance matrix: A B C D E 2 4 6 F 8 AB C D E 4 6 𝑑 𝐶, 𝐶 𝑘 = 𝑑 𝐶 𝑖 , 𝐶 𝑘 | 𝐶 𝑖 |+𝑑 𝐶 𝑗 , 𝐶 𝑘 | 𝐶 𝑗 | 𝐶 𝑖 +| 𝐶 𝑗 |   A B C D E 2 4 6 F 8 𝑑 𝐴𝐵, 𝐶 𝑘 = 𝑑 𝐴, 𝐶 𝑘 𝐴 +𝑑 𝐵, 𝐶 𝑘 𝐵 𝐴 +|𝐵| 𝑑 𝐴𝐵, 𝐶 = 4 𝐴 +4 𝐵 𝐴 +|𝐵| =4   AB C D E 4 6 F 8 The updated distance matrix:

Example   AB C D E 4 6 F 8 Now, we carry on doing the exact same procedure, until we are left with only one cluster.

Example   AB C D E 4 6 F 8 Now, we carry on doing the exact same procedure, until we are left with only one cluster.

Example 𝑑 𝐶, 𝐶 𝑘 = 𝑑 𝐶 𝑖 , 𝐶 𝑘 | 𝐶 𝑖 |+𝑑 𝐶 𝑗 , 𝐶 𝑘 | 𝐶 𝑗 | 𝐶 𝑖 +| 𝐶 𝑗 |   AB C D E 4 6 F 8   AB C DE 4 6 F 8 𝑑 𝐷𝐸, 𝐴𝐵 = 𝑑 𝐷, 𝐴𝐵 𝐷 +𝑑 𝐸, 𝐴𝐵 𝐸 𝐸 +|𝐷| = 6+6 2 =6

Example   AB C DE 4 6 F 8

Example   AB C DE 4 6 F 8

Example 𝑑 𝐶, 𝐶 𝑘 = 𝑑 𝐶 𝑖 , 𝐶 𝑘 | 𝐶 𝑖 |+𝑑 𝐶 𝑗 , 𝐶 𝑘 | 𝐶 𝑗 | 𝐶 𝑖 +| 𝐶 𝑗 |   AB C DE 4 6 F 8   AB,C DE 6 F 8 𝑑 𝐴𝐵𝐶, 𝐷𝐸 = 𝑑 𝐴𝐵, 𝐷𝐸 𝐴𝐵 +𝑑 𝐶, 𝐷𝐸 𝐶 𝐴𝐵 +|𝐶| = 6∗2+6∗1 3 =6 𝑑 𝐴𝐵𝐶, 𝐹 = 𝑑 𝐴𝐵, 𝐹 𝐴𝐵 +𝑑 𝐶, 𝐹 𝐶 𝐴𝐵 +|𝐶| = 8∗2+8∗1 3 =8

Example AB,C DE 6 F 8 (AB,C),DE F 8   AB,C DE 6 F 8   (AB,C),DE F 8 𝑑 𝐴𝐵𝐶𝐷𝐸, 𝐹 = 𝑑 𝐴𝐵𝐶, 𝐹 𝐴𝐵𝐶 +𝑑 𝐷𝐸, 𝐹 𝐷𝐸 𝐴𝐵𝐶 +|𝐷𝐸| = 8∗3+8∗2 5 =8

UPGMA Example Our output tree! Lovely, isn’t it?   (AB,C),DE F 8 Our output tree! Lovely, isn’t it? Can there be more the one tree?

UPGMA Example   A B C D E 2 4 6 F 8 What can be said about the distance matrix? Is it additive? Ultrametric?

UPGMA downfalls UPGMA will always return an ultrametric tree. It assumes all species mutate at the same rate (molecular clock). What will happen if we will try and reconstruct a tree such as this one?

UPGMA downfalls This tree corresponds to the following distance matrix:   A B C D E 5 4 7 10 6 9 F 8 11

UPGMA downfalls If we run UPGMA on the matrix shown, will get this output: Compared to the original tree:

UPGMA downfalls UPGMA returns the right tree if the distance matrix is ultrametric. Even then, we can’t be certain the original tree was also ultrametric. If the distance matrix D is not additive, UPGMA will generate a heuristic solution that does not fit D