Distance-Based Phylogenetic Reconstruction Tutorial #8 © Ilan Gronau, edited by Itai Sharon.

Slides:



Advertisements
Similar presentations
Maximum Parsimony Probabilistic Models of Evolutions Distance Based Methods Lecture 12 © Shlomo Moran, Ilan Gronau.
Advertisements

PHYLOGENETIC TREES Bulent Moller CSE March 2004.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
PLGW01 - September Inferring Phylogenies from LCA distances (back to the basics of distance-based phylogenetic reconstruction) Ilan Gronau Shlomo.
The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau.
Molecular Evolution Revised 29/12/06
Problem Set 2 Solutions Tree Reconstruction Algorithms
UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
Bioinformatics Algorithms and Data Structures
BNFO 602 Phylogenetics Usman Roshan.
. Phylogenetic Trees - Parsimony Tutorial #12 Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in:
. Perfect Phylogeny Tutorial #11 © Ilan Gronau Original slides by Shlomo Moran.
. Distance-Based Phylogenetic Reconstruction ( part II ) Tutorial #11 © Ilan Gronau.
Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
. Robustness to Noise in Distance-Based Phylogenetic Reconstruction Methods Tutorial #12 © Ilan Gronau.
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogeny Tree Reconstruction
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
. Robustness to Noise in Distance-Based Phylogenetic Reconstruction Methods Tutorial #13 © Ilan Gronau.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic Trees Lecture 2
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
Phylogenetic trees Sushmita Roy BMI/CS 576
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
9/1/ Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein.
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
PHYLOGENETIC TREES Dwyane George February 24,
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
The Neighbor Joining Tree-Reconstruction Technique Lecture 13 ©Shlomo Moran & Ilan Gronau.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Trees Tutorial 5. Agenda How to construct a tree using Neighbor Joining algorithm Phylogeny.fr tool Cool story of the day: Horizontal gene.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Evolutionary tree reconstruction (Chapter 10). Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Evolutionary tree reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Tutorial 5 Phylogenetic Trees.
. Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Phylogenetic Trees - Parsimony Tutorial #12
Phylogenetic basis of systematics
Distance based phylogenetics
CSCI2950-C Lecture 7 Molecular Evolution and Phylogeny
dij(T) - the length of a path between leaves i and j
Inferring a phylogeny is an estimation procedure.
Phylogenetic Trees.
Lecture 7 – Algorithmic Approaches
Phylogeny.
Perfect Phylogeny Tutorial #10
Presentation transcript:

Distance-Based Phylogenetic Reconstruction Tutorial #8 © Ilan Gronau, edited by Itai Sharon

Phylogenetic Reconstruction We’d like to study the evolutionary history of species Problems: No information regarding extinct species Many possible tree topologies

3 Common Terminology A B C D E Edges represent distance between nodes Root (Ancestral node) Internal nodes (common ancestors) Leaves TAXA (genes, proteins, species etc.)

Phylogenetic Reconstruction Approach 1: (Character based)  Given a probabilistic model (HMM) of evolution, find the most probable tree to yield the known set of species. Problem: Finding ML tree is very hard Evolutionary models are very complex, with many parameters Estimating parameters using EM  Many local maxima  Small trees (up to 5 taxa) are relatively easy  Big trees (more than 50 taxa) are almost impossible Approach 2: (Distance based)  Given ML pairwise ( evolutionary ) distances between species, find the edge-weighted tree best describing this metric Note: ML pairwise distances = ML trees spanning two species

Distance-Based Reconstruction Given ML pairwise ( evolutionary ) distances between species, find the edge-weighted tree best describing this metric The input: distance matrix – D – D(i,i) ≥ 0 – D(i,i) = 0 – D(i,j) = D(j,i) – D(i,j) ≤ D(i,k) + D(k,j) The Output: edge-weighted tree – T If D is additive, then D T = D Otherwise, return a tree best ‘fitting’ the input – D. Note: Usually ML-estimated pairwise distances are not additive, but they are ‘close’ to some additive metric metric BearRaccoonWeaselSealDog Bear Raccoon Weasel Seal Dog Bear Raccoon Weasel Seal Dog

Neighbor-Joining Algorithms Agglomerative approach: (bottom-up) 1.Find a pair of taxa neighbors – i,j 2.Connect them to a new internal vertex – v (Define edge weights) 3.Remove i,j from taxon-set, and add v (Define distances from v ) 4.Return to (1)  When only 2 taxa are left, connect them Consistency: Given an additive metric D T : - We always choose a pair of neighbors in T (stage 1) - The reduced distance-matrix is consistent with the reduced tree (stage 3) Neighbors: taxa connected by a 2-edge path By induction: We eventually reconstruct T

UPGMA (U nweighted P air G roup M ethod with A rithmetic-Mean ) UPGMA algorithm: 1.Find a pair of taxa of minimal distace– i,j 2.Connect them to a new internal vertex v 3.Remove i,j from taxon-set, and add v ( D(v,k) = αD(i,k) +(1- α)D(j,k) ) 4.Return to (1)  When only 2 taxa are left, connect them Consistency ? - Given an additive metric D T, do we always choose a pair of neighbors in T ? abcd a b 0315 c 014 d 0 c a b d UPGMA chooses b,c Closest taxon is not necessarily a neighbor α, 1- α – proportional to the number of ‘original’ taxa i,j represent

Molecular Clock Reminder: Edge weights correspond to evolutionary distance If rate of evolution is universally constant:  The root is equidistant from all taxa  Closest taxon-pair is a neighbor-pair time

Molecular Clock Reminder: Edge weights correspond to evolutionary distance Rate of evolution is different in each branch  Most observed evolutionary trees  Closest taxon-pair is not necessarily a neighbor-pair time

Ultrametric Trees Edge-weighted trees which have a point (root) equidistant from all leaves Additive metrics consistent with an ultrametric tree are called ultrametrics A distance-matrix is ultrametric iff it obeys the 3-point condition: “ Any subset of three taxa can be labelled i,j,k such that d(i,j) ≤ d(j,k) = d(i,k) ”

UPGMA on Ultrametrics UPGMA algorithm: 1.Find a pair of taxa of minimal distace– i,j 2.Connect them to a new internal vertex v 3.Remove i,j from taxon-set, and add v ( D(v,k) = αD(i,k) +(1- α)D(j,k) ) 4.Return to (1)  When only 2 taxa are left, connect them Consistency for ultrametrics: Given an ultrametric U T : - We always choose a pair of neighbors in T (stage 1) - The reduced distance-matrix is consistent with the reduced tree (stage 3)

Consistency for ultrametrics: Given an ultrametric U T : - We always choose a pair of neighbors in T (stage 1) - The reduced distance-matrix is consistent with the reduced tree (stage 3) If i,j are neighbors in an ultrametric tree, then D(i,k) = D(j,k) for all k. - or - If D(i,j) is minimal in an ultrametric, then D(i,k) = D(j,k) for all k. k ij UPGMA on Ultrametrics 1.Find a pair of taxa of minimal distace– i,j 2.Connect them to a new internal vertex v 3.Remove i,j from taxon-set, and add v ( D(v,k) = αD(i,k) +(1- α)D(j,k) )

UPGMA on Ultrametrics Consistency for ultrametrics: Given an ultrametric U T : - We always choose a pair of neighbors in T (stage 1) - The reduced distance-matrix is consistent with the reduced tree (stage 3) 1.Find a pair of taxa of minimal distace– i,j 2.Connect them to a new internal vertex v 3.Remove i,j from taxon-set, and add v ( D(v,k) = αD(i,k) +(1- α)D(j,k) ) Assume, to the contrary, that i,j are not neighbors The path connecting i,j contains at least 3 non-zero weight edges v – the least-common ancestor (lca) of i,j.  There is a taxon k, s.t. D(j,k) (or D(i,k) ) is smaller than D(i,j). k i j v contradiction changed!!

UPGMA on Non-Ultrametric Data Edge-weights are set so that UPGMA always returns an ultrametric tree (we won’t prove) Example: BearRaccoonWeaselSealDog Bear Raccoon Weasel Seal Dog D: D is not ultrametric

UPGMA on Non-Ultrametric Data Example: 1 st iteration BRWSD B R W S 050 D 0 D: BearRaccoonWeaselSealDogB-R 13 B-RWSD W S 050 D 0 α = ½ 1.Find a pair of taxa of minimal distace– i,j 2.Connect them to a new internal vertex v 3.Remove i,j from taxon-set, and add v ( D(v,k) = αD(i,k) +(1- α)D(j,k) )

UPGMA on Non-Ultrametric Data Example: 2 nd iteration D: B-RWSD W S 050 D 0 BearRaccoonWeaselSealDog BR 13 B-R-S =5.25 B-R-SWD ⅓ W 051 D 0 α = ⅓ 1.Find a pair of taxa of minimal distace– i,j 2.Connect them to a new internal vertex v 3.Remove i,j from taxon-set, and add v ( D(v,k) = αD(i,k) +(1- α)D(j,k) )

UPGMA on Non-Ultrametric Data Example: 3 rd iteration D: BearRaccoonWeaselSealDog BR 13 B-R-S =5.25 B-R-S-WD 045¼ D 0 B-R-SWD ⅓ W 051 D 0 BRSW =1.75 α = ¼ 1.Find a pair of taxa of minimal distace– i,j 2.Connect them to a new internal vertex v 3.Remove i,j from taxon-set, and add v ( D(v,k) = αD(i,k) +(1- α)D(j,k) )

UPGMA on Non-Ultrametric Data Example: 4 th iteration D: B-R-S-WD D 0 BearRaccoonWeaselSealDog BR 13 BRS =5.25 BRSW =1.75 BRSWD = Find a pair of taxa of minimal distace– i,j 2.Connect them to a new internal vertex v 3.Remove i,j from taxon-set, and add v ( D(v,k) = αD(i,k) +(1- α)D(j,k) )

UPGMA Additional notes: In the reduction formula D(v,k) can be set to any value within the interval defined by D(i,k) and D(j,k).  In particular: D(v,k) = ½(D(i,k) + D(j,k)) ( WPGMA algorithm)  If we use: D(v,k) = min {D(i,k), D(j,k)} we get the ‘closest’ ultrametric from below (unique subdominant ultrametric) Run-time analysis: ―Naïve implementation: O(n 3 ) ―By keeping a sorted version of each row in D : O(n 2 log(n)) ―Third variant can be executed in: O(n 2 ) 1.Find a pair of taxa of minimal distace– i,j 2.Connect them to a new internal vertex v 3.Remove i,j from taxon-set, and add v ( D(v,k) = αD(i,k) +(1- α)D(j,k) )