Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

Clustering (2). Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram –A tree like.
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Brandon Andrews CS6030.  What is a phylogenetic tree?  Goals in a phylogenetic tree generator  Distance based method  Fitch-Margoliash Method Example.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
The Evolution Trees From: Computational Biology by R. C. T. Lee S. J. Shyu Department of Computer Science Ming Chuan University.
BNFO 602 Phylogenetics Usman Roshan.
. Distance-Based Phylogenetic Reconstruction ( part II ) Tutorial #11 © Ilan Gronau.
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
Multiple sequence alignment
Lecture 24 Inferring molecular phylogeny Distance methods
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Distance-Based Phylogenetic Reconstruction Tutorial #8 © Ilan Gronau, edited by Itai Sharon.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic trees Tutorial 6. Distance based methods UPGMA Neighbor Joining Tools Mega phylogeny.fr DrewTree Phylogenetic Trees.
Phylogenetic trees Sushmita Roy BMI/CS 576
9/1/ Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein.
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
PHYLOGENETIC TREES Dwyane George February 24,
1 Chapter 7 Building Phylogenetic Trees. 2 Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances –UPGMA method.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
BINF6201/8201 Molecular phylogenetic methods
Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart.
The Neighbor Joining Tree-Reconstruction Technique Lecture 13 ©Shlomo Moran & Ilan Gronau.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Phylogenetic Trees Tutorial 5. Agenda How to construct a tree using Neighbor Joining algorithm Phylogeny.fr tool Cool story of the day: Horizontal gene.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Evolutionary tree reconstruction
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.
Phylogeny Ch. 7 & 8.
WPGMA Input: Distance matrix Dij; Initially each element is a cluster. nr- size of cluster r Find min element Drs in D; merge clusters r,s Delete elts.
Matrix Algebra Section 7.2. Review of order of matrices 2 rows, 3 columns Order is determined by: (# of rows) x (# of columns)
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
DATA MINING van data naar informatie Ronald Westra Dep. Mathematics Maastricht University.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Tutorial 5 Phylogenetic Trees.
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Statistical stuff: models, methods, and performance issues CS 394C September 3, 2009.
Phylogenetics-2 Marek Kimmel (Statistics, Rice)
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart.
Phylogeny - based on whole genome data
Distance based phylogenetics
Multiple Alignment and Phylogenetic Trees
Hierarchical clustering approaches for high-throughput data
Inferring phylogenetic trees: Distance and maximum likelihood methods
Phylogenetic Trees.
Lecture 7 – Algorithmic Approaches
Phylogeny.
Incorporating uncertainty in distance-matrix phylogenetics
Presentation transcript:

Distance methods

UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?

Additivity

UPGMA UPGMA is not additive but works for ultrametric trees. Takes O(n^3) time ADC B A B C D A B CD

1.Initialize n clusters where each cluster i contains the sequence i 2.Find closest pair of clusters i, j, using distances in matrix D 3.Make them neighbors in the tree by adding new node (ij), and set distance from (ij) to i and j as D ij /2 4.Update distance matrix D: for all clusters k do the following (ni and nj are size of clusters i and j respectively) 5.Delete columns and rows for i and j in D and add new ones corresponding to cluster (ij) with distances as computed above 6.Goto step 2 until only one cluster is left UPGMA

A B C D A B CD ADC B

UPGMA Doesn’t work (in general) for non ultrametric trees A D CB A B C D A B CD

UPGMA UPGMA constructs incorrect tree here A B C D A B CD BDA C

UPGMA Bipartition (BC,AD) is not in true tree BDA C A D CB True treeUPGMA tree

Neighbor joining 1.Additive and O(n^3) time 2.Initialization: same as UPGMA 3.For each species compute 4.Select i and j for which is minimum 5.Make them neighbors in the tree by adding new node (ij), and set distance from (ij) to i and j as

Neighbor joining 6.Update distance matrix D: for all clusters k do the following 7.Delete columns and rows for i and j in D and add new ones corresponding to cluster (ij) with distances as computed above 8.Go to 3 until two nodes/clusters are left

NJ NJ constructs the correct tree for additive matrices A D CB A B C D A B CD

Simulation studies

The true evolutionary tree is never known in practice. Simulation allows us to study accuracy of methods under biologically realistic scenarios Mathematics behind the phylogenetics is often complex and challenging. Simulation allows us to study algorithms when not possible theoretically and also examine algorithm performance under various conditions such as different evolutionary rates, sequence lengths, or numbers of taxa