The Tree of Life From Ernst Haeckel, 1891.

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

Computational Molecular Biology Biochem 218 – BioMedical Informatics Doug Brutlag Professor.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Phylogenetic Trees Lecture 12
. Intro to Phylogenetic Trees Lecture 5 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran. Slight modifications by Benny.
Reading Phylogenetic Trees Gloria Rendon NCSA November, 2008.
Multiple Sequence Alignment & Phylogenetic Trees.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Lecture 13 CS5661 Phylogenetics Motivation Concepts Algorithms.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Reading Phylogenetic Trees
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Phylogenetic trees Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Chapter 2.
Molecular Evolution Revised 29/12/06
Fitch-Margoliash (FM) Algorithm
From Ernst Haeckel, 1891 The Tree of Life.  Classical approach considers morphological features  number of legs, lengths of legs, etc.  Modern approach.
UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a.
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces.
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
The Evolution Trees From: Computational Biology by R. C. T. Lee S. J. Shyu Department of Computer Science Ming Chuan University.
Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.
Bioinformatics and Phylogenetic Analysis
The Tree of Life From Ernst Haeckel, 1891.
We have shown that: To see what this means in the long run let α=.001 and graph p:
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
. Phylogenetic Trees Lecture 1 Credits: N. Friedman, D. Geiger, S. Moran,
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
Phylogenetic trees Sushmita Roy BMI/CS 576
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Phylogenetic Tree Reconstruction
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Introduction to Phylogenetics
Reading Phylogenetic Trees
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Comp. Genomics Recitation 8 Phylogeny. Outline Phylogeny: Distance based Probabilistic Parsimony.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Tutorial 5 Phylogenetic Trees.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Evolutionary genomics can now be applied beyond ‘model’ organisms
Phylogeny - based on whole genome data
Distance based phylogenetics
Clustering methods Tree building methods for distance-based trees
Multiple Alignment and Phylogenetic Trees
Inferring phylogenetic trees: Distance and maximum likelihood methods
Phylogenetic Trees.
Reading Phylogenetic Trees
#30 - Phylogenetics Distance-Based Methods
Phylogeny.
Phylogenetic Trees Jasmin sutkovic.
Presentation transcript:

The Tree of Life From Ernst Haeckel, 1891

But, is there only one “Tree of Life?” There are many theories of evolution Basic idea: Speciation is caused by physical separation into groups where different genetic variants become dominant Basic Tennant: Any two species share a common ancestor some time in the distant past

We are generally considering a “Gene Tree” as opposed to a “Species Tree.” Divergence within a gene generally happens before splitting into species occurs. In order to get a picture of evolution involving species, there is a need to look at collections of genes as opposed to individual genes.

Classical phylogenetic analysis: morphological features number of legs, lengths of legs, etc. Modern biological methods allow for the use of molecular features Gene sequences Protein sequences Analysis based on homologous sequences (e.g., globins) in different species

Use of Molecular Data Provides an objective criteria for constructing phylogenetic trees Basic data includes Gene sequences Protein sequences Analysis based on homologous sequences in different species

However, gene/protein sequence can be homologous for different reasons: Orthologs -- sequences diverged after a speciation event Paralogs -- sequences diverged after a duplication event Xenologs -- sequences diverged after a horizontal transfer (e.g., by virus)

Two main kinds of information are contained in trees: The tree topology The tree metric The form or shape of the tree distance or branch length 2 1 3 6

NO! YES! Cycle Family trees can look like this These two trees are topologically equivalent. The one on the right is more common in biology texts.

Trees can be unrooted or rooted I V III IV II I V IV III II

If there are n leaf nodes or taxa, how many different trees are possible? NR = Number of Possible Rooted Trees = NU = Number of Possible Unrooted Trees = The numerator contains the computationally insidious factorial function. Well, so does the denominator, but it is for a much smaller number.

Three Leaf Nodes Four Leaf Nodes Only one unrooted tree is possible C Only one unrooted tree is possible Four Leaf Nodes C A D C B D A B C B A D Three different unrooted trees are possible

A Table Showing the Growth of Unrooted and Rooted Trees 3 1 4 15 5 105 6 945 7 10395 8 135135 9 2027025 10 34459425 11 654729075 12 65729075 1.375*10-10 What do you notice in this table? Why is this true?

To Further Make the Point If we are creating a tree with 15 different taxa, there are 213,458,046,767,875 possible rooted trees. Assuming a computer can create a tree in 10-9 seconds, it would take 2.47 days of computation time to create them. If 20 species, 8,200,794,532,637,891,559,337 possible trees and the same computer would take 259,867 years to generate this many trees!

If we assume the Molecular Clock is working, then the distance from the root to each leaf is the same. The above tree does not assume a Molecular Clock!

THE MODERN MOLECULAR CLOCK Lindell Bromham and David Penny

Distance data can be generated from character data: Jukes-Cantor where p = percent of mismatches Kimura where P = percent transitions Q = percent transversions

Next we create a matrix of these distances species A B C dAB - dAC dBC D dAD dBD dCD For Example: species A B C D 9 - 8 11 12 15 10 E 18 13 5

Input: distance matrix between species Outline: Simple Distance-Based Method Unweighted-pair-group method with arithmetic mean Input: distance matrix between species Outline: Cluster species together Initially clusters are singletons At each iteration combine two “closest” clusters to get a new one

UPGMA Clustering Let Ci and Cj be clusters, define distance between them to be When combining two clusters, Ci and Cj, to form a new cluster Ck, then

Begin with the following distance matrix B C D E Species B C D E A 4 2 3 - 1 Closest Pair is {B, D} so cluster them, C1 = {B, D} d(C1,A) = 1/2 (4 + 4) = 4 d(C1,C) = 1/2 (4 + 4) = 4 d(C1,E) = 1/2 (4 + 4) = 4 B D 0.5 Tree at end of Stage 1

Create a new matrix that includes C1 A C E Species A C E C1 4 - 2 3 Closest are A and C, so C2 = {A, C} d(C1, C2) = 1/2 (4 + 4) = 4 d(C2, E) = 1/2 (3 + 3) = 3 A C 1 B D 0.5 Tree at end of Stage 2

Once again we revise the distance matrix: Species C2 E C1 4 - 3 We create group C3 = {C2, E} = {{A, C}, E} d(C1, C3) = 1/6 ( d(B,A) + d(B,C) + d(B,E) + d(D,A) + d(D,C) + d(D,E)) = 1/6(4+4+4+4+4+4)=4 B D 0.5 E C A 1 1.5 NOTE: This tree satisfies the Molecular Clock Assumption. This is a basic property of UPGMA produced trees. Completed Tree

The Fitch-Margoliash(FM) Algorithm A weaker requirement is additivity In “real” tree, distances between species are the sum of distances between intermediate nodes k c b j a i

Consequences of Additivity Suppose input distances are additive For any three leaves Thus k c b j a m i

Applying this idea to three taxa, A, B, and C: z A y x B Using the fact that: x + y = d(A,B) x+ z = d(A,C) y + z = d(B,C) and a little high school algebra, we have x = 1/2 (d(A,B) + d(A,C) – d(B,C)) y = 1/2 (d(A,B) + d(B,C) – d(A,C)) z = 1/2 (d(A,C) + d(B,C) – d(A,B)

We will apply this criterion to the following data: B C D E A .31 1.01 .75 1.03 - 1.00 .69 .90 .61 .42 .37 We note that A and B are the closest, but to group them without the assumption of equal distance from a common ancestor, we temporarily group C-D-E and use the three taxa case: d(A,C-D-E) = 1/3(1.01+.75+1.03) = .93 d(B,C-D-E) = 1/3(1.00+.69+.90) = .863

From the formulas two slides previous we have: C-D-E .7415 A .1215 .1885 B Recall, the joining of C-D-E was only temporary so that we could get accurate distances for joining A and B Separating C, D, and E and combining A and B for the rest of the algorithm gives the table: C D E A-B 1.005 .72 .965 - .61 .42 .37

We now have the table: C D E A-B 1.005 .72 .965 - .61 .42 .37 The closest distance is D,E. So we combine everything else into a single group A-B-C d(D,A-B-C) = 1/3(.75+.69+.61) = .683 d(E,A-B-C) = 1/3(1.03+.90+.42) = .783 D E A-B-C .683 .783 - .37

This yeilds an intermediate tree of: .135 .548 A-B-C .235 E We keep the edges joining D and E while discarding the grouping A-B-C. We now have four edges of our tree and two groupings. d(A-B,D-E) = 1/4 (.75+1.03+.69+90) = .8425 d(A-B,C) = .72 d(C,D-E) = 1/2 (.61+.42) = .515

We can now produce the table: C D-E 1.005 .8425 - .515 Again applying the distance formulas we have the tree: C .33875 .6625 A-B .17625 D-E

All that remains is to compute a and b 0.1215 0.1885 0.135 0.235 0.33875 The average of A and B from their common vertex is 0.155. For D and E the average distance is .185 So for the value of a we have .66625 – .155 = .51125 for the value of b we have .1765 – .185 = -.00875 The negative value for b is a cause for concern about the quality of our data. If we are confident of our data and since .00875 is close to 0, most researchers would assign a value of 0 to b.

One concern is that we have produced an unrooted tree for our five species is that we have an unrooted tree and no real clue on where to place the root. Sometimes physical evidence can help with the placement of a root for the tree; however, many times such evidence does not exist. A common heuristic practice is to include an extra taxon that is more distantly related to those under consideration than they are to each other. Such a taxon is called an outgroup. The biological asumption is that this group must have split from the others before they split from themselves.