Phylogeny Ch. 7 & 8.

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
An Introduction to Phylogenetic Methods
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
IE68 - Biological databases Phylogenetic analysis
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
Bioinformatics I Fall 2003 copyright Susan Smith 1 Phylogenetic Analysis.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Phylogenetic Reconstruction: Distance Matrix Methods Anders Gorm Pedersen Molecular Evolution Group Center for.
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces.
The Tree of Life From Ernst Haeckel, 1891.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogeny Tree Reconstruction
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
Phylogenetic trees Sushmita Roy BMI/CS 576
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Terminology of phylogenetic trees
Molecular phylogenetics
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
 Read Chapter 4.  All living organisms are related to each other having descended from common ancestors.  Understanding the evolutionary relationships.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Building Phylogenetic Trees.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
PHYLOGENY AND THE TREE OF LIFE CH 26. I. Phylogenies show evolutionary relationships A. Binomial nomenclature: – Genus + species name Homo sapiens.
Phylogenetic Trees - Parsimony Tutorial #13
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Molecular Evolution. Study of how genes and proteins evolve and how are organisms related based on their DNA sequence Molecular evolution therefore is.
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Introduction to Bioinformatics Resources for DNA Barcoding
Phylogenetic basis of systematics
Inferring a phylogeny is an estimation procedure.
Multiple Alignment and Phylogenetic Trees
Inferring phylogenetic trees: Distance and maximum likelihood methods
Molecular Evolution.
#30 - Phylogenetics Distance-Based Methods
Phylogeny.
Presentation transcript:

Phylogeny Ch. 7 & 8

Overview Evolution and sequence variation Phylogenetic trees The meaning of distance Evolutionary sequence models Constructing trees Sequence alignment

Evolution and Sequence Variation

Sequence similarity may imply common descent Similarity of genomic and protein sequence is one way to try and infer the relationships among organisms. If two sequences are homologs, they are descended from a most recent common ancestor sequence. This may imply that the ancestral sequence was in the ancestral organism, but horizontal transfer can occur.

Phylogenetic Trees

Trees are a convenient way to summarize the relationships among a set of (orthologous) sequences or a set of species.

Rooted and Unrooted Trees “Leaves” are extant species Internal nodes are ancestral species Adding a root gives time a direction It is very difficult to accurately determine where the root should go, so it is best to avoid placing it…

The Data Phylogenetic trees predate genomic sequence data. Traditional taxonomy used physical characteristics. Qualitative: eg, fur-bearing Quantitative: number of petals Sequence data is quantitative and plentiful.

What’s in a tree? Cladograms Additive trees Ultrametric trees

Cladograms Branch lengths are meaningless. Shows evolutionary relationships of “taxa” only.

Additive Trees Branch lengths measure “evolutionary distance”. Total distance between two taxa is the sum of the branch lengths separating them. Don’t have to be rooted.

But how can two species be at different “evolutionary distances” from their ancestor?

Distance  Time The rate of evolution, r, can vary over time. The distance is equal to the rate times the time: d=rt

Ultrametric Trees Simplest type of rooted, additive tree. Assumes that the rate of evolution is constant over time. With sequences, called the “molecular clock”. Horizontal lines have no meaning.

Evolutionary Sequence Models

We want to build phylogenetic trees from orthologous genes or proteins. Evolutionary sequence models give us a way to model how one ancestral sequence evolves (independently) into two daughter sequences.

What is the evolutionary distance between two DNA sequences? Align the two DNA sequences. Count the number of places where they differ (ignoring gaps) p = D/L D is the number of differences and L is the total number of aligned positions

Is p the evolutionary distance? NO! p is just the observed number of differences. What is value will p tend towards as evolutionary distance increases???

All things being equal… If all mutations (from one nucleic acid to another) are equally likely, p  3/4 Do you see why?

So what is going on here, really? A position can mutate to any of the 3 other nucleic acids. If the ancestral sequence is distant, this can happen multiple times. But all we get to see is the final result! So a position with a different nucleic acid may be the result of one or more mutation events. And positions with the same nucleic acid can also have had an even number of mutations. Seq 1: A ->T Seq 2: A -> T

If we model mutations as a Poisson process Probability of no mutation in time t is exp(-rt) Both sequences evolving so exp(-2rt) Let d=2rt Then 1-p = exp(-d) So d = -ln(1-p)

Relationship between p-distance and evolutionary distance

Summary So the branch lengths of the tree are “d=rt”. We must propose an evolutionary model to compute “d” from the observed p-distance. The Poisson model is too simple. It doesn’t capture real evolution.

Other Evolutionary Models Jukes-Cantor Assumes all base frequencies are ¼ Has one parameter, α, the substitution rate (per unit time). Distance formula: d = ¾ ln(1- 4⁄3 p)

Kimura Two-Parameter Model Models transversions and transitions separately because the former are very uncommon in reality. Transitions: A<->G, C<->T Two parameters: transition rate α, transversion rate β. Distance formula: d = ½ ln(1-2P-Q) - ¼ ln(1-2Q) where P and Q are fraction of transitions and transversions, respectively.

Transitions and Transversions

More General Models More general models take into account other realities like: Non-uniform base frequencies Non-uniform mutation rates (Gamma correction)

Constructing Phylogenetic Trees

First, construct a multiple alignment A good multiple alignment is key. The p-distances between pairs of sequences can then be computed. This allows the d-distances between pairs of sequences to be computed. Some tree-building methods use the multiple alignment directly Parsimony Methods

Next, choose a tree-building method UPGMA (1958) Builds rooted, ultrametric trees Assumes constant rate of evolution in all branches Neighbor-joining (1987) Builds unrooted, additive trees Assumes the best tree has the shortest total branch length. Principal of minimum evolution, as with maximum parsimony trees.

Neighbor-Joining Similar to maximum parsimony, but works with large datasets. Maximum parsimony methods consider many more tree topologies, so they don’t scale to large numbers of species.

Neighbors are separated by one node. Start with a star topology. Everybody’s a neighbor!

Neighbors are separated by one node. Assume Sequences 1 and 2 were nearest neighbors. So they are joined with new node Y. The method computes the new branch lengths.

Find pair of neighbors that reduces total branch length most N sequences dij = distance between sequences i and j Ui = sum of distances from sequence i to all other sequences δij = dij - (Ui + Uj)/(N-2) Find pair of sequences with minimum δij.

Initial tree: 5 sequences D C B

Step 1. Join nearest neighbors.

How the new branch lengths are computed The new branch lengths from the joined neighbors to the new node W are biW = ½(dij + (Ui – Uj)/(N-2)) and bjW = dij – biW where i = E and j = D in the example.

Replace joined neighbors with new node W.

Compute distances from new node W to each remaining sequence The new distances (to each remaining sequence k) dWk = ½(dik + djk – dij) where i and j are the nearest neighbors (D and E in this example).

Step 2: Repeat with the new star tree

Replace neighbors with new node X.

Step 3: Repeat again

All done. The tree is now a binary tree so the procedure is complete.