Phylogenetic trees as a visualization tools for evolutionary classification.

Slides:



Advertisements
Similar presentations
Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Main Index Contents 11 Main Index Contents Week 6 – Binary Trees.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Phylogenetic Trees Lecture 4
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic Analysis – Part 2 Spring Outline   Why do we do phylogenetics (cladistics)?   How do we build a tree?   Do we believe the tree?
Chapter 20 Cladograms.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
1. 2 Rooting the tree and giving length to branches.
. Phylogenetic Trees - Parsimony Tutorial #12 Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in:
Phylogeny Reconstruction II. The edges of tree can be freely rotated without changing the relationships among the terminal nodes. Trees are like mobiles.
TREES. Trees HumanChimpGorilla = ChimpGorillaHuman ChimpHumanGorilla = HumanGorilla = Chimp HumanChimpGorilla ≠ ChimpHuman ≠ GorillaChimp.
Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.
Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
Introduction to Bioinformatics Molecular Phylogeny Lesson 5.
. Phylogenetic Trees - Parsimony Tutorial #11 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Phylogenetic trees. ChimpHumanGorilla HumanChimpGorilla = ChimpGorillaHuman == GorillaChimp Trees.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Strings and.
Classification and phylogeny
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Counting evolutionary changes the parsimony method requires an algorithm that counts the number of evolutionary changes in a tree. Fitch W.M Syst.
1 Section 9.2 Tree Applications. 2 Binary Search Trees Goal is implementation of an efficient searching algorithm Binary Search Tree: –binary tree in.
TREES. ChimpHumanGorilla HumanChimpGorilla = ChimpGorillaHuman == GorillaChimp Trees.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Molecular phylogenetics
Parsimony and searching tree-space Phylogenetics Workhop, August 2006 Barbara Holland.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetics II.
Summarising Sets of Phylogenies Consensus Trees and Split/Consensus Networks Aidan Budd EMBL Heidelberg Friday July 2nd 2010 Basic Molecular Evolution.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
Lecture 2: Principles of Phylogenetics
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Building Phylogenetic Trees.
Phylogenetic Analysis – Part 2. Outline   Why do we do phylogenetics (cladistics)?   How do we build a tree?   Do we believe the tree?   Applications.
Lecture 6A – Introduction to Trees & Optimality Criteria Branches: n-taxa -> 2n-3 branches 1, 2, 4, 6, & 7 are external (leaves) 3 & 5 are internal branches.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
Phylogenetic Trees - Parsimony Tutorial #13
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
Phylogenetic Analysis – Part 2. Outline   Why do we do phylogenetics (cladistics)?   How do we build a tree?   Do we believe the tree?   Applications.
4. Vorlesung WS 2005/06Softwarewerkzeuge der Bioinformatik1 V4 Prediction of Phylogenies based on single genes Material of this lecture taken from - chapter.
Tree Terminologies. Phylogenetic Tree - phylogenetic relationships are normally displayed in a tree-like diagram (phylogenetic tree/cladogram) - a cladogram.
Phylogenetic Trees - Parsimony Tutorial #12
Trees Chapter 15.
Lecture 6A – Introduction to Trees & Optimality Criteria
Character-Based Phylogeny Reconstruction
BNFO 602 Phylogenetics Usman Roshan.
BNFO 602 Phylogenetics – maximum parsimony
CS 581 Tandy Warnow.
Phylogeny and the Tree of Life
Chapter 20 Phylogenetic Trees. Chapter 20 Phylogenetic Trees.
Phylogeny.
Lecture 6A – Introduction to Trees & Optimality Criteria
Presentation transcript:

Phylogenetic trees as a visualization tools for evolutionary classification

ChimpHumanGorilla HumanChimpGorilla = ChimpGorillaHuman == GorillaChimp Trees

Same thing… s4s5 s1 s3 s2 s4s5 s1 s3 s2 =

Bifurcating / Multifurcating s4s5 s1 s3 s2 A multifurcation = Polytomy s4s5 s1 s3 s2 Dichotomy There are two types of polytomies: soft (lack of information to resolve the tree) and hard (multiple divergence in short evolutionary time).

A “comb” A comb s4s5 s1 s3 s2

Terminology A branch = An edge External node - leaf HumanChimp Chicken Gorilla The root Internal nodes

Ingroup / Outgroup: HumanChimp Chicken Gorilla INGROUP OUTGROUP

Subtrees HumanChimp Chicken Gorilla Duck A subtree

Monophyletic groups HumanChimp Chicken Gorilla The Gorilla+Human+Chimp are monophyletic. A clade is a monophyletic group.

Paraphyletic = Non- monophyletic groups WhaleChimp Drosophila Zebrafish The Zebrafish+Whale are paraphyletic

The maximum parsimony principle. 3. Tree building

Genes: 0 = absence, 1 = presence speciesg1g2g3g4g5g6 s s s s s Tree building

s1s4s3 s2 s5 Evaluate this tree… 3. Tree building

s1s4s3s2s5 Gene number Tree building

s1s4s3s2s5 Gene number 1, Option number Tree building

s1s4s3s2s5 Gene number 1, Option number 2. Number of changes for gene 1 (character 1) = Tree building

s1s4s3 s2 s5 Gene number 2, Option number Tree building

s1s4s3 s2 s5 Gene number 2, Option number Tree building

s1s4s3 s2 s5 Gene number 2, Option number Number of changes for gene 2 (character 2) = 2 3. Tree building

s1s4s3 s2 s5 Gene number 3, Option number Tree building

s1s4s3 s2 s5 Gene number 3, Option number Number of changes for gene 3 (character 3) = 1 3. Tree building

s1s4s3 s2 s5 Gene number 4, Option number Tree building

s1s4s3 s2 s5 Gene number 4, Option number Number of changes for gene 4 (character 4) = 2 3. Tree building

Gene number 5 is the same as Gene number 4 Number of changes for gene 5 (character 5) = 2 3. Tree building

s1s4s3 s2 s5 Gene number 6, 1 option only: Number of changes for gene 6 (character 6) = 1 3. Tree building

Sum of changes Number of changes for gene 6 (character 6) = 1 Number of changes for gene 5 (character 5) = 2 Number of changes for gene 4 (character 4) = 2 Number of changes for gene 3 (character 3) = 1 Number of changes for gene 2 (character 2) = 2 Sum of changes for this tree topology = 9 Can we do better ??? Number of changes for gene 1 (character 1) = 1 3. Tree building

s1s4s3 s2 s5 The MP (most parsimonious) tree: Sum of changes for this tree topology = 8 3. Tree building

How to efficiently compute the MP score of a tree

The Fitch algorithm (1971): AG C C A HumanChimp Chicken Gorilla Duck {A,G} {A,C,G} {A,C} Postorder tree scan. In each node, if the intersection between the leaves is empty: we apply a union operator. Otherwise, an intersection.

Number of changes AG C C A HumanChimp Chicken Gorilla Duck {A,G} {A,C,G} {A,C} Total number of changes = number of union operators.

Patterns: AG C C A HumanChimp Chicken Gorilla Duck {A,G} {A,C,G} {A,C} CACAG require the same number of changes as CACAT, or in general all those positions with the pattern XYXYZ.

Ex: GACAGGGA CAAG GCGA GAAA HumanChimp Chicken Gorilla Duck Find min. number of changes. Point to all identical patterns.

Ambiguous characters: AG C C R = {A,G} HumanChimp Chicken Gorilla Duck {A,G} {A,C,G} {A,G,C } {A,C,G } R = {A,G} = Purine..

Subtrees Each node has an ID HumanChimp Chicken Gorilla Duck Subtree of node 4.

The Sankoff algorithm: Generalization: they assume a cost function Cij for changing from i to j. If Cij = 1, it just counts number of changes. We now search for the tree with the min. cost. Definition: Si(k) = Minimum cost of the subtree of node i, given that the assignment of node i = character k.

Easy to compute for the leaves. For example S 2 (A) = 0 (no cost in A there) S 2 (C) = S 2 (G) = S 2 (T) ∞ (they just can’t be there) A G A A C

Definition: Si(k) = Minimum cost of the subtree of node i, given that the assignment of node i = character k AG A A C [0, ∞, ∞, ∞][∞, 0, ∞, ∞][0, ∞, ∞, ∞] [∞, ∞, 0, ∞]

Definition: Si(k) = Minimum cost of the subtree of node i, given that the assignment of node i = character k. 1 0 [s 1 (A), s 1 (C), s 1 (G), s 1 (T)] ACGT A0312 C3021 G1203 T2130 Costs: 2 [s 2 (A), s 2 (C), s 2 (G), s 2 (T)] S 0 (A) = min x (C AX + S 1 (X)) + min Y (C AY +S 2 (Y))

Definition: Si(k) = Minimum cost of the subtree of node i, given that the assignment of node i = character k. 1 0 [13, 17, 22, 14] ACGT A0312 C3021 G1203 T2130 Costs: 2 [15,14,21,17] S 0 (A) = min { 13, , , } + min { 15, , , } = = 28.

Definition: Si(k) = Minimum cost of the subtree of node i, given that the assignment of node i = character k. 1 [13, 17, 22, 14] ACGT A0312 C3021 G1203 T2130 Costs: 2 [15,14,21,17] S 0 (C) = min { , 17, , } + min { , 14, , } = = 29. [28,x,y,z}

Definition: Si(k) = Minimum cost of the subtree of node i, given that the assignment of node i = character k. 1 [13, 17, 22, 14] ACGT A0312 C3021 G1203 T2130 Costs: 2 [15,14,21,17] S 0 (G) = min { , , 22, } + min { , , 21, } = = 30. [28,29,y,z}

Definition: Si(k) = Minimum cost of the subtree of node i, given that the assignment of node i = character k. 1 [13, 17, 22, 14] ACGT A0312 C3021 G1203 T2130 Costs: 2 [15,14,21,17] S 0 (T) = min { , , , 14 } + min { , , , 17 } = = 29. [28,29,30,z}

Definition: Si(k) = Minimum cost of the subtree of node i, given that the assignment of node i = character k. 1 [28,29,30,29} [13, 17, 22, 14] ACGT A0312 C3021 G1203 T2130 Costs: 2 [15,14,21,17] The cost of the tree is the minimum of this vector, which is 28.

Dynamic programming. This is an example of dynamic programming, because you first solve some small problems, and then recursively, use these solutions to build a solution to a larger problem.

Exercise. Compute minimal cost for this tree A G A C C ACGT A02.51 C 0 1 G1 0 T 1 0 Solution: the vector at the root should be [6,6,7,8], thus, the answer is 6.