Doug Raiford Lesson 9.  3 Approaches  Distance  Parsimony  Maximum Likelihood  Have already seen a distance method 12/18/20152Phylogenetics Part.

Slides:



Advertisements
Similar presentations
Heuristic Search techniques
Advertisements

Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Lecture 13 CS5661 Phylogenetics Motivation Concepts Algorithms.
Brandon Andrews CS6030.  What is a phylogenetic tree?  Goals in a phylogenetic tree generator  Distance based method  Fitch-Margoliash Method Example.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Molecular Evolution Revised 29/12/06
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces.
Phylogeny Tree Reconstruction
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
The Tree of Life From Ernst Haeckel, 1891.
Lecture 24 Inferring molecular phylogeny Distance methods
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Tree-Building. Methods in Tree Building Phylogenetic trees can be constructed by: clustering method optimality method.
Phylogenetic trees Sushmita Roy BMI/CS 576
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Phylogenetic Analysis
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Terminology of phylogenetic trees
Molecular phylogenetics
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Parsimony and searching tree-space Phylogenetics Workhop, August 2006 Barbara Holland.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
COMPUTATIONAL MODELS FOR PHYLOGENETIC ANALYSIS K. R. PARDASANI DEPTT OF APPLIED MATHEMATICS MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY (MANIT) BHOPAL.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
BINF6201/8201 Molecular phylogenetic methods
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
A brief introduction to phylogenetics
Introduction to Phylogenetics
Newer methods for tree building
Calculating branch lengths from distances. ABC A B C----- a b c.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Phylogeny Ch. 7 & 8.
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
Construcción de cladogramas y Reconstrucción Filogenética
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Maximum Parsimony Phenetic (distance based) methods are fast and often accurate but discard data and are not based on explicit character states at each.
Clustering methods Tree building methods for distance-based trees
Multiple Alignment and Phylogenetic Trees
Methods of molecular phylogeny
Hierarchical clustering approaches for high-throughput data
Evolutionary history of related organisms
The Tree of Life From Ernst Haeckel, 1891.
CS 581 Tandy Warnow.
Lecture 7 – Algorithmic Approaches
Phylogeny.
Presentation transcript:

Doug Raiford Lesson 9

 3 Approaches  Distance  Parsimony  Maximum Likelihood  Have already seen a distance method 12/18/20152Phylogenetics Part II

 What’s wrong with UPGMA?  Let’s revisit the example  Can this be? Doesn’t the derived tree imply that B is equidistant from C and D 12/18/2015Phylogenetics Part II3 ABCD ABCD A0767 B045 C03 D0

 UPGMA averaged the two and put them both (branches for C and D) at 1.5  What if don’t have equal rates of evolution after a divergence 12/18/2015Phylogenetics Part II4 ABCD ABCD A0767 B045 C03 D

 Differing rates of evolution can sometimes cause problems with UPGMA  Especially if very similar (small distances) 12/18/2015Phylogenetics Part II5 ABC A043 B03 C0 ABC This treeYields this matrixYields this tree BCA

 Also called minimum evolution method  Definition of parsimony: 1 a : the quality of being careful with money or resources : thrift b : the quality or state of being stingy 2 : economy in the use of means to an end; especially : economy of explanation in conformity with Occam's razor  Ockham's razor: the simplest explanation is usually the best 12/18/20156Phylogenetics Part II

 Looks at each column of an MSA and attempts to find a tree that describes  Builds a consensus tree atgccgca-actgccgcaggagatcaggactttcatgaatatcatcatgcgtggga-ttcag acctccatacgtgccccaggagatctggactttcacc---tggatcatgcgaccgtacctac t-atgg-t-cgtgccgcaggagatcaggactttca-gt--g-aatcatctgg-cgc--c-aa t--tcgt-ac-tgccccaggagatctggactttcaaa---ca-atcatgcgcc-g-tc-tat aattccgtacgtgccgcaggagatcaggactttcag-t--a-tatcatctgtc-ggc--tag 12/18/20157Phylogenetics Part II

 What do we mean when we say “attempts to find a tree that describes”  Attempts to fit all possible trees in each column and choose best  How determine all possible trees?  How determine which one has the best fit?  Assume that majority nucleotide represents ancestor AGCT AACT One possible tree AAAG A 0 0 A or a G 0 if A 1 if A 12/18/20158Phylogenetics Part II Total mutations that explain this tree = 1 Pretty darn good

 When there are two organisms there is only one possible tree AB 12/18/20159Phylogenetics Part II

 What about when there are three  Third could go… AB 12/18/201510Phylogenetics Part II

 For each of the previous 3 trees, could add 4 th to any of its branches (or could form a new root)  Each of the possible trees had 4 branches so could add to one of 4 locations (or splice in at top)  So total number of trees with 4 leaves:  3*5=15 12/18/2015Phylogenetics Part II11 AB If this were the tree

 N i is number of trees given i taxa  B i is the number of branches in a tree given i taxa  B i =B i-1 +2, also i x 2-2  N i =N i-1 *(B i-1 +1)  plus 1 due to possible new root  N 2 = 1  B 2 =2 12/18/2015Phylogenetics Part II12 TaxaBranchesTrees , , ,027, ,459, ,729,075 Defined by a recurrence relation so … That’s right, as usual, exponential Defined by a recurrence relation so … That’s right, as usual, exponential What does this growth rate look like?

 Rooted vs. un-rooted  Wherever the root is, un-kink it 12/18/2015Phylogenetics Part II13

 Always bifurcated  Can never have 3 branches “from” a single node  What are the odds? 12/18/2015Phylogenetics Part II14 A BC D

 Three possible trees 12/18/2015Phylogenetics Part II15 A BC D A DC B A CB D Are there any other combinations?

 For each of the three trees (having 4 taxa) could add a branch to any of the 5 branches  3*5=15 trees 12/18/2015Phylogenetics Part II16 A BC D

 Outgroup  Include an organism that is known to be further away from all taxa than they are from each other 12/18/201517Phylogenetics Part II A BC D If outgroup goes here… outgroup ABCD

 N i is number of trees given i taxa  B i is the number of branches in a tree given i taxa  B i =B i-1 +2, also i x 2-3  N i =N i-1 *(B i-1 )  No need for a “plus 1” for a possible new root because there are no roots  N 2 = 1  B 2 =2 12/18/2015Phylogenetics Part II18 TaxaBranchesTrees , , ,027, ,459, ,729,075

 Noticed that for un-rooted trees:  B i =2i-3 (for i  2)  Also noticed  N i =N i-1 *B i-1  And reduced to  (2n-5)(2n-7)(2n-9)…(3)(1) where n is number of taxa  Shorthand: (2n-5)!!  For rooted  N i =N i-1 *(B i-1 +1)  Reduced to  (2n-3)!! 12/18/201519Phylogenetics Part II Ni=B i-1 *N i-1 =(2(i-1)-3)N i-1 =(2i-5)N i-1 =(2i-5)(2i-7)N i-2 Till the N term gets to 3 Double factorial: each successive number reduced by two

 Radical reduction in the number  Still only bought one additional taxa 12/18/2015Phylogenetics Part II20 TaxaUn-rooted treesRooted trees , , ,027, ,027,02534,459, ,459,425654,729, ,729,07513,749,310,575

 Even brighter mathematicians 12/18/201521Phylogenetics Part II Can you see why?

 Not really a candidate for dynamic programming  Don’t repeat a bunch of sub- problems over and over  Each sub-problem is a tree, and they are all unique 12/18/2015Phylogenetics Part II22 Still exponential

 Discard large subsets of possible solutions  Use heuristics or predictions 12/18/2015Phylogenetics Part II23 Don’t bother

 Calculate a reasonable upper bound using a fast algorithm like UPGMA (hierarchical clustering)  Incrementally grow potential trees  Any branch that any that go over threshold stop investigating 12/18/2015Phylogenetics Part II24 A BC D X X X Don’t bother, over threshold

 Some columns all same  Add no meaning  All trees minimum  Columns that are all different  Also add no meaning  Must have minimum 2 nt’s (or aa’s) that are the same  Useful in one respect  If all the same infer makeup of ancestor 12/18/2015Phylogenetics Part II25 AGCT AACT ACCT AAAA A 0 0 A A

 Each column yields a tree  If all agree done  If some different use majority rule  If sample too small perform bootstrapping  randomly draw sequences from MSA  Generate more trees  labeled branches with the percentage of bootstrap trees in which they appear  Used as a measure of support (repeatability) 12/18/2015Phylogenetics Part II26

 Still have maximum likelihood  Also, some inferential stuff, but that’s all in the next lecture 12/18/2015Phylogenetics Part II27

12/18/201528Phylogenetics Part III