Maximum parsimony Kai Müller.

Slides:



Advertisements
Similar presentations
Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Cladogram Building - 1 ß How complex is this problem anyway ? ß NP-complete:  Time needed to find solution in- creases exponentially with size of problem.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic Trees - I.
Phylogenetic reconstruction
IE68 - Biological databases Phylogenetic analysis
Molecular Evolution Revised 29/12/06
- A brief introduction in 4 hours -
“Inferring Phylogenies” Joseph Felsenstein Excellent reference
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
Tree Evaluation Tree Evaluation. Tree Evaluation A question often asked of a data set is whether it contains ‘significant cladistic structure’, that is.
NJ was originally described as a method for approximating a tree that minimizes the sum of least- squares branch lengths – the minimum – evolution criterion.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Phylogenetic Reconstruction: Parsimony Anders Gorm Pedersen
Building Phylogenies Parsimony 2.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
Phylogenetic trees Sushmita Roy BMI/CS 576
Lecture 8 – Searching Tree Space. The Search Tree.
What Is Phylogeny? The evolutionary history of a group.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Terminology of phylogenetic trees
Molecular phylogenetics
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Parsimony and searching tree-space Phylogenetics Workhop, August 2006 Barbara Holland.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
PARSIMONY ANALYSIS and Characters. Genetic Relationships Genetic relationships exist between individuals within populations These include ancestor-descendent.
COMPUTATIONAL MODELS FOR PHYLOGENETIC ANALYSIS K. R. PARDASANI DEPTT OF APPLIED MATHEMATICS MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY (MANIT) BHOPAL.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Lecture 2: Principles of Phylogenetics
Introduction to Phylogenetics
Newer methods for tree building
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
Cladogram construction Thanks to Leandro Gaetano.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Phylogeny Ch. 7 & 8.
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
Phylogenetics.
Phylogenetic Trees - Parsimony Tutorial #13
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
Maximum Parsimony Phenetic (distance based) methods are fast and often accurate but discard data and are not based on explicit character states at each.
Phylogeny and the Tree of Life
Phylogenetic basis of systematics
Phylogenetic Inference
Summary and Recommendations
BNFO 602 Phylogenetics Usman Roshan.
Lecture 8 – Searching Tree Space
Phylogeny.
PARSIMONY ANALYSIS.
Summary and Recommendations
Presentation transcript:

Maximum parsimony Kai Müller

Phylogenetic trees Cladogram Parenthetical notation (Newick) Branch lenghts without meaning Slanted/rectangular Network/unrooted Parenthetical notation (Newick)

Phylogenetic trees Phylogram Chronogram character changes time an ultrametric tree

Free rotation around all nodes a phylogram the same phylogram =

Phylogenetic trees

Polytomies - multifurcations hard soft

Order within polytomies without meaning =

How many trees are there? for n taxa, there are possible rooted trees A B C A B C D D B C A E Double factorial

Too many! for n taxa, there are possible rooted trees A B C

Types of characters Morphological Molecular DNA sequences Amino acid sequences Presence of genes in Genome Genomic rearrangements Presence of introns Single substitutions (SNP, RFLP, AFLP etc.)

Why many characters? So, why do we need a bazillion characters from entire genomes? The short answer is that we generally don’t. However, the idea behind sampling many characters is that although a minority of the characters will have errors that suggest incorrect relationships, such as the astragalus character supposedly did with the whales, the rest of the characters will contain phylogenetic signal and be congruent with one another. Here’s a map analogy from Willi Hennig’s 1965 book. Suppose we’re trying to reconstruct a map from the torn-up fragments. If we only consider the river, the arrangement at the top left makes sense. However, if we also consider the roads, the arrangement on the bottom right makes more sense. Again, we’re looking for congruence among multiple characters.

Different types of sequence homology

Different types of sequence homology Imagine we are unaware of the gene duplication and sample only alpha-chick, alpha-frog, beta-mouse: != erroneously inferred by a mixture of paralogous & orthologous sequences true organismal relationships

Alignment

Alignment

Alignment formats

Alignment sequence alignment = primary homology assessment conjecture, before data analysis, that similarity among certain characters and character states may be evidence for evolutionary groupings of the taxa an assessment of ‘essential sameness’ without reference to an ancestor tree building = secondary homology assessment recognition of congruence among the primary homologies as a result of a treebuilding analysis of the data—the shared derived character states (synapomorphies) on the phylogenetic tree represent homologies an assessment of ‘congruence’ that explains the sameness as resulting from common ancestry

Issues

Overview: treebuilding methods

Data types: discrete characters vs. distances

Maximum parsimony parsimony principle most parsimonious explanation should be preferred to others Ockham‘s razor parsimony analysis ~= cladistics the oldest phyl. reconstr. method still widely used because principle/optimality criterion easy to grasp quick without convincing alternatives for morphological data

Parsimony-informative characters

Parsimony-informative characters >= 2 states, >= 2 of which occur in >=2 taxa

Character conflict character state distributions of different characters do not all support the same topology example for the following tree:

Character conflict: example

Character conflict: example

Character conflict: example suggested topology No 1: 10 steps

Character conflict: example suggested topology No 2: 9 steps

The shortest tree length of tree = #steps = #character state transitions out of the two trees tried, 9 is the best score trying all other out of the 15 possible trees shows: there are more that require 9, but non with fewer steps

Consensus trees strict loose (semi-strict) majority rule Adams . supertrees

Consensus trees strict/semistrict 50% Majority-rule Adams

Types of parsimony different types of parsimony Wagner parsimony binary or ordered multistate characters i.e, states are on a scale and changes occur via intermediate steps only free reversibel, 01 = 10 Fitch parsimony binary or unordered multistate characters not on a scale, no intermediate steps required: cost(02)= cost(01) free reversibel Dollo parsimony polarity: ancestral/primitive vs. derived states defined a priori derived states may occur only once on a tree, but can undergo >1 reversals Generalized parsimony cost (step) matrices define state transition costs

Character types

Character types

Weighted parsimony differential weight given to each character relative importance of upweighted characters grows other tree length results: different trees may be preferred now successive weighting problematic risk of circularity

Calculating tree length initialize L=0; root arbitrarily (root taxon); postorder traversal: assign each node i a state set Si and adjust L: the set for terminal nodes contains the state of this leaf cycle through all internal nodes k for which Sk has not yet been defined, unlike for its child nodes m and n; if if k is parent node to root node, stop; if state at root node not in current Sk,

Calculating tree length Example: tax1 T tax2 G tax3 T tax4 A tax5 A 1 1) + 1 tax6 A L:

Calculating tree length Example: tax1 T tax2 G tax3 T tax4 A tax5 A 2 1 1) + 1 tax6 A L: 2) + 0

Calculating tree length Example: tax1 T tax2 G tax3 T tax4 A tax5 A 2 1 1) + 1 tax6 A 3 3) + 0 L: 2) + 0

Calculating tree length Example: tax1 T tax2 G tax3 T tax4 A tax5 A 2 4 4) + 1 1 1) + 1 tax6 A 3 3) + 0 L: = 2 2) + 0

Tree searching: exhaustive search branch addition algorithm

Branch and bound Lmin=L(random tree) „search tree“ as in branch addition at each level, if L < Lmin go back one level to try another path if at last level, Lmin=L and go back to first level unless all paths have been tried already

Heuristic searches stepwise addition as branch addition, but on each level only the path that follows the shortest tree at this level is searched best

Star decomposition

Branch swapping NNI: nearest neighbour interchanges SPR: subtree pruning and regrafting TBR: tree bisection and reconnection

Tree inference with many terminals general problem of getting trapped in local optima searches under parsimony: parsimony ratchet searches under likelihood: estimation of substitution model parameters branch lengths topology

Parsimony ratchet generate start tree TBR on this and the original matrix perturbe characters by randomly upweighting 5-25%. TBR on best tree found under 2). Go to 2) [200+ times] once more TBR on current best tree & original matrix get best trees from those collected in steps 2) and 4)

Bootstrapping estimates properties of an estimator (such as its variance) by constructing a number of resamples of the observed dataset (and of equal size to the observed dataset), each of which is obtained by random sampling with replacement from the original dataset

Bootstrapping variants FWR (Frequencies within replicates) SC (strict consensus)

Bootstrapping

Bremer support / decay Bremer support (decay analysis) is the number of extra steps needed to "collapse" a branch. searches under reverse constraints: keep trees only that do NOT contain a given node Takes longer than bootstrapping: parsimony ratchet beneficial (~20 iterations)