Presentation is loading. Please wait.

Presentation is loading. Please wait.

CNRS - Université Montpellier 2 France 1 Phylogenetic Signal with Induction and non-Contradiction: the PhySIC method for building supertrees

Similar presentations


Presentation on theme: "CNRS - Université Montpellier 2 France 1 Phylogenetic Signal with Induction and non-Contradiction: the PhySIC method for building supertrees"— Presentation transcript:

1 CNRS - Université Montpellier 2 France 1 Phylogenetic Signal with Induction and non-Contradiction: the PhySIC method for building supertrees http:/atgc.lirmm.fr/SuperTree/PhySIC Vincent Berry 1, V. Ranwez 2, A. Criscuolo 1,2, P.-H. Fabre 2, S. Guillemot 1, C. Scornavacca 1,2, E.J.P. Douzery 2 Funded by ACI IMPBIO & BIOSTIC LR 1 2

2 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 2 Introduction: use of supertrees Supertrees are useful for  producing well-resolved large phylogenies to provide a framework for broad comparative studies (Gittleman et al 2004)  Quantitative studies of input-tree congruence, identifying outlier taxa by tree-supertree distance measures (Willkinson et al 2004)  Exploring and identifying agreement and disagreement among sets of input trees. The aim is then to reveal conflicts rather than resolving them. Conflict are ultimately resolved from additional data or analyses (Willkinson et al 2001)  Identifying where limited overlap between the leaf sets of the input trees is an obstacle in their amalgamation, thereby guiding further research (Sanderson et al 1996, Arné et al 2007).

3 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 3 Introduction : dealing with conflicts Dealing with topological contradictions (“conflicts”) among source trees :  Voting methods (MRP,MMC,CLANN,…) resolve conflicts based on a voting procedure (optimization approach)  Veto methods (Strict Consensus, Build,SMAST) : do not favor any resolution in case of conflict (consensus approach) DCBADCBA CBDACBDA

4 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 4 Veto methods  Proceed from an axiomatic approach: proposed supertrees satisfy specified theoretical properties  G oal: obtain a reliable, if incomplete, picture of how the source trees fit together  Motivation:  Full congruence with the source trees can be necessary for further applications such as phylogeography, divergence time estimations, etc.  Avoid as much as possible the inference of non-supported novel clades, unlike in some existing voting methods

5 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 5 Overview  Some relevant properties for reliable inference  Decomposition of a tree into triplets  Identifying a tree  Property of Induction (PI)  Property of non-Contradiction (PC)  Algorithms (sketch)  BUILD - Aho  PhySIC PC  PhySIC PI  Biological case study: Primate supertree  Conclusion & prospects

6 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 6 Axiomatic approach: important properties Police investigationSuperTree The inspectorThe superTree method The witnessesThe source trees The testimoniesPhylogenetic information contained within source trees Reliable facts are those that can be induced from testimonies and that are not incompatible with any other. Deducing the true story Pointing out contradictions in the testimonies Deducing new facts by cross-checking

7 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 7 Decomposition of trees in building stones dcbadcba cdbecdbe T1T1 T2T2 dcadca dbadba tr(T 1 ) dcbdcb cbacba bc|dac|d ab|dab|c ed|ceb|deb|c tr(T 2 ) bd|c ac|d Triplets (rooted triples): subtrees on 3 taxa

8 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 8 Properties of interest: identification  A tree T displays a set R of triplets  iff R  tr(T)  In such a case R is said to be compatible : all triplets of R can be combined into a tree dcbadcba cbacba dcbdcb bc|d ab|c T ab|c ab|d R’ does not identify T R identifies T  R identifies T iff  T displays R  AND every tree T’ displaying R contains all the clades of T cdbacdba X

9 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 9 d  R identifies T yet R does not contain all triples of tr(T): additional triples are induced by those present in R dcbdcb bc|d ab|c ab|d and ac|d are induced cbacba T cbacba R Properties of interest: identification

10 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 10  We want to infer reliable supertrees: not making arbitrary inferences Relevant properties: induction (PI) we only accept supertrees T such that tr(T) is present in the data R or induced by hypotheses in R PI dcbadcba ab|c ab|d ac|d? cd|b? cbacba dbadba R dcbadcba ab|c ab|d ac|d? bc|d? dcbadcba ab|c ab|d

11 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 11 Focusing on a coherent subset of hypotheses R ab|c bc|d ab|d ac|d ad|c bd|c dcbadcba cdbacdba Supertree method ? R identifies T T  There is no chance that practical data exactly identifies a (super)tree:  Lack of overlap between the source trees: missing data  Errors due to gene specific evolution, systematic errors in the source tree inference (long branch attraction, estimated model of evolution) find a subset R’ of R identifying a tree (ie, a subtree of the underlying tree) However, there is a chance that part of the underlying “correct” tree appears uncorrupted in the data:

12 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 12 Relevant properties: non-contradiction we reject subsets R’ obtained by keeping xy|z and removing xz|y. ab|c ab|d bc|d ac|d bd|c ad|c dcbadcba T R’ identifies T R’  R We focus on R(T), the triplets of R resolved by T  We search for a subset of R identifying a tree T  But we want to be reliable: no clade contradicted by the data we don’t accept hypotheses that are in direct contradiction with discarded hypotheses PC

13 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 13 Link between the properties:  R(T) identifies T is equivalent to  T satisfies PC: (property of non-contradiction) for any triplet ab|c displayed by T, R(T) induces neither bc|a nor ac|b and  T satisfies PI: (property of induction) every triplet ab|c displayed by T is induced by R(T)  Given a supertree T and a collection of source trees, PI and PC can be checked in polynomial time.  A given supertree can be modified in polynomial time so that it verifies PI and PC.  Why not designing a supertree method proposing supertrees satisfying PI and PC from the start : the PhySIC method (Phylogenetic Signal with Induction and non-Contradiction)

14 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 14 Overview  Relevant properties for a veto method (reliable facts)  Decomposition of a tree into triplets  Tree identification  Property of Induction (PI)  Property of non-Contradiction (PC)  Algorithms (sketch)  BUILD - Aho  PhySIC PC  PhySIC PI  Biological case study: Primate supertree  Conclusion & prospects

15 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 15 Algorithmic ideas: BUILD (Aho et al 81) a b c d d {a,b,c} a b c c {a,b} a b abab cbacba dcbdcb bc|d ab|c dcbadcba R

16 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 16 Algorithmic ideas: limits of BUILD dcbadcba cdbacdba R2R2 bc|d bd|c ac|d ad|c ab|c ab|d a b c d dcbadcba dbcadbca R1R1 ab|c ac|b bc|d ab|d ac|d a b c d d {a,b,c} a b c dcbadcba  Returns a tree only when R is compatible.

17 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 17 Algorithmic ideas: PhySIC PC dcbadcba cdbacdba R bc|d bd|c ac|d ad|c ab|c ab|d a b c d R’ bc|d bd|c ac|d ad|c ab|c ab|d d a b c cdbacdba  At each iteration, if there is a single connected component  Check if using R’ leads to several connected components  If so, check that the tree will satisfy PC w.r.t. R.  Or else, propose a multifurcation on those taxa  We thus obtain a more resolved tree satisfying PC: contradictions affecting basal clades do not always imped deeper clades to be obtained Idea: temporarily forget the direct contradictions

18 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 18 Algorithmic ideas: limits of BUILD (2) R ab|c ef|c cbacba a b c e f {a,b} c {e,f} cfecfe  When the graph contains several connected components, it is necessary to check that the triplets we are about to create are really induced by R  Branches that create triplets not induced by R are collapsed (use graph algorithms) ef|a ?? abcefabcef

19 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 19 Algorithmic ideas - a summary  A supertree draft is proposed by PhySIC PC ensuring PC  If a clade is not « strong enough » the corresponding branch is collapsed by PhySIC PI ensuring also PI  Physic is a polynomial-time supertree method: 1. Decomposition of the input forest into triplets O(kn 3 ) 2. Creation of a tree satisfying PC O(n 4 ) 3. Collapsing edges displaying triplets not induced by the source trees: O(n 4 ) the algorithm requires O(kn 3 +n 4 ) computing time

20 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 20 Overview  Relevant properties for a veto method  Decomposition of a tree into triplets  Tree identification  Property of Induction (PI)  Property of non-Contradiction (PC)  Algorithms (intuitive presentation)  BUILD Aho  PhySIC PC  PhySIC PI  Biological case study: Primate supertree  Conclusion & prospects

21 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 21 Primate case study: source trees  ADRA2B and IRBP study (Poux et al. 04, 06)  SINEs (Roos et al. 04)  Branches with bootstrap support <50% are collapsed Anthropoids

22 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 22 Primate case study: PC & PI in action ADRA2B IRBP Platyrrhines are unresolved due to a conflict (PC) PhySIC PC PhySIC Arbitrary resolution among Anthropoids is removed (PI) Source trees

23 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 23 Labels indicating source of problems  PhySIC can tell the reason for multifurcations proposed:  Lack of overlap or information in the source trees (i)  Local contradictions between the source trees (c) this guides correction/completion of source trees and primary data

24 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 24 Pointing out “problems” in other supertrees  eg, MRP is known to have some indesirable features:  inferring “novel clades” not supported by any input tree (Bininda-Emonds & Bryant 98, Goloboff & Pol 01, Goloboff 05)  being affected by a size-bias, i.e. when two trees conflict on the resolution of a clade, the tree with the smallest local sampling is ignored (Purvis 95, Bininda-Emonds & Bryant 98, Goloboff 05)  favoring source tree that are more unbalanced (Wilkinson et al 01)  A supertree already built from a collection of source trees by an usual supertree method, can be reanalyzed in the light of PI & PC to identify problems on some dubious nodes.

25 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 25 Primate case study: MRP tree analyzed ADRA2B IRBP Source trees MRP supertree 1 1 2 PC filtered MRP supertree

26 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 26 Online server: http://atgc.lirmm.fr/SuperTree/PhySIC Contact: Vincent.Lefort@lirmm.fr

27 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 27 Conclusion & prospects appearing in the november issue of Syst.Biol.  PI and PC properties  PhySIC method ( http://atgc.lirmm.fr/SuperTree/PhySIC )  Supertrees satisfying PI and PC (exact) and as much resolved as possible (heuristics)  Proposes very reliable supertrees: identified by the data (low type-I err)  Polynomial-time method  Localization of conflicts and areas with insufficient overlap  Enables to check/correct supertrees built by other methods (MRP, …).  Further developments:  Producing more resolved trees satisfying PC et PI  Filtering triplets based on their frequencies  Coupling with a database (TreeBase, …)

28 PhySIC: Phylogenetic Signal with Induction and non-Contradiction 28 Thanks Emmanuel Douzery Vincent Ranwez Alexis Criscuolo Sylvain Guillemot Pierre-Henri Fabre Celine Scornavacca Vincent Lefort Equipe Méth. et Algor. pour la bioinf. LIRMM Equipe Phylogénie Moléculaire ISEM


Download ppt "CNRS - Université Montpellier 2 France 1 Phylogenetic Signal with Induction and non-Contradiction: the PhySIC method for building supertrees"

Similar presentations


Ads by Google