A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.

Slides:



Advertisements
Similar presentations
An Algorithm for Constructing Parsimonious Hybridization Networks with Multiple Phylogenetic Trees Yufeng Wu Dept. of Computer Science & Engineering University.
Advertisements

Tree Building What is a tree ? How to build a tree ? Cladograms Trees
CS 598AGB What simulations can tell us. Questions that simulations cannot answer Simulations are on finite data. Some questions (e.g., whether a method.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Phylogenetic reconstruction
Molecular Evolution Revised 29/12/06
T HE P ROBLEM OF R ECONSTRUCTING K - ARTICULATED P HYLOGENETIC N ETWORK Supervisor : Dr. Yiu Siu Ming Second Examiner : Professor Francis Y.L. Chin Student.
Close Lower and Upper Bounds for the Minimum Reticulate Network of Multiple Phylogenetic Trees Yufeng Wu Dept. of Computer Science & Engineering University.
The (Supertree) of Life: Procedures, Problems, and Prospects Presented by Usman Roshan.
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
Inferring Evolutionary History with Network Models in Population Genomics: Challenges and Progress Yufeng Wu Dept. of Computer Science and Engineering.
Fast Computation of the Exact Hybridization Number of Two Phylogenetic Trees Yufeng Wu and Jiayin Wang Department of Computer Science and Engineering University.
BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.
Probabilistic methods for phylogenetic trees (Part 2)
Lecture 13 – Performance of Methods Folks often use the term “reliability” without a very clear definition of what it is. Methods of assessing performance.
CIS786, Lecture 4 Usman Roshan.
Phylogenetic trees Sushmita Roy BMI/CS 576
Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow.
Computational and mathematical challenges involved in very large-scale phylogenetics Tandy Warnow The University of Texas at Austin.
Phylogeny Estimation: Why It Is "Hard", and How to Design Methods with Good Performance Tandy Warnow Department of Computer Sciences University of Texas.
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Molecular phylogenetics
Computer Science Research for The Tree of Life Tandy Warnow Department of Computer Sciences University of Texas at Austin.
Rec-I-DCM3: A Fast Algorithmic Technique for Reconstructing Large Evolutionary Trees Usman Roshan Department of Computer Science New Jersey Institute of.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
Reading Phylogenetic Trees
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Gene tree discordance and multi-species coalescent models Noah Rosenberg December 21, 2007 James Degnan Randa Tao David Bryant Mike DeGiorgio.
Statistical stuff: models, methods, and performance issues CS 394C September 16, 2013.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Understanding sets of trees CS 394C September 10, 2009.
Subtree Prune Regraft & Horizontal Gene Transfer or Recombination.
Algorithmic research in phylogeny reconstruction Tandy Warnow The University of Texas at Austin.
Phylogenetic Trees - Parsimony Tutorial #13
Algorithms research Tandy Warnow UT-Austin. “Algorithms group” UT-Austin: Warnow, Hunt UCB: Rao, Karp, Papadimitriou, Russell, Myers UCSD: Huelsenbeck.
GRAPPA: Large-scale whole genome phylogenies based upon gene order evolution Tandy Warnow, UT-Austin Department of Computer Sciences Institute for Cellular.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Using Divide-and-Conquer to Construct the Tree of Life Tandy Warnow University of Illinois at Urbana-Champaign.
598AGB Basics Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT.
SupreFine, a new supertree method Shel Swenson September 17th 2009.
The Big Issues in Phylogenetic Reconstruction Randy Linder Integrative Biology, University of Texas
Problems with large-scale phylogeny Tandy Warnow, UT-Austin Department of Computer Sciences Center for Computational Biology and Bioinformatics.
Why use phylogenetic networks?
CS 598 AGB Supertrees Tandy Warnow. Today’s Material Supertree construction: given set of trees on subsets of S (the full set of taxa), construct tree.
CS 395T: Computational phylogenetics January 18, 2006 Tandy Warnow.
Statistical stuff: models, methods, and performance issues CS 394C September 3, 2009.
Iterative-DCM3: A Fast Algorithmic Technique for Reconstructing Large Phylogenetic Trees Usman Roshan and Tandy Warnow U. of Texas at Austin Bernard Moret.
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
394C: Algorithms for Computational Biology Tandy Warnow Jan 25, 2012.
New Approaches for Inferring the Tree of Life
394C, Spring 2012 Jan 23, 2012 Tandy Warnow.
Distance based phylogenetics
Multiple Sequence Alignment Methods
Challenges in constructing very large evolutionary trees
BNFO 602 Phylogenetics Usman Roshan.
BNFO 602 Phylogenetics – maximum parsimony
CS 581 Tandy Warnow.
CS 581 Tandy Warnow.
Tandy Warnow Department of Computer Sciences
CS 394C: Computational Biology Algorithms
September 1, 2009 Tandy Warnow
Algorithms for Inferring the Tree of Life
Tandy Warnow The University of Texas at Austin
Presentation transcript:

A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin

Who’s Involved –UT CS: Tandy Warnow, Luay Nakhleh –UT BIO: Randy Linder –UNM CS: Bernard Moret

Why Networks? Lateral gene transfer (LGT) –Ochman estimated that 755 of 4,288 ORF’s in E.coli were from at least 234 LGT events Hybridization –Estimates that as many as 30% of all plant lineages are the products of hybridization –Fish –Some frogs

Phylogenetic Networks Rooted, directed, acyclic graphs that actually model the evolutionary process “tree” nodes and “network” nodes Time constraints

Separate Analysis Analyze individual genes separately Reconcile the resulting phylogenies As opposed to combined analysis in which the datasets are combined (via concatenation) and the combined dataset is then analyzed

Wayne Maddison’s Observation “What is needed is a method that counts the minimal number of branch moves needed to convert one tree into another, where branch moves are restricted so as not to violate a linear order.” Syst. Biol., 46(3): , 1997.

Species Networks ABCDE

Gene Tree I in Species Networks ABCDE ABCDE

Gene Tree II in Species Networks ABCDE ABCDEABCDE

The SPR Operation SPR: Subtree Prune and Regraft Prune a subtree in tree T1 and regraft to another edge (by the same root), thus obtaining another tree T2 The SPR-Distance between two trees T1 and T2 is the minimum number of SPR moves required to transfer T1 to T2

SPR Distances Among Gene Trees ABCDE ABCDEABCDE SPR Distance 1

Maddison’s Method Given two gene datasets Construct two gene trees T1 and T2 If SPR(T1,T2)=0 –Return a tree If SPR(T1,T2)=1 –Return a network with one reticulation event Open problem: extend to reconstructing a network with m reticulation events

Challenges (1) Computational –Computing SPR distances is of unknown computational complexity (probably hard)

Solving the Computational Challenge Galled-networks: reticulation events are independent For two gene trees T1 and T2 on n leaves we can –Decide whether SPR(T1,T2)=m in O(mn) time, and –Construct network N from T1 and T2 in O(mn) time

Challenges (2) Systematic –Obtaining the correct gene trees in practice is very hard (due to missing data, inaccuracy of tree reconstruction methods, wrong assumptions, etc.)

Solving the Systematic Challenge: Our Method SpNet Given the sequences of two genes I & II on a set of species Run MP or ML on gene I and obtain a set U1 of trees, represented by its consensus tree t1 Run MP or ML on gene II and obtain a set U2 of trees, represented by its consensus tree t2 Find binary trees T1 and T2, that refine t1 and t2, respectively, and such that SPR(T1,T2)=1 Build network N from T1 and T2

SpNet: Running Time We have a linear-time algorithm for the single hybrid case (implementation and experimental results are available as well) We are working on the general case of arbitrary number of reticulation events

Experimental Study Generated random networks on 10 and 20 taxa, with 0, 1, and 2 hybrids Evolved sequences under the GTR+Gamma model of evolution with invariant sites We studies the topological accuracy based on the splits defined by the model and inferred network

Evaluation Criteria Detection Quality –How often did the method infer the correct number of hybrids in the model phylogeny? Reconstruction Quality –What is the topological accuracy of the inferred phylogeny?

Methods SpNet(i): Our method where we contract i edges NNet: The method of Bryant and Moulton NJ

Detection Quality of SpNet Model Phylogeny: 20-taxon Tree

Detection Quality of SpNet Model Phylogeny: 20-taxon 1-hybrid network

Detection Quality of SpNet Model Phylogeny: 20-taxon 2-hybrid network

Reconstruction Quality Model Phylogeny: 20-taxon tree

Reconstruction Quality Model Phylogeny: 20-taxon 1-hybrid network

Conclusions Considering a set of “good” trees rather than a single optimal tree is advantageous in network reconstruction Separate analysis approaches outperform combined analysis approaches

Ongoing research Using other techniques for obtaining unresolved trees (e.g., Bayesian analyses, bootstrapping, etc.) Detection vs. reconstruction – visualization and clustering techniques may also be useful (collaboration with St John) Refining unresolved networks DCM-like network reconstruction