Imputing Supertrees and Supernetworks from Quartets

Slides:



Advertisements
Similar presentations
Problems and Their Classes
Advertisements

Chapter 5 Multiple Linear Regression
A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.
Determinization of Büchi Automata
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
The (Supertree) of Life: Procedures, Problems, and Prospects Presented by Usman Roshan.
1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006.
Incomplete Block Designs
Perfect Phylogeny MLE for Phylogeny Lecture 14
Induction and recursion
Fixed Parameter Complexity Algorithms and Networks.
SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.
Incomplete Directed Perfect Phylogeny Itsik Pe'er, Tal Pupko, Ron Shamir, and Roded Sharan SIAM Journal on Computing Volume 33, Number 3, pp
The bootstrap, consenus-trees, and super-trees Phylogenetics Workhop, August 2006 Barbara Holland.
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Gene tree discordance and multi-species coalescent models Noah Rosenberg December 21, 2007 James Degnan Randa Tao David Bryant Mike DeGiorgio.
Inference rules for supernetwork construction Katharina Huber, School of Computing Sciences, University of East Anglia.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Using Divide-and-Conquer to Construct the Tree of Life Tandy Warnow University of Illinois at Urbana-Champaign.
A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,
598AGB Basics Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT.
SupreFine, a new supertree method Shel Swenson September 17th 2009.
Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)
CS 598 AGB Supertrees Tandy Warnow. Today’s Material Supertree construction: given set of trees on subsets of S (the full set of taxa), construct tree.
Statistical stuff: models, methods, and performance issues CS 394C September 3, 2009.
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
Errol Lloyd Design and Analysis of Algorithms Approximation Algorithms for NP-complete Problems Bin Packing Networks.
Section Recursion 2  Recursion – defining an object (or function, algorithm, etc.) in terms of itself.  Recursion can be used to define sequences.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
394C: Algorithms for Computational Biology Tandy Warnow Jan 25, 2012.
Virtual University of Pakistan
Machine Learning: Ensemble Methods
Distance-based phylogeny estimation
Topic 8: Sampling Distributions
Topic 2: binary Trees COMP2003J: Data Structures and Algorithms 2
Probabilistic Algorithms
394C, Spring 2012 Jan 23, 2012 Tandy Warnow.
Distance based phylogenetics
CS 9633 Machine Learning Inductive-Analytical Methods
Multiple Sequence Alignment Methods
Heap Sort Example Qamar Abbas.
Chapter 5. Optimal Matchings
Slide 1: Thank you Elizabeth for the introduction, and hello everybody. So, I have been a PhD student with Charles Semple and Mike Steel at the UoC since.
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
1.3 Modeling with exponentially many constr.
Interval Estimation.
Elementary Statistics
of the Artificial Neural Networks.
Genome Evolution: Horizontal Movements in the Fungi
Technion – Israel Institute of Technology
3.4 Push-Relabel(Preflow-Push) Maximum Flow Alg.
Genome Evolution: Horizontal Movements in the Fungi
Computer Vision Chapter 4
Speaker: Chuang-Chieh Lin National Chung Cheng University
Sampling Distributions
CS 581 Tandy Warnow.
Phylogeny.
CS 394C: Computational Biology Algorithms
September 1, 2009 Tandy Warnow
Algorithms for Inferring the Tree of Life
16. Mean Square Estimation
David Kauchak CS158 – Spring 2019
Consensus Trees.
Fragment Assembly 7/30/2019.
Presentation transcript:

Imputing Supertrees and Supernetworks from Quartets By B. Holland, G. Conner, K. Huber, and V. Moulton Presented by Razieh Nokhbeh Zaeem

This talk Basic problem: constructing an estimate of a species phylogeny (in this case, network) from a given set of gene trees Input: a set of partial gene trees (not all taxa) Output: a supernetwork, allowing the conflicting signals Algorithm by Holland et al. combines quartet-imputation with consensus network construction Experiments comparing the new method to previous method Z-closure and to MRP with respect to “False Positives”, “False Negatives”. Q-imputation provides a useful complementary tool

Q-imputation Some definitions: L(T), T|Z, Q(T) and Let … : collection of input trees corresponding to a collection of gene trees. Put For each tree , we sequentially insert all of the taxa in into to get Once we get all s, we apply consensus network method to obtain a network

Polynomial time alg: For each For each new taxon y: Find a place to add a pendant edge labeled by y We are trying to choose place p s.t. it maximizes the # of agreed quartets between and all other s Choose randomly if there is more than one place to add y to get the best score If the max score is 0 we don’t have enough information

An example – insert F into FB|AD FB|AE FB|DE FA|DE FA|CE FB|AC FB|AE FB|CE FD|BC

The consensus network For example: The consensus network (the split network): Those splits of X that are displayed by more than a certain proportion, t, of the trees computed by Q-imputation In case t = 0 we drop the subscript t: splits which appear at least once For example: If t = 100, then the consensus network is a strict-consensus tree If t = 50, then the consensus network is the majority-rule consensus tree If t < 50, then the consensus network may display conflicting splits

Simulation Three different types of input: (3 types of simulations) Evolution is tree like. Gene trees are correct, but miss taxa Evolution is tree like. Gene trees have errors and miss taxa Evolution is not tree like. Random input trees. In each simulation, three parameters were varied: The species tree, either The completely balanced tree on 16 taxa or The completely unbalanced tree on 16 taxa g taking values 2, 4, 8, 16, and 32 m (The number of taxa missing) taking values 1, 2, 3, 4, 5, and 6, deleted randomly One hundred repetitions were carried out for each parameter combination.

Simulation The split systems generated were: Measuring FP and FN MRP: and , the splits in the majority-rule consensus and strict consensus from MRP. Q-imputation: , and Z-closure: the splits generated using Z-closure Measuring FP and FN FP: splits contained in the output split system that are not in the input FN: splits in input that are not in the output split system

WIP Z-closure satisfies WIP Definition: weak induction property (WIP): For input trees … any split S in should restrict to a split in for some The WIP holds for all splits in in case input trees are all subtrees of a phylogenetic tree. There are examples where WIP does not hold, although very few generated by Q-imputation. Z-closure satisfies WIP Any method with WIP property cannot generate FP: Every split in output has come from some tree in the input set, so there is not split which appears in output but not input. Q-imputation with t=0 cannot produce FN

Simulation results: FP Z-closure cannot generate FP, so we just look at splits in Q-imputation and MRP. 6000 different settings for each type of simulation. Normalized numbers in parenthesis. Each tree on 16 taxa, 13 internal edges. Type Method Simulation 1 Simulation 2 36 (0.006) 35 (0.006) 87 (0.015) 46 (0.008) Simulation 3 56 (0.009) 52 (0.009) 5252(0.875) 4368(0.728)

Simulation 1 results: FN, normalized, % Z-closure Q-imoutaion20 MRP50 g m 1 2 3 4 5 6 (1b) 0.01 0.17 0.30 0.51 0.63 0.92 0.00 0.06 0.05 0.13 0.32 0.41 8 0.02 16 32 (1u) 0.23 0.44 0.78 0.07 0.15 0.26 0.03

Simulation 2 results: FN, normalized, % Z-closure Q-imoutaion20 MRP50 g m 1 2 3 4 5 6 (2b) 0.04 0.16 0.27 0.48 0.65 0.77 0.00 0.61 0.54 0.34 0.30 0.17 0.07 0.05 0.14 0.10 1.67 1.45 1.40 1.16 1.04 0.81 8 0.01 2.89 2.59 2.49 2.06 1.81 1.42 3.30 3.01 2.81 2.47 2.23 1.91 16 6.49 6.00 5.32 4.96 4.42 3.62 6.56 6.02 5.38 5.03 4.45 3.77 32 13.13 12.16 11.15 9.83 8.66 7.61 12.19 9.84 8.67 7.62 (2u) 0.37 0.59 0.58 0.70 0.92 0.84 0.41 0.22 0.08 0.28 0.40 0.44 0.46 2.37 2.09 1.38 1.11 0.89 0.23 0.18 0.13 0.09 0.15 3.78 3.33 2.86 2.33 1.90 1.52 4.46 3.98 3.29 2.42 1.97 8.97 7.53 6.69 5.62 4.71 3.86 9.04 7.64 6.74 5.69 4.82 3.97 18.09 15.50 13.94 11.56 9.59 7.98 18.10 15.52 13.95 9.62 8.05

Simulation 3 results: FN, normalized, % Z-closure Q-imoutaion20 MRP50 g m 1 2 3 4 5 6 (3) 0.48 0.88 0.80 0.96 0.82 0.67 0.00 2.15 1.87 1.34 0.99 0.56 0.18 0.07 0.23 0.31 0.41 0.66 5.57 4.92 4.27 3.61 2.96 2.30 8 0.01 0.08 11.38 10.09 9.02 7.64 6.53 5.21 11.95 10.76 9.72 8.34 7.26 5.95 16 25.36 22.89 20.42 18.36 16.09 13.90 24.61 22.45 20.22 17.98 15.74 13.41 32 51.85 46.86 42.29 37.80 33.50 29.21 50.05 45.77 41.52 37.16 32.74 28.31

Discussion on simulation results By increasing the # of gene trees: FN produced by Z-closure reduces (good) FN produced by Q-imputation increases (bad) As a supertree method (simulation 1 & 2), Q-imputation tended to return fewer FP (unsupported) splits, but also fewer supported splits (more FN (?)) than MRP As a supernetwork method, Q-imputation tended to give rise to FP but not FN(?), whereas Z-closure gave rise to FN but no FP Also, in simulations where there was an underlying species tree, while increasing number of gene trees: For Z-closure the number of FN increased (?) For the split system derived from applying a threshold to the trees completed by Q ‑ imputation, the number of FN had the desirable property of decreasing (?) For the output to be visually palatable, we need to have some FN to restrict the number of splits that are being displayed. Q-imputation: a natural means to filter out splits. Look at case study.

Case study 7 genes, 45 taxa Z-closure Q-imputation