The (Supertree) of Life: Procedures, Problems, and Prospects Presented by Usman Roshan.

Slides:



Advertisements
Similar presentations
1 Modified Mincut Supertrees Roderic Page University of Glasgow.
Advertisements

CONSENSUS “general or widespread agreement” Consensus tree – a tree depicting agreement among a set of treesConsensus tree – a tree depicting agreement.
An introduction to maximum parsimony and compatibility
A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
PHYLOGENETIC TREES Bulent Moller CSE March 2004.
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Molecular Evolution Revised 29/12/06
D. Gusfield, V. Bansal (Recomb 2005) A Fundamental Decomposition Theory for Phylogenetic Networks and Incompatible Characters.
Bioinformatics Algorithms and Data Structures
CIS786, Lecture 3 Usman Roshan.
Data Structures – LECTURE 10 Huffman coding
BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
CIS786, Lecture 4 Usman Roshan.
1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.
Supertrees: Algorithms and Databases Roderic Page University of Glasgow DIMACS Working Group Meeting on Mathematical and Computational.
Phylogenetic trees Sushmita Roy BMI/CS 576
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Molecular phylogenetics
Maximum Parsimony Input: Set S of n aligned sequences of length k Output: –A phylogenetic tree T leaf-labeled by sequences in S –additional sequences of.
Fixed Parameter Complexity Algorithms and Networks.
Barking Up the Wrong Treelength Kevin Liu, Serita Nelesen, Sindhu Raghavan, C. Randal Linder, and Tandy Warnow IEEE TCCB 2009.
SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.
Computer Science Research for The Tree of Life Tandy Warnow Department of Computer Sciences University of Texas at Austin.
PHYLOGENETIC TREES Dwyane George February 24,
Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Phylogenetics II.
Incomplete Directed Perfect Phylogeny Itsik Pe'er, Tal Pupko, Ron Shamir, and Roded Sharan SIAM Journal on Computing Volume 33, Number 3, pp
Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.
Introduction to Phylogenetic Trees
Introduction to Phylogenetics
Benjamin Loyle 2004 Cse 397 Solving Phylogenetic Trees Benjamin Loyle March 16, 2004 Cse 397 : Intro to MBIO.
The bootstrap, consenus-trees, and super-trees Phylogenetics Workhop, August 2006 Barbara Holland.
Discrete Structures Lecture 12: Trees Ji Yanyan United International College Thanks to Professor Michael Hvidsten.
394C: Algorithms for Computational Biology Tandy Warnow Sept 9, 2013.
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Trees – Chapter 9 Slides courtesy of Dr. Michael P. Frank University of Florida Dept. of Computer & Information Science & Engineering.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Understanding sets of trees CS 394C September 10, 2009.
Phylogenetic Trees - Parsimony Tutorial #13
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
Tree isomorphism Bogdan Kalashnikov FI-2
598AGB Basics Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT.
SupreFine, a new supertree method Shel Swenson September 17th 2009.
CS 598 AGB Supertrees Tandy Warnow. Today’s Material Supertree construction: given set of trees on subsets of S (the full set of taxa), construct tree.
CS 395T: Computational phylogenetics January 18, 2006 Tandy Warnow.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Statistical stuff: models, methods, and performance issues CS 394C September 3, 2009.
Iterative-DCM3: A Fast Algorithmic Technique for Reconstructing Large Phylogenetic Trees Usman Roshan and Tandy Warnow U. of Texas at Austin Bernard Moret.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
Chapter AGB. Today’s material Maximum Parsimony Fixed tree versions (solvable in polynomial time using dynamic programming) Optimal tree search.
394C: Algorithms for Computational Biology Tandy Warnow Jan 25, 2012.
394C, Spring 2012 Jan 23, 2012 Tandy Warnow.
Character-Based Phylogeny Reconstruction
BNFO 602 Phylogenetics Usman Roshan.
BNFO 602 Phylogenetics – maximum parsimony
CS 581 Tandy Warnow.
Speaker: Chuang-Chieh Lin National Chung Cheng University
Tandy Warnow Department of Computer Sciences
CS 581 Tandy Warnow.
CS 394C: Computational Biology Algorithms
September 1, 2009 Tandy Warnow
Algorithms for Inferring the Tree of Life
Imputing Supertrees and Supernetworks from Quartets
Presentation transcript:

The (Supertree) of Life: Procedures, Problems, and Prospects Presented by Usman Roshan

Supertree Methods Input: Set of trees Output: Tree leaf-labeled by where is the set of leaves of. Why supertree methods?

Motivation (1) Supertree methods are used as part of divide-and-conquer method to solve NP- hard problems on large datasets

Motivation (2) Supertree methods are used when we have missing data

Types of supertree methods (1) Direct methods (e.g. strict consensus supertrees, MinCutSupertrees)

Types of supertree methods (2) Indirect methods (e.g. MRP, average consensus)

Types of supertree methods (3) (MRP)

Definitions Contraction: Restriction: If then contains

Optimization problems Subtree Compatibility: Given set of trees,does there exist tree,such that, (we say contains ). NP-hard (Steel 1992) Special cases are poly-time (rooted trees, DCM) MRP: also NP-hard

Limitations of supertree methods Three desirable properties: P1: Method can be applied to any unordered set of input trees P2: Renaming the species does not change the constructed supertree P3: If the input trees are compatible, then the output tree is one of the “parent trees”. There is no supertree method that can satisfy P1-P3 when the input trees are unrooted; however, for rooted trees an extension of BUILD satisfies P1-P3.

Rooted subtrees (BUILD) (Aho et al 1981) Input: Set of rooted trees Output: Tree that contains

BUILD (2) - Definitions Cluster: Set of taxa in a rooted subtree A different representation of rooted phylogenetic trees Let C(T) be the clusters of tree T. In this example C(T) = {{1,2}, {3,4}, {1,2,3,4},{1,2,3,4,5}} We write (IJ)K in T, if I,J are in some cluster of T which doesn’t contain J; e.g. (12)3, (34)5 are in T

BUILD (3) - Algorithm 1.Initialize C as set of input taxa 2.If |C|=1 return C, else compute graph 3.Let C’ be the sets of taxa in the connected components of G. If |C’| = 1 then is incompatible, else set C = C C’, and repeat step (2) on each new cluster in C’.

BUILD (4) - Algorithm

BUILD (5) - Algorithm

BUILD (6) - Algorithm

BUILD (7) - Algorithm

Compatible source trees For compatible source trees, MRP or BUILD can be used; however, the strict consensus of MRP trees (or the strict consensus supertree) may not be compatible with the input. BUILD has been extended to output all parent trees; also shown that source trees have a unique parent tree iff BUILD constructs a binary tree.

Incompatible source trees (1) For incompatible source trees two strategies: Resolve incompatibilities by using quartet methods or removing troublesome taxa. Use an appropriate algorithm such as MRP or MinCutSupertrees; the latter is an extension of BUILD so that it always outputs a tree.

Incompatible source trees (2) Desirable property P1: If at least one tree contains (IJ)K and no source tree contains (IK)J or (JK)L, then the output tree must contain (IJ)K No method can satisfy P1; however, the condition: if all source trees contain (IJ)K then output must contain (IJ)K can be satisfied.

Supertree criticism Do not take biomolecular sequences into account Dataset non-independence MRP: Favors larger source trees because they contribute more characters; may also favor unbalanced source trees Direct methods: Cannot incorporate support values in the source trees (except for MinCutSupertrees), and cannot compute support values in the supertree (unlike MRP)

Applications of supertrees Systematics – MRP is the standard method used by biologists Evolutionary models Rates of cladogenesis Evolutionary patterns Biodiversity and conservation

Bright future for supertree construction Despite increase in phylogenetic data, species are poorly characterizes at the molecular level; thus, giving rise to problems from taxon sampling (non- random sampling), long branch attraction, and missing data ML analysis: Genes evolve under different models Non-molecular data