The (Supertree) of Life: Procedures, Problems, and Prospects Presented by Usman Roshan
Supertree Methods Input: Set of trees Output: Tree leaf-labeled by where is the set of leaves of. Why supertree methods?
Motivation (1) Supertree methods are used as part of divide-and-conquer method to solve NP- hard problems on large datasets
Motivation (2) Supertree methods are used when we have missing data
Types of supertree methods (1) Direct methods (e.g. strict consensus supertrees, MinCutSupertrees)
Types of supertree methods (2) Indirect methods (e.g. MRP, average consensus)
Types of supertree methods (3) (MRP)
Definitions Contraction: Restriction: If then contains
Optimization problems Subtree Compatibility: Given set of trees,does there exist tree,such that, (we say contains ). NP-hard (Steel 1992) Special cases are poly-time (rooted trees, DCM) MRP: also NP-hard
Limitations of supertree methods Three desirable properties: P1: Method can be applied to any unordered set of input trees P2: Renaming the species does not change the constructed supertree P3: If the input trees are compatible, then the output tree is one of the “parent trees”. There is no supertree method that can satisfy P1-P3 when the input trees are unrooted; however, for rooted trees an extension of BUILD satisfies P1-P3.
Rooted subtrees (BUILD) (Aho et al 1981) Input: Set of rooted trees Output: Tree that contains
BUILD (2) - Definitions Cluster: Set of taxa in a rooted subtree A different representation of rooted phylogenetic trees Let C(T) be the clusters of tree T. In this example C(T) = {{1,2}, {3,4}, {1,2,3,4},{1,2,3,4,5}} We write (IJ)K in T, if I,J are in some cluster of T which doesn’t contain J; e.g. (12)3, (34)5 are in T
BUILD (3) - Algorithm 1.Initialize C as set of input taxa 2.If |C|=1 return C, else compute graph 3.Let C’ be the sets of taxa in the connected components of G. If |C’| = 1 then is incompatible, else set C = C C’, and repeat step (2) on each new cluster in C’.
BUILD (4) - Algorithm
BUILD (5) - Algorithm
BUILD (6) - Algorithm
BUILD (7) - Algorithm
Compatible source trees For compatible source trees, MRP or BUILD can be used; however, the strict consensus of MRP trees (or the strict consensus supertree) may not be compatible with the input. BUILD has been extended to output all parent trees; also shown that source trees have a unique parent tree iff BUILD constructs a binary tree.
Incompatible source trees (1) For incompatible source trees two strategies: Resolve incompatibilities by using quartet methods or removing troublesome taxa. Use an appropriate algorithm such as MRP or MinCutSupertrees; the latter is an extension of BUILD so that it always outputs a tree.
Incompatible source trees (2) Desirable property P1: If at least one tree contains (IJ)K and no source tree contains (IK)J or (JK)L, then the output tree must contain (IJ)K No method can satisfy P1; however, the condition: if all source trees contain (IJ)K then output must contain (IJ)K can be satisfied.
Supertree criticism Do not take biomolecular sequences into account Dataset non-independence MRP: Favors larger source trees because they contribute more characters; may also favor unbalanced source trees Direct methods: Cannot incorporate support values in the source trees (except for MinCutSupertrees), and cannot compute support values in the supertree (unlike MRP)
Applications of supertrees Systematics – MRP is the standard method used by biologists Evolutionary models Rates of cladogenesis Evolutionary patterns Biodiversity and conservation
Bright future for supertree construction Despite increase in phylogenetic data, species are poorly characterizes at the molecular level; thus, giving rise to problems from taxon sampling (non- random sampling), long branch attraction, and missing data ML analysis: Genes evolve under different models Non-molecular data