Download presentation
Presentation is loading. Please wait.
1
http://creativecommons.org/licenses/by-sa/2.0/
2
CIS786, Lecture 4 Usman Roshan
3
Iterated local search: escape local optima by perturbation Local optimum Output of perturbation Perturbation Local search
4
ILS for MP We saw that ratchet improves upon iterative improvement We saw that TNT’s sophisticated and faster implementation outperforms ratchet and PAUP* implementations But can we do even better?
5
Disk Covering Methods (DCMs) DCMs are divide-and-conquer booster methods. They divide the dataset into small subproblems, compute subtrees using a given base method, merge the subtrees, and refine the supertree. DCMs to date –DCM1: for improving statistical performance of distance-based methods. –DCM2: for improving heuristic search for MP and ML –DCM3: latest, fastest, and best (in accuracy and optimality) DCM
6
DCM2 technique for speeding up MP searches 1. Decompose sequences into overlapping subproblems 2. Compute subtrees using a base method 3. Merge subtrees using the Strict Consensus Merge (SCM) 4. Refine to make the tree binary
7
DCM1 and DCM2 decompositions DCM1 decomposition : NJ gets better accuracy on small diameter subproblems DCM2 decomposition: Getting a smaller number of smaller subproblems speeds up solution
8
Supertree Methods
9
Strict Consensus Merger 12 3 46 5 12 3 7 4 1 3 2 4 12 3 4 1 2 3 4 1 2 3 4 5 6 7
10
Tree Refinement e a bc d f g h a b cd f g h e d e a b c f g h a b cf g h de
11
The big question Why DCMs? Can DCMs improve upon existing Methods such as neighbor-joining or PAUP* or TNT?
12
Improving sequence length requirements of NJ Can DCM1 improve upon NJ? We examine this question under simulation
13
DCM1(NJ)
15
Computing tree for one threshold
16
Recall simulation studies
17
Experimental results True tree selection (phase II of DCM1) Uniformly random trees Birth-death random trees Sequence length requirements on birth- death random trees
18
Comparing tree selection techniques
19
Error rates on uniform random trees
20
Error as a function of evolutionary rate NJDCM1-NJ+MP
21
100 taxa, 90% accuracy Sequence length requirements as a function of evolutionary rates
22
400 taxa, 90% accuracy
23
Sequence length requirements as a function of #taxa DCM1-NJ+MPNJ
24
Conclusion DCM1-NJ+MP improves upon NJ on large and divergent settings Why did it work? Smaller datasets with low evolutionary diameters AND reliable supertree method accurate subtrees (on subsets) accurate supertree
25
Conclusion
26
Previously we saw a comparison of DCM components for solving MP DCM2 better than DCM1 decomposition SCM better than MRP (in DCM context) Constrained refinement better than Inferred Ancestral States technique Higher thresholds take longer but can produce better trees
27
Comparison of DCM components for solving MP
28
I. Comparison of DCMs (1,322 sequences) Base method is the TNT-ratchet.
29
I. Comparison of DCMs (1,322 sequences) Base method is the TNT-ratchet.
30
I. Comparison of DCMs (4583 sequences) Base method is the TNT-ratchet.
31
I. Comparison of DCMs (4583 sequences) Base method is the TNT-ratchet. DCM2 takes almost 10 hours to produce a tree and is too slow to run on larger datasets.
32
DCM2 decomposition on 500 rbcL genes (Zilla dataset) DCM2 decomposition Blue: separator Red: subset 1 Pink: subset 2 Vizualization produced by graphviz program---draws graph according to specified distances. Nodes: species in the dataset Distances: p-distances (hamming) between the DNAs 1.Separator is very large 2.Subsets are very large 3.Scattered subsets
33
Doesn’t look anything like this
34
2.Find separator X in G which minimizes max where are the connected components of G – X 3.Output subproblems as. DCM2 Input: distance matrix d, threshold, sequences S Algorithm: 1a. Compute a threshold graph G using q and d 1b. Perform a minimum weight triangulation of G DCM3 decomposition DCM3 Input : guide-tree T on S, sequences S Algorithm: 1.Compute a short quartet graph G using T. The graph G is provably triangulated. DCM3 advantage: it is faster and produces smaller subproblems than DCM2
35
DCM3 decomposition - example
36
Approx centroid-edge DCM3 decomposition – example 1.Locate the centroid edge e (O(n) time) 2.Set the closest leaves around e to be the separator (O(n) time) 3.Remaining leaves in subtrees around e form the subsets (unioned with the separator)
37
Time to compute DCM3 decompositions An optimal DCM3 decomposition takes O(n 3 ) to compute – same as for DCM2 The centroid edge DCM3 decomposition can be computed in O(n 2 ) time An approximate centroid edge decomposition can be computed in O(n ) time (from hereon we assume we are using the approximate centroid edge decomposition)
38
DCM2 decomposition on 500 rbcL genes (Zilla dataset) DCM2 decomposition Blue: separator Red: subset 1 Pink: subset 2 Vizualization produced by graphviz program---draws graph according to specified distances. Nodes: species in the dataset Distances: p-distances (hamming) between the DNAs 1.Separator is very large 2.Subsets are very large 3.Scattered subsets
39
DCM3 decomposition on 500 rbcL genes (Zilla dataset) DCM3 decomposition Blue: separator (and subset) Red: subset 2 Pink: subset 3 Yellow: subset 4 Vizualization produced by graphviz program---draws graph according to specified distances. Nodes: species in the dataset Distances: p-distances (hamming) between the DNAs 1.Separator is small 2.Subsets are small 3.Compact subsets
40
Dataset: 4583 actinobacteria ssu rRNA from RDP. Base method is the TNT-ratchet. DCM2 takes almost 10 hours to produce a tree and is too slow to run on larger datasets. DCM3 followed by TNT-ratchet doesn’t improve over TNT Recursive-DCM3 followed by TNT-ratchet doesn’t improve over TNT Comparison of DCMs 0.00 0.05 0.10 0.15 0.20 0.25 0.30 04812162024 Hours Average MP score above optimal, shown as a percentage of the optimal TNTDCM2DCM3Rec-DCM3
41
Local optima is a problem Phylogenetic trees Cost Global optimum Local optimum
42
Local optima is a problem Average MP score above optimal, shown as a percentage of the optimal Hours
43
Iterated local search: escape local optima by perturbation Local optimum Output of perturbation Perturbation Local search
44
Iterated local search: Recursive-Iterative-DCM3 Local optimum Output of Recursive-DCM3 Local search
45
Rec-I-DCM3(TNT-ratchet) improves upon unboosted TNT-ratchet 0.00 0.05 0.10 0.15 0.20 0.25 0.30 04812162024 Hours Average MP score above optimal, shown as a percentage of the optimal TNTDCM2DCM3Rec-DCM3Rec-I-DCM3 Comparison of DCMs for solving MP
46
I. Comparison of DCMs (13,921 sequences) Base method is the TNT-ratchet.
47
I. Comparison of DCMs (13,921 sequences) Base method is the TNT-ratchet.
48
I. Comparison of DCMs (13,921 sequences) Base method is the TNT-ratchet.
49
I. Comparison of DCMs (13,921 sequences) Base method is the TNT-ratchet.
50
I. Comparison of DCMs (13,921 sequences) Base method is the TNT-ratchet. Note the improvement in DCMs as we move from the default to recursion to iteration to recursion+iteration.
51
Improving upon TNT But what happens after 24 hours? We studied boosting upon TNT-ratchet. Other TNT heuristics are actually better and improving upon them may not be possible. Can we improve upon the default TNT search?
52
Improving upon TNT
53
2000 Eukaryotes rRNA
54
6722 3-domain+2-org rRNA
55
13921 Proteobacteria rRNA
56
Improving upon TNT What about better TNT heuristics? Can Rec-I- DCM3 improve upon them? Rec-I-DCM3 improves upon default TNT but we don’t know what happens for better TNT heuristics. Therefore, for a large-scale analysis figure out best settings of the software (e.g. TNT or PAUP*) on the dataset and then use it in conjunction with Rec-I-DCM3 with various subset sizes
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.