Download presentation
Presentation is loading. Please wait.
Published byRidwan Sudirman Modified over 6 years ago
1
By Patrick Prosser Presented by Chris Unsworth at CP06
Species Trees & Constraint Programming: recent progress and new challenges By Patrick Prosser Presented by Chris Unsworth at CP06
2
Outline Tree of life (what’s that then?)
Previous work (conventional and CP model) What’s new? (enhanced model, new problems) Conclusions (what have I told you!?) Future work (will this never end?)
3
Tree of life A central goal of systematics construct the tree of life
a tree that represents the relationship between all living things The leaf nodes of the tree are species The interior nodes are hypothesized species extinct, where species diverged
5
Not to be confused with this
6
Not to be confused with this
7
Not to be confused with this either
8
Something like this
15
To date, biologists have cataloged about 1
To date, biologists have cataloged about 1.7 million species yet estimates of the total number of species ranges from 4 to 100 million. “Of the 1.7 million species identified only about 80,000 species have been placed in the tree of life” E. Pennisi “Modernizing the Tree of Life” Science 300:
16
Properties of a Species Tree
We have a set of leaf nodes, each labelled with a species the interior nodes have no labels (maybe) each interior node has 2 children and one parent (maybe/ideally) a bifurcating tree (maybe/ideally) Note: recently there has been a requirements that interior nodes have divergence dates leaf nodes correspond to other trees (such as a leaf “cats”) trees might not bifurcate
17
Super Trees We are given two trees, T1 and T2
S1 and S2 are the sets of leaves for T1 and T2 respectively remember, leaves are species! S1 and S2 have a non-empty intersection some species appear in both trees We want to combine T1 and T2 respecting the relationships in T1 and T2 form a “super tree”
18
superTree combine
19
Overlap is highlighted in the trees and
the superTree
20
A simple wee example Overlap is leafs “a” and “f”
21
Most Recent Common Ancestors (mrca)
mrca(a,c) = mrca(b,c) a b c mrca(a,b) We have 3 species, a, b, and c Species a and b are more closely related to each other than they are to c mrca(a,b) mrca(a,c) mrca(a,b) mrca(b,c) mrca(a,c) mrca(b,c) The most recent common ancestor of a and b is further from the root than the most recent common ancestor of a and c (and b and c) a is closer to b than c NOTE: mrca(x,y) = mrca(y,x)
22
Most Recent Common Ancestors (mrca)
mrca(a,c) = mrca(b,c) a b c mrca(a,b) mrca(a,b) mrca(a,c) mrca(a,b) mrca(b,c) mrca(a,c) mrca(b,c) Note: this defines that Think of mrca(x,y) having integer value “depth”
23
Ultrametric relationship
Given 3 leaf nodes labelled a, b, and c there are only 4 possible situations a b c a c b b c a b c a fan triples
27
That’s all that there can be, for 3 leafs
28
Another view a b c a b c a c b b c a A space made up of triangles a b c Given any three vertices the triangle is either isosceles or equilateral
29
Ultrametric relationship
Given 3 leaf nodes labelled a, b, and c there are only 4 possible situations We can represent this using primitive constraints Where D[i,j] is a constrained integer variable representing the depth in the tree of the most recent common ancestor of the ith and jth species
30
Ultrametric constraint
Therefore the ultrametric constraint is as follows Constraint acting between leaf nodes/species a, b, and c Where D[x,y] is depth in tree of mrca(x,y) D[x,y] can also be thought of as distance
31
How it goes (part 1) Conventional technology (circa 1981) Take 2 species trees T1 and T2 Use the “breakUp” algorithm (Ng & Wormald 1996) on T1 then T2 - This produces a set of triples and fans Use the “oneTree” algorithm (Ng & Wormald 1996) - Generates a superTree or fails This is the “conventional” (non-CP) approach Different versions of oneTree and breakUp from Semple and Steel (I think) that treats fans differently (ignores them) oneTree is essentially the algorithm of Aho, Sagiv, Szymanski and Ullman in SIAM J.Compt 1981
32
breakUp generates constraints!
D E F G 1. Find deepest interior node 2. Get its descendants (leaf nodes) 3. Get a cousin or uncle leaf node 4. Generate a triple or fan 5. Delete one of the leafs in 2 6. Take the other leaf in 2 and make its parent that leaf 7. Go to 1 unless we are at the root with degree 2
33
breakUp generates constraints!
D E F G A deepest interior node Generate triple AB|C This is the constraint D[A,C] = D[B,C] < D[A,B]
34
breakUp generates constraints!
D E F G A deepest interior node Generate triple DE|C This is the constraint D[D,C] = D[E,C] < D[D,E]
35
breakUp generates constraints!
F G C E B A deepest interior node Generate fan BCE This is the constraint D[B,C] = D[B,E] = D[C,E]
36
breakUp generates constraints!
F G A deepest interior node Generate triple FG|E This is the constraint D[E,F] = D[F,G] < D[F,G]
37
breakUp generates constraints!
Done The triples and fans can be viewed as constraints that break the ultrametric disjunctions
38
The 1st CP approach
39
How it goes (part 2) CP approach (circa 2003) Generate an n by n array of constrained integer variables For all 0<i<j<k<n post the ultrametric constraint - Yes, we have a cubic number of constraints - Yes, we have a quadratic number of variables - This gives us an “ultrametric matrix” Use breakUp on trees T1 and T2 to produce triples and fans Post the triples and fans as constraints, breaking disjunctions Find a first solution Convert the ultrametric matrix to an ultrametric tree Algorithm for ultrametric matrix to ultrametric tree given by Dan Gusfield This is the CP approach proposed by Gent, Prosser, Smith & Wei in CP03 (a great great paper, go read it )
40
Key here is that we have an array of variables
Representing distances and this space must be ultrametric
41
An min ultrametric tree and its min ultrametric matrix
Matrix value is the value of the most recent common ancestor of two leaf nodes 3 4 5 B 8 C D E A As we go down a branch values on interior nodes increase Matrix is symmetric
42
The state of play in 2003 Coded up in claire & choco more a ”proof of concept” than a useful tool small data sets only
43
Two species trees of sea birds from the CP03 paper
44
On the left by oneTree and on the right by CP model
Resultant superTree On the left by oneTree and on the right by CP model
45
What’s new 2006 Reimplemented in java & JChoco (so faster) More robust (thanks to Pierre Flener’s help) Can now deal with larger trees (about 70 species) Can generate all solutions up to symmetry Can handle divergence dates on interior nodes Reimplemented breakUp & oneTree in Java All code available on the web
47
Bigger Trees Attempted to reconstruct the supertree in Kennedy & Page’s “Seabird supertrees: Combining partial estimates of rocellariiform phylogeny” in “The Auk: A Quarterly Journal of Ornithology” 119: 7 trees of seabirds (A through G) Varying in size from 14 to 90 species
48
From the paper Table shows on the diagonal the size of each tree, A through G A table entry is the size of the combined tree A table entry in () if trees are incompatible A table entry of – if trees are too big for CP model The only compatible trees are A, B, D and F The resultant supertree has 69 species This takes 20 seconds to produce
50
A “lifted” representation
Rather than instantiate the “D” variables why not just break the disjunctions? Now the decision variables are P[i,j,k] And yes, we have a cubic number of P variables
51
A “lifted” representation Rather than instantiate the “D” variables
why not just break the disjunctions? Now the decision variables are P[i,j,k] Now we can: Enumerate all solutions eliminating value symmetries Allow ranges of values on interior nodes of trees - input and output!
52
Ranked Trees A new problem where input trees have ancestral divergence dates on interior nodes A new “conventional” technique is the RANKED TREE algorithm
53
Ranked Trees using “lifted” CP model
A new problem where input trees have ancestral divergence dates on interior nodes We do this in the “lifted” model by merely 1. reading in divergence dates for pairs of species and posting these as constraints into the “D” variables 2. Then solve using the disjunction breaking “P” variables 3. Interior nodes retain range values 4. In addition can enumerate all solutions eliminating value symmetries
54
Two trees of cats. Ranks (divergence information) on interior nodes
Common species in boxes
55
NOTE: range of values [6..9] on mrca(PTE,LTI)
Two ranked cats trees on left, and on the right one of the ranked supertrees NOTE: range of values [6..9] on mrca(PTE,LTI)
56
7 of the 17 solutions have ranges on interior nodes
Without the “lifted” representation we get 30 solutions (some redundant)
57
Is this a 1st? We thinks so (or at least Patrick thinks so) enumerate all solutions for ranked supertrees remove value symmetries
58
What next? Reduce the size of the model. with a specialised ultrametric constraint - over 3 variables - over 3 variables plus the P decision variable - over an entire n by n array Improve propagation of ultrametric constraint - Bound GAC - GAC New application - Identify common features (back bone) of all supertrees - Address nested taxa - combine all we have Already underway with Neil Moore
59
enumerate all solutions removing symmetries
Conclusion presented a new (non-conventional) way of addressing the supertree problem constraint model has been shown to be versatile enumerate all solutions removing symmetries address divergence dates on interior nodes enumerate all solutions for ranked trees model is bulky/large we are working on this future extensions find the backbone of forest of supertrees address nested taxa
60
NO WAY! I did it all on my own
61
Thanks for helping Pierre Flener Xavier Lorca Rod Page Mike Steel Charles Semple Chris Unsworth Neil Moore Christine Wu Wei Barbara Smith Ian Gent
62
Any questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.