Download presentation
Presentation is loading. Please wait.
1
Species Trees & Constraint Programming
2
Ian Gent, Barbara Smith, Wu Wei (Christine)
Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)
3
The Tree of Life A central goal of systematics
construct the tree of life a tree that represents the relationship between all living things including constraint programmers The leaf nodes of the tree are species The interior nodes are hypothesized species extinct, where species diverged
4
Properties of a Species Tree
We have a set of leaf nodes, each labelled with a species the interior nodes have no labels each interior node has 2 children and one parent except the root (it has no parent) if we have n leaf nodes we then have n 1 interior nodes it is a bifurcating tree
5
Super Trees We are given two trees, T1 and T2 T1 has leaf set S1 and S2 has leaf set remember, leaves are species! But S1 and S2 have a non-empty intersection why? How can that happen? We want to combine T1 and T2 so, why is that a problem?
6
Most Recent Common Ancestors (mrca)
b c We have 3 species, a, b, and c mrca(a,b) mrca(a,c) mrca(a,b) mrca(b,c) mrca(a,c) mrca(b,c) Species a and b are more closely related to each other than they are to c The most recent common ancestor of a and b is further from the root than the most recent common ancestor of a and c (and b and c)
7
Triples (and Fans) a b c b c d
Species trees are frequently presented as a set of triples (and fans) a b c b c d
8
Triples (and Fans) b c d a b c a b c d
9
BreakUp & OneTree (circa 1996)
Algorithm breakUp takes a species tree and produces a set of rooted triples R that define that tree. Algorithm OneTree takes a set of species and a set of rooted triples, and builds a tree that respects those triples, or reports that no tree exists (in polytime) OneTree is a specialisation of Build, an algorithm proposed by Aho, Sagiv, Szymanski, and Ulman in 1981
10
The Flavour of OneTree Given a set of species S and rooted triples R
produce a node N construct a graph G with vertices in S and edge (x,y) if triple xy|z is in R if G is a single component fail else recursively build on the left with one component with S’ and R’ (the set of species and triples in that component) on the right, with the other components
11
The Flavour of OneTree d a c b a c b d
12
Min-cut Super Trees What happens if OneTree fails?
Gives us the best you can by breaking some triples (resulting in fans) by excluding some species There are polytime algorithms for this but they are greedy and biased
13
Constraint Programming solutions to building
a species tree from a set of rooted triples
14
A naïve constraint encoding (footnotes 756, 789, 794, 796)
n-1 variables as interior nodes v[i] = j parent(v[i]) = v[j] no loops/cycles Barbara used set variables (ILOG) Patrick used specialised constraint (Chco) Francois then encoded set variables! n variables as leaf nodes each takes a value respecting triples I am sparing you (and me) the details
15
Why was this a naïve constraint encoding?
It produced the right number of trees when no triples the Catalan number symmetry breaking It would produce a tree if one existed A 2 stage process (1) build a tree from the interior nodes there are Catalan many of these (2) given an “interior tree” place the leaf nodes there are n! ways to do this if step (2) fails generate the next interior tree in (1) Yikes! That’s expensive. Imagine {ab|c,bc|d,cd|a}
16
Ultrametric Trees & Species Trees (footnotes 803,804,805,810,819)
What is an ultrametric tree? We are given a 2d symmetric matrix D D[i][j] is the time of divergence of species i and j. D[i,j] is the the mrca(i,j) labeled with time of divergence D[i,j] is the value of mrca(i,j) Build a bifurcating tree n leaves and n - 1 interior nodes interior nodes labeled with entries from D any path from the root is a strictly decreasing sequence
17
Ultrametric Trees: here’s one I (well, Dan Gusfield actually ) prepared earlier
8 3 5 B C D E A Note: if the sequence increases, we have min-ultrametric tree
18
Ultrametric Matrix: necessary & sufficient conditions
cannot have more than n - 1 distinct values because there are n - 1 interior nodes For every 3 indices i,j,k there is a tie for the maximum between D[i,j], D[i,k], D[j,k] Given an ultrametric matrix, an ultrametric tree can be constructed in O(n2) … see Dan Gusfield’s book “Algorithms on Strings, Trees, and Sequences”
19
Think isosceles triangles, allowing equilateral
A CP encoding of D We have a 2 dimensional matrix of constrained integer cvariables D We must ensure that for any i,j,k the following holds Think isosceles triangles, allowing equilateral An ultrametric space, composed of isosceles triangles
20
A CP encoding of D Any instantiation of the variables in D is now guaranteed to be min-ultrametric We get Catalan number of min-ultrametric solutions
21
Note: our tree is min-ultrametric!
How can we exploit this? We are given triples and fans, but not distances! But we can consider a triple ij|k as a constraint k j i This over-rides the disjunctions posted across the matrix Note: our tree is min-ultrametric!
22
The CP encoding (contd)
we have the “blanket” disjunctive constraint to ensure min-ultrametric triples are constraints that break the disjunctions a solution (if one exists) is min-ultrametric respecting triples we can then produce tree from the matrix, as a post process NOTE: we need a pre-process to break up trees into triples
23
So where are we? Good question: we have not yet tried real data we have a number of different micro-encodings Are we in P for decision? Not sure yet How about optimisation? We can see a way, by introducing penalties Wu Wei is coding up BreakUp and OneTree so we have something real to compare with We need real data to check this out I need to get funding for this write a grant proposal with DRG I think!
24
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.