Presentation is loading. Please wait.

Presentation is loading. Please wait.

Species Trees & Constraint Programming. The Tree of Life A central goal of systematics construct the tree of life a tree that represents the relationship.

Similar presentations


Presentation on theme: "Species Trees & Constraint Programming. The Tree of Life A central goal of systematics construct the tree of life a tree that represents the relationship."— Presentation transcript:

1 Species Trees & Constraint Programming

2 The Tree of Life A central goal of systematics construct the tree of life a tree that represents the relationship between all living things including constraint programmers The leaf nodes of the tree are species The interior nodes are hypothesized species extinct, where species diverged

3 Science 300

4 To date, biologists have cataloged about 1.7 million species yet estimates of the total number of species ranges from 4 to 100 million. “Of the 1.7 million species identified only about 80,000 species have been placed in the tree of life” E. Pennisi “Modernizing the Tree of Life” Science 300:1692-1697 2003

5 Properties of a Species Tree We have a set of leaf nodes, each labelled with a species the interior nodes have no labels each interior node has 2 children and one parent except the root (it has no parent) if we have n leaf nodes we then have n  1 interior nodes it is a bifurcating tree

6 Super Trees We are given two trees, T1 and T2 T1 has leaf set S1 and S2 has leaf set remember, leaves are species! But S1 and S2 have a non-empty intersection why? How can that happen? We want to combine T1 and T2 so, why is that a problem?

7 Tree: (f,((d,e),((c,(a,b)),g))) Triples: {((a,b),c),((d,e),c),((c,b),e), ((e,b),f),((a,g),f)}

8 Most Recent Common Ancestors (mrca) ab c We have 3 species, a, b, and c Species a and b are more closely related to each other than they are to c The most recent common ancestor of a and b is further from the root than the most recent common ancestor of a and c (and b and c) mrca(a,b)  mrca(a,c) mrca(a,b)  mrca(b,c) mrca(a,c)  mrca(b,c)

9 Triples (and Fans) ab c bc d Species trees are frequently presented as a set of triples (and fans)

10 Triples (and Fans) ab c bc d ab c d

11 BreakUp & OneTree (circa 1996) Algorithm breakUp takes a species tree and produces a set of rooted triples R that define that tree. Algorithm OneTree takes a set of species and a set of rooted triples, and builds a tree that respects those triples, or reports that no tree exists (in polytime) OneTree is a specialisation of Build, an algorithm proposed by Aho, Sagiv, Szymanski, and Ulman in 1981

12 The Flavour of OneTree Given a set of species S and rooted triples R produce a node N construct a graph G with vertices in S and edge (x,y) if triple xy|z is in R if G is a single component fail else recursively build on the left with one component with S’ and R’ (the set of species and triples in that component) on the right, with the other components

13 The Flavour of OneTree d a c b a c b d

14 Min-cut Super Trees What happens if OneTree fails? Gives us the best you can by breaking some triples (resulting in fans) by excluding some species There are polytime algorithms for this but they are greedy and biased minCut supertrees

15 Constraint Programming solutions to building a species tree from a set of rooted triples

16 A naïve constraint encoding (footnotes 756, 789, 794, 796) n-1 variables as interior nodes v[i] = j  parent(v[i]) = v[j] no loops/cycles Barbara used set variables (ILOG) Patrick used specialised constraint (Chco) Francois then encoded set variables! n variables as leaf nodes each takes a value respecting triples I am sparing you (and me) the details

17 Why was this a naïve constraint encoding? It produced the right number of trees when no triples the Catalan number symmetry breaking It would produce a tree if one existed A 2 stage process (1) build a tree from the interior nodes there are Catalan many of these (2) given an “interior tree” place the leaf nodes there are n! ways to do this if step (2) fails generate the next interior tree in (1) Yikes! That’s expensive. Imagine {ab|c,bc|d,cd|a}

18 Ultrametric Trees & Species Trees (footnotes 803,804,805,810,819) What is an ultrametric tree? We are given a 2d symmetric matrix D D[i,j] is the time of divergence of species i and j. D[i,j] is the the mrca(i,j) labeled with time of divergence D[i,j] is the value of mrca(i,j) Build a bifurcating tree n leaves and n - 1 interior nodes interior nodes labeled with entries from D any path from the root is a strictly decreasing sequence

19 8 35 B3CD EA Ultrametric Trees: here’s one I (well, Dan Gusfield actually ) prepared earlier Note: if the sequence increases, we have min-ultrametric tree

20 Ultrametric Matrix: necessary & sufficient conditions cannot have more than n - 1 distinct values because there are n - 1 interior nodes For every 3 indices i,j,k there is a tie for the maximum between D[i,j], D[i,k], D[j,k] Given an ultrametric matrix, an ultrametric tree can be constructed in O(n 2 ) … see Dan Gusfield’s book “Algorithms on Strings, Trees, and Sequences”

21 Why are our species trees ultrametric?

22 Take any rooted tree Mark the interior nodes with their depth/height in the tree Any path to a leaf is an increasing/decreasing sequence The tree is ultrametric

23 So? If you take any 3 leaf nodes x,y,z - the deepest ancestor of x and y is deeper than the deepest ancestor of x (y) and z OR - the deepest ancestor of x and z is deeper than the deepest ancestor of x (z) and y OR - the deepest ancestor of y and z is deeper than the deepest ancestor of y (z) and x

24 A CP encoding of D We have a 2 dimensional matrix of constrained integer variables D We must ensure that for any i,j,k the following holds for any 3 indices, there is a tie for the maximum Think isosceles triangles, allowing equilateral An ultrametric space, composed of isosceles triangles

25 A CP encoding of D Any instantiation of the variables in D is now guaranteed to be min-ultrametric We get Catalan number of min-ultrametric solutions

26 A geometric view: Choose one of these isosceles triangle corresponding to a rooted triples a b c a c b b c a c b a b ac a bc

27 How can we exploit this? We are given triples and fans, but not distances! But we can consider a triple ij|k as a constraint k ji Note: our tree is min-ultrametric! This over-rides the disjunctions posted across the matrix

28 The CP encoding (contd) we have the “blanket” disjunctive constraints to ensure min-ultrametric  i  j  k(i  j  k  triple(i,j,k)  triple(i,k,j)  triple(j,k,i)) triple(i,j,k)  (D[i,k]  D[j,k]  D[i,j]  D[i,k]  D[i,j]  D[j,k])  O(n 3 ) ternary constraints! triples are constraints that break the disjunctions a solution (if one exists) is min-ultrametric respecting triples we can then produce tree from the matrix, as a post process NOTE: we need a pre-process to break up trees into triples

29 Some real data sets follow Thorley’s thesis & Rod Page’s birds

30 Tree: (f,((d,e),((c,(a,b)),g))) Triples: {((a,b),c),((d,e),c),((c,b),e), ((e,b),f),((a,g),f)}

31 The tree is ultrametric & has an ultrametric matrix Depth is our measure Depth mrca(X,Y)

32

33

34 What’s the performance like? - time? - space?

35 Why bother with a constraint encoding? the challenge of it add side constraints, such as data on interior nodes distances between species optimisation use variable & value ordering heuristics state of the art CP search algorithms explanations why are some species closer to each other? Why is species X excluded from the tree? Etc?

36 What are the (initial) challenges? make triple(i,j,k) more efficient I think this is easy (and so does Peter Nightingale) something better than the O(n 3 ) ternary constraints an encoding that provably answers the decision in polytime

37 So where are we? Good question: we have tried real data we have a number of different micro-encodings Are we in P for decision? Not sure yet How about optimisation? We can see a way, by introducing penalties

38 Questions?


Download ppt "Species Trees & Constraint Programming. The Tree of Life A central goal of systematics construct the tree of life a tree that represents the relationship."

Similar presentations


Ads by Google