Phylogenetic Trees: Assumptions All existing species have a common ancestor Each species is descended from a single ancestor Each speciation gives rise to two derived species This leads to a ‘tree’ topology for geneology
Tree structure ‘Leaves’ are existing species Interior nodes are hypothetical common ancestors Tree may be rooted or unrooted Root (if it exists) is universal ancesestor gorillahumanbonobochimp
Limits of these assumptions Early life (probably wasn’t nicely ‘packaged’) Merger of symbiotes (origin of eukaryotes, plants) Lateral gene transfer
Characteristics Relatedness is inferred from features, or characters discrete character data –example: has feathers, number of fingers –data forms a n x m matrix distance data –example: sequence edit distance –data forms an n x n triangular matrix
Ordered or unordered characters Differences in characters are assumed to be the result of transitions from a previous uniform state –(unordered) Such transitions may occur between any two states –(ordered) They may occur only in a fixed sequence
Reversals Reversal - mutation into the primitive (ancestral) state Reversals are possible but unlikely Examples –nucleic acid sequence –toes of a mammal
Perfect phylogeny –Each edge in tree = one state transition –That is, the entire subtree must share this state Central problem of character state phylogeny: –input: set O of n objects. Set c of m characters. Set r of states. –Output (y/n): is there a perfect phylogeny for O?
Solve Perfect Phylogeny problem Problem 1 –acgac –accag –acgtt –tccag –acgat –aagtt Problem 2 –aagtt –atgtc –atgta –atgtt –aaatt –ttgta Problem 3 –aagtt –ttgtt –tcctt –tcaat –tcagt –acagt
Complexity of character problem Ordered states –the problem is polynomial Unordered states –If the number of states (r) is unbounded, PPP is NP-complete. –If the number of states is small (2, 3, 4) an algorithm which runs in reasonable time is known
Relaxing our assumptions Expecting perfect phylogeny is unrealistic, because –biological data contains errors –reversals to occur Two approaches to relaxation –minimize reversals (parsimony) –throw away minimum of offending characters (compatibility criterion)
Ideal distance matrix problem Input: a set O of objects and a (triangular) matrix M of pairwise distances between them Output: a tree in which the nodes are O and the paths have weights M
Ideal distance matrix problem Input: a set O of objects and a (triangular) matrix M of pairwise distances between them Output: a tree in which the nodes are O and the paths have weights M Nonideal considerations: reversals, convergence, error
Relaxed distance matrix problem Input: a set O of objects and a pair of (triangular) matrix M min and M max of pairwise distances between them Output: a tree in which the nodes are O and the path weights are bounded by M min and M max
Relaxed distance matrix problem Input: a set O of objects and a pair of (triangular) matrix M min and M max of pairwise distances between them Output: a tree in which the nodes are O and the path weights are bounded by M min and M max Tractable only for ultrametric trees (all leaves equidistant from root)
Neighbor joining A simple and commonly used heuristic while (|nodes| > 2) find nearest neighbors, A, B substitute with interior node at their midpoint