Phylogenetics-2 Marek Kimmel (Statistics, Rice)
Outline Distance trees and ultrametric distances Existence of a tree given a set of ultrametric distances UPGMA method Neighbor Joining method Maximum Parsimony (independent reading)
Distance axioms 1.Nonnegativeness, d(x, y) 0. 2.Nondegeneracy, d(x, y) 0 x = y 3.Symmetry, d(x, y) = d(y, x) 4.Triangle property, d(x, y) d(x, z) + d(z, y) For tree – derived distances: 5.Ultrametricity. For any three points, two distances are equal and the third is less than these two, e.g. d(x, y) < d(x, z) = d(z, y)
Ultrametricity For any 3-subtree, d(x, y) < d(x, z) = d(z, y) Distances: tree – derived all triplets are ultrametric If all triplets ultrametric, do the distances uniquely define a tree?
Proof of tree existence Constructive proof, by induction, given set of nodes, with ultrametric distances. First step: Construct a tree with 2 species
m-step Suppose tree constructed for first m species. r = old root
m+1 - step Take x and y as in the previous slide and s m+1 Suppose d(s m+1, x) = d(s m+1, y) (other cases handled similarly). Consequently, d(x, y) < d(s m+1, x) = d(s m+1, y). r = old root new root
Induction Choose x and y and define These distances good for x and y, now check for any z
Remarks Similar proofs for the other two cases UPGMA method builds the same trees simpler. Not good for non-ultrametric distances, closest nodes do not have to be neighbors. Neighbor Joining method is a remedy (to be continued …).
Neighbor-joining distance Neighbor-joining “distance” is not a distance, but it satisfies the following theorem: Theorem. Suppose S is a set of species and d is a tree-derived distance on S obtained from an unrooted tree (so, not necessarily ultrametric). If x and y are such that (x,y) are minimum, then x and y are neighbors.
Proof for N = 4 In a 4-tree, all leaves have neighbors General proof, see the book N.-J. Algorithm, see the book
Gene splitting versus population splitting Diagram showing that gene splitting (G) usually occurs earlier than population splitting (P) if the population is genetically polymorphic at time P. The evolutionary history of gene splitting resulting in the six alleles denoted a-f is shown in solid lines, and population splitting is shown in broken lines. After Nei (1987).