PHYLOGENETIC TREES Dwyane George February 24,
Outline Introduction & Motivation Definition Algorithm & Proof of Correctness Unweighted Pair Group Method with Arithmetic Mean (UPGMA) Algorithm Runtime
Key Ideas Phylogenetic trees represent inferred evolutionary relationships Composed by various methods Clustering Maximum likelihood estimators
Definitions Phylogeny The relationships among species, populations, individuals or genes (taxa) Phylogenetic Trees Results presented as a collection of nodes and edges – a tree Tree showing inferred evolutionary relationships among various biological species or entities Closely related taxa are spatially nearby, evolutionarily distant taxa are far apart Rooted/unrooted variations
Number of Trees Theorem (Cavalli-Sforza & Edwards): The number of rooted binary phylogenetic trees of n vertices is given by: Proof: by induction
Unweighted Pair Group Method with Arithmetic Mean (UPGMA) d ij denote the distance between the i th and j th taxa Let d ij denote the distance between the i th and j th taxa SpeciesABCD A0--- B d ab 0-- C d ac d bc 0- D d ad d bd d cd 0
UPGMA Algorithm Initialize all vertices to a cluster of size 1 Cluster the two species with the smallest distance Let d ij = min(D) C k = C i U C j Update the distance matrix with the new group against all other nodes d (ij)k = ½ * (d ik + d jk ) Repeat steps 2 & 3 for n-1 times until all species have been grouped
UPGMA Implementation
UPGMA Correctness Definition: Ultrametric tree All pendant vertices are equidistant from the root. “Constant molecular clock” UPGMA assigns same positive height to all subtrees Greedy algorithm Picks locally optimal groupings from leaves to root Topographically correct iff input data is ultrametric
UPGMA Algorithm Runtime Total Runtime O(n 3 ) Potential Speedup to O(n 2 ) by clustering in linear time Gronau & Moran (2006) Quad Trees data structure OperationTimeNumber of CallsTotal Time Hierarchical Clustering O(n 2 )O(n)O(n 3 ) Update D MatrixO(n) O(n 2 ) UPGMA--O(n 3 )