Tree Reconstruction
Phylogenetic tree Nodes – DNA (RNA, mtDNA) sequences, proteins, species = taxonomic units (TUs) Branches – ancestral relations between Tus Terminal (extant) nodes, leaves – OTUs (O for operational)
Tree reconstruction Neighbor joining (Distance) methods Maximum parsimony methods (W. Fitch) Maximum likelihood methods (J. Felsenstein) W. H. Li, “Molecular Evolution”, 1997
Rooted, unrooted trees C D B E A unrooted D E A B C rooted
How many geneological trees can we propose for a given number of terminal nodes n – number of OTUs
n NR NU 2 1 3 4 15 5 105 6 954 7 10 395 8 135 135 9 2 027 025 10 34 459 425
Neighbor joining UPGMA – unweighted pair group method with arithmetic mean Start from distance matrix (*) Find the minimum distance OTUs And merge them Update distance matrix, go to (*)
Maximum parsimony Principle of max parsimony searches for a tree that requires the smallest number of evolutionary changes to explain differences among OTUs Informative sites
Assume topology of the tree – for each site compute minimal number of mutations to explain the configuration Rule: The set at an interior node is the intersection of its two immediate descendants if the intersection is not empty, otherwise it is the union of the descendant sets
The index of the tree is the sum of indices for all informative sites Go through all possible trees to search for optimal one
Maximum likelihood Need a probabilistic model for nucleotide substitution A,C,T,G – 1,2,3,4 time=0 We analyze evoution of one site S. Given S=i, time=0 what is the probability of S=j, time=t time=t
Compute the likelihood function for a given tree Go through all possible trees to search for optimal one
Software PHYLIP (Phylogeny Inference Package) Version 3.57c by Joseph Felsenstein July, 1995