Phylogenetic Inference Data Optimality Criteria Algorithms Results Practicalities 9/20/2018 Chuck Staben
Our Goals Infer Phylogeny Phylogenetic inference Optimality criteria Algorithm Phylogenetic inference (interesting ones) 9/20/2018 Chuck Staben
Watch Out “The danger of generating incorrect results is inherently greater in computational phylogenetics than in many other fields of science.” “…the limiting factor in phylogenetic analysis is not so much in the facility of software applicaition as in the conceptual understanding of what the software is doing with the data.” 9/20/2018 Chuck Staben
Phylogenetic Models No transfer of genetic information by hybridization All sequences are homologous Each position in alignment homologous Observed variation is valid sample from included group Positions evolve independently 9/20/2018 Chuck Staben
Steps in Analysis Data Model (Alignment) DNA base substitution model alignment method “trimming” to a phylogenetic set DNA base substitution model Build Trees Algorithm based vs Criterion based Distance based vs Character-based 9/20/2018 Chuck Staben
Choice of Input Data Informative Data Type Molecule of interest Aligned sequences, RFLP, morphological data… Molecule of interest rRNA (general purpose) interesting character Number/type of taxa ingroup and outgroup Informative 9/20/2018 Chuck Staben
rRNA Genes Duplication? Conserved across kingdoms Varies within species Widely sequenced, easy Long, lots of characters Duplication? 9/20/2018 Chuck Staben
Multiple Alignment Method Computer dependence Phylogenetic Assumptions Alignment parameters (substitution matrix, gap cost) Aligned features primary sequence, structure Optimization statistical, non-statistical 9/20/2018 Chuck Staben
Typical Alignment Method CLUSTAL, then manual editing Manual editing for phylogeny phylogenetic assumption in guide tree parameters a priori and dynamic primary structure (with some “influence” optimization non-statistical 9/20/2018 Chuck Staben
Estimate from "quick" tree building, Substitution Models G to A, C to T versus N to N amino acid substitution forwards and backwards identical? site-to-site variation Simpler model better Estimate from "quick" tree building, Observed Variation 9/20/2018 Chuck Staben
Tree-Building Methods Distance UPGMA, NJ, FM, ME Character Maximum Parsimony (PAUP) Maximum Likelihood (PHYLIP) Acrimonious Debates 9/20/2018 Chuck Staben
Distance Methods Most Often Wrong! CLUSTAL Measure distance (dissimilarity) Accurate if distances are all summative (ultrametric) NEVER true over large distance Methods UPGMA (Unweighted pair group method with Arithmetic Mean) NJ (Neighbor joining) FM (Fitch-Margoliash) ME (Minimal Evolution) Most Often Wrong! CLUSTAL 9/20/2018 Chuck Staben
Which Distance Method? UPGMA NJ ME and FM seem best Least accurate, most used NJ EXTREMELY RAPID GIVES ONLY 1 TREE ME and FM seem best Minimize tree path lengths 9/20/2018 Chuck Staben
Character Methods Maximum Parsimony Maximum Likelihood minimal changes to produce data can use different substitution models Maximum Likelihood turns problem “inside out” coin flip analogy increasingly popular 9/20/2018 Chuck Staben
Searching for Trees 9/20/2018 Chuck Staben
Tree Search Algorithms Exhaustive VERY INTENSIVE Branch and Bound Compromise Heuristic FAST (usually start with NJ) 9/20/2018 Chuck Staben
Evaluating Trees Consenus Tree Randomized Trees Skewness tests Randomized Character Data Permutation tests Bootstrap, Jackknife resampling techniques >70% probably correct; 50% overestimates accuracy 9/20/2018 Chuck Staben
Rooting Trees Molecular Clock Extrinsic Evidence Paralog rooting Root=midpoint, longest span Almost ALWAYS WRONG Extrinsic Evidence select fungus as root for plants, eg long branch attraction can be problem Paralog rooting long branch problems 9/20/2018 Chuck Staben
Tree Congruence Tree-to-Tree Comparison 2 different characters/same groups Important for evaluating biological hypotheses lentiviruses diverged within their current hosts only plant pathogenicity has arisen many times in fungi 9/20/2018 Chuck Staben
Common Software PAUP PHYLIP GCG PAUPSTAR (MACs best!) UNIX (Seqanal) Pileup, Lineup, Paupsearch, Paupdisplay PAUPSTAR (MACs best!) PHYLIP UNIX (Seqanal) 9/20/2018 Chuck Staben
Phylogenetic Stories HIV Coevolution, host and pathogen Big Tree complete genome accessible evolution rapid selection, neutralism? human interest (dentist and his patients, eg.) Coevolution, host and pathogen Big Tree 9/20/2018 Chuck Staben
Phylogenetic Resources NCBI Taxonomy Browser http://www.ncbi.nlm.nih.gov/Taxonomy/ RDP database http://rdpwww.life.uiuc.edu/ “Tree of Life” http://phylogeny.arizona.edu/tree/phylogeny.html 9/20/2018 Chuck Staben
Practicalities Quality of input data critical Examine data from all possible angles distance, parsimony, likelihood Outgroup taxon critical problem if outgroup shares a selective property with a subset of ingroup Order of input can be problematic Jumble them! 9/20/2018 Chuck Staben
plagiarized by Chuck Staben, 1998 Trees plagiarized by Chuck Staben, 1998 Seargent Joyce Kilmer, 1914 9/20/2018 Chuck Staben