Download presentation
Presentation is loading. Please wait.
1
Phylogenetic Inference
Data Optimality Criteria Algorithms Results Practicalities 9/20/2018 Chuck Staben
2
Our Goals Infer Phylogeny Phylogenetic inference Optimality criteria
Algorithm Phylogenetic inference (interesting ones) 9/20/2018 Chuck Staben
3
Watch Out “The danger of generating incorrect results is inherently greater in computational phylogenetics than in many other fields of science.” “…the limiting factor in phylogenetic analysis is not so much in the facility of software applicaition as in the conceptual understanding of what the software is doing with the data.” 9/20/2018 Chuck Staben
4
Phylogenetic Models No transfer of genetic information by hybridization All sequences are homologous Each position in alignment homologous Observed variation is valid sample from included group Positions evolve independently 9/20/2018 Chuck Staben
5
Steps in Analysis Data Model (Alignment) DNA base substitution model
alignment method “trimming” to a phylogenetic set DNA base substitution model Build Trees Algorithm based vs Criterion based Distance based vs Character-based 9/20/2018 Chuck Staben
6
Choice of Input Data Informative Data Type Molecule of interest
Aligned sequences, RFLP, morphological data… Molecule of interest rRNA (general purpose) interesting character Number/type of taxa ingroup and outgroup Informative 9/20/2018 Chuck Staben
7
rRNA Genes Duplication? Conserved across kingdoms
Varies within species Widely sequenced, easy Long, lots of characters Duplication? 9/20/2018 Chuck Staben
8
Multiple Alignment Method
Computer dependence Phylogenetic Assumptions Alignment parameters (substitution matrix, gap cost) Aligned features primary sequence, structure Optimization statistical, non-statistical 9/20/2018 Chuck Staben
9
Typical Alignment Method
CLUSTAL, then manual editing Manual editing for phylogeny phylogenetic assumption in guide tree parameters a priori and dynamic primary structure (with some “influence” optimization non-statistical 9/20/2018 Chuck Staben
10
Estimate from "quick" tree building,
Substitution Models G to A, C to T versus N to N amino acid substitution forwards and backwards identical? site-to-site variation Simpler model better Estimate from "quick" tree building, Observed Variation 9/20/2018 Chuck Staben
11
Tree-Building Methods
Distance UPGMA, NJ, FM, ME Character Maximum Parsimony (PAUP) Maximum Likelihood (PHYLIP) Acrimonious Debates 9/20/2018 Chuck Staben
12
Distance Methods Most Often Wrong! CLUSTAL
Measure distance (dissimilarity) Accurate if distances are all summative (ultrametric) NEVER true over large distance Methods UPGMA (Unweighted pair group method with Arithmetic Mean) NJ (Neighbor joining) FM (Fitch-Margoliash) ME (Minimal Evolution) Most Often Wrong! CLUSTAL 9/20/2018 Chuck Staben
13
Which Distance Method? UPGMA NJ ME and FM seem best
Least accurate, most used NJ EXTREMELY RAPID GIVES ONLY 1 TREE ME and FM seem best Minimize tree path lengths 9/20/2018 Chuck Staben
14
Character Methods Maximum Parsimony Maximum Likelihood
minimal changes to produce data can use different substitution models Maximum Likelihood turns problem “inside out” coin flip analogy increasingly popular 9/20/2018 Chuck Staben
15
Searching for Trees 9/20/2018 Chuck Staben
16
Tree Search Algorithms
Exhaustive VERY INTENSIVE Branch and Bound Compromise Heuristic FAST (usually start with NJ) 9/20/2018 Chuck Staben
17
Evaluating Trees Consenus Tree Randomized Trees
Skewness tests Randomized Character Data Permutation tests Bootstrap, Jackknife resampling techniques >70% probably correct; 50% overestimates accuracy 9/20/2018 Chuck Staben
18
Rooting Trees Molecular Clock Extrinsic Evidence Paralog rooting
Root=midpoint, longest span Almost ALWAYS WRONG Extrinsic Evidence select fungus as root for plants, eg long branch attraction can be problem Paralog rooting long branch problems 9/20/2018 Chuck Staben
19
Tree Congruence Tree-to-Tree Comparison
2 different characters/same groups Important for evaluating biological hypotheses lentiviruses diverged within their current hosts only plant pathogenicity has arisen many times in fungi 9/20/2018 Chuck Staben
20
Common Software PAUP PHYLIP GCG PAUPSTAR (MACs best!) UNIX (Seqanal)
Pileup, Lineup, Paupsearch, Paupdisplay PAUPSTAR (MACs best!) PHYLIP UNIX (Seqanal) 9/20/2018 Chuck Staben
21
Phylogenetic Stories HIV Coevolution, host and pathogen Big Tree
complete genome accessible evolution rapid selection, neutralism? human interest (dentist and his patients, eg.) Coevolution, host and pathogen Big Tree 9/20/2018 Chuck Staben
22
Phylogenetic Resources
NCBI Taxonomy Browser RDP database “Tree of Life” 9/20/2018 Chuck Staben
23
Practicalities Quality of input data critical
Examine data from all possible angles distance, parsimony, likelihood Outgroup taxon critical problem if outgroup shares a selective property with a subset of ingroup Order of input can be problematic Jumble them! 9/20/2018 Chuck Staben
24
plagiarized by Chuck Staben, 1998
Trees plagiarized by Chuck Staben, 1998 Seargent Joyce Kilmer, 1914 9/20/2018 Chuck Staben
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.