Presentation is loading. Please wait.

Presentation is loading. Please wait.

Phylogenetic Inference

Similar presentations


Presentation on theme: "Phylogenetic Inference"— Presentation transcript:

1 Phylogenetic Inference
Data Optimality Criteria Algorithms Results Practicalities 9/20/2018 Chuck Staben

2 Our Goals Infer Phylogeny Phylogenetic inference Optimality criteria
Algorithm Phylogenetic inference (interesting ones) 9/20/2018 Chuck Staben

3 Watch Out “The danger of generating incorrect results is inherently greater in computational phylogenetics than in many other fields of science.” “…the limiting factor in phylogenetic analysis is not so much in the facility of software applicaition as in the conceptual understanding of what the software is doing with the data.” 9/20/2018 Chuck Staben

4 Phylogenetic Models No transfer of genetic information by hybridization All sequences are homologous Each position in alignment homologous Observed variation is valid sample from included group Positions evolve independently 9/20/2018 Chuck Staben

5 Steps in Analysis Data Model (Alignment) DNA base substitution model
alignment method “trimming” to a phylogenetic set DNA base substitution model Build Trees Algorithm based vs Criterion based Distance based vs Character-based 9/20/2018 Chuck Staben

6 Choice of Input Data Informative Data Type Molecule of interest
Aligned sequences, RFLP, morphological data… Molecule of interest rRNA (general purpose) interesting character Number/type of taxa ingroup and outgroup Informative 9/20/2018 Chuck Staben

7 rRNA Genes Duplication? Conserved across kingdoms
Varies within species Widely sequenced, easy Long, lots of characters Duplication? 9/20/2018 Chuck Staben

8 Multiple Alignment Method
Computer dependence Phylogenetic Assumptions Alignment parameters (substitution matrix, gap cost) Aligned features primary sequence, structure Optimization statistical, non-statistical 9/20/2018 Chuck Staben

9 Typical Alignment Method
CLUSTAL, then manual editing Manual editing for phylogeny phylogenetic assumption in guide tree parameters a priori and dynamic primary structure (with some “influence” optimization non-statistical 9/20/2018 Chuck Staben

10 Estimate from "quick" tree building,
Substitution Models G to A, C to T versus N to N amino acid substitution forwards and backwards identical? site-to-site variation Simpler model better Estimate from "quick" tree building, Observed Variation 9/20/2018 Chuck Staben

11 Tree-Building Methods
Distance UPGMA, NJ, FM, ME Character Maximum Parsimony (PAUP) Maximum Likelihood (PHYLIP) Acrimonious Debates 9/20/2018 Chuck Staben

12 Distance Methods Most Often Wrong! CLUSTAL
Measure distance (dissimilarity) Accurate if distances are all summative (ultrametric) NEVER true over large distance Methods UPGMA (Unweighted pair group method with Arithmetic Mean) NJ (Neighbor joining) FM (Fitch-Margoliash) ME (Minimal Evolution) Most Often Wrong! CLUSTAL 9/20/2018 Chuck Staben

13 Which Distance Method? UPGMA NJ ME and FM seem best
Least accurate, most used NJ EXTREMELY RAPID GIVES ONLY 1 TREE ME and FM seem best Minimize tree path lengths 9/20/2018 Chuck Staben

14 Character Methods Maximum Parsimony Maximum Likelihood
minimal changes to produce data can use different substitution models Maximum Likelihood turns problem “inside out” coin flip analogy increasingly popular 9/20/2018 Chuck Staben

15 Searching for Trees 9/20/2018 Chuck Staben

16 Tree Search Algorithms
Exhaustive VERY INTENSIVE Branch and Bound Compromise Heuristic FAST (usually start with NJ) 9/20/2018 Chuck Staben

17 Evaluating Trees Consenus Tree Randomized Trees
Skewness tests Randomized Character Data Permutation tests Bootstrap, Jackknife resampling techniques >70% probably correct; 50% overestimates accuracy 9/20/2018 Chuck Staben

18 Rooting Trees Molecular Clock Extrinsic Evidence Paralog rooting
Root=midpoint, longest span Almost ALWAYS WRONG Extrinsic Evidence select fungus as root for plants, eg long branch attraction can be problem Paralog rooting long branch problems 9/20/2018 Chuck Staben

19 Tree Congruence Tree-to-Tree Comparison
2 different characters/same groups Important for evaluating biological hypotheses lentiviruses diverged within their current hosts only plant pathogenicity has arisen many times in fungi 9/20/2018 Chuck Staben

20 Common Software PAUP PHYLIP GCG PAUPSTAR (MACs best!) UNIX (Seqanal)
Pileup, Lineup, Paupsearch, Paupdisplay PAUPSTAR (MACs best!) PHYLIP UNIX (Seqanal) 9/20/2018 Chuck Staben

21 Phylogenetic Stories HIV Coevolution, host and pathogen Big Tree
complete genome accessible evolution rapid selection, neutralism? human interest (dentist and his patients, eg.) Coevolution, host and pathogen Big Tree 9/20/2018 Chuck Staben

22 Phylogenetic Resources
NCBI Taxonomy Browser RDP database “Tree of Life” 9/20/2018 Chuck Staben

23 Practicalities Quality of input data critical
Examine data from all possible angles distance, parsimony, likelihood Outgroup taxon critical problem if outgroup shares a selective property with a subset of ingroup Order of input can be problematic Jumble them! 9/20/2018 Chuck Staben

24 plagiarized by Chuck Staben, 1998
Trees plagiarized by Chuck Staben, 1998 Seargent Joyce Kilmer, 1914 9/20/2018 Chuck Staben


Download ppt "Phylogenetic Inference"

Similar presentations


Ads by Google