Phylogenetic Inference

Slides:



Advertisements
Similar presentations
Introduction to Molecular Evolution
Advertisements

Computational Molecular Biology Biochem 218 – BioMedical Informatics Doug Brutlag Professor.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
An Introduction to Phylogenetic Methods
Wellcome Trust Workshop Working with Pathogen Genomes Module 6 Phylogeny.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
IE68 - Biological databases Phylogenetic analysis
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
© Wiley Publishing All Rights Reserved. Phylogeny.
Bioinformatics and Phylogenetic Analysis
Phylogenetic reconstruction
With astonishing advance of the Human Genome Project, essentially all human genomic sequences are available in public databases. The major task for the.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
Phylogenetic trees Sushmita Roy BMI/CS 576
Phylogenetic Analysis
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Maximum parsimony Kai Müller.
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Terminology of phylogenetic trees
Molecular phylogenetics
Christian M Zmasek, PhD 15 June 2010.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
COMPUTATIONAL MODELS FOR PHYLOGENETIC ANALYSIS K. R. PARDASANI DEPTT OF APPLIED MATHEMATICS MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY (MANIT) BHOPAL.
Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Phylogenetic Inference Data Optimality Criteria Algorithms Results Practicalities BIO520 BioinformaticsJim Lund Reading: Ch8.
Phylogenetic Tree Reconstruction
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Evolutionary Biology Concepts Molecular Evolution Phylogenetic Inference BIO520 BioinformaticsJim Lund Reading: Ch7.
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Phylogeny and Genome Biology Andrew Jackson Wellcome Trust Sanger Institute Changes: Type program name to start Always Cd to phyml directory before starting.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Phylogenetics.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Molecular Evolution. Study of how genes and proteins evolve and how are organisms related based on their DNA sequence Molecular evolution therefore is.
Phylogeny and the Tree of Life
Introduction to Bioinformatics Resources for DNA Barcoding
Phylogenetic basis of systematics
Distance based phylogenetics
Clustering methods Tree building methods for distance-based trees
Multiple Alignment and Phylogenetic Trees
Goals of Phylogenetic Analysis
Methods of molecular phylogeny
Molecular Evolution.
Summary and Recommendations
Dr Tan Tin Wee Director Bioinformatics Centre
Evolutionary Biology Concepts
Chapter 19 Molecular Phylogenetics
Bill Bruno Brian Foley Thomas Leitner Theoretical Biology & Biophysics
#30 - Phylogenetics Distance-Based Methods
Phylogeny.
Summary and Recommendations
But what if there is a large amount of homoplasy in the data?
Presentation transcript:

Phylogenetic Inference Data Optimality Criteria Algorithms Results Practicalities 9/20/2018 Chuck Staben

Our Goals Infer Phylogeny Phylogenetic inference Optimality criteria Algorithm Phylogenetic inference (interesting ones) 9/20/2018 Chuck Staben

Watch Out “The danger of generating incorrect results is inherently greater in computational phylogenetics than in many other fields of science.” “…the limiting factor in phylogenetic analysis is not so much in the facility of software applicaition as in the conceptual understanding of what the software is doing with the data.” 9/20/2018 Chuck Staben

Phylogenetic Models No transfer of genetic information by hybridization All sequences are homologous Each position in alignment homologous Observed variation is valid sample from included group Positions evolve independently 9/20/2018 Chuck Staben

Steps in Analysis Data Model (Alignment) DNA base substitution model alignment method “trimming” to a phylogenetic set DNA base substitution model Build Trees Algorithm based vs Criterion based Distance based vs Character-based 9/20/2018 Chuck Staben

Choice of Input Data Informative Data Type Molecule of interest Aligned sequences, RFLP, morphological data… Molecule of interest rRNA (general purpose) interesting character Number/type of taxa ingroup and outgroup Informative 9/20/2018 Chuck Staben

rRNA Genes Duplication? Conserved across kingdoms Varies within species Widely sequenced, easy Long, lots of characters Duplication? 9/20/2018 Chuck Staben

Multiple Alignment Method Computer dependence Phylogenetic Assumptions Alignment parameters (substitution matrix, gap cost) Aligned features primary sequence, structure Optimization statistical, non-statistical 9/20/2018 Chuck Staben

Typical Alignment Method CLUSTAL, then manual editing Manual editing for phylogeny phylogenetic assumption in guide tree parameters a priori and dynamic primary structure (with some “influence” optimization non-statistical 9/20/2018 Chuck Staben

Estimate from "quick" tree building, Substitution Models G to A, C to T versus N to N amino acid substitution forwards and backwards identical? site-to-site variation Simpler model better Estimate from "quick" tree building, Observed Variation 9/20/2018 Chuck Staben

Tree-Building Methods Distance UPGMA, NJ, FM, ME Character Maximum Parsimony (PAUP) Maximum Likelihood (PHYLIP) Acrimonious Debates 9/20/2018 Chuck Staben

Distance Methods Most Often Wrong! CLUSTAL Measure distance (dissimilarity) Accurate if distances are all summative (ultrametric) NEVER true over large distance Methods UPGMA (Unweighted pair group method with Arithmetic Mean) NJ (Neighbor joining) FM (Fitch-Margoliash) ME (Minimal Evolution) Most Often Wrong! CLUSTAL 9/20/2018 Chuck Staben

Which Distance Method? UPGMA NJ ME and FM seem best Least accurate, most used NJ EXTREMELY RAPID GIVES ONLY 1 TREE ME and FM seem best Minimize tree path lengths 9/20/2018 Chuck Staben

Character Methods Maximum Parsimony Maximum Likelihood minimal changes to produce data can use different substitution models Maximum Likelihood turns problem “inside out” coin flip analogy increasingly popular 9/20/2018 Chuck Staben

Searching for Trees 9/20/2018 Chuck Staben

Tree Search Algorithms Exhaustive VERY INTENSIVE Branch and Bound Compromise Heuristic FAST (usually start with NJ) 9/20/2018 Chuck Staben

Evaluating Trees Consenus Tree Randomized Trees Skewness tests Randomized Character Data Permutation tests Bootstrap, Jackknife resampling techniques >70% probably correct; 50% overestimates accuracy 9/20/2018 Chuck Staben

Rooting Trees Molecular Clock Extrinsic Evidence Paralog rooting Root=midpoint, longest span Almost ALWAYS WRONG Extrinsic Evidence select fungus as root for plants, eg long branch attraction can be problem Paralog rooting long branch problems 9/20/2018 Chuck Staben

Tree Congruence Tree-to-Tree Comparison 2 different characters/same groups Important for evaluating biological hypotheses lentiviruses diverged within their current hosts only plant pathogenicity has arisen many times in fungi 9/20/2018 Chuck Staben

Common Software PAUP PHYLIP GCG PAUPSTAR (MACs best!) UNIX (Seqanal) Pileup, Lineup, Paupsearch, Paupdisplay PAUPSTAR (MACs best!) PHYLIP UNIX (Seqanal) 9/20/2018 Chuck Staben

Phylogenetic Stories HIV Coevolution, host and pathogen Big Tree complete genome accessible evolution rapid selection, neutralism? human interest (dentist and his patients, eg.) Coevolution, host and pathogen Big Tree 9/20/2018 Chuck Staben

Phylogenetic Resources NCBI Taxonomy Browser http://www.ncbi.nlm.nih.gov/Taxonomy/ RDP database http://rdpwww.life.uiuc.edu/ “Tree of Life” http://phylogeny.arizona.edu/tree/phylogeny.html 9/20/2018 Chuck Staben

Practicalities Quality of input data critical Examine data from all possible angles distance, parsimony, likelihood Outgroup taxon critical problem if outgroup shares a selective property with a subset of ingroup Order of input can be problematic Jumble them! 9/20/2018 Chuck Staben

plagiarized by Chuck Staben, 1998 Trees plagiarized by Chuck Staben, 1998 Seargent Joyce Kilmer, 1914 9/20/2018 Chuck Staben