Practical Session: Bayesian evolutionary analysis by sampling trees (BEAST) Rebecca R. Gray, Ph.D. Department of Pathology University of Florida.

Slides:



Advertisements
Similar presentations
Juan Daza UCF Fall 2008 Juan Daza UCF Fall 2008 Estimating divergence times from molecular data.
Advertisements

Newton Institute, Infectious Diseases Dynamics, Aug 21, Inference of epidemiological dynamics from sequence data: application to influenza Cécile.
Epistatic effect and positive selection in the HIV-1 vif gene are linked with APOBEC3G/F neutralization activity Élcio Leal 1, Shiori Yabe 3, Hirohisa.
Gene tree analyses of Aboriginal Australians Rosalind Harding University of Oxford.
New phylogenetic methods for studying the phenotypic axis of adaptive radiation Liam J. Revell University of Massachusetts Boston.
Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between.
An Introduction to Phylogenetic Methods
Introduction to Phylogenies
Sampling distributions of alleles under models of neutral evolution.
Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus Thomas Bayes
Lecture 23: Introduction to Coalescence April 7, 2014.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
1 Graphical Diagnostic Tools for Evaluating Latent Class Models: An Application to Depression in the ECA Study Elizabeth S. Garrett Department of Biostatistics.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Visual Recognition Tutorial
Molecular Evolution Revised 29/12/06
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Inference of Genealogies for Recombinant SNP Sequences in Populations Yufeng Wu Computer Science and Engineering Department University of Connecticut
Probabilistic methods for phylogenetic trees (Part 2)
PhyloSub Jiao et. al. BMC Bioinformatics 2014, 15:35.
GENETIC DISTINCTIVENESS OF ITALIAN AUROCHS: NEW INSIGHTS INTO CATTLE DOMESTICATION PROCESS Giulio Catalano (1),Stefano Mona (2), Martina Lari (1), Paolo.
7. Bayesian phylogenetic analysis using MrBAYES UST Jeong Dageum Thomas Bayes( ) The Phylogenetic Handbook – Section III, Phylogenetic.
Bayes Factor Based on Han and Carlin (2001, JASA).
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
BINF6201/8201 Molecular phylogenetic methods
Input for the Bayesian Phylogenetic Workflow All Input values could be loaded as text file or typing directly. Only for the multifasta file is advised.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Tree Inference Methods
Speciation history inferred from gene trees L. Lacey Knowles Department of Ecology and Evolutionary Biology University of Michigan, Ann Arbor MI
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
TOWARDS TESTING THE EPIDEMIC CLONE MODEL OF BACTERIAL PATHOGENS Daniel J. Wilson, Gilean A.T. McVean and Martin C.J. Maiden Peter Medawar Building for.
The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong.
PAML: Phylogenetic Analysis by Maximum Likelihood Ziheng Yang Depart of Biology University College London
ABC The method: practical overview. 1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics.
Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU.
Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility? EUAsiaGrid Workshop 4-6 May 2010 Chanditha Hapuarachchi Environmental.
Patterns of divergent selection from combined DNA barcode and phenotypic data Tim Barraclough, Imperial College London.
Parsimony is Computationally Intensive
The star-tree paradox in Bayesian phylogenetics Bengt Autzen Department of Philosophy, Logic and Scientific Method LSE.
Bayesian Phylogenetics. Bayes Theorem Pr(Tree|Data) = Pr(Data|Tree) x Pr(Tree) Pr(Data)
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Evolutionary analysis of hepatitis C virus gene sequences from 1953 by Rebecca R. Gray, Yasuhito Tanaka, Yutaka Takebe, Gkikas Magiorkinis, Zelma Buskell,
Phylogeography of Leucetta chagosensis (Porifera, Calcarea) Christoph Flucke, Jens Kurz, Rasmus Liedigk, Zdenka Valenzova Fig.4: RAxML Phylogram Fig.5:
New methods for estimating species trees from genome-scale data Tandy Warnow The University of Illinois.
Clustering and Geography: Analysis of HIV Transmission among UK MSM Lucy Weinert* 1, Gareth Hughes 1, Esther Fearnhill 2, David Dunn 2, Andrew Rambaut.
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Restriction enzyme analysis The new(ish) population genetics Old view New view Allele frequency change looking forward in time; alleles either the same.
Bayesian Evolutionary Analysis by Sampling Trees (BEAST) LEE KIM-SUNG Environmental Health Institute National Environment Agency.
Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.
Phylogenetics and Coalescence. Goals Construct phylogenetic trees using the UPGMA method Use nucleotide sequences to construct phylogenetic trees using.
Species Tree Workshop January 14, 2012 Practice with BEST Please download MrBayes 3.2 for either Windows, Macintos, or UNIX from
Bayesian II Spring Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have a large enough sample of the posterior?
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
Lecture 14 – Consensus Trees & Nodal Support
From: Phylogenetic Inference via Sequential Monte Carlo
IMa2(Isolation with Migration)
Lecture 16 – Molecular Clocks
Associate Professor Daniel Wilson
the goal of Bayesian divergence time estimation
Volume 19, Issue 5, Pages (May 2011)
Maternal History of Oceania from Complete mtDNA Genomes: Contrasting Ancient Diversity with Recent Homogenization Due to the Austronesian Expansion  Ana T.
Lecture 14 – Consensus Trees & Nodal Support
Role for migratory wild birds in the global spread of avian influenza H5N8 Science Volume 354(6309): October 14, 2016 Published by AAAS.
Presentation transcript:

Practical Session: Bayesian evolutionary analysis by sampling trees (BEAST) Rebecca R. Gray, Ph.D. Department of Pathology University of Florida

BEAST: – is a cross-platform program for Bayesian MCMC analysis of molecular sequences –entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models –can be used as a method of reconstructing phylogenies, but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology –uses MCMC to average over tree space, so that each tree is weighted proportional to its posterior probability

Citations The recommended citation for this program is: –Drummond AJ, Rambaut A (2007) "BEAST: Bayesian evolutionary analysis by sampling trees." BMC Evolutionary Biology 7:214 To cite the relaxed clock model in BEAST: –Drummond AJ, Ho SYW, Phillips MJ & Rambaut A (2006) PLoS Biology 4, e88 To cite the Bayesian Skyline model in BEAST: –Drummond AJ, Rambaut A & Shapiro B and Pybus OG (2005) Mol Biol Evol 22, The original MCMC paper was: –Drummond AJ, Nicholls GK, Rodrigo AG & Solomon W (2002) Genetics 161,

Basic Pipeline 1) setting up xml file (beauti) 2) running xml file (beast) 3) evaluating the performance of the run (Tracer) 4) comparing models, obtaining estimates of parameters (Tracer) 5) summarizing the tree distribution (TreeAnnotator) 6) viewing MCC tree (Figtree)

Downloading programs –Download contains beauti, BEAST, TreeAnnotator

PRACTICAL: RIFT VALLEY FEVER VIRUS

Epidemiology of RVF The virus was first identified in 1931 in the Rift Valley of Kenya Mosquito vector, primarily infects livestock 1997–1998, a major outbreak occurred in Kenya, Somalia and the United Republic of Tanzania September 2000 cases were confirmed in Saudi Arabia and Yemen (first reported occurrence of the disease outside the African continent)

Setting up xml file in beauti Requires a nexus file –Helpful to have dates with the sample name –Use the finest resolution available GUI interface allows basic selection of parameters Xml file can be manually edited to test specific hypotheses/tweak run

Beauti practical Import alignment (g_63.nex) Tip dates – use tipdates, guess dates (years since some time in the past) Site models – use GTR + G, empirical base frequencies Test hypothesis of strict vs. relaxed molecular clock Trees – coalescent tree prior – constant size 5 x 10 7 generations

BEAST Open xml file with text editor Run in beast Check mixing of the MCMC chain Open S log files in Tracer Open L and G2 log files What can we do about the trace??

Proper mixing First step – run chain longer –Open L200 files Other steps to try: –Over parameterization – reduce complexity –Temporal/phylogenetic signal –Priors are inappropriate

Model testing Bayes factors: –Compare estimates of the marginal likelihoods of the models of interest –2*(ln marginal likelihood model 1 – ln marginal likelihood model 2) –>10, strong support for alternative (more complex model) Strict clock vs. relaxed clock –Also consider the coefficient of variation

Summarizing tree TreeAnnotator –Burnin 10% (501 samples) –Keep median heights –MCC tree Visualizing tree: FigTree –Posterior probabilities for branches –Median heights for clades of interest

Advanced analyses Different coalescent priors –Parametric models (exponential, logistic) –Bayesian skyline plots Phylogeography –Lemey et al, 2009, Plos Computational Biology Site specific rates of variation

Change in effective population size over time Log10 Ne

16 Bayesian Genealogy Of G Gene 1916 ( )

Additional resources Tutorials on the beast website, google group 16th International BioInformatics Workshop on Virus Evolution and Molecular Epidemiology –Johns Hopkins University, Baltimore –29 August - 03 September 2010, Bethesda, USA –