Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU.

Slides:



Advertisements
Similar presentations
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Advertisements

New phylogenetic methods for studying the phenotypic axis of adaptive radiation Liam J. Revell University of Massachusetts Boston.
Biointelligence Laboratory, Seoul National University
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
Bayesian Estimation in MARK
Practical Session: Bayesian evolutionary analysis by sampling trees (BEAST) Rebecca R. Gray, Ph.D. Department of Pathology University of Florida.
1 Parametric Sensitivity Analysis For Cancer Survival Models Using Large- Sample Normal Approximations To The Bayesian Posterior Distribution Gordon B.
Sampling Distributions (§ )
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Phylogenetic Trees Lecture 4
Bayesian statistics – MCMC techniques
Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
Presenting: Assaf Tzabari
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
G. Cowan Lectures on Statistical Data Analysis Lecture 14 page 1 Statistical Data Analysis: Lecture 14 1Probability, Bayes’ theorem 2Random variables and.
. Phylogenetic Trees Lecture 13 This class consists of parts of Prof Joe Felsenstein’s lectures 4 and 5 taken from:
Robin McDougall, Ed Waller and Scott Nokleby Faculties of Engineering & Applied Science and Energy Systems & Nuclear Science 1.
7. Bayesian phylogenetic analysis using MrBAYES UST Jeong Dageum Thomas Bayes( ) The Phylogenetic Handbook – Section III, Phylogenetic.
Bayes Factor Based on Han and Carlin (2001, JASA).
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
BINF6201/8201 Molecular phylogenetic methods
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
WSEAS AIKED, Cambridge, Feature Importance in Bayesian Assessment of Newborn Brain Maturity from EEG Livia Jakaite, Vitaly Schetinin and Carsten.
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Model Inference and Averaging
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
Tree Inference Methods
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
A brief introduction to phylogenetics
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
Bayesian MCMC QTL mapping in outbred mice Andrew Morris, Binnaz Yalcin, Jan Fullerton, Angela Meesaq, Rob Deacon, Nick Rawlins and Jonathan Flint Wellcome.
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
The star-tree paradox in Bayesian phylogenetics Bengt Autzen Department of Philosophy, Logic and Scientific Method LSE.
Statistical stuff: models, methods, and performance issues CS 394C September 16, 2013.
Molecular Systematics
Week 41 Estimation – Posterior mean An alternative estimate to the posterior mode is the posterior mean. It is given by E(θ | s), whenever it exists. This.
Bayesian Phylogenetics. Bayes Theorem Pr(Tree|Data) = Pr(Data|Tree) x Pr(Tree) Pr(Data)
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Cosmological Model Selection David Parkinson (with Andrew Liddle & Pia Mukherjee)
Ben Stöver WS 2012/2013 Ancestral state reconstruction Molecular Phylogenetics – exercise.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.
Statistical stuff: models, methods, and performance issues CS 394C September 3, 2009.
Methods in Phylogenetic Inference Chris Castorena Thornton Lab.
Bayesian II Spring Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have a large enough sample of the posterior?
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Canadian Bioinformatics Workshops
Lecture 14 – Consensus Trees & Nodal Support
MCMC Stopping and Variance Estimation: Idea here is to first use multiple Chains from different initial conditions to determine a burn-in period so the.
Bayesian inference Presented by Amir Hadadi
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
More about Posterior Distributions
Bayesian inference with MrBayes Molecular Phylogenetics – exercise
PSY 626: Bayesian Statistics for Psychological Science
Lecture 14 – Consensus Trees & Nodal Support
Presentation transcript:

Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU

Topics Phylogenetics Bayesian inference and MCMC: overview Bayesian model testing MrBayesian tutorial and application – Nexus file – Configuration of the process – How to execute the process – analyzing the results

Phylogenetics Greek: phylum + genesis Broad definition: historical term, how the species evolve and fall Narrow definition: infer relationship of the extant We prefer the narrow one

Infer relationships among three species: Outgroup:

Three possible trees (topologies): A B C

A B C Prior distribution probability 1.0 Posterior distribution probability 1.0 Data (observations)

What is needed for inference? A probabilistic model of evolution Prior distribution on the parameters of the model Data A method for calculating the posterior distribution for the model, prior distribution and data

What is needed for inference? A probabilistic model of evolution Prior distribution on the parameters of the model Data A method for calculating the posterior distribution for the model, prior distribution and data

Model: topology + branch lengths Parameters topology branch lengths A B C D (expected amount of change)

Model: molecular evolution Parameters instantaneous rate matrix (Jukes-Cantor)

What is needed for inference? A probabilistic model of evolution Prior distribution on the parameters of the model Data A method for calculating the posterior distribution for the model, prior distribution and data

Priors on parameters Topology – All unique topologies have equal probabilities Branch lengths – Exponential prior puts more weight on small branch lengths; appr. uniform on transition probabilities

What is needed for inference? A probabilistic model of evolution Prior distribution on the parameters of the model Data A method for calculating the posterior distribution for the model, prior distribution and data

Data The data (alignment) Taxon Characters A ACG TTA TTA AAT TGT CCT CTT TTC AGA B ACG TGT TTC GAT CGT CCT CTT TTC AGA C ACG TGT TTA GAC CGA CCT CGG TTA AGG D ACA GGA TTA GAT CGT CCG CTT TTC AGA

What is needed for inference? A probabilistic model of evolution Prior distribution on the parameters of the model Data A method for calculating the posterior distribution for the model, prior distribution and data

Bayes’ Theorem Posterior distribution Prior distribution Likelihood function Normalizing Constant

tree 1tree 2 tree 3 Posterior probability distribution Parameter space (high-dimension  1d) Posterior probability

tree 1tree 2 tree 3 20% 48%32% We can focus on any parameter of interest (there are no nuisance parameters) by marginalizing the posterior over the other parameters (integrating out the uncertainty in the other parameters) (Percentages denote marginal probability distribution on trees)

joint probabilities marginal probabilities Marginal probabilities trees branch length vectors

How to estimate the posterior? Analytical calculation? Impossible!!!  except for very simple examples Random sampling of parameter space? Impossible too!!!  computational infeasible Dependent sampling using MCMC technique? Yes, you got it!

Metropolis-Hastings Sampling Assume that the current state has parameter values  Consider a move to a state with parameter values   according to proposal density q Accept the move with probability (prior ratio x likelihood ratio x proposal ratio)

Sampling Principles For a complex model, you typically have many “proposal” or “update” mechanisms (“moves”) Each mechanism changes one or a few parameters At each step (generation of the chain) one mechanism is chosen randomly according to some predetermined probability distribution It makes sense to try changing ‘more difficult’ parameters (such as topology in a phylogenetic analysis) more often

Analysis of 85 insect taxa based on 18S rDNA Application example

Model parameters 1 General Time Reversible (GTR) substitution model A B C D topology branch lengths

Model parameters 2 Gamma-shaped rate variation across sites

Priors on parameters Topology – all unique topologies have equal probability Branch lengths – exponential prior (exp(10) means that expected mean is 0.1 (1/10)) State Frequencies – Dirichlet prior: Dir(1,1,1,1) Rates (revmat) – Dirichlet prior: Dir(1,1,1,1,1,1) Shape of gamma-distribution of rates – Uniform prior: Uni(0,100)

burn-in stationary phase sampled with thinning (rapid mixing essential)

Majority rule consensus tree from sampled trees Frequencies represent the posterior probability of the clades Probability of clade being true given data, model, and prior (and given that the MCMC sample is OK)

Mean and 95% credibility interval for model parameters

MrBayes tutorial Introduction/examples

Nexus format input file Input: nexus format; accurately, nexus(ish)

… …

Running MrBayes  Use execute to bring data in a Nexus file into MrBayes  Set the model and priors using lset and prset  Run the chain using mcmc  Summarize the parameter samples using sump  Summarize the tree samples using sumt Note that MrBayes 3.1 runs two independent analyses by default

Convergence Diagnostics By default performs two independent analyses starting from different random trees (mcmc nruns=2) Average standard deviation of clade frequencies calculated and presented during the run (mcmc mcmcdiagn=yes diagnfreq=1000) and written to file (.mcmc) Standard deviation of each clade frequency and potential scale reduction for branch lengths calculated with sumt Potential scale reduction calculated for all substitution model parameters with sump

Bayes’ theorem Marginal likelihood (of the model) We have implicitly conditioned on a model:

Bayesian Model Choice Posterior model odds: Bayes factor:

Bayesian Model Choice The normalizing constant in Bayes’ theorem, the marginal probability of the model, f(X) or f(X|M), can be used for model choice f(X|M) can be estimated by taking the harmonic mean of the likelihood values from the MCMC run (MrBayes will do this automatically with ‘sump’) Any models can be compared: nested, non-nested, data-derived No correction for number of parameters Can prefer a simpler model over a more complex mode

Bayes Factor Comparisons Interpretation of the Bayes factor 2ln(B 10 )B 10 Evidence against M 0 0 to 21 to 3 Not worth more than a bare mention 2 to 63 to 20Positive 6 to 1020 to 150Strong > 10> 150Very strong