CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.

Slides:



Advertisements
Similar presentations
Motivating Markov Chain Monte Carlo for Multiple Target Tracking
Advertisements

Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Estimation, Variation and Uncertainty Simon French
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
Bayesian Estimation in MARK
Bayesian calibration and comparison of process-based forest models Marcel van Oijen & Ron Smith (CEH-Edinburgh) Jonathan Rougier (Durham Univ.)
A Brief Introduction to Bayesian Inference Robert Van Dine 1.
Intro to Bayesian Learning Exercise Solutions Ata Kaban The University of Birmingham 2005.
Bayesian statistics – MCMC techniques
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model Selection Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model Selection Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Statistical Analysis of Systematic Errors and Small Signals Reinhard Schwienhorst University of Minnesota 10/26/99.
7. Bayesian phylogenetic analysis using MrBAYES UST Jeong Dageum Thomas Bayes( ) The Phylogenetic Handbook – Section III, Phylogenetic.
A quick intro to Bayesian thinking 104 Frequentist Approach 10/14 Probability of 1 head next: = X Probability of 2 heads next: = 0.51.
Statistical Decision Theory
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Lecture 5a: Bayes’ Rule Class web site: DEA in Bioinformatics: Statistics Module Box 1Box 2Box 3.
Day 2 Review Chapters 5 – 7 Probability, Random Variables, Sampling Distributions.
Introduction to Bayesian statistics Yves Moreau. Overview The Cox-Jaynes axioms Bayes’ rule Probabilistic models Maximum likelihood Maximum a posteriori.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU.
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Bayesian vs. frequentist inference frequentist: 1) Deductive hypothesis testing of Popper--ruling out alternative explanations Falsification: can prove.
DNA Identification: Bayesian Belief Update Cybergenetics © TrueAllele ® Lectures Fall, 2010 Mark W Perlin, PhD, MD, PhD Cybergenetics, Pittsburgh,
Lecture 2 Review Probabilities Probability Distributions Normal probability distributions Sampling distributions and estimation.
Molecular Systematics
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Bayesian Phylogenetics. Bayes Theorem Pr(Tree|Data) = Pr(Data|Tree) x Pr(Tree) Pr(Data)
Mathematical Foundations Elementary Probability Theory Essential Information Theory Updated 11/11/2005.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Reducing MCMC Computational Cost With a Two Layered Bayesian Approach
Course on Bayesian Methods in Environmental Valuation
Inference of Non-Overlapping Camera Network Topology by Measuring Statistical Dependence Date :
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Bayesian statistics named after the Reverend Mr Bayes based on the concept that you can estimate the statistical properties of a system after measuting.
- 1 - Outline Introduction to the Bayesian theory –Bayesian Probability –Bayes’ Rule –Bayesian Inference –Historical Note Coin trials example Bayes rule.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Bayesian II Spring Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have a large enough sample of the posterior?
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
Canadian Bioinformatics Workshops
Bayes’ Theorem Suppose we have estimated prior probabilities for events we are concerned with, and then obtain new information. We would like to a sound.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Conditional Probability, Bayes Theorem, Independence and Repetition of Experiments Chris Massa.
Lecture 1.31 Criteria for optimal reception of radio signals.
Psychology 202a Advanced Psychological Statistics
Basic Probability Theory
Bayesian inference Presented by Amir Hadadi
Statistical NLP: Lecture 4
BNFO 602 Phylogenetics – maximum likelihood
BNFO 602 Phylogenetics Usman Roshan.
Wellcome Trust Centre for Neuroimaging
Reverend Thomas Bayes ( )
Bayes for Beginners Luca Chech and Jolanda Malamud
CS639: Data Management for Data Science
Conditional Probability, Bayes Theorem, Independence and Repetition of Experiments Chris Massa.
Presentation transcript:

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU)

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayes Theorem Reverend Thomas Bayes ( ) P(A | B) = P(B | A) x P(A) P(B) P(M | D) = P(D | M) x P(M) P(D) P(D|M): Probability of data given model = likelihood P(M): Prior probability of model P(D): Essentially a normalizing constant so posterior will sum to one P(M|D): Posterior probability of model

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesians vs. Frequentists Meaning of probability:Meaning of probability: –Frequentist: long-run frequency of event in repeatable experiment –Bayesian: degree of belief, way of quantifying uncertainty Finding probabilistic models from empirical data:Finding probabilistic models from empirical data: –Frequentist: parameters are fixed constants whose true values we are trying to find good (point) estimates for. trying to find good (point) estimates for. –Bayesian: uncertainty concerning parameter values expressed by means of probability distribution over possible parameter values of probability distribution over possible parameter values

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayes theorem: example William is concerned that he may have reggaetonitis - a rare, inherited disease that affects 1 in 1000 people Prior probability distribution Doctor Bayes investigates William using new efficient test: If you have disease, test will always be positive If not: false positive rate = 1% William tests positive. What is the probability that he actually has the disease?

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayes theorem: example II Model parameter: d (does William have disease) Possible parameter values: d=Y, d=N Possible outcomes (data): + : Test is positive - : Test is negative Observed data: + Likelihoods: P(+|Y) = 1.0(Test positive given disease) P(-|Y) = 0.0(Test negative given disease) P(+|N) = 0.01(Test positive given no disease) P(-|N) = 0.99(Test negative given no disease)

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayes theorem: example III P(M | D)= P(D | M) x P(M) P(D) P(Y|+)= P(+|Y) x P(Y)= P(+|Y) x P(Y) P(+) P(+|Y) x P(Y) + P(+|N) x P(N) =1 x 0.001= 9.1% 1 x x Other way of understanding result: (1) Test 1,000 people. (2) Of these, 1 has the disease and tests positive. (3) 1% of the remaining 999  10 will give a false positive test => Out of 11 positive tests, 1 is true: P(Y|+) = 1/11 = 9.1%

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayes theorem: example IV In Bayesian statistics we use data to update our ever-changing view of reality

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS MCMC: Markov chain Monte Carlo Problem: for complicated models parameter space is enormous.  Not easy/possible to find posterior distribution analytically Solution: MCMC = Markov chain Monte Carlo Start in random position on probability landscape. Attempt step of random length in random direction. (a) If move ends higher up: accept move (b) If move ends below: accept move with probability P (accept) = P LOW /P HIGH Note parameter values for accepted moves in file. After many, many repetitions points will be sampled in proportion to the height of the probability landscape

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS MCMCMC: Metropolis-coupled Markov Chain Monte Carlo Problem: If there are multiple peaks in the probability landscape, then MCMC may get stuck on one of them Solution: Metropolis-coupled Markov Chain Monte Carlo = MCMCMC = MC 3 MC 3 essential features: Run several Markov chains simultaneouslyRun several Markov chains simultaneously One chain “cold”: this chain performs MCMC samplingOne chain “cold”: this chain performs MCMC sampling Rest of chains are “heated”: move faster across valleysRest of chains are “heated”: move faster across valleys Each turn the cold and warm chains may swap position (swap probability is proportional to ratio between heights)Each turn the cold and warm chains may swap position (swap probability is proportional to ratio between heights)  More peaks will be visited More chains means better chance of visiting all important peaks, but each additional chain increases run-time

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS MCMCMC for inference of phylogeny Result of run: (a)Substitution parameters (b)Tree topologies

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Posterior probability distributions of substitution parameters

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Posterior Probability Distribution over Trees MAP (maximum a posteriori) estimate of phylogeny: tree topology occurring most often in MCMCMC outputMAP (maximum a posteriori) estimate of phylogeny: tree topology occurring most often in MCMCMC output Clade support: posterior probability of group = frequency of clade in sampled trees.Clade support: posterior probability of group = frequency of clade in sampled trees. 95% credible set of trees: order trees from highest to lowest posterior probability, then add trees with highest probability until the cumulative posterior probability is % credible set of trees: order trees from highest to lowest posterior probability, then add trees with highest probability until the cumulative posterior probability is 0.95