Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Probability and Statistics Basic concepts II (from a physicist point of view) Benoit CLEMENT – Université J. Fourier / LPSC
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Statistics review of basic probability and statistics.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Physics 270 – Experimental Physics. Let say we are given a functional relationship between several measured variables Q(x, y, …) What is the uncertainty.
Estimating parameters from data Gil McVean, Department of Statistics Tuesday 3 rd November 2009.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Statistical Background
Inference about a Mean Part II
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Applied Bayesian Analysis for the Social Sciences Philip Pendergast Computing and Research Services Department of Sociology
Chapter 9 Hypothesis Testing.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
5-3 Inference on the Means of Two Populations, Variances Unknown
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Statistical Decision Theory
The Triangle of Statistical Inference: Likelihoood
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Theory of Probability Statistics for Business and Economics.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Bayesian vs. frequentist inference frequentist: 1) Deductive hypothesis testing of Popper--ruling out alternative explanations Falsification: can prove.
Statistics In HEP Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Physics 270 – Experimental Physics. Let say we are given a functional relationship between several measured variables Q(x, y, …) x ±  x and x ±  y What.
Not in FPP Bayesian Statistics. The Frequentist paradigm Defines probability as a long-run frequency independent, identical trials Looks at parameters.
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Confidence Interval & Unbiased Estimator Review and Foreword.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
Chapter 13: Inferences about Comparing Two Populations Lecture 8b Date: 15 th November 2015 Instructor: Naveen Abedin.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
In Bayesian theory, a test statistics can be defined by taking the ratio of the Bayes factors for the two hypotheses: The ratio measures the probability.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
More about Posterior Distributions
Statistical NLP: Lecture 4
CS 594: Empirical Methods in HCC Introduction to Bayesian Analysis
Parametric Methods Berlin Chen, 2005 References:
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008

Questions to ask… What is likelihood-based inference? What is Bayesian inference and why is it different? How do you estimate parameters in a Bayesian framework? How do you choose a suitable prior? How do you compare models in Bayesian inference?

A recap on likelihood For any model the maximum information about model parameters is obtained by considering the likelihood function The likelihood function is proportional to the probability of observing the data given a specified parameter value The likelihood principle states that all information about the parameters of interest is contained in the likelihood function

An example Suppose we have data generated from a Poisson distribution. We want to estimate the parameter of the distribution The probability of observing a particular random variable is If we have observed a series of iid Poisson RVs we obtain the joint likelihood by multiplying the individual probabilities together

Relative likelihood We can compare the evidence for different parameter values through their relative likelihood For example, suppose we observe counts of 12, 22, 14 and 8 from a Poisson process The maximum likelihood estimate is 14. The relative likelihood is given by

Maximum likelihood estimation The maximum likelihood estimate is the set of parameter values that maximise the probability of observing the data we got The mle is consistent in that it converges to the truth as the sample size gets infinitely large The mle is asymptotically efficient in that it achieves the minimum possible variance (the Cramér-Rao Lower Bound) as n→∞ However, the mle is often biased for finite sample sizes –For example, the mle for the variance parameter in a normal distribution is the sample variance

Confidence intervals and likelihood Thanks to the CLT there is another useful result that allows us to define confidence intervals from the log-likelihood surface Specifically, the set of parameter values for which the log-likelihood is not more than 1.92 less than the maximum likelihood will define a 95% confidence interval –In the limit of large sample size the LRT is approximately chi-squared distributed under the null This is a very useful result, but shouldn’t be assumed to hold –i.e. Check with simulation

Likelihood ratio tests Suppose we have two models, H0 and H1, in which H0 is a special case of H1 We can compare the likelihood of the MLEs for the two models –Note the likelihood under H1 can be no worse than under H0 Theory shows that if H0 is true, then twice the difference in log-likelihood is asymptotically  2 distributed with degrees of freedom equal to the difference in the number of parameters between H0 and H1 –The likelihood ratio test Theory also tells us that if H1 is true, then the likelihood ratio test is the most powerful test for discriminating between H0 and H1 –Useful, though perhaps not as useful as it sounds

Criticisms of the frequentist approach The choice between models using P-values is focused on rejecting the null rather than proving the appropriateness of the alternative Representing uncertainty through the use of confidence intervals is messy and unintuitive –Cannot say that the probability of the true parameter being within the interval is 0.95 The frequentist approach requires a predefined experimental approach that must be followed through to completion (at which point data are analysed) –Bayesian inference naturally adapts to interim analysis, changes in stopping rules, combining data from different sources Focusing on point estimation leads to models that are ‘over-fitted’ to data

Bayesian estimators Bayesian statistics aims to make statements about the probability attached to different parameter values given the data you have collected It makes use of Bayes’ theorem Prior Likelihood Posterior Normalising constant

Are parameters random variables? The single most important conceptual difference between Bayesian statistics and frequentist statistics is the notion that the parameters you are interested in are themselves random variables This notion is encapsulated in the use of a subjective prior for your parameters Remember that to construct a confidence interval we have to define the set of possible parameter values A prior does the same thing, but also gives a weight to different values

Example: coin tossing I toss a coin twice and observe two heads I want to perform inference about the probability of obtaining a head on a single throw for the coin in question The MLE of the probability is 1.0 – yet I have a very strong prior belief that the answer is 0.5 Bayesian statistics forces the researcher to be explicit about prior beliefs but, in return, can be very specific about what information has been gained by performing the experiment It also provides a natural way for combining data from different experiments

The posterior Bayesian inference about parameters is contained in the posterior distribution The posterior can be summarised in various ways Prior Posterior Posterior mean Credible Interval

Choosing priors A prior reflects your belief before the experiment This might be relatively unfocused –Uniform distributions in the case of single parameters –Jeffreys prior (and other ‘uninformative’ priors) Or might be highly focused –In the coin-tossing experiment, most of my prior would be on P=0.5 –In an association study, my prior on a SNP being causal might be 1/10 7

Using posteriors Posterior summary to provide statements about point estimates and certainty Posterior prediction to make statements about future events Posterior predictive simulation to check the fit of the model to data

Bayes factors Bayes factors can be used to compare the evidence for different models –These do not need to be nested Bayes factors generalise the likelihood ratio by integrating the likelihood over the prior Importantly, if model 2 is a subset of model 1, it does not follow that the Bayes factor is necessarily greater than 1 –The subspace of model 1 that improves the likelihood may be very small and the extra parameter carry extra cost It is generally accepted that a BF of 3 is worth mention, a BF of 10 is strong evidence and a BF of 100 is decisive (Jeffreys)

Example Consider the crossing data of Bateson and Punnett in which we want to estimate the recombination fraction I will use a beta prior for the recombination fraction with parameters 3 and 7 Bateson and Punnett experiment Phenotype and genotype Observed Expected from 9:3:3:1 ratio Purple, long (P_L_) Purple, round (P_ll) 2172 Red, long (ppL_)2172 Red, round (ppll)5524

Conditional on the total sample (381), the likelihood function is described by the multinomial We get the following posterior distribution Comparing the model to one in which r = 0.5 gives a BF of 3.9 Posterior mean = Posterior mode = % ETPI = 0.10 – 0.16

Bayesian inference and the notion of shrinkage The notion of shrinkage is that you can obtained better estimates by assuming a certain degree of similarity among the things you want to estimate and a lack of complexity Practically, this means three things –Borrowing information across observations –Penalising inferences that are very different from anything else –Penalising more complex models The notion of shrinkage is implicit in the use of priors in Bayesian statistics There are also forms of frequentist inference where shrinkage is used –But NOT MLE