Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 9 Inferences Based on Two Samples.
Lecture XXIII.  In general there are two kinds of hypotheses: one concerns the form of the probability distribution (i.e. is the random variable normally.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter Seventeen HYPOTHESIS TESTING
Descriptive statistics Experiment  Data  Sample Statistics Sample mean Sample variance Normalize sample variance by N-1 Standard deviation goes as square-root.
Point and Confidence Interval Estimation of a Population Proportion, p
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 2. Inference.
Inference about a Mean Part II
EC Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 2. Inference.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Standard error of estimate & Confidence interval.
Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology.
Estimation and Hypothesis Testing. The Investment Decision What would you like to know? What will be the return on my investment? Not possible PDF for.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Machine Learning Queens College Lecture 3: Probability and Statistics.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
Chapter 11: Estimation Estimation Defined Confidence Levels
Bayesian Inference, Basics Professor Wei Zhu 1. Bayes Theorem Bayesian statistics named after Thomas Bayes ( ) -- an English statistician, philosopher.
Chapter 9 Hypothesis Testing II: two samples Test of significance for sample means (large samples) The difference between “statistical significance” and.
Bayesian inference review Objective –estimate unknown parameter  based on observations y. Result is given by probability distribution. Bayesian inference.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Naive Bayes Classifier
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone.
The binomial applied: absolute and relative risks, chi-square.
Large sample CI for μ Small sample CI for μ Large sample CI for p
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
S-012 Testing statistical hypotheses The CI approach The NHST approach.
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Confidence Interval & Unbiased Estimator Review and Foreword.
Simple examples of the Bayesian approach For proportions and means.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Sampling and estimation Petter Mostad
P Values Robin Beaumont 8/2/2012 With much help from Professor Chris Wilds material University of Auckland.
Statistics for Political Science Levin and Fox Chapter Seven
ES 07 These slides can be found at optimized for Windows)
- 1 - Matlab statistics fundamentals Normal distribution % basic functions mew=100; sig=10; x=90; normpdf(x,mew,sig) 1/sig/sqrt(2*pi)*exp(-(x-mew)^2/sig^2/2)
1 Probability and Statistics Confidence Intervals.
Sampling Distributions Statistics Introduction Let’s assume that the IQ in the population has a mean (  ) of 100 and a standard deviation (  )
- 1 - Outline Introduction to the Bayesian theory –Bayesian Probability –Bayes’ Rule –Bayesian Inference –Historical Note Coin trials example Bayes rule.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 4 Investigating the Difference in Scores.
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Bayesian Estimation and Confidence Intervals Lecture XXII.
More on Inference.
Bayesian Estimation and Confidence Intervals
Naive Bayes Classifier
Bayes Net Learning: Bayesian Approaches
Hypothesis Testing: Hypotheses
More on Inference.
More about Posterior Distributions
Problems: Q&A chapter 6, problems Chapter 6:
Statistical Inference about Regression
LECTURE 09: BAYESIAN LEARNING
Parametric Methods Berlin Chen, 2005 References:
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Bayes Theorem

Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.

Prior Odds, Omega The ratio of the two prior probabilities What new data would make you revise the priors?

Likelihood Ratio, LR If I have had too many beers, there is a 30% likelihood that I will act awfully. If I have not had too many beers, the likelihood is only 3%. The likelihood ratio is

Multiplication Rule of Probability Thus

Addition Rule of Probability Since B and Not B are mutually exclusive, Substituting this for the denominator of our previous expression,

Multiplication Rule Again and Now substitute the right hand expressions in our previous expression, which was

Bayes Theorem Yielding

Revising the Prior Probability You arrive at the party. Karl is behaving awfully. You revise your prior probability that Karl has had too many beers, obtaining a posterior probability.

A = Behaving awfully, B = had too many beers Prior Probabilities Likelihoods

Posterior Odds Given that Karl is behaving awfully, The probability that he has had too many beers is revised to.714. And the odds are revised from.25 to

Bayes Theorem Restated The posterior odds ratio = the product of the prior odds ratio and the likelihood ratio. 2.5 =.25 x 10

Bayesian Hypothesis Testing H  :  IQ = 100. H 1 :  IQ = 110. P(H  ) and P(H 1 ) are prior probabilities. I’ll set both equal to.5. D is the obtained data. P(D| H  ) and P(D|H 1 ) are the likelihoods. P(D| H  ) is a bit like the p value from classical hypothesis testing.

Compute Test Statistics D: Sample of 25 scores, M = 107. Assume  = 15, so  M = 3. Compute for each hypothesis For H , z = 2.33 For H 1, z = -1 Assume that z is normally distributed.

Obtain the Likelihoods and P(D) For each hypothesis, the likelihood is.5 times the probability density of z – we consider the null and the alternative equally likely. p(D|H 0 ) =.5(.0264) =.0132 p(D|H 1 ) =.5(.2420) =.1210 Notice that P(D) is the denominator of the ratio in Bayes theorem.

Calculate Posterior Probabilities P(H 0 |D) is what many researchers mistakenly think the traditional p value is. The traditional p is P(D|H 0 ).

Calculate Posterior Odds.9023/.098 = Given our data, the alternative hypothesis is more than 9 times more likely than the null hypothesis. Is this enough to persuade you to reject the null? No? Then let us gather more data.

Calculate Likelihoods and P(D) For a new sample of 25, M = 106. Z = 2 under the null, prob density.0540, z = under the alternative, pd P(D|H 0 ) is.0540/2 = P(D|H 1 ) is.1647/2 = and.9023 are posterior probs from previous analysis, prior probs in the new analysis.

Revise the Probabilities, Again With the posterior probability of the null at.0344, we are likely comfortable rejecting it.

Newly Revised Posterior Odds.9656/.0344= The alternative is more than 28 times more likely than the null.

The Alternative Hypothesis Note that the alternative hypothesis here was exact,  = 110. How do we set it? Could be the prediction of an alternative theory. We could make it  = value most likely given the observed data (the sample mean).

P(H  |D) and P(D|H  ) The P(H  |D) is the probability that naïve researchers think they have when they compute a p value. What they really have is P(D or more extreme|H  ). So why don’t more researchers use Bayesian stats to get P(H  |D) ? Traditionalists are uncomfortable with the subjectivity involved in setting prior probabilities.

Bayesian Confidence Intervals Parameters are thought of as random variables rather than constant in value. The distribution of a random variable represents our knowledge about what its true value may be. The wider that distribution, the greater our ignorance.

Precision (prc) The prc is the inverse of the variance of the distribution of the parameter. Thus, the greater the prc, the more we know about the parameter. For means,, so SEM 2 = s 2 /N, and the inverse of SEM 2 is N/s 2 = precision.

Priors: Informative or Non-informative We may think of the prior distribution of the parameter as noninformative –All possible values being equally likely –For example, uniform distrib. From 0 to 1. –Or uniform distribution from -  to +  Or as informative –Some values more likely than others –For example, normal distribution with a certain mean.

Posterior Distribution of the Parameter When we receive new data, we revise the prior distribution of the parameter. We can construct a confidence interval from the posterior distribution. Example: We want to estimate .

Estimating  We confess absolute ignorance about the value of , but are willing to assume a normal distribution for the parameter. We sample 100 scores. M = 107, s 2 = 200. The Precision = the inverse of the squared standard error = n/s 2.

95% Bayesian Confidence Interval This is identical to the traditional CI.

New Data Become Available N = 81, M = 106, s 2 = 243 Precision = 81/243 = 1/3 = prc sample Our prior distribution, the posterior distrib. from the first analysis, had M = 107, precision = ½ The new posterior distribution will be characterized by a weighted combination of the prior distribution and the new data.

Revised 

Revised SEM 2 Revised precision = sum of prior and sample precisions. Revised SEM 2 = inverse of revised precision = 1/ = 1.2

Revised Confidence Interval