Bayes Theorem, a.k.a. Bayes Rule P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/06/2016: Lecture 01-2 Note: This Powerpoint presentation may contain macros that I wrote to help me create the slides. The macros aren’t needed to view the slides. You can disable or delete the macros without any change to the presentation.
Outline Bayes Rule Odds Form of Bayes Rule Application of Bayes Rule to the Interpretation of a Medical Test Overview: Bayesian Statistical Inference versus Classical Statistical Inference ------------------------------------------------------------------------------------------------ Learn R & RStudio – write a Bayesian inference function Bayes Rule Psych 548: Miyamoto, Win ‘16
Bayes Rule Reverend Thomas Bayes, 1702 – 1761 British Protestant minister & mathematician Bayes Rule is fundamentally important to: Bayesian statistics Bayesian decision theory Bayesian models in psychology Next: Explanation of Bayes Rule Psych 548, Miyamoto, Win '16
Bayes Rule – Explanation Likelihood of the Data Prior Probability of the Hypothesis Posterior Probability of the Hypothesis Normalizing Constant Formula for Computing P(Data) Psych 548, Miyamoto, Win '16
Bayes Rule – Explanation Information Needed to Compute P( Data ) P(Data | Hypothesis) P(Data | Not-Hypothesis) P(Hypothesis) P(Not-Hypothesis) = 1 - P(Hypothesis) Same As Previous Slide w-o Emphasis Rectangles Psych 548, Miyamoto, Win '16
Bayes Rule – Explanation Prior Probability of the Hypothesis Posterior Probability of the Hypothesis Likelihood of the Data Normalizing Constant Odds Form of Bayes Rule Psych 548, Miyamoto, Win '16
Bayes Rule – Odds Form Bayes Rule for H given D Bayes Rule for not-H given D Odds Form of Bayes Rule Explanation of Odds form of Bayes Rule Psych 548, Miyamoto, Win '16
Likelihood Ratio (diagnosticity) Bayes Rule (Odds Form) Posterior Odds Prior Odds (base rate) Likelihood Ratio (diagnosticity) Talk a bit about the concept of the odds. E.g., if you think that Obama has 2 to 1 odds to beat McCain, you think Obama is twice as likely as McCain to win. E.g., JM’s personal odds are more like 4 to 1 for Obama to beat McCain. E.g., if the odds are even (1 to 1) that Obama will beat McCain, then their chances are equal (50%). H = a hypothesis, e.g.., hypothesis that the patient has cancer = the negation of the hypothesis, e.g.., the hypothesis that the patient does not have cancer D = the data, e.g., a + result for a cancer test Interpretation of a Medical Test Result Psych 548, Miyamoto, Win '16
Bayesian Analysis of a Medical Test Result (Look at Handout) QUESTION: A physician knows from past experience in his practice that 1% of his patients have cancer (of a specific type) and 99% of his patients do not have the cancer. He also knows the probabilities of a positive test result (+ result) given cancer and given no cancer. These probabilities are: P(+ test | Cancer) = .792 and P(+ test | no cancer) = .096 Suppose Mr. X has a positive test result. What is the probability that Mr. X has cancer? Write down your intuitive answer. (Note to JM: Write estimates on board) Solution to this problem Psych 548, Miyamoto, Win '16
Given Information in the Diagnostic Inference from a Medical Test Result P(+ test | Cancer) = .792 (true positive rate a.k.a. hit rate) P(+ test | no cancer) = .096 (false positive rate a.k.a. false alarm rate) P(Cancer) = Prior probability of cancer = .01 P(No Cancer) = Prior probability of no cancer = 1 - P(Cancer) = .99 Mr. X has a + test result. What is the probability that Mr. X has cancer? Solution to this problem Psych 548, Miyamoto, Win '16
Bayesian Analysis of a Medical Test Result P(+ test | Cancer) = 0.792 and P(+ test | no cancer) = 0.096 P(Cancer) = Prior probability of cancer = 0.01 P(No Cancer) = Prior probability of no cancer = 0.99 P(Cancer | + test) = 1 / (12 + 1) = 0.077 Digression concerning What Are Odds? Psych 548, Miyamoto, Win '16
Digression: Converting Odds to Probabilities If X / (1 – X) = Y = the odds of X versus not-X Then X = Y(1 – X) = Y – XY So X + XY = Y So X(1 + Y) = Y So X = Y / (1 + Y) Conclusion: If Y are the odds for an event, then, Y / (1 + Y) is the probability of the event Return to Slide re Medical Test Inference Psych 548, Miyamoto, Win '16
Bayesian Analysis of a Medical Test Result P(+ test | Cancer) = 0.792 and P(+ test | no cancer) = 0.096 P(Cancer) = Prior probability of cancer = 0.01 P(No Cancer) = Prior probability of no cancer = 0.99 P(Cancer | + test) = (1/12) / (1 + 1/12) = 1 / (12 + 1) = 0.077 Compare the Normative Result to Physician’s Judgments Psych 548, Miyamoto, Win '16
Continue with the Medical Test Problem P(Cancer | + Result) = (.792)(.01)/(.103) = .077 Posterior odds against cancer are (.077)/(1 - .077) or about 1 chance in 12. Notice: The test is very diagnostic but still P(cancer | + result) is low because the base rate is low. David Eddy found that about 95 out of 100 physicians stated that P(cancer | +result) is about 75% in this case (very close to the 79% likelihood of a + result given cancer). General Characteristics of Bayesian Inference Psych 548, Miyamoto, Win '16
General Characteristics of Bayesian Inference The decision maker (DM) is willing to specify the prior probability of the hypotheses of interest. DM can specify the likelihood of the data given each hypothesis. Using Bayes Rule, we infer the probability of the hypotheses given the data Comparison Between Bayesian & Classical Stats - END Psych 548, Miyamoto, Win '16
How Does Bayesian Stats Differ from Classical Stats? Bayesian: Common Aspects Statistical Models Credible Intervals – sets of parameters that have high posterior probability Bayesian: Divergent Aspects Given data, compute the full posterior probability distribution over all parameters Generally null hypothesis testing is nonsensical. Posterior probabilities are meaningful; p-values are half-assed. MCMC approximations to posterior distributions. Classical: Common Aspects Statistical Models Confidence Intervals – which parameter values are tenable after viewing the data. Classical: Divergent Aspects No prior distributions in general, so this idea is meaningless or self- deluding. Null hypothesis te%sting P-values MCMC approximations are sometimes useful but not for computing posterior distributions. Sequential Presentation of the Common & Divergent Aspects Psych 548, Miyamoto, Win '16
How Does Bayesian Stats Differ from Classical Stats? Bayesian: Common Aspects Statistical Models Credible Intervals – sets of parameters that have high posterior probability Bayesian: Divergent Aspects Given data, compute the full posterior probability distribution over all parameters Generally null hypothesis testing is nonsensical. Posterior probabilities are meaningful; p-values are half-assed. MCMC approximations to posterior distributions. Classical: Common Aspects Statistical Models Confidence Intervals – which parameter values are tenable after viewing the data. Classical: Divergent Aspects No prior distributions in general, so this idea is meaningless or self- deluding. Null hypothesis te%sting P-values MCMC approximations are sometimes useful but not for computing posterior distributions. Repeat This Slide With Emphasis on Divergent Aspects Psych 548, Miyamoto, Win '16
How Does Bayesian Stats Differ from Classical Stats? Bayesian: Common Aspects Statistical Models Credible Intervals – sets of parameters that have high posterior probability Bayesian: Divergent Aspects Given data, compute the full posterior probability distribution over all parameters Generally null hypothesis testing is nonsensical. Posterior probabilities are meaningful; p-values are half-assed. MCMC approximations to posterior distributions. Classical: Common Aspects Statistical Models Confidence Intervals – which parameter values are tenable after viewing the data. Classical: Divergent Aspects No prior distributions in general, so this idea is meaningless or self-deluding. Null hypothesis testing P-values MCMC approximations are sometimes useful but not for computing posterior distributions. Repeat This Slide Without Emphasis Rectangles Psych 548, Miyamoto, Win '16
How Does Bayesian Stats Differ from Classical Stats? Bayesian: Common Aspects Statistical Models Credible Intervals – sets of parameters that have high posterior probability Bayesian: Divergent Aspects Given data, compute the full posterior probability distribution over all parameters Generally null hypothesis testing is nonsensical. Posterior probabilities are meaningful; p-values are half-assed. MCMC approximations to posterior distributions. Classical: Common Aspects Statistical Models Confidence Intervals – which parameter values are tenable after viewing the data. Classical: Divergent Aspects No prior distributions in general, so this idea is meaningless or self- deluding. Null hypothesis testing P-values MCMC approximations are sometimes useful but not for computing posterior distributions. Next Topic: Some Lessons in R - END Psych 548, Miyamoto, Win '16
Next Go to: Demo01-2: wUsing R to Think About Bayesian Inference END Psych 548: Miyamoto, Win ‘16