Bayesian Statistics HSTAT1101: 10. november 2004 Arnoldo Frigessi

Slides:



Advertisements
Similar presentations
Introductory Mathematics & Statistics for Business
Advertisements

Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
A Brief Introduction to Bayesian Inference Robert Van Dine 1.
Testing Hypotheses About Proportions Chapter 20. Hypotheses Hypotheses are working models that we adopt temporarily. Our starting hypothesis is called.
Chapter 10: Hypothesis Testing
Statistical Significance What is Statistical Significance? What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant?
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Class Handout #3 (Sections 1.8, 1.9)
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Statistical Significance What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant? How Do We Know Whether a Result.
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
Evaluating Hypotheses
Chapter 9 Hypothesis Testing.
Sample Size Determination Ziad Taib March 7, 2014.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Introduction to Hypothesis Testing
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Probability Distributions and Test of Hypothesis Ka-Lok Ng Dept. of Bioinformatics Asia University.
Fundamentals of Hypothesis Testing: One-Sample Tests
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Inference for a Single Population Proportion (p).
Comparing Two Population Means
Estimation of Statistical Parameters
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Significance Tests: THE BASICS Could it happen by chance alone?
Chapter 21: More About Tests “The wise man proportions his belief to the evidence.” -David Hume 1748.
Essential Statistics Chapter 131 Introduction to Inference.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Hypothesis Testing Hypothesis Testing Topic 11. Hypothesis Testing Another way of looking at statistical inference in which we want to ask a question.
Economic evaluation of health programmes Department of Epidemiology, Biostatistics and Occupational Health Class no. 19: Economic Evaluation using Patient-Level.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Bayesian vs. frequentist inference frequentist: 1) Deductive hypothesis testing of Popper--ruling out alternative explanations Falsification: can prove.
Bayesian statistics Probabilities for everything.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Introduction to the Practice of Statistics Fifth Edition Chapter 6: Introduction to Inference Copyright © 2005 by W. H. Freeman and Company David S. Moore.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Introduction to Statistical Inference A Comparison of Classical Orthodoxy with the Bayesian Approach.
Not in FPP Bayesian Statistics. The Frequentist paradigm Defines probability as a long-run frequency independent, identical trials Looks at parameters.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Hypothesis Testing An understanding of the method of hypothesis testing is essential for understanding how both the natural and social sciences advance.
Fall 2002Biostat Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis.
"Classical" Inference. Two simple inference scenarios Question 1: Are we in world A or world B?
Simple examples of the Bayesian approach For proportions and means.
© Copyright McGraw-Hill 2004
Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
© 2010 Pearson Prentice Hall. All rights reserved Chapter Hypothesis Tests Regarding a Parameter 10.
Chapter 10: The t Test For Two Independent Samples.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chapter 9 Introduction to the t Statistic
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Unit 5: Hypothesis Testing
Hypothesis Testing and Confidence Intervals (Part 2): Cohen’s d, Logic of Testing, and Confidence Intervals Lecture 9 Justin Kern October 17 and 19, 2017.
Chapter 8: Inference for Proportions
Chapter 9 Hypothesis Testing.
Significance Tests: The Basics
Significance Tests: The Basics
Type I and Type II Errors
Presentation transcript:

Bayesian Statistics HSTAT1101: 10. november 2004 Arnoldo Frigessi

Inventor of a “Bayesian analysis” for the binomial model Laplace at the same time discovered Bayes Theorem and a new analytic tool for approximating integrals Bayesian statistics was the method of statistics until about Reverend Thomas Bayes Mathematician who first used probability inductively and established a mathematical basis for probability inference He set down his findings on probability in "Essay Towards Solving a Problem in the Doctrine of Chances" (1763), published posthumously in the Philosophical Transactions of the Royal Society of London.

Statistics estimates unknown parameters (like the mean of a population). Parameters represent things that are unknown. They are some properties of the population from which the data arise. Questions of interest are expressed as questions on such parameters: confidence intervalls, hypothesis testing etc. Classical (frequentist) statistics considers parameters as specific to the problem, so that they are not subject to random variability. Hence parameters are just unknown numbers, they are not random, and it is not possible to make probabilistic statements about parameters (like the parameter has 35% chances to be larger than 0.75). Bayesian statistics considers parameters as unknown and random and hence it is allowed to make probabilistic statements about them (like the above). In Bayesian statistics parameters are uncertain either because they are random or because of our imperfect knowledge of them.

Example: ”Treatment 2 is more cost-effective than treatment 1 for a certain hospital.” Parameters involved in this statement: - mean cost and mean efficacy for treatment 1 - mean cost and mean efficacy for treatment 2 across all patients in the population for which the hospital is responsible. Bayesian point of view: we are uncertain about the statement, hence this uncertainty is described by a probability. We will exactly calculate the probability that treatment 2 is more cost-effective than treatment 1 for the hospital. Classical point of view: either treatment 2 is more cost-effective or it is not. Since this experiment cannot be repeated (it happens only once), we cannot talk about its probability.

... but, in classical statistics we can make a test! Null-hypothesis Ho: treatment 2 is NOT more cost-effective than treatment 1... and we can obtain a p-value! What is a p-value?

Correct: 2 (but it is quite a complicated explanation, isn’t it?) 1 and 3 are ways significance is commonly interpreted. BUT they are not correct. Answer 3 makes a probabilistic statement about the hypothesis, which is not random but either true or false. Answer 1 is about individual patients, while the test is on cost-efficacy.

We cannot interprete a p-value as a probability, because in the classical setting it is irrelevant how probable the hypothesis was a priori, before the data where collected. Example: Can a healer cure cancer? A healer treated 52 cancer patients and 33 of these were better after one session. Null Hypothesis Ho: the healer does not heal. p-value (one sided) = 3,5%. Hence reject at 5% level. Should we believe that it is 96.5% sure that the healer heals? Most doctors would regard healers as highly unreliable and in no way they would be persuaded by a single small experiment. After seeing the experiment, most doctors would continue to believe in Ho. The experiment was due to chance.

In practice, classical statistics would recognise that a much stronger evidence would be needed to reject a very likely Ho. So, the p-value in reality does not mean the same thing in all situations. To interprete the p-value as the probability of the null hypothesis is not only wrong but dangerous when the hypothesis is a priori highly unlikely. All practical statisticians are disturbed that a p-value cannot be seen as the probability that the null hypotheis is true. Similarly, it is disturbing that a 95% confidence interval for a treatment difference does NOT mean that the true difference has 95% chance of lying in this interval.

Classical confidence interval: [3.5 – 11.6] is a 95% confidence interval for the mean cost of … Interpretation: There is a 95% chance that the mean lies between 3.5 and Correct? NO! It cannot mean this since the mean cost is not random! In the Bayesian context, parameters are random and when we compute a Bayesian interval for the mean it means exactly the interpretation usually given to a confidence interval. In classical inference, the words confidence and significance are technical terms and should be interpreted as such!

One widely used way of presenting a cost-effectiveness analysis is through the Cost-Effectiveness Acceptability Curve (CEAC), introduced by van Hout et al (1994). For each value of the threshold willingness to pay λ, the CEAC plots the probability that one treatment is more cost-effective than another. This probability can only be meaningful in a Bayesian framework. It refers to the probability of a one-off event (the relative cost-effectiveness of these two particular treatments is one-off, and not repeatable).

Example: randomised clinical trial evidence Studies:1 RCT (n = 107) Comparators:dopexamine vs standard care Follow-up: 28 days Economics:Single cost analysis Boyd O, Grounds RM, Bennett ED. A randomised clinical trial of the effect of deliberate perioperative increase of oxygen delivery on mortality in high-risk surgical patients. JAMA 1993; 270:

Trial results Costs (£)Survival (days) meansemeanse Standard11,885£ 3,477£ Adrenaline10,847£ 3,644£ Dopexamine7,976£ 1,407£

Trial CEAC curves £0£10,000£20,000£30,000£40,000£50,000£60,000£70,000£80,000£90,000£100,000 Probability strategy is cost-effective ControlDopexamineAdrenaline

The Bayesian method: learn from the data The role of data is to add to our knowledge and so to update what we can say about hypothesis and parameters. If we want to learn from a new data set, we have to first say what we already know about the hypothesis, a priori, before we see the data. Bayesian statistics summerises the a priori known things on an unknown parameter (say the mean cost of something) in a distribution for the unknown quantity, called prior distribution. The prior distribution synthetises what is known or believed to be true before we analyse the new data. Then we will analyse the new data and summerise again the total information about the unknown hypothesis (or parameter) in a distribution called posterior distribution. Bayes’ formula is the mathematical way to calculate the posterior distribution given the prior distribution and the data.

prior data posterior

Bayes recognises the strength of each curve: the posterior is more influenced by the data then by the prior, since the data have a more narrow distribution. Peaks: prior = 0 data = 1,60 posterior = 1,30

The data curve is called likelihood, and it is also important in classical statistics. It describes the support that come from the data for the various possible values of the unknown parameter. Classical statistics uses only the likelihood, bayesian statistics all three curves. The classical estimate here would be the peak of the likelihood (1.6) The bayes estimate is about 1,3, since this includes our prior believe that the partameter should have a value which is below 2 or so.

The bayesian estimate is a compromise between data and prior knowledge. In this sense, bayesian statistics is able to make use of more information than classical statistics and obtain hence stronger results.

Bayesian statistics reads confidence intervals, estimates etc from the posterior distribution. A point estimate for the parameter is the a-posteriori most likely value, the peak of the posterior, or the expected value of the posterior. If we hav an hypothesis (for example that the paramter is positive), then we read from the posterior that the posterior probability for the paramter to be larger than zero is

If we are less sure about the parameter a priori, then we would use a flatter prior. Consequence is that the posterior looks more similar to the likelihood (data).

posterior = (constant number)  prior  likelihood  prior  likelihood P(parameter | data)  P(parameter)  P(data | parameter) P(  | data)  P(  )  P(data |  ) P(  | data) = P(  )  P(data |  ) / P(data) (Bayes formula)

How do we choose a prior distribution? The prior is subjective. Two different experts can have different knowledge and believes, which would lead to two different priors. If you have no opinion then it is possible to use a totally flat prior, which adds no information to what is in the data. If we want to have clear probabilistic interpretations of confidence and significance, then we need to have priors. This is considered as a weakness by many who are trained to reject subjectivity whenever possible. BUT: - Science is not objective in any case. Why should the binomial or the gaussian distribution we the true ones for a data set? - subjective evidence can be tuned down as much as one wishes. - if there is no consensus, and different priors lead to different decisions, why hiding it?

Example: Cancer at Slater School ( Example taken from an article by Paul Brodeur in the New Yorker in Dec )  Slater School is an elementary school where the staff was concerned that their high cancer rate could be due to two nearby high voltage transmission lines. Key Facts  there were 8 cases of invasive cancer over a long time among 145 staff members whose average age was between 40 and 44  based on the national cancer rate among woman this age (approximately 3/100), the expected number of cancers is 4.2 Assumptions: 1) the 145 staff members developed cancer independently of each other 2) the chance of cancer, , was the same for each staff person. Therefore, the number of cancers, X, follows a binomial distribution: X ~ bin (145, ) How well do each of four simplified Competing Theories explain the data? Theory A:  =.03 (the national rate, i.e. no effect of lines) Theory B:  =.04 Theory C:  =.05 Theory D:  =.06

The Likelihood of Theories A-D To compare the theories, we see how well each explains the data. That is, for each hypothesized , we calculate the binomial distribution: Theory A: Pr(X = 8 |  =.03 ) .036 Theory B: Pr(X = 8 |  =.04 ) .096 Theory C: Pr(X = 8 |  =.05 ) .134 Theory D: Pr(X = 8 |  =.06 ) .136 This is a ratio of approximately 1:3:4:4. So, Theory B explains the data about 3 times as well as theory A. There seems to be an effect of the lines!

A Bayesian Analysis There are other sources of information about whether cancer can be induced by proximity to high-voltage transmission lines. - Some epidemiologists show positive correlations between cancer and proximity - Other epidemiologists don’t show these correlations, and physicists and biologists maintain believe that energy in magnetic fields associated with high-voltage power lines is too small to have an appreciable biological effect. Supposes we judge the opposite expert knowledge equally reliable. Therefore, Theory A (no effect) is as likely as Theories B, C, and D together, and we judge theories B, C, and D to be equally likely. So, Pr(A)  0.5  Pr(B) + Pr(C) + Pr(D) Also, Pr(B)  Pr(C)  Pr(D)  0.5 / 3 = 1/6 These quantities will represent our prior distribution on the four possible hypothesis. prior 

Bayes’ Theorem P( A | X = 8 ) = 0.23 Likewise, Pr( B | X = 8 ) = 0.21 Pr( C | X = 8 ) = 0.28 Pr( D | X = 8 ) = 0.28 Accordingly, we’d say that each of these four theories is almost equally likely, So the probability that there is an effect of the lines at Slater is about = So the probability of an effect is pretty high, but not close enough to 1 to be a proof. posterior 

A non-Bayesian Analysis Classical test of the hypothesis Ho:  =.03 (no effect) against the alternative hypothesis. Calculate the p-value; we find: p-value = Pr(X=8|  = 0.03 )+ Pr(X=9|  = 0.03 )+ Pr(X=10|  = 0.03 ) +…+ Pr(X=145|  = 0.03 ) (138 terms to be added) .07 Under a classical hypothesis test, we would not reject the null hypothesis. So there is no indication of an effect of the lines. By comparison, the Bayesian analysis revealed that the probability that Pr(  >.03)  0.77

Today’s posterior is the prior of tomorrow!

Example: Hospitalisation A new drug seems to have good efficacy relative to a standard treatment. Is it cost-effective? Assume that it would be so if it would also reduce hospitalisation. Data: 100 patients in each treatment group. Standard treatment group: 25 days. (sample variance was 1.2) New treatment group: 5 days. (sample variance was 0.248) Classical test (do it!) would show that the difference is significant at 5% level. Pharmaceutical company would then say: ”The mean number of days in hospital under the new treatment is 0.05 per patient (5/100) while it is 0.25 with the standard treatment.” Cost effective!!!!!

Example: Hospitalisation & genuine prior information BUT: this was a rather small trial. Is there other evidence available? Suppose a much larger trial of a similar drug produced a mean number of days in hospital per patient of 0.21, with a standard error of 0.03 only. This would suggest that the 0.05 of the new drug is optimistic and there is a doubt on the real difference between new and standard treatment cost. BUT, the interpretation of how pertinent this evidence is, is subjective. It was a similar, not the same drug. It is however reasonable to suppose that the new drug and this similar one should be rather similar. Because the drug is not the same, we cannot simply put the two data sets together. Classical statistics does not know what to do, except to lower the required significance.

Example: Hospitalisation Bayesian statistics solves the problem by treating the early trial as giving prior information to the new trial. Assume that our prior says that the mean number of days in hospital per patient with the new treatment should be 0.21 but with a standard deviation which is larger, say 0.08, to mark that the two drugs are not the same. Now we compute the posterior estimate, given the new small trial, and obtain that the number of days in hospital per patient is So this is still better than the standard treatment (0.25 days). In fact we can compute the probability that the new drug reduces hospitalisation with respect to the standard one and we get 0.90! Conlcusion: the new treatment has a 90% chance to reduce hospitalisation (but not 95%) and that the mean number of days is about 0.1 (not 0.05).

The Bayesian Initiative in Health Economics & Outcomes Research

I hope I have not confused you too much! BUT I also hope that you are a bit confused now and that at later stages in your education and profession you will want to learn this better! For now: Bayesian statistics is NOT part of the pensum.