Bayesian Statistics HSTAT1101: 10. november 2004 Arnoldo Frigessi

Bayesian Statistics HSTAT1101: 10. november 2004 Arnoldo Frigessi frigessi@medisin.uio.no

Inventor of a “Bayesian analysis” for the binomial model Laplace at the same time discovered Bayes Theorem and a new analytic tool for approximating integrals Bayesian statistics was the method of statistics until about 1910. Reverend Thomas Bayes 1702-1761 Mathematician who first used probability inductively and established a mathematical basis for probability inference He set down his findings on probability in "Essay Towards Solving a Problem in the Doctrine of Chances" (1763), published posthumously in the Philosophical Transactions of the Royal Society of London.

Statistics estimates unknown parameters (like the mean of a population). Parameters represent things that are unknown. They are some properties of the population from which the data arise. Questions of interest are expressed as questions on such parameters: confidence intervalls, hypothesis testing etc. Classical (frequentist) statistics considers parameters as specific to the problem, so that they are not subject to random variability. Hence parameters are just unknown numbers, they are not random, and it is not possible to make probabilistic statements about parameters (like the parameter has 35% chances to be larger than 0.75). Bayesian statistics considers parameters as unknown and random and hence it is allowed to make probabilistic statements about them (like the above). In Bayesian statistics parameters are uncertain either because they are random or because of our imperfect knowledge of them.

Example: ”Treatment 2 is more cost-effective than treatment 1 for a certain hospital.” Parameters involved in this statement: - mean cost and mean efficacy for treatment 1 - mean cost and mean efficacy for treatment 2 across all patients in the population for which the hospital is responsible. Bayesian point of view: we are uncertain about the statement, hence this uncertainty is described by a probability. We will exactly calculate the probability that treatment 2 is more cost-effective than treatment 1 for the hospital. Classical point of view: either treatment 2 is more cost-effective or it is not. Since this experiment cannot be repeated (it happens only once), we cannot talk about its probability.

... but, in classical statistics we can make a test! Null-hypothesis Ho: treatment 2 is NOT more cost-effective than treatment 1... and we can obtain a p-value! What is a p-value?

Correct: 2 (but it is quite a complicated explanation, isn’t it?) 1 and 3 are ways significance is commonly interpreted. BUT they are not correct. Answer 3 makes a probabilistic statement about the hypothesis, which is not random but either true or false. Answer 1 is about individual patients, while the test is on cost-efficacy.

We cannot interprete a p-value as a probability, because in the classical setting it is irrelevant how probable the hypothesis was a priori, before the data where collected. Example: Can a healer cure cancer? A healer treated 52 cancer patients and 33 of these were better after one session. Null Hypothesis Ho: the healer does not heal. p-value (one sided) = 3,5%. Hence reject at 5% level. Should we believe that it is 96.5% sure that the healer heals? Most doctors would regard healers as highly unreliable and in no way they would be persuaded by a single small experiment. After seeing the experiment, most doctors would continue to believe in Ho. The experiment was due to chance.

In practice, classical statistics would recognise that a much stronger evidence would be needed to reject a very likely Ho. So, the p-value in reality does not mean the same thing in all situations. To interprete the p-value as the probability of the null hypothesis is not only wrong but dangerous when the hypothesis is a priori highly unlikely. All practical statisticians are disturbed that a p-value cannot be seen as the probability that the null hypotheis is true. Similarly, it is disturbing that a 95% confidence interval for a treatment difference does NOT mean that the true difference has 95% chance of lying in this interval.

Classical confidence interval: [3.5 – 11.6] is a 95% confidence interval for the mean cost of … Interpretation: There is a 95% chance that the mean lies between 3.5 and 11.6. Correct? NO! It cannot mean this since the mean cost is not random! In the Bayesian context, parameters are random and when we compute a Bayesian interval for the mean it means exactly the interpretation usually given to a confidence interval. In classical inference, the words confidence and significance are technical terms and should be interpreted as such!

One widely used way of presenting a cost-effectiveness analysis is through the Cost-Effectiveness Acceptability Curve (CEAC), introduced by van Hout et al (1994). For each value of the threshold willingness to pay λ, the CEAC plots the probability that one treatment is more cost-effective than another. This probability can only be meaningful in a Bayesian framework. It refers to the probability of a one-off event (the relative cost-effectiveness of these two particular treatments is one-off, and not repeatable).

Example: randomised clinical trial evidence Studies:1 RCT (n = 107) Comparators:dopexamine vs standard care Follow-up: 28 days Economics:Single cost analysis Boyd O, Grounds RM, Bennett ED. A randomised clinical trial of the effect of deliberate perioperative increase of oxygen delivery on mortality in high-risk surgical patients. JAMA 1993; 270:2699-707

Trial results Costs (£)Survival (days) meansemeanse Standard11,885£ 3,477£ 54150.2 Adrenaline10,847£ 3,644£ 61564.1 Dopexamine7,976£ 1,407£ 65734.7

Trial CEAC curves 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 £0£10,000£20,000£30,000£40,000£50,000£60,000£70,000£80,000£90,000£100,000 Probability strategy is cost-effective ControlDopexamineAdrenaline

The Bayesian method: learn from the data The role of data is to add to our knowledge and so to update what we can say about hypothesis and parameters. If we want to learn from a new data set, we have to first say what we already know about the hypothesis, a priori, before we see the data. Bayesian statistics summerises the a priori known things on an unknown parameter (say the mean cost of something) in a distribution for the unknown quantity, called prior distribution. The prior distribution synthetises what is known or believed to be true before we analyse the new data. Then we will analyse the new data and summerise again the total information about the unknown hypothesis (or parameter) in a distribution called posterior distribution. Bayes’ formula is the mathematical way to calculate the posterior distribution given the prior distribution and the data.

prior data posterior

Bayes recognises the strength of each curve: the posterior is more influenced by the data then by the prior, since the data have a more narrow distribution. Peaks: prior = 0 data = 1,60 posterior = 1,30

The data curve is called likelihood, and it is also important in classical statistics. It describes the support that come from the data for the various possible values of the unknown parameter. Classical statistics uses only the likelihood, bayesian statistics all three curves. The classical estimate here would be the peak of the likelihood (1.6) The bayes estimate is about 1,3, since this includes our prior believe that the partameter should have a value which is below 2 or so.

The bayesian estimate is a compromise between data and prior knowledge. In this sense, bayesian statistics is able to make use of more information than classical statistics and obtain hence stronger results.

Bayesian statistics reads confidence intervals, estimates etc from the posterior distribution. A point estimate for the parameter is the a-posteriori most likely value, the peak of the posterior, or the expected value of the posterior. If we hav an hypothesis (for example that the paramter is positive), then we read from the posterior that the posterior probability for the paramter to be larger than zero is 0.89. 0.89

If we are less sure about the parameter a priori, then we would use a flatter prior. Consequence is that the posterior looks more similar to the likelihood (data).

How do we choose a prior distribution? The prior is subjective. Two different experts can have different knowledge and believes, which would lead to two different priors. If you have no opinion then it is possible to use a totally flat prior, which adds no information to what is in the data. If we want to have clear probabilistic interpretations of confidence and significance, then we need to have priors. This is considered as a weakness by many who are trained to reject subjectivity whenever possible. BUT: - Science is not objective in any case. Why should the binomial or the gaussian distribution we the true ones for a data set? - subjective evidence can be tuned down as much as one wishes. - if there is no consensus, and different priors lead to different decisions, why hiding it?

Example: Cancer at Slater School ( Example taken from an article by Paul Brodeur in the New Yorker in Dec. 1992.)  Slater School is an elementary school where the staff was concerned that their high cancer rate could be due to two nearby high voltage transmission lines. Key Facts  there were 8 cases of invasive cancer over a long time among 145 staff members whose average age was between 40 and 44  based on the national cancer rate among woman this age (approximately 3/100), the expected number of cancers is 4.2 Assumptions: 1) the 145 staff members developed cancer independently of each other 2) the chance of cancer, , was the same for each staff person. Therefore, the number of cancers, X, follows a binomial distribution: X ~ bin (145, ) How well do each of four simplified Competing Theories explain the data? Theory A:  =.03 (the national rate, i.e. no effect of lines) Theory B:  =.04 Theory C:  =.05 Theory D:  =.06

The Likelihood of Theories A-D To compare the theories, we see how well each explains the data. That is, for each hypothesized , we calculate the binomial distribution: Theory A: Pr(X = 8 |  =.03 ) .036 Theory B: Pr(X = 8 |  =.04 ) .096 Theory C: Pr(X = 8 |  =.05 ) .134 Theory D: Pr(X = 8 |  =.06 ) .136 This is a ratio of approximately 1:3:4:4. So, Theory B explains the data about 3 times as well as theory A. There seems to be an effect of the lines!

A Bayesian Analysis There are other sources of information about whether cancer can be induced by proximity to high-voltage transmission lines. - Some epidemiologists show positive correlations between cancer and proximity - Other epidemiologists don’t show these correlations, and physicists and biologists maintain believe that energy in magnetic fields associated with high-voltage power lines is too small to have an appreciable biological effect. Supposes we judge the opposite expert knowledge equally reliable. Therefore, Theory A (no effect) is as likely as Theories B, C, and D together, and we judge theories B, C, and D to be equally likely. So, Pr(A)  0.5  Pr(B) + Pr(C) + Pr(D) Also, Pr(B)  Pr(C)  Pr(D)  0.5 / 3 = 1/6 These quantities will represent our prior distribution on the four possible hypothesis. prior 

Bayes’ Theorem P( A | X = 8 ) = 0.23 Likewise, Pr( B | X = 8 ) = 0.21 Pr( C | X = 8 ) = 0.28 Pr( D | X = 8 ) = 0.28 Accordingly, we’d say that each of these four theories is almost equally likely, So the probability that there is an effect of the lines at Slater is about 0.21 + 0.28 + 0.28 = 0.77. So the probability of an effect is pretty high, but not close enough to 1 to be a proof. posterior 

A non-Bayesian Analysis Classical test of the hypothesis Ho:  =.03 (no effect) against the alternative hypothesis. Calculate the p-value; we find: p-value = Pr(X=8|  = 0.03 )+ Pr(X=9|  = 0.03 )+ Pr(X=10|  = 0.03 ) +…+ Pr(X=145|  = 0.03 ) (138 terms to be added) .07 Under a classical hypothesis test, we would not reject the null hypothesis. So there is no indication of an effect of the lines. By comparison, the Bayesian analysis revealed that the probability that Pr(  >.03)  0.77

Today’s posterior is the prior of tomorrow!

Example: Hospitalisation A new drug seems to have good efficacy relative to a standard treatment. Is it cost-effective? Assume that it would be so if it would also reduce hospitalisation. Data: 100 patients in each treatment group. Standard treatment group: 25 days. (sample variance was 1.2) New treatment group: 5 days. (sample variance was 0.248) Classical test (do it!) would show that the difference is significant at 5% level. Pharmaceutical company would then say: ”The mean number of days in hospital under the new treatment is 0.05 per patient (5/100) while it is 0.25 with the standard treatment.” Cost effective!!!!!

Example: Hospitalisation & genuine prior information BUT: this was a rather small trial. Is there other evidence available? Suppose a much larger trial of a similar drug produced a mean number of days in hospital per patient of 0.21, with a standard error of 0.03 only. This would suggest that the 0.05 of the new drug is optimistic and there is a doubt on the real difference between new and standard treatment cost. BUT, the interpretation of how pertinent this evidence is, is subjective. It was a similar, not the same drug. It is however reasonable to suppose that the new drug and this similar one should be rather similar. Because the drug is not the same, we cannot simply put the two data sets together. Classical statistics does not know what to do, except to lower the required significance.

Example: Hospitalisation Bayesian statistics solves the problem by treating the early trial as giving prior information to the new trial. Assume that our prior says that the mean number of days in hospital per patient with the new treatment should be 0.21 but with a standard deviation which is larger, say 0.08, to mark that the two drugs are not the same. Now we compute the posterior estimate, given the new small trial, and obtain that the number of days in hospital per patient is 0.095. So this is still better than the standard treatment (0.25 days). In fact we can compute the probability that the new drug reduces hospitalisation with respect to the standard one and we get 0.90! Conlcusion: the new treatment has a 90% chance to reduce hospitalisation (but not 95%) and that the mean number of days is about 0.1 (not 0.05).

http://www.bayesian-initiative.com/ The Bayesian Initiative in Health Economics & Outcomes Research

I hope I have not confused you too much! BUT I also hope that you are a bit confused now and that at later stages in your education and profession you will want to learn this better! For now: Bayesian statistics is NOT part of the pensum.

Bayesian Statistics HSTAT1101: 10. november 2004 Arnoldo Frigessi

Similar presentations

Presentation on theme: "Bayesian Statistics HSTAT1101: 10. november 2004 Arnoldo Frigessi"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bayesian Statistics HSTAT1101: 10. november 2004 Arnoldo Frigessi

Similar presentations

Presentation on theme: "Bayesian Statistics HSTAT1101: 10. november 2004 Arnoldo Frigessi"— Presentation transcript:

Similar presentations

About project

Feedback