Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)
Informal lecture objectives Objective 1 To enable the student to distinguish between observed data and the underlying tendencies which give rise to those data Objective 2: To understand the concept of random variation
... Objective 3 Describe how ‘observed’ values provide knowledge of the ‘true’ values using tests of hypotheses about about the true value Confidence intervals give a range which include the ‘true’ value with a specific probability.
Neural tube defects in Western Australia ( ) – hypothetical data
Hypothesis testing 1.Calculate the probability of getting an observation as extreme as, or more extreme than, the one observed if the stated hypothesis was true. 2.If this probability is very small, then either a)something very unlikely has occurred; or b)the hypothesis is wrong 3.It is then reasonable to conclude that the data are incompatible with the hypothesis. The probability is called a ‘p-value’
Remember! IMPORTANT: Think of the implications Rejecting H 0 is little use without a conclusion p<0.05 is arbitrary; nothing special happens between p=0.049 and p=0.051 p= and p=0.6 are easy to interpret False positive and false negative results Statistical significance depends on sample size. Flip a coin 3 times minimum p=0.25 (i.e. 2×1/8) Statistically significant clinically important P values widely used
Conclusions - range of values Objective 3 Describe how ‘observed’ values help us towards a knowledge of the ‘true’ values by: b)Confidence intervals give a range which include the ‘true’ value with a specific probability. Allowing us to test hypotheses about the true value
Any questions?
Estimation In a study, we observe a 30% higher risk of TB in Warwick than in the rest of the UK IRR of 1.3 H 0 ‘rejected’ (p=0.01) But, what is our ‘best guess’ at the true excess risk?
Informally Values outside the range [10% excess risk to 50% excess risk] are in some sense ‘inconsistent’ with the data The range [10% excess risk to 50% excess risk] probably includes the true value
The 95% confidence interval A range which we can be 95% certain includes the true value of the underlying tendency. The IRR for Warwick lies in (1.1, 1.5) with probability 95%. Centred on the observed value (our best guess at the real underlying value). So, the observed value always falls inside the 95% confidence interval
The 95% confidence interval Fortunately, the link between hypothesis tests and confidence intervals means that we don’t have to calculate lots of p-values and check whether to reject the hypothesis. For this course, just use ‘error factor’. Instead, simply calculate the ‘observed value’ and a second quantity called the ‘error factor’ (e.f.). Then: (observed value e.f.) is called the lower 95% confidence limit (CL) (observed value e.f.) is called the upper 95% confidence limit (CL) The full range between the lower and upper 95% CLs is called the 95% confidence interval
An example Observe 50 new cases of diabetes in a population of 2,000 people over 5 years. Exposure = 2,000 5 = 10,000 person years New cases = 50 Incidence = 50/10,000 = per person year = 5 per 1000 person years
Diabetes example continued Incidence = 50/10,000 = per person year = 5 per 1,000 person years Lower 95% CL = 1.33 = Upper 95% CL = 1.33 = So, our best estimate of the true incidence is 5 cases per 1,000 person years and we are 95% certain that the range 3.8 to 6.7 cases per 1,000 person years includes the true rate.
As we get more data We get more and more sure about the underlying value: e.f. gets smaller and the 95% CI narrower Observe 200 new cases of diabetes in a population of 40,000 people over 1 year. Estimated rate = (same as before) lower 95% CL = 1.15 = upper 95% CL = 1.15 = Best estimate still 5 cases per 1,000 person years, but now 95% certain that the true rate lies between 4.3 and 5.8 cases per 1,000 person years.
Any questions?
Confidence intervals Reflect uncertainty about the true value of something, e.g. an incidence, a population prevalence, a population average height etc. NOT a range within which 95% of individual observations lie
50 cases, 10,000 p-yrs. Estimate = 0.005, 95% CI (see above) = 3.8 to 6.7 cases per 1,000 person years. But rates in 3 individual years fall outside this range!!
Another example A sample of 50 students Observed mean height = 1.675m The 95% confidence interval for mean height is 1.65m to 1.70m But 95% of the 50 students fall between 1.55m and 1.85m in height. This is called a reference range (or normal range) not a confidence interval. This is an important distinction
Inference on a rate ratio Population 1: d 1 cases in P 1 person years Population 2: d 2 cases in P 2 person years Rate ratio = d 1 /P 1 d 2 /P 2 Inference on a rate ratio Population 1: d 1 cases in P 1 person years Population 2: d 2 cases in P 2 person years Confidence interval and test Rate ratio = d 1 /P 1 d 2 /P 2
Estimation versus hypothesis testing Estimation is more informative Estimation can incorporate a hypothesis test: Hypothesis: the incidence of diabetes in population A is the same as that in B. Data: Population A: 12 cases in 2,000 patient years Population B: 16 cases in 4,000 patient years Rates:A: 12/2,000 = B: 16/4,000 = Ratio of rates: A B = 1.5
Estimation vs hypothesis testing …. Estimation can incorporate a hypothesis test: Ratio of rates = 1 if rates are the same. Ratio of rates: A B = 1.5 95% CI for rate ratio = 1.5 2.15 = 0.70 to 1.5 2.15 = The range [0.70 to 3.23] includes 1.00: data are consistent with the original hypothesis so cannot reject it (p>0.05). This does not prove it’s true!!
Another example 80 deaths in 8,000 person-yrs (male) 50 deaths in 10,000 person-yrs (female) Rate M = 10 per 1,000 p-y; Rate F = 5 per 1,000 p-y Observed rate ratio (M/F) = 2.0 95% CI: [2÷1.43 to 2×1.43] = [1.40 to 2.86] Best estimate of true rate ratio=2.0, and 95% certain that true rate ratio lies between 1.40 and This range does not include 1.00 so able to reject hypothesis of equality (p<0.05)
Inference on an SMR Observe O deaths Expect E deaths (based on age-specific rates in the standard population and age-specific population sizes in the test population) SMR = (O/E) 100
Example for SMR On basis of age specific rates in standard population expect 50 deaths in test population. Observe 60. (O=60, E=50) SMR = (60/50)×100 = 120 95% CI for SMR = 120 ÷/× 1.29 = 93 to 155. CI includes 100 so data consistent with equality of death rate in test and standard populations (p>0.05). But also consistent with e.g. a 50% excess so certainly doesn’t prove equality.
Any questions?
Summary All observations (disease rates, levels of occupational risk, effectiveness of new drugs etc) are subject to random variation We always want to know about the underlying tendency = the true value of rates or risks We use observed data to test hypotheses about the underlying value We use observed data to estimate the underlying tendency
Summary In this course the best estimate of the true value of the underlying tendency is the observed value We express uncertainty by calculating error factors and deriving confidence intervals A 95% confidence interval is the range which includes the true value of the statistic of interest with probability 95%. It can also be viewed as the range of true values that are consistent with the observed data. If different values consistent with the observed data would lead to different conclusions you can only be uncertain what to conclude
Summary Population A: rate=0.008; B: rate=0.002 Rate ratio = 4, e.f.=2, 95% CI [2 to 8] All values in the 95% CI suggest A higher than B. Can safely conclude A higher than B. This is equivalent to saying the 95% CI does not include 1.00 (null hypothesis) so the rate ratio is significantly different from 1.00 (p<0.05)
Summary Population A: rate=0.01; B: rate=0.005 Rate ratio = 2, 95% CI [0.5 to 8] Values in 95% consistent with: A much higher than B; A somewhat lower than B; or both the same. Cannot really conclude anything too firmly. In this case 95% CI does include 1.00 (the null hypothesis) so the rate ratio is not significantly different from 1.00 (p>0.05) so cannot reject hypothesis of equality But this does not prove that the rates are equal
Any questions?