Presentation is loading. Please wait.

Presentation is loading. Please wait.

Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Similar presentations


Presentation on theme: "Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)"— Presentation transcript:

1 Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

2 Informal lecture objectives  Objective 1  To enable the student to distinguish between observed data and the underlying tendencies which give rise to those data  Objective 2:  To understand the concept of random variation

3 ... Objective 3  Describe how ‘observed’ values provide knowledge of the ‘true’ values using  tests of hypotheses about about the true value  Confidence intervals give a range which include the ‘true’ value with a specific probability.

4 Neural tube defects in Western Australia (1975-2000) – hypothetical data

5 Hypothesis testing 1.Calculate the probability of getting an observation as extreme as, or more extreme than, the one observed if the stated hypothesis was true. 2.If this probability is very small, then either a)something very unlikely has occurred; or b)the hypothesis is wrong 3.It is then reasonable to conclude that the data are incompatible with the hypothesis.  The probability is called a ‘p-value’

6 Remember!  IMPORTANT: Think of the implications  Rejecting H 0 is little use without a conclusion  p<0.05 is arbitrary; nothing special happens between p=0.049 and p=0.051  p=0.0001 and p=0.6 are easy to interpret  False positive and false negative results  Statistical significance depends on sample size. Flip a coin 3 times  minimum p=0.25 (i.e. 2×1/8)  Statistically significant  clinically important  P values widely used

7 Conclusions - range of values Objective 3  Describe how ‘observed’ values help us towards a knowledge of the ‘true’ values by: b)Confidence intervals give a range which include the ‘true’ value with a specific probability.  Allowing us to test hypotheses about the true value

8 Any questions?

9 Estimation  In a study, we observe a 30% higher risk of TB in Warwick than in the rest of the UK  IRR of 1.3  H 0 ‘rejected’ (p=0.01)  But, what is our ‘best guess’ at the true excess risk?

10  Informally  Values outside the range [10% excess risk to 50% excess risk] are in some sense ‘inconsistent’ with the data  The range [10% excess risk to 50% excess risk] probably includes the true value

11 The 95% confidence interval  A range which we can be 95% certain includes the true value of the underlying tendency.  The IRR for Warwick lies in (1.1, 1.5) with probability 95%.  Centred on the observed value (our best guess at the real underlying value).  So, the observed value always falls inside the 95% confidence interval

12 The 95% confidence interval  Fortunately, the link between hypothesis tests and confidence intervals means that we don’t have to calculate lots of p-values and check whether to reject the hypothesis. For this course, just use ‘error factor’.  Instead, simply calculate the ‘observed value’ and a second quantity called the ‘error factor’ (e.f.). Then:  (observed value  e.f.) is called the lower 95% confidence limit (CL)  (observed value  e.f.) is called the upper 95% confidence limit (CL)  The full range between the lower and upper 95% CLs is called the 95% confidence interval

13 An example  Observe 50 new cases of diabetes in a population of 2,000 people over 5 years.  Exposure = 2,000  5 = 10,000 person years  New cases = 50  Incidence = 50/10,000 = 0.005 per person year = 5 per 1000 person years 

14 Diabetes example continued  Incidence = 50/10,000 = 0.005 per person year  = 5 per 1,000 person years  Lower 95% CL = 0.005  1.33 = 0.00376  Upper 95% CL = 0.005  1.33 = 0.00665  So, our best estimate of the true incidence is 5 cases per 1,000 person years and we are 95% certain that the range 3.8 to 6.7 cases per 1,000 person years includes the true rate.

15 As we get more data  We get more and more sure about the underlying value: e.f. gets smaller and the 95% CI narrower  Observe 200 new cases of diabetes in a population of 40,000 people over 1 year.  Estimated rate = 0.005 (same as before)   lower 95% CL = 0.005  1.15 = 0.0043  upper 95% CL = 0.005  1.15 = 0.0058  Best estimate still 5 cases per 1,000 person years, but now 95% certain that the true rate lies between 4.3 and 5.8 cases per 1,000 person years.

16 Any questions?

17 Confidence intervals  Reflect uncertainty about the true value of something, e.g. an incidence, a population prevalence, a population average height etc.  NOT a range within which 95% of individual observations lie

18  50 cases, 10,000 p-yrs.  Estimate = 0.005, 95% CI (see above) = 3.8 to 6.7 cases per 1,000 person years.  But rates in 3 individual years fall outside this range!!

19 Another example  A sample of 50 students  Observed mean height = 1.675m  The 95% confidence interval for mean height is 1.65m to 1.70m  But 95% of the 50 students fall between 1.55m and 1.85m in height. This is called a reference range (or normal range) not a confidence interval.  This is an important distinction

20 Inference on a rate ratio  Population 1: d 1 cases in P 1 person years  Population 2: d 2 cases in P 2 person years   Rate ratio = d 1 /P 1  d 2 /P 2  Inference on a rate ratio  Population 1: d 1 cases in P 1 person years  Population 2: d 2 cases in P 2 person years  Confidence interval and test  Rate ratio = d 1 /P 1  d 2 /P 2 

21 Estimation versus hypothesis testing  Estimation is more informative  Estimation can incorporate a hypothesis test:  Hypothesis: the incidence of diabetes in population A is the same as that in B.  Data: Population A: 12 cases in 2,000 patient years Population B: 16 cases in 4,000 patient years  Rates:A: 12/2,000 = 0.006 B: 16/4,000 = 0.004  Ratio of rates: A  B = 1.5

22 Estimation vs hypothesis testing ….  Estimation can incorporate a hypothesis test:  Ratio of rates = 1 if rates are the same.  Ratio of rates: A  B = 1.5   95% CI for rate ratio = 1.5  2.15 = 0.70 to 1.5  2.15 = 3.23. The range [0.70 to 3.23] includes 1.00: data are consistent with the original hypothesis so cannot reject it (p>0.05). This does not prove it’s true!!

23 Another example  80 deaths in 8,000 person-yrs (male)  50 deaths in 10,000 person-yrs (female)  Rate M = 10 per 1,000 p-y; Rate F = 5 per 1,000 p-y  Observed rate ratio (M/F) = 2.0   95% CI: [2÷1.43 to 2×1.43] = [1.40 to 2.86]  Best estimate of true rate ratio=2.0, and 95% certain that true rate ratio lies between 1.40 and 2.86. This range does not include 1.00 so able to reject hypothesis of equality (p<0.05)

24 Inference on an SMR  Observe O deaths  Expect E deaths (based on age-specific rates in the standard population and age-specific population sizes in the test population)  SMR = (O/E)  100 

25 Example for SMR  On basis of age specific rates in standard population expect 50 deaths in test population. Observe 60. (O=60, E=50)  SMR = (60/50)×100 = 120   95% CI for SMR = 120 ÷/× 1.29 = 93 to 155. CI includes 100 so data consistent with equality of death rate in test and standard populations (p>0.05). But also consistent with e.g. a 50% excess so certainly doesn’t prove equality.

26 Any questions?

27 Summary  All observations (disease rates, levels of occupational risk, effectiveness of new drugs etc) are subject to random variation  We always want to know about the underlying tendency = the true value of rates or risks  We use observed data to test hypotheses about the underlying value  We use observed data to estimate the underlying tendency

28 Summary  In this course the best estimate of the true value of the underlying tendency is the observed value  We express uncertainty by calculating error factors and deriving confidence intervals  A 95% confidence interval is the range which includes the true value of the statistic of interest with probability 95%.  It can also be viewed as the range of true values that are consistent with the observed data. If different values consistent with the observed data would lead to different conclusions you can only be uncertain what to conclude

29 Summary  Population A: rate=0.008; B: rate=0.002  Rate ratio = 4, e.f.=2, 95% CI [2 to 8]  All values in the 95% CI suggest A higher than B. Can safely conclude A higher than B. This is equivalent to saying the 95% CI does not include 1.00 (null hypothesis) so the rate ratio is significantly different from 1.00 (p<0.05)

30 Summary Population A: rate=0.01; B: rate=0.005  Rate ratio = 2, 95% CI [0.5 to 8]  Values in 95% consistent with: A much higher than B; A somewhat lower than B; or both the same. Cannot really conclude anything too firmly.  In this case 95% CI does include 1.00 (the null hypothesis) so the rate ratio is not significantly different from 1.00 (p>0.05) so cannot reject hypothesis of equality  But this does not prove that the rates are equal

31 Any questions?


Download ppt "Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)"

Similar presentations


Ads by Google