Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Confidence Intervals.

Similar presentations


Presentation on theme: "Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Confidence Intervals."— Presentation transcript:

1 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Confidence Intervals

2 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.2 Statistical Inference  Statistical Inference  Inferences regarding a population are made based on a sample  Inferences about population parameters (e.g., μ ) are made by examining sample statistics (e.g., sample mean)

3 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.3 Statistical Inference  Statistical Inference  2 primary approaches  Hypothesis Testing (of a population parameter)  Estimation (of a population parameter)  Point Estimation  Sample mean is an estimate of the population mean  Interval Estimation  Confidence Intervals (CIs)  Hypothesis Testing vs. Estimation  Closely related

4 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.4 When p-values are low…  …thus indicating an effect…  Then the natural question is “what is the effect”  Answer this question with confidence intervals

5 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.5 When p-values are high…  This does not imply “no effect”  It only implies that you failed to exclude the possibility of “no effect”  Construct confidence intervals to identify the effect sizes that can be “ruled out” (i.e., are inconsistent with your data)

6 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.6 Thus…  ALWAYS PRESENT CONFIDENCE INTERVALS WITH P-VALUES

7 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.7 Confidence Interval  A range of values associated with a parameter of interest (such as a population mean or a treatment effect) that is calculated using the data, and will cover the TRUE parameter with a specified probability (if the study was repeated a large number of times)

8 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.8 Confidence Interval  Based on sample statistics (estimates of population parameters)  The width of the CI provides some information regarding the precision of the estimate  Provides both a range of plausible values and a test for the parameter of interest

9 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.9 Confidence Interval  Roughly speaking: an interval that is expected to cover the true parameter  Based on the notion of repeated sampling (similar to hypothesis testing)

10 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.10 Confidence Interval  The “percent” indicates the probability (based on repeated sampling) that the CI covers the TRUE parameter  Not the probability that the parameter falls in the interval  The CI is the random entity (that depends on the random sample)  The parameter is fixed  A different sample would produce a different interval, however the parameter of interest remains unchanged

11 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.11 Illustration  We are analyzing a study to determine if a new drug decreases LDL cholesterol  We measure the LDL of 100 people before administering the drug and then again after 12 weeks of treatment  We then calculate the mean change (post-pre) and examine it to see if improvement is observed

12 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.12 Illustration  Assume now that the drug has no effect (i.e., the true mean change is 0)  Note that we never know what the true mean is  We perform the study and note the mean change and calculate a 95% CI  Hypothetically, we repeat the study an infinite number of times (each time re-sampling 100 new people)

13 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.13 Illustration  Study 1, Mean=–7, 95% CI (-12, 2)  Study 2, Mean=–2, 95% CI (-9, 5)  Study 3, Mean=4, 95% CI (-3, 11)  Study 4, Mean=0, 95% CI (-7, 7)  Study 5, Mean=-5,95% CI (-12, 2) .

14 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.14 Illustration  Remember the true change is zero (i.e., the drug is worthless)  95% of these intervals will cover the true change (0).  5% of the intervals will not cover 0  In practice, we only perform the study once.  We have no way of knowing if the interval that we calculated is one of the 95% (that covers the true parameter) or one of the 5% that does not.

15 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.15 Example  Two treatments are being compared with respect to “clinical response” for the treatment of nosocomial (hospital- acquired) pneumonia  The 95% CI for the difference in response rates for the two treatment groups is (- 0.116, 0.151)  What does this mean?

16 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.16 Example (continued)  Based upon the notion of repeated sampling, 95% of the CIs calculated in this manner would cover the true difference in response rates  We do not know if we have one of the 95% that covers the true difference or one of the 5% that does not  Thus we are 95% confident that the true between-group difference in the proportion of subjects with clinical response is between -0.116 and 0.151

17 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.17 Caution  Thus, every time that we calculate a 95% CI, then there is a 5% chance that the CI does not cover the quantity that you are estimating  If you perform a large analysis, calculating many CIs for many parameters, then you can expect that 5% of the will not cover the the parameters of interest

18 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.18 CIs and Hypothesis Testing  To use CIs for hypothesis testing: values between the limits are values for which the null hypothesis would not be rejected  At the α =0.05 level:  We would fail to reject H 0 : treatment change=0 since 0 is contained in (-0.116, 0.151)  We would reject H 0 : treatment change=-20 since -20 is not contained in (-0.116, 0.151)

19 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.19 CIs and Hypothesis Testing  What would be the conclusion of these hypothesis tests if we wanted to test at:  α =0.01?  α =0.10?

20 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.20 Confidence Interval Width  It is desirable to have narrow CIs  Implies more precision in your estimate  Wide CIs have little meaning

21 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.21 Width of CI  In general, with all other things being equal:  Smaller sample size  wider CIs  Higher confidence  wider CIs  Larger variability  wider CIs

22 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.22 In Practice  It is a good idea to provide confidence intervals in an analyses  They can be used for hypothesis testing and provide an estimate of the magnitude of the effect (which p-values do not provide)

23 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.23 Example: BPNS in NARC 007

24 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.24 CIs  Similar to hypothesis testing:  We may choose the confidence level (e.g., 95%  α =0.05)  CIs may be:  1-sided: (- ∞, value) or (value, ∞ ), or  2-sided (value, value)

25 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.25 CIs  Can be constructed for:  A population mean  The difference between two population means  A population proportion  The difference between two population proportions

26 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.26 Example: Neurological Effects of HCV co-infection in HIV  Objective  Investigate the effect of active HCV replication on neuropsychological function and neuropathy in HIV-infected individuals with (HIV) virologic suppression

27 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.27 Methods  Brief Neuro-Cognitive Screen (BNCS)  Administered by trained site personnel (not neuropsychologists)  Battery assessing planning ability, visual motor speed, and concentration  Tests  Trailmaking A  Trailmaking B  Digit Symbol  Raw test scores normalized for age, education, race, gender  Heaton et. al., 2004, WAIS-III-WMS-III-WIAT_II Scoring Assistant  NPZ3 – mean of the 3 normalized tests

28 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.28 Methods  Brief Peripheral Neuropathy Screen (BPNS)  Administered by trained site personnel (not neurologists)  Peripheral neuropathy  At least mild loss of vibration sensation in both great toes or ankle reflexes absent or hypoactive relative to knees  Symptomatic peripheral neuropathy  Peripheral neuropathy and at least mild symptoms (e.g., pain, aching, numbness, burning, pins and needles)

29 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.29 Methods  Primary analyses  2 group comparison  ≤ 40 IU/ml vs. > 40 IU/ml

30 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.30 Results: Neuropsychological Tests HCV VL Status Test ≤ 40 IU/ml (N=X) > 40 IU/ml (N=X) P*95% CI* NPZ3-0.457 (0.830) -0.589 (0.758) 0.14-0.266,0.039 Trailmaking A-0.231 (1.172) -0.383 (1.066) 0.56-0.278,0.151 Trailmaking B-0.021 (1.182) -0.119 (1.086) 0.31-0.333,0.106 Digit Symbol-1.152 (0.777) -1.254 (0.702) 0.07-.0268,0.011 Note: Data are mean (SD). * Adjusted for IV Drug use, education, sex, race, age

31 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.31 Results: Symptomatic Neuropathy HCV VL Status ≤ 40 IU/ml> 40 IU/ml Total Symptomatic Neuropathy Yes393069 No276129405 Total 315159474 P*=0.06, OR*=1.65, (0.98, 2.77) * Adjusted for IV drug use, education, d-drug use, sex, race, age

32 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.32 Where are we now?  We now use the principles of hypothesis testing, confidence interval estimation, to conduct hypothesis tests regarding specific parameters of interest, and build confidence intervals for these parameters

33 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.33 Parameter of Interest: Population Mean  Hypothesis tests regarding a population mean can be performed using “t-tests”  T-tests  1-sample  2-sample  Independent samples  Paired

34 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.34 T-tests  T-tests use the t-distribution  Similar to the normal distribution  Make use of the standardization and the CLT

35 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.35 One-Sample T-test 1.H 0 : μ = μ 0 2.H A : μ ≠ μ 0 or μ > μ 0 or μ < μ 0 3.Typically set α at 0.05 4.Test Statistic: P-value 5.Conclusion – Reject or Fail to Reject H 0 ? Assumptions:  Random (valid) sampling  Data comes from a population that is normally distributed

36 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.36

37 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.37  We can then plug in s for σ and use:  However, since we have less information than if we new σ, t is not standard normal (and has more variability)  It has a t-distribution Example: Benzene Concentration in Cigars

38 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.38

39 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.39

40 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.40  What is the probability that we would see these data if the true (population) mean of cigars was 81 μg/g  If the true mean was 81, the probability of observing a sample mean of 151 or larger is less than 0.0001, since g/g? Since Example: Benzene Concentration in Cigars

41 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.41  We’ve just computed a p-value  Since the probability of observing a 151 or greater if the true mean was 81, is very small, we reject the null hypothesis and conclude that there is sufficient evidence that cigars have a higher concentration of benzene than cigarettes Example: Benzene Concentration in Cigars

42 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.42 STATA Output: benzene concentration example. ttesti 7 151 9 81, level(95) Number of obs = 7 ------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 151 3.40168 44.3898 0.0000 142.6764 159.3236 ------------------------------------------------------------------------------ Degrees of freedom: 6 Ho: mean(x) = 81 Ha: mean 81 t = 20.5781 t = 20.5781 t = 20.5781 P |t| = 0.0000 P > t = 0.0000 Since we are performing a two-sided test, we concentrate on the middle test. Since P > |t| = 0.0000, which is much smaller than 0.05, we reject the null hypothesis.

43 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.43  Recall (from the CLT, especially for large sample sizes) that Z~N(0,1)  Thus  Z is between -1.96 and 1.96 approximately 95% of the time Confidence Intervals

44 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.44  With a bit of mathematical derivation, we have  Thus a 95% confidence interval for μ is Confidence Intervals

45 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.45 Confidence Intervals  More generally, a confidence interval for the population mean μ has the form

46 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.46  However in most cases, σ is unknown and must be estimated with s  We then base inference on the t- distribution with n-1 degrees of freedom Confidence Intervals

47 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.47  2-sided confidence interval  1-sided confidence intervals (upper and lower) Confidence Intervals

48 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.48 Example  A sample of n=10 infants were taken to estimate the plasma aluminum level among infants that have taken antacids that contain aluminum  The sample mean and standard deviation were

49 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.49 95% Confidence Interval for the Population Mean  Based on t-distribution with n-1=9 degrees of freedom

50 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.50 STATA Output. cii 10 37.2 7.13 Variable | Obs Mean Std. Err. [95% Conf. Interval] ----------------------------------------------------------------------------- | 10 37.2 2.254704 32.09951 42.30049 Note: standard error is the standard deviation divided by the square root of n.

51 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.51 We can compare the previous interval to the 95% confidence interval based on the normal distribution derived if we had known the true population standard deviation to be 7.13 mg/l. This interval is (32.8, 41.6) and has length 8.8 (= 41.6-32.8) whereas the one based on the t-distribution has length 10.2 (= 42.3-32.1). This loss of accuracy (i.e., wider confidence interval) is the penalty payed for the lack of knowledge of the true population standard deviation. Example: plasma aluminum level among infants

52 Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.52 Caution STATA only produces two-sided confidence intervals. If you want to obtain one-sided confidence intervals for the aluminum example, you have to use the level(#) option as follows:. cii 10 37.2 7.13, level(90) Variable | Obs Mean Std. Err. [90% Conf. Interval] ---------+------------------------------------------------------------- | 10 37.2 2.254704 33.06687 41.33313 Thus, an upper 95% confidence interval would be (- , 41.3), while a lower 95% confidence interval would be (33.1, +  ).


Download ppt "Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Confidence Intervals."

Similar presentations


Ads by Google