Download presentation
Presentation is loading. Please wait.
1
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Confidence Intervals
2
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.2 Statistical Inference Statistical Inference Inferences regarding a population are made based on a sample Inferences about population parameters (e.g., μ ) are made by examining sample statistics (e.g., sample mean)
3
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.3 Statistical Inference Statistical Inference 2 primary approaches Hypothesis Testing (of a population parameter) Estimation (of a population parameter) Point Estimation Sample mean is an estimate of the population mean Interval Estimation Confidence Intervals (CIs) Hypothesis Testing vs. Estimation Closely related
4
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.4 When p-values are low… …thus indicating an effect… Then the natural question is “what is the effect” Answer this question with confidence intervals
5
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.5 When p-values are high… This does not imply “no effect” It only implies that you failed to exclude the possibility of “no effect” Construct confidence intervals to identify the effect sizes that can be “ruled out” (i.e., are inconsistent with your data)
6
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.6 Thus… ALWAYS PRESENT CONFIDENCE INTERVALS WITH P-VALUES
7
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.7 Confidence Interval A range of values associated with a parameter of interest (such as a population mean or a treatment effect) that is calculated using the data, and will cover the TRUE parameter with a specified probability (if the study was repeated a large number of times)
8
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.8 Confidence Interval Based on sample statistics (estimates of population parameters) The width of the CI provides some information regarding the precision of the estimate Provides both a range of plausible values and a test for the parameter of interest
9
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.9 Confidence Interval Roughly speaking: an interval that is expected to cover the true parameter Based on the notion of repeated sampling (similar to hypothesis testing)
10
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.10 Confidence Interval The “percent” indicates the probability (based on repeated sampling) that the CI covers the TRUE parameter Not the probability that the parameter falls in the interval The CI is the random entity (that depends on the random sample) The parameter is fixed A different sample would produce a different interval, however the parameter of interest remains unchanged
11
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.11 Illustration We are analyzing a study to determine if a new drug decreases LDL cholesterol We measure the LDL of 100 people before administering the drug and then again after 12 weeks of treatment We then calculate the mean change (post-pre) and examine it to see if improvement is observed
12
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.12 Illustration Assume now that the drug has no effect (i.e., the true mean change is 0) Note that we never know what the true mean is We perform the study and note the mean change and calculate a 95% CI Hypothetically, we repeat the study an infinite number of times (each time re-sampling 100 new people)
13
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.13 Illustration Study 1, Mean=–7, 95% CI (-12, 2) Study 2, Mean=–2, 95% CI (-9, 5) Study 3, Mean=4, 95% CI (-3, 11) Study 4, Mean=0, 95% CI (-7, 7) Study 5, Mean=-5,95% CI (-12, 2) .
14
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.14 Illustration Remember the true change is zero (i.e., the drug is worthless) 95% of these intervals will cover the true change (0). 5% of the intervals will not cover 0 In practice, we only perform the study once. We have no way of knowing if the interval that we calculated is one of the 95% (that covers the true parameter) or one of the 5% that does not.
15
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.15 Example Two treatments are being compared with respect to “clinical response” for the treatment of nosocomial (hospital- acquired) pneumonia The 95% CI for the difference in response rates for the two treatment groups is (- 0.116, 0.151) What does this mean?
16
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.16 Example (continued) Based upon the notion of repeated sampling, 95% of the CIs calculated in this manner would cover the true difference in response rates We do not know if we have one of the 95% that covers the true difference or one of the 5% that does not Thus we are 95% confident that the true between-group difference in the proportion of subjects with clinical response is between -0.116 and 0.151
17
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.17 Caution Thus, every time that we calculate a 95% CI, then there is a 5% chance that the CI does not cover the quantity that you are estimating If you perform a large analysis, calculating many CIs for many parameters, then you can expect that 5% of the will not cover the the parameters of interest
18
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.18 CIs and Hypothesis Testing To use CIs for hypothesis testing: values between the limits are values for which the null hypothesis would not be rejected At the α =0.05 level: We would fail to reject H 0 : treatment change=0 since 0 is contained in (-0.116, 0.151) We would reject H 0 : treatment change=-20 since -20 is not contained in (-0.116, 0.151)
19
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.19 CIs and Hypothesis Testing What would be the conclusion of these hypothesis tests if we wanted to test at: α =0.01? α =0.10?
20
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.20 Confidence Interval Width It is desirable to have narrow CIs Implies more precision in your estimate Wide CIs have little meaning
21
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.21 Width of CI In general, with all other things being equal: Smaller sample size wider CIs Higher confidence wider CIs Larger variability wider CIs
22
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.22 In Practice It is a good idea to provide confidence intervals in an analyses They can be used for hypothesis testing and provide an estimate of the magnitude of the effect (which p-values do not provide)
23
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.23 Example: BPNS in NARC 007
24
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.24 CIs Similar to hypothesis testing: We may choose the confidence level (e.g., 95% α =0.05) CIs may be: 1-sided: (- ∞, value) or (value, ∞ ), or 2-sided (value, value)
25
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.25 CIs Can be constructed for: A population mean The difference between two population means A population proportion The difference between two population proportions
26
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.26 Example: Neurological Effects of HCV co-infection in HIV Objective Investigate the effect of active HCV replication on neuropsychological function and neuropathy in HIV-infected individuals with (HIV) virologic suppression
27
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.27 Methods Brief Neuro-Cognitive Screen (BNCS) Administered by trained site personnel (not neuropsychologists) Battery assessing planning ability, visual motor speed, and concentration Tests Trailmaking A Trailmaking B Digit Symbol Raw test scores normalized for age, education, race, gender Heaton et. al., 2004, WAIS-III-WMS-III-WIAT_II Scoring Assistant NPZ3 – mean of the 3 normalized tests
28
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.28 Methods Brief Peripheral Neuropathy Screen (BPNS) Administered by trained site personnel (not neurologists) Peripheral neuropathy At least mild loss of vibration sensation in both great toes or ankle reflexes absent or hypoactive relative to knees Symptomatic peripheral neuropathy Peripheral neuropathy and at least mild symptoms (e.g., pain, aching, numbness, burning, pins and needles)
29
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.29 Methods Primary analyses 2 group comparison ≤ 40 IU/ml vs. > 40 IU/ml
30
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.30 Results: Neuropsychological Tests HCV VL Status Test ≤ 40 IU/ml (N=X) > 40 IU/ml (N=X) P*95% CI* NPZ3-0.457 (0.830) -0.589 (0.758) 0.14-0.266,0.039 Trailmaking A-0.231 (1.172) -0.383 (1.066) 0.56-0.278,0.151 Trailmaking B-0.021 (1.182) -0.119 (1.086) 0.31-0.333,0.106 Digit Symbol-1.152 (0.777) -1.254 (0.702) 0.07-.0268,0.011 Note: Data are mean (SD). * Adjusted for IV Drug use, education, sex, race, age
31
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.31 Results: Symptomatic Neuropathy HCV VL Status ≤ 40 IU/ml> 40 IU/ml Total Symptomatic Neuropathy Yes393069 No276129405 Total 315159474 P*=0.06, OR*=1.65, (0.98, 2.77) * Adjusted for IV drug use, education, d-drug use, sex, race, age
32
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.32 Where are we now? We now use the principles of hypothesis testing, confidence interval estimation, to conduct hypothesis tests regarding specific parameters of interest, and build confidence intervals for these parameters
33
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.33 Parameter of Interest: Population Mean Hypothesis tests regarding a population mean can be performed using “t-tests” T-tests 1-sample 2-sample Independent samples Paired
34
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.34 T-tests T-tests use the t-distribution Similar to the normal distribution Make use of the standardization and the CLT
35
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.35 One-Sample T-test 1.H 0 : μ = μ 0 2.H A : μ ≠ μ 0 or μ > μ 0 or μ < μ 0 3.Typically set α at 0.05 4.Test Statistic: P-value 5.Conclusion – Reject or Fail to Reject H 0 ? Assumptions: Random (valid) sampling Data comes from a population that is normally distributed
36
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.36
37
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.37 We can then plug in s for σ and use: However, since we have less information than if we new σ, t is not standard normal (and has more variability) It has a t-distribution Example: Benzene Concentration in Cigars
38
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.38
39
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.39
40
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.40 What is the probability that we would see these data if the true (population) mean of cigars was 81 μg/g If the true mean was 81, the probability of observing a sample mean of 151 or larger is less than 0.0001, since g/g? Since Example: Benzene Concentration in Cigars
41
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.41 We’ve just computed a p-value Since the probability of observing a 151 or greater if the true mean was 81, is very small, we reject the null hypothesis and conclude that there is sufficient evidence that cigars have a higher concentration of benzene than cigarettes Example: Benzene Concentration in Cigars
42
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.42 STATA Output: benzene concentration example. ttesti 7 151 9 81, level(95) Number of obs = 7 ------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 151 3.40168 44.3898 0.0000 142.6764 159.3236 ------------------------------------------------------------------------------ Degrees of freedom: 6 Ho: mean(x) = 81 Ha: mean 81 t = 20.5781 t = 20.5781 t = 20.5781 P |t| = 0.0000 P > t = 0.0000 Since we are performing a two-sided test, we concentrate on the middle test. Since P > |t| = 0.0000, which is much smaller than 0.05, we reject the null hypothesis.
43
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.43 Recall (from the CLT, especially for large sample sizes) that Z~N(0,1) Thus Z is between -1.96 and 1.96 approximately 95% of the time Confidence Intervals
44
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.44 With a bit of mathematical derivation, we have Thus a 95% confidence interval for μ is Confidence Intervals
45
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.45 Confidence Intervals More generally, a confidence interval for the population mean μ has the form
46
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.46 However in most cases, σ is unknown and must be estimated with s We then base inference on the t- distribution with n-1 degrees of freedom Confidence Intervals
47
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.47 2-sided confidence interval 1-sided confidence intervals (upper and lower) Confidence Intervals
48
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.48 Example A sample of n=10 infants were taken to estimate the plasma aluminum level among infants that have taken antacids that contain aluminum The sample mean and standard deviation were
49
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.49 95% Confidence Interval for the Population Mean Based on t-distribution with n-1=9 degrees of freedom
50
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.50 STATA Output. cii 10 37.2 7.13 Variable | Obs Mean Std. Err. [95% Conf. Interval] ----------------------------------------------------------------------------- | 10 37.2 2.254704 32.09951 42.30049 Note: standard error is the standard deviation divided by the square root of n.
51
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.51 We can compare the previous interval to the 95% confidence interval based on the normal distribution derived if we had known the true population standard deviation to be 7.13 mg/l. This interval is (32.8, 41.6) and has length 8.8 (= 41.6-32.8) whereas the one based on the t-distribution has length 10.2 (= 42.3-32.1). This loss of accuracy (i.e., wider confidence interval) is the penalty payed for the lack of knowledge of the true population standard deviation. Example: plasma aluminum level among infants
52
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.52 Caution STATA only produces two-sided confidence intervals. If you want to obtain one-sided confidence intervals for the aluminum example, you have to use the level(#) option as follows:. cii 10 37.2 7.13, level(90) Variable | Obs Mean Std. Err. [90% Conf. Interval] ---------+------------------------------------------------------------- | 10 37.2 2.254704 33.06687 41.33313 Thus, an upper 95% confidence interval would be (- , 41.3), while a lower 95% confidence interval would be (33.1, + ).
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.