Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida, College of Nursing Professor, College of Public Health Department of Epidemiology and Biostatistics Associate Member, Byrd Alzheimer’s Institute Morsani College of Medicine Tampa, FL, USA 1

SECTION 3.1 Module Overview and Introduction Confidence intervals, estimation of parameters, and hypothesis testing.

Module 3 Learning Objectives: 1.Describe the concepts of parameter estimation and confidence intervals 2.Apply use of the z and t distribution for calculation of confidence intervals based on sample size 3.Select appropriate z and t values based on the width of a desired confidence interval 4.Calculate and interpret confidence intervals for means, proportions, and relative risk for one and two sample designs including matched design 5.Use SPSS to calculate confidence intervals 6.Distinguish the theoretical relationship between the risk ratio and odds ratio

Module 3 Learning Objectives: 7.List the concept, guidelines, and primary steps involved in hypothesis testing 8.Differentiate between the “null” and “alternative” hypothesis. 9.Understand and interpret parameters used in hypothesis testing (level of significance, p-value). 10.Differentiate type I and type II error and factors that impact statistical power. 11.Calculate and interpret sample hypotheses: a) One-sample - continuous outcome b) One-sample - dichotomous outcome c) One-sample - categorical/ ordinal outcome d) Matched design – continuous outcome

Assigned Reading: Textbook: Essentials of Biostatistics in Public Health Chapters 6 and 7

Key terms Estimation: Process of determining a likely value for a population parameter (e.g. mean or proportion) based on a sample. Point Estimate: Single valued estimate of a population parameter, such as a mean or a proportion. Confidence Interval (CI): Range of values (e.g. likely) for a population parameter with a level of confidence attached (e.g. 95% confidence that the interval contains the unknown parameter). General form for CI is: point estimate margin of error Common confidence levels are 90%, 95%, and 99% but, theoretically, any level between 0% and 100% can be selected.

SECTION 3.2 Use of the z and t distributions for calculation of confidence intervals

For the standard normal distribution, the following is true: P(-1.96 < z < 1.96) = 0.95 i.e. there is a 95% probability that a standard normal variable, denoted z, will fall between -1.96 and 1.96. Using the Central Limit Theorem, and some algebra, the 95% confidence interval (CI) for the population mean is: General form for a CI can be written as: point estimate + zSE(point estimate) where z is value from standard normal distribution reflecting the desired confidence level, and SE=standard error of the point estimate

For the formula below for the mean (or any other parameter, we often do not know the true value of the population standard deviation (σ) For large sample sizes (n > 30), σ can be estimated from the sample standard deviation (s) based on the Central Limit Theorem. For small sample size (n < 30), the Central Limit Theorem does not apply, and instead, the t distribution is used (Table 2 of Appendix) t values depend on n small samples have larger t value (less precision) values are indexed by degrees of freedom (df = n-1)

Listing of Selected t Values for Confidence Intervals Confidence Level df80%90%95%98%99% 41.5332.1322.7763.7474.604 51.4762.0152.5713.3654.032 61.4401.9432.4473.1433.707 71.4151.8952.3652.9983.499 81.3971.8602.3062.8963.355 91.3831.8332.2622.8213.250 Example: For a confidence interval of a mean with n < 30, use t:

SECTION 3.3 Calculation and interpretation of confidence intervals One Sample a)Continuous outcome b)Dichotomous outcome

CI for One Sample – Continuous Outcome Parameter:Mean Body Mass Index (BMI) Sample N:180 (n > 30, so use large sample – z value) Sample Mean:28.2 Sample SD:5.4 Confidence Level:95%Z value: 1.96 95% Confidence Interval for μ: = 28.2 1.96 x (5.4 / sqrt(180)) = 28.2 0.79 = (27.4, 29.0)

µ 28.2 Lower limit 27.4 Upper limit 29.0 95% C.I. = 28.2 1.96 x (5.4 / sqrt(180)) = 28.2 0.79 = (27.4, 29.0) From the sample, we estimate the mean BMI as 28.2, and are 95% confident that the true population mean lies between the interval of 27.4 to 29.0

CI for One Sample – Continuous Outcome (Practice) Parameter:Mean diastolic blood pressure Sample N:503 Sample Mean:80.69 Sample SD:10.176 Confidence Level:95%Z value: ___ or t value: ____ 95% Confidence Interval for μ:

CI for One Sample – Continuous Outcome (Practice) Parameter:Mean diastolic blood pressure Sample N:503 (large sample, n > 30) Sample Mean:80.69 Sample SD:10.176 Confidence Level:95%Z value: 1.96 95% Confidence Interval for μ: = 80.69 + 1.96 x 10.176 / sqrt(503) = 80.69 + 0.889 = (79.8, 81.6)

CI for One Sample – Continuous Outcome (Practice) Parameter:Mean diastolic blood pressure Sample N:503 (large sample, n > 30) Sample Mean:80.69 Sample SD:10.176 Confidence Level:95%Z value: 1.96 = 80.69 + 1.96 x 10.176 / sqrt(503) = 80.69 + 0.889 = (79.8, 81.6) SPSS Analyze Compare Means One Sample T Test Options: 95% confidence interval

CI for One Sample – Continuous Outcome (Practice) Parameter:Mean resting pulse (beats per minute) Sample N:14 Sample Mean:63.3 Sample SD:9.5 Confidence Level:95%Z value: ___ or t value: ____ 95% Confidence Interval for μ:

CI for One Sample – Continuous Outcome (Practice) Parameter:Mean resting pulse (beats per minute) Sample N:14 (small sample, n > 30) Sample Mean:63.3 Sample SD:9.5 Confidence Level:95%t value: 2.16 (i.e. n-1) 95% Confidence Interval for μ: = 63.3 + 2.16 x 9.5 / sqrt(14) = 63.3 + 5.484 = (57.8, 68.8)

CI for One Sample – Dichotomous Outcome Parameter:Proportion of population treated for hypertension Sample N:3,532 (large sample, so use z value) Sample Proportion:0.345 (i.e. 1,219 / 3,532) Confidence Level:95%Z value: 1.96 95% Confidence Interval for: = 0.345 0.016 (0.329, 0.361) From the sample, we estimate the proportion of persons treated for hypertension to be 0.345, and we are 95% confident that the true proportion lies between the interval of 0.329 to 0.361.

CI for One Sample – Dichotomous Outcome (Practice) Parameter:Proportion of population with diabetes Sample N:501 Sample Proportion:(91 / 501)= _______ Confidence Level:95%Z value: _______ 95% Confidence Interval for:

CI for One Sample – Dichotomous Outcome (Practice) Parameter:Proportion of population with diabetes Sample N:501 (large sample, so use z value) Sample Proportion:(91 / 501)= 0.1816 Confidence Level:95%Z value: 1.96 95% Confidence Interval for: (0.148, 0.215) From the sample, we estimate the proportion of persons with diabetes to be 0.1816, and we are 95% confident that the true proportion lies between the interval of 0.148 to 0.215. = 0.1816 + 0.0338

SECTION 3.4 Calculation and interpretation of confidence intervals Two Samples – Matched a)Continuous outcome

CI for Two Samples – Matched Continuous Outcome  Often used for intervention studies with a pre- and post-measurement design (e.g. before and after treatment)  Goal is to compare the mean score before and after the intervention  Because the sample is matched (same persons completing pre- and post measurements), cannot use aggregate means (i.e. see below) Subject IDPrePostDifference 1158132 -26 2148138 -10 3152158 6 4155131 -24  Parameter of interest is the mean difference, denoted μ d  Parameter of interest is SD of the difference scores, denoted s d

CI for Two Samples – Matched Continuous Outcome Parameter:Mean difference in depressive symptom scores after taking a new drug: X d = -12.7 Sample N:100 (number of persons, not measurements) Sample SD:SD of difference scores: s d = 8.9 Confidence Level: 95%Z value: 1.96 = -12.7 1.96 x (8.9 / sqrt(100)) = -12.7 1.74 = (-14,4, -11.0)

CI for Two Samples – Matched Continuous Outcome (Practice) Parameter:Mean difference in anxiety symptom scores after psychotherapy: X d = -14.8 Sample N:52 (number of persons, not measurements) Sample SD:SD of difference scores: s d = 9.6 Confidence Level: 90%Z value: ______

CI for Two Samples – Matched Continuous Outcome (Practice) Parameter:Mean difference in anxiety symptom scores after psychotherapy: X d = -14.8 Sample N:52 (number of persons, not measurements) Sample SD:SD of difference scores: s d = 9.6 Confidence Level: 90%Z value: 1.645 = -14.8 1.645 x (9.6 / sqrt(52)) = -14.8 2.19 = (-17.0, -12.6) From the sample, we estimate a mean difference in anxiety scores of -14.8 after undergoing psychotherapy, and we are 90% confident that the true proportion lies between the interval of -16.7 to -12.6.

SECTION 3.5 Calculation and interpretation of confidence intervals Two Samples - Independent a)Continuous – mean difference b)Dichotomous – risk difference c)Dichotomous – risk ratio d)Dichotomous – odds ratio

CI for Two Samples – Independent Continuous Outcome  Common parameter of interest is difference in means between the two groups, X 1 and X 2, and denoted for the population as:  Since there are 2 independent groups, we also have: n 1 and n 2 and s 1 and s 2  If the sample variances are approximately equal, then we can “pool” the standard deviations, s 1 and s 2. A typical rule of thumb to pool is: s 2 1 / s 2 2 > 0.5 and s 2 1 / s 2 2 < 2.0  The pooled (common) standard deviation is a weighted average: μ 1 – μ 2

CI for Two Samples – Independent Continuous Outcome Parameter:Mean difference in systolic blood pressure between a sample of men and a sample of women X men = 128.2; n 1 = 1623;s 1 = 17.5 X women = 126.5; n 2 = 1911;s 2 = 20.1 Note:s 2 1 / s 2 2 = 0.76, so can use pooled SD (S p ) Confidence Level: 95%Z value: 1.96 = sqrt(359.12) = 19.0 Formula

CI for Two Samples – Independent Continuous Outcome Parameter:Mean difference in systolic blood pressure between a sample of men and women X men = 128.2; n 1 = 1623;s 1 = 17.5 X women = 126.5; n 2 = 1911;s 2 = 20.1 Formula = 1.7 + 1.26 = (0.44, 2.96)

CI for Two Samples – Independent Continuous Outcome (Practice) Parameter:Mean difference in depression scores between a sample of men and women X men = 5.77; n 1 = 163;s 1 = 7.674 X women = 6.86; n 2 = 333;s 2 = 8.714 Note:s 2 1 / s 2 2 = = _________ Assume calculation of a 95% confidence interval

CI for Two Samples – Independent Continuous Outcome (Practice) Parameter:Mean difference in depression scores between a sample of men and women X men = 5.77; n 1 = 163;s 1 = 7.674 X women = 6.86; n 2 = 333;s 2 = 8.714 Note:s 2 1 / s 2 2 = 0.78, so can use pooled SD (S p ) = sqrt((9540 + 25210) / 494) = 8.39 (5.77 – 6.86) + 1.96(8.39) 1 1 163 333 + = (-2.66, 0.49) = -1.09

CI for Two Samples – Independent Continuous Outcome (Practice) Parameter:Mean difference in depression scores between a sample of men and women X men = 5.77; n 1 = 163;s 1 = 7.674 X women = 6.86; n 2 = 333;s 2 = 8.714 SPSS Analyze Compare Means Independent Samples T Test Test Variable Grouping Variable Options – CI percentage From the sample, we estimate a mean difference in depression scores between men and women of -1.09, and we are 95% confident that the true mean difference lies between the interval of -2.66 to 0.49.

CI for Two Samples – Independent: Risk Difference  Parameter of interest is the risk difference for the incidence proportions in the population, denoted as RD = p 1 – p 2  For a sample, the point estimate for the risk difference is denoted as: RD = p 1 – p 2 Formula No CVDCVDTotalIncidence Current smoker66381 (x 1 )744p 1 = 81 / 744 = 0.1089 Non-smoker2757298 (x 2 )3055p 2 = 298 / 3055 = 0.0975 Total34203793799 Example: Incidence of CVD in Smokers and Non-Smokers

CI for Two Samples – Independent: Risk Difference  Example: Compare the incidence proportion of CHD among smokers (exposed) and non-smokers (not exposed) Smokers:n 1 = 744w/CHD(x 1 ) = 81p 1 = 0.1089 Non-smokers:n 2 = 3055w/CHD(x 2 ) = 298p 2 = 0.0975 Confidence Level:95%Z value: 1.96 = 0.0114 + 0.0247 = (-0.0133, 0.0361)

CI for Two Samples – Independent: Risk Difference (Practice)  Example: Compare the incidence proportion of sleep disorder among person on statins (exposed) and not on statins (not exposed) Confidence Level:95%Z value: _______ Sleep OKSleep DxTotalIncidence Statin user9114 (x 1 )105p 1 = 14 / 105 = 0.1333 Non-statin user36928 (x 2 )397p 2 = 28 / 397 = 0.0705 Total46042502

CI for Two Samples – Independent: Risk Difference (Practice)  Example: Compare the incidence proportion of sleep disorder among person on statins (exposed) and not on statins (not exposed) Confidence Level:95%Z value: 1.96 = 0.063 + 0.0697 = (-0.007, 0.133) Sleep OKSleep DxTotalIncidence Statin user9114 (x 1 )105p 1 = 14 / 105 = 0.1333 Non-statin user36928 (x 2 )397p 2 = 28 / 397 = 0.0705 Total46042502 105 0.1333(1 – 0.1333) 0.0705(1 – 0.0705) 397 0.1333 – 0.0705 + 1.96 +

CI for Two Samples – Independent: Risk Difference (Practice)  Example: Compare the incidence proportion of sleep disorder among person on statins (exposed) and not on statins (not exposed) Confidence Level:95%Z value: 1.96 = 0.063 + 0.0697 = (-0.007, 0.133) Sleep OKSleep DxTotalIncidence Statin user9114 (x 1 )105p 1 = 14 / 105 = 0.1333 Non-statin user36928 (x 2 )397p 2 = 28 / 397 = 0.0705 Total46042502 From the sample, we estimate that absolute risk of sleep disorder is 0.063 higher in statin-users compared to non-users, and we are 95% confident that the true risk difference lies between the interval of -0.007 to 0.1333.

CI for Two Samples – Independent: Risk Ratio  Parameter of interest is the ratio of the incidence proportions for the population, denoted as RR = p 1 / p 2  For a sample, the point estimate for the risk ratio (RR) is denoted as: RR = p 1 / p 2  Note that the RR does not follow a normal distribution, but the natural log (ln) of the RR is approximately normally distributed and is used to calculate the confidence interval – this entails 2 steps: ---Calculate CI for ln(RR) ---Calculate CI for RR (i.e. transform) CI for ln(RR): CI for (RR):exp(Lower limit), exp(Upper limit)

CI for Two Samples – Independent: Risk Ratio RR = p 1 / p 2 CI for ln(RR): CI for (RR):exp(Lower limit), exp(Upper limit)  Example: Compare future risk of CHD among smokers (exposed) and non-smokers (not exposed) Smokers:n 1 = 744w/CHD(x 1 ) = 81 p 1 = 0.1089 Non-smokers:n 2 = 3055w/CHD(x 2 ) = 298 p 2 = 0.0975 Confidence Level:95%Z value: 1.96 RR = p 1 / p 2 = 0.1089 / 0.0975 = 1.12 CI for ln(RR): = 0.113 + 0.232 = (-0.119, 0.345) (exp(-0.119), exp(0.345)) = (0.89, 1.41)

CI for Two Samples – Independent: Risk Ratio (Practice) RR = p 1 / p 2 CI for ln(RR): CI for (RR):exp(Lower limit), exp(Upper limit)  Example: Compare the future risk of sleep disorder among statin users (exposed) versus non-statin users (not exposed) Confidence Level:95%Z value: _______ RR = p 1 / p 2 = CI for ln(RR): Sleep OKSleep DxTotalIncidence Statin user9114 (x 1 )105p 1 = 14 / 105 = 0.1333 Non-statin user36928 (x 2 )397p 2 = 28 / 397 = 0.0705 Total46042502

CI for Two Samples – Independent: Risk Ratio (Practice) RR = p 1 / p 2 CI for ln(RR): CI for (RR):exp(Lower limit), exp(Upper limit)  Example: Compare the future risk of sleep disorder among statin users (exposed) versus non-statin users (not exposed) Confidence Level:95%Z value: 1.96 RR = p 1 / p 2 = 0.1333 / 0.0705 = 1.89 CI for ln(RR): = 0.6366 + 0.6044 = (0.0322, 0.6044) (exp(0.0322), exp(1.24)) = (1.03, 3.46) Sleep OKSleep DxTotalIncidence Statin user9114 (x 1 )105p 1 = 14 / 105 = 0.1333 Non-statin user36928 (x 2 )397p 2 = 28 / 397 = 0.0705 Total46042502

CI for Two Samples – Independent: Risk Ratio (Practice)  Example: Compare the future risk of sleep disorder among statin users (exposed) versus non-statin users (not exposed) Confidence Level:95%Z value: 1.96 RR = p 1 / p 2 = 0.1333 / 0.0705 = 1.89 CI for ln(RR): = 0.6366 + 0.6044 = (0.0322, 0.6044) (exp(0.0322), exp(1.24)) = (1.03, 3.46) Sleep OKSleep DxTotalIncidence Statin user9114 (x 1 )105p 1 = 14 / 105 = 0.1333 Non-statin user36928 (x 2 )397p 2 = 28 / 397 = 0.0705 Total46042502 From the sample, we estimate that risk of sleep disorder is 1.89 times higher in statin-users compared to non-users, and we are 95% confident that the true risk lies between the interval of 1.03 to 3.46.

CI for Two Samples – Independent: Odds Ratio  Conceptually similar to risk ratio, yet the parameter of interest is the odds ratio (OR), defined as: Odds of exposure among cases / Odds of exposure among controls CVD (D + )No-CVD (D - ) Current smoker(E + )81663 Non-smoker (E - )2982757 Example: Prevalence of CVD in Smokers and Non-Smokers (95% C.I.) CasesControls Exposedab Not exposedcd CI for ln(OR): OR = (81 / 298) / (663 / 2757) = 1.13Z = 1.96 = 0.122 + 0.260 = (-0.138, 0.382) (exp(-0.138), exp(0.382)) = (0.87, 1.47)

CI for Two Samples – Independent: Odds Ratio (Practice) OR = Odds of exposure among cases / Odds of exposure among controls Prevalence of Sleep Disorder Among Statin and Non-Statin Users (95% C.I.) CasesControls Exposedab Not exposedcd CI for ln(OR): OR = (a / c) / (b / d) = _________Z = ___________ Sleep DxSleep OK Statin user (E+)1491 Non-statin user (E-)28369 CI for (OR): exp(Lower limit), exp(Upper limit)

CI for Two Samples – Independent: Odds Ratio (Practice) OR = Odds of exposure among cases / Odds of exposure among controls Example: Prevalence of Sleep Disorder Among Statin and Non-Statin Users CasesControls Exposedab Not exposedcd CI for ln(OR): OR = (14 / 28) / (91 / 369) = 2.027Z = 1.96 = 0.7066 + 0.6813 = (0.0253, 1.3879) (exp(0.0253), exp(1.3879)) = (1.03, 4.01) Sleep DxSleep OK Statin user (E+)1491 Non-statin user (E-)28369

CI for Two Samples – Independent: Odds Ratio (Practice) OR = Odds of exposure among cases / Odds of exposure among controls Example: Prevalence of Sleep Disorder Among Statin and Non-Statin Users CasesControls Exposedab Not exposedcd OR = (14 / 28) / (91 / 369) = 2.027 Sleep DxSleep OK Statin user (E+)1491 Non-statin user (E-)28369 From the sample, we estimate that the odds of statin use among persons with sleep disorder are 2.03 times higher that the odds of statin-use among persons without sleep disorder, and we are 95% confident that the value lies between the interval of 1.03 to 4.01. = 0.7066 + 0.6813 = (0.0253, 1.3879) (exp(0.0253), exp(1.3879)) = (1.03, 4.01)

SECTION 3.6 Use of SPSS to calculate confidence intervals

CI for Two Samples – Independent: Odds Ratio (Practice) Example: Prevalence of Sleep Disorder Among Statin and Non-Statin Users CasesControls Exposedab Not exposedcd OR = (14 / 28) / (91 / 369) = 2.027 Sleep DxSleep OK Statin user (E+)1491 Non-statin user (E-)28369 = 0.7066 + 0.6813 = (0.0253, 1.3879) (exp(0.0253), exp(1.3879)) = (1.03, 4.01) SPSS Analyze Descriptive Statistics Crosstabs Row and Column Variable Statistics (check “Risk”)

1.0 Null value Lower limit 1.04 Upper limit 4.01 Sleep DxSleep OK Statin user (E+)1491 Non-statin user (E-)28369 OR = 2.027 95% C.I. = 1.04, 4.01 0 Bounded at 0 10 Unbounded OR = 2.03 Note: The confidence interval for a continuous variable such as mean or difference in mean is symmetric around the point estimate. In contrast, for the risk ratio and odds ratio, the confidence interval is skewed to the right of the point estimate: This is because: a)Values for RR and OR have a lower bound of 0 yet no upper bound b)The C.I. formulas are based on an exponential function

SECTION 3.7 Relationship between the risk ratio and the odds ratio

Odds Ratio & Risk Ratio Relationship between RR and OR: The odds ratio will provide a good estimate of the risk ratio when: 1.The outcome (disease) is rare OR 2.The effect size is small or modest

Odds Ratio & Risk Ratio The odds ratio will provide a good estimate of the risk ratio when: 1.The outcome (disease) is rare D+D+ D-D- E+E+ ab E-E- cd OR = (a / c) / (b / d) OR = (ad) / (bc) a / (a +b ) RR = ------------ c / (c +d) If the disease is rare, then cells (a) and (c) will be small a / (a +b )a / b ad RR = ------------ = ------ = --- = OR c / (c +d)c / d bc

Odds Ratio & Risk Ratio The odds ratio will provide a good estimate of the risk ratio when: 2.The effect size is small or modest. D+D+ D-D- E+E+ 4060 E-E- 120180 (40 / 120)0.333 OR = ------------ =-------= 1.0 (60 / 180)0.333 40 / (40 + 60)0.40 RR = -------------------- ------= 1.0 120 / 120 + 180)0.40

Odds Ratio & Risk Ratio D+D+ D-D- E+E+ 2030 E-E- 1090 Finally, we expect the risk ratio to be closer to the null value of 1.0 than the odds ratio. Therefore, be especially cautious when interpreting the odds ratio as a measure of relative risk when the outcome is not rare and the effect size is large. (20 / 10) 2.0 OR = ------------ =-------= 6.0 (30 / 90)0.333 (20 / 50)0.40 RR = ------------ =-------= 4.0 (10 / 100)0.10

Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Similar presentations

Presentation on theme: "Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Similar presentations

Presentation on theme: "Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,"— Presentation transcript:

Similar presentations

About project

Feedback