A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand.

Slides:



Advertisements
Similar presentations
How would you explain the smoking paradox. Smokers fair better after an infarction in hospital than non-smokers. This apparently disagrees with the view.
Advertisements

II. Potential Errors In Epidemiologic Studies Random Error Dr. Sherine Shawky.
M2 Medical Epidemiology
KRUSKAL-WALIS ANOVA BY RANK (Nonparametric test)
1 Case-Control Study Design Two groups are selected, one of people with the disease (cases), and the other of people with the same general characteristics.
Random error, Confidence intervals and p-values Simon Thornley Simon Thornley.
Chance, bias and confounding
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.
Categorical Data. To identify any association between two categorical data. Example: 1,073 subjects of both genders were recruited for a study where the.
EPI 809 / Spring 2008 Final Review EPI 809 / Spring 2008 Ch11 Regression and correlation  Linear regression Model, interpretation. Model, interpretation.
Chapter 17 Comparing Two Proportions
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 10: Hypothesis Tests for Two Means: Related & Independent Samples.
Confidence Interval for Relative Risk
1June In Chapter 19: 19.1 Preventing Confounding 19.2 Simpson’s Paradox (Severe Confounding) 19.3 Mantel-Haenszel Methods 19.4 Interaction.
Inferences About Process Quality
Chapter 17 Comparing Two Proportions
BCOR 1020 Business Statistics
HaDPop Measuring Disease and Exposure in Populations (MD) &
Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology.
Are exposures associated with disease?
The Chi-Square Test Used when both outcome and exposure variables are binary (dichotomous) or even multichotomous Allows the researcher to calculate a.
Are the results valid? Was the validity of the included studies appraised?
Stratification and Adjustment
INTRODUCTION TO EPIDEMIOLO FOR POME 105. Lesson 3: R H THEKISO:SENIOR PAT TIME LECTURER INE OF PRESENTATION 1.Epidemiologic measures of association 2.Study.
Statistical inference: confidence intervals and hypothesis testing.
Multiple Choice Questions for discussion
Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.
HSRP 734: Advanced Statistical Methods July 10, 2008.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Evidence-Based Medicine 3 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
Confidence Intervals Nancy D. Barker, M.S.. Statistical Inference.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Hypothesis Testing Field Epidemiology. Hypothesis Hypothesis testing is conducted in etiologic study designs such as the case-control or cohort as well.
Amsterdam Rehabilitation Research Center | Reade Multiple regression analysis Analysis of confounding and effectmodification Martin van de Esch, PhD.
A short introduction to epidemiology Chapter 2b: Conducting a case- control study Neil Pearce Centre for Public Health Research Massey University Wellington,
October 15. In Chapter 19: 19.1 Preventing Confounding 19.2 Simpson’s Paradox 19.3 Mantel-Haenszel Methods 19.4 Interaction.
1October In Chapter 17: 17.1 Data 17.2 Risk Difference 17.3 Hypothesis Test 17.4 Risk Ratio 17.5 Systematic Sources of Error 17.6 Power and Sample.
A short introduction to epidemiology Chapter 8: Effect Modification Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
The binomial applied: absolute and relative risks, chi-square.
Chapter 2 Nature of the evidence. Chapter overview Introduction What is epidemiology? Measuring physical activity and fitness in population studies Laboratory-based.
A short introduction to epidemiology Chapter 10: Interpretation Neil Pearce Centre for Public Health Research Massey University, Wellington, New Zealand.
통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.
Issues concerning the interpretation of statistical significance tests.
Describing the risk of an event and identifying risk factors Caroline Sabin Professor of Medical Statistics and Epidemiology, Research Department of Infection.
Case Control Study : Analysis. Odds and Probability.
BC Jung A Brief Introduction to Epidemiology - XIII (Critiquing the Research: Statistical Considerations) Betty C. Jung, RN, MPH, CHES.
1 G Lect 7a G Lecture 7a Comparing proportions from independent samples Analysis of matched samples Small samples and 2  2 Tables Strength.
© Copyright McGraw-Hill 2004
More Contingency Tables & Paired Categorical Data Lecture 8.
Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.
Instructor Resource Chapter 15 Copyright © Scott B. Patten, Permission granted for classroom use with Epidemiology for Canadian Students: Principles,
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Nine Hypothesis Testing.
A short introduction to epidemiology Chapter 2: Incidence studies Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand.
1 Probability and Statistics Confidence Intervals.
A short introduction to epidemiology Chapter 6: Precision Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Contingency Tables.
STANDARDIZATION Direct Method Indirect Method. STANDARDIZATION Issue: Often times, we wish to compare mortality rates between populations, or at different.
Hypothesis Testing and Statistical Significance
Confidence Intervals and Hypothesis Testing Mark Dancox Public Health Intelligence Course – Day 3.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Introdcution to Epidemiology for Medical Students Université Paris-Descartes Babak Khoshnood INSERM U1153, Equipe EPOPé (Dir. Pierre-Yves Ancel) Obstetric,
Measures of disease frequency Simon Thornley. Measures of Effect and Disease Frequency Aims – To define and describe the uses of common epidemiological.
Types of Errors Type I error is the error committed when a true null hypothesis is rejected. When performing hypothesis testing, if we set the critical.
Lecture 4: Meta-analysis
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Saturday, August 06, 2016 Farrokh Alemi, PhD.
Presentation transcript:

A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

Chapter 9 Data analysis Basic principles Basic analyses Control of confounding

Basic principles Effect estimation Confidence intervals P-values

Testing and estimation The effect estimate provides an estimate of the effect (e.g. relative risk, risk difference) of exposure on the occurrence of disease The confidence interval provides a range of values in which it is plausible that the true effect estimate may lie The p-value is the probability that differences as large or larger as those observed could have arisen by chance if the null hypothesis (of no association between exposure and disease) is correct The principal aim of an individual study should be to estimate the size of the effect (using the effect estimate and confidence interval) rather than just to decide whether or not an effect is present (using the p-value)

Problems of significance testing The p-value depends on two factors: the size of the effect; and the size of the study A very small difference may be statistically significant if the study is very large, whereas a very large difference may not be significant if the study is very small. The purpose of significance testing is to reach a decision based on a single study. However, decisions should be based on information from all available studies, as well as non-statistical considerations such as the plausibility and coherence of the effect in the light of current theoretical and empirical knowledge (see chapter 10).

Chapter 9 Data analysis Basic principles Basic analyses Control of confounding

Basic analyses Measures of occurrence –Incidence proportion (risk) –Incidence rate –Incidence odds Measures of effect –Risk ratio –Rate ratio –Odds ratio

Example: C E c a E b M0M0 d N0N0 N1N1 T C M1M1

Example: Smoking and Ovarian Cancer E E C C

This  2 is based on the assumptions that the marginal totals of the table (N 1, N 0, M 1,M 0 ) are fixed and that the proportion of exposed cases is the same as the proportion of exposed controls (i.e. that the overall proportion M 1 /T applies to both cases and controls)

The natural logarithm of the odds ratio has (under a binomial model) an approximate standard error of: SE[ln(OR)] = (1/a +1/b+ 1/c +1/d) 0.5 An approximate 95% confidence interval for the odds ratio is then given by: OR e SE

Chapter 9 Data analysis Basic principles Basic analyses Control of confounding

There are two methods of calculating a summary effect estimate to control confounding: Pooling Standardisation

The unadjusted (crude) findings indicate that there is a strong association between smoking and the ovarian cancer. Suppose, however, that we are concerned about the possibility that the effect of smoking is confounded by use of oral contraception (this would occur if oral contraception caused the ovarian cancer and if oral contraception was associated with smoking). We then need to stratify the data into those who have used oral contraceptives and those who have not. Example of pooling:

OC use YesNo Smoking Cases Controls YesNo

In those who have used oral contraceptives, the odds ratio for smoking is: In those who have not used oral contraceptives, the odds ratio for smoking is:

Thus, the crude OR for smoking (=0.46) was partly elevated due to confounding by oc use. When we remove this problem (by stratifying on oral contraceptive use) the odds ratios increase and are close to 1.0

In this example, the odds ratios are not exactly the same in each stratum. If they are very different (e.g. 1.0 in one stratum and 4.0 in the other stratum) then we would usually report the findings separately for each stratum. However, if the odds ratio estimates are reasonably similar then we usually wish to summarize our findings into a single summary odds ratio by taking a weighted average of the OR estimates in each stratum.

where OR i = OR in stratum i W i = weight given to stratum i

One obvious choice of weights would be to weight each stratum by the inverse of its variance (precision-based estimates). However, this method of obtaining a summary odds ratio yields estimates which are unstable and highly affected by small numbers in particular strata.

A better set of weights were developed by Mantel-Haenszel. These involve using the weights b i c i /T i :

C E CC C E E Stratum 1Stratum 2 E

This set of weights yields summary odds ratio estimates which are very close to being statistically optimal (they are very close to the maximum likelihood estimates) and are very robust in that they are not unduly affected by small numbers in particular strata (provided that the strata do not have any zero marginal totals).

We can calculate a corresponding chi-square:

C E CC C E E Stratum 1Stratum 2 E

The natural logarithm of the odds ratio has (under a binomial model) an approximate standard error of: ΣPRΣ(PS + QR)ΣQS SE = R + 2 2R + S + 2S + 2 where:P = (a i + d i )/T i Q = (b i + c i )/T i R = a i d i /T i S = b i c i /T i R + = ΣR S + = ΣS

An approximate 95% confidence interval for the odds ratio is then given by: OR e SE

E a E b cM1M1 Y1Y1 Y0Y0 PY Rate ratios:

E E ,000 Case PY Rate

The summary Mantel-Haenszel rate ratio involves taking the weights bY 1 /T to yield:

The equivalent Mantel-Haenszel chi-square is:

This is very similar to the  2 MH for case-control studies, but it has some minor modifications to take account of the fact that we are using person-time data rather than binomial data.

An approximate standard error for the natural log of the rate ratio is : [ ΣM 1i Y 1i Y 0i /T i2 ] 0.5 SE = [(Σa i Y 0i /T i )(Σb i Y 1i /T i )] 0.5

An approximate 95% confidence interval for the rate ratio is then given by: RR e SE

Risk ratios: E a E b CasesM1M1 N1N1 N0N0 Total cd Non CasesM0M0

An approximate standard error for the natural log of the risk ratio is : [ ΣM 1i N 1i N 0i /T i2 - a i b i /T i ] 0.5 SE = [(Σa i N 0i /T i )(Σb i N 1i /T i )] 0.5

An approximate 95% confidence interval for the risk ratio is then given by: RR e SE

Standardization, in contrast to pooling, involves taking a weighted average of the rates in each stratum (eg age-group) before taking the ratio of the two standardized rates. Standardization has many advantages in descriptive epidemiology involving comparisons between countries, regions, ethnic groups or gender groups. However, pooling (when done appropriately) has some superior statistical properties when comparing exposed and non-exposed in specific study.

Summary of Stratified Analysis If we are concerned about confounding by a factor such as age, gender, smoking then we need to stratify on this factor (or all factors simultaneously if there is more than one potential confounder) and calculate the exposure effect separately in each stratum. If the effect is very different in different strata then we would report the findings separately for each stratum.

If the effect is similar in each stratum then we can obtain a summary estimate by taking a weighted average of the effect in each stratum. If the adjusted effect is different from the crude effect this means that the crude effect was biased due to confounding.

Usually we need to adjust the findings (ie stratify on) age, gender, and some other factors. If we have five age-groups and two gender- groups then we need to divide the data into ten age-gender-groups. If we have too many strata then we begin to get strata with zero marginal totals (eg with no cases or no controls). The analysis then begins to ‘break down’ and we have to consider using mathematical modelling.

A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand