Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Contingency Tables.

Slides:



Advertisements
Similar presentations
Three or more categorical variables
Advertisements

M2 Medical Epidemiology
KRUSKAL-WALIS ANOVA BY RANK (Nonparametric test)
© Scott Evans, Ph.D. and Lynne Peeples, M.S.
1 Confounding and Interaction: Part II  Methods to Reduce Confounding –during study design: »Randomization »Restriction »Matching –during study analysis:
Chapter 19 Stratified 2-by-2 Tables
Chance, bias and confounding
Statistical Tests Karen H. Hagglund, M.S.
Categorical Data. To identify any association between two categorical data. Example: 1,073 subjects of both genders were recruited for a study where the.
Analysis of frequency counts with Chi square
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
Introduction to Risk Factors & Measures of Effect Meg McCarron, CDC.
Confounding And Interaction Dr. L. Jeyaseelan Department Of Biostatistics CMC, Vellore.
Confidence Intervals © Scott Evans, Ph.D..
EPI 809 / Spring 2008 Final Review EPI 809 / Spring 2008 Ch11 Regression and correlation  Linear regression Model, interpretation. Model, interpretation.
© Scott Evans, Ph.D. and Lynne Peeples, M.S.
1June In Chapter 19: 19.1 Preventing Confounding 19.2 Simpson’s Paradox (Severe Confounding) 19.3 Mantel-Haenszel Methods 19.4 Interaction.
Categorical Data Analysis: Stratified Analyses, Matching, and Agreement Statistics Biostatistics March 2007 Carla Talarico.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
THREE CONCEPTS ABOUT THE RELATIONSHIPS OF VARIABLES IN RESEARCH
Are exposures associated with disease?
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Presentation 12 Chi-Square test.
The Chi-Square Test Used when both outcome and exposure variables are binary (dichotomous) or even multichotomous Allows the researcher to calculate a.
Stratification and Adjustment
INTRODUCTION TO EPIDEMIOLO FOR POME 105. Lesson 3: R H THEKISO:SENIOR PAT TIME LECTURER INE OF PRESENTATION 1.Epidemiologic measures of association 2.Study.
Unit 6: Standardization and Methods to Control Confounding.
Analysis of Categorical Data
Some Introductory Statistics Terminology. Descriptive Statistics Procedures used to summarize, organize, and simplify data (data being a collection of.
Measuring Associations Between Exposure and Outcomes.
Simple Linear Regression
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Estimation of Various Population Parameters Point Estimation and Confidence Intervals Dr. M. H. Rahbar Professor of Biostatistics Department of Epidemiology.
Evidence-Based Medicine 3 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
Measures of Association
Amsterdam Rehabilitation Research Center | Reade Multiple regression analysis Analysis of confounding and effectmodification Martin van de Esch, PhD.
R Programming Odds & Odds Ratios 1. Session 3 Overview 1.Odds 2.Odds Ratio (OR) 3.Confidence Intervals for OR’s 4.Inference based on OR’s 2.
Dr.Shaikh Shaffi Ahamed Ph.D., Dept. of Family & Community Medicine
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Descriptive Statistics, The Normal Distribution, and Standardization.
October 15. In Chapter 19: 19.1 Preventing Confounding 19.2 Simpson’s Paradox 19.3 Mantel-Haenszel Methods 19.4 Interaction.
The binomial applied: absolute and relative risks, chi-square.
HSRP 734: Advanced Statistical Methods May 29, 2008.
1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic.
Chapter-8 Chi-square test. Ⅰ The mathematical properties of chi-square distribution  Types of chi-square tests  Chi-square test  Chi-square distribution.
Analytical epidemiology Disease frequency Study design: cohorts & case control Choice of a reference group Biases Alain Moren, 2006 Impact Causality Effect.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Case Control Study : Analysis. Odds and Probability.
A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand.
More Contingency Tables & Paired Categorical Data Lecture 8.
Confounding and effect modification Epidemiology 511 W. A. Kukull November
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Counts and Proportions.
THE CHI-SQUARE TEST BACKGROUND AND NEED OF THE TEST Data collected in the field of medicine is often qualitative. --- For example, the presence or absence.
Fall 2002Biostat Inference for two-way tables General R x C tables Tests of homogeneity of a factor across groups or independence of two factors.
Introdcution to Epidemiology for Medical Students Université Paris-Descartes Babak Khoshnood INSERM U1153, Equipe EPOPé (Dir. Pierre-Yves Ancel) Obstetric,
(www).
Measures of disease frequency Simon Thornley. Measures of Effect and Disease Frequency Aims – To define and describe the uses of common epidemiological.
Epidemiologic Measures of Association
Epidemiology 503 Confounding.
Hypothesis testing. Chi-square test
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Saturday, August 06, 2016 Farrokh Alemi, PhD.
Hypothesis testing. Chi-square test
Association, correlation and regression in biomedical research
Kanguk Samsung Hospital, Sungkyunkwan University
Evaluating Effect Measure Modification
Confounders.
Case-control studies: statistics
Effect Modifiers.
Presentation transcript:

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Contingency Tables

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.2 Contingency Tables  We are often interested in determining whether there is an association between two categorical variables.  Note that association does not necessarily imply causality.  In these cases, data may be represented in a two-dimensional table.

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.3 Smoking Smoker Non- smoker Lung Cancer Yesac Nobd

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.4 Contingency Tables  The categorical variables can have more than two levels.  The variables may also be ordinal, however this requires more advanced methods. For now, we consider the case in which both variables are nominal.

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.5 Chi-Square Test  A hypothesis test:  H 0 : no association  H A : association  Strategy: compare what is observed to what is expected if H 0 is true (i.e., no association).  If difference is large, then there is evidence of association  If difference is not large, then insufficient evidence to conclude an association

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.6 Chi-Square Test  Some limitations:  Does not describe the magnitude or the direction of the association  Relies on “large sample theory” (an assumption), which means that the test may be invalid if expected cell sizes are too small (<5). Thus avoid use under these conditions.

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.7 Exact Tests  A hypothesis test:  H 0 : no association  H A : association  Does not rely on the assumption of large samples.  Computes “exact” probability of observing the data in the given study, if no association was present.

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.8 Exact Tests  Computationally intensive (particularly for large datasets)  For this reason, it is historically been used as a back-up for the chi-square test when samples were small.  However, given the power of today’s computers, I suggest using this as a primary analysis strategy (instead of a chi-square test) whenever possible.

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.9 Exact Tests  Definitely use with small expected cell sizes.  Does not describe the magnitude or the direction of the association  Often called “Fishers exact test” for 2X2 tables. For more general dimensions, it is simply called an “exact test”.

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.10 Exact Tests  Insert exact.pdf  Insert Tables_892.pdf  Insert racial_profiling.pdf  Insert contingency_tables.doc

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.11 Odds Ratio  A measure of association indicating magnitude and direction  Commonly used in epidemiology  Ranges from 0 to   Approximates how much more likely (or unlikely) it is for the outcome to be present among those with “exposure” than those without exposure.

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.12 Odds Ratio (OR)  OR estimate has a skewed distribution (although it is normal for large samples in theory). However, ln(OR estimate) is normal (note that this is the natural logarithm). Thus we base testing and CIs using this.

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.13 OR: Interpretation Example  If y denotes the presence (1) or absence (0) of lung cancer and x denotes whether the person is a smoker (1=smoker, 0=non-smoker), then an estimated odds ratio of 2 implies that lung cancer is twice as likely to occur among smokers than among nonsmokers in the study population.

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.14 OR: Interpretation Example  If y denotes the presence (1) or absence (0) of heart disease and x denotes whether the person engages in regular strenuous physical exercise (1= exercise, 0= no exercise), then an estimated odds ratio of 0.5 implies that heart disease is half as likely to occur among those who exercise than those who do not exercise.

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.15 OR  Insert OR_examples.pdf

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.16 OR  Useful regardless of how data were collected.  OR~RR when disease is rare  RR: Relative Risk or Risk Ratio  Ratio of the risk of developing a disease if exposed relative to the risk of developing a disease if unexposed

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.17 Sets of Contingency Tables  We may have a scenario, whereby we have a contingency table for each level (stratum) of a third (potentially confounding) factor.  Example: we develop a contingency table to examine the association between coffee consumption and myocardial infarction (MI). We gather these data for both smokers and non-smokers as smoking status may confound our results.

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.18 Heterogeneous ORs?  We must first determine if the OR is homogeneous across stratum. This can be done with a hypothesis test  H 0 : OR is homogeneous across strata  H A : OR is heterogeneous across strata  This is equivalent to testing for a statistical “interaction”

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.19 Heterogeneous ORs?  If OR are heterogeneous across strata then:  There is an interaction between the third variable and the association  We need to perform a separate analyses by subgroup.

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.20 Interaction and Confounding  Interaction (effect-modification): there is an interaction between x and y when the effect of y on z depend upon the level of x.  Example: if the risk of smoking on developing lung cancer differs between males and females, then there is an interaction between smoking and gender.

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.21 Interaction and Confounding  Confounding occurs when the effect of variable x on z is distorted when we fail to control for variable y.  We say that y is a confounder for the effect of x on z.  This is different from interaction

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.22 Mantel-Haenszel (MH) Methods  If OR are not heterogeneous (i.e., homogeneous) across strata then we may use MH methods.  MH OR: measure of association  Controls for the potentially confounding effect of a third variable.

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.23 Mantel-Haenszel (MH) Methods  MH OR  Adjusted OR (for confounders)  A weighted average of the individual ORs  Considered a “stratified” analyses.

Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.24 Mantel-Haenszel (MH) Methods  MH methods control for confounding  MH methods are only appropriate when there is no interaction.  We can perform hypothesis tests for the MH OR.  We may also obtain a respective CI for the MH OR.