Analysis of matched data; plus, diagnostic testing.

Slides:



Advertisements
Similar presentations
M2 Medical Epidemiology
Advertisements

Conditional Probability
KRUSKAL-WALIS ANOVA BY RANK (Nonparametric test)
Tests for Binary/Categorical outcomes. Binary or categorical outcomes (proportions) Outcome Variable Are the observations correlated?Alternative to the.
1 Case-Control Study Design Two groups are selected, one of people with the disease (cases), and the other of people with the same general characteristics.
Chance, bias and confounding
Confounding And Interaction Dr. L. Jeyaseelan Department Of Biostatistics CMC, Vellore.
EPI 809 / Spring 2008 Final Review EPI 809 / Spring 2008 Ch11 Regression and correlation  Linear regression Model, interpretation. Model, interpretation.
Epidemiology in Medicine Sandra Rodriguez Internal Medicine TTUHSC.
Chapter 17 Comparing Two Proportions
Inferences About Process Quality
Chapter 17 Comparing Two Proportions
Statistics for Health Care
Today Concepts underlying inferential statistics
Sample Size Determination
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Sample Size Determination Ziad Taib March 7, 2014.
16: Odds Ratios [from case- control studies] Case-control studies get around several limitations of cohort studies.
(Medical) Diagnostic Testing. The situation Patient presents with symptoms, and is suspected of having some disease. Patient either has the disease or.
Mapping Rates and Proportions. Incidence rates Mortality rates Birth rates Prevalence Proportions Percentages.
The 2x2 table, RxCxK contingency tables, and pair-matched data July 27, 2004.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Conditional Logistic Regression for Matched Data HRP /25/04 reading: Agresti chapter 9.2.
Conditional Probability and Screening Tests
Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.
Statistics in Screening/Diagnosis
Analysis of Categorical Data
 Mean: true average  Median: middle number once ranked  Mode: most repetitive  Range : difference between largest and smallest.
Multiple Choice Questions for discussion
Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Conditional Logistic Regression for Matched Data.
Case-Control Studies (retrospective studies) Sue Lindsay, Ph.D., MSW, MPH Division of Epidemiology and Biostatistics Institute for Public Health San Diego.
Measuring Associations Between Exposure and Outcomes.
Case control study Moderator : Chetna Maliye Presenter Reshma Sougaijam.
Evidence-Based Medicine 4 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
EPIB-591 Screening Jean-François Boivin 29 September
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Retrospective Cohort Study. Review- Retrospective Cohort Study Retrospective cohort study: Investigator has access to exposure data on a group of people.
Statistics for Health Care Biostatistics. Phases of a Full Clinical Trial Phase I – the trial takes place after the development of a therapy and is designed.
POTH 612A Quantitative Analysis Dr. Nancy Mayo. © Nancy E. Mayo A Framework for Asking Questions Population Exposure (Level 1) Comparison Level 2 OutcomeTimePECOT.
Statistics for Infection Control Practitioners Presented By: Shana O’Heron, MPH, CIC Infection Prevention and Management Associates.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 8 – Comparing Proportions Marshall University Genomics.
Analysis of matched data HRP /02/04 Chapter 9 Agresti – read sections 9.1 and 9.2.
1October In Chapter 17: 17.1 Data 17.2 Risk Difference 17.3 Hypothesis Test 17.4 Risk Ratio 17.5 Systematic Sources of Error 17.6 Power and Sample.
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
The binomial applied: absolute and relative risks, chi-square.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.4 Analyzing Dependent Samples.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Screening and its Useful Tools Thomas Songer, PhD Basic Epidemiology South Asian Cardiovascular Research Methodology Workshop.
Case Control Study : Analysis. Odds and Probability.
Tests of Association (Proportion test, Chi-square & Fisher’s exact test) Dr.L.Jeyaseelan Dept.of Biostatistics Christian Medical College Vellore, India.
A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand.
Positive Predictive Value and Negative Predictive Value
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
More Contingency Tables & Paired Categorical Data Lecture 8.
Matching. Objectives Discuss methods of matching Discuss advantages and disadvantages of matching Discuss applications of matching Confounding residual.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
How to do Power & Sample Size Calculations Part 1 **************** GCRC Research-Skills Workshop October 18, 2007 William D. Dupont Department of Biostatistics.
Fall 2002Biostat Inference for two-way tables General R x C tables Tests of homogeneity of a factor across groups or independence of two factors.
Analysis of matched data Analysis of matched data.
Biostatistics Board Review Parul Chaudhri, DO Family Medicine Faculty Development Fellow, UPMC St Margaret March 5, 2016.
What are the Chances Dr? Nick Pendleton. Can I have a Prostate Check? ?
Uses of Diagnostic Tests Screen (mammography for breast cancer) Diagnose (electrocardiogram for acute myocardial infarction) Grade (stage of cancer) Monitor.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Measures of disease frequency Simon Thornley. Measures of Effect and Disease Frequency Aims – To define and describe the uses of common epidemiological.
The binomial applied: absolute and relative risks, chi-square
Lecture 8 – Comparing Proportions
Random error, Confidence intervals and P-values
Comparing Populations
Review – First Exam Chapters 1 through 5
Presentation transcript:

Analysis of matched data; plus, diagnostic testing

Correlated Observations Correlated data arise when pairs or clusters of observations are related and thus are more similar to each other than to other observations in the dataset. Ignoring correlations will: – overestimate p-values for within-person or within-cluster comparisons – underestimate p-values for between-person or between-cluster comparisons

Pair Matching: Why match? Pairing can control for extraneous sources of variability and increase the power of a statistical test. Match 1 control to 1 case based on potential confounders, such as age, gender, and smoking.

Example Johnson and Johnson (NEJM 287: , 1972) selected 85 Hodgkin’s patients who had a sibling of the same sex who was free of the disease and whose age was within 5 years of the patient’s…they presented the data as…. Hodgkin’s Sib control TonsillectomyNone From John A. Rice, “Mathematical Statistics and Data Analysis. OR=1.47; chi-square=1.53 (NS)

Example But several letters to the editor pointed out that those investigators had made an error by ignoring the pairings. These are not independent samples because the sibs are paired…better to analyze data like this: From John A. Rice, “Mathematical Statistics and Data Analysis. OR=2.14*; chi-square=2.91 (p=.09) Tonsillectomy None TonsillectomyNone Case Control

Pair Matching: example Match each MI case to an MI control based on age and gender. Ask about history of diabetes to find out if diabetes increases your risk for MI.

Pair Matching: example Which cells are informative? Just the discordant cells are informative! Diabetes No diabetes DiabetesNo Diabetes MI cases MI controls

Pair Matching Diabetes No diabetes DiabetesNo Diabetes MI cases MI controls OR estimate comes only from discordant pairs! The question is: among the discordant pairs, what proportion are discordant in the direction of the case vs. the direction of the control. If more discordant pairs “favor” the case, this indicates OR>1.

Diabetes No diabetes DiabetesNo Diabetes MI cases MI controls P(“favors” case/discordant pair) =

Diabetes No diabetes DiabetesNo Diabetes MI cases MI controls odds(“favors” case/discordant pair) =

Diabetes No diabetes DiabetesNo Diabetes MI cases MI controls OR estimate comes only from discordant pairs!! OR= 37/16 = 2.31 Makes Sense!

Diabetes No diabetes DiabetesNo Diabetes MI cases MI controls McNemar’s Test Null hypothesis: P(“favors” case / discordant pair) =.5 (note: equivalent to OR=1.0 or cell b=cell c)

Diabetes No diabetes DiabetesNo Diabetes MI cases MI controls McNemar’s Test Null hypothesis: P(“favors” case / discordant pair) =.5 (note: equivalent to OR=1.0 or cell b=cell c) By normal approximation to binomial:

McNemar’s Test: generally By normal approximation to binomial: Equivalently: exp No exp expNo exp ab c d cases controls

Diabetes No diabetes DiabetesNo Diabetes MI cases MI controls McNemar’s Test McNemar’s Test:

Example: McNemar’s EXACT test Split-face trial: – Researchers assigned 56 subjects to apply SPF 85 sunscreen to one side of their faces and SPF 50 to the other prior to engaging in 5 hours of outdoor sports during mid-day. The outcome is sunburn (yes/no). – Unit of observation = side of a face – Are the observations correlated? Yes. Russak JE et al. JAAD 2010; 62:

Results ignoring correlation: Table I -- Dermatologist grading of sunburn after an average of 5 hours of skiing/snowboarding (P =.03; Fisher’s exact test) Sun protection factorSunburnedNot sunburned Fisher’s exact test compares the following proportions: 1/56 versus 8/56. Note that individuals are being counted twice!

Correct analysis of data: Table 1. Correct presentation of the data (P =.016; McNemar’s exact test). SPF-50 side SPF-85 sideSunburnedNot sunburned Sunburned10 Not sunburned748 McNemar’s exact test: Null hypothesis: X~binomial (n=7, p=.5)

Standard error of the difference of two proportions= RECALL: 95% confidence interval for a difference in INDEPENDENT proportions Standard error can be estimated by:95% confidence interval for the difference between two proportions:

95% CI for difference in dependent proportions Variance of the difference of two random variables is the sum of their variances minus 2*covariance:

95% CI for difference in dependent proportions Diabetes No diabetes DiabetesNo Diabetes MI cases MI controls

The connection between McNemar and Cochran-Mantel-Haenszel Tests

View each pair is it’s own “age-gender” stratum Diabetes No diabetes Case (MI)Control Example: Concordant for exposure (cell “a” from before)

Diabetes No diabetes Case (MI)Control Diabetes No diabetes Case (MI)Control x 9 x 37 Diabetes No diabetes Case (MI)Control Diabetes No diabetes Case (MI)Control x 16 x 82

Mantel-Haenszel for pair- matched data We want to know the relationship between diabetes and MI controlling for age and gender (the matching variables). Mantel-Haenszel methods apply.

RECALL: The Mantel-Haenszel Summary Odds Ratio Exposed Not Exposed CaseControl ab c d

Diabetes No diabetes Case (MI)Control Diabetes No diabetes Case (MI)Control ad/T = 0 bc/T=0 ad/T=1/2 bc/T=0 Diabetes No diabetes Case (MI)Control Diabetes No diabetes Case (MI)Control ad/T=0 bc/T=1/2 ad/T=0 bc/T=0 x 9 x 37 x 16 x 82

Mantel-Haenszel Summary OR

Mantel-Haenszel Test Statistic (same as McNemar’s)

Concordant cells contribute nothing to Mantel- Haenszel statistic (observed=expected) Diabetes No diabetes Case (MI)Control Diabetes No diabetes Case (MI)Control

Discordant cells Diabetes No diabetes Case (MI)Control Diabetes No diabetes Case (MI)Control

From: “Large outbreak of Salmonella enterica serotype paratyphi B infection caused by a goats' milk cheese, France, 1993: a case finding and epidemiological study” BMJ 312: ; Jan Example: Salmonella Outbreak in France, 1996

Epidemic Curve

Matched Case Control Study Case = Salmonella gastroenteritis. Community controls (1:1) matched for:  age group ( = 65 years)  gender  city of residence

Results

In 2x2 table form: any goat’s cheese Goat’s cheese None 2930 Goat’ cheeseNone Cases Controls

In 2x2 table form: Brand A Goat’s cheese Goat’s cheese B None 1049 Goat’ cheese BNone Cases Controls

Brand A None Case (MI)Control Brand A None Case (MI)Control Brand A None Case (MI)Control Brand A None Case (MI)Control x8x8 x 24 x2x2 x 25

Summary: 8 concordant-exposed pairs (=strata) contribute nothing to the numerator (observed-expected=0) and nothing to the denominator (variance=0). Summary: 25 concordant-unexposed pairs contribute nothing to the numerator (observed-expected=0) and nothing to the denominator (variance=0). Using Agresti notation here!

Summary: 2 discordant “control-exposed” pairs contribute -.5 each to the numerator (observed-expected= -.5) and.25 each to the denominator (variance=.25). Summary: 24 discordant “case-exposed” pairs contribute +.5 each to the numerator (observed-expected= +.5) and.25 each to the denominator (variance=.25).

Diagnostic Testing and Screening Tests

Characteristics of a diagnostic test Sensitivity= Probability that, if you truly have the disease, the diagnostic test will catch it. Specificity=Probability that, if you truly do not have the disease, the test will register negative.

Calculating sensitivity and specificity from a 2x2 table +- +ab -cd Screening Test Truly have disease Sensitivity Specificity Among those with true disease, how many test positive? Among those without the disease, how many test negative? a+b c+d

Hypothetical Example Mammography Breast cancer ( on biopsy) Sensitivity=9/10= Specificity= 881/990 =.89 1 false negatives out of 10 cases 109 false positives out of 990

What factors determine the effectiveness of screening? The prevalence (risk) of disease. The effectiveness of screening in preventing illness or death. – Is the test any good at detecting disease/precursor (sensitivity of the test)? – Is the test detecting a clinically relevant condition? – Is there anything we can do if disease (or pre-disease) is detected (cures, treatments)? – Does detecting and treating disease at an earlier stage really result in a better outcome? The risks of screening, such as false positives and radiation.

Positive predictive value The probability that if you test positive for the disease, you actually have the disease. Depends on the characteristics of the test (sensitivity, specificity) and the prevalence of disease.

Example: Mammography Mammography utilizes ionizing radiation to image breast tissue. The examination is performed by compressing the breast firmly between a plastic plate and an x-ray cassette that contains special x-ray film. Mammography can identify breast cancers too small to detect on physical examination. Early detection and treatment of breast cancer (before metastasis) can improve a woman ’ s chances of survival. Studies show that, among year-old women, screening results in 20-35% reductions in mortality from breast cancer.

Mammography Controversy exists over the efficacy of mammography in reducing mortality from breast cancer in year old women. Mammography has a high rate of false positive tests that cause anxiety and necessitate further costly diagnostic procedures. Mammography exposes a woman to some radiation, which may slightly increase the risk of mutations in breast tissue.

Example A 60-year old woman has an abnormal mammogram; what is the chance that she has breast cancer? E.g., what is the positive predictive value?

Calculating PPV and NPV from a 2x2 table +- +ab -cd Screening Test Truly have disease PPV NPV Among those who test positive, how many truly have the disease? Among those who test negative, how many truly do not have the disease? a+cb+d

Hypothetical Example Mammography Breast cancer ( on biopsy) PPV=9/118=7.6% Prevalence of disease = 10/1000 =1% NPV=881/882=99.9%

What if disease was twice as prevalent in the population? Mammography Breast cancer ( on biopsy) sensitivity=18/20= specificity=872/980=.89 Sensitivity and specificity are characteristics of the test, so they don’t change!

What if disease was more prevalent? PPV=18/126=14.3% Prevalence of disease = 20/1000 =2% NPV=872/874=99.8% Mammography Breast cancer ( on biopsy)

Conclusions Positive predictive value increases with increasing prevalence of disease Or if you change the diagnostic tests to improve their accuracy.