Evidence Based Diagnosis Mark J. Pletcher, MD MPH 6/28/2012 Combining Tests.

Slides:

Advertisements

Similar presentations

Michael A. Kohn, MD, MPP 6/9/2011 Combining Tests and Multivariable Decision Rules.

Advertisements

Brief introduction on Logistic Regression

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Conditional Probability

TESTING A TEST Ian McDowell Department of Epidemiology & Community Medicine November, 2004.

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

Exploring uncertainty in cost effectiveness analysis NICE International and HITAP copyright © 2013 Francis Ruiz NICE International (acknowledgements to:

Assessing Information from Multilevel (Ordinal) and Continuous Tests ROC curves and Likelihood Ratios for results other than “+” or “-” Michael A. Kohn,

Receiver Operating Characteristic (ROC) Curves

Prediction Models in Medicine Clinical Decision Support The Road Ahead Chapter 10.

Baye’s Rule and Medical Screening Tests. Baye’s Rule Baye’s Rule is used in medicine and epidemiology to calculate the probability that an individual.

Decision Tree Models in Data Mining

DATASET INTRODUCTION 1. Dataset: Urine 2 From Cleveland Clinic

Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.

First Trimester Screening

Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.

Statistics in Screening/Diagnosis

BASIC STATISTICS: AN OXYMORON? (With a little EPI thrown in…) URVASHI VAID MD, MS AUG 2012.

Multiple Choice Questions for discussion

Michael A. Kohn, MD, MPP 10/30/2008 Chapter 7 – Prognostic Tests Chapter 8 – Combining Tests and Multivariable Decision Rules.

N318b Winter 2002 Nursing Statistics Specific statistical tests: Correlation Lecture 10.

Stats Tutorial. Is My Coin Fair? Assume it is no different from others (null hypothesis) When will you no longer accept this assumption?

Module One: Introduction to the California Prenatal Screening Program

Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 5: Classification Trees: An Alternative to Logistic.

Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 16, 2008.

Correlational Designs

UNDERSTANDINGCLINICAL DECISION RULES Department of Emergency Medicine Stony Brook University Adam J Singer, MD Professor and Vice Chairman for Research.

In the name of god First Trimester Screening Dr.M.Moradi.

Dichotomous Tests Thomas B. Newman, MD, MPH September 27, 2012 Thanks to Josh Galanter and Michael Shlipak 1.

Evidence Based Medicine Workshop Diagnosis March 18, 2010.

EVIDENCE ABOUT DIAGNOSTIC TESTS Min H. Huang, PT, PhD, NCS.

Prenatal Diagnosis of Congenital Malformations for Undergraduates

+ Clinical Decision on a Diagnostic Test Inna Mangalindan. Block N. Class September 15, 2008.

Diagnosis: EBM Approach Michael Brown MD Grand Rapids MERC/ Michigan State University.

MEASURES OF TEST ACCURACY AND ASSOCIATIONS DR ODIFE, U.B SR, EDM DIVISION.

Appraising A Diagnostic Test

Assessing Information from Multilevel (Ordinal) and Continuous Tests ROC curves and Likelihood Ratios for results other than “+” or “-” Michael A. Kohn,

Likelihood 2005/5/22. Likelihood  probability I am likelihood I am probability.

Michael A. Kohn, MD, MPP 10/28/2010 Chapter 7 – Prognostic Tests Chapter 8 – Combining Tests and Multivariable Decision Rules.

Evidence-Based Medicine Diagnosis Component 2 / Unit 5 1 Health IT Workforce Curriculum Version 1.0 /Fall 2010.

1 Risk Assessment Tests Marina Kondratovich, Ph.D. OIVD/CDRH/FDA March 9, 2011 Molecular and Clinical Genetics Panel for Direct-to-Consumer (DTC) Genetic.

1. Statistics Objectives: 1.Try to differentiate between the P value and alpha value 2.When to perform a test 3.Limitations of different tests and how.

CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.

Assessing Information from Multilevel (Ordinal) Tests ROC curves and Likelihood Ratios for results other than “+” or “-” Michael A. Kohn, MD, MPP 10/4/2007.

Evaluating Results of Learning Blaž Zupan

Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Assessing Information from Multilevel and Continuous Tests Likelihood Ratios for results other than “+” or “-” Michael A. Kohn, MD, MPP 10/13/2011.

Prediction statistics Prediction generally True and false, positives and negatives Quality of a prediction Usefulness of a prediction Prediction goes Bayesian.

Multiple Tests, Multivariable Decision Rules, and Prognostic Tests Michael A. Kohn, MD, MPP 10/25/2007 Chapter 8 – Multiple Tests and Multivariable Decision.

Welcome Back From Lunch. Thursday Afternoon 2:00-3:00 Studies of Diagnostic Test Accuracy (Tom) 3:00-3:45 Combining Tests (Mark) 3:45-4:00 Break 4:00-5:30.

HSS4303B – Intro to Epidemiology Feb 8, Agreement.

Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.

1 Medical Epidemiology Interpreting Medical Tests and Other Evidence.

Machine Learning in Practice Lecture 5 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Logistic Regression Analysis Gerrit Rooks

EVALUATING u After retrieving the literature, you have to evaluate or critically appraise the evidence for its validity and applicability to your patient.

Laboratory Medicine: Basic QC Concepts M. Desmond Burke, MD.

Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.

PTP 560 Research Methods Week 12 Thomas Ruediger, PT.

Diagnostic Likelihood Ratio Presented by Juan Wang.

Sensitivity, Specificity, and Receiver- Operator Characteristic Curves 10/10/2013.

Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.

Other tests of significance. Independent variables: continuous Dependent variable: continuous Correlation: Relationship between variables Regression:

Biostatistics Class 2 Probability 2/1/2000.

Bootstrap and Model Validation

Evaluating Results of Learning

Jeffrey A. Kuller, MD; Sean C. Blackwell, MD

Diagnosis II Dr. Brent E. Faught, Ph.D. Assistant Professor

Refining Probability Test Informations Vahid Ashoorion MD. ,MSc,

Evidence Based Diagnosis

Presentation transcript:

Evidence Based Diagnosis Mark J. Pletcher, MD MPH 6/28/2012 Combining Tests

Acknowledgements For this lecture I’ve adapted a slide set from Mike Kohn

Combining Tests Overview A case with 2 simple tests Test non-independence Approaches to combining tests Looking at all possible combinations of results Recursive partitioning, logistic regression, other Overfitting and validation in multitest panels

Combining Tests – A Case Case Pregnant woman getting prenatal care, worried about Down’s Syndrome (Trisomy 21) Chorionic Villus Samping (CVS) is a definitive test, but there is a risk of miscarriage Should she get this procedure?

Combining Tests – A Case Age helps… Risk goes up with age Our patient is 41, so pretest risk is ~2%...

Ultrasound can help even more It’s harmless Several features predict Trisomy 21 (Down’s) at weeks* Nuchal translucency Nasal bone absence Combining Tests – A Case *Cicero, S., G. Rembouskos, et al. (2004). "Likelihood ratio for trisomy 21 in fetuses with absent nasal bone at the week scan." Ultrasound Obstet Gynecol 23(3): How do we use these two features together?

Combining Tests – A Case First, nuchal translucency (NT)

Wider translucent “gap” here is predictive of Down’s

Nuchal Translucency Data Cross-sectional study 5556 Pregnant Women undergoing CVS 333 (6%) with Trisomy 21 fetus All had ultrasound at weeks

Dichotomize here for purposes of illustration

Nuchal Translucency Data Trisomy 21 Nuchal D+ D- Translucency ≥ 3.5 mm(+) < 3.5 mm(-) Total Sensitivity and Specificity? PPV and NPV?

Nuchal Translucency Sensitivity = 212/333 = 64% Specificity = 4745/5223 = 91% and IF we assume that this cross-sectional sample represents our population of interest, then: Prevalence = 333/( ) = 6% PPV = 212/( ) = 31% NPV = 4745/( ) = 97.5%

Nuchal Translucency Data Trisomy 21 Nuchal D+ D- Translucency ≥ 3.5 mm < 3.5 mm Total LR+ and LR-?

Nuchal Translucency Data Trisomy 21 Nuchal D+ D- LR Translucency ≥ 3.5 mm < 3.5 mm Total LR+ = P(T+|D+)/P(T+|D-) LR- = P(T-|D+)/P(T-|D-)

Nuchal Translucency Data Trisomy 21 Nuchal D+ D- LR Translucency ≥ 3.5 mm < 3.5 mm Total LR+ = (212/333)/(478/5223) = 7.0 LR- = (121/333)/(4745/5223) = 0.4

Back to the case… Let’s apply this data to our case, with pre-test probability of 2%

Post-test risk using NT only Pre-test prob: 0.02 at age 41 Pre-test odds: 0.02/0.98 = IF TEST IS POSITIVE - LR = 7.0 Post-Test Odds = Pre-Test Odds x LR(+) = x 7.0 = Post-Test prob = 0.143/( ) = 12.5%

Post-test risk using NT only Pre-test prob: 0.02 at age 41 Pre-test odds: 0.02/0.98 = IF TEST IS NEGATIVE - LR = 0.4 Post-Test Odds = Pre-Test Odds x LR(+) = x 0.4 = Post-Test prob = /( ) =.8%

Back to the case… Is.8% risk low enough to not get CVS? Is 12.5% risk high enough to risk CVS? OTHER Ultrasound features are also predictive Nasal bone absence

Nasal Bone Seen NBA=“No” Neg for Trisomy 21 Nasal Bone Absent NBA=“Yes” Pos for Trisomy 21

Nasal Bone Absence Test Data Nasal Bone Tri21+ Tri21-LR Absent Yes No Total

Post-test risk using NBA only Pre-test prob: 0.02 at age 41 Pre-test odds: 0.02/0.98 = IF TEST IS POSITIVE - LR = 27.8 Post-Test Odds = Pre-Test Odds x LR(+) = x 27.8 =.567 Post-Test prob =.567/( ) = 36%

Post-test risk using NBA only Pre-test prob: 0.02 at age 41 Pre-test odds: 0.02/0.98 = IF TEST IS NEGATIVE - LR = 0.32 Post-Test Odds = Pre-Test Odds x LR(+) = x 0.32 = Post-Test prob = /( ) =.6%

Back to the case… NBA is a bit better than NT, but still important uncertainty… Can we combine our NT results with NBA results and do even better? How do we combine test results?

Combining tests Approach #1 – Assume independence Knowing results of one test doesn’t influence how you interpret the next test We usually assume LR is independent of pre-test probability This is what we did when we used a pre-test risk of 2% instead of 6% in our calculations If so, we can just do the calculations sequentially

Assuming test independence First do NT, assume it’s positive (LR = 7) Pre-test riskPost-test risk 2%  12.5% Then do NBA, assume it’s also positive (LR = 23.7) Pre-test riskPost-test risk 12.5%  77%

Assuming test independence What’s the mathematical shortcut? LR(1) * LR(2) = LR(1&2) 7* 27.8 = 195

Assuming test independence What’s the mathematical shortcut? LR(1) * LR(2) = LR(1&2) NTNBALR

Assuming test independence Slide rule approach (pre-test prob = 6%) Line arrows up without shrinkage

Combining tests Is it reasonable to assume independence? Does nasal bone absence tell you as much if you already know that the nuchal translucency is >3.5 mm? What can we do to figure this out?

Combining tests Approach #2 – evaluate all possible test result combinations

Joint eval of 4 test result combinations NTNBA Trisomy 21 + % Trisomy 21 - %LR Pos 15847%360.7% 69 PosNeg5416%4428.5% 1.9 NegPos7121%931.8% 12 Neg 5015%465289% 0.2 Totals333100% % Vs If tests were independent…

Combining tests The Answer – the tests are NOT completely independent So we CANNOT just multiply LR’s What should we do in this case? Use LR’s from the combination table

Joint eval of 4 test result combinations NTNBE Trisomy 21 + % Trisomy 21 - %LR Pos 15847%360.7% 69 PosNeg5416%4428.5% 1.9 NegPos7121%931.8% 12 Neg 5015%465289% 0.2 Totals333100% % Use these!

Create ROC Table NTNBETri21+SensTri SpecLR 0% Pos 15847%360.70%69 NegPos7168%933%12 PosNeg5484%44211%1.9 Neg 50100% %0.2

AUROC = 0.896

Optimal Cutoff Analysis NTNBELR Post-Test Prob Pos 6981% NegPos1243% PosNeg1.911% Neg 0.21% If we assume: Pre-test probability = 6% Threshold for CVS = 2% Optimal algorithm is “any positive test  CVS”

Non-independence What does non-independence mean?

Non-independence Slide rule approach (pre-test prob = 6%) The total arrow length is NOT equal to the sum of its parts!

Non-independence Technical definition of independence - must condition on disease status: If this stringent definition is not met, the tests are non-independent In patients with disease, a false negative on Test 1 does not affect the probability of a false negative on Test 2. In patients without disease, a false positive on Test 1 does not affect the probability of a false positive on Test 2.

Non-independence Reasons for non-independence? Tests measure the same aspect of disease. Simple example: predicting pneumonia Cyanosis: LR = 5 O2 sat 85%-90%: LR = 6 Can’t just multiply these LR’s because they really just reflect the same physiologic state!

Non-independence Reasons for non-independence? Tests measure the same aspect of disease. In our example: One aspect of Down’s syndrome is slower fetal development; the NT decreases more slowly AND the nasal bone ossifies later. Chromosomally NORMAL fetuses that develop slowly will tend to have false positives on BOTH the NT Exam and the Nasal Bone Exam.

Non-independence Other reasons for non-independence? Disease is heterogeneous In severe pneumonia, all tests tend to be abnormal, so each individual test tells you less O2 sat and respiratory rate Non-disease is heterogeneous In patients with cough but no pneumonia, abnormal tests may still track together 02 sat and respiratory rate also both abnormal with PE; and both are normal with viral URI See EBD page 158

Back to the case… Remember that we actually simplified the case: Nuchal translucency is really a continuous test. How do we take into account actual continuous NT measurement and NBA (and age, race, fetal crown-rump length, etc)?

Back to the case… Can’t do combination table for all possible combinations! 2 dichotomous tests = 4 combinations 4 dichotomous tests = 16 combinations 3 3-level tests = 27 combinations How do we deal with continuous tests?

Combining tests Approach #3: Recursive partitioning Repeatedly split the data to find optimal testing/decision algorithm “prune” the tree

Combining tests Approach #3: Recursive partitioning

Combining tests Approach #3: Recursive partitioning Non-optimal test ordering

Combining tests Approach #3: Recursive partitioning You might do nasal bone test first, then “prune”

Combining tests Approach #3: Recursive partitioning Final algorithm: do Nasal Bone exam first, stop if absence and do CVS…

Combining tests Approach #3: Recursive partitioning Sophisticated statistical algorithms optimize cutpoints

Combining tests Approach #3: Recursive partitioning For classic example, see Figure 8.7: Chest pain workup algorithm (Goldman et al)

Combining tests Approach #3: Recursive partitioning BUT: Still requires dichotomizing at cutpoints

Combining tests Approach #4: Logistic regression Uses a statistical model to combine test results and predict disease Designed to account for non-independence Handles continuous test results Can produce a “score” A single integrated continuous test result Score subject to ROC curve, C-statistic, other standard continuous test analyses

Combining tests Approach #4: Logistic regression For classic example, see Table 8.5: Predicting death in patients with pneumonia – The PORT score

Combining tests Approach #5: Other fancy algorithms Neural networks Random forests Boosting Etc.

Combining tests The Major Pitfall - Overfitting What happens when you throw more variables into a model? Will the model perform better?

Combining tests The Major Pitfall What happens when you throw more variables into a model? Will the model perform better? YES, in the “derivation” set (even random noise will look good!) NO, when you try to apply in the real world!

Combining tests The more complex your test algorithm, the more important it is to VALIDATE Split your sample into a “derivation set” and a “test set” 10-fold cross-validation, etc Validate in an EXTERNAL sample

Example 1 - predicting CAC with multiple risk factors Should we do a heart scan for atherosclerosis? Can we predict with clinical characteristics who has atherosclerosis without doing a heart scan?

Example 1 - predicting CAC with multiple risk factors AUC-ROC Model Naïve*Cross-validated Age + sex + race “” + standard CHD RF’s “” + all possible race-sex interactions Last model is most complex, highest “naïve” AUC-ROC, but NOT the highest cross-validated AUC-ROC because it is “over-fit”. * - “Naïve” AUC-ROC refers to the AUC-ROC that you get when you estimate it within the same dataset from which the test algorithm was derived

Example 2 - predicting CAC with a proteomics “signal” Proteomic analysis is an extreme example of combining test results: hundreds to thousands of signal peak heights, many just noise

Example 2: proteomics-CAC Proteomics algorithm looks great in the derivation set!

Example 2: proteomics-CAC But cross-validation shows that it was all just useless noise (AUC-ROC ~0.5)

VALIDATION No matter what technique (CART or logistic regression) is used, the tests included in a model and the way in which their results are combined must be tested on a data set different from the one used to derive the rule.

Combining Tests Take home points Test non-independence is the rule, not the exception, so usually CAN’T just multiple LR’s together In simple cases, look at LR’s for all possible test result combinations Fancier methods often used, but look for validation analyses, especially when there are LOTS of tests being combined.