Relationship between performance measures: From statistical evaluations to decision-analysis Ewout Steyerberg Dept of Public Health, Erasmus MC, Rotterdam,

Slides:



Advertisements
Similar presentations
Statistical Analysis and Data Interpretation What is significant for the athlete, the statistician and team doctor? important Will Hopkins
Advertisements

Logistic Regression.
Receiver Operating Characteristic (ROC) Curves
Lecture 16: Logistic Regression: Goodness of Fit Information Criteria ROC analysis BMTRY 701 Biostatistical Methods II.
Departments of Medicine and Biostatistics
Evaluating Diagnostic Accuracy of Prostate Cancer Using Bayesian Analysis Part of an Undergraduate Research course Chantal D. Larose.
Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,
Midterm Review Goodness of Fit and Predictive Accuracy
BS704 Class 7 Hypothesis Testing Procedures
How do we know whether a marker or model is any good? A discussion of some simple decision analytic methods Carrie Bennette on behalf of Andrew Vickers.
Lucila Ohno-Machado An introduction to calibration and discrimination methods HST951 Medical Decision Support Harvard Medical School Massachusetts Institute.
Correlational Designs
Sample Size Determination
Validation of predictive regression models Ewout W. Steyerberg, PhD Clinical epidemiologist Frank E. Harrell, PhD Biostatistician.
Health Economics & Policy 3 rd Edition James W. Henderson Chapter 4 Economic Evaluation in Health Care.
Sample Size Determination Ziad Taib March 7, 2014.
Thoughts on Biomarker Discovery and Validation Karla Ballman, Ph.D. Division of Biostatistics October 29, 2007.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Are the results valid? Was the validity of the included studies appraised?
Multiple Choice Questions for discussion
UOG Journal Club: January 2013
Criteria for Assessment of Performance of Cancer Risk Prediction Models: Overview Ruth Pfeiffer Cancer Risk Prediction Workshop, May 21, 2004 Division.
© Copyright 2009 by the American Association for Clinical Chemistry Plasma Myeloperoxidase Predicts Incident Cardiovascular Risks in Stable Patients Undergoing.
Examining validity and precision of prognostic models. Dan McGee Department of Statistics Florida State University
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 5: Classification Trees: An Alternative to Logistic.
Introduction To Biological Research. Step-by-step analysis of biological data The statistical analysis of a biological experiment may be broken down into.
Division of Population Health Sciences Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn Indices of Performances of CPRs Nicola.
10 Points to Remember on the Assessment of Cardiovascular RiskAssessment of Cardiovascular Risk Summary Prepared by Melvyn Rubenfire, MD.
EDRN Approaches to Biomarker Validation DMCC Statisticians Fred Hutchinson Cancer Research Center Margaret Pepe Ziding Feng, Mark Thornquist, Yingye Zheng,
L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.
How do we know whether a marker or model is any good? A discussion of some simple decision analytic methods Carrie Bennette (on behalf of Andrew Vickers)
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 2: Diagnostic Classification.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
EVIDENCE ABOUT DIAGNOSTIC TESTS Min H. Huang, PT, PhD, NCS.
Kevin Kennedy, MS Saint Luke’s Hospital, Kansas City, MO
Long-Term Prognostic Value for Patients with Chronic Heart Failure of Estimated Glomerular Filtration Rate Calculated with the New CKD-EPI Equations Containing.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
EMBC2001 Using Artificial Neural Networks to Predict Malignancy of Ovarian Tumors C. Lu 1, J. De Brabanter 1, S. Van Huffel 1, I. Vergote 2, D. Timmerman.
MEASURES OF TEST ACCURACY AND ASSOCIATIONS DR ODIFE, U.B SR, EDM DIVISION.
Appraising A Diagnostic Test
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
1 Risk Assessment Tests Marina Kondratovich, Ph.D. OIVD/CDRH/FDA March 9, 2011 Molecular and Clinical Genetics Panel for Direct-to-Consumer (DTC) Genetic.
Evaluating Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics.
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
BC Jung A Brief Introduction to Epidemiology - XIII (Critiquing the Research: Statistical Considerations) Betty C. Jung, RN, MPH, CHES.
Unit 15: Screening. Unit 15 Learning Objectives: 1.Understand the role of screening in the secondary prevention of disease. 2.Recognize the characteristics.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Ki-67 index cutoff value of 1% is a valuable prognostic biomarker for pulmonary carcinoids based on this large cohort. Our data also provide strong evidence.
Heart Disease Example Male residents age Two models examined A) independence 1)logit(╥) = α B) linear logit 1)logit(╥) = α + βx¡
Logistic Regression Analysis Gerrit Rooks
Evaluating Classification Performance
EVALUATING u After retrieving the literature, you have to evaluate or critically appraise the evidence for its validity and applicability to your patient.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Blackbox classifiers for preoperative discrimination between malignant and benign ovarian tumors C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Timothy Wiemken, PhD MPH Assistant Professor Division of Infectious Diseases Diagnostic Tests.
Net Reclassification Risk: a graph to clarify the potential prognostic utility of new markers Ewout Steyerberg Professor of Medical Decision Making Dept.
Direct method of standardization of indices. Average Values n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the.
© 2010 Jones and Bartlett Publishers, LLC. Chapter 12 Clinical Epidemiology.
What are the Chances Dr? Nick Pendleton. Can I have a Prostate Check? ?
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Assessing the additional value of diagnostic markers: a comparison of traditional and novel measures Ewout W. Steyerberg Professor of Medical Decision.
The index test results: positivity and negativity criteria.
Bootstrap and Model Validation
Measuring prognosis Patients want to know likely outcome
VALIDATION AND UPDATING OF MODELS WITH BIOMARKERS
Baseline Characteristics of the Study Cohort*
Regression and Clinical prediction models
Evidence Based Diagnosis
Professor of Clinical Biostatistics and Medical Decision Making Nov-19 Why Most Statistical Predictions Cannot Reliably Support Decision-Making:
Presentation transcript:

Relationship between performance measures: From statistical evaluations to decision-analysis Ewout Steyerberg Dept of Public Health, Erasmus MC, Rotterdam, the Netherlands Chicago, October 23, 2011

General issues  Usefulness / Clinical utility: what do we mean exactly?  Evaluation of predictions  Evaluation of decisions  Adding a marker to a model  Statistical significance? Testing β enough (no need to test increase in R 2, AUC, IDI, …)  Clinical relevance: measurement worth the costs? (patient and physician burden, financial costs)

Overview  Case study: residual masses in testicular cancer  Model development  Evaluation approach  Performance evaluation  Statistical  Overall  Calibration and discrimination  Decision-analytic  Utility-weighted measures

Prediction approach  Outcome: malignant or benign tissue  Predictors:  primary histology  3 tumor markers  tumor size (postchemotherapy, and reduction)  Model:  logistic regression  544 patients, 299 malignant tissue  Internal validation by bootstrapping  External validation in 273 patients, 197 malignant tissue

Logistic regression results

Evaluation approach: graphical assessment

Lessons 1.Plot observed versus expected outcome with distribution of predictions by outcome (‘Validation graph’) 2.Performance should be assessed in validation sets, since apparent performance is optimistic (model developed in the same data set as used for evaluation)  Preferably external validation  At least internal validation, e.g. by bootstrap cross-validation

Performance evaluation  Statistical criteria: predictions close to observed outcomes?  Overall; consider residuals y – ŷ, or y – p  Discrimination: separate low risk from high risk  Calibration: e.g. 70% predicted = 70% observed  Clinical usefulness: better decision-making?  One cut-off, defined by expected utility / relative weight of errors  Consecutive cut-offs: decision curve analysis

Predictions close to observed outcomes? Penalty functions  Logarithmic score: (1 – Y)*(log(1 – p)) + Y*log(p)  Quadratic score: Y*(1 – p)^2 + (1 – Y)*p^2

Overall performance measures  R 2 : explained variation  Logistic / Cox model: Nagelkerke’s R 2  Brier score: Y*(1 – p)^2 + (1 – Y)*p^2  Brier scaled = 1 – Brier / Brier max  Brier max = mean(p) x (1 – mean(p))^2 + (1 – mean(p)) x mean(p)^2  Brier scaled very similar to Pearson R 2 for binary outcomes

Overall performance in case study

Measures for discrimination  Concordance statistic, or area under the ROC curve  Discrimination slope  Lorenz curve

ROC curves for case study

Box plots with discrimination slope for case study

Lorenz concentration curves: general pattern

Lorenz concentration curves: case study

Discriminative ability of testicular cancer model

Characteristics of measures for discrimination

Measures for calibration  Graphical assessments  Cox recalibration framework (1958)  Tests for miscalibration  Cox; Hosmer-Lemeshow; Goeman - LeCessie

Calibration: general principle

Calibration: case study

Calibration tests

Hosmer-Lemeshow test for testicular cancer model

Some calibration and goodness-of-fit tests

Lessons 1.Visual inspection of calibration important at external validation, combined with test for calibration-in-the-large and calibration slope

Clinical usefulness: making decisions  Diagnostic work-up  Test ordering  Starting treatment  Therapeutic decision-making  Surgery  Intensity of treatment

Decision curve analysis Andrew Vickers Departments of Epidemiology and Biostatistics Memorial Sloan-Kettering Cancer Center

How to evaluate predictions? Prediction models are wonderful!

How to evaluate predictions? Prediction models are wonderful! How do you know that they do more good than harm?

Overview of talk Traditional statistical and decision analytic methods for evaluating predictions Theory of decision curve analysis

Illustrative example Men with raised PSA are referred for prostate biopsy In the USA, ~25% of men with raised PSA have positive biopsy ~750,000 unnecessary biopsies / year in US Could a new molecular marker help predict prostate cancer?

Molecular markers for prostate cancer detection Assess a marker in men undergoing prostate biopsy for elevated PSA Create “base” model: –Logistic regression: biopsy result as dependent variable; PSA, free PSA, age as predictors Create “marker” model –Add marker(s) as predictor to the base model Compare “base” and “marker” model

How to evaluate models? Biostatistical approach (ROC’ers) –P values –Accuracy (area-under-the-curve: AUC) Decision analytic approach (VOI’ers) –Decision tree –Preferences / outcomes

PSA velocity P value for PSAv in multivariable model <0.001 PSAv an “independent” predictor AUC:Base model = Marker model =0.626

AUCs and p values I have no idea whether to use the model or not –Is an AUC of high enough? –Is an increase in AUC of enough to make measuring velocity worth it?

Decision analysis Identify every possible decision Identify every possible consequence –Identify probability of each –Identify value of each

Cancer No cancer a (p 1 + p 3 ) b c d Cancer No Cancer Cancer No Cancer Apply model Biopsy No biopsy Cancer No cancer a b c d Cancer No Cancer Cancer No Cancer Biopsy No biopsy p1p1 p2p2 p3p3 1- (p 1 + p 2 + p 3 ) 1 - (p 1 + p 3 ) (p 1 + p 3 ) 1 - (p 1 + p 3 ) Decision tree

Optimal decision Use model –p 1 a + p 2 b + p 3 c + (1 - p 1 - p 2 - p 3 )d Treat all –(p 1 + p 3 )a + (1- (p 1 + p 3 ))b Treat none –(p 1 + p 3 )c + (1- (p 1 + p 3 ))d Which gives highest value?

Drawbacks of traditional decision analysis p’s require a cut-point to be chosen

Cancer No cancer a (p 1 + p 3 ) b c d Cancer No Cancer Cancer No Cancer Apply model Biopsy No biopsy Cancer No cancer a b c d Cancer No Cancer Cancer No Cancer Biopsy No biopsy p1p1 p2p2 p3p3 1- (p 1 + p 2 + p 3 ) 1 - (p 1 + p 3 ) (p 1 + p 3 ) 1 - (p 1 + p 3 ) Decision tree

Problems with traditional decision analysis p’s require a cut-point to be chosen Extra data needed on health values outcomes (a – d) –Harms of biopsy –Harms of delayed diagnosis –Harms may vary between patients

Cancer No cancer a (p 1 + p 3 ) b c d Cancer No Cancer Cancer No Cancer Apply model Biopsy No biopsy Cancer No cancer a b c d Cancer No Cancer Cancer No Cancer Biopsy No biopsy p1p1 p2p2 p3p3 1- (p 1 + p 2 + p 3 ) 1 - (p 1 + p 3 ) (p 1 + p 3 ) 1 - (p 1 + p 3 ) Decision tree

Evaluating values of health outcomes 1.Obtain data from the literature on: Benefit of detecting cancer (cp to missed / delayed cancer) Harms of unnecessary prostate biopsy (cp to no biopsy) Burden: pain and inconvenience Cost of biopsy

Evaluating values of health outcomes 2.Obtain data from the individual patient: What are your views on having a biopsy? How important is it for you to find a cancer?

Either way Investigator: “here is a data set, is my model or marker of value?” Analyst: “I can’t tell you, you have to go away and do a literature search first. Also, you have to ask each and every patient.”

ROCkers and VOIers ROCkers’ methods are simple and elegant but useless VOIers’ methods are useful, but complex and difficult to apply

Solving the decision tree

Threshold probability Probability of disease is Define a threshold probability of disease as p t Patient accepts treatment if

Solve the decision tree p t, cut-point for choosing whether to treat or not Harm:Benefit ratio defines p –Harm: d – b (FP) –Benefit: a – c (TP) p t / (1-p t ) = H:B

If P(D=1) = P t t t t t

Intuitively The threshold probability at which a patient will opt for treatment is informative of how a patient weighs the relative harms of false-positive and false-negative results.

Nothing new so far Equation has been used to set threshold for positive diagnostic test Work out true harms and benefits of treatment and disease –E.g. if disease is 4 times worse than treatment, treat all patients with probability of disease >20%.

A simple decision analysis 1. Select a p t

A simple decision analysis 1. Select a p t 2. Positive test defined as

A simple decision analysis 1. Select a p t 2. Positive test defined as 3. Count true positives (benefit), false positives (harm)

A simple decision analysis 1. Select a p t 2. Positive test defined as 3. Count true positives (benefit), false positives (harm) 4. Calculate “Clinical Net Benefit” as:

Long history: Peirce 1884

Peirce 1884

Worked example at p t = 20% N=2742NegativeTrue positive False positive Net benefit calculationNet benefit Biopsy if risk ≥ 20% – 1743 × (0.2 ÷ 0.8) Biopsy all men × (0.2 ÷ 0.8)

Net benefit has simple clinical interpretation Net benefit of at p t of 20% Using the model is the equivalent of a strategy that identified the equivalent of 7.9 cancers per 100 patients with no unnecessary biopsies

Net benefit has simple clinical interpretation Difference between model and treat all at p t of 20%. –5/1000 more TPs for equal number of FPs Divide by weighting 0.005/ 0.25 = 0.02 –20/1000 less FPs for equal number of TPs (=20/1000 fewer unnecessary biopsies with no missed cancers)

Decision curve analysis 4. Vary p t over an appropriate range Vickers & Elkin Med Decis Making 2006;26:565– Select a p t 2. Positive test defined as 3. Calculate “Clinical Net Benefit” as:

Decision curve: theory

Treat none

Treat all [p(outcome)=50%] Treat all Treat none

Decisions with model Decisions based on model Treat all Treat none

Points in Decision Curves If treat none, NB =.. If treat all, and threshold = 0%, NB = … If cut-off is incidence of end point –NB treat none = NB treat all = …

Decision curve analysis Decision curve analysis tells us about the clinical value of a model where accuracy metrics do not Decision curve analysis does not require either: –Additional data –Individualized assessment Simple to use software is available to implement decision curve analysis

Decision analysis in the medical research literature Only a moderate number of papers devoted to decision analysis Many thousands of papers analyzed without reference to decision making (ROC curves, p values)

Decision Curve Analysis With thanks to…. –Elena Elkin –Mike Kattan –Daniel Sargent –Stuart Baker –Barry Kramer –Ewout Steyerberg

Illustrations

Clinical usefulness of testicular cancer model  Cutoff 70% necrosis / 30% malignant, motivated by  Decision analysis  Current practice: ≈ 65%

Net benefit calculations Resect all: NB=(299–3/7∙245)/544= Resect none: NB = (0 – 0) / 544 = 00 Model: NB =(275–3/7∙143)/544= Difference model – resect all: more resections of tumor3.6/1000 at the same number of unnecessary resections of necrosis

Decision curves for testicular cancer model

Comparison of performance measures

Lessons 1.Clinical usefulness may be limited despite reasonable discrimination and calibration

Which performance measure when?  It depends …  Evaluation of usefulness requires weighting and consideration of outcome incidence Hilden J. Prevalence-free utility-respecting summary indices of diagnostic power do not exist. Stat Med. 2000;19(4):  Summary indices vs graphs (e.g. area vs ROC curve, validation graphs, decision curves, reclassification table vs predictiveness curve)

Which performance measure when? 1.Discrimination: if poor, usefulness unlikely, but NB >= 0 2.Calibration: if poor in new setting, risk of NB<0

Conclusions  Statistical evaluations important, but may be at odds with evaluation of clinical usefulness; ROC 0.8 good? 0.6 always poor? NO!  Decision-analytic based performance measures, such as decision curves, are important to consider in the evaluation of the potential of a prediction model to support individualized decision making

References  Steyerberg, EW. Clinical prediction models: a practical approach to development, validation, and updating. New York, Springer,  Vickers AJ, Elkin EB: Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making 26:565-74, 2006  Steyerberg EW, Vickers AJ: Decision Curve Analysis: A Discussion. Med Decis Making 28; 146, 2008  Pencina MJ, D'Agostino RB Sr, Steyerberg EW. Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med 30:11-21, 2011  Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, buchowski N, Pencina MJ, Kattan MW. Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology, 21:128-38, 2010  Steyerberg EW, Pencina MJ, Lingsma HF, Kattan MW, Vickers AJ, Van Calster B. Assessing the incremental value of diagnostic and prognostic markers: a review and illustration. Eur J Clin Invest  Steyerberg EW, Van Calster B, Pencina MJ. Performance measures for prediction models and markers: evaluation of predictions and classifications Rev Esp Cardiol 64: , 2011

Evaluation of incremental value of markers

Case study: CVD prediction  Cohort: 3264 participants in Framingham Heart Study  Age 30 to 74 years  183 developed CHD (10 year risk: 5.6%)  Data as used in  Pencina MJ, D'Agostino RB Sr, D'Agostino RB Jr, Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 27: , 2008  Steyerberg EW, Van Calster B, Pencina MJ. Performance measures for prediction models and markers: evaluation of predictions and classifications Rev Esp Cardiol 64: , 2011

Analysis  Cox proportional hazards models  Time to event data  Reference model:  Dichotomous: Sex, diabetes, smoking  Continuous: age, systolic blood pressure (SBP), total cholesterol as continuous  All hazard ratios statistically signicant  Add high-density lipoprotein (HDL) cholesterol  continuous predictor highly signicant (hazard ratio = 0.65, P-value <.001)

How good are these models?  Performance of reference model  Incremental value of HDL

Performance criteria  Steyerberg EW, Van Calster B, Pencina, M. Medidas del rendimiento de modelos de prediccio ´n y marcadores pronosticos: evaluacion de las predicciones y clasificaciones. Rev Esp Cardiol doi: /j.recesp

Case study: quality of predictions

Discrimination Area: without HDL vs with HDL

Calibration  Internal: quite good  External: more relevant

Performance  Full range of predictions  ROC  R 2 ..  Classifications / decisions  Cut-off to define low vs high risk

Determine a cut-off for classification  Data-driven cut-off  Youden’s index: sensitivity + specificity – 1  E.g. sens 80%, spec 80%  Youden = …  E.g. sens 90%, spec 80%  Youden = …  E.g. sens 80%, spec 90%  Youden = …  E.g. sens 40%, spec 60%  Youden = …  E.g. sens 100%, spec 100%  Youden = …  Youden’s index maximized: upper left corner ROC curve  If predictions perfectly calibrated  Upper left corner: cut-off = incidence of the outcome  Incidence = 183/3264 = 5.6%

Determine a cut-off for classification  Data-driven cut-off  Youden’s index: sensitivity + specificity – 1  Decision-analytic  Cut-off determined by clinical context  Relative importance (‘utility’) of the consequence of a true or false classification  True-positive classification: correct treatment  False-positive classification: overtreatment  True-negative classification: no treatment  False-negative classification: undertreatment  Harm: net overtreatment (FP-TN)  Benefit: net correct treatment (TP-FN)  Odds of the cut-off = H:B ratio

Evaluation of performance  Youden index: “science of the method”  Net Benefit: “utility of the method”  References:  Peirce, Science 1884  Vergouwe, Semin Urol Oncol 2002  Vickers, MDM 2006

Net Benefit  Net Benefit = (TP – w FP) / N w = cut-off/ (1 – cut-off)  e.g.: cut-off 50%: w =.5/.5=1; cut-off 20%: w=.2/.8=1/4  w = H : B ratio  “Number of true-positive classifications, penalized for false-positive classifications”

Increase in AUC  5.6%: AUC   20% : AUC  0.579

Continuous variant Area:  0.774

Addition of a marker to a model  Typically small improvement in discriminative ability according to AUC (or c statistic)  c stat blamed for being insensitive  Study ‘Reclassification’

 Net Reclassification Index:  improvement in sensitivity + improvement in specificity = (move up | event – move down | event) + (move down | non-event – move up | non-event )

/183=12% -1/3081=.03%

NRI for 5.6% cut-off?  NRI for CHD: 7/183 = 3.8%  NRI for No CHD: 24/3081 = 0.8%  NRI = 4.6%

NRI and sens/spec  NRI = delta sens + delta spec  Sens w/out = 135/183 = 73.8%  Sens with HDL= 142/183 = 77.6%

NRI better than delta AUC?  NRI = delta(sens) + delta(spec)  AUC for binary classification = (sens + spec) / 2

NRI and delta AUC  NRI = delta(sens) + delta(spec)  AUC for binary classification = (sens + spec) / 2  Delta AUC = (delta(sens) + delta(spec)) / 2  NRI = 2 x delta(AUC)  Delta(Youden) = delta(sens) + delta(spec)  NRI = delta(Youden)

NRI has ‘absurd’ weighting?

Decision-analytic performance: NB  Net Benefit = (TP – w FP) / N  No HDL model:  TP = = 135  FP = = 1067  w = 0.056/0.944 =  N = 3264  NB = (135 – x 1067) / 3264 = 2.21%  With HDL model:  NB = (142 – x 1043) / 3264 = 2.47%  Delta(NB)  Increase in TP: 10 – 3 = 7  Decrease in FP: 166 – 142 = 24  Increase in NB: ( x 24) / 3264 = 0.26%  Interpretation:  “2.6 more true CHD events identified per 1000 subjects, at the same number of FP classifications.”  “ HDL has to be measured in 1/0.26% = 385 subjects to identify one more TP”

Application to FHS

Continuous NRI: no categories  All cut-offs; information similar to AUC and Decision Curve