Criteria for Assessment of Performance of Cancer Risk Prediction Models: Overview Ruth Pfeiffer Cancer Risk Prediction Workshop, May 21, 2004 Division.

Slides:



Advertisements
Similar presentations
Lecture 3 Validity of screening and diagnostic tests
Advertisements

What is a good ensemble forecast? Chris Ferro University of Exeter, UK With thanks to Tom Fricker, Keith Mitchell, Stefan Siegert, David Stephenson, Robin.
Critically Evaluating the Evidence: diagnosis, prognosis, and screening Elizabeth Crabtree, MPH, PhD (c) Director of Evidence-Based Practice, Quality Management.
1 Case-Control Study Design Two groups are selected, one of people with the disease (cases), and the other of people with the same general characteristics.
Sponsored by Division of Cancer Control and Population Sciences Division of Cancer Epidemiology and Genetics Office of Women’s Health National Cancer Institute,
Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,
Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology.
SOME ADDITIONAL POINTS ON MEASUREMENT ERROR IN EPIDEMIOLOGY Sholom May 28, 2011 Supplement to Prof. Carroll’s talk II.
Lecture 10 Comparison and Evaluation of Alternative System Designs.
How do we know whether a marker or model is any good? A discussion of some simple decision analytic methods Carrie Bennette on behalf of Andrew Vickers.
Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C.
Lucila Ohno-Machado An introduction to calibration and discrimination methods HST951 Medical Decision Support Harvard Medical School Massachusetts Institute.
Sample Size Determination
Cohort Studies Hanna E. Bloomfield, MD, MPH Professor of Medicine Associate Chief of Staff, Research Minneapolis VA Medical Center.
Decision Tree Models in Data Mining
Linear Regression and Correlation Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of.
Thoughts on Biomarker Discovery and Validation Karla Ballman, Ph.D. Division of Biostatistics October 29, 2007.
Screening and Early Detection Epidemiological Basis for Disease Control – Fall 2001 Joel L. Weissfeld, M.D. M.P.H.
AM Recitation 2/10/11.
Statistics in Screening/Diagnosis
BASIC STATISTICS: AN OXYMORON? (With a little EPI thrown in…) URVASHI VAID MD, MS AUG 2012.
 Mean: true average  Median: middle number once ranked  Mode: most repetitive  Range : difference between largest and smallest.
Multiple Choice Questions for discussion
Simple Linear Regression
Risk Prediction in Clinical Practice Joann G. Elmore MD, MPH SCCA June 11, 2014.
8.1 Inference for a Single Proportion
Diving into the Deep: Advanced Concepts Module 9.
Division of Population Health Sciences Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn Indices of Performances of CPRs Nicola.
Continuous Probability Distributions Continuous random variable –Values from interval of numbers –Absence of gaps Continuous probability distribution –Distribution.
EDRN Approaches to Biomarker Validation DMCC Statisticians Fred Hutchinson Cancer Research Center Margaret Pepe Ziding Feng, Mark Thornquist, Yingye Zheng,
Copyright © 2003, SAS Institute Inc. All rights reserved. Cost-Sensitive Classifier Selection Ross Bettinger Analytical Consultant SAS Services.
How do we know whether a marker or model is any good? A discussion of some simple decision analytic methods Carrie Bennette (on behalf of Andrew Vickers)
Introduction Chapter 1. What is Physical Fitness? Physical fitness is the ability to carry out daily tasks with vigor and alertness without undue fatigue.
Introduction Chapter 1. What is Physical Fitness? Physical fitness is the ability to carry out daily tasks with vigor and alertness without undue fatigue.
D:/rg/folien/ms/ms-USA ppt F 1 Assessment of prediction error of risk prediction models Thomas Gerds and Martin Schumacher Institute of Medical.
Prevention with Finasteride Ian M. Thompson, MD October, 2009.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
INTRODUCTION Upper respiratory tract infections, including acute pharyngitis, are common in general practice. Although the most common cause of pharyngitis.
MEASURES OF TEST ACCURACY AND ASSOCIATIONS DR ODIFE, U.B SR, EDM DIVISION.
RevMan for Registrars Paul Glue, Psychological Medicine What is EBM? What is EBM? Different approaches/tools Different approaches/tools Systematic reviews.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
1 Risk Assessment Tests Marina Kondratovich, Ph.D. OIVD/CDRH/FDA March 9, 2011 Molecular and Clinical Genetics Panel for Direct-to-Consumer (DTC) Genetic.
Evaluating Results of Learning Blaž Zupan
1 Lecture 6: Descriptive follow-up studies Natural history of disease and prognosis Survival analysis: Kaplan-Meier survival curves Cox proportional hazards.
Evaluating Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics.
Evaluating Predictive Models Niels Peek Department of Medical Informatics Academic Medical Center University of Amsterdam.
United Nations Workshop on Revision 3 of Principles and Recommendations for Population and Housing Censuses and Evaluation of Census Data, Amman 19 – 23.
Learning Simio Chapter 10 Analyzing Input Data
Unit 15: Screening. Unit 15 Learning Objectives: 1.Understand the role of screening in the secondary prevention of disease. 2.Recognize the characteristics.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
Introduction to Disease Prevalence modelling Day 6 23 rd September 2009 James Hollinshead Paul Fryers Ben Kearns.
XIAO WU DATA ANALYSIS & BASIC STATISTICS.
© 2010 Pearson Prentice Hall. All rights reserved 7-1.
Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.
Evaluating Classification Performance
Breast Cancer Surveillance Consortium (BCSC): A Research Infrastructure sponsored by the National Cancer Institute Breast Cancer Risk Models William Barlow,
Machine Learning 5. Parametric Methods.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Chapter 5 Joint Probability Distributions and Random Samples  Jointly Distributed Random Variables.2 - Expected Values, Covariance, and Correlation.3.
Testing Predictive Performance of Ecological Niche Models A. Townsend Peterson, STOLEN FROM Richard Pearson.
PTP 560 Research Methods Week 12 Thomas Ruediger, PT.
Biostatistics Board Review Parul Chaudhri, DO Family Medicine Faculty Development Fellow, UPMC St Margaret March 5, 2016.
Assessing the additional value of diagnostic markers: a comparison of traditional and novel measures Ewout W. Steyerberg Professor of Medical Decision.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Biostatistics Class 2 Probability 2/1/2000.
Bootstrap and Model Validation
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Associate Professor Dept. of Family & Community Medicine.
Aiying Chen, Scott Patterson, Fabrice Bailleux and Ehab Bassily
Evidence Based Diagnosis
Presentation transcript:

Criteria for Assessment of Performance of Cancer Risk Prediction Models: Overview Ruth Pfeiffer Cancer Risk Prediction Workshop, May 21, 2004 Division of Cancer Epidemiology and Genetics National Cancer Institute

Cancer Risk Prediction Models Model input: –Individual’s age and risk factors –Age interval at risk Model output: –Estimate of individual’s absolute risk of developing cancer over a given time period (e.g. the next 5 years).

Definition of Absolute Risk for Cancer in [a, a+  ]

Applications of absolute risk prediction models Population level: –Estimate population disease burden –Estimate impact of changing the risk factor distribution in the general population –Plan intervention studies Individual level: –Clinical decision-making: Modification of known risk factors (diet, exercise) Weighing risks and benefits of intervention ( eg chemoprevention) –Screening recommendations

Evaluating the performance of risk models How well does model predict for groups of individuals: Calibration How well does model categorize individuals: Accuracy scores How well does model distinguish between individuals who will and will not experience event: Discriminatory Accuracy

Independent population for validation Assume population of N individuals followed over time period Define

Assessing Model Calibration Goodness-of-fit criteria based on comparing observed (O) with expected (E) number of events overall and in subgroups of risk factors of the population Use Poisson approximation to sum of independent binomial random variables with r i <<1

Assessing Model Calibration, cont. Unbiased (well calibrated) Remark:

Brier Score Brier Score = Mean Squared Error (measure of accuracy) Brier, 1950

Comparison of observed (O) and expected (E) cases of invasive breast cancer (Gail et al Model 2) in placebo arm of Breast Cancer Prevention Trial (Table 4, Costantino et al, JNCI, 1999) Age Group # women OEE/O <= >= All ages

Assess model performance for clinical decision making For clinical decision making a decision rule is needed for some threshold r*

For given threshold r* define sensitivity and specificity of decision rule as

Problem: sensitivity and specificity not always appropriate measures Example: rare disease π=P(Y=1)=0.01 Sensitivity =0.95, specificity=0.95

Accuracy Scores Measure how well true disease outcome predicted Quantify clinical value of decision rule (Zweig & Campbell, 1993) Positive predictive value Negative predictive value Weighted combinations of both Depend on sensitivity, specificity, disease prevalence

Measures of Discrimination for Range of Thresholds ROC curve (plots sensitivity against 1-specificity) Area under the ROC curve (AUC) ~Mann- Whitney-Wilcoxon Rank Sum Test ~ Gini index for rare events Concordance statistic (Rockhill et al, 2001; Bach et al, 2003) Partial area under the curve (Pepe, 2003; Dodd&Pepe, 2003)

Decision Theoretic Framework Specify loss function for each combination of true disease status and decision:

Known Loss Function

If sens(r*)=1 and spec(r*)=1

Special Cases 1. C 00 =C 11 =0; C 10 =C 01 overall loss=misclassification rate: EL minimized for r*=0.5

Special Cases, cont 2.

If sens(r*)=1 and spec(r*)=1 Recall:

Should Mammographic Screen be Recommended Based on a Risk Model? Outcome over next 5 Years No ScreenScreen Y=0 (no cancer) Y=1 (cancer)

Ratio of Expected Loss to Minimum Expected Loss vs Sensitivity

Intervention Setting Two outcomes: eg Y 1 =breast cancer Y 2 =stroke Loss

Intervention Setting Intervention does not change cost, it changes probability function of joint outcomes No intervention: P δ=0 (Y 1, Y 2 ) Intervention: P δ=1 (Y 1, Y 2 )

Ideally we would have joint risk model for both outcomes, Y 1, Y 2 Simplification: P i (Y 1 =1, Y 2 =1|x) = p 2i r i (x) p 21 = p 20 ρ 2 r 1 (x) = r 0 (x)ρ 1

Loss function for clinical decision: should woman take Tamoxifen for breast cancer prevention? Over next 5 years No Breastcancer No Stroke ρ 1 =0.5, ρ 2 =3

Ratio of Expected Loss to Expected Loss with sens=spec=1 vs Sensitivity

Summary For certain applications (screening) high sensitivity and specificity more important than others (clinical decision making) Always want a well calibrated model Discriminatory aspects of models may be less important than accuracy and calibration

Collaborators Mitchell Gail, NCI Andrew Freedman, NCI Patricia Hartge, NCI

References Brier GW, 1950, Monthly Weather Review, 75, 1-3 Dodd LE, Pepe M, 2003, JASA 98 (462): Efron B, 1986, JASA 81 (394): Efron B, 1983, JASA 78 (382): Gail MH et al, 1999, JNCI, 91 (21): Hand DJ, 2001, Statistica Neerlandica, 55 (1): 3-16 Hand DJ, 1997, Construction and assessment of classification rules, Wiley. Pepe MS 2000, JASA, 95 (449): Schumacher M, et al, 2003, Methods of information in medicine 42: Steyerberg EW, et al, 2003, Journal of Clinical Epidemiology 56:

AUC value for the Gail et al Model

Relative Risk Estimates for “Gail Model” Risk Factor Age at menarche (yrs.) (>14, 12-13, <12) Number of Biopsies (0, 1, 2+) Age at first live birth (yrs.) ( 30) # of first degree relatives with breast cancer (0, 1, 2+)

Intervention Setting Two outcomes: eg Y 1 =breast cancer Y 2 =stroke Loss