Combined predictor Selection for Multiple Clinical Outcomes Using PHREG Grisell Diaz-Ramirez.

Slides:



Advertisements
Similar presentations
Chapter 5 Multiple Linear Regression
Advertisements

Confidentiality risks of releasing measures of data quality Jerry Reiter Department of Statistical Science Duke University
Chapter 3 – Data Exploration and Dimension Reduction © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Logistic Regression.
What role should probabilistic sensitivity analysis play in SMC decision making? Andrew Briggs, DPhil University of Oxford.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Class 17: Tuesday, Nov. 9 Another example of interpreting multiple regression coefficients Steps in multiple regression analysis and example analysis Omitted.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Prediction and model selection
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
POSTER TEMPLATE BY: Cluster-Based Modeling: Exploring the Linear Regression Model Space Student: XiaYi(Sandy) Shen Advisor:
Chapter 7 Correlational Research Gay, Mills, and Airasian
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
McGraw-Hill © 2006 The McGraw-Hill Companies, Inc. All rights reserved. Correlational Research Chapter Fifteen.
Multiple Choice Questions for discussion
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
Simple Linear Regression
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
Understanding Statistics
 Is there a comparison? ◦ Are the groups really comparable?  Are the differences being reported real? ◦ Are they worth reporting? ◦ How much confidence.
Assessing Survival: Cox Proportional Hazards Model
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
1 Modeling Coherent Mortality Forecasts using the Framework of Lee-Carter Model Presenter: Jack C. Yue /National Chengchi University, Taiwan Co-author:
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
HSRP 734: Advanced Statistical Methods July 31, 2008.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Reserve Variability – Session II: Who Is Doing What? Mark R. Shapland, FCAS, ASA, MAAA Casualty Actuarial Society Spring Meeting San Juan, Puerto Rico.
Entering Multidimensional Space: Multiple Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Lecture 9: Analysis of intervention studies Randomized trial - categorical outcome Measures of risk: –incidence rate of an adverse event (death, etc) It.
1 Lecture 6: Descriptive follow-up studies Natural history of disease and prognosis Survival analysis: Kaplan-Meier survival curves Cox proportional hazards.
Can Physical Activity Attenuate Aging- related Weight Loss in Older People? The Yale Health and Aging Study, James Dziura, Carlos Mendes de Leon,
01/20151 EPI 5344: Survival Analysis in Epidemiology Actuarial and Kaplan-Meier methods February 24, 2015 Dr. N. Birkett, School of Epidemiology, Public.
Chapter 8 Relationships Among Variables. Outline What correlational research investigates Understanding the nature of correlation What the coefficient.
Effect of grandparental child rearing on cognitive development among 12-month-old Thai infants: the prospective cohort study of Thai children Miss. Sukanya.
The Mortality Effects of Health Insurance for the Near-Elderly Uninsured Jose Escarce David Geffen School of Medicine at UCLA and RAND Coauthors: Daniel.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
CHAPTER 6: SAMPLING, SAMPLING DISTRIBUTIONS, AND ESTIMATION Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
Pharmacy in Public Health: Describing Populations Course, date, etc. info.
Simulation setup Model parameters for simulations were tuned using repeated measurement data from multiple in-house completed studies and baseline data.
Stats Methods at IC Lecture 3: Regression.
Prevalence of Chronic Bronchitis in First Nations People Punam Pahwa,1, 2,* Chandima P. Karunanayake,1 Donna Rennie, 1 Kathleen McMullin,1 Josh Lawson,1.
Bootstrap and Model Validation
Conclusions & Implications
- Higher SBP visit-to-visit variability (SBV) has been associated
Logistic Regression APKC – STATS AFAC (2016).
Rabia Khalaila, RN, MPH, PHD Director, Department of Nursing
Generalized linear models
Anastasiia Raievska (Veramed)
Regression Analysis Module 3.
Survival curves We know how to compute survival curves if everyone reaches the endpoint so there is no “censored” data. Survival at t = S(t) = number still.
Notes on Logistic Regression
The binomial applied: absolute and relative risks, chi-square
Statistics in MSmcDESPOT
Dr. Siti Nor Binti Yaacob
Dementia risk model validation in low and middle income countries
Statistics 103 Monday, July 10, 2017.
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Regression Model Building
Categorical Data Analysis Review for Final
Linear Model Selection and regularization
When the Mean isn’t Enough
the prospective cohort study of Thai children
Stat 324 – Day 25 Penalized Regression.
15.1 The Role of Statistics in the Research Process
Logistic Regression.
Sabrina M. Figueiredo1,3, Alicia Rozensveig3, José A. Morais2, Nancy E
T. Tzellos1,2; H. Yang3; F. Mu3; B. Calimlim4; J. Signorovitch3
Chapter 4 –Dimension Reduction
Presentation transcript:

Combined predictor Selection for Multiple Clinical Outcomes Using PHREG Grisell Diaz-Ramirez

Background PHREG procedure has built-in options to perform predictor selection using stepwise methods or best subsets Built-in options for predictor selection do not allow the simultaneous selection of predictors for multiple outcomes The selection of a common set of predictors for multiple outcomes is important in clinical settings: Practitioners want to evaluate the effects of risk-factors on multiple health-related outcomes in the same study

Background (cont.) There are many variable-selection algorithms for individual outcomes, but there are limited studies on the selection of a common set of predictors for multiple outcomes Information Criteria like the Bayesian Information Criterion (BIC) and stepwise regression methods (e.g. backward elimination) are commonly used when selecting the best set of predictors for individual outcomes

Objectives We describe a SAS Macro for selecting a common set of variables for predicting multiple clinical outcomes using PHREG The selection method uses a variant of backward elimination based on the average normalized-BIC across multiple outcomes We test our method using the Health and Retirement Study (HRS) data Compare final predictor set from BIC Combined Backward Elimination with final predictor sets from BIC Backward Elimination by Outcome Perform simulation study

Methods Performed BIC backward elimination using a SAS macro we developed Data: HRS Outcomes (4 time-to-event clinical outcomes from HRS): Time to first Activity of Daily Living (ADL) dependence Time to first Instrumental Activity of Daily Living (IADL) difficulty Time to first walking-across-the-room dependence Time to death Predictors: 39 health-related and demographic predictors measured at baseline from HRS Measured predictive accuracy using the Harrell’s C-statistic for all fitted models

Methods (cont.) Fitted survival regression models: Fine and Gray Competing-risk regression: time to 1st ADL, IADL, walking-across the room. When there are multiple types of events, a competing risk event (e.g. death) can modify the chance that the event of interest occurs (e.g. first ADL), so we need to appropriately account for it Cox regression: time to death. There is only one type of event, namely all-cause mortality

Normalized BIC (k)= 𝐵𝐼𝐶(𝑘) (𝐵𝐼𝐶 𝑓𝑢𝑙𝑙 𝑚𝑜𝑑𝑒𝑙 −𝐵𝐼𝐶 𝑏𝑒𝑠𝑡 𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙 𝑚𝑜𝑑𝑒𝑙) Methods (cont.) Used normalized BIC: Changes in BIC from complex models to simpler models count approximately the same across multiple outcomes: Normalized BIC (k)= 𝐵𝐼𝐶(𝑘) (𝐵𝐼𝐶 𝑓𝑢𝑙𝑙 𝑚𝑜𝑑𝑒𝑙 −𝐵𝐼𝐶 𝑏𝑒𝑠𝑡 𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙 𝑚𝑜𝑑𝑒𝑙) BIC=−2𝑙𝑜𝑔𝐿+𝑘 log 𝑛 L: the maximized value of the likelihood function of the fitted model k: number of parameters estimated by the fitted model n: sample size

Objectives We describe a SAS Macro for selecting a common set of variables for predicting multiple clinical outcomes using PHREG The selection method uses a variant of backward elimination based on the average normalized BIC across multiple outcomes We test our method using the Health and Retirement Study (HRS) data Compare final predictor set from BIC Combined Backward Elimination with final predictor sets from BIC Backward Elimination by Outcome Perform simulation study

SAS Macro For BIC Combined Backward Elimination Select final set of predictors with minimum average normalized BIC from p-1 to 2 predictors Same process is repeated until there are two variables left in the model Last model only with variables that are foced in (e.g. age and sex) Start with model with p-1 predictors (e.g. 38) Select subset of p-2 predictors with minimum average normalized BIC (e.g. p=37) Start with Full model with all predictors (p) (e.g p=39) Select subset of p-1 predictors with minimum average normalized BIC (e.g. p=38)

Objectives We describe a SAS Macro for selecting a common set of variables for predicting multiple clinical outcomes using PHREG The selection method uses a variant of backward elimination based on the average normalized BIC across multiple outcomes We test our method using the Health and Retirement Study (HRS) data Compare final predictor set from BIC Combined Backward Elimination with final predictor sets from BIC Backward Elimination by Outcome Perform simulation study

Case Study: HRS data Nationally representative cohort of 5,531 community- dwelling seniors enrolled in HRS: Longitudinal survey of a representative sample of all persons in the United States over age 50. Interviews are every two years on subjects like health care, housing, assets, pensions, employment, and disability Included respondents 70-years old or older at the time of their interview in 2000 The follow-up time was up to 2014

Case Study: HRS data (cont.) No. Variables Average Normalized-BIC Average C-statistic 39 134.271 0.669 38 134.181 37 134.094 0.668 … 14 133.492 0.655 4 133.930 0.632 3 134.110 0.628 2 134.410 0.611 In the red box is the final model with the minimum average normalized-BIC

Objectives We describe a SAS Macro for selecting a common set of variables for predicting multiple clinical outcomes using PHREG The selection method uses a variant of backward elimination based on the average normalized BIC across multiple outcomes We test our method using the Health and Retirement Study (HRS) data Compare final predictor set from BIC Combined Backward Elimination with final predictor sets from BIC Backward Elimination by Outcome Perform simulation study

BIC Backward Elimination by Individual Outcome Ran the BIC backward elimination macro for each individual outcome and selected a final model for each outcome with the minimum BIC Created a set of predictors with the union of all predictors in the final model of each outcome Compared it with the set of predictors obtained using combined outcomes

BIC Backward Elimination by Individual Outcome (cont.) Harrell's C-statistic time to first ADL dependence (9 variables) time to first IADL difficulty (8 variables) time to first walking-across-the-room dependence (7 variables) time to death (16 variables) Average Individual models 0.637 0.635 0.636 0.711 Combined model by outcome (14 variables) 0.639 0.642 0.706 0.655 (average 4 models) Union model by outcome (22 variables) 0.647 0.638 0.648 0.661 (average 4 models) Combined model is more parsimonious with 14 variables vs. 22 variables in union model C-statistic in the combined model by outcome is very similar to the predictive accuracy of individual and union models Final combined model has a good balance between parsimony and predictive accuracy

Objectives We describe a SAS Macro for selecting a common set of variables for predicting multiple clinical outcomes using PHREG The selection method uses a variant of backward elimination based on the average normalized BIC across multiple outcomes We test our method using the Health and Retirement Study (HRS) data Compare final predictor set from BIC Combined Backward Elimination with final predictor sets from BIC Backward Elimination by Outcome Perform simulation study

Simulation Study Created two sets of 100 simulated data sets with the same sample size (N=5,531) and number of initial predictors (i.e. 39) Generated one set of 100 simulated data sets with high correlation among the outcomes and a second set of 100 simulated data sets with low correlation among the outcomes Ran the BIC combined backward elimination macro in each of the simulated data sets Computed the percentage of correct inclusion and the percentage of correct exclusion of predictors in the final simulated models

Simulation Study (cont.) Variable Percentage Correct Inclusion Uncorrelated outcomes Percentage Correct Inclusion Correlated outcomes COGDLRC3G 100 DIABETES DRIVE 95 89 EDU 78 71 EXERCISE 93 91 HEARTFAILURE 94 96 INCONTINENCE 81 LUNG 73 74 OTHERCLIM3G 87 82 OTHERPUSH 97 92 SMOKING VOLUNTEER 98 Mean 91.33 89.25 Most of the predictors have a percentage of correct inclusion of 80% or higher. That is, they show up in 80-100% of the final combined models using simulated data sets Percentage of correct exclusion is almost 100% (result not shown in table)

Simulation Study (cont.) Harrell's C-Statistic No. Variables Final Combined Model Original data set 0.655 14 Simulated data sets with uncorrelated outcomes (mean [95% CI]) 0.647 [0.646-0.647] 12.960 [12.793-13.127] Simulated data sets with correlated outcomes (mean [95% CI]) 0.646 [ 0.645-0.648] 12.720 [12.548-12.892] Computed mean and 95% Confidence Interval (CI) of the C-statistic and the number of predictors in the final combined models from simulated data sets C-statistic of the final combined models from the simulated data sets is very similar to the C-statistic of the final combined model in the original data Means of the number of predictors in the final combined models from simulated data sets are slightly smaller

Conclusions It is possible to select a common set of predictors for multiple clinical outcomes using PROC PHREG and a variant of backward elimination based on the average normalized Bayesian Information Criterion (BIC) across outcomes Our method provides a straightforward approach without compromising parsimony or predictive accuracy This method could also be applied to continuous or categorical outcome variables, and other Information Criteria like the AIC can be used

Contact Information Name: Grisell Diaz-Ramirez Company: University of California, San Francisco Email: grisell.diaz-ramirez@ucsf.edu