Multivariable Logistic Regression Split Cohort into Development &

Slides:



Advertisements
Similar presentations
CART: Classification and Regression Trees Chris Franck LISA Short Course March 26, 2013.
Advertisements

Divisional Meeting 15 th January 2009 Streptococcal Pharyngitis: A Systematic Review of the Predictive Value of Signs and Symptoms and the External Validation.
1 Statistical Modeling  To develop predictive Models by using sophisticated statistical techniques on large databases.
A Quick Overview By Munir Winkel. What do you know about: 1) decision trees 2) random forests? How could they be used?
Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.
Chapter 7 – Classification and Regression Trees
Model assessment and cross-validation - overview
Recursive Partitioning Method on Survival Outcomes for Personalized Medicine 2nd International Conference on Predictive, Preventive and Personalized Medicine.
How do we know whether a marker or model is any good? A discussion of some simple decision analytic methods Carrie Bennette on behalf of Andrew Vickers.
Chapter 13: Inference in Regression
) Linked2Safety Project (FP7-ICT – 5.3 ) A NEXT-GENERATION, SECURE LINKED DATA MEDICAL INFORMATION SPACE FOR SEMANTICALLY-INTERCONNECTING ELECTRONIC.
بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.
How do we know whether a marker or model is any good? A discussion of some simple decision analytic methods Carrie Bennette (on behalf of Andrew Vickers)
Chapter 9 – Classification and Regression Trees
NFL Play Predictions Will Burton, NCSU Industrial Engineering 2015
Dimensionality of the latent structure and item selection via latent class multidimensional IRT models FRANCESCO BARTOLUCCI.
Prediction of Malignancy of Ovarian Tumors Using Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel 1, I.
Debbie Postlethwaite RNP, MPH Adekemi Ogultala, MD Maqdooda Merchant MSc, MA.
Trees Lives Temp>30° Lives Dies Temp
Factors Predicting Individual Health among Pilgrims of Kurdistan County: an application of Health Belief Model.
Results Compliance with Breast Cancer Screening Guidelines in the HIV Clinic: A Quality Improvement Tool E. Patrozou M.D., E. Christaki M.D., L. Hicks.
By:Carmen Rodarte.  The first Electronic Health Record system was first developed in 1972 by The Regenestreif Institute.
Finding a Predictive Model for Post-Hospitalization Adverse Events Henry Carretta 1, PhD, MPH; Katrina McAfee 1,2, MS; Dennis Tsilimingras 1,3, MD, MPH.
Afebrile Infants With UTI and the Risk for Bacteraemia Journal Club Sheffield Children’s Hospital Naheed Maher 7 th January 2015.
1 Statistics & R, TiP, 2011/12 Neural Networks  Technique for discrimination & regression problems  More mathematical theoretical foundation  Works.
Validation and Refinement of a Prediction Rule to Identify Children at Low Risk for Acute Appendicitis Kharbanda AB, Dudley NC, Bajaj L, et al; Pediatric.
Printed by Natural History of Sun Protection Behaviors in a Cohort of Children in Colorado Nancy L. Asdigian PhD,* Lori A. Crane.
CMS SAS Users Group Conference Learn more about THE POWER TO KNOW ® October 17, 2011 Medicare Payment Standardization Modeling using SAS Enterprise Miner.
Is it possible to predict New Onset Diabetes After Transplantation (NODAT) in renal recipients using epidemiological data alone? Background NODAT is an.
Characterizing an Optimal Predictive Modeling Framework for Prediction of Adverse Drug Events Jon Duke, MD MS, Xiaochun Li PhD, Zuoyi Zhang PhD EDM Forum.
R3. 하인균 /Prof. 박기호. INTRODUCTION Two international consensus conferences in 1991 and 2001 used expert opinion to generate the current definitions of “Sepsis”
Demographic and Behavioral Differences between Latino and non-Latino Patients Attending Baltimore City STD Clinics, Renee M. Gindi 1, Kathleen.
DEMONSTRATION OF USING SPSS Logistic Regression Models for Prediction 2016/11/71.
Kelci J. Miclaus, PhD Advanced Analytics R&D Manager JMP Life Sciences
Bootstrap and Model Validation
Texas Pediatric Society Electronic Poster Contest
VA Office of Mental Health and Suicide Prevention
Fever in infants: Evaluation by
Prognostic Implications of Neutrophil to Lymphocyte Ratio in the Treatment of Metastatic Renal Cell Carcinoma with Pazopanib and Sunitinib Ajay Raghunath1,
Jan B. Pietzsch1, Benjamin P. Geisler1, Murray D. Esler 2
CLASSIFICATION OF TUMOR HISTOPATHOLOGY VIA SPARSE FEATURE LEARNING Nandita M. Nayak1, Hang Chang1, Alexander Borowsky2, Paul Spellman3 and Bahram Parvin1.
Project Participants Mitch Campion, M.S. Graduate Student
Lecture 17. Boosting¶ CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.
Thiopurine Metabolites Indexed Assay Calculation as a Grid-Enabled Rules Engine via the LIDDEx Consortium’s Grid Services Architecture in Support of Inflammatory.
Strategies to incorporate pharmacoeconomics into pharmacotherapy
AMIA Joint Summits 2017 San Francisco
Roland C. Merchant, MD, MPH, ScD
Predicting Primary Myocardial Infarction from Electronic Health Records -Jitong Lou.
CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,
Development and internal-external validation of a multivariable prediction model for pre-operative assessment of positive lymph nodes during robot assisted.
Direct or Remotely sensed
Lecture 6: Introduction to effect modification (part 2)
Jessina C. McGregor, PhD; Miriam R. Elman, MPH; David T
Using decision trees and their ensembles for analysis of NIR spectroscopic data WSC-11, Saint Petersburg, 2018 In the light of morning session on superresolution.
Predicting Pneumonia & MRSA in Hospital Patients
Analytics: Its More than Just Modeling
Postoperative neonatal mortality prediction using superlearning
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
CRISP: Consensus Regularized Selection based Prediction
Ensemble learning Reminder - Bagging of Trees Random Forest
Improving Overlap Farrokh Alemi, Ph.D.
Toktobaeva B, Karymbaeva S Drug Information Centre Kyrgyzstan
Statistical Considerations for Using Multiple Databases to Build a Biomarker Probability Tool Shijia Bian MS1; Wenting Wang PhD1; Nancy Maserejian.
Regression and Clinical prediction models
Clinical prediction models
Lack of Confidence Interval Reporting in Dermatology: A Call to Action
A machine learning approach to prognostic and predictive covariate identification for subgroup analysis David A. James and David Ohlssen Advanced Exploratory.
Machine learning analysis for predicting survival in stage III non-small cell lung cancer patients receiving definitive chemotherapy and proton radiation.
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Chaoran Hu1,4, Xiao Tan2,4, Qing Pan3, Yong Ma4, Jaejoon Song4
Presentation transcript:

Multivariable Logistic Regression Split Cohort into Development & Consensus Strategy for Variable Selection in Clinical Prediction Rule Development Miriam R. Elman, MPH1; Jessina C. McGregor, PhD2; Jodi Lapidus, PhD1 1Oregon Health & Science University-Portland State University School of Public Health; 2Oregon State University/Oregon Health & Science University College of Pharmacy BACKGROUND Clinical prediction rules aim to prognostically identify presence of diagnoses using baseline patient data Electronic health record (EHR) data is a rich resource Massive amount of retrospective patient information Robust and efficient variable reduction likely aids variable selection on multidimensional, EHR data Prevent early removal of key predictors Model Building Approach Consensus strategy to reduce candidate predictors RESULTS (continued) Saturated and best subsets model results in Table 3 of 4 predictors selected by all three methods appeared in final model Multivariable Logistic Regression Consensus Strategy Random Forest Group Lasso Boosted Classification Statistical analysis with R 3.3.3 Table. Results of multivariable logistic regression models Model AUC Sensitivity Specificity Saturated 0.6631 0. 6039 0.6432 Best subsets 0.6382 0.4312 0.7748 OBJECTIVE Apply consensus strategy to inform prediction rule developed to direct appropriate selection of antibiotic agents to treat urinary tract infections CONCLUSIONS Prediction rule did not meet minimum acceptable 90% sensitivity and 85% specificity set a priori by clinicians Challenging prediction problem Mostly categorical predictors Key predictors may be missing in retrospective data METHODS Data Preparation Data management with SAS v9.4 Extract EHR Data & Identify Cohort STEP 1 Split Cohort into Development & Validation Sets STEP 2 Development Set for Prediction Rule STEP 3 RESULTS No interaction terms selected for best subsets model Twenty-two predictors selected by consensus strategy FUTURE DIRECTIONS Reviewed predictors with clinical partners and conducting prospective data collection Further model development with additional data Explore additional modeling strategies Developed framework for consensus strategy Available for other applications Random Forest (0) (3) Lasso (8) (4) (5) (2) Boosting (0) Miriam Elman elmanm@ohsu.edu

Split Cohort into Development & STEP 1 Extract EHR Data & Identify Cohort STEP 2 Split Cohort into Development & Validation Datasets STEP 3 Use Development for Prediction Rule 80% 20% Extract retrospective EHR data from electronic repositories Define cohort, outcome, and predictors Randomly split cohort into development (80%) and validation (20%) datasets Construct prediction rule on development set Retain remaining data set aside for rule validation

Boosted Classification Random Forest Group Lasso Boosted Classification Consensus Strategy party (1.2-2) implementation used Algorithm repeated x 3 with different seeds and 10 most important variables used for each grpreg (3.0-2) used to select categorical variables as a group Tuning parameter identified with minimized cross-validated error then refined mboost (2.7-0) used Variables defined as ordinary least squared base learners to group categorical variables Continuous variables centered

Multivariable Logistic Regression Model selection conducted with best subsets based on minimized BIC Model limited to 4 main effects by design Interactions assessed after main effects selected AUC, sensitivity, and specificity calculated for saturated model and selected model Youden’s index chosen for sensitivity and specificity cutpoint

Lasso Alone (8) Random Forest (0) Boosting and Boosting (5) Random Forest and Boosting (2) All Three (4) and Lasso (3)