Chaoran Hu1,4, Xiao Tan2,4, Qing Pan3, Yong Ma4, Jaejoon Song4

Slides:



Advertisements
Similar presentations
 Bases de données complexes et nouveaux outils prédictifs: - MIMIC-II - Super ICU Learner Algorithm (SICULA) Project PIRRACCHIO R, Petersen M, Carone.
Advertisements

Clustered or Multilevel Data
Prediction Methods Mark J. van der Laan Division of Biostatistics U.C. Berkeley
Research Proposal Development of research question
Variable Selection for Optimal Decision Making Lacey Gunter University of Michigan Statistics Department Michigan Student Symposium for Interdisciplinary.
L1 Chapter 11 Experimental and Quasi- experimental Designs Dr. Bill Bauer.
Electronic Medical Record Use and the Quality of Care in Physician Offices National Conference on Health Statistics August 17, 2010 Chun-Ju (Janey) Hsiao,
Physician Acceptance of New Medicaid Patients by State in 2011 Sandra Decker, Ph.D. National Center for Health Statistics NCHS National.
Adoption of Health Information Technology among U.S. Ambulatory and Long-term Care Providers by Esther Hing, M.P.H., and Anita Bercovitz, Ph.D National.
Frequency of Asthma Education in Primary Care for the Years Marquise Lee, MSCR 1, Kevin Cross, PharmD, MSCR 1, Wan Yu Yang, MSCR 1, Michael Jiroutek,
Diagnostic Indicators of Anxiety and Depression in Older Dizzy Patients in Primary Care J Geriatr Psychiatry Neurol 2011;24(2) Maarsingh OR, 1 Dros.
1 Understanding and Using NAMCS and NHAMCS Data: A Hands-On Workshop Susan M. Schappert Donald K. Cherry.
Moving from Development to Efficacy & Intervention Fidelity Topics National Center for Special Education Research Grantee Meeting: June 28, 2010.
TRANSLATING VISITS INTO PATIENTS USING AMBULATORY VISIT DATA (Hypertensive patient case study) by Esther Hing, M.P.H. and Julia Holmes, Ph.D U.S. DEPARTMENT.
Data to Action: Results and Next Steps for the Healthy Kids Colorado Survey (HKCS) HKCS Advisory Committee Overview October 2014.
Electronic Health Records and Clinical Decision Support Systems Impact on National Ambulatory Care Quality Max J. Romano, BA; Randall S. Stafford, MD,
Psychology of Learning EXP4404 Chapter 2: The Study of Learning and Behavior Dr. Steve.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
Impact of the “Asthma Toolbox” for Improving Documentation of Pediatric Asthma Management in an Urban Community Health Center Presenter: Delaney Gracy,
Doing Data Science – Chapter 12: Epidemiology Vast amounts of individual patient medical data is available – Detailed – visits, prescriptions, outcomes,
EMBC2001 Using Artificial Neural Networks to Predict Malignancy of Ovarian Tumors C. Lu 1, J. De Brabanter 1, S. Van Huffel 1, I. Vergote 2, D. Timmerman.
CJT 765: Structural Equation Modeling Class 12: Wrap Up: Latent Growth Models, Pitfalls, Critique and Future Directions for SEM.
Acute and Chronic Disability Among US Farmers and Pesticide Applicators: The National Health Interview Survey O Gómez-Marín, D Zheng, W LeBlanc, D Lee,
A Comparison of General v. Specific Measures of Achievement Goal Orientation Lisa Baranik, Kenneth Barron, Sara Finney, and Donna Sundre Motivation Research.
Evaluating Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics.
The Role of Residential Segregation in Disparity Research: A Case Example of ADHD Diagnosis and Treatment Dinci Pennap, MPH, 1 Mehmet Burcu, MS, 1 Daniel.
Finding a Predictive Model for Post-Hospitalization Adverse Events Henry Carretta 1, PhD, MPH; Katrina McAfee 1,2, MS; Dennis Tsilimingras 1,3, MD, MPH.
Access points to the different levels of health care in a country without a gate-keeping system; numbers and reasons Kathryn Hoffmann 1, K. Viktoria Stein.
Super Learning in Prediction HIV Example Mark van der Laan Division of Biostatistics, University of California, Berkeley.
Optimization of psychotropic drug prescription in nursing home patients with dementia: the PROPER study (PRescription Optimization of Psychotropic drugs.
Racial/Ethnic Differences in Pediatric Antipsychotic Use by FDA Labeled/Off-label Status MARYLAND CENTER FOR EXCELLENCE IN REGULATORY SCIENCE & INNOVATION.
1 EPI235: Epi Methods in HSR April 5, 2005 L3 Evaluating Health Services using administrative data 2: Advanced Topics in Risk Adjustment (Dr. Schneeweiss)
Heart Disease Example Male residents age Two models examined A) independence 1)logit(╥) = α B) linear logit 1)logit(╥) = α + βx¡
1 SSC 2006: Case Study #2: Obstructive Sleep Apnea Rachel Chu, Shuyu Fan, Kimberly Fernandes, and Jesse Raffa Department of Statistics, University of British.
Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.
Pediatric Asthma Hospitalizations: Impact of Managed Care in the Patterns of Outpatient Healthcare Utilization Capriles, JA., Rodríguez, MH., Rios, R.,
CMS SAS Users Group Conference Learn more about THE POWER TO KNOW ® October 17, 2011 Medicare Payment Standardization Modeling using SAS Enterprise Miner.
Dr. Rachel Syme AD, Institute of Cancer Research, CIHR Dr. Bernie Eigl Provincial Director, Clinical Trials, BCCA Incremental Costs of Cancer Clinical.
PRAGMATIC Study Designs: Elderly Cancer Trials
Kelci J. Miclaus, PhD Advanced Analytics R&D Manager JMP Life Sciences
Global Youth Tobacco Survey (GYTS): Overview
Division of HIV/AIDS Managing Questionnaire Development for a National HIV Surveillance Survey, Medical Monitoring Project Jennifer L Fagan, Health Scientist/Interview.
Bootstrap and Model Validation
Why Model? Make predictions or forecasts where we don’t have data.
Machine Learning with Spark MLlib
JMP Discovery Summit 2016 Janet Alvarado
Lesson 11.4: Experimental Design
Modifying Interviewer Strategies to Reduce Cost of Data Collection
Refusal Conversions: When to Call It Quits
Trena M. Ezzati-Rice, Frederick Rohde, Robert Baskin
STAT 6304 Final Project Fall, 2016.
Department of Health Management and Informatics
Chapter Eight: Quantitative Methods
Comparisons among methods to analyze clustered multivariate biomarker predictors of a single binary outcome Xiaoying Yu, PhD Department of Preventive Medicine.
Annals of Internal Medicine • Vol. 167 No. 12 • 19 December 2017
Professional Organizations
What is Regression Analysis?
Multivariable Logistic Regression Split Cohort into Development &
Jensen, et. al Winter distribution of blue crab Callinectes sapidus in Chesapeake Bay: application and cross-validation of a two-stage generalized.
Postoperative neonatal mortality prediction using superlearning
Signature of CRC‐associated gut microbial species Relative abundances of 22 gut microbial species, collectively associated with CRC, are displayed as heatmap.
“The Roles of Race and Representation in Learning Health Systems:
Development Plans: Study Design and Dose Selection
Toktobaeva B, Karymbaeva S Drug Information Centre Kyrgyzstan
Impact on prescribing patterns of a fee per drug unit versus a fee per drug item Kathleen Holloway1, Karkee SB2, Tamang AL2, Gurung YB2, Pradhan R2, Reeves.
Regulatory Perspective of the Use of EHRs in RCTs
STEPS Site Report.
International Conference on Improving Use of Medicines
Gregory Levin, FDA/CDER/OTS/OB/DBIII
Sadeq R Chowdhury JSM 2019, Denver
Presentation transcript:

Chaoran Hu1,4, Xiao Tan2,4, Qing Pan3, Yong Ma4, Jaejoon Song4 Random Forests for Exploring Factors Driving Opioid Prescribing in National Outpatient Health Care Data Using Complex Survey Design Chaoran Hu1,4, Xiao Tan2,4, Qing Pan3, Yong Ma4, Jaejoon Song4 1 University of Connecticut, Department of Statistics 2 George Mason University, Department of Statistics 3 George Washington University, Department of Statistics 4 U.S. Food and Drug Administration, Center for Drug Evaluation and Research Joint Statistical Meeting, 2019

Disclaimer This presentation reflects the views of the author and should not be construed to represent FDA’s views or policies.

Background and study goals The opioid crisis Reduce unnecessary prescription is a key Need to understand opioid prescription pattern and identify important predictors of opioid prescription The NAMCS survey Nation-wide complex survey conducted by CDC Penalized logistic regression (PLR) with complex survey data LASSO with weighted logistic regression Random Forest (RF) with complex survey data Weighted RF Goals Compare the results of PLR with RF in complex survey data Evaluate the performances using cross validation www.fda.gov

Data description Stratum: Geographical regions Cluster: Physicians 2016 national ambulatory medical care survey data (NAMCS)1 Data collected by using complex survey structure simplified to stratified 2 stage sampling (see figure on the right) 10031 observations Response variable: opioid prescription (binary) Covariates: 190 deemed relevant, after removing highly correlated covariates, final covariates used is 177: number of medication other than opioid, physician specialty, usage of tobacco, total number of chronic conditions and others. Sampling weights: at patient level ________ 1. https://www.cdc.gov/nchs/ahcd/index.htm Stratum: Geographical regions Cluster: Physicians Weight: Patients www.fda.gov

Comparison of LASSO and Random Forest (RF) Y axis lists variables sorted by importance from RF model (bottom the most important) X axis shows increasing λ in the LASSO model and variables with shaded area remaining in the model A perfect match between LASSO and RF would show a shaded area covering the lower 45 degree region www.fda.gov λ in the LASSO model

Comparison of LASSO and Random Forest via cross-validation Overall classification error rate ROC curves Cross-validation: 2/3 of the original data was used to fit a LASSO or RF model and 1/3 used for validation and this was repeated 500 times Overall, Random Forest preforms better than LASSO (RF AUC at 0.83 vs. LASSO AUC at 0.81) www.fda.gov

Conclusions Random Forest and LASSO produced different yet similar results. Differences can be explained by the following factors Functional form of the continuous variables. Inclusion of the interaction terms. Handling highly correlated covariates. RF performs better than LASSO in terms of AUC, sensitivity and overall classification rate at cutoff 0.5. RF can be used as a tool to build a better LASSO regression model, or vice versa. Alternatively, a Super Learner approach can be used to join forces1. _______________________ 1 Van der Laan MJ, Polley EC, Hubbard AE. Super learner. Stat Appl Genet Mol Biol. 2007;6:Article25 www.fda.gov

Acknowledgement This study was made possible by funding from FDA’s Regulatory Science and Review Enhancement Program (RSR, FY 2019). This project was also supported in part by an appointment to the Oak Ridge Institute for Science and Education (ORISE) Research Participation Program at FDA/CDER, administered by ORISE through an interagency agreement between the U.S. Department of Energy and FDA/CDER