Download presentation
Presentation is loading. Please wait.
Published byΦερενίκη Καλάρης Modified over 5 years ago
1
Chaoran Hu1,4, Xiao Tan2,4, Qing Pan3, Yong Ma4, Jaejoon Song4
Random Forests for Exploring Factors Driving Opioid Prescribing in National Outpatient Health Care Data Using Complex Survey Design Chaoran Hu1,4, Xiao Tan2,4, Qing Pan3, Yong Ma4, Jaejoon Song4 1 University of Connecticut, Department of Statistics 2 George Mason University, Department of Statistics 3 George Washington University, Department of Statistics 4 U.S. Food and Drug Administration, Center for Drug Evaluation and Research Joint Statistical Meeting, 2019
2
Disclaimer This presentation reflects the views of the author and should not be construed to represent FDA’s views or policies.
3
Background and study goals
The opioid crisis Reduce unnecessary prescription is a key Need to understand opioid prescription pattern and identify important predictors of opioid prescription The NAMCS survey Nation-wide complex survey conducted by CDC Penalized logistic regression (PLR) with complex survey data LASSO with weighted logistic regression Random Forest (RF) with complex survey data Weighted RF Goals Compare the results of PLR with RF in complex survey data Evaluate the performances using cross validation
4
Data description Stratum: Geographical regions Cluster: Physicians
2016 national ambulatory medical care survey data (NAMCS)1 Data collected by using complex survey structure simplified to stratified 2 stage sampling (see figure on the right) 10031 observations Response variable: opioid prescription (binary) Covariates: 190 deemed relevant, after removing highly correlated covariates, final covariates used is 177: number of medication other than opioid, physician specialty, usage of tobacco, total number of chronic conditions and others. Sampling weights: at patient level ________ 1. Stratum: Geographical regions Cluster: Physicians Weight: Patients
5
Comparison of LASSO and Random Forest (RF)
Y axis lists variables sorted by importance from RF model (bottom the most important) X axis shows increasing λ in the LASSO model and variables with shaded area remaining in the model A perfect match between LASSO and RF would show a shaded area covering the lower 45 degree region λ in the LASSO model
6
Comparison of LASSO and Random Forest via cross-validation
Overall classification error rate ROC curves Cross-validation: 2/3 of the original data was used to fit a LASSO or RF model and 1/3 used for validation and this was repeated 500 times Overall, Random Forest preforms better than LASSO (RF AUC at 0.83 vs. LASSO AUC at 0.81)
7
Conclusions Random Forest and LASSO produced different yet similar results. Differences can be explained by the following factors Functional form of the continuous variables. Inclusion of the interaction terms. Handling highly correlated covariates. RF performs better than LASSO in terms of AUC, sensitivity and overall classification rate at cutoff 0.5. RF can be used as a tool to build a better LASSO regression model, or vice versa. Alternatively, a Super Learner approach can be used to join forces1. _______________________ 1 Van der Laan MJ, Polley EC, Hubbard AE. Super learner. Stat Appl Genet Mol Biol. 2007;6:Article25
8
Acknowledgement This study was made possible by funding from FDA’s Regulatory Science and Review Enhancement Program (RSR, FY 2019). This project was also supported in part by an appointment to the Oak Ridge Institute for Science and Education (ORISE) Research Participation Program at FDA/CDER, administered by ORISE through an interagency agreement between the U.S. Department of Energy and FDA/CDER
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.