Download presentation
Presentation is loading. Please wait.
Published byHugo Stanley Modified over 6 years ago
1
Postoperative neonatal mortality prediction using superlearning
Jennifer Cooper, PhD Peter C. Minneci, MD, MHSc, Katherine J. Deans, MD, MHSc Center for Surgical Outcomes Research and Center for Innovation in Pediatric Practice The Research Institute at Nationwide Children’s Hospital
2
Background Logistic regression models have traditionally been the exclusive method used for clinical prediction modeling Logistic regression imposes stringent, parametric constraints on the relationship between predictors and the probability of an outcome Superlearning is an ensemble machine learning method for selecting via cross-validation the optimal algorithm among all weighted combinations of a set1,2 1. van der Laan MJ, Polley EC, Hubbard AE. Super learner. Stat Appl Genet Mol Biol 2007;6:25. 2. Polley EC, van der Laan MJ. SuperLearner: Super Learner Prediction. 2014; version Available from:
3
Objective To develop and validate a clinical prediction model, using superlearning, for 30-day postoperative mortality in neonates The objective of this study was to develop and validate a clinical prediction model, using superlearning, for 30-day postoperative mortality in neonates. We chose this problem because the neonatal surgery patient population is a very complex and high risk population, and the mechanisms causing mortality in these patients are also highly varied and complex. Thus, we thought that a simple parametric prediction model would be unlikely to perform well.
4
Methods Used National Surgical Quality Improvement Program-Pediatric data Development sample: patients treated in (N=6499, 3.6% mortality) Validation sample: patients treated in 2014 (N=3552, 3.8% mortality) 211 preoperative predictors and 14 algorithms 2 stepwise logistic regression models, 3 penalized logistic regression models, 2 generalized boosted regression models, 5 random forest models, and 2 classification tree models Repeated analysis after removing predictors with p>0.20 in bivariate analysis Examined discrimination (AUC) and calibration (calibration intercept and slope) of superlearner and all constituent algorithms in both development and validation datasets
5
Preoperative patient characteristics
Patients treated in Survived 30 days (N=6267) Died within 30 days (N=232) Age at surgery (days) 16 (3-53) 14 (7-28) Gestational age at surgery (weeks) 39 (36-41) 33 (29-38) Female 2405 (38.4) 103 (44.4) Race White 4041 (64.5) 131 (56.5) Black 968 (15.4) 53 (22.8) Asian 136 (2.2) 5 (2.2) Other/Unknown 1122 (17.9) 43 (18.5) Weight at surgery (kg) 3.09 ( ) 1.60 ( ) Ventilator dependent 1503 (24.0) 201 (86.6) Bronchopulmonary dysplasia or chronic lung disease 978 (15.6) 57 (24.6) Oxygen support 1656 (26.4) 169 (72.8) Structural pulmonary/airway abnormality 814 (13.0) 54 (23.3) Esophageal, gastric, or intestinal disorder 3897 (62.2) 156 (67.2) Any cardiac risk factor 2339 (37.3) 123 (53.0) Nutritional support 2691 (42.9) 160 (69.0) Hematologic disorder 784 (12.5) 78 (33.6) Congenital malformation 2855 (45.6) 83 (35.8) Most common principal procedures Pyloromyotomy 687 (11.0) 0 (0) Creation of VP shunt 362 (5.8) 3 (1.3) Repair of large omphalocele or gastroschisis 329 (5.2) 11 (4.7) This tables several key patient demographic and preoperative clinical characteristics at the time of surgery. All patient characteristics were similar in As you can see, there were numerous differences between patients who did and did not die within 30 days of their surgery. Those who died were on average younger, both in terms of their actual age and their gestational age at surgery, were more often female, more often black, weighed less at surgery, were much more likely to have respiratory issues, and more likely to have several particular types of comorbidities at the time of surgery. The most common surgical procedures performed in the total study sample were pyloromyotomies, VP shunt procedures, and repairs of omphalocele or gastroschisis, which are abdominal wall defects.
6
Model discrimination Discrimination of the superlearner, as measured by the area under the receiver operating characteristics curve, was excellent, at Discrimination of most of the other algorithms was also excellent, ranging from 0.87 for stepwise regression with forward selection to 0.91 for the random forest with 50 variables randomly selected for consideration at each node and a minimum node size of either 1 or 5. The only exceptions were the single classification trees, which had very low AUCs of only 0.59 each.
7
Model calibration As this figure shows, the calibration of the superlearner was excellent, with the predicted and observed probabilities of mortality lining up quite well. At 0.36 and 1.16, the calibration intercept and slope were very close to their optimal values of 0 and 1 respectively.
8
Results in validation sample
Good discrimination: AUC: 0.87 (95% CI ) Poor calibration: U statistic 0.064, p<0.001 All results were similar after variable screening The discrimination of the superlearner was also quite good in the validation dataset, with an AUC of As this figure shows, however, the calibration of our superlearner algorithm was unfortunately not good in the validation sample. It overestimate the risk of mortality among patients with a low risk or mortality but underestimated the risk of mortality among patients with the highest risk.
9
Conclusions Superlearning provided improved or equivalent accuracy compared to individual regression and machine learning algorithms for predicting neonatal surgical mortality in development sample but showed poor calibration in validation sample Superlearning offers a flexible alternative to other non-parametric methods because it can include as many candidate algorithms as desired and will perform at least as well as the best individual algorithm in its library Superlearning should be considered for prediction in large datasets whenever complex mechanisms make parametric modeling assumptions unrealistic Although Super Learner will perform no worse than best constituent algorithm in a training dataset, there is no guarantee it will perform well in a validation dataset Poor calibration in our validation data may have been due to 11 new hospitals joining the NSQIP-Pediatric program in 2014 In conclusion…. In our validation study, we believe that the poor calibration may have been due to 11 new hospitals, and of course many more new surgeons at those hospitals, joining the NSQIP-Pediatric program in Because no hospital or surgeon characteristics are available in the NSQIP dataset that is available to researchers, it is impossible to include these variables in the prediction algorithms. Therefore changes in the distributions of these characteristics over time can clearly decrease the accuracy of a prediction model that was developed on earlier data from a smaller set of hospitals and surgeons.
10
Contact Information Jennifer Cooper, PhD
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.