Download presentation
Presentation is loading. Please wait.
Published byTheodora Robertson Modified over 5 years ago
1
A machine learning approach to prognostic and predictive covariate identification for subgroup analysis David A. James and David Ohlssen Advanced Exploratory Analytics Novartis Pharmaceuticals Joint Statistical Meetings July 2018
2
Use of machine learning for:
Objectives Use of machine learning for: Discovering and exploring prognostic and predictive subgroups Patient risk stratification Risk prediction Two examples from large cardiovascular trials Note: Non-confirmatory setting
3
Patient stratification by relative risk
Legend Relative Risk #events / #patients %Patients
4
Interrogating the tree-building process
What are the top competing predictors for splitting each node? How much better was the winning splitting predictors vs the 2nd, 3rd,..., 5th contenders? What predictors could be used for imputing missing data? Etc. These investigations in addition to the usual assessments like cross- validation error estimates, ROC analysis, C-index (AUC), etc.
5
Competing splits at each non-terminal node (Baseline predictors)
Here we go into the details of the tree construction. First we note the final tree (displayed in an abbreviated form) and the most important variables identified by the tree over all ~60 candidate predictors. Then we note the top 5 predictors for splitting the root node 1. The left bottom panel showing the change in deviance in the top node (vs the sum of the deviance in the daughter nodes) in the range of 145 units out of 8452 in the parent node. The right bottom panel displays the “surrogate” variables – these are the covariates most correlated to the primary split “heartfn” – it conveys a measure of collinearity b/w predictors involved in the splitting of node 1.
6
Competing splits at each non-terminal node (Baseline predictors)
Here we go into the details of the tree construction. First we note the final tree (displayed in an abbreviated form) and the most important variables identified by the tree over all ~60 candidate predictors. Then we note the top 5 predictors for splitting the root node 1. The left bottom panel showing the change in deviance in the top node (vs the sum of the deviance in the daughter nodes) in the range of 145 units out of 8452 in the parent node. The right bottom panel displays the “surrogate” variables – these are the covariates most correlated to the primary split “heartfn” – it conveys a measure of collinearity b/w predictors involved in the splitting of node 1.
7
Competing splits at each non-terminal node (Baseline predictors)
Here we go into the details of the tree construction. First we note the final tree (displayed in an abbreviated form) and the most important variables identified by the tree over all ~60 candidate predictors. Then we note the top 5 predictors for splitting the root node 1. The left bottom panel showing the change in deviance in the top node (vs the sum of the deviance in the daughter nodes) in the range of 145 units out of 8452 in the parent node. The right bottom panel displays the “surrogate” variables – these are the covariates most correlated to the primary split “heartfn” – it conveys a measure of collinearity b/w predictors involved in the splitting of node 1.
8
Searching for predictive factors Model-based (mob) partitioning trees
Objective Assess whether baseline covariates are “predictive” Methods Quantify how much each baseline covariate changes the estimated treatment effects Use mob trees to split patients into subgroups according to those baseline covariates that impact the magnitude of the overall treatment effects $cantos_loc/local/trees/trees_v2.pptx
9
Predicting risk Random survival forests vs extended Cox
Performance: 2-year predictions (C-index, calibration plots) Nelson-Aalen estimate of survival Out-of-bag” ensemble estimator
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.