A machine learning approach to prognostic and predictive covariate identification for subgroup analysis David A. James and David Ohlssen Advanced Exploratory.

Slides:



Advertisements
Similar presentations
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Advertisements

Random Forest Predrag Radenković 3237/10
Ridge Regression Population Characteristics and Carbon Emissions in China ( ) Q. Zhu and X. Peng (2012). “The Impacts of Population Change on Carbon.
Analysis & Expressing Resultd in Clinical Trials Dr. Khalili.
Departments of Medicine and Biostatistics
HSRP 734: Advanced Statistical Methods July 24, 2008.
Detecting an interaction between treatment and a continuous covariate: a comparison between two approaches Willi Sauerbrei Institut of Medical Biometry.
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.
Recursive Partitioning Method on Survival Outcomes for Personalized Medicine 2nd International Conference on Predictive, Preventive and Personalized Medicine.
Sparse vs. Ensemble Approaches to Supervised Learning
Distinguishing the Forest from the Trees University of Texas November 11, 2009 Richard Derrig, PhD, Opal Consulting Louise Francis,
Chapter 11: Sequential Clinical Trials Descriptive Exploratory Experimental Describe Find Cause Populations Relationships and Effect Sequential Clinical.
PROGNOSTIC SIGNIFICANCE OF PRIMARY TUMORAL FDG UPTAKE MEASURED BY PET: Systematic Review and Meta-analysis Ben A. Dwamena, MD.
Using Machine Learning to Model Standard Practice: Retrospective Analysis of Group C-Section Rate via Bagged Decision Trees Rich Caruana Cornell CS Stefan.
Common Problems in Writing Statistical Plan of Clinical Trial Protocol Liying XU CCTER CUHK.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Ensemble Learning (2), Tree and Forest
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Criteria for Assessment of Performance of Cancer Risk Prediction Models: Overview Ruth Pfeiffer Cancer Risk Prediction Workshop, May 21, 2004 Division.
The best of both worlds Pharma R&D IT - Informatics Rudi Verbeeck Guided analytics in the hands of the SME.
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 5: Analysis Issues in Large Observational Studies.
Mailing Campaign Model Nan Yang University of Central Florida 04/11/2008.
A Quantitative Multi-Gene RT-PCR Assay for Prediction of Recurrence in Stage II Colon Cancer (CC): Selection of the Genes in 4 Large Studies and Results.
NFL Play Predictions Will Burton, NCSU Industrial Engineering 2015
Rasch trees: A new method for detecting differential item functioning in the Rasch model Carolin Strobl Julia Kopf Achim Zeileis.
D:/rg/folien/ms/ms-USA ppt F 1 Assessment of prediction error of risk prediction models Thomas Gerds and Martin Schumacher Institute of Medical.
Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician Session 3: Incomplete Data in Longitudinal Studies.
Selection of Patient Samples and Genes for Disease Prognosis Limsoon Wong Institute for Infocomm Research Joint work with Jinyan Li & Huiqing Liu.
1 THE ROLE OF COVARIATES IN CLINICAL TRIALS ANALYSES Ralph B. D’Agostino, Sr., PhD Boston University FDA ODAC March 13, 2006.
Survival Analysis 1 Always be contented, be grateful, be understanding and be compassionate.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
THE ROLE OF SUBGROUPS IN CLINICAL TRIALS Ralph B. D’Agostino, Sr., PhD Boston University September 13, 2005.
Regression Tree Ensembles Sergey Bakin. Problem Formulation §Training data set of N data points (x i,y i ), 1,…,N. §x are predictor variables (P-dimensional.
Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.
Distinguishing the Forest from the Trees 2006 CAS Ratemaking Seminar Richard Derrig, PhD, Opal Consulting Louise Francis, FCAS,
Approaches to quantitative data analysis Lara Traeger, PhD Methods in Supportive Oncology Research.
Date of download: 5/29/2016 Copyright © 2016 American Medical Association. All rights reserved. From: Effects of a Fixed-Dose Combination Strategy on Adherence.
From: Cost-Effectiveness of Sacubitril-Valsartan Combination Therapy Compared With Enalapril for the Treatment of Heart Failure With Reduced Ejection Fraction.
Kelci J. Miclaus, PhD Advanced Analytics R&D Manager JMP Life Sciences
Bootstrap and Model Validation
for Overall Prognosis Workshop Cochrane Colloquium, Seoul
Measuring prognosis Patients want to know likely outcome
Anastasiia Raievska (Veramed)
CLINICAL PROTOCOL DEVELOPMENT
Jan B. Pietzsch1, Benjamin P. Geisler1, Murray D. Esler 2
Lecture 17. Boosting¶ CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.
How Good is a Model? How much information does AIC give us?
S117: Acute Setting Predictive Analytics Sharon E. Davis, MS
12 Inferential Analysis.
Employee Turnover: Data Analysis and Exploration
CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,
(classification & regression trees)
Lindsay A. Renfro1, Axel Grothey2, Leonard B
Common Problems in Writing Statistical Plan of Clinical Trial Protocol
Predicting Breast Cancer Diagnosis From Fine-Needle Aspiration
Multivariable Logistic Regression Split Cohort into Development &
12 Inferential Analysis.
Task 6 Statistical Approaches
Establishment of a Multi-Analyte Serum Biomarker Panel to Identify Lymph Node Metastases in Non-small Cell Lung Cancer  Jeffrey A. Borgia, PhD, Sanjib.
Five-Year Outcomes after Randomization to Transcatheter or Surgical Aortic Valve Replacement: Final Results of The PARTNER 1 Trial Michael J. Mack, MD.
Ensemble learning Reminder - Bagging of Trees Random Forest
Classification with CART
A systematic review and meta-analysis of stereotactic body radiation therapy versus surgery for patients with non–small cell lung cancer  Christopher.
with application in a phase II study
ASXL1 and BIM germ line variants predict response and identify CML patients with the greatest risk of imatinib failure by Justine E. Marum, David T. Yeung,
Björn Bornkamp, Georgina Bermann
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
STT : Intro. to Statistical Learning
2019 Joint Statistical Meetings at Denver
Presentation transcript:

A machine learning approach to prognostic and predictive covariate identification for subgroup analysis David A. James and David Ohlssen Advanced Exploratory Analytics Novartis Pharmaceuticals Joint Statistical Meetings July 2018

Use of machine learning for: Objectives Use of machine learning for: Discovering and exploring prognostic and predictive subgroups Patient risk stratification Risk prediction Two examples from large cardiovascular trials Note: Non-confirmatory setting

Patient stratification by relative risk Legend Relative Risk #events / #patients %Patients

Interrogating the tree-building process What are the top competing predictors for splitting each node? How much better was the winning splitting predictors vs the 2nd, 3rd,..., 5th contenders? What predictors could be used for imputing missing data? Etc. These investigations in addition to the usual assessments like cross- validation error estimates, ROC analysis, C-index (AUC), etc.

Competing splits at each non-terminal node (Baseline predictors) Here we go into the details of the tree construction. First we note the final tree (displayed in an abbreviated form) and the most important variables identified by the tree over all ~60 candidate predictors. Then we note the top 5 predictors for splitting the root node 1. The left bottom panel showing the change in deviance in the top node (vs the sum of the deviance in the daughter nodes) in the range of 145 units out of 8452 in the parent node. The right bottom panel displays the “surrogate” variables – these are the covariates most correlated to the primary split “heartfn” – it conveys a measure of collinearity b/w predictors involved in the splitting of node 1.

Competing splits at each non-terminal node (Baseline predictors) Here we go into the details of the tree construction. First we note the final tree (displayed in an abbreviated form) and the most important variables identified by the tree over all ~60 candidate predictors. Then we note the top 5 predictors for splitting the root node 1. The left bottom panel showing the change in deviance in the top node (vs the sum of the deviance in the daughter nodes) in the range of 145 units out of 8452 in the parent node. The right bottom panel displays the “surrogate” variables – these are the covariates most correlated to the primary split “heartfn” – it conveys a measure of collinearity b/w predictors involved in the splitting of node 1.

Competing splits at each non-terminal node (Baseline predictors) Here we go into the details of the tree construction. First we note the final tree (displayed in an abbreviated form) and the most important variables identified by the tree over all ~60 candidate predictors. Then we note the top 5 predictors for splitting the root node 1. The left bottom panel showing the change in deviance in the top node (vs the sum of the deviance in the daughter nodes) in the range of 145 units out of 8452 in the parent node. The right bottom panel displays the “surrogate” variables – these are the covariates most correlated to the primary split “heartfn” – it conveys a measure of collinearity b/w predictors involved in the splitting of node 1.

Searching for predictive factors Model-based (mob) partitioning trees Objective Assess whether baseline covariates are “predictive” Methods Quantify how much each baseline covariate changes the estimated treatment effects Use mob trees to split patients into subgroups according to those baseline covariates that impact the magnitude of the overall treatment effects $cantos_loc/local/trees/trees_v2.pptx

Predicting risk Random survival forests vs extended Cox Performance: 2-year predictions (C-index, calibration plots) Nelson-Aalen estimate of survival Out-of-bag” ensemble estimator