Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Considerations for Using Multiple Databases to Build a Biomarker Probability Tool Shijia Bian MS1; Wenting Wang PhD1; Nancy Maserejian.

Similar presentations


Presentation on theme: "Statistical Considerations for Using Multiple Databases to Build a Biomarker Probability Tool Shijia Bian MS1; Wenting Wang PhD1; Nancy Maserejian."— Presentation transcript:

1 Statistical Considerations for Using Multiple Databases to Build a Biomarker Probability Tool
Shijia Bian MS1; Wenting Wang PhD1; Nancy Maserejian ScD2; Judith Jaeger PhD3,4; Robert Engle MS1; Timothy Swan MS1; James McIninch PhD5; Feng Gao PhD1 July 30th, 2018

2 Robust Framework with Intuitive Interpretability and Visualization
Overview A new simulation framework using nested-cross-validation accompanied with stratified-subsampling procedures was developed for assessing the probability of the biomarker presence. A heat plot was created for intuitive interpretation. Framework Advantage Alleviates problems caused by heterogeneity among these training or testing databases and facilitates tuning the model for specific target populations (e.g., a particular clinical setting) Prevents overfitting and increases model robustness & reproducibility Overcomes the randomness caused by simulation and provides intuitive interpretation Visually presents the probability of biomarker in a heat plot Success Criteria An independent third-party validation will be blindly conducted using the model developed from this framework. This validation performance could be examined for reproducibility and robustness.

3 Framework Incorporated with Nested Cross- Validation & Choice of Modeling Method
Simulation Framework Basic Algorithm

4 Model “Averaging” to Overcome Randomness and Help to Develop Heat Plot
Random split of training/validation/test data resulted in different trained optimal models at every iteration. Averaging those optimal models provided their consensus: the expected probability of biomarker presence A heat plot is a convenient visualization tool representing the probability of biomarker presence, that can be readily employed in the real world In addition to the expected probability, other summary statistics, such as SD and median, could also be summarized by model “averaging” & presented using the heat plot Example - 1 binary and 2 quantitative features with binary response variable: biomarker presence or not AVERAGING

5 Stratification sampling in the framework can mimic the target populations
The same features are used for predicting the probability of biomarker presence (binary response variable) by applying this framework with 1,000 simulations. However, the constructed heat plots are different because the target populations are different, which has been accounted for by this simulation framework. Target Population 1 Target Population 2 Expected Prob. of Biomarker Present Binary Qualitative feature 1 Quantitative Feature 1 Quantitative Feature 2 With the two feature combination of (2, 65), the probability of the biomarker being present is 74% and 65% respectively

6 Performance Metrics & External Validation to Ensure Reproducibility
Cut-off for the probability of biomarker presence can be chosen based upon cost- benefit considerations AUC, accuracy, PPV, NPV, sensitivity and specificity can be calculated accordingly The metrics of interest could be selected to match users’ needs Third-party Blind Validation The performance of the external validation can be retained and then compared to the internal validation result to ensure the reproducibility Application This framework can aid in development of a tool for estimating the probability of presence of a biomarker. With a potential effective treatment and more readily obtained patient data, it may be used to support efficient resource use of diagnostic modalities by identifying clinical cases with elevated likelihood of disease pathology. Internal Findings Robustness: the internal and external third-party validation AUC are almost identical Generalizability: this framework has been generalized to other fields and scenarios

7 Thank You

8 Affiliations and contact information 1Department of Global Biometrics, Biogen, Cambridge MA, USA 2Department of Epidemiology, Biogen, Cambridge MA, USA 3CognitionMetrics, 4023 Kennett Pike, Suite 253, Wilmington, DE USA 4Department of Psychiatry and Behavioral Sciences, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY USA 5 Alnylam Pharmaceuticals, Cambridge MA, USA Contact info: Feng Gao, Shijia Bian,


Download ppt "Statistical Considerations for Using Multiple Databases to Build a Biomarker Probability Tool Shijia Bian MS1; Wenting Wang PhD1; Nancy Maserejian."

Similar presentations


Ads by Google