Regression and Clinical prediction models

Slides:



Advertisements
Similar presentations
Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Advertisements

Brief introduction on Logistic Regression
Chi Squared Tests. Introduction Two statistical techniques are presented. Both are used to analyze nominal data. –A goodness-of-fit test for a multinomial.
Session 8b Decision Models -- Prof. Juran.
1 Statistical Modeling  To develop predictive Models by using sophisticated statistical techniques on large databases.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Departments of Medicine and Biostatistics
HSRP 734: Advanced Statistical Methods July 24, 2008.
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
Chapter 16 Chi Squared Tests.
Ch 15 - Chi-square Nonparametric Methods: Chi-Square Applications
Sparse vs. Ensemble Approaches to Supervised Learning
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.
Validation of predictive regression models Ewout W. Steyerberg, PhD Clinical epidemiologist Frank E. Harrell, PhD Biostatistician.
Introduction to Directed Data Mining: Neural Networks
Ensemble Learning (2), Tree and Forest
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Inferential statistics Hypothesis testing. Questions statistics can help us answer Is the mean score (or variance) for a given population different from.
Discrete Choice Modeling William Greene Stern School of Business New York University.
Simple Linear Regression
Sampling Theory Determining the distribution of Sample statistics.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Using Resampling Techniques to Measure the Effectiveness of Providers in Workers’ Compensation Insurance David Speights Senior Research Statistician HNC.
Experimental Evaluation of Learning Algorithms Part 1.
Linear correlation and linear regression + summary of tests
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
Introduction to sample size and power calculations Afshin Ostovar Bushehr University of Medical Sciences.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Linear Discriminant Analysis and Its Variations Abu Minhajuddin CSE 8331 Department of Statistical Science Southern Methodist University April 27, 2002.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
1 Statistics & R, TiP, 2011/12 Neural Networks  Technique for discrimination & regression problems  More mathematical theoretical foundation  Works.
Case Selection and Resampling Lucila Ohno-Machado HST951.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Chi-Två Test Kapitel 6. Introduction Two statistical techniques are presented, to analyze nominal data. –A goodness-of-fit test for the multinomial experiment.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Survival Skills for Researchers Study Design. Typical Process in Research Design study Generate hypotheses Develop tentative new theories Analyze & interpret.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
Bootstrap and Model Validation
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Logic of Hypothesis Testing
BINARY LOGISTIC REGRESSION
Chapter 7. Classification and Prediction
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 25 Comparing Counts.
Review for Exam 2 Some important themes from Chapters 6-9
Introduction to Predictive Modeling
Categorical Data Analysis Review for Final
CH2. Cleaning and Transforming Data
Chapter 26 Comparing Counts.
Ensemble learning Reminder - Bagging of Trees Random Forest
Lecture 1: Descriptive Statistics and Exploratory
Evidence Based Practice
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Chapter 26 Comparing Counts.
Regression and Clinical prediction models
Regression and Clinical prediction models
Regression and Clinical prediction models
Clinical prediction models
Introduction to Machine learning
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Business Statistics For Contemporary Decision Making 9th Edition
Presentation transcript:

Regression and Clinical prediction models Session 3 Steps in planning and conducting CPM research – Part 2 Pedro E A A do Brasil pedro.brasil@ini.fiocruz.br 2018

Objectives Clinical prediction models have some unique characteristics which make them different from other observational studies. In this session, usual steps in planning and conducting CPM research will be introduced and commented. One must be well aware of which state of development the research line is, to know what additional evidence is necessary to have a prediction model available. 2015 Session 3

Statistical model Choosing the model (tool) is not an easy task Many options with modern modeling and software availability Often advantages of a model over another is theoretical but not confirmed on predictions accuracy Medical readers may be resistant to unusual models, even if they predict better Common outcomes formats helps to choose a model binary, unordered categorical, ordered categorical, continuous, and survival data. 2015 Session 3

Statistical model Some options according to outcomes formats (e.g.) For binary outcome Logistic regression, decision trees, neural network, GAM, MARS, GEE, SVM, Random forest Unordered categorical outcome Multinomial regression, neural network Ordered categorical outcome Ordered logistic regression Continuous outcome Ordinary least squared (linear regression), GAM, SVM, GEE Survival outcome Cox or parametric models, decision trees, neural networks, Random forest 2015 Session 3

Statistical model Before definitely choosing a model one may consider Wonder and possibly test if model assumptions can be met only to the extent that adaptations to the model lead to better predictions Wonder if model assumptions can be flexiblelized or worked around Significant violations of underlying assumptions do not mean that a model predicts poorly Robustness is preferred over flexibility in capturing idiosyncracies Test two or more options of models Transform the outcome of interest To follow model assumptions or facilitate modeling and predictions Be very very careful in back transforming Results of the model should be transparent and presentable to the intended audience. 2015 Session 3

Statistical model Quality of predictions may depend on: The essential quality and appropriateness of the method The actual implementation of the method as a computer program The skill of the “data pilot” 2015 Session 3

Statistical model 2015 Session 3 Steyerbeg. Clinical Prediction Models:  A Practical Approach to Development, Validation, and Updating. Springer in 2009. 2015 Session 3

Statistical model 2015 Session 3 Steyerbeg. Clinical Prediction Models:  A Practical Approach to Development, Validation, and Updating. Springer in 2009. 2015 Session 3

Statistical model 2015 Session 3

Statistical model Survival analysis Cox regression model provides a default framework for prediction of long-term prognostic outcomes. Kaplan–Meier analysis provides a nonparametric method, but requires categorization of all predictors. It is the equivalent of cross-tables Parametric survival models may be useful for predictive purposes because of their parsimony and robustness, for example at the end of follow-up 2014 Session 3

Statistical model 2015 Session 3 Steyerbeg. Clinical Prediction Models:  A Practical Approach to Development, Validation, and Updating. Springer in 2009. 2015 Session 3

Usual steps of CPM Seven modelling steps Data inspection Missing values Coding of predictors Continuous predictors; Combining categorical predictors Restrictions on candidate predictors Model specification Appropriate selection of main effects? Assessment of assumptions (distributional, linearity, and additivity)? Steyerbeg. Clinical Prediction Models:  A Practical Approach to Development, Validation, and Updating. Springer in 2009. 2015 Session 3

Usual steps of CPM Model estimation Model validation Shrinkage included? External information used? Model performance Appropriate statistical measures used? Clinical usefulness considered? Model validation Internal validation, including model specification and estimation? External validation? Steyerbeg. Clinical Prediction Models:  A Practical Approach to Development, Validation, and Updating. Springer in 2009. 2015 Session 3

Usual steps of CPM Validity Model presentation Internal: overfitting - sufficient attempts to limit and correct for overfitting? External: generalizability - predictions valid for plausibly related populations? Model presentation Format appropriate for audience? Steyerbeg. Clinical Prediction Models:  A Practical Approach to Development, Validation, and Updating. Springer in 2009. 2015 Session 3

Model estimation Overfitting and Optimism We are primarily interested in the validity of the predictions for new subjects, outside the sample under study Overfitting causes optimism Overfitting - the data under study are well described, but predictions are not valid for new subjects, usually accuracy is overestimated; a statistical model with too many degrees of freedom in the modelling process Optimism – accuracy overestimation due to overfitting; true performance minus apparent performance The solution is generally named “shrinkage” or penalization Bootstrap resampling is a central technique to quantify optimism in internal model performance 2015 Session 3

Model estimation Overfitting and Optimism 2015 Session 3 Steyerbeg. Clinical Prediction Models:  A Practical Approach to Development, Validation, and Updating. Springer in 2009. 2015 Session 3

Model estimation Overfitting and Optimism 2015 Session 3 Steyerbeg. Clinical Prediction Models:  A Practical Approach to Development, Validation, and Updating. Springer in 2009. 2015 Session 3

Model estimation What is bootstrap? 2015 Session 3 Steyerbeg. Clinical Prediction Models:  A Practical Approach to Development, Validation, and Updating. Springer in 2009. 2015 Session 3

Optimism = Bootstrap performance − Test performance Model estimation Bootstrap for calibration Optimism-corrected performance = Apparent performance in sample − Optimism Optimism = Bootstrap performance − Test performance Steyerbeg. Clinical Prediction Models:  A Practical Approach to Development, Validation, and Updating. Springer in 2009. 2015 Session 3

Model estimation Bootstrap for calibration 2015 Session 3 Steyerbeg. Clinical Prediction Models:  A Practical Approach to Development, Validation, and Updating. Springer in 2009. 2015 Session 3

Concluding There are recognized methodological standards that should be adhered to when developing and validating CPRs. The research design must follow the hypothesis question and each choice has it strong and weak points. Several analysis steps not usually included in other observational research must be considered, such as calibration. 2015 Session 3

Steps in planning and conducting CPM research – Part 2 fim Session 3 Steps in planning and conducting CPM research – Part 2 Pedro E A A do Brasil pedro.brasil@ini.fiocruz.br 2018