Making fractional polynomial models more robust Willi Sauerbrei Institut of Medical Biometry and Informatics University Medical Center Freiburg, Germany.

Slides:



Advertisements
Similar presentations
ADHD Reaction Times: Densities, Mixed Effects, and PCA.
Advertisements

Allometric Crown Width Equations for Northwest Trees Nicholas L. Crookston RMRS – Moscow June 2004.
Part II: Coping with continuous predictors
Transformations & Data Cleaning
Fall 2013Biostat 5110 (Biostatistics 511) Discussion Section Week 4 Sandrine Moutou Medical Biometry I.
Is the BMI a Relic of the Past? Wang-Sheng Lee School of Accounting, Economics and Finance Deakin University (joint work with David Johnston, Monash University)
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Statistics 100 Lecture Set 7. Chapters 13 and 14 in this lecture set Please read these, you are responsible for all material Will be doing chapters
Review of Univariate Linear Regression BMTRY 726 3/4/14.
Departments of Medicine and Biostatistics
Detecting an interaction between treatment and a continuous covariate: a comparison between two approaches Willi Sauerbrei Institut of Medical Biometry.
Selected from presentations by Jim Ramsay, McGill University, Hongliang Fei, and Brian Quanz Basis Basics.
Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
From last time….. Basic Biostats Topics Summary Statistics –mean, median, mode –standard deviation, standard error Confidence Intervals Hypothesis Tests.
Flexible modeling of dose-risk relationships with fractional polynomials Willi Sauerbrei Institut of Medical Biometry and Informatics University Medical.
Curve-Fitting Regression
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
7 November The 2003 CHEBS Seminar 1 The problem with costs Tony O’Hagan CHEBS, University of Sheffield.
Regression Diagnostics Checking Assumptions and Data.
Modelling health care costs: practical examples and applications Andrew Briggs Philip Clarke University of Oxford & Daniel Polsky Henry Glick University.
Multivariable model building with continuous data Willi Sauerbrei Institut of Medical Biometry and Informatics University Medical Center Freiburg, Germany.
A Longitudinal Study of Maternal Smoking During Pregnancy and Child Height Author 1 Author 2 Author 3.
Classification and Prediction: Regression Analysis
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Diane Stockton Trend analysis. Introduction Why do we want to look at trends over time? –To see how things have changed What is the information used for?
DIY fractional polynomials Patrick Royston MRC Clinical Trials Unit, London 10 September 2010.
Building multivariable survival models with time-varying effects: an approach using fractional polynomials Willi Sauerbrei Institut of Medical Biometry.
Modelling continuous exposures - fractional polynomials Willi Sauerbrei Institut of Medical Biometry and Informatics University Medical Center Freiburg,
Determining Sample Size
Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University.
Section 4.4: Simpson Paradox Section 4.5: Linearizing an association between two variable by performing a Mathematical Transformations 4-11.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Multiple Regression SECTIONS 10.1, 10.3 (?) Multiple explanatory variables.
Lecture 12 Model Building BMTRY 701 Biostatistical Methods II.
Epi-on-the-Island Time Series Regression (TSR) 6-10 July 2015 Wed 1: DLNMs Ben Armstrong London School of Hygiene and Tropical Medicine.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 5: Methods for Assessing Associations.
Practical Statistical Analysis Objectives: Conceptually understand the following for both linear and nonlinear models: 1.Best fit to model parameters 2.Experimental.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Today: Lab 9ab due after lecture: CEQ Monday: Quizz 11: review Wednesday: Guest lecture – Multivariate Analysis Friday: last lecture: review – Bring questions.
Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates David Speights Senior Research Statistician HNC Insurance.
Normal Curve 64, 95, 99.7.
Multivariable regression modelling – a pragmatic approach based on fractional polynomials for continuous variables Willi Sauerbrei Institut of Medical.
Use of FP and Other Flexible Methods to Assess Changes in the Impact of an exposure over time Willi Sauerbrei Institut of Medical Biometry and Informatics.
Curve-Fitting Regression
Further Pure 1 Lesson 7 – Sketching Graphs. Wiltshire Graphs and Inequalities Good diagrams Communicate ideas efficiently. Helps discovery & understanding.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Empirical Efficiency Maximization: Locally Efficient Covariate Adjustment in Randomized Experiments Daniel B. Rubin Joint work with Mark J. van der Laan.
Introduction to Multivariate Analysis Epidemiological Applications in Health Services Research Dr. Ibrahim Awad Ibrahim.
Lecture 11 Multicollinearity BMTRY 701 Biostatistical Methods II.
Prediction statistics Prediction generally True and false, positives and negatives Quality of a prediction Usefulness of a prediction Prediction goes Bayesian.
Data Analysis in Practice- Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine October.
Overview of Regression Analysis. Conditional Mean We all know what a mean or average is. E.g. The mean annual earnings for year old working males.
Curvilinear Regression
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Multiple Regression SECTIONS 10.1, 10.3 Multiple explanatory variables (10.1,
LECTURE 04: LINEAR REGRESSION PT. 2 February 3, 2016 SDS 293 Machine Learning.
Data Mining: Neural Network Applications by Louise Francis CAS Convention, Nov 13, 2001 Francis Analytics and Actuarial Data Mining, Inc.
LECTURE 16: BEYOND LINEARITY PT. 1 March 28, 2016 SDS 293 Machine Learning.
DATA ANALYSIS AND MODEL BUILDING LECTURE 9 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
DSCI 346 Yamasaki Lecture 6 Multiple Regression and Model Building.
Smoking and Lung Cancer
Piecewise Polynomials and Splines
Multivariable regression models with continuous covariates with a practical emphasis on fractional polynomials and applications in clinical epidemiology.
STAT 250 Dr. Kari Lock Morgan
A practical trial design for optimising treatment duration
Week 5 Lecture 2 Chapter 8. Regression Wisdom.
Optimal scaling for a logistic regression model with ordinal covariates Sanne JW Willems, Marta Fiocco, and Jacqueline J Meulman Leiden University & Stanford.
Clinical prediction models
Presentation transcript:

Making fractional polynomial models more robust Willi Sauerbrei Institut of Medical Biometry and Informatics University Medical Center Freiburg, Germany Patrick Royston MRC Clinical Trials Unit, London, UK

2 An interesting dataset From Johnson (J Statistics Education 1996) Percent body fat measurements in 252 men 13 continuous covariates comprising age, weight, height, 10 body circumference measurements Used by Johnson to illustrate some of the problems of multiple regression analysis (collinearity etc.)

3 The problem …

4 Effect of case 39 on FP analysis (P-values for non-linear effects) Non-linearity depends on case 39 This case has an undue influence on the results of the FP analysis Would have similar influence on other flexible models, e.g. splines

5 Brief reminder: Fractional polynomial models For one covariate, X Fractional polynomial of degree m for X with powers p 1, …, p m is given by FPm(X) =  1 X p1 + … +  m X pm Powers p 1,…, p m are taken from a special set {  2,  1,  0.5, 0, 0.5, 1, 2, 3} In clinical data, m = 1 or m = 2 is usually sufficient for a good fit

6 FP1 and FP2 models FP1 models are simple power transformations 1/X 2, 1/X, 1/  X, log X,  X, X, X 2, X 3  8 models of the form  0 +  1 X p FP2 models have combinations of the powers  For example  0 +  1 (1/X) +  2 (X 2 )  28 models Also ‘repeated powers’ models  For example (1, 1):  0 +  1 X +  2 X log X  8 models

7 Bodyfat: Case 39 also influences a multivariable FP model Case 39 is extreme for several covariates

8 A conceptual solution: preliminary transformation of X

9 Bodyfat revisited

10 Preliminary transformation: effect on multivariable FP analysis Apply preliminary transformation to all predictors in bodyfat data

11 The transformation (1) Take  = 0.01 for best results

12 The transformation (2) 0 < g(z,  ) < 1 for any z and  g(z,  ) tends to asymptotes 0 and 1 as z tends to  g(z,  ) looks like a straight line centrally, smoothly truncated at the extremes

13 The transformation (3)  = 0.01 is nearly linear in central region

14 The transformation (4) FP functions (including transformations such as log) are sensitive to values of x near 0 To avoid this effect, shift the origin of g(z,  ) to the right Simple linear transformation of g(z,  ) to the interval ( , 1) does this Simulation studies support  = 0.2

15 Example 2 – Whitehall 1 study 17,370 male Civil Servants aged years Covariates: age, cigarette smoking, BP, cholesterol, height, weight, job grade Outcomes of interest: all-cause mortality  logistic regression Interested in risk as function of covariates Several continuous covariates Risk functions  preliminary transformation

16 Multivariable FP modelling with or without preliminary transformation Green vertical lines show 1 and 99 th centiles of X

17 Comments and conclusions Issue of robustness affects FP and other models Standard analysis of influence may identify problematic points but does not tell you what to do Proposed preliminary transformation is effective in reducing leverage of extreme covariate values  Lowers the chance that FP and other flexible models will contain artefacts in curve shape  Transformation looks complicated, but graph shows idea is really quite simple – like double truncation May be concerned about possible bias in fit at extreme values of X following transformation