Modelling continuous exposures - fractional polynomials Willi Sauerbrei Institut of Medical Biometry and Informatics University Medical Center Freiburg,

Slides:



Advertisements
Similar presentations
Part II: Coping with continuous predictors
Advertisements

Exploring the Shape of the Dose-Response Function.
Analysis of variance (ANOVA)-the General Linear Model (GLM)
/k 2DS00 Statistics 1 for Chemical Engineering lecture 4.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Review of Univariate Linear Regression BMTRY 726 3/4/14.
Departments of Medicine and Biostatistics
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Interactions With Continuous Variables – Extensions of the Multivariable Fractional Polynomial Approach Willi Sauerbrei Institut of Medical Biometry and.
HSRP 734: Advanced Statistical Methods July 24, 2008.
RELATIVE RISK ESTIMATION IN RANDOMISED CONTROLLED TRIALS: A COMPARISON OF METHODS FOR INDEPENDENT OBSERVATIONS Lisa N Yelland, Amy B Salter, Philip Ryan.
Detecting an interaction between treatment and a continuous covariate: a comparison between two approaches Willi Sauerbrei Institut of Medical Biometry.
Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
x – independent variable (input)
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Making fractional polynomial models more robust Willi Sauerbrei Institut of Medical Biometry and Informatics University Medical Center Freiburg, Germany.
Statistics for Managers Using Microsoft® Excel 5th Edition
Flexible modeling of dose-risk relationships with fractional polynomials Willi Sauerbrei Institut of Medical Biometry and Informatics University Medical.
BIOST 536 Lecture 9 1 Lecture 9 – Prediction and Association example Low birth weight dataset Consider a prediction model for low birth weight (< 2500.
1 Chapter 9 Variable Selection and Model building Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Issues In Multivariable Model Building With Continuous Covariates, With Emphasis On Fractional Polynomials Willi Sauerbrei Institut of Medical Biometry.
Multiple Linear Regression
Multivariable model building with continuous data Willi Sauerbrei Institut of Medical Biometry and Informatics University Medical Center Freiburg, Germany.
Correlation and Regression Analysis
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Model Checking in the Proportional Hazard model
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Diane Stockton Trend analysis. Introduction Why do we want to look at trends over time? –To see how things have changed What is the information used for?
Regression and Correlation Methods Judy Zhong Ph.D.
The Use of Fractional Polynomials in Multivariable Regression Modeling Part I - General considerations and issues in variable selection Willi Sauerbrei.
DIY fractional polynomials Patrick Royston MRC Clinical Trials Unit, London 10 September 2010.
Building multivariable survival models with time-varying effects: an approach using fractional polynomials Willi Sauerbrei Institut of Medical Biometry.
Simple Linear Regression
Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University.
Lecture 12 Model Building BMTRY 701 Biostatistical Methods II.
Biostatistics Case Studies 2015 Youngju Pak, PhD. Biostatistician Session 4: Regression Models and Multivariate Analyses.
Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables.
بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Assessing Survival: Cox Proportional Hazards Model
Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Multivariable regression modelling – a pragmatic approach based on fractional polynomials for continuous variables Willi Sauerbrei Institut of Medical.
Use of FP and Other Flexible Methods to Assess Changes in the Impact of an exposure over time Willi Sauerbrei Institut of Medical Biometry and Informatics.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Linear correlation and linear regression + summary of tests
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
Lecture 12: Cox Proportional Hazards Model
Simple linear regression Tron Anders Moger
1 Chapter 16 logistic Regression Analysis. 2 Content Logistic regression Conditional logistic regression Application.
Ch15: Multiple Regression 3 Nov 2011 BUSI275 Dr. Sean Ho HW7 due Tues Please download: 17-Hawlins.xls 17-Hawlins.xls.
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
DSCI 346 Yamasaki Lecture 6 Multiple Regression and Model Building.
Canadian Bioinformatics Workshops
Stats Methods at IC Lecture 3: Regression.
Logistic Regression APKC – STATS AFAC (2016).
Multivariable regression models with continuous covariates with a practical emphasis on fractional polynomials and applications in clinical epidemiology.
Lecture 12 Model Building
Presentation transcript:

Modelling continuous exposures - fractional polynomials Willi Sauerbrei Institut of Medical Biometry and Informatics University Medical Center Freiburg, Germany Patrick Royston MRC Clinical Trials Unit, London, UK

2 Overview Part 1General issues in regression models Part 2Fractional polynomial models - Univariate - multivariate Part 3Interactions

3 Observational Studies Several variables, mix of continuous and (ordered) categorical variables Different situations: –prediction –explanation –confounders only Explanation is the main interest here: Identify variables with (strong) influence on the outcome Determine functional form (roughly) for continuous variables The issues are very similar in different types of regression models (linear regression model, GLM, survival models...) Use subject-matter knowledge for modelling but for some variables, data-driven choice inevitable

4 Regression models X=(X 1,...,X p ) covariate, prognostic factors g(x) = ß 1 X 1 + ß 2 X ß p X p (assuming effects are linear) normal errors (linear) regression model Y normally distributed E (Y|X) = ß 0 + g(X) Var (Y|X) = σ 2 I logistic regression model Y binary Logit P (Y|X) = ln survival times T survival time (partly censored) Incorporation of covariates g(X) (g(X))

5 Central issue To select or not to select (full model)? Which variables to include? How to model continuous variables?

6 Continuous variables – The problem “Quantifying epidemiologic risk factors using non- parametric regression: model selection remains the greatest challenge” Rosenberg PS et al, Statistics in Medicine 2003; 22: Discussion of issues in (univariate) modelling with splines Trivial nowadays to fit almost any model To choose a good model is much harder

7 Building multivariable regression models Before dealing with the functional form, the ‚easier‘ problem of model selection: variable selection assuming that the effect of each continuous variable is linear

8 Multivariable models – methods for variable selection Full model –variance inflation in the case of multicollinearity Stepwise procedures  prespecified (  in,  out ) and actual significance level? forward selection (FS) stepwise selection (StS) backward elimination (BE) All subset selection  which criteria? C p Mallows= (SSE / ) - n+ p 2 AICAkaike Information Criterion= n ln (SSE / n)+ p 2 BICBayes Information Criterion= n ln (SSE / n)+ p ln(n) fitpenalty Combining selection with Shrinkage Bayes variable selection Recommendations??? Central issue: MORE OR LESS COMPLEX MODELS?

9 Backward elimination is a sensible approach -Significance level can be chosen -Reduces overfitting Of course required Checks Sensitivity analysis Stability analysis

10 Traditional approaches a) Linear function - may be inadequate functional form - misspecification of functional form may lead to wrong conclusions b) ‘best‘ ‘standard‘ transformation c) Step function (categorial data) - Loss of information - How many cutpoints? - Which cutpoints? - Bias introduced by outcome-dependent choice Continuous variables – what functional form?

11 StatMed 2006, 25:

12 Fractional polynomial models Describe for one covariate, X Fractional polynomial of degree m for X with powers p 1, …, p m is given by FPm(X) =  1 X p 1 + … +  m X p m Powers p 1,…, p m are taken from a special set {  2,  1,  0.5, 0, 0.5, 1, 2, 3} Usually m = 1 or m = 2 is sufficient for a good fit Repeated powers (p 1 =p 2 )  1 X p 1 +  2 X p 1 log X 8 FP1, 36 FP2 models

13 Examples of FP2 curves - varying powers

14 Examples of FP2 curves - single power, different coefficients

15 Our philosophy of function selection Prefer simple (linear) model Use more complex (non-linear) FP1 or FP2 model if indicated by the data Contrasts to more local regression modelling –Already starts with a complex model

events for recurrence-free survival time (RFS) in 686 patients with complete data 7 prognostic factors, of which 5 are continuous Example 1: Prognostic factors GBSG-study in node-positive breast cancer

17 FP analysis for the effect of age

18 χ 2 dfp-value Any effect? Best FP2 versus null Linear function suitable? Best FP2 versus linear FP1 sufficient? Best FP2 vs. best FP Function selection procedure (FSP) Effect of age at 5% level?

19 Many predictors – MFP With many continuous predictors selection of best FP for each becomes more difficult  MFP algorithm as a standardized way to variable and function selection (usually binary and categorical variables are also available) MFP algorithm combines backward elimination with FP function selection procedures

20 P-value Continuous factors Different results with different analyses Age as prognostic factor in breast cancer (adjusted)

21 Results similar? Nodes as prognostic factor in breast cancer (adjusted) P-value

22 Example 2: Risk factors Whitehall 1 –17,370 male Civil Servants aged years, 1670 (9.7%) died –Measurements include: age, cigarette smoking, BP, cholesterol, height, weight, job grade –Outcomes of interest: all-cause mortality at 10 years  logistic regression

23 Whitehall 1 Systolic blood pressure Deviance difference in comparison to a straight line for FP(1) and FP(2) models

24 Similar fit of several functions

25 Presentation of models for continuous covariates The function + 95% CI gives the whole story Functions for important covariates should always be plotted In epidemiology, sometimes useful to give a more conventional table of results in categories This can be done from the fitted function

26 Whitehall 1 Systolic blood pressure Odds ratio from final FP(2) model LogOR= 2.92 – 5.43X -2 –14.30* X –2 log X Presented in categories

27 Whitehall 1 MFP analysis No variables were eliminated by the MFP algorithm Assuming a linear function weight is eliminated by backward elimination

28 Detecting predictive factors (interaction with treatment) Don’t investigate effects in separate subgroups! Investigation of treatment/covariate interaction requires statistical tests Care is needed to avoid over-interpretation Distinguish two cases: -Hypothesis generation: searching several interactions -Specific predefined hypothesis Interactions Motivation – I

29 Continuous by continuous interactions usually linear by linear product term not sensible if main effect is non-linear mismodelling the main effect may introduce spurious interactions Motivation - II

30 (Searching for) treatment – covariate interaction Most popular approach -Treatment effect in separate subgroups -Has several problems (Assman et al 2000) Test of treatment/covariate interaction required -For `binary`covariate standard test for interaction available Continuous covariate -Often categorized into two groups  loss of information, loss of power  which cutpoint?

31 Interactions with treatment - MFPI Have one continuous factor X of interest Use other prognostic factors to build an adjustment model, e.g. by MFP Interaction part – with or without adjustment Find best FP2 transformation of X with same powers in each treatment group LRT of equality of reg coefficients Test against main effects model(no interaction) based on  2 with 2df Interpretation predefined hypothesis or hypothesis searching?

32 RCT: Metastatic renal carcinoma Comparison of MPA with interferon N = 347, 322 Death

33 Is the treatment effect similar in all patients? Sensible questions? -Yes, from our point of view Ten factors available for the investigation of treatment – covariate interactions  Interaction with WCC is significant Overall: Interferon is better (p<0.01)

34 Treatment effect function for WCC Only a result of complex (mis-)modelling? MFPI

35 Treatment effect in subgroups defined by WCC HR (Interferon to MPA; adjusted values similar) overall: 0.75 (0.60 – 0.93) I : 0.53 (0.34 – 0.83) II : 0.69 (0.44 – 1.07) III : 0.89 (0.57 – 1.37) IV : 1.32 (0.85 –2.05) Does the MFPI model agree with the data? Check proposed trend

36 Continuous by continuous interactions MFPIgen Have Z 1, Z 2 continuous and X confounders Apply MFP to X, Z 1 and Z 2, forcing Z 1 and Z 2 into the model. –FP functions f 1 (Z 1 ) and f 2 (Z 2 ) are selected for Z 1 and Z 2 Often f 1 (Z 1 ) and/or f 2 (Z 2 ) are linear Add term f 1 (Z 1 )* f 2 (Z 2 ) to the model chosen and use LRT for test of interaction Check (graphically) interactions for artefacts Check all pairs of continuous variables for an interaction Use forward stepwise if more than one interaction remains Low significance level for interactions

37 Interactions – continuous by continuous Whitehall 1 Consider only age and weight Main effects: age – linear weight – FP2 (-1,3) Interaction? Include age*weight -1 + age*weight 3 into the model LRT: χ 2 = 5.27 (2df, p = 0.07)  no (strong) interaction

38 Erroneously assume that the effect of weight is linear Interaction? Include age*weight into the model LRT: χ 2 = 8.74 (1df, p = 0.003)  intercation appears higly significant

39 Model check: categorize age in (equal sized) groups (e.g. 4 groups) Compute running line smooth of the binary outcome on weight in each group Plot results for each group Interactions: checking the model

40 Whitehall 1: check of age  weight interaction – 4 subgroups for age

41 Interpreting the plot Running line smooth are roughly parallel across age groups  no (strong) interactions Erroneously assume that the effect of weight is linear  estimated slopes of weight in age-groups indicate strong qualitative interaction between age und weight

42 Whitehall 1: 7 variables – any interactions? P-values for two-way interactions from MFPIgen * FP transformations  chol  age highly significant, but needs checking

43 Software sources MFP Most comprehensive implementation is in Stata –Command mfp is part since Stata 8 (now Stata 10) Versions for SAS and R are available –SAS –R version available on CRAN archive mfp package Extensions to investigate interactions So far only in Stata

44 Concluding comments – MFP FPs use full information - in contrast to a priori categorisation FPs search within flexible class of functions (FP1 and FP models) MFP is a well-defined multivariate model-building strategy – combines search for transformations with BE Important that model reflects medical knowledge, e.g. monotonic / asymptotic functional forms

45 Towards recommendations for model-building by selection of variables and functional forms for continuous predictors under several assumptions IssueRecommendation Variable selection procedureBackward elimination; significance level as key tuning parameter, choice depends on the aim of the study Functional form for continuous covariates Linear function as the 'default', check improvement in model fit by fractional polynomials. Check derived function for undetected local features Extreme values or influential pointsCheck at least univariately for outliers and influential points in continuous variables. A preliminary transformation may improve the model selected. For a proposal see R & S 2007 Sensitivity analysisImportant assumptions should be checked by a sensitivity analysis. Highly context dependent Check of model stabilityThe bootstrap is a suitable approach to check for model stability Complexity of a predictorA predictor should be 'as parsimonious as possible' Sauerbrei et al. SiM 2007

46 Interactions Interactions are often ignored by analysts Continuous  categorical has been studied in FP context because clinically very important Continuous  continuous is more complex Interaction with time important for long-term FU survival data

47 MFP extensions MFPI – treatment/covariate interactions MFPIgen – interaction between two continuous variables MFPT – time-varying effects in survival data

48 Summary Getting the big picture right is more important than optimising aspects and ignoring others strong predictors strong non-linearity strong interactions strong non-PH in survival model

49 Harrell FE jr. (2001): Regression Modeling Strategies. Springer. Royston P, Altman DG. (1994): Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling (with discussion). Applied Statistics, 43: Royston P, Ambler G, Sauerbrei W. (1999): The use of fractional polynomials to model continuous risk variables in epidemiology. International Journal of Epidemiology, 28: Royston P, Sauerbrei W. (2004): A new approach to modelling interactions between treatment and continuous covariates in clinical trials by using fractional polynomials. Statistics in Medicine, 23: Royston P, Sauerbrei W. (2007): Improving the robustness of fractional polynomial models by preliminary covariate transformation: a pragmatic approach. Computational Statistics and Data Analysis, 51: Royston P, Sauerbrei W (2008): Multivariable Model-Building - A pragmatic approach to regression analysis based on fractional polynomials for continuous variables. Wiley. Royston P, Sauerbrei W (2008): Two techniques for investigating interactions between treatment and continuous covariates in clinical trials. Stata Journal, to appear. Sauerbrei W. (1999): The use of resampling methods to simplify regression models in medical statistics. Applied Statistics, 48, Sauerbrei W, Meier-Hirmer C, Benner A, Royston P. (2006): Multivariable regression model building by using fractional polynomials: Description of SAS, STATA and R programs. Computational Statistics & Data Analysis, 50: Sauerbrei W, Royston P. (1999): Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. Journal of the Royal Statistical Society A, 162: Sauerbrei W, Royston P, Binder H (2007): Selection of important variables and determination of functional form for continuous predictors in multivariable model building. Statistics in Medicine, 26: Sauerbrei W, Royston P, Zapien K. (2007): Detecting an interaction between treatment and a continuous covariate: a comparison of two approaches. Computational Statistics and Data Analysis, 51: References