DIY fractional polynomials Patrick Royston MRC Clinical Trials Unit, London 10 September 2010.

Slides:



Advertisements
Similar presentations
The %LRpowerCorr10 SAS Macro Power Estimation for Logistic Regression Models with Several Predictors of Interest in the Presence of Covariates D. Keith.
Advertisements

Topic 15: General Linear Tests and Extra Sum of Squares.
Chapter 10 Curve Fitting and Regression Analysis
Chapter 6 (p153) Predicting Future Performance Criterion-Related Validation – Kind of relation between X and Y (regression) – Degree of relation (validity.
Objectives (BPS chapter 24)
Detecting an interaction between treatment and a continuous covariate: a comparison between two approaches Willi Sauerbrei Institut of Medical Biometry.
Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Taking the pain out of looping and storing Patrick Royston Nordic and Baltic Stata Users’ meeting, Stockholm, 11 November 2011.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Chapter 13 Multiple Regression
Multiple regression analysis
Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont.
Making fractional polynomial models more robust Willi Sauerbrei Institut of Medical Biometry and Informatics University Medical Center Freiburg, Germany.
Flexible modeling of dose-risk relationships with fractional polynomials Willi Sauerbrei Institut of Medical Biometry and Informatics University Medical.
Chapter 12 Multiple Regression
Lecture 4: Correlation and Regression Laura McAvinue School of Psychology Trinity College Dublin.
Clustered or Multilevel Data
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Business Statistics - QBM117 Interval estimation for the slope and y-intercept Hypothesis tests for regression.
Multiple Linear Regression
Longitudinal Data Analysis: Why and How to Do it With Multi-Level Modeling (MLM)? Oi-man Kwok Texas A & M University.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Multivariable model building with continuous data Willi Sauerbrei Institut of Medical Biometry and Informatics University Medical Center Freiburg, Germany.
Correlation and Regression Analysis
Relationships Among Variables
Linear Regression and Correlation Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of.
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Review Guess the correlation. A.-2.0 B.-0.9 C.-0.1 D.0.1 E.0.9.
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :
Introduction to Multilevel Modeling Using SPSS
Regression and Correlation Methods Judy Zhong Ph.D.
Statistics for the Social Sciences Psychology 340 Fall 2013 Tuesday, November 19 Chi-Squared Test of Independence.
Building multivariable survival models with time-varying effects: an approach using fractional polynomials Willi Sauerbrei Institut of Medical Biometry.
Modelling continuous exposures - fractional polynomials Willi Sauerbrei Institut of Medical Biometry and Informatics University Medical Center Freiburg,
Inference for regression - Simple linear regression
Simple Linear Regression
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Assessing Survival: Cox Proportional Hazards Model
STA302/ week 911 Multiple Regression A multiple regression model is a model that has more than one explanatory variable in it. Some of the reasons.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-5 Multiple Regression.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Environmental Modeling Basic Testing Methods - Statistics III.
[1] Simple Linear Regression. The general equation of a line is Y = c + mX or Y =  +  X.  > 0  > 0  > 0  = 0  = 0  < 0  > 0  < 0.
Chapter 5 Multilevel Models
1 Experimental Statistics - week 13 Multiple Regression Miscellaneous Topics.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
Inferential Statistics Psych 231: Research Methods in Psychology.
Stats Methods at IC Lecture 3: Regression.
Statistics 200 Lecture #5 Tuesday, September 6, 2016
Regression 10/29.
Multivariable regression models with continuous covariates with a practical emphasis on fractional polynomials and applications in clinical epidemiology.
Applied Statistical Analysis
Migration and the Labour Market
Statistical Inference about Regression
3.2. SIMPLE LINEAR REGRESSION
Presentation transcript:

DIY fractional polynomials Patrick Royston MRC Clinical Trials Unit, London 10 September 2010

Overview Introduction to fractional polynomials Going off-piste: DIY fractional polynomials Examples

3

Fractional polynomial models A fractional polynomial of degree 1 with power p 1 is defined as FP1 = β 1 X p 1 A fractional polynomial of degree 2 with powers (p 1,p 2 ) is defined as FP2 = β 1 X p 1 + β 2 X p 2 Powers (p 1,p 2 ) are taken from a predefined set S = {2,  1,  0.5, 0, 0.5, 1, 2, 3} where 0 means log X  Also, there are ‘repeated’ powers FP2 models Example: FP1 [power 0.5] = β 1 X 0.5 Example: FP2 [powers (0.5, 3)] = β 1 X β 2 X 3 Example: FP2 [powers (3, 3)] = β 1 X 3 + β 2 X 3 lnX

Some examples of fractional polynomial (FP2) curves Royston P, Altman DG (1994) Applied Statistics 43:

FP analysis for the prognostic effect of age in breast cancer

FP function selection procedure Simple functions are preferred. More complicated functions are accepted only if the fit is much better Effect of age significant at 5% level? χ 2 dfP-value Any effect? Best FP2 versus null Linear function suitable? Best FP2 versus linear FP1 sufficient? Best FP2 vs. best FP

Fractional polynomials in Stata fracpoly command Basic syntax:. fracpoly [, fp_options]: regn_cmd [yvar] xvar1 [xvars] … xvar1 is a continuous predictor which may have a curved relationship with yvar xvars are other predictors, all modelled as linear Can use the fp_option compare to compare the fit of different FP models uses the FP function selection procedure

Example (auto data) fracpoly, compare: regress mpg displacement Fractional polynomial model comparisons: displacement df Deviance Res. SD Dev. dif. P (*) Powers Not in model Linear m = m = (*) P-value from deviance difference comparing reported model with m = 2 model Show FP1 and FP2 models in Stata (+ fracplot)

But what if fracpoly can’t fit my model … ? fracpoly supports only some of Stata’s rich set of regression-type commands Provided we know what the command we want to fit looks like with a transformed covariate, we can fit an FP model to the data We just create the necessary transformed covariate values, fit the model using them, and assess the fit A new, simple command fracpoly_powers helps by generating strings (local macros) with the required powers:. fracpoly_powers [, degree (#) s (list_of_powers) ]

Fitting an FP2 model in the auto example // Store FP2 powers in local macros fracpoly_powers, degree(2) local np = r(np) forvalues j = 1 / `np' { local p`j' `r(p`j')' } // Compute deviance for each model with covariate displacement local x displacement local y mpg local devmin 1e30 quietly forvalues j = 1 / `np' { fracgen `x' `p`j'', replace regress `y' `r(names)' local dev = -2 * e(ll) if `dev' < `devmin' { local pbest `p`j'' local devmin `dev' } di "Best model has powers `pbest', deviance = " `devmin'

A real example: modelling fetal growth Prospective longitudinal study of n = 50 pregnant women There are about 6 repeated measurements on each fetus at different gestational ages ( gawks ) gawks = gestational age in weeks Wish to model how y = log fetal abdominal circumference changes with gestational age There is considerable curvature!

The raw data

A mixed model for fetal growth Multilevel (mixed) model to fit this relationship:. xtmixed y FP(gawks) || id: FP(gawks), covariance(unstructured) But how do we implement “ FP(gawks) ” here? We want the best-fitting FP function of gawks, with random effects for the parameters (β’s) of the FP model

Fitting an FP2 mixed model to the fetal AC data [First run fracpoly_powers to create local macros with powers] // Compute deviance for each FP model with covariate gawks gen x = gawks gen y = ln(ac) local devmin 1e30 forvalues j = 1 / `np' { qui fracgen x `p`j'', replace adjust(mean) qui xtmixed y `r(names)' || id: `r(names)', /// nostderr covariance(unstructured) local dev = -2 * e(ll) if `dev' < `devmin' { local p `p`j'' local devmin `dev' } di "powers = `p`j''" _col(20) " deviance = " %9.3f `dev' } di _n "Best model has powers `p', deviance = " `devmin'

Plots of some results

An “ignorant” example! I know almost nothing about “seemingly unrelated regression” (Stata’s sureg command) It fits a set of linear regression models which have correlated error terms The syntax therefore has a set of “equations”. sureg (depvar1 varlist1) (depvar2 varlist2)... (depvarN varlistN) There may be non-linearities lurking in these “equations” How can we fit FP models to varlist1, varlist2, … ?

Example: modelling learning scores Stata FAQ from UCLA ( ): What is seemingly unrelated regression and how can I perform it in Stata? Example: High School and Beyond study

Example: modelling learning scores Contains data from hsb2.dta obs: 200 highschool and beyond (200 cases) vars: 11 5 Jul :23 size: 9,600 (99.9% of memory free) storage display value variable name type format label variable label id float %9.0g female float %9.0g fl race float %12.0g rl ses float %9.0g sl schtyp float %9.0g scl type of school prog float %9.0g sel type of program read float %9.0g reading score write float %9.0g writing score math float %9.0g math score science float %9.0g science score socst float %9.0g social studies score [It is unclear to me what “ses” (low, middle, high) is]

Example (ctd.) As an example, suppose we wish to model 2 outcomes ( read, math ) as predicted by “ socst female ses ” and “ science female ses ” using sureg as follows:. sureg (read socst female ses) (math science female ses) Are there non-linearities in read as a function of socst ? In math as a function of science ? For simplicity here, will restrict ourselves to FP1 functions of socst and science not necessary in principle We fit the 8 × 8 = 64 FP1 models and look for the best- fitting combination

Stata gen x1 = socst gen x2 = science gen y1 = read gen y2 = math local devmin 1e30 forvalues j = 1 / `np' { qui fracgen x1 `p`j'', replace adjust(mean) local x1vars `r(names)' forvalues k = 1 / `np' { qui fracgen x2 `p`k'', replace adjust(mean) local x2vars `r(names)' qui sureg (y1 `x1vars' female ses) (y2 `x2vars' female ses) local dev = -2 * e(ll) if `dev' < `devmin' { local px1 `p`j'' local px2 `p`k'' local devmin `dev' } [Run fpexample3.do in Stata]

Comments The results suggest that there is indeed curvature in both relationships Can reject the null hypothesis of linearity at the 1% significance level FP1 vs linear: χ 2 = (2 d.f.), P = Shows the importance of considering non-linearity

read as a function of socst (adjusted female ses )

math as a function of science (adjusted female ses )

Conclusions Fractional polynomial models are a simple yet very useful extension of linear functions and ordinary polynomials If you are willing to do some straightforward do-file programming, you can apply them in a bespoke manner to a wide range of Stata regression-type commands and get useful results For (much) more, see Royston & Sauerbrei (2008) book