Regression diagnostics

Slides:



Advertisements
Similar presentations
Simple linear models Straight line is simplest case, but key is that parameters appear linearly in the model Needs estimates of the model parameters (slope.
Advertisements

Apr-15H.S.1 Stata: Linear Regression Stata 3, linear regression Hein Stigum Presentation, data and programs at: courses.
Qualitative predictor variables
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
/k 2DS00 Statistics 1 for Chemical Engineering lecture 4.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Logistic Regression Example: Horseshoe Crab Data
Quantitative Techniques
Repeated Measures, Part 3 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
Simple Linear Regression 1. Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their.

Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Lecture 20 Simple linear regression (18.6, 18.9)
Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:
Regression Diagnostics Checking Assumptions and Data.
Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at: courses.
Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at:
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Finding help. Stata manuals You have all these as pdf! Check the folder /Stata12/docs.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Dealing with data All variables ok? / getting acquainted Base model Final model(s) Assumption checking on final model(s) Conclusion(s) / Inference Better.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Multiple regression.
A first order model with one binary and one quantitative predictor variable.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
Lab 4 Multiple Linear Regression. Meaning  An extension of simple linear regression  It models the mean of a response variable as a linear function.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
A radical view on plots in analysis
Chapter 15 Multiple Regression Model Building
Advanced Quantitative Techniques
Advanced Quantitative Techniques
Correlation, Bivariate Regression, and Multiple Regression
Regression Analysis Simple Linear Regression
Chapter 12: Regression Diagnostics
A statistical package for epidemiologists
بحث في التحليل الاحصائي SPSS بعنوان :
Day 7 Linear Regression.
I271B Quantitative Methods
Diagnostics and Transformation for SLR
Introduction to analysis DAGitty
Presentation, data and programs at:
The greatest blessing in life is
Solution 9 1. a) From the matrix plot, 1) The assumption about linearity seems ok; 2).The assumption about measurement errors can not be checked at this.
Stata 9, Summing up.
Presentation, data and programs at:
Standard Statistical analysis Linear-, logistic- and Cox-regression
Adequacy of Linear Regression Models
Adequacy of Linear Regression Models
Adequacy of Linear Regression Models
Adequacy of Linear Regression Models
Diagnostics and Transformation for SLR
Learning outcomes By the end of this session you should know about:
Exercise 1: Gestational age and birthweight
Exercise 1: Gestational age and birthweight
Presentation transcript:

Regression diagnostics Hein Stigum Presentation, data and programs at: http://folk.uio.no/heins/Talks/ Jan-19 Jan-19 H.S. H.S. 1

Agenda Linear regression diagnostics (Logistic regression) Assumtions Robust results (Logistic regression) (Poisson regression) Time: Linear: 50-60 min Logistic, binary, conditional: 60 min 2 January 2019 Jan-19 H.S. H.S. 2

Birth weight by gestational age Linear regression Birth weight by gestational age Jan-19 2 January 2019 H.S. H.S. 3

Workflow Scatterplots Bivariate analysis Regression Model fitting Cofactors in/out Interactions Test of assumptions Independent errors Linear effects Constant error variance Influence (robustness) Model fitting Adjust if cofactor is confounder, problems cofacor has missing, cofactor has error, cofactor is in causal path Test of assumptions Stata: Estimation remains in memory. Post estimation commands Normally do assumtions first, then influence. Note leverage value of outlier 2 January 2019 Jan-19 H.S. H.S. 4

Scatterplot Jan-19 2 January 2019 H.S. H.S. 5 pregnancy=280 days is normal N=518 Jan-19 2 January 2019 H.S. H.S. 5

Results Outcome: birthweight Covariates: gestational age, sex, parity Model: linear regression OBS: synthetic data Study birthweigth Vary with sex, mothers age and espesially gest age (length of pregnancy in days) Descriptive (bivar analysis) Expected =3531 for boys, parity=0 gest=280 To get average must take 3530-166/2+2/5*230+2*17=3580 Jan-19 H.S.

Model diagnostics Model Assumptions Robustness Independent errors (residuals) Linear effects Constant error variance Robustness Y must be normal? Normal Y-skewed X Skewed Y-Normal X 2 January 2019 Jan-19 H.S. H.S. 7

Checking assumptions Jan-19 H.S.

1. Independent residuals No diagnostic tool Possible violations Pupils nested in schools: weak correlations Repeated measurement: strong correlations Models Adjust for clustering Linear mixed models GEE Birth weight example: possible violations: repeated births by the same mother Jan-19 H.S.

2. Linear effects Save residuals and predicted values Plot resid vs pred If non-linear: Plot resid vs cont. vars Add square term or cut in categories Add gest^2, linearity now OK Jan-19 H.S.

Linear effect test Model 1: only linear terms Significant means non-linearity Model 2: linear terms+square term Jan-19 H.S.

3. Constant residual variance Plot resid vs pred If non-constant variance: Robust regression Weighted regression SPSS: log transform, use poisson regression look for missing cofactor Jan-19 H.S.

Constant variance test Significant means non-const. var. Jan-19 H.S.

Weighted regression Estimate residual variance Weights=1/variance Effects Takes care of heteroskedasticity “robustification” Jan-19 H.S.

Summary of assumptions Dependent residuals Mixed models: xtmixed Non linear effects gen gest2=gest^2 regress weigth gest gest2 sex Non-constant variance regress weigth gest sex, robust Linear mixed models: xtmixed 2 January 2019 Jan-19 H.S. H.S. 15

Checking robustness , Measures of influence Jan-19 H.S.

Measures of influence Measure change in: Predicted (y) Deviance Remove obs 1, see change remove obs 2, see change Measure change in: Predicted (y) Deviance Coefficients (beta) 2 January 2019 Jan-19 H.S. H.S. 17

Influence idea Outlierness Leverage Influence Residuals Distance from x-mean Influence Combination Jan-19 H.S.

Leverage versus residuals2 “Adjusted” scatterplot Added variable plot (partial regression leverage) Look at: 321: high lev, med resid 111: low lev, high resid Lack the ability to see 2 points with opposite large effects, as in delta beta Jan-19 H.S.

Delta beta (for gestational age) Pro: Advantage: direct measure of the coef we are interested in, both pos and neg directions Con: Disadvatage: one measure for each covariate Scaled? Jan-19 H.S.

Delta fitted value, Dfits Jan-19 H.S.

Summary: Robustness, influence Linear regression sensitive! Look for influential points Leverage versus residual plots Added variable plots Delta-beta Rerun regression without influential points and look for change in: coefficients constant term p-values Found influential point in a dataset with N=30 000! (MoBa, exercise) Jan-19 H.S.

Logistic, Poisson regression Assumptions Independent errors as before Linear effects as before Constant error variance no! Robustness Linear not robust! Poisson medium robust Logistic fairly robust Jan-19 H.S.