Apr-15H.S.1 Stata: Linear Regression Stata 3, linear regression Hein Stigum Presentation, data and programs at: courses.

Slides:



Advertisements
Similar presentations
Qualitative predictor variables
Advertisements

Kin 304 Regression Linear Regression Least Sum of Squares
1 1 Chapter 5: Multiple Regression 5.1 Fitting a Multiple Regression Model 5.2 Fitting a Multiple Regression Model with Interactions 5.3 Generating and.
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Research Support Center Chongming Yang
/k 2DS00 Statistics 1 for Chemical Engineering lecture 4.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Feb 21, 2006Lecture 6Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing.
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
Linear Regression MARE 250 Dr. Jason Turner.
Regression Diagnostics - I
Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:
Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at: courses.
Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at:
Analysis of Covariance Goals: 1)Reduce error variance. 2)Remove sources of bias from experiment. 3)Obtain adjusted estimates of population means.
Business Statistics - QBM117 Statistical inference for regression.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
A Longitudinal Study of Maternal Smoking During Pregnancy and Child Height Author 1 Author 2 Author 3.
Classification and Prediction: Regression Analysis
Advantages of Multivariate Analysis Close resemblance to how the researcher thinks. Close resemblance to how the researcher thinks. Easy visualisation.
Regression and Correlation Methods Judy Zhong Ph.D.
Correlation and Covariance. Overview Continuous Categorical Histogram Scatter Boxplot Predictor Variable (X-Axis) Height Outcome, Dependent Variable (Y-Axis)
Simple Linear Regression
MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression.
Multivariate Analysis. One-way ANOVA Tests the difference in the means of 2 or more nominal groups Tests the difference in the means of 2 or more nominal.
Regression. Population Covariance and Correlation.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Regression Regression relationship = trend + scatter
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Biostat 200 Lecture Simple linear regression Population regression equationμ y|x = α +  x α and  are constants and are called the coefficients.
MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.
Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Linear Discriminant Analysis (LDA). Goal To classify observations into 2 or more groups based on k discriminant functions (Dependent variable Y is categorical.
Statistical Methodology for the Automatic Confidentialisation of Remote Servers at the ABS Session 1 UNECE Work Session on Statistical Data Confidentiality.
Overview of our study of the multiple linear regression model Regression models with more than one slope parameter.
Multiple regression.
Linear Models Alan Lee Sample presentation for STATS 760.
A first order model with one binary and one quantitative predictor variable.
Regression Analysis1. 2 INTRODUCTION TO EMPIRICAL MODELS LEAST SQUARES ESTIMATION OF THE PARAMETERS PROPERTIES OF THE LEAST SQUARES ESTIMATORS AND ESTIMATION.
Mar-16H.S.1Mar-16H.S.1 Stata 5, Mixed Models Not finished Hein Stigum Presentation, data and programs at:
Lecture 3 (Chapter 4). Linear Models for Longitudinal Data Linear Regression Model (Review) Ordinary Least Squares (OLS) Maximum Likelihood Estimation.
Simple linear regression. What is simple linear regression? A way of evaluating the relationship between two continuous variables. One variable is regarded.
Simple linear regression. What is simple linear regression? A way of evaluating the relationship between two continuous variables. One variable is regarded.
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
A radical view on plots in analysis
Correlation, Bivariate Regression, and Multiple Regression
B&A ; and REGRESSION - ANCOVA B&A ; and
Ch12.1 Simple Linear Regression
Chapter 12: Regression Diagnostics
A statistical package for epidemiologists
Simple Linear Regression
I271B Quantitative Methods
Introduction to analysis DAGitty
Multiple Regression A curvilinear relationship between one variable and the values of two or more other independent variables. Y = intercept + (slope1.
Presentation, data and programs at:
The greatest blessing in life is
Presentation, data and programs at:
Standard Statistical analysis Linear-, logistic- and Cox-regression
Regression diagnostics
Regression is the Most Used and Most Abused Technique in Statistics
12 Inferential Analysis.
Regression Assumptions
Correlation and Covariance
Microeconometric Modeling
Regression Assumptions
Presentation transcript:

Apr-15H.S.1 Stata: Linear Regression Stata 3, linear regression Hein Stigum Presentation, data and programs at: courses

SYNTHETIC DATA EXAMPLE Birth weight by gestational age Apr-15H.S.2

Apr-15H.S.3 Linear regression Birth weight by gestational age

Apr-15H.S.4 Regression idea

Apr-15H.S.5 Model, measure and assumptions Model Association measure  1 = change in y for one unit increase in x 1 Assumptions –Independent errors –Linear effects –Constant error variance Robustness –influence

Apr-15H.S.6 Association measure Model: Start with: Hence:

Apr-15H.S.7 Purpose of regression Estimation –Estimate association between outcome and exposure adjusted for other covariates Prediction –Use an estimated model to predict the outcome given covariates in a new dataset

Outcome distributions by exposure Apr-15H.S.8 Linear regression Quantile regression or cutoff, logistic regression Linear regression or transform, linear regression

Apr-15H.S.9 Workflow DAG Scatter- and densityplots Bivariate analysis Regression –Model estimation –Test of assumptions Independent errors Linear effects Constant error variance –Robustness Influence E gest age D birth weight C2 education C1 sex

Scatter and density plots Scatter of birth weight by gestational age Distribution of birth weight for low/high gestational age Apr-15H.S.10 Look for deviations from linearity and outliers Look for shift in shape

Apr-15H.S.11 Syntax Estimation –regress y x1 x2 linear regression –regress y c.age i.sex continuous age, categorical sex –regress y c.age##i.sex main+interaction Compare models –estimates store m1 save model –estimates table m1 m2 compare coefficients –estimates stats m1 m2 compare model fit Post estimation –predict res, residuals predict residuals

Apr-15H.S.12 Model 1: outcome+exposure regress bw gestcrude model estimates store m1store model results

Apr-15H.S.13 Model 2 and 3: Add covariates Estimate association: m1 is biased, m2=m3 Prediction: m3 is best regress bw gest i.educsexadd covariates estimates table m1 m2 m3compare coefs estimates stats m1 m2 m3compare fit m3 more precise?

Factor (categorical) variables Variable –educ = 1, 2, 3 for low, medium and high education Built in –i.educuse educ=1 as base (reference) –ib3.educuse educ=3 as base (reference) Manual “dummies” –educ=1 as base, make dummies for 2 and 3 –generate Medium =(educ==2) if educ<. –generate High =(educ==3) if educ<. Apr-15H.S.14

Create meaningful constant Expected birth weight at: gest= 0, educ=1, sex=0, not meaningful gest=280, educ=1, sex=0 Expected birth weight: Margins: margins, at(gest= 0 educ=1 sex=0) = not meaningful margins, at(gest= 280 educ=1 sex=0) = 3426 Apr-15H.S.15

Results so far Apr-15H.S.16 Would normally check for interaction now!

ASSUMPTIONS Apr-15H.S.17

Apr-15H.S.18 Test of assumptions Assumptions –Independent residuals: –Linear effects: –Constant variance: estat hettest p=0.9 no heteroskedasticity discuss plot residuals versus predicted y predict res, residuals predict pred, xb scatter res pred

Apr-15H.S.19 Violations of assumptions Dependent residuals Use mixed models or GEE Non linear effects Add square term or spline Non-constant variance Use robust variance estimation

ROBUSTNESS Measures of influence Apr-15H.S.20

Apr-15H.S.21 Influence idea

Apr-15H.S.22 Measures of influence Measure change in: –Predicted outcome –Deviance –Coefficients (beta) Delta beta Remove obs 1, see change remove obs 2, see change

Apr-15H.S.23 Leverage versus residuals 2 lvr2plot, mlabel(id) high influence

Delta-beta for gestational age Apr-15H.S.24 dfbeta(gest) scatter _dfbeta_1 id OBS, variable specific If obs nr 370 is removed, beta will change from 17.9 to 18.6

Apr-15H.S.25 Removing outlier regress bw gest i.educ sex if id!=370 est store m4 est table m3 m4, b(%8.1f)

Removing outlier Apr-15H.S.26 Full model N=5000 Outlier removed N=4999 One outlier affected several estimates Final model

Help Linear regression –help regress syntax and options –help regress postestimation dfbeta estat hettest lvr2plot predict margins Apr-15H.S.27

NON-LINEAR EFFECTS bw2 Apr-15H.S.28

bw2: Non-linear effects Apr-15H.S.29 Handle: add polynomial or spline

Non-linear effects: polynomial Apr-15H.S.30 regress bw2 c.gest##c.gest i.educ sex2. order polynomial in gest margins, at(gest=(250(10)310))predicted bw2 by gest marginsplotplot

Non-linear effects: spline Qubic spline Plot Linear spline Apr-15H.S.31 mkspline g=gest, cubic nknots(4)make spline with 4 knots regress bw2 g1 g2 g3 i.educ sexregression with spline gen igest=5*round(gest/5)5-year integer values of gest margins, over(igest)predicted bw by gest marginsplot mkspline g1 280 g2=gestmake linear spline with knot at 280 regress bw2 g1 g2 i.educ sexregression with spline

INTERACTION bw3 Apr-15H.S.32

Interaction definitions Interaction: combined effect of two variables Scale –Linear modelsadditive y=b 0 +b 1 x 1 +b 2 x 2 both x 1 and x 2 = b 1 +b 2 –Logistic, Poisson, Coxmultiplicative Interaction –deviation from additivity (multiplicativity)  –effect of x 1 depends on x 2 Apr-15H.S.33

bw3: Interaction (only linear effects) Add interaction terms Show results Apr-15H.S.34 regress bw3 c.gest##i.sex i.educgest-sex interaction margins, dydx(gest) at(sex=0)effect of gest for boys margins, dydx(gest) at(sex=1)effect of gest for girls

Summing up 1 Build model –regress bw gestcrude model –est store m1store –regress bw gest i.educ sexfull model –est store m2 –est table m1 m2compare coefficients Interaction –regress bw3 c.gest##i.sex i.eductest interaction –margins, dydx(gest) at(sex=0)gest for boys Assumptions –predict res, residualsresiduals –predict pred, xbpredicted –scatter res predplot Apr-15H.S.35

Summing up 2 Non-linearity (linear spline) –mkspline g1 280 g2=gestspline with knot at 280 –regress bw2 g1 g2 i.educ sexregression with spline Robustness –dfbeta(gest)delta-beta –scatter _dfbeta_1 idplot versus id Apr-15H.S.36