Multiple regression refresher Austin Troy NR 245 Based primarily on material accessed from Garson, G. David 2010. Multiple Regression. Statnotes: Topics.

Slides:



Advertisements
Similar presentations
Week 13 November Three Mini-Lectures QMM 510 Fall 2014.
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Kin 304 Regression Linear Regression Least Sum of Squares
Forecasting Using the Simple Linear Regression Model and Correlation
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Chapter 10 Simple Regression.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Lecture 20 Simple linear regression (18.6, 18.9)
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
Chapter 11 Multiple Regression.
(Correlation and) (Multiple) Regression Friday 5 th March (and Logistic Regression too!)
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Ch. 14: The Multiple Regression Model building
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Pertemua 19 Regresi Linier
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Chapter 7 Forecasting with Simple Regression
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
Regression and Correlation Methods Judy Zhong Ph.D.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Chapter 5: Regression Analysis Part 1: Simple Linear Regression.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Ch14: Linear Least Squares 14.1: INTRO: Fitting a pth-order polynomial will require finding (p+1) coefficients from the data. Thus, a straight line (p=1)
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
BUSINESS MATHEMATICS & STATISTICS. Module 6 Correlation ( Lecture 28-29) Line Fitting ( Lectures 30-31) Time Series and Exponential Smoothing ( Lectures.
Chapter 11 REGRESSION Multiple Regression  Uses  Explanation  Prediction.
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Inference for Least Squares Lines
Linear Regression.
Statistics for Managers using Microsoft Excel 3rd Edition
Chapter 12: Regression Diagnostics
Fundamentals of regression analysis
Diagnostics and Transformation for SLR
Regression Analysis Week 4.
CHAPTER 29: Multiple Regression*
Multiple Regression A curvilinear relationship between one variable and the values of two or more other independent variables. Y = intercept + (slope1.
Checking Regression Model Assumptions
PENGOLAHAN DAN PENYAJIAN
Checking the data and assumptions before the final analysis.
Regression Assumptions
Chapter 13 Additional Topics in Regression Analysis
Diagnostics and Transformation for SLR
Regression Assumptions
Presentation transcript:

Multiple regression refresher Austin Troy NR 245 Based primarily on material accessed from Garson, G. David Multiple Regression. Statnotes: Topics in Multivariate Analysis.

Purpose Y (dependent) as function vector of X’s (independent) Y=a + b 1 X 1 + b 2 X 2 + ….+b n X n +e B=0? Each X adds a dimension Multiple X’s: effect of X i controlling for all other X’s.

Assumptions Proper specification of the model Linearity of relationships. Nonlinearity is usually not a problem when the SD of Y is more than SD of residuals. Normality in error term (not Y) Same underlying distribution for all variables Homoscedasticity/Constant variance. Heteroskedacticity may mean omitted interaction effect. Can use weighted least squares regression or transformation No outliers. Leverage statisticsLeverage

Assumptions Interval, continuous, unbounded data Non-simultaneity/recursivity: causality one way Unbounded data Absence of perfect or high partial multicollinearity Population error is uncorrelated with each of the independents. "assumption of mean independence”: mean error doesn’t vary with X Independent observations (absence of autocorrelation) leading to uncorrelated error terms. No spatial/temporal autocorrelation mean population error=0 Random sampling

Outputs of regression Model fit – R 2 = (1 - (SSE/SST)), where SSE = error sum of squares; SST = total sum of squares – Coefficients table: Intercept, Betas, standard errors, t statistics, p values

A simple univariate model

A simple multivariate model

Another example: car price

Addressing multicollinearity Intercorrelation of Xs. When excessive, SE of beta coefficients become large, hard to assess relative importance of Xs. Is a problem when the research purpose includes causal modeling. Increasing samples size can offset Options: – Mean center data – Combine variables into a composite variable. – Remove the most intercorrelated variable(s) from analysis. – Use partial least squares, which doesn’t assume no multicollinearity Ways to check: correlation matrix, Variance inflation Factors. VIF>4 is common rule VIF from last model diasbp.1 age.1 generaldiet.1 exercise.1 drinker However, here is VIF when we regress BMI, age and weight against blood pressure age.1 bmi.1 wt

Addressing nonconstant variance Bottom graph ideal Diagnosed with residual plots (or abs resid plot) Look for funnel shape Generally suggests the need for: – Generalized linear model – transformation, – weighted least squares or – addition of variables (with which error is correlated) Source:

Considerations: Model specification U shape or upside down U suggest nonlinear relationship between Xs and Y. Note: full model residual plots versus partial residual plots Possible transformations: semi-log, log-log, square root, inverse, power, Box- Cox

Considerations: normality Normal Quantile plot Close to normal Population is skewed to the right (i.e. it has a long right hand tail). Heavy tailed populations are symmetric, with more members at greater remove from the population mean than in a Normal population with the same standard deviation.