Multiple Regression.

Slides:



Advertisements
Similar presentations
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Advertisements

Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Quantitative Data Analysis: Hypothesis Testing
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Chapter 10 Simple Regression.
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
Chapter 11 Multiple Regression.
Topic 3: Regression.
Multiple Linear Regression
Multiple Regression and Correlation Analysis
Ch. 14: The Multiple Regression Model building
Correlation and Regression Analysis
Multiple Regression Dr. Andy Field.
Simple Linear Regression Analysis
Multiple Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Alcohol consumption and HDI story TotalBeerWineSpiritsOtherHDI Lifetime span Austria13,246,74,11,60,40,75580,119 Finland12,524,592,242,820,310,80079,724.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Examining Relationships in Quantitative Research
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
11 Chapter 12 Quantitative Data Analysis: Hypothesis Testing © 2009 John Wiley & Sons Ltd.
14- 1 Chapter Fourteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
Chapter 16 Data Analysis: Testing for Associations.
Chapter 13 Multiple Regression
Lecture 4 Introduction to Multiple Regression
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Regression Analysis © 2007 Prentice Hall17-1. © 2007 Prentice Hall17-2 Chapter Outline 1) Correlations 2) Bivariate Regression 3) Statistics Associated.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Stats Methods at IC Lecture 3: Regression.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard)   Week 5 Multiple Regression  
Chapter 14 Introduction to Multiple Regression
BINARY LOGISTIC REGRESSION
Multiple Regression Prof. Andy Field.
The Correlation Coefficient (r)
Multiple Regression Analysis and Model Building
Essentials of Modern Business Statistics (7e)
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
Multiple Regression and Model Building
Chapter 12: Regression Diagnostics
Lecture 4 Statistical analysis
Stats Club Marnie Brennan
CHAPTER 29: Multiple Regression*
Multiple Regression A curvilinear relationship between one variable and the values of two or more other independent variables. Y = intercept + (slope1.
Multiple Regression Chapter 14.
Product moment correlation
Chapter Fourteen McGraw-Hill/Irwin
Introduction to Regression
The Correlation Coefficient (r)
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Multiple Regression

What is Multiple Regression? y = b0 + b1x1 + b2x2.... + bkxk.  One dependent variable Several independent variables x1,x2..xk

Assumptions Quantitative data Independent observation Each value of IV, the distribution of the DV must be normal The variance of the distribution of the DV should be constant for all values of the IV (Homoscedasticity) The relationship between the DV and each IV should be linear (Multicollinearity) Limited linear correlation among the dependent variables Residuals of the predicted DV value, should be random

Using SWI data set DV: Trust towards social workers (Q2) 6 IV: Social workers make people rely on welfare (Q1c) Social workers bring hopes to those in averse situation (Q1n) Social workers help the disadvantaged (Q1m) Age (Q13), Family income (Q17), Sex (Q12)

A model (or equation) involves a DV and different combination of the IVs. With 5 independent variables, there can be models with 0, 1, 2, 3, 4, or 5 IVs. There can be as many as 32 models

Method Enter Include all the independent variables in the model. Some variables might have little contribution to the model and they’ll affect the coefficient of the equation (or the model). Other methods of inclusion (Forward, Stepwise) include only those with greater contributions (changes in R2) and (usually the case) b (coefficient) significantly differs from 0.

Null hypotheses The ANOVA table above is used to test several equivalent null hypotheses: there is no linear relationship in the population between the dependent variable and the independent variables, All of the population partial regression coefficients are 0, and that the population value for multiple R2 is 0. So the alternative only say at least one partial coefficient is not 0.

Null hypothesis: The population partial regression coefficient for a variable is 0 using the t statistic and its observed significance level.

The equation: Trust of social workers = 3.3 + .759 x Q1n – .233 x Q1c + 0.375 x Q1n - 0.145 x Q17

Understanding partial correlation Removes from both the given IV and the DV all variance accounted for by the control IVs, then correlates the unique component of the IV with the unique component of the DV. It is the correlation between each IV and the DV after the variance explained by other IVs (called the controlling variables) was removed. We will say: the partial correlation of an IV is its correlation with the DV after the influence of other IVs in the model is being controlled.

Iv1 Iv2 Dv

Method Stepwise: This method enters IVs one each time by selecting the one that is significant at 0.05 level (you can change it) and made the most changes in R2 the model. It does one more thing, when an IV enters into the model, it checks whether any of the existing variables will have its partial correlation reduced and the significant level became 0.1 (this is the default, you can change it by yourself). If so, it get excluded from the model

Examining the standardized residual PRED (the standardized predicted values of the dependent variable based on the model). These values are standardized forms of the values predicted by the model. ZRESID (the standardized residuals, or errors). These values are the standardized differences between the observed data and the values that the model predicts

Multicollinearity The existence of a high degree of linear correlation amongst two or more independent variables in a multiple regression model. In the presence of multicollinearity, it will be difficult to assess the effect of the independent variables on the dependent variable. A tolerance of less than 0.20 and/or a VIF of 5 and above indicates a multicollinearity problem (Wiki)

Data Transformation Use the dataset (Transformation.sav) Plot the scatter-plot for Life-satisfaction (Y- axis) against Income (X-axis) (Graphs/Legacy dialogs/Scatter-dot)

After transformation monthly income into LnIncome, we obtain a new scatter-plot.

For a curve like this, the best way is to transform the independent variable (dosage of drug) into its inverse (1/x). With the new variable in the X-axis, the plot now looks like this.

Dummy variable If you have a categorical (i.e. nominal) variable that you want it to be included in your model as an independent variable, you need to recode it into a number of dummy variables. For a dichotomous variable, such as gender (male =1, female =0), then gender = 1 is meaning that the case is a male. You can rename it as male (1=male, 0=not male). Male is called a dummy variable. In a multiple regression equation, we can have something like this:

Marital satisfaction (MS) = a + b1 Year married + b2 Male + b3 Income Male MS = a + b1 Year married + b2 (1) + b3 Income Female MS = a + b1 Year married + b2 (0) + b3 Income

For place of residence (HK, Kowloon and NT) Suppose you choose HK as a reference group Two dummy variables: Kowloon and NT For a person living in Kowloon, enter the value of Kowloon as 1, and NT, 0. If a person lives in NT, then 0 for Kowloon and 1 for NT. For a person lives in HK, then both Kowloon and NT will be 0.

MS = a + b1 Year married + b2 Male + b3 Income + b4 Kowloon + b5 NT.

http://www.stattucino.com/berrie/dsl/regres sion/regression.html

Null hypothesis for the ANOVA test in regression: There is no linear relationship in the population between the dependent variable and the independent variables Alternative:

Null hypothesis for the t-test of regression coefficient (b) The slope of the regression line fitting two variables in the population is equal to zero OR For the regression equation: y = a + bx H0: b = 0 Ha: b ≠ 0