Analysis of Economic Data

Slides:



Advertisements
Similar presentations
Managerial Economics in a Global Economy
Advertisements

Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Pengujian Parameter Regresi Pertemuan 26 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Ch11 Curve Fitting Dr. Deshi Ye
Simple Linear Regression and Correlation
Multiple Regression [ Cross-Sectional Data ]
Chapter 13 Multiple Regression
Chapter 13 Additional Topics in Regression Analysis
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Note 14 of 5E Statistics with Economics and Business Applications Chapter 12 Multiple Regression Analysis A brief exposition.
Statistics for Business and Economics
1 1 Slide 統計學 Spring 2004 授課教師:統計系余清祥 日期: 2004 年 5 月 4 日 第十二週:複迴歸.
SIMPLE LINEAR REGRESSION
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Linear Regression and Correlation Analysis
Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 2000 LIND MASON MARCHAL 1-1 Chapter Twelve Multiple Regression and Correlation Analysis GOALS When.
Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building.
Multiple Regression and Correlation Analysis
SIMPLE LINEAR REGRESSION
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Multiple Regression Analysis
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
1 1 Slide © 2003 Thomson/South-Western Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Chapter 14 Introduction to Multiple Regression
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
14- 1 Chapter Fourteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
Chapter 13 Multiple Regression
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Business Research Methods
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Chapter 14 Introduction to Multiple Regression
Chapter 20 Linear and Multiple Regression
Correlation and Simple Linear Regression
Essentials of Modern Business Statistics (7e)
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
John Loucks St. Edward’s University . SLIDES . BY.
Statistics for Business and Economics
Multiple Regression Chapter 14.
SIMPLE LINEAR REGRESSION
Chapter Fourteen McGraw-Hill/Irwin
Presentation transcript:

Analysis of Economic Data Dr. Ka-fu Wong ECON1003 Analysis of Economic Data

Multiple Regression and Correlation Analysis Chapter Twelve Multiple Regression and Correlation Analysis GOALS Describe the relationship between two or more independent variables and the dependent variable using a multiple regression equation. Compute and interpret the multiple standard error of estimate and the coefficient of determination. Interpret a correlation matrix. Setup and interpret an ANOVA table. Conduct a test of hypothesis to determine if any of the set of regression coefficients differ from zero. Conduct a test of hypothesis on each of the regression coefficients. l

Multiple Regression Analysis For two independent variables, the general form of the multiple regression equation is: Yi = E(Yi | X1i , X2i )+ ei = b0 + b1 X1i + b2 X2i + ei X1i and X2i are the i-th observation of independent variables. b0 is the Y-intercept. b1 is the net change in Y for each unit change in X1 holding X2 constant. It is called a partial regression coefficient, a net regression coefficient, or just a regression coefficient.

Visualize the multiple linear regression in a plot The simple linear regression model allows for one independent variable, “x” y =b0 + b1x + e y y = b0 + b1x Note how the straight line becomes a plain, and... y = b0 + b1x1+ b2x2 X 1 The multiple linear regression model allows for more than one independent variable. Y = b0 + b1x1 + b2x2 + e X2

Visualize the multiple non-linear regression in a plot y y= b0+ b1x2 b0 X 1

Visualize the multiple non-linear regression in a plot y y= b0+ b1x2 b0 X 1 y = b0 + b1x12 + b2x2 … a parabola becomes a parabolic surface X2

Multiple Regression Analysis The general multiple regression with k independent variables is given by: Yi = E(Yi |X1i , X2i , …, Xki) + ei = b0 + b1 X1i + b2 X2i +…+ bk Xki+ ei When k>2, it is impossible to visualize the regression equation in a plot. The least squares criterion is used to develop an estimation of this equation. Because determining b1, b2, etc. is very tedious, a software package such as Excel or other statistical software is recommended to estimate them.

Choosing the line that fits best Ordinary Least Squares (OLS) Principle Straight lines can be describe generally by Yi = b0 + b1 X1i + b2 X2i +…+ bk Xki Finding the best line with smallest sum of squared difference is the same as Let b0* , b1* , b2* … bk* be the solution of the above problem. Y* = b0* + b1*X1+ b2*X2+ …+bk*Xk is known as the “average predicted value” (or simply “predicted value”) of y for any vector of (X1, X2, …, Xk).

Coefficient estimates from the ordinary least squares (OLS) principle Solving the minimization problem implies the first order conditions:

Coefficient estimates from the ordinary least squares (OLS) principle Solving the first order conditions implies The solution of b0* , b1* , b2* … bk* Y* = b0* + b1*X1+ b2*X2+ …+bk*Xk is known as the “average predicted value” (or simply “predicted value”) of y for any vector of (X1, X2, …, Xk).

Multiple Linear Regression Equations An estimation of the coefficients is too complicated by hand, let me use some computational software packages, such as Excel. 16

Interpretation of Estimated Coefficients Slope (k) Estimated Y Changes by k for Each 1 Unit Increase in Xk Holding All Other Variables Constant Note Yi = E(Yi |X1i , X2i , …, Xki) + ei = b0 + b1 X1i + b2 X2i +…+ bk Xki+ ei E(Yi | X1i , X2i , …, Xki=g+1) - E(Yi | X1i , X2i , …, Xki=g) = b0 + b1 X1i + b2 X2i +…+ bk (g+1) – [b0 + b1 X1i + b2 X2i +…+ bk g ] = k Example: If 1 = 2, then sales (Y) is expected to increase by 2 for each 1 unit increase in advertising (X1) given the number of sales rep’s (X2) Y-Intercept (0) Average value of Y when all Xk = 0 17

Parameter Estimation Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.) & newspaper circulation (000) on the number of ad responses (00). You’ve collected the following data: Resp Size Circ 1 1 2 4 8 8 1 3 1 3 5 7 2 6 4 4 10 6 Is this model specified correctly? What other variables could be used (color, photo’s etc.)? 18

Parameter Estimation Computer Output Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP 1 0.0640 0.2599 0.246 0.8214 ADSIZE 1 0.2049 0.0588 3.656 0.0399 CIRC 1 0.2805 0.0686 4.089 0.0264 b0 b2 b1

Interpretation of Coefficients Solution Slope (b1) # Responses to Ad is expected to increase by .2049 (20.49) for each 1 sq. in. increase in Ad Size Holding Circulation Constant Slope (b2) # Responses to Ad is expected to increase by .2805 (28.05) for each 1 unit (1,000) increase in circulation Holding Ad Size Constant Y-intercept is difficult to interpret. How can you have any responses with no circulation?

Multiple Standard Error of Estimate The multiple standard error of estimate is a measure of the effectiveness of the regression equation. It is measured in the same units as the dependent variable. It is difficult to determine what is a large value and what is a small value of the standard error.

Multiple Standard Error of Estimate The formula is: Because in computing y*, k+1 parameters must be fixed. Hence, we lost k+1 degree of freedom. Interpretation is similar to that in simple linear regression.

Multiple Regression and Correlation Assumptions The independent variables and the dependent variable have a linear relationship. The dependent variable must be continuous and at least interval-scale. The variation in (Y-Y*) or residual must be the same for all values of Y. When this is the case, we say the difference exhibits homoscedasticity. The residuals should follow the normal distributed with mean 0. Successive values of the dependent variable must be uncorrelated.

The ANOVA Table The ANOVA table reports the variation in the dependent variable. The variation is divided into two components. The Explained Variation is that accounted for by the set of independent variable. The Unexplained or Random Variation is not accounted for by the independent variables.

Correlation Matrix A correlation matrix is used to show all possible simple correlation coefficients among the variables. See which xj are most correlated with y, and which xj are strongly correlated with each other.

Multicollinearity High correlation between X variables Multicollinearity makes it difficult to separate effect of x1 on y from the effect of x2 on y. Leads to unstable coefficients depending on X variables in model Always exists -- matter of degree Example: using both age & height as explanatory variables in same model

Detecting Multicollinearity Examine correlation matrix Correlations between pairs of X variables are more than with Y variable Few remedies Obtain new sample data Eliminate one correlated X variable

Correlation Matrix Computer Output Correlation Analysis Pearson Corr Coeff /Prob>|R| under HO:Rho=0/ N=6 RESPONSE ADSIZE CIRC RESPONSE 1.00000 0.90932 0.93117 0.0 0.0120 0.0069 ADSIZE 0.90932 1.00000 0.74118 0.0120 0.0 0.0918 CIRC 0.93117 0.74118 1.00000 0.0069 0.0918 0.0 All 1’s rY1: correlation between response and ADSIZE rY2 : correlation between response and CIRC r12: correlation between ADSIZE and CIRC

Global Test The global test is used to investigate whether any of the independent variables have significant coefficients. The hypotheses are: The test statistic follows an F distribution with k (number of independent variables) and n-(k+1) degrees of freedom, where n is the sample size.

Test for Individual Variables This test is used to determine which independent variables have nonzero regression coefficients. The variables that have zero regression coefficients are usually dropped from the analysis. The test statistic is the t distribution with n-(k+1) degrees of freedom.

EXAMPLE 1 A market researcher for Super Dollar Super Markets is studying the yearly amount families of four or more spend on food. Three independent variables are thought to be related to yearly food expenditures (Food). Those variables are: total family income (Income) in $00, size of family (Size), and whether the family has children in college (College).

Example 1 continued Note the following regarding the regression equation. The variable college is called a dummy or indicator variable. It can take only one of two possible outcomes. That is a child is a college student or not. Other examples of dummy variables include gender, the part is acceptable or unacceptable, the voter will or will not vote for the incumbent governor. We usually code one value of the dummy variable as “1” and the other “0.”

EXAMPLE 1 continued Family Food Income Size Student 1 3900 376 4 2 2 5300 515 5 3 4300 516 4900 468 6400 538 6 7300 626 7 543 8 437 9 6100 608 10 513 11 7400 493 12 5800 563

EXAMPLE 1 continued Use a computer software package, such as Excel, to develop a correlation matrix. From the analysis provided by Excel, write out the regression equation: Y*= 954 +1.09X1 + 748X2 + 565X3 What food expenditure would you estimate for a family of 4, with no college students, and an income of $50,000 (which is input as 500)?

EXAMPLE 1 continued The regression equation is Food = 954 + 1.09 Income + 748 Size + 565 Student Predictor Coef SE Coef T P Constant 954 1581 0.60 0.563 Income 1.092 3.153 0.35 0.738 Size 748.4 303.0 2.47 0.039 Student 564.5 495.1 1.14 0.287 S = 572.7 R-Sq = 80.4% R-Sq(adj) = 73.1% Analysis of Variance Source DF SS MS F P Regression 3 10762903 3587634 10.94 0.003 Residual Error 8 2623764 327970 Total 11 13386667

EXAMPLE 1 continued From the regression output we note: The coefficient of determination is 80.4 percent. This means that more than 80 percent of the variation in the amount spent on food is accounted for by the variables income, family size, and student. Each additional $100 dollars of income per year will increase the amount spent on food by $109 per year. An additional family member will increase the amount spent per year on food by $748. A family with a college student will spend $565 more per year on food than those without a college student.

EXAMPLE 1 continued The correlation matrix is as follows: Food Income Size Income 0.587 Size 0.876 0.609 Student 0.773 0.491 0.743 The strongest correlation between the dependent variable and an independent variable is between family size and amount spent on food. None of the correlations among the independent variables should cause problems. All are between –.70 and .70.

EXAMPLE 1 continued The estimated food expenditure for a family of 4 with a $500 (that is $50,000) income and no college student is $4,491. Y* = 954 + 1.09(500) + 748(4) + 565 (0) = 4491

EXAMPLE 1 continued Conduct a global test of hypothesis to determine if any of the regression coefficients are not zero. H0 is rejected if F>4.07. From the computer output, the computed value of F is 10.94. Decision: H0 is rejected. Not all the regression coefficients are zero

EXAMPLE 1 continued Conduct an individual test to determine which coefficients are not zero. This is the hypotheses for the independent variable family size. From the computer output, the only significant variable is SIZE (family size) using the p-values. The other variables can be omitted from the model. Thus, using the 5% level of significance, reject H0 if the p-value<.05

EXAMPLE 1 continued We rerun the analysis using only the significant independent family size. The new regression equation is: Y* = 340 + 1031X2 The coefficient of determination is 76.8 percent. We dropped two independent variables, and the R-square term was reduced by only 3.6 percent.

Example 1 continued Regression Analysis: Food versus Size The regression equation is Food = 340 + 1031 Size Predictor Coef SE Coef T P Constant 339.7 940.7 0.36 0.726 Size 1031.0 179.4 5.75 0.000 S = 557.7 R-Sq = 76.8% R-Sq(adj) = 74.4% Analysis of Variance Source DF SS MS F P Regression 1 10275977 10275977 33.03 0.000 Residual Error 10 3110690 311069 Total 11 13386667

Evaluating the Model yi* = b0 +b1x1i+…+bkxki Evaluating the Model Most of the procedures to evaluate the multiple regression model are the same as those discussed in the chapter of simple regression models. Residual analysis Test for linearity Global F-test Test for the coefficient of correlation irrelevant. Test for individual slope of regression line not enough. Non-independence of error variables. Durbin-Watson Statistics Outliers Evaluating the Model

Chapter Twelve Multiple Regression and Correlation Analysis - END -