1 An Investigation into Regression Model using EVIEWS Prepared by: Sayed Hossain Lecturer for Economics Multimedia University Personal website: www.sayedhossain.comwww.sayedhossain.com.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Chapter 12 Simple Linear Regression
Forecasting Using the Simple Linear Regression Model and Correlation
Hypothesis Testing Steps in Hypothesis Testing:
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Ch11 Curve Fitting Dr. Deshi Ye
Objectives (BPS chapter 24)
Chapter 12 Simple Linear Regression
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
The Simple Linear Regression Model: Specification and Estimation
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Chapter 10 Simple Regression.
CHAPTER 3 ECONOMETRICS x x x x x Chapter 2: Estimating the parameters of a linear regression model. Y i = b 1 + b 2 X i + e i Using OLS Chapter 3: Testing.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
CHAPTER 4 ECONOMETRICS x x x x x Multiple Regression = more than one explanatory variable Independent variables are X 2 and X 3. Y i = B 1 + B 2 X 2i +
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Lecture 23 Multiple Regression (Sections )
SIMPLE LINEAR REGRESSION
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Lecture 5 Correlation and Regression
Correlation & Regression
Choosing Statistical Procedures
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 11 Simple Regression
Means Tests Hypothesis Testing Assumptions Testing (Normality)
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Inferential Statistics 2 Maarten Buis January 11, 2006.
1 1 Slide Simple Linear Regression Coefficient of Determination Chapter 14 BA 303 – Spring 2011.
1 Review of ANOVA & Inferences About The Pearson Correlation Coefficient Heibatollah Baghi, and Mastee Badii.
Byron Gangnes Econ 427 lecture 3 slides. Byron Gangnes A scatterplot.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
1 Inferences About The Pearson Correlation Coefficient.
ANOVA Assumptions 1.Normality (sampling distribution of the mean) 2.Homogeneity of Variance 3.Independence of Observations - reason for random assignment.
Lecture 10: Correlation and Regression Model.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Environmental Modeling Basic Testing Methods - Statistics III.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Correlation & Regression Analysis
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Correlation. u Definition u Formula Positive Correlation r =
Research Methodology Lecture No :26 (Hypothesis Testing – Relationship)
Copyright © 2008 by Nelson, a division of Thomson Canada Limited Chapter 18 Part 5 Analysis and Interpretation of Data DIFFERENCES BETWEEN GROUPS AND RELATIONSHIPS.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Advanced Higher STATISTICS Spearman’s Rank (Spearman’s rank correlation coefficient) Lesson Objectives 1. Explain why it is used. 2. List the advantages.
Inference about the slope parameter and correlation
Applied Biostatistics: Lecture 2
Hypothesis Tests: One Sample
Multiple Regression A curvilinear relationship between one variable and the values of two or more other independent variables. Y = intercept + (slope1.
Hypothesis Tests for Proportions
Simple Linear Regression
SIMPLE LINEAR REGRESSION
Chapter 13 Additional Topics in Regression Analysis
Inference Concepts 1-Sample Z-Tests.
Presentation transcript:

1 An Investigation into Regression Model using EVIEWS Prepared by: Sayed Hossain Lecturer for Economics Multimedia University Personal website:

2 Seven assumptions about a good regression model 1.Regression line must be fitted to data strongly. 2.Most of the independent variables should be individually significant to explain dependent variable 3.Independent variables should be jointly significant to influence or explain dependent variable. 4.The sign of the coefficients should follow economic theory or expectation or experiences or intuition. 5.No serial or auto-correlation in the residual (u) 6.The variance of the residual (u) should be constant meaning that homoscedasticity 7.The residual (u) should be normally distributed.

3 (Assumption no. 1) Regression line must be fitted to data strongly (Goodness of Data Fit) *** Guideline : R 2 => 60 percent (0.60) is better

4 Goodness of Data Fit Data must be fitted reasonable well. That is value of R 2 should be reasonable high, more than 60 percent. Higher the R 2 better the fitted data.

5 (Assumption no. 2) Most of the independent variables should be individually individually significant ** t- test t –test is done to know whether each and every independent variable (X1, X2 and X3 etc here) is individually significant or not to influence the dependent variable, that is Y here.

6 Individual significance of the variable Most of the independent variables should be individually significant. This matter can be checked using t test. If the p-value of t statistics is less than 5 percent (0.05) we can reject the null and accept alternative hypothesis. If we can reject the null hypothesis, it means that particular independent variable is significant to influence dependent variable in the population.

7 For Example>> Variables: We have four variables, Y, X 1, X 2 X 3 Here Y is dependent and X 1, X 2 X 3 are independent Population regression model Y = Bo + B 1 X 1 + B 2 X 2 + B 3 X 3 + u Sample regression model Y = bo + b 1 X 1 + b 2 X 2 + b 3 X 3 + e Here, sample regression line is a estimator of population regression line. Our target is to estimate population regression line (which is almost impposible or time and money consuming to estimate) from sample regression line. For example, small b 1, b 2 and b 3 are estimators of big B 1, B 2 and B 3 Here, u is the residual for population regression line while e is the residual for sample regression line. e is the estimator of u. We want to know the nature of u from e.

Tips If the sample collection is done as per the statistical guideline (several random procedures) then sample regression line can be a representative of population regression line. Our target is to estimate the population regression line from a sample regression line.

9 Setting hypothesis for t –test : An example Null Hypothesis: Bo=0 Alternative hypothesis: Bo≠0 Null hypothesis : B 1 =0 Alternative hypothesis: B 1 ≠0 Null Hypothesis : B 2 =0 Alternative hypothesis: B 2 ≠0 Null Hypothesis : B 3 =0 Alternative hypothesis: B 3 ≠0 Hypothesis setting is always done for population, not for sample. That is why we have taken all big B (from population regression line) but not small b from sample regression line.

Hypothesis Setting Null hypothesis : B 1 =0 Alternative hypothesis: B 1 ≠0 Since the direction of alternative hypothesisis is ≠, meaning that we assume that there exists a relationship between independent variable (X1 should be here) with dependent variable (Y here) in the population. But it can not say whether the relationship is negative or positive. This direction ≠ is a two tail hypothesis. Null hypothesis : B 1 =0 Alternative hypothesis: B 1 <0 But if we set hypothesis as above, then we assume that in the population, there exists a negative relationship between X1 and Y as the direction in alternative hypothesis is <. It requires one tail test. `

11 (Assumption no. 3) Joint Significace Independent variables should be jointly significant to explain dependent variable ** F- test ANOVA (Analysis of Variance)

12 Joint significance Independent variables should be jointly significant to explain Y. This can be checked using F-test. If the p-value of F statistic is less than 5 percent (0.05) we can reject the null and accept alternative hypothesis. If we can reject null hypothesis, it means that all the independent variables (X 1, X 2 X 3 ) jointly can influence dependent variable, that is Y here.

13 Joint hypothesis setting Null hypothesis Ho: B 1 =B 2 =B 3 =0 Alternative H 1 : Not all B’s are simultaneously equal to zero Here Bo is dropped as it is not associated with any variable. Here also taken all big B

14 Few things Residual ( u or e) = Actual Y – estimated (fitted) Y Residual, error term, disturbance term all are same meaning. Serial correlation and auto-correlation are same meaning.

15 (Assumption no. 4) The sign of the coefficients should follow economic theory or expectation or experiences of others (literature review) or intuition.

Residual Analysis

17 (Assumption no. 5) No serial or auto-correlation in the residual (u). ** Breusch-Godfrey serial correlation LM test : BG test

18 Serial correlation Serial correlation is a statistical term used to the describe the situation when the residual is correlated with lagged values of itself. In other words, If residuals are correlated, we call this situation serial correlation which is not desirable.

19 How serial correlation can be formed in the model? Incorrect model specification, omitted variables, incorrect functional form, incorrectly transformed data.

20 Detection of serial correlation Many ways we can detect the existence of serial correlation in the model. An approach of detecting serial correlation is Breusch-Godfrey serial correlation LM test : BG test

21 Hypothesis setting Null hypothesis H o : no serial correlation (no correlation between residuals (ui and uj)) Alternative hypothesis H 1 : serial correlation (correlation between residuals (ui and uj )

22 (Assumption no. 6) The variance of the residual (u) is constant (Homoscedasticity) *** Breusch-Pegan-Godfrey Test

23 Heteroscedasticity is a term used to the describe the situation when the variance of the residuals from a model is not constant. When the variance of the residuals is constant, we call it homoscedasticity. Homoscedasticity is desirable. If residuals do not have constant variance, we call it hetersocedasticty, which is not desirable.

24 How the heteroscedasticity may form? –Incorrect model specification, –Incorrectly transformed data,

25 Hypothesis setting for heteroscedasticity –Null hypothesis H o : Homoscedasticity (the variance of residual (u) is constant) –Alternative hypothesis H 1 : Heteroscedasticity (the variance of residual (u) is not constant )

26 Detection of heteroscedasticity There are many test involed to detect heteroscedasticity. One of them is Bruesch-Pegan-Godfrey test which we will employ here.

27 (Assumption no. 7) Residuals (u ) should be normally distributed ** Jarque Bera statistics

28 Setting the hypothesis: Null hypothesis H o : Normal distribution (the residual (u) follows a normal distribution) Alternative hypothesis H 1 : Not normal distribution (the residual (u) follows not normal distribution) Detecting residual normality: Histogram-Normality test (Perform Jarque- Bera Statistic). If the p-value of Jarque-Bera statistics is less than 5 percent (0.05) we can reject null and accept the alternative, that is residuals (u) are not normally distributed.

29 An Emperical Model Development

30 Our hypothetical model Variables: We have four variables, Y, X1, X2 X3 Here Y is dependent and X1, X2 and X3 are independent Population regression model Y = Bo + B 1 X 1 + B 2 X 2 + B 3 X 3 + u Sample regression line Y = bo+ b 1 X 1 + b 2 X 2 +b 3 X 3 + e

DATA Sample size is 35 taken from population

DATA obsRESIDX1X2X3YYF

DATA obsRESIDX1X2X3YYF Y, X1, X2 and X3 are actual sample data collected from population YF= Estimated, forecasted or predicted Y RESID (e) = Residuals of the sample regression line that is, e=Actual Y – Predicted Y (fitted Y)

Regression Output

35 Dependent Variable: Y35 ObservationMethod: Least Squares Included observations: 98 VariableCoefficientStd. Errort-StatisticProb. C X1-2.11E E X X3-3.95E E R-squared0.1684Mean dependent var Adjusted R-squared0.087S.D. dependent var S.E. of regression0.3736Akaike info criterion Sum squared resid4.328Schwarz criterion1.15 Log likelihood-13.08F-statistic2.093 Durbin-Watson stat2.184Prob(F-statistic) Regression output

Few things t- statistics= Coeffient / standard error t-statistics (absolute value) and p values always move in opposite direction

Output Actual Y, Fitted Y, Residual and its plotting obsActualFittedResidualResidual Plot |. |.* | |.* |. | |. * |. | |. | *. | | * |. | |. | *. | |. *. | |. * |. | |. |. * | |. *. | | * |. | |. *. | | * |. | |. |. * | |. |. * |.* |. | |.* |. | | *. |. | | *. |. | |. |*. |

obsActualFittedResidualResidual Plot | * |. | |. | *. | |. | *. | | *. |. | |. | * | |. * |. | |. *|. | |. *. | |. | *. | |. *|. | | *. |. | |. |. * | |. | *. | |. * |. | |. |*. | Output Actual Y, Fitted Y, Residual and its plotting

Actual Y, Fitted Y and Residual

Sample residual

41 (Assumption no. 1) Goodness of Fit Data R-square: It means that percent variation in Y can be explained jointly by three independent variables such as X1, x2 and X3. The rest percent variation in Y can be explained by residuals or other variables other than X1 X2 and X3.

42 (Assumption no. 2) Joint Hypothesis : F statistics F statistics: and Prob Null hypothesis Ho: B 1 =B 2 =B 3 =0 Alternative H 1 : Not all B’s are simultaneously equal to zero Since the p-value is more tha than 5 percent (here percent), we can not reject null. In other words, it means that all the independent variables (here X1 X2 and X3) can not jointly explain or influence Y in the population.

43 Assumption No. 3 Independent variable significance For X1, p-value : Null Hypothesis: B 1 =0 Alternative hypothesis: B 1 ≠0 Since the p-value is more than 5 percent (0.05) we can not reject null and meaning we accept null meaning B 1 =0. In other words, X1 can not influence Y in the population. For X2, p-value: (3.05 percent) Null Hypothesis: B 2 =0 Alternative hypothesis: B 2 ≠0 Since p-value ( ) is less than 5 percent meaning that we can reject null and accept alternative hypothesis. It means that variable X2 can influence variable Y in the population but what direction we can not say as alternative hypothesis is ≠. For X3, p-value: So X3 is not significant to explain Y.

Assumption No. 4 Sign of the coefficients Our sample model: Y=bo+b1x1+b2x2+b3x3+e Sign we expected after estimation as follows: Y=bo - b1x1 + b2x2 - b3x3 Decision : The outcome did not match with our expectation. So assumption 4 is violated.

45 Breusch-Godfrey Serial Correlation LM Test: F-statistic 1.01 Prob. F(2,29) Obs*R-squared Prob. Chi-Square(2) Null hypothesis : No serial correlation in the residuals (u) Alternative: There is serial correlation in the residuals (u) Since the p-value ( ) of Obs*R-squared is more than 5 percent (p>0.05), we can not reject null hypothesis meaning that residuals (u) are not serially correlated which is desirable. Assumption no 5 SERIAL OR AUTOCORRELATION

46 Assumption no. 6 Heteroscedasticy Test F-statistic 1.84 Probability Obs*R-squared Probability Breusch-Pegan-Godfrey test (B-P-G Test) Null Hypothesis: Residuals (u) are Homoscedastic Alternative: Residuals (u) are Hetroscedastic The p-value of Obs*R-squared shows that we can not reject null. So residuals do have constant variance which is desirable meaning that residuals are homoscedastic. B-P-G test normally done for large sample

47 Assumption no. 7 Residual (u) Normality Test Null Hypothesis: residuals (u) are normally distribution Alternative: Not normally distributed Jarque Berra statistics is and the corresponding p value is Since p vaue is more than 5 percent we accept null meaning that population residual (u) is normally distrbuted which fulfills the assumption of a good regression line.

48 Evaluation of our model on the basis of assumptions 1.R-square is very low ( Bad sign) 2.There is no serial correlation (Good sign) 3.Independent variables are not jointly can influence Y (Bad sign) 4.Signs are not as expected (Bad sign) 5.Only X2 variable is significant out of three (Bad sign). 6.Heteroscedasticity problem is not there (Good sign) 7.Residuals are normally distributed (Good sign)

References Essentials of Econometrics by Damodar Gujarati, McGraw Hill Publication. Basic Econometrics by Damodar Gujarati, McGraw Hill Publication. An Introduction to Econometrics by Cheng Ming Yu, Sayed Hossain and Law Siong Hook. McGraw Hill Publication.Sayed Hossain

50 Prepared by: Sayed Hossain Lecturer for Economics Multimedia University, Malaysia Peronal website: Year: 2009 Use the information of this website on your own risk. This website shall not be responsible for any loss or expense suffered in connection with the use of this website. PLEASE COMMENT IN MY GUESTBOOK AT: