Topic 3: Regression.

Slides:



Advertisements
Similar presentations
Managerial Economics in a Global Economy
Advertisements

Brief introduction on Logistic Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Correlation and regression
Objectives (BPS chapter 24)
Lecture 4 Econ 488. Ordinary Least Squares (OLS) Objective of OLS  Minimize the sum of squared residuals: where Remember that OLS is not the only possible.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Chapter 13 Additional Topics in Regression Analysis
Multiple Linear Regression Model
Additional Topics in Regression Analysis
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
Chapter 11 Multiple Regression.
Introduction to Probability and Statistics Linear Regression and Correlation.
Multiple Regression and Correlation Analysis
Ch. 14: The Multiple Regression Model building
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Ordinary Least Squares
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Multiple Linear Regression Analysis
12 Autocorrelation Serial Correlation exists when errors are correlated across periods -One source of serial correlation is misspecification of the model.
Correlation & Regression
Chapter 8 Forecasting with Multiple Regression
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Chapter 13: Inference in Regression
Correlation and Linear Regression
1 MADE WHAT IF SOME OLS ASSUMPTIONS ARE NOT FULFILED?
Hypothesis Testing in Linear Regression Analysis
Regression Method.
Chapter 12 Multiple Regression and Model Building.
Multiple Regression. In the previous section, we examined simple regression, which has just one independent variable on the right side of the equation.
Understanding Multivariate Research Berry & Sanders.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Chapter 10 Hetero- skedasticity Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Pure Serial Correlation
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Lecturer: Kem Reat, Viseth, PhD (Economics)
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
9-1 MGMG 522 : Session #9 Binary Regression (Ch. 13)
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Autocorrelation in Time Series KNNL – Chapter 12.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Chapter 5 Demand Estimation Managerial Economics: Economic Tools for Today’s Decision Makers, 4/e By Paul Keat and Philip Young.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
1 B IVARIATE AND MULTIPLE REGRESSION Estratto dal Cap. 8 di: “Statistics for Marketing and Consumer Research”, M. Mazzocchi, ed. SAGE, LEZIONI IN.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques
1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Quantitative Methods. Bivariate Regression (OLS) We’ll start with OLS regression. Stands for  Ordinary Least Squares Regression. Relatively basic multivariate.
Chapter 14 Introduction to Multiple Regression
Kakhramon Yusupov June 15th, :30pm – 3:00pm Session 3
Fundamentals of regression analysis
Pure Serial Correlation
Serial Correlation and Heteroskedasticity in Time Series Regressions
Prepared by Lee Revere and John Large
Simple Linear Regression
Simple Linear Regression
Regression III.
BEC 30325: MANAGERIAL ECONOMICS
BEC 30325: MANAGERIAL ECONOMICS
Presentation transcript:

Topic 3: Regression

Correlation Analysis correlation analysis expresses the relationship between two data series using a single number. The correlation coefficient is a measure of how closely related two data series are. The correlation coefficient measures the linear association between two variables.

Variables with Perfect Positive Correlation

Variables with Perfect Negative Correlation

Variables with a Correlation of 0

Variables with a Strong Non-Linear Association

correlation coefficient The sample correlation coefficient is,

Correlations Among Stock Return Series

Testing the Significance of the Correlation Coefficient A t-test can be used to test the significance of the correlation coefficient.

Linear Regression Linear regression assumes a linear relationship between the dependent (Y) and the independent variables (X).

Assumptions of the Linear Regression Model The relationship between the dependent variable, Y, and the independent variable, X is linear in the parameters b0 and b1. The independent variable, X , is not random. The expected value of the error term is 0. The variance of the error term is the same for all observations. The error term, ε, is uncorrelated across observations. The error term, ε, is normally distributed.

Linear Regression Model Linear regression chooses the estimated or fitted parameters to minimize Standard Error of the Estimate

Coefficient of Determination The coefficient of determination measures the fraction of the total variation in the dependent variable that is explained by the independent variable.

Hypothesis Testing We can test to see if the slope coefficient is significant by using a t-test. We can also construct a confidence interval for b.

ANOVA Analysis of variance (ANOVA) is a statistical procedure for dividing the total variability of a variable into components that can be attributed to different sources. where,

Limitations of Regression Analysis Regression relations can change over time, known as the issue of parameter instability. Use of regression results specific to investment contexts is that public knowledge of regression relationships may negate their future usefulness. If the regression assumptions are violated, hypothesis tests and predictions based on linear regression will not be valid.

Multiple Linear Regression Model Multiple linear regression allows us to determine the effect of more than one independent variable on a particular dependent variable. A slope coefficient, bj , measures how much the dependent variable, Y , changes when the independent variable, Xj , changes by one unit, holding all other independent variables constant. In practice, software programs are used to estimate the multiple regression model.

Assumptions of the Multiple Linear Regression Model The relationship between the dependent variable, Y , and the independent variables, X1, X2, . . . , Xk, is linear. The independent variables (X1, X2, . . . , Xk) are not random. Also, no exact linear relation exists between two or more of the independent variables. The expected value of the error term, conditioned on the independent variables, is 0: E(| X1, X2, . . . , Xk) = 0. The variance of the error term is the same for all observations The error term is uncorrelated across observations The error term is normally distributed.

Testing Whether All Population Regression Coefficients Equal 0 We illustrated how to conduct hypothesis tests on regression coefficients individually using a t-test. But what about the significance of the regression as a whole? We test the null hypothesis that all the slope coefficients in a regression are simultaneously equal to 0 by using an F-test.

ANOVA Analysis of variance (ANOVA) is a statistical procedure for dividing the total variability of a variable into components that can be attributed to different sources.

R2 Adjusted R2 is a measure of goodness of fit that accounts for additional explanatory variables.

Using Dummy Variables A dummy variable is qualitative variable that takes on a value of 1 if a particular condition is true and 0 if that condition is false. used to account for qualitative variables such male or female, month of the year effects, etc.

Month-of-the-Year Effects on Small Stock Returns Suppose we want to test whether total returns to one small-stock index, the Russell 2000 Index, differ by month. We can use dummy variables in estimate the following regression,

Violations of Regression Assumptions Inference based on an estimated regression model rests on certain assumptions being met. Violations may cause the inferences made to be invalid. Heteroskedasticity occurs when the variance of the errors differs across observations. does not affect consistency causes the F-test for the overall significance to be unreliable. t-tests for the significance of individual regression coefficients are unreliable because heteroskedasticity introduces bias into estimators of the standard error of regression coefficients.

Regressions with Homoskedasticity

Regressions with Heteroskedasticity

Testing for Heteroskedasticity The Breusch–Pagan test consists of regressing the squared residuals from the estimated regression equation on the independent variables in the regression. If no conditional heteroskedasticity exists, the independent variables will not explain much of the variation in the squared residuals. If conditional heteroskedasticity is present in the original regression, however, the independent variables will explain a significant portion of the variation in the squared residuals.

Correcting for Heteroskedasticity Two different methods to correct the effects of conditional heteroskedasticity: computing robust standard errors, corrects the standard errors of the linear regression model’s estimated coefficients to account for the conditional heteroskedasticity. Generalized least squares, modifies the original equation in an attempt to eliminate the heteroskedasticity.

Serial Correlation When regression errors are correlated across observations, we say that they are serially correlated (or autocorrelated). Serial correlation most typically arises in time-series regressions. The principal problem caused by serial correlation in a linear regression is an incorrect estimate of the regression coefficient standard errors Positive serial correlation is serial correlation in which a positive error for one observation increases the chance of a positive error for another observation.

Testing for Serial Correlation The Durbin Watson statistic is used to test for serial correlation When the Durbin–Watson (DW) statistic is less than dl , we reject the null hypothesis of no positive serial correlation. When the DW statistic falls between dl and du , the test results are inconclusive. When the DWstatistic is greater than, du we fail to reject the null hypothesis of no positive serial correlation

Value of the Durbin–Watson Statistic

Correcting for Serial Correlation Two alternative remedial steps whena regression has significant serial correlation: adjust the coefficient standard errors for the linear regression parameter estimates to account for the serial correlation. modify the regression equation itself to eliminate the serial correlation.

Multicollinearity Multicollinearity occurs when two or more independent variables (or combinations of independent variables) are highly (but not perfectly) correlated with each other. does not affect the consistency of the OLS estimates of the regression coefficients estimates become extremely imprecise and unreliable The classic symptom of multicollinearity is a high R2 (and significant F-statistic) even though the t-statistics on the estimated slope coefficients are not significant. The most direct solution to multicollinearity is excluding one or more of the regression variables.

Problems in Linear Regression & their Solutions Effect Solution Heteroskedasticity Incorrect standard errors Correct for conditional heteroskedasticity Serial Correlation Correct for serial correlation Multicollinearity High R2 and low t-statistic Remove 1 or more independent variable

Model Specification Model specification refers to the set of variables included in the regression and the regression equation’s functional form. Possible mispecifications include: One or more important variables could be omitted from regression. One or more of the regression variables may need to be transformed (for example, by taking the natural logarithm of the variable) before estimating the regression. The regression model pools data from different samples that should not be pooled.

Models with Qualitative Dependent Variables Qualitative dependent variables are dummy variables used as dependent variables instead of as independent variables. The probit model, which is based on the normal distribution, estimates the probability that Y = 1 (a condition is fulfilled) given the value of the independent variable X . The logit model is identical, except that it is based on the logistic distribution rather than the normal distribution. Discriminant analysis yields a linear function, similar to a regression equation, which can then be used to create an overall score. Based on the score, an observation can be classified into categories such as bankrupt or not bankrupt.