Chapter 5: The Simple Regression Model

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Multiple Regression Analysis
The Simple Regression Model
CHAPTER 3: TWO VARIABLE REGRESSION MODEL: THE PROBLEM OF ESTIMATION
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Objectives (BPS chapter 24)
Part 1 Cross Sectional Data
The Simple Linear Regression Model: Specification and Estimation
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Simple Linear Regression
Chapter 3 Simple Regression. What is in this Chapter? This chapter starts with a linear regression model with one explanatory variable, and states the.
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need.
Introduction to Regression Analysis, Chapter 13,
3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors.
Introduction to Linear Regression and Correlation Analysis
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Chapter Three TWO-VARIABLEREGRESSION MODEL: THE PROBLEM OF ESTIMATION
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin The Two-Variable Model: Hypothesis Testing chapter seven.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
1 AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Part II: Theory and Estimation of Regression Models Chapter 5: Simple Regression Theory.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Lecture 6 Feb. 2, 2015 ANNOUNCEMENT: Lab session will go from 4:20-5:20 based on the poll. (The majority indicated that it would not be a problem to chance,
The simple linear regression model and parameter estimation
Chapter 4 Basic Estimation Techniques
6. Simple Regression and OLS Estimation
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Ch. 2: The Simple Regression Model
Multiple Regression Analysis: Estimation
THE LINEAR REGRESSION MODEL: AN OVERVIEW
Section 11.1 Day 2.
Chapter 4: The Nature of Regression Analysis
Chapter 11 Simple Regression
The Simple Regression Model
ECONOMETRICS DR. DEEPTI.
Chapter 3: TWO-VARIABLE REGRESSION MODEL: The problem of Estimation
Multiple Regression Analysis
HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT?
Simple Linear Regression - Introduction
Ch. 2: The Simple Regression Model
Two-Variable Regression Model: The Problem of Estimation
Chapter 6: MULTIPLE REGRESSION ANALYSIS
No notecard for this quiz!!
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
Undergraduated Econometrics
The Simple Linear Regression Model: Specification and Estimation
HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT?
Simple Linear Regression
Chapter 7: The Normality Assumption and Inference with OLS
Seminar in Economics Econ. 470
Product moment correlation
The Simple Regression Model
Linear Panel Data Models
Linear Regression Summer School IFPRI
The Multiple Regression Model
Financial Econometrics Fin. 505
Chapter 4: The Nature of Regression Analysis
Financial Econometrics Fin. 505
Introduction to Regression
Regression Models - Introduction
REGRESSION ANALYSIS 11/28/2019.
Presentation transcript:

Chapter 5: The Simple Regression Model Econometrics Econ. 405 Chapter 5: The Simple Regression Model

I. Understanding the definition of Simple Linear Regression Model There are two types of regression models (Simple vs. multiple regression models. The simple regression model can be used to study the relationship between any two variables. This simple regression model is appropriate as an empirical tool. It is also a good practice for studying multiple regression model (covered next chapters).

The analysis of applied econometrics begins with following: Y and X are two variables representing some population. We are interested in “explaining Y in terms of x”. Or similarly, studying “how much Y varies with changes in X”.

Recall: Explaining the variable y in terms of variable x Intercept Slope parameter Dependent variable, explained variable, response variable, predicted variable, and regressand Error term, disturbance, and unobservables Independent variable, explanatory variable, control variable, predictor variable, and regressor

The Simple Regression Model In order to estimate the regression model, data is needed: A random sample of observations First observation Second observation Third observation Value of the dependent variable of the i-th ob- servation Value of the expla- natory variable of the i-th observation n-th observation

Revist the regression equations:

The Simple Regression Model Fit as good as possible a regression line through the data points: Fitted regression line

II. Ordinary Least Squares Technique Regression analysis refers to techniques that allow you to estimate economic relationship using data. Mainly there are three techniques to estimate regression function; Generalized method of moments (GMM), Maximum Likelihood (ML), & Ordinary Least Squares (OLS). The method used most frequently is commonly known as Ordinary Least Squares (OLS). Although the OLS technique is popular and relatively simple, the application of it can become more complicated when starting to add more independent variables to your regression model.

Justifying the Least Squares Principle When estimating a Sample Regression Function (SRF), the most common econometrics method to use is OLS. Method of OLS uses the least squares principle to fit pre-specified regression. The least squares principle states that SRF should be constructed ( with constant and slope function) so that the sum of squared distance between the observed values of “Y” and the values estimated from your SRF is minimized (the smallest possible value).

Reasons for OLS Popularity A) OLS is easier than alternatives: Although other alternative methods are used to estimate same regression functions, they require more mathematical sophistication.

B) The OLS is sensible: So by using square residuals, you can avoid positive and negative residuals canceling each other out and find a regression line that’s as close as possible to the observed data points. How??

 

 

 

 

 

 

 

 

   

   

   

 

 

The numerical properties of estimators obtained by the method of OLS: The OLS estimators are expressed only in terms of the observable quantities (i.e., X and Y). Therefore, they can be easily computed. They are point estimators; that is, given the sample, each estimator will provide only a single (point, not interval) value of the relevant population parameter. Once the OLS estimates are obtained from the sample data, the sample regression line can be easily obtained.

   

 

Recall: Algebraic properties of OLS regression Fitted or predicted values Deviations from regression line (= residuals) Deviations from regression line sum up to zero Correlation between deviations and regressors is zero Sample averages of y and x lie on regression line

Accordingly: Thus the Goodness-of-Fit is measures of variation to to show “How well does the explanatory variable explain the dependent variable?“ TSS= total sum of squares ESS= explained sum of squares RSS= residual sum of squares

Total sum of squares, represents total variation in dependent variable   Total sum of squares, represents total variation in dependent variable   Explained sum of squares, represents variation explained by regression   Residual sum of squares, represents variation not explained by regression

Total variation Explained part Unexplained part   Total variation Explained part Unexplained part R-squared measures the fraction of the total variation that is explained by the regression

 

III. OLS Assumptions- Classical Linear Regression Model (CLRM) In regression analysis our objective is not only to obtain βˆ1 and βˆ2 but also to draw inferences about the true β1 and β2. For example, we would like to know how close βˆ1 and βˆ2 are to their counterparts in the population or how close Yˆi is to the true E(Y | Xi). The econometrics model shows that Yi depends on both Xi and ui . The assumptions made about the Xi variable(s) and the error term are extremely critical to the valid interpretation of the regression estimates. When deciding whether OLS is the best technique for your estimation problem, some requirements must be met. They are called the OLS assumptions or the classical linear regression model (CLRM).

First Assumption: Keep in mind that the regressand Y and the regressor X themselves may be nonlinear.

Second Assumption: This means the regression analysis is conditional on the given values of the regressor(s) X.

Third Assumption: Revisit property (3); each Y population corresponding to a given X is distributed around its mean value with some Y values above the mean and some below it. the mean value of these deviations corresponding to any given X should be zero. Note that the assumption E(ui | Xi) = 0 implies that E(Yi | Xi) = β1 + β2Xi.

 

This situation, the variation around the regression line (which is the line of average relationship between Y and X) is the same across the X values; it neither increases or decreases as X varies.

The conditional variance of the Y population varies with X The conditional variance of the Y population varies with X. This situation is known as heteroscedasticity, or unequal spread, or variance. Symbolically, in this situation Assumption (4) can be written as var (ui | Xi) = σ2i var (u| X1) < var (u| X2), . . . , < var (u| Xi). Therefore, the likelihood is that the Y observations coming from the population with X = X1 would be closer to the PRF than those coming from populations corresponding to X = X2, X = X3, and so on. In short, not all Y values corresponding to the various X’s will be equally reliable, reliability being judged by how closely or distantly the Y values are distributed around their means, that is, the points on the PRF.

 

According to the Figures , When the disturbances (deviations) follow systematic patterns: In Figure a, we see that the u’s are positively correlated, a positive u followed by a positive u or a negative u followed by a negative u. In Figure b, the u’s are negatively correlated, a positive u followed by a negative u and vice versa. We can see there is auto- or serial correlation. In Figure c, shows that there is no systematic pattern to the u’s, thus indicating zero correlation. In another word, auto- or serial correlation, is absent.

Sixth Assumption: The disturbance u and explanatory variable X are uncorrelated. The it is assumed that X and u (which may represent the influence of all the omitted variables) have separate (and additive) influence on Y. But if X and u are correlated, it is not possible to assess their individual effects on Y. Thus, if X and u are positively correlated, X increases when u increases and it decreases when u decreases. Similarly, if X and u are negatively correlated, X increases when u decreases and it decreases when u increases. In either case, it is difficult to isolate the influence of X and u on Y

Seventh Assumption: Note: n > # of X's , n > # of β's

Eight Assumption:  

Ninth Assumption: (1) What variables should be included in the model? (2) What is the functional form of the model? Is it linear in the parameters, the variables, or both? (3) What are the probabilistic assumptions made about the Yi , the Xi, and the ui entering the model?

Tenth Assumption: For models beyond the two-variable model containing several regressors (discussed next chapter).

IV. Gauss–Markov Theorem  

 

 

 

Similarly it can be shown that for β ̂1 too.

According to the graphs: For convenience, assume that β*2, like βˆ2, is unbiased, that is, its average or expected value is equal to β2. Assume further that both βˆ2 and β*2 are linear estimators, that is, they are linear functions of Y. Which estimator, βˆ2 or β*2, would you choose? To answer this question, It is obvious that although both βˆ2 and β*2 are unbiased the distribution of β*2 is more diffused or widespread around the mean value than the distribution of βˆ2. In other words, the variance of β*2 is larger than the variance of βˆ2. Now given two estimators that are both linear and unbiased, one would choose the estimator with the smaller variance because it is more likely to be close to β2 than the alternative estimator. In short, one would choose the BLUE estimator.

IIV. Evaluating Fit versus Quality In economic settings, a high R² (close to 1 may indicate a violation in assumptions) is more likely to indicate something wrong with the regression instead of showing a high quality results. Why not using R² as the only measure of your regression’s quality: You may have high R² BUT no meaningful interpretation ( Check the validity of your economic theory). Using small dataset ( or incorrect data) can lead to high R² value but deceptive results.