Best Fitting Line Clint’s Assignment Simple Regression Model

Lecture 5 Preview: Ordinary Least Squares Estimation Procedure  The Mechanics
Best Fitting Line Clint’s Assignment Simple Regression Model Parameters of the Model Error Term Best Fitting Line Needed: A Systematic Procedure to Determine the Best Fitting Line Ordinary Least Squares (OLS) Estimation Procedure Sum of Squared Residuals Criterion Finding the Best Fitting Line Importance of the Error Term Absence of Random Influences: A What If Question Presence of Random Influences: Back to Reality Error Terms and Random Influences: A Closer Look Clint’s Assignment: The Two Parts

Income and Savings Year Income Savings Year Income Savings Year Income Savings Do the data support the theory? Theory: Additional income increases savings. In general, yes. Scatter Diagram: Income versus Savings How can we estimate the relationship between savings and income more precisely? Best fitting line: y = .14x  10 What does the .14 coefficient suggest? An additional $1 of income increases savings by $.14; or $1,000 of income increase savings by $140. Aside: Random Influences In the real world, the data will never reveal the relationship between savings and income perfectly as a consequence of random influences.

Clint’s Assignment: Studying and Quiz Scores
Three students are enrolled in Professor Jeff Lord’s 8:30 am class. Every week, he gives a quiz. Professor Lord asks his students to report the number of minutes they studied; the students always respond honestly. Scatter Diagram Theory: Additional studying increases quiz scores. Our “theory” suggests that a student’s score on the quiz increases when he/she studies more. Std 3 Std 2 Also, it is generally believed that Professor Lord awards students some points just for showing up for a quiz that early in the morning. First Quiz: Student Minutes Score Std 1 Question: Do the data support the theory? Yes. The Regression Model: yt = Const + xxt + et yt = Score received by student t: Dependent Variable et = Error term for student t xt = Minutes studied by student t: Explanatory Variable Interpretation of the Parameters Const represents the points given by Professor Lord for just showing up x represents the additional points received for each additional minute studied Interpretation of the Error Term et is a random variable; et represents random influences, the factors that cannot be anticipated or determined with certainty before the quiz is given. When will et be positive; that is, when will yt be unusually high? When will et be negative; that is, when will yt be unusually low?

Notation: ’s denote the actual values; b’s denote the estimates.
Theory: Additional studying increases quiz scores. Scatter Diagram First Quiz: Student Minutes Score The Regression Model: yt = Const + xxt + et yt = Actual quiz score for student t xt = Actual number of minutes studied by student t Parameters of the Model et = Error term for student t Const: Points given for just showing up x: Additional points for each additional minute studied Model’s Implicit Assumptions: Professor Lord gives each student the same number of points for showing up. Points earned for each minute studied is the same for each student What can Clint do? Clint’s Assignment: Find the values of Const and x? But we cannot observe Const and x. Econometrician’s Philosophy: If you lack the information to determine the value directly, do the best you can by estimating the value using the information you do have. Notation: ’s denote the actual values; b’s denote the estimates. How can we estimate the relationship between scores and studying? Strategy: Use intercept and slope of the best fitting line to estimate Const and x. bConst = Intercept of the best fitting line bConst estimates the value of Const bx = Slope of the best fitting line bx estimates the value of x Problem: Different individuals would “eye” the best-fitting line differently. Needed: We need a systematic procedure to determine the best fitting line.

Ordinary Least Squares (OLS) Estimation Procedure
The most commonly used method to find the best fitting line. OLS Criterion: Minimize the sum of squared residuals. Step 1: Define the sum of squared residuals (SSR) The Regression Model: yt = Const + xxt + et yt = Actual quiz score received by student t: Dependent variable xt = Actual number of minutes studied by student t: Explanatory variable et = Error for student t Const = Actual constant: Points awarded for showing up x = Actual coefficient: Additional points earned for an additional minute studied The Estimate: Estyt = bConst + bxxt Estyt = Estimated quiz score for student t bConst = Estimated constant; that is, bConst estimates the value of Const bx = Estimated coefficient; that is, bx estimates the value of x The Residual: Rest = yt  Estyt Rest = Residual for student t = Actual quiz score for student t  Estimated quiz score for student t Esty1 = bConst + bxx Esty2 = bConst + bxx Esty3 = bConst + bxx3 Res1 = y1  Esty Res2 = y2  Esty Res3 = y3  Esty3 Res1 = y1  bConst  bxx Res2 = y2  bConst  bxx Res3 = y3  bConst  bxx3 2 Res1 2 Res2 2 Res3 SSR = = (y1  bConst  bxx1) (y2  bConst  bxx2) (y3  bConst  bxx3)2

Step 2: Differentiate the sum of squared residuals (SSR) with respect to bConst
SSR = (y1  bConst  bxx1) (y2  bConst  bxx2) (y3  bConst  bxx3)2 dSSR dbConst = 2(y1  bConst  bxx1)  2(y2  bConst  bxx2)  2(y3  bConst  bxx3) (y1  bConst  bxx1) (y2  bConst  bxx2) (y3  bConst  bxx3) = 0 (y1 + y y3)  (bConst + bConst + bConst) + ( bxx1  bxx2  bxx3) = 0 (y1 + y y3)  bConst  bx (x x x3) = 0 x1 + x x3 3 y1 + y y3 3  bConst  bx = 0 y  x   bConst  bx = 0 x  y = bConst bx x  y bConst =  bx For future reference, note that:

Step 3: Differentiate the sum of squared residuals (SSR) with respect to bx

Ordinary Least Squares (OLS) Estimates - Calculations
First Quiz: Student x y x = Minutes studied y = Quiz score OLS Best Fitting Line: Esty = x The equations: = 81  15 = 81  63 = 81  18 = 63 240 = = = 1.2 200 The means: = 15 = 81 The deviations from the means: Student 1 2 3 15 10 6 9 10 Products of x and y deviations and squared x deviations: Student 1 2 3 (15) (6) (9) (10) (0) (10) = 150 (10)2 (0)2 (10)2 = 100 = = = 90 = 100 Sum = 240 Sum = 200

Ordinary Least Squares (OLS)
The Sum of Squared Residuals for the Best Fitting Line The Residual: Rest = yt  Estyt = Actual quiz score for student t  Estimated quiz score for student t Best Fitting Line: bConst = 63 and bx = 1.2. Student xt yt Rest = yt  Estyt Rest2 66  69 =  3 9 87  81 = 6 36 90  93 =  3 9 SSR = 54  EViews  Lab 5.1 Ordinary Least Squares (OLS) Dependent Variable: y Explanatory Variable(s): Estimate SE t-Statistic Prob x 0.2601 Const 0.0891 Number of Observations 3 Sum Squared Residuals bx = 1.2 Esty = x bConst = 63 SSR = 54

Dependent Variable: y Explanatory Variable(s): Estimate SE t-Statistic Prob x 0.2601 Const 0.0891 Number of Observations 3 Sum Squared Residuals bx = 1.2 bConst = 63 Esty = x yt = Score received by student t: xt = Minutes studied by student t: The Regression Model: yt = Const + xxt + et et = Error term for student t Theory: Additional studying increases quiz scores. Our “theory” suggests that a student’s score on the quiz increases when he/she studies more. Also, it is generally believed that Professor Lord awards students some points just for showing up for a quiz that early in the morning. Const represents the points given by Professor Lord for just showing up x represents the additional points received for each additional minute studied Interpretation: We estimate that Professor Lord gives students 63 points for showing up. Studying one additional minute results in 1.2 additional points.

 Lab 5.2 Importance of the Error Term
The Regression Model: yt = Const + xxt + et Assume Const = 50 and x = 2 et, the error term, is a random variable; it represents the factors that cannot be anticipated and/or determined before the quiz is given. It represents all the random influences. First Quiz: Student Minutes (xt) Absence of Random Influences Score (yt = xt) With Random Influences Score (yt = xt + et) 50 + 25 = = 60 50 + 215 = = 80 50 + 225 = = 100 WHAT IF Question: What if there were no random influences; that is, what if there were no error term? Const = 50 and x = 2 Without the error term: Std 3 Std 2  Lab 5.2 yt = xt + et OLS Estimate: y = x In the absence of random influences (the error term), the best fitting line fits the data perfectly. We can determine the actual value of the coefficient by calculating the slope of the line using any two points. Student 1: e1 > 0 Actual: y = x Student 2: e2 > 0 Std 1 Student 3: e3 < 0 Back to Reality: There are random influences in the real world.

 Lab 5.3 The Constant and Coefficient Estimates Are Random Variables
Real World  Random influences are present as represented by the regression model’s error term. Std 3 Std 2  Claim: As a consequence of random influences, we cannot expect the intercept and slope of the best fitting line to equal the actual constant and coefficient. OLS Estimate: y = x Actual: y = x  In fact, even if we knew the actual values of the constant and coefficient, we could not predict the constant and coefficient of the best fitting line with certainty before the quiz is given. Std 1 As a consequence of the random influences, we can be all but certain that  The intercept and slope of the best fitting line, bConst and bx, are random variables. the intercept of the best fitting line will not equal the actual intercept, 50 the slope of the best fitting line will not equal the actual slope, 2.

The Error Term Reflects Random Influences: A Closer Look
The Regression Model: yt = Const + xxt + et et is a random variable. Before the experiment is conducted: Bad news. What we do not know: We cannot determine the numerical value of the random variable with certainty before the experiment is conducted. Good news. What we do know: On the other hand, we can often calculate the random variable’s probability distribution telling us how likely it is for the random variable to equal each of its possible numerical values. Intuition: What happens after many, many quizzes? Since the error term represents the random influences, a student’s error term , et, should be: positive about half the time indicating that the student performs better than “usual;” negative about half the time indicating that the student performs worse than is “usual.” “In the long run” after many, many repetitions, the error terms should average out to 0.

 Lab 5.4 Error Terms and Random Influences
The Regression Model: yt = Const + xxt + et We shall illustrate two points:  Lab 5.4 The error term is a random variable. The error term represents random influences. After many, many repetitions: et Mean[e1] = 0 Mean[e2] = 0 Mean[e3] = 0  e1 is positive half the time and negative half the time  e2 is positive half the time and negative half the time  e3 is positive half the time and negative half the time  e1 has no systematic effect on Student 1’s quiz score  e2 has no systematic effect on Student 2’s quiz score  e3 has no systematic effect on Student 3’s quiz score  e1 represents a random influence  e2 represents a random influence  e3 represents a random influence Summary: The mean of the probability distribution for each student’s error term equals 0. The chances that a student’s error term will be positive in any one quiz are about equal to the chances that it will be negative. A student’s error term has no systematic effect on his/her quiz score. A student’s error term represents a random influence.

Clint’s Assignment: Where Do We Stand? Theory: Additional studying increases quiz scores. Ordinary Least Squares (OLS) Dependent Variable: y Explanatory Variable(s): Estimate SE t-Statistic Prob x 0.2601 Const 0.0891 Number of Observations 3 Sum Squared Residuals Summary The OLS estimate for the value of the coefficient is 1.2; Clint estimates that an additional minute of studying results in 1.2 additional points suggesting that the theory is correct. But, since random influences are present in the real world, we know that the coefficient estimate is a random variable. We are all but certain that the numerical value of the coefficient estimate, 1.2, does NOT equal the actual value of the coefficient. What should Clint do? We will proceed by dividing Clint’s assignment into two related parts: Coefficient Reliability: How reliable is the coefficient estimate calculated from the results of the first quiz? That is, how confident should Clint be that the coefficient estimate, 1.2, will be close to the actual value? Theory Confidence: How much confidence should Clint have in the theory that studying more leads to higher quiz scores?

Best Fitting Line Clint’s Assignment Simple Regression Model

Similar presentations

Presentation on theme: "Best Fitting Line Clint’s Assignment Simple Regression Model"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Best Fitting Line Clint’s Assignment Simple Regression Model

Similar presentations

Presentation on theme: "Best Fitting Line Clint’s Assignment Simple Regression Model"— Presentation transcript:

Similar presentations

About project

Feedback