Best Fitting Line Clint’s Assignment Simple Regression Model

Slides:



Advertisements
Similar presentations
Managerial Economics in a Global Economy
Advertisements

Chapter 12 Simple Linear Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
The Simple Linear Regression Model: Specification and Estimation
9. SIMPLE LINEAR REGESSION AND CORRELATION
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Pertemua 19 Regresi Linier
Introduction to Linear Regression and Correlation Analysis
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
Lecture 3 Preview: Interval Estimates and the Central Limit Theorem Review Populations, Samples, Estimation Procedures, and the Estimate’s Probability.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Summary of introduced statistical terms and concepts mean Variance & standard deviation covariance & correlation Describes/measures average conditions.
10B11PD311 Economics REGRESSION ANALYSIS. 10B11PD311 Economics Regression Techniques and Demand Estimation Some important questions before a firm are.
Simple Linear Regression. The term linear regression implies that  Y|x is linearly related to x by the population regression equation  Y|x =  +  x.
Lecture 7 Preview: Estimating the Variance of an Estimate’s Probability Distribution Review: Ordinary Least Squares (OLS) Estimation Procedure Importance.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Lecture 11 Preview: Hypothesis Testing and the Wald Test Wald Test Let Statistical Software Do the Work Testing the Significance of the “Entire” Model.
Lecture 6 Preview: Ordinary Least Squares Estimation Procedure  The Properties Clint’s Assignment: Assess the Effect of Studying on Quiz Scores General.
Essentials of Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly Copyright © 2014 by McGraw-Hill Higher Education. All rights.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Linear Regression 1 Sociology 5811 Lecture 19 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Lecture 6 Feb. 2, 2015 ANNOUNCEMENT: Lab session will go from 4:20-5:20 based on the poll. (The majority indicated that it would not be a problem to chance,
The simple linear regression model and parameter estimation
Chapter 20 Linear and Multiple Regression
Regression Analysis AGEC 784.
Inference for Least Squares Lines
Correlation and Simple Linear Regression
Revisit Omitted Explanatory Variable Bias
Lecture 13 Preview: Dummy and Interaction Variables
Lecture 21 Preview: Panel Data
Lecture 9 Preview: One-Tailed Tests, Two-Tailed Tests, and Logarithms
Lecture 8 Preview: Interval Estimates and Hypothesis Testing
Chapter 5 STATISTICS (PART 4).
Lecture 18 Preview: Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables Review Regression Model Standard Ordinary.
Lecture 15 Preview: Other Regression Statistics and Pitfalls
Regression 1 Sociology 8811 Copyright © 2007 by Evan Schofer
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Slides by JOHN LOUCKS St. Edward’s University.
Lecture 17 Preview: Autocorrelation (Serial Correlation)
Correlation and Simple Linear Regression
I271B Quantitative Methods
CHAPTER 29: Multiple Regression*
Review: Explanatory Variable/Error Term Correlation and Bias
Lecture 22 Preview: Simultaneous Equation Models – Introduction
Lecture 16 Preview: Heteroskedasticity
Regression Models - Introduction
Simple Linear Regression
Correlation and Simple Linear Regression
The Simple Linear Regression Model: Specification and Estimation
Warm-up: This table shows a person’s reported income and years of education for 10 participants. The correlation is .79. State the meaning of this correlation.
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
SIMPLE LINEAR REGRESSION
3.2. SIMPLE LINEAR REGRESSION
Introduction to Regression
MGS 3100 Business Analysis Regression Feb 18, 2016
Regression Models - Introduction
REGRESSION ANALYSIS 11/28/2019.
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Lecture 5 Preview: Ordinary Least Squares Estimation Procedure  The Mechanics Best Fitting Line Clint’s Assignment Simple Regression Model Parameters of the Model Error Term Best Fitting Line Needed: A Systematic Procedure to Determine the Best Fitting Line Ordinary Least Squares (OLS) Estimation Procedure Sum of Squared Residuals Criterion Finding the Best Fitting Line Importance of the Error Term Absence of Random Influences: A What If Question Presence of Random Influences: Back to Reality Error Terms and Random Influences: A Closer Look Clint’s Assignment: The Two Parts

Income and Savings Year Income Savings Year Income Savings Year Income Savings 1950 210.1 17.9 1959 350.5 32.9 1968 625.0 67.0 1951 231.0 22.5 1960 365.4 33.7 1969 674.0 68.8 1952 243.4 23.9 1961 381.8 39.7 1970 735.7 87.2 1953 258.6 25.5 1962 405.1 41.8 1971 801.8 99.9 1954 264.3 24.3 1963 425.1 42.4 1972 869.1 98.5 1955 283.3 24.5 1964 462.5 51.1 1973 978.3 125.9 1956 303.0 31.3 1965 498.1 54.3 1974 1071.6 138.2 1957 319.8 32.9 1966 537.5 56.6 1975 1187.4 153.0 1958 330.5 34.3 1967 575.3 67.5 Do the data support the theory? Theory: Additional income increases savings. In general, yes. Scatter Diagram: Income versus Savings How can we estimate the relationship between savings and income more precisely? Best fitting line: y = .14x  10 What does the .14 coefficient suggest? An additional $1 of income increases savings by $.14; or $1,000 of income increase savings by $140. Aside: Random Influences In the real world, the data will never reveal the relationship between savings and income perfectly as a consequence of random influences.

Clint’s Assignment: Studying and Quiz Scores Three students are enrolled in Professor Jeff Lord’s 8:30 am class. Every week, he gives a quiz. Professor Lord asks his students to report the number of minutes they studied; the students always respond honestly. Scatter Diagram Theory: Additional studying increases quiz scores. Our “theory” suggests that a student’s score on the quiz increases when he/she studies more. Std 3 Std 2 Also, it is generally believed that Professor Lord awards students some points just for showing up for a quiz that early in the morning. First Quiz: Student Minutes Score 1 5 66 2 15 87 3 25 90 Std 1 Question: Do the data support the theory? Yes. The Regression Model: yt = Const + xxt + et yt = Score received by student t: Dependent Variable et = Error term for student t xt = Minutes studied by student t: Explanatory Variable Interpretation of the Parameters Const represents the points given by Professor Lord for just showing up x represents the additional points received for each additional minute studied Interpretation of the Error Term et is a random variable; et represents random influences, the factors that cannot be anticipated or determined with certainty before the quiz is given. When will et be positive; that is, when will yt be unusually high? When will et be negative; that is, when will yt be unusually low?

Notation: ’s denote the actual values; b’s denote the estimates. Theory: Additional studying increases quiz scores. Scatter Diagram First Quiz: Student Minutes Score 1 5 66 2 15 87 3 25 90 The Regression Model: yt = Const + xxt + et yt = Actual quiz score for student t xt = Actual number of minutes studied by student t Parameters of the Model et = Error term for student t Const: Points given for just showing up x: Additional points for each additional minute studied Model’s Implicit Assumptions: Professor Lord gives each student the same number of points for showing up. Points earned for each minute studied is the same for each student What can Clint do? Clint’s Assignment: Find the values of Const and x? But we cannot observe Const and x. Econometrician’s Philosophy: If you lack the information to determine the value directly, do the best you can by estimating the value using the information you do have. Notation: ’s denote the actual values; b’s denote the estimates. How can we estimate the relationship between scores and studying? Strategy: Use intercept and slope of the best fitting line to estimate Const and x. bConst = Intercept of the best fitting line bConst estimates the value of Const bx = Slope of the best fitting line bx estimates the value of x Problem: Different individuals would “eye” the best-fitting line differently. Needed: We need a systematic procedure to determine the best fitting line.

Ordinary Least Squares (OLS) Estimation Procedure The most commonly used method to find the best fitting line. OLS Criterion: Minimize the sum of squared residuals. Step 1: Define the sum of squared residuals (SSR) The Regression Model: yt = Const + xxt + et yt = Actual quiz score received by student t: Dependent variable xt = Actual number of minutes studied by student t: Explanatory variable et = Error for student t Const = Actual constant: Points awarded for showing up x = Actual coefficient: Additional points earned for an additional minute studied The Estimate: Estyt = bConst + bxxt Estyt = Estimated quiz score for student t bConst = Estimated constant; that is, bConst estimates the value of Const bx = Estimated coefficient; that is, bx estimates the value of x The Residual: Rest = yt  Estyt Rest = Residual for student t = Actual quiz score for student t  Estimated quiz score for student t Esty1 = bConst + bxx1 Esty2 = bConst + bxx2 Esty3 = bConst + bxx3 Res1 = y1  Esty1 Res2 = y2  Esty2 Res3 = y3  Esty3 Res1 = y1  bConst  bxx1 Res2 = y2  bConst  bxx2 Res3 = y3  bConst  bxx3 2 Res1 2 Res2 2 Res3 SSR = + + = (y1  bConst  bxx1)2 + (y2  bConst  bxx2)2 + (y3  bConst  bxx3)2

Step 2: Differentiate the sum of squared residuals (SSR) with respect to bConst SSR = (y1  bConst  bxx1)2 + (y2  bConst  bxx2)2 + (y3  bConst  bxx3)2 dSSR dbConst = 2(y1  bConst  bxx1)  2(y2  bConst  bxx2)  2(y3  bConst  bxx3) (y1  bConst  bxx1) + (y2  bConst  bxx2) + (y3  bConst  bxx3) = 0 (y1 + y2 + y3)  (bConst + bConst + bConst) + ( bxx1  bxx2  bxx3) = 0 (y1 + y2 + y3)  3bConst  bx (x1 + x2 + x3) = 0 x1 + x2 + x3 3 y1 + y2 + y3 3  bConst  bx = 0 y  x   bConst  bx = 0 x  y = bConst + bx x  y bConst =  bx For future reference, note that:

Step 3: Differentiate the sum of squared residuals (SSR) with respect to bx

Ordinary Least Squares (OLS) Estimates - Calculations First Quiz: Student x y 1 5 66 x = Minutes studied 2 15 87 y = Quiz score 3 25 90 OLS Best Fitting Line: Esty = 63 + 1.2x The equations: = 81  15 = 81  63 = 81  18 = 63 240 = = = 1.2 200 The means: = 15 = 81 The deviations from the means: Student 1 2 3 66 87 90 8181 81 15 5 15 25 1515 15 10 6 9 10 Products of x and y deviations and squared x deviations: Student 1 2 3 (15) (6) (9) (10) (0) (10) = 150 (10)2 (0)2 (10)2 = 100 = 0 = 0 = 90 = 100 Sum = 240 Sum = 200

Ordinary Least Squares (OLS) The Sum of Squared Residuals for the Best Fitting Line The Residual: Rest = yt  Estyt = Actual quiz score for student t  Estimated quiz score for student t Best Fitting Line: bConst = 63 and bx = 1.2. Student xt yt Rest = yt  Estyt Rest2 66  69 =  3 9 1 5 66 2 15 87 3 25 90 87  81 = 6 36 90  93 =  3 9 SSR = 54  EViews  Lab 5.1 Ordinary Least Squares (OLS) Dependent Variable: y Explanatory Variable(s): Estimate SE t-Statistic Prob x 1.200000 0.519615 2.309401 0.2601 Const 63.00000 8.874120 7.099296 0.0891 Number of Observations 3 Sum Squared Residuals 54.00000 bx = 1.2 Esty = 63 + 1.2x bConst = 63 SSR = 54

Ordinary Least Squares (OLS) Dependent Variable: y Explanatory Variable(s): Estimate SE t-Statistic Prob x 1.200000 0.519615 2.309401 0.2601 Const 63.00000 8.874120 7.099296 0.0891 Number of Observations 3 Sum Squared Residuals 54.00000 bx = 1.2 bConst = 63 Esty = 63 + 1.2x yt = Score received by student t: xt = Minutes studied by student t: The Regression Model: yt = Const + xxt + et et = Error term for student t Theory: Additional studying increases quiz scores. Our “theory” suggests that a student’s score on the quiz increases when he/she studies more. Also, it is generally believed that Professor Lord awards students some points just for showing up for a quiz that early in the morning. Const represents the points given by Professor Lord for just showing up x represents the additional points received for each additional minute studied Interpretation: We estimate that Professor Lord gives students 63 points for showing up. Studying one additional minute results in 1.2 additional points.

 Lab 5.2 Importance of the Error Term The Regression Model: yt = Const + xxt + et Assume Const = 50 and x = 2 et, the error term, is a random variable; it represents the factors that cannot be anticipated and/or determined before the quiz is given. It represents all the random influences. First Quiz: Student Minutes (xt) 1 5 2 15 3 25 Absence of Random Influences Score (yt = 50 + 2xt) With Random Influences Score (yt = 50 + 2xt + et) 66 87 90 50 + 25 = 50 + 10 = 60 50 + 215 = 50 + 30 = 80 50 + 225 = 50 + 50 = 100 WHAT IF Question: What if there were no random influences; that is, what if there were no error term? Const = 50 and x = 2 Without the error term: Std 3 Std 2  Lab 5.2 yt = 50 + 2xt + et OLS Estimate: y = 63 + 1.2x In the absence of random influences (the error term), the best fitting line fits the data perfectly. We can determine the actual value of the coefficient by calculating the slope of the line using any two points. Student 1: e1 > 0 Actual: y = 50 + 2x Student 2: e2 > 0 Std 1 Student 3: e3 < 0 Back to Reality: There are random influences in the real world.

 Lab 5.3 The Constant and Coefficient Estimates Are Random Variables Real World  Random influences are present as represented by the regression model’s error term. Std 3 Std 2  Claim: As a consequence of random influences, we cannot expect the intercept and slope of the best fitting line to equal the actual constant and coefficient. OLS Estimate: y = 63 + 1.2x Actual: y = 50 + 2x  In fact, even if we knew the actual values of the constant and coefficient, we could not predict the constant and coefficient of the best fitting line with certainty before the quiz is given. Std 1 As a consequence of the random influences, we can be all but certain that  The intercept and slope of the best fitting line, bConst and bx, are random variables. the intercept of the best fitting line will not equal the actual intercept, 50 the slope of the best fitting line will not equal the actual slope, 2.

The Error Term Reflects Random Influences: A Closer Look The Regression Model: yt = Const + xxt + et et is a random variable. Before the experiment is conducted: Bad news. What we do not know: We cannot determine the numerical value of the random variable with certainty before the experiment is conducted. Good news. What we do know: On the other hand, we can often calculate the random variable’s probability distribution telling us how likely it is for the random variable to equal each of its possible numerical values. Intuition: What happens after many, many quizzes? Since the error term represents the random influences, a student’s error term , et, should be: positive about half the time indicating that the student performs better than “usual;” negative about half the time indicating that the student performs worse than is “usual.” “In the long run” after many, many repetitions, the error terms should average out to 0.

 Lab 5.4 Error Terms and Random Influences The Regression Model: yt = Const + xxt + et We shall illustrate two points:  Lab 5.4 The error term is a random variable. The error term represents random influences. After many, many repetitions: et Mean[e1] = 0 Mean[e2] = 0 Mean[e3] = 0  e1 is positive half the time and negative half the time  e2 is positive half the time and negative half the time  e3 is positive half the time and negative half the time  e1 has no systematic effect on Student 1’s quiz score  e2 has no systematic effect on Student 2’s quiz score  e3 has no systematic effect on Student 3’s quiz score  e1 represents a random influence  e2 represents a random influence  e3 represents a random influence Summary: The mean of the probability distribution for each student’s error term equals 0. The chances that a student’s error term will be positive in any one quiz are about equal to the chances that it will be negative. A student’s error term has no systematic effect on his/her quiz score. A student’s error term represents a random influence.

Ordinary Least Squares (OLS) Clint’s Assignment: Where Do We Stand? Theory: Additional studying increases quiz scores. Ordinary Least Squares (OLS) Dependent Variable: y Explanatory Variable(s): Estimate SE t-Statistic Prob x 1.200000 0.519615 2.309401 0.2601 Const 63.00000 8.874120 7.099296 0.0891 Number of Observations 3 Sum Squared Residuals 54.00000 Summary The OLS estimate for the value of the coefficient is 1.2; Clint estimates that an additional minute of studying results in 1.2 additional points suggesting that the theory is correct. But, since random influences are present in the real world, we know that the coefficient estimate is a random variable. We are all but certain that the numerical value of the coefficient estimate, 1.2, does NOT equal the actual value of the coefficient. What should Clint do? We will proceed by dividing Clint’s assignment into two related parts: Coefficient Reliability: How reliable is the coefficient estimate calculated from the results of the first quiz? That is, how confident should Clint be that the coefficient estimate, 1.2, will be close to the actual value? Theory Confidence: How much confidence should Clint have in the theory that studying more leads to higher quiz scores?