Presentation is loading. Please wait.

Presentation is loading. Please wait.

Simple Linear Regression

Similar presentations


Presentation on theme: "Simple Linear Regression"— Presentation transcript:

1 Simple Linear Regression
Chapter 4 Simple Linear Regression

2 Learning Objectives Understand the goals of simple linear regression analysis Consider what the error term contains Define the population regression model and the sample regression function Estimate the sample regression function Interpret the estimated sample regression function Predict outcomes based on our estimated sample regression function Assess the goodness-of-fit of the estimated sample regression function Understand how to read regression output in Excel Understand the difference between correlation and causation

3

4 Understand the Goals of Simple Linear Regression Analysis
Regression analysis is used to: Obtain the marginal effect that a one-unit change in the independent variable has on the dependent variable Predict the value of a dependent variable based on the value of the independent variable Dependent or explanatory variable: the variable we wish to explain Independent variable: the variable used to explain the dependent variable

5 Simple Linear Regression Model
The term simple refers to that there is only one independent variable, x, Relationship between x and y is described by a linear function Regression refers to the manner the relationship is estimated Changes in y are assumed to be caused by changes in x (although this is not typically the case)

6 Types of Regression Models
Positive Linear Relationship Relationship NOT Linear Negative Linear Relationship No Relationship

7 Population Linear Regression Model
The population regression model: Random Error term, or residual Population Slope Coefficient Population y intercept Independent Variable Dependent Variable Linear component Random Error component

8 Population Linear Regression
y Observed Value of y for xi εi Slope = β1 Predicted Value of y for xi Random Error for this x value Intercept = β0 xi x

9 Consider What the Random Error Component, ε, Contains
Omitted Variables – independent variables that are related to the dependent variable, y, but are not in the regression model (i.e. they are omitted). Measurement Error – the difference between the measured value of the observation and the true value. This can occur if there is a data entry error or if a person, firm, etc. does not know the true value and instead reports an incorrect value.

10 Consider What the Random Error Component, ε, Contains
Incorrect Functional Form – the wrong model is fit to the data. For example, a linear function is fit between y and x but the true relationship is quadratic. Random Component – the variable being studies is inherently random. Even if two people have the same number of years of education, they may earn different salaries due to random factors aside from the omitted factors listed above.

11 Estimated Regression Function
The sample regression line provides an estimate of the population regression line Estimated (or predicted) y value Estimate of the regression intercept Estimate of the regression slope Independent variable

12 What is a Residual? A residual is the difference between the observed value of y and the predicted value of y. It is an estimate of the error term, ε, that resides in the population while the residual is from the sample.

13 Graph of the Sample Regression Function

14 Graph of Predictions and Residuals for Multiple Observations

15 Estimate the Sample Regression Function
and are obtained by minimizing the sum of the squared residuals with respect to and

16 The Least Squares Equation
The formulas for and are: and

17 Interpretation of the Slope and the Intercept
is, on average, the estimated value of y when x is equal to zero is, on average, the estimated change in the value of y as a result of a one-unit change in x

18 Salary (y) vs. Education (x) Example in salary.xls

19 Example continued

20 A Graphical Representation of the Estimated Regression Line

21 Using Excel to Compute the Estimated Regression Equation in a Scatter Plot
Create a scatter diagram in Excel Position the mouse over any data point and right click Select Add Trendline option When the Add Trendline dialog box appears: On the Type tab select Linear (it is the default) On the Options tab select the Display equation on chart box (note the equation is displayed with the slope first and the intercept second) Click OK

22 Interpret the Estimated Sample Regression Function
: On average, if education goes up by one year then salary will go up by $11, : On average, if an individual has 0 years of education then their estimated salary is $-121, (this estimate is obviously ridiculous)

23 Predict Outcomes Based on our Estimated Sample Regression Function
Say we want to predict salary for a person with 12 years of education. We would put this value of x into the sample regression function as We predict a salary of $13, for a person with 12 years of education.

24 Assess the Goodness-of-Fit of the Estimated Regression Function
Goodness-of-Fit is how well the regression model describes the observed data. Two measures of goodness-of-fit R-squared Standard error of the regression

25 Comparing the Goodness-of-Fit of Two Hypothetical Data Sets

26 A Venn Diagram Demonstrating Joint Variation between y and x

27 The Sample Regression Function Explains None of the Variation in y

28 The Sample Regression Function Explains All of the Variation in y

29 The Sample Regression Function Explains All of the Variation in y

30 Explained and Unexplained Variation
Total variation in the dependent variable is made up of two parts: Total sum of Squares Explained Sum of Squares Unexplained Sum of Squares where: = Average value of the dependent variable y = Observed values of the dependent variable = Estimated value of y for the given x value

31 Explained and Unexplained Variation
SST = total sum of squares Measures the total variation of the yi values around the mean of y. This is the numerator of the variance of y ESS = explained sum of squares Variation in y attributable to the portion of the dependent variable y that is explained by the independent variable x USS = unexplained sum of squares Variation in y attributable to factors other than the relationship between x and y

32 Explained and Unexplained Variation
y yi y USS = (yi - yi )2 _ TSS = (yi - y)2 _ y ESS = (yi - y)2 _ _ y y x Xi

33 Coefficient of Determination, R2
The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable The coefficient of determination is also called R-squared and is denoted as R2 where

34 Coefficient of Determination, R2
Note: In simple linear regression, R2 is equal to the correlation coefficient squared where: R2 = Coefficient of determination = correlation coefficient between x and y

35 How are the Correlation Coefficient and the Coefficient of Determination Related?
R2 = rxy2 Note that this relationship only occurs with simple linear regression.

36 What is the Intuition Behind This Relationship?
In the case of linear relationship between two variables, both the coefficient of determination and the sample correlation coefficient provide measures of the strength of relationship. The coefficient of determination provides a measure between 0 and 1 whereas the correlation coefficient provides a measure between -1 and 1. The coefficient of determination can be used for nonlinear relationships and for relationships that have two or more independent variables. Why might the correlation coefficient be preferred to the coefficient of determination?

37 Examples of Approximate R2 Values
y R2 = 1 Perfect linear relationship between x and y: 100% of the variation in y is explained by variation in x Note that the R2 is positive even if the line has a negative slope x R2 = 1 y x R2 = +1

38 Examples of Approximate R2 Values
y 0 < R2 < 1 Weaker linear relationship between x and y: Some but not all of the variation in y is explained by variation in x x y x

39 Examples of Approximate R2 Values
y No linear relationship between x and y: The value of Y does not depend on x. (None of the variation in y is explained by variation in x) x R2 = 0

40 What does R2 mean? R2 means that R2*100% of the variation in y is explained by x. For example if R2=.85 we would say that 85% of the variation in y is explained by x.

41 Calculating R2 for the salary.xls example
This says 63.73% of the variation in salary is explained by education

42 Using Excel to Compute the Coefficient of Determination
Position the mouse pointer over any data point in the scatter diagram and right click to display the chart menu. Select Add Trendline option When the Add Trendline dialog box occurs: On the Options tab display the R-squared value on the chart box and click OK.

43 The Standard Error of the Estimated Sample Regression Function
The standard error of the regression function measures, on average, how far the points fall away from the regression line. where k = the number of explanatory variables. In simple linear regression k = 1.

44 Calculation of the Standard Error for the salary.xls Example

45 Reading Regression Output in Excel: Intercept and Slope

46 Reading Regression Output in Excel: R2
63.73% of the variation in salary is explained by the variation in education Explained Unexplained Total

47 Reading Regression Output in Excel: Standard Error
Explained Unexplained Total

48 Excel’s Regression Tool
Select the Tools menu Choose the Data Analysis option Choose Regression from the list of Analysis Tools Input y into the Input Y Range Input x into the Input X Range Select Labels Select Output Range in the sheet Click OK

49 Understand the Difference between Correlation and Causation
Correlation is when there is a linear relationship between two random variables. Causation occurs between two random variables when changes in one variable (say x) causes changes in another variable (say y) Spurious correlation occurs when there is correlation between two random variables that results from a relationship from a third random variable

50 Understand the Difference between Correlation and Causation
Just because there is correlation between two random variables it does not mean causation. Examples: The more firemen at a fire is linked to increased monetary damages from the fire. The number of shark attacks and ice cream sales are positively related. Students who are tutored tend to get worse grades than children that are tutored. See Google correlate for more real world examples of this phenomenon.


Download ppt "Simple Linear Regression"

Similar presentations


Ads by Google