Economics 173 Business Statistics Lecture 10 Fall, 2001 Professor J. Petry
Simple Linear Regression and Correlation Chapter 17
17.1 Introduction In this chapter we employ Regression Analysis to examine the relationship among quantitative variables. The technique is used to predict the value of one variable (the dependent variable - y)based on the value of other variables (independent variables x 1, x 2,…x k.)
17.2 The Model The first order linear model y = dependent variable x = independent variable 0 = y-intercept 1 = slope of the line = error variable x y 00 Run Rise = Rise/Run 0 and 1 are unknown, therefore, are estimated from the data.
17.3 Estimating the Coefficients The estimates are determined by –drawing a sample from the population of interest, –calculating sample statistics. –producing a straight line that cuts into the data. The question is: Which straight line fits best? x y
3 3 The best line is the one that minimizes the sum of squared vertical differences between the points and the line. (1,2) 2 2 (2,4) (3,1.5) Sum of squared differences =(2 - 1) 2 +(4 - 2) 2 +( ) 2 + (4,3.2) ( ) 2 = 6.89 Sum of squared differences =(2 -2.5) 2 +( ) 2 +( ) 2 +( ) 2 = Let us compare two lines The second line is horizontal The smaller the sum of squared differences the better the fit of the line to the data.
To calculate the estimates of the coefficients that minimize the differences between the data points and the line, use the formulas: The regression equation that estimates the equation of the first order linear model is:
Example 17.1 Relationship between odometer reading and a used car’s selling price. –A car dealer wants to find the relationship between the odometer reading and the selling price of used cars. –A random sample of 100 cars is selected, and the data recorded. –Find the regression line. Independent variable x Dependent variable y
Solution using first formula –Solving by hand To calculate b 0 and b 1 we need to calculate several statistics first; where n = 100.
Solution –Solving by hand using the second formula First we obtain the necessary numbers to enter: n = 100.
–Using the computer (see file Xm17-01.xls) Tools > Data analysis > Regression > [Shade the y range and the x range] > OK
This is the slope of the line. For each additional mile on the odometer, the price decreases by an average of $ The intercept is b 0 = No data Do not interpret the intercept as the “Price of cars that have not been driven”
Example – Armani’s Pizza –Armani’s Pizza is considering locating at the U of I campus. To do their financial analysis, they first need to estimate sales for their product. –They have data from their existing 10 locations on other college campuses. –Estimate sales for the University of Illinois, with a college population of 35,000.
Example – Armani’s Pizza –Begin by plotting the data –Followed by calculating the statistics of your regression
Example – Armani’s Pizza –Solve for b 1, then b 0 using the second formula –After obtaining values for your regression line, plug in the value for your independent variable (35, not 35,000) into the formula, and predict sales!
Example – Armani’s Pizza Excel Output
17.4 Error Variable: Required Conditions The error is a critical part of the regression model. Four requirements involving the distribution of must be satisfied. –The probability distribution of is normal. –The mean of is zero: E( ) = 0. –The standard deviation of is for all values of x. –The set of errors associated with different values of y are all independent.
From the first three assumptions we have: y is normally distributed with mean E(y) = 0 + 1 x, and a constant standard deviation From the first three assumptions we have: y is normally distributed with mean E(y) = 0 + 1 x, and a constant standard deviation 0 + 1 x 1 0 + 1 x 2 0 + 1 x 3 E(y|x 2 ) E(y|x 3 ) x1x1 x2x2 x3x3 E(y|x 1 ) The standard deviation remains constant, but the mean value changes with x