Ordinary Least Squares (OLS) Regression What is it? Closely allied with correlation – interested in the strength of the linear relationship between two variables One variable is specified as the dependent variable The other variable is the independent (or explanatory) variable
Regression Model Y = a + bx + e What is Y? What is a? What is b? What is x? What is e?
Elements of the Regression Line a = Y intercept (what Y is predicted to equal when X = 0) b = Slope (indicates the change in Y associated with a unit increase in X) e = error (the difference between the predicted Y (Y hat) and the observed Y
Regression Has the ability to quantify precisely the relative importance of a variable Has the ability to quantify how much variance is explained by a variable(s) Use more often than any other statistical technique
The Regression Line Y = a + bx + e Y = sentence length X = prior convictions Each point represents the number of priors (X) and sentence length (Y) of a particular defendant The regression line is the best fit line through the overall scatter of points
X and Y are observed. We need to estimate a & b
Calculus 101 Least Squares Method and differential calculus Differentiation is a very powerful tool that is used extensively in model estimation. Practical examples of differentiation are usually in the form of minimization/optimization problems or rate of change problems.
How do you draw a line when the line can be drawn in almost any direction? The Method of Least Squares: drawing a line that minimizing the squared distances from the line (Σe2) This is a minimization problem and therefore we can use differential calculus to estimate this line.
Least Squares Method x y Deviation =y-(a+bx) d2 1 1 - a (1 - a)2 1 1 - a (1 - a)2 1-2a+a2 3 3 - a - b (3 - a - b)2 9 - 6a + a2 - 6b + 2ab + b2 2 2 - a - 2b (2 - a - 2b)2 4 - 4a - a2 - 8b + 4ab + 4b2 4 4 - a - 3b (4 - a - 3b)2 16 - 8a + a2 - 24b + 6ab +9b2 5 5 - a - 4b (5 - a - 4b)2 25 - 10a +a2 -40b +8ab +16b2
Summing the squares of the deviations yields: f(a, b) = 55-30a + 5a2 - 78b + 20ab + 30b2 Calculate the first order partial derivatives of f(a,b) fb = -78 + 20a + 60b and fa = -30 + 10a + 20b
Set each partial derivative to zero: Manipulate fa: 0 = -30 + 10a + 20b 10a = 30 - 20b a= 3 - 2b
Substitute (3-2b) into fb: 0 = -78 + 20a + 60b = -78 +20(3-2b) + 60b = -78 + 60 - 40b + 60b = -18 +20b 20b = 18 b = 0.9 Slope = .09
Substituting this value of b back into fa to obtain a: Y-intercept = 1.2
Estimating the model (the easy way) Calculating the slope (b)
Sum of Squares for X Some of Squares for Y Sum of products
We’ve seen these values before
Regression is strongly related to Correlation
Calculating the Y-intersept (a) Calculating the error term (e) Y hat = predicted value of Y e will be different for every observation. It is a measure of how much we are off in are prediction.