Introduction to Regression Analysis Prepared by: Bhakti Joshi February 03, 2012
Assumption of Linearity Merriam Webster defines it as: “…of relating to, resembling or having a graph that is a line and especially a straight line” “…involving a single dimension”
Linearity Examples
Examples Savings and interest rates Investment and interest rates Rise in prices and demand for food Fall in prices and cost of raw material Rise in oil prices and costs of transportation Rise in oil prices and prices of food
Regression Analysis … is a tool to investigate relationships between variables Dependent Variable Independent Variable Error term Deviation from the actual and explanatory variables Variable to be studied Variable that is affected Variable that is the presumed effect Known as “actual” variable Variable that possibly explains the dependent variable Variable that is the presumed cause Known as “explanatory” variable
Yi = α + β Xi + εi Regression Equation Dependent Variable Independent Variable Error term Regression Coefficients This equation is also described as “Regression of Y on X”
Regression Coefficients ‘α’ is the y-intercept or constant of the regression the starting point of the regression line ‘β’ is the slope and represents the rate of change in Yi, with changes in the Xi Also, known as unknown parameters
Calculating unknown parameters Yi = α + β Xi + εi, Assume Σ εi = 0 β = Cov (X,Y) sx2 OR β = AND α = Y - β X Σ (Xi – X) (Yi – Y) OR Σ (Xi – X)2 sx r * sY
Independent Variable (X) Dependent Variable (Y) Example 1. Independent Variable (X) Dependent Variable (Y) 6 2 5 3 11 9 7 1 8 4
Interpretation The y-intercept or constant ‘α’ is the starting value of the regression line when X equals zero. A negative ‘α’ implies that the line begins from below zero (or the 3rd and 4th quadrants). A positive ‘α’ will imply the line begins from above zero Interpretation of the slope ‘β’ is two-fold: It shows the direction of the curve. A negative ‘β’ implies a downward sloping curve and a positive ‘β’ implies an upward sloping curve It reflects the rate of change in X for Y. If the slope is (positive) 0.305 then for every 1 unit increase in X variable, the variable Y will increase by 0.305. If the slope is (negative) 0.305 then for every 1 unit increase in X variable, the variable Y will decrease by 0.305
Estimated Values of Y or Y ᶺ ᶺ Y ᶺ X ᶺ Y X Y = α + β = α + β X 6 2 3.47=3.16 + 0.31 * 1 4.99 =3.16 + 0.31 * 6 5 3 3.78= 3.16 + 0.31 * 2 4.69=3.16 + 0.31 * 5 11 9 4.08= 3.16 + 0.31 * 3 6.52=3.16 + 0.31 * 11 7 1 4.39= 3.16 + 0.31 * 4 5.30=3.16 + 0.31 * 7 8 4.69= 3.16 + 0.31 * 5 4 5.00= 3.16 + 0.31 * 6 4.38=3.16 + 0.31 * 4 5.30= 3.16 + 0.31 * 7 4.38 =3.16 + 0.31 * 4
Regression Line ᶺ ᶺ = α + β X Y ᶺ Y = α + β X (11,9) (5,8) (4,7) (4,5) (5,3) (6,2)
Error Terms ᶺ ε ᶺ Y X Y = Y - ε = Y – (α + βX) 6 2 2 -3.47 = -1.49 -2.99 = 2 – 3.16 – 0.31 * 6 5 3 3 - 3.78 = -0.48 -1.69 = 3 – 3.16 – 0.31 * 5 11 9 9 – 4.08 = 4.92 2.49 = 9 – 3.16 – 0.31 * 11 7 1 1 – 4.39 = -3.39 -4.30 = 1 – 3.16 – 0.31 * 7 8 8 – 4.69 = 3.31 3.32 = 8 – 3.16 – 0.31 * 5 4 7 – 5 = 2 2.62 = 7 – 3.16 – 0.31 * 4 5 – 5.30 = 1.30 0.62 = 5 – 3.16 – 0.31 * 4
Interpretation of Errors Σ ≠ Σ ᶺ ε ε Should be close to or equal to Σ ε ᶺ If = 0 , then the X can explain Y and the data is a good fit Σ ε
Example 2 β = 1.64 α = -7.43 Independent Value (X) Dependent Value (Y) 12 14 9 8 6 10 11 13 7 3 β = 1.64 α = -7.43
Email: bhaktij@gmail.com Website: www.headscratchingnotes.net