Linear Regression A method of calculating a linear equation for the relationship between two or more variables using multiple data points.
Linear Regression General form: Y = a + bX + e “Y” is the dependent variable “X” is the explanatory variable “a” is the intercept parameter “b” is the slope parameter “e” is an error term or residual
Regression Results Y and X come from data A computer program calculates estimates of a and b e is the difference between a + bX and the actual value of Y corresponding to X OLS estimates of a and b minimize the sum of the squared residuals ∑e 2 “OLS” is Ordinary Least Squares
The Regression Line and the Residual
Electricity Demand Example Data for residential customers in U. S. states Y is millions of kilo-watt-hours sold X is population Other data include per capita income, price of electricity (cents/kwh) and price of natural gas
Variable Means 2004
Variable Means 2012
Excel Regression: Milkwh
Actual vs. Predicted Milkwh
Useful Statistics and Tests t-statistic: is estimated coefficient significantly different from zero? Coefficient of determination or R-square: % variation explained F-statistic: statistical significance of the entire regression equation; OR is the R- square significantly different from zero? Find these on the Excel regression output.
Confidence and Significance Levels 99% Confidence = 1% Significance P-value of 0.01 or less 95% Confidence = 5% Significance P-value of 0.05 or less 90% Confidence = 10% Significance P-value of 0.10 or less Smaller Significance Levels Are Better Find P-values for t and F statistics on Excel regression output
Multiple & Nonlinear Regression Multiple Regression Y= a + bX + cW + dZ Nonlinear Regression Quadratic: Y = a + bX + cX 2 Log-Linear: Y = aX b Z c Or Ln Y = (ln a) + b(ln X) + c(ln Z)
Multiple Regression: Milkwh
Quadratic Regression: Milkwh
Log Linear Regression: LnMilkwh
Demand Regression Project DATA: ElecDemandData2012.xls under Project Materials on D2L 1. Using the data file above, run a linear regression of Dependent Variable: Milkwh Explanatory Variables: Pop, Pkwh, PGas, Income. Which coefficients (including the constant) are statistically significant at the 10% level or better? Which are not significant? How much of the variation in the dependent variable is explained by the estimated equation? Is the equation as a whole statistically significant? At what level?
Finding the Marginal Revenue Equation: Overview Evaluate estimated demand at means of all explanatory variables except price Calculate average effect of non-price variables to get demand equation in this form Q = A - b(P) Rearrange to find Inverse Demand equation P = (A/b) - (1/b)Q MR has twice the slope of inverse demand MR = (A/b) – (2/b)Q The end result is an equation, not a number
Finding the Marginal Revenue Equation: Example Write your regression equation in this form Milkwh = 11,000 – 3600Pkwh Pop PGas 11,000 is the intercept or constant coefficient -3600, , and 2150 are estimated coefficients These are made-up numbers for this example Use the mean values of the non-electricity-price variables Pop=5,756,577 PGas=11.4 Substitute into your regression equation and simplify Milkwh = 11,000 – 3600Pkwh (5,756,577) (11.4) Milkwh= [11, , ,510] – 3600(Pkwh) Milkwh = 59, – 3600(Pkwh) This is Q = A – bP from the previous slide
Finding Marginal Revenue Example, Continued Milkwh = 59, – 3600(Pkwh) From end of previous slide Rearrange to find Inverse Demand P = (A/b) - (1/b)Q = (A – Q)/b Pkwh = (59, – Milkwh)/(3600) Pkwh = – (Milkwh) This is the inverse demand equation Marginal Revenue has twice the slope MR = – (Milkwh)