Demand Estimation and Forecasting FIN 30210: Managerial Economics Demand Estimation and Forecasting
There are three ways we can attempt to forecast consumer demand… A cross-sectional analysis looks at variations across multiple locations at a single point in time A time series analysis looks at variations across at a single location across time A panel analysis looks at variations across both location and across time
Example: Sherwin-Williams Has added an 11th sales region Example: Sherwin-Williams Has added an 11th sales region. They are contemplating a price of $16/Gal. in this new region. Can we forecast sales at this price? Price Sales Region Sales (x1,000 Gal) Price 1 160 15 2 220 13.5 3 140 16.5 4 190 14.5 5 130 17 6 16 7 200 13 8 150 18 9 210 12 10 15.5 Quantity
Regression Statistics Example: Sherwin-Williams Has added an 11th sales region. They are contemplating a price of $16/Gal. in this new region. Can we forecast sales at this price? SUMMARY OUTPUT Regression Statistics Multiple R 0.87 R Square 0.75 Adjusted R Square 0.72 Standard Error 16.43 Observations 10.00 ANOVA df SS MS Regression 1.00 6489.81 Residual 8.00 2160.19 270.02 Total 9.00 8650.00 Coefficients t Stat Intercept 390.38 44.24 8.82 Price -14.26 2.91 -4.90 Every $1 increase in price lowers sales of paint by 14,260 gallons Price Elasticity of Demand
Example: Sherwin-Williams Has added an 11th sales region Example: Sherwin-Williams Has added an 11th sales region. They are contemplating a price of $16/Gal. in this new region. Can we forecast sales at this price? Price Price Elasticity of Demand Quantity 162.2
Based on the available data, we can solve for the revenue maximizing price… $13.69 $2,672.29 195.2
Issue #1: Model Specification Semi-Log Model Log-Linear Model Linear Model OR OR A $1 increase in price cases a beta units change in quantity A $1 increase in price cases a beta percent change in quantity A 1 percent increase in price cases a beta percent change in quantity
Regression Statistics Semi-Log Model Price SUMMARY OUTPUT Regression Statistics Multiple R 0.86 R Square 0.74 Adjusted R Square 0.70 Standard Error 0.10 Observations 10.00 ANOVA df SS MS Regression 1.00 0.22 Residual 8.00 0.08 0.01 Total 9.00 0.29 Coefficients t Stat Intercept 6.39 0.26 24.20 Price -0.08 0.02 -4.74 Quantity
Example: Sherwin-Williams Has added an 11th sales region Example: Sherwin-Williams Has added an 11th sales region. They are contemplating a price of $16/Gal. in this new region. Can we forecast sales at this price? Price (10.6%) Price Elasticity of Demand Quantity 165.7
Regression Statistics Log linear Model Price SUMMARY OUTPUT Regression Statistics Multiple R 0.85 R Square 0.73 Adjusted R Square 0.70 Standard Error 0.10 Observations 10.00 ANOVA df SS MS Regression 1.00 0.21 Residual 8.00 0.08 0.01 Total 9.00 0.29 Coefficients t Stat Intercept 8.43 0.71 11.93 ln(price) -1.21 0.26 -4.64 Quantity
Example: Sherwin-Williams Has added an 11th sales region Example: Sherwin-Williams Has added an 11th sales region. They are contemplating a price of $16/Gal. in this new region. Can we forecast sales at this price? Price (10.6%) Price Elasticity of Demand Quantity 159.42
Issue #2: Model Specification Error Suppose that I had some additional data… Sales Region Sales (x1,000 Gal) Price Income (x $1,000) 1 160 15 19 2 220 13.5 17.5 3 140 16.5 14 4 190 14.5 21 5 130 17 15.5 6 16 7 200 13 21.5 8 150 18 9 210 12 18.5 10 20 Income is negatively correlated with price (i.e. when price is high, income tends to be low) Income is positively correlated with sales (i.e. when income is high, sales tends to be high) We don’t have income in our regression….this creates a problem!!
Issue #2: Model Specification Error My regression “True” regression VS When price is high Quantity is lower Income is lower (which also lowers quantity) My estimated coefficient will be biased because it is capturing the effect of income as well as than of price My estimated coefficient actually “too negative” (+) (-)
Regression Statistics Let’s see if we can fix this….. SUMMARY OUTPUT Regression Statistics Multiple R 0.89 R Square 0.79 Adjusted R Square 0.73 Standard Error 16.13 Observations 10.00 ANOVA df SS MS Regression 2.00 6829.43 3414.72 Residual 7.00 1820.57 260.08 Total 9.00 8650.00 Coefficients t Stat Intercept 311.61 81.46 3.83 Price -12.31 3.33 -3.70 Income 2.75 2.40 1.14
Example: Sherwin-Williams Has added an 11th sales region Example: Sherwin-Williams Has added an 11th sales region. They are contemplating a price of $16/Gal. in this new region. Can we forecast sales at this price? Let’s assume average income Price I’m cheating a little here!! Price Elasticity of Demand Income Elasticity of Demand Quantity 163.9
Issue #2: Model Specification Error Without Income “True” regression VS When price is high Quantity is lower Income is lower (which also lowers quantity) (+) (-) -1.41
Issue #3: Multicollinearity X is a proxy for income By solving one problem, I have created another… Possible Solution With the correlation between price and income, it’s difficult to determine the separate influences of price and income. My coefficients are no longer biased, but the standard errors are biased upwards We want our proxy to be highly correlated with income, but uncorrelated with price
Issue #4: Simultaneity and Identification Many prices are determined in markets with the interaction of supply and demand. This creates a potential problem. Demand An increase in the error term effects Q Supply So, price and the error term are correlated!! Oh oh!! Equilibrium An increase in the quantity demanded increases quantity supplied in equilibrium An increase in the quantity supplied is causes by a rise in price
VS I can’t distinguish between these two cases!! Issue #4: Simultaneity and Identification Many prices are determined in markets with the interaction of supply and demand. This creates a potential problem. High elasticity of demand with demand shifts Low elasticity of demand with no demand shifts VS I can’t distinguish between these two cases!!
Issue #5: Autocorrelation Sometimes, we use time series to estimate empirical relationships. In which case we can run into autocorrelation. Example: Macroeconomic Trends/Cycles During economic expansions, many variables are above trend for several quarters Positive error terms tend to be followed by positive error terms Positive error terms tend to be followed by negative error terms Example: Undershooting/overshooting If a business buys too much product this month, they purposefully under buy the following month
Issue #5: Autocorrelation Sometimes, we use time series to estimate empirical relationships. In which case we can run into autocorrelation. Looks like a nonlinear relationship here, so I will use logs E-Commerce Retail Sales (Millions) E-commerce Sales Gross Domestic Product Nominal GDP (Billions)
Regression Statistics Issue #5: Autocorrelation Sometimes, we use time series to estimate empirical relationships. In which case we can run into autocorrelation. A 1% rise in GDP raises retail E-commerce sales by 4% SUMMARY OUTPUT Regression Statistics Multiple R 0.99 R Square Adjusted R Square Standard Error 0.08 Observations 66.00 ANOVA df SS MS Regression 1.00 40.77 Residual 64.00 0.44 0.01 Total 65.00 41.21 Coefficients t Stat Intercept -31.92 0.55 -58.31 LN(GDP) 4.42 0.06 77.07 E-Commerce Retail Sales (Millions) Nominal GDP (Billions)
Issue #5: Autocorrelation The residuals are positive for a long time followed by being negative for a long time…positive autocorrelation! Actual vs. Predicted Residuals Log of Sales
Issue #5: Autocorrelation There is a test for autocorrelation call the Durban-Watson statistic In my E-Commerce example, I get a DW statistic of: The Durbin-Watson Statistic will always lie between 0 and 4 Definitely, we have the existence of autocorrelation: With the existence of autocorrelation, my coefficient estimates are still unbiased, but the errors are wrong, so the t-statistics will be off
Regression Statistics Issue #5: Autocorrelation One possible solution is to rerun the regression using differences SUMMARY OUTPUT Regression Statistics Multiple R 0.44 R Square 0.19 Adjusted R Square 0.18 Standard Error 0.04 Observations 65.00 ANOVA df SS MS Regression 1.00 0.02 Residual 63.00 0.09 0.00 Total 64.00 0.11 Coefficients t Stat Intercept 0.01 3.07 X Variable 1 2.49 0.65 3.85
Issue #5: Autocorrelation The residuals are positive for a long time followed by being negative for a long time…positive autocorrelation! Residuals We didn’t get rid of all the autocorrelation, but things look a lot better!
Issue #6: Heteroscedasticity Heteroscedasticity refers to situations where the magnitude of the error term is related in some way to on or more of the independent variables – this also screws up your standard errors Heteroscedasticity Present Normal Size of errors are independent of x Size of errors are increasing in x
Issue #6: Heteroscedasticity Heteroscedasticity refers to situations where the magnitude of the error term is related in some way to on or more of the independent variables – this also screws up your standard errors Gross Domestic Product Total Savings Savings (Billions) This looks pretty linear to me GDP (Billions)
Regression Statistics Issue #6: Heteroscedasticity Heteroscedasticity refers to situations where the magnitude of the error term is related in some way to on or more of the independent variables Every $1 increase in GDP raises Savings by about 20 cents SUMMARY OUTPUT Regression Statistics Multiple R 0.991 R Square 0.983 Adjusted R Square 0.982 Standard Error 145.826 Observations 277.000 ANOVA df SS MS Regression 1.000 328999299.063 Residual 275.000 5847915.277 21265.146 Total 276.000 334847214.340 Coefficients t Stat Intercept -1.845 12.321 -0.150 GDP 0.198 0.002 124.384 Savings (Billions) GDP (Billions)
Issue #6: Heteroscedasticity Heteroscedasticity refers to situations where the magnitude of the error term is related in some way to on or more of the independent variables Savings (Billions) GDP (Billions) Note: There is Autocorrelation here too!
Regression Statistics Issue #6: Heteroscedasticity One Way to deal with this is to take logs…that alters the relationship between the variables SUMMARY OUTPUT Regression Statistics Multiple R 1.00 R Square Adjusted R Square Standard Error 0.10 Observations 277.00 ANOVA df SS MS F Regression 523.24 55155.78 Residual 275.00 2.61 0.01 Total 276.00 525.85 Coefficients t Stat P-value Intercept -1.65 0.03 -48.27 0.00 LN(GDP) 234.85 Every 1% increase in GDP raises Savings by about 1% LN Savings LN GDP
Issue #6: Heteroscedasticity One Way to deal with this is to take logs…that alters the relationship between the variables LN Savings LN GDP Note: There is Autocorrelation here too!
Regression Statistics Issue #6: Heteroscedasticity Another way is to divide everything by the independent variable that is causing the problem SUMMARY OUTPUT Regression Statistics Multiple R 0.06 R Square 0.00 Adjusted R Square Standard Error 0.02 Observations 277.00 ANOVA df SS MS F Regression 1.00 0.95 Residual 275.00 0.10 Total 276.00 Coefficients t Stat P-value Intercept 0.20 87.69 Time 0.98 0.33 The savings rate has averaged 20% Savings Rate Time
Issue #6: Heteroscedasticity Another way is to divide everything by the independent variable that is causing the problem Savings Rate Time Note: There is Autocorrelation here too!
Forecasting using time series Consider the following time series….it has several different components Trend (Long Term – Many Years) Seasonal Cycle (Months) Business Cycle (1 -5 Years) Noise (Days/Weeks) We need to be able to decompose the data into these various components.
Issue#1: Dealing With the Trend What type of trend best fits the data? We have many choices… Exponential Trend Diffusion Trend Linear Trend Estimated As Estimated As Estimated As Quantity increases at a constant rate Quantity increases at a constant percentage rate Quantity increases at a decreasing percentage rate
Retail and Food Service Sales, Monthly Example: Retail and Food Service Sales, Monthly Millions of Dollars, Seasonally Adjusted Now, how about an linear trend…. t = months away from Jan. 1992
Regression Statistics Example: Retail and Food Service Sales, Monthly Millions of Dollars, Seasonally Adjusted SUMMARY OUTPUT Regression Statistics Multiple R 0.99 R Square 0.98 Adjusted R Square Standard Error 12590.39 Observations 294.00 ANOVA df SS MS Regression 1.00 2000751437979.34 Residual 292.00 46287213152.80 158517853.26 Total 293.00 2047038651132.14 Coefficients t Stat Intercept 167114.77 1464.83 114.08 Time 972.01 8.65 112.35 Sales increase by $972M per month
Retail and Food Service Sales, Monthly Example: Retail and Food Service Sales, Monthly Millions of Dollars, Seasonally Adjusted Now, how about an exponential trend…. t = months away from Jan. 1992
Regression Statistics Example: Retail and Food Service Sales, Monthly Millions of Dollars, Seasonally Adjusted SUMMARY OUTPUT Regression Statistics Multiple R 0.98 R Square 0.96 Adjusted R Square Standard Error 0.06 Observations 294.00 ANOVA df SS MS Regression 1.00 23.35 Residual 292.00 1.02 0.00 Total 293.00 24.37 Coefficients t Stat Intercept 12.12 0.01 1762.62 Time 0.0033 81.78 Sales increase by .33% per month (3.96% annualized)
Forecasting with the Trend (6%) In Sample In Sample 564,035 488,926 503,603 463,576 443,170 438,226 Jan 1992 293 June 2016 305 June 2017 Jan 1992 293 June 2016 305 June 2017
In both cases, of I plot the residuals, what I should have is the Business cycle component plus the noise. Recession Recession Recession Recession % Deviation from Trend Dollar Deviation from Trend
Issue#2: Dealing with Seasonality The easiest way to deal with seasonality is to include dummy variables for quarters
New One Family Home Sales in the United States Example: New One Family Home Sales in the United States In Thousands, Monthly: 1963 - 2016 Months from Jan 1963 Dummies for Q1, Q2, and Q3
Regression Statistics Example: New One Family Home Sales in the United States In Thousands, Monthly: 1963 - 2016 SUMMARY OUTPUT Regression Statistics Multiple R 0.30 R Square 0.09 Adjusted R Square Standard Error 18.53 Observations 642.00 ANOVA df SS MS Regression 4.00 22202.96 5550.74 Residual 637.00 218617.78 343.20 Total 641.00 240820.74 Coefficients t Stat Intercept 41.19 1.94 21.18 Time 0.02 0.00 4.26 D1 8.09 2.07 3.91 D2 13.95 6.75 D3 8.88 2.08 4.27 Sales increase by 20 homes per month Quarter Home Sales Relative to the 4th Quarter 1 +8,090 2 +13,950 3 +8,888
New One Family Home Sales in the United States Example: New One Family Home Sales in the United States In Thousands, Monthly: 1963 - 2016 Seasonalized Unseasonalized
New One Family Home Sales in the United States Example: New One Family Home Sales in the United States In Thousands, Monthly: 1963 - 2016 In Sample 103.1 I’m cheating a little here! 66.13 29.1 Jan 1963 642 June 2016 654 June 2017
Issue #3: Dealing with noise Option #1: Moving Averages A moving average of length N would be as follows Forecast
Example: 30 Year Mortgage Rate Percent, Monthly: 1971 - 2016 There is no seasonality here and there is no trend (at least, there shouldn’t be)
Example: 30 Year Mortgage Rate MA(N) Date Interest Rate MA 4 MA 8 January 1981 14.90 --- February 1981 15.13 March 1981 15.40 April 1981 15.58 May 1981 16.40 15.25 June 1981 16.70 15.63 July 1981 16.83 16.02 August 1981 17.29 16.38 September 1981 18.16 16.81 16.03 October 1981 18.45 17.25 16.44 November 1981 17.83 17.68 16.85 December 1981 16.92 17.93 17.16 January 1982 17.40 17.84 17.32 16.63 MA(N) A moving average simply takes the average of the previous N observations
Example: 30 Year Mortgage Rate Percent, Monthly: 1971 - 2016 Moving Averages “Smooth Out” the noise
So, how do we choose N? Date Interest Rate MA 4 Error Squared Error January 1981 14.90 --- February 1981 15.13 March 1981 15.40 April 1981 15.58 May 1981 16.40 15.25 -1.15 1.317 June 1981 16.70 15.63 -1.07 1.150 July 1981 16.83 16.02 -0.81 0.656 August 1981 17.29 16.38 -0.91 0.833 September 1981 18.16 16.81 -1.36 1.836 October 1981 18.45 17.25 -1.21 1.452 November 1981 17.83 17.68 -0.15 0.022 December 1981 16.92 17.93 1.01 1.025 January 1982 17.40 17.84 0.44 0.194 Total 8.484 So, how do we choose N? Root Mean Squared Error
So, how do we choose N? Root Mean Squared Error For the whole sample…the MA4 is the best MA4 MA8 MA12 MA24 RMSE .467 .620 .751 1.02
Exponential Smoothing One criticism of Moving Average models is that they give an equal weight to all observations used for the forecast. AL alternative would be exponential smoothing… Forecast “Smoothing Parameter” Prior Forecast Note what happens when we substitute in our prior forecasts Repeating this process, we get…
Exponential Smoothing One criticism of Moving Average models is that they give an equal weight to all observations used for the forecast. AL alternative would be exponential smoothing… Forecast “Smoothing Parameter” Prior Forecast Another way to write this is…. Prior Forecast Prior Forecast Error
Example: 30 Year Mortgage Rate Date Interest Rate W=.2 W=.4 W=.6 January 1981 14.90 February 1981 15.13 March 1981 15.40 14.95 April 1981 15.58 15.04 May 1981 16.40 15.15 June 1981 16.70 July 1981 16.83 15.66 August 1981 17.29 15.89 September 1981 18.16 16.17 October 1981 18.45 16.57 November 1981 17.83 16.95 December 1981 16.92 17.12 January 1982 17.40 17.08 Use the first observation as your initial forecast
Example: 30 Year Mortgage Rate Date Interest Rate W=.2 W=.4 W=.6 January 1981 14.90 --- ---- February 1981 15.13 March 1981 15.40 14.95 14.99 15.04 April 1981 15.58 15.16 15.26 May 1981 16.40 15.15 15.33 15.45 June 1981 16.70 15.76 16.02 July 1981 16.83 15.66 16.13 16.43 August 1981 17.29 15.89 16.41 16.67 September 1981 18.16 16.17 16.76 17.04 October 1981 18.45 16.57 17.32 17.71 November 1981 17.83 16.95 17.77 December 1981 16.92 17.12 17.80 17.96 January 1982 17.40 17.08 17.45 17.34
Here’s the whole series… We also get a “smoothed out” series”
So, how do we choose w? Root Mean Squared Error For the whole sample…the MA4 is the best W = .1 W = .2 W = .4 W = .6 RMSE .81 .58 .41 .35