Department of Business Administration FALL 2016-17 Demand Estimation by Prof. Dr. Sami Fethi
(-, +/- , - /+, +) Demand Estimation Qdx= (P, In;Ii, Pc;Ps, T) To use these important demand relationship in decision analysis, we need empirically to estimate the structural form and parameters of the demand function-Demand Estimation. Qdx= (P, In;Ii, Pc;Ps, T) (-, +/- , - /+, +) The demand for a commodity arises from the consumers’ willingness and ability to purchase the commodity. Consumer demand theory postulates that the quantity demanded of a commodity is a function of or depends on the price of the commodity, the consumers’ income, the price of related commodities, and the tastes of the consumer.
Demand Estimation In general, we will seek the answer for the following qustions: How much will the revenue of the firm change after increasing the price of the commodity? How much will the quantity demanded of the commodity increase if consumers’ income increase What if the firms double its ads expenditure? What if the competitors lower their prices? Firms should know the answers the abovementioned questions if they want to achieve the objective of maximizing thier value.
The Identification Problem The demand curve for a commodity is generally estimated from market data on the quantity purchased of the commodity at various price over time (i.e. Time-series data) or various consuming units at one point in time (i.e. Cross-sectional data). Simply joinning priced-quantity observations on a graph does not generate the demand curve for a commodity. The reason is that each priced-quantity observation is given by the intersection of a different and unobserved demand and supply curve of commodity. In other words, The difficulty of deriving the demand curve for a commodity from observed priced-quantity points that results from the intersection of different and unobserved demand and supply curves for the commodity is referred to as the identification problem.
The Identification Problem In the following demand curve, Observed price-quantity data points E1, E2, E3, and E4, result respectively from the intersection of unobserved demand and supply curves D1 and S1, D2 and S2, D3 and S3, and D4 and S4. Therefore, the dashed line connecting observed points E1, E2, E3, and E4 is not the demanded curve for the commodity. The derived a demand curve for the commodity, say, D2, we allow the supply to shift or to be different and correct, through regression analysis, for the forces that cause demand curve D2 to shift or to be different as can be seen at points E2, E'2. This is done by regression analysis.
Demand Estimation: Marketing Research Approaches Consumer Surveys Observational Research Consumer Clinics Market Experiments These approaches are usually covered extensively in marketing courses, however the most important of these are consumer surveys and market experiments.
Demand Estimation: Marketing Research Approaches Consumer surveys: These surveys require the questioning of a firm’s customers in an attempt to estimate the relationship between the demand for its products and a variety of variables perceived to be for the marketing and profit planning functions. These surveys can be conducted by simply stopping and questioning people at shopping centre or by administering sophisticated questionnaires to a carefully constructed representative sample of consumers by trained interviewers.
Demand Estimation: Marketing Research Approaches Major advantages: they may provide the only information available; they can be made as simple as possible; the researcher can ask exactly the questions they want Major disadvantages: consumers may be unable or unwilling to provide reliable answers; careful and extensive surveys can be very expensive.
Demand Estimation: Marketing Research Approaches Market experiments: attempts by the firm to estimate the demand for the commodity by changing price and other determinants of the demand for the commodity in the actual market place.
Demand Estimation: Marketing Research Approaches Major advantages: consumers are in a real market situation; they do not know that they being observed; they can be conducted on a large scale to ensure the validity of results. Major disadvantages: in order to keep cost down, the experiment may be too limited so the outcome can be questionable; competitors could try to sabotage the experiment by changing prices and other determinants of demand under their control; competitors can monitor the experiment to gain very useful information about the firm would prefer not to disclose.
Purpose of Regression Analysis Regression Analysis is Used Primarily to Model Causality and Provide Prediction Predict the values of a dependent (response) variable based on values of at least one independent (explanatory) variable Explain the effect of the independent variables on the dependent variable The relationship between X and Y can be shown on a scatter diagram
Scatter Diagram It is two dimensional graph of plotted points in which the vertical axis represents values of the dependent variable and the horizontal axis represents values of the independent or explanatory variable. The patterns of the intersecting points of variables can graphically show relationship patterns. Mostly, scatter diagram is used to prove or disprove cause-and-effect relationship. In the following example, it shows the relationship between advertising expenditure and its sales revenues.
Scatter Diagram-Example
Scatter Diagram Scatter diagram shows a positive relationship between the relevant variables. The relationship is approximately linear. This gives us a rough estimates of the linear relationship between the variables in the form of an equation such as Y= a+ b X
Regression Analysis In the equation, a is the vertical intercept of the estimated linear relationship and gives the value of Y when X=0, while b is the slope of the line and gives an estimate of the increase in Y resulting from each unit increase in X. The difficulty with the scatter diagram is that different researchers would probably obtain different results, even if they use same data points. Solution for this is to use regression analysis.
Regression Analysis Regression analysis: is a statistical technique for obtaining the line that best fits the data points so that all researchers can reach the same results. Regression Line: Line of Best Fit Regression Line: Minimizes the sum of the squared vertical deviations (et) of each point from the regression line. This is the method called Ordinary Least Squares (OLS).
Regression Analysis In the table, Y1 refers actual or observed sales revenue of $44 mn associated with the advertising expenditure of $10 mn in the first year for which data collected. In the following graph, Y^1 is the corresponding sales revenue of the firm estimated from the regression line for the advertising expenditure of $10 mn in the first year. The symbol e1 is the corresponding vertical deviation or error of the actual sales revenue estimated from the regression line in the first year. This can be expressed as e1= Y1- Y^1.
Regression Analysis In the graph, Y^1 is the corresponding sales revenue of the firm estimated from the regression line for the advertising expenditure of $10 mn in the first year. The symbol e1 is the corresponding vertical deviation or error of the actual sales revenue estimated from the regression line in the first year. This can be expressed as e1= Y1- Y^1.
Regression Analysis Since there are 10 observation points, we have obviously 10 vertical deviations or error (i.e., e1 to e10). The regression line obtained is the line that best fits the data points in the sense that the sum of the squared (vertical) deviations from the line is minimum. This means that each of the 10 e values is first squared and then summed.
Simple Regression Analysis Now we are in a position to calculate the value of a ( the vertical intercept) and the value of b (the slope coefficient) of the regression line. Conduct tests of significance of parameter estimates. Construct confidence interval for the true parameter. Test for the overall explanatory power of the regression.
Simple Linear Regression Model Regression line is a straight line that describes the dependence of the average value of one variable on the other Slope Coefficient Random Error Y Intercept Dependent (Response) Variable Independent (Explanatory) Variable Regression Line
Ordinary Least Squares (OLS) Model:
Ordinary Least Squares (OLS) Objective: Determine the slope and intercept that minimize the sum of the squared errors.
Ordinary Least Squares (OLS) Estimation Procedure
Estimation-Example Using the relevant information in the table, calculate the value of a ( the vertical intercept) and the value of b (the slope coefficient) of the regression line. Construct the equation. Conduct tests of significance of parameter estimates. Test for the overall explanatory power of the regression.
Ordinary Least Squares (OLS) Estimation Example
Ordinary Least Squares (OLS) Estimation Example
The Equation of Regression Line The equation of the regression line can be constructed as follows: Yt^=7.60 +3.53 Xt When X=0 (zero advertising expenditures), the expected sales revenue of the firm is $7.60 mn. In the first year, when X=10mn, Y1^= $42.90 mn. Strictly speaking, the regression line should be used only to estimate the sales revenues resulting from advertising expenditure that are within the range.
Error term is normally distributed. Crucial Assumptions Error term is normally distributed. Error term has zero expected value or mean. Error term has constant variance in each time period and for all values of X (i.e. Heteroscedasticity). Error term’s value in one time period is unrelated to its value in any other period (Autocorrelation).
Tests of Significance: Standard Error To test the hypothesis that b is statistically significant (i.e., advertising positively affects sales), we need first of all to calculate standard error (deviation) of b^. Calculate standard error The standard error can be calculated in the following expression:
Standard Error of the Slope Estimate Tests of Significance Standard Error of the Slope Estimate
Example Calculation Tests of Significance Yt^=7.60 +3.53 Xt =7.60+3.53(10)= 42.90
Tests of Significance Example Calculation
Calculation of the t Statistic Tests of Significance Calculation of the t Statistic Degrees of Freedom = (n-k) = (10-2) = 8 Critical Value (tabulated) at 5% level =2.306
Confidence interval We can also construct confidence interval for the true parameter from the estimated coefficient. Accepting the alternative hypothesis that there is a relationship between X and Y. Using tabular value of t=2.306 for 5% and 8 df in our example, the true value of b will lies between 2.33 and 4.73 t=b^+/- 2.306 (sb^)=3.53+/- 2.036 (0.52)
Decomposition of Sum of Squares Tests of Significance Decomposition of Sum of Squares Total Variation = Explained Variation + Unexplained Variation
Decomposition of Sum of Squares Tests of Significance Decomposition of Sum of Squares
Coefficient of Determination Coefficient of Determination: is defined as the proportion of the total variation or dispersion in the dependent variable that explained by the variation in the explanatory variables in the regression. In our example, COD measures how much of the variation in the firm’s sales is explained by the variation in its advertising expenditures.
Coefficient of Determination Tests of Significance Coefficient of Determination
Coefficient of Correlation Coefficient of Correlation (r): The square root of the coefficient of determination. This is simply a measure of the degree of association or co-variation that exists between variables X and Y. In our example, this mean that variables X and Y vary together 92% of the time. The sign of coefficient r is always the same as the sign of coefficient of b^.
Coefficient of Correlation Tests of Significance Coefficient of Correlation CC measures only degree of association or co variation between the two variables. In this case there is a strong and positive relationship between Y and X.
Multiple Regression Analysis Model:
Multiple Regression Analysis Relationship between 1 dependent & 2 or more independent variables is a linear function Y-intercept Slopes Random error Dependent (Response) variable Independent (Explanatory) variables
Multiple Regression Analysis
Multiple Regression Analysis Too complicated by hand! Ouch!
Multiple Regression Model: Example Develop a model for estimating heating oil used for a single family home in the month of January, based on average temperature and amount of insulation in inches.
Multiple Regression Model: Example Excel Output For each degree increase in temperature, the estimated average amount of heating oil used is decreased by 5.437 gallons, holding insulation constant. For each increase in one inch of insulation, the estimated average use of heating oil is decreased by 20.012 gallons, holding temperature constant.
Multiple Regression Analysis Adjusted Coefficient of Determination
Interpretation of Coefficient of Multiple Determination 96.56% of the total variation in heating oil can be explained by temperature and amount of insulation 95.99% of the total fluctuation in heating oil can be explained by temperature and amount of insulation after adjusting for the number of explanatory variables and sample size
Testing for Overall Significance Shows if Y Depends Linearly on All of the X Variables Together as a Group Use F Test Statistic Hypotheses: H0: 1 = 2 = … = k = 0 (No linear relationship) H1: At least one i 0 ( At least one independent variable affects Y ) The Null Hypothesis is a Very Strong Statement The Null Hypothesis is Almost Always Rejected
Multiple Regression Analysis Analysis of Variance and F Statistic
Test for Overall Significance Excel Output: Example k = 3, no of parameters Test for Overall Significance Excel Output: Example p-value n - 1 k -1= 2, the number of explanatory variables and dependent variable
Test for Overall Significance: Example Solution H0: 1 = 2 = … = k = 0 H1: At least one j 0 = .05 df = 2 and 12 Critical Value: Test Statistic: Decision: Conclusion: F 168.47 Reject at = 0.05. = 0.05 There is evidence that at least one independent variable affects Y. 3.89
t Test Statistic Excel Output: Example t Test Statistic for X1 (Temperature) t Test Statistic for X2 (Insulation)
t Test : Example Solution Does temperature have a significant effect on monthly consumption of heating oil? Test at = 0.05. Test Statistic: t Test Statistic = -16.1699 Decision: Reject H0 at = 0.05. Conclusion: There is evidence of a significant effect of temperature on oil consumption holding constant the effect of insulation. H0: 1 = 0 H1: 1 0 df = 12 Critical Values: Reject H Reject H .025 .025 -2.1788 2.1788
Problems in Regression Analysis Multicollinearity: Two or more explanatory variables are highly correlated. Heteroskedasticity: Variance of error term is not independent of the Y variable. Autocorrelation: Consecutive error terms are correlated. Functional form: Misspecified by the omission of a variable Normality: Residuals are normally distributed or not
Practical Consequences of Multicollinearity Large variance or standard error Wider confidence intervals Insignificant t-ratios A high R2 value but few significant t-ratios OLS estimators and their Std. Errors tend to be unstable Wrong signs for regression coefficients
Multicollinearity How can Multicollinearity be overcome? Increasing number of observation Acquiring additional data A new sample Using an experience from a previous study Transformation of the variables Dropping a variable from the model This is the simplest solution, but the worse one referring an economic model (i.e., model specification error)
Heteroskedasticity Heteroskedasticity: Variance of error term is not independent of the Y variable or unequal/non-constant variance. This means that when both response and explanatory variables increase, the variance of response variables does not remain same at all levels of explanatory variables (cross-sectional data). Homoscedasticity: when both response and explanatory variables increase, the variance of response variable around its mean value remains same at all levels of explanatory variables (equal variance).
Residual Analysis for Homoscedasticity X X SR SR X X Homoscedasticity Heteroscedasticity
Autocorrelation or serial correlation Autocorrelation: Correlation between members of observation ordered in time as in time series data (i.e., residuals are correlated where consecutive errors have the same sign). Detecting Autocorrelation: This can be detected by many ways. The most common used is DW statistics.
Durbin-Watson Statistic Test for Autocorrelation If d=2, autocorrelation is absent.
Residual Analysis for Independence The Durbin-Watson Statistic Used when data is collected over time to detect autocorrelation (residuals in one time period are related to residuals in another period) Measures violation of independence assumption Should be close to 2. If not, examine the model for autocorrelation.
Residual Analysis for Independence Graphical Approach Not Independent Independent e e Time Time Cyclical Pattern No Particular Pattern Residual is Plotted Against Time to Detect Any Autocorrelation
Using the Durbin-Watson Statistic : No autocorrelation (error terms are independent) : There is autocorrelation (error terms are not) Inconclusive Reject H0 (positive autocorrelation) Reject H0 (negative autocorrelation) Accept H0 (no autocorrelation) dL dU 2 4-dU 4-dL 4
Steps in Demand Estimation Model Specification: Identify Variables Collect Data Specify Functional Form Estimate Function Test the Results
Functional Form Specifications Linear Function: Power Function: Estimation Format:
Dummy-Variable Models When the explanatory variables are qualitative in nature, these are known as dummy variables. These can also defined as indicators variables, binary variables, categorical variables, and dichotomous variables such as variable D in the following equation:
Dummy-Variable Models Categorical Explanatory Variable with 2 or More Levels Yes or No, On or Off, Male or Female, Use Dummy-Variables (Coded as 0 or 1) Only Intercepts are Different Assumes Equal Slopes Across Categories Regression Model Has Same Form Can the dependent variable be dummy?
Dummy-Variable Models Given: Y = Assessed Value of House X1 = Square Footage of House X2 = Desirability of Neighbourhood = Desirable (X2 = 1) Undesirable (X2 = 0) 0 if undesirable 1 if desirable Same slopes
Simple and Multiple Regression Compared: Example
Regression Analysis in Practice Suppose we have an Employment (Labor Demand) Function as follows: N=Constant+K+W+AD+P+WT N: employees in employment K: capital accumulation W: value of real wages AD: aggregate deficit P: effect of world manufacturing exports on employment WT: the deviation of world trade from trend.
Output by Microfit v4.0w Ordinary Least Squares Estimation ******************************************************************************* Dependent variable is LOGN 39 observations used for estimation from 1956 to 1994 Regressor Coefficient Standard Error T-Ratio[Prob] CON 4.9921 .98407 5.0729[.000] LOGK .040394 .012998 3.1078[.004] LOGW .024737 .010982 2.2526[.032] AD -.9174E-7 .1587E-6 .57798[.567] LOGP .026977 .0099796 2.7032[.011] LOGWT -.053944 .024279 2.2219[.034] R-Squared .82476 F-statistic F( 5, 34) 20.8432[.000] R-Bar-Squared .78519 S.E. of Regression .012467 Residual Sum of Squares .0048181 Mean of Dependent Variable 10.0098 S.D. of Dependent Variable .026899 Maximum of Log-likelihood 120.1407 DW-statistic 1.8538
Diagnostic Tests ******************************************************************************* * Test Statistics * LM Version * F Version * * * * * * A:Serial Correlation *CHI-SQ( 1)= .051656[.820]*F(1,30)=.039788[.843]* * B:Functional Form *CHI-SQ( 1)= .056872[.812]*F(1,30)=.043812[.836]* * C:Normality *CHI-SQ( 2)= 1.2819[.527]* Not applicable * * D:Heteroscedasticity *CHI-SQ( 1)= 1.0065[.316]*F( 1,37)=.98022[.329]* A:Lagrange multiplier test of residual serial correlation B:Ramsey's RESET test using the square of the fitted values C:Based on a test of skewness and kurtosis of residuals D:Based on the regression of squared residuals on squared fitted values
Dependent Variable: LOGN Explanatory Variables CON 4.9921 (5.07) LOGK 0.40394 (3.10) LOGW 0.0247 (2.25) AD -0.9174 (-0.577) LOGP 0.0269 (2.70) LOGWT -0.0539 (-2.22) R2 0.87 0.83 DW 2.16 SER 0.021 X2SC .05165[.820] X2FF 05687[.812] X2NORM 1.2819[.527] X2HET 1.0065[.316] R2 bar
H1: s0 (significant) Interpretation t-test (individual significance) Let’s first see the significance of each variable; n=39 k=5 and hence d.f.=39-5=34 =0.05 (our confidence level is 95%). With =0.05 and d.f.=34, ttab=2.042 Our Hypothesis are: Ho:s=0 (not significant) H1: s0 (significant) This is t- distribution and using this distribution, you can decide whether individual t-values (calculated or estimated) of the existing variables are significant or not according to the tabulated t-values as appears in the fig above.
Ho:R2s=0 (not significant) H1: R20 (significant) F-test (overall significance) Our result is F(5,34)=20.8432 k-1=5 and n-k=34 = 0.05 (our confidence level is 95%). With = 0.05 and F(5,34), the Ftab=2.53 (from the table 5 and 30). Our hypothesis are Ho:R2s=0 (not significant) H1: R20 (significant)
Diagnostic Tests: Serial Correlation: Ho:=0(non-existence of autocorrelation ) H1:0 (existence of autocorrelation) Since CHI-SQ(1)=0.051656< X2=3.841, we accept Ho that estimate regression does not have first order serial correlation or autocorrelation.
Functional Form: Ho:=0 (non-existence misspecification) H1: 0 (existence misspecification) The estimated LM version of CHI-SQ is 0.0568721 and with = 0.05 the tabular value is X2=3.841. Because CHI-SQ (1)=0.0568721< X2=3.841, then we accept the null hypothesis that there is no misspecification.
Normality: Ho:ut=0 (residuals are normally distributed) H1:ut0(residuals are not normally distributed) Our estimated result of LM version for normality is CHI-SQ(2)=1.28191, and the tabular value with 2 restrictions with = 0.05 is X2=5.991. Since CHI-SQ(2)=1.28191< X2=5.991, the test result shows that the null hypothesis of normality of the residuals is accepted.
Heteroscedasticity: Ho:yt2=2 (homoscedasticity) H1:yt22(heteroscedasticity) LM version of our result for the heteroscedasticity is CHI-SQ(1)=1.00651 and table critical value with 1 restriction with = 0.05 is X2=3.841. Since CHI-SQ(1)=1.00651< X2=3.841, we accept the null hypothesis that error term is constant for all the independent variables.
The End Thanks