Download presentation
Presentation is loading. Please wait.
Published byDamon Davis Modified over 9 years ago
1
Lecture 3-3 Summarizing r relationships among variables © 1
2
Topics covered in this lecture note 4We will cover several topics about ordinary least square estimation. 1.Testing the statistical significance of the estimated coefficient using t-statistics (i.e., testing whether advertisement spending has any effect on revenue). 2.Ordinary Least Square estimation when there are more explanatory variables. 3.An introduction to panel data (repeated observations over time) 2
3
1. Testing the statistical significance of the estimated coefficient: Example The graph above shows a relationship between advertisement spending and revenue along with the estimated linear equation. The estimated slope coefficient is 13.4. This means that every 1000 yen you spend on advertisement, revenue increases by 13.4 thousand yen. Next Page 3
4
Testing the statistical significance of the estimated coefficient: Example, contd However, the graph also seems to indicate that there is not much relationship between advertisement spending and revenue. When we estimate a linear equation, we typically would like to know if advertisement has any effect on the revenue. To answer such a question, just estimating β 0 and β 1 is not enough. We need more information. 4
5
Testing the statistical significance of the estimated coefficient: Example, contd The following slides describe the procedure to answer the following question: “Would the advertisement have any impact on the revenue?” 5
6
Testing the statistical significance of the estimated coefficient: Example, contd 4To test if advertisement spending has any impact on the revenue, we need to test whether the slope coefficient is “significantly” different from zero. 1.If the slope coefficient is significantly different from zero, we may conclude that advertisement spending has some effect on the revenue. 2.If the slope coefficient is not significantly different from zero, we may conclude that advertisement spending has no effect on the revenue. 4Then, what would be the criterion to decide whether the slope coefficient is “significantly” different from zero? See next slide 6
7
Testing the statistical significance of the estimated coefficient: Example, contd 4To decide whether the slope coefficient is significantly different from zero, we use “t-statistic”. 4OLS estimation procedure estimates much more than β 0 andβ 1, also it includes t-statistic. Now, we will obtain some of extra information from OLS estimation using Excel. 7
8
Testing the statistical significance of the estimated coefficient: Example, contd 4Open Data set “OLS Exercise 2-Advertisement and Revenue”. This is the data set used to produce the graph in the previous slides. Now, use “Data Analysis” to estimate the following Model (Revenue)= β 0 +β 1 (Advertisement Spending) 8
9
Testing the statistical significance of the estimated coefficient: Example, contd The table above is the result of OLS regression. 1.Intercept Coefficient (β 0 )=15440.18 2.Slope Coefficient(β 1 )=13.45 3.We have some extra information, such as standard error and t statistic (t-Stat in the table). These are pieces of information needed to test whether slope coefficient is significantly different from zero. See next slides Coefficie nts Standard Error t Stat P- val ue Lower 95 % Upper 95 % Lower 95.0 % Upper 95.0 % Intercept 15440.1 8 2796.81 1 5.5206 39 5.87E- 05 9478.9 23 21401. 45 9478.9 23 21401. 45 Advertisem ent Spendin g 13.4510 7 60.3282 6 0.2229 65 0.8265 71 - 11 5.1 36 142.03 77 - 11 5.1 36 142.03 77 9
10
Testing the statistical significance of the estimated coefficient: Example -Standard Error- Since data contain a lot of noise (unexpected rises and falls in revenue, etc), the effect of advertisement on revenue (β 1 ) is estimated with some error. Standard errors show the expected error in the estimation of the coefficients. Next Slides Coefficie nts Standard Error t Stat P- val ue Lower 95 % Upper 95 % Lower 95.0 % Upper 95.0 % Intercept 15440.1 8 2796.81 1 5.5206 39 5.87E- 05 9478.9 23 21401. 45 9478.9 23 21401. 45 Advertisem ent Spendin g 13.4510 7 60.3282 6 0.2229 65 0.8265 71 - 11 5.1 36 142.03 77 - 11 5.1 36 142.03 77 10
11
Testing the statistical significance of the estimated coefficient: Example -Standard Error, contd- For example, the standard error for the slope coefficient is 60.3. This means that there would be an error in the estimate of the slope coefficient (β 1 ) of about ± 60.3 on average. Thus, the smaller the standard error for (β 1 ) is, the more precise the estimate of the impact of advertisement is. Coefficien ts Standard Error t Stat P- val ue Lower 95 % Upper 95 % Lower 95.0 % Upper 95.0 % Intercept 15440.1 8 2796.81 1 5.5206 39 5.87E- 05 9478.9 23 21401. 45 9478.9 23 21401. 45 Advertise ment Spendi ng 13.4510 7 60.3282 6 0.2229 65 0.8265 71 - 11 5.1 36 142.03 77 - 11 5.1 36 142.03 77 11
12
Testing the statistical significance of the estimated coefficient: Example -t statistic- t-statistic is obtained by dividing the coefficient by its standard error. For example, t-statistic for the slope coefficient is 13.45107/60.32825=0.222965 Our confidence that the advertisement spending has some impact on revenue increases if t-statistic increases (because this happens when the standard error decreases or the coefficient increases) We use t-statistic to test whether the slope coefficient is significantly different from zero. Coefficie nts Standard Error t Stat P- val ue Lower 95 % Upper 95 % Lower 95.0 % Upper 95.0 % Intercept 15440.1 8 2796.81 1 5.5206 39 5.87E- 05 9478.9 23 21401. 45 9478.9 23 21401. 45 Advertisem ent Spendin g 13.4510 7 60.3282 6 0.2229 65 0.8265 71 - 11 5.1 36 142.03 77 - 11 5.1 36 142.03 77 12
13
The procedure to test the statistical significance of the estimated coefficient 4The following is the procedure to test if a coefficient is significantly different from zero. 1.Obtain t-statistic 2.Check if the absolute value of the t-statistic is greater than or equal to 2 (that is, t-stat ≤ ‒ 2 or t-stat≥+2) 3.If the absolute value of the t-statistic is greater than (or equal to) 2, the coefficient is statistically significantly different from zero 4.If the absolute value of the t-statistic is smaller than 2, then the coefficient is not statistically significantly different from zero 13
14
A note on the test of statistical significance of the estimated coefficient 1 4When the coefficient is statistically significantly different from zero, we simply say “the coefficient is statistically significant”. 1.If the coefficient is statistically significant, we conclude that the advertisement spending has some impact on the revenue. 2.If the coefficient is not statistically significant, we concluded that the advertisement spending has no impact on the revenue. 14
15
A note on the test of statistical significance of the estimated coefficient 2 (Optional) The criterion value for t-statistic that we used for testing the statistical significance was 2. More precisely speaking, this criterion value depends on the number of observations and the number of parameters to be estimated. This topic will be discussed more in detail later in the class. When you use the criterion value of 2, roughly speaking, you are testing the statistical significance of the slope coefficient at the 5% significance level. 15
16
Exercise 4Exercise 1: Open data “Statistical Significance Exercise”. Use Product A data to estimate the effect of promotion on the revenue by estimating the following model. Pay particular attention to the statistical significance of the slope coefficient. (Revenue)=β 0 +β 1 (Number of promotion) 4Exercise 2: Use data “Statistical Significance Exercise”. Use Product C data to estimate the same model. 16
17
Exercise 1 Answer The estimated effect of the promotion on the revenue is 99060.15, with t-statistic equal to 5.07. Since t-statistic is greater than 2, we conclude that the effect of the promotion on the revenue is statistically significant. Given the statistical significance of the coefficient, the estimated slope coefficient of 99060 indicates that, if we increase the number of promotion by one, the revenue is likely to increase by 99060 yen. Produc t A Coefficient s Standard Error t StatP-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercep t 105827. 1 254311. 3 0.41613 2 0.68977 5 -495524 707177. 8 -495524 707177. 8 Number of promotio ns 99060.1 5 19523.9 4 5.07377 9 0.00144 1 52893.3 7 145226. 9 52893.3 7 145226. 9 17
18
Exercise 2 Answer The estimated effect of promotion on the revenue is -11751.1 with t-statistic equal to -1.3. Since the absolute value of t- statistic is smaller than 2, we conclude that the slope coefficient is not statistically significant. In other word, we did not find evidence that promotion has any impact on the revenue from the product C. Produc t C Coefficient s Standard Error t StatP-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Interce pt 341540.1 111203.4 3.0713 1 0.0180 34 78585. 82 604494.4 78585. 82 604494.4 Numbe r of promot ions - 11751. 1 8970.7 4 - 1.3099 3 0.2315 67 - 32963. 5 9461.3 73 - 32963. 5 9461.3 73 18
19
2. OLS with multiple explanatory variables Introduction 4So far, we have considered a model with only one explanatory variable. Y=β 0 +β 1 X 4Often, we have more than one explanatory variable. For example, in addition to promotion, the company may increase the number of sales persons. If we have data about the number of sales persons, we can also incorporate such a variable. 19
20
OLS with multiple regressors -Example: Returns on Education- 4Suppose you are considering to pursue more education (going to graduate school, etc). Then you may want to know if this is worth your effort. 20
21
OLS with multiple regressors -Example: Returns on Education- 4To investigate by how much the extra education increases your future salary we can utilize OLS regression. 4Open data “Returns on education”. This data contain three variables. These are data collected for 935 persons. For each person, data contain information about weekly wage in dollars, number of years of education, and number of years of work experience. 4As an exercise, find the mean, variance and standard deviation for the three variables. 21
22
OLS with multiple regression -Example: Returns on Education- 4To investigate the effect of education on wage, we may estimate the OLS regression: (wage)=β 0 +β 1 (education). 4However, wage is affected not only by education, but also the number of years of work experience. Therefore, it seems better to incorporate “work experience” in the model. 4The simplest way to incorporate experience in the model is the following: (wage)=β 0 +β 1 (education)+β 2 (experience) 4Notice, that this OLS equation has two explanatory variables on the right hand side of the equation. 22
23
OLS with multiple regressors -Example: Returns on Education- 4Excel estimates coefficients β 0, β 1 and β 2 automatically (wage)=β 0 +β 1 (education)+β 2 (experience) 4The estimated β 1 is the effect of education on wage, holding experience constant. This is the big advantage of OLS with multiple explanatory variables. When we look at data, education and experience vary at the same time, so it is difficult see the effect of education separately from the effect of experience just by looking at the data. By incorporating these two variables we can separate the effect of experience from the effect of education. 4Exercise: Estimate the model above using Excel. 23
24
OLS with multiple regressors -Example: Returns on Education- Estimated β 0 =-272.5, β 1 =76.2 and β 2 =17.6 Also notice that t-statistic for β 1 is 12.1, which is bigger than 2. Therefore, the estimated β 1 is statistically significant. Therefore, education does have an impact on wage. Given the statistical significance of β 1, we can say that, holding experience constant, increasing the year of education by one year would increase the weekly wage by $76.2. This also means that if you go to graduate school for 2 years, your annual salary would increase by $76.2*(52 weeks)*(2 years)=$7924.8 Coefficient s Standard Errort StatP-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept-272.528 107.262709 4 -2.54075 0.01122 266 - 483.0 32 - 62.02 34 - 483.0 32 - 62.02 34 education (in years) 76.2163 9 6.29660399 8 12.10436 1.98778 E-31 63.85 922 88.57 355 63.85 922 88.57 355 work experience in years) 17.6377 7 3.16177545.578439 3.18016 E-08 11.43 275 23.84 279 11.43 275 23.84 279 24
25
Exercise 2 4Open Data “Returns on education 2” 4This is the same data set as “Returns on education 1”, except that it has more variables. This data set contains information about the age of the person, and IQ test score of the person. Exercise: Add IQ to the model. Does this change the results? 25
26
OLS with multiple variables: Application -Making a model more flexible- 4When you specify a model for OLS estimation, the first criterion is the simplicity. (Revenue)=β 0 +β 1 (Promotion) 4Such a simple equation gives a clear idea of the effect of promotion on revenue. 4However, simplicity comes with a cost: It is often not flexible. 26
27
OLS with multiple variables: Application -Making a model more flexible- 4The model implicitly assumes that the effect of increasing the number of promotion by one does not change revenue. That is, the model assumes that the effect of increasing the number of promotion from 10 to 11 is the same as the effect of increasing the number of promotion from 40 to 41. 4However, it is reasonable to think that the effect of promotion would diminish due to the law of diminishing marginal return. 4See the next example. 27
28
-Making a model more flexible. An example 4Open the data set “Making a model more flexible”. This data show the relationship between number of promotion and revenue for product D. 4Plot the relationship between the number of promotion and revenue, then describe the relationship. 28
29
-Making a model more flexible: An example The relationship seems to be a curve, not a straight line. The effectiveness of promotion seems to be diminishing as the number of promotion increases. How do we incorporate the“diminishing effectiveness” of promotion in the model? 29
30
-Making a model more flexible: An example- 4To incorporate the “diminishing effectiveness” in the model we need to specify the model that can “curve”. 4A simple way to achieve this is to estimate the following model: (Revenue)=β 0 +β 1 (Number of promotion) +β 2 (Number of promotion) 2 30
31
-Making a model more flexible: Exercise- 4Use the data “Making a model more flexible” and estimate the following model: (Revenue)=β 0 +β 1 (Number of promotion) +β 2 (Number of promotion) 2 31
32
Exercise: Answer The estimated equation is (Revenue)=-295299.7+181554.72(Number of promotion) ‒ 2629.38(Number of promotion)2 Note the both β 1 and β 2 are statistically significant. Coefficient s Standard Errort Stat P- value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept - 295299. 7 166846.45 98 - 1.7698888 7 0.093 683 - 64583 1 55231.7 1 -645831 55231.7 1 Number of promotion s 181554. 72 16497.653 68 11.004881 33 2.01E- 09 14689 4.4 216215 146894. 4 216215 (Number of promotion )^2 - 2629.83 8 359.26503 49 -7.3200508 8.48E- 07 - 3384. 63 - 1875.05 - 3384.63 - 1875.05 32
33
More exercises 4Exercise 1: Using the estimated equation compute “predicted” revenue for each observation. 4Exercise 2: Now plot the predicted revenue and the number of promotions. Also plot the actual revenue and promotions, on the same graph. See how well the model predicts the outcome. 33
34
More exercises 4Exercise 3: Using the estimated results, compute the expected increases in revenue when you increase the number of promotion from 10 to 11, and 25 to 26. 34
35
OLS with multiple variables: Application 2 -Dummy Variables- 4Often, our data contain qualitative variables. For example, if you have data about your clients, for each client you may have data about whether the person is male or female. Such data (about gender) is not a quantitative variable but a qualitative variable. 35
36
OLS with multiple variables: Application 2 -Dummy Variables- 4However, such a qualitative variable is also important in analyzing data. For example, you would like to answer the following question: “which gender consumes more?” 36
37
4To incorporate such a qualitative variable into the OLS equation, we first convert qualitative information into a quantitative variable called a “dummy variable”. 4A dummy variable is a variable that takes 1 if a particular criterion is satisfied, and takes 0 otherwise. 4If you would like to incorporate gender information in your model, create the following dummy variable: Female =1 if the client is female =0 if the client is male Then you can estimate (Consumer spending)=β 0 +β 1 (Number of promotion) +β 2 (Female) 37
38
OLS with multiple variables: Application 2 -Dummy Variables- 4 A dummy variable is very versatile. Suppose you would like to know if there is any wage differentials among different races (for example between white and black), then you can use a dummy variable that takes 1 if the person is black, and 0 otherwise. 4A dummy variable can be created for many other occasions. The use of a dummy variable is one of the most important techniques in regression analysis. 38
39
Dummy variable exercise 4Open Data. “Dummy variable Exercise”. This data set contains three dummy variables. Black =1 if the person is black =0 otherwise Married =1 if the person is married =0 otherwise South =1 if the person lives in South of USA =0 otherwise Urban =1 if the person lives in urban area =0 otherwise. 39
40
Dummy variable exercise 4Exercise 1: Estimate the following model: (Wage)=β0+β1(Education)+β2(Experience) +β3(Age)+ β4(IQ) +β5(Black) Then interpret the results. 40
41
Dummy variable exercise 、 Answer The coefficient for the dummy variable for black person is - 124.6. The t-statistic is -3.19;the absolute value of t-statistic is greater than 2. Therefore, the coefficient is statistically significant. The results indicate that, holding education, experience, age, and IQ constant, the weekly wage is lower for a black person by $124.6. There seems to exist a large wage gap among white and black races. Coefficien ts Standard Error t StatP-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept-726.121165.4365-4.389121.27E-05-1050.79-401.448-1050.79-401.448 education52.708897.2661127.254078.52E-1338.4489966.9687938.4489966.96879 experience11.272173.6999213.0465970.002384.01099518.533344.01099518.53334 age13.380114.6466122.8795410.0040744.26103522.499184.26103522.49918 IQ4.1191130.9978744.1278893.99E-052.1607656.0774622.1607656.077462 black-124.65339.04528-3.192530.001458-201.28-48.0259-201.28-48.0259 41
42
Dummy variable: More exercises 4Use data “Dummy Variable Exercise”. Specify your own model, estimate, and interpret the results. 42
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.