Lectures 3&4 Univariate regression

Lectures 3&4 Univariate regression

The effect of X on Y Are female bosses better?
Does having a PhD (in science) help to innovate? Is website design A better than design B in terms of sales? Do people of different age buy different gadgets from Elisa? Are promotions of substitute products of the same firm at the same time?

Let’s look at Miller’s beer
Maxim Sinitsyn (2015) Managing Price Promotions Within a Product Line. Marketing Science Published online in Articles in Advance 12 Oct 2015

Two products – m12 and m24 Data over 221 weeks.
Miller Lite 12/12 oz (”m12”) and Miller Lite 24/12 oz (”m24”). Information on price of m12 and m24 relative to their ”regular price”, in %. Define a promotion as ”price < regular price”.

Variables m12 = price of Miller Lite 12/12 oz relative to regular price m24 = price of Miller Lite 24/12 oz relative to regular price m12_prom = 0 if no promotion, 1 if promotion (dummy variable) for m12. m24_prom = 0 if no promotion, 1 if promotion (dummy variable) for m24. lnm12 = natural logarithm of m12 lnm24 = natural logarithm of m24

Descriptive statistics
Variable Obs Mean Std. Dev. Min Max miller12 221 0.95 0.08 0.74 1.00 miller24 0.94 0.06 0.79 m12_prom 0.29 0.45 0.00 m24_prom 0.59 0.49 lnm12 -0.05 0.09 -0.30 lnm24 -0.06 -0.24 sum *

Frequency of promotions

twoway kdensity miller12 || kdensity miller24, ///
title("Distribution of price of m12 and m24") /// legend(lab (1 "m12") lab (2 "m24")) /// xtitle("price")

Conditional descriptive statistics
miller12 < 1 Variable Obs Mean Std. Dev. Min Max miller12 63 0.83 0.05 0.74 0.97 miller24 0.93 0.06 0.79 1.00 m12_prom 0.00 m24_prom 0.68 0.47 miller24 < 1 130 0.94 0.09 0.90 0.04 0.99 0.33 sum miller* *prom if miller12 < 1

Joint distribution of promotions
1 Total 71 87 158 m12 32.13 39.37 71.49 20 43 63 9.05 19.46 28.51 91 130 221 41.18 58.82 100 tab m12_prom m24_prom, cell

Modeling Q1: what is the object you want to model (”explain”)?
Let’s call this Y. Q2: what is the object whose effect on Y you want to understand? Let’s call this X.

Modeling Where do these (decisions) come from? Theory. What is theory?
Mathematical model. Conseptualization of existing qualitative knowledge. Conseptualization of existing quantitative knowledge.

Modeling the relationship between m12 and m24 prices?
𝑚12=𝑓 𝑚24 𝑚12_𝑝𝑟𝑜𝑚=𝑓 𝑚24_𝑝𝑟𝑜𝑚 ln𝑚12=𝑓 ln𝑚24 𝑌=𝑓 𝑋 What do we know about 𝑓 𝑋 ? How can we learn about it?

Quick aside - correlation
𝑐𝑜𝑟𝑟 𝑌,𝑋 = 𝑐𝑜𝑣(𝑌,𝑋) 𝑣𝑎𝑟(𝑋) 𝑣𝑎𝑟(𝑌)

More structure - linear
𝑌= 𝛽 0 + 𝛽 1 𝑋 This is the so-called population regression line (populaatio regressio). 𝑌 = dependent variable (vastemuuttuja) / endogenous variable. 𝑋 = independent variable (selittävä muuttuja) / exogenous variable / regressor. 𝛽 0 , 𝛽 1 = parameters of the model.

Parameters 𝑌= 𝛽 0 + 𝛽 1 𝑋 𝛽 0 , 𝛽 1 . Interpretation?
𝑌= 𝛽 0 + 𝛽 1 𝑋 𝛽 0 , 𝛽 1 . Interpretation? Intercept, slope. What is now assumed about what can influence 𝑌?

How to allow for other factors?
𝑌=𝑓 𝑋,𝑢 = 𝛽 0 + 𝛽 1 𝑋+𝑢 𝑢 = error term/residual (virhetermi/jäännöstermi). Why such a name? It shows how much our model misses in terms of determining 𝑌. It measures those things that 1) affect 𝑌 and 2) we don’t observe.

What is known about 𝑢? How large should the error be on average?
0. Why?  E 𝑢 𝑋 =0.

How to get 𝛽 0 , 𝛽 1 ?

How to get 𝛽 0 , 𝛽 1 ? OLS Ordinary Least Squares (pienimmän neliösumman menetelmä). 𝑌= 𝛽 0 + 𝛽 1 𝑋+𝑢 𝐸 𝑌− 𝛽 0 + 𝛽 1 𝑋 =𝐸 𝑢 𝑋 =0 𝑚𝑖𝑛 𝛽 0 , 𝛽 1 𝑖=1 𝑛 𝑌− 𝛽 0 + 𝛽 1 𝑋 2

How to get 𝛽 0 , 𝛽 1 ? OLS Notice link to estimation of mean: set 𝛽 1 =0. 𝑖=1 𝑛 𝑌− 𝛽 Now 𝛽 0 =𝑚= 𝜇 𝑌

How to get 𝛽 0 , 𝛽 1 ? OLS Predicted value (”ennuste”)
𝛽 1 = 𝑖=1 𝑛 𝑋𝑌− 𝑋 𝑌 𝑖=1 𝑛 𝑋𝑋− 𝑋 𝑋 = 𝑐𝑜𝑣(𝑌,𝑋) 𝑣𝑎𝑟(𝑋) = 𝑐𝑜𝑣(𝑌,𝑋) 𝑣𝑎𝑟(𝑋) 𝑣𝑎𝑟(𝑋) 𝛽 0 = 𝑌 − 𝑖=1 𝑛 𝑋𝑌− 𝑋 𝑌 𝑖=1 𝑛 𝑋𝑋− 𝑋 𝑋 𝑋 = 𝑌 − 𝛽 1 𝑋 𝑌 𝑖 = 𝛽 𝛽 1 𝑋 𝑖 𝑢 𝑖 = 𝑌 𝑖 − ( 𝛽 𝛽 1 𝑋 𝑖 ) Predicted value (”ennuste”) Prediction error (”ennustevirhe”)

Back to beer... 𝛽 0 𝑙𝑛𝑚12= 𝛽 0 + 𝛽 1 lnm24 𝛽 1
𝑢 𝑖 = 𝑙𝑛𝑚12 𝑖 − (𝛽 0 + 𝛽 1 𝑙𝑛𝑚24 𝑖 ) ( 𝑙𝑛𝑚24 𝑖 , 𝑙𝑛𝑚12 𝑖 )

Back to beer… regr miller12 miller24 estimates store lin_est
regr lnm12 lnm24 estimates store ln_est regr m12_prom m24_prom estimates store pr_est estimates table lin_est ln_est pr_est, b(%7.3f) se(%7.3f) p(%7.3f) stats(r2)

Back to beer... Dependent variable miller12 lnm12 m12_prom miller24
0.214 0.095 0.025 lnm24 0.220 0.098 0.026 m24_prom 0.111 0.062 0.073 constant 0.749 -0.041 0.090 0.009 0.047 0.000 𝑅 2 0.023 0.015 Coefficient / parameter estimate kerroin Standard error / keskivirhe p-value / p-arvo

What are these numbers? How good is the model’s fit? How much does it explain? Of what....? Of the variation in Y.

What are these numbers? Selitetty neliösumma Kokonaisneliösumma
𝐸𝑆𝑆= 𝑖=1 𝑛 ( 𝑌 𝑖 − 𝑌 ) 2 𝑇𝑆𝑆= 𝑖=1 𝑛 ( 𝑌 𝑖 − 𝑌 ) 2 𝑅𝑆𝑆= 𝑖=1 𝑛 ( 𝑢 𝑖 ) 2 Selitetty neliösumma Kokonaisneliösumma Jäännöstermin neliösumma

What are these numbers? 𝑅 2 = 𝐸𝑆𝑆 𝑇𝑆𝑆 =1− 𝑅𝑆𝑆 𝑇𝑆𝑆 𝑅 2 ∈[0,1]

What are these numbers? Dependent variable miller12 lnm12 m12_prom 𝑅 2
0.023 0.015

What are these numbers? Dependent variable miller12 miller24 0.214
0.095 0.025 lnm24 m24_prom constant 0.749 0.090 0.000 𝑅 2 0.023

What are these numbers? Dependent variable lnm12 miller24 lnm24 0.220
0.098 0.026 m24_prom constant -0.041 0.009 0.000 𝑅 2 0.023

What are these numbers? Dependent variable m12_prom miller24 lnm24
0.111 0.062 0.073 constant 0.220 0.047 0.000 𝑅 2 0.015 m12_prom m24_prom 1

What are these numbers? So economic interpretation & significance is of key importance. What about statistical significance? Under assumptions that we’ll discuss in a moment, 𝛽 0 , 𝛽 1 are normally distributed with a known mean and variance.

What are these numbers? 𝛽 0 , 𝛽 1 are unbiased consistent and
efficient (with an extra assumption). under a set of assumptions.

Let’s have a look at 𝑙𝑛𝑚12= 𝛽 0 + 𝛽 1 𝑙𝑛𝑚24+𝑢 Let’s vary sample size.
How do we do this?

(Monte Carlo) simulation
Let’s use artificial data that has ”appealing” features. Artificial data = ask the computer to generate it.  the researcher chooses what the data looks like. Monte Carlo simulation = repeat a statistical model S times on artificial data, look at means and distributions of parameters.

= standard deviation of lnm24.

e1 lnm24 u lnm12

ln𝑚24=− e1 − = mean of lnm24.

e1 lnm24 u lnm12

ln𝑚24=− e1 𝑢=𝑟𝑛𝑜𝑟𝑚𝑎𝑙 ∗ = standard deviation of lnm12 (after variation in lnm24 taken into account).

e1 lnm24 u lnm12

e1 = 𝑟𝑛𝑜𝑟𝑚𝑎𝑙 ∗ ln𝑚24=− e1 𝑢=𝑟𝑛𝑜𝑟𝑚𝑎𝑙 ∗ 𝛽 0 =− 𝛽 1 = ln𝑚12= 𝛽 0 + 𝛽 1 𝑙𝑛𝑚24+𝑢

e1 lnm24 u lnm12

Different sample sizes
Effect of sample size Different sample sizes truth 10 obs 100 1000 10K 100K 1M 10M m24 0.220 0.668 0.392 0.127 constant -0.041 0.05 0.155 0.757 𝑅 2 0.266 0.000

Effect of sample size Different sample sizes truth 10 obs 100 1000 10K 100K 1M 10M m24 0.220 0.668 0.256 0.392 0.118 0.127 0.032 constant -0.041 0.05 -0.038 0.155 0.031 0.757 0.223 𝑅 2 0.266 0.046

Effect of sample size Different sample sizes truth 10 obs 100 1000 10K 100K 1M 10M m24 0.220 0.668 0.256 0.242 0.392 0.118 0.039 0.127 0.032 0.000 constant -0.041 0.05 -0.038 -0.028 0.155 0.031 0.01 0.757 0.223 0.005 𝑅 2 0.266 0.046 0.037

Effect of sample size Different sample sizes truth 10 obs 100 1000 10K 100K 1M 10M m24 0.220 0.668 0.256 0.242 0.214 0.392 0.118 0.039 0.012 0.127 0.032 0.000 constant -0.041 0.05 -0.038 -0.028 0.155 0.031 0.01 0.003 0.757 0.223 0.005 𝑅 2 0.266 0.046 0.037 0.029

Effect of sample size Different sample sizes truth 10 obs 100 1000 10K 100K 1M 10M m24 0.220 0.668 0.256 0.242 0.214 0.217 0.392 0.118 0.039 0.012 0.004 0.127 0.032 0.000 constant -0.041 0.05 -0.038 -0.028 -0.042 0.155 0.031 0.01 0.003 0.001 0.757 0.223 0.005 𝑅 2 0.266 0.046 0.037 0.029

Effect of sample size Different sample sizes truth 10 obs 100 1000 10K 100K 1M 10M m24 0.220 0.668 0.256 0.242 0.214 0.217 0.219 0.392 0.118 0.039 0.012 0.004 0.001 0.127 0.032 0.000 constant -0.041 0.05 -0.038 -0.028 -0.042 0.155 0.031 0.01 0.003 0.757 0.223 0.005 𝑅 2 0.266 0.046 0.037 0.029 0.03

Effect of sample size Increasing sample size
Brings the coefficients closer to their true value. Reduces the standard errors of the coefficients.

OLS assumptions Important to understand that any mathematical model of an economic question rests on assumptions. So does a statistical model.  same applies to an econometric model.

OLS assumptions One needs to understand the assumptions that allow a particular interpretation of the results. Crucial to understand the assumptions & their implications. Crucial to form an opinion / test the validity of assumptions and/or the robustness of results to those assumptions.

OLS assumption #1 E 𝑢 𝑋 =0 Implies that 𝑢 and 𝑋 are uncorrelated.
If E 𝑢 𝑋 =0, then 𝑐𝑜𝑣 𝑢,𝑋 =0). Not the other way round (as correlation is about a linear relationship only).

Back to beer... 𝛽 0 𝑙𝑛𝑚12= 𝛽 0 + 𝛽 1 lnm24 𝛽 1
𝑢 𝑖 = 𝑙𝑛𝑚12 𝑖 − (𝛽 0 + 𝛽 1 𝑙𝑛𝑚24 𝑖 ) ( 𝑙𝑛𝑚24 𝑖 , 𝑙𝑛𝑚12 𝑖 )

OLS assumption #2 𝑋 𝑖 , 𝑌 𝑖 i = 1, …, n are i.i.d.
The same concept as before, but now over a joint distribution of two variables. Experiments where X chosen. Time series.

OLS assumption #3 𝑋 𝑖 and 𝑌 𝑖 have nonzero finite fourth moments.
= they have finite kurtosis. Needed to ensure that the standard errors are from a normal distribution (4th moment ≈ variance of variance). Means that large outliers are (extremely) unlikely.

OLS assumption #4 (auxiliary)
𝑢 𝑖 is homoscedastic (as opposed to heteroscedastic). Means 𝑣𝑎𝑟(𝑢 𝑖 𝑋 𝑖 =𝑥 = 𝜎 2 for i = 1, …, n. Alternative: 𝑣𝑎𝑟(𝑢 𝑖 𝑋 𝑖 =𝑥 = 𝜎 𝑖 2 . 𝜎

The Gauss-Markov Theorem
If A.1 – A.4 hold, then OLS is BLUE (Best Linear conditionally Unbiased Estimator).

Why these assumptions?

Assumption #4: homoscedasticity

What to assume about the variance of u?
In practice, data have/lead to heteroscedastic errors almost always.  easy and efficient ways to correct for heteroscedasticity. Modern default is to use (heteroscedasticity) robust standard errors. Wrong assumption on variance of the error term biases standard errors, not coefficients.

Let’s illustrate The data generating process: 𝑋=2+𝑟𝑛𝑜𝑟𝑚𝑎𝑙() Case #1:
𝑢=𝑟𝑛𝑜𝑟𝑚𝑎𝑙() Case#2: 𝑢 ℎ𝑒𝑡 =𝑟𝑛𝑜𝑟𝑚𝑎𝑙()×(1+0.15×𝑋) Notice: both satisfy E 𝑢 𝑋 =0.

Let’s illustrate 𝑌=1+𝑋+𝑢 𝑌 ℎ𝑒𝑡 =1+𝑋+ 𝑢 ℎ𝑒𝑡

Data Variable Obs Mean Std. Dev. Min Max X 10000 1.9961 0.9925 -1.6111
5.6107 u 0.9999 4.5606 u_het 1.3083 5.3741 Y 2.9943 1.4091 8.5691 Y_het 2.9937 1.6431

Correlations X u u_het Y Y_het 1 0.0003 0.0011 0.9934 0.7046 0.7098
0.7057 0.605 0.7912 0.7969 0.9876

Comparison

Let’s illustrate further
𝑢 ℎ𝑒𝑡 =𝑟𝑛𝑜𝑟𝑚𝑎𝑙 × 1+𝑎×𝑋 Let 𝑎=1, …, 10. 𝑌 ℎ𝑒𝑡 =1+𝑋+ 𝑢 ℎ𝑒𝑡 Notice: constant and coefficient of X both = 1.

Variable het_0 het_1 het_2 het_3 het_4 het_5 het_6 het_7 het_8 het_9 het_10 X 1.0000 0.0100 0.0000 Const 0.9980 0.0220 r2 0.4960

Variable het_0 het_1 het_2 het_3 het_4 het_5 het_6 het_7 het_8 het_9 het_10 X 1.0000 1.0080 0.0100 0.0320 0.0000 Const 0.9980 0.9780 0.0220 0.0710 r2 0.4960 0.0910

Variable het_0 het_1 het_2 het_3 het_4 het_5 het_6 het_7 het_8 het_9 het_10 X 1.0000 1.0080 1.0160 1.0250 1.0330 1.0410 0.0100 0.0320 0.0540 0.0770 0.0990 0.1220 0.0000 Const 0.9980 0.9780 0.9590 0.9390 0.9200 0.9000 0.0220 0.0710 0.1210 0.1710 0.2210 0.2710 0.0010 r2 0.4960 0.0910 0.0340 0.0180 0.0110 0.0070

Variable het_0 het_1 het_2 het_3 het_4 het_5 het_6 het_7 het_8 het_9 het_10 X 1.0000 1.0080 1.0160 1.0250 1.0330 1.0410 1.0490 1.0570 1.0650 1.0730 1.0810 0.0100 0.0320 0.0540 0.0770 0.0990 0.1220 0.1440 0.1670 0.1890 0.2120 0.2340 0.0000 Const 0.9980 0.9780 0.9590 0.9390 0.9200 0.9000 0.8800 0.8610 0.8410 0.8220 0.8020 0.0220 0.0710 0.1210 0.1710 0.2210 0.2710 0.3210 0.3720 0.4220 0.4720 0.5220 0.0010 0.0060 0.0210 0.0460 0.0820 0.1240 r2 0.4960 0.0910 0.0340 0.0180 0.0110 0.0070 0.0050 0.0040 0.0030 0.0020

Assumption #3: no (large) outliers
(large) outliers may lead to a biased estimate. Difficulty is of course to determine what is large. For illustration, let’s change the value of X for one obs to 50. Recall, E[X] = 2, var[X] = 1, min[X] = -1.6, max[X] = 5.6. The value we replace with 50 is

Variable real 50 100 1000 5000 10000 X 1.000 0.010 0.000 const 0.998 0.022 r2 0.496

Variable real 50 100 1000 5000 10000 X 1.000 -0.017 0.010 0.029 0.000 0.552 const 0.998 2.894 0.022 0.211 r2 0.496 0.007

Variable real 50 100 1000 5000 10000 X 1.000 -0.017 0.005 0.010 0.029 0.030 0.000 0.552 0.857 const 0.998 2.894 0.022 0.211 0.164 r2 0.496 0.007

Variable real 50 100 1000 5000 10000 X 1.000 -0.017 0.005 0.283 0.010 0.029 0.030 0.023 0.000 0.552 0.857 const 0.998 2.894 2.367 0.022 0.211 0.164 0.061 r2 0.496 0.007 0.134

Variable real 50 100 1000 5000 10000 X 1.000 -0.017 0.005 0.283 0.677 0.010 0.029 0.030 0.023 0.014 0.000 0.552 0.857 const 0.998 2.894 2.367 1.633 0.022 0.211 0.164 0.061 0.032 r2 0.496 0.007 0.134 0.331

Variable real 50 100 1000 5000 10000 X 1.000 -0.017 0.005 0.283 0.677 0.803 0.010 0.029 0.030 0.023 0.014 0.000 0.552 0.857 const 0.998 2.894 2.367 1.633 1.387 0.022 0.211 0.164 0.061 0.032 r2 0.496 0.007 0.134 0.331 0.395

Original data Original data, scale of X-axis changed Outlier data

OLS assumption #2 𝑋 𝑖 , 𝑌 𝑖 i = 1, …, n are i.i.d.
If this not true, then not taking a truly random sample from the whole population. Effect depends on how the assumption violated. Illustration: pick randomly ( 𝑋 𝑖 , 𝑌 𝑖 ) if 𝑋 𝑖 >3. Recall, E[X] = 2

Non-random sample Variable real X>2.5 X>3 X>3.5 X 1.0000
0.0100 0.0000 const 0.9980 0.0220 r2 0.4960 nobs 10000

1.0110 0.0100 0.0360 0.0000 const 0.9980 0.9590 0.0220 0.1140 r2 0.4960 0.2070 nobs 10000 3031

1.0110 1.0170 0.0100 0.0360 0.0600 0.0000 const 0.9980 0.9590 0.9330 0.0220 0.1140 0.2150 r2 0.4960 0.2070 0.1540 nobs 10000 3031 1554

1.0110 1.0170 1.0890 0.0100 0.0360 0.0600 0.1080 0.0000 const 0.9980 0.9590 0.9330 0.6460 0.0220 0.1140 0.2150 0.4250 0.1290 r2 0.4960 0.2070 0.1540 0.1360 nobs 10000 3031 1554 650

OLS assumption #2 Notice: all the estimators in the table are correct for the (sub)sample they are based on. The question is how to interpret the results. Notice too: all the coefficients of X ”close” to 1 (compare to s.e.). Why? Think of the data generating process outlined earlier.

OLS assumption #1 E 𝑢 𝑋 =0 Implies that 𝑢 and 𝑋 are uncorrelated.
Not the other way round (as correlation is about a linear relationship only).

What would a violation of A#1 mean?
𝑐𝑜𝑣 𝑢,𝑋 ≠0 So what? Let’s find out.

Let’s make X and u positively correlated.
Let 𝑐𝑜𝑟𝑟(𝑋,𝑢) go from 0 to 0.9 in steps of 0.1. Re-estimate the model each time. What would you expect to happen?

Positive correlation between X and u
Variable 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 1.0000 0.0100 0.0000 _cons 0.9980 0.0220 r2 0.4960

Variable 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 1.0000 1.0020 0.0100 0.0000 _cons 0.9980 0.9940 0.0220 r2 0.4960 0.5040

Variable 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 1.0000 1.0020 1.1010 1.2070 1.3030 1.3930 0.0100 0.0090 0.0000 _cons 0.9980 0.9940 0.7940 0.5960 0.3750 0.2060 0.0220 0.0210 r2 0.4960 0.5040 0.5480 0.6480 0.6910

Variable 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 1.0000 1.0020 1.1010 1.2070 1.3030 1.3930 1.4900 1.5940 1.7070 1.8000 1.9000 0.0100 0.0090 0.0080 0.0070 0.0060 0.0040 0.0000 _cons 0.9980 0.9940 0.7940 0.5960 0.3750 0.2060 0.0140 0.0220 0.0210 0.0190 0.0180 0.0160 0.0130 0.4540 r2 0.4960 0.5040 0.5480 0.6480 0.6910 0.7460 0.7960 0.8500 0.9030 0.9490

Let’s make X and u negatively correlated.
Let 𝑐𝑜𝑟𝑟(𝑋,𝑢) go from 0 to -0.9 in steps of 0.1. Re-estimate the model each time. What would you expect to happen?

Negative correlation between X and u
Variable 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 1.0000 0.9950 0.9190 0.8060 0.7080 0.5890 0.5010 0.3970 0.2910 0.2000 0.1050 0.0100 0.0090 0.0080 0.0070 0.0060 0.0040 0.0000 _cons 0.9980 1.0280 1.1830 1.3970 1.5710 1.8320 2.0100 2.2010 2.4140 2.5980 2.7830 0.0220 0.0210 0.0190 0.0180 0.0160 0.0130 r2 0.4960 0.5020 0.4620 0.4130 0.3510 0.2900 0.2510 0.2030 0.1430 0.1010 0.0540

Violation of A#1 leads to biased coefficient estimates.
The bias increases in a systematic fashion. Important to understand how this happens.

Lectures 3&4 Univariate regression

Similar presentations

Presentation on theme: "Lectures 3&4 Univariate regression"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lectures 3&4 Univariate regression

Similar presentations

Presentation on theme: "Lectures 3&4 Univariate regression"— Presentation transcript:

Similar presentations

About project

Feedback