Lectures 3&4 Univariate regression

Slides:



Advertisements
Similar presentations
Request Dispatching for Cheap Energy Prices in Cloud Data Centers
Advertisements

SpringerLink Training Kit
Luminosity measurements at Hadron Colliders
From Word Embeddings To Document Distances
Choosing a Dental Plan Student Name
Virtual Environments and Computer Graphics
Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI
THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –
D. Phát triển thương hiệu
NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN
Điều trị chống huyết khối trong tai biến mạch máu não
BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.
Nasal Cannula X particulate mask
Evolving Architecture for Beyond the Standard Model
HF NOISE FILTERS PERFORMANCE
Electronics for Pedestrians – Passive Components –
Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel
L-Systems and Affine Transformations
CMSC423: Bioinformatic Algorithms, Databases and Tools
Some aspect concerning the LMDZ dynamical core and its use
Bayesian Confidence Limits and Intervals
实习总结 (Internship Summary)
Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,
Front End Electronics for SOI Monolithic Pixel Sensor
Face Recognition Monday, February 1, 2016.
Solving Rubik's Cube By: Etai Nativ.
CS284 Paper Presentation Arpad Kovacs
انتقال حرارت 2 خانم خسرویار.
Summer Student Program First results
Theoretical Results on Neutrinos
HERMESでのHard Exclusive生成過程による 核子内クォーク全角運動量についての研究
Wavelet Coherence & Cross-Wavelet Transform
yaSpMV: Yet Another SpMV Framework on GPUs
Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.
MOCLA02 Design of a Compact L-­band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Fuel cell development program for electric vehicle
Overview of TST-2 Experiment
Optomechanics with atoms
داده کاوی سئوالات نمونه
Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium  
ლექცია 4 - ფული და ინფლაცია
10. predavanje Novac i financijski sustav
Wissenschaftliche Aussprache zur Dissertation
FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,
Particle acceleration during the gamma-ray flares of the Crab Nebular
Interpretations of the Derivative Gottfried Wilhelm Leibniz
Advisor: Chiuyuan Chen Student: Shao-Chun Lin
Widow Rockfish Assessment
SiW-ECAL Beam Test 2015 Kick-Off meeting
On Robust Neighbor Discovery in Mobile Wireless Networks
Chapter 6 并发:死锁和饥饿 Operating Systems: Internals and Design Principles
You NEED your book!!! Frequency Distribution
Y V =0 a V =V0 x b b V =0 z
Fairness-oriented Scheduling Support for Multicore Systems
Climate-Energy-Policy Interaction
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Ch48 Statistics by Chtan FYHSKulai
The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.
Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs
Online Learning: An Introduction
Factor Based Index of Systemic Stress (FISS)
What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.
THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*
Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.
The Toroidal Sporadic Source: Understanding Temporal Variations
FW 3.4: More Circle Practice
ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف
Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM
Limits on Anomalous WWγ and WWZ Couplings from DØ
Presentation transcript:

Lectures 3&4 Univariate regression

The effect of X on Y Are female bosses better? Does having a PhD (in science) help to innovate? Is website design A better than design B in terms of sales? Do people of different age buy different gadgets from Elisa? Are promotions of substitute products of the same firm at the same time?

Let’s look at Miller’s beer Maxim Sinitsyn (2015) Managing Price Promotions Within a Product Line. Marketing Science Published online in Articles in Advance 12 Oct 2015 http://pubsonline.informs.org/doi/abs/10.1287/mksc.2015.0938

Two products – m12 and m24 Data over 221 weeks. Miller Lite 12/12 oz (”m12”) and Miller Lite 24/12 oz (”m24”). Information on price of m12 and m24 relative to their ”regular price”, in %. Define a promotion as ”price < regular price”.

Variables m12 = price of Miller Lite 12/12 oz relative to regular price m24 = price of Miller Lite 24/12 oz relative to regular price m12_prom = 0 if no promotion, 1 if promotion (dummy variable) for m12. m24_prom = 0 if no promotion, 1 if promotion (dummy variable) for m24. lnm12 = natural logarithm of m12 lnm24 = natural logarithm of m24

Descriptive statistics Variable Obs Mean Std. Dev. Min Max miller12 221 0.95 0.08 0.74 1.00 miller24 0.94 0.06 0.79 m12_prom 0.29 0.45 0.00 m24_prom 0.59 0.49 lnm12 -0.05 0.09 -0.30 lnm24 -0.06 -0.24 sum *

Frequency of promotions

twoway kdensity miller12 || kdensity miller24, /// title("Distribution of price of m12 and m24") /// legend(lab (1 "m12") lab (2 "m24")) /// xtitle("price")

Conditional descriptive statistics miller12 < 1 Variable Obs Mean Std. Dev. Min Max miller12 63 0.83 0.05 0.74 0.97 miller24 0.93 0.06 0.79 1.00 m12_prom 0.00 m24_prom 0.68 0.47 miller24 < 1 130 0.94 0.09 0.90 0.04 0.99 0.33 sum miller* *prom if miller12 < 1

Joint distribution of promotions 1 Total 71 87 158 m12 32.13 39.37 71.49 20 43 63 9.05 19.46 28.51 91 130 221 41.18 58.82 100 tab m12_prom m24_prom, cell

Modeling Q1: what is the object you want to model (”explain”)? Let’s call this Y. Q2: what is the object whose effect on Y you want to understand? Let’s call this X.

Modeling Where do these (decisions) come from? Theory. What is theory? Mathematical model. Conseptualization of existing qualitative knowledge. Conseptualization of existing quantitative knowledge.

Modeling the relationship between m12 and m24 prices? 𝑚12=𝑓 𝑚24 𝑚12_𝑝𝑟𝑜𝑚=𝑓 𝑚24_𝑝𝑟𝑜𝑚 ln𝑚12=𝑓 ln𝑚24 𝑌=𝑓 𝑋 What do we know about 𝑓 𝑋 ? How can we learn about it?

Quick aside - correlation 𝑐𝑜𝑟𝑟 𝑌,𝑋 = 𝑐𝑜𝑣(𝑌,𝑋) 𝑣𝑎𝑟(𝑋) 𝑣𝑎𝑟(𝑌)

More structure - linear 𝑌= 𝛽 0 + 𝛽 1 𝑋 This is the so-called population regression line (populaatio regressio). 𝑌 = dependent variable (vastemuuttuja) / endogenous variable. 𝑋 = independent variable (selittävä muuttuja) / exogenous variable / regressor. 𝛽 0 , 𝛽 1 = parameters of the model.

Parameters 𝑌= 𝛽 0 + 𝛽 1 𝑋 𝛽 0 , 𝛽 1 . Interpretation? 𝑌= 𝛽 0 + 𝛽 1 𝑋 𝛽 0 , 𝛽 1 . Interpretation? Intercept, slope. What is now assumed about what can influence 𝑌?

How to allow for other factors? 𝑌=𝑓 𝑋,𝑢 = 𝛽 0 + 𝛽 1 𝑋+𝑢 𝑢 = error term/residual (virhetermi/jäännöstermi). Why such a name? It shows how much our model misses in terms of determining 𝑌. It measures those things that 1) affect 𝑌 and 2) we don’t observe.

What is known about 𝑢? How large should the error be on average? 0. Why?  E 𝑢 𝑋 =0.

How to get 𝛽 0 , 𝛽 1 ?

How to get 𝛽 0 , 𝛽 1 ?

How to get 𝛽 0 , 𝛽 1 ?

How to get 𝛽 0 , 𝛽 1 ? OLS Ordinary Least Squares (pienimmän neliösumman menetelmä). 𝑌= 𝛽 0 + 𝛽 1 𝑋+𝑢 𝐸 𝑌− 𝛽 0 + 𝛽 1 𝑋 =𝐸 𝑢 𝑋 =0 𝑚𝑖𝑛 𝛽 0 , 𝛽 1 𝑖=1 𝑛 𝑌− 𝛽 0 + 𝛽 1 𝑋 2

How to get 𝛽 0 , 𝛽 1 ? OLS Notice link to estimation of mean: set 𝛽 1 =0. 𝑖=1 𝑛 𝑌− 𝛽 0 2 Now 𝛽 0 =𝑚= 𝜇 𝑌

How to get 𝛽 0 , 𝛽 1 ? OLS Predicted value (”ennuste”) 𝛽 1 = 𝑖=1 𝑛 𝑋𝑌− 𝑋 𝑌 𝑖=1 𝑛 𝑋𝑋− 𝑋 𝑋 = 𝑐𝑜𝑣(𝑌,𝑋) 𝑣𝑎𝑟(𝑋) = 𝑐𝑜𝑣(𝑌,𝑋) 𝑣𝑎𝑟(𝑋) 𝑣𝑎𝑟(𝑋) 𝛽 0 = 𝑌 − 𝑖=1 𝑛 𝑋𝑌− 𝑋 𝑌 𝑖=1 𝑛 𝑋𝑋− 𝑋 𝑋 𝑋 = 𝑌 − 𝛽 1 𝑋 𝑌 𝑖 = 𝛽 0 + 𝛽 1 𝑋 𝑖 𝑢 𝑖 = 𝑌 𝑖 − ( 𝛽 0 + 𝛽 1 𝑋 𝑖 ) Predicted value (”ennuste”) Prediction error (”ennustevirhe”)

Back to beer... 𝛽 0 𝑙𝑛𝑚12= 𝛽 0 + 𝛽 1 lnm24 𝛽 1 𝑢 𝑖 = 𝑙𝑛𝑚12 𝑖 − (𝛽 0 + 𝛽 1 𝑙𝑛𝑚24 𝑖 ) ( 𝑙𝑛𝑚24 𝑖 , 𝑙𝑛𝑚12 𝑖 )

Back to beer… regr miller12 miller24 estimates store lin_est regr lnm12 lnm24 estimates store ln_est regr m12_prom m24_prom estimates store pr_est estimates table lin_est ln_est pr_est, b(%7.3f) se(%7.3f) p(%7.3f) stats(r2)

Back to beer... Dependent variable miller12 lnm12 m12_prom miller24 0.214 0.095 0.025 lnm24 0.220 0.098 0.026 m24_prom 0.111 0.062 0.073 constant 0.749 -0.041 0.090 0.009 0.047 0.000 𝑅 2 0.023 0.015 Coefficient / parameter estimate kerroin Standard error / keskivirhe p-value / p-arvo

What are these numbers? How good is the model’s fit? How much does it explain? Of what....? Of the variation in Y.

What are these numbers? Selitetty neliösumma Kokonaisneliösumma 𝐸𝑆𝑆= 𝑖=1 𝑛 ( 𝑌 𝑖 − 𝑌 ) 2 𝑇𝑆𝑆= 𝑖=1 𝑛 ( 𝑌 𝑖 − 𝑌 ) 2 𝑅𝑆𝑆= 𝑖=1 𝑛 ( 𝑢 𝑖 ) 2 Selitetty neliösumma Kokonaisneliösumma Jäännöstermin neliösumma

What are these numbers? 𝑅 2 = 𝐸𝑆𝑆 𝑇𝑆𝑆 =1− 𝑅𝑆𝑆 𝑇𝑆𝑆 𝑅 2 ∈[0,1]

What are these numbers? Dependent variable miller12 lnm12 m12_prom 𝑅 2 0.023 0.015

What are these numbers? Dependent variable miller12 miller24 0.214 0.095 0.025 lnm24 m24_prom constant 0.749 0.090 0.000 𝑅 2 0.023

What are these numbers? Dependent variable lnm12 miller24 lnm24 0.220 0.098 0.026 m24_prom constant -0.041 0.009 0.000 𝑅 2 0.023

What are these numbers? Dependent variable m12_prom miller24 lnm24 0.111 0.062 0.073 constant 0.220 0.047 0.000 𝑅 2 0.015 m12_prom m24_prom 1

What are these numbers? So economic interpretation & significance is of key importance. What about statistical significance? Under assumptions that we’ll discuss in a moment, 𝛽 0 , 𝛽 1 are normally distributed with a known mean and variance.

What are these numbers? 𝛽 0 , 𝛽 1 are unbiased consistent and efficient (with an extra assumption). under a set of assumptions.

Let’s have a look at 𝑙𝑛𝑚12= 𝛽 0 + 𝛽 1 𝑙𝑛𝑚24+𝑢 Let’s vary sample size. How do we do this?

(Monte Carlo) simulation Let’s use artificial data that has ”appealing” features. Artificial data = ask the computer to generate it.  the researcher chooses what the data looks like. Monte Carlo simulation = repeat a statistical model S times on artificial data, look at means and distributions of parameters.

(Monte Carlo) simulation 0.057871 = standard deviation of lnm24.

e1 lnm24 u lnm12 0.0516265 -0.2681188 0.5102922 -0.2609462 -0.132724 -0.0660694 -0.7103841 -0.5395586 -0.3153001 0.0673407

(Monte Carlo) simulation ln𝑚24=−.0616435+ e1 −.0616435 = mean of lnm24.

e1 lnm24 u lnm12 0.0516265 -0.010017 -0.2681188 -0.3297623 0.5102922 0.4486487 -0.2609462 -0.3225898 -0.132724 -0.1943675 -0.0660694 -0.1277129 -0.7103841 -0.7720276 -0.5395586 -0.6012021 -0.3153001 -0.3769436 0.0673407 0.0056972

(Monte Carlo) simulation ln𝑚24=−.0616435+ e1 𝑢=𝑟𝑛𝑜𝑟𝑚𝑎𝑙 ∗ .0908283 .0908283 = standard deviation of lnm12 (after variation in lnm24 taken into account).

e1 lnm24 u lnm12 0.0516265 -0.010017 0.2623541 -0.2681188 -0.3297623 -0.6086857 0.5102922 0.4486487 0.0321724 -0.2609462 -0.3225898 -0.0425321 -0.132724 -0.1943675 -0.1827693 -0.0660694 -0.1277129 0.8126694 -0.7103841 -0.7720276 -0.4009946 -0.5395586 -0.6012021 0.1826611 -0.3153001 -0.3769436 -0.3657082 0.0673407 0.0056972 0.194316

(Monte Carlo) simulation e1 = 𝑟𝑛𝑜𝑟𝑚𝑎𝑙 ∗ 0.057871 ln𝑚24=−.0616435+ e1 𝑢=𝑟𝑛𝑜𝑟𝑚𝑎𝑙 ∗ .0908283 𝛽 0 =−.0410283 𝛽 1 =.2196687 ln𝑚12= 𝛽 0 + 𝛽 1 𝑙𝑛𝑚24+𝑢

e1 lnm24 u lnm12 0.0516265 -0.010017 0.2623541 0.2191254 -0.2681188 -0.3297623 -0.6086857 -0.7221525 0.5102922 0.4486487 0.0321724 0.0896982 -0.2609462 -0.3225898 -0.0425321 -0.1544232 -0.132724 -0.1943675 -0.1827693 -0.266494 -0.0660694 -0.1277129 0.8126694 0.7435866 -0.7103841 -0.7720276 -0.4009946 -0.6116132 -0.5395586 -0.6012021 0.1826611 0.0095676 -0.3153001 -0.3769436 -0.3657082 -0.4895392 0.0673407 0.0056972 0.194316 0.1545392

Different sample sizes Effect of sample size Different sample sizes truth 10 obs 100 1000 10K 100K 1M 10M m24 0.220 0.668 0.392 0.127 constant -0.041 0.05 0.155 0.757 𝑅 2 0.266 0.000

Different sample sizes Effect of sample size Different sample sizes truth 10 obs 100 1000 10K 100K 1M 10M m24 0.220 0.668 0.256 0.392 0.118 0.127 0.032 constant -0.041 0.05 -0.038 0.155 0.031 0.757 0.223 𝑅 2 0.266 0.046

Different sample sizes Effect of sample size Different sample sizes truth 10 obs 100 1000 10K 100K 1M 10M m24 0.220 0.668 0.256 0.242 0.392 0.118 0.039 0.127 0.032 0.000 constant -0.041 0.05 -0.038 -0.028 0.155 0.031 0.01 0.757 0.223 0.005 𝑅 2 0.266 0.046 0.037

Different sample sizes Effect of sample size Different sample sizes truth 10 obs 100 1000 10K 100K 1M 10M m24 0.220 0.668 0.256 0.242 0.214 0.392 0.118 0.039 0.012 0.127 0.032 0.000 constant -0.041 0.05 -0.038 -0.028 0.155 0.031 0.01 0.003 0.757 0.223 0.005 𝑅 2 0.266 0.046 0.037 0.029

Different sample sizes Effect of sample size Different sample sizes truth 10 obs 100 1000 10K 100K 1M 10M m24 0.220 0.668 0.256 0.242 0.214 0.217 0.392 0.118 0.039 0.012 0.004 0.127 0.032 0.000 constant -0.041 0.05 -0.038 -0.028 -0.042 0.155 0.031 0.01 0.003 0.001 0.757 0.223 0.005 𝑅 2 0.266 0.046 0.037 0.029

Different sample sizes Effect of sample size Different sample sizes truth 10 obs 100 1000 10K 100K 1M 10M m24 0.220 0.668 0.256 0.242 0.214 0.217 0.219 0.392 0.118 0.039 0.012 0.004 0.001 0.127 0.032 0.000 constant -0.041 0.05 -0.038 -0.028 -0.042 0.155 0.031 0.01 0.003 0.757 0.223 0.005 𝑅 2 0.266 0.046 0.037 0.029 0.03

Different sample sizes Effect of sample size Different sample sizes truth 10 obs 100 1000 10K 100K 1M 10M m24 0.220 0.668 0.256 0.242 0.214 0.217 0.219 0.392 0.118 0.039 0.012 0.004 0.001 0.127 0.032 0.000 constant -0.041 0.05 -0.038 -0.028 -0.042 0.155 0.031 0.01 0.003 0.757 0.223 0.005 𝑅 2 0.266 0.046 0.037 0.029 0.03

Effect of sample size Increasing sample size Brings the coefficients closer to their true value. Reduces the standard errors of the coefficients.

OLS assumptions Important to understand that any mathematical model of an economic question rests on assumptions. So does a statistical model.  same applies to an econometric model.

OLS assumptions One needs to understand the assumptions that allow a particular interpretation of the results. Crucial to understand the assumptions & their implications. Crucial to form an opinion / test the validity of assumptions and/or the robustness of results to those assumptions.

OLS assumption #1 E 𝑢 𝑋 =0 Implies that 𝑢 and 𝑋 are uncorrelated. If E 𝑢 𝑋 =0, then 𝑐𝑜𝑣 𝑢,𝑋 =0). Not the other way round (as correlation is about a linear relationship only).

Back to beer... 𝛽 0 𝑙𝑛𝑚12= 𝛽 0 + 𝛽 1 lnm24 𝛽 1 𝑢 𝑖 = 𝑙𝑛𝑚12 𝑖 − (𝛽 0 + 𝛽 1 𝑙𝑛𝑚24 𝑖 ) ( 𝑙𝑛𝑚24 𝑖 , 𝑙𝑛𝑚12 𝑖 )

OLS assumption #2 𝑋 𝑖 , 𝑌 𝑖 i = 1, …, n are i.i.d. The same concept as before, but now over a joint distribution of two variables. Experiments where X chosen. Time series.

OLS assumption #3 𝑋 𝑖 and 𝑌 𝑖 have nonzero finite fourth moments. = they have finite kurtosis. Needed to ensure that the standard errors are from a normal distribution (4th moment ≈ variance of variance). Means that large outliers are (extremely) unlikely.

OLS assumption #4 (auxiliary) 𝑢 𝑖 is homoscedastic (as opposed to heteroscedastic). Means 𝑣𝑎𝑟(𝑢 𝑖 𝑋 𝑖 =𝑥 = 𝜎 2 for i = 1, …, n. Alternative: 𝑣𝑎𝑟(𝑢 𝑖 𝑋 𝑖 =𝑥 = 𝜎 𝑖 2 . 𝜎

The Gauss-Markov Theorem If A.1 – A.4 hold, then OLS is BLUE (Best Linear conditionally Unbiased Estimator).

Why these assumptions?

Assumption #4: homoscedasticity

What to assume about the variance of u? In practice, data have/lead to heteroscedastic errors almost always.  easy and efficient ways to correct for heteroscedasticity. Modern default is to use (heteroscedasticity) robust standard errors. Wrong assumption on variance of the error term biases standard errors, not coefficients.

Let’s illustrate The data generating process: 𝑋=2+𝑟𝑛𝑜𝑟𝑚𝑎𝑙() Case #1: 𝑢=𝑟𝑛𝑜𝑟𝑚𝑎𝑙() Case#2: 𝑢 ℎ𝑒𝑡 =𝑟𝑛𝑜𝑟𝑚𝑎𝑙()×(1+0.15×𝑋) Notice: both satisfy E 𝑢 𝑋 =0.

Let’s illustrate 𝑌=1+𝑋+𝑢 𝑌 ℎ𝑒𝑡 =1+𝑋+ 𝑢 ℎ𝑒𝑡

Data Variable Obs Mean Std. Dev. Min Max X 10000 1.9961 0.9925 -1.6111 5.6107 u -0.0018 0.9999 -4.1546 4.5606 u_het -0.0023 1.3083 -4.7413 5.3741 Y 2.9943 1.4091 -2.6366 8.5691 Y_het 2.9937 1.6431 -2.9594 10.4786

Correlations X u u_het Y Y_het 1 0.0003 0.0011 0.9934 0.7046 0.7098 0.7057 0.605 0.7912 0.7969 0.9876

Comparison

Let’s illustrate further 𝑢 ℎ𝑒𝑡 =𝑟𝑛𝑜𝑟𝑚𝑎𝑙 × 1+𝑎×𝑋 Let 𝑎=1, …, 10. 𝑌 ℎ𝑒𝑡 =1+𝑋+ 𝑢 ℎ𝑒𝑡 Notice: constant and coefficient of X both = 1.

Variable het_0 het_1 het_2 het_3 het_4 het_5 het_6 het_7 het_8 het_9 het_10 X 1.0000 0.0100 0.0000 Const 0.9980 0.0220 r2 0.4960

Variable het_0 het_1 het_2 het_3 het_4 het_5 het_6 het_7 het_8 het_9 het_10 X 1.0000 1.0080 0.0100 0.0320 0.0000 Const 0.9980 0.9780 0.0220 0.0710 r2 0.4960 0.0910

Variable het_0 het_1 het_2 het_3 het_4 het_5 het_6 het_7 het_8 het_9 het_10 X 1.0000 1.0080 1.0160 1.0250 1.0330 1.0410 0.0100 0.0320 0.0540 0.0770 0.0990 0.1220 0.0000 Const 0.9980 0.9780 0.9590 0.9390 0.9200 0.9000 0.0220 0.0710 0.1210 0.1710 0.2210 0.2710 0.0010 r2 0.4960 0.0910 0.0340 0.0180 0.0110 0.0070

Variable het_0 het_1 het_2 het_3 het_4 het_5 het_6 het_7 het_8 het_9 het_10 X 1.0000 1.0080 1.0160 1.0250 1.0330 1.0410 1.0490 1.0570 1.0650 1.0730 1.0810 0.0100 0.0320 0.0540 0.0770 0.0990 0.1220 0.1440 0.1670 0.1890 0.2120 0.2340 0.0000 Const 0.9980 0.9780 0.9590 0.9390 0.9200 0.9000 0.8800 0.8610 0.8410 0.8220 0.8020 0.0220 0.0710 0.1210 0.1710 0.2210 0.2710 0.3210 0.3720 0.4220 0.4720 0.5220 0.0010 0.0060 0.0210 0.0460 0.0820 0.1240 r2 0.4960 0.0910 0.0340 0.0180 0.0110 0.0070 0.0050 0.0040 0.0030 0.0020

Assumption #3: no (large) outliers (large) outliers may lead to a biased estimate. Difficulty is of course to determine what is large. For illustration, let’s change the value of X for one obs to 50. Recall, E[X] = 2, var[X] = 1, min[X] = -1.6, max[X] = 5.6. The value we replace with 50 is -0.084.

Assumption #3: no (large) outliers Variable real 50 100 1000 5000 10000 X 1.000 0.010 0.000 const 0.998 0.022 r2 0.496

Assumption #3: no (large) outliers Variable real 50 100 1000 5000 10000 X 1.000 -0.017 0.010 0.029 0.000 0.552 const 0.998 2.894 0.022 0.211 r2 0.496 0.007

Assumption #3: no (large) outliers Variable real 50 100 1000 5000 10000 X 1.000 -0.017 0.005 0.010 0.029 0.030 0.000 0.552 0.857 const 0.998 2.894 0.022 0.211 0.164 r2 0.496 0.007

Assumption #3: no (large) outliers Variable real 50 100 1000 5000 10000 X 1.000 -0.017 0.005 0.283 0.010 0.029 0.030 0.023 0.000 0.552 0.857 const 0.998 2.894 2.367 0.022 0.211 0.164 0.061 r2 0.496 0.007 0.134

Assumption #3: no (large) outliers Variable real 50 100 1000 5000 10000 X 1.000 -0.017 0.005 0.283 0.677 0.010 0.029 0.030 0.023 0.014 0.000 0.552 0.857 const 0.998 2.894 2.367 1.633 0.022 0.211 0.164 0.061 0.032 r2 0.496 0.007 0.134 0.331

Assumption #3: no (large) outliers Variable real 50 100 1000 5000 10000 X 1.000 -0.017 0.005 0.283 0.677 0.803 0.010 0.029 0.030 0.023 0.014 0.000 0.552 0.857 const 0.998 2.894 2.367 1.633 1.387 0.022 0.211 0.164 0.061 0.032 r2 0.496 0.007 0.134 0.331 0.395

Original data Original data, scale of X-axis changed Outlier data

OLS assumption #2 𝑋 𝑖 , 𝑌 𝑖 i = 1, …, n are i.i.d. If this not true, then not taking a truly random sample from the whole population. Effect depends on how the assumption violated. Illustration: pick randomly ( 𝑋 𝑖 , 𝑌 𝑖 ) if 𝑋 𝑖 >3. Recall, E[X] = 2

Non-random sample Variable real X>2.5 X>3 X>3.5 X 1.0000 0.0100 0.0000 const 0.9980 0.0220 r2 0.4960 nobs 10000

Non-random sample Variable real X>2.5 X>3 X>3.5 X 1.0000 1.0110 0.0100 0.0360 0.0000 const 0.9980 0.9590 0.0220 0.1140 r2 0.4960 0.2070 nobs 10000 3031

Non-random sample Variable real X>2.5 X>3 X>3.5 X 1.0000 1.0110 1.0170 0.0100 0.0360 0.0600 0.0000 const 0.9980 0.9590 0.9330 0.0220 0.1140 0.2150 r2 0.4960 0.2070 0.1540 nobs 10000 3031 1554

Non-random sample Variable real X>2.5 X>3 X>3.5 X 1.0000 1.0110 1.0170 1.0890 0.0100 0.0360 0.0600 0.1080 0.0000 const 0.9980 0.9590 0.9330 0.6460 0.0220 0.1140 0.2150 0.4250 0.1290 r2 0.4960 0.2070 0.1540 0.1360 nobs 10000 3031 1554 650

OLS assumption #2 Notice: all the estimators in the table are correct for the (sub)sample they are based on. The question is how to interpret the results. Notice too: all the coefficients of X ”close” to 1 (compare to s.e.). Why? Think of the data generating process outlined earlier.

OLS assumption #1 E 𝑢 𝑋 =0 Implies that 𝑢 and 𝑋 are uncorrelated. Not the other way round (as correlation is about a linear relationship only).

What would a violation of A#1 mean? 𝑐𝑜𝑣 𝑢,𝑋 ≠0 So what? Let’s find out.

Let’s make X and u positively correlated. Let 𝑐𝑜𝑟𝑟(𝑋,𝑢) go from 0 to 0.9 in steps of 0.1. Re-estimate the model each time. What would you expect to happen?

Positive correlation between X and u Variable 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 1.0000 0.0100 0.0000 _cons 0.9980 0.0220 r2 0.4960

Positive correlation between X and u Variable 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 1.0000 1.0020 0.0100 0.0000 _cons 0.9980 0.9940 0.0220 r2 0.4960 0.5040

Positive correlation between X and u Variable 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 1.0000 1.0020 1.1010 1.2070 1.3030 1.3930 0.0100 0.0090 0.0000 _cons 0.9980 0.9940 0.7940 0.5960 0.3750 0.2060 0.0220 0.0210 r2 0.4960 0.5040 0.5480 0.6480 0.6910

Positive correlation between X and u Variable 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 1.0000 1.0020 1.1010 1.2070 1.3030 1.3930 1.4900 1.5940 1.7070 1.8000 1.9000 0.0100 0.0090 0.0080 0.0070 0.0060 0.0040 0.0000 _cons 0.9980 0.9940 0.7940 0.5960 0.3750 0.2060 0.0140 -0.2020 -0.4350 -0.5980 -0.8000 0.0220 0.0210 0.0190 0.0180 0.0160 0.0130 0.4540 r2 0.4960 0.5040 0.5480 0.6480 0.6910 0.7460 0.7960 0.8500 0.9030 0.9490

Let’s make X and u negatively correlated. Let 𝑐𝑜𝑟𝑟(𝑋,𝑢) go from 0 to -0.9 in steps of 0.1. Re-estimate the model each time. What would you expect to happen?

Negative correlation between X and u Variable 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 1.0000 0.9950 0.9190 0.8060 0.7080 0.5890 0.5010 0.3970 0.2910 0.2000 0.1050 0.0100 0.0090 0.0080 0.0070 0.0060 0.0040 0.0000 _cons 0.9980 1.0280 1.1830 1.3970 1.5710 1.8320 2.0100 2.2010 2.4140 2.5980 2.7830 0.0220 0.0210 0.0190 0.0180 0.0160 0.0130 r2 0.4960 0.5020 0.4620 0.4130 0.3510 0.2900 0.2510 0.2030 0.1430 0.1010 0.0540

Violation of A#1 leads to biased coefficient estimates. The bias increases in a systematic fashion. Important to understand how this happens.