Introduction and Identification Todd Wagner Econometrics with Observational Data.

Slides:



Advertisements
Similar presentations
Autocorrelation and Heteroskedasticity
Advertisements

Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Econometric Modeling Through EViews and EXCEL
Brief introduction on Logistic Regression
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
Econometrics with Observational Data Will begin at 2PM ET For conference audio, dial and use access code After entry please dial *6.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Random effects estimation RANDOM EFFECTS REGRESSIONS When the observed variables of interest are constant for each individual, a fixed effects regression.
Lecture 4 Econ 488. Ordinary Least Squares (OLS) Objective of OLS  Minimize the sum of squared residuals: where Remember that OLS is not the only possible.
Lecture 8 Relationships between Scale variables: Regression Analysis
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
OUTLIER, HETEROSKEDASTICITY,AND NORMALITY
The Simple Linear Regression Model: Specification and Estimation
Multiple Linear Regression Model
Economics Prof. Buckles1 Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
Chapter 10 Simple Regression.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
Additional Topics in Regression Analysis
Regression Hal Varian 10 April What is regression? History Curve fitting v statistics Correlation and causation Statistical models Gauss-Markov.
The Simple Regression Model
Economics 20 - Prof. Anderson
Topic 3: Regression.
Empirical Estimation Review EconS 451: Lecture # 8 Describe in general terms what we are attempting to solve with empirical estimation. Understand why.
Basics of regression analysis
Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need.
Business Statistics - QBM117 Statistical inference for regression.
Ordinary Least Squares
Correlation & Regression
3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors.
Simple Linear Regression
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
Understanding Multivariate Research Berry & Sanders.
2-1 MGMG 522 : Session #2 Learning to Use Regression Analysis & The Classical Model (Ch. 3 & 4)
Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Multiple Collinearity, Serial Correlation,
Name: Angelica F. White WEMBA10. Teach students how to make sound decisions and recommendations that are based on reliable quantitative information During.
Introduction and Identification Todd Wagner Econometrics with Observational Data.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Examining Relationships in Quantitative Research
10. Basic Regressions with Times Series Data 10.1 The Nature of Time Series Data 10.2 Examples of Time Series Regression Models 10.3 Finite Sample Properties.
Christine Pal Chee October 9, 2013 Research Design.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Chapter 6 Introduction to Multiple Regression. 2 Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Chapter 11: Linear Regression and Correlation Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables.
Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
Todd Wagner, PhD February 2011 Propensity Scores.
Quantitative Methods. Bivariate Regression (OLS) We’ll start with OLS regression. Stands for  Ordinary Least Squares Regression. Relatively basic multivariate.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 18 Multivariate Statistics.
The Instrumental Variables Estimator The instrumental variables (IV) estimator is an alternative to Ordinary Least Squares (OLS) which generates consistent.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.
PO 141: INTRODUCTION TO PUBLIC POLICY Summer I (2015) Claire Leavitt Boston University.
Methods of Presenting and Interpreting Information Class 9.
Lecture 6 Feb. 2, 2015 ANNOUNCEMENT: Lab session will go from 4:20-5:20 based on the poll. (The majority indicated that it would not be a problem to chance,
Linear Regression with One Regression
Kakhramon Yusupov June 15th, :30pm – 3:00pm Session 3
More on Specification and Data Issues
Simultaneous equation system
Fundamentals of regression analysis
More on Specification and Data Issues
I271B Quantitative Methods
Regression Analysis Week 4.
Migration and the Labour Market
OVERVIEW OF LINEAR MODELS
Checking Assumptions Primary Assumptions Secondary Assumptions
More on Specification and Data Issues
Presentation transcript:

Introduction and Identification Todd Wagner Econometrics with Observational Data

Goals for Course To enable researchers to conduct careful analyses with existing VA (and non-VA) datasets. We will –Describe econometric tools and their strengths and limitations –Use examples to reinforce learning

Goals of Today’s Class Understanding causation with observational data Describe elements of an equation Example of an equation Assumptions of the classic linear model

Terminology Confusing terminology is a major barrier to interdisciplinary research –Multivariable or multivariate –Endogeneity or confounding –Interaction or Moderation –Right or Wrong Maciejewski ML, Weaver ML and Hebert PL. (2011) Med Care Res Rev 68 (2):

Polls

Understanding Causation: Randomized Clinical Trial RCTs are the gold-standard research design for assessing causality What is unique about a randomized trial? The treatment / exposure is randomly assigned Benefits of randomization: Causal inferences

Randomization Random assignment distinguishes experimental and non-experimental design Random assignment should not be confused with random selection –Selection can be important for generalizability (e.g., randomly-selected survey participants) –Random assignment is required for understanding causation

Limitations of RCTs Generalizability to real life may be low –Exclusion criteria may result in a select sample Hawthorne effect (both arms) RCTs are expensive and slow Can be unethical to randomize people to certain treatments or conditions Quasi-experimental design can fill an important role

Can Secondary Data Help us understand Causation? Coffee, exercise may decrease risk of skin cancer Study: Coffee may make you lazy Coffee may make high achievers slack off Coffee not linked to psoriasis Coffee: An effective weight loss tool Coffee poses no threat to hearts, may reduce diabetes risk: EPIC data

Observational Data Widely available (especially in VA) Permit quick data analysis at a low cost May be realistic/ generalizable Key independent variable may not be exogenous – it may be endogenous

Endogeneity A variable is said to be endogenous when it is correlated with the error term (assumption 4 in the classic linear model) If there exists a loop of causality between the independent and dependent variables of a model leads, then there is endogeneity

Endogeneity Endogeneity can come from: –Measurement error –Autoregression with autocorrelated errors –Simultaneity –Omitted variables –Sample selection

Elements of an Equation Maciejewski ML, Diehr P, Smith MA, Hebert P. Common methodological terms in health services research and their synonyms. Med Care. Jun 2002;40(6):

Terms Univariate– the statistical expression of one variable Bivariate– the expression of two variables Multivariate– the expression of more than one variable (can be dependent or independent variables)

Dependent variable Outcome measure Error Term Intercept Covariate, RHS variable, Predictor, independent variable Note the similarity to the equation of a line (y=mx+B)

“i” is an index. If we are analyzing people, then this typically refers to the person There may be other indexes

DV Two covariates Error Term Intercept

DV j covariates Error Term Intercept Different notation

Error term Error exists because 1.Other important variables might be omitted 2.Measurement error 3.Human indeterminacy Understand error structure and minimize error Error can be additive or multiplicative See Kennedy, P. A Guide to Econometrics

Example: is height associated with income?

Y=income; X=height Hypothesis: Height is not related to income (B 1 =0) If B 1 =0, then what is B 0 ?

Height and Income How do we want to describe the data?

Estimator A statistic that provides information on the parameter of interest (e.g., height) Generated by applying a function to the data Many common estimators –Mean and median (univariate estimators) –Ordinary least squares (OLS) (multivariate estimator)

Ordinary Least Squares (OLS)

Other estimators Least absolute deviations Maximum likelihood

Choosing an Estimator Least squares Unbiasedness Efficiency (minimum variance) Asymptotic properties Maximum likelihood Goodness of fit We’ll talk more about identifying the “right” estimator throughout this course.

How is the OLS fit?

What about gender? How could gender affect the relationship between height and income? –Gender-specific intercept –Interaction

Gender Indicator Variable Gender Intercept height

Gender-specific Indicator B0B0 B2B2 B 1 is the slope of the line

Interaction Term, Effect modification, Modifier Interaction Note: the gender “main effect” variable is still in the model height gender

Gender Interaction Interaction allows two groups to have different slopes

Assumptions Classic Linear Regression (CLR)

Classic Linear Regression No “superestimator” CLR models are often used as the starting point for analyses 5 assumptions for the CLR Variations in these assumption will guide your choice of estimator (and happiness of your reviewers)

Assumption 1 The dependent variable can be calculated as a linear function of a specific set of independent variables, plus an error term For example,

Violations to Assumption 1 Omitted variables Non-linearities –Note: by transforming independent variables, a nonlinear function can be made from a linear function

Testing Assumption 1 Theory-based transformations Empirically-based transformations Common sense Ramsey RESET test Pregibon Link test Ramsey J. Tests for specification errors in classical linear least squares regression analysis. Journal of the Royal Statistical Society. 1969;Series B(31): Pregibon D. Logistic regression diagnostics. Annals of Statistics. 1981;9(4):

Assumption 1 and Stepwise Statistical software allows for creating models in a “stepwise” fashion Be careful when using it. –Little penalty for adding a nuisance variable –BIG penalty for missing an important covariate

Assumption 2 Expected value of the error term is 0 E(u i )=0 Violations lead to biased intercept A concern when analyzing cost data

Assumption 3 IID– Independent and identically distributed error terms –Autocorrelation: Errors are uncorrelated with each other –Homoskedasticity: Errors are identically distributed

Heteroskedasticity

Violating Assumption 3 Effects –OLS coefficients are unbiased –OLS is inefficient –Standard errors are biased Plotting is often very helpful Different statistical tests for heteroskedasticity –GWHet--but statistical tests have limited power

Fixes for Assumption 3 Transforming dependent variable may eliminate it Robust standard errors (Huber White or sandwich estimators)

Assumption 4 Observations on independent variables are considered fixed in repeated samples E(x i u i )=0 Violations –Errors in variables –Autoregression –Simultaneity Endogeneity

Assumption 4: Errors in Variables Measurement error of dependent variable (DV) is maintained in error term. OLS assumes that covariates are measured without error. Error in measuring covariates can be problematic

Common Violations Including a lagged dependent variable(s) as a covariate Contemporaneous correlation –Hausman test (but very weak in small samples) Instrumental variables offer a potential solution

Assumption 5 Observations > covariates No multicollinearity Solutions –Remove perfectly collinear variables –Increase sample size

Any Questions?

Statistical Software I frequently use SAS for data management I use Stata for my analyses Stattransfer

Regression References Kennedy A Guide to Econometrics Greene. Econometric Analysis. Wooldridge. Econometric Analysis of Cross Section and Panel Data. Winship and Morgan (1999) The Estimation of Causal Effects from Observational Data Annual Review of Sociology, pp