Econometrics with Observational Data Will begin at 2PM ET For conference audio, dial 800.767.1750 and use access code 45043 After entry please dial *6.

Slides:



Advertisements
Similar presentations
Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Advertisements

Structural Equation Modeling. What is SEM Swiss Army Knife of Statistics Can replicate virtually any model from “canned” stats packages (some limitations.
Multiple Regression W&W, Chapter 13, 15(3-4). Introduction Multiple regression is an extension of bivariate regression to take into account more than.
Welcome to Econ 420 Applied Regression Analysis
Methods of Economic Investigation Lecture 2
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
A Short Introduction to Curve Fitting and Regression by Brad Morantz
Introduction and Identification Todd Wagner Econometrics with Observational Data.
Lecture 4 Econ 488. Ordinary Least Squares (OLS) Objective of OLS  Minimize the sum of squared residuals: where Remember that OLS is not the only possible.
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Econometric Details -- the market model Assume that asset returns are jointly multivariate normal and independently and identically distributed through.
The Simple Linear Regression Model: Specification and Estimation
Multiple Linear Regression Model
Chapter 10 Simple Regression.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
Additional Topics in Regression Analysis
Regression Hal Varian 10 April What is regression? History Curve fitting v statistics Correlation and causation Statistical models Gauss-Markov.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
Economics 20 - Prof. Anderson
Topic 3: Regression.
Empirical Estimation Review EconS 451: Lecture # 8 Describe in general terms what we are attempting to solve with empirical estimation. Understand why.
Basics of regression analysis
Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need.
Ordinary Least Squares
Correlation & Regression
3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Hypothesis Testing in Linear Regression Analysis
Simple Linear Regression
Understanding Multivariate Research Berry & Sanders.
2-1 MGMG 522 : Session #2 Learning to Use Regression Analysis & The Classical Model (Ch. 3 & 4)
Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Two Ending Sunday, September 9 (Note: You must go over these slides and complete every.
Introduction and Identification Todd Wagner Econometrics with Observational Data.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
10. Basic Regressions with Times Series Data 10.1 The Nature of Time Series Data 10.2 Examples of Time Series Regression Models 10.3 Finite Sample Properties.
Methods of Economic Investigation: Lent Term Radha Iyengar Office Hour: Monday Office: R425.
2.4 Units of Measurement and Functional Form -Two important econometric issues are: 1) Changing measurement -When does scaling variables have an effect.
Christine Pal Chee October 9, 2013 Research Design.
Right Hand Side (Independent) Variables Ciaran S. Phibbs June 6, 2012.
Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,
Right Hand Side (Independent) Variables Ciaran S. Phibbs.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
LECTURE 9 Tuesday, 24 FEBRUARY STA291 Fall Administrative 4.2 Measures of Variation (Empirical Rule) 4.4 Measures of Linear Relationship Suggested.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Dynamic Models, Autocorrelation and Forecasting ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
Todd Wagner, PhD February 2011 Propensity Scores.
ANCOVA.
Lecture 1 Introduction to econometrics
Quantitative Methods. Bivariate Regression (OLS) We’ll start with OLS regression. Stands for  Ordinary Least Squares Regression. Relatively basic multivariate.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
By Randall Munroe, xkcd.com Econometrics: The Search for Causal Relationships.
Lecture 6 Feb. 2, 2015 ANNOUNCEMENT: Lab session will go from 4:20-5:20 based on the poll. (The majority indicated that it would not be a problem to chance,
Chapter 13 Simple Linear Regression
The Simple Linear Regression Model: Specification and Estimation
Econometrics ITFD Week 8.
Kakhramon Yusupov June 15th, :30pm – 3:00pm Session 3
Ch3: Model Building through Regression
Fundamentals of regression analysis
…Don’t be afraid of others, because they are bigger than you
I271B Quantitative Methods
Migration and the Labour Market
Introduction to Logistic Regression
BY: Mohammed Hussien Feb 2019 A Seminar Presentation on Longitudinal data analysis Bahir Dar University School of Public Health Post Graduate Program.
Rachael Bedford Mplus: Longitudinal Analysis Workshop 23/06/2015
Presentation transcript:

Econometrics with Observational Data Will begin at 2PM ET For conference audio, dial and use access code After entry please dial *6 to mute your line.

Econometrics with Observational Data Introduction and Identification Todd Wagner

Goals for Course To enable researchers to conduct careful analyses with existing VA (and non-VA) datasets. To enable researchers to conduct careful analyses with existing VA (and non-VA) datasets. We will We will –Describe econometric tools and their strengths and limitations –Use examples to reinforce learning

Requirements Familiarity with multivariate analysis Familiarity with multivariate analysis We expect you to We expect you to –do the readings –ask questions if you don’t understand something –do your best

Course faculty Todd Wagner, PhD Todd Wagner, PhD Paul Barnett, PhD Paul Barnett, PhD Ciaran Phibbs, PhD Ciaran Phibbs, PhD Mark Smith, PhD Mark Smith, PhD

Course Dates Introduction and identification: (March 12 th ) Introduction and identification: (March 12 th ) Cost as the dependent variable (I): (March 26th) Cost as the dependent variable (I): (March 26th) Cost as the dependent variable (II): (April 9th) Cost as the dependent variable (II): (April 9th) Non-linear dependent variables: (April 30th) Non-linear dependent variables: (April 30th) Right-hand-side variables: (May 14) Right-hand-side variables: (May 14) Research design (May 28 th ) Research design (May 28 th ) Endogeneity and simultaneity: (June 11th) Endogeneity and simultaneity: (June 11th)

LiveMeeting Rules Mute your phone Mute your phone Don’t use the hold button Don’t use the hold button If you have to make or answer another call, please hang up and then dial back in

Features of web system Main page Main page –Slides, white board, applications demo Side page Side page –Chat –Questions and answers –Informal poll –Polls

Virtual Interaction Polls Polls me (or the presenter) with questions or me (or the presenter) with questions or If you are in a group setting, please me the next two items. If you are in a group setting, please me the next two items.

Randomized Clinical Trial RCTs are the gold-standard research design for assessing causality RCTs are the gold-standard research design for assessing causality What is unique about a randomized trial? What is unique about a randomized trial? The treatment / exposure is randomly assigned Benefits of randomization: Benefits of randomization: Causal inferences

Randomization Random assignment distinguishes experimental and non-experimental design Random assignment distinguishes experimental and non-experimental design Random assignment should not be confused with random selection Random assignment should not be confused with random selection –Selection can be important for generalizability (e.g., randomly-selected survey participants) –Random assignment is required for understanding causation

Limitations of RCTs Generalizability to real life may be low Generalizability to real life may be low Hawthorne effect (both arms) Hawthorne effect (both arms) RCTs are expensive and slow RCTs are expensive and slow Can be unethical to randomize people to certain treatments or conditions Can be unethical to randomize people to certain treatments or conditions Quasi-experimental design can fill an important role Quasi-experimental design can fill an important role

Elements of an Equation Maciejewski ML, Diehr P, Smith MA, Hebert P. Common methodological terms in health services research and their synonyms. Med Care. Jun 2002;40(6):

Dependent variable Outcome measure Error Term Intercept Covariate, RHS variable, Predictor, independent variable

“i” is an index. If we are analyzing people, then this typically refers to the person There may be other indexes

DV Two covariates Error Term Intercept

DV j covariates Error Term Intercept Different notation

Error term Error exists because Error exists because 1.Other important variables might be omitted 2.Measurement error 3.Human indeterminancy Understand and minimize error Understand and minimize error Error can be additive or multiplicative Error can be additive or multiplicative See Kennedy, P. A Guide to Econometrics

Example: is height associated with income?

Y=income; X=height Y=income; X=height Hypothesis: Height is not related to income (B 1 =0) Hypothesis: Height is not related to income (B 1 =0) If B 1 =0, then what is B 0 ? If B 1 =0, then what is B 0 ? To test Hypothesis, we need to estimate B 1 with a sample of data To test Hypothesis, we need to estimate B 1 with a sample of data

Estimators

OLS

Other estimators Least absolute deviations Least absolute deviations Maximum likelihood Maximum likelihood

Choosing an Estimator Least squares Least squares Unbiasedness Unbiasedness Efficiency (minimum variance) Efficiency (minimum variance) Asymptotic properties Asymptotic properties Maximum likelihood Maximum likelihood We’ll talk more about estimators in courses We’ll talk more about estimators in courses 2- 4.

What is driving the error?

What about gender How could gender affect the relationship between height and income? How could gender affect the relationship between height and income? –Gender-specific intercept –Interaction

Gender Indicator Variable Gender Intercept height

Gender-specific Indicator B0B0 B2B2

Interaction Term, Effect modification, Modifier Interaction Note: the gender “main effect” variable is still in the model height gender

Gender Interaction

Classic Linear Regression (CLR) Assumptions

Classic Linear Regression No “superestimator” No “superestimator” CLR models are often used as the starting point for analyses CLR models are often used as the starting point for analyses 5 assumptions for the CLR 5 assumptions for the CLR Variations in these assumption will guide your choice of estimator (and happiness of your reviewers) Variations in these assumption will guide your choice of estimator (and happiness of your reviewers)

Assumption 1 The dependent variable can be calculated as a linear function of a specific set of independent variables, plus an error term The dependent variable can be calculated as a linear function of a specific set of independent variables, plus an error term For example, For example,

Violations to Assumption 1 Omitted variables Omitted variables Non-linearities Non-linearities –Note: by transforming independent variables, a nonlinear function can be made from a linear function

Testing Assumption 1 Theory-based transformations Theory-based transformations Empirically-based transformations Empirically-based transformations Common sense Common sense Ramsey RESET test Ramsey RESET test Pregibon Link test Pregibon Link test Ramsey J. Tests for specification errors in classical linear least squares regression analysis. Journal of the Royal Statistical Society. 1969;Series B(31): Pregibon D. Logistic regression diagnostics. Annals of Statistics. 1981;9(4):

Assumption 1 and Stepwise Statistical software allows for creating models in a “stepwise” fashion Statistical software allows for creating models in a “stepwise” fashion Be careful when using it. Be careful when using it. Why? Why? –Little penalty for adding a nuisance variable –BIG penalty for missing an important covariate

Assumption 2 Expected value of the error term is 0 Expected value of the error term is 0 E(u i )=0 Violations lead to biased intercept A concern when analyzing cost data (Wei will talk about the smearing estimator)

Assumption 3 IID– Independent and identically distributed error terms IID– Independent and identically distributed error terms –Autocorrelation: Errors are uncorrelated with each other –Homoskedasticity: Errors are identically distributed

Heteroskedasticity

Violating Assumption 3 Effects Effects –OLS coefficients are unbiased –OLS is inefficient –Standard errors are biased Plotting is often very helpful Plotting is often very helpful Different statistical tests for heteroskedasticity Different statistical tests for heteroskedasticity –GWHet--but statistical tests have limited power

Fixes for Assumption 3 Transforming dependent variable may eliminate it Transforming dependent variable may eliminate it Robust standard errors (Huber White or sandwich estimators) Robust standard errors (Huber White or sandwich estimators) Wei and Ciaran will address this issue in more detail in later courses Wei and Ciaran will address this issue in more detail in later courses

Assumption 4 Observations on independent variables are considered fixed in repeated samples Observations on independent variables are considered fixed in repeated samples E(x i u i )=0 E(x i u i )=0 Violations Violations –Errors in variables –Autoregression –Simultaneity

Assumption 4: Errors in Variables Measurement error of dependent variable (DV) is maintained in error term. Measurement error of dependent variable (DV) is maintained in error term. Error in measuring covariates can be problematic Error in measuring covariates can be problematic –Is error correlated with error from DV?

Common Violations Including a lagged dependent variable(s) as a covariate Including a lagged dependent variable(s) as a covariate Contemporaneous correlation—often called endogeneity Contemporaneous correlation—often called endogeneity –Hausman test (but very weak in small samples) Mark Smith will talk more about this. Mark Smith will talk more about this.

Assumption 5 Observations > covariates Observations > covariates No multicollinearity No multicollinearity Solutions Solutions –Remove perfectly collinear variables –Increase sample size

Any Questions?

Statistical Software SAS is for data management SAS is for data management R and Stata are for analyses R and Stata are for analyses – Stattransfer Stattransfer (Always transfer SCRSSN as double precision)

Reading for Next Week Maciejewski ML et al. Common methodological terms in health services research and their synonyms. Med Care. Jun 2002;40(6): Manning WG, Mullahy J. Estimating log models: to transform or not to transform? J Health Econ. Jul 2001;20(4):

Regression References Kennedy A Guide to Econometrics Kennedy A Guide to Econometrics Greene. Econometric Analysis. Greene. Econometric Analysis. Wooldridge. Econometric Analysis of Cross Section and Panel Data. Wooldridge. Econometric Analysis of Cross Section and Panel Data.