Methods of Economic Investigation Lecture 2

Slides:



Advertisements
Similar presentations
Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Advertisements

Ordinary least Squares
Economics 20 - Prof. Anderson1 Panel Data Methods y it = x it k x itk + u it.
Economics 20 - Prof. Anderson
Review of Identifying Causal Effects Methods of Economic Investigation Lecture 13.
Random Assignment Experiments
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
Omitted Variable Bias Methods of Economic Investigation Lecture 7 1.
Lecture 12 (Ch16) Simultaneous Equations Models (SEMs)
STAT 497 APPLIED TIME SERIES ANALYSIS
Econ 140 Lecture 151 Multiple Regression Applications Lecture 15.
Building and Testing a Theory Steps Decide on what it is you want to explain or predict. 2. Identify the variables that you believe are important.
The World’s Fastest Crash Course in Statistics Or, What You Need to Know to Answer Your Research Question 13 November 2006.
Lecture 8 Relationships between Scale variables: Regression Analysis
Econ Prof. Buckles1 Welcome to Econometrics What is Econometrics?
1 A REVIEW OF QUME 232  The Statistical Analysis of Economic (and related) Data.
Economics 20 - Prof. Anderson
Economics Prof. Buckles1 Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
Demystifying Data Reference Helping non-specialists make sense of data.
Pooled Cross Sections and Panel Data II
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
Econ 140 Lecture 131 Multiple Regression Models Lecture 13.
Multiple Regression Models
Fitting the Data Lecture 2 Lecture 2.
Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17.
Linear Regression and Correlation Analysis
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
Chapter 15 Panel Data Analysis.
Chapter 2 – Tools of Positive Analysis
Economics 20 - Prof. Anderson
Prof. Dr. Rainer Stachuletz 1 Welcome to the Workshop What is Econometrics?
1Prof. Dr. Rainer Stachuletz Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
1Prof. Dr. Rainer Stachuletz Panel Data Methods y it =  0 +  1 x it  k x itk + u it.
Chapter 12 Section 1 Inference for Linear Regression.
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Chapter 13: Inference in Regression
Hypothesis Testing in Linear Regression Analysis
Assessing Studies Based on Multiple Regression
  What is Econometrics? Econometrics literally means “economic measurement” It is the quantitative measurement and analysis of actual economic and business.
JDS Special program: Pre-training1 Carrying out an Empirical Project Empirical Analysis & Style Hint.
Inferences for Regression
Error Component Models Methods of Economic Investigation Lecture 8 1.
Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
Instrumental Variables: Problems Methods of Economic Investigation Lecture 16.
Introduction 1 Panel Data Analysis. And now for… Panel Data! Panel data has both a time series and cross- section component Observe same (eg) people over.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Methods of Economic Investigation: Lent Term Radha Iyengar Office Hour: Monday Office: R425.
Application 3: Estimating the Effect of Education on Earnings Methods of Economic Investigation Lecture 9 1.
Lecture 7: What is Regression Analysis? BUEC 333 Summer 2009 Simon Woodcock.
Econ 488 Lecture 2 Cameron Kaplan. Hypothesis Testing Suppose you want to test whether the average person receives a B or higher (3.0) in econometrics.
Instrumental Variables: Introduction Methods of Economic Investigation Lecture 14.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
When should you use fixed effects estimation rather than random effects estimation, or vice versa? FIXED EFFECTS OR RANDOM EFFECTS? 1 NLSY 1980–1996 Dependent.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 4-1 Basic Mathematical tools Today, we will review some basic mathematical tools. Then we.
Economics 20 - Prof. Anderson1 Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
Lecture 1 Introduction to econometrics
4-1 MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form (Ch. 6 & 7)
Experimental Evaluations Methods of Economic Investigation Lecture 4.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Financial Econometrics Lecture Notes 5
Econometrics ITFD Week 8.
The Nature of Econometrics and Economic Data
Basics of Group Analysis
CHAPTER 29: Multiple Regression*
Economics 20 - Prof. Anderson
Economics 20 - Prof. Anderson
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
Presentation transcript:

Methods of Economic Investigation Lecture 2 Data Structures Methods of Economic Investigation Lecture 2

Why are we doing this? Thus far: Most of econometrics teaching has been theory based Type of data can drive what you can do Type of data affects credibility and problems with analysis Can be hard to translate equations into applications and even into reading papers Rest of this course based on applications: this lecture will help with both lectures and exercises

Choosing your data.. Suppose interested in causal effect of X on y: How would you test this? If you could choose the way in which X is determined in your sample—what would you do may seem fanciful but field experiments becoming more common in economics Good thought experiment: If you could have any data in the world, is this question answerable (if not, move on!) Good reason to choose to do randomized controlled experiment 3

Where does data come from? Surveys Response Rate Stratification/Clusters Reporting Error/Measurement Error Administrative Records Lots of different places Often kept real-time (so addresses “reporting” or “recollection” errors) May be missing, and that might not be random… Researchers (and you!) Often collected for specific project—so be careful what it has More “unique” with different types of data (e.g. content analysis)

Who Collects Data Government Service providers Third Parties Official Statistics: Unemployment, GDP, etc Surveys: Labor Force, Consumption, etc. Records: Justice System, Social Programs Service providers Often this may be administrative (e.g. hospital records) Sometimes, internal surveys or evaluations which can be useful if you can get them Third Parties Critical for places with limited capacity (e.g. World Bank is a big source of this for developing countries) University or Survey Research Programs Newspapers and Media sources compile LOTS of things

Different Types of data Cross-Sectional Data Time Series Data Panel Data Repeated Cross-Section

Cross-Sectional Data Cross section data covers a cross section of population and information is collected from this cross section during a given period of time. What does this look like Rows are units of observations (e.g. individuals) Columns may be variables

Cross-Section Data Simple descriptive statistics across individuals: can get sample mean and variance of various X’s Regressions: The standard formulas

AlgebraReality: Outcome Variables Try to get a sense of data, to translate the matrix algebra into reality. What is the effect of education on income? We have an Outcome “y”, for example income

AlgebraReality: RHS Variables There may be several (labeled by k) different X’s. So usually we think of this as meaning that: X is of dimensionality kxn We will estimate k coefficients Our X variables looks like:

Our Data Looks like: ID Income Race Sex Education 1 y1 x11 x21 x31 2 4 y4 x14 x24 x34 5 y5 x15 x25 x35 Our Data Example N=5 k=3 We can index our individuals by ID (useful later)

What does a regression tell us? Remember, it’s minimizing the errors and will pick the 3 coefficients (one on race, one on sex, and one on education) to do that We are interested in the coefficient on education to tell use the “effect of education on earnings” We might still care about the effect of race and gender as “control” variables

Stata Output

AlgebraReality: Stata Output Using our “data” if we regress y on our X’s To do this in stata we would tell stata: regress income race sex education Output: Coefficients Standard errors R-squared

Limitations… Lots of things vary over time Can’t control for these issues in cross-section data Only source of variation is across individuals (or whatever the unit of observation) Identification: Need observations similar time characteristics (because we can’t control this) but different on some variable of interest

Now to time series data Pretty similar to panel data except data indexed by time instead of individual Year Income Inflation Growth Unempl 2000 y1 x11 x21 x31 2001 y2 x12 x22 x32 2002 y3 x13 x23 x33 2003 y4 x14 x24 x34 2004 y5 x15 x25 x35

Why is time series different? Correlation between different observations Violates OLS assumptions (estimates ok but can’t do inference) More on this later… Lots of things about individuals are time-invariant so they don’t make sense in this context. Other things, often in time series data, are common across individuals (e.g. macroeconomic trends) Limits what we can do with these variables—we CAN’T “control” for time-invariant characteristics so all variation comes from time variation…

Estimating with Time Series Data Two critical issues: Stationary: Mean and Variance not changing over time Stronger conditions sometimes required which is that distribution (e.g. all moments) same over time/space May need to do something to make your data stationary (e.g. de-mean, detrend, difference, etc.) Ergodic Given a sufficiently long set of realizations, can estimate statistical properties Worry about Unit roots (more on this later)

Panel Data Repeated observation on individuals Common example: Labor Force Surveys Take information about individuals Usually contains time invarying for any individual (race, sex, education level) Usually contains time varying for any given individual (employed last week) Can contain or link to time varying but same across groups of individuals (local unemployment rate)

Example of Panel Data Multi-dimensional—so indexed by time & individual ID Year Income Employment Sex Education 1 2000 Y1,2000 X11,2000 X21,2000 X31,2000 2001 Y1,2001 X11,2001 X21,2001 X31,2001 2 Y2,2000 X12, 2000 X22,2000 X32,2000 Y2,2001 X12, 2001 X22,2001 X32,2001 2002 Y2,2002 X12,2002 X22,2002 X32,2002

Panel Data Regressions Regressions need to be indexed by all dimensions (our example is time and individual but it could be time, state, and individual) May allow intercept shift (e.g. add a dummy for each year) May allow a slope shift (e.g. allow different coefficients for men and women)

What’s so great about Panel Data? We can control for individual specific factors (e.g. error component models) ECM may solve some of our omitted variable bias issues (individual controls) Can use both “within” (for an individual over time) and “between variation (across individuals in a given time) Can be rare to have long panels Tend to span very short periods of time May make it difficult to study trends—can only see “breaks” at big changes

Repeated Cross-Section Data More common—Annual or Frequent Surveys—not always same people Get repeated cross-section, of different cohorts of individuals Can do several things: Construct panel at more aggregate level Use time-series aspects to compare cohorts

Example of Cross-Section Data Multi-dimensional—so indexed by time & individual ID Year Income Employment Sex Education 1 2000 Y1,2000 X11,2000 X21,2000 X31,2000 2 2001 Y2,2001 X12,2001 X22,2001 X32,2001 3 Y3,2000 X13, 2000 X23,2000 X33,2000 4 Y4,2001 X14, 2001 X24,2001 X34,2001 5 2002 Y5,2002 X15,2002 X25,2002 X35,2002

Repeated Cross-Section Regressions Index by time and whatever “group” you want to use—for example: group 1 is men and group 2 is women, then you estimate: Use similarities between groups but can’t control of individual specific issues Cohort specific changes—selection issues, e.g. Can allow ‘fixed effects’ for time or group—but not as believable to control for unobservables

Next Steps: Using data can we: Describe the data to understand what we’ve got Develop some “questions” to answer Test our hypotheses Application based class—will use Stata examples