Esman M. Nyamongo Central Bank of Kenya Econometrics Course organized by the COMESA Monetary Institute (CMI) on 2-11 June 2014, KSMS Nairobi Kenya 1.

Slides:



Advertisements
Similar presentations
Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Advertisements

Dynamic panels and unit roots
Panel Data Models Prepared by Vera Tabakova, East Carolina University.
Lecture 29 Summary of previous lecture LPM LOGIT PROBIT ORDINAL LOGIT AND PROBIT TOBIT MULTINOMIAL LOGIT AN PROBIT DURATION.
Random Assignment Experiments
Lecture 8 (Ch14) Advanced Panel Data Method
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
8. Heteroskedasticity We have already seen that homoskedasticity exists when the error term’s variance, conditional on all x variables, is constant: Homoskedasticity.
Part 1 Cross Sectional Data
Chapter 10 Simple Regression.
2.5 Variances of the OLS Estimators
Pooled Cross Sections and Panel Data II
Chapter 15 Panel Data Analysis.
Chapter 11 Multiple Regression.
Topic 3: Regression.
12 Autocorrelation Serial Correlation exists when errors are correlated across periods -One source of serial correlation is misspecification of the model.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Hypothesis Testing in Linear Regression Analysis
Regression Method.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief.
Panel Data Models ECON 6002 Econometrics I Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
Chapter Three TWO-VARIABLEREGRESSION MODEL: THE PROBLEM OF ESTIMATION
MODELS FOR PANEL DATA. PANEL DATA REGRESSION Double subscript on variables (observations) i… households, individuals, firms, countries t… period (time-series.
7.4 DV’s and Groups Often it is desirous to know if two different groups follow the same or different regression functions -One way to test this is to.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Chapter 15 Panel Data Models Walter R. Paczkowski Rutgers University.
Leobardo Diosdado David Galaz. “Longitudinal data analysis represents a marriage of regression and time series analysis.” Source: Edward Frees (
Time Series Econometrics
Financial Econometrics Lecture Notes 5
Heteroscedasticity Chapter 8
F-tests continued.
Vera Tabakova, East Carolina University
Esman M. Nyamongo Central Bank of Kenya
Chapter 15 Panel Data Models.
REGRESSION DIAGNOSTIC III: AUTOCORRELATION
Vera Tabakova, East Carolina University
Esman M. Nyamongo Central Bank of Kenya
Esman M. Nyamongo Central Bank of Kenya
Chow test.
Econometrics ITFD Week 8.
PANEL DATA REGRESSION MODELS
Kakhramon Yusupov June 15th, :30pm – 3:00pm Session 3
THE LINEAR REGRESSION MODEL: AN OVERVIEW
Esman M. Nyamongo Central Bank of Kenya
Fundamentals of regression analysis
STOCHASTIC REGRESSORS AND THE METHOD OF INSTRUMENTAL VARIABLES
Chapter 15 Panel Data Analysis.
I271B Quantitative Methods
Advanced Panel Data Methods
Chapter 6: MULTIPLE REGRESSION ANALYSIS
1.2 DATA TYPES.
Migration and the Labour Market
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
Simple Linear Regression
Tutorial 1: Misspecification
Chapter 7: The Normality Assumption and Inference with OLS
Product moment correlation
Linear Panel Data Models
Chapter 13 Additional Topics in Regression Analysis
Tutorial 6 SEG rd Oct..
Esman M. Nyamongo Central Bank of Kenya
Chapter 9 Dummy Variables Undergraduated Econometrics Page 1
Financial Econometrics Fin. 505
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
Advanced Panel Data Methods
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Esman M. Nyamongo Central Bank of Kenya Econometrics Course organized by the COMESA Monetary Institute (CMI) on 2-11 June 2014, KSMS Nairobi Kenya 1

 There are 3 types of data  Cross sectional data  Time series data  Panel data The cross-section and time series are the primary building blocks of Panel 2

 A time series is a set of observations on the values that a variable takes at different times  Such data may be collected at regular time intervals ◦ Minutely and Hourly- collected literally continuously ( the so-called real time quote) ◦ Daily- e.g., Financial time series-Stock prices, exchange rates; weather reports- rainfall, temperature ◦ Weekly – e.g., money supply ◦ Monthly- e.g., consumer price index ◦ Quarterly- e.g., GDP ◦ Semi-annually- e.g., Fiscal data ◦ Annually- e.g., Fiscal data ◦ Quinquennially ( every 5 years)- e.g., manufacturing survey ◦ Decennially- (every 10 years)- e.g., population census data 3

 Illustration of time series 4

 The model setup:  Where t= time series 5

 Cross-section data is data on one or more variables collected at a particular point in time in time Survey data- questionnaire is designed to capture all variables a research is looking for. Macro data relating to different economic entities : countries, banks at a particular point in time. E.g Other data 6

7

 The model set up  Where i= cross -section 8

 Panel data is a combination of both time and cross-section data  Specialized type is the longitudinal or micropanel data where a cross-sectional unit (say, individual, family, firm) is surveyed over time.  Surveying same individual over time is able to provide useful information on the dynamics of individual/household/firm behavior 9

10

 Time series + cross-section  Where i= cross section; t= time series 11

PooledPooled data 12

 Longitudinal/Micropanel data 13

 Controlling for individual heterogeneity. Panel data suggests that individuals, firms, states or countries are heterogeneous.  Panel data give more informative data, more variability, less collinearity among the variables, more degrees of freedom and more efficiency  Panel data are better able to study the dynamics of adjustment.  Panel data are better able to identify and measure effects that are simply not detectable in pure cross-section or pure time-series data. 14

 Panel data models allow us to construct and test more complicated behavioral models than purely cross-section or time-series data.  Micro panel data gathered on individuals, firms and households may be more accurately measured than similar variables measured at the macro level. Biases resulting from aggregation over firms or individuals may be reduced or eliminated (see Blundell, 1988; Klevmarken, 1989).  Macro panel data on the other hand have a longer time series and unlike the problem of nonstandard distributions typical of unit roots tests in time-series analysis, shows that panel unit root tests have standard asymptotic distributions. 15

 Design and data collection problems.  Distortions of measurement errors. Measurement errors may arise because of faulty responses due to unclear questions, memory errors, deliberate distortion of responses (e.g. prestige bias), inappropriate informants, misrecording of responses and interviewer effects  Selectivity problems.-Self-selectivity, Nonresponse, Attrition.  Short time-series dimension  Cross-section dependence 16

 Emphasizes the joint estimation of coefficients- ignores panel structure of the data  where y= dependent variable,  Xs= regressors  for all i and t. i.e  For a given cross section, observations are serially uncorrelated  Across cross-sections and time, the errors are homoscedastic 17

 These assumptions- classical assumptions therefore suggest we estimate the equation by OLS  Pooling- increases degrees of freedom, potentially lowering standard errors on the coefficients.  This involves stacking cross-sections in the data setstacking  This form assumes same intercept and same slope for all coefficients  But how realistic is it to ignore the panel structure of the data? 18

19 Estimation of pooled model Preparation of data for use in panel estimation

 Here same slope and intercept is assumed  Homogeneity of the cross-section units 20

 Recall  In this model u may display 3 different schemes:  (i) consists of 3 individual shocks, each assumed to be independent of each other:  is cross section invariant shock- time effect  is time invariant shock- individual effect  is the error term with usual properties- uncorrelated with X it 21

 (ii) to yield: Cross-section fixed effects model  (iii) to yield: Time-fixed effects model.  Here schemes (ii) and (iii) are referred to as ONE-AWAY ERROR COMPONENT MODEL  Scheme (i) is the TWO-WAY-ERROR COMPONENT MODEL 22

 Follows from the restrictions imposed by the pooled model i.e joint intercept and slope for i=1,2….N for t=1,2….T  One-way error component model allows cross-section heterogeneity in the error term.  Error term (u it ) becomes the sum of an individual specific effect ( u i - time invariant) and a ‘well behaved’ disturbance ( ) 23

 In this formulation we have:  The first part varies across cross-section units but is constant across time  The second part varies unsystematically (independently) across time and individuals  Two ways to estimate a regression model with error terms that are assumed to consist of several error components:  Fixed-effects model- (1) each equation constant is a separate parameter (2) values of vi are potentially correlated with the other regressors  Random-effects model – (1) differences in the vi are randomly distributed between units (2) values of vi are uncorrelated with the other regressors 24

 Recall:  Fixed-effects model-  (1) each equation constant is a separate parameter  (2) values of v i are potentially correlated with the other regressors 25

 The main question is whether X it is correlated with u it :  If no, then we have a seemingly unrelated regression  If yes, then we have a multi-equation system with common coefficients and endogenous regressors  Then how do we account for this endogeneity-  In time series we use instrumental variable estimation methods (2sls,3sls etc)  However, in panel we can handle with this, under certain assumptions, without using instruments How? 26

 There are 3 approaches to doing this in panel:  The least squares dummy variables (LSDV) estimator  The within-group estimator  The first difference estimator  Discussed in turns 27

 LSDV method applies OLS to levels with group specific dummies added to the list of regressors.  Explains why the estimator is called the Least- Squares Dummy Variables (LSDV) estimator  Consider the general model:  Stack the observations over t, to obtain: 28

 The pooled regression is then:  Where is a Kronecker productKronecker  Since 1 T is a regressor alongside X it then we expect: 29

 This model is appealing, but consider the number of parameters to be estimated! K+1+(N-1)= k+N  K= parameters for the original X-regressors  1= parameter for the intercept  N-1= parameters for cross-section fixed effects (omitted x-section captured by intercept)  Too many parameters (especially with large N!  Is there another way? - yes 30 N-1 K+1

 But we need to proceed!  Even though the OLS would be valid as an estimation method, it is inappropriate, as the parameter space is too large:  There are N+K regression parameters- as N is usually very large  We estimate the using OLS following the Frisch-Waugh-Lovell (FWL) theorem on partitioned regressions:  Digression on FWL and partitioned regressionsFWL and partitioned 31

 Based on this result we can show that the fixed effects estimator is the partitioned OLS estimator of in pooled regression.  Where  It then follows that:  Therefore: 32

 Whence it follows that:  This is basically a pooled OLS estimator on transformed data 33

34

 Still assumes individual effects although we no longer directly estimate them.  We demean the data- wipe out the individual effects- to estimate only B  How do we wipe out the individual effects?  We define a Q matrix. Where Q is defined such that:  where 35

 Consider a simple regression:  The trick is to remove the fixed effect, vi.  How? Step 1: Average over time t for each i. 36

 Step 2: The transformed regression:  or  Then, stack by observation for t= 1, …., T, resulting in the ‘giant’ (pooled) regression:  Or -edit this 37

 The within-group fixed-effects estimator is pooled OLS on the transformed regression that has been stacked by observations:  Degrees of freedom of FE estimator  nT-k-n=n(T-1)-k  Why- loose 1 degree of freedom in each fixed effect estimated. There are k B s to be estimated as well 38

 Consider the general model:  Stack an individual i’s observations for t=1,…..,T, giving:  Or i= 1, 2, …., n  Where 1 T is a (Tx1) vector of ones 39

 The fixed-effects estimator is applied to a T- equation system transformed from the original system above.  The matrix used for the transformation is the so-called annihilator associated with 1 T :annihilator  where Q is a T x T matrix: 40

 The matrices QT and PT are such that:  And  What does this mean- the QT and PT matrices takes you back to the transformed y 41

 The transformed error-components model is then:  The pooled regression is stated as:  sss 42

 The fixed-effects estimator is again obtained by pooled OLS on the transformed system:  Why?  Q T is idempotent i.e Q T Q T =Q T  In other words: 43

 P is the matrix that averages across time for each individual cross section.  Thus pre-multiplying this regression by Q obtains deviations from means WITHIN each cross- section  is an NTx1 vector with ‘stacked’ deviations  The OLS estimator is therefore:  Demeaning the data will not change the estimates for.  Similar to running a regression with the line of best fit passing through the origin 44

 Thus the ‘WITHIN’ model becomes a simple regression:  Individual effects can be solved ( not estimated). But we need the following assumption:  And solving: 45

 Notice the fixed effects are not estimated.  Computed instead. 46

 The computed FE are indicated.  But what are these FE? 47

Disadvantages of the method  Demeaning the data means X-regressors which are themselves dummy variables cannot be used (sex, religion, etc) 48

 Both coefficients are positive and significant. Therefore satisfy some theory!  Which one do we choose? 49

 The null hypothesis  Alternative HA: Not all equal to 0  We test the null hypothesis of no individual effects within applied Chow or F-test, combining the residual sum of squares for the regression both with constraints (under the null) and without (under alternative).  The recipe:  RSS- OLS on pooled model ( constant intercept)  URSS- OLS on LSDV 50

 The F- statistics is stated as:  If N is ‘large enough’ can use ‘WITHIN’ estimation instead of LSDV for the RSS.  The decision rule:  P-value FE are not redundant  P-value> 0.05, we fail to reject the null hypothesis => FE are redundant, suggesting pooled model is valid 51

 Recall:  i=1,….,N; t=1, …..,T  Where = unobservable individual effect; =unobservable time effect; stochastic disturbance  = selector matrix of ones and zeros- individual effects  = time dummies 52

 Here we still assume and are fixed parameters to be estimated and  Estimation with LSDV requires the estimation of {(N-1)+(T-1)} dummies  This can introduce rather severe loss of degrees of freedom  once again to avoid this problem we perform ‘ WITHIN’ transformation (similar to one-way model). Now, however, we must demean across both dimensions 53

 Here we work with the following:  Where:J N = matrix of ones of dimension N deviations across i deviations across t  Transforming with Q sweeps out the time and individual effects 54

 Here we have a simple regression (one-x regressor):  We now need 2 constraints to capture individual and time effects  Then we can compute the intercept using  Again as in one way model we cannot use time-invariant or individual-invariant (dummy regressors) as Q wipes them out. 55

 How do we interpret these result

58

59

 Same results 60

61

62

 Consider the partitioned regression equation:  The least-squares estimators for B1 and B2 can be expressed as:  Where 63

The residual maker and the Hat Matrix  Some useful matrices:  We know  Meaning:  Where is called the residual maker since it makes residuals out of y.  Matrix A is idempotent [M 2 =MM=M] 64

 M is a square matrix and idempotent- show  The matrix has the following properties as well:  The hat matrix (H)- makes y hat out of y.  where 65