CREST-ENSAE Mini-course Microeconometrics of Modeling Labor Markets Using Linked Employer-Employee Data John M. Abowd portions of today’s lecture are.

CREST-ENSAE Mini-course Microeconometrics of Modeling Labor Markets Using Linked Employer-Employee Data John M. Abowd portions of today’s lecture are the work of Kevin McKinney (U.S. Census Bureau) and Ian Schmutte (University of Georgia) June 3, 2013

Topics May 30: Basics of analyzing complex linked data June 3: Basics of graph theory with applications to labor markets June 6: Matching and sorting models June 10: Endogenous mobility models Online course materials 3 June 2013© John M. Abowd and others, 20132

Lecture 2 Basic graph theory for linked employer employee data models Estimation by fixed and mixed effect methods More basic graph theory Sampling employer-employee graphs Modularity and community models for employer-employee graphs 3 June 2013© John M. Abowd and others, 20133

BASIC GRAPH THEORY 3 June 2013© John M. Abowd and others, 20134

Graph Basics 3 June 2013© John M. Abowd and others, 20135

Graphs Fully connected graph E-mail network 3 June 2013© John M. Abowd and others, 20136

The Bipartite Labor Market Graph 3 June 2013© John M. Abowd and others, 20137

Realized Employment Network 3 June 2013© John M. Abowd and others, 20138

Adjacency Matrices 3 June 2013© John M. Abowd and others, 20139

STATISTICAL MODELING: A FIRST EXAMPLE BASED ON AKM 3 June 2013© John M. Abowd and others, 201310

The dependent variable is compensation The function J(i,t) indicates the employer of i at date t. The first component is the person effect. The second component is the firm effect. The third component is the measured characteristics effect. The fourth component is the statistical residual, orthogonal to all other effects in the model. Basic Statistical Model 3 June 2013© John M. Abowd and others, 201311

Matrix Notation: Basic Model All vectors/matrices have row dimensionality equal to the total number of observations. Data are sorted by person-ID and ordered chronologically for each person. D is the design matrix for the person effect: columns equal to the number of unique person IDs. F is the design matrix for the firm effect: columns equal to the number of unique firm IDs times the number of effects per firm. 3 June 2013© John M. Abowd and others, 201312

True Industry Effect Model The function K(j) indicates the industry of firm j The first component is the person effect The second component is the firm effect net of the true industry effect The third component is the true industry effect, an aggregation of firm effects since industry is a property of the employer The fourth component is the effect of personal characteristics The fifth component is the statistical residual 3 June 2013© John M. Abowd and others, 201313

Matrix Notation: True Industry Effect Model The matrix A is the classification matrix taking firms into industries The matrix FA is the design matrix for the true industry effect The true industry effect  can be expressed as 3 June 2013© John M. Abowd and others, 201314

Raw Industry Effect Model The first component is the raw industry effect The second component is the measured personal characteristics effect The third component is the statistical residual The raw industry effect is an aggregation of the appropriately weighted average person and average firm effects within the industry, since both have been excluded from the model The true industry effect is only an aggregation of the appropriately weighted average firm effect within the industry, as shown above 3 June 2013© John M. Abowd and others, 201315

Industry Effects Adjusted for Person Effects Model The first component is the industry effect adjusted for person effects The second component is individual effect (with firm effects omitted) The third component is the measured personal characteristics effect The fourth component is the statistical residual The industry effects adjusted for person effects are also biased 3 June 2013© John M. Abowd and others, 201316

Relation: True and Raw Industry Effects The vector  ** of industry effects can be expressed as the true industry effect  plus a bias that depends upon both the person and firm effects The matrix M is the residual matrix (column null space) after projection onto the column space of the matrix in the subscript. For example, 3 June 2013© John M. Abowd and others, 201317

Relation: Industry, Person and Firm Effects The vector  ** of raw industry effects can be expressed as a matrix weighted average of the person effects  and the firm effects  The matrix weights are related to the personal characteristics X, and the design matrices for the person and firm effects (see Abowd, Kramarz and Margolis, 1999) 3 June 2013© John M. Abowd and others, 201318

IDENTIFICATION 3 June 2013© John M. Abowd and others, 201319

Estimation by Fixed-effect Methods The normal equations for least squares estimation of fixed person, firm and characteristic effects are very high dimension Estimation of the full model by either fixed- effect or mixed-effect methods requires special algorithms to deal with the high dimensionality of the problem 3 June 2013© John M. Abowd and others, 201320

Least Squares Normal Equations The full least squares solution to the basic estimation problem solves these normal equations for all identified effects 3 June 2013© John M. Abowd and others, 201321

Identification of Effects Use of the decomposition formula for the industry (or firm-size) effect requires a solution for the identified person, firm and characteristic effects The usual technique of eliminating singular row/column combinations from the normal equations won’t work if the least squares problem is solved directly. 3 June 2013© John M. Abowd and others, 201322

Identification by Finding connected Sub-graphs (Depth-first) Firm 1 is in group g = 1 Repeat until no more persons or firms are added: – Add all persons employed by a firm in group 1 to group 1 – Add all firms that have employed a person in group 1 to group 1 For g= 2,..., repeat until no firms remain: – The first firm not assigned to a group is in group g. – Repeat until no more firms or persons are added to group g: Add all persons employed by a firm in group g to group g. Add all firms that have employed a person in group g to group g. Each group g is a connected sub-graph Identification of  : drop one firm from each group g Identification of  : impose one linear restriction 3 June 2013© John M. Abowd and others, 201323

Connected Sub-graphs of the Labor Market 3 June 2013© John M. Abowd and others, 201324

Normal Equations after Group Blocking The normal equations have a sub-matrix with block diagonal components This matrix is of full rank and the solution for (  ) is unique 3 June 2013© John M. Abowd and others, 201325

Necessity of Identification Conditions For necessity, we want to show that exactly N+J-G person and firm effects are identified (estimable), including the grand mean  y Because X and y are expressed in deviations from the mean, all N effects are included in the equation but one is redundant because both sides of the equation have a zero mean by construction So the grand mean plus the person effects constitute N effects There are at most N + J-1 person and firm effects including the grand mean. The grouping conditions imply that at most G group means are identified (or, the grand mean plus G-1 group deviations) Within each group g, at most N g and J g -1 person and firm effects are identified. Thus the maximum number of identifiable person and firm effects is: 3 June 2013© John M. Abowd and others, 201326

Sufficiency of Identification Conditions For sufficiency, we use an induction proof Consider an economy with J firms and N workers Denote by E[y it ] the projection of worker i's wage at date t on the column space generated by the person and firm identifiers. For simplicity, suppress the effects of observable variables X The firms are connected into G groups, then all effects  j, in group g are separately identified up to a constraint of the form: 3 June 2013© John M. Abowd and others, 201327

Sufficiency of Identification Conditions II Suppose G=1 and J=2 Then, by the grouping condition, at least one person, say 1, is employed by both firms and we have So, exactly N+2-1 effects are identified 3 June 2013© John M. Abowd and others, 201328

Sufficiency of Identification Conditions III Next, suppose there is a connected group g with J g firms and exactly J g -1 firm effects identified Consider the addition of one more connected firm to such a group Because the new firm is connected to the existing J g firms in the group there exists at least one individual, say worker 1 who works for a firm in the identified group, say firm J g, at date 1 and for the supplementary firm at date 2. Then, we have two relations So, exactly J g effects are identified with the new information 3 June 2013© John M. Abowd and others, 201329

ESTIMATION BY FIXED-EFFECTS METHODS 3 June 2013© John M. Abowd and others, 201330

Estimation by Direct Solution of Least Squares Once the grouping algorithm has identified all estimable effects, we solve for the least squares estimates by direct minimization of the sum of squared residuals This method, widely used in animal breeding and genetics research, produces a unique solution for all estimable effects Software: SAS (GLM); Stata (xtreg); R (lme); cgcg 3 June 2013© John M. Abowd and others, 201331

Least Squares Conjugate Gradient Algorithm The matrix  is chosen to precondition the normal equations The data matrices and parameter vectors are redefined as shown 3 June 2013© John M. Abowd and others, 201332

LSCG (II) The goal is to find  to solve the least squares problem shown The gradient vector g figures prominently in the equations The initial conditions for the algorithm are shown. – e is the vector of residuals – d is the direction of the search 3 June 2013© John M. Abowd and others, 201333

LSCG (III) The loop shown has the following features: – The search direction d is the current gradient plus a fraction of the old direction – The parameter vector  is updated by moving a positive amount in the current direction – The gradient, g, and residuals, e, are updated. – The original parameters are recovered from the preconditioning matrix 3 June 2013© John M. Abowd and others, 201334

LSCG (IV) Verify that the residuals are uncorrelated with the three components of the model. – Yes: the LS estimates are calculated as shown. – No: certain constants in the loop are updated and the next parameter vector is calculated. 3 June 2013© John M. Abowd and others, 201335

ESTIMATION BY MIXED-EFFECTS METHODS 3 June 2013© John M. Abowd and others, 201336

Mixed Effects Assumptions The assumptions above specify the complete error structure with the firm and person effects random. For maximum likelihood or restricted maximum likelihood estimation assume joint normality. Software: ASREML, cgmixedASREMLcgmixed 3 June 2013© John M. Abowd and others, 201337

Estimation by Mixed Effects Methods Solve the mixed effects equations Techniques: Bayesian EM, Restricted ML 3 June 2013© John M. Abowd and others, 201338

Bayesian ECM The algorithm is illustrated for the special case of uncorrelated residuals and uncorrelated random effects. The initial conditions are taken directly from the LS solution to the fixed effects problem. 3 June 2013© John M. Abowd and others, 201339

Bayesian ECM (II) At each loop of the algorithm the E step is used to compute the parameters The conditional M step is used to update the variances. 3 June 2013© John M. Abowd and others, 201340

Relation Between Fixed and Mixed Effects Models Under the conditions shown above, the ME and estimators of all parameters approaches the FE estimator 3 June 2013© John M. Abowd and others, 201341

Correlated Random Effects vs. Orthogonal Design Orthogonal design means that characteristics, person design, firm design are orthogonal Uncorrelated random effects means that  is diagonal 3 June 2013© John M. Abowd and others, 201342

Software SAS: proc mixed ASREML aML SPSS: Linear Mixed Models STATA: xtreg, gllamm, xtmixed R: the lme() function S+: linear mixed models Gauss Matlab (pcg) Genstat: REML Grouping (connected sub-graphs) 3 June 2013© John M. Abowd and others, 201343

EXAMPLE 3 June 2013© John M. Abowd and others, 201344

Inter-Industry Wage Differences From Abowd, Kramarz, Lengermann, Roux, and Schmutte (2012) 3 June 2013© John M. Abowd and others, 201345

The Three Labor Market Graphs Person-Firm Graph – The core set of connections or relationships Firm-to-Firm Graph – Projection of the PFG onto the firm nodes Person-to-Person Graph – Projection of the PFG onto the person nodes (will not discuss today) Today’s talk is non-technical, although a formal mathematical document is available on request. 52

Person-Firm Graph G=(V,E) V=Nodes (Persons and Firms) E=Edges (Jobs) – Graph is bipartite; nodes are made up of two disjoint sets (persons and firms). – Persons cannot employ other persons and firms cannot employ other firms. An edge always contains one node from the set of persons and one node from the set of firms. – Edges (jobs) are generated when person i is employed at firm j and positive earnings are reported to a participating state UI system. 53

A Person-Firm Edge in Detail 55 1 1 Person Node (PIK) Labels DOB Sex Race Person Node (PIK) Labels DOB Sex Race 6 6 Edge (Job) Labels Quarterly Earnings History Edge (Job) Labels Quarterly Earnings History Firm Node (SEIN) Labels Industry Location Size Firm Node (SEIN) Labels Industry Location Size

LEHD from a Graph Theoretic POV Person-Firm Graph – Graph is not complete. Every person does not have a job at every firm. – Adjacency matrix is sparse; realized person-firm edges are only ~3/1000 th of one percent of the total possible. Edge Labels – Earnings history for each job is stored in the EHF – SAS statement to recover person-firm graph from the EHF: proc sort data=ehf (keep=pik sein) out=pfg nodupkey; by pik sein; Node Labels Stored in Separate Files – Person – ICF – Firm – ECF 56

LEHD Products that use the Person- Firm Graph QWI – Sum edges (jobs), where edge and node labels meet certain criteria. For example, sum all edges (jobs) for males age 16-24 with positive earnings in 1992:4, working in the construction industry at a firm located in Miami-Dade county Florida. OTM – Attaches location information to the person and firm nodes. Plots each active edge in 2 dimensional space, where the node locations determine the placement of the edge. Annual Job Flows – Uses a directed weighted firm graph. We will talk more about Job Flows in subsequent slides. 57

The Firm Graph Projection of the Person Firm Graph onto the Firm Nodes ‒Workers with multiple employers generate (j*(j- 1))/2 edges. ‒Workers with a single employer generate a loop. If loops are included, the result is a multi-graph. Shows the Connections Between Firms Workers employed at multiple firms bring skills and knowledge from previous jobs to their current job. Different Versions Undirected versus directed Unweighted versus weighted Loops versus no loops 58

Going from Person-Firm to Firm Edges 1 1 6 6 8 8 5 5 6 6 8 8 2 2 7 7 8 8 4 4 3 3 7 7 8 8 Two Loops Workers 2 and 3 are employed at only one firm Two Loops Workers 2 and 3 are employed at only one firm Three Firm-to-Firm Edges Workers 1,3, and 5 each generate one edge Three Firm-to-Firm Edges Workers 1,3, and 5 each generate one edge 60

Undirected Unweighted Firm-to-Firm Graph with no Loops Edge Weight of 2 Edge Weight of 1 61

Undirected Weighted Firm-to-Firm Graph A A B B E E C C D D F F 50 110 20 5 Largest Connected Component G G 1 62

Weighted Directed Firm-to-Firm Graph A A B B E E C C D D F F 20 1 3 5 4 15 Largest Strongly Connected Component G G Weakly Connected 1 30 1 7 5 15 63

Data Select workers on the EHF with positive earnings some time during 2000:1 to 20010:4. – All States except MA, plus DC – Some states are not available until after 2000:1: AL (2001:1), AR(2002:3), DC(2002:2), MS (2003:3), and NH (2003:1). Workers with more than 40 jobs are removed. – PIK with a large number of jobs is unlikely to represent the work history of a single individual. – A worker with a large number of jobs generates a large number of edges. Max edges per PIK=780. About 220 million persons, 15 million firms, and 6.3 billion person firm year quarter observations. 64

Graph Characteristics Total number of person-to-firm edges is about 1 billion Total number of firm-to-firm edges is about 4 billion (including loops) – Direct relationship between the distribution of the number of jobs per person and the number of person- firm edges – Direct relationship between the distribution of the number of jobs per person and the number of firm-to- firm edges Edges per person=(j*(j-1))/2 except if j=1 then Edges=1 65

Distribution of Jobs and Firm Edges by Number of Jobs per Person 66

Cumulative Distribution of Jobs and Firm Edges by Number of Jobs per Person 67

Creating the State Level Firm-to-Firm Graph Collapse the firm edges by removing duplicates. For example, multiple workers may have been employed at the same two firms. Average multiplicity is approximately 2, but a little over 80% of the edges come from only one PIK (multiplicity of 1). Attach state geography labels to each firm-to-firm node and sum edges within each state pair. Result is a new graph with state-to-state edges 68

69 50 Largest Edges (Top 10 in Red)

71 Largest Edge for Each State (Reciprocal Edges in Red)

Summary The connectedness of the bipartite graph between employers and employees yields the identification conditions for longitudinally linked employer-employee data models at the job level These can be estimated using either fixed or mixed effects methods. 3 June 2013© John M. Abowd and others, 201373

CREST-ENSAE Mini-course Microeconometrics of Modeling Labor Markets Using Linked Employer-Employee Data John M. Abowd portions of today’s lecture are.

Similar presentations

Presentation on theme: "CREST-ENSAE Mini-course Microeconometrics of Modeling Labor Markets Using Linked Employer-Employee Data John M. Abowd portions of today’s lecture are."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CREST-ENSAE Mini-course Microeconometrics of Modeling Labor Markets Using Linked Employer-Employee Data John M. Abowd portions of today’s lecture are.

Similar presentations

Presentation on theme: "CREST-ENSAE Mini-course Microeconometrics of Modeling Labor Markets Using Linked Employer-Employee Data John M. Abowd portions of today’s lecture are."— Presentation transcript:

Similar presentations

About project

Feedback