© John M. Abowd 2005, all rights reserved Statistical Tools for Data Integration John M. Abowd April 2005.

Slides:



Advertisements
Similar presentations
Ordinary Least-Squares
Advertisements

General Linear Model With correlated error terms  =  2 V ≠  2 I.
GENERAL LINEAR MODELS: Estimation algorithms
1 Chapter 4 Experiments with Blocking Factors The Randomized Complete Block Design Nuisance factor: a design factor that probably has an effect.
Chapter 4 Randomized Blocks, Latin Squares, and Related Designs
Experimental Design, Response Surface Analysis, and Optimization
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Simple Regression. Major Questions Given an economic model involving a relationship between two economic variables, how do we go about specifying the.
INFO 7470/ILRLE 7400 Geographic Information Systems John M. Abowd and Lars Vilhuber March 29, 2011.
Ch11 Curve Fitting Dr. Deshi Ye
The General Linear Model. The Simple Linear Model Linear Regression.
Chapter 2 Matrices Finite Mathematics & Its Applications, 11/e by Goldstein/Schneider/Siegel Copyright © 2014 Pearson Education, Inc.
1.2 Row Reduction and Echelon Forms
Visual Recognition Tutorial
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
The Simple Linear Regression Model: Specification and Estimation
6 6.1 © 2012 Pearson Education, Inc. Orthogonality and Least Squares INNER PRODUCT, LENGTH, AND ORTHOGONALITY.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Factor Analysis Purpose of Factor Analysis
Goldstein/Schnieder/Lay: Finite Math & Its Applications, 9e 1 of 86 Chapter 2 Matrices.
© John M. Abowd 2007, all rights reserved Statistical Tools for Data Integration John M. Abowd April 2007.
Independent Component Analysis (ICA) and Factor Analysis (FA)
Review of Matrix Algebra
© John M. Abowd 2005, all rights reserved Modeling Integrated Data John M. Abowd April 2005.
Ordinary least squares regression (OLS)
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Linear and generalised linear models
Linear and generalised linear models
Basics of regression analysis
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Finite Mathematics & Its Applications, 10/e by Goldstein/Schneider/SiegelCopyright © 2010 Pearson Education, Inc. 1 of 86 Chapter 2 Matrices.
Maximum likelihood (ML)
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
Separate multivariate observations
Normalised Least Mean-Square Adaptive Filtering
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture 5QF Introduction to Vector and Matrix Operations Needed for the.
Today Wrap up of probability Vectors, Matrices. Calculus
INFO 7470/ILRLE 7400 Statistical Tools: Basic Integrated Data Models John M. Abowd and Lars Vilhuber April 12, 2011.
PATTERN RECOGNITION AND MACHINE LEARNING
CREST-ENSAE Mini-course Microeconometrics of Modeling Labor Markets Using Linked Employer-Employee Data John M. Abowd portions of today’s lecture are.
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture 6 Solving Normal Equations and Estimating Estimable Model Parameters.
Multinomial Distribution
Yaomin Jin Design of Experiments Morris Method.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Danila Filipponi Simonetta Cozzi ISTAT, Italy Outlier Identification Procedures for Contingency Tables in Longitudinal Data Roma,8-11 July 2008.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Introduction to Matrices and Matrix Approach to Simple Linear Regression.
Multivariate Statistics Confirmatory Factor Analysis I W. M. van der Veld University of Amsterdam.
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Diversity Loss in General Estimation of Distribution Algorithms J. L. Shapiro PPSN (Parallel Problem Solving From Nature) ’06 BISCuit 2 nd EDA Seminar.
INFO 7470/ECON 7400 Statistical Tools: Complex Data Models John M. Abowd and Lars Vilhuber April 22, 2013.
INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011.
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
Computacion Inteligente Least-Square Methods for System Identification.
The LEHD Program and Employment Dynamics Estimates Ronald Prevost Director, LEHD Program US Bureau of the Census
Econometrics III Evgeniya Anatolievna Kolomak, Professor.
Estimation Econometría. ADE.. Estimation We assume we have a sample of size T of: – The dependent variable (y) – The explanatory variables (x 1,x 2, x.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Statistical Interpretation of Least Squares ASEN.
Lecture XXVII. Orthonormal Bases and Projections Suppose that a set of vectors {x 1,…,x r } for a basis for some space S in R m space such that r  m.
INFO 7470 Statistical Tools: Hierarchical Models and Network Analysis John M. Abowd and Lars Vilhuber May 2, 2016.
Chapter 7. Classification and Prediction
Matrices Definition: A matrix is a rectangular array of numbers or symbolic elements In many applications, the rows of a matrix will represent individuals.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
OVERVIEW OF LINEAR MODELS
OVERVIEW OF LINEAR MODELS
Parametric Methods Berlin Chen, 2005 References:
Presentation transcript:

© John M. Abowd 2005, all rights reserved Statistical Tools for Data Integration John M. Abowd April 2005

© John M. Abowd 2005, all rights reserved Outline Estimating Models from Linked Data Building Models with Heterogeneity for Linked Data Fixed Effect Estimation Identification Calculation Mixed Effect Estimation

© John M. Abowd 2005, all rights reserved Estimating Models from Linked Files Linked files are usually analyzed as if the linkage were without error Most of this class focuses on such methods There are good reasons to believe that this assumption should be examined more closely

© John M. Abowd 2005, all rights reserved Lahiri and Larsen Consider regression analysis when the data are imperfectly linked See JASA article March 2005 for full discussion

© John M. Abowd 2005, all rights reserved Setup of Lahiri and Larsen

© John M. Abowd 2005, all rights reserved Estimators

© John M. Abowd 2005, all rights reserved Problem: Estimating B

© John M. Abowd 2005, all rights reserved Does It Matter? Yes The bias from the naïve estimator is very large as the average q ii goes away from 1. The SW estimator does better. The U estimator does very well, at least in simulations.

© John M. Abowd 2005, all rights reserved Building Linked Data Examples from the LEHD infrastructure files Analysis can be done using workers, jobs or employers as the basic observation unit Want to model heterogeneity due to the workers and employers for job level analyses Want to model heterogeneity due to the jobs and workers for employer level analyses Want to model heterogeneity due to the jobs and employers for individual analyses

© John M. Abowd 2005, all rights reserved The Longitudinal Employer - Household Dynamics Program Link Record Person-ID Employer-ID Data Business Register Employer-ID Census Entity-ID Data Economic Censuses and Surveys Census Entity-ID Data Demographic Surveys Household Record Household-ID Data Person Record Household-ID Person-ID Data

© John M. Abowd 2005, all rights reserved Basic model The dependent variable is some individual level outcome, usually the log wage rate. The function J(i,t) indicates the employer of i at date t. The first component is the measured characteristics effect. The second component is the person effect. The third component is the firm effect. The fourth component is the statistical residual, orthogonal to all other effects in the model.

© John M. Abowd 2005, all rights reserved Matrix Notation: Basic Statistical Model All vectors/matrices have row dimensionality equal to the total number of observations. Data are sorted by person-ID and ordered chronologically for each person. D is the design matrix for the person effect: columns equal to the number of unique person IDs plus columns of u i. F is the design matrix for the firm effect: columns equal to the number of unique firm IDs times the number of effects per firm.

© John M. Abowd 2005, all rights reserved Estimation by Fixed-effect Methods The normal equations for least squares estimation of fixed person, firm and characteristic effects are very high dimension. Estimation of the full model by either fixed- effect or mixed-effect methods requires special algorithms to deal with the high dimensionality of the problem.

© John M. Abowd 2005, all rights reserved Least Squares Normal Equations The full least squares solution to the basic estimation problem solves these normal equations for all identified effects.

© John M. Abowd 2005, all rights reserved Identification of Effects Use of the decomposition formula for the industry (or firm-size) effect requires a solution for the identified person, firm and characteristic effects. The usual technique of eliminating singular row/column combinations from the normal equations won’t work if the least squares problem is solved directly.

© John M. Abowd 2005, all rights reserved Identification by Grouping Firm 1 is in group g = 1. Repeat until no more persons or firms are added: –Add all persons employed by a firm in group 1 to group 1 –Add all firms that have employed a person in group 1 to group 1 For g= 2,..., repeat until no firms remain: –The first firm not assigned to a group is in group g. –Repeat until no more firms or persons are added to group g: Add all persons employed by a firm in group g to group g. Add all firms that have employed a person in group g to group g. Identification of  : drop one firm from each group g. Identification of  : impose one linear restriction

© John M. Abowd 2005, all rights reserved Normal Equations after Group Blocking The normal equations have a sub-matrix with block diagonal components. This matrix is of full rank and the solution for (  ) is unique.

© John M. Abowd 2005, all rights reserved Necessity of Identification Conditions For necessity, we want to show that exactly N+J-G person and firm effects are identified (estimable), including the grand mean  y.. Because X and y are expressed in deviations from the mean, all N effects are included in the equation but one is redundant because both sides of the equation have a zero mean by construction. So the grand mean plus the person effects constitute N effects. There are at most N + J-1 person and firm effects including the grand mean. The grouping conditions imply that at most G group means are identified (or, the grand mean plus G-1 group deviations). Within each group g, at most N g and J g -1 person and firm effects are identified. Thus the maximum number of identifiable person and firm effects is:

© John M. Abowd 2005, all rights reserved Sufficiency of Identification Conditions For sufficiency, we use an induction proof. Consider an economy with J firms and N workers. Denote by E[y it ] the projection of worker i's wage at date t on the column space generated by the person and firm identifiers. For simplicity, suppress the effects of observable variables X The firms are connected into G groups, then all effects  j, in group g are separately identified up to a constraint of the form:

© John M. Abowd 2005, all rights reserved Sufficiency of Identification Conditions II Suppose G=1 and J=2. Then, by the grouping condition, at least one person, say 1, is employed by both firms and we have So, exactly N+2-1 effects are identified.

© John M. Abowd 2005, all rights reserved Sufficiency of Identification Conditions III Next, suppose there is a connected group g with J g firms and exactly J g -1 firm effects identified. Consider the addition of one more connected firm to such a group. Because the new firm is connected to the existing J g firms in the group there exists at least one individual, say worker 1 who works for a firm in the identified group, say firm J g, at date 1 and for the supplementary firm at date 2. Then, we have two relations So, exactly J g effects are identified with the new information.

© John M. Abowd 2005, all rights reserved Estimation by Direct Solution of Least Squares Once the grouping algorithm has identified all estimable effects, we solve for the least squares estimates by direct minimization of the sum of squared residuals. This method, widely used in animal breeding and genetics research, produces a unique solution for all estimable effects.

© John M. Abowd 2005, all rights reserved Least Squares Conjugate Gradient Algorithm The matrix  is chosen to precondition the normal equations. The data matrices and parameter vectors are redefined as shown.

© John M. Abowd 2005, all rights reserved LSCG (II) The goal is to find  to solve the least squares problem shown. The gradient vector g figures prominently in the equations. The initial conditions for the algorithm are shown. –e is the vector of residuals. –d is the direction of the search.

© John M. Abowd 2005, all rights reserved LSCG (III) The loop shown has the following features: –The search direction d is the current gradient plus a fraction of the old direction. –The parameter vector  is updated by moving a positive amount in the current direction. –The gradient, g, and residuals, e, are updated. –The original parameters are recovered from the preconditioning matrix.

© John M. Abowd 2005, all rights reserved LSCG (IV) Verify that the residuals are uncorrelated with the three components of the model. –Yes: the LS estimates are calculated as shown. –No: certain constants in the loop are updated and the next parameter vector is calculated.

© John M. Abowd 2005, all rights reserved Mixed Effects Assumptions The assumptions above specify the complete error structure with the firm and person effects random. For maximum likelihood or restricted maximum likelihood estimation assume joint normality.

© John M. Abowd 2005, all rights reserved Estimation by Mixed Effects Methods Solve the mixed effects equations Techniques: Bayesian EM, Restricted ML

© John M. Abowd 2005, all rights reserved Relation Between Fixed and Mixed Effects Models Under the conditions shown above, the ME and estimators of all parameters approaches the FE estimator

© John M. Abowd 2005, all rights reserved Correlated Random Effects vs. Orthogonal Design Orthogonal design means that characteristics, person design, firm design are orthogonal. Uncorrelated random effects means that  is diagonal.