Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July 19, 2011 The 4 th Conference of ESRA, Lausanne, Switzerland July 2011
Introduction Basic Theory Applications to Unemployment Discussion Outline 1
■ To improve the direct estimators using auxiliary variables, ■ from Census ■ From administrative data, other independent survey data ■ To link the direct estimator and the auxiliary variable ■ In our study; ■ Area-level model approach, ■ Several source of auxiliary information, ■ A measurement error model ■ To propose Composite estimator ■ Using a generalized least squares method ■ In the unemployment survey in Korea 1. Introduction In the paper 2
■ General Setup ■ Survey A: Computes, subject to sampling error. ■ Survey B: Computes, subject to sampling error. ■ Administrative records (Data C): Measures, subject to coverage error. ■ Census: Measures. ■ Target measurement item is Y, but can be helpful to predict Y. ■ Goal: Improve estimation of by incorporating various types of auxiliary information. 1. Introduction Survey Integration 3
■ General Setup ■ Survey A: Measures, subject to sampling error. ■ Survey B: Measures, subject to sampling error. ■ due to the structural difference between the surveys ■ Structural difference (or systematic difference) ■ due to different mode of survey ■ due to time difference ■ due to frame difference ■ Example: Economically Active Population Survey at national level vs. Local employment data in more small area level 2. Basic Theory GMM (or GLS) method 4
■ Two error models (for area i) ■ Sampling error model Where represents the sampling error such that ■ Structural error model Where is the (known) population size of area i. 2. Basic Theory GMM (or GLS) method 5
■ Structural error model describes the relationship between the two survey measurement up to sampling error. - Y : target measurement item (variable of primary interest) - X : inaccurate measurement of Y with possible systematic bias. ■ If both X and Y measure the same item (with different survey modes), structural error model is essentially a measurement error model. ( means no measurement bias.) ■ Why consider instead of ? : We want to treat fixed rather than treating fixed. 2. Basic Theory GMM (or GLS) method 6
■ If the parameters in the structural error model are known, is also an unbiased estimator of, computed from called survey B. Estimator, using consistent is often called synthetic estimator. ■ How to combine the two estimators ? : GLS (or GMM) approach 2. Basic Theory GMM (or GLS) method 7
■ GLS approach to combine two error models: Where. Thus 2. Basic Theory GMM (or GLS) method 8
■ GLS estimator : Best linear unbiased estimator of based on the linear combination of ■ Under the current setup, Where ■ The GLS estimator is sometimes called composite estimator. In practice, we need to use and 2. Basic Theory GMM (or GLS) method 9
■ To estimate, we express ■ Direct regression of on does not work because has sampling error. ■ The area-level model takes the form of measurement error model (Fuller, 1987). ■ Parameter estimation can be performed using the measurement error model estimation methods. (Details skipped.) 2. Basic Theory Model parameter estimation 10
■ Several sources of information for unemployment of Korea ■ Economically Activity Population Survey (EAPS) data ■ Local area employment survey data ■ The number of claims from Unemployment Insurance system (UI) 3. Korea LFS application 11
■ Several sources of information for unemployment (for area i: analysis district) ■ : estimates from EAPS (subject to sampling error) ■ : estimates from Local area employment ■ : estimates from the Claimant data, UI system ■ Even though Local area employment data is a little subject to sampling error, it’s estimates are subject to measurement error due to the effect of interviewer, etc.. ■ Claimant data estimates can be modified to reduce the coverage error by using structural error model. 3. Korea LFS application 12
■ First, we can construct structural error models of where 3. Korea LFS application To discuss estimation of 13 ■ Thus, by the method of moment (1)
■ Sampling error model ■ We can rewrite (1) in terms of population mean where 3. Korea LFS application 14 (2) (3)
■ Combining (2) and (3) ■ A consistent estimator of : Minimize 3. Korea LFS application 15
■ Next, for the Claimant data, we can construct a model let be the age*gender group, : the mean of the measure for y for group g in area i : the population mean of y for group g in area i : the estimate of obtained from survey A 3. Korea LFS application 16 Where represent the error associated with incomplete coverage
■Synthetic estimator of 3. Korea LFS application 17 ■ Then, Composite estimator ■ We can estimate using the argument similar to previous case, respectively.
■ GLS method can be applied to separate structural error models to get where and are two different synthetic estimators of ■ We may use a simpler covariance matrix to achieve computational simplicity. 3. Korea LFS application 18
■ CV of estimators; 3. Korea LFS application 19 cv Estimators Y ia x1 Y ib Y ic Y i * Y i *: composite estimator CV reduction effect from composite estimator
3. Korea LFS application 20 region 조사구수 경활공표 (Y_ha) 지역고용 _ 직 접추정 (x1) com1.Y_hcom2.Y_h alternative 경활 지역고 용 서울 ,000208,226199,434166,842192,971 부산 ,00054,18354,04949,54157,136 대구 ,00050,02747,82340,78347,027 인천 ,00064,26561,70549,80657,503 광주 ,00022,60122,07419,99322,894 대전 ,00021,07820,23219,74822,133 울산 ,00017,94318,23716,89819,048 경기 ,000166,772174,245168,539187,982 강원 ,00018,58019,34918,839 충북 ,00018,41018,50218,600 충남 ,00028,54929,48228,568 전북 ,00022,04623,33022,967 전남 ,00018,03220,12119,778 경북 ,00031,11534,18833,422 경남 ,00038,73942,34041,546 제주 ,0005,4275,9266,068 1,8628,136857,000785,993791,037721, ,482 ■ Comparison of estimates in large area level
■ Two error models: Sampling error model and structural error model. ■ GLS method provides a useful tool for combining two models. ■ Does not rely on parametric distributional assumptions. ■ Requires correct specification of the variance-covariance matrix for optimal estimation. Requires variance component estimation. ■ Simpler form of covariance matrix can be used for computational simplicity over statistical efficiency. 4. Discussion 21
D EMING, W. E. &S TEPHAN, F. F. (1940). On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Annals of Mathematical Statistics 11, F AY, R. & H ERRIOT, R. (1979). Estimates of income for small places: an application of James-Stein procedures to census data. J. Am. Statist. Assoc. 74, F ULLER, W. (1987). Measurement Error Models. Hoboken, New Jersey: John Wiley & Sons, Inc. Z IESCHANG, K. (1990). Sample weighting method and estimation of totals in the consumer expenditure survey. J. Am. Statist. Assoc. 85, References 22