Download presentation
Presentation is loading. Please wait.
Published byRodney Summers Modified over 8 years ago
1
Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July 19, 2011 The 4 th Conference of ESRA, Lausanne, Switzerland 18-22 July 2011
2
Introduction Basic Theory Applications to Unemployment Discussion Outline 1
3
■ To improve the direct estimators using auxiliary variables, ■ from Census ■ From administrative data, other independent survey data ■ To link the direct estimator and the auxiliary variable ■ In our study; ■ Area-level model approach, ■ Several source of auxiliary information, ■ A measurement error model ■ To propose Composite estimator ■ Using a generalized least squares method ■ In the unemployment survey in Korea 1. Introduction In the paper 2
4
■ General Setup ■ Survey A: Computes, subject to sampling error. ■ Survey B: Computes, subject to sampling error. ■ Administrative records (Data C): Measures, subject to coverage error. ■ Census: Measures. ■ Target measurement item is Y, but can be helpful to predict Y. ■ Goal: Improve estimation of by incorporating various types of auxiliary information. 1. Introduction Survey Integration 3
5
■ General Setup ■ Survey A: Measures, subject to sampling error. ■ Survey B: Measures, subject to sampling error. ■ due to the structural difference between the surveys ■ Structural difference (or systematic difference) ■ due to different mode of survey ■ due to time difference ■ due to frame difference ■ Example: Economically Active Population Survey at national level vs. Local employment data in more small area level 2. Basic Theory GMM (or GLS) method 4
6
■ Two error models (for area i) ■ Sampling error model Where represents the sampling error such that ■ Structural error model Where is the (known) population size of area i. 2. Basic Theory GMM (or GLS) method 5
7
■ Structural error model describes the relationship between the two survey measurement up to sampling error. - Y : target measurement item (variable of primary interest) - X : inaccurate measurement of Y with possible systematic bias. ■ If both X and Y measure the same item (with different survey modes), structural error model is essentially a measurement error model. ( means no measurement bias.) ■ Why consider instead of ? : We want to treat fixed rather than treating fixed. 2. Basic Theory GMM (or GLS) method 6
8
■ If the parameters in the structural error model are known, is also an unbiased estimator of, computed from called survey B. Estimator, using consistent is often called synthetic estimator. ■ How to combine the two estimators ? : GLS (or GMM) approach 2. Basic Theory GMM (or GLS) method 7
9
■ GLS approach to combine two error models: Where. Thus 2. Basic Theory GMM (or GLS) method 8
10
■ GLS estimator : Best linear unbiased estimator of based on the linear combination of ■ Under the current setup, Where ■ The GLS estimator is sometimes called composite estimator. In practice, we need to use and 2. Basic Theory GMM (or GLS) method 9
11
■ To estimate, we express ■ Direct regression of on does not work because has sampling error. ■ The area-level model takes the form of measurement error model (Fuller, 1987). ■ Parameter estimation can be performed using the measurement error model estimation methods. (Details skipped.) 2. Basic Theory Model parameter estimation 10
12
■ Several sources of information for unemployment of Korea ■ Economically Activity Population Survey (EAPS) data ■ Local area employment survey data ■ The number of claims from Unemployment Insurance system (UI) 3. Korea LFS application 11
13
■ Several sources of information for unemployment (for area i: analysis district) ■ : estimates from EAPS (subject to sampling error) ■ : estimates from Local area employment ■ : estimates from the Claimant data, UI system ■ Even though Local area employment data is a little subject to sampling error, it’s estimates are subject to measurement error due to the effect of interviewer, etc.. ■ Claimant data estimates can be modified to reduce the coverage error by using structural error model. 3. Korea LFS application 12
14
■ First, we can construct structural error models of where 3. Korea LFS application To discuss estimation of 13 ■ Thus, by the method of moment (1)
15
■ Sampling error model ■ We can rewrite (1) in terms of population mean where 3. Korea LFS application 14 (2) (3)
16
■ Combining (2) and (3) ■ A consistent estimator of : Minimize 3. Korea LFS application 15
17
■ Next, for the Claimant data, we can construct a model let be the age*gender group, : the mean of the measure for y for group g in area i : the population mean of y for group g in area i : the estimate of obtained from survey A 3. Korea LFS application 16 Where represent the error associated with incomplete coverage
18
■Synthetic estimator of 3. Korea LFS application 17 ■ Then, Composite estimator ■ We can estimate using the argument similar to previous case, respectively.
19
■ GLS method can be applied to separate structural error models to get where and are two different synthetic estimators of ■ We may use a simpler covariance matrix to achieve computational simplicity. 3. Korea LFS application 18
20
■ CV of estimators; 3. Korea LFS application 19 cv Estimators Y ia x1 Y ib Y ic Y i * Y i *: composite estimator CV reduction effect from composite estimator
21
3. Korea LFS application 20 region 조사구수 경활공표 (Y_ha) 지역고용 _ 직 접추정 (x1) com1.Y_hcom2.Y_h alternative 경활 지역고 용 서울 230710225,000208,226199,434166,842192,971 부산 12040058,00054,18354,04949,54157,136 대구 10021946,00050,02747,82340,78347,027 인천 11225768,00064,26561,70549,80657,503 광주 9014420,00022,60122,07419,99322,894 대전 9014218,00021,07820,23219,74822,133 울산 8014219,00017,94318,23716,89819,048 경기 2201521219,000166,772174,245168,539187,982 강원 10065917,00018,58019,34918,839 충북 10046715,00018,41018,50218,600 충남 10061733,00028,54929,48228,568 전북 9049218,00022,04623,33022,967 전남 13774016,00018,03220,12119,778 경북 13381935,00031,11534,18833,422 경남 11070248,00038,73942,34041,546 제주 501055,0005,4275,9266,068 1,8628,136857,000785,993791,037721,938 796,482 ■ Comparison of estimates in large area level
22
■ Two error models: Sampling error model and structural error model. ■ GLS method provides a useful tool for combining two models. ■ Does not rely on parametric distributional assumptions. ■ Requires correct specification of the variance-covariance matrix for optimal estimation. Requires variance component estimation. ■ Simpler form of covariance matrix can be used for computational simplicity over statistical efficiency. 4. Discussion 21
23
D EMING, W. E. &S TEPHAN, F. F. (1940). On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Annals of Mathematical Statistics 11, 427-444. F AY, R. & H ERRIOT, R. (1979). Estimates of income for small places: an application of James-Stein procedures to census data. J. Am. Statist. Assoc. 74, 341-353. F ULLER, W. (1987). Measurement Error Models. Hoboken, New Jersey: John Wiley & Sons, Inc. Z IESCHANG, K. (1990). Sample weighting method and estimation of totals in the consumer expenditure survey. J. Am. Statist. Assoc. 85, 986-1001. References 22
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.