Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July.

Slides:



Advertisements
Similar presentations
The Simple Linear Regression Model Specification and Estimation Hill et al Chs 3 and 4.
Advertisements

Data organization.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Data Modeling and Parameter Estimation Nov 9, 2005 PSCI 702.
Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.
1 Multiple Frame Surveys Tracy Xu Kim Williamson Department of Statistical Science Southern Methodist University.
Riku Salonen Regression composite estimation for the Finnish LFS from a practical perspective.
Random effects estimation RANDOM EFFECTS REGRESSIONS When the observed variables of interest are constant for each individual, a fixed effects regression.
The Generalized IV Estimator IV estimation with a single endogenous regressor and a single instrument can be naturally generalized. Suppose that there.
The Simple Linear Regression Model: Specification and Estimation
Raymond J. Carroll Texas A&M University Nonparametric Regression and Clustered/Longitudinal Data.
Chapter 10 Simple Regression.
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Part 2b Parameter Estimation CSE717, FALL 2008 CUBS, Univ at Buffalo.
Chapter 4 Multiple Regression.
Chapter 15 Panel Data Analysis.
Chapter 11 Multiple Regression.
Review.
SIMPLE LINEAR REGRESSION
Economics Prof. Buckles
Autocorrelation Lecture 18 Lecture 18.
SIMPLE LINEAR REGRESSION
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Copyright 2010, The World Bank Group. All Rights Reserved. Estimation and Weighting, Part I.
Chi-squared distribution  2 N N = number of degrees of freedom Computed using incomplete gamma function: Moments of  2 distribution:
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
1 Ratio estimation under SRS Assume Absence of nonsampling error SRS of size n from a pop of size N Ratio estimation is alternative to under SRS, uses.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Jeroen Pannekoek - Statistics Netherlands Work Session on Statistical Data Editing Oslo, Norway, 24 September 2012 Topic (I) Selective and macro editing.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
The new multiple-source system for Italian Structural Business Statistics based on administrative and survey data Orietta Luzi, Ugo Guarnera, Paolo Righi.
The Dutch Virtual Census of 2001 A New Approach by Combining Different Sources Eric Schulte Nordholt ECE Census meetings Geneva, November 2004.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.
Generalised method of moments approach to testing the CAPM Nimesh Mistry Filipp Levin.
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
Improving of Household Sample Surveys Data Quality on Base of Statistical Matching Approaches Ganna Tereshchenko Institute for Demography and Social Research,
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.
Basic Business Statistics, 8e © 2002 Prentice-Hall, Inc. Chap 1-1 Inferential Statistics for Forecasting Dr. Ghada Abo-zaid Inferential Statistics for.
Chapter 10 Comparing Two Treatments Statistics, 5/E by Johnson and Bhattacharyya Copyright © 2006 by John Wiley & Sons, Inc. All rights reserved.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011.
Sampling Design and Analysis MTH 494 LECTURE-11 Ossam Chohan Assistant Professor CIIT Abbottabad.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Estimation Econometría. ADE.. Estimation We assume we have a sample of size T of: – The dependent variable (y) – The explanatory variables (x 1,x 2, x.
1/25 Introduction to Econometrics. 2/25 Econometrics Econometrics – „economic measurement“ „May be defined as the quantitative analysis of actual economic.
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Ch3: Model Building through Regression
Simultaneous equation system
Simple Linear Regression - Introduction
Sampling Distribution
Sampling Distribution
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Simple Linear Regression
OVERVIEW OF LINEAR MODELS
SIMPLE LINEAR REGRESSION
SIMPLE LINEAR REGRESSION
Linear Regression Summer School IFPRI
New Techniques and Technologies for Statistics 2017  Estimation of Response Propensities and Indicators of Representative Response Using Population-Level.
Chapter 18: The Chi-Square Statistic
Simultaneous Equations Models
Regression Models - Introduction
Presentation transcript:

Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July 19, 2011 The 4 th Conference of ESRA, Lausanne, Switzerland July 2011

Introduction Basic Theory Applications to Unemployment Discussion Outline 1

■ To improve the direct estimators using auxiliary variables, ■ from Census ■ From administrative data, other independent survey data ■ To link the direct estimator and the auxiliary variable ■ In our study; ■ Area-level model approach, ■ Several source of auxiliary information, ■ A measurement error model ■ To propose Composite estimator ■ Using a generalized least squares method ■ In the unemployment survey in Korea 1. Introduction In the paper 2

■ General Setup ■ Survey A: Computes, subject to sampling error. ■ Survey B: Computes, subject to sampling error. ■ Administrative records (Data C): Measures, subject to coverage error. ■ Census: Measures. ■ Target measurement item is Y, but can be helpful to predict Y. ■ Goal: Improve estimation of by incorporating various types of auxiliary information. 1. Introduction Survey Integration 3

■ General Setup ■ Survey A: Measures, subject to sampling error. ■ Survey B: Measures, subject to sampling error. ■ due to the structural difference between the surveys ■ Structural difference (or systematic difference) ■ due to different mode of survey ■ due to time difference ■ due to frame difference ■ Example: Economically Active Population Survey at national level vs. Local employment data in more small area level 2. Basic Theory GMM (or GLS) method 4

■ Two error models (for area i) ■ Sampling error model Where represents the sampling error such that ■ Structural error model Where is the (known) population size of area i. 2. Basic Theory GMM (or GLS) method 5

■ Structural error model describes the relationship between the two survey measurement up to sampling error. - Y : target measurement item (variable of primary interest) - X : inaccurate measurement of Y with possible systematic bias. ■ If both X and Y measure the same item (with different survey modes), structural error model is essentially a measurement error model. ( means no measurement bias.) ■ Why consider instead of ? : We want to treat fixed rather than treating fixed. 2. Basic Theory GMM (or GLS) method 6

■ If the parameters in the structural error model are known, is also an unbiased estimator of, computed from called survey B. Estimator, using consistent is often called synthetic estimator. ■ How to combine the two estimators ? : GLS (or GMM) approach 2. Basic Theory GMM (or GLS) method 7

■ GLS approach to combine two error models: Where. Thus 2. Basic Theory GMM (or GLS) method 8

■ GLS estimator : Best linear unbiased estimator of based on the linear combination of ■ Under the current setup, Where ■ The GLS estimator is sometimes called composite estimator. In practice, we need to use and 2. Basic Theory GMM (or GLS) method 9

■ To estimate, we express ■ Direct regression of on does not work because has sampling error. ■ The area-level model takes the form of measurement error model (Fuller, 1987). ■ Parameter estimation can be performed using the measurement error model estimation methods. (Details skipped.) 2. Basic Theory Model parameter estimation 10

■ Several sources of information for unemployment of Korea ■ Economically Activity Population Survey (EAPS) data ■ Local area employment survey data ■ The number of claims from Unemployment Insurance system (UI) 3. Korea LFS application 11

■ Several sources of information for unemployment (for area i: analysis district) ■ : estimates from EAPS (subject to sampling error) ■ : estimates from Local area employment ■ : estimates from the Claimant data, UI system ■ Even though Local area employment data is a little subject to sampling error, it’s estimates are subject to measurement error due to the effect of interviewer, etc.. ■ Claimant data estimates can be modified to reduce the coverage error by using structural error model. 3. Korea LFS application 12

■ First, we can construct structural error models of where 3. Korea LFS application To discuss estimation of 13 ■ Thus, by the method of moment (1)

■ Sampling error model ■ We can rewrite (1) in terms of population mean where 3. Korea LFS application 14 (2) (3)

■ Combining (2) and (3) ■ A consistent estimator of : Minimize 3. Korea LFS application 15

■ Next, for the Claimant data, we can construct a model let be the age*gender group, : the mean of the measure for y for group g in area i : the population mean of y for group g in area i : the estimate of obtained from survey A 3. Korea LFS application 16 Where represent the error associated with incomplete coverage

■Synthetic estimator of 3. Korea LFS application 17 ■ Then, Composite estimator ■ We can estimate using the argument similar to previous case, respectively.

■ GLS method can be applied to separate structural error models to get where and are two different synthetic estimators of ■ We may use a simpler covariance matrix to achieve computational simplicity. 3. Korea LFS application 18

■ CV of estimators; 3. Korea LFS application 19 cv Estimators Y ia x1 Y ib Y ic Y i * Y i *: composite estimator CV reduction effect from composite estimator

3. Korea LFS application 20 region 조사구수 경활공표 (Y_ha) 지역고용 _ 직 접추정 (x1) com1.Y_hcom2.Y_h alternative 경활 지역고 용 서울 ,000208,226199,434166,842192,971 부산 ,00054,18354,04949,54157,136 대구 ,00050,02747,82340,78347,027 인천 ,00064,26561,70549,80657,503 광주 ,00022,60122,07419,99322,894 대전 ,00021,07820,23219,74822,133 울산 ,00017,94318,23716,89819,048 경기 ,000166,772174,245168,539187,982 강원 ,00018,58019,34918,839 충북 ,00018,41018,50218,600 충남 ,00028,54929,48228,568 전북 ,00022,04623,33022,967 전남 ,00018,03220,12119,778 경북 ,00031,11534,18833,422 경남 ,00038,73942,34041,546 제주 ,0005,4275,9266,068 1,8628,136857,000785,993791,037721, ,482 ■ Comparison of estimates in large area level

■ Two error models: Sampling error model and structural error model. ■ GLS method provides a useful tool for combining two models. ■ Does not rely on parametric distributional assumptions. ■ Requires correct specification of the variance-covariance matrix for optimal estimation. Requires variance component estimation. ■ Simpler form of covariance matrix can be used for computational simplicity over statistical efficiency. 4. Discussion 21

D EMING, W. E. &S TEPHAN, F. F. (1940). On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Annals of Mathematical Statistics 11, F AY, R. & H ERRIOT, R. (1979). Estimates of income for small places: an application of James-Stein procedures to census data. J. Am. Statist. Assoc. 74, F ULLER, W. (1987). Measurement Error Models. Hoboken, New Jersey: John Wiley & Sons, Inc. Z IESCHANG, K. (1990). Sample weighting method and estimation of totals in the consumer expenditure survey. J. Am. Statist. Assoc. 85, References 22