Download presentation
Presentation is loading. Please wait.
1
I n f o r m a t i o n e n www.statistik.at Wir bewegen
Data Imputation and Estimation for the Austrian Register Based Census Test I n f o r m a t i o n e n Reinhard Fiedler Peter Schodl April 23rd, 2008 © STATISTIK AUSTRIA
2
Welcome 4/23/2008
3
Introduction Background Information Estimation procedures Time Plan
Pros and Cons Registers used for RBCT (Register Based Census Test) Estimation procedures Record Linkage Estimation Hot-deck technique Clustering 4/23/2008
4
Background Information
Time Plan 2001: last conventional census : reference date for RBCT April 2008: first report RBCT 2010: first register-based census 4/23/2008
5
Background Information
Pros and cons Pros: economic efficiency faster more often unburden respondents privacy Cons: incomplete data inconsistent data timeliness 4/23/2008
6
Background Information
Registers used for RBCT 8 basis registers, e.g. Central Population Register (CPR) Central Social Security Register (CSSR) Register of Educational Attainment 7 comparison registers for cross-checks, e.g. Register of Family Allowance Register of Social welfare Linkage by unique keys Branch-specific identification number (bPK) (a specific personal code) Social Security Number (RBCT) 4/23/2008
7
Background Information
Missing data Low missing rates Covered by more than one data source Sex (<1% missing) Date of birth (<1% missing) Medium to high missing rates Marital status (11% missing) Graduates (7% missing) Not included in any register Occupation (100% missing) 4/23/2008
8
Estimation Strategy Record Linkage For all registers Estimation
Marital status (high missing rate) Occupation (not contained in any register) Graduates (immigrants since last census) 4/23/2008
9
Record Linkage Problem: imperfect linkage of registers
Wrong or missing keys Attributes used: Date of birth Address Nationality Sex Standardization of notations 4/23/2008
10
Record Linkage Example: Current school enrolment
By record linkage, people in school-age without current school enrolment are reduced by 40% 4/23/2008
11
Estimation Occupation and graduates Graduates Occupation
Source: RBCT itself people with graduation people with missing graduation Occupation Source: Labour Force Survey Quarterly sample survey About people with occupation in survey People with missing occupation (all working persons) 4/23/2008
12
Estimation Basic idea:
Same procedure for estimation of occupation and graduates Estimation on person-level Target-distribution Building of groups, to transfer the distribution of the source to the corresponding group of the target Groups are formed by attributes with significant influence on the target-variable 4/23/2008 12
13
Hot-deck technique Example:
1000 People from 30 to 34 years living in Tyrol form one deck Labour Force Survey 200 with occupation A 300 with occupation B 500 with occupation C Weighting scheme gets applied to all people within the deck in the RBCT 20% probability for occupation A 30% probability for occupation B 50% probability for occupation C 4/23/2008
14
Which attributes have influence?
Graduates Age Status in employment Sex Nationality Urban / rural environment Occupation Age Status in employment Sex Nationality Region NACE of employment Level of educational achievement 4/23/2008
15
Clustering Groups must not be too small No donor for many persons
Wrong distribution Example: Source: Tyrol, male, 87 years, German nationality: 10 Persons 5 occupation A (50% A) 5 occupation B (50% B) Tyrol, female, 87 years, German nationality: 1 Person 1 occupation B (100% B) Target: Tyrol, male, 87 years, German nationality: Persons 500 A, 500 B Tyrol, female, 87 years, German nationality: Persons 1000B Tyrol, male, 87 years, German nationality: occupation A 1500 occupation B 4/23/2008 15
16
Clustering Optimal groups by cluster analysis
Groups must not be too big Correct distribution only on highest level, incorrect distribution on lower levels Example: 2 Groups: Male / Female. Distribution for males and for females are transferred to the target distribution of source and target is the same for males and females. But: distribution for regions, age,… can be incorrect! Optimal groups by cluster analysis 4/23/2008
17
Clustering Occupation First clustering:
Variables with many values (age, nationality, region,…) Second clustering: Since the groups after first clustering are too small, the groups are clustered again nationality age First aggregation Second aggregation 4/23/2008
18
Results Graduates: No second clustering, 457 groups with about Persons in each group Never more than 2.4% deviation on highest level to the Labour force survey 2006 Occupation: 65 groups after second clustering with about 500 Persons in each group Never more than 1.7% deviation to the traditional census on highest level Never more than 3.2% deviation on medium levels 4/23/2008
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.