Presentation is loading. Please wait.

Presentation is loading. Please wait.

Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands.

Similar presentations


Presentation on theme: "Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands."— Presentation transcript:

1 Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands Division Social and Spatial Statistics Department Support and Development Section Research and Development ESLE@CBS.NL Joint UNECE/Eurostat Meeting on Population and Housing Censuses in Astana 4-6 June 2007

2 2 Contents History of the Dutch Census Data sources Micro linkage Micro integration Social Statistical Database Estimation aspects Statistical confidentiality Conclusions

3 3 History of the Dutch Census TRADITIONAL CENSUS Ministry of Home Affairs: 1829, 1839, 1849, 1859, 1869, 1879 and 1889 Statistics Netherlands: 1899, 1909, 1920, 1930, 1947, 1960 and 1971 Unwillingness (nonresponse) and reduction expenses  no more Traditional Censuses ALTERNATIVE: VIRTUAL CENSUS 1981 and 1991: Population Register and surveys development 90’s: more registers → 2001: integrated set of registers and surveys, SSD

4 4 Data sources Registers: Population Register (PR), 16 million records demographic variables: sex, age, household status etc. Jobs file, employees, 6.5 million records, and self-employed persons, 790 thousand records dates of job, branch of economic activity Fiscal administration (FIBASE) jobs, 7.2 million records, and pensions and life insurance benefits, 2.7 million records Social Security administrations, 2 million records, auxiliary information integration process Surveys: Survey on Employment and Earnings (SEE), 3 million records, working hours, place of work Labour Force Survey (LFS), 2 years: 230.000 records education, occupation, (economic) activity

5 5 –Matching of registers and datasets to a self constructed Central Matching File –Records are identified by a surrogate identifier (RIN) –One unique table RIN-Social Security Number –Minimal set of identifying variables –Every step in the process is a deterministic match Matching process

6 6 Statistics Netherlands’ backbone of persons The Central Matching File (April 2007) 46.436.060 records 16.334.210 unique persons Social security number (sofi)< 0.03 % unknown for 1995-2007; Date of birth < 0.5% unknown month and/or day Genderalways Postal code< 0.05% unknown House number< 0.05% unknown RIN Personalways RIN Addressalways Time frame of variable validityalways

7 7 Matching process 1.Social security number matching Check on date of birth and gender A valid match when no more than one of the variables year, month, day of birth and gender differ else 2.Matching using other variables like postal code, house number, date of birth, gender All keys must match else 3.Match on social security number without any control on other variables

8 8 Micro data with Surrogate Identifier Registers Surveys Direct Identifier Surrogate Identifier (RIN) Micro data Preparation and documentation YearMonthBirth, gender, municipality, civil status employment income, jobs education social security,.. Municipal Population Register RIN de-identification table de-identified micro data RIN Selection from Municipal population register productionenvironment SN Micro data Services Social Statistics Database

9 9 Example Employement and Wages survey 20033801246100,0 Total matched374797698,6 1Sofi number, year of birth, month, day, gender357709094,1 2Postal code, year of birth, month, day, gender1642674,3 3Sofi number66190,2 Not matched532701,4 Valid sofi number211940,6 valid postal code57990,2 invalid postal code102940,3 non-resident51010,1 Unknown or invalid sofi number320760,8 valid postal code87180,2 invalid postal code200520,5 non-resident33060,1

10 10 Micro integration (1) The aim of micro integration is: –To check the linked data and modify incorrect records, –In such a way that the results that are to be published are of higher quality than the original sources

11 11 Micro integration (2) To fulfil this demand an integrated process of: data editing, derivation of statistical variables, and imputation is executed

12 12 Micro integration (3) Constraints and limitations: -Only variables that are to be published are micro integrated -Identity rules are necessary, e.g. the same variable in two sources or a relationship between two or more variables in one or more sources -No mass imputation

13 13 Social Statistical Database (SSD) Social Statistical Database (SSD): Set of integrated microdata files with coherent and detailed demographic and socio-economic data on persons, households, jobs and benefits No remaining internal conflicting information SSD set: Population Register (backbone) Integrated jobs file Integrated file of (social and other) benefits Surveys, e.g. LFS Combining element: RIN-person

14 14 SSD- core satellite Core and satellites (1)

15 15 Core and satellites (2) Core: contains only integral register information contains the most important demographic and socio-economic information contains only information that is used in at least two satellites

16 16 Core and satellites (3) Satellites are produced in two steps: Copying and derivation of the relevant information from the core SSD Adding of the unique information on a specific theme from registers and surveys

17 17 Conclusions SSD The SSD diminishes the administrative burden The SSD increases –The efficiency of statistics production –The accuracy of statistical outputs –The relevance of social statistics –The possibilities for social policy research

18 18 Estimation aspects –Surveys are samples from the population –If surveys are enriched with register information, estimations of the register part of the enriched survey will lead to inconsistencies with the counts from the entire register –Statistics Netherlands developed the method of consistent and repeated weighting to solve these inconsistencies

19 19 Statistical confidentiality IDsVariables Characteristics Identifiers (PINs, sex, date of birth, address) PERSONS BACKBONE full range of all persons as from 1995 Administrative sources IDsVariables Household surveys IDs in sources are replaced by random Record Identification Numbers (RINs)

20 20 Conclusions Matching is relatively cheap Matching is relatively quick (short production time) Micro integration remains important The SSD has found its place in the organisation Repeated weighting method guarantees consistent estimates Statistical confidentiality aspects have become very important

21 21 Time for questions and discussion


Download ppt "Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands."

Similar presentations


Ads by Google