Download presentation
Presentation is loading. Please wait.
1
The Census Data Enhancement Project Glenys Bishop
2
Outline Brief description of Census Data Enhancement Project Focus on Statistical Longitudinal Census Dataset (SLCD) Simulation to determine likely quality
3
Census Data Enhancement Formation of Statistical Longitudinal Census Data set ƒ 5% 2006 Census linked to 2011, 2016,... ƒ augmented with 5% sample intercensal births, immigrants Statistical studies ƒ approved projects involving linking SLCD and other data sets –Births and deaths –Long-term migrations –Disease registers No names and addresses
4
Quality Studies Quality studies use the whole Census ƒ with name and address during census processing period ƒ without name and address at other times Two types during the 2006 census processing period ƒ Assess feasibility and quality of linking without names and addresses ƒ Improve ABS outputs
5
1Indigenous mortality ƒ Linking deaths since Census and Census 2Assessing automatic matching for Post Enumeration Survey 2011 ƒ Linking 2006 PES and Census 3Undercoverage in Labour Force Survey ƒ Linking LFS August 2006 and Census Quality Studies for 2006
6
1Indigenous mortality ƒ Linking deaths since Census and Census 2Assessing automatic matching for Post Enumeration Survey 2011 ƒ Linking 2006 PES and Census 3Undercoverage in Labour Force Survey ƒ Linking LFS August 2006 and Census 4Conditions of entry and settlement outcomes for immigrants ƒ Linking migrant settlements database and Census. 5Simulation of SLCD formation Quality Studies for 2006
7
Mesh Blocks ƒ Micro-level geographical unit for statistics. ƒ 314,369 spatial MB covering Australia ƒ Residential MB contain ~ 30 to 60 dwellings New building block of statistical and administrative geography Canberra
8
Census Dress Rehearsal 2005, Census 2006 assess the feasibility of forming the SLCD without names and addresses make defensible statements about quality of the linked data Simulated SLCD Formation
9
What Linkages Census Dress Rehearsal_Census Gold Standard using name, address, mesh block and other variables Names and addresses were destroyed at end of Census processing Silver Standard using ~12000 hash-numbers, mesh block and other variables Hash numbers were destroyed at end of Census processing Bronze Standard using mesh block and other variables
10
20m Data Linking Process File A File B Record pair comparison weights Links Clerical review Non-links upper cut-off lower cut-off
11
Issues Setting cut-offs Clerical review very time consuming
12
Frequency Comparison Weight Matches Non - Matches
13
Frequency Comparison Weight
14
Accept These Links Reject These Links Estimated Cumulative Matches Linked Estimated Cumulative Non - Matches Linked
15
incorrect non-links =P(non-link|match) false links =P(link|non-match) Matches Non-matches Links Non-Links Record Pairs
16
Determining Quality of Links MatchesNon-matches LinksTrue linksNon-matches that are linked Total Links Non-linksMatches that are not linked True Non-linksTotal Non- links Total Matches Total Non- matches Total Record Pairs Match Status (True) Link Status (assigned by linking method )
17
Match Status Gold Standard linkage uses name and address Use this as benchmark for Bronze and Silver Standards
18
Comparing Quality of Bronze and Silver Linkages
19
What is Important? High link accuracy ƒ most links are correct but many matches are missed High match link rate ƒ most matches are linked but many links are incorrect
20
Comparison Bronze Linkages
21
Silver and Bronze VL
22
Univariate Summary The higher the cut-off the more likely are some subpopulations to be missed ƒ under-represented: 0-19 yr-olds and indigenous people, employed in Agriculture, people from non- families ƒ over-represented: born overseas, more highly educated, professional and clerical occupation, married ƒ trends weaker or non-existent in Silver.
23
Odds Ratios employed/(unemployed, NILF) in 2006 explanatory (all from 2005) ƒ sex, indigenous status, age, tenure status, required assistance –moved house in previous year ƒ education characteristics ƒ work characteristics in 2005, income –volunteer, occupation labourer, sales/retail VariableGoldSilverBronze movedNS 1.133 volunteerNS1.1421.116 labourer0.7520.771NS sales/retail0.854NS
24
Test Cases Female, aged 25-39, worked 15 hours in sales, weekly income $400-599, dwelling being purchased, provided child care Male, indigenous, weekly income $250-$399, actively seeking work Male, non-indigenous, married, worked 40 hours, weekly income $600-799, no degree, owned house, did not move in previous year
25
The properties of the CDR records that did not get linked to a Census record in the Gold Standard. Match-link rate and link accuracy of the different Silver and Bronze Standard linkages compared with Gold. The over- or under-representation of sub-groups in the various linked data sets compared with the Gold Standard. The effects of this over- or under-representation on some representative analyses and models fitted to linked data. Weighted analyses to counteract under-representation Methods for modifying the fitted models to account for inexact linkage and disparities in the representation of sub-groups of interest. How well linking two files collected one year apart can represent linking two files collected five years apart. Assessing the Linked Data
26
Finding Out More about CDE Already published papers included in handout Today two new papers using Indigenous Mortality QS results –Information Paper: Census Data Enhancement - Indigenous Mortality Quality Study 2006-07 (Cat. No. 4723.0) – Discussion Paper: Assessment of Methods for Developing Life Tables for Aboriginal and Torres Strait Islander Australians, 2006 (Cat. No. 3302.0.55.002) Early in 2009, several in research paper series –Methodological report on each QS (4) –Analysis of probabilistically linked data –Acceptance sampling & clerical review –Assessment of quality of SLCD
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.