Using Reported Data as Matching Variables in Record Linkage

Slides:



Advertisements
Similar presentations
Re-design of the trade in commercial services program in Canada October 2010 OECD Working Party on Trade in Goods and Services.
Advertisements

Missing data – issues and extensions For multilevel data we need to impute missing data for variables defined at higher levels We need to have a valid.
Discussion of topic VI Censuses Work Session on Data Editing Vienna, April 21 st -23 rd 2008 Heather Wagstaff & Thomas Burg.
Efficient modelling of record linked data A missing data perspective Harvey Goldstein Record Linkage Methodology Research Group Institute of Child Health.
Efficient modelling of record linked data A missing data perspective Harvey Goldstein Record Linkage Methodology Research Group Institute of Child Health.
Capturing Sensitive Data & Data Linkage. Capturing Sensitive Data Data Protection Act 1998 (Section 33) – Allows data to be used for research purposes.
© 2007 John M. Abowd, Lars Vilhuber, all rights reserved Introduction to Record Linking John M. Abowd and Lars Vilhuber April 2007.
March 2013 ESSnet DWH - Workshop IV DATA LINKING ASPECTS OF COMBINING DATA INCLUDING OPTIONS FOR VARIOUS HIERARCHIES (S-DWH CONTEXT)
Data Quality Class 10. Agenda Review of Last week Cleansing Applications Guest Speaker.
1 The 2010 Census Coverage Measurement Survey Patrick J. Cantwell U.S Census Bureau Annual Meeting of the Association of Public Data Users September 25,
1 Editing Administrative Data and Combined Data Sources Introduction.
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
© John M. Abowd and Lars Vilhuber 2005, all rights reserved Introduction to Probabilistic Record Linking John M. Abowd and Lars Vilhuber March 2005.
© 2007 John M. Abowd, Lars Vilhuber, all rights reserved Estimating m and u Probabilities Using EM Based on Winkler 1988 "Using the EM Algorithm for Weight.
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
Household Surveys ACS – CPS - AHS INFO 7470 / ECON 8500 Warren A. Brown University of Georgia February 22,
Effects of Income Imputation on Traditional Poverty Estimates The views expressed here are the authors and do not represent the official positions.
Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)
Introduction to Record Linking John M. Abowd and Lars Vilhuber April 2011 © 2011 John M. Abowd, Lars Vilhuber, all rights reserved.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Q2010, Helsinki Development and implementation of quality and performance indicators for frame creation and imputation Kornélia Mag László Kajdi Q2010,
List frames area frames and administrative data, are they complementary or in competition? Elisabetta Carfagna University of Bologna Department of Statistics.
© John M. Abowd 2007, all rights reserved Analyzing Frames and Samples with Missing Data John M. Abowd March 2007.
The relationship between error rates and parameter estimation in the probabilistic record linkage context Tiziana Tuoto, Nicoletta Cibella, Marco Fortini.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Editing a Mixture of Canadian 2006 Census and Tax Data Mike Bankier Statistics Canada 2006 Work Session on Statistical Data Editing
ISI Satellite Conference on Agricultural Statistics, Maputo, August 2009 Integrated survey framework Using Household Expenditure Surveys for Food.
1.State your research hypothesis in the form of a relation between two variables. 2. Find a statistic to summarize your sample data and convert the above.
Calibrated imputation of numerical data under linear edit restrictions Jeroen Pannekoek Natalie Shlomo Ton de Waal.
Identifying Sources of Error: the 2007 Classification Error Survey for the US Census of Agriculture Jaki McCarthy and Denise Abreu USDA’s National Agricultural.
Chapter 6: Analyzing and Interpreting Quantitative Data
Multivariate selective editing via mixture models: first applications to Italian structural business surveys Orietta Luzi, Guarnera U., Silvestri F., Buglielli.
3.5 Notes analytical technique for evaluating limits of rational functions as x approaches infinity.
An Overview of Editing and Imputation Methods for the next Italian Censuses Gianpiero Bianchi, Antonia Manzari, Alessandra Reale UNECE-Eurostat Meeting.
Chapter 8 – Naïve Bayes DM for Business Intelligence.
1 1 Slide © 2003 Thomson/South-Western. 2 2 Slide © 2003 Thomson/South-Western Chapter 3 Descriptive Statistics: Numerical Methods Part B n Measures of.
מהפכות באנגליה.
FREQUENCY DISTRIBUTION
Methods for Data-Integration
Challenges in data linkage: error and bias
Introduction to Probabilistic Record Linking
Modeling approaches for the allocation of costs
Load Weighting and Priority
General Concepts on Sampling Frames
Update and Overview of Administrative Records for the 2020 Census
Survey phases, survey errors and quality control system
Survey phases, survey errors and quality control system
Workshop on Area Sampling Frame Key features of area sampling frame
CS222P: Principles of Data Management UCI, Fall 2018 Notes #09 External Sorting Instructor: Chen Li.
Single-Variable, Correlated-Groups Designs
Concepts and Definitions Used in Area Sampling Frame
Directors of Social Statistics Board (DSSB) 4-5 December 2017
CS222: Principles of Data Management Lecture #10 External Sorting
Generic Statistical Business Process-Censuses
15.1 The Role of Statistics in the Research Process
1-Way Random Effects Model
Data processing German foreign trade statistics
CS222P: Principles of Data Management Lecture #10 External Sorting
Member States' starting points for modernisation: results of a survey
The role of metadata in census data dissemination
Directors of Social Statistics (DSS) 1-2 Mars 2018
Automatic Editing with Soft Edits
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #09 External Sorting Instructor: Chen Li.
Tabulation and Dual System of Estimation (DSE) Pres. 9
Multi-Mode Data Collection
Major Development Belgium
United States Department of Agriculture
Pnina ZADKA Central Bureau of Statistics Israel
Pnina ZADKA Central Bureau of Statistics Israel
Presentation transcript:

Using Reported Data as Matching Variables in Record Linkage By Bill Iwig, Kara Daniel, Tom Pordugal, and Stan Hoge National Agricultural Statistics Service

NASS Use of Record Linkage Match new list sources to the Farm Register Identify duplication within the Farm Register Match Area Frame records to the Farm Register for measuring coverage National Agricultural Statistics Service

Record Linkage Procedures Matching variables are divided into components Matching components are assigned agreement and disagreement weights Records are only compared within blocks Sum of agreement and disagreement weights compared to thresholds National Agricultural Statistics Service

Record Linkage System Enhancement Use data items as matching variables Provided through SuperMatch software feature Parameters allow “close” values to match and be assigned a reduced agreement weight National Agricultural Statistics Service

Identifying Duplication on 2002 Census of Agriculture Data File 2.85 million records on the Census Mail List Positive data for 1.1 million at the time of record linkage Numerous steps to eliminate duplication prior to data capture Duplication still exists! National Agricultural Statistics Service

National Agricultural Statistics Service Using Census Reported Data as Matching Variables to Identify Duplication 40 data items used “0” values not considered for matching Fewer than 10 positive values for most records National Agricultural Statistics Service

Initial Record Linkage Parameters Agreement weight = 1 Disagreement weight = 0 “Non-tolerable” percentage difference =11 Sum of weights threshold = 5 National Agricultural Statistics Service

Pro-rated Agreement Weight Examples A = 100, B = 95, Wt = .52 A = 20, B = 19, Wt = .52 A = 20, B = 18, Wt = 0 National Agricultural Statistics Service

National Agricultural Statistics Service Results Approximately 1500 potential duplicates identified Actual number of duplicates less than 500 National Agricultural Statistics Service

Recommendations for Effective Application of Data Matching Feature Evaluate distribution of response differences for true duplicates Evaluate handling of “0” values Highly correlated variables Edited and imputed variables Threshold value for matching National Agricultural Statistics Service