Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Reported Data as Matching Variables in Record Linkage

Similar presentations

Presentation on theme: "Using Reported Data as Matching Variables in Record Linkage"— Presentation transcript:

1 Using Reported Data as Matching Variables in Record Linkage
By Bill Iwig, Kara Daniel, Tom Pordugal, and Stan Hoge National Agricultural Statistics Service

2 NASS Use of Record Linkage
Match new list sources to the Farm Register Identify duplication within the Farm Register Match Area Frame records to the Farm Register for measuring coverage National Agricultural Statistics Service

3 Record Linkage Procedures
Matching variables are divided into components Matching components are assigned agreement and disagreement weights Records are only compared within blocks Sum of agreement and disagreement weights compared to thresholds National Agricultural Statistics Service

4 Record Linkage System Enhancement
Use data items as matching variables Provided through SuperMatch software feature Parameters allow “close” values to match and be assigned a reduced agreement weight National Agricultural Statistics Service

5 Identifying Duplication on 2002 Census of Agriculture Data File
2.85 million records on the Census Mail List Positive data for 1.1 million at the time of record linkage Numerous steps to eliminate duplication prior to data capture Duplication still exists! National Agricultural Statistics Service

6 National Agricultural Statistics Service
Using Census Reported Data as Matching Variables to Identify Duplication 40 data items used “0” values not considered for matching Fewer than 10 positive values for most records National Agricultural Statistics Service

7 Initial Record Linkage Parameters
Agreement weight = 1 Disagreement weight = 0 “Non-tolerable” percentage difference =11 Sum of weights threshold = 5 National Agricultural Statistics Service

8 Pro-rated Agreement Weight Examples
A = 100, B = 95, Wt = .52 A = 20, B = 19, Wt = .52 A = 20, B = 18, Wt = 0 National Agricultural Statistics Service

9 National Agricultural Statistics Service
Results Approximately 1500 potential duplicates identified Actual number of duplicates less than 500 National Agricultural Statistics Service

10 Recommendations for Effective Application of Data Matching Feature
Evaluate distribution of response differences for true duplicates Evaluate handling of “0” values Highly correlated variables Edited and imputed variables Threshold value for matching National Agricultural Statistics Service

Download ppt "Using Reported Data as Matching Variables in Record Linkage"

Similar presentations

Ads by Google