Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improved Register Data Matching and its Impact on Survey Population Estimates Steve Vale Office for National Statistics, UK.

Similar presentations


Presentation on theme: "Improved Register Data Matching and its Impact on Survey Population Estimates Steve Vale Office for National Statistics, UK."— Presentation transcript:

1 Improved Register Data Matching and its Impact on Survey Population Estimates Steve Vale Office for National Statistics, UK

2 Current matching systems Enhancements Impact on survey populations
Contents Background Current matching systems Enhancements Impact on survey populations

3 Background No common business identifier in UK Data from different sources matched using name, address and postcode Software based around SSAName3 Limited clerical input for “possible match” category (>10 employment) Quality marker (“inquiry stop”) used to indicate probability of duplication and to exclude some enterprises from survey populations

4

5 Inquiry Stop 6 Units - Time series
100,000 110,000 120,000 130,000 140,000 150,000 160,000 170,000 180,000 190,000 200,000 Jun- 02 Jul- Aug- Sep- Oct- Nov- Dec- Jan- 03 Feb- Mar- Apr- May- 04

6 Aim to improve the quality of automatic matching
The Project Aim to improve the quality of automatic matching Reduce the number of units on the register that are not included in survey populations Improve certainty about probability of duplication Part funded by Eurostat

7 Name is standardised to form a name key
Matching Process 1 Name is standardised to form a name key Name keys are checked against existing records at decreasing levels of accuracy until possible matches are found The name, address and post codes of possible matches are compared, and a score out of 100 is calculated

8 If the score is >79 it is considered to be a definite match
Matching Process 2 If the score is >79 it is considered to be a definite match If the score is between 60 and 79 it is considered a possible match, and is reported for clerical checking If the score is <60 it is considered a non-match

9 Matching Process 3 Possible matches are checked clerically and linked where appropriate using an on-line system Non-matches with >9 employment are checked - if no link is found they are sent a Business Register Survey form Samples of definite matches and smaller non-matches are checked periodically

10 Re-matching using cleaned addresses
Improvements 1 Re-matching using cleaned addresses Gains from timing Gains from cleaning and standardising addresses Needs extra storage space on the register for cleaned addresses (approx. 3Gb) Address cleaning tool used: Matchcode5 by Capscan

11 Better treatment of compound names
Improvements 2 Enhancing name keys Standardised creation Inclusion of part of postcode Better treatment of compound names E.g. John Smith trading as Smiths Bakery More use of data on company registrations to assist matching of corporate units

12 Some units in survey populations found to be duplicates (1%?)
Results 1 Approximately 30% of units outside survey populations will match to units already in those populations Less than 5% of the remainder are duplicates of units in the survey populations Some units in survey populations found to be duplicates (1%?)

13 Overall impact: Results 2 6% more units in survey populations
Maximum of 1.4% increase in employment Timing of change is an issue The risk of duplication will be less than the risk of under-coverage

14 Conclusions Matching rates will be improved by regular re-matching using cleaned addresses. Initial matching by name can be improved if part of the postcode is included. Improvements to matching increase the certainty that the remaining unmatched units are genuinely single source. Desk profiling and clerical matching can reduce duplication still further if targeted at high risk units.

15 Any Questions? www.statistics.gov.uk/idbr steve.vale@ons.gov.uk
Further information Any Questions?


Download ppt "Improved Register Data Matching and its Impact on Survey Population Estimates Steve Vale Office for National Statistics, UK."

Similar presentations


Ads by Google