Presentation is loading. Please wait.

Presentation is loading. Please wait.

Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)

Similar presentations


Presentation on theme: "Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)"— Presentation transcript:

1 Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)

2 Overview Issues with integrating admin and survey data Plans for Work Package 5 Work stream B Improving predicted values

3 Issues with integrating admin data Admin data may have different unique identifiers to survey data One-to-one and many admin units to one survey unit matches are fine Many-to-many and one admin unit to many survey units matches are more difficult Matching can result in duplicate & missing units Common variables from admin and survey sources often have different definitions Admin data are often of different periodicity to survey data (dealt with in WP4)

4 Plans for WP5 Split into two work streams to focus on specific topics relating to integrating data from multiple sources: Work stream A – methods for editing integrated data to ensure they are error-free and consistent Work stream B – combining admin data with survey data to improve editing and imputation

5 Work stream B (1/3) Collaboration between UK, Belgium and Italy Improving editing and imputation by using admin data integrated with survey data Initially concentrating on Structural Business Statistics Hope to extend to Short Term Statistics following progress by WP4 on periodicity issues

6 Work stream B (2/3) For some countries, admin data offers possibility of imputing rather than re-weighting to account for non-response In other cases, admin data can improve the accuracy of predicted values used in both editing and imputation Begin by analysing integrated admin and survey data available in UK, Belgium and Italy Test whether admin data offers benefits over other available predictors

7 Work stream B (3/3) Ultimately aiming for ESS wide recommendations Research availability of admin data and editing and imputation methods used in other ESS countries for SBS surveys Use information to undertake further analysis which will be applicable to other European countries

8 Use of predicted values Predicted values are often used when editing and imputing survey data: Traditional edit rules often compare survey responses with predicted values – large deviations are deemed suspicious Selective editing relies on having predicted values for each business in order to estimate the importance of potential errors on survey estimates Item non-response can be dealt with by modelling the relationship between survey and other (related) data

9 Improving predicted values More accurate predictors can improve the ability of editing methods to identify erroneous responses Can improve quality of data Or keep quality the same whilst reducing survey costs and burden on businesses Also possible to improve outputs by better imputation Using predictors directly as imputations or modelling with survey data

10 Evaluating predicted values (1/2) Estimated error for each predictor: Estimate savings to editing by comparing edit failures using edit rules with current and new predictors

11 Evaluating predicted values (2/2) Use imputation study to estimate imputation bias for methods based on each predictor Check distribution of imputed data sets

12 UK example of improved predicted values Used modelled and unmodelled VAT turnover and expenditure to predict 5 key SBS variables (Turnover, Purchases, Employment Costs, Net Capital Expenditure, Gross Value Added) Compared with existing predictors (previous values where available, register values) Restricted study to one-to-one matches between sources

13 UK example of improved predicted values Previous period values are generally best predictors, but only available for a third of the sample For Turnover, Purchases, Employment Costs and Gross Value Added, VAT predictors were better than register values, often significantly so Illustrates potential benefits of using admin data integrated with survey data Also highlights some of the problems that need to be addressed when integrating data from multiple sources

14 Summary WP5 focuses on integrating data from multiple sources in production of business statistics Two work streams looking at different aspects of this Editing integrated data Use of admin data to improve editing and imputation of survey data Previous research suggests that improvements to survey outputs and reduction in costs and burden are possible Will ultimately produce guidelines for ESS


Download ppt "Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)"

Similar presentations


Ads by Google