Testing the use of administrative data to edit the 2009 Agriculture Census Dolores Lorca National Statistical Institute of Spain
Summary A selective editing procedure is applied to test the use of administrative data to edit the Agriculture Census Study case: Using data from previous Agriculture Census and the 2005 Farm Structural Survey (FSS)
1. Introduction The Spanish NSI carries out every 10 years an Agriculture Census The large number of questionnaires collected by many interviewers during a short time about different kind of holdings can have quite errors to amend in editing
1. Introduction Different editing approaches are applied to the complex set of collected data: –In the data collection phase simple checks are applied using build-in edits in a CAPI system –Selective editing is applied to determine the units that will be manually reviewed –Automatic editing is applied to rest of units using DIA system –Macroediting
2. Selective editing procedure Using administrative data for editing census data Selective editing procedure: To determine and prioritise the suspect units to be manually reviewed
2. Selective editing procedure Using simple expansion estimators: W i is the sample weight for the unit i n is the sample size X i denotes the X variable value for the unit i
2. Selective editing procedure Local score function: X i is the reported value is the administrative value w i =1 ( census data)
2. Selective editing procedure Scaled local score is the total estimate of variable X calculated using administrative data
2. Selective editing procedure Global score GS i =max (LS i )
2. Selective editing procedure Selection of the thresholds: the simulation study approach by Lawrence and Mackenzie (2000): Absolute pseudo-bias of Latouche and Berthelot (1992) is the total estimate obtained by replacing all reported values with a global score larger than the pre- determined threshold (the percentile p) by their administrative values and leaving reported values in place for the others
2. Selective editing procedure Recontact rate: the number of units with a global score larger than the pre-determined threshold (percentile p) divided by the total number of units To choose the thresholds trying to balance between a low pseudo-bias and a low re- contact rate
3. Study case Selected variable: area of olive grove planted 1999 Agriculture Census and administrative register: 253,038 units 2005 FSS and administrative register: 5,804 units Selection of thresholds by geographical area: PB p p=95, 90, 85, 80, 75, 70 Recontacts for each p
3. Study case
With a re-contact rate of 5% the reduction of the pseudo-bias is much greater than the rest of rates At least, for most region, we would use the global score distribution percentile p=95 as threshold
4. Conclusions The pilot study shows that this selective editing approach could help us to prioritise follow-up actions for those units with significant discrepancies with administrative data