Presentation is loading. Please wait.

Presentation is loading. Please wait.

MULTI-SOURCE: Administrative data vs CAPI, CATI

Similar presentations


Presentation on theme: "MULTI-SOURCE: Administrative data vs CAPI, CATI"— Presentation transcript:

1 MULTI-SOURCE: Administrative data vs CAPI, CATI
WORKSHOP ON THE BEST PRACTICES FOR EU-SILC London, September 2015 MULTI-SOURCE: Administrative data vs CAPI, CATI Martina Stare

2 Background of EU-SILC (1)
EU SILC frame regulation Output harmonized survey (more than 300 EU variables with very detailed guidelines; SI variables more than 1000) Covers different areas: living conditions; housing conditions; work, employment; health; child care; material deprivation; incomes; ad hoc modules Legal ground for using register in EU SILC – National Statistical Act: SURS has the right to get all administrative sources in Slovenia and use them for statistical purposes Pilot survey in 2003 and 2004 (sample of 300 HH) We tested questionnaires, CATI interviewing and some possibilities to use data from the registers The whole process of data collection, editing and imputations was not tested

3 Background of EU-SILC (2)
Regular survey in 2005 and further 2005 EU-SILC – the first household sample survey that used administrative sources Panel survey (4 years – 4 waves) Selected respondent model: persons aged 16 and over have been selected at random from Central Population Register Sample size every year: approximately HH, data collected for HH and persons (cross-sectional) Response rate: 70-75% (cross-sectional) Longitudinal response: 2011 – wave 1 (DB075=9) - initial sample: 4928  response: 3052 (62 %) 2014 – only 1907 of HH from 2011 wave 1 (39 %)

4 Data sources for EU-SILC (1)
Primary source: Questionnaire(s): The frame  define the persons who are included in the database for EU SILC PAPI - only in 2005 CAPI + CATI – from 2006 onwards Secondary sources: Administrative data and registers 7 institutions outside SURS SURS

5 Data sources for EU-SILC (2)
Outside sources Questionnaire Tax authority Ministry of Labour, Family, Social Affairs and Equal Opportunities Employment Service of Slovenia Agency for Agricultural Markets and Rural Development Ministry of the Interior -Central Population Register Pension and Disability Insurance Institute Health Insurance Institute some demographic data housing conditions dwelling costs material deprivation financial situation child care health incomes which are not included in admin. sources some data about employment overall life satisfaction ad hoc modules (opinion questions..) Inside sources (SURS) Statistical Register of Employment Survey on schoolarship Demographic base „Partial“ databases EU-SILC „Integrated“ database

6 Data sources for EU-SILC (3)
Questionnaire Started with PAPI (only in 2005) A lot of time was devoted to the preparation of the questionnaire (navigations,…) Training for the interviewers was organized in more days (theoretical and practical work); A lot of work and time spent on data entering and controlling (logical controls, syntax errors…) On the filed we tried to get a telephone number (possibility to have CATI in the next years) At the beginning we found several difficulties with the data from administrative sources which we did not expect First release of Social cohesion indicators was in N+2 (February) problems of timeliness From 2006 onward  progress on data collection: CATI – wave 2, 3, 4 (6.500 that already participated: aprox.1300 on mobile phone, the rest on fixed phone) – from January to March CAPI – wave 1 and those from wave 2, 3, 4 without phone number (6.000 HH) – from January to June In May about HH transfer from CATI (no answer, disconnected, disable to answer on the phone, moved households) + moved selected respondents (wave 1 – special procedure to transfer those HH to interviewers ) to CAPI

7 Data sources for EU-SILC (4)
Questionnaire PREPARATION OF THE QUESTIONNAIRE was a huge job in 2006 (and each year after try to improve it) Completely different way of questioning if computer assisted interviewing is used Clear and short questions (sometimes it is very hard to ask as simple as possible) Order of questions is important Instructions for the entry survey – data entry: program Blaise (navigations, logical control, syntax check: active signal or hard error)  program leads the interviewer– easier; quicker Questionnaire testing  very important (if questions are clear, if navigations are OK…); cognitive testing The form of questions is the same for CATI and CAPI

8 Data sources for EU-SILC (5)
Questionnaire INSTRUCTIONS - methodological and organizational guidelines (specifically for CAPI, CATI) TRAINING FOR THE INTERVIEWERS: specifically for CAPI: 50 to 60 interviewer (self employed persons or those working occasionally by contract) more than half of interviewers are experienced and the rest non – experienced specifically for CATI: 25 to 30 interviewers (mostly students, 4 persons also self employed who work for SURS the whole year) almost all the students are already familiar with CATI survey, but they do not have enough life experiences, for example with the cost of utilities, pensions insurance, etc. Theoretical part and practical work on computer; experienced (1 day) vs. non - experienced interviewers (more detailed explanation, 2 days) MONITORING OF INTERVIEWERS AND DATA COLLECTION ON THE FILED department Survey studio monitors (controls) interviewers; better monitoring by CATI (we listen to the interviewers)

9 Data sources for EU-SILC (6)
+/- Questionnaire Combination CAPI and CATI Because of selected respondent model, we can have CATI for waves 2, 3 and 4 Shorter questionnaire (some of the data transfer and some are only checked) Better monitoring of interviewers by CATI Better response rate than CAPI Source: SURS, EU-SILC Possibility to use follow up – HH (approx. 800) from CATI (no answer, disconnected, disable to answer on the phone) to CAPI + Response rate (%) 2011 2012 2013 2014 CAPI 63,1 63,7 60,1 61,6 CATI 81,7 82,9 83,0 86,0 Total 72,7 72,6 70,9 72,5

10 Data sources for EU-SILC (7)
+/- Questionnaire Problems with proxy answers For variables collected for all persons (i.e. supplement for meals and transport to work, contributions to individual private pensions plans, etc.)  One person answering for all other HH members Share of proxy is calculated only for selected respondent (around 20 %) Higher share of proxy on CATI (there is less likely that person who answers gives phone to another person / selected respondent) - PROXY answers (%) 2011 2012 2013 2014 CAPI 20,1 18,0 20,2 18,5 CATI 22,8 22,0 25,7 20,6 Total 21,7 23,2 19,6 Source: SURS, EU-SILC

11 Data sources for EU-SILC (8)
Outside sources Institution Source Tax Authority Income tax register Tax register for income from self-employment Problem: incomes from agriculture are not covered completely Ministry of Labour, Family and Social affairs Family allowances (parental allowance, childbirth allowance, child allowance, large family allowance, allowance for care of a child needing special care and protection, part payment for lost income and compensation for childbirth leave) Social allowances Pension and Disability Insurance Institute Old age, survivor and disability benefits Untaxable allowances for handicapped persons Employment Service of Slovenia Register of unemployed persons Unemployment benefits Health Insurance Institute Activity status for inactive persons Ministry of the Interior - Central Population Register Addresses (for sampling), degree of urbanization, marital status, birthday and gender, country of the birth, citizenship Ministry of Agriculture and the Environment Housing allowance Subsidies from agriculture

12 Data sources for EU-SILC (9)
Inside sources (SURS) Possible additional source / from 2016 onwards: Institution Source Statistical Office Statistical Register of Employment Survey on scholarships (the data are collected by Agency of the Republic of Slovenia for Public Legal Records and Related Service) Demographic base (highest ISCED level attained) – from 2014 onwards Real estate register THE SURVEYING AND MAPPING AUTHORITY OF THE REPUBLIC OF SLOVENIA

13 Composing the database
Questionnaire – data on name, surname, birthdate and sex  linking this questionnire data with Central Population Register to obtain PINs of all memebers of the HH Administrative sources – PIN We had to compose PINs from the data from the questionnaire: 93.09% of PINs were found with computer program 6.90% of PINs – manual searching was used 0.01% of PINs imputed If we take into account only the first wave: 19.66% of PINs – manual searching was used. Source: SURS, EU-SILC 2014 PIN-key for linking data from questionnaire and administrative sources

14 Total population Editing, imputations Editing, imputations
Income tax data Statistical Register of Employment Other administrative sources Total population Editing, imputations Editing, imputations Editing, imputations PIN Questionnaire: Personal data Income tax data Statistical Register of Employment Other admnistrative sources Editing, imputations Editing, imputations Editing, imputations Editing, imputations EU-SILC population Data on personal level Integrated databes composed from all 4 partial databases + variables compose for Eurostat Data on HH level Questionnaire: Household data Integrated databes from questionnaire on HH level + variables compose for Eurostat

15 Statistical data processing
Establishment of data processing in 2007 was urgent  repeatable and transparent data processing (from raw data to final data), separate steps: Logical controls Transfer data from the previous year Editing Imputations Final editing + aggregation and preparing table for releases Every year we tried to improve controls and checking programs: cross-sectional and longitudinal; editing and imputations methods are mostly the same All data are transmitted to the ORACLE database: 5 „partial“ tables After checking data in the partial databases, we produce the so-called „integrated“ database where all data are included – additional controls are done  we find some inconsistencies in the data from different sources Editing and imputations are always done on „ partial“ tables Our data processing is quite complex and time consuming (especially when first time seting the rules) – but every next year data editing is quicker In 2014 upgraded editing process with better technical solution

16 The advantages of using registers (by EU-SILC)
A shorter questionnaire reduce reporting burden Skipping the most difficult and sensible questions about income More accurate data available Item non-response and unit non-response are lower Lower costs (possibility of using CATI)

17 The disadvantages of using registers (by EU-SILC)
Additional work with searching pins for all persons Changing of definitions and sense of variables A lot of work is required to ensure logical integrity of data – differences among the data in administrative sources and questionnaires Data cleaning, editing, imputations take much more time – inconsistence Timeliness is a problem – administrative data are available relatively late We have the largest problem with the main source for EU-SILC, i.e. data from the Tax Authority  final data for N-1 are not available before December of year N GRANT APPLICATION (till the end of 2016)  to improve timeliness Some persons are not in administrative sources The abolition of administrative source

18 year N -1 year N X+1 year N+1 year N year N+1 year N-1
Phases of work on data before publication in case of use administrative and registers data (SI EU SILC) jan year N -1 year N X+1 year N+1 Reference period for income Data publication of provisional data Searching PINs - KEY for linking data Statistical data editing Data publication of final data Acquirement data from Tax authority (N-1) CAPI + CATI year N year N+1 year N-1 Phases of work on data before publication in case of use only survey data (no administrative sources)

19 Who is not in administrative sources? (1)
If we looked only main activity status (PL211)  without data 7,7 % (approx persons without activity for all 12 months) Analysis from the administrative sources for composing activity status (PL211) and income (social allowances are taken into account) Altogether without data 1.9% of persons 0.5% foreigners 0.3% live near the border – some of them work abroad – they are not included in the Statistical Register of Employment; Tax Authority gets income data from abroad later and they are not included into the data which SURS receives in December. 1.2% other reasons Source: SURS, EU-SILC 2014

20 Who is not in administrative sources? (2)
It is impossible to control the full coverage of the data in administrative sources Hidden economy is not included Problems by editing and imputations – income depends on employment status and employment status depends on income We spend each year a lot of time to determine the main activity status for all persons for each month

21 Who is not in administrative sources? (3)
In 2014 we introduced new variables into questionnaire (Q)– „How many months person had certain activity status during income reference period?“ (PL211): no problems with collection this data it helped us to determine the ‚main activity status‘ for those who are not in administrative sources Very good matching for employed, retired person (more than 96 %) if we taken into account those who declared in Q that were all 12 month employed, retired Slightly less successful matching with category „unemployed“ and „other inactive“ For „pupil and students“ we do not have these data in any administrative source, so we used in all cases Q setting new rules – priority: if person are active the priority has the Statistical Register of Employment, Register of Unemployed If person is inactive the priority has Questionnaire  Questionnaire got greater importance

22 Who are not in administrative sources? (4)
According to the status (from the questionnaire) for those who were not in admin. sources (1,9 % of persons) were all 12 months 0.3% in work 1.3% unemployed 0.1% retired 0.03% other inactive 0.2% had different activity status – more than one - through out the year Source: SURS, EU-SILC 2014

23 Share of imputations for different income variables
Description % of persons /households with income % of cases without imputations % of cases with partial imputations % of cases where all income was completely imputed PY010N Employee cash or near cash income - net 56.5 68.04 30.6 1.39 PY035N Contributions to individual private pensions plans - net 13.3 69.3 0.9 29.8 PY090N Unemployment benefits - net 4.5 99.0 0.1 0.0 Source: SURS, EU-SILC 2014

24 Share of imputations according to original data for the variable “employee cash or near cash income – net” PY010N_I Frequency Percent Entire income imputed 186 1,39 Imputed more than 75% 69 0,51 Imputed more than 50% up to 75% 67 0,5 Imputed more than 25% up to 50% 196 1,46 Imputed more than 10% up to 25% 1245 9,28 Imputed up to 10% 1978 14,74 No imputations 9128 68,04 Up to 10% decreased income 273 2,03 Income decreased from 10% up to 20% 60 0,45 Income decreased for more than 20% 214 1,6 Aggregate PY010N in million Raw data (weighted) 9 572 Final data (weighted) 9 914 Source: SURS, EU-SILC 2014

25 Conclusions Trade off between timeliness and quality of data
The using of multi-sources has advantages and disadvantages as well It is very important to have good EU SILC team: the results and quality of data depend on very good cooperation among staff

26 Thank you for you attention
Questions?


Download ppt "MULTI-SOURCE: Administrative data vs CAPI, CATI"

Similar presentations


Ads by Google