MULTI-SOURCE: Administrative data vs CAPI, CATI

Slides:



Advertisements
Similar presentations
Conducting of EU - SILC in the Republic of Macedonia, 2010 REPUBLIC OF MACEDONIA STATE STATISTICAL OFFICE State Statistical Office of Republic of Macedonia.
Advertisements

ECONOMIC STATISTICS AND NATIONAL ACCOUNT IN ETHIOPIA By Sehin Merawi Central Statistical Agency of Ethiopia.
Counting the Dutch, The Future of the Virtual Census in the Netherlands Presentation at the seminar Counting the 7 Billion 24 February 2012 * Geert Bruinooge.
REPUBLIC OF TURKEY TURKISH STATISTICAL INSTITUTE TurkStat Population and Demography Statistics Department Population and Migration Statistics Team
Producing migration data using household surveys Experience of the Republic of Moldova UNECE Work Session on Migration Statistics, Geneva, October.
Use of administrative data in statistics - challenges and opportunities ICES III End Panel Discussion Montreal, June 2007 Heli Jeskanen-Sundström Statistics.
Joint UNECE/Eurostat Work Session on Migration Statistics 3 March, 2008, Geneva, Switzerland Selected methods to improve emigration estimates MEASURING.
Register-Based Census 2011 in Slovenia – Some Quality Aspects Danilo Dolenc Statistical Office of the Republic of Slovenia UNECE-Eurostat Expert Group.
Register-based migration statistics and using additional administrative data sources Barica Razpotnik Statistical Office of the Republic of Slovenia UNECE.
AADAPT Workshop South Asia Goa, December 17-21, 2009 Maria Isabel Beltran 1.
USAGE OF ADMINISTRATIVE DATA IN EU-SILC SURVEY Signe Bāliņa University of Latvia.
National design, fieldwork and data harmonization for Labour Force Survey Irena Svetin Statistical Office of the Republic of Slovenia September 2014.
October 28-30, 2009 UNECE Geneva Quality Assessment of 2008 Integrated Census - Israel Pnina ZADKA Central Bureau of Statistics Israel.
New sources – administrative registers Genovefa RUŽIĆ.
New challenges for Social statistics, EurostatLuxemburg, 23 September 2008 New approach to migration statistics in Lithuania NEW APPROACH TO MIGRATION.
REPUBLIC OF TURKEY TURKISH STATISTICAL INSTITUTE TurkStat Demography Statistics Department Population and Migration Statistics Group EXPERIENCES.
4-6 September 2013, Vilnius Quality in Statistics: Administrative Data and Official Statistics USING ADMINISTRATIVE DATA SOURCES IN OFFICIAL.
EU-SILC Survey Process in the Czech Republic presentation for EU-SILC Methodological Workshop November 7th Martina Mysíková, Martin Zelený Social.
MAKING THE NATIONAL TRANSITION TO A ONE-STOP SHOP IN SOCIAL ASSISTANCE REFORM OF THE SLOVENIAN SOCIAL ASSISTANCE SYSTEM Davor Dominkuš Sašo Domijan.
Establishing a register-based statistical system Example: Population and housing censuses in Norway Training workshop on censuses using administrative.
BC Student Outcomes 55,000 post-secondary students 27,000 respondents
Statistics Netherlands Division Social and Spatial Statistics
Rudi Seljak, Aleš Krajnc
Informal Sector Statistics
Conducting of EU - SILC in the Republic of Macedonia, 2010
LISA, Anticipating the Next Generation of Longitudinal Data
Haksoon Ahn, PhD Associate Professor
The second wave of the new design of the Dutch EU-SILC: Possibilities and challenges Judit Arends.
The importance of administrative data in the era of SDGs
Andris Fisenko and Jānis Lapiņš
LISA, Anticipating the Next Generation of Longitudinal Data
Older persons in the Swedish Labour Force Surveys
Prague EU-SILC Best Practice Workshop, 14th and 15th September 2017
The effects of rotational design and attrition
ADMINISTRATIVE DATA IN ANNUAL BUSINESS STATISTICS OF LATVIA
Haksoon Ahn, PhD Associate Professor
WORKSHOP ON THE DATA COLLECTION OF OCCUPATIONAL DATA Luxembourg, 28 November 2008 Occupation as a core variable in social surveys Sylvain Jouhette
Richard Heuberger, Nadja Lamei Statistics Austria
2011 POPULATION AND HOUSING CENSUS PREPARATORY WORKS
Point 2.1 of the agenda: net monthly income of the household
Census Planning and Management
Towards 2021 Population Census in the Republic of Serbia
2011 Population and Housing Census of Turkey
SES 2014 IN SLOVENIA Miran Žavbi, SURS.
Administrative Data and their Use in Economic Statistics
Directors of Social Statistics Board (DSSB) 4-5 December 2017
Passenger Mobility Statistics 2017
Debriefing from the December 2017 LAMAS meeting Item 4
The change of data sources in the Spanish SILC
Telling Canada’s story in numbers Marie-Josée Major
Implementing mixed mode questionnaire in FI-SILC
WG ILC Nucleus variables.
Andrew Jenkins and Rosalind Levačić
Directors of Social Statistics (DSS) 1-2 Mars 2018
Economic life cycle in Sweden: 1980s, 1990s, & 2000s Daniel Hallberg Institute for Futures Studies, Stockholm 1 Demographic background 2 Institutional.
LAMAS Working Group October 2018
Preparatory activities - CENSUS 2021
Basic preconditions The next round of population and housing censuses is scheduled for the start of the new decade (2021), both in the EU and in the partner.
Key Considerations for Planning and Management of Census Operations
Multi-Mode Data Collection
PRESENTATION OF MONTENEGRO
SMALL AREA ESTIMATION FOR CITY STATISTICS
Technical Coordination Group, Zagreb, Croatia, 26 January 2018
Lithuanian case: The challenges of user friendly questionnaire and data validation Laura Perevičiūtė.
Component 1 Study Visit The Ministry of Labour and Social Affairs, its responsibilities and subordinate authorities Pavel Janeček, Head of the International.
Item 5 Modernisation of the EU-SILC Production
Workshop on best practices for EU-SILC revision, −
Stratification, calibration and reducing attrition rate in the Dutch EU-SILC Judit Arends.
17th Task Force on the revision of the EU-SILC legal basis
Key Considerations for Planning and Management of Census Operations
Presentation transcript:

MULTI-SOURCE: Administrative data vs CAPI, CATI WORKSHOP ON THE BEST PRACTICES FOR EU-SILC London, 16-17 September 2015 MULTI-SOURCE: Administrative data vs CAPI, CATI Martina Stare martina.stare@gov.si

Background of EU-SILC (1) EU SILC frame regulation Output harmonized survey (more than 300 EU variables with very detailed guidelines; SI variables more than 1000) Covers different areas: living conditions; housing conditions; work, employment; health; child care; material deprivation; incomes; ad hoc modules Legal ground for using register in EU SILC – National Statistical Act: SURS has the right to get all administrative sources in Slovenia and use them for statistical purposes Pilot survey in 2003 and 2004 (sample of 300 HH) We tested questionnaires, CATI interviewing and some possibilities to use data from the registers The whole process of data collection, editing and imputations was not tested

Background of EU-SILC (2) Regular survey in 2005 and further 2005 EU-SILC – the first household sample survey that used administrative sources Panel survey (4 years – 4 waves) Selected respondent model: persons aged 16 and over have been selected at random from Central Population Register Sample size every year: approximately 12.500 HH, data collected for 9.000 HH and 28.000 persons (cross-sectional) Response rate: 70-75% (cross-sectional) Longitudinal response: 2011 – wave 1 (DB075=9) - initial sample: 4928  response: 3052 (62 %) 2014 – only 1907 of HH from 2011 wave 1 (39 %)

Data sources for EU-SILC (1) Primary source: Questionnaire(s): The frame  define the persons who are included in the database for EU SILC PAPI - only in 2005 CAPI + CATI – from 2006 onwards Secondary sources: Administrative data and registers 7 institutions outside SURS SURS

Data sources for EU-SILC (2) Outside sources Questionnaire Tax authority Ministry of Labour, Family, Social Affairs and Equal Opportunities Employment Service of Slovenia Agency for Agricultural Markets and Rural Development Ministry of the Interior -Central Population Register Pension and Disability Insurance Institute Health Insurance Institute some demographic data housing conditions dwelling costs material deprivation financial situation child care health incomes which are not included in admin. sources some data about employment overall life satisfaction ad hoc modules (opinion questions..) Inside sources (SURS) Statistical Register of Employment Survey on schoolarship Demographic base „Partial“ databases EU-SILC „Integrated“ database

Data sources for EU-SILC (3) Questionnaire Started with PAPI (only in 2005) A lot of time was devoted to the preparation of the questionnaire (navigations,…) Training for the interviewers was organized in more days (theoretical and practical work); A lot of work and time spent on data entering and controlling (logical controls, syntax errors…) On the filed we tried to get a telephone number (possibility to have CATI in the next years) At the beginning we found several difficulties with the data from administrative sources which we did not expect First release of Social cohesion indicators was in N+2 (February) problems of timeliness From 2006 onward  progress on data collection: CATI – wave 2, 3, 4 (6.500 that already participated: aprox.1300 on mobile phone, the rest on fixed phone) – from January to March CAPI – wave 1 and those from wave 2, 3, 4 without phone number (6.000 HH) – from January to June In May about 800-1000 HH transfer from CATI (no answer, disconnected, disable to answer on the phone, moved households) + moved selected respondents (wave 1 – special procedure to transfer those HH to interviewers ) to CAPI

Data sources for EU-SILC (4) Questionnaire PREPARATION OF THE QUESTIONNAIRE was a huge job in 2006 (and each year after try to improve it) Completely different way of questioning if computer assisted interviewing is used Clear and short questions (sometimes it is very hard to ask as simple as possible) Order of questions is important Instructions for the entry survey – data entry: program Blaise (navigations, logical control, syntax check: active signal or hard error)  program leads the interviewer– easier; quicker Questionnaire testing  very important (if questions are clear, if navigations are OK…); cognitive testing The form of questions is the same for CATI and CAPI

Data sources for EU-SILC (5) Questionnaire INSTRUCTIONS - methodological and organizational guidelines (specifically for CAPI, CATI) TRAINING FOR THE INTERVIEWERS: specifically for CAPI: 50 to 60 interviewer (self employed persons or those working occasionally by contract) more than half of interviewers are experienced and the rest non – experienced specifically for CATI: 25 to 30 interviewers (mostly students, 4 persons also self employed who work for SURS the whole year) almost all the students are already familiar with CATI survey, but they do not have enough life experiences, for example with the cost of utilities, pensions insurance, etc. Theoretical part and practical work on computer; experienced (1 day) vs. non - experienced interviewers (more detailed explanation, 2 days) MONITORING OF INTERVIEWERS AND DATA COLLECTION ON THE FILED department Survey studio monitors (controls) interviewers; better monitoring by CATI (we listen to the interviewers)

Data sources for EU-SILC (6) +/- Questionnaire Combination CAPI and CATI Because of selected respondent model, we can have CATI for waves 2, 3 and 4 Shorter questionnaire (some of the data transfer and some are only checked) Better monitoring of interviewers by CATI Better response rate than CAPI Source: SURS, EU-SILC 2011-2014 Possibility to use follow up – HH (approx. 800) from CATI (no answer, disconnected, disable to answer on the phone) to CAPI +   Response rate (%) 2011 2012 2013 2014 CAPI 63,1 63,7 60,1 61,6 CATI 81,7 82,9 83,0 86,0 Total 72,7 72,6 70,9 72,5

Data sources for EU-SILC (7) +/- Questionnaire Problems with proxy answers For variables collected for all persons (i.e. supplement for meals and transport to work, contributions to individual private pensions plans, etc.)  One person answering for all other HH members Share of proxy is calculated only for selected respondent (around 20 %) Higher share of proxy on CATI (there is less likely that person who answers gives phone to another person / selected respondent) -   PROXY answers (%) 2011 2012 2013 2014 CAPI 20,1 18,0 20,2 18,5 CATI 22,8 22,0 25,7 20,6 Total 21,7 23,2 19,6 Source: SURS, EU-SILC 2011-2014

Data sources for EU-SILC (8) Outside sources Institution Source Tax Authority Income tax register Tax register for income from self-employment Problem: incomes from agriculture are not covered completely Ministry of Labour, Family and Social affairs Family allowances (parental allowance, childbirth allowance, child allowance, large family allowance, allowance for care of a child needing special care and protection, part payment for lost income and compensation for childbirth leave) Social allowances Pension and Disability Insurance Institute Old age, survivor and disability benefits Untaxable allowances for handicapped persons Employment Service of Slovenia Register of unemployed persons Unemployment benefits Health Insurance Institute Activity status for inactive persons Ministry of the Interior - Central Population Register Addresses (for sampling), degree of urbanization, marital status, birthday and gender, country of the birth, citizenship Ministry of Agriculture and the Environment Housing allowance Subsidies from agriculture

Data sources for EU-SILC (9) Inside sources (SURS) Possible additional source / from 2016 onwards: Institution Source Statistical Office Statistical Register of Employment Survey on scholarships (the data are collected by Agency of the Republic of Slovenia for Public Legal Records and Related Service) Demographic base (highest ISCED level attained) – from 2014 onwards Real estate register THE SURVEYING AND MAPPING AUTHORITY OF THE REPUBLIC OF SLOVENIA

Composing the database Questionnaire – data on name, surname, birthdate and sex  linking this questionnire data with Central Population Register to obtain PINs of all memebers of the HH Administrative sources – PIN We had to compose PINs from the data from the questionnaire: 93.09% of PINs were found with computer program 6.90% of PINs – manual searching was used 0.01% of PINs imputed If we take into account only the first wave: 19.66% of PINs – manual searching was used. Source: SURS, EU-SILC 2014 PIN-key for linking data from questionnaire and administrative sources

Total population Editing, imputations Editing, imputations Income tax data Statistical Register of Employment Other administrative sources Total population Editing, imputations Editing, imputations Editing, imputations PIN Questionnaire: Personal data Income tax data Statistical Register of Employment Other admnistrative sources Editing, imputations Editing, imputations Editing, imputations Editing, imputations EU-SILC population Data on personal level Integrated databes composed from all 4 partial databases + variables compose for Eurostat Data on HH level Questionnaire: Household data Integrated databes from questionnaire on HH level + variables compose for Eurostat

Statistical data processing Establishment of data processing in 2007 was urgent  repeatable and transparent data processing (from raw data to final data), separate steps: Logical controls Transfer data from the previous year Editing Imputations Final editing + aggregation and preparing table for releases Every year we tried to improve controls and checking programs: cross-sectional and longitudinal; editing and imputations methods are mostly the same All data are transmitted to the ORACLE database: 5 „partial“ tables After checking data in the partial databases, we produce the so-called „integrated“ database where all data are included – additional controls are done  we find some inconsistencies in the data from different sources Editing and imputations are always done on „ partial“ tables Our data processing is quite complex and time consuming (especially when first time seting the rules) – but every next year data editing is quicker In 2014 upgraded editing process with better technical solution

The advantages of using registers (by EU-SILC) A shorter questionnaire reduce reporting burden Skipping the most difficult and sensible questions about income More accurate data available Item non-response and unit non-response are lower Lower costs (possibility of using CATI)

The disadvantages of using registers (by EU-SILC) Additional work with searching pins for all persons Changing of definitions and sense of variables A lot of work is required to ensure logical integrity of data – differences among the data in administrative sources and questionnaires Data cleaning, editing, imputations take much more time – inconsistence Timeliness is a problem – administrative data are available relatively late We have the largest problem with the main source for EU-SILC, i.e. data from the Tax Authority  final data for N-1 are not available before December of year N GRANT APPLICATION (till the end of 2016)  to improve timeliness Some persons are not in administrative sources The abolition of administrative source

year N -1 year N X+1 year N+1 year N year N+1 year N-1 Phases of work on data before publication in case of use administrative and registers data (SI EU SILC) jan year N -1 year N X+1 year N+1 Reference period for income Data publication of provisional data Searching PINs - KEY for linking data Statistical data editing Data publication of final data Acquirement data from Tax authority (N-1) CAPI + CATI year N year N+1 year N-1 Phases of work on data before publication in case of use only survey data (no administrative sources)

Who is not in administrative sources? (1) If we looked only main activity status (PL211)  without data 7,7 % (approx. 1800 persons without activity for all 12 months) Analysis from the administrative sources for composing activity status (PL211) and income (social allowances are taken into account) Altogether without data 1.9% of persons 0.5% foreigners 0.3% live near the border – some of them work abroad – they are not included in the Statistical Register of Employment; Tax Authority gets income data from abroad later and they are not included into the data which SURS receives in December. 1.2% other reasons Source: SURS, EU-SILC 2014

Who is not in administrative sources? (2) It is impossible to control the full coverage of the data in administrative sources Hidden economy is not included Problems by editing and imputations – income depends on employment status and employment status depends on income We spend each year a lot of time to determine the main activity status for all persons for each month

Who is not in administrative sources? (3) In 2014 we introduced new variables into questionnaire (Q)– „How many months person had certain activity status during income reference period?“ (PL211): no problems with collection this data it helped us to determine the ‚main activity status‘ for those who are not in administrative sources Very good matching for employed, retired person (more than 96 %) if we taken into account those who declared in Q that were all 12 month employed, retired Slightly less successful matching with category „unemployed“ and „other inactive“ For „pupil and students“ we do not have these data in any administrative source, so we used in all cases Q setting new rules – priority: if person are active the priority has the Statistical Register of Employment, Register of Unemployed If person is inactive the priority has Questionnaire  Questionnaire got greater importance

Who are not in administrative sources? (4) According to the status (from the questionnaire) for those who were not in admin. sources (1,9 % of persons) were all 12 months 0.3% in work 1.3% unemployed 0.1% retired 0.03% other inactive 0.2% had different activity status – more than one - through out the year Source: SURS, EU-SILC 2014

Share of imputations for different income variables Description % of persons /households with income % of cases without imputations % of cases with partial imputations % of cases where all income was completely imputed PY010N Employee cash or near cash income - net 56.5 68.04 30.6 1.39 PY035N Contributions to individual private pensions plans - net 13.3 69.3 0.9 29.8 PY090N Unemployment benefits - net 4.5 99.0 0.1 0.0 Source: SURS, EU-SILC 2014

Share of imputations according to original data for the variable “employee cash or near cash income – net” PY010N_I Frequency Percent Entire income imputed 186 1,39 Imputed more than 75% 69 0,51 Imputed more than 50% up to 75% 67 0,5 Imputed more than 25% up to 50% 196 1,46 Imputed more than 10% up to 25% 1245 9,28 Imputed up to 10% 1978 14,74 No imputations 9128 68,04 Up to 10% decreased income 273 2,03 Income decreased from 10% up to 20% 60 0,45 Income decreased for more than 20% 214 1,6   Aggregate PY010N in million Raw data (weighted) 9 572 Final data (weighted) 9 914 Source: SURS, EU-SILC 2014

Conclusions Trade off between timeliness and quality of data The using of multi-sources has advantages and disadvantages as well It is very important to have good EU SILC team: the results and quality of data depend on very good cooperation among staff

Thank you for you attention Questions?