Preliminaries Training Course «Statistical Matching» Rome, 6-8 November 2013 Mauro Scanu Dept. Integration, Quality, Research and Production Networks.

Slides:



Advertisements
Similar presentations
Some considerations on developing a DWH for SBS estimates Orietta Luzi – Mauro Masselli Istat - Italy march 2013.
Advertisements

Eurostat Georgiana Ivan Jean-Louis Mercy Eurostat, European Commission European Conference on Quality in Official Statistics Vienna, 3-5 June 2014 Measuring.
Quality Guidelines for statistical processes using administrative data European Conference on Quality in Official Statistics Q2014 Giovanna Brancato, Francesco.
Bosna i Hercegovina Agencija za statistiku Bosne i Hercegovine Bosna i Hercegovina Agencija za statistiku Bosne i Hercegovine Post-enumeration Survey-A.
New procedures for Editing and Imputation of demographic variables G. Bianchi, A. Manzari, A. Pezone, A. Reale, G. Saporito ISTAT.
The Dutch Censuses of 1960, 1971 and 2001 Producing public use files in the IPUMS project Wijnand Advokaat Statistics Netherlands Division Social and Spatial.
Combining administrative and survey data: potential benefits and impact on editing and imputation for a structural business survey UNECE Work Session on.
Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)
Copyright 2010, The World Bank Group. All Rights Reserved. Integrating Agriculture into National Statistical Systems Section A 1.
Dutch Virtual Census Presentation at the International Seminar on Population and Housing Censuses; Beyond the 2010 Round November, 2012 Egon Gerards,
Comparing approaches of different (partly) register-based countries Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics.
Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands.
ESSnet Workshop Rome December Rome 2012 Memobust: harmonisation and integration issues Rob van de Laar Division of Process development, IT and Methodology.
Quality issues on the way from survey to administrative data: the case of SBS statistics of microenterprises in Slovakia Andrej Vallo, Andrea Bielakova.
GSIM implementation in the Istat Metadata System: focus on structural metadata and on the joint use of GSIM and SDMX Mauro Scanu
European Conference on Quality in Official Statistics Session 26: Quality Issues in Census « Rome, 10 July 2008 « Quality Assurance and Control Programme.
The new multiple-source system for Italian Structural Business Statistics based on administrative and survey data Orietta Luzi, Ugo Guarnera, Paolo Righi.
Centraal Bureau voor de Statistiek Challenges of redesigning household surveys and maintaining output quality Menno Cuppen Paul van der Laan Wim van Nunspeet.
The Dutch Virtual Census of 2001 A New Approach by Combining Different Sources Eric Schulte Nordholt ECE Census meetings Geneva, November 2004.
Use of Administrative Data Seminar on Developing a Programme on Integrated Statistics in support of the Implementation of the SNA for CARICOM countries.
Quality Assurance Programme of the Canadian Census of Population Expert Group Meeting on Population and Housing Censuses Geneva July 7-9, 2010.
Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013.
Eurostat Statistical Matching using auxiliary information Training Course «Statistical Matching» Rome, 6-8 November 2013 Marco Di Zio Dept. Integration,
The availability of Dutch census microdata Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands Division Social.
1 For a Population Statistical Register Characteristics and Potentials for the Official Statistics Central department for administrative data and archives.
Eurostat Accuracy of Results of Statistical Matching Training Course «Statistical Matching» Rome, 6-8 November 2013 Marcello D’Orazio Dept. National Accounts.
Towards a Process Oriented View on Statistical Data Quality Michaela Denk, Wilfried Grossmann.
Geneva, April 2010 Joint UNECE/Eurostat Work Session on Migration Statistics Migration Statistics Mainstreaming Katarzyna Kraszewska European Commission,
Census quality evaluation: Considerations from an international perspective Bernard Baffour and Paolo Valente UNECE Statistical Division Joint UNECE/Eurostat.
13-Jul-07 State of the art of the ISCO-08 implementation.
An Overview of Editing and Imputation Methods for the next Italian Censuses Gianpiero Bianchi, Antonia Manzari, Alessandra Reale UNECE-Eurostat Meeting.
Istat and household data: the past and the future Giorgio Alleva President of the Italian National Institute of Statistics The Bank of Italy’s Analysis.
COMBINING SURVEY AND ADMINISTRATIVE DATA IN THE ITALIAN EU-SILC EXPERIENCE: POSITIVE AND CRITICAL ASPECTS National Institute of Statistics - Italy Claudio.
Metadata requirements for archiving structured data Alice Born Statistics Canada Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (9-11 April.
Establishing a Statistical Business Register (BR)
Metadata models to support the statistical cycle: IMDB
Methods and approaches of collecting education data
Methods for Data-Integration
Short Training Course on Agricultural Cost of Production Statistics
INTRODUCTION AND DEFINITIONS
Implementation of Quality indicators for administrative data
Experiences Informal Sector in National Accounts
Towards more flexibility in responding to users’ needs
Statistics Netherlands Division Social and Spatial Statistics
The Generic Statistical Information Model (GSIM) and the Sistema Unitario dei Metadati (SUM): state of application of the standard Cecilia Casagrande –
The European Statistical Training Programme (ESTP)
Survey phases, survey errors and quality control system
Generic Statistical Business Process Model (GSBPM)
Item 5.1 of the agenda Preliminary results of LUCAS 2009 Part II
Survey phases, survey errors and quality control system
A handbook on validation methodology Marco Di Zio Istat
6.1 Quality improvement Regional Course on
The European Statistical Training Programme (ESTP)
2nd meeting of the task force on survey based disability statistics
Secondary Data Analysis Lec 10
Administrative Data and their Use in Economic Statistics
Overview of Approaches to Register-Based Populating Censuses
A strategy for the development of Social Statistics in the ESS
Agenda item 5.3 EHIS - Implementing Regulation
EUROSTAT –Unit F3 Living conditions and social protection statistics
Parallel Session: BR maintenance Quality in maintenance of a BR:
LAMAS Working Group 6-7 December 2017
2.7 Annex 3 – Quality reports
Chapter 13: Item nonresponse
Quality assurance and assessment in the vital statistics system
Chapter 5: The analysis of nonresponse
A strategy for the development of Social Statistics in the ESS
Modernization of Social statistics: integrated use of survey and
Hands-on GSIM Mauro Scanu ISTAT
GSIM overview Mauro Scanu ISTAT
Presentation transcript:

Preliminaries Training Course «Statistical Matching» Rome, 6-8 November 2013 Mauro Scanu Dept. Integration, Quality, Research and Production Networks Development, Istat scanu [at] istat.it

Notation and statistical matching phases Outline Notation and statistical matching phases Notation Matching phases Harmonising the sources Metadata consistency Accuracy Statistical coherence

Notation target variables (Y,Z): variables that are not jointly observed but whose joint distribution is the target common variables: variables available in both the samples matching variables (X): subset of the common variables used for “building” joint information on the target variables 3

Matching phases 4

Harmonising the sources: 3 main topics In order to harmonize the two data sources it is necessary: to check for metadata consistency; to check for overall accuracy; to check for statistical consistency. 5

1) Metadata consistency There are some very fortunate cases. An example was the Dutch Household Survey on Living Condition (POLS): this consisted of a survey composed of different modules first module - questionnaire on demographic aspects (age, nationality, birth place,...) and socio-economical aspects (education, income,...) second module - screening questions on life conditions; third module - specific questionnaires on different life condition aspects (culture, work, health,...) the first two questionnaires were filled in by all the units in the sample. The sample is then split in distinct subsamples, each subsample should answer only a specific questionnaire on living conditions 6

1) Metadata consistency If joint information on a pair of variables included in different subsamples is of interest, statistical matching can be used. Metadata is already perfectly harmonized. In general, in order to harmonize metadata, these steps should be considered [a] harmonization of unit definition [b] harmonization of reference periods [c] completeness of the reference populations [d] variable harmonization [e] classification harmonization 7

2) Overall accuracy The selection of the matching variables from the set of common variables is essentially a statistical phase, which will be illustrated afterwards. A ``common sense'' rule is to use only high accuracy variables. In this sense, it would be preferable to limit the selection of the matching variables among those which are completely observed. If this is not possible, it would be preferable that missing items are minimal. As far as missing data treatment is concerned, it would be preferable to avoid imputed data. If necessary: split the data set according to the missing data structure use different statistical matching strategies (for the selection of the matching variables) in the different subdata sets 8

3) Statistical coherence A second ``common sense'' rule relates the statistical information contained in the matching variables. In order to have meaningful results, the statistical content of the matching variables should be the same, as if they refer to the same population. More precisely, it is important to check if the distribution of the common variables is the same. In order to do this statistical tests for ``homogeneity'' can be used (not appropriate for large data sets and care should be given to complex designs) ad hoc rules (for instance, cell relative frequencies should not differ more than 5\%) 9

Selected references Laan, P. van der, 2000. Integrating Administrative Registers and HouseholdSurveys. Netherlands Official Statistics, Vol. 15 (Summer 2000): SpecialIssue, Integrating Administrative Registers and Household Surveys, eds.P.G. Al and B.F.M. Bakker, pp. 7-15.