Presentation is loading. Please wait.

Presentation is loading. Please wait.

ESSnet Workshop ― Köln, 27-28 October 2011 ESSnet on Data Integration Results of a Project on Record Linkage, Statistical Matching and Micro Integration.

Similar presentations


Presentation on theme: "ESSnet Workshop ― Köln, 27-28 October 2011 ESSnet on Data Integration Results of a Project on Record Linkage, Statistical Matching and Micro Integration."— Presentation transcript:

1 ESSnet Workshop ― Köln, 27-28 October 2011 ESSnet on Data Integration Results of a Project on Record Linkage, Statistical Matching and Micro Integration 1 Miguel Guigó (INE – Spain) Mauro Scanu (ISTAT – Italy)

2 ESSnet Workshop ― Köln, 27-28 October 2011 2 ESSnet on Data Integration Project partners Istituto Nazionale di Statistica (ISTAT) – Italy (Project Co-ordinator) Centraal Bureau voor de Statistiek (CBS) – Netherlands Główny Urząd Statystyczny (GUS) – Poland Instituto Nacional de Estadística (INE) – Spain Swiss Federal Statistical Office (SFSO) – Switzerland Statistisk Sentralbyrå (SSB) – Norway Observers European Central Bank (ECB) – EU Basic data

3 ESSnet Workshop ― Köln, 27-28 October 2011 3 ESSnet on Data Integration Previous project CENEX (then ESSnet) Statistical Methodology Project on Integration of Survey and Administrative Data (ISAD) http://cenex-isad.istat.it 5 partners: ISTAT (Italy) – CzSO (Czech Republic) – CBS (Netherlands) – INE (Spain) – STAT (Austria) December 2006 – June 2008 (18 months) Results (state-of-the-art on methodologies / recommendations for integration methods / software / dissemination) presented to the SPC (November 2008) gathering unanimous support to its continuation Basic data

4 ESSnet Workshop ― Köln, 27-28 October 2011 4 ESSnet on Data Integration Current project 6 partners January 2010 – December 2011 (24 months) Objective : To complete the results achieved by the CENEX ISAD project ESSnet portal: http://www.essnet-portal.eu/di/data-integration The ESSnet Projects Space: https://webgate.ec.europa.eu/fpfis/wikis/display/ESSnet/ESSnet+ Data+Integration (* needs to register) Co-ordinator / Contact: Mauro Scanu / essnet.di@istat.it Basic data

5 ESSnet Workshop ― Köln, 27-28 October 2011 5 ESSnet on Data Integration 3 Methodological areas Record linkage Statistical Matching Micro-integration Processing 5 Areas for actions (5 WPs + WP for Management issues) Providing common knowledge (theoretical background) Developing methodologies (specific subdomains) Implementing tools (software applications) Fostering knowledge transfer (case studies, recommendations) Dissemination throughout the ESS (communication activities) Basic data

6 ESSnet Workshop ― Köln, 27-28 October 2011 6 ESSnet on Data Integration Data Integration issues are becoming increasingly relevant within the ESS, due to quality and budgetary reasons Background Projects: ESSnet DI (ESSnet CENEX ISAD) ESSNet on Use of Administrative Data ESSnet on Small Area Estimation ESSnet DataWarehouse...

7 ESSnet Workshop ― Köln, 27-28 October 2011 7 ESSnet on Data Integration Today's increasing availability of data is an opportunity for NSIs: Allows analyses beyond single data sources [1] Efficient joint use of already existing sources allows to meet information needs without setting up new surveys, thus reducing costs and response burden while providing timely results. Nevertheless, combination of sources must be done with caution: To be aware of the integrated data set statistical properties Background – joint use of data sources [1]: sample surveys, censuses, administrative registers

8 ESSnet Workshop ― Köln, 27-28 October 2011 8 ESSnet on Data Integration Background – Data integration Definition (operative purposes): combination of two (or more) sources of data, at the unit level, for the production of statistics. Goal: to improve the information available on a unit (business, person, household), enhancing the number or quality of variables. Variables ↓ ↓ U n i t s → Aggregates ↑ →

9 ESSnet Workshop ― Köln, 27-28 October 2011 9 ESSnet on Data Integration The ESSnet DI focuses on Background – DI methodologies (1) A)Statistical methodologies for performing data integration: two different sources (datasets) regarding the same population Record linkage Statistical Matching A procedure to determine whether two different records from different datasets belong to the same entity or not (assumed totally or partially overlapped samples) A procedure to add joint information of variables not jointly observed in one survey, but available in two different sample surveys (assumed empty intersection of the units in the two samples: record linkage not possible)

10 ESSnet Workshop ― Köln, 27-28 October 2011 10 ESSnet on Data Integration The ESSnet DI focuses on Background – DI methodologies (2) B)Statistical methodologies for ensuring the usability of the integrated data sets Micro-integration Processing Methods to improve the data quality in combined sources by searching and correcting for the errors at the unit level (ensuring reliability and consistency of the data from the different sources, and comparability in space and time)

11 ESSnet Workshop ― Köln, 27-28 October 2011 11 ESSnet on Data Integration ESSnet DI. Gannt graphic: September 2011 What has been achieved so far WP1 State of the art WP2 Methodological developments WP3 Software WP4 Case studies WP5 Dissemination

12 ESSnet Workshop ― Köln, 27-28 October 2011 12 ESSnet on Data Integration WP1- State of the art: 1.Updated reference available from the former project (CENEX on ISAD) on statistical methods for data integration [Draft] Report of WP1. State of the art on statistical methodologies for data integration (22/06/2010) [1][1] 2.Bibliography and available papers Bibliography on: record linkage [2], statistical matching [3], software for record linkage [4], selected papers on record linkage for the ESSnet DI project [5], statistical matching and statistical disclosure control (SDC) [6] (07/05/2010) [2] [3] [4] [5] [6] Selected papers: record linkage [7], statistical matching [8], statistical matching and SDC [9] (07/05/2010) [7] [8] [9] What has been achieved so far

13 ESSnet Workshop ― Köln, 27-28 October 2011 13 ESSnet on Data Integration WP1- State of the art (2): record linkage, statistical matching The present WP1 document updates the results obtained in the former document with regard to record linkage and statistical matching, covering the following areas, amongst others: What has been achieved so far Advances in the basic Fellegi-Sunter theory for record linkage Bayesian methods for record linkage Statistical matching for data from complex survey sampling Uncertainty and nonparametric procedures for statistical matching Connections between data integration methods and other applied areas: data disclosure, ecological inference

14 ESSnet Workshop ― Köln, 27-28 October 2011 14 ESSnet on Data Integration WP1- State of the art (3): micro integration processing What has been achieved so far MeasurementRepresentation Administrative concept Operationalisation administrative concept Response administrative concept Corrected response statistical concept validity administrative concept measurement error adm concept processing error Target population Registered population elements Linked population elements Postlinking corrections coverage errors linking error correction error register outcome The WP1 document shows A model for possible errors in the joint analysis of two data sets A set of Micro-integration techniques

15 ESSnet Workshop ― Köln, 27-28 October 2011 15 ESSnet on Data Integration WP2 - Methodological developments: To tackle methodological problems in order to make Data Integration methods applicable in the ESS. a) Record linkage Methodological developments on quality measures: estimation of probabilities of correct linkage Editing errors in the relations between units when linking economic data sets to a population frame [1] [1] b) Inference with multiple and incomplete sources Methodological developments on the use of samples drawn according to complex survey designs Handling incompleteness after linkage to a population frame: incoherence in unit types, variables and periods [2] [2] What has been achieved so far

16 ESSnet Workshop ― Köln, 27-28 October 2011 16 ESSnet on Data Integration WP2 - Methodological developments (2): To tackle methodological problems in order to make Data Integration methods applicable in the ESS. b) Inference with multiple and incomplete sources (2) Bootstrapping combined estimators based on register and survey data [3] [3] c) Micro and macro consistency Models and algorithms for micro-integration [4] [4] Applications of macro-integrations [5] [5] What has been achieved so far

17 ESSnet Workshop ― Köln, 27-28 October 2011 17 ESSnet on Data Integration WP3 - common software tools To provide ESS with data integration software tools Record linkage: Updating RELAIS (REcord Linkage At IStat) [1] [1] Current version: 2.1 [2], User’s guide [3] [2] [3] Statistical matching: Improving the R package StatMatch [4] [4] Current version: 1.0.2 [5], User’s guide [6] (04/07/2011) [5] [6] New functionalities: statistical matching when dealing with complex sample survey data via weight calibration; Frechét bounds for uncertainty evaluation of frequencies in a contingency table (categorical variables) What has been achieved so far

18 ESSnet Workshop ― Köln, 27-28 October 2011 18 ESSnet on Data Integration WP4 – Case studies: Development of case studies and associated recommendations on representative problems in data integration in the ESS [Draft] Report of WP4 Case studies (29/07/2011) [1][1] Includes 5 sections on 4 case studies: 1. (a) Register-based employment statistics. A case of microintegration 1. (b) The approach to quality evaluation of the micro-integrated employment statistics 2. Combining data from administrative sources and sample surveys; the single-variable case. Case study: Educational Attainment 3. First Steps in Profiling Italian Patenting Enterprises 4. The framework for error in an integrated survey What has been achieved so far

19 ESSnet Workshop ― Köln, 27-28 October 2011 19 ESSnet on Data Integration WP5 - Dissemination activities towards the ESS: 1.Course on Data Integration: record linkage, statistical matching and micro integration processing; methods, applications and software tools Rome, 09/2011 [1] [1] 2.Training on the job (upon request of other ESS countries) On-the-job training on record linkage: Southampton, 25-28/01/2011; Riga, 4-7/07/2011 [2], [3] [2] [3] On-the-job training on statistical matching: Poznan, 20-22/10/2010 [4], [5] [4] [5] What has been achieved so far

20 ESSnet Workshop ― Köln, 27-28 October 2011 20 ESSnet on Data Integration WP5 - Meetings with experts of other statistical areas: Exchange of know-how between experts in different areas 5 meetings at the partners' premises have been held. Along with issues related to the ESSnet DI organization itself, several seminars, forums and presentations in the area Data Integration were included. Slides of the presentations are available. What has been achieved so far

21 ESSnet Workshop ― Köln, 27-28 October 2011 21 ESSnet on Data Integration WP5 - Meetings with experts of other statistical areas (2) Rome 28-29 January 2010: [1] Data Integration in Small Area Estimation (SAE); [2] DI in statistical disclosure control; [3] DI for business statistics[1][2][3] Den Haag 27-28 May 2010: [4] AdminData ESSnet[4] Poznań 21-22 October 2010: [5] Essnet SAE: Report on the analysis of questionnaires; [6] Integration of census and LFS data: first results[5][6] Neuchâtel 10-11 February 2011: [7] Record linkage: a key and challenging problem for CATI surveys; [8] Turning enterprise into local unit data; [9] Use of statistical matching procedures for the production of social indicators[7][8][9] What has been achieved so far

22 ESSnet Workshop ― Köln, 27-28 October 2011 22 ESSnet on Data Integration WP5 - ESSnet DI Workshop on Data Integration (Madrid, November 2011) Following the success of the ESSnet - ISAD workshop (Vienna (29-30 May 2008), the ESSnet DI includes a workshop open to other ESS with a twofold aim: dissemination of the ESSnet DI results (discussion on the reports, deliverables and and other results) presentations and proceedings by researchers in the ESS on results, methods, problems in the area Data Integration. The future

23 ESSnet Workshop ― Köln, 27-28 October 2011 23 ESSnet on Data Integration WP5 - ESSnet DI Workshop on Data Integration (Madrid, November 2011) (2) Workshop web page: http://www.ine.es/e/essnetdi_ws2011.htmlhttp://www.ine.es/e/essnetdi_ws2011.html Dates: 24/25 November 2011 Venue: Instituto Nacional de Estadistica (INE Spain) Paseo de la Castellana, 183, 28046 Madrid. 1st floor, Main conference room (118) Addressed to either participating or non-participating countries throughout the ESS, and also open to researchers on state-of- the-art methodologies on Data Integration The future

24 ESSnet Workshop ― Köln, 27-28 October 2011 24 ESSnet on Data Integration WP5 - ESSnet DI Workshop on Data Integration (Madrid, November 2011) (3) Five invited speakers from both NSIs and the Academia: William E. Winkler (Bureau of Census, US); Pier Luigi Conti (Univ. La Sapienza, Rome, IT); Elżbieta Gołata (Economic Univ. in Poznan, PL); Manuela Lenk (STAT); Brunero Liseo (Univ. La Sapienza - Rome, IT) Sessions on: Record linkage; statistical matching; micro integration processing; experiences on Data Integration and related domains; register-based statistics; integration of administrative data and surveys 15 expected contributions so far Deadline for registration: 9 November 2011 The future

25 ESSnet Workshop ― Köln, 27-28 October 2011 25 ESSnet on Data Integration Data integration is becoming strategic in different NSIs. Among the available ESSnets, data integration does not only relate to methodology, but also to IT, metadata and information management Workshop. Workshops on nonresponse, editing, small area estimation, business statistics are regurarly held. For this reason the organization of a regular DI workshop for information exchange & networking would be certainly welcome. Given the current financial constraints, a practical difficulty is finding a host. Will it be possible for Eurostat to act as a sponsor? Networking with other geographical areas (U.S.A, Australia, new Zealand, Brazil) is also important. Networking with other ESSnets is a key issue How to sustain ESSnet DI results

26 ESSnet Workshop ― Köln, 27-28 October 2011 26 ESSnet on Data Integration A sustained knowledge-base. Why not using the ESSnet portal as a tool for maintaining the knowledge produced by ESSnets (eg state-of-the-art), helpful for practitioners but also for other projects and for a renewed ESSnet? Task Force in addition to, or as an alternative to, projects. The aim is to increase agility and decrease administrative overload of managing a project team. The partners must make a "business case" including cost-benefit assessment and tangible outputs. The partners may work on their own & meet regularly, as common in ESSnet/framework projects. But they may also get together to solve a problem in a concentrated time period. This is an important aspect of business-case planning. How to sustain ESSnet DI results (2)

27 ESSnet Workshop ― Köln, 27-28 October 2011 27 ESSnet on Data Integration Cristina Casciano, Nicoletta Cibella, Paolo Consolini, Marco Di Zio, Marcello D’Orazio, Marco Fortini, Daniela Ichim, Filippo Oropallo, Laura Peci, Francesca Romana Pogelli, Mauro Scanu, Monica Scannapieco, Giovanni Seri, Tiziana Tuoto, Luca Valentino, Jeroen Pannekoek, Arnout van Delden, Bart Bakker, Paul Knottnerus, Léander Kuijvenhoven, Frank Linder, Nino Mushkudiani, Dominique van Roon, Eric Schulte Nordholt, Jean-Pierre Renfer, Daniel Kilchmann, Marcin Szymkowiak, Adam Ambroziak, Dehnel Grażyna, Tomasz Józefowski, Tomasz Klimanek, Jacek Kowalewski, Ewa Kowalka, Andrzej Młodak, Artur Owczarkowski, Jan Paradysz, Wojciech Roszka, Pietrzak Beata Rynarzewska, Magdalena Zakrzewska, Francisco Hernandez Jimenez, Gervasio-Luís Fernández Trasobares, Miguel Guigó Pérez, Johan Fosen, Li-Chun Zhang Project members Thank you


Download ppt "ESSnet Workshop ― Köln, 27-28 October 2011 ESSnet on Data Integration Results of a Project on Record Linkage, Statistical Matching and Micro Integration."

Similar presentations


Ads by Google