WP7 – COMBINING BIG DATA - STATISTICAL DOMAINS The meeting to prepare SGA-1 by ESSnet BIG DATA 7-8 January 2016
Agenda Introduction TASK 1. Data availability/Data inventory TASK 2. Data feasibility TASK 3. Data combination TASK 4. Summary plus future perspectives
‘Population’, ‘Tourism/border crossings’ and ‘Agriculture’. Introduction Aim of this workpackage is to find out how a combination of: big data sources, administrative data statistical data may enrich statistical output in domains: ‘Population’, ‘Tourism/border crossings’ and ‘Agriculture’. In many cases, one data source will not suffice for producing official statistics. In these cases, one has to combine different data sources. This package has a scientific nature. From the methodological, qualitative and technical point of view it is required to work with professional independence. According to EUROSTAT: „Perform a conceptual investigation of the potential of combining multiple sources. This investigation should be organized by statistical domain. The following domains should be targeted: demography and migration, tourism, agriculture”
Introduction TASK 4 Summary plus future perspectives From the methodological, qualitative and technical point of view it is required to work with professional independence. However, WP 7 should use the experience of other workpackages especially sources-oriented. It is a reason that some WP7` tasks should take place after the tasks of other workpackages.
TASK 1. Data availability/Data inventory Identify big data sources taking into account sustainability and availability in several countries. Establishing an inventory of these sources by: brainstorming - a review of potential sources Preparation of the questionnaire with questions about the sources used by the project participants. Sending the questionnaire to participants Gathering answers and preparation for analysis Assessment of the possibility of using sources for big data analysis in the domains of population, tourism/border crossings, agriculture Build the list of potential sources
TASK 1. Data availability/Data inventory Identify which results or new products from the source-oriented pilots may contribute to these domains. Match the sources from the list of potential sources to following domains: Population Tourism/border crossings Agriculture Preliminary analysis of possibility for using sources to each domain - including: Consideration of the legal aspects Consideration of availability The preliminary analysis of the methodological aspects Consideration of the quality issues Preparation of initial technical requirements Build the list of exploitable sources for each domain Describe the added value of delivered linkage between these sources to current statistics. Analyze the list of exploitable sources for each domain Prepare the map of linkages between Big Data sources (e.g which aspect of one data source can be used in several domains) Describe the added value for each domain.
TASK 1. Data availability/Data inventory Milestone 1. List of availability big data sources in the domain(s); by M6
TASK 2. Data feasibility Carry out explorative analyses on two or three big data sources in the domain of population, tourism / border crossings or agriculture. Selection the most value big data sources for each domain. Evaluation of the legal aspects; Evaluation of availability; Evaluation of methodology; Evaluation of the quality; Evaluation of technical requirements. Analyzing results. Preliminary assessment of the usefulness - developing the assessment factors Selection and recommendation two or three big data sources for using in the domain of population, tourism / border crossings, agriculture. Preparing the SWOT analysis (positive and negative factors of using several sources) Recommendation the most important and useful sources.
TASK 2. Data feasibility Milestone 2. Recommendation for using two or three big data sources in the domain(s); by M13 Recommendation for using two or three big data sources in the domain(s) – investigation whether that sources could be useful;
the partial report for each domain containing basic information on: SGA-1 SUMMARY Milestone 1. List of availability big data sources in the domain(s); by M6 Milestone 2. Recommendation for using two or three big data sources in the domain(s); by M13 DELIVERABLE the partial report for each domain containing basic information on: The data access (with legal and privacy aspects) The data quality issues The methodology (focus also on combining data) The technical aspects by M14
TASK 3. Data combination The experimental work (if practical work would be possible or if not it would be theoretical considerations including consultation with practice ex. sandbox ) Data collection Data preparation Data analysis Describe practical, technical and methodological aspects when combining big data outputs in the statistical system. For example, differences in definition, populations and volatility etc. Provide first answers on quality issues when combining big data with traditional outputs. Provide answers on the question whether micro-data have to be used when combining big data estimates with traditional outputs or data at aggregated level can be considered. Analysis advantage and disadvantages of combining data Preparing the list of criteria for combining data
TASK 3. Data combination Milestone 3. Combining data analyze; by M19
TASK 4. Summary plus future perspectives Suggest pilots and domains with successful implementation potential for further elaboration in the second wave of pilots in 2018. Recommendation on legal aspects; Recommendation on availability; Recommendation on methodology; Recommendation on quality; Recommendation on technical requirements. Conclusion
TASK 4. Summary plus future perspectives Milestone 4. List of potential pilots and domains with successful implementation potential - for further elaboration in the second wave of pilots in 2018. by M23 Deliverable (by M24) - THE GENERAL REPORT FOR DOMAINS including: The data access (with legal and privacy aspects) The data quality issues The methodology (focus also on combining data) The technical aspects
SGA-2 SUMMARY Deliverable THE GENERAL REPORT FOR DOMAINS including: Milestone 3. Combining data analyze; by M19 Milestone 4. List of pilots and domains with successful implementation potential for further elaboration in the second wave of pilots in 2018; by M23 Deliverable THE GENERAL REPORT FOR DOMAINS including: The data access (with legal and privacy aspects) The data quality issues The methodology (focus also on combining data) The technical aspects by M24
TIMETABLE FOR WP 7 COMBINING BIG DATA - STATISTICAL DOMAINS INVENTORY PREPARATION SELECTION COMBINING REPORT AND RECOMMEND Milestones 1. List of availability big data sources in the domain(s); by M6 Milestones 2. Recommendation for using two or three big data sources in the domain(s); by M13 DELIVERABLE the partial report for each domain by M14 Milestones 3. Combining data analyze; by M19 Milestones 4. List of pilots and domains with successful implementation potential for further elaboration in the second wave of pilots in 2018; by M23 Deliverable THE GENERAL REPORT FOR DOMAINS by M24
a.nowicka@stat.gov.pl w.jazwinska@stat.gov.pl