Istituto Nazionale di Statistica – Istat

Slides:



Advertisements
Similar presentations
Enabling Access to Sound Archives through Integration, Enrichment and Retrieval WP1. Project Management.
Advertisements

Overview of GIS projects Geog 463 March 29, 2006.
ESSnet on SDMX phase II Dario Camol
INTEROP WP10: Training activities by e-learning Raúl Poler.
provide information ESSnet on consistency of concepts and applied methods of business and trade related statistics Session 2 : Business.
European Commission DG Enterprise VIRTUAL ENVIRONMENT FOR INNOVATION MANAGEMENT TECHNIQUES VERITE Kick-off meeting Thessaloniki November 2001.
1 INTEROP WP1: Knowledge Map Michaël Petit (U. of Namur) January 19 th 2004 Updated description of tasks after INTEROP Kickoff Meeting, Bordeaux.
ASTRONET Coordinating Strategic Planning for European Astronomy T HE N ETWORKING.
Jenny Linnerud, 27/10/2011, Cologne1 ESSnet CORE Common Reference Environment ESSnet workshop in Cologne 27th and 28th of October 2011.
DonQ – Air Project presentation. DonQ-Air – About project Objective: Objective: to encourage R&D activities in the aeronautic-related.
Marina Signore e- Frame Project Coordinator Division "Metadata, Quality and R&D Projects", Chief Istat e-Frame “European Framework for Measuring Progress.
United Nations Economic Commission for Europe Statistical Division UNECE Big Data Work Steven Vale UNECE
On Implementing CSPA Specifications for Editing and Imputation Services Donato Summa, Monica Scannapieco, Diego Zardetto, Istat, Italy Istituto Nazionale.
Dissemination of SBS data and technical visits to MSs item 10 of the agenda Structural Business Statistics Working Group 14 April 2015, Luxembourg.
Type of funding scheme: STREP Work programme topics addressed: PHC 30 – 2015 Digital representation of health data to improve disease diagnosis and treatment.
Big Data activities at SURS Statistical Office of the Republic of Slovenia DIME/ITDG meeting, February 2016.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA2 Networking support for EGEE III Xavier.
4° ESSnet workshop on the EuroGroups Register Development of an enhanced EGR Vision EGR version 2.0.
Carlo Vaccari – ITDG 2010 Luxembourg1 CORA ESSNet final results.
Co-ordination & Harmonisation of Advanced e-INfrastructures WP4 : Modelling the cooperation of European e-Infrastructures with non-European ones Ludek.
1MIL client logo to be positioned at the mark minimum height maximum size navigator Text Lines MIL Agenda.
Carlo Vaccari – CORA final meeting1 CORA ESSNet final results.
Integration of Demand Side Management, Distributed Generation, Renewable Energy Sources and Energy Storages Task status report, Task XVII EXCO meeting.
The Suitland Working Group: Using Household Surveys to Measure Migration and Migrant Populations Victoria A. Velkoff Assistant Division Chief, Estimates.
Implementing ModernStats Standards Linked Open Metadata
Quantum Leap Project Management
Sharing of previous experiences on scraping Istat’s experience
WEB SCRAPING FOR JOB STATISTICS
WP2 Internal Meeting 15:00-15:30 Next Milestones and proposed workplan
ESSNet Pilot: Web Scraping for Job Vacancy Statistics
WP1: Web scraping Job Vacancies- ELSTAT
Chapter 3: The Project Management Process Groups: A Case Study
The Project Management Framework
Universitat de Barcelona / FBG
Removing Duplicate Job Ads
WP7 MULTI DOMAINS.
Steering Group Admin Project, 12 May 2016
Progress on ESS Validation Project
Dissemination Workshop WP 2: Webscraping / Enterprise Characteristics
Background CRiteria for the IDentification of Groundwater thrEsholds BRIDGE Project Presentation Contract N° (SSPI) Co-ordinator: BRGM (Fr)
ESS Validation State of Play and next steps
WP8 Methodology (SGA2) Piet Daas NL, AT, BG, IT, PT, PL, SL.
ESSnet Linked Open Statistics Update
ESSnet on SDMX phase II Laura Vignola
FP7 SCIENTIFIC NEGOTIATIONS Astrid Kaemena European Commission
Goals and objectives of Work package 2 of the ESSnet on Consistency of concepts and applied methods of business and trade-related statistics Norbert Rainer,
ESS Vision 2020 Recent developments
Dissemination Workshop ESSnet Big Data Sofia, February 2017
ESSNet Pilot: Web Scraping for Job Vacancy Statistics
ESSnet on Linked Open Statistics
ESS Vision 2020 Resource Directors Group – June 2015
Progress of the ESS.VIP ADMIN Special focus on the ESSnet on quality of multiple sources statistics. DIME/ITDG SG, Fabrice Gras, unit B1.
FP7 SCIENTIFIC NEGOTIATIONS
Information session SCIENTIFIC NEGOTIATIONS Call FP7-ENV-2013-two-stage "Environment (including climate change)" Brussels 22/05/2013 José M. Jiménez.
Information session SCIENTIFIC & TECHNICAL NEGOTIATIONS Call FP7-ENV-2013-WATER-INNO-DEMO "Environment (including climate change)" Brussels 24/06/2013.
IOF Event Overview Tool – status Feb 2018
Use of Web scraping for Enterprises Characteristics
Item 3 of the draft agenda ESS.VIP ADMIN: progress report
WP7 – COMBINING BIG DATA - STATISTICAL DOMAINS
Results of the XBRL Pilot Project
X-DIS/XBRL Phase 2 Kick-Off
Final Report of Phase 2 of the Pilot Project
CORA ESSNet COmmon Reference Architecture starting ...
Big Data ESSNet WP 1: Web scraping / Job Vacancies Pilot
The ESS reference metadata standards
ESS Validation Project State of Play and next steps
ESS.VIP ADMIN EssNet on Quality in Multi-source Statistics, progress report 19TH WORKING GROUP ON QUALITY IN STATISTICS, 6 December 2016 Fabrice Gras,
Background CRiteria for the IDentification of Groundwater thrEsholds: BRIDGE Co-ordinator: BRGM (Fr) Groundwater Characterisation workshop, 25 June 2004.
Data integration methods
Session 4: Finalize Project Working Arrangements
Presentation transcript:

Istituto Nazionale di Statistica – Istat ESSnet Big data WP2 Workplan Monica Scannapieco Istituto Nazionale di Statistica – Istat

Web Scraping Enterprises Characteristics: Objectives Main objectives: to demonstrate whether business registers can be improved by predicting values of some key variables starting from scraped data to verify the possibility to produce statistical outputs using predicted data

Web Scraping Enterprises Characteristics: use cases Initial set of use cases in the proposal: whether an enterprise performs e-commerce or not whether an enterprise manages job vacancies on its site presence in social media contact information: location, contact emails, etc. profiling information: type of activity, links with other enterprises, etc.

Work organization - 1 Four tasks: Task 1 – Data access Task 2 – Data handling Task 3 – Testing of Methods and Techniques Task 4 – Finalization of Methods and Techniques Task 1,2,3 in SGA1 (within 31/7/2017) Task 4 foreseen for SGA-2

Work organization - 2 Participants (Effort P/M): IT – 92 BG – 200 NL – 45 PL – 100 SE – 55 UK – 50

Task 1: Data access 1.1 Inventory of enterprises target of the web scraping Dependance from task 2: use case refinement and «specialization» for each country 1.2 Identification of URLs Ad-hoc software tools to retrieve them when not available 1.3 Legal aspects and privacy issues Jointly with WP1

Task 2: Data handling 2.1 Detailed use cases definition coordination with ESS.VIP “European System of Interoperable Statistical Business Registers” 2.2 Choice of techniques and technologies and set up of the working environment Sandbox? 2.3 Carrying out scraping activities and sharing of results among participants

Task 3: Testing of Methods and Techniques Testing activity that will be enriched and finalized in SGA-2 Select some use cases, out of the defined ones, that allow us to have a good representativeness of the overall potential statistical outputs and information to enrich business registers. Build a proof of concept of the selected use cases to predict characteristics of the enterprises by applying text and data mining techniques.

Deliverables and milestones for SGA - 1 To Anticipate for reviewing Deliverables Due date Report with legal aspects Month 12 (January 2017) Technical and methodological report describing web scraping, prediction and inference procedures Month 18 (July 2017) Milestone Progress and technical report of first internal WP-meeting month 4 (May 2016)

Gantt: Proposal M1 (Feb) M3 (April) M6 (July) M9 (October) M12 (Jan) Task 1: Data Access 1.1 Inventory 1.2 URLs 1.3 Legal aspects Task 2: Data Handling 2.1 Use cases 2.2 IT architecture 2.3 Scraping Task 3: Testing 3.1 Proof of Concept

Agenda of the meeting 23 March 2016 14:00-14:30 Overview of WP2 workplan (M. Scannapieco, Istat) 14:30-16:30 Sharing of previous experiences on scraping Istat’s experience (G. Barcaroli) ONS’s experience (R. Breton) CBS’s experience (O. ten Bosch)  16:30-17:30 Characteristics of National Business Registers (all participants)

Agenda of the meeting 24 March 2016 9:30-10:30 The issues of URLs retrieval (G. Barcaroli, M.Scannapieco, Istat) 10:30-11:30 Legal issues (all participants) 11:30 -12:30 Use case definition and stakeholder involvement (all participants) Lunch break 14:00 -15:00 Working environment and tools (Istat) 15:00 -15:30 Interaction with WP1 (M. Scannapieco/R.Breton) 15:30 -16:00 Wrap-up and To Do activities (M. Scannapieco)