Big Data activities at SURS Statistical Office of the Republic of Slovenia DIME/ITDG meeting, February 2016.

Slides:



Advertisements
Similar presentations
ESSnet STAND-PREP Work Package 2. WP2: Aim Systemise standards other than statistical methods and examine issues in the adoption of standards. Consider.
Advertisements

Big Data at Eurostat and the ESS
Challenges in designing mixed mode in business surveys Dr Mojca Noc Razinger, Statistical Office of the Republic of Slovenia.
1 Human resources management in NSOs Training workshop for SADC member states. Luanda, 2-6 Dec 2006 Olav Ljones, Deputy Director General, Statistics Norway.
United Nations Economic Commission for Europe Statistical Division Big Data International Cooperation Steven Vale UNECE
ESTAT International Seminar on Modernizing Official Statistics: Meeting Productivity and New Data Challenges Tianjin, People’s Republic of China
Inter-American Observatory on Security Department of Public Security Secretariat for Multidimensional Security WORKSHOP ON VICTIMIZATION SURVEY IN THE.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
USE OF LITHUANIAN CLASSIFICATION OF OCCUPATIONS ISCO 88, ISCO 2008 and the Development of the ESeC Regional Meeting, Oslo, 7 June 2005 Violeta Skamarociene.
Usage of new data sources at SORS Boro Nikić, Tomaž Špeh, Zvone Klun Statistical Office of the Republic of Slovenia Washington, 29 April - 1 May 2015.
TRUST SOSMIE – Paris Meeting n° 1: 4th - 6th October 2012 Workshop n.1: Partner presentation Workshop n.2: Key questions.
Big Data Activities at Eurostat Workshop on Statistical Data Collection, 29 Apr – 1 May 2015, Washington D.C, USA
Population Census carried out in Armenia in 2011 as an example of the Generic Statistical Business Process Model Anahit Safyan Member of the State Council.
Coordination mechanisms in the area of statistics Henri Laurencin UNCTAD, co-chairman of the Committee for Coordination of Statistical Activities World.
Population census micro data for research: the case of Slovenia Danilo Dolenc Statistical Office of the Republic of Slovenia Ljubljana, First Regional.
Modernisation of ESS infrastructure: The ESS instruments - a review E. di Meglio – P. Jacques – J.M. Museux.
United Nations Economic Commission for Europe Statistical Division Part B of CMF: Metadata, Standards Concepts and Models Jana Meliskova UNECE Work Session.
Dr. Mojca Noč Razinger SURS Data collection in the Statistical Office of the Republic of Slovenia (SURS)
Access to official statistical micro data at the Statistical Office of the Republic of Slovenia and cooperation with the Slovenian Social Science Data.
Developing guidance and counselling within the Lifelong Learning Programme Get Set seminar Turku
National design, fieldwork and data harmonization for Labour Force Survey Irena Svetin Statistical Office of the Republic of Slovenia September 2014.
United Nations Economic Commission for Europe Statistical Division High-Level Group Achievements and Plans Steven Vale UNECE
United Nations Statistics Division Work Programme on Economic Census Vladimir Markhonko, Chief Trade Statistics Branch, UNSD Youlia Antonova, Senior Statistician,
ICT TOOLS AND SOCIETY INVOLVEMENT AMONG THE EUPAN NETWORK HIGHLIGHTS FROM THE SURVEY RESULTS TANYA CHETCUTI AND MARCO FICHERA - WORKSHOP EUROPEAN COMMISSION.
United Nations Economic Commission for Europe Statistical Division UNECE and gender statistics Angela Me UNECE Statistical Division.
United Nations Economic Commission for Europe Statistical Division Data Initiatives: The UNECE Gender Database and Website Victoria Velkoff On behalf of.
United Nations Economic Commission for Europe Statistical Division UNECE Big Data Work Steven Vale UNECE
United Nations Economic Commission for Europe Statistical Division GENDER DIVISION IN INDIA.
Regional Seminar on Promotion and Utilization of Census Results and on the Revision on the United Nations Principles and Recommendations for Population.
Digital Literacy Concepts and basic vocabulary. Digital Literacy Knowledge, skills, and behaviors used in digital devices (computers, tablets, smartphones)
ESSnet(s) Big Data I + II Item 8 of the agenda Joint DIME-ITDG Plenary Luxembourg, 24 Feb 2015.
2.An overview of SDMX (What is SDMX? Part I) 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
Modernisation Activities DIME-ITDG – February 2015 Item 7.
Vita Žunda Q-Placements Project Manager Q_Placements Conference October 7, 2015; Riga, Latvi a LLP LDV Transfer of Innovation Project, LV1-LEO
RECENT DEVELOPMENT OF SORS METADATA REPOSITORIES FOR FASTER AND MORE TRANSPARENT PRODUCTION PROCESS Work Session on Statistical Metadata 9-11 February.
Overview and challenges in the use of administrative data in official statistics IAOS Conference Shanghai, October 2008 Heli Jeskanen-Sundström Statistics.
Contribution of a statistical organisation to social media DWG, Anne Nuka Head of Marketing and Dissemination Department.
Dissemination of SBS data and technical visits to MSs item 10 of the agenda Structural Business Statistics Working Group 14 April 2015, Luxembourg.
4° ESSnet workshop on the EuroGroups Register Development of an enhanced EGR Vision EGR version 2.0.
Statistical Business Register Enterprise Groups in Latvia Sarmite Prole Head of Business Register Section Business Statics Department Central Statistical.
UNECE Data Integration Project
WEB SCRAPING FOR JOB STATISTICS
The ESS vision, ESSnets and SDMX
ESSNet Pilot: Web Scraping for Job Vacancy Statistics
WP1: Web scraping Job Vacancies- ELSTAT
Rudi Seljak, Aleš Krajnc
Istituto Nazionale di Statistica – Istat
László Drótos – Márton Németh National Széchényi Library Department of Electronic Library Services Web archiving Planning a new pilot project.
Steering Group Admin Project, 12 May 2016
Implementing the ESS Vision 2020
United Nations Development Account 10th Tranche Statistics and Data
Eurostat's open data and experimental statistics
Document E4/URBAN/2001/6_EN
Generic Statistical Business Process Model (GSBPM)
Dissemination Workshop ESSnet Big Data Sofia, February 2017
Experimental statistics
ESSNet Pilot: Web Scraping for Job Vacancy Statistics
Scanning the environment: The global perspective on the integration of non-traditional data sources, administrative data and geospatial information Sub-regional.
Statistical Office of the Republic of Slovenia
2. An overview of SDMX (What is SDMX? Part I)
SAEG 15th March 2018 Item 2.1 Use of By Dario Buono.
Boro Nikic WP1&WP2 meeting Rome, November 2016
Use of Web scraping for Enterprises Characteristics
Big Data ESSNet WP 1: Web scraping / Job Vacancies Pilot
ESSnet on EuroGroups Register (EGR)
WP 6 Combining big data: early estimates
Marc Debusschere, Statistics Belgium
ESS Vision 2020 Recent developments
Statistical Office of the Republic of Slovenia and microdata for research Tomaž Smrekar.
Access to Big Data for Statistical Purposes
Presentation transcript:

Big Data activities at SURS Statistical Office of the Republic of Slovenia DIME/ITDG meeting, February 2016

Aim of official statistics Support data users: -Government -Politicians and legislators -Markets -The public -The media -International community 2

Data Sources -Surveys -Administrative sources -Big Data 3

Big Data – Possible usage -New statistics -New (or combined) sources for existing statistics -Validation („Benchmarking“) of data and statistics -Different mode of data collection -„Flash“ statistic -Faster release of statistics AAPOR Report on Big Data (2015): “Surveys and Big Data are complementary data sources, not competing data sources”. 4

Current activities at SURS Analysis of different types of Big Data and possibilities of their usage in regular statistical production (mobile positioning data, scanned price data, web scraping, etc.) IT infrastructure Partnership with stakeholders (data owners, academia, etc.) Active participation in different international task forces (Eurostat BD Task Force, UNECE BD Task Force) and projects (ESSNET grant pilots) 5

Statistical model and new sources Scanned & scraped data of prices and job vacancies New type of statistics on mobile positioning data Comparison between job vacancies statistics from survey and scraped data

Web scraping system for identifying job advertisements 7

Process of creating the collection tool Spider: The aim of Spider is to take a company website and find all webpages (sub links) on this website that relate to employment. Downloader: The task of Downloader is simply to download the content of the saved URL links (problems with the pdf files and https). Splitter: The aim of Splitter is to split the content of the certain URL into different documents. Determinator: The aim of Determinator is to detect the JV ads in the documents from Splitter. Classifier: The aim of Classifier is to classify the detected JV, for example by occupation, deadline, address, region.

Process of creating the collection tool Two different approaches of detecting the JV ads are currently being carried out: Usage of "decision tree" on the content of downloaded URLs Usage of the list of common key words and phrases (whitelist and blacklist of words) in order to detect the JV ads from the content of downloaded URLs

Job Ads Statistics - initial results 10

Mobile positioning and statistical derivatives  Mobile operators - 4 mobile network operators - 3 service providers - 3 re-sellers - first 4 are primary data providers - all network operators and service providers could be/are important! 11

Mobile data For the investigation purposes, SURS had access to data from the second largest mobile operator in Slovenia Data from April to October 2014 (1 billion records) Three variables - Anonymized IMEI, -Time of event (outgoing call, outgoing SMS, connecting to internet using mobile phone) -Coordinates of antennas 12

Daytime Density of people in Ljubljana during the day

Density of people in Ljubljana during the night 14

BD activities in 2016 (1) In the February the set of workshops will be organized with the subject matter statisticians. Goal: brainstorm the ideas and preparation a business cases for usage of BD in different domains of statistical production 15

BD activities in 2016 (2) Deepen cooperation with Slovenian universities: Goal: Education of colleagues Usage of data mining (and collection)tools developed by Slovenian faculties or Cooperation in projects 16

BD activities in 2016 (3) Active part in ESSNET BD project ( one of WP leaders) Organization of Eurostat Big Data Workshop in Slovenia and contribution in ethical review and ethical guidelines which is to be prepared this year. Continuation of ongoing work in local projects (Job vacancies data from enterprise websites and CEMODE) 17

Open questions Access to data (legal issues, partnership, etc.) Big data are used for different purposes (different definitions) There is no control of the collection process Data could change or even extinct Public perception IT and methodological skills IT infrastructure Quality of data 18