Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data activities at SURS Statistical Office of the Republic of Slovenia DIME/ITDG meeting, February 2016.

Similar presentations


Presentation on theme: "Big Data activities at SURS Statistical Office of the Republic of Slovenia DIME/ITDG meeting, February 2016."— Presentation transcript:

1 Big Data activities at SURS Statistical Office of the Republic of Slovenia DIME/ITDG meeting, February 2016

2 Aim of official statistics Support data users: -Government -Politicians and legislators -Markets -The public -The media -International community 2

3 Data Sources -Surveys -Administrative sources -Big Data 3

4 Big Data – Possible usage -New statistics -New (or combined) sources for existing statistics -Validation („Benchmarking“) of data and statistics -Different mode of data collection -„Flash“ statistic -Faster release of statistics AAPOR Report on Big Data (2015): “Surveys and Big Data are complementary data sources, not competing data sources”. 4

5 Current activities at SURS Analysis of different types of Big Data and possibilities of their usage in regular statistical production (mobile positioning data, scanned price data, web scraping, etc.) IT infrastructure Partnership with stakeholders (data owners, academia, etc.) Active participation in different international task forces (Eurostat BD Task Force, UNECE BD Task Force) and projects (ESSNET grant pilots) 5

6 Statistical model and new sources Scanned & scraped data of prices and job vacancies New type of statistics on mobile positioning data Comparison between job vacancies statistics from survey and scraped data

7 Web scraping system for identifying job advertisements 7

8 Process of creating the collection tool Spider: The aim of Spider is to take a company website and find all webpages (sub links) on this website that relate to employment. Downloader: The task of Downloader is simply to download the content of the saved URL links (problems with the pdf files and https). Splitter: The aim of Splitter is to split the content of the certain URL into different documents. Determinator: The aim of Determinator is to detect the JV ads in the documents from Splitter. Classifier: The aim of Classifier is to classify the detected JV, for example by occupation, deadline, address, region.

9 Process of creating the collection tool Two different approaches of detecting the JV ads are currently being carried out: Usage of "decision tree" on the content of downloaded URLs Usage of the list of common key words and phrases (whitelist and blacklist of words) in order to detect the JV ads from the content of downloaded URLs

10 Job Ads Statistics - initial results 10

11 Mobile positioning and statistical derivatives  Mobile operators - 4 mobile network operators - 3 service providers - 3 re-sellers - first 4 are primary data providers - all network operators and service providers could be/are important! 11

12 Mobile data For the investigation purposes, SURS had access to data from the second largest mobile operator in Slovenia Data from April to October 2014 (1 billion records) Three variables - Anonymized IMEI, -Time of event (outgoing call, outgoing SMS, connecting to internet using mobile phone) -Coordinates of antennas 12

13 Daytime Density of people in Ljubljana during the day

14 Density of people in Ljubljana during the night 14

15 BD activities in 2016 (1) In the February the set of workshops will be organized with the subject matter statisticians. Goal: brainstorm the ideas and preparation a business cases for usage of BD in different domains of statistical production 15

16 BD activities in 2016 (2) Deepen cooperation with Slovenian universities: Goal: Education of colleagues Usage of data mining (and collection)tools developed by Slovenian faculties http://orange.biolab.si/ or http://newsfeed.ijs.si/ http://orange.biolab.si/http://newsfeed.ijs.si/ Cooperation in projects 16

17 BD activities in 2016 (3) Active part in ESSNET BD project ( one of WP leaders) Organization of Eurostat Big Data Workshop in Slovenia and contribution in ethical review and ethical guidelines which is to be prepared this year. Continuation of ongoing work in local projects (Job vacancies data from enterprise websites and CEMODE) 17

18 Open questions Access to data (legal issues, partnership, etc.) Big data are used for different purposes (different definitions) There is no control of the collection process Data could change or even extinct Public perception IT and methodological skills IT infrastructure Quality of data 18


Download ppt "Big Data activities at SURS Statistical Office of the Republic of Slovenia DIME/ITDG meeting, February 2016."

Similar presentations


Ads by Google