Download presentation
Presentation is loading. Please wait.
Published byRussell French Modified over 9 years ago
1
United Nations Economic Commission for Europe Statistical Division NTTS 2015 – Satellite Workshop on Big Data March 9, 2015 The Big Data Project – The Sandbox Antonino Virgillito Project Consultant, UNECE Istat
2
The Role of Big Data in the Modernisation of Statistical Production and Services NTTS 2015 March 9, 2015 The HLG Big Data Project Objectives – to identify the main possibilities offered by Big Data to statistical organizations – to demonstrate the feasibility of efficient production of both novel products and 'mainstream' official statistics using Big Data sources 75 participants from 20 Organizations – National Statistical Offices and International Organizations Ran from January to December 2014 4 task teams – Quality – Partnership – Privacy – Technology: hands-on work on Big Data tools and dataset on common, shared computation environment - The Sandbox 2
3
The Role of Big Data in the Modernisation of Statistical Production and Services NTTS 2015 March 9, 2015 The Sandbox Shared computation environment for the storage and the analysis of large-scale datasets Used as a platform for collaboration across participating institutions Explore tools and methods Test feasibility of producing Big Data-derived statistics Replicate outputs across countries Hortonworks Data Platform PentahoRHadoop Created with support from: - CSO Central Statistics Office of Ireland - ICHEC Irish Centre for High-End Computing Cluster of 28 machines Accessible through web and SSH Software: full Hadoop stack, visual analytics, R, RDBMS, NoSQL DB Objectives 3
4
The Role of Big Data in the Modernisation of Statistical Production and Services NTTS 2015 March 9, 2015 The Sandbox Web Interface 4
5
The Role of Big Data in the Modernisation of Statistical Production and Services NTTS 2015 March 9, 2015 Social Media Mobile Phones Prices Smart Meters Job Vacancies Ads Web Scraping Traffic Loops Each experiment team produced a detailed report on its activity, available on the UNECE wiki 5
6
The Role of Big Data in the Modernisation of Statistical Production and Services NTTS 2015 March 9, 2015 Cheaper More timely Novel Statistics We showed some of the possible improvements that can be obtained using Big Data sources 6
7
The Role of Big Data in the Modernisation of Statistical Production and Services NTTS 2015 March 9, 2015 Skills All available tools were used in the experiments by both researchers and techicians with no previous experience The Sandbox can represent a capacity building platform for participating institutions Crucial for building “data scientist” skills Projects in planning were less likely to use tools generally associated with “Big Data”. Often this decision was made due to a lack of familiarity with new tools or a deficit of secure “Big Data” infrastructure (e.g. parallel processing no- SQL data stores such as Hadoop). UNSD Big Data Questionnaire At present there is insufficient training in the skills that were identified as most important for people working with Big Data Skills on Hadoop/NoSQL DBs indicated as “planned in the near future” by majority of organizations UNECE Big Data Questionnaire 7
8
The Role of Big Data in the Modernisation of Statistical Production and Services NTTS 2015 March 9, 2015 Technology Big Data tools are necessary when dealing with data ranging from hundreds of Gb on – Effective starting from tenths of Gb – “Traditional” tools perform better with smaller datasets Researchers/technicians should be able to master different tools and be ready to deal with immature software – Highly dynamic situation with frequent updates and new tools spawning frequently Need strong IT skills for managing the tools – Support from software companies might be required in early phases 8
9
The Role of Big Data in the Modernisation of Statistical Production and Services NTTS 2015 March 9, 2015 Acquisition 7 datasets were loaded – Initial project proposal required “one or more” Difficult to retrieve “interesting” (i.e., meaningful, disaggregated…) datasets – Privacy and size issues This also applies to web sources that are only apparently easy to retrieve – Issues with quality, in terms of coverage and representativeness 9
10
The Role of Big Data in the Modernisation of Statistical Production and Services NTTS 2015 March 9, 2015 Sharing Naturally achieved sharing of methods and datasets Many data sets have the same form in all countries – Methods can be developed and tested in the shared environment and then applied to real counterparts within each NSI Privacy constraints on datasets limit the possibility of sharing – Can be partly bypassed through the use of synthetic data sets 10
11
The Role of Big Data in the Modernisation of Statistical Production and Services NTTS 2015 March 9, 2015 Extension of the Project in 2015 Production of Multi-national statistics only basing on Big Data sources – Objective: present results in a press conference in November 2015 Continuation of experiments – Consolidated technical skills that now can be used more effectively in experiments Possibility of testing new models of partnership – Moving data is too difficult. Why not trying to involve partners in running our programs on their data in their data centers? 11
12
The Role of Big Data in the Modernisation of Statistical Production and Services NTTS 2015 March 9, 2015 12 Project output available on UNECE Wiki http://www1.unece.org/stat/platform/display/bigdata/2014+Project
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.