United Nations Economic Commission for Europe Statistical Division UNECE Big Data Work Steven Vale UNECE
Big Data
Big Data Data sources with high volume, velocity and variety of data, which require new tools and methods to capture, curate, manage, and process them in an efficient way Source: UNECE and UNSD
What does Big Data mean for official statistics? Priorities: Partnerships – Guidelines Privacy – Guidelines Quality – Guidelines Skills – Survey Skills profile IT / methodological issues - Sandbox
Sandbox Irish Centre for High-end Computing is hosting a Big Data ‘sandbox’ containing data and tools for international experiments “Play is the highest form of research” – Einstein
Sandbox: Aims Is remote access and processing a feasible approach for statistical production? Can existing statistical standards / methods be applied to Big Data? Which Big Data software tools are most useful for statistical organisations? What are the potential uses, advantages and disadvantages of Big Data? “Learning by doing” Can we build an international collaboration community on the use of Big Data for statistics?
Social Media Mobile Phones Prices Smart Meters Job Vacancies Web Scraping Traffic Loops
7 Sandbox Experiment Teams 75 Individuals from 25 countries / organisations 3 Task Teams Executive Board, Modernisation Committees 1 Project Manager 2 Coordinators Partners
Results
Project output available on UNECE Wiki
2015 Project More sandbox experiments More data UNSD Comtrade Wikipedia Twitter Enterprise web site data Future of the sandbox approach “Sprint” – Cork, June
2015 Project Challenge from the High-level Group: Produce and release a set of internationally comparable statistics from one or more Big Data sources By November 2015!
Get involved! Anyone is welcome to contribute! Contact: More Information Big Data Wiki: www1.unece.org/stat/platform/display/bigdata LinkedIn group: “Modernising official statistics” Big Data Inventory: