Download presentation
Published byJustina Rodgers Modified over 9 years ago
1
WebDataNet Conference 2015 Salamanca, 26th – 28th May 2015
Big data and official statistics - progress at European level Fernando Reis, Big Data Task-Force European Commission (Eurostat)
2
The data deluge © Copyright Brett Ryder 2010
3
The data deluge The origins Decreasing storage price
Source: JC McCallum Some methodology and references can be found in: JC McCallum, Price-Performance of Computer Technology (1st edition), pages 4-1 to 4-18 in The Computer Engineering Handbook, Vojin G. Oklobdzija, editor, CRC Press, Boca Raton, Florida, USA, 2002
4
The data deluge The origins Decreasing storage price
Developments in distributed computing paradigm Copyright © 2014 The Apache Software Foundation.
5
The data deluge The 3 V’s of big data Volume (large N and large P)
Velocity (data streaming and real-time statistics) Variety
6
The data deluge Organic data / exhaust data / digital footprint
Data ubiquity: text, sound, images, video Emergent use of unstructured data Reality mining
7
The data deluge Communication Mobile phone data Social Media WWW Web Searches Businesses' Websites E-commerce websites Job advertisements Real estate websites Sensors Traffic loops Smart meters Vessel Identification Satellite Images Process generated data Flight Booking transactions Supermarket Cashier Data Financial transactions Crowd sourcing VGI websites (OpenStreetMap) Community pictures collection The key critical success factor in the action plan is an agreement at ESS level to embark on a number of concrete pilot projects. A real understanding of the implications of big data for official statistics can only be gained through hands-on experience, ‘learning by doing’. Different actors have already gained experience conducting pilots in their respective organisations at global, European and at national level. The basic principle for the identification of the pilots should be their potential for producing official statistics. In addition, the pilots should cover a variety of characteristics related to big data and production of statistics. On the basis of the principles described in the action plan and roadmap, the he ESS Big Data ask Force should make proposals for pilots which would have to be agreed by the ESSC. Participation in the pilots will be guided by the collaboration principles of the ESS Vision on a case-by-case approach.
8
Analytics This photo, “Cartoon: Big Data” is copyright (c) 2014 Thierry Gregorius and made available under an Attribution 2.0 Generic license.
9
Analytics How to deal with exhaust data? Massive datasets
Dealt by machine learning / predictive analytics Massive datasets Foster machine learning Data science: a new discipline? Multiple inference Overfitting
10
Analytics Visualisation Data-driven applications Predictive analytics
The end of theory Predictive analytics Ex: Google translate, voice recognition, suggestions systems, health applications New types of data products Official stat: chances of getting a new job Ex: Inflation Calculator: Bureau of Labor Statistics
11
Predicting Personality Using Mobile Phone-Based Metrics
de Montjoye, Yves-Alexandre, et al. "Predicting personality using novel mobile phone-based metrics." Social Computing, Behavioral-Cultural Modeling and Prediction. Springer Berlin Heidelberg,
12
Multidimensional Poverty Index
(Lighter colour indicates higher poverty) Poverty map estimated based on mobile phone data Poverty map in finer granularity estimated based on mobile phone data Smith, Christopher, Afra Mashhadi, and Licia Capra. "Ubiquitous sensing for mapping poverty in developing countries." Paper submitted to the Orange D4D Challenge (2013).
13
Big data industry sector
14
An emergent market Monetisation of data: Data is the new oil
Data as a new factor of production (competitive differentiating factor for businesses) A threat to official statistics? (ex: Argentina) Data ecosystem The cases of Google and Facebook
15
What does big data mean for official statistics?
Change of paradigm From: finite population sampling methodology To: additional statistical modelling and machine learning from designers of data collection processes to designers of statistical products Privacy Use of digital footprint Data subject lack of control of data High data detail and insight from analytics
16
Big data strategy of the European Statistical system
Scheveningen Memorandum ESS big data action plan
17
Scheveningen Memorandum
September 2013 Big data strategy Roadmap Heads of the National Statistical Institutes of the EU
18
Scheveningen Memorandum on Big Data
Examine the potential of Big Data sources for official statistics Official Statistics Big Data strategy as part of wider government strategy Address privacy and data protection Collaboration at European and global level Address need for skills Partnerships between different stakeholders (government, academics, private sector) Developments in Methodology, quality assessment and IT Adopt action plan and roadmap for the European Statistical System At their meeting in September 2013, the directors general of the European Statistical System agreed on a memorandum on Big Data. The so called Scheveningen memorandum called for the submission of the action plan and roadmap on Big Data. It also expresses the main points of such an action plan, such as the integration of a Big Data strategy for statistics into an overall government strategy, the need for skills, the collaboration at European as well as at global level, the need for methodological developments and the protection of personal data. The action plan and roadmap was adopted by the ESS at its meeting on 25 Sep 2014.
19
Big data strategy Start with concrete pilots 3 time-frames
Short-term: pilots, where we are, what do we need Medium-term: Long-term: Review the roadmap
20
Big data sources integrated in ESS official statistics production
Roadmap "As is" versus "To be" Long term Vision Medium term aims Short term objectives Topics developed for integration of Big Data into official statistics Government Strategy Pilots finalised IT infrastructure, Methods, Quality Framework Skills Partnerships Ethics > 2020 Big data sources integrated in ESS official statistics production By 2020 Commission big data strategy Identification and analysis of output portfolio Pilot projects launched Skills and training requirements identified Horizon 2020 Research Framework Programme Communication strategy Exchange with stakeholders Long Term vision beyond 2020: Big data sources integrated in ESS official statistics production By 2020, we want to prepare the grounds for the successful integration of Big Data into official statistics. The issue should be tackled by the topics which have already been identified in the Scheveningen memorandum and further developed in the action plan. Medium term aims: Topics developed for integration of Big Data into official statistics Government Strategy Pilots finalised IT infrastructure, Methods, Quality Framework Skills Partnerships Ethics Short term goals: Commission big data strategy Identification and analysis of output portfolio Pilot projects launched Skills and training requirements identified Horizon 2020 Research Framework Programme Communication strategy Exchange with stakeholders Long term vision Big data sources are available to the ESS (in such a way that business continuity is guaranteed) The European and national legislations enable Big Data usage Statistics graduates with data science skills available across the ESS Methods, tools, IT infrastructures and quality frameworks are reviewed and adjusted to new requirements Medium term aims Official Statistics integrated in governmental big data strategies at national levels across the ESS Pilots are finalised and provide input as regards implementation of statistics based on big data IT infrastructures are designed for processing big data sources Methodological and quality frameworks are developed (and/or reviewed) Data science skills integral part of official statistics education Public Private Partnerships are in place on big data and Official Statistics. Ethical guidelines for big data usage are in place Implementation of communication strategy ensures positive attitude of general public towards use of big data sources in official statistics. Short term goals Official statistics integrated in the Commission big data strategy Identification and analysis of output portfolio of big data sources Pilot projects on big data applications in official statistics launched and delivering first results The requirements in terms of skills and ESS professional training programmes are established Research lines on big data relevant to official statistics are included into the Work Programme of the Horizon 2020 Research Framework Programme Communication to the general public on planned and ongoing big data activities of the ESS with users/policy makers on their needs, and with big data owners on data aspirations Exchange of information with stakeholders within the statistical system and the research community, e.g. a European conference on big data in Official Statistics By 2016
21
Ethics / Communication
T O P I C S Policy Quality Skills Experience sharing Legislation IT Infrastructures Methods Ethics / Communication Pilots The slide shows the horizontal topics that have been developed in the action plan and roadmap. The ESSC reinforced in its opinion the legal and ethical issues as well as the development of appropriate skills. Some of the topics should be directly further developed such as policy, communication or skills, while other topics should be further elaborated as part of the pilots that arer planned to be run at EU level.
22
Blending of Sources and multipurpose Statistics
Mobile Phone Data Tourism Statistics Population Statistics Migration Statistics Traffic Statistics Commuting Statistics Statistics Population Mobile phone data Smart Meters VGI websites Satellite Images A Big Data source could provide data for different statistical areas. For example, mobile phone data could be used in the context of tourism statistics or population statistics. The pilots should explore the potential of different data sources for different statistical domains. On the other hand, different statistical domains could be enhanced from information that is derived from different data sources.
23
Thank you for your attention Eurostat Task Force on Big Data
Fernando Reis Eurostat Task Force on Big Data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.