El valor de la información: el reto del Big Data

Slides:



Advertisements
Similar presentations
ESSnet on Standardisation Q2014 – European Conference on Quality in Official Statistics 2 nd -5 th June, 2014 Mr. Csaba Ábry (HCSO) – Ms. Deirdre Giesen.
Advertisements

Will ‘big data’ transform official statistics?
The quality framework of European statistics by the ESCB Quality Conference Vienna, 3 June 2014 Aurel Schubert 1) European Central Bank 1) This presentation.
Frank Yu Australian Bureau of Statistics Unstructured Data 1.
United Nations Economic Commission for Europe Statistical Division The Data Deluge: What Does It Mean for Official Statistics? Steven Vale UNECE
WebDataNet Conference 2015 Salamanca, 26th – 28th May 2015
United Nations Economic Commission for Europe Statistical Division NTTS 2015 – Satellite Workshop on Big Data March 9, 2015 The Big Data Project – The.
Big Data at Eurostat and the ESS
ONS Big Data Project. Plan for today Introduce the ONS Big Data Project Provide a overview of our work to date Provide information about our future plans.
International Seminar on Modernizing Official Statistics:
United Nations Economic Commission for Europe Statistical Division Big Data International Cooperation Steven Vale UNECE
UN Global Working Group on Big Data UNECE Workshop on Statistical Data Collection Washington, DC 29 April – 1 May 2015 United Nations Statistics Division.
ESTAT International Seminar on Modernizing Official Statistics: Meeting Productivity and New Data Challenges Tianjin, People’s Republic of China
Session 1: Understanding the Value of Official statistics: Introduction Eurostat CES seminar, 9 th of April, 2014 Mariana Kotzeva, Adviser Hors Classe.
Globalisation processes in the field of statistics Discussion DGINS, Budapest, 2007 Irena Križman Director-General of the Statistical Office of the Republic.
Quality assurance activities at EUROSTAT CCSA Conference Helsinki, 6-7 May 2010 Martina Hahn, Eurostat.
Big Data Activities at Eurostat Workshop on Statistical Data Collection, 29 Apr – 1 May 2015, Washington D.C, USA
Overview of quality work in Statistics Denmark Kirsten Wismer.
Modernisation of ESS infrastructure: The ESS instruments - a review E. di Meglio – P. Jacques – J.M. Museux.
How to use the VSS to design a National Strategy for the Development of Statistics (NSDS) 1.
Pilot project – Energy Trade Data Reporting Scheme 1 st Steering Committee Meeting Brussels, 5 October 2010.
Eurostat Web activity evidence to increase timeliness of official statistics IAOS – 10 October.
BAIGORRI Antonio – Eurostat, Unit B1: Quality; Classifications Q2010 EUROPEAN CONFERENCE ON QUALITY IN STATISTICS Terminology relating to the Implementation.
Third International Seminar on Early Warning and Business Cycle indicators Session 6 Communication and Dissemination: Country experiences Discussant: Geert.
29 February 2012 Inter-Agency Group on Economic and Financial Statistics (IAG) and the G-20 Data Gaps Initiative Laurs Nørlund Director - National Accounts,
Eurostat WebDataNet Conference 2015 Salamanca, 26 th – 28 th May 2015 Fernando Reis, Big Data Task-Force European Commission (Eurostat) Web activity evidence.
United Nations Economic Commission for Europe Statistical Division UNECE Big Data Work Steven Vale UNECE
Statistics Netherlands’ modernization programme: the use of administrative data, lessons learned and the way ahead. Geert Bruinooge Assistant Director.
ESSnet(s) Big Data I + II Item 8 of the agenda Joint DIME-ITDG Plenary Luxembourg, 24 Feb 2015.
Jacques Bus Head of Unit, DG INFSO-F5 “Security” European Commission FP7 launch in the New Member States Regional on-line conference 22 January 2007 Objective.
Overview and challenges in the use of administrative data in official statistics IAOS Conference Shanghai, October 2008 Heli Jeskanen-Sundström Statistics.
XBRL for Statistical Reporting Giuseppe Sindoni Eurostat Statistical Information Technologies.
Big Data activities at SURS Statistical Office of the Republic of Slovenia DIME/ITDG meeting, February 2016.
Introduction to EU regulation for Information Society statistics Armenia Twinning 2011 Component F – Information Society, 2 – 6 May. Danmarks Statistik.
Eurostat Item 10 Special session dedicated to big data sources with potential for tourism statistics The possible future impact of big data on tourism.
1 Strategy for Statistical Cooperation in the ENP-East Region High Level Seminar June 2012, Tbilisi, Georgia Session No. 7 Jolanta Szczerbinska,
United Nations Statistics Division
Data Science in Official Statistics: The Big Data Team
Official Statistics in the Age of Big Data
The ESS vision, ESSnets and SDMX
Innovation in statistical processes and products: a European view
Steering Group Admin Project, 12 May 2016
ESS Vision 2020 Recent developments Addressing the skill gaps
Implementing the ESS Vision 2020
United Nations Development Account 10th Tranche Statistics and Data
Methodology and Corporate Architecture
New ways to get the data Multiple mode and big data
ESS Vision 2020 Recent developments
Dissemination Workshop ESSnet Big Data Sofia, February 2017
ESTP programme for 2016 Živilė Aleksonytė-Cormier
Scanning the environment: The global perspective on the integration of non-traditional data sources, administrative data and geospatial information Sub-regional.
Smart Tourism statistics: improving the range of service offering in Rome Massimo De Cubellis Istat -Italy.
Information Society Statistics
Use of Wikipedia for Statistics on Culture
ESS Vision 2020.
Item 3 of the draft agenda ESS.VIP ADMIN: progress report
Big Data ESSNet WP 1: Web scraping / Job Vacancies Pilot
International Statistics
The Data Revolution and Official Statistics
Albrecht Wirthmann ESS TF Big Data
United Nations Statistics Division
Ðì SA Effective Monitoring and Evaluation of Progress on the SDGs Monitoring SDGs : the perspective of Armstat Learning Conference: Implementing.
Access to Big Data for Statistical Purposes
Ethical Implications of using Big Data for Official Statistics
ESS Vision 2020.
DIME / ITDG Meeting Luxemburg, 14 Feb 2017
Case Study: HLG Big Data Sandbox
Big Data in Official Statistics: Generalities
RDG TF Cooperation models – Action Plan Progress report
Presentation transcript:

El valor de la información: el reto del Big Data Instituto de Estadística y Cartografia de Andalucia 5 Feb 2016 Big data in official statistics in the European Statistical System: the Big Data Action Plan & Roadmap EUROSTAT – Fernando Reis – 'Task Force Big Data'

Datafication Sensors Digital footprint Good afternoon, I have the pleasure to welcome you to this session on the activities of Eurostat on big data and on possible collaboration between UNSD and Eurostat. <fade in "Datafication"> What I actually want to talk about at the beginning of this session, is DATAFICATION. This concept was introduced in the May/June 2013 issue of Foreign Affairs, in article by Kenneth Neil Cukier and Viktor Mayer-Schoenberger called “The Rise of Big Data”. In it they discuss the concept of datafication, and their example is how we quantify friendships with “likes”: it’s the way everything we do, online or otherwise, ends up recorded for later examination in someone’s data storage units. Or maybe multiple storage units, and maybe also for sale. They define datafication as a process of “taking all aspects of life and turning them into data”. For instance: Twitter 'datafies' stray thoughts, LinkedIn 'datafies' professional networks. Datafication is an interesting concept and led us to consider its importance with respect to people’s intentions about sharing their own data. We are being datafied all the time. Or rather: our actions are. When we “like” someone or something online, we should expect to be 'datafied'. When we merely browse the web, we are unintentionally, or at least passively, being datafied through cookies that we might or might not be aware of. And when we walk around in a store, or even on the street, we are being datafied in a completely unintentional way, via sensors, cameras, or Google streetview cars. <fade in "footprints & sensors"> As such, we can distinguish two "tools": the digital footprint passively left behind by an individual and sensors actively gathering information. So, with or without knowing, everyone of you left your footprint behind when you switched on your mobile phone last night or this morning to call home, some weeks ago when you booked your flight via Amadeus or when you looked for a hotel via booking.com, some days ago when you checked via Google how to get to this building by bus in the morning, or this morning during breakfast when you wrote on Facebook that you were going to attend a very interesting session on big data  Our challenge as statisticians is to exploit this so-called datafication and use big data for producing statistics. What will be the impact of ubiquitous data collection and networking on official statistics? Sensors Digital footprint

Big Data and Official Statistics What will be the impact of ubiquitous data collection and networking Mobile Communication Internet of [every]Things, Social media, Wearables, Autonomous traffic, Smart systems, … on official statistics?

Expected benefits of using big data ? Outward-looking More adequate and flexible response to user needs Wider range of statistical products and services (without increasing burden) Better understand quality aspects of new sources Inward-looking Acquisition of new competences for NSIs Increase efficiency in producing statistics We remain key players for statistical information (self-explanatory)

Big data at Eurostat – key points ESS (European Statistical System) Scheveningen Memorandum Sept 2013 Examine the potential of big data sources for official statistics Official Statistics big data strategy as part of wider government strategy Address privacy and data protection Collaboration at European and global level Address need for skills Partnerships between different stakeholders (government, academics, private sector) Developments in methodology, quality assessment and IT Adopt action plan and roadmap for the ESS The real kick-off for the ESS work on big data was the Scheveningen Memorandum adopted by the heads of the national statistical offices. List of objectives (bulletpoints) I will not go further into the details here, but let me pick two important results <next slide>.

Big data at Eurostat – key points ESS (European Statistical System) Scheveningen Memorandum Sep 2013  Task Force Big Data  Big Data Roadmap and Action Plan 1.0 June 2014  ESS Pilots 2016 - 2020 Implementation of ESS Vision 2020: Big Data project = integral part of the portfolio European Commission Communication "Towards a thriving data driven economy" Private Public Partnership on big data International cooperation (UNSD, UNECE, etc.) UN/ECE project “Big data in official statistics” (Sandbox) UNSD Global WG on Big Data Firstly, the creation of a Task Force on Big Data. We have an internal TF here at Eurostat (of which Albrecht is a fulltime member) and an ESS Task Force. The latter is currently composed of 16 statistical offices but also includes experts from the ECB, OECD, UNECE, academic experts and experts from other Commission services (DG CNECT, DG JRC), it is chaired by Eurostat. Secondly, the ESS Task Force drafted a Big Data Roadmap and Action Plan. An important axis of this roadmap and action plan, concerns the ESS Pilots that will be carried out until 2019 – but I come back to this later. <fade in "ESS vision 2020"> Obviously, big data is also an important element for achieving the implementation of the ESS VISION 2020. <fade in "EC" and "International cooperation> Apart from the mentioned work, initiaves are also taking at the level of the Commission and Eurostat is also closely cooperation with other international organisations the UNSD and UNECE, for instance via the Global Working Group on Big Data for Official Statistics.

Big Data Action Plan and Roadmap@ a glance Policy Quality Skills Experience sharing Legislation IT Infrastructures Methods Ethics / Communication Big data sources Governance Pilots

Ethics / Communication Policy Quality Skills Experience sharing Legislation IT Infrastructures Methods Ethics / Communication Big data sources Governance Pilots Challenges cooperation, sharing of know-how development of a sound methodology ("from design-based to model-based approach") exploration & tentative implementation Looking for partners Action (example) Pilot projects, carried out by the Member States (ESSnet) 2015 – 2019 (FPA / SGA construction) Exploring different big data sources (but also IT architecture, partnerships), developing generic guidelines and frameworks Establish Parternships with data providers and research and international organisations Cooperation with UN (lead) on Metodological Framework A first set of challenges refers to the cooperation and exchange of best practices, the methodology and the transition into the "real use" of data. These are perhaps the areas that are closest to a statistician's heart. One way of tackling these issues, is the launching of a series of PILOT PROJECTS. A Framework Partnership Agreement between Eurostat and 20 NSIs was signed in Nov 2015. In Dec 2015 Eurostat launched the Special Grant Agreements that will provide the resources to the NSIs to carry out the work. In this context close cooperation between the ESS and the GWG will be necessary in order to avoid double work and ensure synergies between the two groups. These pilot projects will be an important pillar of the big data activities in the ESS in the coming years and should pave the way towards a data production driven by big data.

Ethics / Communication Policy Quality Skills Experience sharing Legislation IT Infrastructures Methods Ethics / Communication Big data sources Governance Pilots Action (example) – continued List of pilot projects (Frame Partnership Agreement signed) Web scraping [job vacancies ; enterprise characteristics] Smart meters [electricity consumption ; temporary vacant dwellings] AIS data [vessel identification systems] Mobile phone data “The big data for official statistics competition" (2016) A first set of challenges refers to the cooperation and exchange of best practices, the methodology and the transition into the "real use" of data. These are perhaps the areas that are closest to a statistician's heart. One way of tackling these issues, is the launching of a series of PILOT PROJECTS. We hope to conclude a Framework Partnership Agreement very soon and will then launch the Special Grant Agreements that will provide the resources to the countries to carry out the work. These pilot projects will be an important pillar of the big data activities in the ESS in the coming years and should pave the way towards a data production driven by big data.

Ethics / Communication Policy Quality Skills Experience sharing Legislation IT Infrastructures Methods Ethics / Communication Big data sources Governance Pilots Challenges new skills for NSI staff: statisticians vs. data scientists ? computing capacity, hardware ? analytical tools, software? storage ? Action (example) Training program for European statisticians (ESTP) In the next years: dedicated courses on big data Focus on big data sources and on big data tools Acquiring the skills needed to assess sources and their quality, the skills to use tools and to explore big data sources Secondly, important enablers for a successful move towards big data, are SKILLS and IT INFRASTRUCTURE. Our staff will slowly but steadily need new skills and our IT architecture & infrastructure will need to adapt to the new sources. The impact on hardware needs will be significant. Experiments are ongoing, for instance the "sandbox" environment for big data experiments hosted by the Irish Central Statistics Office – in a cooperation between among others Eurostat and UNECE. An concrete action in the pipeline, is the set-up of a series of training courses under the umbrella of the ESTP. Our Task Force on Big Data is currently preparing the outline for such program for 2016. The courses will focus on sources and on tools and will be modulated in a way to address basic/new users or management as well as more experienced users.

ESTP courses supporting big data (2016) 12 – 15 Sep Big data sources - Web, Social media and text analytics 29 Feb – 2 Mar 21 – 24 Jun Introduction to big data and its tools Hands-on immersion on big data tools Nowcasting 7 – 10 Nov Advanced big data sources - Mobile phone and other sensors 5 – 7 Apr 8 – 10 Jun 24 – 26 Feb The use of R in official statistics: model based estimates Arrows represent suggested learning paths and not mandatory precedencies; Set of knowledge and skills required for staff to be prepared to work on the processing of big data (statisticians) is too large to be well covered in one single course. Therefore, it needs to be covered by several courses with well-defined precedencies. [click for first animation] Besides these precedencies between the big data courses, there are important skills not exclusively related to big data which are required when working with big data sources (e.g. machine learning). These are covered by ESTP training courses in other domains, in particular methodology. [click for second animation] Methodology training courses also provide important skills required in the production of statistical products which build on the potential of big data sources, in particular nowcasting. Can a statistician become a data scientist? Time-series econometrics Big data courses Methodology courses Activity

ESTP courses supporting big data (2016) 12 – 15 Sep Big data sources - Web, Social media and text analytics Web scrapping Content and sentiment analysis on social media Text mining 29 Feb – 2 Mar 21 – 24 Jun Introduction to big data and its tools Hands-on immersion on big data tools Hadoop; Map Reduce; Pig and Hive; Spark; NoSQL databases; RHadoop; Nowcasting 7 – 10 Nov Advanced big data sources - Mobile phone and other sensors Big data and the several digital traces people leave; Overview of big data sources: sensors and the IoT, process-mediated data; human-sourced data; The implications of big data for official statistics; International big data initiatives in official statistics; Privacy and personal data protection; Examples of use of big data for producing statistics; Methodological challenges of big data, e.g. over-fitting, multiple inference, and model-based inference. Visualisation and its importance in the analysis of big data; Data science and its role in big data analytics; Overview of big data tools, e.g. distributed computing; Mobile phone operators data; Road sensor data; Satellite images; Vessels and planes identification systems; 5 – 7 Apr 8 – 10 Jun 24 – 26 Feb The use of R in official statistics: model based estimates Arrows represent suggested learning paths and not mandatory precedencies; Set of knowledge and skills required for staff to be prepared to work on the processing of big data (statisticians) is too large to be well covered in one single course. Therefore, it needs to be covered by several courses with well-defined precedencies. [click for first animation] Besides these precedencies between the big data courses, there are important skills not exclusively related to big data which are required when working with big data sources (e.g. machine learning). These are covered by ESTP training courses in other domains, in particular methodology. [click for second animation] Methodology training courses also provide important skills required in the production of statistical products which build on the potential of big data sources, in particular nowcasting. Can a statistician become a data scientist? Time-series econometrics Methods of statistical inference: design-based, model-based and algorithm-based estimation Statistical learning Geo-spatial analysis Network analysis and Web analytics Graph database and advanced data visualisation Essentials of R Descriptive statistics with R Data visualization with R Programming with R Applications of R in an NSI Introduction to time series analysis. Forecasting with time series models, uncertainty and confidence in forecasting. Univariate time series modelling: ARIMA, ARCH and GRACH models. Multivariate time series modelling: cointegration and VAR and VECM models. Other developments : nowcasting, combination of forecasting, etc. Brief introduction to state space modelling; Big data courses Methodology courses Activity

Ethics / Communication Policy Quality Skills Experience sharing Legislation IT Infrastructures Methods Ethics / Communication Big data sources Governance Pilots Challenges integrating official statistics in big data strategies getting access to data & continuity of access data security & privacy concerns compensate for the burden ? Action (example) Project on the analysis of legislation and strategy (but also ethics and communication) 2015-2017 (22 months) Analysis for EU and for Member States at national level See also the Feasibility study on the use of mobile positioning data for tourism statistics (report on feasibility of access) Other important areas relate to the policy / political framework and the regulatory framework. Given the interaction between policy and regulation, it is very important to work on these areas in parallel and in narrow cooperation. One aspect of policy will be the integrating of (official) statistics into any strategy related to big data. This is essential to put statistics on the map and to open doors to actually accessing of data. It should be kept in mind that big data are often held or stored by private companies, e.g. mobile network operator. The discussion of access is not limited to the entry but should include a long term vision, in other words a certain continuity of access – this is a conditio sine qua non for a sound statistical system that is based, fully or partially, on big data sources. A main barrier to access, is data security and privacy concerns –as was also highlighted in the feasibility study carried out with respect to tourism statistics. Another important challenge is finding a sustainable business model for big data in official statistics, taking into account the budgetary impact for statistical offices and for those "holding" the data. To address these questions, I can mention that Eurostat recently launched a Call for Tender with the objective of analysing the legal frameworks at EU and national level.

Ethics / Communication Policy Quality Skills Experience sharing Legislation IT Infrastructures Methods Ethics / Communication Big data sources Governance Pilots Challenges transversal challenges to all big data activities: quality and ethics & communication big data vs. statistics : "goodness of fit" (concepts, representativeness,…) impact on the public opinion of privacy and security concerns ? Action (example) Cooperation with UN (lead) on a quality framework for big data Project on the analysis of ethics and communication (but also legislation and strategy) 2015-2017 (22 months) Analysis for EU and for Member States at national level As I already mentioned, all of the areas in the roadmap are interrelated. Two areas in particular are of a more horizontal, transversal nature. On the one hand "quality"… the quality framework as we know it, will not be adapted to the new data sources. Eurostat is contributing to the UN's work on a quality framework for big data. Quality issues will appear in the pilots, when assessing the access to data, etc. Just think of conceptual issues (can statistical definitions be maintained when using big data?), timeliness and flexibility of access, coverage and sampling issues, etc… On the other hand "ethics and communication" will play an important if not decisive role. Policy makers and businesses will be reluctant to cooperate or to launch big data initiatives if the "public opinion" is not supporting such approaches. Protection of data will become even more important than it already is now.

Currently a focal data source for big data Exists in all countries Communication Mobile phone data Social Media WWW Web Searches Businesses' Websites e-commerce websites Job advertisements Real estate websites Sensors Traffic loops Smart meters Vessel Identification Satellite Images Process generated data Flight Booking transactions Supermarket Cashier Data Financial transactions Crowd sourcing VGI websites (OpenStreetMap) Community pictures collection Currently a focal data source for big data Exists in all countries (≠ accessible in all countries) Many promising studies/experiments available Potential relevance to many areas of official statistics (synergies!) Most available studies linking big data to tourism statistics, are based on mobile phone data

Mobile phone data Eurostat: Feasibility study on the use of mobile positioning data for tourism statistics (2012-2014) Included in the forthcoming ESS Pilots on Big Data (2016-2019) GWG Big Data Pilot NSIs (and tourism researchers) Many small or larger scale projects ongoing! GWG Big Data Task Team Mobile Phone Data

… slow data vs. quick data… Article released one day after 2015 Easter weekend about tourism in Belgian coast: 150 000 same-day visitors on Sunday, 400 000 during the entire long weekend Data based on a monitoring of the regional tourism board, in cooperation with the main mobile network operator Proximus and the road infrastructure administration; In comparison: Eurostat will receive data on same-day visitors for the 2nd quarter of 2015 (not a particular weekend) on 30 June 2016 (not the day after) for the entire country (not a coastal strip within a NUTS2 region); Methodology not clear, but it's a nice example of how flash estimates based on big data decreases the relevance of official statistics.

Multiple sources & Multiple outputs Big data = Multiple sources & Multiple outputs Statistics Population Mobile phone data Smart Meters VGI websites Satellite Images Mobile Phone Data Tourism Statistics Population Statistics Migration Statistics Traffic Statistics Commuting Statistics We can expect that in the coming years, big data will influence oru work via many different entry points. The wide range of big data sources wil not replace the current statistics in the short term but will be used to enhance, improve and complement statistics – in many areas simultaneously. For example mobile phone data can contribute to producing statistical data in different domains, such as tourism or mobily, but also population or migration. On the other hand various big data sources can interact and contribute to providing statistical data for a specific domain. For example population statistics can be fed by mobile phone data, volunteered geographic information sources and smart meters.

Lifecycle for the coming years ? Domain STATISTICS Mobile phone data Payment cards data HOUSEHOLD & BUSINESS SURVEYS Other big data SHORT TERM 'Traditional' surveys as main input for tourism statistics Big data sources slowly becoming auxiliary information

Lifecycle for the coming years ? (2) Domain STATISTICS Mobile phone data Payment cards data HOUSEHOLD & BUSINESS SURVEYS Other big data MID TERM Weight of surveys decreases in favour of big data ? Surveys no longer 'main filter' but 'one of the sources' ?

Lifecycle for the coming years ? (3) Domain STATISTICS Mobile phone data Payment cards data HOUSEHOLD & BUSINESS SURVEYS Other big data Web (prices) Bookings (nowcast/forecast) NEW LONGER TERM 'Replacement of surveys continues (smaller samples, less frequent collection) ? Enhanced tourism statistics via embedding of newer sources ?

The statistical office of the future Data flows in addition to surveys and censuses Embedded in data flow – smart statistics Product designers in addition to data collection designers Statistical modelling will be a major activity From descriptive indicators to nowcasting (and forecasting) Trust and quality will be key New role in teaching digital literacy Accreditation and certification instead of pure production Address issues linked to quality & transparency, privacy & confidentiality, access to third party data sources & data sharing, scientific standards & methodology, professional ethics, skills, … To close, and before jumping to the next speakers, let's jump a bit further in time and try to imagine what the statistical office of the future could look like… We will move away from the traditional sources that have often been in place since the scientist after whom this meeting room in named, Quetelet, put them at the core of statistical production. Surveys and censuses will be "competing" with data flows, and somehow the NSIs will become embedded in such data flow. In terms of skills, we will no longer design data collection but we will be designing statistical products (using the available sources). Modelling and nowcasting will become common terminology. Partnerships and trust will become more important than even. Users and producers will need new types of digital literacy and skills. In terms of quality, the NSI will lose control over the entire production chain from interview to indicator and quality assessment will focus more on accreditation and cerfication of big data sources and statistical output based on big data.

Thank you for your attention Eurostat Task Force on Big Data Fernando Reis Eurostat Task Force on Big Data fernando.reis@ec.europa.eu https://github.com/reisfe/ https://twitter.com/reisfe/ https://linkedin.com/in/reisfe/