Strategies for European statistics New insights using micro data ECB-UNRESTRICTED Aurel Schubert Director-General Statistics Strategies for European statistics New insights using micro data ESS Big Data Workshop 2016 Ljubljana, 13-14 October 2016 Disclaimer: The opinions expressed in this presentation are not necessarily those of the European Central Bank (ECB) or the European System of Central Banks (ESCB)
Overview Big Data – The challenge of moving to more granular data 1 2 Big Data – Discovery and piloting 3 Strategy for new insights
New policy needs drive new statistics Statistics needs to stay relevant An increasing heterogeneity world Broader coverage Further breakdowns European Statistics New policy needs for micro-data Increasing responsibilities and new challenges for statistics 3 3
Example: Micro-level statistics Paradigm shift - Moving to more granular data Statistics & Analytics Reliable, trustful and sustainable Micro-level data Big data Macro-level statistics Quality Example: Micro-level statistics Securities by security statistics Holdings of individual securities Money market statistics reporting Loans by loans register (Ana Credit) Register of Financial Institutions Individual supervisory data Reliability Consistency Conny: Aurel, apart from new and expanded responsibility of DG-S, how has this impacted the collection and availability of statistics ? Aurel: A substantial part – The user demand and focus is less on euro area statistics and more on country , sector and individual banks. It is a paradigm shift from macro-level statistics to more micro level statistics . Before we focussed on producing macro-level statistics such as the euro area balance sheet statistics , monetary, euro area securities statistics. The financial crisis has demonstrated the need for micro-level statistics - such as banks transactional data on the money market (MMSR) – meaning banks providing short term loans to other banks; Or individual securities issuance and holding statistics, - meaning which banks are issuing and holding which securities – and even more information is required on each individual banks loans to its customers. This provides in near real time the holdings and activities of the banks – meaning the associated risks and exposures profiles of the banks The challenge here is to bring and match the micro-level statistics with the macro level aggregates. We are also looking into big data - and have all the search teams used in Google 4 4
Need to connect the dots Linking and integrating Data Management Semantics Map and link data (sets) Information Model/ Data Dictionary Standards, Identifiers, Master Data Standardisation of Micro Data Database infra-structure Data Quality / Methodology Common Platform, Data marts Linking Data Warehouse(s) Analytical toolbox, user sandbox Statistics analysis Best practice in analytics Data Discovery Business Analysis Communication outreach Visualisation & Presentations Communication Outreach to frequent users Explainers
The human dimension – extended Swiss knife Core skills of a statistician Economics - Understanding of the economic phenomena Statistics – Statistics methodology and concepts Research - Modelling, algorithm and errors terms Lawyer - Drafting legal regulations and guidelines IT - Building infrastructures, programming and databases Project management – Planning and implementing Detective – Assuring quality and detecting errors Coordinator – STC/ESCB and country knowledge Analyst – Analysing results Communicator – Presenting results and methodology New skills Data science – large datasets, Engineering and Mathematics Technical skills – Machine learning and data mining IT skills – Hadoop, Spark, NoSQL, Pypton, Visualisations – Patterns, discovery, new tools (Tableau)
IFC Survey on Big Data Big Data – Discovery and piloting Should statisticians play a role, contribute and develop the concept of “Big data” or is it only a temporary phenomenon ? IFC Survey on Big Data Aim was to assess central banks’ experiences and interest in exploring big data related to financial and economic topics of interest to central banks IFC on-line survey with 69 responses (83% response rate) Big data is not just about large data sets “Pretty Big” Data
Big Data – Discovery and piloting At senior policy level, there is significant interest in big data within the central banking community 66% Despite the interest, central banks have limited experience in use of big data 30% Address key statistics topics Relevance of sources Quality New indicators/statistics Statistics methods Sampling & representativeness Central banks are interested in cooperating together on specific topics to explore the usefulness of big data 71%
Big Data – Discovery and piloting Macro-economics Forecasting/nowcasting Financial stability Business cycle analysis Supervisory purposes (micro-economics) Sentiments and behaviour indexes Improve quality Big data can be useful for central banking purposes and is perceived as useful for supporting central banking policies Central banks are interested in cooperating together in a structural approach Setting up a road map identify joint pilot projects sharing experience Barriers and challenges Resources and costs Skilled human capital IT constraints Explore synergies to overcome barriers and challenges
Big Data – Discovery and piloting IFC way forward to define and contribute to a “big data” roadmap Share and contribute to selected big data pilot projects administrative dataset (e.g. corporate balance sheet data) web search data set (e.g. Google type search info) commercial dataset (e.g. credit card operations) financial market data (e.g. frequency trading, price spreads)
Strategic directions for statistics Managing micro & big data to derive useful statistics New semantics and methodology (standardisations) Linking and integrating datasets (overcoming silos) New and efficient production platforms New skill-sets and staff training International collaborations Communication and outreach
Thank you for your attention Any questions?
ECB & Google search data Annex ECB & Google search data ECB receives weekly data from Google search machines in a CSV file The data is an index of weekly volume changes of Google queries by geographic location and category Google search data is more accurate and uses much larger samples than Google Trends Google search data includes the following 14 countries: Austria, France, Italy, Slovenia, USA, Belgium, Germany, Netherlands, Spain, United Kingdom, Denmark, Ireland, Portugal, Sweden Google search data includes 26 categories and 269 subcategories E.g. Finance is a category and Banking is a subcategory The data are normalised starting at 1, one can see the relative change in Google searches by category but nothing can be said about the absolute search volumes
ECB & uses of Google search data/big data Annex ECB & uses of Google search data/big data Findings of the ECB Statistics Paper Series released on this topic “Nowcasting GDP with electronic payments data” by John W. Galbraith and Greg Tkacz Electronic payment transactions and cheques can be used to formulate nowcasts of current gross domestic product growth Assesses this technique and finds that debit card transactions contribute most to forecast accuracy “Social media sentiment and consumer confidence” by Piet J. H. Daas and Marco J. H. Puts What is the relationship between the changes in Dutch consumer confidence and the Dutch public social media? The changes in social media sentiment have the same underlying phenomenon as Dutch consumer confidence Could be used as an indicator for changes in consumer confidence and as an early indicator “Quantifying the effects of online bullishness on international financial markets” by Huina Mao, Scott Counts and Johan Bollen The researchers develop a measure of investor sentiment, based on Twitter content and Google search queries Twitter and Google bullishness are positively correlated to investor sentiment Twitter bullishness is able to predict increases in stock returns The results appear to support the investor sentiment hypothesis in behavioural finance
ECB & uses of Google search data/big data (cont’d) Annex ECB & uses of Google search data/big data (cont’d) Pipeline publications by the ECB staff “Big data – the hunt for timely insights and decision certainty: Central banking reflections on the use of big data for policy purposes” by Per Nymand-Andersen Big data might lead to new economic theories with statistical algorithms applied to multiple big data sources from various disciplines finding new causations Big data as opportunity for the central banks to apply expertise in testing existing and new models, data sets and theories; to explore new data sources and to obtain new, timely knowledge from the feedback loop between monetary policy and market reactions Central banks need to start by taking a structural approach to systematically testing the use of non-official big data sources “Predicting euro area unemployment rate using Google data: Central banks interest and use of big data” by Per Nymand-Andersen and Heikki Koivupalo Explores how Google search data has been used for macro-economic and financial purposes within the literature Tests how Google search data can be used for predicting the euro area unemployment rate in advance of the official statistics Demonstrates that applying Google data within a simple model can improve the predictability of the euro area employment rates Further testing is needed with the Google search data to establish its usefulness for central banking statistical and analytical toolkit