SDMX: Enabling World Bank to automate data ingestion

Slides:

Advertisements

Similar presentations

The SDMX Registry Model April 2, 2009 Arofan Gregory Open Data Foundation.

Advertisements

April, 2004 Lars Thygesen International Trade Expert meeting Whats going on at OECD: statistical information management.

EMu Online Data Sources Brad Lickman For Taxonomy and Geolocation (and Vocabulary Control)

Reproductions of this material, or any parts of it, should refer to the IMF Statistics Department as the source. IMF Statistics Department Louis Marc Ducharme.

Serving up Statistics to an International Community IASSIST Conference Brian Buffett May 2003.

Briefing on the work of the Inter-Agency Group on Economic and Financial Statistics - Envisaged Developments - Werner Bier Deputy Director General Statistics.

United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE

The implementation of the SDMX standards by the ECB and the European System of Central Banks Werner Bier (ECB) Gérard Salou (ECB) Sami Airo (Bank.

Communication and dissemination of indicators Soong Sup Lee, World Bank.

Overview of SDMX: Statistical Data and Metadata eXchange Technical and Content Standards for Statistical Data Ann McPhail, Division Chief Statistics Department,

MDG Data Coordination Neda Jafar Workshop on MDG Data Reconciliation: Employment Indicators July, Beirut Workshop on MDG Data Reconciliation:

CountryData Development Improving the collation, availability and dissemination of development indicators (including the MDGs) Nairobi, 27 November 2013.

SDMX at the IMF Progress Report Expert Group on Statistical Data and Metadata Exchange (SDMX 2007), Geneva, May 8-11, 2007 Patrick Hinderdael, Economic.

1 Annual National Accounts  1. Situation of OECD annual national accounts database  2. New features of the joint OECD-Eurostat questionnaire  3. COFOG2.

February, CONTEXT  CONSTITUTIONAL AMENDMENTS  Creation of the Statistical and Geographical Information System (SNIEG)  INEGI’s Autonomy (July.

13-Jul-07 Implementation of SDMX for data and metadata exchange Balance of Payments Working Group 2-3 April 2012 Daniel Suranyi Eurostat B5 Management.

United Nations Economic Commission for Europe Statistical Division The Importance of Databases in the Dissemination Process Steven Vale, UNECE.

Monitoring Human Development OECD EXPERT GROUP ON SDMX GENEVA MAY 2007.

Basics David Barraclough OECD SDMX Coordinator

Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,

Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October.

SDMX IT Tools Introduction

7 Strategies for Extracting, Transforming, and Loading.

Eurostat 1.SDMX: Background and purpose 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.

7b. SDMX practical use case: Census Hub

Implementation of SDMX for Balance of Payments Balance of Payments Working Group 9-10 April 2013 BP Daniel Suranyi Eurostat B5 Management of statistical.

UNSD-DFID Project on National Development Indicators Case of Morocco Country Director’s Meeting New York, October

The OECD-UNSD Trade System – A Progress Report OECD Trade Experts Meeting – September 2007.

15-16 December 2010 CGST Meeting 1 IT Developments TRIS 1 – TRIS 1 / TRIS 2 Item 7.1 on the agenda 1 TRIS = TRansport Information System.

Eurostat 6. SDMX: A non-technical overview of the SDMX architecture and IT tools 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services”

IAEA International Atomic Energy Agency Implementing SDMX for Energy Domain: From Discussion to Actual Implementation and Testing Andrii Gritsevskyi Oslo.

Statistical Information Systems Introducing SIS tool .Stat

Building a Data Portal with SDMX

FUTURE EVOLUTION OF SHORT-TERM ECONOMIC STATISTICS

National Accounts World Wide Exchange

Exchanging Reference Metadata using SDMX

SDMX Opportunities MED Meeting 14 May 2013 Daniel Suranyi Eurostat B5

SDMX Information Model

Using SDMX structures to facilitate data reporting

Using the Checklist for SDMX Data Providers

Census Hub in practice Working Group "European Statistical Data Support" Luxembourg, 29 April 2015.

SDMX for SDGs What it means for you

SDMX: A brief introduction

11. The future of SDMX Introducing the SDMX Roadmap 2020

Data collection of 2012: Data transmission standards and tools

SDMX Introduction and practical exercises

2. An overview of SDMX (What is SDMX? Part I)

2. An overview of SDMX (What is SDMX? Part I)

Statistical Data and Metadata eXchange (SDMX)

Data Transmission Tools & Services EDAMIS, SDMX, Validation

SDMX in the S-DWH Layered Architecture

SDMX: an Overview Abdulla Gozalov UNSD.

Statistical Information Technology

SDMX as basis for water data reporting

August Götzfried Eurostat unit B 4

ESS VIP ICT Project Task Force Meeting 5-6 March 2013.

SDMX for MDG Indicators

Generate SDMX files.

SDMX : General introduction H. Linden, Eurostat, Unit B5

WG on Statistical Confidentiality (TRansport Information System)

SDMX Implementation The National Accounts use case

European Census Hub: a cooperation model for dissemination of EU statistics Paper prepared by Ioannis Xirouchakis Presentation: Christine WIRTZ, Eurostat.

1. SDMX: Background and purpose

Standardizing and industrializing a business process – the dissemination use case Alessio Cardacino - ESTP Course “Information standards.

SDMX IT building blocks

SDMX in AFRICA SDMX Roadmap th SDMX Global Conference

SDMX: From Labour Force Department to the Statistical Database

GSIM overview Mauro Scanu ISTAT

Presentation transcript:

SDMX: Enabling World Bank to automate data ingestion Siddhesh Kaushik, World Bank SDMX Global Conference(Oct 2-3 2017 Addis Ababa) Good morning everyone.

Our Focus

World Development Indicators The primary World Bank collection of development indicators Cross country data compiled from officially recognized sources most current and accurate global development data includes national, regional and global estimates Has 1400 plus indicators across statistical domains Updated Quarterly World Bank’s primary collection of cross country comparable Development indicators. Has over 1400 indicators and updated quarterly.

WDI Production Process Data Extraction Download Clean Transform Load Consolidate, Verify Publish Data is pulled in from multiple agencies and in different formats. In the past we used to pull data from IMF via backdoor access to their server and populate an in-house query tool for staff to get easier access to data. The data is then cleaned to remove aggregates not used, then transformed into a format need to load into the Data Management System. The loaded data is consolidated, verified and published.

Challenges in Data Extraction Different sources & formats Clean and Validate Time Consuming Many of you would have figured out the challenges in the previous slide itself as I am sure most of us face similar problems. Some of the major challenges are Different sources , formats and methods to obtain data. Excel’s have lot of additional information useful for human consumption that has to be cleaned up Due to security reason the backdoor connection to IMF was closed and we have to execute multiple queries in IMF site to get manageable amount of data in a desktop. Large database like Direction of Trade could not be updated anymore Majority of the cleaning and validation task is manual Leading to large amount of time needed to clean the file Manual process also means more quality checks to be done especially when we are dealing with large volume of data. Manual process More Quality Checks

to the rescue The Super Data Machine Exchange was called in to help us with the situation.

SDMX Implementation Scheduler SDMX Connector Mapping Management Transform Database connector This is how we implemented. We have a scheduler and we can specify details of the SDMX Web Service or file to processed. The frequency of processing and persons to be notified upon completion . Using a combination of Eurostat SDMX Source and R SDMX package we implemented a connector to fetch and read SDMX data. We have a mapping and transformation component that lets us map SDMX codes to internal codes and also to merge dimensions to meet our internal structure Database connectors help us to connect to different databases primarily the Management and Dissemination ones, but giving flexibility to push to additional databases if needed. We then used Eurostat’s SDMX-RI to provide a SDMX enabled Web Service for WDI SDMX Web Service Dissemination

Benefits Single Tool Easy to add new data sources Time Saving The most visible benefit is time saving for staff giving them more time to do more useful things than in cleaning data. We now have a single tool to pull and process SDMX data and it can be deposited in multiple system The databases that we updated using backdoor or now being updated again and more frequently. We just have to add configuration to accommodate new SDMX source Data is being updated weekly All this leads to better data quality and timeliness The code can be re-used for other activities. While our focus of the project was to automate the datasets needed for WDI we realized now we have access to over 8000 datasets from various agencies. We now have add the capacity to access Eurostat like Air traffic data that will be valuable to the Transport group in the Bank. This posed a challenge too, we could not pull all this data Unexpected Bonus Code reuse Access to 8000 + datasets

SDMX Browser Access to 8000 + datasets from 9 organizations To facilitate access to this data we built a SDMX browser and saved a lot of time thanks to OECD making their work available in github. Access to 8000 + datasets from 9 organizations Screen Design courtesy OECD

Future plans More Bank data in SDMX We have around 56 datasets in our dissemination system and we would start SDMX enabling them We need to validation routines for the data being pulled in. The other missing piece in the automation puzzle is metadata ingestion . We also have to face the reality of Excel data and automate its ingestion while reusing lot of the support components build for SDMX automation

Thank you