Alessandro Capezzuoli, Emanuela Recchini

Slides:



Advertisements
Similar presentations
Management Information Systems, Sixth Edition
Advertisements

1 Introduction The Database Environment. 2 Web Links Google General Database Search Database News Access Forums Google Database Books O’Reilly Books Oracle.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
Integration of Applications MIS3502: Application Integration and Evaluation Paul Weinberg Adapted from material by Arnold Kurtz, David.
Chapter 2 Database Environment Pearson Education © 2014.
BUSINESS DRIVEN TECHNOLOGY
TIBCO Designer TIBCO BusinessWorks is a scalable, extensible, and easy to use integration platform that allows you to develop, deploy, and run integration.
Introduction to Databases
 Introduction Introduction  Purpose of Database SystemsPurpose of Database Systems  Levels of Abstraction Levels of Abstraction  Instances and Schemas.
Postacademic Interuniversity Course in Information Technology – Module C1p1 Contents Data Communications Applications –File & print serving –Mail –Domain.
Chapter 1 : Introduction §Purpose of Database Systems §View of Data §Data Models §Data Definition Language §Data Manipulation Language §Transaction Management.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
Storing Organizational Information - Databases
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
IS 325 Notes for Wednesday August 28, Data is the Core of the Enterprise.
Use of Administrative Data Seminar on Developing a Programme on Integrated Statistics in support of the Implementation of the SNA for CARICOM countries.
1 CS 430 Database Theory Winter 2005 Lecture 2: General Concepts.
1 Seminar on Service Oriented Architecture Principles of REST.
Internet Architecture and Governance
1 Chapter 1 Introduction to Databases Transparencies.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Web Technologies Lecture 10 Web services. From W3C – A software system designed to support interoperable machine-to-machine interaction over a network.
ASP-2-1 SERVER AND CLIENT SIDE SCRITPING Colorado Technical University IT420 Tim Peterson.
Chapter 2 Database Environment.
The overview How the open market works. Players and Bodies  The main players are –The component supplier  Document  Binary –The authorized supplier.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
6/28/ A global mesh of interconnected networks (internetworks) meets these human communication needs. Some of these interconnected networks are.
© 2017 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
Management Information Systems by Prof. Park Kyung-Hye Chapter 7 (8th Week) Databases and Data Warehouses 07.
E-Business Infrastructure PRESENTED BY IKA NOVITA DEWI, MCS.
Database and Cloud Security
An innovative webGIS system for dissemination and visualization of official statistics and geospatial analysis Emanuela Recchini, Alessandro.
Chapter 1 Computer Technology: Your Need to Know
Introduction to DBMS Purpose of Database Systems View of Data
By: Raza Usmani SaaS, PaaS & TaaS By: Raza Usmani
The Client-Server Model
Client/Server Databases and the Oracle 10g Relational Database
PLM, Document and Workflow Management
An Introduction to database system
The Client/Server Database Environment
Reengineering of Administrative Data Acquisition
Chapter 1: Introduction
CHAPTER 3 Architectures for Distributed Systems
Chapter 2 Database Environment Pearson Education © 2009.
WEB API.
Introduction to Database Systems
Institutional Framework, Resources and Management
Database Environment Transparencies
Dissemination guidelines at INE
2. An overview of SDMX (What is SDMX? Part I)
Data Model.
Testing RESTful Web APIs
Database Systems Instructor Name: Lecture-3.
Unit# 5: Internet and Worldwide Web
Introduction to DBMS Purpose of Database Systems View of Data
LOD reference architecture
Chapter 1: The Database Environment
The Database Environment
Database Management Systems
DATABASES WHAT IS A DATABASE?
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
PHP Forms and Databases.
Chapter 1: Introduction
SOA initiatives at Istat
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
Work Session on Statistical Metadata (Geneva, Switzerland May 2013)
Presentation transcript:

A tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini Conference of European Statistics Stakeholders Budapest, 20-21 October 2016

1 2 3 4 5 Official statistics and data integration Technology Model Architecture 5 Concluding remarks DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016

1. Official statistics and data integration Introductory remarks (1) There is worldwide recognition of the increasing role played by administrative data in the production of more timely, more disaggregated statistics at higher frequencies than traditional survey data. The efficient use of all available information to produce timely, accurate and high quality statistics is a challenge for National Statistical Offices (NSOs), which are even more committed to developing methods and suitable tools for the production, collection, standardization and integration of different types of statistical data. Bringing together information from different sources makes it possible to fill information gaps or provide insights which cannot be gleaned from unlinked data and to improve the knowledge and understanding of specific phenomena. DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016 1

1. Official statistics and data integration Introductory remarks (2) Nowadays, the exploitation of administrative data for statistical purposes is a normal practice for a large number of NSOs. This improves the quality of statistical outputs, reduces the statistical burden on respondents and minimizes costs. The Italian National Institute of Statistics (Istat) collects and manages a large amounts of administrative data from different sources, among which: Italian Agency of Revenue Bank of Italy Ministries Social Security Institutions Government Institutions Private Institutions … From 2009 to 2015, administrative data supplied to Istat have trebled DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016 2

1. Official statistics and data integration The Italian legislation on data collection According to the provisions of the Italian Digital Administration Code: before proceeding to the collection of new data, public administrations are required to verify whether the information they need can be acquired through access to information already in the possession of other public authorities or public bodies. the technical options for the usability of data are: web access through the website of the supplier institution or an ad hoc thematic website Interoperability among public administrations for data collection and data integration the user can process data collected exclusively for the pursuit of its institutional goals; data transfer from one information system to another does not change data ownership the transfer of a data from an information system to another does not change the ownership of the given (Guidelines for the drafting of conventions on the usability Public Administrations data; Legislative Decree n. 82/2005, commonly referred to as the “Digital Administration Code”, modified by the Legislative Decree n. 235/2010) DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016 3

1. Official statistics and data integration Administrative data collected by Istat Data collected by Istat are very different from each other in type, content and structure DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016 4

DATA COLLECTOR DATA SUPPLIER 1. Official statistics and data integration Data collection process (1) DATA COLLECTOR - manages data requests - defines methods and standards - manages reminders - stores data and metadata - standardizes and disseminates data DATA SUPPLIER - receives data requests - elaborates data requests - prepares data to be sent - sends data to data collector DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016 5

THESE SOLUTIONS DO NOT PERMIT PROCESS AUTOMATION 1. Official statistics and data integration Data collection process (2) Management of data requests and reminders Complex IT infrastructure Burden for data suppliers Human resources for transactions management Data collection through File Transfer Protocol (FTP) Data uploading through an ad hoc website to manage reminders and data supply requests THESE SOLUTIONS DO NOT PERMIT PROCESS AUTOMATION DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016 6

Representational State Transfer (REST) 2. Tecnology …the World Wide Web offers a possible solution! HTTP (Hypertext Transfer Protocol), the set of rules for transferring files on the Web, can be conveniently used for data collection and data exchange. It is a request/response protocol based on the client-server architecture. Representational State Transfer (REST) is not a standard, is just an architecture style for designing networked applications defines a set of guidelines to use the HTTP protocol in order to perform 4 operations summarized in the acronym CRUD (Create, Read, Update, Delete), by means of an API (Application Programming Interface). DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016 7

2. Tecnology CRUD principles REST is a service concept that may be summarized by the CRUD principles REST allows data suppliers to create, read and update resources with a logic similar to that used to perform operations on any SQL database. DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016 8

2. Tecnology REST architecture enables users to separate relational DB from the client through an API, which exploits HTTP to transmit data and exchange information. DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016 9

3. Model The different types of data, IT tools and skills of data suppliers require a model implying: UNSTRUCTURED DATA - a model collecting data in their essence (key/value) is more convenient and immediate than defining multiple standards for data representation; SCALABILITY - a highly extensible architecture is needed, in case of possible conceptual/architectural future upgrade; INTUITIVE SCHEMA - the model should be easily applied by data suppliers, without resorting to complex studies of any imposed standard; BIG-DATA-ORIENTED ARCHITECTURE - the system should be in line with big-data processing techniques; INTEGRATION WITH MODERN IT TOOLS FOR BIG DATA - storage is closely linked to the tools used for semantic search, data analysis and data visualization. Elasticsearch, Hadhoop, Solr, Cassandra provide a complete integrated environment for managing them. DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016 10

3. Model KEY/VALUE storage model The format that is better suited for HTTP use is JSON (JavaScript Object Notation) to which different models for data representation can be associated. In particular, dealing with highly heterogeneous data, it is recommended to use a model to represent them in their simplest form: a key/value pair. { "keyspace" :     {       "columnfamily" :         {           "rowkey" :             {             "supercolumn" :                 {                         "column name" : "column value"                 }             }         }     } } Statistical Key Value Data Model DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016 11

4. Architecture DataSTAT Hub is a tool for data collection that takes advantage of the potential offered by HTTP 2.0 and REST architecture and exploits the methods offered by the CRUD architecture (Create, Read, Update, Delete). DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016 12

4. Architecture DOCUMENT INDEX / TYPE MAPPING Elasticsearch is an open source search engine that can be conveniently used for collection and release of data. Through Elasticsearch it is possible to index and map documents/data through querystrings to be sent via HTTP in JSON format. Most entities or objects in most applications can be serialized into a JSON object, with keys and values. A key is the name of a field or property, and a value can be a string, a number, a Boolean, another object, an array of values, or some other specialized type such as a string representing a date or an object representing a geolocation. DOCUMENT Documents are indexed—stored and made searchable—by using the index API, which uniquely identify the document. INDEX / TYPE Mapping is the process of defining how a document, and the fields it contains, are stored and indexed. MAPPING DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016 13

4. Architecture DATA STORAGE Data contained in the index can be easily stored in a database that uses the Key/Value model (Eg. Cassandra) Data suppliers can autonomously create data index, describe data content and perform any operation on them (put/update/delete/get) DATA SUPPLIER ELASTICSERACH OUTPUT CHANNEL Indexed data have an immediate dissemination channel which Elasticsearch is associated to as a powerful engine for searching among big data and, possibly, an API that standardizes the output DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016 14

WIDGET / USERS INTERFACE 4. Architecture www. statisticlass.eu Datastat Hub applied to statistical classifications DATA SUPPLIER SEARCH ENGINE REST WEBSERVICES ELASTICSERACH WIDGET / USERS INTERFACE DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016 15

5. Concluding remarks DataSTAT Hub is a suitable and easy tool for the automated collection, standardization and integration of administrative data. Reduction of burden on users: this hub does not require the knowledge of the internal data base since the updating is performed through the HTTP querystrings and can be used with any programming language; once created, the procedure will be used for each next data supply. Reduction of costs in terms of employment of human resources for organizational, bureaucratic and IT management By allowing us to overcome some critical issues related to the use of administrative data, including those connected with privacy and security, a tool such as DataSTAT Hub is time-saving and cost-effective. It is a user-friendly tool developed by making use of open source technologies and can be conveniently shared among NSOs, while it is extensible to any other institution. DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016 16

THANK YOU FOR YOUR ATTENTION FOR ANY QUESTIONS CONTACT US: alessandro.capezzuoli@istat.it emanuela.recchini@istat.it