Highlighting the added value of Statistical Linked Open Data Monica Scannapieco Raffaella Aracri Andrea Pagano Paolo Pizzo Laura Tosco Luca Valentino Istat Giovanni Corcione Oracle
Istat’s Linked Open Data Portal The LOD portal of ISTAT The LOD Portal as of today allows accessing about 900 Million RDF triples Traffic for year 2016: 24.000 unique visitors and 750.000 hits Istat LOD Portal: http://datiopen.istat.it English Version: http://datiopen.istat.it/index.php?language=eng 1 Highlighting the added value of Statistical Linked Open Data, Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
Administrative Boundaries Statistical-geographical Boundaries Big Investment on ontology modeling Describes the measures and dimensions of the indicators w/r/t households Ontologies Dwellings Territory Population Describes the measures and dimensions of the indicators w/r/t people Describes the administrative and geographical features of the Italian territory Describes the measures and dimensions of the indicators w/r/t dwellings Households Special Areas Administrative Boundaries Statistical-geographical Boundaries Special Units Regions Provinces Municipalities Localities Census sections Abbeys Hospitals Dimensions Measures Sex Age Marital status 2 Highlighting the added value of Statistical Linked Open Data, Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
IT Architecture of the Portal Open Source Oracle 12C 3 Highlighting the added value of Statistical Linked Open Data, Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
First scenario: use case A bookseller wants to open a new international library. He is carrying out an inspection on an available location and wants to make a market analysis to know the type of users distributed by age, country of origin, educational level and employment status that are resident in areas adjacent to the possible location of the store. 4 Highlighting the added value of Statistical Linked Open Data, Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
First scenario: workflow An app on the smartphone of the bookseller detects the local GPS coordinates It sends this information to the dati.open.istat.it SPARQL endpoint that allows to make a query, via the HTTP protocol, to the Istat’s triple store. The endpoint returns the required information. Data are visualized on the smartphone. 5 Highlighting the added value of Statistical Linked Open Data, Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
First scenario: SPARQL query Selects population indicators by census section Identifies census section nearest to the detected position WKT-Well known text Il Well-known text (WKT) è un linguaggio creato per rappresentare: oggetti di geometria vettoriale su una mappa, un sistema di coordinate di riferimento (una proiezione), una trasformazione tra sistemi di coordinate. Un equivalente binario, il Well-Known Binary (WKB) è generalmente usato per salvare le stesse informazioni in un database. Il formato è mantenuto dall'Open Geospatial Consortium (OGC). A GeoSPARQL query performs the retrieval of the results The GEOSPARQL query identifies the census sections nearest to the detected position and returns for each of them, the related WKT geometry and the resident population according to the specified profiled 6 Highlighting the added value of Statistical Linked Open Data, Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
First scenario: result Example of visualization of the result on a smartphone 7 Highlighting the added value of Statistical Linked Open Data, Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
Second scenario: use case The responsible of a technical office of an Italian province has to make an analysis of the status of degradation of the buildings in the municipalities of the province in relation to the land use. 8 Highlighting the added value of Statistical Linked Open Data, Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
Italian National Institute for Environmental Protection and Research Second Scenario: Federated Querying With LOD, it is very easy to realize analyses with comparison of data coming from different sources (linked for example at territorial level) Federated query on ISTAT and ISPRA (Institute for Environmental Protection and Research) i.e. the query accesses ISTAT and ISPRA portals Results dynamically retrieved from both portals Query on one Portal ISPRA Italian National Institute for Environmental Protection and Research ISTAT 9 Highlighting the added value of Statistical Linked Open Data, Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
Second scenario: workflow The application builds a federated query between the data published by ISTAT and data published by ISPRA. Results are retrieved from both triple stores (data have been linked at the municipality level). Obtained data can be visualized on a chart. 10 Highlighting the added value of Statistical Linked Open Data, Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
Second scenario: SPARQL query The federated query selects for the municipalities in the province : name, cadastral code, resident population and an indicator related to the number of buildings in a bad state of preservation (from the Istat triple store) and an indicator related to the land usage expressed in percentage (from the ISPRA triple store). Query to local endpoint (ISTAT) Query to remote endpoint (ISPRA) 11 Highlighting the added value of Statistical Linked Open Data, Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
Second scenario: result Example of resulting chart. The municipalities are represented by their cadastral code. The details of all retrieved information for a single municipality appears by hovering over its point 12 Highlighting the added value of Statistical Linked Open Data, Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
Conclusions The next planned release is the National Italian Registry of Addresses (with civic numbers), recognized as a priority also by the Italian Agency for IT in Public Administration. The advanced services described in the use cases could be particularly suitable also with respect to these next releases. A dissemination strategy based on open data does put the Official Statistics users at the center: Reaching them through different channels e.g. apps Making easier for them to retrieve data e.g. federated query that make transparent the distribution of data on different portals Providing richer services to them e.g. spatial querying and dynamical visualizations 13 Highlighting the added value of Statistical Linked Open Data, Monica Scannapieco – Brussels, NTTS, 14-16 March 2017