Toward a framework for statistical data integration Ba-Lam Do, Peb Ruswono Aryan, Tuan-Dat Trinh, Peter Wetz, Elmar Kiesling, A Min Tjoa Linked Data Lab,

Slides:



Advertisements
Similar presentations
Improving Human-Semantic Web Interaction: The Rhizomer Experience Roberto García and Rosa Gil GRIHO - Human Computer Interaction Research Group Universitat.
Advertisements

Status on the Mapping of Metadata Standards
National Institute of Statistics, Geography and Informatics (INEGI) Implementation of SDMX in Mexico.
Digital Repositories – Linked Open Data – the possible Role of D4Science Workshop, December 2010, FAO use cases A tool to create Linked Data providers.
Supported by EU projects 12/12/2013 Athens, Greece Open Data in Agriculture Hands-on with data infrastructures that can power your agricultural data products.
Semantic Web Introduction
CZECH STATISTICAL OFFICE | Na padesatem 81, Prague 10 | Jitka Prokop, Czech Statistical Office SMS-QUALITY The project and application.
0 General information Rate of acceptance 37% Papers from 15 Countries and 5 Geographical Areas –North America 5 –South America 2 –Europe 20 –Asia 2 –Australia.
Geographical Service: Gianluca Correndo, Manuel Salvadores, Yang Yang, Nicholas Gibbins, Nigel Shadbolt A compass for the Web of Data.
A division of Publishing Technology Facet Building Web Pages With SPARQL SWIG-UK Event, HP Labs November 23 rd 2007 Leigh Dodds Chief Technology Officer,
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
RDF: Building Block for the Semantic Web Jim Ellenberger UCCS CS5260 Spring 2011.
© Copyright 2008 STI INNSBRUCK Rhizomer “The Rhizomer Semantic Content Management System” Roberto Garcia, Juan.
Distributed Data Analysis & Dissemination System (D-DADS) Prepared by Stefan Falke Rudolf Husar Bret Schichtel June 2000.
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Publishing data on the Web (with.
Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.
From Web 1.0  Web 3.0: Is RDF access to RDB enough? Vipul Kashyap Senior Medical Informatician, Clinical Informatics R&D Partners.
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
Workshop – 10, December 2014, Berlin ICCS / NTUA Greece Efthymios Chondrogiannis An Intelligent Ontology Alignment Tool Dealing with Complicated Mismatches.
DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) Thomas Bosch.
1 Country report 2014 – Statistics Norway PC-Axis Reference Group meeting
CHRIS NELSON METADATA TECHNOLOGY WORK SESSION ON STATISTICAL METADATA GENEVA 6-8 MAY 2013 Designing a Metadata Repository Metadata Technology Ltd.
Quality issues on the way from survey to administrative data: the case of SBS statistics of microenterprises in Slovakia Andrej Vallo, Andrea Bielakova.
Dynamic Hypermedia Generations through a Mediator using CRM and Web Service Jen-Shin Hong National ChiNan University,Taiwan
CountryData Technologies for Data Exchange SDMX Information Model: An Introduction.
Boris Villazón-Terrazas, Ghislain Atemezing FI, UPM, EURECOM, Introduction to Linked Data.
A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:
Using Semantic Mapping to Manage Heterogeneity in XLIFF Interoperability by Dave Lewis, Rob Brennan, Alan Meehan, Declan O’Sullivan CNGL Centre for Global.
Slide 1 Eurostat Unit B3 – Statistical Information Technologies CoRD Meeting – 4 June 2007 Agenda Item 8 Preliminary ideas for a 2011 census hub Giuseppe.
Semantic Enhancement: Key to Massive and Heterogeneous Data Pools Violeta Damjanovic, Thomas Kurz, Rupert Westenthaler, Wernher Behrendt, Andreas Gruber,
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
Eurostat 1 7a. Practical use case 1: Pesticides Use Project Blanaru Cristina Eurostat Unit B5: “Central data and metadata services” SDMX Basics course,
A Data Structure Definition for Eurostat's short-term business statistics Jan Planovsky, Eurostat SDMX Global Conference Bangkok, September 2015.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
Formal Representation and Harmonization of Open Government Data with Semantic Web Technologies Markus Ungersböck.
SDMX IT Tools Introduction
Visualization Four groups Design pattern for information visualization
COMMUNITY. Data Acquisition and Usage Value Chain.
RDFPath: Path Query Processing on Large RDF Graph with MapReduce Martin Przyjaciel-Zablocki et al. University of Freiburg ESWC May 2013 SNU IDB.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
7b. SDMX practical use case: Census Hub
CENSUS OUTPUTS Dissemination Plans Chris Ashford 2011 Census Outputs : Technical Delivery.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
CoRD Meeting 12 March 2003 STIPES (Lot 4) STIPES = Statistical Inquiries from Popular European Software.
Renovation of Eurostat dissemination chain
Chapter 04 Semantic Web Application Architecture 23 November 2015 A Team 오혜성, 조형헌, 권윤, 신동준, 이인용.
Linked Open Data for European Earth Observation Products Carlo Matteo Scalzo CTO, Epistematica epistematica.
SDMX Basics course, March 2016 Eurostat SDMX Basics course, March Introducing the Roadmap Marco Pellegrino Eurostat Unit B5: “Data and.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
1 Mashup Workflow. 2 What We Have 3 Challenges with REST APIs * Only ask what its built to answer * No standard - must relearn each time * Opaque - no.
ΕΚΤ Access to Knowledge ΕΚΤ Access to Knowledge R&D Statistics Information System: An Interoperability Tail between CERIF and SDMX Dimitris Karaiskos Dimitrios.
Integrating Data for Archaeology
Query Rewriting Framework for Spatial Data
EU-US Open Data project
Interoperable data formats: SDMX
SDMX Information Model
The Re3gistry software and the INSPIRE Registry
Linked Data for SDG Reporting
11. The future of SDMX Introducing the SDMX Roadmap 2020
SDMX Information Model: An Introduction
LOD reference architecture
Developing a Data Model
Ioannis Xirouchakis / Unit B3
TOOLS & Projects overview
Item 7.3 (b) SDMX for UOE data collection
NTTS 2019 Conference / Brussels / Belgium
9. Practical use case 3: Pesticides Use Project
GISCO Working Party Mirosław Migacz Chief GIS Specialist
“Argentina´s first steps in SDMX”
Presentation transcript:

Toward a framework for statistical data integration Ba-Lam Do, Peb Ruswono Aryan, Tuan-Dat Trinh, Peter Wetz, Elmar Kiesling, A Min Tjoa Linked Data Lab, Vienna University of Technology

Toward a framework for statistical data integration Agenda  Introduction  Data Integration Framework  Use case  Conclusion and Future Work 2 Introduction Data Integration Framework Use Case Conclusion and Future Work

Toward a framework for statistical data integration Problem Statement  Statistical data  is growing fast  opens up opportunities for interesting applications  But: it is difficult to integrate statistical data due to  diversity in access mechanisms  inconsistencies in vocabulary and entity naming  limited adoption of existing standards 3 Data Cube Vocabulary Introduction Data Integration Framework Use Case Conclusion and Future Work

Toward a framework for statistical data integration Integration Approach Propose a framework for statistical data integration based on:  Consolidation of  Data format: RDF  Access mechanism: SPARQL query  Standard adoption:  vocabulary - Data cube vocabulary  property, code list – SDMX’s content-oriented guidelines (COG)  Components 1.RML mapping service: transforms data from non-RDF formats to RDF format 2.Metadata repository: uses standards to provide interconnection between data sets 3.Mediator: queries and integrates data 4 Introduction Data Integration Framework Use Case Conclusion and Future Work

Toward a framework for statistical data integration Example use case Compare the population of the UK according to three different data sources 5 Introduction Data Integration Framework Use Case Conclusion and Future Work

Toward a framework for statistical data integration Architecture 6 Introduction Data Integration Framework Use Case Conclusion and Future Work

Toward a framework for statistical data integration RML Mapping Service  RML (RDF Mapping Language)  transform data from non-RDF formats into RDF format  supported formats: JSON, XML, CSV, TSV  Our service  Our extensions  XLS format support  parameterized RML mappings: variables in mapping e.g., country code, indicators 7 JSON XML CSV TSV RML Mapping Service Mappings to RDF following Data cube vocabulary URI Data set RDF Introduction Data Integration Framework Use Case Conclusion and Future Work

Toward a framework for statistical data integration Two main parts in structure of metadata 1. Structure and access method for the data set  represent components in each data set  represent method to query this data set: SPARQL, API, and RML mapping 2. Co-reference information: links each component/value to its corresponding identifier  use properties in COG: sdmx-d:refArea,sdmx-m:obsValue …  use code lists in COG: code list of sdmx-d:sex  define new code lists which are not available in COG: spatial values (use Google Geo coding service) and temporal values (use UK time reference service) 8 Introduction Data Integration Framework Use Case Conclusion and Future Work

Toward a framework for statistical data integration Structure of Semantic Metadata 9 Metadata WB’s indicators Sparql, Api, Rml Co-reference value Co-reference property Introduction Data Integration Framework Use Case Conclusion and Future Work

Toward a framework for statistical data integration Issue with COG  Issue  COG has only one measure property, sdmx- m:obsValue Measure a Measure b  loss of meaning of observed measure  querying and integrating data is difficult  a data set can have multiple measures  Approach  use World Bank’s indicators as topic set  assign topic to each data set  split a multi-measure data set into multiple single-measure data sets 10 DS a Measure a Measure b sdmx-m:obsValue RDF DS b sdmx-m:obsValue Introduction Data Integration Framework Use Case Conclusion and Future Work Topic a Topic b

Toward a framework for statistical data integration Inconsistent Number of Dimensions  Inconsistent number of dimensions in each data set  WB, UK data sets: 2 dimensions  EU data set: 5 dimensions  Approach  identify a fixed value (if any) for each dimension  value: unique or aggregated value  assign dimensions that do not appear in both data sets with their fixed values 11 Introduction Data Integration Framework Use Case Conclusion and Future Work

Toward a framework for statistical data integration Mediator 12 Introduction Data Integration Framework Use Case Conclusion and Future Work

Toward a framework for statistical data integration Mediator 13 Query Acceptance Input: a SPARQL query uses consolidated properties/values Query Rewriting Identify suitable data sets in the repository Rewrite the input query based on co-reference information Send queries to SPARQL endpoints or RML mapping service Result Rewriting Rewrite each result based on co-reference information Apply filter conditions to each result Consolidate different units/scales to a common unit/scale Integrate results Return Result Output: Return integrated result to user/application Introduction Data Integration Framework Use Case Conclusion and Future Work

Toward a framework for statistical data integration Role of Stakeholders 1. Data providers  publish data in varying formats  create RML mapping to transform their data from non-RDF formats to RDF format 2. Developers  build innovative data integration applications  create RML mapping for their interest data sets 3. End users  lack semantic web knowledge and programming skills  use appropriate tools to generate SPARQL queries, compare, and visualize data 14 Introduction Data Integration Framework Use Case Conclusion and Future Work

Toward a framework for statistical data integration Requirements for Statistical Data Integration  Principles  Object: two single-measure data sets  Comparison: based on the co-reference identifiers of data sets  Requirements of data structure  They have the same sets of dimensions, same measure, and same topic  They have the same sets of dimensions, same measure, but different topics  They have the same measure and same topic. We can use fixed values to assign dimensions that do not appear in both data sets  Requirement of value: same values 15 Introduction Data Integration Framework Use Case Conclusion and Future Work

Toward a framework for statistical data integration Example: data integration requirements 16 UK World Bank EU Introduction Data Integration Framework Use Case Conclusion and Future Work

Toward a framework for statistical data integration PREFIX qb: SELECT * WHERE { ?ds dc:subject. ?o qb:dataSet ?ds. ?o sdmx-m:obsValue ?obsValue. ?o sdmx-d:refPeriod ?refPeriod. ?o sdmx-d:refArea ?refArea. Filter(?refArea= )} Query Acceptance Query Rewriting Result Rewriting Return Result Mediator 17 Introduction Data Integration Framework Use Case Conclusion and Future Work

Toward a framework for statistical data integration Query Acceptance Query Rewriting Result Rewriting Return Result Mediator 18 SELECT * WHERE { ?o qb:dataSet. ?o sdmx-m:obsValue ?obsValue. ?o sdmx-d:timePeriod ?timePeriod. ?o sdmx-d:refArea ?refArea. ?o sdmx-d:freq. ?o sdmx-d:age. ?o sdmx-d:sex. FILTER(?refArea= ) } source/wb&subject= //pebbie.org/ns/wb/countries/GB source/ons_pop EU data set query WB data set query UK data set query Introduction Data Integration Framework Use Case Conclusion and Future Work

Toward a framework for statistical data integration Query Acceptance Query Rewriting Result Rewriting Return Result Mediator 19 => => ^^ => =>  Rewrite each result based on co-reference information  Apply filter conditions (refArea – UK) to results  Consolidate scales in three results  EU and WB data sets: absolute number scaling  UK data set: millions scale => multiply one million  Integrate results Introduction Data Integration Framework Use Case Conclusion and Future Work

Toward a framework for statistical data integration Query Acceptance Query Rewriting Result Rewriting Return Result Mediator 20 JSON format XML format HTML format Introduction Data Integration Framework Use Case Conclusion and Future Work

Toward a framework for statistical data integration Conclusion  A framework for statistical data integration  RML mapping service transforms data into RDF  Semantic metadata repository provides interconnection between data sets  Mediator facilitates cross-datasets querying  Preliminary results  RML mapping service supports CSV, XML, TSV, JSON, XSL formats  Repository focus on properties: sdmx-d:refArea, sdmx-d:refPeriod, sdmx-d: sex, sdmx-m:obsValue generated semi-automatically  Mediator allows simple queries 21 Introduction Data Integration Framework Use Case Conclusion and Future Work

Toward a framework for statistical data integration Future Work  RML Mapping Service  support users to create RML mappings  Repository  increase the number of data sources  split integrated properties/values into separated properties/values  consider to reuse available properties/code list from data publishers  Mediator  support complex queries  Framework  evaluate the performance of the framework 22 Introduction Data Integration Framework Use Case Conclusion and Future Work

Toward a framework for statistical data integration Thank you very much for your attention! 23 Contact: Ba-Lam Do Linked Data Lab Vienna University of Technology, Austria Introduction Data Integration Framework Use Case Conclusion and Future Work