Lifting Data Portals to the Web of Data

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Digital Repositories – Linked Open Data – the possible Role of D4Science Workshop, December 2010, FAO use cases A tool to create Linked Data providers.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.
Supported by EU projects 12/12/2013 Athens, Greece Open Data in Agriculture Hands-on with data infrastructures that can power your agricultural data products.
(1) Standardizing for Open Data Ivan Herman, W3C Open Data Week Marseille, France, June Slides at:
Samad Paydar Web Technology Laboratory Computer Engineering Department Ferdowsi University of Mashhad 1389/11/20 An Introduction to the Semantic Web.
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Dublin Core application profiles in context Thomas Baker 22 October 2009 Knowledge Organization Systems: Managing to the Future A joint CENDI/NKOS Workshop.
Data Sets, Vocabularies and Tools Pablo N. Mendes Freie Universität Berlin 1st year review Luxembourg, December /02/11.
How can you use Open Data? ... And why you should!
 Copyright 2009 Digital Enterprise Research Institute. All rights reserved Digital Enterprise Research Institute Semantic Search for CMS IKS.
ONTOLOGY SUPPORT For the Semantic Web. THE BIG PICTURE  Diagram, page 9  html5  xml can be used as a syntactic model for RDF and DAML/OIL  RDF, RDF.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator Session: Managing Ecological Data for Effective Use and Reuse Patrice Seyed.
Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.
Semantic Web outlook and trends May The Past 24 Odd Years 1984 Lenat’s Cyc vision 1989 TBL’s Web vision 1991 DARPA Knowledge Sharing Effort 1996.
Michalis Vafopoulos NTUA, GFOSS & The transformers GREEN CITY HACKATHON.
Semantic Publishing Update Second TUC meeting Munich 22/23 April 2013 Barry Bishop, Ontotext.
Data API. AO database List builder Export API User data Transactional data Hybrid data CSV or XLS Automated ftp/sftp https GET request CSV or XML Options.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Supported by EU projects 12/12/2013 Athens, Greece Open Data in Agriculture Hands-on with data infrastructures that can power your agricultural data products.
Boris Villazón-Terrazas, Ghislain Atemezing FI, UPM, EURECOM, Introduction to Linked Data.
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
Antoine Isaac 1 st PRELIDA Workshop Pisa, June 26, 2013.
Linked Data: Emblematic applications on Legacy Data in Libraries.
Semi-Automatic Quality Assessment of Linked Data without Requiring Ontology Saemi Jang, Megawati, Jiyeon Choi, and Mun Yong Yi KIRD, KAIST NLP&DBPEDIA.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Semantic Publishing Benchmark Task Force Fourth TUC Meeting, Amsterdam, 03 April 2014.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
RDF and Relational Databases
 Key integrating concepts  Groups  Formal Community Groups  Ad-hoc special purpose/ interest groups  Fine-grained access control and membership 
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,
Paloma Marín Arraiza 17 th International Conference on Grey Literature 1 st and 2 nd December 2015, Amsterdam (Netherlands) SCIENTIFIC AUDIOVISUAL MATERIALS.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Prizms for Data Publication and Management May 9, 2014 Katie Chastain.
Prizms for Data Publication and Management Katie Chastain May 9, 2014.
© Copyright 2015 STI INNSBRUCK PlanetData D2.7 Recommendations for contextual data publishing Ioan Toma.
Briefing and Planning meeting on INSPIRE validator implementation – Discussion 16/12/2015.
Linked Open Data for European Earth Observation Products Carlo Matteo Scalzo CTO, Epistematica epistematica.
Linked Library (+AM) Data Presented LITA Next-Generation Catalog IG Corey A Harper Publish, Enrich, Relate and Un-Silo.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Session: Towards systematically curating and integrating
Linking Big Data from Space to Apps on Earth
Linked Data Web that can be processed by machines
Sebastian Neumaier Advisor: Univ.Prof. Dr. Axel Polleres Co-Advisor:
Scientific Reproducibility using the Provenance for Healthcare and Clinical Research Framework Satya S. Sahoo Collaborators/Co-Authors: Joshua Valdez,
Presented at Archives Records 2016, session 510
Yesterday in a talk this slide was presented.
Semantic Database Builder
Big Data Quality the next semantic challenge
NISO Virtual Conference 19 February 2014 Ralph Swick, W3C
Metadata Quality: Learning from Open Data Portalwatch
Building an Open Knowledge Graphs for and from Open Data
The Re3gistry software and the INSPIRE Registry
Linked Data for SDG Reporting
Linked Data Technologies in e-Government and e-Medicine
EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal
How can DDI make the most of RDF?
Jürgen Umbrich Invited talk at eXascale Infolab, Fribourg, June 2016
Semantic Annotation service
Google Dataset Search Evaluation
LOD reference architecture
Data Provenance.
W3C Recommendation 17 December 2013 徐江
On Provenance of Queries on Linked Web Data
Linked Data Ryan McAlister.
Taxonomy of public services
Australian and New Zealand Metadata Working Group
Presentation transcript:

Lifting Data Portals to the Web of Data Sebastian Neumaier, Jürgen Umbrich, Axel Polleres Vienna University of Economics and Business, Vienna, Austria

Lifting Data Portals to the Web of Data

Open Data Portal Watch Project Monitoring and QA over evolving data portals 3/2015 [1]: - 90 portals - Only CKAN 8/2015 [2]: - 6 quality metrics - QA 6/2016 [3]: - 260 portals - CKAN, Socrata, OpenDataSoft - 18 metrics [1] Towards assessing the quality evolution of open data portals. In ODQ2015: Open Data Quality Workshop, Munich, Germany [2] Quality assessment & evolution of open data portals. In: International Conference on Open and Big Data, Rome, Italy (2015) [3] Automated quality assessment of metadata across open data portals. ACM Journal of Data and Information Quality (2016)

Identified challenges Metadata is heterogeneous and (partially) messy Software-specific metadata (CKAN vs Socrata vs …) Portal-specific metadata Missing metadata (file formats, API descriptions, …) Metadata not available as Linked Data Only partially in DCAT vocabulary No mappings for additional metadata fields Poor discoverability of datasets No content information in metadata (e.g., CSV headers) Datasets’ metadata not optimized for search engines Metadata still in silos -> bring it to the Web

Current progress Enable access SPARQL, Memento, sitemap.xml Mapping to standard vocabularies DCAT, Schema.org Enrich the datasets DQV, CSVW, PROV Enable access SPARQL, Memento, sitemap.xml

Mapping to standard vocabularies DCAT export of metadata from CKAN, Socrata and OpenDataSoft portals Mapping of most frequent (portal/domain specific) extra-metadata fields Mapping and publishing of Schema.org: Mapping of DCAT to Schema.org‘s dataset vocabulary Enabling integration into knowledge graphs of major search engines

Enrich the datasets Portal Watch quality dimensions: Data Quality vocabulary Metadata for tables: CSV on the Web vocabulary Record provenance: PROV ontology

CSV on the Web metadata Dialect properties: HTTP Content-Type → dcat:mediaType Encoding detection → csvw:encoding Delimiter detection → csvw:delimiter Schema properties: CSV header line → csvw:name Column type detection → xsd:datatype

The big picture Bold arraws _> cross vocab links

Enable Access SPARQL endpoint [1]: Memento framework: Three versions in RDF triple store (as named graphs) 120 million triples each Memento framework: Datetime negotiation on “Accept-Datetime” HTTP header Access to original metadata, DCAT, and DQV measures Schema.org via sitemap.xml [2]: Publishing of all 850k datasets as HTML-embedded Schema.org [1] http://data.wu.ac.at/portalwatch/sparql [2] http://data.wu.ac.at/odso/

Outlook & Future Work data.wu.ac.at/portalwatch Richer CSVW metadata XSD datatypes, date(time) patterns, … Access/query historical data Named graphs vs change-based vs timestamp-based Add links to datasets and external knowledge Enrich/link metadata (e.g., tags/keywords) Derive semantic labels from data sources Enrichment: CSV to RDF Metadata to RDF Historical data: 2 years of data -> 5 billion statement. Allow users to query over prior versions Linked data fragments BEAR work on benchmarking, evolution of RDF (put reference in) Sebastian Neumaier WU Vienna, Institute for Information Business email: sebastian.neumaier@wu.ac.at url: https://sebneumaier.wordpress.com/ twitter: @sebneum

Backup Slides

Portal Watch quality measures Existence Is certain metadata available Conformance Valid emails/URLs/dates Open Data Open format Machine-readable Open license

Add provenance information We annotate: Mapped DCAT description Quality measurements CSVW metadata PROV-Activities: „fetch“-activity „csvw“-activity