Introduction to metadata cleansing using SPARQL update queries

Slides:



Advertisements
Similar presentations
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Advertisements

ESDSWG2011 – Semantic Web session Semantic Web Sub-group Session ESDSWG 2011 Meeting – Semantic Web sub-group session Wednesday, November 2, 2011 Norfolk,
RDF Tutorial.
Semantic Web Introduction
Training Module 2.4 Designing and developing RDF vocabularies
JOINING UP GOVERNMENTS EUROPEAN COMMISSION ADMS-enabled exploration of GS1 Dox 20 February 2013.
Introduction to the Open Refine RDF tool March 2014 PwC EU Services.
1 Computing for Todays Lecture 22 Yumei Huo Fall 2006.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
RDF: Building Block for the Semantic Web Jim Ellenberger UCCS CS5260 Spring 2011.
Matt Masson| Senior Program Manager
EIRA/CarTool e-SENS pilot Follow-up call ISA Programme Action 2.1 & February 2015 Follow-up call 03 February 2015.
Training Module 1.4 Introduction to metadata management
Training Module 2.5 Data & metadata licensing PwC firms help organisations and individuals create the value they’re looking for. We’re a network of firms.
UKOLUG - July Metadata for the Web RDF and the Dublin Core Andy Powell UKOLN, University of Bath UKOLN.
CIMI / FHIR and Shape Expressions. Local DB … …
EdReNe Workshop London, 8th – 9th January 2008 Enhancing the LOM application profiles using the DOI AIE – Italian Publishers Association.
Logics for Data and Knowledge Representation SPARQL Protocol and RDF Query Language (SPARQL) Feroz Farazi.
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Publishing data on the Web (with.
Training Module 1.3 Introduction to RDF & SPARQL PwC firms help organisations and individuals create the value they’re looking for. We’re a network of.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
® IBM Software Group © 2009 IBM Corporation Rational Publishing Engine RQM Multi Level Report Tutorial David Rennie, IBM Rational Services A/NZ
Practical RDF Chapter 1. RDF: An Introduction
Open Data Support Contributing to the development of the European data economy Nikolaos Loutas, Michiel De Keyzer, Leda Bargiotti PwC EU Services PwC firms.
Logics for Data and Knowledge Representation
Save time. Reduce costs. Find and reuse interoperability solutions on Joinup for developing European public services Nikolaos Loutas
How to import and export ADMS-AP conform metadata of interoperability solutions on Joinup 1.
Using Joinup as a catalogue for interoperability solutions March 2014 PwC EU Services.
Introduction to the advanced search functionality of Joinup March 2014 PwC EU Services.
Introduction to the Asset Description Metadata Schema Application Profile (ADMS-AP) March 2014 PwC EU Services.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
ELIS – Multimedia Lab PREMIS OWL Sam Coppens Multimedia Lab Department of Electronics and Information Systems Faculty of Engineering Ghent University.
05/01/2016 SPARQL SPARQL Protocol and RDF Query Language S. Garlatti.
Training Module 2.4 Designing and developing RDF vocabularies.
Doc.: IEEE /0169r0 Submission Joe Kwak (InterDigital) Slide 1 November 2010 Slide 1 Overview of Resource Description Framework (RFD/XML) Date:
EIRA/CarTool EE pilot Follow-up call ISA Programme Action 2.1 & January Follow-up call 28 January 2015.
PRESENTATION OF THE TEST REGISTRY AND REPOSITORY (TRR) ON JOINUP 23 OCTOBER 2015 Roch Bertucat, ENGISIS.
EIRA/CarTool NL pilot Follow-up call ISA Programme Action 2.1 & January 2015 Follow-up call 29 January 2015.
Linked Open Data for European Earth Observation Products Carlo Matteo Scalzo CTO, Epistematica epistematica.
Sage Franch | Technical Evangelist Susan Ibach | Technical Evangelist.
XP Creating Web Pages with Microsoft Office
1 Designing and using normalization rules Yoel Kortick Senior Librarian, Ex Libris.
1 RDF Storage and Retrieval Systems Jan Pettersen Nytun, UiA.
Session: Towards systematically curating and integrating
Slides Template for Module 3 Contextual details needed to make data meaningful to others CC BY-NC.
An Introduction to the Bibliographic Metadata Profile in Alma
Defining and using an external search profile with multiple targets for copy cataloging Yoel Kortick Senior Librarian Alma Product Management.
Your Name Proposal Creation Module 5 Your Name
Introduction to Persistent Identifiers
Training Module 1.4 Introduction to metadata management
RDF and RDB 1 Some slides adapted from a presentation by Ivan Herman at the Semantic Technology & Business Conference, 2012.
SPARQL SPARQL Protocol and RDF Query Language
Middleware independent Information Service
Semantic Database Builder
Services Course 9/9/2018 3:37 PM Services Course Windows Live SkyDrive Participant Guide © 2008 Microsoft Corporation. All rights reserved.
Materials Engineering Product Data Management (ePDM)
Geospatial Knowledge Base (GKB) Training Platform
Manage your Interest Group
Logics for Data and Knowledge Representation
Adding and Editing Students and Student Test Settings
Managing Rosters Screener Training Module Module 5
RDF 1.1 Concepts and Abstract Syntax
Session 2: Metadata and Catalogues
LOD reference architecture
CC La Web de Datos Primavera 2018 Lecture 8: SPARQL [1.1]
Resource Description Framework (RDF)
Rational Publishing Engine RQM Multi Level Report Tutorial
Integrating Office 2013 Programs
BEMS user Manual Fundación cartif.
Semantic-Web, Triple-Strores, and SPARQL
Presentation transcript:

Introduction to metadata cleansing using SPARQL update queries April 2014 PwC EU Services

Learning objectives By the end of this module, you will have an understanding of: How to transform your metadata using simple SPARQL Update queries How to conform to the ADMS-AP to get your interoperability solutions ready to be shared on Joinup The main types of errors that you could face when uploading metadata of interoperability solutions on Joinup

How can this tutorial help you? Interoperability solutions’ owners may have the possibility to generate automatically in RDF the descriptive metadata of their solutions. Sometimes, this metadata may not be conform to the ADMS Application Profile for Joinup (ADMS-AP), preventing it from being uploaded on Joinup. This tutorial provides basic knowledge on how to transform and cleanse RDF metadata using SPARQL Update queries in order to conform to the ADMS-AP. SPARQL is the query language for RDF and also allows for creating, updating and deleting RDF triples. “Since its launch in 2011 Joinup has been steadily growing in popularity. It currently receives more than 60.000 visits per month and is hosting some 130 online communities.” ADMS-AP: https://joinup.ec.europa.eu/asset/adms/asset_release/adms-application-profile-joinup

Outline 1. The context 2. Construct ADMS-AP compliant RDF ADMS-AP for describing your interoperability solutions About SPARQL About RDF 1. The context Why? Construct queries 2. Construct ADMS-AP compliant RDF The main queries 3 examples 3. Metadata cleansing 4. Metadata upload to Joinup

What is the ADMS Application Profile for Joinup (ADMS-AP) The Asset Description Metadata Schema Application Profile is a common vocabulary used for all type of interoperability solutions. It allows interoperability solutions providers to describe their solutions and easily upload the descriptions on Joinup. It allows users to easily discover and re-use interoperability solutions coming from Joinup using a common vocabulary.

Public administrations Standardisation bodies ADMS-AP for describing your interoperability solutions on Joinup Repository ADMS-AP Public administrations ADMS-AP Using the ADMS Application Profile Explore Find Select Obtain Your repository Academic ADMS-AP ADMS-AP Repository Standardisation bodies Repository Businesses

Transformation with Open Refine See how to transform with Open Refine: Automatic or manual path to generate ADMS-AP Transformation with Open Refine Manual Interoperability solutions Automatic Cleansing with SPARQL This tutorial focuses on the automatic path to generate ADMS-AP compliant RDF. See how to transform with Open Refine: https://joinup.ec.europa.eu/svn/adms/trainings/Introduction_to_Open_Refine_RDF_tool.pptx

SPARQL Protocol and RDF Query Language (SPARQL) SPARQL is the standard language to query graph data represented as RDF triples. One of the three core standards of the Semantic Web, along with RDF and OWL. Became a W3C standard January 2008. SPARQL 1.1 standard as of 2013.

The Resource Description Framework (RDF) RDF represents data as (subject, predicate, object) triples. A set of triples is an RDF graph. rdf:type adms:Asset http://myasset.eu/ dct:title My asset name Resources (URIs), often abbreviated Resources Plain literals: “Text”, “Text”@en Typed literals: “42”^^xsd:integer, “2014-01-01”^^xsd:date NB: subjects and objects may also be blank nodes.

Syntaxes are equivalent. It is easy to transform one into another. A graph can be represented with different syntaxes RDF/XML required by Joinup Turtle used in SPARQL and in this tutorial <rdf:Description about=“http://myasset.eu/”> <rdf:type rdf:resource=“http://www.w3.org/ns/adms#Asset”/> <dct:title>My asset name</dct:title> <dct:description>Description of the asset</dct:description> <dct:modified rdf:datatype=“http://www.w3.org/2001/XMLSchema#dateTime”> 2014-01-01T00:00:00Z </dct:modified> </rdf:Description> <http://myasset.eu/> a adms:Asset ; dct:title “My asset name” ; dct:description “Description of the asset” ; dct:modified “2014-01-01T00:00:00Z”^^xsd:dateTime . Syntaxes are equivalent. It is easy to transform one into another.

Graph pattern: an RDF graph with placeholder variables (e.g., ?asset) SPARQL is a query language for RDF data Query: <http://myasset.eu/> a adms:Asset ; dct:title “My asset name” ; dct:description “Description of the asset” ; dct:modified “2014-01-01T00:00:00Z”^^xsd:dateTime . <http://yourasset.eu/> dct:title “Your asset name” ; dct:description “Another asset” . SELECT * WHERE { ?asset a adms:Asset ; dct:title ?title . } Graph pattern: an RDF graph with placeholder variables (e.g., ?asset) Results: ?asset ?title <http://myasset.eu/> “My asset name” <http://yourasset.eu/> “Your asset name”

SPARQL queries have many forms SPARQL SELECT to query data from a graph (not used in this tutorial) SPARQL CONSTRUCT to transform one graph into another (used for creating ADMS-AP from existing RDF) SPARQL Update to modify a graph in place (used to cleanse ADMS-AP metadata)

A useful tool to transform RDF files Used to create and edit RDF files and run SPARQL queries over them. A free version is also available. “TopBraid Composer is the leading industrial-strength RDF editor and OWL ontology editor, as well as the best SPARQL tool on the market.” Source: http://semanticweb.org/ For download: http://www.topquadrant.com/downloads/

Outline 1. The context 2. Construct ADMS-AP compliant RDF ADMS-AP for describing your interoperability solutions About SPARQL About RDF 1. The context Why? Construct queries 2. Construct ADMS-AP compliant RDF The main queries 3 examples 3. Cleanse metadata 4. Metadata upload to Joinup

Construct ADMS-AP from existing RDF Why? You may already have the metadata description of your interoperability solutions in a RDF file that is not compliant with ADMS-AP (e.g. missing out on mandatory properties or on the use of recommended controlled vocabularies). The following slides help you to create a compliant ADMS-AP RDF graph from your initial RDF.

Result graph to construct Construct ADMS-AP from existing RDF … using a SPARQL CONSTRUCT query CONSTRUCT { ?asset a adms:Asset ; dct:title ?title ; dct:description ?description ; dct:modified ?modified ; dct:type <http://purl.org/adms/assettype/Ontology> ; dct:relation ?related ; dcat:distribution ?d . ?d a adms:AssetDistribution ; dcat:accessURL ?asset . } WHERE { ?asset a voaf:Vocabulary ; dct:modified ?modified . OPTIONAL { ?asset voaf:similar ?related } BIND(IRI(CONCAT(STR(?asset), "?type=distribution")) AS ?d) } Result graph to construct Graph pattern to query Recommended and optional fields Construct new URIs using expressions

Construct ADMS-AP from existing RDF … the result is a new RDF graph <http://data.lirmm.fr/ontologies/food> a adms:Asset ; dct:title “Food Ontology”@en ; dct:description “This ontology…”@en ; dct:modified “2013-09-24” ; dct:type <http://purl.org/adms/assettype/Ontology> ; dct:relation <http://www.w3.org/TR/2003/PR-owl-guide-20031215/food> ; dcat:distribution <http://data.lirmm.fr/ontologies/food?type=distribution> . <http://data.lirmm.fr/ontologies/food?type=distribution> a adms:AssetDistribution ; dcat:accessURL <http://data.lirmm.fr/ontologies/food> . <http://www.w3.org/TR/2003/PR-owl-guide-20031215/food> dct:title “Food Ontology in OWL”@en ; dct:description “Along with…”@en ; dct:modified “2003-12-15” ; dcat:distribution <http://www.w3.org/TR/2003/PR-owl-guide-20031215/food?type=distribution> . <http://www.w3.org/TR/2003/PR-owl-guide-20031215/food?type=distribution> dcat:accessURL <http://www.w3.org/TR/2003/PR-owl-guide-20031215/food> . <http://data.lirmm.fr/ontologies/food> a voaf:Vocabulary ; dct:title “Food Ontology”@en ; dct:description “This ontology…”@en ; dct:modified “2013-09-24” ; voaf:similar <http://www.w3.org/TR/2003/PR-owl-guide-20031215/food> . <http://www.w3.org/TR/2003/PR-owl-guide-20031215/food> dct:title “Food Ontology in OWL”@en ; dct:description “Along with…”@en ; dct:modified “2003-12-15” . Example from the Linked Open Vocabulary repository. 17

Outline 1. The context 2. Construct ADMS-AP compliant RDF ADMS-AP for describing your interoperability solutions About SPARQL About RDF 1. The context Why? Construct queries 2. Construct ADMS-AP compliant RDF The main queries 3 examples 3. Metadata cleansing 4. Metadata upload to Joinup

Metadata cleansing Why? You may need to make some small modifications to your RDF graph in order to have it fully compliant to ADMS-AP Only ADMS-AP compliant descriptive metadata can be uploaded on Joinup. Joinup has a built-in ADMS-AP validation feature to help you pinpoint inconsistencies with the standard.

Metadata cleansing … with SPARQL update queries Add static triples (INSERT DATA) Remove static triples (DELETE DATA) Modify static triples (combine INSERT DATA and DELETE DATA) Add triples based on query results (INSERT) Remove triples based on query results (DELETE) Modify triples based on query results (DELETE/INSERT) For more info: http://www.w3.org/TR/sparql11-update/#graphUpdate https://joinup.ec.europa.eu/community/ods/document/tm13-introduction-rdf-sparql-en

Metadata cleansing … add static triples Example: add the title of a specific interoperability solution (modelled as an adms:Asset) Query: INSERT DATA { <http://myasset.eu/> dct:title “Asset name”@en . } Before: After: <http://myasset.eu/> a adms:Asset ; dct:description “Description…” . <http://myasset.eu/> a adms:Asset ; dct:title “Asset name”@en ; dct:description “Description…” .

Metadata cleansing … remove static triples Example: remove an erroneous date of a specific asset Query: DELETE DATA { <http://myasset.eu/> dct:issued “2242-01-01”^^xsd:date . } Before: After: <http://myasset.eu/> a adms:Asset ; dct:title “Asset name”@en ; dct:description “Description…” ; dct:issued “2242-01-01”^^xsd:date . <http://myasset.eu/> a adms:Asset ; dct:title “Asset name”@en ; dct:description “Description…” .

Metadata cleansing … modify static triples Example: modify the title of a specific asset Query: DELETE DATA { <http://myasset.eu/> dct:title “Asset name”@en . } INSERT DATA { <http://myasset.eu/> dct:title “My asset name”@en . Before: After: <http://myasset.eu/> a adms:Asset ; dct:title “Asset name”@en ; dct:description “Description…” . <http://myasset.eu/> a adms:Asset ; dct:title “My asset name”@en ; dct:description “Description…” .

Metadata cleansing … add triples based on query results Example: add asset type for all assets whose name contain “Schema” Query: INSERT { ?asset dct:type <http://purl.org/adms/assettype/Schema> . } WHERE { ?asset a adms:Asset ; dct:title ?title . FILTER(CONTAINS(?title, “Schema”)) } Before: After: <http://myasset.eu/> a adms:Asset ; dct:title “My Asset Schema” ; dct:description “Description…” . <http://yourasset.eu/> dct:title “Your Asset Vocabulary” . <http://myasset.eu/> a adms:Asset ; dct:title “My Asset Schema” ; dct:description “Description…” ; dct:type <http://purl.org/adms/assettype/Schema> . <http://yourasset.eu/> dct:title “Your Asset Vocabulary” .

Metadata cleansing … remove triples based on query results Example: remove all asset modification dates in the future Query: DELETE { ?asset dct:modified ?date . } WHERE { ?asset a adms:Asset ; dct:modified ?date . FILTER(?date > NOW()) } Before: After: <http://myasset.eu/> a adms:Asset ; dct:title “Asset name”@en ; dct:modified “2242-01-01T00:00:00Z”^^xsd:dateTime . <http://yourasset.eu/> dct:title “Your Asset Vocabulary” ; “2000-08-12T11:42:22Z”^^xsd:dateTime . <http://myasset.eu/> a adms:Asset ; dct:title “Asset name”@en ; dct:description “Description…” . <http://yourasset.eu/> dct:title “Your Asset Vocabulary” ; dct:modified “2000-08-12T11:42:22Z”^^xsd:dateTime .

Metadata cleansing … modify triples based on query results Example: replace a word in all asset titles Query: DELETE { ?asset dct:title ?title . } INSERT { ?asset dct:title ?newtitle . } WHERE { ?asset a adms:Asset ; dct:title ?title . BIND(REPLACE(?title, “grt”, “great”) AS ?newtitle) } Before: After: <http://myasset.eu/> a adms:Asset ; dct:title “My grt asset” . <http://yourasset.eu/> dct:title “Your asset” . <http://myasset.eu/> a adms:Asset ; dct:title “My great asset” . <http://yourasset.eu/> dct:title “Your asset” .

Metadata cleansing Proposed fixes for 3 common issues Ensure all text fields have a language tag Transform date strings into xsd:dateTime values Add missing asset modification dates

Metadata cleansing Ensure all text fields have a language tag Query: DELETE { ?s ?p ?o . } INSERT { ?s ?p ?olang . } WHERE { ?s ?p ?o . FILTER(?p IN (foaf:name, dct:title, dct:description)) FILTER(LANG(?o) = “”) BIND(STRLANG(?o, “en”) AS ?olang) } Before: After: <http://myasset.eu/> a adms:Asset ; dct:title “Asset name” ; dct:description “Description…”@en . <http://myasset.eu/> a adms:Asset ; dct:title “Asset name”@en ; dct:description “Description…”@en .

Metadata cleansing Transform “YYYY-MM-DD” strings into xsd:dateTime values Query: DELETE { ?s dct:modified ?str . } INSERT { ?s dct:modified ?date . } WHERE { ?s dct:modified ?str . BIND(xsd:dateTime(CONCAT(?str, “T00:00:00Z”)) AS ?date) } Before: After: <http://myasset.eu/> a adms:Asset ; dct:title “Asset name”@en ; dct:description “Description…”@en ; dct:modified “2014-02-24” . <http://myasset.eu/> a adms:Asset ; dct:title “Asset name”@en ; dct:description “Description…”@en ; dct:modified “2014-02-24T00:00:00Z”^^xsd:dateTime .

Metadata cleansing Add missing asset modification dates, copying the creation date Query: INSERT { ?asset dct:modified ?date . } WHERE { ?asset a adms:Asset ; dct:issued ?date . FILTER NOT EXISTS { ?asset dct:modified ?modified } } Before: After: <http://myasset.eu/> a adms:Asset ; dct:title “Asset name”@en ; dct:issued “2014-02-24T00:00:00Z”^^xsd:dateTime . <http://yourasset.eu/> a adms:Asset ; dct:title “Your asset”@en ; “2012-01-01T00:00:00Z”^^xsd:dateTime ; dct:modified “2014-03-04T00:00:00Z”^^xsd:dateTime . <http://myasset.eu/> a adms:Asset ; dct:title “Asset name”@en ; dct:issued “2014-02-24T00:00:00Z”^^xsd:dateTime ; dct:modified “2014-02-24T00:00:00Z”^^xsd:dateTime . <http://yourasset.eu/> a adms:Asset ; dct:title “Your asset”@en ; “2012-01-01T00:00:00Z”^^xsd:dateTime ; “2014-03-04T00:00:00Z”^^xsd:dateTime .

Outline 1. The context 2. Construct ADMS-AP compliant RDF ADMS-AP for describing your interoperability solutions About SPARQL About RDF 1. The context Why? Construct queries 2. Construct ADMS-AP compliant RDF The main queries 3 examples 3. Metadata cleansing 4. Metadata upload to Joinup

Metadata upload to Joinup Upload an RDF/XML file to Joinup On your repository page, click on “Upload metadata” Select the RDF/XML file Click on “Upload the metadata file” 2 3 1 32

Metadata upload to Joinup Get the upload status Log in with your account Go to the repository page Click on “Report file” 33

Metadata upload to Joinup Reading the upload log Lines have the format: 2013-08-30 17:36:02 INFO - Treatment of the repository … Timestamp Level Message INFO Information message WARN Warning (you may ignore it) ERROR Error (you should fix it) 34

Related learning resources Introduction to ADMS-AP How to import and export ADMS-AP conform metadata of interoperability solutions on Joinup Introduction to the Open Refine RDF tool Using Joinup as catalogue for interoperability solutions Introduction to the advanced search functionality of EFIR

Disclaimers The views expressed in this presentation are purely those of the authors and may not, in any circumstances, be interpreted as stating an official position of the European Commission. The European Commission does not guarantee the accuracy of the information included in this presentation, nor does it accept any responsibility for any use thereof. Reference herein to any specific products, specifications, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favouring by the European Commission. All care has been taken by the author to ensure that s/he has obtained, where necessary, permission to use any parts of manuscripts including illustrations, maps, and graphs, on which intellectual property rights already exist from the titular holder(s) of such rights or from her/his or their legal representative. This presentation has been carefully compiled by PwC, but no representation is made or warranty given (either express or implied) as to the completeness or accuracy of the information it contains. PwC is not liable for the information in this presentation or any decision or consequence based on the use of it. PwC will not be liable for any damages arising from the use of the information contained in this presentation. The information contained in this presentation is of a general nature and is solely for guidance on matters of general interest. This presentation is not a substitute for professional advice on any particular matter. No reader should act on the basis of any matter contained in this publication without considering appropriate professional advice.

Visit our initiatives Get involved Follow @Joinup_EU on Twitter Project Officer Szabolcs.SZEKACS@ec.europa.eu  Contractors Nikolaos.Loutas@be.pwc.com Joan.Bremers@be.pwc.com Visit our initiatives Get involved Follow @Joinup_EU on Twitter Join the CISR community on Joinup Joinup and ADMS are funded by the ISA Programme