Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to metadata cleansing using SPARQL update queries

Similar presentations


Presentation on theme: "Introduction to metadata cleansing using SPARQL update queries"— Presentation transcript:

1 Introduction to metadata cleansing using SPARQL update queries
April 2014 PwC EU Services

2 Learning objectives By the end of this module, you will have an understanding of: How to transform your metadata using simple SPARQL Update queries How to conform to the ADMS-AP to get your interoperability solutions ready to be shared on Joinup The main types of errors that you could face when uploading metadata of interoperability solutions on Joinup

3 How can this tutorial help you?
Interoperability solutions’ owners may have the possibility to generate automatically in RDF the descriptive metadata of their solutions. Sometimes, this metadata may not be conform to the ADMS Application Profile for Joinup (ADMS-AP), preventing it from being uploaded on Joinup. This tutorial provides basic knowledge on how to transform and cleanse RDF metadata using SPARQL Update queries in order to conform to the ADMS-AP. SPARQL is the query language for RDF and also allows for creating, updating and deleting RDF triples. “Since its launch in 2011 Joinup has been steadily growing in popularity. It currently receives more than visits per month and is hosting some 130 online communities.” ADMS-AP:

4 Outline 1. The context 2. Construct ADMS-AP compliant RDF
ADMS-AP for describing your interoperability solutions About SPARQL About RDF 1. The context Why? Construct queries 2. Construct ADMS-AP compliant RDF The main queries 3 examples 3. Metadata cleansing 4. Metadata upload to Joinup

5 What is the ADMS Application Profile for Joinup (ADMS-AP)
The Asset Description Metadata Schema Application Profile is a common vocabulary used for all type of interoperability solutions. It allows interoperability solutions providers to describe their solutions and easily upload the descriptions on Joinup. It allows users to easily discover and re-use interoperability solutions coming from Joinup using a common vocabulary.

6 Public administrations Standardisation bodies
ADMS-AP for describing your interoperability solutions on Joinup Repository ADMS-AP Public administrations ADMS-AP Using the ADMS Application Profile Explore Find Select Obtain Your repository Academic ADMS-AP ADMS-AP Repository Standardisation bodies Repository Businesses

7 Transformation with Open Refine See how to transform with Open Refine:
Automatic or manual path to generate ADMS-AP Transformation with Open Refine Manual Interoperability solutions Automatic Cleansing with SPARQL This tutorial focuses on the automatic path to generate ADMS-AP compliant RDF. See how to transform with Open Refine:

8 SPARQL Protocol and RDF Query Language (SPARQL)
SPARQL is the standard language to query graph data represented as RDF triples. One of the three core standards of the Semantic Web, along with RDF and OWL. Became a W3C standard January 2008. SPARQL 1.1 standard as of 2013.

9 The Resource Description Framework (RDF)
RDF represents data as (subject, predicate, object) triples. A set of triples is an RDF graph. rdf:type adms:Asset dct:title My asset name Resources (URIs), often abbreviated Resources Plain literals: “Text”, Typed literals: “42”^^xsd:integer, “ ”^^xsd:date NB: subjects and objects may also be blank nodes.

10 Syntaxes are equivalent. It is easy to transform one into another.
A graph can be represented with different syntaxes RDF/XML required by Joinup Turtle used in SPARQL and in this tutorial <rdf:Description about=“ <rdf:type rdf:resource=“ <dct:title>My asset name</dct:title> <dct:description>Description of the asset</dct:description> <dct:modified rdf:datatype=“ T00:00:00Z </dct:modified> </rdf:Description> < a adms:Asset ; dct:title “My asset name” ; dct:description “Description of the asset” ; dct:modified “ T00:00:00Z”^^xsd:dateTime . Syntaxes are equivalent. It is easy to transform one into another.

11 Graph pattern: an RDF graph with placeholder variables (e.g., ?asset)
SPARQL is a query language for RDF data Query: < a adms:Asset ; dct:title “My asset name” ; dct:description “Description of the asset” ; dct:modified “ T00:00:00Z”^^xsd:dateTime . < dct:title “Your asset name” ; dct:description “Another asset” . SELECT * WHERE { ?asset a adms:Asset ; dct:title ?title . } Graph pattern: an RDF graph with placeholder variables (e.g., ?asset) Results: ?asset ?title < “My asset name” < “Your asset name”

12 SPARQL queries have many forms
SPARQL SELECT to query data from a graph (not used in this tutorial) SPARQL CONSTRUCT to transform one graph into another (used for creating ADMS-AP from existing RDF) SPARQL Update to modify a graph in place (used to cleanse ADMS-AP metadata)

13 A useful tool to transform RDF files
Used to create and edit RDF files and run SPARQL queries over them. A free version is also available. “TopBraid Composer is the leading industrial-strength RDF editor and OWL ontology editor, as well as the best SPARQL tool on the market.” Source: For download:

14 Outline 1. The context 2. Construct ADMS-AP compliant RDF
ADMS-AP for describing your interoperability solutions About SPARQL About RDF 1. The context Why? Construct queries 2. Construct ADMS-AP compliant RDF The main queries 3 examples 3. Cleanse metadata 4. Metadata upload to Joinup

15 Construct ADMS-AP from existing RDF
Why? You may already have the metadata description of your interoperability solutions in a RDF file that is not compliant with ADMS-AP (e.g. missing out on mandatory properties or on the use of recommended controlled vocabularies). The following slides help you to create a compliant ADMS-AP RDF graph from your initial RDF.

16 Result graph to construct
Construct ADMS-AP from existing RDF … using a SPARQL CONSTRUCT query CONSTRUCT { ?asset a adms:Asset ; dct:title ?title ; dct:description ?description ; dct:modified ?modified ; dct:type < ; dct:relation ?related ; dcat:distribution ?d . ?d a adms:AssetDistribution ; dcat:accessURL ?asset . } WHERE { ?asset a voaf:Vocabulary ; dct:modified ?modified . OPTIONAL { ?asset voaf:similar ?related } BIND(IRI(CONCAT(STR(?asset), "?type=distribution")) AS ?d) } Result graph to construct Graph pattern to query Recommended and optional fields Construct new URIs using expressions

17 Construct ADMS-AP from existing RDF
… the result is a new RDF graph < a adms:Asset ; dct:title “Food ; dct:description “This ; dct:modified “ ” ; dct:type < ; dct:relation < ; dcat:distribution < . < a adms:AssetDistribution ; dcat:accessURL < . < dct:title “Food Ontology in ; dct:description “Along ; dct:modified “ ” ; dcat:distribution < . < dcat:accessURL < . < a voaf:Vocabulary ; dct:title “Food ; dct:description “This ; dct:modified “ ” ; voaf:similar < . < dct:title “Food Ontology in ; dct:description “Along ; dct:modified “ ” . Example from the Linked Open Vocabulary repository. 17

18 Outline 1. The context 2. Construct ADMS-AP compliant RDF
ADMS-AP for describing your interoperability solutions About SPARQL About RDF 1. The context Why? Construct queries 2. Construct ADMS-AP compliant RDF The main queries 3 examples 3. Metadata cleansing 4. Metadata upload to Joinup

19 Metadata cleansing Why?
You may need to make some small modifications to your RDF graph in order to have it fully compliant to ADMS-AP Only ADMS-AP compliant descriptive metadata can be uploaded on Joinup. Joinup has a built-in ADMS-AP validation feature to help you pinpoint inconsistencies with the standard.

20 Metadata cleansing … with SPARQL update queries
Add static triples (INSERT DATA) Remove static triples (DELETE DATA) Modify static triples (combine INSERT DATA and DELETE DATA) Add triples based on query results (INSERT) Remove triples based on query results (DELETE) Modify triples based on query results (DELETE/INSERT) For more info:

21 Metadata cleansing … add static triples
Example: add the title of a specific interoperability solution (modelled as an adms:Asset) Query: INSERT DATA { < dct:title “Asset . } Before: After: < a adms:Asset ; dct:description “Description…” . < a adms:Asset ; dct:title “Asset ; dct:description “Description…” .

22 Metadata cleansing … remove static triples
Example: remove an erroneous date of a specific asset Query: DELETE DATA { < dct:issued “ ”^^xsd:date . } Before: After: < a adms:Asset ; dct:title “Asset ; dct:description “Description…” ; dct:issued “ ”^^xsd:date . < a adms:Asset ; dct:title “Asset ; dct:description “Description…” .

23 Metadata cleansing … modify static triples
Example: modify the title of a specific asset Query: DELETE DATA { < dct:title “Asset . } INSERT DATA { < dct:title “My asset . Before: After: < a adms:Asset ; dct:title “Asset ; dct:description “Description…” . < a adms:Asset ; dct:title “My asset ; dct:description “Description…” .

24 Metadata cleansing … add triples based on query results
Example: add asset type for all assets whose name contain “Schema” Query: INSERT { ?asset dct:type < . } WHERE { ?asset a adms:Asset ; dct:title ?title . FILTER(CONTAINS(?title, “Schema”)) } Before: After: < a adms:Asset ; dct:title “My Asset Schema” ; dct:description “Description…” . < dct:title “Your Asset Vocabulary” . < a adms:Asset ; dct:title “My Asset Schema” ; dct:description “Description…” ; dct:type < . < dct:title “Your Asset Vocabulary” .

25 Metadata cleansing … remove triples based on query results
Example: remove all asset modification dates in the future Query: DELETE { ?asset dct:modified ?date . } WHERE { ?asset a adms:Asset ; dct:modified ?date . FILTER(?date > NOW()) } Before: After: < a adms:Asset ; dct:title “Asset ; dct:modified “ T00:00:00Z”^^xsd:dateTime . < dct:title “Your Asset Vocabulary” ; “ T11:42:22Z”^^xsd:dateTime . < a adms:Asset ; dct:title “Asset ; dct:description “Description…” . < dct:title “Your Asset Vocabulary” ; dct:modified “ T11:42:22Z”^^xsd:dateTime .

26 Metadata cleansing … modify triples based on query results
Example: replace a word in all asset titles Query: DELETE { ?asset dct:title ?title . } INSERT { ?asset dct:title ?newtitle . } WHERE { ?asset a adms:Asset ; dct:title ?title . BIND(REPLACE(?title, “grt”, “great”) AS ?newtitle) } Before: After: < a adms:Asset ; dct:title “My grt asset” . < dct:title “Your asset” . < a adms:Asset ; dct:title “My great asset” . < dct:title “Your asset” .

27 Metadata cleansing Proposed fixes for 3 common issues
Ensure all text fields have a language tag Transform date strings into xsd:dateTime values Add missing asset modification dates

28 Metadata cleansing Ensure all text fields have a language tag Query:
DELETE { ?s ?p ?o . } INSERT { ?s ?p ?olang . } WHERE { ?s ?p ?o . FILTER(?p IN (foaf:name, dct:title, dct:description)) FILTER(LANG(?o) = “”) BIND(STRLANG(?o, “en”) AS ?olang) } Before: After: < a adms:Asset ; dct:title “Asset name” ; dct:description . < a adms:Asset ; dct:title “Asset ; dct:description .

29 Metadata cleansing Transform “YYYY-MM-DD” strings into xsd:dateTime values Query: DELETE { ?s dct:modified ?str . } INSERT { ?s dct:modified ?date . } WHERE { ?s dct:modified ?str . BIND(xsd:dateTime(CONCAT(?str, “T00:00:00Z”)) AS ?date) } Before: After: < a adms:Asset ; dct:title “Asset ; dct:description ; dct:modified “ ” . < a adms:Asset ; dct:title “Asset ; dct:description ; dct:modified “ T00:00:00Z”^^xsd:dateTime .

30 Metadata cleansing Add missing asset modification dates, copying the creation date Query: INSERT { ?asset dct:modified ?date . } WHERE { ?asset a adms:Asset ; dct:issued ?date . FILTER NOT EXISTS { ?asset dct:modified ?modified } } Before: After: < a adms:Asset ; dct:title “Asset ; dct:issued “ T00:00:00Z”^^xsd:dateTime . < a adms:Asset ; dct:title “Your ; “ T00:00:00Z”^^xsd:dateTime ; dct:modified “ T00:00:00Z”^^xsd:dateTime . < a adms:Asset ; dct:title “Asset ; dct:issued “ T00:00:00Z”^^xsd:dateTime ; dct:modified “ T00:00:00Z”^^xsd:dateTime . < a adms:Asset ; dct:title “Your ; “ T00:00:00Z”^^xsd:dateTime ; “ T00:00:00Z”^^xsd:dateTime .

31 Outline 1. The context 2. Construct ADMS-AP compliant RDF
ADMS-AP for describing your interoperability solutions About SPARQL About RDF 1. The context Why? Construct queries 2. Construct ADMS-AP compliant RDF The main queries 3 examples 3. Metadata cleansing 4. Metadata upload to Joinup

32 Metadata upload to Joinup
Upload an RDF/XML file to Joinup On your repository page, click on “Upload metadata” Select the RDF/XML file Click on “Upload the metadata file” 2 3 1 32

33 Metadata upload to Joinup
Get the upload status Log in with your account Go to the repository page Click on “Report file” 33

34 Metadata upload to Joinup
Reading the upload log Lines have the format: :36:02 INFO - Treatment of the repository … Timestamp Level Message INFO Information message WARN Warning (you may ignore it) ERROR Error (you should fix it) 34

35 Related learning resources
Introduction to ADMS-AP How to import and export ADMS-AP conform metadata of interoperability solutions on Joinup Introduction to the Open Refine RDF tool Using Joinup as catalogue for interoperability solutions Introduction to the advanced search functionality of EFIR

36 Disclaimers The views expressed in this presentation are purely those of the authors and may not, in any circumstances, be interpreted as stating an official position of the European Commission. The European Commission does not guarantee the accuracy of the information included in this presentation, nor does it accept any responsibility for any use thereof. Reference herein to any specific products, specifications, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favouring by the European Commission. All care has been taken by the author to ensure that s/he has obtained, where necessary, permission to use any parts of manuscripts including illustrations, maps, and graphs, on which intellectual property rights already exist from the titular holder(s) of such rights or from her/his or their legal representative. This presentation has been carefully compiled by PwC, but no representation is made or warranty given (either express or implied) as to the completeness or accuracy of the information it contains. PwC is not liable for the information in this presentation or any decision or consequence based on the use of it. PwC will not be liable for any damages arising from the use of the information contained in this presentation. The information contained in this presentation is of a general nature and is solely for guidance on matters of general interest. This presentation is not a substitute for professional advice on any particular matter. No reader should act on the basis of any matter contained in this publication without considering appropriate professional advice.

37 Visit our initiatives Get involved Follow @Joinup_EU on Twitter
Project Officer Contractors Visit our initiatives Get involved on Twitter Join the CISR community on Joinup Joinup and ADMS are funded by the ISA Programme


Download ppt "Introduction to metadata cleansing using SPARQL update queries"

Similar presentations


Ads by Google