Linked Data Best Practices (and Abuses) Lessons Learned in IBM Rational Arthur Ryman 2014-04-15.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

OSLC Resource Shape: A Linked Data Constraint Language Arthur Ryman & Achille Fokoue, IBM W3C RDF Validation Workshop, Cambridge,
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
The Semantic Web – WEEK 4: RDF
1 RDF Tutorial. C. Abela RDF Tutorial2 What is RDF? RDF stands for Resource Description Framework It is used for describing resources on the web Makes.
RDFa: Embedding RDF Knowledge in HTML Some content from a presentation by Ivan Herman of the W3c, Introduction to RDFa, given at the 2011 Semantic Technologies.
ESDSWG2011 – Semantic Web session Semantic Web Sub-group Session ESDSWG 2011 Meeting – Semantic Web sub-group session Wednesday, November 2, 2011 Norfolk,
RDF Tutorial.
Semantic Web Introduction
© Copyright IBM Corporation 2014 Getting started with Rational Engineering Lifecycle Manager queries Andy Lapping – Technical sales and solutions Joanne.
An Introduction to XML Based on the W3C XML Recommendations.
Feature requests for Case Manager By Spar Nord Bank A/S IBM Insight 2014 Spar Nord Bank A/S1.
Chapter 3 RDF Syntax 1. Topics Basic concepts of RDF resources, properties, values, statements, triples URIs and URIrefs RDF graphs Literals and Qnames.
CSCI 572 Project Presentation Mohsen Taheriyan Semantic Search on FOAF profiles.
Semantic Search Jiawei Rong Authors Semantic Search, in Proc. Of WWW Author R. Guhua (IBM) Rob McCool (Stanford University) Eric Miller.
Dr. Alexandra I. Cristea RDF.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
Form Handling, Validation and Functions. Form Handling Forms are a graphical user interfaces (GUIs) that enables the interaction between users and servers.
Chapter 9 Collecting Data with Forms. A form on a web page consists of form objects such as text boxes or radio buttons into which users type information.
Metadata Standards and Applications 4. Metadata Syntaxes and Containers.
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Publishing data on the Web (with.
Semantic Web Series 1 Mohammad M. R. Cowdhury UniK, Kjeller.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
SPARQL All slides are adapted from the W3C Recommendation SPARQL Query Language for RDF Web link:
Overview of Previous Lesson(s) Over View  ASP.NET Pages  Modular in nature and divided into the core sections  Page directives  Code Section  Page.
Practical RDF Chapter 1. RDF: An Introduction
Logics for Data and Knowledge Representation
The Semantic Web Web Science Systems Development Spring 2015.
Module 7: Fundamentals of Administering Windows Server 2008.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
Master Informatique 1 Semantic Technologies Part 11Direct Mapping Werner Nutt.
XRules An XML Business Rules Language Introduction Copyright © Waleed Abdulla All rights reserved. August 2004.
CC L A W EB DE D ATOS P RIMAVERA 2015 Lecture 2: RDF Model & Syntax Aidan Hogan
JSON-LD. JSON as an XML Alternative JSON is a light-weight alternative to XML for data- interchange JSON = JavaScript Object Notation – It’s really language.
© 2012 IBM Corporation Best Practices for Publishing RDF Vocabularies Arthur Ryman,
Using Semantic Web Data: Inference Lalana Kagal Decentralized Information Group MIT CSAIL Eric Prud'hommeaux Engineer World Wide Web Consortium.
Module 10 Administering and Configuring SharePoint Search.
Integrating Modeling Tools in the Development Lifecycle with OSLC Miami, October 2013 Adam Neal (Presenter) Maged.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
Tutorial 13 Validating Documents with Schemas
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Understanding RDF. 2/30 What is RDF? Resource Description Framework is an XML-based language to describe resources. A common understanding of a resource.
Access and Query Task Force Status at F2F1 Simon Miles.
Dr. Bhavani Thuraisingham September 24, 2008 Building Trustworthy Semantic Webs Lecture #9: RDF and RDF Security.
Copyright © 2008 Model Driven Solutions EKB XML Interface Jim Logan September 2008 Formerly Data Access Technologies.
JAVA BEANS JSP - Standard Tag Library (JSTL) JAVA Enterprise Edition.
Doc.: IEEE /0169r0 Submission Joe Kwak (InterDigital) Slide 1 November 2010 Slide 1 Overview of Resource Description Framework (RFD/XML) Date:
RDF and Relational Databases
Web Technologies Lecture 10 Web services. From W3C – A software system designed to support interoperable machine-to-machine interaction over a network.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
© 2008 IBM Corporation Presentation URLs from Resource URLs Last updated Sep. 22, 2008.
Configuring and Deploying Web Applications Lesson 7.
Knowledge Technologies Manolis Koubarakis 1 Some Other Useful Features of RDF.
© 2010 IBM Corporation RESTFul Service Modelling in Rational Software Architect April, 2011.
Linked Data Publishing on the Semantic Web Dr Nicholas Gibbins
SysML v2 Model Interoperability & Standard API Requirements Axel Reichwein Consultant, Koneksys December 10, 2015.
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
What's New in RDF 1.1 Cambridge Semantic Web Gathering 9 April 2013
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Developing Linked Data Applications
Linked Data Web that can be processed by machines
WEB SERVICES From Chapter 19 of Distributed Systems Concepts and Design,4th Edition, By G. Coulouris, J. Dollimore and T. Kindberg Published by Addison.
Node.js Express Web Services
CC La Web de Datos Primavera 2016 Lecture 2: RDF Model & Syntax
RDF 1.1 Concepts and Abstract Syntax
LOD reference architecture
WEB SERVICES From Chapter 19, Distributed Systems
JSON for Linked Data: a standard for serializing RDF using JSON
Semantic-Web, Triple-Strores, and SPARQL
Presentation transcript:

Linked Data Best Practices (and Abuses) Lessons Learned in IBM Rational Arthur Ryman

Best Practices Publishing vocabularies Data model customization Real-world things JSON and RDF Multi-valued and optional properties Provenance and inverse properties Ontologies and constraints 2

PUBLISHING VOCABULARIES 3

Publishing vocabularies We should use established vocabularies if they exist – W3C, Dublin Core, OSLC, … Any new terms we define should be described in vocabulary documents rooted at – propose generally useful terms to OSLC When you look up an RDF term, you should get its vocabulary document – HTML for web browsers – RDF for programs, e.g. query builders – e.g. 4

Vocabulary page for 5

How to publish a vocabulary We have a new public wiki! – Read the guidelinesguidelines Create a wiki page and attach the HTML, Turtle, and RDF/XML files Request a review from Nelson – Allow dev time to address issues Arthur will redirect jazz.net/ns to the wiki 6

LinkedData wiki 7

Abuses You published your vocabulary but skimped on the content – e.g. minimal or cryptic comments You published your vocabulary, but didn’t keep it up-to-date – e.g. Focal Point You created some new terms but didn’t publish your vocabulary – e.g. JLIP Tracked Resource Set

DATA MODEL CUSTOMIZATION 9

Data model customization Many of our tools allow customization – e.g. RTC work items We need to expose the custom data elements as RDF Tools should allow users to map custom data elements to externally defined RDF terms – industry standards – corporate standards When no mapping is specified, tools should generate local RDF terms and vocabularies – vocabularies are needed by query authors – tools must host the vocabularies they generate 10

Abuses Your tool generates a cryptic URI for local RDF terms – Obfuscates meaning – Forces humans to access vocabulary document Your tool does not generate a vocabulary document for local RDF terms – e.g. RTC – see following case study When the mapping to RDF is changed, your tool does not create TRS change events for just the affected resources 11

Case study: RTC Work Items Some attributes are built-in Some are defined by OSLC CM 2.0 Some are user defined Consider Priority 12

Project area editor allows customization 13

Enumerated values should specify RDF URIs (External Value) 14

Priority values are enumerated 15

Get the resource URL 16

Look for priority in the RDF representation of Task

RDF triple for priority Subject (good) resource/itemName/com.ibm.team.workitem.Wo rkItem/ Predicate (bad) Object (ugly) oslc/enumerations/_QYx2UBIzEd6bpunPP4ZLOA/ priority/priority.literal.l3 18

Object of priority is not an RDF vocabulary term 19

Problems The priority predicate comes from a non-existent vocabulary (bad) – – RDF vocabularies should be dereferenceable – OSLC should publish it, tagged as archaicarchaic The object is a dereferenceable URI (good), but not a vocabulary term (ugly) – Need rdfs:label, rdfs:comment for query authors Result: no easy way to write queries based on priority 20

Best Practice for external vocabularies RTC project template should refer to external vocabularies for standard terms – OSLC CM V3 defines priority and 4 values Teach and enable clients to create corporate standard vocabularies for reuse of common terms (UA) – Needed for cross-project queries Provide export/import UI to manage vocabularies – E.g. Focal Point uses simple spreadsheet format 21

Best Practice for local vocabularies RTC (and all other tools) should generate a local RDF vocabulary for all user-defined terms – Include rdfs:label, rdfs:comment for query authors (and other consumers) LQE admin should load user-defined vocabularies into LQE to make them available to queries – provide programmatic integration, e.g. a special purpose vocabulary TRS 22

Best Practice for all vocabularies When an administrator changes the RDF representation of a set of resources, corresponding change events MUST be added to the TRS change log – Add/remove custom attributes and values – Modify mapping to RDF URIs Allow the administrator to make multiple representation changes and then manually trigger the generation of change events – Batch multiple representation changes together to minimize re-indexing time and server load 23

REAL-WORLD THINGS 24

La Trahison des Images "The famous pipe. How people reproached me for it! And yet, could you stuff my pipe? No, it's just a representation, is it not? So if I had written on my picture "This is a pipe", I'd have been lying!“ - René MagritteRené Magritte 25

Real-world things Linked Data differentiates between two kinds of thing – Information, e.g. a document on the web – Real-world, e.g. a person Both kinds should be identified with HTTP URIs Looking up a real-world URI should result in an information resource that contains information about the real-world thing – URI-references (hash URIs) – HTTP redirect: 303 See Other (303 URIs) Refer to Cool URIs for the Semantic WebCool URIs for the Semantic Web 26

Example foaf:Person Suppose you create a document, about John Smith on The following is nonsense because John Smith was not created on : a foaf:Person. dcterms:created “ ”^^xsd:date. The following makes sense: a foaf:Person. dcterms:created “ ”^^xsd:date. 27

Abuses Failure to differentiate between a person and an account owned by a person – Leads to nonsense triples – Focal Point Defect Focal Point Defect – JTS Defect JTS Defect – See following JTS users case study NOTE: address is the preferred way to identify people across tools 28

Work items refer to people 29

JTS Users OSLC Core specifies that the object of dcterms:creator, dcterms:contributor, oslc:modifiedBy should be a resource of class foaf:Agent or foaf:Person (real-world) RTC implements OSLC CM and has triples like: dcterms: creator, dcterms:contributor. 30

RDF representation of person contains nonsense 31

Best Practice The property j.1:archived applies to the user account (information resource), not the person (real-world) Solution 1: use hash URIs for people: Solution 2: use 303 URIs for accounts (preferred by Philippe): 32

303 URI jfs:. a foaf:OnlineAccount, jfs:archived false. a foaf:Person; foaf:account, foaf:img ; foaf:mbox ; foaf:name "Arthur Ryman"; foaf:nick "ryman". 33

JSON AND RDF 34

JSON Familiar to OO and Web developers Popularity fueled by Cloud e.g. Amazon uses JSON as the payload in AWS REST APIs as an alternative to SOAP and XML – Simpler/faster to handle by web clients Use is spreading across the stack – MongoDB, CouchDB/Cloudant – node.js 35

JSON and RDF Some developers are saying: “JSON is simpler and more popular than RDF. Let’s use JSON instead of RDF.” – This is a false dichotomy JSON is just as problematic as XML for data integration – JSON and XML are message formats Linked Data is our integration strategy – RDF expresses semantics Use JSON-LD, now a W3C standardJSON-LD – OSLC and Rational should publish standard contexts See following LQE Security Context case studyLQE Security Context case study 36

Initial JSON design Simple, but no explicit semantics Use of UUIDs instead of HTTP URIs [ { "security_context_id" : "urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6", "name" : "Resources for Alpha project" }, { "security_context_id" : "urn:uuid:g92e5gbf-8efd-22e1-b876-11b1d02f7cg7", "name" : "Resources for Beta project" } ] 37

Equivalent JSON-LD design { { " "dcterms": " }, [ { "#1", "dcterms:title": "Resources for Alpha project" }, { "#2", "dcterms:title": "Resources for Beta project" } ] } 38

Final JSON-LD design with type info { [ { " " }, { " " " "Resources for Alpha project" }, { " " " "Resources for Beta project" } ] } 39

MULTI-VALUED AND OPTIONAL PROPERTIES 40

Multi-valued and optional properties RDF documentations contain sets of triples Model multi-valued properties by a set of triples that share a common subject and object Model the absence of an optional property by an empty set of triples 41

Abuses Model multiple values of a property by concatenating the values into a single object – Defeats database indexing – Slows queries since substring matching must be used Model the absence of an optional value using the presence of an empty string – Adds many unnecessary triples – Slows queries (longer scans) – Sometimes an empty string is a meaning value – Sometimes an empty string is lexically invalid See following RTC tag case study Defect Defect

“Tags” is multi-valued “Estimate” is optional 43

RDF dcterms:subject "datagap, oslc, next_release_candidate, data_gap, reporting-gap"^^xsd:string ; … rtc_cm:estimate ""^^xsd:long. Syntax validated OK. There were warnings: Typed literal has an invalid lexical value: Input string was not in the correct format: s.Length==0.: ""^^. 44

PROVENANCE AND INVERSE PROPERTIES 45

Provenance: Where did the triple come from? A statement is represented by a triple Triples from multiple documents may be merged and queried – Default graph is a triple store When storing RDF documents, the document URL is often used as the name of a graph (e.g. in LQE) – triple + graph name = quad – triple stores are really quad stores Provenance of triples is important in several use cases – Updating a document – Access control – VVC (which version) 46

Provenance and authority The authority (trust) of a triple depends on the author of the document that contains the triple Triples should be placed in the document that the author is authorized to modify – When creating a link from A to B, put the link in the document that the author is editing, not necessarily A or B or both – Document C may contain links from A to B 47

Inverse properties Directed relations between resources (links) may be stated in two equivalent ways, e.g. – Testcase1 validates Requirement2. – Requirement2 isValidatedBy Testcase1. There is no benefit to having mutual inverse pairs of properties The existence of mutual inverse pairs of properties makes query authoring more complex, and query execution more expensive A triple should be put in the document that the author of the triple is editing (provenance) – There is no special significance attached to being the subject of a triple See OSLC guidance on preferred direction of propertiesOSLC guidance – Direction should be from downstream to upstream, – e.g. test case validates requirement 48

Abuses OSLC domain specs define many pairs of mutual inverse predicates Recommendation – Deprecate one member of each pair – Replace deprecated property in all RDF representations and queries 49

ONTOLOGIES AND CONSTRAINTS 50

Vocabularies and Ontologies A vocabulary defines the meaning of terms – Use RDFS: rdfs:label, rdfs:comment, rdfs:isDefinedBy, … An ontology defines inference rules – Given a set of triples, infer more triples – Use RDFS: rdfs:domain, rdfs:range, rdfs:subClassOf, … – Use OWL for more complex inference rules 51

Ontologies and Constraints Ontologies are not designed to define integrity constraints – See Linked Data Interfaces for examplesLinked Data Interfaces An RDFS or OWL reasoner will add triples to create a model for the ontology A reasoner will report an inconsistency if it cannot create a model – However, this mechanism cannot in practice be used to check for typical integrity constraints 52

Best Practice: Ontologies Your triples may end up in a reasoner one day, so only add inference rules when they produce the intended results If you define generic properties, such as “uses”, then you probably SHOULD NOT define rdfs:domain and rdfs:range If you define type-specific properties, such as “usesTestCase” then rdfs:domain and rdfs:range MAY make sense e.g. If you intend to infer that the object of oslc_qm:usesTestCase is an oslc_qm:TestCase then include the following triple in an ontology: oslc_qm:usesTestCase rdfs:range oslc_qm:TestCase. 53

Best Practice: Constraints W3C is starting an activity on RDF validation – See W3C workshopW3C workshop We have submitted the OSLC Resource Shape specification to W3C – See Resource Shape 2.0Resource Shape 2.0 Use Resource Shape 2.0 to describe integrity constraints on RDF documents 54

Other topics Blank nodes – Mean there exists or some – use fragment ids for internal resources Containers – Avoid Seq, Bag, List – Use Linked Data Platform containersLinked Data Platform Consuming external vocabularies – Tools should gracefully degrade when external resources are unreachable – Be a well-behaved HTTP client wrt caching, etc. 55