Co-funded by the European Union Semantic CMS Community Semantic Data Access Copyright IKS Consortium 1 SRDC Ltd. August, 2011
Page: Outline Semantic Data Semantic Web RDF Semantic Data Storage Triple Stores Semantic Data Access SPARQL RQL API Calls Copyright IKS Consortium 2
Page: Semantic Data Stands for machine understandable information Allows computers to figure out the data without user interference Allows computers act intelligently without programming for each task
Page: Semantic Data Provides infrastructure to get practical results Applications find out subsequent information based on the previous relations. (e.g Eiffel Tower -> Paris -> France) Allows reasoning capabilities Providing extraction of related information which is not directly linked
Page: Semantic Web A classical generic description: “Web of data” Extends the World Wide Web By encouraging, Common language for representing data Transformable to/from disparate sources such as relational databases, XML, etc (RDF) Common reusable data model to represent data from different domains in common terms (RDFS, OWL, etc) Rules to enable applications reason over the information (SWRL)
Page: Semantic Web Stack
Page: Semantic Web So many organizations publishing their data in different domains Media Geographic Government … Whole set contains approximately 30 billion triples One of the largest collections is DBPEDIA Semantified version of Wikipedia Example: Obtain cities of China that have population over 20 million Needs efficient storage and query for semantic data Copyright IKS Consortium 7
Page: Representation of Semantic Data RDF The common data format An abstract model with several serialization formats Consists of statement referred as triples having the form (subject, predicate, object) where, Subject: any resource identifier Predicate: a resource identifier of any property Object: either a resource identifier or a literal value
Page: Two types of resource identifiers: URIRef BNode. Property Used when talking about the particular aspect of a resource. Must be represented by URIRefs and may not be given BNode identifiers. Literal Used when the object of the statement has no resource identifier. Represents statement itself RDF
Page: RDF Serialization Formats RDF/XML N3 N-Triples TRiG TRiX Turtle JSON JSON-LD RDFa
Page: RDF Serialization Formats RDF/XML is one of the most used serialization format making use of Relative and absolute URIs Namespaces XSD Datatypes The terms declared in XML Information Set
Page: RDF Serialization Formats Notation 3 (N3) More human readable RDF serialization Aims to integrate logic and data in the same language by allowing smooth integration of rules with RDF
Page: RDF Examples RDF/XML <rdf:RDF xml:base=“ xmlns:rdf=" xmlns:dbpprop=“ xmlns:dbpont=“
Page: RDF Examples Cont’d N3 notation of previous ex: ex:Japan rdf:Type dbpont:Country; dbpprop:areaKm ; dbpprop:city ex:Tokyo. ex:Tokyo rdf:Type dbpont:City
Page: Storing Semantic Data Need for specialized designs for triple collections Two modalities: Relational databases Triple stores Mostly used for storage Lots of implementations They can also be RDB based.
Page: Triple Store A purpose-built database for the storage and retrieval of RDF data. Optimized place to add, remove and query for triples. Each triple in the TripleStore complies with the form (subject, predicate, object)
Page: Considering XML Databases XML databases are existing storage systems for semi- structured data Idea: Transform RDF to XML and store it in XML databases Yet, XML data model is not exactly same with semantic data XML data model is a tree-like structure RDF data is represented through a graph without an hierarchy Copyright IKS Consortium 17
Page: Considering XML Databases XML Databases are not suitable for storage and querying RDF Only simple manipulations can be handled through XML query languages RDF Schema processing and inference is not possible Standard RDF/XML mapping is unsuitable Copyright IKS Consortium 18
Page: Monolithic approach for DB Based Triple Stores Generic representation for all RDF schemas Only two tables are used Resources table Triples table Copyright IKS Consortium 19
Page: Monolithic approach for DB Based Triple Stores Copyright IKS Consortium 20 predidsubidobjidobjvalue Sunscal e iduri 1http:// 2http:// 3http:// 4http:// 5http:// 6http:// 7http:// ns#Property 8http:// 9rl
Page: Triples Stores Can be categorized into 3 category: In memory triple stores Used for certain operations like benchmarking, caching, etc Native triple stores Provides their own implementations (Virtuoso, Mulgara, AllegroGraph, …) Non memory non native triple stores Are built on third party databases (Jena SDB, Kaon, …)
Page: Functionalities provided by Triple Stores RDBMS-support General RDF model access Query language support in the store such as RQL, SPARQL Some stores provide: Provenance - tracking of who-said-what APIs for accessing triple store over network Very few stores provide: Full text search Inference and rule languages Copyright IKS Consortium 22
Page: Example Triple Store implementations RDF Suite Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, Karsten Tolle. The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases, SemWeb, 2001 Based on an ORDBMS model Sesame Relational databases (mysql, postgres, oracle) Jena Relational databases (mysql, postgres, oracle) Virtuoso Native RDF Quad Storage (Physical Quads) Copyright IKS Consortium 23
Page: RDFSuite (ICS-Forth)* * IST C-Web, IST Mesmuses
Page: How triples are stored and accessed in RDF Suite Separate tables are created to store resources Properties, subClasses, subProperties and instances Indices on attributes like URI, source and target Querying is possible through RQL Copyright IKS Consortium 25
Page: How triples are stored and accessed in RDF Suite Copyright IKS Consortium 26 [ Figure from *] *Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, Karsten Tolle. The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases, SemWeb, 2001
Page: Sesame Architecture DBMS-independent API for accessing triple repositories SAIL API A set of Java interfaces between other modules and repository Abstract from the actual storage mechanism Query Module RQL support Different ways to communicate with clients Through Protocol handlers Copyright IKS Consortium 27 *Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International Semantic Web Conference, 2002
Page: SAIL API over PostgreSQL PostgreSQL Object-relational DBMS Support sub-table relations between its tables for providing RDF Schema class and property subsumption Individuals are represented under separate tables created for resources Difficult to add table *Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International Semantic Web Conference, 2002
Page: SAIL API over MySQL MySQL The database schema does not change when the RDFS changes Has advantage where RDFS is unstable *Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International Semantic Web Conference, 2002
Page: Jena2 Architecture Copyright IKS Consortium 30
Page: Jena2 Architecture Copyright IKS Consortium 31 *Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB'03, The first International Workshop onKevin WilkinsonCraig SayersHarumi A. KunoDave Reynolds Semantic Web and Databases
Page: Jena2 Jena2 Denormalized schema Avoids unnecessary joins by merging URIs, literals in statements table Multiple statement tables Better locality and caching Property Tables Copyright IKS Consortium 32
Page: Normalized vs Denormalized Tables Copyright IKS Consortium 33
Page: Property Tables Copyright IKS Consortium 34 SubjectPropertyObject person1nameAlice person1age32 person1twinOfperson2 person1faxPhonex1234 person1adminPhx5678 person2nameBob person2age35 person2adopteeOfperson6 person2friendOfperson8 person2gendermale SubjectPropertyObject person1twinOfperson2 person1faxPhonex1234 person1adminPhx5678 person2adopteeOfperson6 person2friendOfperson8 IDnameagegender p1Alice32- p2Bob35male Triple Store Person Property Table Triple Store Only *Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB'03, The first International Workshop onKevin WilkinsonCraig SayersHarumi A. KunoDave Reynolds Semantic Web and Databases
Page: Jena Persistence Options SDB Scalable storage and query for RDF Specifically designed for SPARQL support Supports: MySQL, PostgreSQL, Oracle 11g, Microsoft SQL server and IBM DB2 Scales to graphs of 100 million triples Copyright IKS Consortium 35
Page: Jena Persistence Options TDB Provides for large scale storage and query of RDF datasets using a pure Java engine Supports SPARQL A non-transactional, faster database solution for use by a single system It scales well beyond SDB and is simpler to setup Copyright IKS Consortium 36
Page: Virtuoso General purpose RDBMS with extensive RDF adaptations RDF data is stored as RDF quads, i.e. it supports RDF with named graphs i.e. graph, subject, predicate, object tuples The columns are G for graph, P for predicate, S for subject and O for object Copyright IKS Consortium 37
Page: Querying Semantic Data Semantic data can be queried from triple stores by Various query languages SPARQL Different endpoints provided RQL RDQL SeRQL … API Calls Through proprietary APIs of different projects Linked Data
Page: SPARQL Is an RDF query language Standardized by W3C Concortium Similar concept of SQL for databases Syntactically resembles to SQL RDF Graphs instead of databases
Page: SPARQL Provides 4 types of query: SELECT Used for querying RDF graph by selecting certain fields from the query pattern CONSTRUCT Used for constructing a single RDF graph specified by a graph template ASK Used for querying existence of a resource DESCRIBE Used for construction an RDF graph, but the structure of graph determined by SPARQL query processor unlike Constructor type
Page: SPARQL SPARQL SELECT queries can be constructed through the following parts: Prefix declaration Field declarations Dataset selection Query pattern Query modifiers
Page: Prefix Declaration Prefix declarations are specified to be able to use short URIs instead of the full ones PREFIX rdf:. PREFIX foaf:.
Page: Field declarations Desired fields from the query pattern are specified after SELECT keyword with PREFIX rdf:. PREFIX foaf:. SELECT ?person ?name
Page: Dataset selection RDF graphs to be queried are stated after together with a FROM clause PREFIX rdf:. PREFIX foaf:. SELECT ?person ?name FROM
Page: Query Pattern A set of triples that determines the target resources to be selected PREFIX rdf:. PREFIX foaf:. SELECT ?person ?name FROM WHERE { ?person foaf:name ?name. }
Page: Query Modifiers There are a plenty of query modifiers in SPARQL providing post processing of query results Order Limit Distinct Offset Reduced Projection
Page: Query Modifiers They are used to Eliminate duplicate results Ordering the results Limiting the number of returned results, etc PREFIX rdf:. PREFIX foaf:. SELECT ?person ?name FROM WHERE { ?person foaf:name ?name. } ORDER BY ?name LIMIT 5 OFFSET 10
Page: FILTER and OPTINAL CLAUSES FILTER provides calling a subset of functions provided by XQuery specification in query pattern of SPARQL PREFIX rdf:. PREFIX foaf:. PREFIX fn:. SELECT ?person ?name FROM WHERE { ?person foaf:name ?name. FILTER (fn:string-length(?name) > 10) }
Page: FILTER and OPTINAL CLAUSES OPTIONAL clause enables to specify query patterns of which match are not obligatory in query execution. PREFIX rdf:. PREFIX foaf:. PREFIX fn:. PREFIX info:. SELECT ?person ?name FROM WHERE { ?person foaf:name ?name. FILTER (fn:string-length(?name) > 10) OPTIONAL { ?person foaf:homepage ?page. } }
Page: ASK Query Example PREFIX foaf:. ASK WHERE { ?person foaf:name “Tim Berners-Lee”. } Check existences of a resource having “Tim Berners- Lee” as foaf:name.
Page: CONSTRUCT Query Example Below example construct an new RDF graph by changing type values of skos:Concept resources. In other words, that means transformation of skos vocabulary to a new custom vocabulary PREFIX rdf:. PREFIX skos:. PREFIX myvocab:. CONSTRUCT { ?person rdf:Type myvocab:MyType. } WHERE { ?person rdf:Type skos:Concept. }
Page: SPARQL Endpoints Provides functionality to query the knowledge base via the SPARQL language Accepts queries and returns results through HTTP protocol Query results can be in different formats such as RDF XML HTML JSON CSV
Page: Semantic Data Access With API Calls Open source projects provides APIs to manipulate RDF data Jena Apache Clerezza Sesame JRDF
Page: Jena Jena provides a rich API to manipulate the RDF stored in the underlying triple store. Model to represent graphs CRUD methods for triples Querying methods for existing resources See the next slide for the code snippet…
Page: Jena Code Snippet String personURI = " String givenName = "John"; String familyName = "Smith"; String fullName = givenName + " " + familyName; // create an empty Model which represents an RDF graph Model model = ModelFactory.createDefaultModel(); // create the resource which will produce the triples in the next slide Resource johnSmith = model.createResource(personURI).addProperty(VCARD.FN, fullName).addProperty(VCARD.N, model.createResource().addProperty(VCARD.Given, givenName).addProperty(VCARD.Family, familyName));
Page: Jena Created triples with the code snippet in previous slide: (, VCARD.FN, “John Smith”) (, VCARD.FN, _) (_, VCARD.Given, “John”) (_, VCARD.Family, “Smith”) Note that _ symbol represents a blank node
Page: Apache Clerezza Provides an API regardless from the different triples stores it supports Its API provides a model to represent RDF graphs and manipulate those graphs Also provides an SPARQL endpoint to query the stored knowledge
Page: Apache Clerezza Code Snippet String base = “ MGraph g = new SimpleMGraph(); g.add( new TripleImpl( new UriRef(base + “JohnSmith”), new UriRef(rdf:Type) new UriRef(foaf:Person))); g.add( new TripleImpl( new UriRef(base + “JohnSmith”), new UriRef(VCARD:FN) LiteralFactory.getInstance().createTypedLiteral(“John”))); Simple code snippet adding two triples to the graph:
Page: Linked Data Interrelated datasets on the Web so that computers can explore them Has a standard format to be accessed and managed Provides integration and reasoning on a huge amount of data on the Web
Page: Linked Data Four famous principles of linked data represented by Tim Berners-Lee Use URIs as names of things Use HTTP URIs to provide dereferencable data to people When an URI is dereferenced provide useful information in standard format (RDF, SPARQL) Provide links to other URIs to make possible discovery of related data
Page: Linked Data
Page: Linking Open Data Project Is an W3C SWEO Project Aims to make data freely to everyone Aims to publish open data sets as RDF and set semantic relationships between them Serves information in a machine readable format Enriches content Reduces duplication Linked datasets increasing rapidly A large number of datasets are linked already
Page: Linked Datasets As of October 2008
Page: Linked Datasets As of September 2010
Page: 2011
Page: Access Data In The Cloud Follow the RDF links representing the “things” SPARQL Endpoints Ready to use software to discover linked data (See the next slide)
Page: Linked Data Applications Lots of application on top of the linked data Tabulator Marbles Openlink RDF Browser … Just google RDF Crawlers RDF Browsers Also see the following link containing a number of linked data applications: LinkingOpenData/Applications LinkingOpenData/Applications
Page: Available SPARQL Endpoints To see possible SPARQL endpoints providing a certain URI see See also a list of alive SPARQL endpoints
Page: References web?src=related_normal&rel= web?src=related_normal&rel= Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, Karsten Tolle. The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases, SemWeb, 2001 Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International, Semantic Web Conference, 2002 Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB'03, The first International Workshop on Semantic Web and Databases