Presentation is loading. Please wait.

Presentation is loading. Please wait.

RDF Triple Stores Nipun Bhatia Department of Computer Science. Stanford University.

Similar presentations


Presentation on theme: "RDF Triple Stores Nipun Bhatia Department of Computer Science. Stanford University."— Presentation transcript:

1 RDF Triple Stores Nipun Bhatia Department of Computer Science. Stanford University

2 Contents  Introduction  Different Architectures Implications  An Example : Jena SDB  Evaluations Evaluations using LUBM/DBPedia  Open Research Issues  Which RDF Store to choose for a particular application?  Possible system diagram for Phenotype Annonations.

3 Introduction  What is an RDF store? A system to provide a mechanism for persistent storage and access of RDF graphs.  Potential Applications areas: Plenty! Backend for Protege, BioPortal, Phenotype Annotations.

4 Different Architectures  Based on their implementation, can be divided into 3 broad categories : In-memory, Native, Non-native Non- memory.  In – Memory : RDF Graph is stored as triples in main – memory. Eg. Storing an RDF graph using Jena API/ Sesame API.  Native : Persistent storage systems with their own implementation of databases. Eg. Sesame Native, Virtuoso, AllegroGraph, Oracle 11g.  Non-Native Non-Memory : Persistent storage systems set- up to run on third party DBs. Eg. Jena SDB.

5 Implications  Scalability  Different query languages supported to varying degrees. Sesame – SeRQL, Oracle 11g – Own query language.  Different level of inferencing. Sesame supports RDFS inference, AllegroGraph – RDFS++, Oracle 11g – RDFS++, OWL Prime  Lack of interoperability and portability. More pronounced in Native stores.

6 Jena SDB  SDB basically is a Java Loader.  Multiple stores supported: MySQL, PostgreSQL, Oracle, DB2.  Takes incoming triples and breaks them down into components ready for the database.  Multiple layouts  Integration with the Joseki server.  SPARQL supported. (Non) Interest Declaration: I was previously an intern at HP Labs with the Jena team

7 Evaluations  Third party evaluations for Sesame, Jena SDB, Virtuoso  Oracle 11g company evaluations  Methodology LUBM – Lehigh University BenchMark DBPedia Multiple Queries Load Times

8 Evaluations  DB Pedia – Database of structured information extracted from Wikipedia. Information about places, persons, music albums and films[2]  LUBM – Synthetically generated RDF data containing universities, departments, students etc.[1]  Dataset size: DataSet1: 15,472,624 triples; 2.1 GB DataSet 2: LUBM 50 – 2.75 Million & LUBM 1000 – 55.09 Million 3 Queries

9 Loading Time-DataSet1

10 Results – Query 1  Simple select query – 2 variables

11 Query 2  Unconstrained Select Query – only predicate was specified.

12 Query 3  Complex Query – Uses filter

13 Oracle 11g – DataSet 2 Ontology (size)RDFSOWL Prime TriplesTimeTriplesTime LUBM – 50(6.8 Million)2.75 M12.14 min3.05 M8.01 min LUBM – 1000(133.6 M)55.09M7h 19m65.25M7h 12m

14 Observations  Native Stores perform better than systems using third party stores. Optimizations are possible  Each of the systems uses different database layouts. Virtuoso – OGPS,POGS,PSOG,SOPG SDB – SPO,GSPO  Hashing on SDB is very bad.

15 Open Research Issues  Inferencing[4] Present common implementations: Make a number of small queries to propagate the effects of rule firing. Each of these queries creates an interaction with the database. Not very efficient Approaches Snapshot the contents of the database-backed model into RAM for the duration of processing by the inference engine. Performing inferencing in-stream. Precompute the inference closure of ontology and analyze the in-coming data-streams, add triples to it based on your inference closure. Assumes rigid seperation of the RDF Data(A-box) and the Ontology data(T-box) Even this maynot work for very large ontologies – BioMedical Ontologies

16 Open Research Issues  Query Optimization Third party stores undo’s any optimization done at the API level. Better performance of native stores points to that direction. Some work in optimizing SPARQL queries for in-memory story.

17 Which RDF store to choose for an app?  Frequency of loads that the application would perform.  Single scaling factor and linear load times.  Level of inferencing.  Support for which query language. W3C recommendations.  Special system needs. Eg. Allegograph needs 64 bit processor.

18 Phenotype Annotations Set of Ontologies required for Phenotype Annotationseg. PATO, Fly etc. j Jena ModelSDB MySQL / Virtuoso Phenotype Annotations Jena API Inferencing Jena API j Jena ModelSDB

19 References  [1] http://esw.w3.org/topic/RdfStoreBenchmarkinghttp://esw.w3.org/topic/RdfStoreBenchmarking  [2] http://www4.wiwiss.fu-berlin.de/benchmarks-200801/http://www4.wiwiss.fu-berlin.de/benchmarks-200801/  [3] Kurt Rohloff et al.: An Evaluation of Triple-Store Technologies for Large Data Stores. Comparing Sesame, Jena and AllegroGraph. 2007An Evaluation of Triple-Store Technologies for Large Data StoresAllegroGraph  [4]N Bhatia, A Seaborne – ‘Ingestion pipeline for RDF’Ingestion pipeline for RDF


Download ppt "RDF Triple Stores Nipun Bhatia Department of Computer Science. Stanford University."

Similar presentations


Ads by Google