The Design and Implementation of Minimal RDFS Backward Reasoning in 4store Manuel Salvadores, Gianluca Correndo, Steve Harris, Nick Gibbins, and Nigel.

Slides:



Advertisements
Similar presentations
SPARQL Dimitar Kazakov, with references to material by Noureddin Sadawi ARIN, 2014.
Advertisements

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
WIMS 2014, June 2-4Thessaloniki, Greece1 Optimized Backward Chaining Reasoning System for a Semantic Web Hui Shi, Kurt Maly, and Steven Zeil Contact:
An Introduction to RDF(S) and a Quick Tour of OWL
Department of Computer Science and Engineering University of Washington Brian N. Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
Building and Analyzing Social Networks Web Data and Semantics in Social Network Applications Dr. Bhavani Thuraisingham February 15, 2013.
WIMS 2011, Sogndal, Norway1 Comparison of Ontology Reasoning Systems Using Custom Rules Hui Shi, Kurt Maly, Steven Zeil, and Mohammad Zubair Contact:
Triple Stores
On Triple Dissemination, Forward- Chaining and Load Balancing in DHT Based RDF Stores Dominic Battre, Felix Heine, Andre Höing, and Odej Kao Presented.
 Copyright 2008 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Context Dependent Reasoning.
Geographical Service: Gianluca Correndo, Manuel Salvadores, Yang Yang, Nicholas Gibbins, Nigel Shadbolt A compass for the Web of Data.
Embedded Network Controller with Web Interface Bradley University Department of Electrical & Computer Engineering By: Ed Siok Advisor: Dr. Malinowski.
Audumbar Chormale Advisor: Dr. Anupam Joshi M.S. Thesis Defense
Triple Stores.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz.
1. Motivation Knowledge in the Semantic Web must be shared and modularly organised. The semantics of the modular ERDF framework has been defined model.
RDF Triple Stores Nipun Bhatia Department of Computer Science. Stanford University.
-By Mohamed Ershad Junaid UTD ID :
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
Market Blended Insight Modeling propensity to buy with the Semantic Web ISWC 2008 Manuel Salvadores, Landong Zuo, SM Hazzaz Imtiaz, John Darlington, Nicholas.
1 CENTRIA, Dept. Informática da Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Caparica, Portugal. 2 Institute of Computer Science,
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.
Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools Mohammad Farhan Husain, Latifur Khan, Murat Kantarcioglu and Bhavani Thuraisingham.
VLDB2012 Hoang Tam Vo #1, Sheng Wang #2, Divyakant Agrawal †3, Gang Chen §4, Beng Chin Ooi #5 #National University of Singapore, †University of California,
1 © 2012 OpenLink Software, All rights reserved. Virtuoso - Column Store, Adaptive Techniques for RDF Orri Erling Program Manager, Virtuoso Openlink Software.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Trisolda Jakub Yaghob Charles University in Prague, Czech Rep.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
IDB, SNU Dong-Hyuk Im Efficient Computing Deltas between RDF Models using RDFS Entailment Rules (working title)
KIT – University of the State of Baden-Württemberg and National Large-scale Research Center of the Helmholtz Association Institute of Applied Informatics.
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, Bhavani Thuraisingham University.
Ontologies and Lexical Semantic Networks, Their Editing and Browsing Pavel Smrž and Martin Povolný Faculty of Informatics,
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Comparison of BaseVISor, Jena and Jess Rule Engines Jakub Moskal, Northeastern University Chris Matheus, Vistology, Inc.
TAKE – A Derivation Rule Compiler for Java Jens Dietrich, Massey University Jochen Hiller, TopLogic Bastian Schenke, BTU Cottbus.
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.
Semantic Web Programming in Python an Introduction Biju B Jaganath G.
Authors: Haowei Yuan and Patrick Crowley Publisher: 2013 Proceedings IEEE INFOCOM Presenter: Chia-Yi Chu Date: 2013/08/14 1.
Large-scale Linked Data Management Marko Grobelnik, Andreas Harth (Günter Ladwig), Dumitru Roman Big Linked Data Tutorial Semantic Days 2012.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
RDF languages and storages part 1 - expressivness Maciej Janik Conrad Ibanez CSCI 8350, Fall 2004.
Scalable Distributed Reasoning Using MapReduce Jacopo Urbani, Spyros Kotoulas, Eyal Oren, and Frank van Harmelen Department of Computer Science, Vrije.
Bigscholar 2014, April 8, Seoul, South Korea1 Trust and Hybrid Reasoning for Ontological Knowledge Bases Hui Shi, Kurt Maly, and Steven Zeil Contact:
PHS / Department of General Practice Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn Knowledge representation in TRANSFoRm AMIA.
Semantic Publishing Benchmark Task Force Fourth TUC Meeting, Amsterdam, 03 April 2014.
TAKE – A Derivation Rule Compiler for Java Jens Dietrich, Massey University Jochen Hiller, TopLogic Bastian Schenke, BTU Cottbus/REWERSE.
Triple Stores. What is a triple store? A specialized database for RDF triples Can ingest RDF in a variety of formats Supports a query language – SPARQL.
Building an Operational Product Ontology System Written by Taehee Lee, Ig-hoon Lee, Suekyung Lee, Sang-goo Lee (IDS Lab. SNU) Dongkyu Kim, Jonghoon Chun.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Features Of SQL Server 2000: 1. Internet Integration: SQL Server 2000 works with other products to form a stable and secure data store for internet and.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Introduction to the Semantic Web Jeff Heflin Lehigh University.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
RDF storages and indexes Maciej Janik September 1, 2005 Enterprise Integration – Semantic Web.
Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.
1 Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sears Yahoo! Research.
WP3: Data Provenance and Access Control Irini Fundulaki, FORTH December 11-12, 2012, Luxembourg.
Performing Fault-tolerant, Scalable Data Collection and Analysis James Jolly University of Wisconsin-Madison Visualization and Scientific Computing Dept.
Stream Reasoning with Linked Data Open Data Open Day 2013 Sina Samangooei, Nick Gibbins 26 June 2013.
Managing Large RDF Graphs Vaibhav Khadilkar Dr. Bhavani Thuraisingham Department of Computer Science, The University of Texas at Dallas December 2008.
Semantic Web in Depth RDF Schema Dr Nicholas Gibbins –
Rya Query Inference.
Virtualization OVERVIEW
Triple Stores.
Semantic Event-based Service Oriented Architecture
Analyzing and Securing Social Networks
Triple Stores.
Triple Stores.
Presentation transcript:

The Design and Implementation of Minimal RDFS Backward Reasoning in 4store Manuel Salvadores, Gianluca Correndo, Steve Harris, Nick Gibbins, and Nigel Shadbolt

Contents Motivation Background –4store –Minimal RDFS 4sr –Distributed Model –Design and Implementation LUBM Scalability Evaluation Conclusions 2

Motivation 3 Triple/Quad stores are good for schema-less data engineering. Semantics in Triple/Quad stores are even better! Forward chained reasoning can be very expensive in space. Moreover, updates force to re-compute entailments. Data changes regularly and SPARQL/Update is in process of standardization … we need to improve backward chained reasoning.

4store 4 4store is a clustered RDF storage and SPARQL query system that became open source under the GNU license in July Clustered/Distributed (quads allocated on segment based on subject hash modulo) Written in C. Native storage (2 radix tries per predicate PO/PS, 1 hash for context) Native communication protocol on top of TCP/IP Fast, last LUBM Benchmark (2 nd on import, 2 nd on query and 1 st on updates)

4store: bind operation 5 B0 ⇐ bind (NULL,NULL,{basedNear},{London}) B1 ⇐ bind (NULL,B0s,{name,homePage},NULL}) QE SPARQL RESULTSET

Minimal RDFS 6 Minimal RDFS refers to the RDFS fragment published in: Simple and Efficient Minimal RDFS Muñoz, S., Pérez, J., Gutierrez, C.:. Journal of Web Semantics 7, 220–234 (September 2009) RDFS Issues: –RDFS can generate inconsistencies. –Decidability issues. –No differentiation between language constructors and ontology vocabulary. Minimal RDFS is built upon the ρdf fragment which includes the following RDFS constructors: rdfs:subPropertyOf, rdfs:subClassOf, rdfs:domain, rdfs:range and rdf:type

4sr’s Distributed Model 7 Definitions –ρdf = {sc, sp, dom, range, type} –A quad (m,s,p,o) is an mrdf-quad iff p ∈ ρdf - {type}, and Gmrdf is a graph with all the mrdf- quads from every graph in a KB.

4sr’s Distributed Model 8

9

4sr’s Design and Implementation 10

4sr’s Design and Implementation 11

4sr’s Design and Implementation 12

4sr’s Design and Implementation 13

LUBM Scalability Evaluation 14 LUBM(100), LUBM(200), LUBM(400), …, LUBM(1000). From 13M to 138M Triples. Measurement point

15 LUBM Scalability Evaluation Hardware Specs: –Server set-up: One Dell PowerEdge R410 with 2 dual quad processors (8 cores - 16 threads) at 2.40GHz, 48G memory and 15k rpm SATA disks. –Cluster set-up: An infrastructure made of 5 Dell PowerEdge R410s, each of them with 4 dual core processors at 2.27 GHz, 48G memory and 15k rpm SATA disks. The network connectivity is standard gigabit ethernet and all the servers are connected to the same network switch. For the server infrastructure we have measured configurations of 1, 2, 4, 8, 16, and 32 segments. For the cluster infrastructure we measured 4, 8, 16 and 32 - it makes no sense to measure fewer than 4 segments in a cluster made up of four physical nodes.

16 LUBM Scalability Evaluation Faculty {?s type Faculty} Person {?s type Person} Organisation {?s type Organisation} degreeFrom {?s degreeFrom ?o} worksFor {?s worksFor ?o}

17 LUBM Scalability Evaluation – server setup

18 LUBM Scalability Evaluation – cluster setup

19 Conclusions Backward chained reasoning can scale in a distributed environment for Minimal RDFS and the ρdf fragment. 4sr can concurrently perform search in indexes (radix tries) with awareness of RDFS semantics by replicating a small subset of triples. The small subset of triples to replicate are the ones that use the ρdf constructors. Backward chain reasoning benefits: More economic in space – number of quads. No need to re-compute entailments between updates.

20 4sr latest release

21 Future Work Implement more OWL constructors by studying subsets to replicate sameAs, TransitiveProperty, inverseProperty, … Merge with 4store main distribution. Probably with a compile option that will include RDFS reasoning. Look at overhead of subset replication when running SPARQL update(s).

Acknowledgments EnAKTing project This work was supported by the EnAKTing project funded by the Engineering and Physical Sciences Research Council under contract EP/G008493/1. 22

23 Thank you, Questions