1 Efficient Processing of Transitive Closure Queries in Ontology Store using Graph Labeling Kim, Jongnam SNU OOPSLA Lab. Dec. 3, 2004.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
XML: Extensible Markup Language
WIMS 2014, June 2-4Thessaloniki, Greece1 Optimized Backward Chaining Reasoning System for a Semantic Web Hui Shi, Kurt Maly, and Steven Zeil Contact:
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
RDF Databases By: Chris Halaschek. Outline Motivation / Requirements Storage Issues Sesame General Introduction Architecture Scalability RQL Introduction.
Evaluating Path Queries over Route Collections Panagiotis Bouros NTUA, Greece (supervised by Y. Vassiliou)
Building and Analyzing Social Networks Web Data and Semantics in Social Network Applications Dr. Bhavani Thuraisingham February 15, 2013.
Who am I Gianluca Correndo PhD student (end of PhD) Work in the group of medical informatics (Paolo Terenziani) PhD thesis on contextualization techniques.
Incremental Materialization of RDF Graph Closures for Stream Reasoning Alexandre Mello Ferreira (PhD student) 22/11/2010.
Accelerating Inferencing. Assertion Efficient inferencing using taxonomies require fast computation of subsumption, disjointness, least common ancestors,
CS 206 Introduction to Computer Science II 10 / 31 / 2008 Happy Halloween!!! Instructor: Michael Eckmann.
Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,
Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
Triple Stores.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Cost-based Optimization of Graph Queries Silke Trißl Humboldt-Universität zu Berlin Knowledge Management in Bioinformatics IDAR 2007.
RDF Triple Stores Nipun Bhatia Department of Computer Science. Stanford University.
Practical RDF Chapter 1. RDF: An Introduction
Reducing Search Space Scheme using RDF-Schema Domain and Range Information for Efficient RDF Query Processing Sungtae Kim SNU OOPSLA Lab. December 3, 2004.
Keyword Search on External Memory Data Graphs Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan PVLDB 2008 Reported by: Yiqi Lu.
Database Support for Semantic Web Masoud Taghinezhad Omran Sharif University of Technology Computer Engineering Department Fall.
Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools Mohammad Farhan Husain, Latifur Khan, Murat Kantarcioglu and Bhavani Thuraisingham.
IDB, SNU Dong-Hyuk Im Efficient Computing Deltas between RDF Models using RDFS Entailment Rules (working title)
On Data Provenance in Group-centric Secure Collaboration Oct. 17, 2011 CollaborateCom Jaehong Park, Dang Nguyen and Ravi Sandhu Institute for Cyber Security.
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, Bhavani Thuraisingham University.
Trees By Charl du Plessis. Contents Basic Terminology Basic Terminology Binary Search Trees Binary Search Trees Interval Trees Interval Trees Binary Indexed.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Web Data Management Indexes. In this lecture Indexes –XSet –Region algebras –Indexes for Arbitrary Semistructured Data –Dataguides –T-indexes –Index Fabric.
Co-funded by the European Union Semantic CMS Community Tutorial: Knowledge Interaction and Presentation Copyright IKS Consortium 1 DFKI GmbH. September,
Path-Hop: efficiently indexing large graphs for reachability queries Tylor Cai and C.K. Poon CityU of Hong Kong.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
RDF languages and storages part 1 - expressivness Maciej Janik Conrad Ibanez CSCI 8350, Fall 2004.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Triple Stores. What is a triple store? A specialized database for RDF triples Can ingest RDF in a variety of formats Supports a query language – SPARQL.
M180: Data Structures & Algorithms in Java Trees & Binary Trees Arab Open University 1.
Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar When they were out of sight Ali Baba.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 12 RDF, OWL, Minimax.
Graph Data Management Lab, School of Computer Science Branch Code: A Labeling Scheme for Efficient Query Answering on Tree
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
WonderWeb. Ontology Infrastructure for the Semantic Web. IST Project Review Meeting, 11 th March, WP2: Tools Raphael Volz Universität.
Data Structure and Algorithms
GRIN: A Graph Based RDF Index Octavian Udrea 1 Andrea Pugliese 2 V. S. Subrahmanian 1 1 University of Maryland College Park 2 Università di Calabria.
RDF storages and indexes Maciej Janik September 1, 2005 Enterprise Integration – Semantic Web.
An Optimization Technique for RDFS Inference using the Application Order of RDFS Entailment Rules Kisung Kim, Taewhi Lee
RDF languages and storages part 2 - indexing semi-structure data Maciej Janik Conrad Ibanez CSCI 8350, Fall 2004.
Sesame A generic architecture for storing and querying RDF and RDFs Written by Jeen Broekstra, Arjohn Kampman Summarized by Gihyun Gong.
Author: Akiyoshi Matonoy, Toshiyuki Amagasay, Masatoshi Yoshikawaz, Shunsuke Uemuray.
BY: Mark Gruszecki.  What is a Recursive Query?  Definition(s) and Algorithm(s)  Optimization Techniques  Practical Issues  Impact of each Optimization.
Semantic Web for the Working Ontologist Dean Allemang Jim Hendler SNU IDB laboratory.
RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB DongHyuk Im.
CS 405G: Introduction to Database Systems
CS4222 Principles of Database System
Triple Stores.
Introduction to the Semantic Web (tutorial) 2009 Semantic Technology Conference San Jose, California, USA June 15, 2009 Ivan Herman, W3C
Dynamic Multi-version Ontology-based Personalization
Probabilistic Data Management
Chapter Trees and B-Trees
Chapter Trees and B-Trees
Analyzing and Securing Social Networks
RDF Stores S. Sakr and G. A. Naymat.
Triple Stores.
Information Networks: State of the Art
HP Labs and the semantic web
Triple Stores.
Presentation transcript:

1 Efficient Processing of Transitive Closure Queries in Ontology Store using Graph Labeling Kim, Jongnam SNU OOPSLA Lab. Dec. 3, 2004

2 Contents  Introduction  Motivation  Our Approach  Experiments  Related Work  Closing Remarks

3 Introduction (1/2)  What are Ontologies?  “ Document that formally defines the relations among terms ”  Hierarchical taxonomy and a set of inference rules  Gene Ontology  Gene Ontology Consortium  Information about the role of gene products within an organism  Jena  Hewlett-Packard  The most general framework for ontology and semantic web  RDF/ OWL API, inference support, RDBMS persistence Enzyme activator Apoptotic protease activator Gene Ontology Apoptosis regulator Apoptosis activator Protease activator Molecular function Coalation activator Coalation Synthesis Protease synthesis Galactos Systhesis Galactos activator

4 Introduction (2/2)  What are transitive closure queries?  “Find all enzyme genes”  “Find transitive *correlations between terms”  Why important in ontology queries?  To find ‘Enzyme’ gene, we should also look into ‘helicase’ and ‘DNA helicase’ etc.  Transitive closure computation is expensive is_a implied molecular function ligand binding or carrier nucleic acid binding DNA binding enzyme helicase DNA helicase *correlation: whether two terms have same gene products

5 Motivation (1/3)  Naïve approach for transitive closure queries  Dynamic approach  Most implementations of SQL do not support recursive querying  Requires multiple SQL calls  Static approach  not space-efficient B subClassOf A C subClassOf B D subClassOf C E subClassOf D B subClassOf A C subClassOf B C subClassOf A D subClassOf C D subClassOf B D subClassOf A E subClassOf D E subClassOf C E subClassOf B E subClassOf A G G* “ pre-computation is essential ” G : data set G * : its presentation A B C D E

6 Motivation (2/3)  Approach in Jena  Space-efficient, but not time-efficient  Most of work in Jena are for transitive reduction  Transitive closure is done by brute force (graph traversal) C subClassOf A B creator “kim” B date “12-03” B subClassOf C C date “10-12” B subClassOf D D name “blar” D subClassOf C E subClassOf C E subClassOf D C subClassOf A B subClassOf C D subClassOf B E subClassOf D Ontology Jena Transitive Reasoner Memory B A D C E G B A D C E G-G- Reasonable in quite large ontology ?

7 Motivation (3/3)  Approach in Jena (cont.) is_a part_of develops_from subClassOf is_a someValuesFrom anonymous part_of Restriction onProperty subClassOf part_of gene ontology file

8 Our Approach : Interval-based Labeling for Graph  We propose efficient approach in both space and time  Labeling is a one-time activity, and it can be used repeatedly {(1,1)} {(2,2)} {(6,6)} {(7,7)} {(5,5)} {(3,3)} {(4, 4)} {(1,1)} {(2,5)} {(6,6)} {(7,7)} {(5,5)} {(3,3)} {(4, 4)} {(1,7)} {(2,5)} {(4,4),(6,7)} {(7,7)} {(5,5)} {(3,3)} {(4, 4)} {(1,1)} {(2,5)} {(4,4),(6,7)} {(7,7)} {(5,5)} {(3,3)} {(4, 4)}

9 Our Approach : Data Structures  Interval = ( start, end )  Node_ID = start  Node_Label = { ( start, end ), …, ( start, end ) }  B + -tree index over start number  To make the best of performance, we maintain the list of each relation type (e.g. is_a, part_of) (3,3) (4,4) (5,5) (2,5) (7,7) (6,7) (1,7) B + -tree index Interval List for each relation {(1,7)} {(2,5)} {(4,4),(6,7)} {(7,7)} {(5,5)} {(3,3)} {(4, 4)}

10 Our Approach : Algorithms  Preprocessing  *Transitive closure queries  Descendants (v) = {u}  start(v) = end(u)  Ancestors (v) = {u}  start(v) >= start(u) ^ end(v) <= end(u)  Nearest Common Ancestor (v, w) = {u}  start(u) p ^ ~ ∃ u’  s.t. start(u’) p ^ start(u’) <= end(u) ^ end(u’) < end(u)  where i = minStart(v, w), p = maxEnd(v, w) Find the roots of each relation Do labeling each graph of different relation Materialize * See appendix 2 is_a part_of develops_from

11 Our Approach : Analytical Efficiency  Space  Naïve: n + (n-1) + … + 1 = O(n 2 )  Jena: O(n)  Our approach: average O(n) (n := # of nodes)  Time  Jena: O(k)  Our approach  subclass: O(1)  superclass: O(k) (k := # of answer nodes)  When considering quite large ontology  the situation that cannot load necessary triples completely  Jena behave like naïve approach except that it uses transitive reduction B subClassOf A C subClassOf B D subClassOf C E subClassOf D Triples Jena listSubClasses(A) { for each A’s child C add C to result listSubClasses(C) until A has no child } Our approach listSubClasses(A) { L := label(A) for each interval L k in L add contained node in Lk to result } A {(1,7)} {(2,5)} {(4,4),(6,7)} {(7,7)} {(5,5)} {(3,3)} {(4, 4)} A B C D E

12 Experiments (1/2)  Data  Gene Ontology (term-db/owl)  Information about the role of gene products within an organism  Subject of evaluation  Naïve approach  Jena transitive reasoner (i.e. OWL_MEM_TRANS_INF)  Our approach Molecular function Biological process Cellular component Total Term Edge * is_a: 17602, part_of: 2100, total: 19702

13 Experiments (2/2)  Query Set  Results Q1Find all (is_a) subclasses of one class Q2Find all (part_of) subclasses of one class Q3Find all superclasses of one class Q4Find the nearest common ancestor of two classes memory version disk version

14 Related Work  [1] Indexing Techniques for Object-Oriented Databases. W. Kim. Object-Oriented Concepts, Databases, and Applications, 1989  [2] Efficient processing of regular path joins using PID. J. Kim. Information and Software Technology, 2002  [3] On supporting containment queries in relational database management systems. C. Zhang. ACM SIGMOD, 2001  [4] The ICS-FORTH RDFSuite: Manageing voluminous RDF description bases. S. Alexaki. Semantic Web Workshop, 2001  [5] Efficient RDF storage and retrieval in Jena2. K. Wilkinson. SWDB, 2003  [6] Sesame: An Architecture for Storing and Querying RDF Data and Schema Information. J. Broekstra. Semantics for the WWW, 2001  [7] Gene Ontology Consortium.

15 Closing Remarks  We present a technique for processing transitive closure queries using interval-based labeling  We present both analytical and empirical evidence of its efficiency in compared with Jena  When it comes to quite large ontology, our approach and data structures reduce response time remarkably

16 Transitive Closure & Reduction  Transitive closure (G*)  Given a digraph G, the transitive closure of G is the digraph G* s.t  G* has the same vertices as G  if G has a directed path from u to v ( u  v ), G* has a directed edge from u to v  The transitive closure provides reachability information about a digraph  Transitive reduction (G - )  Digraph G - s.t smallest number of edges such for every path between vertices in G B A D C E G* B A D C E G B A D C E G-G- Appendix 1

17 Algorithms for Transitive Closure Queries  listSubClasses  listSuperClasses  Nearest Common Ancestor listSubclasses(target) { for i = target.start to target.end find node of i add to result return result } listSupersubclasses(target) { for each node s.t. node.end >= target.end if node.start <= target.start add to result return result } getNCA(target1, target2) { let target1 to have larger postorder number for each node s.t. node.end >= target1.end if node.start <= target1.start and node.start <= target2.start return node } Appendix 2

18 Incremental Maintenance  Leave gaps bet. postorder numbers (e.g. 10)  Addition  Deletion  just delete {(1,60)} {(10,40)} {(30,30),(50,60)} {(60,60)} {(40,40)} {(20,20)} {(30,30)} Appendix 3 {(1,60)} {(10,40)} {(30,30),(50,60)} {(60,60)} {(40,40)} {(20,20)} {(30,30)} {(15,15)}