Reducing Search Space Scheme using RDF-Schema Domain and Range Information for Efficient RDF Query Processing Sungtae Kim SNU OOPSLA Lab. December 3, 2004.

Slides:



Advertisements
Similar presentations
Building a Semantic IntraWeb with Rhizomer and a Wiki Roberto Garcia and Rosa Gil GRIHO (Human Computer Interaction Research Group) Universitat de Lleida,
Advertisements

1 ISWC-2003 Sanibel Island, FL IMG, University of Manchester Jeff Z. Pan 1 and Ian Horrocks 1,2 {pan | 1 Information Management.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
RDF Schemata (with apologies to the W3C, the plural is not ‘schemas’) CSCI 7818 – Web Technologies 14 November 2001 Van Lepthien.
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
E © 2002 Dario Aganovic Resource Description Framework Schema (RDFS) Dario Aganovic Industrial PhD-student NPI Production Kista, Ericsson AB and Production.
Shelley Powers, O’Reilly SNU IDB Lab. Hyewon Kim
Store RDF Triples In A Scalable Way Liu Long & Liu Chunqiu.
Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.
RDF Databases By: Chris Halaschek. Outline Motivation / Requirements Storage Issues Sesame General Introduction Architecture Scalability RQL Introduction.
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
Dr. Alexandra I. Cristea RDF.
1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan.
Chapter Physical Database Design Methodology Software & Hardware Mapping Logical Design to DBMS Physical Implementation Security Implementation Monitoring.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
1 Semantic Web and Retrieval of Scientific Data Semantics Goran Soldar University of Brighton UK Dan Smith University of East Anglia UK.
1 Extracting RDF Data from Unstructured Sources Based on an RDF Target Schema Tim Chartrand Research Supported By NSF.
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.
VLDB 2005 An Efficient SQL-based RDF Querying Scheme Eugene Inseok Chong Souripriya Das George Eadon Jagannathan Srinivasan New England Development Center.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.
Practical RDF Chapter 1. RDF: An Introduction
Okech Odhiambo Faculty of Information Technology Strathmore University
Logics for Data and Knowledge Representation
Database Support for Semantic Web Masoud Taghinezhad Omran Sharif University of Technology Computer Engineering Department Fall.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools Mohammad Farhan Husain, Latifur Khan, Murat Kantarcioglu and Bhavani Thuraisingham.
By: Dan Johnson & Jena Block. RDF definition What is Semantic web? Search Engine Example What is RDF? Triples Vocabularies RDF/XML Why RDF?
SQL Databases are a Moving Target Juan F. Sequeda – Syed Hamid Tirmizi –
IDB, SNU Dong-Hyuk Im Efficient Computing Deltas between RDF Models using RDFS Entailment Rules (working title)
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
RDF and triplestores CMSC 461 Michael Wilson. Reasoning  Relational databases allow us to reason about data that is organized in a specific way  Data.
Coastal Atlas Interoperability - Ontologies (Advanced topics that we did not get to in detail) Luis Bermudez Stephanie Watson Marine Metadata Interoperability.
Co-funded by the European Union Semantic CMS Community Tutorial: Knowledge Interaction and Presentation Copyright IKS Consortium 1 DFKI GmbH. September,
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Chapter 7: Resource Description Framework (RDF) Service-Oriented Computing: Semantics, Processes, Agents – Munindar P. Singh and Michael N. Huhns, Wiley,
FlexTable: Using a Dynamic Relation Model to Store RDF Data IDS Lab. Seungseok Kang.
Of 35 lecture 5: rdf schema. of 35 RDF and RDF Schema basic ideas ece 627, winter ‘132 RDF is about graphs – it creates a graph structure to represent.
RDF-3X : RISC-Style RDF Database Engine
Web Information Systems Modeling Luxembourg, June VisAVis: An Approach to an Intermediate Layer between Ontologies and Relational Database Contents.
Practical RDF Chapter 10. Querying RDF: RDF as Data Shelley Powers, O’Reilly SNU IDB Lab. Hyewon Lim.
RDF-3X : a RISC-style Engine for RDF Thomas Neumann, Gerhard Weikum Max-Planck-Institute fur Informatik, Max-Planck-Institute fur Informatik PVLDB ‘08.
Chapter 7: Resource Description Framework (RDF) Service-Oriented Computing: Semantics, Processes, Agents – Munindar P. Singh and Michael N. Huhns, Wiley,
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
RDF-3X: a RISC-style Engine for RDF Presented by Thomas Neumann, Gerhard Weikum Max-Planck-Institut fur Informatik Saarbrucken, Germany Session 19: System.
Practical RDF Ch.10 Querying RDF: RDF as Data Taewhi Lee SNU OOPSLA Lab. Shelley Powers, O’Reilly August 27, 2004.
Important Concepts from the W3C RDF Vocabulary/Schema Sungtae Kim SNU OOPSLA Lab. August 19, 2004.
© 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry
Ontology Architectural Support Options Group Name: MAS WG Source: Catalina Mladin, Lijun Dong, InterDigital Meeting Date: Agenda Item: TBD.
R Store Angelique Moscicki Oshani Seneviratne Sergio Herrero-Lopez.
Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar When they were out of sight Ali Baba.
Triple Storage. Copyright  2006 by CEBT Triple(RDF) Storages  A triple store is designed to store and retrieve identities that are constructed from.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Experience with XML Schema Ashok Malhotra Schema Usage  Mapping XML Schema and XML documents controlled by the Schema to object classes and instances.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall RDF & RDF Schema Machine Understandable Metadata for the.
RDF storages and indexes Maciej Janik September 1, 2005 Enterprise Integration – Semantic Web.
An Optimization Technique for RDFS Inference using the Application Order of RDFS Entailment Rules Kisung Kim, Taewhi Lee
RDF languages and storages part 2 - indexing semi-structure data Maciej Janik Conrad Ibanez CSCI 8350, Fall 2004.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Sesame A generic architecture for storing and querying RDF and RDFs Written by Jeen Broekstra, Arjohn Kampman Summarized by Gihyun Gong.
Author: Akiyoshi Matonoy, Toshiyuki Amagasay, Masatoshi Yoshikawaz, Shunsuke Uemuray.
RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB DongHyuk Im.
1 Efficient Processing of Transitive Closure Queries in Ontology Store using Graph Labeling Kim, Jongnam SNU OOPSLA Lab. Dec. 3, 2004.
1 RDF Storage and Retrieval Systems Jan Pettersen Nytun, UiA.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Tools for DAML-Based Services, Query Answering, and
Introduction to RDF and RDFS Editor: MR3
Chaitali Gupta, Madhusudhan Govindaraju
Semantic-Web, Triple-Strores, and SPARQL
Tools for DAML-Based Services, Query Answering, and
Presentation transcript:

Reducing Search Space Scheme using RDF-Schema Domain and Range Information for Efficient RDF Query Processing Sungtae Kim SNU OOPSLA Lab. December 3, 2004 효율적인 RDF 질의 처리를 위한 RDF-Schema Domain 과 Range 정보기반의 데이타 탐색 범위 감소 기법 ( )

2 Contents  Introduction  Motivation  Related work  RDF-Schema information  rdfs:Class, rdfs:domain, rdfs:range  Our Approach  Experiments  Conclusion and Future work

3 Introduction (1/2)  Semantic Web definition  Extension of the current web, in which information is given well-defined meaning, better enabling computers and people to work in cooperation  RDF (Resource Description Framework)  W3C Recommendation for the formulation of meta-data  Triple structure  RDF-Schema  Specify domain vocabulary, resource structure and relations  rdfs:Class, rdfs:domain, rdfs:range Predicate Subject Object

4 Introduction (2/2)  Ontology data  Wine Ontology  Recommend wines to accompany meal courses  Gene Ontology  The information about the shared genes and proteins in all diverse organisms  Jena  Leading semantic web framework (HP Lab)  Efficient RDF Storage and Retrieval in Jena2 SWDB K. Wilkinson, C. Sayers, H. Kuno, D. Reynolds

5 Motivation (1/2)  Jena2 Database Schema Jena_long_lit ID Head CHKSum Tail Jena_gntn_stmt Subj Prop Obj GraphID Jena_long_uri ID Head CHKSum Tail Jena_sys_stmt Subj Prop Obj GraphID Jena_prefix ID Head CHKSum Tail Jena_graph ID Name Jena_gntn_reif Subj Prop Obj GraphID Stmt HasType Object Model Info Subj, Prop, Obj, GraphID GraphID Statement table

6 Motivation (2/2)  Triple database  Can we reduce search space of table by using RDF-Schema rdfs:domain and rdfs:range information? SubjectPredicateObject ⋈⋈ Result Querying Multiple self-join 1. Duplicate 2. Long strings 3. Object reference Triple mapping Require large table self-join Ontology data Statement table

7 Related Work  Efficient RDF Storage and Retrieval in Jena2 Kevin, Craig, Harumi and Dave HP Laboratories SWDB 2003  Introduce Jena for storing OWL by using de-normalization of triple structure  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema Jeen, Arjohn and Frank On-To-Knowledge Project ISWC 2002  Store triple by using normalization method and support semantic level query  Database Schema Design and Analysis for the efficient OWL Semantic information processing Kyung-Hyen Tak, Hag-Soo Kim, Hyun-Seok Cha, Jin-Hyun son Hanyang University KDBC 2004  Propose new database schema and eliminate unnecessary table at Sesame

8 RDF-Schema information  rdfs:Class (owl:Class)  Similar type system of object-oriented programming concept  rdfs:domain  State that specified predicate is instance of subject class  Triple structure (Subject, Predicate, Object)  rdfs:range  State that values of a property are instance of object class  Triple structure (Subject, Predicate, Object) paints Painter exhibited Museum PainterPainting paints PaintingMuseum exhibited Subject = { Picasso, Michelangelo, …} Object = { Louvre Museum, Rodin Museum,...} Painter Designer Sculptor Musician Museum Painting rdfs:domain rdfs:range Brush ART

9 Our approach(1/4) Class: GeneProduct Class: Association Class: Dbxref Class: Evidence SubjPredObj GeneProduct SubjPredObj Association SubjPredObj Term SubjPredObj Evidence Multiple class statement tables Ontology schema SubjPredObj Direct resolve SubjPredObj ⋈ Term Association Schema analysis SubjPredObj DafaultTriple Class: History SPO Query Analyzer Extract table  System flow Class: Term SQL Query Result

10 Our Approach (2/4)  What is the term whose name is “antioxidant a) activity” and related GeneProduct name is “T14G11.18” ?  Triple input query style Pattern 1 (?X, name, ‘antioxidant activity’ ) Pattern 2 (?X, association, ?Y ) Pattern 3 (?Y, gene_product, ?Z) Pattern 4 (?Z, name, ‘T14G11.18’)  Analysis of twig query tree & problem &Association ‘antioxidant activity’ &Term &GeneProduct ‘T14G11.18’ name association gene_product name Same predicate name Which class does it belong ? a) Antioxidant : A chemical compound or substance that inhibits oxidation …… null GeneProduct null …… Range …… Term Association GeneProduct …… Domain …… name gene_prdouct name …… Pred DomainRange

11 Our Approach (3/4)  Edge reverse tracing  SQL query SELECT Term.* FROM Term, Association, GeneProduct WHERE Term.pred = ‘name’ AND Term.obj = ‘antioxidant activity’ AND Term.obj = Association.subj AND Associatoin.obj = GeneProduct.subj AND GeneProduct.pred = ‘name’ AND GeneProduct.obj = ‘T14G11.18’ Reverse tracing & use range value DomainPredRange …… Term Association GeneProduct …… name gene_prdouct name …… null GeneProduct null …… DomainRange PredDupli …… name gene_product …… 1 0 …… PropDuplicate 1 2 rdfs:domain rdfs:range &Association ‘antioxidant activity’ &Term &GeneProduct ‘T14G11.18’ name association gene_product name

12 Our Approach (4/4)  Multiple edge reverse tracing  Stack operation of pair (Domain, Predicate) preddupli …… name gene_product association …… 1 0 …… domainpredRange …… Term Association GeneProduct Term …… name gene_prdouct name association …… null GeneProduct null Association …… DomainRange PropDuplicate 1 2 ( &y, gene_product ) ( &x, name ) association == 0 ( &y, gene_product ) ( &x, name ) Association GeneProduct &Association ‘antioxidant activity’ &Term &GeneProduct ‘T14G11.18’ name association gene_product name

13 Experiments (1/2)  Environment  Intel Pentium P4 1.6GHz 1GB RAM  OS : Windows XP  Database : MySQL 4.0  Implementation language: Java  Data set : Gene Ontology termDB  Query Set Q1Find term whose accession is ‘GO: ’ and related evidence code value is ‘ISS’ Q2Find Q1 term and that is related with database symbol with ‘PMID’ Q3Find parent term whose child term’s definition is containing ‘amino acid’ Q4Find term whose name is ‘antioxidant’ and related with GeneProduct whose name is ‘T14G11.18’

14 Experiments (2/2) Response time Size of Database % sec

15 Conclusion and Future work  Reorganize database schema for storing triple data  Reduce search space by using both  Semantic information rdfs:domain and rdfs:range  Multiple statement tables  Reduce physical size of table  Eliminate redundant namespace value  Overhead  Require schema analysis  Maintain DomainRange table and PredicateDuplicate table  Future work  Ontology schema analysis engine for semi-automatic inserting rdfs:domain and rdfs:range

16 Query Analyzer Algorithm Function Query Input parameter: user query, ModelRDB model for all input triple do if is belong to domain and predicate then if is predicate conflict get parent predicate for range value endif check domain value and extract table name else use default triple table build SQL APPENDEX 1

17 Statement Table Feature APPENDEX 2

18 Additional Database Schema  Reorganize database schema  Construct ‘allNameSpace’ table  Reduce physical table size  Add namespace referencing column to a statement table IDNameSpace AllNameSpace SubjNSPredObj Statement APPENDEX 3

19 Sesame Database Schema Namespaces Id prefix name Triples subject predicate object Explicit Range property class Domain property class Literal id language value Resources id namespace localname Instanceof Inst class Proper_Instanceof Inst class Property id Class id Direct_subclassof sub super Direct_subpropertyof sub super Subpropertyof sub super Subclassof sub super * * 2..* 1..* 2..* 1..* 1 Literal-to- object Namespace- assignment Resource-to- inst Resource-to- subject Resource-to- predicate Resource-to- object Resource-to- property, resource-to- property Resource- assign Class,class-to- proper_instanceof,class Id-to-sub, id-to-super Id-to-sub, id-to-super APPENDEX 4

20 Gene Ontology Schema ‘ go#GO: go#GO: ’ ‘ go#GO: go#GO: ’ accession dbxref name dbxref database_symbol reference gene_product name association is_a ‘….’ ‘GO: ’ ‘Antioxidant Activity’ ‘ISS’ ‘MGI’ ‘MGI: ’ ‘ C22Rik’ evidence_code evidence dbxref definition Class: Association Class: Term Class: GeneProduct Class: Dbxref Class: Evidence APPENDEX 5