Presentation is loading. Please wait.

Presentation is loading. Please wait.

VLDB 2005 An Efficient SQL-based RDF Querying Scheme Eugene Inseok Chong Souripriya Das George Eadon Jagannathan Srinivasan New England Development Center.

Similar presentations


Presentation on theme: "VLDB 2005 An Efficient SQL-based RDF Querying Scheme Eugene Inseok Chong Souripriya Das George Eadon Jagannathan Srinivasan New England Development Center."— Presentation transcript:

1

2 VLDB 2005 An Efficient SQL-based RDF Querying Scheme Eugene Inseok Chong Souripriya Das George Eadon Jagannathan Srinivasan New England Development Center Oracle

3 VLDB 2005 Talk Outline Introduction Functionality Design and Implementation Performance Conclusions and Future Work

4 VLDB 2005 Introduction

5 VLDB 2005 RDF (Resource Description Framework) RDF is a W3C Standard for describing resources on the web Uniform Resource Identifiers (URIs) are used to identify resources Example: http://www.oracle.com/people#John RDF triples are used to make statements about a resource Format: (subject predicate object) Example: (:John :brotherOf :Mary) Represents a directed, labeled edge in an RDF graph: :John:Mary :brotherOf

6 VLDB 2005 RDF Data and Graph Example Family Data: (:John :brotherOf :Mary) (:Mary :parentOf :Matt) (:John :name “John”) (:Mary :name “Mary”) (:Matt :name “Matt”) :John :Mary :brotherOf :Matt :parentOf :name John Mary :name Matt :name

7 VLDB 2005 RDF Querying Problem Given RDF graphs: the data set to be searched Graph Pattern: containing a set of variables Find Matching Subgraphs Return Sets of variable bindings: where each set corresponds to a Matching Subgraph

8 VLDB 2005 RDF Query Example Family Data: (:John :brotherOf :Mary) (:Mary :parentOf :Matt) (:John :name “John”) (:Mary :name “Mary”) (:Matt :name “Matt”) Graph Pattern: (names of Mary’s brothers) (?x :brotherOf ?y) (?y :name “Mary”) (?x :name ?n) Variable Bindings: x = :John, y = :Mary, n = “John” Matching Subgraph: (:John :brotherOf :Mary) (:Mary :name “Mary”) (:John :name “John”) :John :Mary :brotherOf :Matt :parentOf :name John Mary :name Matt :name

9 VLDB 2005 RDF Storage Issues Need to store RDF triples where the individual components can be URIs, blank nodes, or literals Namespaces used in URIs could be long Multiple triples describe a resource resulting in repetition of (possibly long) URIs Different representations possible for a literal occurring in multiple triples e.g. 120 120.0 12.0e+1 1.20e+2 RDF graph may include schema triples e.g. (:brotherOf rdfs:domain :Male)

10 VLDB 2005 RDF Querying Issues in SQL Support specification of graph pattern-based SQL query Occurrence of same variables in multiple triples of graph pattern: Processing requires self-join e.g. (?x :brotherOf ?y) (?y :name “Mary”) (?x :name ?n) Query processing (e.g for filter conditions, ORDER BY) requires datatype-specific comparison semantics Schema Triple: (:age rdfs:range xsd:int) Graph Pattern: (?x :age ?a) Filter Condition: a > 60 ORDER BY: a DESCENDING

11 VLDB 2005 RDF Querying Issues: Inference Query processing may involve Inferencing. Example: Data: (:Jim :brotherOf :John) (:John :fatherOf :Mary) Graph Pattern: (?x :uncleOf ?y) Result: Empty Rule: (?x :brotherOf ?y) (?y :fatherOf ?z)  (?x :uncleOf ?z) Inferred data: (:Jim :uncleOf :Mary) Result: x = :Jim, y = :Mary

12 VLDB 2005 RDF Querying Approach General Approach Create a new (declarative, SQL-like) query language e.g.: RQL, SeRQL, TRIPLE, N3, Versa, SPARQL, RDQL, RDFQL, SquishQL, RSQL, etc. SQL-based Approach Introduces a SQL Table Function RDF_MATCH that uses SPARQL-like graph pattern to express RDF queries Benefits of SQL-based Approach Leverages all the powerful constructs in SQL (e.g., SELECT / FROM / WHERE, ORDER BY, GROUP BY, aggregates, Join) to process graph query results RDF queries can easily be combined with conventional queries on database tables thereby avoiding staging

13 VLDB 2005 SELECT … FROM …, TABLE ( ) t, … WHERE …; Use of RDF_MATCH Table Function allows embedding a graph query in a SQL query Embedding RDF Query in SQL RDF Query (expressed as RDF_MATCH Table Function invocation)

14 VLDB 2005 Functionality

15 VLDB 2005 RDF_MATCH Table Function Input parameters RDF_MATCH ( Pattern,  graph pattern Models,  Data (set of RDF graphs) RuleBases,  Rules (0 or more rulebases) Aliases  list of prefixes for namespaces ) Returns a set of columns containing variable bindings Variable matching URI returned as single VARCHAR2 column with the same name (e.g. x for ?x) Variable matching literal returned as a pair of VARCHAR2 columns with a name (e.g. x for ?x) and the type (x$type for ?x)

16 VLDB 2005 RDF_MATCH Example Example: student reviewers less than 25 years old SELECT t.r reviewer, t.c conf, t.a age FROM TABLE ( RDF_MATCH ( ‘(?r rdf:type :Student) (?r :reviewerOf ?c) (?r :age ?a)’, RDFModels(‘reviewers’), NULL, RDFAliases(…)) ) t WHERE t.a < 25;

17 VLDB 2005 Specifying Rules RDFS rulebase: Pre-Loaded Can add User-defined rules Rule: “Chairperson of Conference is also a reviewer” (‘rb’,  rulebase name ‘ChairpersonRule’,  rule name ‘(?r :ChairpersonOf ?c)’  antecedents NULL,  filter condition NULL,  aliases ‘(?r:ReviewerOf ?c)’)  consequents

18 VLDB 2005 RDF_MATCH Example with rulebase Query: Find reviewers of conferences SELECT t.r reviewer FROM TABLE( RDF_MATCH( ‘(?r :ReviewerOf?c)’, RDFModels (‘reviewers’), RDFRules (‘rb’), NULL)) t; Data  (:Mary :ChairpersonOf :IDBC2005) Inferred data  (:Mary :ReviewerOf :IDBC2005)

19 VLDB 2005 Design & Implementation

20 VLDB 2005 RDF Data Storage Triples Data stored after normalization in two tables UriMap(UriID, UriValue,…) contains mapping of (URIs, blank nodes, literals) to internal identifiers IdTriples (ModelID, SubjectID, PropertyID, ObjectID,…) contains the triple information encoded as three identifiers Multiple representation of literals: The first occurrence treated as canonical, rest mapped to canonical representation e.g. 120.0  120 1.20e+2 12.0e+1

21 VLDB 2005 RDF_MATCH Query Processing Subsititute aliases with namespaces in search pattern Convert URIs and literals to internal IDs Generate Query Generate self-join query based on matching variables Generate SQL subqueries for rulebases component (if any) Generate the join result by joining internal IDs with UriMap table Use model IDs to restrict IdTriples table Compile and Execute the generated query

22 VLDB 2005 Optimization: Table Function Rewrite TableRewriteSQL( ) Takes RDF Query (specified via arguments) as input generates a SQL string Substitute the table function call with the generated SQL string Reparse and execute the resulting query Advantages Avoid execution-time overhead (linear in number of result rows) associated with table function infrastructure Leverage SQL optimizer capabilities to optimize the resulting query (including filter condition pushdown)

23 VLDB 2005 Optimization: Materialized Join Views Generic Materialized Join views (MJVs) Subject-Subject, Object-Subject, … Subject-property matrix MJVs (SPMJVs) custom, workload based (e.g., frequent search patterns) Example: Select student name, university, and age Select r, u, a …… ‘(?r rdf:type :Student) (?r :enrolledAt?u) (?r :age ?a)’ …… SPMJV:

24 VLDB 2005 Performance

25 VLDB 2005 Dataset WordNet : lexical database for English language UniProt : large scale (80 million triples) Protein and annotation data

26 VLDB 2005 Experiments Varying number of triples in search pattern Varying filter conditions Varying projection list Large-scale RDF data Subject-property MJVs

27 VLDB 2005 Varying Number of Triples ‘(?a wn:hyponymOf ?b) (?b wn:hyponymOf ?c) ….. Increasing number of self-joins

28 VLDB 2005 Varying Number of Triples

29 VLDB 2005 Varying Projection List ‘(?c0 wn:wordForm ?word) (?c0 wn:wordForm ?syn1) (?c1 wn:wordForm ?syn1) …. (5 triples) Benefit of the projection list optimization Eliminate joins with UriMap table for variables not referenced outside of RDF_MATCH

30 VLDB 2005 Varying Projection List

31 VLDB 2005 Large-Scale RDF Data UniProt – 10M, 20M, 40M, 80M triples 6 example queries given with UniProt Number of matches remain constant as dataset size changes (ROWNUM)

32 VLDB 2005 UniProt Sample Queries DescriptionPatternProjectionResult limit Q1: Display the ranges of transmembrane regions 6 triples 5 vars 3 vars15000 rows Q2: List proteins with publications by authors with matching names 5 triples 5 vars 1 LIKE pred. 3 vars10 rows Q3: Count the number of times a publication by a specific author is cited 3 triples 2 vars 0 vars32 rows Q4: List resources that are related to proteins annotated with a specific keyword 3 triples 2 vars 1 var3000 rows Q5: List genes associated with human diseases 7 triples 5 vars 3 vars750 rows Q6: List recently modified entries 2 triples 2 vars 1 range pred. 2 vars8000 rows

33 VLDB 2005 Query Response Times RDF_MATCH Performance Scalability Q1Q2Q3Q4Q5Q6 10 M Triples 0.86< 0.01 0.030.180.46 20 M Triples 0.95< 0.01 0.030.190.47 40 M Triples 0.96< 0.01 0.030.180.47 80 M Triples 1.03< 0.01 0.030.200.49 Maximum .0540.002.011.0650.07

34 VLDB 2005 Conclusions

35 VLDB 2005 Conclusions and Future Work SQL-based RDF querying scheme RDF_MATCH table function Supports graph-pattern based query on RDF data with RDFS and user-defined rules Efficient Execution Table Function Rewrite Materialized Join Views: Generic and Subject-Property Rule Indexes Future work OPTIONAL support – outer-join Provenance support

36


Download ppt "VLDB 2005 An Efficient SQL-based RDF Querying Scheme Eugene Inseok Chong Souripriya Das George Eadon Jagannathan Srinivasan New England Development Center."

Similar presentations


Ads by Google