Download presentation
Presentation is loading. Please wait.
1
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar
2
Outline Motivation Background and related work Problem statement Our contributions Assumptions Experimental process Results Conclusions
3
Motivation Semantic Web libraries scientific databases industry social networks Computer-to- computer communication
4
RDF Schema Schema Instance
5
RDF Schema RDF Triples
6
Related Work Triple store Property tables Class property tables Dynamic table model Vertically partitioned tables (Abadi, et al 2007) Path based approach (Matono, et al 2005) Require more self joins, normal joins, NULL value storage
7
Vertical Partitioning A table is created for each property First SubjectObject 'r1''Picasso' 'r4''August' Last SubjectObject 'r1''Picasso' 'r4''Rodin' Paints SubjectObject 'r1''r2' 'r1''r3'... etc.
8
Path-based Model Path signatures relate to instance data Path pathidpathexp 1'' 2'#first' 3'#last' 4'#paints' 5'#title<#paints' 6'#sculpts' 7'#title<#sculpts' Resource namepathidroot 'r1'1'r1' 'r2'4'r1' 'r3'4'r1' 'r4'1'r4' 'Picasso'2'r1' 'Pablo'3'r1' 'August'2'r4' 'Rodin'3'r4'... Our enhancement
9
Problem Statement Given: A set of RDF triples Vertical partitioning storage model Path-based storage model Find: Query plans for the various categories of queries under these two storage schemes. Objective: To determine query types that perform comparatively better or worse in two storage models Why is this challenging? Need for efficient storage of structured data Different application domains use RDF, generic storage schemes should support a diverse workload.
10
Contributions Identification of benchmark queries schema, instance, path, and aggregate queries Enhancement to the path-based schema that addresses different types of workloads Comparison of path-based model and vertical partitioning Analysis of cyclic queries
11
Query Types Schema queries find all types of artists list all property names list nodes with 2 or more descendants. find the transitive sub-classes of a class 'sculpture' list properties with 2 or more descendants Instance queries find the titles of all paintings by Picasso select all nodes within one edge-length of R4 list all the properties of node r4 Schema vs Instance Path Non-path Aggregate Cycle Relationship DiameterConstraints intermediate node terminal node Connection List
12
Query Types Path queries find the title of any painting painted by anyone display all the titles of work done by artists find the names of all the sculptors...with constraint on intermediate node find an artist's name where the artifact is a painting...with terminal node constraints display all the titles of work done by Picasso
13
Query Types Path queries connection queries list all the properties of node r4 is there a connection between 'Picasso' and 'Guernica'? diameter queries select all nodes in the graph within one edge- length of R4 non-simple path queries detect loops in the dataset starting at 'Picasso' detect loops in the whole dataset
14
Query Types Aggregate queries find all nodes with 2 or more properties list all subjects that have two instances of a single property Relationship queries find any relationship between r1 and r4
15
Assumptions Using a small dataset, with the assumption that number of joins and efficiency of the queries will not change significantly with larger datasets No explicit storage of the RDF schema in the vertically- partitioned scheme (application independent) INSERT, UPDATE, & DELETE are insignificant compared to SELECT Key nodes in the path-based model are well-defined In practice, key nodes, would be generated dynamically after user load analysis
16
Experimental Process Validation parameters Nodes Edges Number of joins Number of tables CPU cost Storage bytes Setup both schemes in Oracle 10g for the RDF graph shown earlier Materialized path lengths in path-based scheme Generated query plans Analyzed queries based on the validation parameters Cycle queries – joins are not supported
17
Dataset used for experiment
18
* For CPU cost and bytes (storage) the entry in the table indicates which scheme used less CPU cycles or occupied less space. In cases where both required an identical or similar amount of computation or storage, we indicate this with “same”. Queries which cannot be answered are indicated by ‘--‘. Experimental Results
19
Conclusions & Observations Vertical Partitioning performs well for Short path length, terminal node constraints. Offers storage benefits for instance queries without path expressions. Enhanced Path Based model performs well for Schema queries, path queries, cycle queries Queries which the original path-based could not address and the enhanced model could answer: Connection queries and diameter queries Path queries with intermediate node constraints
20
Conclusion (Cont'd) Both the schemes show the same performance on instance queries without path expressions. Both the schemes do not address relationship queries Interesting results for cycle queries specifying the start node gives a bad performance than when the start node is not specified specifying the start node uses Oracle Filter.
21
Future Work Test large and diverse datasets Test vertical partitioning with a column-orientated database like MonetDB Pruning strategies for cycle queries Impose join indexes Find approaches to answer relationship queries Storage classification based on the application domain
22
Thank You Questions? Please see http://www.cs.umn.edu/~cmueller/cs8715 for a copy of the report that accompanies this presentation, including a full bibliography
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.