Download presentation
Presentation is loading. Please wait.
Published byBuddy Barton Modified over 9 years ago
1
GRIN: A Graph Based RDF Index Octavian Udrea 1 Andrea Pugliese 2 V. S. Subrahmanian 1 1 University of Maryland College Park 2 Università di Calabria
2
2 Motivation Plenty of large RDF datasets: TAP, GovTrack, ChefMoz, CIA World Factbook Many many more (see rdfdata.org) Query languages: RDQL, RQL, SPARQL DB systems: Jena, Sesame, RDFBroker Indexing? Based on relational database indexes Has to be rooted in the characteristics of the query language
3
Contributions Lightweight mechanism for indexing large RDF datasets GRIN: Graph-based RDF INdex Query answer algorithms for SPARQL-like queries Evaluation on two real-world datasets: TAP (Stanford) and ChefMoz (chefmoz.org) 3
4
Outline RDF data and queries The GRIN Index structure Answering queries Experimental evaluation 4
5
RDF graph example (ChefMoz) 5
6
RDF query example 6
7
Query example in SPARQL 7 X SELECT ?v1 ?v2 ?v3 WHERE { {(?v1 attire ?v3). (?v1 cuisine Italian)} {(?v2 attire ?v3). (?v2 cuisine Italian). (?v2 location Norfolk)} {(Norfolk locatedIn NE/USA)} } FROM ChefMoz
8
Native RDF systems: Jena2 Stores RDF as (subject, property, value) in a relational table Indexes on each of the three attributes Translates SPARQL/RDQL into SQL 8 X 6 self-joins
9
Native RDF systems: Sesame Broekstra et al., ISWC 2002 The Sesame SAIL API improves on Jena: Supports RDF Schema inference Separates RDFS from the triple table Supports database schema generation based on the underlying RDF schema of a dataset The problem of too many joins remains 9
10
Native RDF systems: RDFBroker Sintek et al., ESWC 2006 The database schema is built based on signatures – the set of properties used on a resource Reduces the number of joins between tables 10
11
The human perspective 11
12
The human perspective 12
13
The human perspective 13
14
The human perspective 14
15
The human perspective 15
16
Outline RDF data and queries The GRIN Index structure Answering queries Experimental evaluation 16
17
GRIN intuition Resources “closer” in the RDF graph are more likely to be part of the same answer Hence they should appear on the same page GRIN will group resources in circles around selected center resources Query evaluation: Find the smallest circle that contains the answer Evaluate query only on resources in that circle 17
18
The GRIN Index structure GRIN is a binary tree in which: Leaf nodes are sets of resources (and the associated triples) Inner nodes are circles consisting of a center resource and a radius Each node is fully contained in its parent Distance metric: shortest path distance in the undirected graph 18
19
Building the index: clustering 19
20
Building the index: clustering 20
21
Building the index: clustering 21
22
Building the index: clustering 22
23
Building the index: clustering Standard k-medoids clustering (Kaufman & Rousseeuw, 1987) How many clusters? R is the set of resources M is the maximum number of resources per page Average link gives the best performance for the inter-cluster distance 23
24
Building the index: the tree 24
25
Building the index: the tree 25
26
Building the index: the tree 26
27
Outline RDF data and queries The GRIN Index structure Answering queries Experimental evaluation 27
28
Queries to constraints Extract constraints from the query: d(?v1, Italian) ≤ 1 d(?v2, Norfolk) ≤ 1 d(?v3, Italian) ≤ 2 …and so on 28
29
Query evaluation 29 Goal: identify the smallest circle that is guaranteed to contain an answer to the query 1. Perform a depth-first traversal 2. For each index node, evaluate the constraints 3. If the constraints guarantee an answer, perform subgraph matching
30
Query evaluation 30
31
Evaluating constraints Constraints: d(?v1, Italian) ≤ 1, d(?v2, Norfolk) ≤ 1, d(?v3, Italian) ≤ 2 Question: is ?v1 in the circle (Grivanti, 3)? d(Grivanti,?v1) ≤ d(Grivanti, Italian) + d(?v1, Italian) ≤ 1 + 1 = 2 ?v1 must be in the circle (Grivanti, 3) 31
32
Evaluating constraints Question: is ?v3 in (Grivanti, 3)? d(Grivanti, ?v3) ≤ d(Grivanti, Italian) + d(Italian, ?v3) ≤ 1 + 2 = 3 ?v3 must be in (Grivanti, 3) Similarly, ?v2 is in the same circle 32
33
Subgraph matching Perform subgraph matching on the resources in the circles guaranteed to contain an answer Algorithm by Cordella et. al, IEEE PAMI 26(10), 2006 Worst-time complexity of O(N!) Where N is the maximum number of nodes in either graph In practice, GRIN makes N very small 33
34
Outline RDF data and queries The GRIN Index structure Answering queries Experimental evaluation 34
35
Experimental framework Comparison between GRIN, Sesame, Jena2 and RDFBroker (in-memory) Index build time Memory consumption at query time Query time Two real-world datasets: TAP (Stanford): datasets between 1.5MB and 300MB ChefMoz (chefmoz.org): 220 MB 35
36
Index build time 36
37
Memory consumption 37
38
Query time 38
39
Average degree of a query node 39
40
Conclusions Method for indexing large RDF graphs adapted to the characteristics of RDF queries Avoids expensive join operations Gives better query times than Jena2, Sesame and RDFBroker Current and future work: Disk-based index Analysis of overlap and coverage 40
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.