1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan
2 Maintenance of RDF Aggregate Views Introduction of RDF and RDQL RDQL Extension for Aggregate Views Aggregate View Maintenance Algorithms AMX Implementation and Experiments Related Work
3 Introduction Resource Description Framework (RDF) W3C Recommendation Represents metadata about resources identifiable on the web (by Uniform Resource Identifier (URI)) Triple: (Resource, Property, Value) (Artist, rdf:type, rdfs:Class) (Painter, rdf:type, rdfs:Class) (Painter, rdfs:subClassOf, Artist)
]> ]> <rdf:RDF xmlns:rdf =" xmlns:ns1=" Guy RDF Schema RDF Instance
]> ]> <rdf:RDF xmlns:rdf =" xmlns:ns1=" Guy Artist String Painter fname subClassOf &r1 Guy fname &r1 =
7 RDQL: RDF Query Language SELECT?highprice WHERE (?artist,, "Rose"), (?artist,, "Guy"), (?artist,, ?artifact), (?artifact,, ?price), (?price,, ?highprice), (?artifact,, ?date) AND <= ?date <= USING ns1 FOR graph pattern
8 RDQL Extension for Aggregates and Views CREATEVIEW AS SELECT max(?highprice) WHERE (?artist,, "Rose"), (?artist,, "Guy"), (?artist,, ?artifact), (?artifact,, ?price), (?price,, ?highprice), (?artifact,, ?date) AND <= ?date <= USING ns1 FOR
9 Aggregate Query Aggregate operators, e.g. min, max, sum, count, average GROUP BY clause Output a table of tuples Output can be (i) an RDF instance or (ii) a table Advantage of (i): allows us to further query the result However, (ii) allows any forms of tables, which include the possibility to output in the form of an RDF instance if the table consists of a set of RDF tuples.
We are expanding the syntax of RDQL so that it allows constants in SELECT clauses which equivalently creates new resources using the constants. For example, the previous query can be modified as follows CREATEVIEW AS SELECT,, max(?highprice) WHERE (?artist,, "Rose"), (?artist,, "Guy"), (?artist,, ?artifact), (?artifact,, ?price), (?price,, ?highprice), (?artifact,, ?date) AND <= ?date <= USING ns1 FOR The result is a valid RDF statement (,,``800000"^^ns1:USD)
11 Aggregate View Maintenance Relational Approach Store all triples in a relational table with schema (Resource, Property, Value) OR Store resources and values of the same property in a separate relational table with schema (Resource, Value) #self-joins = (#triples in where-clause) – 1 Large number of delta rules during relational view maintenance expensive
12 Aggregate View Maintenance Our Approach Localized search in RDF graphs Modified version of breadth-first search starting at the inserted/deleted edge auxiliary data are needed for certain aggregate views min, max, avg
13 Distributive Aggregate Function An aggregate function f is distributive w.r.t a source update operation if and only if the updated value is based on its old value and update without reference to the source. Examples: count, sum, average w.r.t. insertion, deletion and update For average, we will need an additional attribute size which stores the size of intermediate result S in order to compute the correct updated value (or, we can use sum, count to calculate it) max and min are distributive w.r.t. insertion, but not deletion and update Auxiliary data computed from S help to avoid the need to refer to the source.
graph pattern
BAG
800000
SELECT max(?highprice) BAG ,
18 Compute Aggregates Algorithm CAA Algorithm CAA(I, Q) /* Input: RDF graph I, query Q */ /* Output: table T(Q, I) */ 1) GP BuildGP(Q); X aggregate variables of Q; 2) Y GROUP BY variables of Q; 3) S [VRetrieve(θ, GP, X U Y) | θ MSearchAll(GP, Q, I)]; 4) Return T(Q, I) TCompute(S, Q);
19 Aggregate View Maintenance Algorithms AMX AMI – Insertion AMD – Deletion AMT – Triple Modification AMR – Resource Modification
Update: Insertion BAG , paints
BAG , paints
SELECT max(?highprice) BAG , , paints
23 AMI for Insertion Algorithm AMI(I, Q, A(Q, I), T(Q, I), t) /* Input: RDF graph I, query Q, auxiliary data A(Q, I), query result T(Q, I), inserted triple t */ /* Output: table T(Q, I U t), auxiliary data A(Q, I U t) * 1) GP BuildGP(Q); 2) X aggregate variables of Q; 3) Y GROUP BY variables of Q; 4) If TMatch(GP, t) == TRUE, then a) ΔS [VRetrieve(θ, GP, X U Y) | θ MSearch(GP, Q, t, I U t)]; b) return (T(Q, I U t), A(Q, I U t)) TMaintain I (T(Q,I), ΔS, A(Q, I), Q); 5) else, return (T(Q, I U t), A(Q, I U t)) (T(Q, I), A(Q, I));
24 Algorithm MSearch(GP, Q, t, I) /* Input: graph pattern GP, query Q, triple t, RDF graph I */ /* Output: Θ = {θ | θ is a pattern matching} */ 1) Θ ; 2) for each t’ GP s.t. θ’, t θ’ = t’ θ’, a) for each θ bSearch(t, t’, GP, I), i. if θ satisfies the constraints in Q, then Θ Θ U θ; 3) return Θ;
25 Handling GROUP BY From GROUP BY clause, each tuple in ΔS affects a particular group. TMaintain I only maintain each affected group (and its corresponding auxiliary data) using affecting tuples. Delete empty groups and insert new groups.
26 TMaintain I Handling sum, count, min, max No auxiliary data required Suppose f(x) is an aggregate function on attribute x, F the original result, F’ the new result F’ = F + if f = sum F’ = F + |ΔS| if f = count F’ = min([F] U π x (ΔS)) if f = min F’ = max([F] U π x (ΔS)) if f = max π x (ΔS) projects a bag of values of x from ΔS
27 TMaintain I Handling average We need size of S size’ = size+|ΔS|
BAG , , Update: Deletion paints
BAG , , paints
SELECT max(?highprice) BAG , paints
31 AMD for Deletion Algorithm AMD(I, Q, A(Q, I), T(Q, I), t) /* Input: RDF graph I, query Q, auxiliary data A(Q, I), query result T(Q, I), deleted triple t */ /* Output: table T(Q, I - t), auxiliary data A(Q, I - t) * 1) GP BuildGP(Q); 2) X aggregate variables of Q; 3) Y GROUP BY variables of Q; 4) If TMatch(GP, t) == TRUE, then a) ΔS [VRetrieve(θ, GP, X U Y) | θ MSearch(GP, Q, t, I)]; b) return (T(Q, I - t), A(Q, I - t)) TMaintain D (T(Q,I), ΔS, A(Q, I), Q); 5) else, return (T(Q, I - t), A(Q, I - t)) (T(Q, I), A(Q, I));
32 TMaintain D Handling min, max Min and max are not distributive w.r.t. deletion We need to store π x (S) which projects a bag of values of x from S The new aggregate value F’ is obtained by: F’ = min(π x (S - ΔS)) if f = min F’ = max(π x (S - ΔS)) if f = max We need to update π x (S) to become π x (S) - π x (ΔS)
33 Implementation and Experiment Implemented in Java Jena – RDQL Engine of HP Comparison with Relational Approach (standard view maintenance algorithm on relational tables) Counting Algorithm in Gupta et al. "Maintaining Views Incrementally", SIGMOD 1993 Dataset: Chef Moz Project RDF dump Data stored in memory
34
35 Other Related Work Volz, Oberle, Studer [DBFUSION’02] the first to introduce a view mechanism for RDF data Their views require that 1. the results contain class instances (i.e., a subject or object variable), or 2. the result itself has the pattern of RDF statement (i.e., a triple containing subject, predicate and object). Magkanaraki et al [ISWC’03] proposed RVL, a view definition language that can also create virtual RDF schemas and restructure class and property hierarchies such that new resources, property values, classes and property types can be created. None of these works specifically address (i) aggregates in RDF or (ii) the problem of maintaining aggregate RDF views.
36 Summary Aggregate Views are important for RDF applications RDQL Extension for Views and Aggregates Aggregate View Maintenance Algorithms AMX Localized search in RDF graphs
37 Thank you very much! Questions and Answers