/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, RAL: an RDF Algebra Flavius Frasincar Geert-Jan Houben Richard Vdovjak Peter Barna
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, Contents 1.Introduction 2.RAL Goals 3.RAL Data Model 4.RAL Operators 5.Conclusion
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, Introduction Metadata is machine understandable information about web resources or other things [Source: Tim Berners-Lee, “Metadata Architecture”] RDF (Resource Description Framework) is the Web metadata language for the Web RDF extends the syntactic interoperability of XML to semantic interoperability being the foundation for the Semantic Web
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, Semantic Web Architecture “Layer Cake” [Source: Tim Berners-Lee Director W3C Keynote speech at XML2000 “ RDF and the Semantic Web ” (Washington DC, 6 Dec. 2000)]
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, Hera Hera research project: Web Information Systems (WIS) and web (hypermedia) generation in WIS WIS use RDF to represent and query application data for: –Semantic integration of data coming from heterogeneous sources –Semantic information presentation –Semantic querying Huge quantities of data and metadata need to be processed in real-time: optimization is crucial
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, Hera Methodology/Suite
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, RDF Representations Primitive semantics: Subject Predicate Object Three alternative notations: Triple ( painted_by, “Rembrandt”) RDF/XML Rembrandt Graph painted_by Rembrandt
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, RDF Query Languages Triple-based: –Triple [successor of SiLRI] (Horn logic) –Metalog (Datalog) XML-based: –RDF Query –RQuery (XQuery) Graph-based (but not graphical): –RQL (OQL)
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, RAL Goals Support the formal specification of RDF query languages Provide a reference framework to compare different RDF query languages Consider the result construction phase –presently neglected by RDF query languages which focus only on extraction Enable algebraic query optimization
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, RAL RAL Data Model: specify what information is accessible (for RAL operators) in an RDF graph –Nodes: Resources and Literals –Edges: Properties RAL Operators: define operators working on collections of nodes from the RAL Data Model –Extraction Operators –Loop Operators –Construction Operators
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, RAL Data Model R is the set of resources R = U B U is the set of URI references rdf:Property U B is the set of blank nodes L is the set of literals U, B, L are disjoint P is the set of properties P R, rdf:type P R L rdf:type rdf:Property UB P
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, An RDF model M is a finite set of triples (statements) M R U (R L) The set of properties of an RDF model M P M = {p| (s, p, o) M (p, rdf:type, rdf:Property) M} The RDF graph model is similar to a directed labeled graph (DLG) –It is not a DLG since it allows for multiple edges between two nodes –It is not a general multigraph because different edges between two nodes cannot share the same label
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, The RDF graph model corresponding to an RDF model M is defined by G M = (N, E, l N, l E ), l N : N R L, l E : E P using the following construction mechanism: for each (s, p, o) M add nodes n s, n o to N (different only if s o) assign l N (n s ) = s, l N (n o ) = o add e p to E as a directed edge between n s and n o assign l E ( e p ) = p Observations: l N (.) is an injective partial function l E (.) is a total function
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, Basic Properties Basic Property Result for resources Result for literals id l N (u), u Ul N (s), s L typeResourceLiteral Basic Property Result name l E (p), p P subject r, r R object o, o R L Two non-blank nodes are equal if they have the same id Two blank nodes are equal if they have the same properties and the corresponding property values are equal Nodes Edges
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, RDF(S)-Closure RDF Model Theory defines the RDF-closure and RDFS- closure of an RDF Model M by proposing a set of rules for generating new triples Extensional data: the original model M triples Intensional data: the new triples generated by the RDF(S)- closure RAL operators work on extensional+intensional data Variants of the operators can be defined to neglect the intensional data (similar to the RQL strict interpretation)
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, RAL Operators All operators have the following form o[f](x 1, x 2, … x n : expression) where an expression is a collection of nodes and f is a function having as input/output collection of nodes Extraction Operators: retrieve the needed information from an RDF graph Loop Operators: control the repetitive application of certain operators Construction Operators: build new RDF graphs from the extracted data
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12,
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, Projection [re_name](e: expression) computes the values of the properties with a name given by the regular expression re_name over strings on the input collection given by e Example [(P|p)aint[s]#](r4) returns the resources painted by r4 4.1 Extraction Operators
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12,
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, Selection [condition](e: expression) selects input collection nodes fulfilling the given condition Example [ [tname] = “Chiaroscuro”](c) where c is the collection of input resources r1, r2, r3, and r4, returns the resources representing the painting technique with the name“Chiaroscuro”
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12,
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, Cartesian Product (x: expression) (y: expression) for each element in the Cartesian product of the input collections, a blank node that has all properties of both originating nodes is added to the result Example [ [rdf:type] = Technique](c) [ [rdf:type] = Painter](c) returns a collection of blank nodes, each blank node having all the properties of the corresponding pair from the Cartesian product (the new nodes have both types Technique and Painter)
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12,
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, Join (x: expression) ⋈ [condition] (y: expression) [condition](x y) is a Cartesian product followed by a selection Example (x: [ [rdf:type] = Technique](c)) ⋈[ [ exemplified_by](x) = [paints](y) ] ( y: [ [rdf:type] = Painter](c)) returns a collection of blank nodes, each blank node having all the properties of the corresponding pair from the Cartesian product that satisfies the given condition
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12,
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, Union, Difference, Intersection (x: expression) (y: expression) where { , , } defined as in set theory Example [ [rdf:type] = Technique](c) [ [rdf:type] = Painter](c), returns the collection of resources obtained by combining the two collections (these two collections are obtained using two selections)
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12,
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, Loop Operators Map map[f](e: expression) applies the function f to each element of the input collection; the function results are added in the output collection Example map[ [ rdfs:subClassOf]](Painting, Painter) computes the parent classes using the property rdfs:subClassOf for the collection consisting of Painting and Painter
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12,
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12,
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, Kleene Star [f](e: expression) repeats the function f possibly infinite times starting with the given input collection; at each iteration the results of the function are added to the next function input Example [ [rdfs:subClassOf]](Painting)) computes the transitive closure of the property rdfs:subClassOf starting from Painting, i.e. Painting and all its superclasses
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12,
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, Construction Operators Create Node node[type, id]() adds a new node to the graph with the given type and id (id is missing for blank nodes) and returns this node; if a resource is created, an rdf:type edge is added between the resource and the node representing rdfs:Resource The Create Node operator assigns a unique (in the resulted RDF graph) internal identifier for each created node
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, Caravagio rdfs:Resource rdf:type Example node[Resource]() and node[Literal,“Caravagio”]() create a Resource representing a blank node and a Literal representing the string “Caravagio”
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, Create Edge edge[name, subject](object: expression) adds edges between the subject node and each of the nodes in the object collection, and returns the subject node; the label of the edges is given by name which is the id of a property resource The Create Node and Create Edge operators abort if the “well- formed RDF(S) graph” conditions (e.g. rdf:type cannot refer to a literal, literals cannot have properties etc.) are not met after construction
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, name Caravagio rdfs:Resource rdf:type Example edge[name, node[Resource]()](node[Literal, ”Caravagio”]()) creates an edge labeled with name between the nodes defined in the previous example
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, Conclusion The RAL algebra is developed from a DB perspective and proposes a set of operators similar to their relational algebra counterparts: –Extraction Operators: Projection, Selection, Cartesian Product, Join, Union, Difference, Intersection Similar to the existing semi-structured query languages RAL considers powerful repetition operators: –Loop Operators: Map, Kleene Star As opposed to present RDF query languages RAL supports result construction: –Construction Operators: Create Node, Create Edge
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, Future Work Analyze the power of expression of RAL compared to RQL, a popular RDF query language at present time (build a translation scheme from RQL to RAL) Formally specify the semantics of other RDF query languages in terms of RAL Compare the power of expression of different RDF query languages using RAL as reference language Explore equivalence rules for RAL expressions to be used in query optimization Develop an RDF query optimization algorithm on RAL