Lu Xing CS59000GDM Sept 7th, 2018
Knowledge Graph The Knowledge Graph is a knowledge base used by Google and its services to enhance its search engine's results with information gathered from a variety of sources.
Soccer player 30 million EUR Juventus Feb 5th, 1985 Cristiano Ronaldo Funchal, Portugal 6’2’’ Soccer player Juventus Current team Occupation Salary Birth date Birth place Height
RDF Resource Description Framework It is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. Subject Object Predicate
Subject: URI or a blank node, both of which denote resources Object Predicate Subject: URI or a blank node, both of which denote resources Predicate: URI Object: URI, blank node or a Unicode string literal. URI: uniform resource identifier, a string of characters designed for unambiguous identification of resources and extensibility via the URI scheme.
SPARQL SPARQL Protocol and RDF Query Language Graphical queries
How to find a matched pattern in a RDF graph? RDF indexing!
Idea in a high level Identify certain centers in the graph Propose a notion of radius from these center vertices Associate vertices within the radius of the center Define a tree data structure to store the regions specified by the centers
GRIN index
Query Evaluation Graphical query q = (N,V,E,λn) derive inequality constraints For any path of length l on edges from Es connecting a resource c and a variable v, we write d(c, v) ≤ l. The rule applies similarly for paths from v to c. For any edge (c, v, P, l) ∈ Ed, we write d(λn(c), λn(v)) ≤ l. The rule applies similarly for edges from v to c.
Two rules for rejection Rule 1: for any constant (resource) x in q, reject (c, r) if d(c, x) > z (?) Rule 2: if there exists v ∈ V such that (c, r) does not definitely satisfy v, then reject (c, r)
Compared to other work
Drawbacks? (from other groups) It focuses more on path-like queries on RDF data It shows good performance on small- to medium sized data and for hand-specified execution plans. It entails difficult optimization problems of identifying the most beneficial paths.
Thank you!
Before building the index… M: the maximum number of RDF graph vertices per page C: the number of leaf nodes |R|: the number of resources and literals |R| / C ≤ M ⇒ C ≥ |R| / M .