瞿裕忠（Yuzhong Qu）计算机科学与技术系

瞿裕忠（Yuzhong Qu） yzqu@nju.edu.cn 计算机科学与技术系
About blank nodes 瞿裕忠（Yuzhong Qu）计算机科学与技术系

Reading material Aidan Hogan, Marcelo Arenas, Alejandro Mallea, Axel Polleres: Everything you always wanted to know about blank nodes. J. Web Sem. 27: (2014) Extended paper of “On blank nodes”(ISWC2011)

Introduction Blank node: one of core features of RDF
Misunderstood, misinterpreted, ignored Inconsistency between the standard and its actual use Are the semantics and the current definition of blank nodes appropriate for the needs of the Web community?

Running Example

Preliminaries An RDF graph: a finite set of RDF triples
(s, p, o) ∈ UB × U × UBL terms(G): the set of elements of UBL occurring in G. voc(G) = terms(G) ∩ UL A graph G is ground if terms(G) ∩ B = ∅ A map is a partial function μ : UBL → UBL such that μ(u) = u for all u ∈ dom(μ) ∩ UL μ(G) ={(μ(s), μ(p), μ(o)) |(s, p, o) ∈ G} μ : G1 → G2, if dom(μ) = terms(G1) and μ(G1) ⊆ G2.

Preliminaries Isomorphic
maps blank nodes to blank nodes on a one-to-one basis (s, p, o) ∈ G1 if and only if (μ(s), μ(p), μ(o)) ∈ G2. A map μ is consistent with G if μ(G) is an RDF graph, if s is the subject of a triple in G, then μ(s) ∈ UB, if p is the predicate of a triple in G, then μ(p) ∈ U, etc. If μ is consistent with G, μ(G) is called an instance of G. An instance μ(G) of G is proper if μ(G) has fewer bnodes than G. A merge of G1 and G2, denoted G1+G2, is the union G′1∪G′2, G′1 and G′2 are isomorphic copies of G1 and G2, respectively the sets of blank nodes in G′1 and G′2 are disjoint from each other.

Preliminaries G simple-entails H, denoted by G  H, if every model of G is also a model of H. The simple entailment G H holds if and only if there is a map μ : H → G. The intractability of deciding whether an RDF graph G simple-entails a graph H depends only on the structure of the subgraph of H induced by its blank nodes An RDF graph G is lean if there is no map μ such that μ(G) is a proper subgraph of G;

Running Example

Existential variables in first-order logic
Existential first-order formulas without negation and disjunction. ρ: UBL→UVL is a 1-1 map that is the identity on UL. ρ(t): triple(ρ(s), ρ(p), ρ(o)), for t = (s, p, o) Skolemisation Replace existentially quantified variables by ‘‘fresh’’ constants ∃x∀y R(x, y) , ∀y R(c, y) ∀x∃y (P(x) → Q(y)) , ∀x(P(x) → Q(f(x)))

Simple entailment checks
The simple entailment check G H has the upper bound O(n2+mn2k), where k = tw(blank(H))+1 blank(H): the blank graph of H tw(.): treewidth of a given graph

Blank nodes in the standards
Turtle can avoid the auxiliary blank nodes in some cases The JSON-LD specification permits use of blank nodes in the predicate position. To map such data to RDF, IRIs must first be minted for predicate terms. Jena offers sound and complete methods for checking the isomorphism of two RDF graphs

Blank nodes in the standards
The incompleteness of RDFS entailment rules We cannot infer the triple :Federer rdf:type :Competitor

SPARQL Support for blank nodes
With respect to querying over blank nodes in the dataset, SPARQL considers blank nodes as constants that are local to the scoping graph they appear in {{(?X, _: b1)}, {(?X, _: b3)}}

Running Example

SPARQL Support for blank nodes
SPARQL uses blank nodes in the WHERE clause of the query to represent non-distinguishable variables, A second use of blank-node is within CONSTRUCT templates, which generate RDF data from solution mappings

Blank nodes in publishing
BTC-2012 corpus 1.230 billion unique quadruples 8.373 million RDF documents 829 different pay-level domains Prevalence of blank nodes in Web data 274 M (22.3%) triples had a blank node as subject and 94M (7.7%) triples had a blank node in the object position. 88M (25.9%) were blank nodes among all RDF terms. 3.758 M (44.9%) documents featured at least one blank node. 549 (66.2%) PLDs feature use of at least one blank node

Structure of blank nodes in web data
1.477 M (39.3%) docs contained connected BNodes: 3.334 M non-singleton components contained M blank nodes each component contained on average 18.8 blank nodes. (71.0%) blank nodes were connected (37.7%) blank-node components containing cycles. 17 domains published blank-node components with cycles

Of the 1,258,774 with a treewidth of 2, 1,257,229 of these (99.9%) originated from “data.gov.uk” Only 19 components have a treewidth of three or more.

Distribution of degree of connected BNodes in directed blank graphs (log/log)

(Non-) Lean blank nodes in web data
5.378 M (6.07%) BNode are non-lean. The vast majority are isomorphic cases, main reasons: documents copies blank nodes are left ‘‘underspecified’’ and thus referentially ambiguous, where we would conjecture that the intent is often to refer to different real world things with each blank node.

Alternatives for blank nodes
Deprecate/disallow blank nodes Discouraging the ‘‘unnecessary’’ use of blank nodes Ground semantics Well-behaved RDF Acyclic blank nodes No Change

Summary Semantic of blank nodes Simple entailment, Leanness checking
Tree  Treewidth Experimental analysis

Research Issue Sentence pattern: Sub-tree mining
Sub-tree isomorphic problem Inexact/Approximate graph matching

Related readings J.-F. Baget, RDF entailment as a graph homomorphism, in: International Semantic Web Conference, 2005, pp. 82–96. J.J. Carroll, Signing RDF graphs, in: ISWC, 2003, pp. 369–384. Y. Tzitzikas, C. Lantzaki, D. Zeginis, Blank node matching and RDF/S comparison functions, in: International Semantic Web Conference, 2012, pp. 591–607. J. de Bruijn, S. Heymans, Logical foundations of (e)RDF(S): complexity and reasoning, in: ISWC/ASWC, 2007, pp. 86–99. ter Horst H J Completeness, Decidability and Complexity of Entailment for RDF Schema and a Semantic Extension Involving the OWL Vocabulary. Web Semantics: Science, Services and Agents on the World Wide Web, 3(2-3): P. Hayes, RDF Semantics. W3C Recommendation, February 2004.

Related readings—Mining Subtree
Yun Chi, Richard R. Muntz, Siegfried Nijssen, Joost N. Kok: Frequent Subtree Mining - An Overview. Fundam. Inform. 66(1-2): (2005) Yun Chi, Yi Xia, Yirong Yang, Richard R. Muntz: Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees. IEEE Trans. Knowl. Data Eng. 17(2): (2005) Yun Chi, Yirong Yang, Richard R. Muntz: Canonical forms for labelled trees and their applications in frequent subtree mining. Knowl. Inf. Syst. 8(2): (2005)

Related readings—Inexact Graph Matching
Jason Tsong-Li Wang, Kaizhong Zhang, Gung-Wei Chirn: Algorithms for Approximate Graph Matching. Inf. Sci. 82(1-2): (1995) Kaspar Riesen, Xiaoyi Jiang, Horst Bunke. Exact and Inexact Graph Matching: Methodology and Applications. Advances in Database Systems Volume 40, 2010, pp Yuanyuan Tian, Jignesh M. Patel: TALE: A Tool for Approximate Large Graph Matching. ICDE 2008:

Related readings—Inexact Graph Matching
Endika Bengoetxea. Inexact Graph Matching Using Estimation of Distribution Algorithms. 2002, PhD Thesis Mongiovì M1, Di Natale R, Giugno R, Pulvirenti A, Ferro A, Sharan R. SIGMA: a set-cover-based inexact graph matching algorithm. J Bioinform Comput Biol Apr;8(2): Laura A. Zager, George C. Verghese, Graph similarity scoring and matching, Applied Mathematics Letters, Volume 21, Issue 1, January 2008, Pages

Q&A 欢迎讨论

瞿裕忠（Yuzhong Qu）计算机科学与技术系

Similar presentations

Presentation on theme: "瞿裕忠（Yuzhong Qu）计算机科学与技术系"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

瞿裕忠（Yuzhong Qu） 计算机科学与技术系

Similar presentations

Presentation on theme: "瞿裕忠（Yuzhong Qu） 计算机科学与技术系"— Presentation transcript:

Similar presentations

About project

Feedback

瞿裕忠（Yuzhong Qu）计算机科学与技术系

Presentation on theme: "瞿裕忠（Yuzhong Qu）计算机科学与技术系"— Presentation transcript: