Claudio Gutierrez, Carlos Hurtado, Alberto O. Mendelzon 1.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
From Handbook of Temporal Reasoning in Artificial Intelligence By Jan Chomicki & David Toman Temporal Databases Presented by Leila Jalali CS224 presentation.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
RDF Schemata (with apologies to the W3C, the plural is not ‘schemas’) CSCI 7818 – Web Technologies 14 November 2001 Van Lepthien.
The Semantic Web – WEEK 4: RDF
An Introduction to RDF(S) and a Quick Tour of OWL
CS570 Artificial Intelligence Semantic Web & Ontology 2
RDF Tutorial.
Answer Set Programming Overview Dr. Rogelio Dávila Pérez Profesor-Investigador División de Posgrado Universidad Autónoma de Guadalajara
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Computability and Complexity 9-1 Computability and Complexity Andrei Bulatov Logic Reminder (Cnt’d)
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Firewall Policy Queries Author: Alex X. Liu, Mohamed G. Gouda Publisher: IEEE Transaction on Parallel and Distributed Systems 2009 Presenter: Chen-Yu Chang.
The Semantic Web – WEEK 5: RDF Schema + Ontologies The “Layer Cake” Model – [From Rector & Horrocks Semantic Web cuurse]
2005certain1 Views as Incomplete Databases – Certain & Possible Answers  Views – an incomplete representation  Certain and possible answers  Complexity.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
More RDF CS 431 – Carl Lagoze – Cornell University Acknowledgements: Eric Miller Dieter Fensel.
Foundations of Semantic Web Databases Gutierrez, Hurtado and Mendelzon Presented by: Nir Zepkowitz.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Semantic Web Technologies. The Semantic Web: Means Many Things to Many People.
Module 2b: Modeling Information Objects and Relationships IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 12: Ontologies and Knowledge Representation PRINCIPLES OF DATA INTEGRATION.
RDF Semantics by Patrick Hayes W3C Recommendation Presented by Jie Bao RPI Sept 4, 2008 Part 1 of RDF/OWL Semantics Tutorial.
RDF: Concepts and Abstract Syntax W3C Recommendation 10 February Michael Felderer Digital Enterprise.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
CS848: Topics in Databases: Foundations of Query Optimization Topics Covered  Databases  QL  Query containment  More on QL.
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
Logics for Data and Knowledge Representation
RDF and OWL Developing Semantic Web Services by H. Peter Alesso and Craig F. Smith CMPT 455/826 - Week 6, Day Sept-Dec 2009 – w6d21.
Master Informatique 1 Semantic Technologies Part 11Direct Mapping Werner Nutt.
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
1 Relational Algebra. 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of data from a database. v Relational model supports.
Of 41 lecture 4: rdf – basics and language. of 41 RDF basic ideas the fundamental concepts of RDF  resources  properties  statements ece 720, winter.
Semantic Web - an introduction By Daniel Wu (danielwujr)
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
1 Relational Algebra and Calculas Chapter 4, Part A.
Relational Algebra.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Semantic Web Exam 1 Review.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
A Logic of Partially Satisfied Constraints Nic Wilson Cork Constraint Computation Centre Computer Science, UCC.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Understanding RDF. 2/30 What is RDF? Resource Description Framework is an XML-based language to describe resources. A common understanding of a resource.
CS6133 Software Specification and Verification
DISCRETE COMPUTATIONAL STRUCTURES CSE 2353 Fall 2010 Most slides modified from Discrete Mathematical Structures: Theory and Applications by D.S. Malik.
THEORY OF COMPUTATION Komate AMPHAWAN 1. 2.
Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.
Doc.: IEEE /0169r0 Submission Joe Kwak (InterDigital) Slide 1 November 2010 Slide 1 Overview of Resource Description Framework (RFD/XML) Date:
ece 627 intelligent web: ontology and beyond
Of 35 lecture 17: semantic web rules. of 35 ece 627, winter ‘132 logic importance - high-level language for expressing knowledge - high expressive power.
Containment of Relational Queries with Annotation Propagation Wang-Chiew Tan University of California, Santa Cruz.
Formal Semantics Purpose: formalize correct reasoning.
Ontology Technology applied to Catalogues Paul Kopp.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
1 Representing and Reasoning on XML Documents: A Description Logic Approach D. Calvanese, G. D. Giacomo, M. Lenzerini Presented by Daisy Yutao Guo University.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Ontology.
Semantic Web Basics (cont.)
Relational Algebra & Calculus
Representations & Reasoning Systems (RRS) (2.2)
CS589 Principles of DB Systems Fall 2008 Lecture 4e: Logic (Model-theoretic view of a DB) Lois Delcambre
Presentation transcript:

Claudio Gutierrez, Carlos Hurtado, Alberto O. Mendelzon 1

 The Web is a huge collection of varied interconnected data which lacks of semantic. Therefore, understandable only by humans. To allow anyone to say anything about anything  The Semantic Web is based on the idea of adding machine understandable semantics to web information via annotations., so that they can perform more of the tedious work involved in finding, sharing and combining information on the web. 2

 The rows represent the things you are storing information about.  The columns represent the properties of those things.  The intersection gives the value of that property for that thing. 3

book title JavaScript subject property value 4

 Resource Description Framework (RDF).  The RDF model was designed with the following goals: simple data model, formal semantics and provable inference, extensible URI-based vocabulary, allowing anyone to make statements about any resource.  RDF statement is the way to describe any resource which can have a URI, through it’s properties using binary predicates and another resource. 5

 RDF statement - (Subject, Predicate, Object) ( "Wikipedia“)  Or in XML format: <rdf:RDF xmlns:rdf=" xmlns:dc=" Wikipedia … 6

 RDF lacks the ability of expressing the relations between objects (e.g. Cat is an Animal, Book has an Author).  RDF Schema (also called RDFS vocabulary) provides additional information about properties, e.g. adds information about the classes and properties of resources and the relations between them. 7

 RDFS main constructs: Class, subClassOf, Property, subPropertyOf, Object, Predicate, Subject, Range, Domain, Type, etc… A: (John, Class, Man) B: (Man,subClassOf, Person) C: (A,Subject,John)  Enables “Duck Typing”. Reification 8

 Given data which is represented by RDF format, the query language (e.g. SPARQL) enables to retrieve and manipulate the data.  Like in other querying languages we would like to “filter” and reorganize the data. Although the data can be part of different DBs, and represented in different formats, its semantic is represented with RDFS and ontologies, common to all of the data. 9

RDF DB 10

RDF DB 11

RDF DB 12

 Different representation of the data (no normal form) and redundancy elimination.  Equivalence (of DBs, queries and answers).  Entailment and containment of queries.  The impact of predefined semantics (RDFS vocabulary), blank nodes, reification and premises on queries.  Complexity issues. 13

 Blank node of resource is a resource in RDF DB (or graph), which is not identified by URI (Universal Resource Identifier). (John, knows, _:p1) (_:p1, birthDate, 04-21) “exist _:p1 who is known by John and his date of birth is the 21 st of April”  Enables partial understanding when information is missing.  We will use letters N,X,Y,… to donate blank nodes. 14

UBL (Resources) UBL (Resources)  For a given triple (Subj, Pred, Obj)  RDF graph G is a set of triples. SubjObj Pred U (URIs) U (URIs) B (Blank Nodes) B (Blank Nodes) L (Literals) L (Literals) 15

 The universe of a graph is the set of elements of UBL, which occur in the triples of G, universe(G).  The vocabulary of a graph G is the set of elements of UL, which occur in the triples of G.  A graph is ground if it has no blank nodes.  The union of G 1, G 2 is the union of their sets of triples, donate by G 1 ∪ G 2.  The merge of G 1, G 2 is the union of their sets of triples, where the sets of blank nodes are disjoint, donate by G 1 + G 2. (merge is safe) 16

X sc ac Y ac G2G2 G1G1 G 1 ∪ G 2 X sc ac G 1 +G 2 X sc ac Y 17

 Describes properties like attributes of resources, and relationships between them. Also enable to make statements about statements, reifications. For a given triple N:(a, b, c) occurs in N occurs type stat abc subjobj pred 18

 Map is a function μ:UBL → UBL.  μ is consistent with graph G, if μ(G) is RDF graph. And μ(G) is an instance of G.  An instance is proper if it has fewer blank nodes.  Overloading the meaning of map, μ:G 1 → G 2 if there is a map μ such that μ(G 1 ) is subgraph of G 2. 19

 Two graphs G 1 and G 2 are isomorphic if there are maps μ 1 and μ 2 such that μ 1 (G 1 )=G 2 and μ 2 (G 2 )=G 1, donated by G 1 ≃ G 2. 20

a a b b c c d d g g h h i i j j ƒ(a) = 1 ƒ(b) = 6 ƒ(c) = 8 ƒ(d) = 3 ƒ(g) = 5 ƒ(h) = 2 ƒ(i) = 4 ƒ(j) = 7 21

 A graph G is lean, if there is no map μ such that μ(G) is a proper subgraph of G. a q p X Y p r a p X Y p b G2G2 G1G1 22

 Theorem: Each RDF graph G contains a unique (up to isomorphism) lean subgraph which is an instance of G. We will denote this unique subgraph by core(G).  Theorem:  Deciding if G is lean is coNP-complete (reduction to tautology).  Deciding if G’ ≃ core(G) is DP-complete. 23

 An interpretation I of RDF graph G: 1. A non-empty set of resources Res. 2. The literals, a subset Lit ⊆ Res. 3. A set of binary properties Prop ⊆ Res X Res. 4. Mapping from the vocabulary of G, U  Res ∪ Prop and L  Lit. 24

 An RDF graph G 1 entails G 2, denoted G 1 |= G 2, iff every interpretation over the vocabulary of G 1 ∪ G 2 which satisfies G 1 also satisfies G 2.  We say that two graphs are equivalent, denoted G 1 ≡ G 2, if G 1 |= G 2 and G 2 |= G 1. 25

 A simple RDF graphs is a graph that do not use vocabulary with a predefined semantics.  Theorem: A simple RDF graph G 1 entails G 2, denoted G 1 |= G 2, if and only if there is a map G 2  G 1.  A graph entail any of its subgraphs. a p bc q X p bc q |= 26

 Theorem: 1. Deciding entailment of simple RDF graphs is NP- complete. 2. Deciding equivalence of simple RDF graphs is isomorphism-complete.  Both depends heavily on the set of blank nodes. Can be done in O(v n ), where v the set of nodes and n the blank nodes.  Theorem: If G is simple, then core(G) is the unique minimal graph equivalence to G. 27

Group B (sp) Group A (simple graphs) (a, type, prop)/(a, sp, a) (a, sp, b) (b, sp, c)/(a, sp, c) (a, sp, b) (x, a, y)/(x, b, y) 2) 3) 4) From map μ: G’  G G/G’1) Group D (typing) Group C (sc) (a, dom, c) (x, a, y)/(x, type, c) (a, range, d) (x, a, y)/(y, type, d) 8) 9) (a, type, class)/(a, sc, a) (a, sc, b) (b, sc, c)/(a, sc, c) (a, sp, b) (x, type, a)/(x, type, b) 5) 6) 7) The following deductive system is sound & complete: 28

 Theorem: G 1 |= G 2, if and only if there is a sequence operations starts from G 1 and ends with G 2. NP-complete.  There is no mapping from G 2  G 1 although G 1 |= G 2.  The idea is to “close” the graph with all possible triples. b sc c a G2G2 G1G1 d b c a d X 29

 A closure of a graph G is a maximal set of triples G’ over universe(G) plus the RDFS vocabulary such that G’ contains G and is equivalent to it.  There could be more than one closer for a graph.  The closer may have a redundancies.  The problem of deciding if G’ is the closure of G is DP-complete. b q d a p r X p c p 30

 A normal-form of a graph G, donated nf(G), is the core(G’) for the closer G’ of G.  Theorem: Let G be an RDF graph: 1. The normal-form, nf(G) is unique. 2. G 1 |= G 2 if and only if nf(G 2 )  nf(G 1 ). 3. G 1 ≡ G 2 if and only if nf(G 1 ) ≃ nf(G 2 ).  The problem of deciding if G’ is the normal form of G is DP-complete. 31

b sc c a G2G2 G1G1 d b c a d X nf(G i ) b sc c a d nf is not the most compact representation. 32

 The RDF database will be the RDF graph.  Let V be the set of variables donated by ?X, ?Y.  The query form is Datalog like H  B, where H and B contain variables. (?X, ancestor, ?Y)  (?X, ancestor, ?Z), (?Z, ancestor, ?Y)  The condition var(H) ⊆ var(B) avoids the presence of free variables in the head of the query.  The presence of blank nodes in the body plays the same rule as variable, therefore is unnecessary. 33

 Query can have a set of premises P and constrains C. Query is a tuple (H, B, P, C).  The set of constrains C gives the user the possibility to discriminate between blank and ground nodes in the answer.  The premise P represents information the user supplies to the database to be queried in order to answer the query. E.g. the ability to query incomplete information by supplying information not in the DB or adding semantic information like (son, sp, relative). 34

 Let q = (H, B, P, C) be a query, D a database and V set of variables.  A valuation v is function v:V  UBL for all variables x in B. And for all variables x in C, v(x) is not a blank node.  A pre-answer to q over D is the set single answers v(H): preans(q,D) = {v(H): v(B) ⊆ nf(D+P) and v|=C} 35

 Composing a complex query from simpler once. 1. ans u (q,D) is the union of all single answers (blank nodes play the rule of bridges between two single answers). 2. Ans + (q,D) is the merge of all single answers (renaming blank nodes to avoid names clashes). Useful when querying to several sources.  Let q be a query: 1. If D’|=D then ans(q,D’) |=ans(q,D). 2. For all D, ans u (q,D)|=ans + (q,D) (the converse is not true). 36

 The ability of identifying RDF statements.  By having a blank nodes in the head of the query, one can identify a statement. (N, value, true), (N, type, stat), (N, subj, ?X), (N, pred, ?Y ), (N, obj, ?Z)  (?X, ?Y, ?Z)  Can cause an infinite DB. If statement i 1 (a,b,c) is a valid then statement i 2 (i 1, subj, a) is also and the statement (i 2, subj, i 1 ), and so on. 37

 Exploring different notions of query containment.  In relational databases, set-theoretical inclusion of tuples captures this requirement.  Let q and q’ be queries, and for all databases D: 1. q ⊆ p q’, iff preans(q,D) ⊆ preans(q’,D) up to isomorphism. 2. q ⊆ m q’, iff ans(q’,D) |= ans(q,D).  Let q and q’ be queries, q ⊆ p q’ entails that q ⊆ m q’. The converse is not true.  Theorem: Deciding each one of them is NP-complete. 38

For example: H=B=(X, sc, Y), (Y, sc, Z) H’=B’=(X, sc, Y), (Y, sc, Z), (X, sc, Z) q’ ⊆ m q and q ⊆ m q’ is true, but NOT q’ ⊆ p q or q ⊆ p q’ 39

 Consider the queries q=(H,B,P,C) and q’=(H’,B’,P’,C’), and assume H,H’,B,B’, P, P’ are simple graphs.  Theorem: Then q ⊆ p q’ if and only if for each map μ on the variables of B, there is a substitution (of variables and blank nodes) Θ μ such that: 1. Θ μ (B’) ⊆ P’+(B−μ(B,P)), where μ(B,P) is the set of triples t of B such that μ(t) ∋ P. 2. Θ μ (H’)=H. 3. Θ μ (C’) ⊆ C. 40

 Consider the queries q=(H,B,P,C) and q’=(H’,B’,P’,C’), and assume H,H’,B,B’, P, P’ are simple graphs.  Theorem: Then q ⊆ m q’ if and only if there are substitutions (of variables) Θ 1,…, Θ n such that: 1. Θ j (B’) ⊆ nf(B). 2. ∪ j Θ j (H’)|=H. 3. Θ j (C’) ⊆ C. 41

 The complexity of the evaluation problem of testing emptiness of the query answer set in two versions: 1. Query complexity version: For a fixed database D, given a query q, is q(D) non-empty? NP-complete 1. Data complexity version: For a fixed query q, given a database D, is q(D) non-empty? polynomial  The size of the set of the answer is bounded by |D| |q|. 42

 A reduction of a graph G is a minimal graph G r equivalent to G and contained in G.  Algorithm computing the reduction of a graph G: 1. G  nf(G) 2. Apply reverse rules 7), 8), 9), 4), and 3) and 6) in this order until no longer applicable. 3. Apply any reverse rule in any order until no longer applicable.  Theorem: The problem of deciding if G’ is the reduction of G is DP-complete. 43

 Avoiding redundancy in query answer with lean query heads.  Lean query’s body is not always possible, and may cause for missing an answer.  Even having lean databases and queries with lean heads and bodies does not avoid redundancies. For example: G 1 is the answer to the query (?Z, p, ?U)  (?Z, p, ?U) on G2 a q p X Y p r a p X Y p b G2G2 G1G1 44

 The naive approach to eliminate redundancy in answers is to compute: (1) ans(q,D), and (2) a lean equivalent to ans(q,D).  Theorem: Given a lean database D and a query q, to decide whether ans ∪ (q,D) is lean is coNP-complete (in the size of D).  Theorem: Given a lean database D and a query q, to decide whether ans + (q,D) is lean can be done in polynomial time in the size of D 45

 Normal form.  A formal definition of query language for RDF and its main features.  Query containment and processing.  Redundancy elimination.  From entailment to mapping between graphs.  Complexity issues. 46

 Foundations of Semantic Web Databases – Claudio Gutierrez, Carlos Hurtado, Alberto O. Mendelzon (2004)  RDF Semantics – W3C Working Draft (2003)  Composing Web Services on the Semantic Web – Vadim Eisenberg  Special thanks to Google and Wikipedia. 47

48