Foundations of Semantic Web Databases Gutierrez, Hurtado and Mendelzon Presented by: Nir Zepkowitz.

Slides:



Advertisements
Similar presentations
Relational Calculus and Datalog
Advertisements

10 October 2006 Foundations of Logic and Constraint Programming 1 Unification ­An overview Need for Unification Ranked alfabeths and terms. Substitutions.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
RDF Schemata (with apologies to the W3C, the plural is not ‘schemas’) CSCI 7818 – Web Technologies 14 November 2001 Van Lepthien.
Semantic Web Thanks to folks at LAIT lab Sources include :
The Semantic Web – WEEK 4: RDF
CS570 Artificial Intelligence Semantic Web & Ontology 2
By Ahmet Can Babaoğlu Abdurrahman Beşinci.  Suppose you want to buy a Star wars DVD having such properties;  wide-screen ( not full-screen )  the extra.
RDF Tutorial.
Answer Set Programming Overview Dr. Rogelio Dávila Pérez Profesor-Investigador División de Posgrado Universidad Autónoma de Guadalajara
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A Modified by Donghui Zhang.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Claudio Gutierrez, Carlos Hurtado, Alberto O. Mendelzon 1.
1 An Introduction To The Semantic Web. 2 Information Access on the Web Find an mp3 of a song that was on the Billboard Top Ten that features a cowbell.
Firewall Policy Queries Author: Alex X. Liu, Mohamed G. Gouda Publisher: IEEE Transaction on Parallel and Distributed Systems 2009 Presenter: Chen-Yu Chang.
2005certain1 Views as Incomplete Databases – Certain & Possible Answers  Views – an incomplete representation  Certain and possible answers  Complexity.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
Intelligent Systems Semantic Web. Aims of the session To introduce the basic concepts of semantic web ontologies.
Dr. Alexandra I. Cristea CS 319: Theory of Databases: C3.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Rutgers University Relational Algebra 198:541 Rutgers University.
Relational Algebra Chapter 4 - part I. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 12: Ontologies and Knowledge Representation PRINCIPLES OF DATA INTEGRATION.
Chapter 4: A Universal Program 1. Coding programs Example : For our programs P we have variables that are arranged in a certain order: Y 1 X 1 Z 1 X 2.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Chinese-European Workshop on Digital Preservation Beijing (China), July.
RDF: Concepts and Abstract Syntax W3C Recommendation 10 February Michael Felderer Digital Enterprise.
Semantic Web Technologies ufiekg-20-2 | data, schemas & applications | lecture 21 original presentation by: Dr Rob Stephens
CS848: Topics in Databases: Foundations of Query Optimization Topics Covered  Databases  QL  Query containment  More on QL.
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
Logics for Data and Knowledge Representation
Math 3121 Abstract Algebra I Section 0: Sets. The axiomatic approach to Mathematics The notion of definition - from the text: "It is impossible to define.
CC L A W EB DE D ATOS P RIMAVERA 2015 Lecture 2: RDF Model & Syntax Aidan Hogan
1 Relational Algebra. 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of data from a database. v Relational model supports.
RRXS Redundancy reducing XML storage in relations O. MERT ERKUŞ A. ONUR DOĞUÇ
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.
1 Relational Algebra and Calculas Chapter 4, Part A.
ICS 321 Fall 2011 The Relational Model of Data (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 8/29/20111Lipyeow.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
1 Relational Algebra Chapter 4, Sections 4.1 – 4.2.
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
A Logic of Partially Satisfied Constraints Nic Wilson Cork Constraint Computation Centre Computer Science, UCC.
The future of the Web: Semantic Web 9/30/2004 Xiangming Mu.
CSCD34-Data Management Systems - A. Vaisman1 Relational Algebra.
1 Computing Full Disjunctions Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
LDK R Logics for Data and Knowledge Representation ClassL (Propositional Description Logic with Individuals) 1.
Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.
Doc.: IEEE /0169r0 Submission Joe Kwak (InterDigital) Slide 1 November 2010 Slide 1 Overview of Resource Description Framework (RFD/XML) Date:
1 Reasoning with Infinite stable models Piero A. Bonatti presented by Axel Polleres (IJCAI 2001,
ece 627 intelligent web: ontology and beyond
Copyright © Cengage Learning. All rights reserved. CHAPTER 8 RELATIONS.
R-customizers Goal: define relation between graph and its customizers, study domains of adaptive programs, merging of interface class graphs.
Knowledge Technologies Manolis Koubarakis 1 Some Other Useful Features of RDF.
Formal Semantics Purpose: formalize correct reasoning.
An Optimization Technique for RDFS Inference using the Application Order of RDFS Entailment Rules Kisung Kim, Taewhi Lee
Presented by Kyumars Sheykh Esmaili Description Logics for Data Bases (DLHB,Chapter 16) Semantic Web Seminar.
Linked Data & Semantic Web Technology The Semantic Web Part 7. RDF Semantics Dr. Myungjin Lee.
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
1 Representing and Reasoning on XML Documents: A Description Logic Approach D. Calvanese, G. D. Giacomo, M. Lenzerini Presented by Daisy Yutao Guo University.
The Relation Induced by a Partition
Schema Refinement and Normal Forms
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Chapter 3 The Relational Database Model
Chapter 19 (part 1) Functional Dependencies
ICOM 5016 – Introduction to Database Systems
Presentation transcript:

Foundations of Semantic Web Databases Gutierrez, Hurtado and Mendelzon Presented by: Nir Zepkowitz

Background The web is a huge collection of interconnected data. The web lacks semantic information so managing and processing the data is hard. Semantic web – proposal to build an infrastructure of machine-readable semantic for the data on the web.

Semantic web "The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." -- Tim Berners-Lee, James Hendler, Ora Lassila "If HTML and the Web made all the online documents look like one huge book, RDF, schema, and inference languages will make all the data in the world look like one huge database." Tim Berners- Lee, Weaving the Web, 1999

Background (cont) In 1998 the W3C offered the language that will be the basis for that infrastructure – the Resource Description Framework (RDF). RDF is being implemented in many world- wide initiatives and gets a lot of attention.

Why are we here? Query languages for RDF were developed side by side with RDF. Little research about the foundations of RDF and its query languages. This research is necessary because of the new features that arise in querying RDF graphs (as opposed to standard DB).

Some problems The RDF data model allows several representations for the same information. Is there a normal form? Is there a way to check equivalence?

What we are going to do? Study formal aspects of querying DBs containing RDF data. New notation of normal form for RDF graphs. Give formal definition of query language for RDF. Investigate theoretical and complexity aspects related to query processing and redundancy.

The RDF model U – RDF URI references. B – blank nodes. L – RDF literals. RDF triple: V 1 – subject, v 2 – predicate, v 3 – object.

Definitions Graph is a set of triples. Universe(G) – set of UBL elements that appear in a triple of G. Vocabulary of G – A graph is ground if it has no blank nodes.

Definitions Map: a function (UBL->UBL) preserving URIs and literals (μ(u) = u). μ(G) – a set (μ(s), μ(p), μ(o)) s.t. (s,p,o) in G. μ is consistent with G if μ(G) is RDF graph. In this case we denote μ(G) an instance of G. An instance is proper if μ(G) has fewer blank nodes than G.

Definitions G 1,G 2 are isomorphic ( ) if there are maps μ 1, μ 2 s.t. μ 1 (G 1 )=G 2 and μ 2 (G 2 )=G 1 Union of graphs (G 1 UG 2 ) is the union of their triples. Merge of graphs (G 1 +G 2 ) is G 1 UG 2 ’ where G 2 ’ is isomorphic to G 2 and its blank nodes are disjoint with that of G 1. (there is no relation between the graphs)

RDFS Extended version of RDF. Defines classes and properties that may be used for describing groups of resources and relationships between resources. Supports: reification (making statements about statements), typing and inheritance. The triple (a,b,c) occurs in page http:

Lean graphs G is lean if there is no map μ s.t. μ(G) is a proper sub-graph of G. G 1 is not lean G 2 is lean

Core(G) Theorem: each RDF graph G contains a unique lean sub-graph which is an instance of G. We will denote this unique sub-graph: core(G).

Semantics of RDF graphs Theorem 3: Let G 1, G 2 be simple (do not use predefined semantics) graphs. G 1 entails G 2 (G 1 ╞ G 2 ) iff there is a map G 2 ->G 1 (there is a map s.t. μ(G 1 ) is sub-graph of G 2 ).

Its follows that there is a cubist that painted Guernica It does not follow that Rivera painted Zapata

Equivalence G 1 and G 2 are equivalent (G 1 ≡G 2 ) if G 1 ╞ G 2 and G 2 ╞ G 1. Theorem: if G is simple, then core(G) is the unique (up to isomorphism) minimal (w.r.t number of triples) graph equivalent to G. unique lean sub-graph

There is a sound and complete set of rules for ╞ in graphs with RDFS-vocabulary. For example: (a,sc,b), (b,sc,c) -> (a,sc,c). In non-simple graphs we can not use theorem 3 because of issues like transitivity. Sub class Let G1, G2 be simple (do not use predefined semantics) graphs. G1 entails G2 (G1╞ G2) iff there is a map G2->G1.

The two are equivalent, but there is no mapping G 1 ->G 2

Closure To avoid the problem we will “close” the graph with all possible triples that are entailed by the existing ones. A closure of G is a maximal set of triples G’ over universe(G’) plus the rdfs-vocabulary s.t. G’ contains G and is equivalent to G.

RDFS-closure Closure of G under the set of RDFS rules. By using this definition we can prove that: G 1 ╞ G 2 iff there is a map from G 2 to the RDFS closure of G 1.

Redundancy From the data representation point of view, “closure” and “RDFS-closure” may have redundancies. They are not the best choice to work with. The operator core does not eliminate all redundant triples.

Normal from G’s normal form (nf(G)) is core(G’), where G’ is closure of G. If G is RDF graph then: – nf(G) is unique. – G 1 ╞ G 2 iff nf(G 2 )->nf(G 1 ). – G 1 ≡G 2 iff

Redundancy Normal forms are not the most compact representation. A reduction of a graph G is a minimal graph G r equivalent to G and contained in G. The writers of the article present an algorithm to get the reduction of a graph. The basic idea is to delete triplets deduced by RDFS rules.((a,sc,b), (b,sc,c) -> (a,sc,c)).

Querying RDF Databases RDF graph can be viewed as standard relational database. Each tuple in the table is a triplet with the attributes: subject, predicate and object.

Query language Variables (disjoint from UBL) will be denoted ?X, ?Y, ?person. The query language will be similar to datalog: (?A,creates,?Y) <- (?A,type,Flemish), (?A,paints,?Y), (?Y,exhibited,?Gordon) “define the artifacts created by Flemish artists being exhibited in the Gordon gallery”.

Tableau (H<-B) A tableau is a pair (H,B). H and B are RDF graphs. Some UBLs are replaced by variables in V. All variables in H occur in B.

Query Query is a tableau (H,B) plus a set of premises P and a set of constraints C. P is a graph over UBL. C is a subset of the variables occurring in H. We can think of a query as the tuple (H,B,P,C). When P/C are omitted: assume they are Φ

Constraints Allow to discriminate between blank and ground nodes in an answer (IS NOT NULL). If we add the constraint {?A} this means that ?A variable must be bound to a non-blank element in each answer to the query.

Premises The premise represents information that the user supplies to the database in order to answer the query. (?X,relative,Peter) <- (?X,relative,Peter) P={(son,sp,relative)} All relatives of Peter knowing that “son” is a sub-property of “relative”.

Premises (cont) Allows hypothetical analysis. Fixed premises through all the query. Allows black nodes but no variables.

Answering a query Valuation is a function: V->UBL. For a set C of variables, the valuation v satisfies the constraint C, if for all x in C v(x) is not blank. v(B) is the graph obtained after replacing every occurrence of a variable x in B with v(x).

Matching Matching of a graph B in DB D is a valuation v s.t.. The matchings that interest us are the ones that satisfy C.

Single answer Let (H,B,P,C) be a query and D a DB. Pre-answer of q over D is: – preans(q,D)={v(H) : v is a matching of B in D+P and v satisfies C}. A graph v(H) in preans(q,D) is called a single answer of query q over D.

Complex queries We would like complex queries to be composed form simple ones: – ans u (q,D) – good when we want blank nodes to play the role of bridges between two queries. – ans + (q,D) – (merge) renaming blank nodes to avoid name clashes. Good when querying several unrelated DBs.

Reification We allow blank nodes in the head of queries. Main motivation is the reification vocabulary. In the RDF semantics statement does not have an identifier. To refer to a statement we must give it a name (blank node) – reification process.

Reification It allows us to say something about statements. (N,value,true), (N,type,stat), (N,subj,?X), (N,pred,?Y), (N,obj,?Z) <- (?X,?Y,?Z). If the DB is Britanica then: “all statements made by Encyclopedia Britanica are true”.

Reification - Problem By RDF specification, RDF graph (DB) is a finite set of objects. Answers to queries are finite set also. If a triple itself is an object i 1 then having (a,b,c) in the DB would imply (i 1,subj,a)…… We get infinite sets.

Query complexity We consider a simpler version: – Query complexity version: fixed DB D, given a query q, is q(D) is non-empty? – Date complexity version: fixed query q, given a DB D, is q(D) non-empty? Theorem: the evaluation problem is NP- complete for the query complexity version and polynomial for the data complexity version.

Query complexity The size of the set of answers of a query q over a DB D is |D| |q|. |D| - size of the normal form of D. |q| - the number of symbols in the query.

What we saw RDF model. RDF semantics. Normal forms of RDF graphs. Querying RDF databases. Query complexity.