Joint work with Werner Nutt Free University of Bozen-Bolzano Completeness of Queries over Incomplete Databases Simon Razniewski.

Slides:



Advertisements
Similar presentations
Metadata in the TAP context (1) The Problem: learn about which tables, tablesets,... are available from a TAP server for each of the tables / tablesets,
Advertisements

CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos & A. Pavlo Lecture#6: Rel. model - SQL part1 (R&G, chapter.
1 CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS. 2 Introduction - We discuss here two mathematical formalisms which can be used as the basis for stating and.
Logic.
Copyright © Cengage Learning. All rights reserved.
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
1 Lecture 12: Further relational algebra, further SQL
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Oct 28, 2003Murali Mani Relational Algebra B term 2004: lecture 10, 11.
Logical Agents Chapter 7. Why Do We Need Logic? Problem-solving agents were very inflexible: hard code every possible state. Search is almost always exponential.
Chapter 3. 2 Chapter 3 - Objectives Terminology of relational model. Terminology of relational model. How tables are used to represent data. How tables.
Logical Agents Chapter 7. Why Do We Need Logic? Problem-solving agents were very inflexible: hard code every possible state. Search is almost always exponential.
1 Relational Model. 2 Relational Database: Definitions  Relational database: a set of relations  Relation: made up of 2 parts: – Instance : a table,
The Relational Model Lecture 3 Book Chapter 3 Relational Data Model Relational Query Language (DDL + DML) Integrity Constraints (IC) From ER to Relational.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #3.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Constraints, Triggers Chapter 5.
Relational Algebra Chapter 4 - part I. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
The Relational Model Codd (1970): based on set theory Relational model: represents the database as a collection of relations (a table of values --> file)
Nov 18, 2003Murali Mani Relational Algebra B term 2004: lecture 10, 11.
In collaboration with Werner Nutt Free University of Bozen-Bolzano Data Quality Simon Razniewski.
Murali Mani Relational Algebra. Murali Mani What is Relational Algebra? Defines operations (data retrieval) for relational model SQL’s DML (Data Manipulation.
Midterm 1 Concepts Relational Algebra (DB4) SQL Querying and updating (DB5) Constraints and Triggers (DB11) Unified Modeling Language (DB9) Relational.
Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Relational Model – SQL Part I (based on notes by Silberchatz,Korth,
Completeness of Queries over Incomplete Databases Werner Nutt joint work with Marco Montali, Sergey Paramonov, Simon Razniewski, Ognjen Savkovic, Alex.
Relational Model & Relational Algebra. 2 Relational Model u Terminology of relational model. u How tables are used to represent data. u Connection between.
The Relational Model. Review Why use a DBMS? OS provides RAM and disk.
Relational Query Languages. Languages of DBMS  Data Definition Language DDL  define the schema and storage stored in a Data Dictionary  Data Manipulation.
Lecture 3 [Self Study] Relational Calculus
FEN  Concepts and terminology  Operations (relational algebra)  Integrity constraints The relational model.
General Database Statistics Using Maximum Entropy Raghav Kaushik 1, Christopher Ré 2, and Dan Suciu 3 1 Microsoft Research 2 University of Wisconsin--Madison.
1 The Relational Model. 2 Why Study the Relational Model? v Most widely used model. – Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. v “Legacy.
FALL 2004CENG 351 File Structures and Data Management1 Relational Model Chapter 3.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.
FEN Introduction to the database field:  The Relational Model Seminar: Introduction to relational databases.
NULLs & Outer Joins Objectives of the Lecture : To consider the use of NULLs in SQL. To consider Outer Join Operations, and their implementation in SQL.
ICS 321 Fall 2011 The Relational Model of Data (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 8/29/20111Lipyeow.
Chapter 6 The Relational Algebra Copyright © 2004 Ramez Elmasri and Shamkant Navathe.
Copyright © Cengage Learning. All rights reserved.
Database Systems (Atzeni, Ceri, Paraboschi, Torlone) Chapter 3 : Relational algebra and calculus McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Relational model.
Chapter 2 Introduction to Relational Model. Example of a Relation attributes (or columns) tuples (or rows) Introduction to Relational Model 2.
IST 210 The Relational Language Todd S. Bacastow January 2004.
ICS 321 Spring 2011 The Database Language SQL (iii) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/14/20111Lipyeow.
1 CSCE Database Systems Anxiao (Andrew) Jiang The Database Language SQL.
 CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian.
SqlExam1Review.ppt EXAM - 1. SQL stands for -- Structured Query Language Putting a manual database on a computer ensures? Data is more current Data is.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
Querying Big Data by Accessing Small Data Wenfei FanUniversity of Edinburgh & Beihang University Floris GeertsUniversity of Antwerp Yang CaoUniversity.
Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.
SE305 Database System Technology 25/09/2014 Quiz-1.
1 SQL: The Query Language. 2 Example Instances R1 S1 S2 v We will use these instances of the Sailors and Reserves relations in our examples. v If the.
Chapter 3 The Relational Model. Objectives u Terminology of relational model. u How tables are used to represent data. u Connection between mathematical.
Relational Algebra Instructor: Mohamed Eltabakh 1.
Chapter 4 The Relational Model Pearson Education © 2009.
LECTURE TWO Introduction to Databases: Data models Relational database concepts Introduction to DDL & DML.
Chapter 3 The Relational Model. Why Study the Relational Model? Most widely used model. Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. “Legacy.
1 CS122A: Introduction to Data Management Lecture #4 (E-R  Relational Translation) Instructor: Chen Li.
The Mediator: What Next? Talk by: Andy Cooke Collaborators: Alasdair Gray, Lisha Ma, and Werner Nutt Heriot-Watt University.
1 CS122A: Introduction to Data Management Lecture 9 SQL II: Nested Queries, Aggregation, Grouping Instructor: Chen Li.
CENG 351 File Structures and Data Management1 Relational Model Chapter 3.
IT 5433 LM3 Relational Data Model. Learning Objectives: List the 5 properties of relations List the properties of a candidate key, primary key and foreign.
CS 3630 Database Design and Implementation
Logical Agents Chapter 7.
Instructor: Mohamed Eltabakh
The Relational Model Textbook /7/2018.
Chapter 2: Intro to Relational Model
Ontologies and Databases
Equivalence of Aggregate Queries in Conjunctive QL
Relational Algebra Chapter 4 - part I.
Presentation transcript:

Joint work with Werner Nutt Free University of Bozen-Bolzano Completeness of Queries over Incomplete Databases Simon Razniewski

Introduction Data completeness: important aspect of data quality Query answering over incomplete data: extensively studied Query Completeness: little work Completeness of Queries over Incomplete Databases2

Bolzano is in the Province of South Tyrol Autonomous, trilingual province in the north of Italy Completeness of Queries over Incomplete Databases3 Bolzano

School Data in South Tyrol Decentrally maintained database Statistical reports Completeness of Queries over Incomplete Databases4 ?? notoriously incomplete correctness important

Example Database Schema Pupil(pname, age, sname) School(sname, type, language) Completeness of Queries over Incomplete Databases5

Completeness Reasoning Example Suppose we have data about pupils from all –German schools –Italian schools, except the high school “Da Vinci“ –Ladin schools, except the middle school “Gherdëna“ Will the following query get a correct answer? “How many pupils are at German primary schools?“  Yes Completeness of Queries over Incomplete Databases6 (if we also have all German primary schools)

Completeness Reasoning Example (Cntd) Suppose we have data about pupils from all –German schools –Italian schools, except the high school “Da Vinci“ –Ladin schools, except the middle school “Gherdëna“ Will the following query get a correct answer? “How many Ladin pupils are there?  Maybe not, pupils from “Gherdëna“ could be missing Completeness of Queries over Incomplete Databases7

Overview Formalization –Incomplete Database –Query Completeness –Table Completeness Reasoning for Conjunctive Queries –Bag Semantics –Set Semantics –Aggregate Queries Completeness of Queries over Incomplete Databases8

Incomplete Database (Motro 1989) Completeness of Queries over Incomplete Databases9

Incomplete Database - Example D = (D i, D a ) “Paul and Andrea are pupils in the ideal database” D i = { pupil(‘Paul‘, 11, ‘Da Vinci‘), pupil(‘Andrea‘, 14, ‘Gherdëna‘) } “Our available database misses the fact that Andrea is a pupil“ D a = { pupil(‘Paul‘, 11, ‘Da Vinci‘) } Completeness of Queries over Incomplete Databases10

Query Completeness (Motro 1989) Completeness of Queries over Incomplete Databases11

Table Completeness (Levy 1996) Completeness of Queries over Incomplete Databases12 This is a full TGD (= tuple generating dependency) This is a full TGD (= tuple generating dependency)

Table Completeness (Cntd) Completeness of Queries over Incomplete Databases13

Completeness Reasoning Completeness of Queries over Incomplete Databases14 TC Statements C QC Statement Compl(Q)

Completeness Reasoning (Cntd) Completeness of Queries over Incomplete Databases15

What is Known? Completeness of Queries over Incomplete Databases16

TC-QC bag – Canonical TC Statements “How many 12-year old pupils are at the Italian schools?'' Q(COUNT(p)) :− pupil(p, 12, s), school(s, t, ‘Italian')‏ Q can be answered correctly if - every 12-year old pupil from an Italian school is there - every Italian school with a 12-year old pupil is there That is, if the database satisfies - Compl(pupil(p, a, s); school(s, t, ‘Italian'), a = 12) ‏ - Compl(school(s, t, l); pupil(p, 12, s), l = ‘Italian') ‏ Completeness of Queries over Incomplete Databases17 canonical completeness statements for Q

TC-QC bag – Canonical TC Statements (Cntd) Completeness of Queries over Incomplete Databases18

TC-QC bag Reduces to TC-TC Completeness of Queries over Incomplete Databases19 TC-QC TC-TC

TC-TC Entailment = Query Containment Completeness of Queries over Incomplete Databases20

TC-TC Entailment = Query Containment (Cntd) TC statements describe parts of tables that are complete TC statements entail each other if the parts described are contained  Entailment of TC from TC can naturally be reduced to query containment Completeness of Queries over Incomplete Databases21 Theorem: Let L be a class of conjunctive queries that (i) contains for every relation the identity query (ii) is closed under intersection Then TC-TC entailment and containment of unions of queries can be reduced to each other in linear time. Theorem: Let L be a class of conjunctive queries that (i) contains for every relation the identity query (ii) is closed under intersection Then TC-TC entailment and containment of unions of queries can be reduced to each other in linear time.

Complexity Classes of conjunctive queries:  CQ: Conjunctive queries with comparisons over dense orders  RQ: Relational conjunctive queries (i.e., without comparisons)  LCQ: Linear conjunctive queries (i.e., without self-joins)  LRQ: Linear relational conjunctive queries Completeness of Queries over Incomplete Databases22

TC-QC bag - Complexity Completeness of Queries over Incomplete Databases23 Query Language LRQLCQRQCQ TC Statement Language LRQin PTIME NP RQin PTIME NP LCQcoNP CQcoNP

TC-QC set Completeness of Queries over Incomplete Databases24

Completeness of Queries over Incomplete Databases25 Query Language LRQLCQRQCQ TC Statement Language LRQin PTIME NP RQin PTIME NP LCQcoNP CQcoNP TC-QC bag - ComplexityTC-QC set

Completeness Reasoning for Aggregate Queries SUM and COUNT: similar to bag semantics MIN and MAX: similar to set semantics Completeness of Queries over Incomplete Databases26

QC-QC and Query Determinacy Motro’s idea: Look for rewritings Given Q 1 (x) :− R(x), S(x) Q 2 (x) :− T(x) Suppose we know Compl(Q 1 ) and Compl(Q 2 ) Consider Q(x) :− R(x), S(x), T(x) We see: Q can be rewritten as Q(x) :− Q 1 (x), Q 2 (x) Therefore, we conclude Compl(Q) Completeness of Queries over Incomplete Databases27

QC-QC and Query Determinacy (Cntd) Queries Q 1, …, Q n, Q Completeness of Queries over Incomplete Databases28

QC-QC and Query Determinacy (Cntd) However: –Decidability of determinacy for conj. queries is open (Segoufin/Vianu ‘05) –Necessity of determinacy for QC-QC entailment is open Completeness of Queries over Incomplete Databases29 Theorem: For boolean queries, existence of rewritings, determinacy and QC-QC entailment coincide Theorem: For boolean queries, existence of rewritings, determinacy and QC-QC entailment coincide

Where Can Completeness Statements Come From? Any conclusion only as correct as the statements it is derived from ~> On which basis can someone give a completeness statement?  Someone knows some part of the real world E.g., a class teacher knows all his students –The method of data collection is known to be complete E.g., at the deadline for enrolment all forms must be present –Cardinalities of parts of the real world are known and the method of data collection is correct E.g., no nonexisting schools are registered and the number of schools in South Tyrol is known Completeness of Queries over Incomplete Databases30

Conclusion Framework for modelling completeness –query answers (Motro: QC statements) –parts of databases (Levy: TC statements) Reasoning –Complexity analysis of TC-TC and TC-QC –Connection between determinacy and QC-QC –Reasoning in the presence of instances Current work –Schema constraints (keys, foreign keys, finite domains) –Null values –Prototypical implementation Completeness of Queries over Incomplete Databases31