Default-all is dangerous! Wolfgang Gatterbauer Alexandra Meliou Dan Suciu Database group University of Washington.

Slides:



Advertisements
Similar presentations
University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,
Advertisements

A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Copyright © C. J. Date 2005page 97 S#Y S1DURINGS3DURING [d04:d10][d08:d10] S2DURINGS4DURING [d02:d04][d04:d10] [d08:d10] WITH ( EXTEND T2 ADD ( COLLAPSE.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
Querying for Information Integration: How to go from an Imprecise Intent to a Precise Query? Aditya Telang Sharma Chakravarthy, Chengkai Li.
The Power of How-to Queries joint work with Dan Suciu (University of Washington) Alexandra Meliou.
University of Washington Database Group Tiresias The Database Oracle for How-To Queries Alexandra Meliou § ✜ Dan Suciu ✜ § University of Massachusetts.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A Modified by Donghui Zhang.
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
Efficient Query Evaluation on Probabilistic Databases
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
1 Lecture 11: Provenance and Data privacy December 8, 2010 Dan Suciu -- CSEP544 Fall 2010.
Rules of Thumb for Information Acquisition from Large and Redundant Data Wolfgang Gatterbauer Database group University of Washington.
An Overview of Data Provenance
University of Washington Database Group Reverse Data Management … and the case for Reverse What-If queries 1 Alexandra Meliou, Wolfgang Gatterbauer, Dan.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
1 Provenance Semirings T.J. Green, G. Karvounarakis, V. Tannen University of Pennsylvania Principles of Provenance (PrOPr) Philadelphia, PA June 26, 2007.
Managing XML and Semistructured Data Lecture 16: Indexes Prof. Dan Suciu Spring 2001.
1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.
Rutgers University Relational Algebra 198:541 Rutgers University.
Relational Algebra Chapter 4 - part I. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Table of Contents The independent variable, x, denotes a member of the domain and the dependent variable, y, denotes a member of the range. We say, "y.
Function: Definition A function is a correspondence from a first set, called the domain, to a second set, called the range, such that each element in the.
Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,
IST Databases and DBMSs Todd S. Bacastow January 2005.
Computing Provenance and Annotations of Derived Data Wang-Chiew Tan UC Santa Cruz.
44220: Database Design & Implementation Logical Data Modelling Ian Perry Room: C48 Tel Ext.: 7287
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
Entity Framework Overview. Entity Framework A set of technologies in ADO.NET that support the development of data-oriented software applications A component.
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
Chapter 2 Adapted from Silberschatz, et al. CHECK SLIDE 16.
® © 2009 Open Geospatial Consortium, Inc. Towards a common information model for water 71st OGC Technical Committee Mountain View, CA. USA Rob Atkinson.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Christopher Re and Dan Suciu University of Washington Efficient Evaluation of HAVING Queries on a Probabilistic Database.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.
Databases Shortfalls of file management systems Structure of a database Database administration Database Management system Hierarchical Databases Network.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 4 Relational Algebra.
Computing & Information Sciences Kansas State University Friday, 26 Jan 2008CIS 560: Database System Concepts Lecture 2 of 42 Friday, 29 August 2008 William.
Answering Queries Using Views: The Last Frontier.
Believe It or Not – Adding belief annotations to databases Wolfgang Gatterbauer, Magda Balazinska, Nodira Khoussainova, and Dan Suciu University of Washington.
Managing Structured Collections of Community Data
1 Provenance Semirings T.J. Green, G. Karvounarakis, V. Tannen University of Pennsylvania PODS 2007.
Query Caching and View Selection for XML Databases Bhushan Mandhani Dan Suciu University of Washington Seattle, USA.
A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB
Database Systems Logical Data Modelling Tutor:Ian Perry Tel: Web:
ARGUMENTS Chapter 15. INTRODUCTION All research projects require some argumentation An argument simply ‘combines’ existing facts to derive new facts,
Querying and storing XML
5 1 Normalization of Database Tables. 5 2 Database Tables and Normalization Normalization –Process for evaluating and correcting table structures to minimize.
Dynamic/Deferred Document Sharing (D3S) Profile for 2010 presented to the IT Infrastructure Technical Committee Karen Witting February 1, 2010.
Module 2: Intro to Relational Model
Relational Model By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany)
Relational Algebra Chapter 4 1.
STRUCTURE OF PRESENTATION :
Relational Algebra Chapter 4, Part A
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Lecture 16: Probabilistic Databases
Relational Algebra 1.
Relational Algebra Chapter 4 1.
Joining Interval Data in Relational Databases
Probabilistic Databases
Chapter 2: Intro to Relational Model
Example of a Relation attributes (or columns) tuples (or rows)
Chapter 2: Intro to Relational Model
Entity-Relationship Modelling
Presentation transcript:

Default-all is dangerous! Wolfgang Gatterbauer Alexandra Meliou Dan Suciu Database group University of Washington Version June 20, rd USENIX Workshop on the Theory and Praxis of Provenance (Tapp'11)

2 Overview Provenance Definitions Why? Why-provenance = witness basis (α w ) Minimal witness basis (α w m ) Where-provenance = propagation (α p ) Default-all propagation (α p d ) Where? Naive Provenance definition QRI definition (Query-Rewrite- Insensitive) Witness"SQL interpretation" Buneman et al. [ICDT’01] Bhagwat et al. [VLDB’04] Buneman et al. [PODS’02] Buneman et al. [ICDT’01] Minimal propagation (α p m ) Proposed in this paper! Has problems if one interprets annotations on attribute values We do not discuss here whether QRI is desirable (see also ), but merely point out that, if aiming for QRI, care has to be taken about the ramifications of the proposed semantics. Glavic, Miller [Tapp'11] Independent work presented at this WS

3 Overview Provenance Definitions Why? Why-provenance = witness basis (α w ) Minimal witness basis (α w m ) Where-provenance = propagation (α p ) Default-all propagation (α p d ) Where? Naive Provenance definition QRI definition (Query-Rewrite- Insensitive) Witness"SQL interpretation" Buneman et al. [ICDT’01] Bhagwat et al. [VLDB’04] Buneman et al. [PODS’02] Buneman et al. [ICDT’01] Glavic, Miller [Tapp'11] Note that Minimal propagation is "stable", in contrast to Default-all Minimal propagation (α p m ) Has problems if one interprets annotations on attribute values Proposed in this paper!

4 Example 1: Query-Rewrite-Insensitivity (QRI) 1 a 1 c 2 e A B 2 b 3 d 2 f A B t1t1 t2t2 t3t A B {{t 1 },{t 1,t 3 }} {{t 2 }} {{t 3 },{t 1,t 3 }} {{t 1 }} {{t 2 }} {{t 3 }} {t 1,t 3 } {t 2 } {t 1,t 3 } Q 1 (x,y):-R(x,y) A B {{t 1 }} {{t 2 }} {{t 3 }} R Why Query 1Input Why-provenance = witness basis (α w ) Minimal witness basis (α w m ) Lineage (α l ) Q 2 (x,y):-R(x,y),R(_,y) RaRa Input Where 1 a 1 c 2 e A B 2 b 3 d 2 f Q 1 (x,y):-R a (x,y) Query 1 1 a 1 c 2 e A B 2 b,f 3 d 2 b,f 1 a,c 2 e A B 2 b,f 3 d 2 b,f Where-provenance = propagation (α p ) Q 2 (x,y):-R a (x,y),R a (_,y) Query 2 ≡ Query 1 Default-all propagation (α p d ) Example adapted from 1 a 1 c 2 e A B 2 b 3 d 2 f Minimal propagation (α p m ) Query 2 ≡ Query 1 Cheney et al. [Foundations and Trends in DBs’09]

5 Real example: Why Default-all is dangerous Default-all propagation makes her drink the milk: LF Milk SC Water FoodContent Cesium-137 b Calcium d Cesium-137 f Bob, March 18, 2011 Don't drink, lots of Cesium! Fuyumi, March 19, 2011 No Cesium, save to drink! RaRa Content Cesium-137 ??? Q (y):-R a (‘LF Milk’,y) b f Hanako queries a community DB for contents of LF-milk * : Community DatabaseHanako's query Content Cesium-137 bf Minimal propagation (α p m )Default-all propagation (α p d ) Bob, March 18, 2011 Don't drink, lots of Cesium! Fuyumi, March 19, 2011 No Cesium, save to drink! b f Content Cesium-137 b Bob, March 18, 2011 Don't drink, lots of Cesium! b * Note the one-to-one correspondence of this example with example 1 Calcium d "semantically irrelevant information": annota- tions leak over from SC Water tuple to LF Milk "all relevant and only relevant"

6 Definition Minimal propagation (α p m ) 1 a 1 c 2 e A B 2 b 3 d 2 f t1t1 t2t2 t3t3 RaRa Example 1 1 a 1 c 2 e A B 2 b,f 3 d 2 b,f Q 2 (x,y):-R a (x,y),R a (_,y) Intuition: Return the intersection between: query-specific where-provenanc (α p ) and QRI minimal witness basis (α w m ) {{t 1 }} {{t 2 }} {{t 3 }} Minimal witness basis (α w m ) 1 a 1 c 2 e A B 2 b 3 d 2 f transforms 'sets of sets' into 'sets', hence something like QRI lineage t4t4 t5t5 t6t6 InputQuery 2 Where provenance (α p ) {t 1 } {t 2 } {t 3 } α w m Minimal propagation (α p m ) "all relevant... and only relevant"

7 Example 1: Illustration of "minimal" versus "all" Why-provenance Where-provenance Where-provenance (α p ) Minimal witness basis (α w m ) Why-provenance (α w ) Minimal propagation (α p m ) Default-all propagation (α p d )

8 Interpretation of Annotations 1: Attribute Value * * Interpretation of annotations on entity attribute values favored by us and underlying our model

9 Interpretation of Annotations 1: Attribute Value * Annotations on values of an attribute (here "population") for a particular entity (here "Athens") Argument: Interpreting cell annotations as relevant to the tuple (entity) adds something that is not trivially modeled with normalized tables. * Interpretation of annotations on entity attribute values favored by us and underlying our model

10 Interpretation of Annotations 2: Domain Value * * Alternative interpretation suggested by Wang-Chiew Tan (example created after conversation at Sigmod 2011) 1 a 1 c 2 e A B 2 b 3 d 2 f Input R a : Bob, March 18, 2011 This number is a prime number. Fuyumi, March 19, 2011 Two is not a prime number because it is even. b f... Date Dec Dec 25 This is a holiday. b This is a holiday too !!! f Domain value annotations * Input S a : Argument for default-all: If annotations are on domain values, then retrieving all annotations are relevant. Counter-Argument: But then these anno- tations can be modeled in a separate table as normalized tables. Alternative representation 2 2 B annotation b: Bob, March 18, 2011 This number is a prime number. f: Fuyumi, March 19, 2011 Two is not a prime number because it is even Annotation table S a : Dec 25 Date annotation This is a holiday. Annotation table S a :

α w m (~QRI lineage) 11 Backup: Detailed Example 2 1 a 1 c 2 e A B 2 b 3 d 2 f t1t1 t2t2 t3t3 RaRa 1 a,c 2 e,g A B 2 b,e,g 2 e,f,g Q 5 (x,y):-R a (x,y),R a (y,_),R a (x,_) Where-provenance (α p ) {{t 1,t 3 },{t 1,t 2,t 3 },{t 1,t 4 },{t 1,t 2,t 4 }} {{t 3 },{t 3,t 4 }} {t 1,t 3,t 4 } {t 3 } Minimal witness basis (α w m ) 1 a,c 2 e,g AB 2 b,e,f,g 2 b,e,f {{t 1,t 3 }, {t 1,t 4 }} {{t 3 }} Why-provenance (α w ) t5t5 t6t6 1 a 2 e A B 2 b,e,g 2 e,f t4t4 t5t5 Minimal propagation (α p m )Default-all propagation (α p d ) Q 6 (x,y):-R a (x,y),R a (y,_),R a (x,_),S a (_,y) α p d (t 4,B,Q 5 ) = α p (t 4,B,Q 6 ) with 2 g 4 h t4t4 Note minimal propagation is not equivalent to just evaluating the where-provenance for the query: Q 7 (x,y):-R a (x,y),R a (y,_). E.g. α p (t 5,B,Q 7 ) = {e,f,g}