On Propagation of Deletions and Annotations through Views Wang-Chiew Tan University of Pennsylvania Database Group Joint work with Peter Buneman and Sanjeev.

Slides:



Advertisements
Similar presentations
Hard Instances of the Constrained Discrete Logarithm Problem Ilya MironovMicrosoft Research Anton MityaginUCSD Kobbi NissimBen Gurion University Speaker:
Advertisements

August 15, 2006Vladlen Timciuc, Caltech CMS Group1 ECAL H4 TB 2006 Status report.
Endacott Society Computer Study Group1 By Jerry Niebaum.
Information Systems & Semantic Web University of Koblenz Landau, Germany Advanced Data Modeling Relational Data Model continued Steffen Staab with Simon.
DB glossary (focus on typical SQL RDBMS, not XQuery or SPARQL)
1 Constraints and Updating Hugh Darwen CS252.HACD: Fundamentals of Relational Databases Section 7: Constraints.
Deco Query Processing Hector Garcia-Molina, Aditya Parameswaran, Hyunjung Park, Alkis Polyzotis, Jennifer Widom Stanford and UCSC Scoop The Stanford –
ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.
Sep 16, 2013 Lirong Xia Computational social choice The easy-to-compute axiom.
An Annotation Management System for Relational Databases Laura Chiticariu University of California, Santa Cruz Joint work with Deepavali Bhagwat, Wang-Chiew.
Deco — Declarative Crowdsourcing
1 Lecture 02: SQL. 2 Outline Data in SQL Simple Queries in SQL (6.1) Queries with more than one relation (6.2) Recomeded reading: Chapter 3, Simple Queries.
Testing “Multiple Conditions” with Decision Table Technique
CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of.
Statistics for Linguistics Students Michaelmas 2004 Week 7 Bettina Braun
Functional Dependencies (FDs)
Belgian C++ User Group Impact of C++11 Move Semantics on Performance Francisco Almeida.
Sep 15, 2014 Lirong Xia Computational social choice The easy-to-compute axiom.
A*-tree: A Structure for Storage and Modeling of Uncertain Multidimensional Arrays Presented by: ZHANG Xiaofei March 2, 2011.
PODC 2007 © 2007 IBM Corporation Constructing Scalable Overlays for Pub/Sub With Many Topics Problems, Algorithms, and Evaluation G. Chockler, R. Melamed,
SATISFIABILITY Eric L. Frederich.
Materialized Views in Probabilistic Databases for Information Exchange and Query Optimization Christopher Re and Dan Suciu University of Washington 1.
SQL Group Members: Shijun Shen Xia Tang Sixin Qiang.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Programming, Triggers Chapter 5 Modified by Donghui Zhang.
Efficient Query Evaluation on Probabilistic Databases
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Week 23 - Revision1 Week 23 Revision DSA. Week 23 - Revision2 Agenda Section A: Multiple choice Section B: Problem-oriented questions Topics for revision.
Databases 6: Normalization
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Web Database Programming Week 6 Using Templates & Updating Web Database.
Computer Science 101 Web Access to Databases Overview of Web Access to Databases.
Graph Algebra with Pattern Matching and Aggregation Support 1.
©Brooks/Cole, 2003 Chapter 14 Databases. ©Brooks/Cole, 2003 Understand a DBMS and define its components. Understand the architecture of a DBMS and its.
Computing Provenance and Annotations of Derived Data Wang-Chiew Tan UC Santa Cruz.
CS848: Topics in Databases: Foundations of Query Optimization Topics Covered  Databases  QL  Query containment  More on QL.
1 On Provenance of Non-Answers for Queries over Extracted Data Jiansheng Huang Ting Chen AnHai Doan Jeffrey F. Naughton.
CODD’s 12 RULES OF RELATIONAL DATABASE
Christopher Re and Dan Suciu University of Washington Efficient Evaluation of HAVING Queries on a Probabilistic Database.
Relational Data Model Ch. 7.1 – 7.3 John Ortiz Lecture 3Relational Data Model2 Why Study Relational Model?  Most widely used model.  Vendors: IBM,
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
1 Relational Algebra and Calculas Chapter 4, Part A.
Guofeng Cao CyberInfrastructure and Geospatial Information Laboratory Department of Geography National Center for Supercomputing Applications (NCSA) University.
Advanced Relational Algebra & SQL (Part1 )
CS848: Topics in Databases: Information Integration Topics covered  Databases  QL  Query containment  An evaluation of QL.
A Dichotomy in the Complexity of Deletion Propagation with Functional Dependencies 2012 ACM SIGMOD/PODS Conference Scottsdale, Arizona, USA PODS 2012 Benny.
Himanshu GuptaCSE 532-SQL-1 SQL. Himanshu GuptaCSE 532-SQL-2 Why SQL? SQL is a very-high-level language, in which the programmer is able to avoid specifying.
Temporal Data Modeling
Containment of Relational Queries with Annotation Propagation Wang-Chiew Tan University of California, Santa Cruz.
An algorithm of Lock-free extensible hash table Yi Feng.
1 CS 430 Database Theory Winter 2005 Lecture 7: Designing a Database Logical Level.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Distributed Database Design Bayu Adhi Tama, MTI Fasilkom-Unsri Adapted from Connolly, et al., Database Systems 4 th Edition, Pearson Education Limited,
11/06/97J-1 Principles of Relational Design Chapter 12.
Chapter 04 Semantic Web Application Architecture 23 November 2015 A Team 오혜성, 조형헌, 권윤, 신동준, 이인용.
Chapter 8: Concurrency Control on Relational Databases
Logic as a Query Language: from Frege to XML
Relational Model By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany)
Chapter 2: Intro to Relational Model
Relational Calculus and QBE
Data Model.
Chapter 2: Intro to Relational Model
Relational Calculus and QBE
Chapter 9: Database Systems
On Provenance of Queries on Linked Web Data
Course Instructor: Supriya Gupta Asstt. Prof
Assertions and Triggers
Relational Calculus Chapter 4, Part B
CS 405G: Introduction to Database Systems
Presentation transcript:

On Propagation of Deletions and Annotations through Views Wang-Chiew Tan University of Pennsylvania Database Group Joint work with Peter Buneman and Sanjeev Khanna

Wang-Chiew Tan, Penn Database Group2 Data Annotations (share annotations) Knowledge sharing through annotations Annotations on data at various levels of granularity, annotations on annotations Improve accuracy of data –data and annotations can be reviewed by independent parties Annotations: –loosely structured Source Data: –proprietary –fixed schema A system that overlays annotations on existing data big business in scientific databases

Wang-Chiew Tan, Penn Database Group3 Restaurant CostType Peacock Alley Bull & Bear Pacifica Soho Kitchen & Bar $$$French $$$Seafood $Chinese $ American Restaurant CostType Pacifica Soho Kitchen & Bar $Chinese $ American All Restaurants (View 1) Cheap Restaurants (View 2) Yummy chicken curry!! NYRestaurants (Source Table) Restaurant CostType Peacock Alley Bull & Bear Pacifica Soho Kitchen & Bar Zip $$$French10022 $$$Seafood10022 $Chinese10013 $ American10022 Serves fine French Cuisine in elegant setting. Jackets required. Extensive wine list! Data Annotations (share annotations)

Wang-Chiew Tan, Penn Database Group4 Data Annotations Communicate meta data through annotations –bounce or spread annotations around by piggybacking annotations on data items in the source-query-view model. An annotation is placed in the view –where do we place the annotation on source? Annotation placement problem presented in relational setting –results carry over to fragments of XML (hierarchical model) Source: Relational Database View : result of query applied on source Model: Not an easy problem! Query

Wang-Chiew Tan, Penn Database Group5 Location and Propagation Rules A location is a triple: (R, t, A) A1A1 A2A2 A3A3 A1A1 A2A2 A3A3 A3A3 A1A1 A2A2 A3A3 A1A1 A2A2 A2A2 A3A3 A1A1 A2A2 A3A3 A1A1 A2A2 A3A3 A1A1 A2A2 A3A3 A1A1 A2A2 A3A3 R R R1R1 R2R2 R1R1 R2R2 relation nametuple in RA is an attribute in schema of R Propagation Rules: –Select: –Project: –Join: –Union:

Wang-Chiew Tan, Penn Database Group6 Annotation Placement Problem Annotation Placement Problem: –Given a view V = Q(S) and an annotation A placed in the view V, decide if there is an annotation in the source that when propagated to the view, produces no other annotation except A. Q = query S = data source –side-effect-free annotation : an annotation on the source that produces no other annotation except A in the view S Q V=Q(S)

Wang-Chiew Tan, Penn Database Group7 A Dichotomy Theorem (a) It is NP-hard to decide if there is a side-effect-free annotation for a PJ query. (b) There is a polynomial time algorithm for queries which do not simultaneously contain a Project and a Join operation. Theorem: S Q V=Q(S)

Wang-Chiew Tan, Penn Database Group8 Project and Join Query Intuition: PJ can encode 3SAT (x 1 + x 2 + x 3 )... ( x 3 + x 5 + x 2 ) x1x1 x2x2 x3x3 C1C1 C1C1 CmCm C1C1... CmCm Query Output Query:Join, then Project on C 1 … C m... C1C1 ddd T - true F - false Assignment tuples: All possible satisfying assignments for C 1 C1C1 C1C1 F F F T F F C1C1 F T F C1C1 T T F C1C1 F F T C1C1 F T T C1C1 T T T Dummy tuple Assignment tuples: All possible satisfying assignments for C m x3x3 x5x5 x2x2 CmCm CmCm CmCm CmCm ddd T F F F T F CmCm T T F CmCm F F T CmCm T F T CmCm F T T CmCm T T T Dummy tuple...

Wang-Chiew Tan, Penn Database Group9 Intuition: PJ can encode 3SAT (x 1 + x 2 + x 3 ) … ( x 3 + x 5 + x 2 ) Assignment tuples: All possible satisfying assignments for C 1 x1x1 x2x2 x3x3 C1C1 C1C1 C1C1 C1C1 Assignment tuples: All possible satisfying assignments for C m ddd C1C1... CmCm Output C1C1 CmCm F F F T F F C1C1 F T F C1C1 T T F C1C1 F F T C1C1 F T T x3x3 x5x5 x2x2 CmCm CmCm CmCm CmCm ddd T F F F T F CmCm T T F CmCm F F T CmCm T F T CmCm F T T T - true F - false C1C1 T T T CmCm T T T Dummy tuple Dummy tuples CmCm ddd C1C1... CmCm Query: Join, then Project on C 1 … C m Project and Join Query

Wang-Chiew Tan, Penn Database Group10 Related Work on Annotations Superimposed Information ( D. Maier, L. Delcambre [WebDB99]) –data placed over existing information eg. bookmark files, schema of a database Annotation Systems –Annotea ( W3C) annotate web pages location is defined with XPointer –Multivalent Browser (R. Wilensky, T. A. Phelps. UC Berkeley DL Project) annotate on PDF files, HTML, etc. robust locations –BioDAS (Distributed Annotation Server) ( L.Stein et. al ) annotate on genome sequences notion of location is genome specific No one has formally studied annotation placement problem

Wang-Chiew Tan, Penn Database Group11 The classical view deletion problem A view tuple is to be deleted –What changes should be made to the source? Many kinds of view-to-source deletion translations –eg. deletion-to-insertion, deletion-to-modification, etc. Update Semantics of Relational Views ( F. Banchilon, N. Spyratos, [TODS81] ) On the correct translation of Update Operations on Relational Views ( U. Dayal, P. Bernstein, [TODS82] ) Algorithms for Translating View Updates to Database Updates for Views Involving Selections, Projections and Joins ( A. M. Keller, [PODS85] ) –deletion-to-deletion Run-Time translations of View Tuple Deletions Using Data Lineage ( Y. Cui, J. Widom, [2001] ) –exploits lineage information to find side-effect free deletions whenever possible

Wang-Chiew Tan, Penn Database Group12 View Deletion Problem (Deletion-to-deletion translation) View Deletion Problem (minimize view side-effect): –Given a view V=Q(S) and a tuple t in V, decide if there is a side- effect free deletion for t –side-effect-free deletion : a set of source tuples whose removal from the database will only remove t from the view Source: Relational Database View : result of query applied on source Query

Wang-Chiew Tan, Penn Database Group13 A Dichotomy Theorem (a) It is NP-hard to decide if there is a side-effect free deletion for a PJ or JU query in normal form. (b) There is a polynomial time algorithm to find the set of source deletions with minimum side-effects for all other queries, i.e., queries that involve only S,P,U or S,J operators). Theorem (a) is true even for a constant size PJ query involving only two relations! Theorem: PROJ A,C (R1 JOIN R2)

Wang-Chiew Tan, Penn Database Group14 View Deletion: PJ Query It is NP-hard to decide if there is a side-effect free deletion for a PJ query in normal form. AB BC c2c2 x2x2 c2c2 x4x4 c2c2 x5x5 c3c3 x4x4 c3c3 x1x1 c3c3 x3x3 ( x 1 +x 2 +x 3 )(x 2 +x 4 +x 5 )(x 4 +x 1 +x 3 ) R1 R2 AC ac ac1c1 ac3c3 c2c2 c c2c2 c1c1 c2c2 c3c3 PROJ A,C (R1 JOIN R2) c1c1 x2x2 c1c1 x3x3 c1c1 x1x1 ax5x5 ax1x1 ax2x2 ax3x3 ax4x4 c x1x1 c x2x2 c x3x3 c x4x4 c x5x5 For each x i, decide whether to delete (a,x i ) or (x i,c). Theorem:

Wang-Chiew Tan, Penn Database Group15 Ongoing and Future Work Implementation of annotation system –on RDBMS special cases of PJ queries with polynomial time algorithm –PJ queries that do not project out key information –on XML –effects on query languages?

Wang-Chiew Tan, Penn Database Group16 Do we need an annotation-conscious QL? The same query in different languages, but different annotation behavior Emp(Name, Sal, Dept) [Name:Joe, Sal:50K, Dept:Marketing ] Relational Algebra: Emp JOIN Department SQL: SELECT e.Name, e.Sal, e.Dept, d.Manager FROM Emp e, Department d WHERE e.Dept = d.Dept [Name:Joe, Sal:50k ] Department(Dept, Manager) [Dept:Marketing, Manager:Jane] [Name:Joe, Sal:50K, Dept:Marketing, Manager:Jane] Q 1 = SELECT e.Name, e.Sal FROM Emp e WHERE e.Sal = 50K Q 2 = SELECT e.Name, 50K AS Sal FROM Emp e WHERE e.Sal = 50K Equivalent queries in the same language, but different annotation behavior =a=a

Wang-Chiew Tan, Penn Database Group17 Relational algebra seems to suggest a natural set of propagation rules SQL seems to suggest another natural propagation rule –one that is based on variable bindings Not clear how we extend the semantics of query languages so that annotation propagation is well-behaved. Should a query language be annotation-conscious ? OR Should the user be allowed to control which annotation gets propagated to where? Do we need an annotation-conscious QL?

Wang-Chiew Tan, Penn Database Group18 End of Talk