A Dichotomy in the Complexity of Deletion Propagation with Functional Dependencies 2012 ACM SIGMOD/PODS Conference Scottsdale, Arizona, USA PODS 2012 Benny.

Slides:



Advertisements
Similar presentations
A D ICHOTOMY ON T HE C OMPLEXITY OF C ONSISTENT Q UERY A NSWERING FOR A TOMS W ITH S IMPLE K EYS Paris Koutris Dan Suciu University of Washington.
Advertisements

The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford.
University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Query Folding Xiaolei Qian Presented by Ram Kumar Vangala.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
Max Cut Problem Daniel Natapov.
Queries with Difference on Probabilistic Databases Sanjeev Khanna Sudeepa Roy Val Tannen University of Pennsylvania 1.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
The Theory of NP-Completeness
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Efficient Query Evaluation on Probabilistic Databases
PCPs and Inapproximability Introduction. My T. Thai 2 Why Approximation Algorithms  Problems that we cannot find an optimal solution.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
The Theory of NP-Completeness
Analysis of Algorithms CS 477/677
Technion 1 Generating minimum transitivity constraints in P-time for deciding Equality Logic Ofer Strichman and Mirron Rozanov Technion, Haifa, Israel.
Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.
Chapter 11: Limitations of Algorithmic Power
CS151 Complexity Theory Lecture 6 April 15, 2004.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
1 Combinatorial Dominance Analysis Keywords: Combinatorial Optimization (CO) Approximation Algorithms (AA) Approximation Ratio (a.r) Combinatorial Dominance.
2005lav-iii1 The Infomaster system & the inverse rules algorithm  The InfoMaster system  The inverse rules algorithm  A side trip – equivalence & containment.
Complexity 1 Hardness of Approximation. Complexity 2 Introduction Objectives: –To show several approximation problems are NP-hard Overview: –Reminder:
Complexity Issues Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova, Simpson College.
1 NP-Complete Problems (Fun part) Polynomial time vs exponential time –Polynomial O(n k ), where n is the input size (e.g., number of nodes in a graph,
CS 405G: Introduction to Database Systems 16. Functional Dependency.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
1 Propagating Functional Dependencies with Conditions Wenfei Fan University of Edinburgh & Bell Laboratories Shuai Ma University of Edinburgh Yanli HuNational.
The Theory of NP-Completeness 1. What is NP-completeness? Consider the circuit satisfiability problem Difficult to answer the decision problem in polynomial.
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
Fixed Parameter Complexity Algorithms and Networks.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
A D ICHOTOMY ON T HE C OMPLEXITY OF C ONSISTENT Q UERY A NSWERING FOR A TOMS W ITH S IMPLE K EYS Paris Koutris Dan Suciu University of Washington.
APPROXIMATION ALGORITHMS VERTEX COVER – MAX CUT PROBLEMS
General Database Statistics Using Maximum Entropy Raghav Kaushik 1, Christopher Ré 2, and Dan Suciu 3 1 Microsoft Research 2 University of Wisconsin--Madison.
INHERENT LIMITATIONS OF COMPUTER PROGRAMS CSci 4011.
A NSWERING C ONJUNCTIVE Q UERIES W ITH I NEQUALITIES Paris Koutris 1 Tova Milo 2 Sudeepa Roy 1 Dan Suciu 1 ICDT University of Washington 2 Tel Aviv.
Sorting by Cuts, Joins and Whole Chromosome Duplications
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
EMIS 8373: Integer Programming NP-Complete Problems updated 21 April 2009.
1 Approximate Algorithms (chap. 35) Motivation: –Many problems are NP-complete, so unlikely find efficient algorithms –Three ways to get around: If input.
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
Finding a Minimal Tree Pattern Under Neighborhood Constraints Benny Kimelfeld Yehoshua Sagiv IBM Research – AlmadenThe Hebrew University of Jerusalem 2011.
Querying Big Data by Accessing Small Data Wenfei FanUniversity of Edinburgh & Beihang University Floris GeertsUniversity of Antwerp Yang CaoUniversity.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
NP Completeness Piyush Kumar. Today Reductions Proving Lower Bounds revisited Decision and Optimization Problems SAT and 3-SAT P Vs NP Dealing with NP-Complete.
Containment of Relational Queries with Annotation Propagation Wang-Chiew Tan University of California, Santa Cruz.
Graph Indexing From managing and mining graph data.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.
COSC 3101A - Design and Analysis of Algorithms 14 NP-Completeness.
Computability Examples. Reducibility. NP completeness. Homework: Find other examples of NP complete problems.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
CHAPTER SIX T HE P ROBABILISTIC M ETHOD M1 Zhang Cong 2011/Nov/28.
Queries with Difference on Probabilistic Databases
Data Integration with Dependent Sources
Effective Social Network Quarantine with Minimal Isolation Costs
Foundations of Data Exchange and Metadata Management
NP-Complete Problems.
Chapter 2: Intro to Relational Model
Example of a Relation attributes (or columns) tuples (or rows)
Chapter 2: Intro to Relational Model
Presentation transcript:

A Dichotomy in the Complexity of Deletion Propagation with Functional Dependencies 2012 ACM SIGMOD/PODS Conference Scottsdale, Arizona, USA PODS 2012 Benny Kimelfeld IBM Research – Almaden

This Work! Deletion Propagation Translate a tuple deletion on the view back to the source relations … properly Classic database problem –Specializing the more general view-update problem –[Dayal & Bernstein 1982; Cosmadakis & Papadimitriou 1984; Keller 1986; Cui & Widom 2001; Buneman & Khanna & Tan 2002; Cong & Fan & Geerts 2006; …] Renewed motivation: debug/causality for false positives [K, Vondrak, Williams, 2011] Various definitions of “properly” were studied –Minimize the view side effect # view tuples lost except the intentional one –Minimize the source side effect # source tuples to delete = maximal “responsibility” for an answer [Meliou et al., 2010]

Example: File Access GroupFile groupfile ai a.txt ai b.txt db a. txt db b.txt os a.txt UserGroup usergroup Emma ai Emma db Olivia os Olivia db Jacob ai Access (u,f) :– UserGroup (u,g), GroupFile (g,f) Delete source rows, s.t. Emma won’t access a.txt. But, maintain maximum access permissions! [Cui & Widom 2001; Buneman et al. 2002] Access userfile Emma a.txt Emma b.txt Olivia a.txt Olivia b.txt Jacob a.txt Jacob b.txt = ⋈

Example: File Access Access (u,f) :– UserGroup (u,g), GroupFile (g,f) Delete source rows, s.t. Emma won’t access a.txt. But, maintain maximum access permissions! = ⋈ GroupFile groupfile ai a.txt ai b.txt db a. txt db b.txt os a.txt UserGroup usergroup Emma ai Emma db Olivia os Olivia db Jacob ai Access userfile Emma a.txt Emma b.txt Olivia a.txt Olivia b.txt Jacob a.txt Jacob b.txt [Cui & Widom 2001; Buneman et al. 2002]

Example: File Access Delete source rows, s.t. Emma won’t access a.txt. But, maintain maximum access permissions! GroupFile groupfile ai a.txt ai b.txt db a. txt db b.txt os a.txt UserGroup usergroup Emma ai Emma db Olivia os Olivia db Jacob ai Access userfile Emma a.txt Emma b.txt Olivia a.txt Olivia b.txt Jacob a.txt Jacob b.txt Access (u,f) :– UserGroup (u,g), GroupFile (g,f) = ⋈ side-effect free (& minimal side effect) side-effect free (& minimal side effect) [Cui & Widom 2001; Buneman et al. 2002]

Formal Definitions Schema S : rel. symbols + functional dependencies (fd) R 1,…., R m R i : attribute-set → attribute Conjunctive Query (CQ) Q : head variables existential variables Q ( y 1, y 2, y 3 ) :– R 1 ( x 1, y 1 ), R 2 ( x 1,'ibm'), R 3 ( x 2, y 1, y 2, x 3 ), R 4 ( x 4, y 3 ) Solution: E ⊆ D s.t. a ∉ Q ( E ) Side-effect free: Q ( E ) = Q ( D ) – { a } Optimal: | Q ( E )| is maximal Input: DB D over S Answer a ∈ Q ( D ) to delete No self joins! atom

Complexity Questions What is the complexity of Deciding if a side-effect-free solution exists? Finding an optimal solution? –Or one w/ approximately minimal side effect? –Or one w/ approximately maximal # surviving answers? Not the same [K, Vondrák, Williams, 2011] Data complexity: Fixed:Schema S, CQ Q Input: DB D over S, answer a ∊ Q ( D ) to delete

Unirelation Algorithm ( 1Rel ): Example Delete a = (Emma, a.txt ) = ⋈ GroupFile groupfile ai a.txt ai b.txt db a. txt db b.txt os a.txt UserGroup usergroup Emma ai Emma db Olivia os Olivia db Jacob ai Access userfile Emma a.txt Emma b.txt Olivia a.txt Olivia b.txt Jacob a.txt Jacob b.txt Access (u,f) :– UserGroup (u,g), GroupFile (g,f) [Buneman et al., 2002]

Unirelation Algorithm ( 1Rel ): Example Recall: there is even better solution (side-effect free) better than previous ⇒ selected solution Delete a = (Emma, a.txt ) GroupFile groupfile ai a.txt ai b.txt db a. txt db b.txt os a.txt UserGroup usergroup Emma ai Emma db Olivia os Olivia db Jacob ai Access userfile Emma a.txt Emma b.txt Olivia a.txt Olivia b.txt Jacob a.txt Jacob b.txt Access (u,f) :– UserGroup (u,g), GroupFile (g,f) ⋈ [Buneman et al., 2002] =

1Rel : General Case undesired a ∈ Q ( D ) R1R1 R2R2 RkRk select best (i=1,…,k) solution i : delete from R i each tuple consistent w/ a solution 1 solution 2 solution k Q has k atoms … … R1R1 R2R2 RkRk … R1R1 R2R2 RkRk … D D D

Head Domination [K, Vondrák, Williams, 2011] Q:Q:A CQ over a schema S G∃[Q]:G∃[Q]: nodes = atoms( Q ) edges = “sharing ≥1 existential var.” head domination: ∀ C ∊ CC( G ∃ [ Q ]) ∃   ∊ atoms( Q ) s.t.,  headVars( C ) ⊆ vars(  ) Connected Components Q ( y 1, y 2, y 3 ) :– R 1 ( x 1, y 1 ), R 2 ( x 1, y 2 ), R 3 ( y 1, y 2 ), R 4 ( x 2, y 2, y 3 ) Q ( y 1, y 2 ) :– R 1 ( x, y 1 ), R 2 ( x, y 2 ) Q ( y 1, y 2 ) :– R 1 ( x 1, y 1 ), R 2 ( x 1, y 2 ), R 3 ( x 1, y 1, y 2 ) Access ( u, f )

Previous Dichotomy Theorem [KVW 2011] Let Q be a CQ over a schema S (no self joins) [K, Vondrak, Williams, 2011], no FDs: Q has head domination ⇒ 1Rel returns an optimal solution (in PTime) otherwise ⇒ ∃ side-effect-free is NP-complete; NP-hard to find an ( α Q -approx.) optimal solution Q ( y 1, y 2, y 3 ) :– R 1 ( x 1, y 1 ), R 2 ( x 1, y 2 ), R 3 ( y 1, y 2 ), R 4 ( x 2, y 2, y 3 ) Q ( y 1, y 2 ) :– R 1 ( x, y 1 ), R 2 ( x, y 2 ) Q ( y 1, y 2 ) :– R 1 ( x 1, y 1 ), R 2 ( x 1, y 2 ), R 3 ( x 1, y 1, y 2 ) PTime (1Rel) PTime NP-hard Access ( u, f )

Access Example Revisited Delete (Emma, a.txt ) group ← file PTime GroupFile groupfile ai a.txt ai b.txt db a. txt db b.txt os a.txt UserGroup usergroup Emma ai Emma db Olivia os Olivia db Jacob ai Access userfile Emma a.txt Emma b.txt Olivia a.txt Olivia b.txt Jacob a.txt Jacob b.txt ⋈ = NP-hard

Access Example Revisited Delete (Emma, a.txt ) user → group NP-hard PTime GroupFile groupfile ai a.txt ai b.txt db a. txt db b.txt os a.txt UserGroup usergroup Emma ai Emma db Olivia os Olivia db Jacob ai Access userfile Emma a.txt Emma b.txt Olivia a.txt Olivia b.txt Jacob a.txt Jacob b.txt = ⋈ group ← file PTime

Access Example Revisited Delete (Emma, a.txt ) NP-hard user → group PTime group ← file PTime GroupFile groupfile ai a.txt ai b.txt db a. txt db b.txt os a.txt UserGroup usergroup Emma ai Emma db Olivia os Olivia db Jacob ai Access userfile Emma a.txt Emma b.txt Olivia a.txt Olivia b.txt Jacob a.txt Jacob b.txt user ← group PTime ⋈ =

Access Example Revisited Delete (Emma, a.txt ) NP-hard user → group PTime group ← file PTime user ← group PTime group → file PTime Every nontrivial set of FDs brings the problem to PTime GroupFile groupfile ai a.txt ai b.txt db a. txt db b.txt os a.txt UserGroup usergroup Emma ai Emma db Olivia os Olivia db Jacob ai Access userfile Emma a.txt Emma b.txt Olivia a.txt Olivia b.txt Jacob a.txt Jacob b.txt ⋈ =

Additional Examples Q ( y, y 1, y 2 ) :– R 1 ( y 1, x 1 ), R ( x 1, y, x 2 ), R 2 ( y 2, x 2 ) Q ( y, y 1, y 2 ) :– R 1 ( x 1, y 1 ), R ( x 1, y, x 2 ), R 2 ( x 2, y 2 ) PTime NP- hard

Dichotomy with FDs [K, Vondrak, Williams, 2011], no FDs: Q has head domination ⇒ 1Rel returns an optimal solution (in PTime) otherwise ⇒ ∃ side-effect-free is NP-complete; NP-hard to find an ( α Q -approx.) optimal solution This paper: (FDs) Q + has functional head dom. ⇒ 1Rel* returns an optimal solution (in PTime) otherwise ⇒ ∃ side-effect-free is NP-complete; NP-hard to find an ( α Q -approx.) optimal solution Let Q be a CQ over a schema S (no self joins) Depending on the CQ and FDs, the problem is either straightforward or hard! Remove tuple only if it is used for the undersired answer

FDs Among Variables Access (u,f) :– UserGroup (u,g), GroupFile (g,f) FD: group → file g → fg → fu → g FD: user → group u → f{u,g} → f Definition: CQ Q over schema S, U, V ⊆ variables( Q ) U → V : ∀ D ∈ db( S )  1,  2 ∈ hom( Q→D )  1 =  2 on U ⇒  1 =  2 on V

The CQ Q + Definition: CQ Q over schema S, U, V ⊆ variables( Q ) U → V : ∀ D ∈ db( S )  1,  2 ∈ hom( Q→D )  1 =  2 on U ⇒  1 =  2 on V Q + : add to Q ’s head every x s.t. headVars → x Access (u,f) :– UserGroup (u,g), GroupFile (g,f) group ← file Access + (u,g,f) :– UserGroup (u,g), GroupFile (g,f) g ← {u,f} ⇒ Tractability Condition: Q + has functional head domination Tractability Condition: Q + has functional head domination

Functional Head Domination functional head domination: ∀ C ∈ CC( G ∃ [ Q ]) ∃   ∊ atoms( Q ), s.t. vars(  ) → headVars( C ) head domination: ∀ C ∈ CC( G ∃ [ Q ]) ∃   ∊ atoms( Q ), s.t. vars(  )  ⊇ headVars( C ) Access (u,f) :– UserGroup (u,g), GroupFile (g,f) group → file {u,g} → {u,f} ⇐ Q:Q:A CQ over a schema S G∃[Q]:G∃[Q]: nodes = atoms( Q ) edges = “sharing ≥1 existential var.” Tractability Condition: Q + has functional head domination Tractability Condition: Q + has functional head domination

Examples Tractability Condition: Q + has functional head domination Tractability Condition: Q + has functional head domination Q ( y, y 1, y 2 ) :– R 1 ( x 1, y 1 ), R ( x 1, y, x 2 ), R 2 ( x 2, y 2 ) PTime (1Rel*) Q + ( y, y 1, y 2, x 2 ) :– R 1 ( x 1, y 1 ), R ( x 1, y, x 2 ), R 2 ( x 2, y 2 ) { y, y 1, y 2 } → x 2 Q ( y, y 1, y 2 ) :– R 1 ( x 1, y 1 ), R ( x 1, y, x 2 ), R 2 ( x 2, y 2 ) NP-hard

Example: Key-Preserving Views Theorem [Cong, Fan, Geerts, 2006]: Q preserves keys* ⇒ deletion propagation in PTime Tractability Condition: Q + has functional head domination Tractability Condition: Q + has functional head domination *Each relation has a key; none of the key attributes are projected out Q preserves keys ⇒ Q + has no existential vars ⇒ G ∃ [ Q + ] has no edges ⇒ Q + trivially has functional head domination (every connected component is a node, dominated by itself…) ⇒ 1Rel * returns an optimal solution For CQs w/o self joins, follows directly from our positive side:

About the Proof T he positive side is fairly simple –… once the tractability condition is found The negative side is intricate –Reduction from the special case of the Access CQ –Challenge: simulating Access (u,f) by an instance that satisfies all the FDs –Central concept: graph separation on the variable graph of the CQ Q' ( y 1, y 2 ) :– R 1 ( y 1, x 1, x ), R 2 ( x, x 2, y 2 ) Q ( y 1, y 2 ) :– R 1 ( y 1, x ), R 2 ( x, y 2 ) R3(x1, x2)R3(x1, x2) →

Conclusions & Ongoing Work Studied deletion propagation in the presence of functional dependencies Established a dichotomy in complexity: –PTime by a straightforward algorithm vs. –Hardness (of approximation) Generalizes previously established special cases: no FDs, key-preserving views Ongoing work: deletion of multiple answers –Preview: trichotomy Straightforward Hard but approximable (by a constant-factor) Hard to approximate Questions?