A D ICHOTOMY ON T HE C OMPLEXITY OF C ONSISTENT Q UERY A NSWERING FOR A TOMS W ITH S IMPLE K EYS Paris Koutris Dan Suciu University of Washington.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

Path-Sensitive Analysis for Linear Arithmetic and Uninterpreted Functions SAS 2004 Sumit Gulwani George Necula EECS Department University of California,
Mathematical Preliminaries
Applications Computational LogicLecture 11 Michael Genesereth Spring 2004.
Advanced Piloting Cruise Plot.
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
Analysis of Algorithms
Chapter 2: Basic Structures: Sets, Functions, Sequences, and Sums (1)
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Fourth normal form: 4NF 1. 2 Normal forms desirable forms for relations in DB design eliminate redundancies avoid update anomalies enforce integrity constraints.
Relational data objects 1 Lecture 6. Relational data objects 2 Answer to last lectures activity.
Reductions Complexity ©D.Moshkovitz.
1 Bart Jansen Polynomial Kernels for Hard Problems on Disk Graphs Accepted for presentation at SWAT 2010.
Randomized Algorithms Randomized Algorithms CS648 1.
Data Structures Using C++
ABC Technology Project
ML Lists.1 Standard ML Lists. ML Lists.2 Lists A list is a finite sequence of elements. [3,5,9] ["a", "list" ] [] Elements may appear more than once [3,4]
Chapter 9 -- Simplification of Sequential Circuits.
3 Logic The Study of What’s True or False or Somewhere in Between.
COMP 482: Design and Analysis of Algorithms
Comp 122, Spring 2004 Graph Algorithms – 2. graphs Lin / Devi Comp 122, Fall 2004 Identification of Edges Edge type for edge (u, v) can be identified.
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
VOORBLAD.
Quadratic Inequalities
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
Equations of Lines Equations of Lines
Squares and Square Root WALK. Solve each problem REVIEW:
Joint work with Andre Lieutier Dassault Systemes Domain Theory and Differential Calculus Abbas Edalat Imperial College Oxford.
CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of.
Graphs, representation, isomorphism, connectivity
© 2012 National Heart Foundation of Australia. Slide 2.
Abbas Edalat Imperial College London Contains joint work with Andre Lieutier (AL) and joint work with Marko Krznaric (MK) Data Types.
Faster Query Answering in Probabilistic Databases using Read-Once Functions Sudeepa Roy Joint work with Vittorio Perduca Val Tannen University of Pennsylvania.
1 P, NP, and NP-Complete Dr. Ying Lu RAIK 283 Data Structures & Algorithms.
Chapter 5 Test Review Sections 5-1 through 5-4.
Addition 1’s to 20.
25 seconds left…...
Slippery Slope
Copyright © Cengage Learning. All rights reserved.
Complexity ©D.Moshkovits 1 Where Can We Draw The Line? On the Hardness of Satisfiability Problems.
Introduction to Computability Theory
Finite-state Recognizers
Januar MDMDFSSMDMDFSSS
Week 1.
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Copyright © 2012 Pearson Education, Inc. Chapter 12: Theory of Computation Computer Science: An Overview Eleventh Edition by J. Glenn Brookshear.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 6 The Relational Algebra.
CS203 Lecture 15.
PSSA Preparation.
University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,
CpSc 3220 Designing a Database
From Approximative Kernelization to High Fidelity Reductions joint with Michael Fellows Ariel Kulik Frances Rosamond Technion Charles Darwin Univ. Hadas.
Bart Jansen 1.  Problem definition  Instance: Connected graph G, positive integer k  Question: Is there a spanning tree for G with at least k leaves?
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
Epp, section 10.? CS 202 Aaron Bloomfield
Complexity ©D.Moshkovitz 1 NPC More NP-Complete Problems.
A D ICHOTOMY ON T HE C OMPLEXITY OF C ONSISTENT Q UERY A NSWERING FOR A TOMS W ITH S IMPLE K EYS Paris Koutris Dan Suciu University of Washington.
A NSWERING C ONJUNCTIVE Q UERIES W ITH I NEQUALITIES Paris Koutris 1 Tova Milo 2 Sudeepa Roy 1 Dan Suciu 1 ICDT University of Washington 2 Tel Aviv.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Presentation transcript:

A D ICHOTOMY ON T HE C OMPLEXITY OF C ONSISTENT Q UERY A NSWERING FOR A TOMS W ITH S IMPLE K EYS Paris Koutris Dan Suciu University of Washington

R EPAIRS An uncertain instance I for a schema with key constraints A repair r of I is a subinstance of I that satisfies the key constraints and is maximal 2 R(x, y) (a 1, b 1 ) (a 1, b 2 ) (a 2, b 2 ) (a 3, b 3 ) (a 3, b 4 ) (a 4, b 4 ) (a 1, b 1 ) (a 2, b 2 ) (a 3, b 4 ) (a 4, b 4 ) (a 1, b 1 ) (a 2, b 2 ) (a 3, b 3 ) (a 4, b 4 ) (a 1, b 2 ) (a 2, b 2 ) (a 3, b 4 ) (a 4, b 4 ) (a 1, b 2 ) (a 2, b 2 ) (a 3, b 3 ) (a 4, b 4 ) The 4 possible repairs

C ONSISTENT Q UERY A NSWERING If Q is boolean, we say that I is certain for Q, I |= Q, if for every repair r of I, Q(r) is true 3 R(x, y) (a 1, b 1 ) (a 1, b 2 ) (a 2, b 2 ) (a 3, b 3 ) (a 3, b 4 ) (a 4, b 4 ) S(y, z) (b 1, c 1 ) (b 2, c 1 ) (b 2, c 2 ) (b 3, c 3 ) Q() = R(x, y), S(y, z) I |= Q

P ROBLEM S TATEMENT CERTAINTY(Q): Given as input an instance I, does I |= Q when Q is a boolean CQ? In general, CERTAINTY(Q) is in coNP – Q 1 = R(x, y), S(y, z) : expressible as a first-order query – Q 2 = R(x, y), S(z, y) : coNP-complete – Q 3 = R(x, y), S(y, x) : PTIME but not first-order expressible 4 Conjecture For every boolean conjunctive query Q, CERTAINTY(Q) is either in PTIME or coNP-complete

P ROGRESS SO F AR [Wijsen, 2010] – Syntactic characterization of FO-expressible acyclic CQs w/o self- joins [Kolaitis and Pema, 2012] – A trichotomy for CQs with 2 atoms and no self-joins [Wijsen, 2010 & 2013] – PTIME algorithm for cyclic queries: C k = R 1 (x 1,x 2 ), …, R k (x k, x 1 ) – Further classification of acyclic CQs w/o self-joins 5

O UR C ONTRIBUTION A dichotomy for CQs w/o self-joins where atoms have either Simple keys : R(x, y, z) Keys that consist of all attributes: S(x, y, z) 6 Theorem For every boolean CQ Q w/o self-joins where for each atom the key consists of either one attribute or all attributes, there exists a dichotomy of CERTAINTY(Q) into PTIME and coNP-complete

O UTLINE 1.The Dichotomy Condition 2.Frugal Repairs & Representable Answers 3.Strongly Connected Graphs 7

T HE Q UERY G RAPH We equivalently study boolean CQs consisting only of binary relations where one attribute is the key: R(x, y) Relations can be consistent (R c ) or inconsistent (R i ) Query Graph: a directed edge (u, v) for each atom R(u,v) 8 Q = R i (x, y), S i (z, w), T c (y, w) y w x S T R z G[Q] source node u R end node v R

D EFINITIONS x +,R : set of nodes reachable from node x once we remove the edge R (through a directed path) R ~ S [source-equivalent]: source nodes u R, u S are in the same SCC [R]: the equivalence class of R w.r.t ~ 9 y R z x T S v w u x +,R = {x, v, w} R ~ T and [R] = {R, T} V U

C OUPLED E DGES coupled + (R) = edges in [R] + any inconsistent edge S s.t. the source node u S is connected to the end node v R through a (undirected) path that does not intersect with u R +,R 10 y = v R R z x = u R T S v w u = u V coupled + (R): contains R,T: [R] = {R, T} contains V: path from y (= v R ) to u (= u V ) does not contain U V U The set u R +,R

S PLITTABLE G RAPHS Two inconsistent edges R, S are coupled if – S in coupled + (R) & R in coupled + (S) A graph G[Q] is: – unsplittable if it contains a pair of coupled edges that are not source-equivalent. – splittable otherwise 11 y R z x T S v w u V U coupled + (R) = {R, T, V} coupled + (T) = {R, T, V} coupled + (V) = {V} coupled + (U) = {U,V,R,T} Only R,T are coupled SPLITTABLE!

T HE D ICHOTOMY C ONDITION 12 y R z x T S v w u V U Dichotomy Theorem If G[Q] is splittable, CERTAINTY(Q) is in PTIME If G[Q] is unsplittable, CERTAINTY(Q) is coNP- complete Splittable, so in PTIME

E XAMPLES 13 PTIME R(x, y), S(y, z) coNP-complete R(x, y), S(y, z), T c (x, z) x y z x y z PTIME R(x, y), S(y, z), U c (z, y) x y z coNP-complete R(x, y), S(z, y), U c (y, z) x y z

O UTLINE 1.The Dichotomy Condition 2.Frugal Repairs & Representable Answers 3.Strongly Connected Graphs 14

F RUGAL R EPAIRS (1) 15 Definition A repair r of an instance I is frugal for a boolean query Q if for any other repair r’ of I, Q f (r’) is not strictly contained in Q f (r) R(x, y) (a 1, b 1 ) (a 1, b 2 ) (a 2, b 3 ) (a 3, b 4 ) (a 4, b 4 ) S(y, x) (b 1, a 1 ) (b 3, a 2 ) (b 4, a 3 ) (b 4, a 4 ) repair r 1 = { R(a 1, b 1 ), R(a 2, b 3 ), R(a 3, b 4 ), R(a 4, b 4 ) S(b 1, a 1 ), S(b 3, a 2 ), S(b 4, a 3 ) } Q f (r 1 ) = { (a 1, b 1 ), (a 2, b 3 ), (a 3, b 4 ) } repair r 2 = { R(a 1, b 2 ), R(a 2, b 3 ), R(a 3, b 4 ), R(a 4, b 4 ) S(b 1, a 1 ), S(b 3, a 2 ), S(b 4, a 3 ) } Q f (r 2 ) = { (a 2, b 3 ), (a 3, b 4 ) } not frugal frugal Q f = all body variables to the head (full query)

R(x, y) (a 1, b 1 ) (a 1, b 2 ) (a 2, b 3 ) (a 3, b 4 ) (a 4, b 4 ) S(y, x) (b 1, a 1 ) (b 3, a 2 ) (b 4, a 3 ) (b 4, a 4 ) F RUGAL R EPAIRS (2) 16 I |= Q if and only if every frugal repair satisfies Q We lose no generality if we study only frugal repairs! Only two frugal repairs: Q f (r 2 ) = {(a 2, b 3 ), (a 3, b 4 )} Q f (r 3 ) = {(a 2, b 3 ), (a 4, b 4 )}

O R -S ETS 17 Efficiently represent all answer sets of frugal repairs We use or-sets: means 1 or 2 or 3 – A = – We can “compress” A as B = {, } – [Libkin and Wong, ‘93] “decompression” α operator: α(B) = A The or-set of answer sets for frugal repairs of I for Q: – M Q (I) = Compressed form (set of or-sets): – A Q (I) = {, }

R EPRESENTABILITY (1) 18 An or-set-of-sets S is representable if there exists a set-of- or-sets S 0 (compression) such that: – α(S 0 ) = S – For any distinct or-sets A, B in S 0, the tuples in A and B use distinct constants in all coordinates The compression of a representable set with active domain of size n has size polynomial in n {, } compressionnot representable

R EPRESENTABILITY (2) 19 I |= Q iff the compression A Q (I) is not empty If we can compute A Q (I) in polynomial time, deciding whether I |= Q is in PTIME Theorem If G[Q] is a strongly connected graph, M Q (I) is representable and its compression can be computed in polynomial time in the size of I

O UTLINE 1.The Dichotomy Condition 2.Frugal Repairs & Representable Answers 3.Strongly Connected Graphs 20

C YCLES 21 C k = R 1 (x 1, x 2 ), R 2 (x 2, x 3 )…, R k (x k, x 1 ) The purified instance contains a collection of disjoint SCCs ALGORITHM FrugalC – Find the SCCs that contain no directed cycle of length > k – For each such SCC i, create an or-set A i that contains all cycles of length k – Output A Ck (I) = {A 1, A 2, …} R(x, y) (a 1, b 1 ) (a 2, b 2 ) (a 2, b 3 ) S(y, z) (b 1, c 1 ) (b 2, c 2 ) (b 3, c 2 ) T(z, x) (c 1, a 1 ) (c 2, a 2 ) a1a1 b1b1 c1c1 a2a2 b2b2 c2c2 b3b3 A C3 (I) = {, }

G ENERAL C ASE : SCC S (1) 22 Recursively split a SCC G into a SCC G’ and a directed path P that intersects G’ only at its start and end node The set A G’ (I) can be recursively computed x y R S T t U V Graph G’ The path P = y -- > t -- > z A G’ (I) = {, } A1A1 A2A2 z

G ENERAL C ASE : SCC S (2) 23 A G’ (I) = {, } A1A1 A2A2 B(a, b) (A 1, [a 1 b 1 c 1 ]) (A 2, [a 2 b 2 c 2 ]) (A 2, [a 2 b 3 c 2 ]) B 1 c (b, y) ([a 1 b 1 c 1 ], b 1 ) ([a 2 b 2 c 2 ], b 2 ) ([a 2 b 3 c 2 ], b 3 ) B 2 c (b, z) ([a 1 b 1 c 1 ], c 1 ) ([a 2 b 2 c 2 ], c 2 ) ([a 2 b 3 c 2 ], c 2 ) B 0 c (z, b) (c 1, A 1 ) (c 2, A 2 ) Any value belongs in a unique or-set a y t U V b B B1cB1c z B2cB2c B0cB0c Replacement of G’ A cycle C = a -> b -> y -> t -> z -> a + a chord B 2 that is a consistent relation

R EST O F THE P ROOF 24 PTIME algorithm for splittable graphs – Find a separator in G[Q] (always exists if a graph is splittable) – The separator splits G[Q] into cases with fewer inconsistent edges, which are solved recursively – Base case: all edges are consistent (check whether Q(I) is true) coNP-hardness – Reduction from the Monotone-3SAT problem

C ONLUSIONS 25 Significant progress towards proving the dichotomy for the complexity of Certain Query Answering for Conjunctive Queries Settle the dichotomy (or trichotomy) even for queries with self-joins!

Thank you ! 26

G ENERAL C ASE : SCC S (3) 27 a y t U V b B B1cB1c z B2cB2c B0cB0c Replacement of G’ A cycle C = a -> b -> y -> t -> z -> a+ a chord B 2 that is a consistent relation Compute A C for the modified input Throw away any or-sets that have a tuple that does not agree with B 2 B(a, b) (A 1, [a 1 b 1 c 1 ]) (A 2, [a 2 b 2 c 2 ]) (A 2, [a 2 b 3 c 2 ]) B 1 c (b, y) ([a 1 b 1 c 1 ], b 1 ) ([a 2 b 2 c 2 ], b 2 ) ([a 2 b 3 c 2 ], b 3 ) B 2 c (b, z) ([a 1 b 1 c 1 ], c 1 ) ([a 2 b 2 c 2 ], c 2 ) ([a 2 b 3 c 2 ], c 2 ) B 0 c (z, b) (c 1, A 1 ) (c 2, A 2 )

O VERVIEW A query graph G[Q] is associated with query Q The condition for PTIME (splittability) is defined on G[Q] PTIME case: – We introduce the notion of frugal repairs & representable answers – Algorithm for Strongly Connected Graphs – Use the notion of separators to recursively split the query graph (self-reducibility) coNP-complete case: – Reduction from the Monotone-3SAT problem 28