A NSWERING C ONJUNCTIVE Q UERIES W ITH I NEQUALITIES Paris Koutris 1 Tova Milo 2 Sudeepa Roy 1 Dan Suciu 1 ICDT 2015 1 University of Washington 2 Tel Aviv.

Slides:



Advertisements
Similar presentations
A D ICHOTOMY ON T HE C OMPLEXITY OF C ONSISTENT Q UERY A NSWERING FOR A TOMS W ITH S IMPLE K EYS Paris Koutris Dan Suciu University of Washington.
Advertisements

Constraint Satisfaction Problems
DOLAP'04 - Washington DC1 Constructing Search Space for Materialized View Selection Dimiti Theodoratos Wugang Xu New Jersey Institute of Technology.
An Improved Data Stream Summary: The Count-Min Sketch and its Applications Graham Cormode, S. Muthukrishnan 2003.
Queries with Difference on Probabilistic Databases Sanjeev Khanna Sudeepa Roy Val Tannen University of Pennsylvania 1.
A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.
Noga Alon Institute for Advanced Study and Tel Aviv University
S KEW IN P ARALLEL Q UERY P ROCESSING Paraschos Koutris Paul Beame Dan Suciu University of Washington PODS 2014.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Coloring k-colorable graphs using smaller palettes Eran Halperin Ram Nathaniel Uri Zwick Tel Aviv University.
1 L is in NP means: There is a language L’ in P and a polynomial p so that L 1 · L 2 means: For some polynomial time computable map r : 8 x: x 2 L 1 iff.
Dagstuhl 2010 University of Puerto Rico Computer Science Department The power of group algebras for constrained multilinear monomial detection Yiannis.
1 9. Evaluation of Queries Query evaluation – Quantifier Elimination and Satisfiability Example: Logical Level: r   y 1,…y n  r’ Constraint.
1 Polynomial Church-Turing thesis A decision problem can be solved in polynomial time by using a reasonable sequential model of computation if and only.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
The Theory of NP-Completeness
NP-Complete Problems Problems in Computer Science are classified into
EXPANDER GRAPHS Properties & Applications. Things to cover ! Definitions Properties Combinatorial, Spectral properties Constructions “Explicit” constructions.
Deterministic Network Coding by Matrix Completion Nick Harvey David Karger Kazuo Murota.
CSE 421 Algorithms Richard Anderson Lecture 27 NP Completeness.
Chapter 11: Limitations of Algorithmic Power
2005lav-iii1 The Infomaster system & the inverse rules algorithm  The InfoMaster system  The inverse rules algorithm  A side trip – equivalence & containment.
Low Complexity Algebraic Multicast Network Codes Sidharth “Sid” Jaggi Philip Chou Kamal Jain.
1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation.
Pebble games for rigidity Overview. The game of pebbling was first suggested by Lagarias and Saks, as a tool for solving a particular problem in number.
C OMMUNICATION S TEPS F OR P ARALLEL Q UERY P ROCESSING Paraschos Koutris Paul Beame Dan Suciu University of Washington PODS 2013.
P ARALLEL S KYLINE Q UERIES Foto Afrati Paraschos Koutris Dan Suciu Jeffrey Ullman University of Washington.
A D ICHOTOMY ON T HE C OMPLEXITY OF C ONSISTENT Q UERY A NSWERING FOR A TOMS W ITH S IMPLE K EYS Paris Koutris Dan Suciu University of Washington.
Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.
Theory of Computing Lecture 17 MAS 714 Hartmut Klauck.
Christopher Re and Dan Suciu University of Washington Efficient Evaluation of HAVING Queries on a Probabilistic Database.
Approximate schemas Michel de Rougemont, LRI, University Paris II.
Data Streams Part 3: Approximate Query Evaluation Reynold Cheng 23 rd July, 2002.
Discrete Mathematics and Its Applications Sixth Edition By Kenneth Rosen Chapter 8 Relations 歐亞書局.
1 Approximate Schemas and Data Exchange Michel de Rougemont University Paris II & LRI Joint work with Adrien Vielleribière, University Paris-South.
Approximate schemas Michel de Rougemont, LRI, University Paris II Joint work with E. Fischer, Technion, F. Magniez, LRI.
Chapter 6 Properties of Regular Languages. 2 Regular Sets and Languages  Claim(1). The family of languages accepted by FSAs consists of precisely the.
Relation. Combining Relations Because relations from A to B are subsets of A x B, two relations from A to B can be combined in any way two sets can be.
1 How to establish NP-hardness Lemma: If L 1 is NP-hard and L 1 ≤ L 2 then L 2 is NP-hard.
T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.
CSE 589 Part V One of the symptoms of an approaching nervous breakdown is the belief that one’s work is terribly important. Bertrand Russell.
A Dichotomy in the Complexity of Deletion Propagation with Functional Dependencies 2012 ACM SIGMOD/PODS Conference Scottsdale, Arizona, USA PODS 2012 Benny.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Functional Dependencies CIS 4301 Lecture Notes Lecture 8 - 2/7/2006.
NP Completeness Piyush Kumar. Today Reductions Proving Lower Bounds revisited Decision and Optimization Problems SAT and 3-SAT P Vs NP Dealing with NP-Complete.
Parallel Evaluation of Conjunctive Queries Paraschos Koutris and Dan Suciu University of Washington PODS 2011, Athens.
CSE 421 Algorithms Richard Anderson Lecture 27 NP-Completeness Proofs.
COSC 3101A - Design and Analysis of Algorithms 14 NP-Completeness.
1 SAT SAT: Given a Boolean function in CNF representation, is there a way to assign truth values to the variables so that the function evaluates to true?
Lecture 9: Query Complexity Tuesday, January 30, 2001.
TU/e Algorithms (2IL15) – Lecture 10 1 NP-Completeness, II.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
L is in NP means: There is a language L’ in P and a polynomial p so that L1 ≤ L2 means: For some polynomial time computable map r :  x: x  L1 iff.
Richard Anderson Lecture 26 NP-Completeness
Computing Full Disjunctions
Exact Algorithms via Monotone Local Search
Queries with Difference on Probabilistic Databases
ICS 353: Design and Analysis of Algorithms
Time Complexity We use a multitape Turing machine
Lecture 10: Query Complexity
Data Exchange: Semantics and Query Answering
CSE 6408 Advanced Algorithms.
Machine Learning: UNIT-3 CHAPTER-2
CSE 589 Applied Algorithms Spring 1999
Switching Lemmas and Proof Complexity
Probabilistic Databases with MarkoViews
Presentation transcript:

A NSWERING C ONJUNCTIVE Q UERIES W ITH I NEQUALITIES Paris Koutris 1 Tova Milo 2 Sudeepa Roy 1 Dan Suciu 1 ICDT University of Washington 2 Tel Aviv University

P ROBLEM What is the combined complexity of computing conjunctive queries with inequalities (CQ ≠ )? query (q,I): q = R(x,y),S(y,z),T(z,w) I = {x ≠ z, y ≠ w} 2

E XAMPLE : P ATH Q UERY Path query (of length k) P k = R 1 (x 1,x 2 ),R 2 (x 2,x 3 ),…,R k (x k,x k+1 ) acyclic query polynomial combined complexity 3 x1x1 x2x2 x3x3... xkxk x k+1 R1R1 R2R2 R3R3 RkRk

E XAMPLE : P ATH Q UERY Path query + inequalities P k = R 1 (x 1,x 2 ),R 2 (x 2,x 3 ),…,R k (x k,x k+1 ) I = {x i ≠ x j, for all i<j} equivalent to Hamiltonian path NP-hard 4 x1x1 x2x2 x3x3... xkxk x k+1 R1R1 R2R2 R3R3 RkRk inequality graph

E XAMPLE : P ATH Q UERY Path query + inequalities P k = R 1 (x 1,x 2 ),R 2 (x 2,x 3 ),…,R k (x k,x k+1 ) I = {x i ≠ x i+2, for all i} polynomial combined complexity 5 x1x1 x2x2 x3x3... xkxk x k+1 R1R1 R2R2 R3R3 RkRk

C ONTRIBUTION How does the combined complexity of computing CQs changes when we add inequalities? Given any blackbox algorithm that computes q, we can compute (q,I) with a g(q,I) log(|D|) blowup Given any Selection-Projection-Join plan that computes q, we can compute (q,I) with a f(q,I) blowup 6

O UTLINE 7 Color Coding The Main Technique Query Plans for Inequalities

B ACKGROUND [Papadimitriou, Yannakakis ‘97] Let q be a boolean acyclic CQ ≠ and D be a database instance. Then, q can be evaluated in time k = #variables in the inequality graph 8 fixed-parameter tractability

C OLOR C ODING : I DEA Pick a random coloring h: Dom  {1, …, k} – maps values to k colors If a tuple t belongs in the answer of the full query, then the colors satisfy the inequalities with probability ≥ e -k 9 q = R(x,y),S(y,z),T(z,w) I = {x ≠ z, y ≠ w} tupleabcd col #11214 col #21233 valid [Alon, Yuster, Zwick ‘97]

C OLOR C ODING : T HEOREM /Theorem/ Let q be a CQ that can be computed in time T(|q|, |D|). Then, (q, I) can be computed in time 10 Color-coding demands the construction of k-perfect hash family for every instance There is a log(|D|) additional factor The algorithm is oblivious to the combined structure of the query + inequalities

O UTLINE 11 Color Coding The Main Technique Query Plans for Inequalities

M AIN T ECHNIQUE q = R(x 1,…,x m ),S(y 1,…,y l ) + inequalities How do we compute (q,I) ? Cartesian product, then apply the inequalities – time O(ml|R||S|) IDEA: compress R to a representation R’ of size independent of |R|, then compute the product R’,S 12

R UNNING E XAMPLE inequality graph (bipartite) H 13 x1x1 x2x2 y1y1 y2y2 y3y3 R(x 1, x 2 ) (1,1) (1,2) (1,4) (1,8) (2,3) (2,1) (3,2) (5,2) (2,2) (2,4)

H-A CCEPTED T UPLES 14 A tuple t over the schema of S is H-accepted by R if for some t’ in R, t and t’ satisfy the inequalities in H t = (2,1,3) is H-accepted t = (2,1,2) is not! R(x 1, x 2 ) (1,1) (1,2) (1,4) (1,8) (2,3) (2,1) (3,2) (5,2) (2,2) (2,4) x1x1 x2x2 y1y1 y2y2 y3y3

H-E QUIVALENCE 15 Relations R 1, R 2 are H-equivalent if for any tuple t, t is H- accepted by R 1 if and only if t is H-accepted by R 2 /Lemma/ There exists a sub-instance R’ of R s.t. R’,R are H-equivalent |R’| ≤ f(H), independent of R R’ can be computed in time O(f(H) |R|)

H-F ORBIDDEN T UPLES 16 A tuple t over Dom + {-} is H-forbidden for R if for every tuple t’ in R, the inequalities between t, t’ are violated t = (1,2,3) is H-forbidden t = (1,2,-) is also H-forbidden The H-forbidden tuples are infinitely many but the minimally H-forbidden are finite R(x 1, x 2 ) (1,1) (1,2) (1,4) (1,8) (2,3) (2,1) (3,2) (5,2) (2,2) (2,4)

T HE A LGORITHM 17 (1,1) (-,-,-) (1,-,-) (-,1,-) (-,-,1) R(x 1, x 2 ) (1,1) (1,2) (1,4) (1,8) (2,3) (2,1) (3,2) (5,2) (2,2) (2,4)

T HE A LGORITHM 18 (1,1) (-,-,-) (1,-,-) (-,1,-) (-,-,1) (1,2) (-,2,1) (-,1,1) (1,-,1) R(x 1, x 2 ) (1,1) (1,2) (1,4) (1,8) (2,3) (2,1) (3,2) (5,2) (2,2) (2,4) (1,-,-) remains H-forbidden (-,1,-) remains H-forbidden (-,-,1) is not

T HE A LGORITHM 19 (1,1) (-,-,-) (1,-,-) (-,1,-) (-,-,1) (1,2) (-,2,1) (-,1,1) (1,-,1) (1,4) (1,2,1) R(x 1, x 2 ) (1,1) (1,2) (1,4) (1,8) (2,3) (2,1) (3,2) (5,2) (2,2) (2,4) only the rightmost node needs expansion

T HE A LGORITHM 20 (1,1) (-,-,-) (1,-,-) (-,1,-) (-,-,1) (1,2) (-,2,1) (-,1,1) (1,-,1) (1,4) (1,2,1) R(x 1, x 2 ) (1,1) (1,2) (1,4) (1,8) (2,3) (2,1) (3,2) (5,2) (2,2) (2,4) the tuple (1,8) expands no node

T HE A LGORITHM 21 (1,1) (-,-,-) (1,-,-) (-,1,-) (-,-,1) (1,2) (-,2,1) (-,1,1) (1,-,1) (2,3) (2,1,1) (1,2,-) (1,3,-) (1,-,3)(2,1,-)(-,1,3) (1,2,1) (1,3,1) (1,4) (1,2,1) R(x 1, x 2 ) (1,1) (1,2) (1,4) (1,8) (2,3) (2,1) (3,2) (5,2) (2,2) (2,4)

T HE A LGORITHM 22 (1,1) (-,-,-) (1,-,-) (-,1,-) (-,-,1) (1,2) (-,2,1) (-,1,1) (1,-,1) (2,3) (2,1,1) (1,2,-) (1,3,-) (1,-,3)(2,1,-)(-,1,3) (1,2,1) (1,3,1) (1,4) (1,2,1) (2,1) (1,3,1) (1,1,3) (1,2,3) R(x 1, x 2 ) (1,1) (1,2) (1,4) (1,8) (2,3) (2,1) (3,2) (5,2) (2,2) (2,4)

T HE A LGORITHM 23 (1,1) (-,-,-) (1,-,-) (-,1,-) (-,-,1) (1,2) (-,2,1) (-,1,1) (1,-,1) (2,3) (2,1,1) (1,2,-) (1,3,-) (1,-,3)(2,1,-)(-,1,3) (1,2,1) (1,3,1) (1,4) (1,2,1) (2,1) (1,3,1) (1,1,3) (1,2,3) (3,2) (2,1,2) (3,1,3) (3,2) R(x 1, x 2 ) (1,1) (1,2) (1,4) (1,8) (2,3) (2,1) (3,2) (5,2) (2,2) (2,4) the node should be expanded, but has no “space”

T HE A LGORITHM 24 (1,1) (-,-,-) (1,-,-) (-,1,-) (-,-,1) (1,2) (-,2,1) (-,1,1) (1,-,1) (2,3) (2,1,1) (1,2,-) (1,3,-) (1,-,3)(2,1,-)(-,1,3) (1,2,1) (1,3,1) (1,4) (1,2,1) (2,1) (1,3,1) (1,1,3) (1,2,3) (3,2) (2,1,2) (3,1,3) (3,2)(5,2) R(x 1, x 2 ) (1,1) (1,2) (1,4) (1,8) (2,3) (2,1) (3,2) (5,2) (2,2) (2,4)

T HE A LGORITHM 25 (1,1) (-,-,-) (1,-,-) (-,1,-) (-,-,1) (1,2) (-,2,1) (-,1,1) (1,-,1) (2,3) (2,1,1) (1,2,-) (1,3,-) (1,-,3)(2,1,-)(-,1,3) (1,2,1) (1,3,1) (1,4) (1,2,1) (2,1) (1,3,1) (1,1,3) (1,2,3) (3,2) (2,1,2) (3,1,3) (3,2)(5,2) R(x 1, x 2 ) (1,1) (1,2) (1,4) (1,8) (2,3) (2,1) (3,2) (5,2) (2,2) (2,4)

T HE A LGORITHM 26 (1,1) (-,-,-) (1,-,-) (-,1,-) (-,-,1) (1,2) (-,2,1) (-,1,1) (1,-,1) (2,3) (2,1,1) (1,2,-) (1,3,-) (1,-,3)(2,1,-)(-,1,3) (1,2,1) (1,3,1) (1,4) (1,2,1) (2,1) (1,3,1) (1,1,3) (1,2,3) (3,2) (2,1,2) (3,1,3) (3,2)(5,2) R(x 1, x 2 ) (1,1) (1,2) (1,4) (1,8) (2,3) (2,1) (3,2) (5,2) (2,2) (2,4) (1,2,-)(1,2,3) (2,1,2) (1,2,1)

A NALYSIS 27 R(x 1, x 2 ) (1,1) (1,2) (1,4) (1,8) (2,3) (2,1) (3,2) (5,2) (2,2) (2,4) relations with the same tree are H-equivalent tuples that do not expand a node can be removed the tree has only f(H) nodes E H (R) = constant-size relation that is H-equivalent to R

O UTLINE 28 Color Coding The Main Technique Query Plans for Inequalities

T HE H-P ROJECTION 29 Let R(A 1, …, A m ) X subset of A = {A 1,…,A m } H a bipartite graph with sets A \ X and some set B the size of the H-projection is at most f(H) times the projection

SPJ P LANS 30 q(w)=R(x,y,’a’),S(y,z),T(z,w) I={x≠z, y≠w, x≠w} R(A,B,E) S(B’,C) Π C,E σ E=‘a’ ΠDΠD T(C’,D) B=B’ C=C’ inequalities cannot be trivially added to the plan

SPJ P LANS : S TEP ONE 31 R(A,B,E) S(B’,C) Π C,E σ E=‘a’ ΠDΠD T(C’,D) B=B’ C=C’ R(A,B,E) S(B’,C) σ E=‘a’ ΠDΠD T(C’,D) B=B’ C=C’ push projections to the top of the plan

SPJ P LANS : S TEP T WO 32 R(A,B,E) S(B’,C) σ E=‘a’ Π D H0 T(C’,D) B=B’ C=C’ add the inequalities after the projection introduce H-projection with empty graph H0 σ A ≠C,B≠D,A≠D

SPJ P LANS : S TEP T HREE 33 R(A,B,E) S(B’,C) σ E=‘a’ Π D H0 T(C’,D) B=B’ C=C’ Push projections to initial place σ A ≠C,B≠D,A≠D R(A,B,E) S(B’,C) σ E=‘a’ Π D H0 T(C’,D) B=B’ C=C’ σ B≠D,A≠D Π C,E H2 σ A≠C A B D H2

SPJ P LANS : S TEP T HREE 34 Push projections to initial place R(A,B,E) S(B’,C) σ E=‘a’ Π D H0 T(C’,D) B=B’ C=C’ σ B≠D,A≠D Π C,E H2 σ A≠C A B D H2 R(A,B,E) S(B’,C) σ E=‘a’ Π D H0 T(C’,D) B=B’ C=C’ σ B≠D,A≠D Π C,E H2 σ A≠C

M AIN R ESULT /Theorem/ Let q be a CQ that can be evaluated in time T(|q|,|D|) using a Select-Project-Join plan. Then, we can compute (q, I) in time 35 x1x1 x2x2 x3x3... xkxk x k+1 R1R1 R2R2 R3R3 RkRk The function g depends on the joint structure of the query plan and the inequalities

C ONCLUSION 36 What is the complexity of computing CQ ≠ ? color-coding for any CQ ≠ SPJ query plans with inequalities In the paper : analysis of other structural properties Open questions can we apply the technique to arbitrary join algorithms? other classes of queries: UCQs, Datalog

Thank you! 37

C OLOR C ODING : A LGORITHM For any (valid) k-coloring c of the inequality graph, and any hash function h For each relation R, compute the sub-relation R c,h that satisfies the colors of c Apply the black-box join algorithm on the sub-instance with relations R c,h Output the union for all possible colorings and hash functions 38